Sorting ------- Sorting data is very important in computer applications. We will look at 5 different methods of sorting arrays: Best Average Worst Time to sort Advantages Disadvantages Case Case Case 10000 int's in range 1 - 200 on Dell Bubble Sort with imp. O(n) O(n^2) O(n^2) 34.4 sec Simple Inefficient Insertion Sort O(n) O(n^2) O(n^2) 6.75 sec Fairly fast Lots of swaps esp if array is somewhat sorted Selection Sort O(n^2) O(n^2) O(n^2) 10.0 sec Minimal swaps Always takes the same amt of time even if array is somewhat sorted Merge Sort O(nlgn) O(nlgn) O(nlgn) .169 sec Fairly fast Requires auxiliary array (extra memory) Quick Sort O(nlgn) O(nlgn) O(n^2) .067 sec Very fast in Very bad average case when array is Does sorting sorted or in in place reverse-sorted order Complicated Bubble Sort ----------- You may be familiar with Bubble Sort - it is a very simple type of sort, but rather inefficient (and discredited by most "computer scientists"). However, it is easy to implement and easy to remember and not that bad for a small amount of data. It was fine sorting about 1500 integers in the range from 1 - 200 on my Dell machine. The basic idea for an array with n elements is to make n - 1 passes through the array exchanging adjacent elements that are out of order. The smaller numbers "bubble up" to the top of the array. The largest value is guaranteed to sink to the bottom during the first pass. DEMO For n elements, how many comparisons do we have to make? 2 nested loops that are each executed n - 1 times: (n - 1) * (n - 1) = n^2 - 2n + 1 As n becomes very large, the dominant term is n^2. If an array with 1 element takes 1 ns to sort, an array with 1000 elements takes 1000000 ns to sort. So the amount of time to sort n elements is proportional to n^2 or O(n^2) We could actually decrease the time, by checking if we actually make any exchanges on a particular pass. If we don't, the array is sorted and we can stop. If we did this, what kind of array would be the "best case", ie. would take the least amount of time to run for BubbleSort - a sorted array would only require one pass through the array or about n comparisons. What would be the worst case? - array in descending sorted order would always require about n^2 comparisons Here is the code for BubbleSort - you could add a check to see if any exchanges were made on a particular pass through the array and quit if there weren't any. public class BubbleSort { public static void sort(int[] array) { for (int i = 0; i < array.length - 1; i++) { for (int j = 0; j < array.length - 1; j++) { if (array[j] > array[j + 1]) { int tmp = array[j]; array[j] = array[j + 1]; array[j + 1] = tmp; } } } } Improved version of BubbleSort: public class BubbleSort { public static void sort(int[] array) { boolean done = false; for (int i = 0; i < array.length - 1 && !done; i++) { done = true; for (int j = 0; j < array.length - 1 - i; j++) { if (array[j] > array[j + 1]) { done = false; int tmp = array[j]; array[j] = array[j + 1]; array[j + 1] = tmp; } } } } Insertion Sort -------------- Insertion sort is something like sorting a hand of playing cards from left to right. Each successive card is inserted in the correct position. DEMO What is the best case for insertion sort - the case that requires the least number of comparisons? the worst case? the average case? best O(n) worst O(n^2) average O(n^2) On my machine, sorting 3000 integers in the range 1 - 200 took about the same amount of time as sorting 1500 integers using BubbleSort We are actually doing the sum from 1 - n comparisons which is equal to n(n + 1)/ 2 = 1/2 n^2 + n which is why we could sort twice as many integers as BubbleSort in the same amount of time public static void sort(int[] array) { for (int i = 1; i < array.length; i++) { int toBeInserted = array[i]; int j; for (j = i - 1; j >= 0 && toBeInserted < array[j]; j--) array[j + 1] = array[j]; array[j + 1] = toBeInserted; } } Selection Sort -------------- Selection sort involves finding the smallest integer in the array and exchanging it with the first integer in the array, then finding the next smallest integer in the array and exchanging it with the second integer in the array, etc. until the last element is reached which is already in the correct position by default. DEMO Best, worst, and average cases all require n(n+1)/2 comparisons or about 1/2n^2 O(n^2) Selection sort took longer on my machine than insertion sort in the average case. Probably because if the inserted element is in place that pass ends. 8 sec/10000 vs. 10 sec/100000 45 sec for bubble sort public static void sort(int[] array) { for (int i = 0; i < array.length - 1; i++) { int min = array[i]; int minPos = i; for (int j = i + 1; j < array.length; j++) if (array[j] < min) { min = array[j]; minPos = j; } if (minPos != i) { int tmp = array[i]; array[i] = min; array[minPos] = tmp; } } } Merge Sort ---------- Merge sort is a faster sort than any of the ones we have looked at so far. It involves successively cutting the array in half until each array has only one element - then the arrays are merged. DEMO 8 4 4 8 3 levels 2 2 2 2 8 1 1 1 1 1 1 1 1 8 8 * 3 comparisons = 24 comparisons for 8 elements How is 3 related to 8? log 2 (8) = 3 2^3 = 8 How about 16 log 2 (16) = 4 to sort 16 elements requires 16 * 6 comparisons = 96 MergeSort - O(nlgn) big improvement over n^2 when n is very large: Selection sort Merge sort -------------- ---------- 10 ns / 10 elements 10 ns / 10 elements ? / 1024 elements ? / 1024 elements (1024)^2 ~ 1,000,000 ns 1024 log2 (1024) = 1024 * 10 = 10,024 ns Big savings with just 1000 elements - with 10000 elements took about 1 sec on my machine The problem with merge sort is that it requires an extra array in which to merge the smaller arrays. The code here is even more wasteful of memory, but the coding is fairly straight forward. There are better routines that use less memory. Merge sort is an excellent use of recursion - it would be very tedious to keep track of all of the halved arrays that needed to be merged to create the final sorted array. public static void sort(int[] array) { //Copy sorted array to original array when merge sort is complete System.arraycopy (mergeSort(array, 0, array.length - 1), 0, array, 0, array.length); } private static int[] mergeSort(int[] array, int first, int last) { int [] newArray; if (first == last) { newArray = new int[1]; newArray[0] = array[first]; } else { int mid = (last - first) / 2 + first; int [] firstHalf = mergeSort(array, first, mid); int [] lastHalf = mergeSort(array, mid + 1, last); newArray = new int[last - first + 1]; int i, j, k; for(i = 0, j = 0, k = 0; i < newArray.length && j < firstHalf.length && k < lastHalf.length; i++) { if (firstHalf[j] < lastHalf[k]) newArray[i] = firstHalf[j++]; else newArray[i] = lastHalf[k++]; } if (j < firstHalf.length) for ( ; i < newArray.length; i++, j++) newArray[i] = firstHalf[j]; else for ( ; i < newArray.length; i++, k++) newArray[i] = lastHalf[k]; } //System.out.println(MergeSort.toString(newArray)); return newArray; } Quicksort --------- Quicksort was invented by a man named C.A.R. Hoare in 1962. It is the fastest known general purpose in-memory sorting algorithm in the average case. It works by partitioning the array such that part of the array contains all of the values smaller than the "pivot" value and everything greater than or equal to the pivot value. This is done recursively until the array is partitioned into n arrays containing 1 element each. At this point the array is sorted. DEMO The best case for Quicksort is when the partitions are always the same size. For the best case or average case, the running time for Quicksort is proportional to nlgn just like merge sort. The worst case for Quicksort is when the original array is either in sorted order or reverse sorted order. That results in partitions such that the first (or last partition) always contains 1 element and the other partition contains the rest of the elements. public class Quicksort { public static void sort(int[] array) { Quicksort(array, 0, array.length - 1); } private static void Quicksort(int[] array, int first, int last) { if (first < last) { //System.err.println(first + " " + last + " " + toString(array)); int mid = partition(array, first, last); Quicksort(array, first, mid); Quicksort(array, mid + 1, last); } } private static int partition(int array[], int first, int last) { int x = array[first]; int i = first - 1; int j = last + 1; while (true) { do { j--; } while (array[j] > x); do { i++; } while (array[i] < x); if ( i < j ) { int tmp = array[i]; array[i] = array[j]; array[j] = tmp; } else return j; } }