Sorting
			   -------

Sorting data is very important in computer applications.
			   
We will look at 5 different methods of sorting arrays:

        Best   Average Worst  Time to sort  Advantages Disadvantages
        Case     Case   Case  10000 int's 
		               in range
			       1 - 200
			       on Dell
Bubble 
Sort
with
imp.     O(n)   O(n^2)  O(n^2)  34.4 sec     Simple      Inefficient

Insertion 
Sort     O(n)   O(n^2)  O(n^2)  6.75 sec  Fairly fast   Lots of swaps
                                          esp if array
					  is somewhat 
                                          sorted

Selection 
Sort    O(n^2) O(n^2)  O(n^2)  10.0 sec   Minimal swaps  Always takes     
                                                         the same amt
					                 of time even
					                 if array is 
					                 somewhat sorted

Merge 
Sort  O(nlgn)  O(nlgn) O(nlgn)  .169 sec  Fairly fast  Requires 
                                                       auxiliary array
						       (extra memory)
                                                                  
Quick 
Sort  O(nlgn)  O(nlgn)  O(n^2)  .067 sec  Very fast in Very bad
                                          average case when array is 
					  Does sorting sorted or in
				          in place     reverse-sorted
						       order
                                                       Complicated
                   Bubble Sort
		   -----------
		   
You may be familiar with Bubble Sort - it is a very simple type of 
sort, but rather inefficient (and discredited by most 
"computer scientists"). However, it is easy to implement and easy to 
remember and not that bad for a small amount of data. It was fine 
sorting about 1500 integers in the range from 1 - 200 on my
Dell machine.

The basic idea for an array with n elements is to make n - 1 passes 
through the array exchanging adjacent elements that are out of order. 
The smaller numbers "bubble up" to the top of the array. The largest 
value is guaranteed to sink to the bottom during the first pass.

DEMO

For n elements, how many comparisons do we have to make?

2 nested loops that are each executed n - 1 times:

(n - 1) * (n - 1) = n^2 - 2n + 1

As n becomes very large, the dominant term is n^2.

If an array with 1 element takes 1 ns to sort,

an array with 1000 elements takes 1000000 ns to sort.

So the amount of time to sort n elements is proportional to n^2 
or O(n^2)

We could actually decrease the time, by checking if we actually 
make any exchanges on a particular pass. If we don't, the array is 
sorted and we can stop.

If we did this, what kind of array would be the "best case", ie. 
would take the least amount of time to run for BubbleSort - a sorted 
array would only require one pass through the array or about n 
comparisons.

What would be the worst case?  - array in descending sorted 
order would always require about n^2 comparisons

Here is the code for BubbleSort - you could add a check to see 
if any exchanges were made on a particular pass through the 
array and quit if there weren't any.

public class BubbleSort {

  public static void sort(int[] array) {
  
    for (int i = 0; i < array.length - 1; i++) {
      
      for (int j = 0; j < array.length - 1; j++) {
      
        if (array[j] > array[j + 1]) {
	  
	  int tmp = array[j];
	  
	  array[j] = array[j + 1];
	  
	  array[j + 1] = tmp;
	}
      }
    } 
  }


Improved version of BubbleSort:
public class BubbleSort {

  public static void sort(int[] array) {
    
    boolean done = false;
    
    for (int i = 0; i < array.length - 1 && !done; i++) {
      
      done = true;
      
      for (int j = 0; j < array.length - 1 - i; j++) {
      
        if (array[j] > array[j + 1]) {
	  
	  done = false;
	  
	  int tmp = array[j];
	  
	  array[j] = array[j + 1];
	  
	  array[j + 1] = tmp;
	}
      }
    } 
  }

                     Insertion Sort
		     --------------
		     
Insertion sort is something like sorting a hand of playing cards from 
left to right. Each successive card is inserted in the correct 
position.

DEMO

What is the best case for insertion sort - the case that requires the 
least number of comparisons? the worst case? the average case?

best O(n)  worst O(n^2) average O(n^2)

On my machine, sorting 3000 integers in the range 1 - 200 took about 
the same amount of time as sorting 1500 integers using BubbleSort

We are actually doing

the sum from 1 - n comparisons which is equal to 
n(n + 1)/ 2 = 1/2 n^2 + n
which is why we  could sort twice as many integers as BubbleSort 
in the same amount of time

public static void sort(int[] array) {
  
    for (int i = 1; i < array.length; i++) {
    
      int toBeInserted = array[i];
      
      int j;
      
      for (j = i - 1; j >= 0 && toBeInserted < array[j]; j--)
      
        array[j + 1] = array[j];
	
      array[j + 1] = toBeInserted;
    }
  }

                       Selection Sort
		       --------------
		       
Selection sort involves finding the smallest integer in the array and 
exchanging it with the first integer in the array, then finding the 
next smallest integer in the array and exchanging it with the second 
integer in the array, etc. until the last element is reached
which is already in the correct position by default.

DEMO

Best, worst, and average cases all require n(n+1)/2 comparisons or 
about 1/2n^2

O(n^2)

Selection sort took longer on my machine than insertion sort in the 
average case. Probably because if the inserted element is in place 
that pass ends. 8 sec/10000 vs. 10 sec/100000
45 sec for bubble sort

public static void sort(int[] array) {
  
    for (int i = 0; i < array.length - 1; i++) {
        
      int min = array[i];
      
      int minPos = i;
      
      for (int j = i + 1; j < array.length; j++)
      
        if (array[j] < min) {
	  
	  min = array[j];
	  
	  minPos = j;
	  
	}
	
      if (minPos != i) {
        
	 int tmp = array[i];
	 
	 array[i] = min;
	 
	 array[minPos] = tmp;
      }
	 
    }
  }

                        Merge Sort
			----------
			
Merge sort is a faster sort than any of the ones we have looked at so
far. It involves successively cutting the array in half until each 
array has only one element - then the arrays are merged.

DEMO

                           8       
			   
		4                   4                8        3 levels
		
	2           2         2           2          8
	 
    1      1     1    1    1    1     1       1      8  
    
    
8  *  3 comparisons  = 24 comparisons for 8 elements

How is 3 related to 8?

log 2 (8)   =   3       2^3 = 8

How about 16

log 2 (16) =  4   to sort 16 elements requires 16 * 6 comparisons = 96


MergeSort - O(nlgn)  big improvement over n^2 when n is very large:

Selection sort                        Merge sort
--------------                        ----------

10 ns / 10 elements            10 ns / 10 elements

?      / 1024 elements         ?     / 1024 elements

(1024)^2 ~ 1,000,000 ns      1024 log2 (1024) = 1024 * 10 = 10,024 ns

Big savings with just 1000 elements - with 10000 elements took about 
1 sec on my machine

The problem with merge sort is that it requires an extra array in 
which to merge the smaller arrays.

The code here is even more wasteful of memory, but the coding is 
fairly straight forward. There are better routines that use less 
memory. Merge sort is an excellent use of recursion - it would be 
very tedious to keep track of all of the halved arrays that needed to 
be merged to create the final sorted array.

public static void sort(int[] array) {
  
  //Copy sorted array to original array when merge sort is complete
  System.arraycopy
  (mergeSort(array, 0, array.length - 1), 0, array, 0, array.length);
    
}
  
private static int[] mergeSort(int[] array, int first, int last) {
    
  int [] newArray;
  if (first == last) {
    newArray = new int[1];
    newArray[0] = array[first];
  }
  else {
    int mid = (last - first) / 2 + first;
      
    int [] firstHalf = mergeSort(array, first, mid);
      
    int [] lastHalf = mergeSort(array, mid + 1, last);
  
     
    newArray = new int[last - first + 1];
    int i, j, k;
    for(i = 0, j = 0, k = 0; i < newArray.length && 
                             j < firstHalf.length && 
			     k < lastHalf.length; i++) {
      if (firstHalf[j] < lastHalf[k])
	newArray[i] = firstHalf[j++];
      else
	newArray[i] = lastHalf[k++];
    }
    if (j < firstHalf.length)
      for ( ; i < newArray.length; i++, j++)
	newArray[i] = firstHalf[j];
    else
      for ( ; i < newArray.length; i++, k++)
	newArray[i] = lastHalf[k];
  }
  //System.out.println(MergeSort.toString(newArray));	  
  return newArray;
}

                          Quicksort
			  ---------
			  
Quicksort was invented by a man named C.A.R. Hoare in 1962. It is the 
fastest known general purpose in-memory sorting algorithm in the 
average case.

It works by partitioning the array such that part of the array 
contains all of the values smaller than the "pivot" value and 
everything greater than or equal to the pivot value. This is done 
recursively until the array is partitioned into n arrays containing
1 element each. At this point the array is sorted.

DEMO

The best case for Quicksort is when the partitions are always the 
same size. For the best case or average case, the running time for 
Quicksort is proportional to nlgn just like merge sort. The worst case 
for Quicksort is when the original array is either in sorted order or 
reverse sorted order. That results in partitions such that the first 
(or last partition) always contains 1 element and the other partition 
contains the rest of the elements.

public class Quicksort {

  public static void sort(int[] array) {
    
    Quicksort(array, 0, array.length - 1);
  }
  
  
  private static void Quicksort(int[] array, int first, int last) {
  
    if (first < last) {
    
      //System.err.println(first + " " + last + " " + toString(array));
      int mid = partition(array, first, last);
      
      Quicksort(array, first, mid);
      
      Quicksort(array, mid + 1, last);
      
    }
  }
  
  private static int partition(int array[], int first, int last) {
  
    int x = array[first];
    
    int i = first - 1;
    
    int j = last + 1;
    
    while (true) {
    
      do {
        j--;
	
      } while (array[j] > x);
      
      do  {
        
	i++;
	
      } while (array[i] < x);
      
      if ( i < j ) {
      
        int tmp = array[i];
	array[i] = array[j];
	array[j] = tmp;
	
      }
      
      else
      
        return j;
    }
  }