Divide and conquer algorithms

The next set of tutorials cover the more advanced sorting algorithms merge sort and quicksort, which have lower time complexities than selection sort and insertion sort.

The Khan Academy page has a great table summarising the time complexities, which I will reuse here for reference.

image.png

Both merge sort and quicksort implement a divide and conquer algorithm, which includes elements of recursion that we saw in the previous exercises.

Merge sort

To keep notation simple and consistent with the tutorial, I will denote the subarray sub to be a subset of a length n array where sub = array[p:r+1] for indices p and r.

Merge sort is implemented by:

  1. Divide the array into two equal length subarrays
  2. Conquer by recursively sorting the subarrays
  3. Combine by merging the two sorted subarrays back into a single sorted array

Here, the base case is where sub is an array of less than 2 elements, as that is already sorted.

The Divide step is simple, and the Conquer step recursively reduces to the base case. It is the Combine step that has the most complexity, and that is where we will focus first.

Note: Khan Academy requests you implement the rest of the mergesort algorithm first using their already built merge function. I feel this is trivial and also lacks intuition. We gain nothing by implementing the easy steps, especially if we have no understanding of the most critical and complex step and what it returns.

Linear-time merging

We want to build a function merge that combines two sorted subarrays, array[p:q+1] and array[q+1:r+1] into a single subarray array[p:r+1].

In order to merge them, we will need to copy the elements into temporary arrays so that when we writeover the original arrays, we still have the original elements recorded somewhere. Note that in python (at least in ver 3.6), when we assign a slice of an array to a variable, it deep copies the elements across. This is not the same behaviour when you assign an array name to a new variable! Lets call these copies lohalf and hihalf.

The next consideration is in what order to recombine the two subarrays. As both subarrays are sorted, we only need to compare the left most elements that are yet to be allocated.

In other words, if all elements smaller lohalf[i] and hihalf[j] have been allocated and sorted into array, then for each index k in array,

  1. Take the the smaller of lohalf[i] and hihalf[j],
  2. Assign it to array[k],
  3. Increment i by 1 if lohalf[i] was smaller than hihalf[j], increment j otherwise,
  4. Increment k by 1,
  5. Once all of either lohalf or hihalf has been allocated, allocate the rest of the other into array.

This is called linear-time merging is because we only examine and allocate each element in array once, and so for a length $n$ array, this should take linear time, or $O(n)$.

Note: Overthinking! Continuing our knowledge gained in the previous topic about python lists being a stack class, there is a way to implement this algoirithm using lohalf and hihalf as reverse stacks, or queues. This would save a line in our code from incrementing the indices, but we would still need to have an exit condition for our while loop. I have implemented the simpler solution below for readability. Whether using the arrays as stacks is more efficient is beyond the scope of this exercise and I leave it to the reader to consider.

In [38]:
def merge(array,p,q,r):
    """
    Merge two sorted subarrays array[p:q+1] and array[q+1:r] such that
    the merged array is sorted.
    """
    lohalf = array[p:q+1]
    hihalf = array[q+1:r+1]
    i=0
    j=0
    k=p
    while i<= q-p and j<=r-q-1:
        if lohalf[i]< hihalf[j]:
            array[k] = lohalf[i]
            i +=1
        else:
            array[k] = hihalf[j]
            j +=1
        k +=1
    else:
        if i>q-p: #lohalf fully allocated
            array[k:r+1] = hihalf[j:] #shrinks array to be size of remainder if array[k:]
        else: #hihalf was fully allocated
            array[k:r+1] = lohalf[i:]

test1 = [2,3,5,7,11]           
test2 = [1,3,5,7,9]
test = test2+test1
merge(test,0,(len(test)-1)//2,len(test)-1)
test
Out[38]:
[1, 2, 3, 3, 5, 5, 7, 7, 9, 11]

Having implemented a merge function, we will now use this function to implement a mergesort function.

In [39]:
def mergesort(array,p=0,r=None):
    """
    function to sort array from index p to index r (Default end of array) using
    mergesort algorithm
    """
    if r is None:
        #use whole array
        r = len(array)-1
    #base case skips below loop (p==r)
    if r-p>=1:
        q = (p+r)//2
        mergesort(array, p, q) #sort array[p:q+1]
        mergesort(array, q+1, r) #sort array[q+1,r+1]
        merge(array, p, q, r)

        
        
test = [6,3,-1,3.5,5,90,-2,4,5,11]

mergesort(test)
test
Out[39]:
[-2, -1, 3, 3.5, 4, 5, 5, 6, 11, 90]

Note: With variables p, q, r, k, i, j all representing indices at various parts of the algorithm, I found I made the most mistakes on incrementing them and what each of their ranges and limits should be, such as p, q, r, k all belong in the total array space (ranged from $0$ to $n$), but i, j were only in the copied elements space, from $0$ to p-q or r-q-1.

Order of merge sort

As we saw from the table above, merge sort has time complexity $O(n \text{log} n)$.

Most of the steps in merge sort are simple and run in constant time, but the two complex steps are the recursive divide, and the merge step.

As we recursively divide the array, this step will be done $\text{log}_2 (n) +1 $ times, this gets us to $n$ number of single element arrays. We then need to merge all $n$ of them together into pairs. Comparing two single elements and merging them is constant complexity. Remember that the merge operates in linear time, which means the merge operation is $O(n)$, and for single elements, it is then $O(1)$ for each, or $c$ constant time. We do this $n$ times, so at the lowest level, the total merging is $O(n)$.

Now we need to merge $n/2$ arrays of length 2 each. Each one takes $2c$ time, but there are $n/2$ of them, so we again take $cn$ time, or $O(n)$ time. Hopefully you can see each level of division will take $O(n)$ time.

So as there are $\text{log}_2 (n) +1 $ levels of division, and each level takes $O(n)$ time, the total complexity of the merge sort is $O(n\text{log} n) $

Space order of merge sort

Remember that in selection and insertion sort, we at most only needed to remember one value at a time. This means that those algorithms need very little memory to work, and we consider them to work in place. As we need to store a copy of the entire array to sort in merge sort, the memory requirements are much larger. This memory requirement means that the alternative quicksort, which we will run throguh now, is often the preferred algorithm.

Quicksort

Quicksort is the mirror to merge sort, just like selection sort is the mirror to insertion sort. In merge sort, the combine step did all the fancy work, while dividing was trivial. Here, we will find quicksort to be the opposite, the dividing step has all the complexity while combining does absolutely nothing.

A key difference however, is quicksort works in place, but sacrifices some time efficiency for this, and has a worst-case running time of $\Theta(n^2)$. The average case running time is still $\Theta(n\text{log}n)$ though.

The algorithm is summarised as:

  1. Divide the subarray array[p:r+1] by choosing any element. This element is the pivot. Conventionally, the right most element array[r] is usually chosen as the pivot. Place all elements of array[p:r+1] that are less than or equal to the pivot on the left of the pivot, and greater than the pivot on the right in any order. This is called partitioning. Let q be the index of where the pivot ends up.
  2. Conquer by recursively sorting the subarrays array[p:q] (all elements smaller than the pivot above) and array[q+1:r+1].
  3. Combine by doing nothing, as the Conquer step was done in place, no merging is required.

I have kept the same number and label of steps as Khan Academy for consistency and to avoid confusion. The labels used are to keep consistent with the labels from merge sort, so I can see why they've done this. However, my interpretation of the algorithm, which I think is a bit more transparent and clear, would be:

  1. Choose an element as a pivot. Conventionally this is the right-most element.
  2. Partition all elements of array[p:r+1] that are less than or equal to the pivot on the left of the pivot, and all elements greater than the pivot on the right, in any order. Let q be the index of where the pivot ends up.
  3. Divide and conquer by recursively applying steps 1. and 2. to sort the subarrays array[p:q] and array[q+1:r+1].

Let us have a go at implementing quick sort, before a brief discussion on its complexity.

In [2]:
def quicksort(array, p=0, r=None):
    """
    Implement quicksort on a subarray array[p:r+1]
    """
    if r==None:
        r = len(array)-1
    
    #base case skips the following code and does nothing
    if len(array[p:r+1])>1:
        pivot = array[r]
        q = r
        i = p
        while i < q:
            if array[i] > pivot:
                end = array[i]
                for n in range(i, r):
                    array[n] = array[n+1]
                array[r] = end
                #keep track of where the pivot is
                q = q-1
            else:
                #move to next index, 
                #while leaving ith element to left of pivot
                i=i+1
        quicksort(array,p, q-1)
        quicksort(array, q+1, r)

test = [6,3,-1,-3,5,90,67,12,5,11]

quicksort(test)
test
Out[2]:
[-3, -1, 3, 5, 5, 6, 11, 12, 67, 90]

Order of quicksort

Here we encounter our first algorithm where the worse and average case of the algorithm have different orders. In the worse case, where interestingly the array is already sorted, then the quicksort function will compare every element, with every element to the left of it (as the pivot is right most in our implementation). This is clearly an $O(n^2)$ algorithm. It takes exactly $\sum^N_{i=2} n = \frac{n^2 +n}{2} -1$ iterations. An intuitive way to realise this is the worst case scenario is to consider that in the divide step for this case, we really only lose one element from consideration at each divide, the pivot (all other elements are already less than our pivot). We will need $n-1$ divisions.

The best case occurs when the partitions are evenly balanced on either side of the pivot, or within one of each other. This means that when we subdivide, we reduce the size of the array by half, and so like a binary search, we will need $log_2n +1$ divisions. The problem is then very similar to mergesort in its complexity at each step, resulting in $O(nlogn)$ complexity.

To investigate the average case, please see the explanation in the course notes.

It's skipped over in the course notes, but the reason we should consider the worse case is due to cryptography reasons. One way to avoid malicious attacks of slowing down code is to randomise the chosen location of the pivot! If that is not enough, you could also randomly choose 3 elements, and select their median to be the pivot.