The next set of tutorials cover the more advanced sorting algorithms merge sort and quicksort, which have lower time complexities than selection sort and insertion sort.
The Khan Academy page has a great table summarising the time complexities, which I will reuse here for reference.
Both merge sort and quicksort implement a divide and conquer algorithm, which includes elements of recursion that we saw in the previous exercises.
To keep notation simple and consistent with the tutorial, I will denote the subarray sub
to be a subset of a length n
array
where sub = array[p:r+1]
for indices p
and r
.
Merge sort is implemented by:
Here, the base case is where sub
is an array of less than 2 elements, as that is already sorted.
The Divide step is simple, and the Conquer step recursively reduces to the base case. It is the Combine step that has the most complexity, and that is where we will focus first.
Note: Khan Academy requests you implement the rest of the mergesort algorithm first using their already built merge
function. I feel this is trivial and also lacks intuition. We gain nothing by implementing the easy steps, especially if we have no understanding of the most critical and complex step and what it returns.
We want to build a function merge
that combines two sorted subarrays, array[p:q+1]
and array[q+1:r+1]
into a single subarray array[p:r+1]
.
In order to merge them, we will need to copy the elements into temporary arrays so that when we writeover the original arrays, we still have the original elements recorded somewhere. Note that in python (at least in ver 3.6), when we assign a slice of an array to a variable, it deep copies the elements across. This is not the same behaviour when you assign an array name to a new variable! Lets call these copies lohalf
and hihalf
.
The next consideration is in what order to recombine the two subarrays. As both subarrays are sorted, we only need to compare the left most elements that are yet to be allocated.
In other words,
if all elements smaller lohalf[i]
and hihalf[j]
have been allocated and sorted into array
,
then
for each index k
in array
,
lohalf[i]
and hihalf[j]
, array[k]
,i
by 1 if lohalf[i]
was smaller than hihalf[j]
, increment j
otherwise,k
by 1,lohalf
or hihalf
has been allocated, allocate the rest of the other into array
.This is called linear-time merging is because we only examine and allocate each element in array
once, and so for a length $n$ array
, this should take linear time, or $O(n)$.
Note: Overthinking! Continuing our knowledge gained in the previous topic about python lists
being a stack class, there is a way to implement this algoirithm using lohalf
and hihalf
as reverse stacks, or queues. This would save a line in our code from incrementing the indices, but we would still need to have an exit condition for our while
loop. I have implemented the simpler solution below for readability. Whether using the arrays as stacks is more efficient is beyond the scope of this exercise and I leave it to the reader to consider.
def merge(array,p,q,r):
"""
Merge two sorted subarrays array[p:q+1] and array[q+1:r] such that
the merged array is sorted.
"""
lohalf = array[p:q+1]
hihalf = array[q+1:r+1]
i=0
j=0
k=p
while i<= q-p and j<=r-q-1:
if lohalf[i]< hihalf[j]:
array[k] = lohalf[i]
i +=1
else:
array[k] = hihalf[j]
j +=1
k +=1
else:
if i>q-p: #lohalf fully allocated
array[k:r+1] = hihalf[j:] #shrinks array to be size of remainder if array[k:]
else: #hihalf was fully allocated
array[k:r+1] = lohalf[i:]
test1 = [2,3,5,7,11]
test2 = [1,3,5,7,9]
test = test2+test1
merge(test,0,(len(test)-1)//2,len(test)-1)
test
Having implemented a merge
function, we will now use this function to implement a mergesort
function.
def mergesort(array,p=0,r=None):
"""
function to sort array from index p to index r (Default end of array) using
mergesort algorithm
"""
if r is None:
#use whole array
r = len(array)-1
#base case skips below loop (p==r)
if r-p>=1:
q = (p+r)//2
mergesort(array, p, q) #sort array[p:q+1]
mergesort(array, q+1, r) #sort array[q+1,r+1]
merge(array, p, q, r)
test = [6,3,-1,3.5,5,90,-2,4,5,11]
mergesort(test)
test
Note: With variables p, q, r, k, i, j
all representing indices at various parts of the algorithm, I found I made the most mistakes on incrementing them and what each of their ranges and limits should be, such as p, q, r, k
all belong in the total array space (ranged from $0$ to $n$), but i, j
were only in the copied elements space, from $0$ to p-q
or r-q-1
.
As we saw from the table above, merge sort has time complexity $O(n \text{log} n)$.
Most of the steps in merge sort are simple and run in constant time, but the two complex steps are the recursive divide, and the merge step.
As we recursively divide the array, this step will be done $\text{log}_2 (n) +1 $ times, this gets us to $n$ number of single element arrays. We then need to merge all $n$ of them together into pairs. Comparing two single elements and merging them is constant complexity. Remember that the merge operates in linear time, which means the merge operation is $O(n)$, and for single elements, it is then $O(1)$ for each, or $c$ constant time. We do this $n$ times, so at the lowest level, the total merging is $O(n)$.
Now we need to merge $n/2$ arrays of length 2 each. Each one takes $2c$ time, but there are $n/2$ of them, so we again take $cn$ time, or $O(n)$ time. Hopefully you can see each level of division will take $O(n)$ time.
So as there are $\text{log}_2 (n) +1 $ levels of division, and each level takes $O(n)$ time, the total complexity of the merge sort is $O(n\text{log} n) $
Remember that in selection and insertion sort, we at most only needed to remember one value at a time. This means that those algorithms need very little memory to work, and we consider them to work in place. As we need to store a copy of the entire array to sort in merge sort, the memory requirements are much larger. This memory requirement means that the alternative quicksort, which we will run throguh now, is often the preferred algorithm.
Quicksort is the mirror to merge sort, just like selection sort is the mirror to insertion sort. In merge sort, the combine step did all the fancy work, while dividing was trivial. Here, we will find quicksort to be the opposite, the dividing step has all the complexity while combining does absolutely nothing.
A key difference however, is quicksort works in place, but sacrifices some time efficiency for this, and has a worst-case running time of $\Theta(n^2)$. The average case running time is still $\Theta(n\text{log}n)$ though.
The algorithm is summarised as:
array[p:r+1]
by choosing any element. This element is the pivot. Conventionally, the right most element array[r]
is usually chosen as the pivot. Place all elements of array[p:r+1]
that are less than or equal to the pivot on the left of the pivot, and greater than the pivot on the right in any order. This is called partitioning. Let q
be the index of where the pivot ends up.array[p:q]
(all elements smaller than the pivot above) and array[q+1:r+1]
.I have kept the same number and label of steps as Khan Academy for consistency and to avoid confusion. The labels used are to keep consistent with the labels from merge sort, so I can see why they've done this. However, my interpretation of the algorithm, which I think is a bit more transparent and clear, would be:
array[p:r+1]
that are less than or equal to the pivot on the left of the pivot, and all elements greater than the pivot on the right, in any order. Let q
be the index of where the pivot ends up.array[p:q]
and array[q+1:r+1]
.Let us have a go at implementing quick sort, before a brief discussion on its complexity.
def quicksort(array, p=0, r=None):
"""
Implement quicksort on a subarray array[p:r+1]
"""
if r==None:
r = len(array)-1
#base case skips the following code and does nothing
if len(array[p:r+1])>1:
pivot = array[r]
q = r
i = p
while i < q:
if array[i] > pivot:
end = array[i]
for n in range(i, r):
array[n] = array[n+1]
array[r] = end
#keep track of where the pivot is
q = q-1
else:
#move to next index,
#while leaving ith element to left of pivot
i=i+1
quicksort(array,p, q-1)
quicksort(array, q+1, r)
test = [6,3,-1,-3,5,90,67,12,5,11]
quicksort(test)
test
Here we encounter our first algorithm where the worse and average case of the algorithm have different orders. In the worse case, where interestingly the array is already sorted, then the quicksort
function will compare every element, with every element to the left of it (as the pivot is right most in our implementation). This is clearly an $O(n^2)$ algorithm. It takes exactly $\sum^N_{i=2} n = \frac{n^2 +n}{2} -1$ iterations. An intuitive way to realise this is the worst case scenario is to consider that in the divide step for this case, we really only lose one element from consideration at each divide, the pivot (all other elements are already less than our pivot). We will need $n-1$ divisions.
The best case occurs when the partitions are evenly balanced on either side of the pivot, or within one of each other. This means that when we subdivide, we reduce the size of the array by half, and so like a binary search, we will need $log_2n +1$ divisions. The problem is then very similar to mergesort
in its complexity at each step, resulting in $O(nlogn)$ complexity.
To investigate the average case, please see the explanation in the course notes.
It's skipped over in the course notes, but the reason we should consider the worse case is due to cryptography reasons. One way to avoid malicious attacks of slowing down code is to randomise the chosen location of the pivot! If that is not enough, you could also randomly choose 3 elements, and select their median to be the pivot.