Graphs

The next section of the course has a brief introduction to graph notation which is comprehensive enough to get anyone started from any background. If you understand the notation of a graph to be $G(V,E)$ or know what a DAG is (as most mathemeticians like myself will), you can skip the background. Or use it to refresh your memory!

What might be new to mathematicians is how graphs can be stored in memory, and how difficult it is to determine if an edge exists in each of the 3 data structures.

Breadth-first search (BFS)

We came across BFS in the first examples of an algorithm in the course, where we attempted to find the shortest path (geodesic path for mathemeticians) from a source to a target.

The basic premise to that the search occurs over a graph, and we begin at the target, exploring all paths from the target until we find the source. Through each step, we keep track of the minimum distance from the target, and the predecessor vertex we came from, to allow us to retrace our steps to find the shortest path. As we explore the graph, we assign each vertex a tuple of (distance, predecessor), but only if the tuple was empty beforehand. If we allow overwriting of tuples, then we would be overwriting a shorter path with a later, longer path!

The pseudo code for this algorithm would be:

  1. Initialise all vertices with tuples (null, null)
  2. Start at the target vertex, assign it (0, null)
  3. For each neighbour of the target, assign it (1, target)
  4. For each neighbour of vertex i, all k distance from target, if the neighbour does not have (null, null), assign it (k, i).
  5. Continue until you reach the source vertex.

Khan Academy takes this point to introduce queues as a First In First Out data structure. We can use this to intuitively keep track of which vertex's neighbours we have already covered, to ensure we snowball correctly. We've already seen stacks in the Tower of Hanoi exercise, where we used python list data structure. However, it is generally not a good idea to use list, as inserting or deleting the first entry, the point of a queue, takes $O(n)$ time. See here for more information on queues in python, and why you should use each case.

Note: In BFS we use queues to cover all neighbours of a node before moving on to the next set of children. In Depth-first Search, we attmept to search as deep as we can before backtracking. What data structure would be most appropriate for Depth-first Search?

In [1]:
from collections import deque

def doBFS(graph, source):
    """
    graph is represented as an adjacency list, and source is the index of the source vertex.
    Returns a dictionary where the keys vertex indices, and give a tuple of (distance, predecessor)
    """
    q = deque()
    search = {}
    for i in range(len(graph)):
        search[i] = (None, None)
        
    search[source] = (0,None)
    q.append(source)
    dist = 1
    while len(q)>0:
        i = q.popleft() #take first in queue
        for n in graph[i]: #examine all neighbours
            if search[n][0] ==None:
                q.append(n)
                search[n] = (search[i][0]+1,i) #add one to predecessor distance
    return search
        
test = [
    [1],
    [0, 4, 5],
    [3, 4, 5],
    [2, 6],
    [1, 2],
    [1, 2, 6],
    [3, 5],
    []
    ]
doBFS(test, 3)
Out[1]:
{0: (4, 1),
 1: (3, 4),
 2: (1, 3),
 3: (0, None),
 4: (2, 2),
 5: (2, 2),
 6: (1, 3),
 7: (None, None)}

In this algorithm, we initialise across all nodes, and search all edges, leading to a complexity of $O(V+E)$. As these are both linear terms, we really mean $O(max(V,E))$.

Conclusion

This concludes all the material in the Khan Academy course. I will have a final notebook where I will tackle some oher problems that can be used to prepare for a coding interview.