Lecture 31 Handout

Graph Search

Learning Outcomes

At the end of this lecture, you’ll be able to:

Lecture Plan

In this lecture, we'll cover the following lessons:

  1. Graph Terminology: Neighborhood of a Vertex
  2. Graph Terminology: Path
  3. Graph Search: Definition
  4. Graph Search: Exercise
  5. Graph Search: General Solution
  6. Graph Search: BFS
  7. Graph Search: BFS Exercise
  8. Graph Search: BFS Pseudocode
  9. Graph Search: DFS
  10. Graph Search: DFS Exercise
  11. Graph Search: DFS Pseudocode
  12. Graph Search: Analysis
  13. Graph Search: Summary

Lessons marked with ⚡ contain exercise/activity.

Downloads

Graph Terminology: Neighborhood of a Vertex

Consider the following graph:

The set of all vertices in $G$ that are adjacent to a vertex $v$ is called the neighborhood of $v$ and denoted by $N(v)$.

$$ N_{\text{outgoing}}(A) = \{B, C\} $$

$$ N_{\text{incoming}}(C) = \{A, D\} $$

Exercise Name all the vertices adjacent to vertex $D$ in two groups – incoming neighbors and outgoing neighbors:

Solution

$$ N_{\text{outgoing}}(D) = \{C, E\} $$

$$ N_{\text{incoming}}(D) = \{B\} $$

Graph Terminology: Path

A path is a sequence of consecutive edges in a graph.

Alternatively, we can define “path” as a sequence of vertices where each vertex in the sequence is adjacent to the vertex next to it.

Consider the following graph:

Here are two pathes from $A$ to $C$: $(A, C)$ and $(A, B, D, C)$.

A simple path is a path that does not repeat any nodes or edges.

In this class, when I say “path”, I mean “simple path”.

Aside: In some references, what I defined as “path” is defined as “walk” and instead “simple path” is called, simply, “path”.

Exercise List the edges on a directed path from $B$ to $E$ and from $C$ to $E$.

Solution
  • Directed path from $B$ to $E$: $((B, D), (D, E))$.
  • There is no directed path from $C$ to $E$.

Graph Search: Definition

The Graph Search problem, in a nutshell, is figuring out if a graph contains a path from one vertex to another.

Many fundamental algorithms on graphs (e.g finding shortest path, cycles, connected components, $\dots$) are applications of the graph search problem.

General Graph Search Problem

Input: Graph $G=(V, E)$, and a starting vertex $s \in V$.
Goal: Identify the vertices in $V$ reachable from $s$ in $G$.

For example, consider the following graph: (It’s one graph with multiple connected components!)

The set of vertices reachable from $s$ is $\{s, u, v, w\}$.

Graph Search: Exercise

Exercise Identify the set of vertices reachable from $s$ and from $x$, in the following graph:

Solution
  • The set of vertices reachable from $s$ is $\{s, u, v\}$.
  • The set of vertices reachable from $x$ is $\{x\}$.

Graph Search: General Solution

Recall:

General Graph Search Problem

Input: Graph $G=(V, E)$, and a starting vertex $s \in V$.
Goal: Identify the vertices in $V$ reachable from $s$ in $G$.

The following is a solution to this problem:

// Post: a vertex is reachable from s iff it is marked as explored.
mark s as "explored"; all other vertices as "unexplored"
while there is an edge (v, w) in E with v explored and w unexplored do 
    choose some such edge (v, w) // underspecified 
    mark w as explored

Notice the instruction marked as “underspecified”; depending on how we choose the edge, the search will be called:

Graph Search: BFS

According to the Dictionary of Algorithms and Data Structures:

Breadth-First Search, or BFS, is any search algorithm that considers neighbors of a vertex, that is, outgoing edges of the vertex’s predecessor in the search, before any outgoing edges of the vertex. Extremes are searched last.

Given a “source” vertex to initiate the search, BFS starts by visiting its adjacent nodes, then all nodes that can be reached by a path from the start node containing two edges, three edges, and so on.

The BFS algorithm visits all vertices in a graph $G$ that are $k$ edges away from the source vertex $s$ before visiting any vertex $k+1$ edges away. You have seen this behavior in level-order tree traversal.

The process is further elaborated using a demo:

Demo

The following animated visualization of BFS algorithm (made by Gerry Jenkins) does a good job of illustrating its behavior:

Resources

Graph Search: BFS Exercise

Consider the following graph:

Exercise Write the vertices of the above graph in the order in which they would be visited in a breadth-first traversal starting at node $0$. Assume neighbors are visited in numerical order.

Solution
QueueEdgesExplored
0-0
1(0, 1)1
1, 2(0, 2)2
1, 2, 5(0, 5)5
2, 5, 3(1, 3)3
5, 3, 4(2, 4)4
3, 4, 7(5, 7)7
4, 7, 6(3, 6)6
7, 6--
6--
8(6, 8)8
---

The answer is $0, 1, 2, 5, 3, 4, 7, 6, 8$.

Graph Search: BFS Pseudocode

Exercise Based on your understanding of the BFS process, complete the pseudocode of BFS!

mark s as explored;all other vertices as unexplored
______________ data structure, initialized with s 
while____is not empty do
  remove the vertex from ____________, call it v         
  for edge (v, w) in v's neighborhood do
    if ____________ then
      _________________________
      _________________________
Solution
mark s as explored, all other vertices as unexplored
Q := a queue data structure, initialized with s 
while Q is not empty do
  remove the vertex from the front of Q, call it v 
  for edge (v, w) in v's neighborhood do
    if w is unexplored then
      mark w as explored 
      add w to the end of Q

Graph Search: DFS

According to the Dictionary of Algorithms and Data Structures:

Depth-First Search, or DFS, is any search algorithm that considers outgoing edges (children) of a vertex before any of the vertex’s siblings, that is, outgoing edges of the vertex’s predecessor in the search. Extremes are searched first.

The main idea behind DFS is to explore deeper into the graph whenever possible. Starting at a vertex, DFS will take a path and explore it as far as it goes. It then backtracks until it reaches an unexplored neighbor (a branch on the path which it has not explored yet). This process continues until it has discovered every vertex that are reachable from the original source vertex.

You have seen this behavior in pre-order and post-order tree traversal (and in-order binary tree traversal).

The process is further elaborated using a demo:

Demo

The following animated visualization of DFS algorithm (made by Gerry Jenkins) does a good job of illustrating its behavior:

Resources

Graph Search: DFS Exercise

Consider the following graph:

Exercise Write the vertices of the above graph in the order in which they would be visited in a depth-first traversal starting at node $0$. Assume neighbors are visited in numerical order.

Solution
StackEdgesExplored
0-0
1(0, 1)1
2, 1(0, 2)2
5, 2, 1(0, 5)5
7, 2, 1(5, 7)7
4, 2, 1(7, 4)4
3, 2, 1(4, 3)3
6, 3, 2, 1(4, 6)6
8, 3, 2, 1(6, 8)8
3, 2, 1--
2, 1--
1--
---

The answer is $0, 1, 2, 5, 7, 4, 3, 6, 8$.

Graph Search: DFS Pseudocode

Exercise Based on your understanding of the DFS process, complete the pseudocode of DFS!

mark s as explored;all other vertices as unexplored
______________ data structure, initialized with s 
while____is not empty do
  remove the vertex from ____________, call it v         
  for edge (v, w) in v's neighborhood do
    if ____________ then
      _________________________
      _________________________
Solution
mark s as explored, all other vertices as unexplored
S := a stack data structure, initialized with s 
while S is not empty do
  pop the vertex from the top of S, call it v 
  for each edge (v, w) in v's neighborhood do
    if w is unexplored then
      mark w as explored 
      push w to the top of S

Graph Search: Analysis

Here is a (more elaborate) pseudocode for solving the General Graph Search problem:

mark s as explored, all other vertices as unexplored
D := a queue or stack data structure, initialized with s 
while D is not empty do
  remove the vertex from the front/top of D, call it v 
  for edge (v, w) in v's neighborhood do
    if w is unexplored then
      mark w as explored 
      add w to the end/top of D

Notice the difference between BFS and DFS is that DFS uses stack but BFS uses queue.

Exercise Analyze the complexity of BFS algorithm (use Big Oh notation).

Solution
mark s as explored, all other vertices as unexplored // O(1), O(N)
D := a queue or stack data structure, initialized with s // O(1)
while D is not empty do                              // total O(N)     
  remove the vertex from the front/top of D, call it v  // O(1)
  for edge (v, w) in v’s neighborhood do                // O(neighbors(v))
    if w is unexplored then                             // O(1)
      mark w as explored                                // O(1)
      add w to the end/top of D                         // O(1)
  • Both search explore each edge at most once (for directed graphs), or twice (undirected graphs — once when exploring each endpoint).
  • After edge $(v, u)$ is encountered, both $v$ & $u$ are marked as explored.
  • We can implement the search in linear time if we can find eligible $(v, u)$ quickly (for each $v$)
  • This is where adjacency (incidence) list will provide fast access.
  • $O(\text{neighbors}(v))$ is $O(\deg(v))$ in incidence list (but it is $O(N)$ in adjacency matrix).
  • $N \times O(\deg(v))$ is $O(M)$ because Handshaking lemma says $\sum_{v \in V} \deg(v) = 2M$.
  • So in adjacency list, finding (unexplored) neighbors of each vertex takes total of $O(M)$ time.
  • (In adjacency matrix, this total would be $O(N^2)$ : $N$ for neighbors(v) $\times$ $N$ vertices).
  • Note that we can check $u$ is unexplored in $O(1)$ if we store this information in the vertex node (or HashTable of explored vertices where keys are the nodes).

The total running time of BFS & DFS is $O(M+N)$ if we use adjacency list representation.

The space complexity of a DFS, in practice, is usually lower than that of BFS. During BFS, all the nodes at one level must be stored whereas in DFS all the nodes in one path need to be stored. In a tree, for instance, the number of nodes per level usually exceeds the depth of the tree.

Graph Search: Summary

BFSDFS
Starts the search from the source node and visits nodes in a level by level manner (i.e., visiting the ones closest to the source first).Starts the search from the source node and visits nodes as far as possible from the source node (i.e., depth wise).
Usually implemented using a queue data structure.Usually implemented using a stack data structure.
Used for finding the shortest path between two nodes, testing if a graph is bipartite, finding all connected components in a graph, etc.Used for topological sorting, solving problems that require graph backtracking, detecting cycles in a graph, scheduling problems, etc.

Both BFS & DFS run in $O(M+N)$ if we use adjacency list. That’s just a constant factor larger than the amount of time required to read the input!

Note: It is common to modify the BFS/DFS algorithm to keep track of the edges instead of (or in addition to) the vertices (where each edge describes the nodes at each end). This is useful for e.g. reconstructing the traversed path after processing each node.

Aside: Both BFS & DFS can be implemented recursively. In particular, DFS easily lends itself to a recursive implementation. In fact, most resources describe DFS recursively! It is left as an exercise to you, to come up with recursive implementation of these algorithms.