3 Distance in Graphs While the previous lecture studied just the connectivity properties of a graph, now we are going to investigate how "long" (short, actually) a connection in a graph is. This naturally leads to the concept of graph distance, which has two variants: the simple one considering only the number of edges, while the weighted one having a "length" for each edge. Brief outline of this lecture • Distance in a graph, basic properties, triangle inequality. • Graph metrics: all-pairs shortest distances. • Dijkstra's algorithm for the shortest weighted distance in a graph. • Route planning: a sketch of some advanced ideas. 3.1 Graph distance (unweighted) Recall that a walk of length n in a graph G is an alternating sequence of vertices and edges vq, e\, f 1, e2, i>2, ■ ■ ■, en, such that each has the ends t>i_i, t>i. Definition 3.1. Distance da(u,v) between two vertices u,v of a graph G is defined as the length of the shortest walk between u and v in G. If there is now walk between u,v, then we declare da(u,v) — oo. Informally and naturally, the distance between u,v equals the least possible number of edges traversed from u to v. Specially do{u,u) = 0. Recall, moreover, that the shortest walk is always a path - Theorem 2.2. Fact: The distance in an undirected graph is symmetric, i.e. dc(u,v) — dc(v,u). Lemma 3.2. The graph distance satisfies the triangle inequality: Vu,v,w G V(G) : dc(u, v) + dc(v, w) > dc(u, w) .□ Proof. Easily; starting with a walk of length dc(u,v) from u to v, and appending a walk of length dc(v,w) from v to w, results in a walk of length dc(u,v) + dc(v,w) from u to w. This is an upper bound on the real distance from utow. C How to find the distance Theorem 3.3. Let u,v,w be vertices of a connected graph G such that dc(u,v) < dc(u,w). Then the breadth-first search algorithm on G, starting from u, finds the vertex v before w. c Proof. We apply induction on the distance dc(u, v): If dc(u, v) — 0, i.e. u — v, then it is trivial that v is found first. So let dc(u,v) — d > 0 and v' be a neighbour of v closer to u, which means dc(u, v') — d — 1. Analogously choose w' a neighbour of w closer to u. Then dc(u, w') > dc(u, w) — 1 > dc(u, v) — 1 — dc(u, v'). and so v' has been found before w' by the inductive assumption. Hence v' has been stored into U before w', and (cf. FIFO) the neighbours of v' (v among them, but not w) are found before the neighbours of w' (such as w). C Corollary 3.4. The breadth-first search algorithm on G correctly determines graph distances from the starting vertex. Other related terms Definition 3.5. Let G be a graph. We define, with resp. to G, the following notions: • The excentricity of a vertex exc(t>) is the largest distance from v to another vertex; exc(t>) — ma.yixev{G) da{v,x). c • The diameter d iam (G) of G is the largest excentricity over its vertices, and the radius rad(G) of G is the smallest excentricity over its vertices. • The center of G is the subset U C V (G) of vertices such that their excentricity equals rad(G). ^3.2 All-pairs shortest distances Definition: The metrics of a graph is the collection of distances between all pairs of its vertices. In other words, the metrics is a matrix d[,] such that d[i,j] is the distance from i to j. c Method 3.6. Dynamic programming for all-pairs distances in a graph G on the vertex set V(G) — {t>o, t>i,..., vjv-i}- • Initially, let d[i, j] be 1 (alternatively, the edge length of {vi,Vj}), or oo if Vi,Vj are not adjacent, c • After step t > 0 let it hold that d[i, j] is the shortest length of a walk between Vi,Vj such that its internal vert, are from {i>o,t>i,...,ft-i} (empty for t — 0). • Moving from step t to t + 1, we update all the distances as: - Either d[i, j] from the previous step is still optimal (the vertex vt does not help to obtain a shorter walk from Vi to Vj), or - there is a shorter Vi to Vj walk using (also) the vertex vt which is, by the assumption at step t, of length d [i, t] +d [t, j ] —>• d [i, j ]. c Theorem 3.7. Method 3.6 correctly computes the distance d[i,j] between each pair of vertices Vi,Vj in N — \V(G)\ steps. Remark: In a practical implementation we may use, say, MAX_INT/2 in place of oo. Algorithm 3.8. Floyd-Warshall algorithm (cf. 3.6) input < the adjacency matrix G [,] of an N-vertex graph, such that the vertices ofG are indexed as 0. . . N-l, and G [i, j ] =1 if i,j adjacent and G [i, j ] =0 otherwise; for (i=0; i• R (edge lengths in this case). A positively weighted graph G, w is such that w(e) > 0 for all edges e. c Definition 3.9. (Weighted distance) Consider a positively weighted graph G,w. The length of the weighted walk S — t>o, ei, i>i, e?, t>2, ■ ■ ■, en,vn in G is the sum d%(S) = w(ei) + w(e2) + • • • + w(e„). The weighted distance in G, w between a pair of vertices u, v is cIq(u,v) — mm{d,q(S) : S is a walk from w to t>} .□ All these terms naturally extend from graphs to directed graphs, c Analogously to Section 3.1 we get: Fact: The shortest walk in a positively weighted (di)graph is always a path, c Lemma 3.10. The weighted distance in a positively weighted (di)graph satisfies the triangle inequality. See an example. The distances between a-c and between b-c are 3. What about the a—b distance? □ Is it 6? □ No, the distance from a to b in the graph is 5 (traverse the "upper path"). Negative edge-lengths? What is the reason we are avoiding negative edge lengths? Hence, what is the x-y distance this graph? Say, 3 or 1? □ No, it is —oo, precisely by Definition 3.9, and this answer does not sound nice.. . □ Hence we have got a good reason not to consider negative edges in general. ^3.4 Single-source shortest paths problem This section deals with the more specific problem of finding the shortest distance between one pair of terminals in a graph (or, from a single source to all other vertices). Remark: The coming Dijkstra's algorithm is, on one hand, slightly more involved than Algorithm 3.8, but it is significantly faster in the computation of single-source shortest distances, on the other hand. □ Dijkstra's algorithm: • Is a variant of graph searching (related to BFS), in which every discovered vertex carries a variable keeping its temporary distance—the length of the shortest so far discovered walk reaching this vertex from the starting vertex, c • We always pick from the depository the vertex with the shortest temporary distance. This is because no shorter walk may reach this vertex (assuming nonneg-ative edge lengths), c • At the end of processing, the temporary distances become final shortest distances from the starting vertex (cf. Theorem 3.13). Algorithm 3.12. Computing the single-source shortest paths (Dijkstra), i.e. finding the shortest walk from u to v, or from u to all other vertices. input < ^-vertex graph G given by adjacency-length matrix len[,] > 0, where del [i, j ] =00 if j is not an out-neighbour of i; input < u, v, where u is the starting vertex and v the destination; □ // state [i] records the vertex processing state, dist [i] is the temporary distance for (i=0; i 0, i.e. w(xy) > pv(x) —pv(y). The above Euclidean potential is always admissible. □ • The modified length of any u-v walk S then is cIq (S) — (Iq(S)+pv(v)—pv(u), which is a constant difference from c[q(S)\ Hence some S is optimal for the weighting w iff S is optimal for w'. Here the Euclidean potential "strongly prefers" edges in the dest. direction. Second, The idea of "reach" • It is based on a natural observation that for long-distance route planning, vaste majority of edges of real-world road maps are basically irrelevant.□ Definition: Let Su,v denote a shortest walk from u to v in weighted G. For e G E(SUtV) let prefix(SUtV,e), suffix(SUtV,e) denote the starting (ending) segment of Su,v up to (after) e. cThe reach of an edge e G E(G) is given as reachc(e) — max { min( d^{prefix(Su^v, e)), d^{suffix(Su^v, e))) : Mu,v G V(G) A e G E(Su,v)}.n The reach of e mathematically quantifies (ir)relevance of e for route planning; the smaller reacha(e) is, the closer to the start or end of an optimal route e has to be. □ The immediate use of precomputed reach values is as follows: • The line "foreach (k out-neighbour of m)" (Algorithm 3.12) simply takes only those neighbours k such that reacho(m~k) > dist[m].