This module implements a few functions that deal with the computation of distances or shortest paths between all pairs of vertices.
Efficiency : Because these functions involve listing many times the (out)-neighborhoods of (di)-graphs, it is useful in terms of efficiency to build a temporary copy of the graph in a data structure that makes it easy to compute quickly. These functions also work on large volume of data, typically dense matrices of size \(n^2\), and are expected to return corresponding dictionaries of size \(n^2\), where the integers corresponding to the vertices have first been converted to the vertices’ labels. Sadly, this last translating operation turns out to be the most time-consuming, and for this reason it is also nice to have a Cython module, and version of these functions that return C arrays, in order to avoid these operations when they are not necessary.
Memory cost : The methods implemented in the current module sometimes need large amounts of memory to return their result. Storing the distances between all pairs of vertices in a graph on \(1500\) vertices as a dictionary of dictionaries takes around 200MB, while storing the same information as a C array requires 4MB.
The C function all_pairs_shortest_path_BFS actually does all the computations, and all the others (except for Floyd_Warshall) are just wrapping it. This function begins with copying the graph in a data structure that makes it fast to query the out-neighbors of a vertex, then starts one Breadth First Search per vertex of the (di)graph.
What can this function compute ?
The matrix of predecessors.
This matrix \(P\) has size \(n^2\), and is such that vertex \(P[u,v]\) is a predecessor of \(v\) on a shortest \(uv\)-path. Hence, this matrix efficiently encodes the information of a shortest \(uv\)-path for any \(u,v\in G\) : indeed, to go from \(u\) to \(v\) you should first find a shortest \(uP[u,v]\)-path, then jump from \(P[u,v]\) to \(v\) as it is one of its outneighbors. Apply recursively and find out what the whole path is !.
The matrix of distances.
This matrix has size \(n^2\) and associates to any \(uv\) the distance from \(u\) to \(v\).
The vector of eccentricities.
This vector of size \(n\) encodes for each vertex \(v\) the distance to vertex which is furthest from \(v\) in the graph. In particular, the diameter of the graph is the maximum of these values.
What does it take as input ?
- gg a (Di)Graph.
- unsigned short * predecessors – a pointer toward an array of size \(n^2\cdot\text{sizeof(unsigned short)}\). Set to NULL if you do not want to compute the predecessors.
- unsigned short * distances – a pointer toward an array of size \(n^2\cdot\text{sizeof(unsigned short)}\). The computation of the distances is necessary for the algorithm, so this value can not be set to NULL.
- int * eccentricity – a pointer toward an array of size \(n\cdot\text{sizeof(int)}\). Set to NULL if you do not want to compute the eccentricity.
Technical details
- The vertices are encoded as \(1, ..., n\) as they appear in the ordering of G.vertices().
- Because this function works on matrices whose size is quadratic compared to the number of vertices when computing all distances or predecessors, it uses short variables to store the vertices’ names instead of long ones to divide by 2 the size in memory. This means that only the diameter/eccentricities can be computed on a graph of more than 65536 nodes. For information, the current version of the algorithm on a graph with \(65536=2^{16}\) nodes creates in memory \(2\) tables on \(2^{32}\) short elements (2bytes each), for a total of \(2^{33}\) bytes or \(8\) gigabytes. In order to support larger sizes, we would have to replace shorts by 32-bits int or 64-bits int, which would then require respectively 16GB or 32GB.
- In the C version of these functions, infinite distances are represented with <unsigned short> -1 = 65535 for unsigned short variables, and by INT32_MAX otherwise. These case happens when the input is a disconnected graph, or a non-strongly-connected digraph.
- A memory error is raised when data structures allocation failed. This could happen with large graphs on computers with low memory space.
Warning
The function all_pairs_shortest_path_BFS has no reason to be called by the user, even though he would be writing his code in Cython and look for efficiency. This module contains wrappers for this function that feed it with the good parameters. As the function is inlined, using those wrappers actually saves time as it should avoid testing the parameters again and again in the main function’s body.
AUTHOR:
REFERENCE:
[KRG96b] | (1, 2) S. Klavzar, A. Rajapakse, and I. Gutman. The Szeged and the Wiener index of graphs. Applied Mathematics Letters, 9(5):45–49, 1996. |
[GYLL93c] | I. Gutman, Y.-N. Yeh, S.-L. Lee, and Y.-L. Luo. Some recent results in the theory of the Wiener number. Indian Journal of Chemistry, 32A:651–661, 1993. |
[CGH+13] | (1, 2) P. Crescenzi, R. Grossi, M. Habib, L. Lanzi, A. Marino. On computing the diameter of real-world undirected graphs. Theor. Comput. Sci. 514: 84-95 (2013) http://dx.doi.org/10.1016/j.tcs.2012.09.018 |
[CGI+10] | P. Crescenzi, R. Grossi, C. Imbrenda, L. Lanzi, and A. Marino. Finding the Diameter in Real-World Graphs: Experimentally Turning a Lower Bound into an Upper Bound. Proceedings of 18th Annual European Symposium on Algorithms. Lecture Notes in Computer Science, vol. 6346, 302-313. Springer (2010). |
[MLH08] | C. Magnien, M. Latapy, and M. Habib. Fast computation of empirically tight bounds for the diameter of massive graphs. ACM Journal of Experimental Algorithms 13 (2008) http://dx.doi.org/10.1145/1412228.1455266 |
Returns the diameter of \(G\).
This method returns Infinity if the (di)graph is not connected. It can also quickly return a lower bound on the diameter using the 2sweep and multi-sweep schemes.
INPUTS:
method – (default: ‘iFUB’) specifies the algorithm to use among:
'standard' – Computes the diameter of the input (di)graph as the largest eccentricity of its vertices. This is the classical method with time complexity in \(O(nm)\).
'2sweep' – Computes a lower bound on the diameter of an unweighted undirected graph using 2 BFS, as proposed in [MLH08]. It first selects a vertex \(v\) that is at largest distance from an initial vertex source using BFS. Then it performs a second BFS from \(v\). The largest distance from \(v\) is returned as a lower bound on the diameter of \(G\). The time complexity of this method is linear in the size of \(G\).
'multi-sweep' – Computes a lower bound on the diameter of an unweighted undirected graph using several iterations of the 2sweep algorithms [CGH+13]. Roughly, it first uses 2sweep to identify two vertices \(u\) and \(v\) that are far apart. Then it selects a vertex \(w\) that is at same distance from \(u\) and \(v\). This vertex \(w\) will serve as the new source for another iteration of the 2sweep algorithm that may improve the current lower bound on the diameter. This process is repeated as long as the lower bound on the diameter is improved.
'iFUB' – The iFUB (iterative Fringe Upper Bound) algorithm, proposed in [CGI+10], computes the exact value of the diameter of an unweighted undirected graph. It is based on the following observation:
The diameter of the graph is equal to the maximum eccentricity of a vertex. Let \(v\) be any vertex, and let \(V\) be partitionned into \(A\cup B\) where:
\[\begin{split}d(v,a)<=i, \forall a \in A\\ d(v,b)>=i, \forall b \in B\end{split}\]As all vertices from \(A\) are at distance \(\leq 2i\) from each other, a vertex \(a\in A\) with eccentricity \(ecc(a)>2i\) is at distance \(ecc(a)\) from some vertex \(b\in B\).
Consequently, if we have already computed the maximum eccentricity \(m\) of all vertices in \(B\) and if \(m>2i\), then we do not need to compute the eccentricity of the vertices in \(A\).
Starting from a vertex \(v\) obtained through a multi-sweep computation (which refines the 4sweep algorithm used in [CGH+13]), we compute the diameter by computing the eccentricity of all vertices sorted decreasingly according to their distance to \(v\), and stop as allowed by the remark above. The worst case time complexity of the iFUB algorithm is \(O(nm)\), but it can be very fast in practice.
source – (default: None) vertex from which to start the first BFS. If source==None, an arbitrary vertex of the graph is chosen. Raise an error if the initial vertex is not in \(G\). This parameter is not used when method=='standard'.
EXAMPLES:
sage: G = graphs.PetersenGraph()
sage: G.diameter(method='iFUB')
2
sage: G = Graph( { 0 : [], 1 : [], 2 : [1] } )
sage: G.diameter(method='iFUB')
+Infinity
Although max( ) is usually defined as -Infinity, since the diameter will never be negative, we define it to be zero:
sage: G = graphs.EmptyGraph()
sage: G.diameter(method='iFUB')
0
Comparison of exact methods:
sage: G = graphs.RandomBarabasiAlbert(100, 2)
sage: d1 = G.diameter(method='standard')
sage: d2 = G.diameter(method='iFUB')
sage: d3 = G.diameter(method='iFUB', source=G.random_vertex())
sage: if d1!=d2 or d1!=d3: print "Something goes wrong!"
Comparison of lower bound methods:
sage: lb2 = G.diameter(method='2sweep')
sage: lbm = G.diameter(method='multi-sweep')
sage: if not (lb2<=lbm and lbm<=d3): print "Something goes wrong!"
Returns the matrix of distances in G.
This function returns a double dictionary D of vertices, in which the distance between vertices u and v is D[u][v].
EXAMPLE:
sage: from sage.graphs.distances_all_pairs import distances_all_pairs
sage: g = graphs.PetersenGraph()
sage: distances_all_pairs(g)
{0: {0: 0, 1: 1, 2: 2, 3: 2, 4: 1, 5: 1, 6: 2, 7: 2, 8: 2, 9: 2},
1: {0: 1, 1: 0, 2: 1, 3: 2, 4: 2, 5: 2, 6: 1, 7: 2, 8: 2, 9: 2},
2: {0: 2, 1: 1, 2: 0, 3: 1, 4: 2, 5: 2, 6: 2, 7: 1, 8: 2, 9: 2},
3: {0: 2, 1: 2, 2: 1, 3: 0, 4: 1, 5: 2, 6: 2, 7: 2, 8: 1, 9: 2},
4: {0: 1, 1: 2, 2: 2, 3: 1, 4: 0, 5: 2, 6: 2, 7: 2, 8: 2, 9: 1},
5: {0: 1, 1: 2, 2: 2, 3: 2, 4: 2, 5: 0, 6: 2, 7: 1, 8: 1, 9: 2},
6: {0: 2, 1: 1, 2: 2, 3: 2, 4: 2, 5: 2, 6: 0, 7: 2, 8: 1, 9: 1},
7: {0: 2, 1: 2, 2: 1, 3: 2, 4: 2, 5: 1, 6: 2, 7: 0, 8: 2, 9: 1},
8: {0: 2, 1: 2, 2: 2, 3: 1, 4: 2, 5: 1, 6: 1, 7: 2, 8: 0, 9: 2},
9: {0: 2, 1: 2, 2: 2, 3: 2, 4: 1, 5: 2, 6: 1, 7: 1, 8: 2, 9: 0}}
Returns the matrix of distances in G and the matrix of predecessors.
Distances : the matrix \(M\) returned is of length \(n^2\), and the distance between vertices \(u\) and \(v\) is \(M[u,v]\). The integer corresponding to a vertex is its index in the list G.vertices().
Predecessors : the matrix \(P\) returned has size \(n^2\), and is such that vertex \(P[u,v]\) is a predecessor of \(v\) on a shortest \(uv\)-path. Hence, this matrix efficiently encodes the information of a shortest \(uv\)-path for any \(u,v\in G\) : indeed, to go from \(u\) to \(v\) you should first find a shortest \(uP[u,v]\)-path, then jump from \(P[u,v]\) to \(v\) as it is one of its outneighbors.
The integer corresponding to a vertex is its index in the list G.vertices().
EXAMPLE:
sage: from sage.graphs.distances_all_pairs import distances_and_predecessors_all_pairs
sage: g = graphs.PetersenGraph()
sage: distances_and_predecessors_all_pairs(g)
({0: {0: 0, 1: 1, 2: 2, 3: 2, 4: 1, 5: 1, 6: 2, 7: 2, 8: 2, 9: 2},
1: {0: 1, 1: 0, 2: 1, 3: 2, 4: 2, 5: 2, 6: 1, 7: 2, 8: 2, 9: 2},
2: {0: 2, 1: 1, 2: 0, 3: 1, 4: 2, 5: 2, 6: 2, 7: 1, 8: 2, 9: 2},
3: {0: 2, 1: 2, 2: 1, 3: 0, 4: 1, 5: 2, 6: 2, 7: 2, 8: 1, 9: 2},
4: {0: 1, 1: 2, 2: 2, 3: 1, 4: 0, 5: 2, 6: 2, 7: 2, 8: 2, 9: 1},
5: {0: 1, 1: 2, 2: 2, 3: 2, 4: 2, 5: 0, 6: 2, 7: 1, 8: 1, 9: 2},
6: {0: 2, 1: 1, 2: 2, 3: 2, 4: 2, 5: 2, 6: 0, 7: 2, 8: 1, 9: 1},
7: {0: 2, 1: 2, 2: 1, 3: 2, 4: 2, 5: 1, 6: 2, 7: 0, 8: 2, 9: 1},
8: {0: 2, 1: 2, 2: 2, 3: 1, 4: 2, 5: 1, 6: 1, 7: 2, 8: 0, 9: 2},
9: {0: 2, 1: 2, 2: 2, 3: 2, 4: 1, 5: 2, 6: 1, 7: 1, 8: 2, 9: 0}},
{0: {0: None, 1: 0, 2: 1, 3: 4, 4: 0, 5: 0, 6: 1, 7: 5, 8: 5, 9: 4},
1: {0: 1, 1: None, 2: 1, 3: 2, 4: 0, 5: 0, 6: 1, 7: 2, 8: 6, 9: 6},
2: {0: 1, 1: 2, 2: None, 3: 2, 4: 3, 5: 7, 6: 1, 7: 2, 8: 3, 9: 7},
3: {0: 4, 1: 2, 2: 3, 3: None, 4: 3, 5: 8, 6: 8, 7: 2, 8: 3, 9: 4},
4: {0: 4, 1: 0, 2: 3, 3: 4, 4: None, 5: 0, 6: 9, 7: 9, 8: 3, 9: 4},
5: {0: 5, 1: 0, 2: 7, 3: 8, 4: 0, 5: None, 6: 8, 7: 5, 8: 5, 9: 7},
6: {0: 1, 1: 6, 2: 1, 3: 8, 4: 9, 5: 8, 6: None, 7: 9, 8: 6, 9: 6},
7: {0: 5, 1: 2, 2: 7, 3: 2, 4: 9, 5: 7, 6: 9, 7: None, 8: 5, 9: 7},
8: {0: 5, 1: 6, 2: 3, 3: 8, 4: 3, 5: 8, 6: 8, 7: 5, 8: None, 9: 6},
9: {0: 4, 1: 6, 2: 7, 3: 4, 4: 9, 5: 7, 6: 9, 7: 9, 8: 6, 9: None}})
Returns the distances distribution of the (di)graph in a dictionary.
This method ignores all edge labels, so that the distance considered is the topological distance.
OUTPUT:
A dictionary d such that the number of pairs of vertices at distance k (if any) is equal to \(d[k] \cdot |V(G)| \cdot (|V(G)|-1)\).
Note
We consider that two vertices that do not belong to the same connected component are at infinite distance, and we do not take the trivial pairs of vertices \((v, v)\) at distance \(0\) into account. Empty (di)graphs and (di)graphs of order 1 have no paths and so we return the empty dictionary {}.
EXAMPLES:
An empty Graph:
sage: g = Graph()
sage: g.distances_distribution()
{}
A Graph of order 1:
sage: g = Graph()
sage: g.add_vertex(1)
sage: g.distances_distribution()
{}
A Graph of order 2 without edge:
sage: g = Graph()
sage: g.add_vertices([1,2])
sage: g.distances_distribution()
{+Infinity: 1}
The Petersen Graph:
sage: g = graphs.PetersenGraph()
sage: g.distances_distribution()
{1: 1/3, 2: 2/3}
A graph with multiple disconnected components:
sage: g = graphs.PetersenGraph()
sage: g.add_edge('good','wine')
sage: g.distances_distribution()
{1: 8/33, 2: 5/11, +Infinity: 10/33}
The de Bruijn digraph dB(2,3):
sage: D = digraphs.DeBruijn(2,3)
sage: D.distances_distribution()
{1: 1/4, 2: 11/28, 3: 5/14}
Returns the vector of eccentricities in G.
The array returned is of length n, and its ith component is the eccentricity of the ith vertex in G.vertices().
EXAMPLE:
sage: from sage.graphs.distances_all_pairs import eccentricity
sage: g = graphs.PetersenGraph()
sage: eccentricity(g)
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
Computes the shortest path/distances between all pairs of vertices.
For more information on the Floyd-Warshall algorithm, see the Wikipedia article on Floyd-Warshall.
INPUT:
- gg – the graph on which to work.
- paths (boolean) – whether to return the dictionary of shortest paths. Set to True by default.
- distances (boolean) – whether to return the dictionary of distances. Set to False by default.
OUTPUT:
Depending on the input, this function return the dictionary of paths, the dictionary of distances, or a pair of dictionaries (distances, paths) where distance[u][v] denotes the distance of a shortest path from \(u\) to \(v\) and paths[u][v] denotes an inneighbor \(w\) of \(v\) such that \(dist(u,v)= 1 + dist(u,w)\).
Warning
Because this function works on matrices whose size is quadratic compared to the number of vertices, it uses short variables instead of long ones to divide by 2 the size in memory. This means that the current implementation does not run on a graph of more than 65536 nodes (this can be easily changed if necessary, but would require much more memory. It may be worth writing two versions). For information, the current version of the algorithm on a graph with \(65536=2^{16}\) nodes creates in memory \(2\) tables on \(2^{32}\) short elements (2bytes each), for a total of \(2^{34}\) bytes or \(16\) gigabytes. Let us also remember that if the memory size is quadratic, the algorithm runs in cubic time.
Note
When paths = False the algorithm saves roughly half of the memory as it does not have to maintain the matrix of predecessors. However, setting distances=False produces no such effect as the algorithm can not run without computing them. They will not be returned, but they will be stored while the method is running.
EXAMPLES:
Shortest paths in a small grid
sage: g = graphs.Grid2dGraph(2,2)
sage: from sage.graphs.distances_all_pairs import floyd_warshall
sage: print floyd_warshall(g)
{(0, 1): {(0, 1): None, (1, 0): (0, 0), (0, 0): (0, 1), (1, 1): (0, 1)},
(1, 0): {(0, 1): (0, 0), (1, 0): None, (0, 0): (1, 0), (1, 1): (1, 0)},
(0, 0): {(0, 1): (0, 0), (1, 0): (0, 0), (0, 0): None, (1, 1): (0, 1)},
(1, 1): {(0, 1): (1, 1), (1, 0): (1, 1), (0, 0): (0, 1), (1, 1): None}}
Checking the distances are correct
sage: g = graphs.Grid2dGraph(5,5)
sage: dist,path = floyd_warshall(g, distances = True)
sage: all( dist[u][v] == g.distance(u,v) for u in g for v in g )
True
Checking a random path is valid
sage: u,v = g.random_vertex(), g.random_vertex()
sage: p = [v]
sage: while p[0] is not None:
... p.insert(0,path[u][p[0]])
sage: len(p) == dist[u][v] + 2
True
Distances for all pairs of vertices in a diamond:
sage: g = graphs.DiamondGraph()
sage: floyd_warshall(g, paths = False, distances = True)
{0: {0: 0, 1: 1, 2: 1, 3: 2},
1: {0: 1, 1: 0, 2: 1, 3: 1},
2: {0: 1, 1: 1, 2: 0, 3: 1},
3: {0: 2, 1: 1, 2: 1, 3: 0}}
TESTS:
Too large graphs:
sage: from sage.graphs.distances_all_pairs import floyd_warshall
sage: floyd_warshall(Graph(65536))
Traceback (most recent call last):
...
ValueError: The graph backend contains more than 65535 nodes
Tests if the graph is distance-regular
A graph \(G\) is distance-regular if for any integers \(j,k\) the value of \(|\{x:d_G(x,u)=j,x\in V(G)\} \cap \{y:d_G(y,v)=j,y\in V(G)\}|\) is constant for any two vertices \(u,v\in V(G)\) at distance \(i\) from each other. In particular \(G\) is regular, of degree \(b_0\) (see below), as one can take \(u=v\).
Equivalently a graph is distance-regular if there exist integers \(b_i,c_i\) such that for any two vertices \(u,v\) at distance \(i\) we have
where \(d\) is the diameter of the graph. For more information on distance-regular graphs, see its associated wikipedia page.
INPUT:
See also
EXAMPLES:
sage: g = graphs.PetersenGraph()
sage: g.is_distance_regular()
True
sage: g.is_distance_regular(parameters = True)
([3, 2, None], [None, 1, 1])
Cube graphs, which are not strongly regular, are a bit more interesting:
sage: graphs.CubeGraph(4).is_distance_regular()
True
sage: graphs.OddGraph(5).is_distance_regular()
True
Disconnected graph:
sage: (2*graphs.CubeGraph(4)).is_distance_regular()
True
TESTS:
sage: graphs.PathGraph(2).is_distance_regular(parameters = True)
([1, None], [None, 1])
sage: graphs.Tutte12Cage().is_distance_regular(parameters=True)
([3, 2, 2, 2, 2, 2, None], [None, 1, 1, 1, 1, 1, 3])
Returns the matrix of predecessors in G.
The matrix \(P\) returned has size \(n^2\), and is such that vertex \(P[u,v]\) is a predecessor of \(v\) on a shortest \(uv\)-path. Hence, this matrix efficiently encodes the information of a shortest \(uv\)-path for any \(u,v\in G\) : indeed, to go from \(u\) to \(v\) you should first find a shortest \(uP[u,v]\)-path, then jump from \(P[u,v]\) to \(v\) as it is one of its outneighbors.
The integer corresponding to a vertex is its index in the list G.vertices().
EXAMPLE:
sage: from sage.graphs.distances_all_pairs import shortest_path_all_pairs
sage: g = graphs.PetersenGraph()
sage: shortest_path_all_pairs(g)
{0: {0: None, 1: 0, 2: 1, 3: 4, 4: 0, 5: 0, 6: 1, 7: 5, 8: 5, 9: 4},
1: {0: 1, 1: None, 2: 1, 3: 2, 4: 0, 5: 0, 6: 1, 7: 2, 8: 6, 9: 6},
2: {0: 1, 1: 2, 2: None, 3: 2, 4: 3, 5: 7, 6: 1, 7: 2, 8: 3, 9: 7},
3: {0: 4, 1: 2, 2: 3, 3: None, 4: 3, 5: 8, 6: 8, 7: 2, 8: 3, 9: 4},
4: {0: 4, 1: 0, 2: 3, 3: 4, 4: None, 5: 0, 6: 9, 7: 9, 8: 3, 9: 4},
5: {0: 5, 1: 0, 2: 7, 3: 8, 4: 0, 5: None, 6: 8, 7: 5, 8: 5, 9: 7},
6: {0: 1, 1: 6, 2: 1, 3: 8, 4: 9, 5: 8, 6: None, 7: 9, 8: 6, 9: 6},
7: {0: 5, 1: 2, 2: 7, 3: 2, 4: 9, 5: 7, 6: 9, 7: None, 8: 5, 9: 7},
8: {0: 5, 1: 6, 2: 3, 3: 8, 4: 3, 5: 8, 6: 8, 7: 5, 8: None, 9: 6},
9: {0: 4, 1: 6, 2: 7, 3: 4, 4: 9, 5: 7, 6: 9, 7: 9, 8: 6, 9: None}}
Returns the Wiener index of the graph.
The Wiener index of a graph \(G\) can be defined in two equivalent ways [KRG96b] :
EXAMPLE:
From [GYLL93c], cited in [KRG96b]:
sage: g=graphs.PathGraph(10)
sage: w=lambda x: (x*(x*x -1)/6)
sage: g.wiener_index()==w(10)
True