Generating inputs for random-testing graph algorithms?

When testing algorithms, a common approach is random testing: generate a significant number of inputs according to some distribution (usually uniform), run the algorithm on them and verify correctness. Modern testing frameworks can generate inputs automatically given the algorithms signature, with some restrictions.

If the inputs are numbers, lists or strings, generating such inputs in straight-forward. Trees are harder, but still easy (using stochastic context-free grammars or similar approaches).

How can you generate random graphs (efficiently)? Usually, picking graphs uniformly at random is not what you want: they should be connected, or planar, or cycle-free, or fulfill any other property. Rejection sampling seems suboptimal, due to the potentially huge set of undesirable graphs.

What are useful distributions to look at? Useful here means that

  • the graphs are likely to test the algorithm at hand well and
  • they can be generated effectively and efficiently.

I know that there are many models for random graphs, so I’d appreciate some insight into which are best for graph generation in this regard.

If “some algorithm” is too general, please use shortest-path finding algorithms as a concrete class of algorithms under test. Graphs for testing should be connected and rather dense (with high probability, or at least in expectation). For testing, the optimal solution would be to create random graphs around a shortest path so we know the desired result (without having to employ another algorithm).


Random graphs with small world topology

In graphs with small world topology, nodes are highly clustered yet the path length between them is small. A topology like this can make search problems very difficult, since local decisions quickly propagate globally. In other words, shortcuts can mislead heuristics. Further is has been shown that many different search problems have a small world topology.

Watts and Strogatz [1] propose a model for small world graphs. First, we start with a regular graph. Disorder is introduced into the graph by randomly rewiring each edge with probability $p$. If $p=0$, the graph is completely regular and ordered. If $p=1$, the graph is completely random and disordered. Values of $0 < p < 1$ produce graphs that are neither completely regular nor completely disordered. Graphs don’t have a small world topology for $p=0$ and $p=1$.

Watts and Strogatz start from a ring lattice with $n$ nodes and $k$ nearest neighbours. A node is chosen from the lattice uniformly at random, and a rewired edge is reconnected to it. If rewiring would create a duplicate edge, it is left untouched. For large, sparse graphs they demand $n \gg k \gg \ln(n) \gg 1$, where $k \gg \ln(n)$ ensures the graph remains connected.

The model of Watts and Strogatz is somewhat popular, but does have certain drawbacks. Walsh [2] investigates the effects of randomization and restart strategies in graphs generated using the model. There’s also a paper by Virtanen [3], which covers other models motivated by the need of realistic modeling of complex systems.

Random simple planar graphs

Generating random simple planar graphs on $n$ vertices uniformly at random can be done efficiently. The number of planar graphs with $n$ vertices, $g_n$, can be determined using generating functions. The value of $g_n$ for $1 \leq n \leq 9$ is $1,2,8,64,1023,32071,1823707,163947848$ and $20402420291$, respectively. Since the numbers are too complicated, one is not expected to find a closed formula for them. Giménez and Noy [4] give a precise asymptotic estimate for the growth of $g_n$:
$$g_n \sim g \cdot n^{-7/2} \gamma^n n!,$$
where $g$ and $\gamma$ are constants determined analytically with approximate values $g \approx 0.42609$ and $\gamma \approx 27.22687$.

The proof of the result leads to a very efficient algorithm by Fusy [5]. Fusy gives an approximate size random generator and also a exact size random generator of planar graphs. The approximate size algorithm runs in linear time while the exact size algorithm runs in quadratic time. The algorithms are based on decomposition according to successive levels of connectivity: planar graph $\rightarrow$ connected $\rightarrow$ 2-connected $\rightarrow$ 3-connected $\rightarrow$ binary tree.

The algorithms then operate by translating a decomposition of a planar graph into a random generator using the framework of Boltzmann samplers by Duchon, Flajolet, Louchard and Schaeffer [6]. Given a combinatorial class, a Boltzmann sampler draws an object of size $n$ with probability to $x^n$, where $x$ is certain real parameter tuned by the user. Furthermore, the probability distribution is spread over all the objects of the class, with the property that objects of the same size have the same probability of occurring. Also, the probability distribution is uniform when restricted to a fixed size.

For a lightweight introduction, see a presentation by Fusy.

[1] D.J. Watts and S.H. Strogatz. Collective dynamics of ‘small-world’ networks. Nature, 393:440-442, 1998.

[2] Toby Walsh. Search in a small world. Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI-99-Vol2), pages 1172-1177, 1999.

[3] Satu Virtanen. Properties of nonuniform random graph models. Research Report A77, Helsinki University of Technology, Laboratory for Theoretical Computer Science, 2003.

[4] O. Giménez and M. Noy. Asymptotic enumeration and limit laws of planar graphs, arXiv
math.CO/0501269. An extended abstract has appeared in Discrete Mathematics and The-
oretical Computer Science AD (2005), 147-156

[5] E. Fusy. Quadratic and linear time generation of planar graphs, Discrete Mathematics and Theoretical Computer Science AD (2005), 125-138.

[6] P. Duchon, P. Flajolet, G. Louchard, and G. Schaeffer. Boltzmann sampler for the random generation of combinatorial structures. Combinatorics, Probability and Computing, 13(4-5):577-625, 2004.

Source : Link , Question Author : Raphael , Answer Author : Juho

Leave a Comment