Quid allows users to derive data driven insights from extensive, unstructured data sets. Not only do we collect and analyze the world’s unstructured data, but we also present this data to customers in a way that instigates insight and inspiration, and ultimately adds value to their business.
On the engineering side, we’ve been faced with the challenge of visualizing information organized into networks in a meaningful and efficient way. The networks consist of nodes, or vertices, and links between them, also called edges. The vertices stand for real-world entities, and edges represent a variety of relationships between them. This structured illumination helps us understand a given data set faster and more fully.
While network generation takes place on Quid’s infrastructure, the interactive visualization runs entirely in your web browser. This means displaying and manipulating thousands of vertices and edges in an environment not known for speed. Quid engineers are always looking for ways to up the amount of data on display, while keeping a responsive frame rate; optimizing the network visualization is one of the more satisfying improvements we have made recently. Read on to find out how we did that!
Providing Structure to the Unstructured
Some networks are easier to draw than others. Spiders, for example, weave nets that are both manifestly planar and fairly regular. The nodes and links all lie in one plane, and the spiders’ artistic finesse is universally recognizable.
Networks of real-world information are just like that, but a bit messier. Connections tend to be more complicated and less regular, and we do not even know the visual shape of the network ahead of time. What we do know, roughly, is which vertices go together and which don’t, but that is just a start to visualizing the network.
To visualize these relations, we arrange vertices and edges using a physics simulation that mimics the stretching of spider webs. Our spider-web-inspired representation models interactions between each pair of vertices as a Coulomb-like repulsion and an additional Hooke-like attraction in the presence of an edge between the pair. A weak gravitation-like force is used to prevent separate components and isolated vertices from venturing too far from the network’s center of mass.
Using this physics-based network layout has turned out to be a fruitful approach. For one, the layout looks remarkably natural. Equally important, the hierarchy of a network’s structure is readily apparent; both small and large network structures are exposed, which allows us to study relationships between groups of vertices on different scales.
Physics Simulations Aren’t Cheap
Running a physics simulation in a user’s browser isn’t easy, as the simulation is inherently resource-intensive. Much can be said about optimizing the performance of individual simulation time steps, but we want to focus on a less obvious way the size of data impacts simulation performance.
Empirically, we found that the number of time steps necessary to achieve equilibrium starting with a random configuration of vertices scales linearly with the number of vertices. That is undesirable for presentation purposes. The natural question to ask is whether it is possible to arrange for initial conditions of the simulation so the equilibrium is attained faster. This is where thinking about energy landscapes comes in handy.
A physics-based animation of an n-body system can be thought of as a visualization of gradient descent optimization. Just as an optimization routine iteratively computes the gradient of an objective function in some parameter space and “descends” in that direction, we compute the gradient of the system’s energy (i.e. the force), integrate that to compute momentum, and move the particles accordingly. Unfortunately, starting with random initial conditions sets the simulation up for a journey through the energy landscape which is really, really hilly.
A natural improvement over the random initial conditions would be to seed the simulation with a vertex configuration that is in the vicinity of the final destination. We could try a discretized version of the problem and search through all vertex configurations on a 2D grid. This process is still of combinatorial complexity and generally too expensive. Fortunately, we can use a clever trick to simplify the search space to one dimension.
It turns out that it’s possible to cover a 2D region of space with a one-dimensional curve. Such space-filling curves are usually constructed via an iterative process, whereby at each step of the iteration the curve is refined at ever-finer scales. By hijacking this process at a finite step, we can obtain a curve with just enough points to accommodate our data.
In addition, most space filling curves enjoy a rather remarkable property: the 2D distance between any pair of vertices is well-approximated by (the square root of) the distance along the curve.
Altogether, this makes the curves very useful for our layout purposes. With a choice of a closed space-filling curve, the problem of finding an approximate 2D layout is equivalent to finding an energy-optimal linear ordering of vertices. This, in turn, allows for developing simple and effective heuristics, eventually circumventing the initial combinatorial complexity.
To illustrate the merit of this approach, let us start a simulation of the same network in two different ways. First, with vertices ordered and laid out along a space-filling curve, and second, randomly arranging the initial positions of all vertices. Observe how the time taken by the space-filing curve initial conditions to evolve to near-stationary solution is barely enough for the random initial conditions to assemble local coherence! The sooner we get users digging into the data at superhuman scale, the better!
From an engineer’s standpoint, the journey to analyze and visualize the world’s data doesn’t end here. We have only scratched the surface.
Interested in helping us solve awesome problems? If so, then head over to our careers page!