Zoom in/out or pan around the chart using the controls in the upper-right corner. Hover on a data point to identify the corresponding airport. Click on an airport in the legend to toggle on/off. The size of the airport represents the degree of the airport (number of connected nodes)

I’ve done a lot of experimentation into determining the most efficient way to generate network graphs for visualizing relationships between groups.

Among popular network graph visualization tools for the web, Gephi is a pain to work with and the workflows are not reproducible, Sigma.js is better but trickier to configure, and my stop-gap method of exporting network graphs as a PDF has issues on mobile devices.

I’ve had success with Plotly for data visualizations, particularly its use for automatically converting R and ggplot2 charts into interactive charts. There are a few official tutorials for Plotly network graphs using network-related R packages, but they are not straightforward, and even if I adapted the tutorial for other datasets, the resulting chart would be difficult to customize for maximum usability.

Recently, I had an incredibly stupid idea to try and combine Plotly and the customization of ggplot2 using several layers of R packages to make effective interactive network graphs with very little code. Normally, I would not call this ONE WEIRD TRICK because that is clickbait, but this is indeed a weird trick that shouldn’t even work. But it does!

R and Plotly

For this example of a network graph, we will use the flights paths of domestic airline flights from New York City airports in 2013, provided by the nycflights13 R package by Hadley Wickham (usually, tutorials on working with networks use the character network map from Les Misèrables, so let’s mix it up a bit).

First, we install and load a few R packages:

library(dplyr)
library(nycflights13)
library(igraph)
library(intergraph)
library(sna)
library(ggplot2)
library(ggnetwork)
library(plotly)
  • dplyr helps aggregate and manipulate tabular data.
  • nycflights13 contains the data.
  • igraph allows the creation of network graphs, with sna and intergraph helping allow compatibity with other packages.
  • ggplot2 for constructing the chart, with ggnetwork by François Briatte having special interactions with ggplot2 for graphing network graphs specifically. (note: at time-of-writing, ggnetwork must be installed after ggplot2 from source).
  • plotly for converting the ggplot2 chart into the final interactive graph.

First, let’s look at the included flights dataset with nycflights13. The dataset contains 336,776 total flights, with information such as date/time departed, the carrier, and most importantly, the airport where the plane leaves, and the airport where the plane arrives. There are only three airports where the plane leaves from in this dataset: Newark Liberty International Airport (EWR), John F. Kennedy International Airport (JFK), and LaGuardia Airport (LGA). All three airports have approximately the same number of outbound flights.

We want to aggregate the counts of pairs of (origin, destination): the pairs will serve as the edges of our network graph, and the airports will serve as the vertices. The counts of pairs (i.e. number of flights from X to Y) will serve as the weight of the edge during the graph layout phase; nodes with a higher edge weight between them have a greater “pull.” This aggregation takes just one line of dplyr code.

df_edges <- flights %>% group_by(origin, dest) %>% summarize(weight = n())

Using igraph, we create a network graph object from the aggregated dataset. In this context, the graph is directed, since logically each flight is in one direction.

net <- graph.data.frame(df_edges, directed = T)

As noted by the numbers at the top of the output, there are 107 vertices/unique airports, and 224 unique edges.

We may want to size the vertices of each of the airports in the final chart. We can set that in igraph as well by calculating the degree of each vertex, so nodes with more connected airports will be larger.

V(net)$degree <- centralization.degree(net)$res

Now we can convert the network to a ggplot2 friendly format. ggnetwork both calculates x/y coordinates for the nodes of the network graph according to the Fruchterman-Reingold force-directed layout algorithm (so that nodes connected by higher weights will be closer), and also formats the data into a ggplot2-friendly tabular format. 5,000 iterations of the algorithm on a smaller graph is enough for a quick demo, but you should use as many iterations as possible.

df_net <- ggnetwork(net, layout = "fruchtermanreingold", weights = "weight", niter = 5000)

Now we can plot the graph in R. The base aesthetics for the ggplot are the x/y coordinates of the points, and the xend/yend-points of the edge line segments. ggnetwork exposes a geom_edges() which plots the edges of a network graph, and geom_nodes() which plots the nodes. Both of these functions work by extending normal ggplot2 functions: the edges are actually ggplot2 lines (via geom_segment()), while the nodes are ggplot2 points (via geom_points()).

Style-wise, for the edges, we should apply a low opacity so we can see where they overlap. For the nodes, we can size the nodes depending on their degree, and also set the text to the airport name for use later. We should include a plot title as always, and use the theme_blank() ggplot2 theme included with ggnetwork such that the visualization does not use ggplot2’s default theme.

plot <- ggplot(df_net, aes(x = x, y = y, xend = xend, yend = yend)) +
    geom_edges(size = 0.4, alpha = 0.25) +
    geom_nodes(aes(size = degree, text = vertex.names)) +
    ggtitle("Network Graph of U.S. Flights Outbound from NYC in 2013") +
    theme_blank()

Putting it all together:

Nifty! So how does this become interactive? Easily.

One Weird Trick

The plotly R package has a ggplotly() function which converts a ggplot2 chart to an interactive Plotly chart, and the resulting parity to the static chart is surprisingly good.

Plotly also has as a toWebGL() function, which turns a Plotly chart to the WebGL equivalent, allowing the system GPU to power crazy data visualizations with tons of points (my Interactive Clickbait chart using this WebGL interface has about 10,000 points and can be manipulated without slowdown). The Plotly documentation is sparse on the limitations of its WebGL support, but examining the examples indicates that WebGL support is limited to only points and straight lines; no polygons or curves.

Hey, isn’t a network graph just points and straight lines?

So let’s chain the plot into ggplotly() while setting the hover text to the airport names, and chain that to the WebGL functionality:

plot %>% ggplotly(tooltip = "text") %>% toWebGL()

You can now mouse over the points to see the corresponding airport. Not too shabby!

When generating the interactive charts in R, a browser window will pop up with the chart, and the chart contains an option to save the chart to a Plotly account. This is the most practical way to embed the chart for a blog service such as Wordpress or Medium. (in my case, I embed the charts directly into the blog post as static data so it will always be accessible).

With a little more data manipulation, we can integrate more metadata into the network (not shown; however all code is available in this R Notebook). The nycflights13 package also contains an airports dataset with the full name of the airports and their location in latitude/longitude coordinates. We can map that data into the corresponding vertices, and use some of it for more informative tooltip popovers, such as including the total number of flights to/from each airport. Additionally, we can set the colors of edges to correspond to the NYC origin airport. The algorithm is also set to 50,000 iterations for even further convergence.

The result is the network graph you saw at the beginning of the post.

Looking at this network graph, you can clearly see that some airports receive flights from all 3 airports, some from only 2 airports, and some from only 1 airport. I admit I am not an expert in the inner-workings of the airline industry, but what would cause this?

Sea to Shining Sea

Another approach is to map the airports by their physical location. This is easy to do on the ggplot2-side since the latitude/longitudes were added to the data set for the previous chart, so all we have to do is tell ggplot2 to use those instead, where the base aesthetics of the ggplot statement now become:

ggplot(df_net, aes(x = lon, y = lat, xend = endlon, yend = endlat))

The result is a highly-abstract depiction of the United States. (+ Alaska and Hawaii)

Again, one line of code makes it interactive.

This is an instance where Plotly’s native zooming and filtering are very helpful! For example, if you filter solely on flights from LGA/LaGuardia, you will see that the airport never shuttles flights to the western United States.

However, there are no obvious trends outside of that observation. The 1-degree airports and the 2-degree airports are scattered around the country with no apparent pattern. Oh well.

I should mention that since there is a relatively low amount of data, these examples do not benefit as much from rendering with WebGL than with Plotly’s native d3/SVG. And even then, there are still further optimizations that can be done; after finishing the code write up, I discovered ggraph, which is more actively developed than ggnetwork, although ggraph does not interact as well with Plotly.

This creation workflow is pragmatic enough that I can included these interactive network graphs in more blog posts. At the least, this approach is a good proof-of-concept for some very crazy data visualizations I have planned.


You can view all the R and ggplot2 code used to visualize the data in this R Notebook. You can also view the images/data used for this post in this GitHub repository.

You are free to use the data visualizations from this article however you wish, but it would be greatly appreciated if proper attribution is given to this article and/or myself!

Author

Max Woolf (@minimaxir) is currently a data scientist at BuzzFeed in San Francisco. He is also an ex-Apple employee and Carnegie Mellon University graduate.

In his spare time, Max uses Python to gather data from public APIs and ggplot2 to plot plenty of pretty charts from that data. On special occasions, he uses Keras for fancy deep learning projects.

You can learn more about Max here, view his data analysis portfolio here, or view his coding portfolio here.