I’ve done a lot of experimentation into determining the most efficient way to generate network graphs for visualizing relationships between groups.

Among popular network graph visualization tools for the web, Gephi is a pain to work with and the workflows are not reproducible, Sigma.js is better but trickier to configure, and my stop-gap method of exporting network graphs as a PDF has issues on mobile devices.

I’ve had success with Plotly for data visualizations, particularly its use for automatically converting R and ggplot2 charts into interactive charts. There are a few official tutorials for Plotly network graphs using network-related R packages, but they are not straightforward, and even if I adapted the tutorial for other datasets, the resulting chart would be difficult to customize for maximum usability.

Recently, I had an incredibly stupid idea to try and *combine* Plotly and the customization of ggplot2 using several layers of R packages to make effective interactive network graphs with very little code. Normally, I would not call this **ONE WEIRD TRICK** because that is clickbait, but this *is* indeed a weird trick that shouldn’t even work. But it does!

## R and Plotly

For this example of a network graph, we will use the flights paths of domestic airline flights from New York City airports in 2013, provided by the `nycflights13`

R package by Hadley Wickham (usually, tutorials on working with networks use the character network map from *Les MisÃ¨rables*, so let’s mix it up a bit).

First, we install and load a few R packages:

```
library(dplyr)
library(nycflights13)
library(igraph)
library(intergraph)
library(sna)
library(ggplot2)
library(ggnetwork)
library(plotly)
```

`dplyr`

helps aggregate and manipulate tabular data.`nycflights13`

contains the data.`igraph`

allows the creation of network graphs, with`sna`

and`intergraph`

helping allow compatibity with other packages.`ggplot2`

for constructing the chart, with`ggnetwork`

by FranÃ§ois Briatte having special interactions with ggplot2 for graphing network graphs specifically. (note: at time-of-writing, ggnetwork must be installed after ggplot2 from source).`plotly`

for converting the ggplot2 chart into the final interactive graph.

First, let’s look at the included `flights`

dataset with nycflights13. The dataset contains 336,776 total flights, with information such as date/time departed, the carrier, and most importantly, the airport where the plane leaves, and the airport where the plane arrives. There are only three airports where the plane leaves from in this dataset: Newark Liberty International Airport (EWR), John F. Kennedy International Airport (JFK), and LaGuardia Airport (LGA). All three airports have approximately the same number of outbound flights.

We want to aggregate the counts of pairs of (origin, destination): the pairs will serve as the *edges* of our network graph, and the airports will serve as the vertices. The counts of pairs (i.e. number of flights from X to Y) will serve as the *weight* of the edge during the graph layout phase; nodes with a higher edge weight between them have a greater “pull.” This aggregation takes just one line of `dplyr`

code.

```
df_edges <- flights %>% group_by(origin, dest) %>% summarize(weight = n())
```

Using `igraph`

, we create a network graph object from the aggregated dataset. In this context, the graph is *directed*, since logically each flight is in one direction.

```
net <- graph.data.frame(df_edges, directed = T)
```

As noted by the numbers at the top of the output, there are 107 vertices/unique airports, and 224 unique edges.

We may want to size the vertices of each of the airports in the final chart. We can set that in igraph as well by calculating the degree of each vertex, so nodes with more connected airports will be larger.

```
V(net)$degree <- centralization.degree(net)$res
```

Now we can convert the network to a `ggplot2`

friendly format. `ggnetwork`

both calculates x/y coordinates for the nodes of the network graph according to the Fruchterman-Reingold force-directed layout algorithm (so that nodes connected by higher weights will be closer), and also formats the data into a ggplot2-friendly tabular format. 5,000 iterations of the algorithm on a smaller graph is enough for a quick demo, but you should use as many iterations as possible.

```
df_net <- ggnetwork(net, layout = "fruchtermanreingold", weights = "weight", niter = 5000)
```

Now we can plot the graph in R. The base aesthetics for the ggplot are the x/y coordinates of the points, and the xend/yend-points of the edge line segments. ggnetwork exposes a `geom_edges()`

which plots the edges of a network graph, and `geom_nodes()`

which plots the nodes. Both of these functions work by extending normal ggplot2 functions: the edges are actually ggplot2 lines (via `geom_segment()`

), while the nodes are ggplot2 points (via `geom_points()`

).

Style-wise, for the *edges*, we should apply a low opacity so we can see where they overlap. For the *nodes*, we can size the nodes depending on their degree, and also set the text to the airport name for use later. We should include a plot title as always, and use the `theme_blank()`

ggplot2 theme included with ggnetwork such that the visualization does not use ggplot2’s default theme.

```
plot <- ggplot(df_net, aes(x = x, y = y, xend = xend, yend = yend)) +
geom_edges(size = 0.4, alpha = 0.25) +
geom_nodes(aes(size = degree, text = vertex.names)) +
ggtitle("Network Graph of U.S. Flights Outbound from NYC in 2013") +
theme_blank()
```

Putting it all together:

Nifty! So how does this become interactive? Easily.

## One Weird Trick

The `plotly`

R package has a `ggplotly()`

function which converts a ggplot2 chart to an interactive Plotly chart, and the resulting parity to the static chart is surprisingly good.

Plotly also has as a `toWebGL()`

function, which turns a Plotly chart to the WebGL equivalent, allowing the system GPU to power crazy data visualizations with tons of points (my Interactive Clickbait chart using this WebGL interface has about 10,000 points and can be manipulated without slowdown). The Plotly documentation is sparse on the limitations of its WebGL support, but examining the examples indicates that WebGL support is limited to only points and straight lines; no polygons or curves.

Hey, isn’t a network graph just points and straight lines?

So let’s chain the plot into ggplotly() while setting the hover text to the airport names, and chain that to the WebGL functionality:

```
plot %>% ggplotly(tooltip = "text") %>% toWebGL()
```

You can now mouse over the points to see the corresponding airport. Not too shabby!

When generating the interactive charts in R, a browser window will pop up with the chart, and the chart contains an option to save the chart to a Plotly account. This is the most practical way to embed the chart for a blog service such as Wordpress or Medium. (in my case, I embed the charts directly into the blog post as static data so it will always be accessible).

With a little more data manipulation, we can integrate more metadata into the network (not shown; however all code is available in this R Notebook). The nycflights13 package also contains an `airports`

dataset with the full name of the airports and their location in latitude/longitude coordinates. We can map that data into the corresponding vertices, and use some of it for more informative tooltip popovers, such as including the total number of flights to/from each airport. Additionally, we can set the colors of edges to correspond to the NYC origin airport. The algorithm is also set to 50,000 iterations for even further convergence.

The result is the network graph you saw at the beginning of the post.

Looking at this network graph, you can clearly see that some airports receive flights from all 3 airports, some from only 2 airports, and some from only 1 airport. I admit I am not an expert in the inner-workings of the airline industry, but what would cause this?

## Sea to Shining Sea

Another approach is to map the airports by their physical location. This is easy to do on the ggplot2-side since the latitude/longitudes were added to the data set for the previous chart, so all we have to do is tell ggplot2 to use those instead, where the base aesthetics of the ggplot statement now become:

```
ggplot(df_net, aes(x = lon, y = lat, xend = endlon, yend = endlat))
```

The result is a highly-abstract depiction of the United States. (+ Alaska and Hawaii)

Again, one line of code makes it interactive.

This is an instance where Plotly’s native zooming and filtering are very helpful! For example, if you filter solely on flights from LGA/LaGuardia, you will see that the airport never shuttles flights to the western United States.

However, there are no obvious trends outside of that observation. The 1-degree airports and the 2-degree airports are scattered around the country with no apparent pattern. Oh well.

I should mention that since there is a relatively low amount of data, these examples do not benefit as much from rendering with WebGL than with Plotly’s native d3/SVG. And even then, there are still further optimizations that can be done; after finishing the code write up, I discovered ggraph, which is more actively developed than ggnetwork, although ggraph does not interact as well with Plotly.

This creation workflow is pragmatic enough that I can included these interactive network graphs in more blog posts. At the least, this approach is a good proof-of-concept for some *very crazy* data visualizations I have planned.

*You can view all the R and ggplot2 code used to visualize the data in this R Notebook. You can also view the images/data used for this post in this GitHub repository*.

*You are free to use the data visualizations from this article however you wish, but it would be greatly appreciated if proper attribution is given to this article and/or myself!*

I am currently **looking for a job** in data analysis/software engineering in San Francisco. If you liked this post and have a lead, feel free to shoot me an email.

Since I currently do not have a full-time salary to subsidize my machine learning/deep learning/software/hardware needs for these blog posts, I have set up a Patreon, and any monetary contributions to the Patreon are appreciated and will be put to good creative use.