Visualizing Airline Flight Characteristics Between SFO and JFK

Box plots, when used correctly, can be a very fun way to visualize big data.

October 23, 2019 · 8 min

Problems with Predicting Post Performance on Reddit and Other Link Aggregators

The nature of algorithmic feeds like Reddit inherently leads to a survivorship bias: although users may recognize certain types of posts that appear on the front page, there are many more which follow the same patterns but fail.

September 10, 2018 · 10 min

Analyzing IMDb Data The Intended Way, with R and ggplot2

For IMDb’s big-but-not-big data, you have to play with the data smartly, and both R and ggplot2 have neat tricks to do just that.

July 16, 2018 · 11 min

Visualizing One Million NCAA Basketball Shots

Although visualizing basketball shots has been done before, this time we have access to an order of magnitude more public data to do some really cool stuff.

March 19, 2018 · 6 min

Playing with 80 Million Amazon Product Review Ratings Using Apache Spark

Manipulating actually-big-data is just as easy as performing an analysis on a dataset with only a few records.

January 2, 2017 · 7 min