Big Data

How to Visualize New York City Using Taxi Location Data and ggplot2

I had posted a visualization of NYC taxis using ggplot2. Due to popular demand, I've cleaned up the code and have released it open source, with a few improvements.

Quantifying and Visualizing the Reddit Hivemind

If we can find out which topics Reddit users tend to upvote, we can identify what keywords are most attractive to the Reddit hivemind.

How to Analyze Every Reddit Submission and Comment, in Seconds, for Free

With Reddit data in BigQuery, quantifying all the hundreds of millions of Reddit submissions and comments is trivial.

Why is the Most-Viewed Gaming Video on YouTube About Cars 2?

No, this is not an error. You can watch the video yourself on YouTube and verify the view count.

Analyzing the Patterns of Numbers in 10 Million Passwords

There are many patterns for numbers in passwords, which involve surprising yet intuitive logic.

A Statistical Analysis of 142 Million Reddit Submissions

I constructed a database to store all Reddit Submissions from November 2007 to the end of October 2014: 142,159,793 submissions in total. And this data is very curious and very, *very* memetic.

The Quality, Popularity, and Negativity of 5.6 Million Hacker News Comments

Hopefully, these comments will answer whether Hacker News is experiencing a rise in quality, or if the complaints levied against HN are valid.