Locating All the Christmas Trees on Instagram

I downloaded hundreds of thousands of #tree images and found 25,432 images which were taken on Christmas, have a #tree, and, most importantly, contain location data where the photo was taken.

January 1, 2015 · 5 min

A Statistical Analysis of 142 Million Reddit Submissions

I constructed a database to store all Reddit Submissions from November 2007 to the end of October 2014: 142,159,793 submissions in total. And this data is very curious and very, very memetic.

December 16, 2014 · 8 min

The Quality, Popularity, and Negativity of 5.6 Million Hacker News Comments

Hopefully, these comments will answer whether Hacker News is experiencing a rise in quality, or if the complaints levied against HN are valid.

October 6, 2014 · 9 min

The Statistical Difference Between 1-Star and 5-Star Reviews on Yelp

It can be proven that language has a strong statistical effect on review ratings, but that is intuitive enough. How have review ratings changed?

September 23, 2014 · 7 min

The Data From Our Comments to the FCC About Net Neutrality

The FCC released a dataset of about 450,000 comments against net neutrality. Looking at the data behind these comments, it is clear to see that the entire country is passionate against the rule changes to net neutrality.

August 8, 2014 · 6 min