Playing with 80 Million Amazon Product Review Ratings Using Apache Spark

Manipulating actually-big-data is just as easy as performing an analysis on a dataset with only a few records.

January 2, 2017 · 7 min

What Percent of the Top-Voted Comments in Reddit Threads Were Also 1st Comment?

Are commenters ’late to this thread’ indeed late?

November 7, 2016 · 7 min

Visualizing How Developers Rate Their Own Programming Skills

As it turns out, there is no correlation between programming ability and the frequency of Stack Overflow visits.

July 21, 2016 · 6 min

Methods for Finding Related Reddit Subreddits with Simple Set Theory

Fancy machine learning approaches may not be required to help Redditors discover new things.

June 20, 2016 · 5 min

How to Create a Network Graph Visualization of Reddit Subreddits

There is very little discussion on how to gather the data for large-scale network graph visualizations, and how to make them. It is time to fix that.

May 27, 2016 · 7 min