Predicting And Mapping Arrest Types in San Francisco with LightGBM, R, ggplot2

Given that a SF police arrest occurs at a specified time and place, what is the reason for that arrest?

February 8, 2017 · 10 min

Playing with 80 Million Amazon Product Review Ratings Using Apache Spark

Manipulating actually-big-data is just as easy as performing an analysis on a dataset with only a few records.

January 2, 2017 · 7 min

What Percent of the Top-Voted Comments in Reddit Threads Were Also 1st Comment?

Are commenters ’late to this thread’ indeed late?

November 7, 2016 · 7 min

Visualizing How Developers Rate Their Own Programming Skills

As it turns out, there is no correlation between programming ability and the frequency of Stack Overflow visits.

July 21, 2016 · 6 min

Methods for Finding Related Reddit Subreddits with Simple Set Theory

Fancy machine learning approaches may not be required to help Redditors discover new things.

June 20, 2016 · 5 min