The nature of algorithmic feeds like Reddit inherently leads to a survivorship bias: although users may recognize certain types of posts that appear on the front page, there are many more which follow the same patterns but fail.
For IMDb's big-but-not-big data, you have to play with the data smartly, and both R and ggplot2 have neat tricks to do just that.
Although visualizing basketball shots has been done before, this time we have access to an order of magnitude more public data to do some really cool stuff.
I was surprised to see that all types of programming languages have quick answer times and a high probability of receiving an acceptable answer!
In general, it takes little additional effort to make something unique with ggplot2, and the effort is well worth it.
Before Reddit added native image hosting, Imgur accounted for 15% of all submissions to Reddit. Now it's below 9%.
The relatively new R Notebooks improve the workflows of common data analysis in ways Jupyter Notebooks can't.
Keras + TensorFlow + Pretrained character embeddings makes text generation a breeze.
Given that a SF police arrest occurs at a specified time and place, what is the reason for that arrest?
Manipulating actually-big-data is just as easy as performing an analysis on a dataset with only a few records.