Grant Slatton made an amusing post on Hacker News yesterday titled “Show HN: Probabilistically Generating HN Post Titles”. By using the statistical principle of Markov chains, Slatton was able to generate eerily-realistic Hacker News headlines such as “Facebook detects if you are not a pilot” and “The No. 1 Habit of Highly Effective Mediocre Entrepreneurs.”

Could Markov chains be applied to any other data sets for hilarious effect? By using Slatton’s Python implementation of Markov chains plus 300,000 descriptions of public GitHub repositories retrieved from their API, I discovered that statistical randomness can indeed create funny innovation.

You can download a list of 1,000 Markov chain-generated projects here. Here are a few interesting ones:

  • MaNGOS is a free, Open Source implementation of a tag at relatively random intervals.
  • A Warhammer 40k simulator to teach myself both OpenGL and Clojure
  • Perl interface to Git repositories via Ruby.
  • A windows live messenger network client written in Erlang
  • Rails plugin which allows to talk anonymously and use tripcodes if you want.
  • A Firebug extension for displaying the latest from Hacker News
  • Sinatra-inspired JavaScript node.js web development framework for lua. Inspired by rspec
  • Inverted Index on top of Tornado
  • Android LED interface library for various wave propagation techniques.
  • CatchAPI is a Java API to remove the need for boring project setup.
  • Adds basic social networking capabilities to your lighting system based on the concept of the Working with Rails
  • Brute force your OpenERP data integration with flatfiles
  • Culerity integrates Cucumber and Celerity in order to shutdown the computer.
  • Parses ANSI color codes and converts them to iphone compatible mp4s using HandBrake
  • A simple OFX (Open Financial Exchange) parser built on top of WordPress. Rolopress core theme

The code used to get the project descriptions from the GitHub API is available in this GitHub repository, and you can download the ~300k repo descriptions here. [5MB .zip]


Max Woolf (@minimaxir) is currently a data scientist at BuzzFeed in San Francisco. He is also an ex-Apple employee and Carnegie Mellon University graduate.

In his spare time, Max uses Python to gather data from public APIs and ggplot2 to plot plenty of pretty charts from that data. On special occasions, he uses Keras for fancy deep learning projects.

You can learn more about Max here, view his data analysis portfolio here, or view his coding portfolio here.