Last week, I was surfing Reddit and came across an interesting infographic within the ProgrammerHumor subreddit. The infographic, as compiled by tech blog ReadWrite, depicts the Top 10 Most In-Demand Developer Skills of 2013, as compiled by Stack Overflow through keyword searches.
Take a look at the chart. What’s good and bad about it?
The data is properly cited.
What’s terrible about the graph? Let me count the ways:
Using discrete values in the X-Axis for a continuous measurement (i.e. the percentage). And not only that, discrete values with two significant figures, which make the X-Axis unusually cluttered.
The Y-Axis is Language. This implies that some programming languages are more language than others. (to be fair, Java is more language than Android)
Not all entries on the chart are programming languages. (Android, for example, is an operating system.)
The 45-degree line in the chart implies that the relationship between language and %-of-searches is perfectly linear, where in reality the data has an upward-parabolic shape.
No relative proportions between the programming languages. We can’t accurately see the increase in language Java has relative to Android just by looking at the graph.
Cannot easily associate a language with the given X-Axis value. The logos representing the programming language oscillate around the line, and it’s hard to see at a glance which percentage corresponds to which language.
Fixing the Chart
This isn’t one of those blog posts that make snarky criticism without offering any constructive input. (those will be posted next week). How can we make the chart somewhat logical?
The easiest way to improve the chart is to convert the chart from a line chart to a column chart. Here’s a column chart that keeps the intended impression of the original chart:
A much bigger improvement, although unfortunately without the cool hand-drawn logos. The axes are no longer illogical and it’s easy to determine the relative impact of each language (e.g. Java is clearly, clearly at the top). However, the large amount of text can clutter the bars, and it can be difficult to correlate the raw percentage with the scale at a glance.
Another option is to rotate the chart and use bars instead of columns:
This fixes the text issue by giving more room for text with most of the factors, but the Java text clips outside the chart. The correlation issue between language and percent value persists.
A best-of-both-worlds approach is to display the language on the Y-Axis and the raw percent value on the corresponding bar itself, allowing us to forgo the percent metric axis entirely.
There we go. Minimalist, fixes both the text and correlation issues, and gets the point across effectively.
Wait, they changed the infographic?
Hey, they fixed the chart, and it’s using a similar design as mine! What a crazy random happenstance!
The article was appended with an update:
An earlier version of the infographic in this story presented the StackOverflow data in a confusing and conceptually problematic fashion. It has been updated.
“Problematic” is the world’s biggest understatement.
I am currently looking for a job in data analysis/software engineering in San Francisco. If you liked this post and have a lead, feel free to shoot me an email.
Since I currently do not have a full-time salary to subsidize my machine learning/deep learning/software/hardware needs for these blog posts, I have set up a Patreon, and any monetary contributions to the Patreon are appreciated and will be put to good creative use.