Quantifying the Clickbait and Linkbait in BuzzFeed Article Titles

UPDATE: I became a data scientist at BuzzFeed in August 2017. My thoughts about BF have changed significantly in the years since this was published in 2015!

BuzzFeed is one of the most significant sources of journalistic content on the entire internet. Of course, that depends on your definition of “journalistic”: BuzzFeed is one of the first organizations to leverage both social media and the power of language as an editorial business model.

BuzzFeed has popularized the use of the “listicle” as seen above: a bulleted list of text blurbs and/or photos that fits the length and depth of a normal blog article. Additionally, BuzzFeed was one of the first news sources to use non-neutral headlines that deliberately invoke a reaction in the reader which then subsequently tempts them to click on the article in an attempt to promote virality. These “clickbait” and “linkbait” techniques have been responsible for BuzzFeed receiving $50 million in venture capital, and has spawned entire startups and job positions designed solely to emulate BuzzFeed’s success.

I decided to determine which phrases in BuzzFeed headlines are the most successful in order to see if it’s possible to reverse-engineer BuzzFeed’s business model. Therefore, I scraped BuzzFeed’s website (after initial frustration) and obtained 60,378 distinct articles and the corresponding number of Facebook Shares for each article. From there, I decomposed each headline into its component n-grams, allowing me to perform quantitative analysis for each possible permutation of words in the article titles. You probably don’t know that the 3 most interesting things I found will blow your mind.

The Rise of the Listicle

Listicles almost always begin with a numeral as the first or second word. Out of the 60,378 articles I obtained, 26% of them (15,656 articles) are listicles. BuzzFeed clearly believes they are successful, as the proportion of listicles to normal articles has increased over the years.

Listicles can be of any size. The distibution of listicle sizes is centered at the median of 19 entries.

Surprisingly, there is a positive correlation between listicle size and the number of Facebook shares it receives: A 30-size listicle receives many-multiples of shares more than 10-size listicles. (note the logarithmic scale for FB Shares)

BuzzFeed has many different types of listicles to appeal to a wide crowd, including [X] reasons, [X] books, [X] movies, etc, where [X] is any 1 or 2-digit numeral. However, BuzzFeed’s go-to listicle phrase has changed over the years. Here are the most-used listicle phrases for each month since 2012:

In 2012 and 2013, BuzzFeed’s listicles began with the [X]; in 2014, BuzzFeed’s most-used listicles began with [X] things. The “the” is technically redundant; perhaps BuzzFeed decided to make the listicle schema cleaner and less formal. It may be possible that [X] things performs better on average than the [X].

Which types of listicles are the most successful on Facebook? Which types of listicles receive the most amount of Facebook shares?

Here’s a chart of of the Top 30 types of listicles by the number of Facebook shares those articles have received on average (with a minimum of 50 articles of that listicle type):

A few notes on the chart: the gray bars on each average bar represent a 95% confidence interval for the true value of each average, where the confidence interval is obtained through 10,000 iterations of bootstrap resampling. The dashed vertical line represents the population average of all distinct BuzzFeed articles, at 6,657 Facebook shares, and helps visualize the relative impact of having these words in the title compared to a normal BuzzFeed article.

The most-posted listicle types mentioned above are not the types of listicles are most shared, however [X] things does indeed perform slightly better than the [X] on average. Emotional words, such as insanely, awesome, and probably, which you would never see in a more serious journalistic publication, are some of the key drivers of shares.

Let’s look into these keywords more to see if there are any other trends.

Key Keywords

Specific keywords may be more informative. Here’s the most popular keywords over time, ignoring common stop words and listicle words:

Like most journalistic news sources, BuzzFeed tends to write more frequently toward then-current events. 2012 for example had many articles about the 2012 election, while April 2013 consisted of many articles about the Boston Marathon bombings.

Which keywords encouraged the most Facebook shares on average?

There’s a more uncertainty in the accuracy of the average on keywords, especially with the #1 word, career. There’s a strong focus on nostalgia, with toys, childhood, and 80s. Certain brands (potter and disney) fit the nostalgia too.

High words with a relatively small confidence interval and which and character. These are likely caused by BuzzFeed’s quizzes, which have been incredibly popular. Analyzing full phrases is necessary to get a bigger picture.

3-Word Phrases

After careful analysis, I found that 3-word phrases (trigrams) provided more helpful information than phrases of other lengths. Over time, there are similarities with the popular phrases; they both relate to then-current event and occasionally contain listicles.

The average shares of articles based on phrases in their titles, however, tell the full story.

Now we can clearly see some the infamous phrases traditionally associated with clickbait.

Indeed, character are you, a frequent phrase in quizzes, is what leads to the most virality. (It’s worth nothing that these perform 3-4 times better than the best listicles on average). Likewise, you may notice a few phrases are redundant and subset of a bigger phrase (e.g. things you probably, you probably don’t, probably don’t know), but since the averages FB shares aren’t identical, it’s not a perfect subset, and therefore the average is relevant. There’s also a frequent appeal to you, the reader, with you/your/you’re appearing in about half of the top phrases.

Does clickbait work? Of course it does. Granted, there has been a lot of disenchantment with the rise of clickbait; that’s why the parody Twitter account @SavedYouAClick was created and hit 182K followers in months. It’s also the reason why Facebook will now be punishing clickbait and making them less public in a user’s news feed, which will definitely hurt BuzzFeed. That’s likely one of the reasons why they are pivoting to quizzes and video content instead.

I don’t expect clickbait to disappear anytime soon; it’s easy and provides a good return-on-investment, both of which are important to scrappy websites trying to market on social media. Or things could come full-circle and BuzzFeed could publish clickbait about making the best clickbait.

You can view and download all the BuzzFeed article data and metadata in this Google Sheet.

All graphics were generated using R. The charts were created using ggplot2 and the word clouds were created using the wordcloud package.

If you liked this post, I have set up a Patreon to fund my machine learning/deep learning/software/hardware needs for my future crazy yet cool projects, and any monetary contributions to the Patreon are appreciated and will be put to good creative use.
Max Woolf
Max Woolf

Data Scientist at BuzzFeed in San Francisco. Creator of AI text generation tools such as aitextgen and gpt-2-simple. I am the data.

comments powered by Disqus