What makes people click the Like button on Facebook Statuses? Does the status say something funny? Does it contain a cool link or a funny photo? Does the status actively engage its readers?
Needless to say, brands on Facebook want to know the answers to these questions. The more people that Like their statuses on their Facebook Pages, the more exposure they gain, and the more money they earn. Many, many companies have been created for the sole purpose of maximizing the number of Facebook Likes.
By 2013, most brands have realized that photo posts and statuses that ask a question will generate much higher numbers of Likes. Personally, that’s not why I like a given Facebook status: I Like statuses that contain new and exciting things that are relevant to my interests. The first thing I read in a status is the Message: if the message is boring and uninteresting, then there’s no reason for me to Like the status needlessly.
Can the keywords and language in a Facebook Status predict the number of Likes the status receives?
Running multivariate regressions on keywords from thousands of Facebook statuses, I’ve discovered that the language used in a status does help predict the number of Likes a status receives, and actually could help brands learn much more about their fans.
In order to draw accurate conclusions from the analysis, large amounts of data are needed (and there’s no such thing as too much data).
I decided to analyze the Facebook Pages of 3 extremely popular News sources: CNN, the New York Times, and BBC World News. All three Pages have millions of fans, post the same types of Facebook statuses, and post statuses extremely frequently to Facebook (around 8-9 times a day). The statuses analyzed will be from June 1st 2012 to June 1st 2013, in order to both gather a large sample size (~3,000 statuses from each Page) and a ensure a consistent apples-to-apples comparison between the three sources.
The raw data, code, and a detailed technical explanation of the statistical techniques used to process the data can be found in this GitHub repository. (tl;dr, it’s an optimized least squares regression)
Can the keywords and language in a Facebook Status predict the number of Likes the status receives? We’re ready to test this hypothesis.
+Likes P-Val Bourdain 1962.98 0 NEWS 1272.64 0 Photo 1154.6 0 Barack 1002.45 0 City 851.82 0 Monday 705.42 0 Obama 632.2 0 United 578.77 0 Mitt 562.29 0.01 South 508.43 0.03 America 505.83 0.01 Watch 470.51 0 Boston 398.13 0.05 New 326.74 0.03 ET -433.96 0 North -467.05 0.02 Check -503.5 0 Travel -988.99 0.02
- The most impactful keywords for CNN are political keywords.
- NEWS (“BREAKING NEWS”) predicts more Likes than Photo (i.e. Travel Photo of the Day).
- Bourdain refers to Anthony Bourdain, who very recently began a food and travel show on CNN. I assume that targets CNN’s primary demographic.
- The presence of Barack (Barack Obama) predicts more likes than just Obama.
- The presence of Monday predicts more likes most other keywords. Garfield would be disappointed.
- The presence of North (North Korea) predicts a decrease in Likes from the average. Is North Korea boring?
- The presence of Check (“Check this out!”) also predicts a decrease in Likes from the average. This is interesting because this call-to-action is usually associated with the other keywords. Perhaps it’s not necessary?
+Likes P-Val Mills 1022.9 0 Democratic 496.32 0 Krugman 456.31 0 Senate 431.05 0 London 373.67 0.01 Oscar 367.64 0.01 Sandy 364.61 0 Olympics 339.15 0.01 Sunday 317 0 Clinton 314.64 0.02 You 247.07 0 Obama 237.98 0 Opinion 221 0 Michael 204.2 0.03 Credit 191.92 0.05 Tuesday 187.22 0.03 Times 178.69 0 See 160.12 0.01 Romney 157.66 0.04 Ed 140.86 0.01 Read -101.23 0.05 Here -182.72 0.04 Facebook -187.75 0.02 Are -216.3 0.01 While -293.77 0.04 Quotation -322.29 0.02 House -362.1 0.01 Thanksgiving -369.88 0.05 Convention -448.34 0.01 Jersey -505.98 0
- The most impactful keywords for NYTimes are political keywords and current events, such as Hurricane Sandy and the London Olympics.
- Mills refers to Doug Mills, photographer for the New York Times. The power of photos!
- You and See are impactful call-to-action keywords, even though they don’t ask any questions. (Interestingly, Are, which implicitly asks a question, predicts a decrease in Likes. Maybe people don’t Like questions?)
- Clinton has a bigger impact than Obama.
- No one likes Jersey.
BBC World News
+Likes P-Val Malala 488.1 0 Japan 186.32 0 Britain 167.25 0.01 That 155.01 0.01 AFP 154.97 0.02 Travel 153.41 0 Now 148.39 0.02 Would 146.45 0.03 Delhi 137.85 0.04 Let 134.93 0.03 Future 127.49 0.02 Barack 125.79 0.02 An 122.18 0.03 David 111.78 0.04 Do 111.13 0 Israel 101.31 0.04 Impact 87.94 0.05 South 78.62 0.03 It 62.7 0.01 The 55.7 0 US -47.84 0.02 On -86.22 0.04 BBC -103.58 0 Sunday -117.01 0.03 After -122.38 0.05 Thursday -143.91 0.03 Twitter -144.55 0 City -147.16 0.02 Hi -149.85 0.03 BBCNewsUS -150.48 0.03 House -176.86 0.01 California -187.57 0
- The most impactful keywords for BBC World News are keywords describing news around the world. Unlike CNN and NYTimes, there are very few impactful keywords related to domestic politics (such as Prime Minister David Cameron, and even Barack is more impactful than him.)
- Malala refers to Malala Youdsfzai, a Pakistani activist who survived an assassination attempt.
- Implicit question keywords such as Let, Do, and Would are all very effective.
- The simple salutation of Hi predicts a decrease in Likes. There’s a British joke here somewhere.
The language used in Facebook Statuses can be very useful in identifying what words Fans like, and what words will be most useful in generating the most exposure. While my analysis can’t predict the exact number of Likes a Status receives, and I likely broke a few rules of statistics in the process of making this post, the impact of language and specific keywords on social media interaction is an endeavor worth pursuing.
I am currently looking for a job in data analysis/software engineering in San Francisco. If you liked this post and have a lead, feel free to shoot me an email.
Since I currently do not have a full-time salary to subsidize my machine learning/deep learning/software/hardware needs for these blog posts, I have set up a Patreon, and any monetary contributions to the Patreon are appreciated and will be put to good creative use.