On February 24th, Facebook added the Facebook Reactions feature across all platforms, allowing to users to show more than a Like, but also Love, Haha, Wow, Sad, and Angry. These additional actions are revealed by hovering over the Like button on Web, or tap-and-holding the button on mobile devices.
I wrote an article at that time lamenting the design decision, arguing that most of those emotions are redundant with Likes, and I was also annoyed about the lack of easy access to bulk Reactions data via Facebook’s Graph API for further analysis. Thanks to a recently-discovered workaround, I am now able to scrape the Reactions data from any public Facebook Page using my updated Facebook Page Post Scraper.
Let’s look at CNN’s Facebook Page, which has made 4,629 Facebook Posts in the four months since the implementation of Reactions. As a news organization, CNN makes Posts on a wide variety of subjects, which we would expect to trigger a wide variety of reactions.
First, we can check the daily marketshare of Reactions to see which ones are the most popular:
And we have a problem. Likes technically count as Reactions. And since Likes are much easier for users to perform, they typically comprise of more than 75% of the total Reactions.
To further compound this problem, when using the CNN dataset, Likes are strongly positively correlated with Love and Wow (implying potentially redundant variables), and Likes are weakly positively correlated with Sad and Angry (which doesn’t make intuitive sense).
It is my belief that Likes are now a statistical red herring used by Facebook to identify low-effort data from people who just tap the Like button. If we ignore Likes completely, both problems are resolved; we have 25% Reaction data left which is more-than-enough on large Facebook pages, and the correlations of non-Like variables are weaker and provide more independent information.
But can looking at the counts of Wow, Love, Haha, Sad, and Angry reactions on Facebook Posts be used to accurately classify their sentiment? If so, can we quantify that sentiment? What makes a Facebook Post Wow but not Haha? I downloaded the Reactions data from the Top 100+ Facebook Pages, and the results turned out better than expected.
That Makes Me Feel Angry
For each Facebook Post from a Facebook Page, I look at the relative percentages of the Wow, Love, Haha, Sad, and Angry counts on a given Post, ignoring Likes (i.e. the percentage of Wow on a Post is the count of Wow reactions divided by the sum of the Wow, Love, Haha, Sad, and Angry counts). Then, I filter the results Posts dataset to Posts which have atleast 75% of an emotion, and classifying the Post with that emotion (additionally, I only keep Posts where the percentage proportion is greater than 75% with 99% statistical confidence). This approach mitigates the possibility of misclassifying the emotion of a Post, and also penalizes Posts with insufficient amounts of Reactions data.
Due to the layers of filtering involved and the sample sizes needed to ensure statistical confidence, and the fact that only an extremely small fraction of a Page’s userbase provide any engagement (the 1% rule), this methodology will only work on Facebook Pages with millions of active readers. In CNN’s case, there are 576 Posts (12%) which fit the criteria above.
We can group by emotion category and scale from 75%—100% emotional response. The result is this for the CNN dataset:
In the interactive version of this chart at the beginning of the article, you can mouse over a point to see the news headline corresponding to that data point.
And the news headlines match the classified emotion surprisingly well. Wow Posts are about neat things, Sad Posts are typically about deaths, Haha Posts capture genuinely funny headlines, and Angry Posts capture controversial and negative topics (e.g. the Brock Turner case). Love is for everything positive in between which doesn’t fit into the above categories.
CNN has a nice variety of emotional responses, but how do they compare to other news sources, like the BBC?
NOTE: The rest of the charts in this blog post are ALL interactive! Enjoy!
The same! What about less-formal journalistic news sources, like ESPN’s SportsCenter?
Similar, but in this case, SportsCenter focuses more on Haha headlines instead of Sad. Let’s go further: how about gossip site TMZ?
An equal balance of Sad, Love, and Haha.
Let’s try a wild card: how about Fox News?
All the Feels
So some pages have specific emotional targets. Fox News has an unusual proportion of Angry Facebook Posts, and from my research, they are the only large page that does so.
In order to test the validity of this approach, we must look at many different types of pages. Several Facebook Pages focus on comedic, Haha Posts relative to other emotions. The Onion, the premier satire publication, is the king (not shown because there are thousands of data points). Semi-relevant music duo LMFAO has posted quite a few random comedic listicles.
And also musician/investor Snoop Dogg, whose brand of humor is…slightly different.
Pages which focus on Wow are rare. The only one I could find with enough data is I f*cking love science, which makes sense given the subject matter.
I’ve had zero success finding pages which focus on Sad Posts, though, which makes sense. Love is a different story.
The correlation between Likes and Loves earlier was 0.80. As you’ve seen from the charts above, Loves are an extension of Likes, but more so in the sense that Loves are positive emotions which are NOT Wows and NOT Hahas. There is an extra granularity, and Pages often focus on Love with little inclusion of other sentiments.
Let’s start with one public figure loved by millions on the Internet: Justin Bieber, of course.
In the political world, the official White House Facebook Page mostly has Posts which are patriotic.
The conclusion of all these charts? Yes, it turns out that Reactions data can indeed be used to be classify the emotional response of Facebook Posts at a more granular level than just a Like/Dislike button…but only if there is a large amount of data available. Me giving a single Wow on a random friend’s Facebook Post says absolutely nothing about the sentiment of the Post (and in my case, the Wow is often sarcastic anyways).
The follow-up question that many social-media coordinators reading this post are asking is “which type of emotion on Facebook Posts leads to the largest number of total Reactions?” That’s hard to say at the moment and requires further analysis. Or even if the implementation of Reactions actually leads to higher total engagement with Facebook Posts than the previous Likes-only system, which is likely a KPI from Facebook’s perspective.
At the least, accurate semantic language data is very valuable in the field of natural language processing and the data from these tagged Facebook Posts can be used to build more accurate classifiers for all five emotions for use in sentiment analysis. Making pretty interactive charts is merely the start.
As always, the code used to create all the visualizations is available in this Jupyter notebook, and the data itself is open-sourced on GitHub. The interactive charts are rendered using plot.ly to automatically convert ggplot2 plots into fully-interactive D3.js charts.
If you do find any other interesting trends in the charts and write about it, it would be greatly appreciated if proper attribution is given back to this post and/or myself. Thanks!
I am currently looking for a job in data analysis/software engineering in San Francisco. If you liked this post and have a lead, feel free to shoot me an email.
Since I currently do not have a full-time salary to subsidize my machine learning/deep learning/software/hardware needs for these blog posts, I have set up a Patreon, and any monetary contributions to the Patreon are appreciated and will be put to good creative use.