<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Facebook on Max Woolf&#39;s Blog</title>
    <link>https://minimaxir.com/tag/facebook/</link>
    <description>Recent content in Facebook on Max Woolf&#39;s Blog</description>
    <image>
      <title>Max Woolf&#39;s Blog</title>
      <url>https://minimaxir.com/android-chrome-512x512.png</url>
      <link>https://minimaxir.com/android-chrome-512x512.png</link>
    </image>
    <generator>Hugo</generator>
    <language>en</language>
    <copyright>Copyright Max Woolf © 2026</copyright>
    <lastBuildDate>Mon, 29 Feb 2016 08:00:00 -0700</lastBuildDate>
    <atom:link href="https://minimaxir.com/tag/facebook/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Facebook Reactions and the Problems With Quantifying Likes Differently</title>
      <link>https://minimaxir.com/2016/02/facebook-reactions/</link>
      <pubDate>Mon, 29 Feb 2016 08:00:00 -0700</pubDate>
      <guid>https://minimaxir.com/2016/02/facebook-reactions/</guid>
      <description>Apparently, there is little statistical relationship between things that are cute and things that make you go YAAASS.</description>
      <content:encoded><![CDATA[<p>Facebook added <a href="http://newsroom.fb.com/news/2016/02/reactions-now-available-globally/">Facebook Reactions</a>, allowing users to do more than just &ldquo;Like&rdquo; posts and statuses as they have done for the past decade. Likes were the universal symbol of approval on social media. Now, Facebook users can apply more granular responses, from positive emotions like <strong>Love</strong>, to negative emotions such as <strong>Angry</strong>. This was widely believed to be Facebook&rsquo;s compromise instead of adding a Dislike button.</p>
<figure>

    <img loading="lazy" srcset="/2016/02/facebook-reactions/facebook_react_hu_844fda335951da9b.webp 320w,/2016/02/facebook-reactions/facebook_react_hu_86d02679c7d58cf3.webp 768w,/2016/02/facebook-reactions/facebook_react.png 828w" src="facebook_react.png"/> 
</figure>

<p>Of course, there&rsquo;s an ulterior motive. The use of reactions provides organic data on the sentiment of a status, which is helpful for numerous marketing and statistical applications. As <a href="http://www.buzzfeed.com/alexkantrowitz/facebook-reactions-launch-today">BuzzFeed notes</a>, Facebook ads may be able &ldquo;to write one product message for someone who mostly uses <strong>Sad</strong> and another who mostly uses <strong>Wow</strong> or <strong>Love.</strong>&rdquo;</p>
<p>However, this isn&rsquo;t the first time a big social network has tried implementing reactions alongside Likes/Dislikes. Four years ago, YouTube added <a href="http://googlesystem.blogspot.com/2011/06/youtube-reactions.html">Reaction buttons</a> to their comments section:</p>
<figure>

    <img loading="lazy" srcset="/2016/02/facebook-reactions/youtube-reactions_hu_7a4632d04eb1ee92.webp 320w,/2016/02/facebook-reactions/youtube-reactions.png 572w" src="youtube-reactions.png"/> 
</figure>

<p>&hellip;and removed them sometime after without fanfare, replacing it with the simple Like/Dislike bar.</p>
<p>Presumably, YouTube implemented the buttons for the similar reason as Facebook. What makes things different now, if anything?</p>
<h2 id="a-quantitative-approach-to-feeling">A Quantitative Approach to Feeling</h2>
<p>Even after YouTube&rsquo;s failure, another data-driven website implemented reaction buttons: BuzzFeed (who else?). At the end of each article (in most categories), registered users can select a quirky reaction to indicate how they felt about the article.</p>
<figure>

    <img loading="lazy" srcset="/2016/02/facebook-reactions/buzzfeedreactions_hu_d32803738bee0d06.webp 320w,/2016/02/facebook-reactions/buzzfeedreactions.png 646w" src="buzzfeedreactions.png"/> 
</figure>

<p>The heart represents <strong>Love</strong> internally and is by-far the most-used reaction on BuzzFeed posts. When I started scraping BuzzFeed data in 2014 <a href="http://minimaxir.com/2015/01/linkbait/">to analyze clickbait</a>, I made sure to grab the reaction data of other reactions as well to see if there are any interesting trends or correlations between reactions. A cursory glance at the scraped reaction data revealed a problem that forced me to disregard it.</p>
<p>An important part of variable selection for analysis and modeling is avoiding <em>redundant</em> features, as that can cause issues such as <a href="https://en.wikipedia.org/wiki/Multicollinearity">multicollinearity</a> and <a href="https://en.wikipedia.org/wiki/Overfitting">overfitting</a>. For Facebook, avoiding adding redundant Reactions was an <a href="https://medium.com/facebook-design/reactions-not-everything-in-life-is-likable-5c403de72a3f">explicit design goal</a> of the feature, but the positive emotions such as <strong>Like</strong> and <strong>Wow</strong> might be overly similar regardless (I believe it fair to compare the behavior of BuzzFeed users with the average Facebook user, given that they hit the same demographics). Do BuzzFeed readers use specific positive reactions differently? Did they use specific negative reactions?</p>
<p>I rechecked my 2014 data in light of Facebook Reactions. The scraped dataset contains reaction data from 9,883 BuzzFeed articles in the Celebrity, Animals, Books, Longform, and Business categories. From that, I made a <a href="http://vita.had.co.nz/papers/gpp.pdf">pairs plot</a> for the counts of all the <em>positive</em> reactions on the articles to illustrate all bivariate relationships:</p>
<ul>
<li>The lower half of the pairs plot is a scatterplot for the two reactions; the axes represent the number of votes for a given reaction on a BuzzFeed article (both axes are scaled logarithmically), color intensity indicates the number of articles at that X/Y combo, and the line is a linear trendline of least-squares.</li>
<li>The diagonal of the pairs plot represents the density distribution of reaction vote counts for that reaction. (also logarithmically scaled on the X axis)</li>
<li>The upper half of the pairs plot illustrates the Pearson correlation between the non-log quantities of the two reaction variables. The stars represent statistical significance of the correlation test; since the data set is large, all correlations are statistically significant (rejection of null hypothesis of no correlation) at p &lt; 0.001.</li>
</ul>
<figure>

    <img loading="lazy" srcset="/2016/02/facebook-reactions/buzzfeed-pos_hu_d6d7a26193c3ada0.webp 320w,/2016/02/facebook-reactions/buzzfeed-pos_hu_286467cf9bb9a86.webp 768w,/2016/02/facebook-reactions/buzzfeed-pos_hu_2ddf9da10f86ca62.webp 1024w,/2016/02/facebook-reactions/buzzfeed-pos.png 1600w" src="buzzfeed-pos.png"/> 
</figure>

<p>All of the bivariate correlations of positive reactions are <em>moderately or strongly positively correlated</em>, which is problematic for analysis (except one: apparently, there is little statistical relationship between things that are cute and things that make you go YAAASS). So why not just use the <strong>Love</strong> reaction, since articles tend to get about 100 Loves, while other reactions get around 10?</p>
<p>Does the same hold for negative reactions? Relatedly, we would also expect a negative correlation between the number of <strong>Love</strong> reactions and negative reactions, right?</p>
<figure>

    <img loading="lazy" srcset="/2016/02/facebook-reactions/buzzfeed-neg_hu_621abb3f15b07deb.webp 320w,/2016/02/facebook-reactions/buzzfeed-neg_hu_74efa270902d7802.webp 768w,/2016/02/facebook-reactions/buzzfeed-neg_hu_70af131959efaab2.webp 1024w,/2016/02/facebook-reactions/buzzfeed-neg.png 1600w" src="buzzfeed-neg.png"/> 
</figure>

<p>All negative reactions are positively correlated, as expected, but there is a weak <em>positive</em> correlation between <strong>Love</strong> and <strong>Hate</strong>, which is definitely not right. There isn&rsquo;t an ideal &ldquo;negative&rdquo; reaction, since all have similar distributions.</p>
<p>Why does Facebook have 6 different responses to gauge positivity or negativity when one reaction for each would be both more accurate and more intuitive for the user?</p>
<h2 id="conceal-dont-feel">Conceal, Don&rsquo;t Feel</h2>
<p>There are other qualitative issues with Facebook&rsquo;s current implementation of Reactions. Apparently, Likes and Reactions are treated <em>differently internally</em>. As a result, you get separate notifications for Likes and Reactions.</p>
<figure>

    <img loading="lazy" srcset="/2016/02/facebook-reactions/facebook_react2_hu_462102ea064f125f.webp 320w,/2016/02/facebook-reactions/facebook_react2_hu_cdb8c15334891015.webp 768w,/2016/02/facebook-reactions/facebook_react2.png 874w" src="facebook_react2.png"/> 
</figure>

<p>Why? No idea. There is enough Notification spam on Facebook, I don&rsquo;t need <em>double notifications</em> in my Notification feed for every status I make.</p>
<p>What&rsquo;s important to note is that a user cannot both Like and React to a status; only one or the other. As a result, the number of Likes on statuses overall will drop, and this is a <em>major</em> problem for businesses who are dependent on measuring the number of Likes for engagement.</p>
<p>I took a look at the Facebook Graph API endpoint for <a href="https://developers.facebook.com/docs/graph-api/reference/v2.5/post">Facebook Page Posts</a> (same endpoint I use for my <a href="https://github.com/minimaxir/facebook-page-post-scraper">Facebook Page Data Scraper</a>), and I can confirm that the API can only report the number of Likes on a status; not the number of Likes + Reactions, or number of Likes + number of each Reaction.</p>
<figure>

    <img loading="lazy" srcset="/2016/02/facebook-reactions/cnn_fb_hu_7b868416e9ef2e61.webp 320w,/2016/02/facebook-reactions/cnn_fb_hu_cc31054c813a0b29.webp 768w,/2016/02/facebook-reactions/cnn_fb_hu_22634dbc600cd4ba.webp 1024w,/2016/02/facebook-reactions/cnn_fb.jpg 1800w" src="cnn_fb.jpg"/> 
</figure>

<p>There is no way currently to automate the retrieval of Reactions data from Facebook posts, which is an unfortunate oversight (especially considering how Twitter <a href="https://blog.twitter.com/2015/hearts-for-developers">handled the transition</a> from Favorites to Likes easily).</p>
<p>The example <a href="https://www.facebook.com/cnn/posts/10154506885211509">CNN story</a> I used for that screenshot is anecdotally one of the very few examples I&rsquo;ve noticed where the number of Likes is <em>almost equal</em> to negative emotions, a relationship which should be weakly correlated and therefore this knowledge may be useful to isolate the story as unusual (and serve ads accordingly). At Facebook&rsquo;s immense scale, identifying a relatively small proportion of unusual stories might be enough to justify adding Reactions.</p>
<p>Or maybe this feature is just the harbinger of a new generation of emotionally-charged linkbait. Perhaps there is more to this Facebook Reactions data than what meets the eye, and I&rsquo;ll update my scripts and do further statistical analysis when able. But given what has happened with Reactions data before with YouTube, I am unconvinced and I still believe the functionality as a whole is a usability regression that won&rsquo;t last.</p>
<p>A Dislike button would have been better, just saying.</p>
<hr>
<p><em>You can view the code and data used to generate the BuzzFeed Reaction data visualizations <a href="https://github.com/minimaxir/facebook-reactions/blob/master/buzzfeed_reactions.ipynb">in this Jupyter notebook</a>, <a href="https://github.com/minimaxir/facebook-reactions">open-sourced on GitHub</a>, or you can <a href="https://github.com/minimaxir/facebook-reactions/raw/master/reactions_pdf.pdf">view as a PDF</a>, which is better if you are on a mobile device.</em></p>
]]></content:encoded>
    </item>
    <item>
      <title>An Introduction on How to Make Beautiful Charts With R and ggplot2</title>
      <link>https://minimaxir.com/2015/02/ggplot-tutorial/</link>
      <pubDate>Thu, 12 Feb 2015 08:00:00 -0700</pubDate>
      <guid>https://minimaxir.com/2015/02/ggplot-tutorial/</guid>
      <description>Adding a touch of color and design can help make more compelling visualizations, thanks to ggplot2 syntax and chaining capabilities.</description>
      <content:encoded><![CDATA[<p><em><strong>UPDATE August 2017</strong>: I have published an <a href="http://minimaxir.com/2017/08/ggplot2-web/">updated version</a> of this post with modern trends for making high quality charts with R and ggplot2, which may be a helpful resource in addition to this post.</em></p>
<p>Readers of my previous blog posts have frequently asked me &ldquo;how do you make those charts?&rdquo;</p>
<figure>

    <img loading="lazy" srcset="/2015/02/ggplot-tutorial/buzzfeed-listicle-scatterplot_hu_34ec28bc68eb8132.webp 320w,/2015/02/ggplot-tutorial/buzzfeed-listicle-scatterplot_hu_7bc795e151287e5d.webp 768w,/2015/02/ggplot-tutorial/buzzfeed-listicle-scatterplot_hu_1e739f5b63048060.webp 1024w,/2015/02/ggplot-tutorial/buzzfeed-listicle-scatterplot.png 1200w" src="buzzfeed-listicle-scatterplot.png"/> 
</figure>

<p>These charts were made using <a href="http://docs.ggplot2.org/current/">ggplot2</a>, an add-on package for the <a href="http://www.r-project.org/index.html">R programming language</a>, along with lots of iterative improvement over the months. R notably has chart-making capabilities built into the language by default, but it is not easy to use and often produces <em>very</em> simplistic charts. Enter ggplot2, which allows users to create full-featured and robust charts with only a few lines of code.</p>
<figure>

    <img loading="lazy" srcset="/2015/02/ggplot-tutorial/geom_histogram-4_hu_d270d1d30dedd57d.webp 320w,/2015/02/ggplot-tutorial/geom_histogram-4.png 400w" src="geom_histogram-4.png"/> 
</figure>

<p>You&rsquo;ve probably seen charts elsewhere on the internet similar to this one. While it implements the &ldquo;<a href="http://vita.had.co.nz/papers/layered-grammar.html">Grammar of Graphics</a>&rdquo; (which is where the &ldquo;gg&rdquo; in &ldquo;ggplot2&rdquo; comes from), it does look generic and cluttered.</p>
<p>Adding a touch of color and design can help make more compelling visualizations, and it&rsquo;s pretty easy to do thanks to ggplot2&rsquo;s syntax and chaining capabilities.</p>
<h2 id="quick-design-notes">Quick Design Notes</h2>
<p>Charts with a completely-gray background have become rather popular lately, mostly in part to the charts produced by <a href="http://fivethirtyeight.com/">FiveThirtyEight</a>, which was the inspiration behind my design. An important functional aspect of a gray background is that it makes the chart area distinct from the article body.</p>
<p>The charts I make are typically 1200px by 900px. On my blog, the width of the article text container is less than 1200px, so the browser shrinks the chart to make it fit. The chart still appears at a high resolution on HiDPI/Retina screens, and since the charts are simple, shrinking will not cause significant graphical distortion on normal-resolution screens. 1200x900px also keeps the file size low, which is important when putting 10 or more charts in a post.</p>
<p>An important tip when making charts in ggplot2: render the chart on OS X, if possible. OS X has antialiasing for text and curves in charts, while Windows/Linux does not, and it can significantly improve the quality of the chart.</p>
<h2 id="making-a-ggplot2-histogram">Making a ggplot2 Histogram</h2>
<p>The first chart we&rsquo;ll be making is a histogram. This is a good example of a chart that&rsquo;s easy to make in R/ggplot2, but hard to make Excel.</p>
<p>For this tutorial, we&rsquo;ll be using <code>ggplot2</code>, plus three additional R packages: <code>RColorBrewer</code>, which allows for the procedural generation of colors from a palette for the chart, <code>scales</code>, which allows for the axes to express numbers with commas/percents, and <code>grid</code>, which allows for manipulation of the chart margins and layout. We can install and load these packages at the beginning of the R file:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-R" data-lang="R"><span class="line"><span class="cl"><span class="nf">install.packages</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s">&#34;ggplot2&#34;</span><span class="p">,</span><span class="s">&#34;RColorBrewer&#34;</span><span class="p">,</span><span class="s">&#34;scales&#34;</span><span class="p">))</span>
</span></span><span class="line"><span class="cl"><span class="nf">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">);</span> <span class="nf">library</span><span class="p">(</span><span class="n">scales</span><span class="p">);</span> <span class="nf">library</span><span class="p">(</span><span class="n">grid</span><span class="p">);</span> <span class="nf">library</span><span class="p">(</span><span class="n">RColorBrewer</span><span class="p">)</span>
</span></span></code></pre></div><p>The dataset we&rsquo;ll use is my <a href="http://minimaxir.com/csv/buzzfeed_linkbait_headlines.csv">list of 15,101 BuzzFeed listicles</a> that I used <a href="http://minimaxir.com/2015/01/linkbait/">in my previous blog post</a>, including both the listicle size and number of Facebook shares the listicle received, which have been prefiltered to listicle sizes of 50 or less, and have received atleast 1 Facebook share. Download the file, and set the working directory of R to the containing folder. We load the dataset into R by reading the CSV:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-R" data-lang="R"><span class="line"><span class="cl"><span class="n">df</span> <span class="o">&lt;-</span> <span class="nf">read.csv</span><span class="p">(</span><span class="s">&#34;buzzfeed_linkbait_headlines.csv&#34;</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="bp">T</span><span class="p">)</span>
</span></span></code></pre></div><p>We can make a basic histogram in two lines of code.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-R" data-lang="R"><span class="line"><span class="cl"><span class="nf">ggplot</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="nf">aes</span><span class="p">(</span><span class="n">listicle_size</span><span class="p">))</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">	<span class="nf">geom_histogram</span><span class="p">(</span><span class="n">binwidth</span><span class="o">=</span><span class="m">1</span><span class="p">)</span>
</span></span></code></pre></div><p>The first line instantiates the charts and defines the variables used for plotting. We declare the use of the data frame <code>df</code>, and the <code>listicle_size</code> vector from that data frame as the plotting aesthetic. The second line tells ggplot to make a histogram out of the given data with <code>geom_histogram</code>, and we specify a binwidth of 1 so that each column represents one discrete value of listicle. Running that code will cause a plot to pop up.</p>
<figure>

    <img loading="lazy" srcset="/2015/02/ggplot-tutorial/tutorial_1_hu_bce0f394e88ac2c0.webp 320w,/2015/02/ggplot-tutorial/tutorial_1_hu_e2beb498e49df054.webp 768w,/2015/02/ggplot-tutorial/tutorial_1_hu_dfd61e72f1fc7ce1.webp 1024w,/2015/02/ggplot-tutorial/tutorial_1.png 1200w" src="tutorial_1.png"/> 
</figure>

<p>Not a bad start. In order to save the created plot, we use the <code>ggsave</code> command, which saves the last-generated plot to an image in your working directory. The first parameter, the filename, determines the filetype.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-R" data-lang="R"><span class="line"><span class="cl"><span class="nf">ggsave</span><span class="p">(</span><span class="s">&#34;tutorial_1.png&#34;</span><span class="p">,</span> <span class="n">dpi</span><span class="o">=</span><span class="m">300</span><span class="p">,</span> <span class="n">width</span><span class="o">=</span><span class="m">4</span><span class="p">,</span> <span class="n">height</span><span class="o">=</span><span class="m">3</span><span class="p">)</span>
</span></span></code></pre></div><p>Now we can add a theme to make it look classy.</p>
<p>A ggplot2 theme is a function that overrides the graphical parameters of the default theme. Here&rsquo;s the long code block for my FiveThirtyEight-inspired theme, with code comments for each code subblock:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-R" data-lang="R"><span class="line"><span class="cl"><span class="n">fte_theme</span> <span class="o">&lt;-</span> <span class="kr">function</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Generate the colors for the chart procedurally with RColorBrewer</span>
</span></span><span class="line"><span class="cl"><span class="n">palette</span> <span class="o">&lt;-</span> <span class="nf">brewer.pal</span><span class="p">(</span><span class="s">&#34;Greys&#34;</span><span class="p">,</span> <span class="n">n</span><span class="o">=</span><span class="m">9</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">color.background</span> <span class="o">=</span> <span class="n">palette[2]</span>
</span></span><span class="line"><span class="cl"><span class="n">color.grid.major</span> <span class="o">=</span> <span class="n">palette[3]</span>
</span></span><span class="line"><span class="cl"><span class="n">color.axis.text</span> <span class="o">=</span> <span class="n">palette[6]</span>
</span></span><span class="line"><span class="cl"><span class="n">color.axis.title</span> <span class="o">=</span> <span class="n">palette[7]</span>
</span></span><span class="line"><span class="cl"><span class="n">color.title</span> <span class="o">=</span> <span class="n">palette[9]</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Begin construction of chart</span>
</span></span><span class="line"><span class="cl"><span class="nf">theme_bw</span><span class="p">(</span><span class="n">base_size</span><span class="o">=</span><span class="m">9</span><span class="p">)</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Set the entire chart region to a light gray color</span>
</span></span><span class="line"><span class="cl"><span class="nf">theme</span><span class="p">(</span><span class="n">panel.background</span><span class="o">=</span><span class="nf">element_rect</span><span class="p">(</span><span class="n">fill</span><span class="o">=</span><span class="n">color.background</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="n">color.background</span><span class="p">))</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl"><span class="nf">theme</span><span class="p">(</span><span class="n">plot.background</span><span class="o">=</span><span class="nf">element_rect</span><span class="p">(</span><span class="n">fill</span><span class="o">=</span><span class="n">color.background</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="n">color.background</span><span class="p">))</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl"><span class="nf">theme</span><span class="p">(</span><span class="n">panel.border</span><span class="o">=</span><span class="nf">element_rect</span><span class="p">(</span><span class="n">color</span><span class="o">=</span><span class="n">color.background</span><span class="p">))</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Format the grid</span>
</span></span><span class="line"><span class="cl"><span class="nf">theme</span><span class="p">(</span><span class="n">panel.grid.major</span><span class="o">=</span><span class="nf">element_line</span><span class="p">(</span><span class="n">color</span><span class="o">=</span><span class="n">color.grid.major</span><span class="p">,</span><span class="n">size</span><span class="o">=</span><span class="m">.25</span><span class="p">))</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl"><span class="nf">theme</span><span class="p">(</span><span class="n">panel.grid.minor</span><span class="o">=</span><span class="nf">element_blank</span><span class="p">())</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl"><span class="nf">theme</span><span class="p">(</span><span class="n">axis.ticks</span><span class="o">=</span><span class="nf">element_blank</span><span class="p">())</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Format the legend, but hide by default</span>
</span></span><span class="line"><span class="cl"><span class="nf">theme</span><span class="p">(</span><span class="n">legend.position</span><span class="o">=</span><span class="s">&#34;none&#34;</span><span class="p">)</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl"><span class="nf">theme</span><span class="p">(</span><span class="n">legend.background</span> <span class="o">=</span> <span class="nf">element_rect</span><span class="p">(</span><span class="n">fill</span><span class="o">=</span><span class="n">color.background</span><span class="p">))</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl"><span class="nf">theme</span><span class="p">(</span><span class="n">legend.text</span> <span class="o">=</span> <span class="nf">element_text</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="m">7</span><span class="p">,</span><span class="n">color</span><span class="o">=</span><span class="n">color.axis.title</span><span class="p">))</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Set title and axis labels, and format these and tick marks</span>
</span></span><span class="line"><span class="cl"><span class="nf">theme</span><span class="p">(</span><span class="n">plot.title</span><span class="o">=</span><span class="nf">element_text</span><span class="p">(</span><span class="n">color</span><span class="o">=</span><span class="n">color.title</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="m">10</span><span class="p">,</span> <span class="n">vjust</span><span class="o">=</span><span class="m">1.25</span><span class="p">))</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl"><span class="nf">theme</span><span class="p">(</span><span class="n">axis.text.x</span><span class="o">=</span><span class="nf">element_text</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="m">7</span><span class="p">,</span><span class="n">color</span><span class="o">=</span><span class="n">color.axis.text</span><span class="p">))</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl"><span class="nf">theme</span><span class="p">(</span><span class="n">axis.text.y</span><span class="o">=</span><span class="nf">element_text</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="m">7</span><span class="p">,</span><span class="n">color</span><span class="o">=</span><span class="n">color.axis.text</span><span class="p">))</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl"><span class="nf">theme</span><span class="p">(</span><span class="n">axis.title.x</span><span class="o">=</span><span class="nf">element_text</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="m">8</span><span class="p">,</span><span class="n">color</span><span class="o">=</span><span class="n">color.axis.title</span><span class="p">,</span> <span class="n">vjust</span><span class="o">=</span><span class="m">0</span><span class="p">))</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl"><span class="nf">theme</span><span class="p">(</span><span class="n">axis.title.y</span><span class="o">=</span><span class="nf">element_text</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="m">8</span><span class="p">,</span><span class="n">color</span><span class="o">=</span><span class="n">color.axis.title</span><span class="p">,</span> <span class="n">vjust</span><span class="o">=</span><span class="m">1.25</span><span class="p">))</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Plot margins</span>
</span></span><span class="line"><span class="cl"><span class="nf">theme</span><span class="p">(</span><span class="n">plot.margin</span> <span class="o">=</span> <span class="nf">unit</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="m">0.35</span><span class="p">,</span> <span class="m">0.2</span><span class="p">,</span> <span class="m">0.3</span><span class="p">,</span> <span class="m">0.35</span><span class="p">),</span> <span class="s">&#34;cm&#34;</span><span class="p">))</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></div><p>Adding the completed theme to the chart is just one line of code:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-R" data-lang="R"><span class="line"><span class="cl"><span class="nf">ggplot</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="nf">aes</span><span class="p">(</span><span class="n">listicle_size</span><span class="p">))</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">	<span class="nf">geom_histogram</span><span class="p">(</span><span class="n">binwidth</span><span class="o">=</span><span class="m">1</span><span class="p">)</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">	<span class="nf">fte_theme</span><span class="p">()</span>
</span></span></code></pre></div><figure>

    <img loading="lazy" srcset="/2015/02/ggplot-tutorial/tutorial_2_hu_4fd405b53345eac6.webp 320w,/2015/02/ggplot-tutorial/tutorial_2_hu_20b8cfe807277fb.webp 768w,/2015/02/ggplot-tutorial/tutorial_2_hu_d22d27b176715362.webp 1024w,/2015/02/ggplot-tutorial/tutorial_2.png 1200w" src="tutorial_2.png"/> 
</figure>

<p>A little more classy. Now that the core design of the chart is present, we can make polish the chart to make it more beautiful.</p>
<p>Of course, all charts need properly labled axes and a title. We can add that with the <code>labs</code> function:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-R" data-lang="R"><span class="line"><span class="cl"><span class="nf">ggplot</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="nf">aes</span><span class="p">(</span><span class="n">listicle_size</span><span class="p">))</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">	<span class="nf">geom_histogram</span><span class="p">(</span><span class="n">binwidth</span><span class="o">=</span><span class="m">1</span><span class="p">)</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">	<span class="nf">fte_theme</span><span class="p">()</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl"><span class="nf">labs</span><span class="p">(</span><span class="n">title</span><span class="o">=</span><span class="s">&#34;Distribution of Listicle Sizes for BuzzFeed Listicles&#34;</span><span class="p">,</span> <span class="n">x</span><span class="o">=</span><span class="s">&#34;# of Entries in Listicle&#34;</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="s">&#34;# of Listicles&#34;</span><span class="p">)</span>
</span></span></code></pre></div><figure>

    <img loading="lazy" srcset="/2015/02/ggplot-tutorial/tutorial_3_hu_27bb341b7e7c648a.webp 320w,/2015/02/ggplot-tutorial/tutorial_3_hu_d13b0e82d65d3dec.webp 768w,/2015/02/ggplot-tutorial/tutorial_3_hu_7de37758bbf0f095.webp 1024w,/2015/02/ggplot-tutorial/tutorial_3.png 1200w" src="tutorial_3.png"/> 
</figure>

<p>Now we can add a few finishing touches. For the x-axis, we can set the breaks to 5 instead of 10 using <code>scale_x_continuous</code> since we have the room. For the y-axis, since we have an axis value with 4 digits, we can set the formatting to use a comma with <code>scale_y_continuous</code>. Lastly, we can add a line at y = 0 using <code>geom_line</code> to further seperate the data. Lastly, in <code>geom_histogram</code>, we can change the fill of the bars to a red color for more thematic branding, and also reduce the opacity to make the grid lines visible behind the chart.</p>
<p>Putting it all together:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-R" data-lang="R"><span class="line"><span class="cl"><span class="nf">ggplot</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="nf">aes</span><span class="p">(</span><span class="n">listicle_size</span><span class="p">))</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl"> 	<span class="nf">geom_histogram</span><span class="p">(</span><span class="n">binwidth</span><span class="o">=</span><span class="m">1</span><span class="p">,</span> <span class="n">fill</span><span class="o">=</span><span class="s">&#34;#c0392b&#34;</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="m">0.75</span><span class="p">)</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">	<span class="nf">fte_theme</span><span class="p">()</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">	<span class="nf">labs</span><span class="p">(</span><span class="n">title</span><span class="o">=</span><span class="s">&#34;Distribution of Listicle Sizes for BuzzFeed Listicles&#34;</span><span class="p">,</span> <span class="n">x</span><span class="o">=</span><span class="s">&#34;# of Entries in Listicle&#34;</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="s">&#34;# of Listicles&#34;</span><span class="p">)</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">	<span class="nf">scale_x_continuous</span><span class="p">(</span><span class="n">breaks</span><span class="o">=</span><span class="nf">seq</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="m">50</span><span class="p">,</span> <span class="n">by</span><span class="o">=</span><span class="m">5</span><span class="p">))</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">	<span class="nf">scale_y_continuous</span><span class="p">(</span><span class="n">labels</span><span class="o">=</span><span class="n">comma</span><span class="p">)</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">	<span class="nf">geom_hline</span><span class="p">(</span><span class="n">yintercept</span><span class="o">=</span><span class="m">0</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="m">0.4</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">&#34;black&#34;</span><span class="p">)</span>
</span></span></code></pre></div><figure>

    <img loading="lazy" srcset="/2015/02/ggplot-tutorial/tutorial_4_hu_b9ceb93464d0d6ac.webp 320w,/2015/02/ggplot-tutorial/tutorial_4_hu_d47bdc2ebcbd77d9.webp 768w,/2015/02/ggplot-tutorial/tutorial_4_hu_3c495262743728f7.webp 1024w,/2015/02/ggplot-tutorial/tutorial_4.png 1200w" src="tutorial_4.png"/> 
</figure>

<p>That&rsquo;s pretty professional and is a good stopping point. Normally, I would change the text fonts as well, but that&rsquo;s a subject for another post.</p>
<h2 id="making-a-ggplot2-scatterplot">Making a ggplot2 Scatterplot</h2>
<p>Scatterplots are also efficient to do in ggplot2, which especially useful as making a plot containing 15,101 points might cause spreadsheets to freeze.</p>
<p>Creating a scatterplot of the relationship between listicle size and the number of Facebook shares the listicle receives is essentially the same procedure as creating a histogram, except that the x-axis and y-axis aesthetic vectors must be declared explicitly.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-R" data-lang="R"><span class="line"><span class="cl"><span class="nf">ggplot</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="nf">aes</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">listicle_size</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="n">num_fb_shares</span><span class="p">))</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">	<span class="nf">geom_point</span><span class="p">()</span>
</span></span></code></pre></div><figure>

    <img loading="lazy" srcset="/2015/02/ggplot-tutorial/tutorial_5_hu_1efa002c2ca731eb.webp 320w,/2015/02/ggplot-tutorial/tutorial_5_hu_50aaad3d20287488.webp 768w,/2015/02/ggplot-tutorial/tutorial_5_hu_89f3f718f94bd76c.webp 1024w,/2015/02/ggplot-tutorial/tutorial_5.png 1200w" src="tutorial_5.png"/> 
</figure>

<p>Because there are a few listicles with <em>over 1 million</em> Facebook shares (welcome to 2015), the entire plot is skewed. As a result, we need to compress the plot by scaling the y-axis logarithmically using <code>scale_y_log10</code>. Additionally, there will be a large amount of overlap between points due to the large sample size, so we need to greatly reduce the opacity of the points. (I set to 5% for this chart, but the best value can be determined through trial and error)</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-R" data-lang="R"><span class="line"><span class="cl"><span class="nf">ggplot</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="nf">aes</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">listicle_size</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="n">num_fb_shares</span><span class="p">))</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">	<span class="nf">geom_point</span><span class="p">(</span><span class="n">alpha</span><span class="o">=</span><span class="m">0.05</span><span class="p">)</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">	<span class="nf">scale_y_log10</span><span class="p">(</span><span class="n">labels</span><span class="o">=</span><span class="n">comma</span><span class="p">)</span>
</span></span></code></pre></div><figure>

    <img loading="lazy" srcset="/2015/02/ggplot-tutorial/tutorial_6_hu_1409787eb4a44057.webp 320w,/2015/02/ggplot-tutorial/tutorial_6_hu_e7b0cea0547fd231.webp 768w,/2015/02/ggplot-tutorial/tutorial_6_hu_693fca0a69cb5e07.webp 1024w,/2015/02/ggplot-tutorial/tutorial_6.png 1200w" src="tutorial_6.png"/> 
</figure>

<p>That&rsquo;s a lot more intuitive, and it makes it clear that there is indeed a positive relationship between listicle size and the number of Facebook shares.</p>
<p>Now we can apply the theme and labels:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-R" data-lang="R"><span class="line"><span class="cl"><span class="nf">ggplot</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="nf">aes</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">listicle_size</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="n">num_fb_shares</span><span class="p">))</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">	<span class="nf">geom_point</span><span class="p">(</span><span class="n">alpha</span><span class="o">=</span><span class="m">0.05</span><span class="p">)</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">	<span class="nf">scale_y_log10</span><span class="p">(</span><span class="n">labels</span><span class="o">=</span><span class="n">comma</span><span class="p">)</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">	<span class="nf">fte_theme</span><span class="p">()</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">	<span class="nf">labs</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="s">&#34;# of Entries in Listicle&#34;</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="s">&#34;# of Facebook Shares&#34;</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="s">&#34;FB Shares vs. Listicle Size for BuzzFeed Listicles&#34;</span><span class="p">)</span>
</span></span></code></pre></div><figure>

    <img loading="lazy" srcset="/2015/02/ggplot-tutorial/tutorial_7_hu_235d915d0d56a15d.webp 320w,/2015/02/ggplot-tutorial/tutorial_7_hu_cdff48a5c5f9f001.webp 768w,/2015/02/ggplot-tutorial/tutorial_7_hu_dbdc9597dc29b536.webp 1024w,/2015/02/ggplot-tutorial/tutorial_7.png 1200w" src="tutorial_7.png"/> 
</figure>

<p>And then the final touches. We can include the same horizontal line, x-axis behavior, and point color as with the last plot. However, for the y-axis, we have room to include each power of 10 between 1 and 1,000,000 as breaks, which we can do through a cute R syntax trick: <code>10^(0:6)</code>. While the chart shows a positive relationship between the variables, the shape is ambiguous and it may be helpful to add a trend line. We use <code>geom_smooth</code> to add a trendline representing a <a href="http://www.inside-r.org/r-doc/mgcv/gam">generalized additive model</a> with a 95% confidence interval.</p>
<p>Putting it all together:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-R" data-lang="R"><span class="line"><span class="cl"><span class="nf">ggplot</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="nf">aes</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">listicle_size</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="n">num_fb_shares</span><span class="p">))</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">	<span class="nf">geom_point</span><span class="p">(</span><span class="n">alpha</span><span class="o">=</span><span class="m">0.05</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">&#34;#c0392b&#34;</span><span class="p">)</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">	<span class="nf">scale_x_continuous</span><span class="p">(</span><span class="n">breaks</span><span class="o">=</span><span class="nf">seq</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="m">50</span><span class="p">,</span> <span class="n">by</span><span class="o">=</span><span class="m">5</span><span class="p">))</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">	<span class="nf">scale_y_log10</span><span class="p">(</span><span class="n">labels</span><span class="o">=</span><span class="n">comma</span><span class="p">,</span> <span class="n">breaks</span><span class="o">=</span><span class="m">10</span><span class="nf">^</span><span class="p">(</span><span class="m">0</span><span class="o">:</span><span class="m">6</span><span class="p">))</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">	<span class="nf">geom_hline</span><span class="p">(</span><span class="n">yintercept</span><span class="o">=</span><span class="m">1</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="m">0.4</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">&#34;black&#34;</span><span class="p">)</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">	<span class="nf">geom_smooth</span><span class="p">(</span><span class="n">alpha</span><span class="o">=</span><span class="m">0.25</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">&#34;black&#34;</span><span class="p">,</span> <span class="n">fill</span><span class="o">=</span><span class="s">&#34;black&#34;</span><span class="p">)</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">	<span class="nf">fte_theme</span><span class="p">()</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">	<span class="nf">labs</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="s">&#34;# of Entries in Listicle&#34;</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="s">&#34;# of Facebook Shares&#34;</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="s">&#34;FB Shares vs. Listicle Size for BuzzFeed Listicles&#34;</span><span class="p">)</span>
</span></span></code></pre></div><figure>

    <img loading="lazy" srcset="/2015/02/ggplot-tutorial/tutorial_8_hu_f02e93ab4c4243d9.webp 320w,/2015/02/ggplot-tutorial/tutorial_8_hu_a9c57497fd476c46.webp 768w,/2015/02/ggplot-tutorial/tutorial_8_hu_edd772a45506a640.webp 1024w,/2015/02/ggplot-tutorial/tutorial_8.png 1200w" src="tutorial_8.png"/> 
</figure>

<p>Now that is pretty insightful.</p>
<p>Hopefully, this small overview of how ggplot2 gives you an small idea of what it can do. This is just the tip of the iceberg. However, making cooler charts such as categorical bar charts, charts with multiple factor variables, and charts with multiple facets require smart data preprocessing, which is a topic for another blog post.</p>
<hr>
<p><em>You can access a copy of the code used in this blog post <a href="https://github.com/minimaxir/ggplot-tutorial">at this GitHub repository</a>.</em></p>
]]></content:encoded>
    </item>
  </channel>
</rss>
