<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Word Clouds on Max Woolf&#39;s Blog</title>
    <link>https://minimaxir.com/tag/word-clouds/</link>
    <description>Recent content in Word Clouds on Max Woolf&#39;s Blog</description>
    <image>
      <title>Max Woolf&#39;s Blog</title>
      <url>https://minimaxir.com/android-chrome-512x512.png</url>
      <link>https://minimaxir.com/android-chrome-512x512.png</link>
    </image>
    <generator>Hugo</generator>
    <language>en</language>
    <copyright>Copyright Max Woolf © 2026</copyright>
    <lastBuildDate>Mon, 09 May 2016 08:00:00 -0700</lastBuildDate>
    <atom:link href="https://minimaxir.com/tag/word-clouds/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Creating Stylish, High-Quality Word Clouds Using Python and Font Awesome Icons</title>
      <link>https://minimaxir.com/2016/05/wordclouds/</link>
      <pubDate>Mon, 09 May 2016 08:00:00 -0700</pubDate>
      <guid>https://minimaxir.com/2016/05/wordclouds/</guid>
      <description>Why not make a word cloud which looks like a line chart?</description>
      <content:encoded><![CDATA[<p>You&rsquo;ve probably seen word clouds around the internet. There are several popular free tools for creating them, such as <a href="http://www.wordle.net">Wordle</a>. I myself am a fan of them, and I have made them for previous posts using the <a href="http://www.inside-r.org/packages/cran/wordcloud/docs/wordcloud">wordcloud package</a> for R.</p>
<p>Word clouds are not the most scientific type of data visualization. However, they are a very <em>information-dense</em> representation of the frequency of all words in a given text. Word clouds are more effective than just using bar charts displaying the counts of words for large amounts of text, as the chart would be difficult to parse if there are too many bars.</p>
<p>The <a href="https://github.com/amueller/word_cloud">Python word_cloud package</a> by Andreas Mueller is relatively popular. A Reddit bot <a href="https://github.com/Winneon/makeswordclouds">makeswordclouds</a> by Jesse Bryan automatically <a href="https://www.reddit.com/user/makeswordcloudsagain">generates a word cloud</a> of comments on <a href="https://www.reddit.com">Reddit</a> submissions using this package. However, when I first saw the example output on the package, I was not impressed.</p>
<figure>

    <img loading="lazy" srcset="/2016/05/wordclouds/constitution_hu_b47d362759f84e4f.webp 320w,/2016/05/wordclouds/constitution.png 400w" src="constitution.png"/> 
</figure>

<p>I did more research into the <a href="http://amueller.github.io/word_cloud/">package documentation</a>. I found that there are two important perks present the Python implementation:</p>
<ol>
<li>Python word_cloud allows the user to specify a mask to constrain the distribution of words.</li>
</ol>
<figure>

    <img loading="lazy" srcset="/2016/05/wordclouds/a_new_hope_1_hu_d9564f860998e9fe.webp 320w,/2016/05/wordclouds/a_new_hope_1_hu_c5d7d708a3cebcb4.webp 768w,/2016/05/wordclouds/a_new_hope_1.png 800w" src="a_new_hope_1.png"/> 
</figure>

<ol start="2">
<li>In addition to the mask, Python word_cloud allows the user to use the original colors of the image to set the colors of the words.</li>
</ol>
<figure>

    <img loading="lazy" srcset="/2016/05/wordclouds/colored_2_hu_c5e96b6dbf22083e.webp 320w,/2016/05/wordclouds/colored_2_hu_490d47bb27e6d9a7.webp 768w,/2016/05/wordclouds/colored_2.png 800w" src="colored_2.png"/> 
</figure>

<p>The masks are the more interesting aspect for creating visualizations, but <em>where</em> do you get the masks? You can manually trace and extract objects from images, but that can be time consuming and the masks will likely be heavily aliased and at a low resolution (the size of the mask sets the size of the word cloud).</p>
<p>Enter <a href="https://fortawesome.github.io/Font-Awesome/">Font Awesome</a>, an icon font by Dave Gandy which is <em>very</em> widely used throughout the Internet (including this website). Icon fonts contain a wide variety of shapes and are vectorized, and therefore they can scale to any size.</p>
<figure>

    <img loading="lazy" srcset="/2016/05/wordclouds/fa_icons_hu_e66c7ec001416440.webp 320w,/2016/05/wordclouds/fa_icons_hu_11694fa082aa0dd.webp 768w,/2016/05/wordclouds/fa_icons_hu_e8e8a8efb2fa837.webp 1024w,/2016/05/wordclouds/fa_icons.png 1328w" src="fa_icons.png"/> 
</figure>

<p>So why not use Font Awesome icons as masks for the word cloud? The font icons need to be extracted and rasterized as an image in order to be usable with the Python word_cloud package: cue the Python script <a href="https://github.com/Pythonity/icon-font-to-png">Icon Font to PNG</a> by Pythonity which does what the name implies.</p>
<figure>

    <img loading="lazy" srcset="/2016/05/wordclouds/fa_icons_finder_hu_42a651d300d0a496.webp 320w,/2016/05/wordclouds/fa_icons_finder_hu_31f91f350f560e31.webp 768w,/2016/05/wordclouds/fa_icons_finder.png 869w" src="fa_icons_finder.png"/> 
</figure>

<p>Now <em>every</em> Font Awesome icon can be used as a word cloud mask! And the icons can be exported at any size: for this post, I render the word clouds at <strong>2048x2048px</strong>, larger than most desktop screens! After hacking the Python scripts included with the package which were used to create the default word clouds, I managed to create a few interesting examples.</p>
<h2 id="reddit-data-and-thematic-icons">Reddit Data and Thematic Icons</h2>
<p>Font Awesome has <a href="http://fortawesome.github.io/Font-Awesome/icon/line-chart/">icons for charts</a>, which logically appeals to me as a data person. Why not make a word cloud which looks like a line chart?</p>
<p>Let&rsquo;s use the word counts of titles of submissions to the <a href="https://www.reddit.com/r/dataisbeautiful/">/r/dataisbeautiful subreddit</a> on Reddit which have scored at least 100 points (using the <a href="https://bigquery.cloud.google.com/table/fh-bigquery:reddit_posts.full_corpus_201512">Reddit data dump</a> located on <a href="https://cloud.google.com/bigquery/">BigQuery</a>).</p>
<p>Additionally, we can improve on the design of the default word cloud output by forcing all-caps text and by changing the text font. For word clouds, I prefer to use condensed font families, as they can allow for more information to be displayed in the word cloud. In this example, I will be using the <a href="https://www.myfonts.com/fonts/paratype/din-condensed/">DIN Condensed</a> font, a font native to OS X and a font you&rsquo;ve likely seen in media advertisements and website logos.</p>
<p>Putting it all together:</p>
<figure>

    <img loading="lazy" srcset="/2016/05/wordclouds/dataisbeautiful_wordcloud_hu_d8248af421941e70.webp 320w,/2016/05/wordclouds/dataisbeautiful_wordcloud.png 596w" src="dataisbeautiful_wordcloud.png"/> 
</figure>

<!-- _(All word clouds in this post are shrunk to 600x600px to reduce loading time: click on the image for the full 2048x2048px resolution.)_ -->
<h2 id="github-data-and-brand-icons">GitHub Data and Brand Icons</h2>
<p>Font Awesome also contains icons representing the logos of popular internet brands, such as Facebook and Twitter.</p>
<p><a href="https://github.com">GitHub</a> is another such website. I will use BigQuery again with the <a href="https://bigquery.cloud.google.com/table/githubarchive:year.2014">2014 GitHub Archive dataset</a> to gather word counts of git commit messages during that year, and use the modern GitHub logo as the mask. This time, I will incorporate the freeware condensed monospaced font <a href="https://www.fontsquirrel.com/fonts/M-1m">M+ 1m</a> in order to create a more code-like aesthetic, which creates an interesting look when juxtaposed with the negative space of GitHub&rsquo;s logo.</p>
<figure>

    <img loading="lazy" srcset="/2016/05/wordclouds/github_wordcloud_hu_c9c536a414c36067.webp 320w,/2016/05/wordclouds/github_wordcloud.png 600w" src="github_wordcloud.png"/> 
</figure>

<h2 id="yelp-data-and-sentiment-icons">Yelp Data and Sentiment Icons</h2>
<p>One of the <a href="http://minimaxir.com/2014/09/one-star-five-stars/">earliest word clouds I made</a> was for the <a href="http://www.yelp.com">Yelp</a> reviews dataset from the <a href="https://www.yelp.com/dataset_challenge">Yelp Dataset Challenge</a> to compare and contrast verbiage between 1-star reviews and 5-star reviews. Let&rsquo;s remake those word clouds.</p>
<p>At this point I should mention appropriate color palettes for word clouds since the rainbows of the stereotypical word clouds can be distracting. I strongly recommend using the <a href="http://colorbrewer2.org">ColorBrewer</a> palettes, helpfully provided for this use case with the <a href="https://github.com/jiffyclub/palettable">paletteable Python library</a> by Matt Davis. I particularly like the <a href="https://jiffyclub.github.io/palettable/colorbrewer/sequential/">sequential palettes</a>, which follow a clean gradient between white and another color (or between two or three colors), although I ignore some of the lighter colors as they may not be visible against white backgrounds.</p>
<p>The font choice this time is <a href="https://www.google.com/fonts/specimen/Open&#43;Sans&#43;Condensed">Open Sans Condensed</a>, a Google Font. <a href="https://www.google.com/fonts">Google Fonts</a> are free and open source. I strongly recommend using them for documents/websites to add some flair over default fonts.</p>
<p>Using the <strong>Greens</strong> palette and a smiley-face Font Awesome icon on 5-star reviews:</p>
<figure>

    <img loading="lazy" srcset="/2016/05/wordclouds/yelp_pos_wordcloud_hu_a9ecfbd1a53f0b43.webp 320w,/2016/05/wordclouds/yelp_pos_wordcloud.png 600w" src="yelp_pos_wordcloud.png"/> 
</figure>

<p>Makes sense, although the thin lines of the smiley-face causes the font sizes to become constrained. How about the inverse: <strong>Reds</strong>, thumbs-down, and 1-star reviews?</p>
<figure>

    <img loading="lazy" srcset="/2016/05/wordclouds/yelp_neg_wordcloud_hu_6347f60b9210c07.webp 320w,/2016/05/wordclouds/yelp_neg_wordcloud.png 600w" src="yelp_neg_wordcloud.png"/> 
</figure>

<h2 id="facebook-data-and-etc-icons">Facebook Data and Etc. Icons</h2>
<p>I recently updated my <a href="https://github.com/minimaxir/facebook-page-post-scraper">Facebook Page Data Scraper</a>, which gathers public posts made by Facebook Pages, to now retrieve total Reaction counts on those posts instead of just Likes.</p>
<p>Why not create a word cloud of news headlines to get a zeitgeist of popular discussion? To do this, I scraped all the public posts from <a href="https://www.facebook.com/cnn/">CNN&rsquo;s Facebook page</a>, and created a word cloud of all CNN headlines to which the posts link. Let&rsquo;s use the Google Font <a href="https://www.google.com/fonts/specimen/Amatic&#43;SC">Amatic SC</a> which you&rsquo;ve likely seen before in ads, and let&rsquo;s try a <a href="https://jiffyclub.github.io/palettable/colorbrewer/qualitative/">qualitative palette</a>, <strong>Dark2</strong>, to get a &ldquo;rainbow&rdquo; effect without looking gawdy.</p>
<p>And use a flag icon, because why not?</p>
<figure>

    <img loading="lazy" srcset="/2016/05/wordclouds/cnn_wordcloud_hu_6d5ffd4cc76e5324.webp 320w,/2016/05/wordclouds/cnn_wordcloud.png 600w" src="cnn_wordcloud.png"/> 
</figure>

<p>Of course, CNN has had fun with the 2016 U.S. Presidential election.</p>
<p>Let&rsquo;s do one more word cloud. We have not yet done a word cloud using the colors-from-original-image technique. Using the underlying <a href="http://matplotlib.org/examples/color/colormaps_reference.html">matplotlib color map</a> for the <strong>Spectral</strong> palette, we can overlay a spatial rainbow which determines the color of the words displayed at that area of the mask:</p>
<figure>

    <img loading="lazy" srcset="/2016/05/wordclouds/spectral_hu_511415aa3cfca0ba.webp 320w,/2016/05/wordclouds/spectral.jpg 600w" src="spectral.jpg"/> 
</figure>

<p>Additionally, in order to estimate the importance of the presence of each word in generating Reactions, we can create a word cloud of the words where the words are sized not by count, but by the <strong>average number of Reactions</strong> on Facebook posts referencing CNN headlines containing that word (where the word is used in at least 20 headlines for some statistical correction). And let&rsquo;s try a black background.</p>
<figure>

    <img loading="lazy" srcset="/2016/05/wordclouds/cnn_wordcloud_reactions_hu_f2d4977d2913e7a4.webp 320w,/2016/05/wordclouds/cnn_wordcloud_reactions.png 600w" src="cnn_wordcloud_reactions.png"/> 
</figure>

<p>While the word cloud cannot be output in a vectorized format using this method, creating a word cloud at a super-high resolution (even larger than 2048x2048px) is more-than-enough for making typical wall posters and t-shirts.</p>
<p>Word clouds may not have as much explanatory value in the academic sense, but they have <em>persuasive</em> power, which is just as important. At the least, it&rsquo;s another visual technique in my fun bag of visualization tricks to spice up future blog posts.</p>
<p>Performing postprocessing on rendered word clouds can help create especially artsy art, but that discussion best-saved for <a href="https://raw.githubusercontent.com/minimaxir/stylistic-word-clouds/master/wordclouds/starry_night_cnn_weight_12_iterations_500_smooth_5.png">another time</a>.</p>
<hr>
<p><em>You can view the scripts to create the word clouds in this posts in this <a href="https://github.com/minimaxir/stylistic-word-clouds">GitHub repository</a>; the code is more hacky than usual, but it should be clear enough to demonstrate how the raw data was processed in each instance and how the word clouds were rendered. In the future, I hope to create a <a href="http://flask.pocoo.org">Flask app</a> based on these scripts to streamline the creation of word clouds.</em></p>
]]></content:encoded>
    </item>
    <item>
      <title>The Data From Our Comments to the FCC About Net Neutrality</title>
      <link>https://minimaxir.com/2014/08/comments-about-comments/</link>
      <pubDate>Fri, 08 Aug 2014 08:00:00 -0700</pubDate>
      <guid>https://minimaxir.com/2014/08/comments-about-comments/</guid>
      <description>The FCC released a dataset of about 450,000 comments against net neutrality. Looking at the data behind these comments, it is clear to see that the entire country is passionate against the rule changes to net neutrality.</description>
      <content:encoded><![CDATA[<p>This year, the Federal Communications Commission, one of the governmental entities which polices the Internet in the United States, announced significant rule changes to the policy of &ldquo;<a href="http://www.fcc.gov/openinternet">Open Internet</a>.&rdquo; Open Internet, more commonly known as &ldquo;<a href="http://en.wikipedia.org/wiki/Net_neutrality">net neutrality</a>,&rdquo; helps businesses facilitate competition and promote innovation on the internet, which help improve the internet as a whole. However, the proposed rule changes allow internet service providers (ISPs) to discriminate between different types of internet traffic (a &ldquo;fast lane&rdquo; for video and social media, for example). Said pricing discrimination may end up affecting the consumers instead (e.g. paying $10/month for access to Facebook), which may reduce innovation due to increased costs to the consumers of internet bandwidth, i.e. the average American citizen.</p>
<p>The FCC recently <a href="http://www.fcc.gov/comments">opened up a comment period</a>, where the U.S. public can <a href="http://apps.fcc.gov/ecfs/upload/display?z=s6uf0">send or e-mail comments</a> on the changes to this policy. Naturally, the consumers of the internet reacted strongly. By August 2014, over <em>1.1 million comments</em> have been received by the FCC.</p>
<figure>

    <img loading="lazy" srcset="/2014/08/comments-about-comments/fcc-words-small_hu_bbe0f548669f39f1.webp 320w,/2014/08/comments-about-comments/fcc-words-small_hu_b0b52a06f3d6cd42.webp 768w,/2014/08/comments-about-comments/fcc-words-small.png 1000w" src="fcc-words-small.png"/> 
</figure>

<p>This week, the FCC <a href="http://www.fcc.gov/blog/fcc-makes-open-internet-comments-more-accessible-public">released a dataset</a> of about <a href="http://www.fcc.gov/files/ecfs/14-28/ecfs-files.htm">450,000 of these comments</a>. Looking at the data behind these comments, it&rsquo;s clear to see that the entire country is passionate against the rule changes to net neutrality.</p>
<p>Here&rsquo;s a timeline of when comments about net neutrality were sent to the FCC:</p>
<figure>

    <img loading="lazy" srcset="/2014/08/comments-about-comments/fcc-timeline-annotated_hu_4ceafb8506e23d27.webp 320w,/2014/08/comments-about-comments/fcc-timeline-annotated_hu_e49dff4e9650bf51.webp 768w,/2014/08/comments-about-comments/fcc-timeline-annotated_hu_b1f601ea95602509.webp 1024w,/2014/08/comments-about-comments/fcc-timeline-annotated.png 1200w" src="fcc-timeline-annotated.png"/> 
</figure>

<p>There are clear spikes after important initiatives for awareness of the FCC&rsquo;s ruling. May 15th marked the <a href="http://www.savetheinternet.com/net-neutrality-resources">beginning of the Open Comment period</a> for the FCC&rsquo;s new guidelines, June 3rd marked the week after an airing of Last Week Tonight with John Oliver, which contained an <a href="https://www.youtube.com/watch?v=fpbOEoRrHyU">anti-net-neutrality rant</a> which went viral for the rest of the week. July 15th marked the close of the Open Comment period, which is why on July 14th, the internet rallied and sent in over a hundred thousand comments, which <a href="http://www.nydailynews.com/news/politics/fcc-extends-net-neutrality-open-comment-deadline-friday-article-1.1868238#kDMozMu84rJ5TPsl.97">crashed their servers</a> and forced them to extend the deadline.</p>
<p>But as with many awareness campaigns over the internet, this campaign may have &ldquo;<a href="http://en.wikipedia.org/wiki/Slacktivism">slacktivists</a>&rdquo;, as evidenced by the flat lines after the events where people stopped writing comments. How much effort did the U.S. people actually put into their submissions? One way to tell is to check the length of the submissions.</p>
<figure>

    <img loading="lazy" srcset="/2014/08/comments-about-comments/fcc-comment-length_hu_1493f7d770de8443.webp 320w,/2014/08/comments-about-comments/fcc-comment-length_hu_7b0f6d6baeb752b.webp 768w,/2014/08/comments-about-comments/fcc-comment-length_hu_bd8d71d4d1fb6cbe.webp 1024w,/2014/08/comments-about-comments/fcc-comment-length.png 1200w" src="fcc-comment-length.png"/> 
</figure>

<p>Many comments were one-liners at about 20 words each, and many comments were multiparagraph notes at about 180 words each. But why is there a giant spike at about 300 words?</p>
<p>As it turns out, there were over 100,000 comments with exactly 1,477 characters (approximately 290 words). That number of characters (before cleaning) corresponds to a comment following this template:</p>
<blockquote>
<p>Net neutrality is the First Amendment of the Internet, the principle that Internet service providers (ISPs) treat all data equally. As an Internet user, net neutrality is vitally important to me. The FCC should use its Title II authority to protect it.</p>
<p>Most Americans have only one choice for truly high speed Internet: their local cable company. This is a political failure, and it is an embarrassment. America deserves competition and choice.</p>
<p>Without net neutrality, a bad situation gets even worse. These ISPs will now be able to manipulate our Internet experience by speeding up some services and slowing down others. That kills choice, diversity, and quality.</p>
<p>It also causes tremendous economic harm. If ISPs can speed up favored services and slow others, new businesses will no longer be able to rely on a level playing field. When ISPs can slow your site and destroy your business at will, how can any startup attract investors?</p>
<p>My friends, family, and I use the Internet for conversation and fun, but also for work and business. When you let ISPs mess with our Internet experience, you are attacking our social lives, our entertainment, and our economic well being. We won&rsquo;t stand for it.</p>
<p>ISPs are opposing Title II so that they can destroy the FCC&rsquo;s net neutrality rules in court. This is the same trick they pulled last time. Please, let&rsquo;s not be fooled again. Title II is the strong, legally sound way to enforce net neutrality. Use it.</p>
</blockquote>
<p>This is the default template for a submission at the <a href="https://www.battleforthenet.com">Battle for Net Neutrality</a> website. That means over about 1/4th of the comments in the dataset, and atleast 1/10th of all comments submitted, used this website&rsquo;s submission form.</p>
<h2 id="comments-across-the-nation">Comments Across the Nation</h2>
<p>Net neutrality affects some individuals more than others. Not everyone in the U.S. may be as passionate over the issue, and many may not even be aware that such a threat to the modern internet even exists.</p>
<p>Which cities in the United States sent the most comments to the FCC?</p>
<figure>

    <img loading="lazy" srcset="/2014/08/comments-about-comments/fcc-city_hu_86624bc5cfccc9e3.webp 320w,/2014/08/comments-about-comments/fcc-city_hu_e1e9540c20098c1d.webp 768w,/2014/08/comments-about-comments/fcc-city_hu_af85cb4025c464ca.webp 1024w,/2014/08/comments-about-comments/fcc-city.png 1200w" src="fcc-city.png"/> 
</figure>

<p>Yes, Brooklyn, NY counts as a city according to the FCC.</p>
<p>It&rsquo;s not surprising that the three most populated cities in the U.S. (New York, Los Angeles, Chicago) top this chart due to the higher potential number of commenters. What is surprising yet important is that tech hubs with much fewer populations, such as San Francisco, Seattle, and Portland, all have extremely strong showings.</p>
<p>When you look at the distribution of comments by state of origin, it&rsquo;s even more apparent that California and Washington are some of the key drivers of the comments. (admittingly, it does resemble a <a href="https://xkcd.com/1138/">population map</a>)</p>
<figure>

    <img loading="lazy" srcset="/2014/08/comments-about-comments/fcc-state-map_hu_6aff9e05e70e7b5a.webp 320w,/2014/08/comments-about-comments/fcc-state-map_hu_7c70c7db5f767ad3.webp 768w,/2014/08/comments-about-comments/fcc-state-map_hu_8ae120d39636999b.webp 1024w,/2014/08/comments-about-comments/fcc-state-map.png 1200w" src="fcc-state-map.png"/> 
</figure>

<p>What are the key words mentioned in the comments to the FCC?</p>
<p>The key players in who would benefit the most from the implementation of net neutrality are <a href="http://www.comcast.com/">Comcast</a> and <a href="http://www.verizon.com/">Verizon</a>, two of the biggest ISPs in the country. Which states have been speaking out the most against these institutions?</p>
<figure>

    <img loading="lazy" srcset="/2014/08/comments-about-comments/fcc-state-map-comcast_hu_23e9195b938a61fd.webp 320w,/2014/08/comments-about-comments/fcc-state-map-comcast_hu_fcc95fc9141b68cb.webp 768w,/2014/08/comments-about-comments/fcc-state-map-comcast_hu_213bdad274ad9de1.webp 1024w,/2014/08/comments-about-comments/fcc-state-map-comcast.png 1200w" src="fcc-state-map-comcast.png"/> 
</figure>

<figure>

    <img loading="lazy" srcset="/2014/08/comments-about-comments/fcc-state-map-verizon_hu_74ef87638045159e.webp 320w,/2014/08/comments-about-comments/fcc-state-map-verizon_hu_72df43d42bfd28a5.webp 768w,/2014/08/comments-about-comments/fcc-state-map-verizon_hu_5dc423f6aab6bed9.webp 1024w,/2014/08/comments-about-comments/fcc-state-map-verizon.png 1200w" src="fcc-state-map-verizon.png"/> 
</figure>

<p>Comcast is a frequent topic of discussion (5.2% of all comments about net neutrality contain atleast 1 mention of Comcast), especially on the West Coast. On the other hand, less than half as many talk about Verizon (2.0% of all comments), except on the East Coast.</p>
<p>Although not on the map, Washington, DC actually had the most to comment on these two topics, with 11.7% comments having atleast one mention of Comcast and 8.8% of comments having atleast one mention of Verizon.</p>
<p><a href="http://www.netflix.com/">Netflix</a>, an internet video-streaming service which would likely be negatively impacted by the FCC ruling, was also discussed.</p>
<figure>

    <img loading="lazy" srcset="/2014/08/comments-about-comments/fcc-state-map-netflix_hu_919e04dd9ce22036.webp 320w,/2014/08/comments-about-comments/fcc-state-map-netflix_hu_6894ab91add91b9c.webp 768w,/2014/08/comments-about-comments/fcc-state-map-netflix_hu_1ada01945426ef3f.webp 1024w,/2014/08/comments-about-comments/fcc-state-map-netflix.png 1200w" src="fcc-state-map-netflix.png"/> 
</figure>

<p>Discussion of Netflix from all around the country is very evenly distributed (2.1% of all comments), potentially because it&rsquo;s not location-specific as the presence of an ISP. (Montana apparently does not care much about Netflix.)</p>
<p>Another concern about net neutrality is the potential <a href="https://www.aclu.org/net-neutrality">fight against the First Amendment</a> and free speech itself, as ISPs could theoretically restrict traffic to unfavorable websites under the system. Which states are most passionate about free speech?</p>
<figure>

    <img loading="lazy" srcset="/2014/08/comments-about-comments/fcc-state-map-free-speech_hu_9cef337d095c475d.webp 320w,/2014/08/comments-about-comments/fcc-state-map-free-speech_hu_eb326ed2bdfd71da.webp 768w,/2014/08/comments-about-comments/fcc-state-map-free-speech_hu_a3323da79f96b34f.webp 1024w,/2014/08/comments-about-comments/fcc-state-map-free-speech.png 1200w" src="fcc-state-map-free-speech.png"/> 
</figure>

<p>Much more activity in the Midwest and especially the North West than the previous maps. (2.1% of all comments discuss &ldquo;free speech&rdquo;) North Dakota apparently is relatively indifferent about the freedom of speech.</p>
<p>Unfortunately, at this period of time, it&rsquo;s hard to guess if the hundreds of thousands of comments sent to the FCC will actually cause them to reconsider their plans. The sheer quantity of comments, at the least, lets the FCC know that Americans feel strongly about the issue. It&rsquo;s clear that in the worst-case scenario where the ISPs win the net neutrality battle, the consumers of the internet in the United States <strong>will not remain passive</strong>.</p>
<hr>
<ul>
<li><em>Charts were generated using R and ggplot2.</em></li>
<li><em>You can view the data aggregated by date, by state, and by city in <a href="https://docs.google.com/spreadsheets/d/1D2T5Lg41IWfkQPEMLWq3fjU7_kJ8lDNc4H2wsbr4BbU/edit?usp=sharing">this Google Sheet</a>. You can download a CSV of the original comment metadata <a href="https://www.dropbox.com/s/7tzsk7kv7ctgydp/fcc_comments.zip">here</a>. [7.6MB .zip]</em></li>
<li><em>You can view a 3000px x 3000px image of the FCC comments word cloud <a href="http://i.imgur.com/I0dpEA6.png">here</a>.</em></li>
</ul>
<p><strong>*EDIT: 8/9/14</strong>: Per <a href="https://news.ycombinator.com/item?id=8153091">comments on Hacker News</a>, I&rsquo;ve changed wording in a few paragraphs for clarification.*</p>
]]></content:encoded>
    </item>
    <item>
      <title>The Wikipedia Entries Which Are Most-Edited by Members of the U.S. Congress</title>
      <link>https://minimaxir.com/2014/07/caucus-needed/</link>
      <pubDate>Tue, 15 Jul 2014 08:30:00 -0700</pubDate>
      <guid>https://minimaxir.com/2014/07/caucus-needed/</guid>
      <description>Saying that the results were surprising would be the understatement of the century.</description>
      <content:encoded><![CDATA[<p>Last week, the Twitter account <a href="https://twitter.com/congressedits">@congressedits</a> launched. This account is a bot that tweets edits to Wikipedia that were made by members of the U.S. Congress, in order to help <a href="http://inkdroid.org/journal/2014/07/10/why-congressedits/">facilitate transparency</a>. The account <a href="https://github.com/edsu/anon">works</a> by automatically tweeting any Wikipedia edits made by anonymous contributors with IP addresses between the known IP address blocks of the <a href="http://whois.arin.net/rest/org/USSAA/nets">U.S. Senate</a> or the <a href="http://whois.arin.net/rest/org/ISUHR/nets">House of Representatives</a>.</p>
<p>Google&rsquo;s <a href="https://developers.google.com/bigquery/">BigQuery</a> tool has a <a href="https://developers.google.com/bigquery/docs/dataset-wikipedia">sample dataset</a> of Wikipedia data, representing the data on 314 million article edits up to April 2010. Out of curiosity, I wrote a query which returns the top 100 pages with the most amount of edits by Wikipedia contributors in the U.S. Senate&rsquo;s IP block.</p>
<figure>

    <img loading="lazy" srcset="/2014/07/caucus-needed/senate-query_hu_16b02354a02d1c15.webp 320w,/2014/07/caucus-needed/senate-query_hu_fd95858d1b366022.webp 768w,/2014/07/caucus-needed/senate-query_hu_50bc2ae0160c0471.webp 1024w,/2014/07/caucus-needed/senate-query.png 1502w" src="senate-query.png"/> 
</figure>

<p>Using this query for the Senate&rsquo;s IP block, and a similar one for the House of Representatives IP blocks, I retrieved the most-edited entries for both entities. You can access the spreadsheet of this data by <a href="https://dl.dropboxusercontent.com/u/2017402/Congress_Wikipedia_Edits.pdf">downloading a .pdf</a> or by viewing the data online with <a href="https://www.icloud.com/iw/#numbers/BALY8siqBP4jYZq5E2OB20P-wlPpyXdycqqF/Congress_Wikipedia_Edits">Numbers for iCloud</a>, both of which contain high-resolution charts and clickable Wikipedia links. A <a href="https://docs.google.com/spreadsheets/d/1qfFEwzNzc4KL4gqe2i4IoMO0ksmoYndCK0O0m46n37I/edit?usp=sharing">Google Sheets</a> version is also available.</p>
<p>Here are the Top 10 Wikipedia entries with the most amount of edits by members of the Senate:</p>
<figure>

    <img loading="lazy" srcset="/2014/07/caucus-needed/senate-wikipedia_hu_b9352ce44d11c504.webp 320w,/2014/07/caucus-needed/senate-wikipedia_hu_efc5507cf51a738c.webp 768w,/2014/07/caucus-needed/senate-wikipedia_hu_7cfc8ec08e60bb8d.webp 1024w,/2014/07/caucus-needed/senate-wikipedia.png 1062w" src="senate-wikipedia.png"/> 
</figure>

<p>Wait a minute. Hawk from G.I. Joe?!</p>
<p>Saying that the query results were surprising would be the understatement of the century.</p>
<p>Two of the top-edited entries are directly pertaining to the U.S. Senate, which helps prove that the IP block is indeed the Senate&rsquo;s IP block. Both Kappa Upsilon Chi and <a href="http://en.wikipedia.org/wiki/Beta_Upsilon_Chi">Beta Upsilon Chi</a> are Christian fraternities. (however, the Kappa Upsilon Chi Wikipedia entry no longer exists for some reason)</p>
<p>The edits corresponding to actual people are ones which are the most interesting. <a href="http://en.wikipedia.org/wiki/William_Swain_Lee">William Swain Lee</a> is a Delaware politician whose entry was <a href="http://en.wikipedia.org/w/index.php?title=William_Swain_Lee&amp;diff=prev&amp;oldid=31202175">created and edited</a> by a <a href="http://en.wikipedia.org/wiki/Special:Contributions/156.33.148.107">user in the Senate IP block</a>. OrangePie is a user who, <a href="http://en.wikipedia.org/wiki?curid=7319910">according to his talk page</a>, was criticized for repeatedly recreating an entry for &ldquo;Michael Hardaway&rdquo; after deletion, who coincidentally worked for the Senate <a href="https://twitter.com/michaelhardaway">according to his Twitter bio</a>. In journalist <a href="http://en.wikipedia.org/wiki?curid=8593106">Paul D. Thacker&rsquo;s</a> entry, one Senate editor <a href="http://en.wikipedia.org/w/index.php?title=Paul_D._Thacker&amp;diff=311839513&amp;oldid=311689066">replaced a paragraph</a> of Thacker&rsquo;s biography with the word &ldquo;anus?&rdquo;. Jay Rockefeller is an <a href="http://en.wikipedia.org/wiki?curid=337026">actual U.S. Senator</a>, so the edits are definitely a conflict of interest. The <a href="http://en.wikipedia.org/wiki/Special:Contributions/156.33.96.28">user who made the edits</a> apparently also removed <a href="http://en.wikipedia.org/w/index.php?title=Jay_Rockefeller&amp;diff=prev&amp;oldid=33857327">information about a government investigation</a> into the Senator.</p>
<p>I have nothing to add for <a href="http://en.wikipedia.org/wiki?curid=2814171">Hawk from G.I. Joe</a>.</p>
<p>Other interesting frequently-edited Wikipedia entries from members of the U.S. Senate are <a href="http://en.wikipedia.org/wiki?curid=3626593">Primetime Emmy Award for Outstanding Supporting Actor – Comedy Series</a> (11 edits), <a href="http://en.wikipedia.org/wiki?curid=1226609">Wikipedia:Introduction</a> (5 edits) and <a href="http://en.wikipedia.org/wiki?curid=1749535">Crash (2004 film)</a> (5 edits)</p>
<p>The Wikipedia entries with the most amount of edits by members of the House of Representatives are somehow even <em>weirder</em>, and that&rsquo;s quite an accomplishment.</p>
<figure>

    <img loading="lazy" srcset="/2014/07/caucus-needed/house-wikipedia_hu_262092abb9a17599.webp 320w,/2014/07/caucus-needed/house-wikipedia_hu_8550ed885ba7ffea.webp 768w,/2014/07/caucus-needed/house-wikipedia_hu_9b51b61749f2ac71.webp 1024w,/2014/07/caucus-needed/house-wikipedia.png 1054w" src="house-wikipedia.png"/> 
</figure>

<p>Well, if <em>anyone</em> in the entire United States would be experts on the topics of <a href="http://en.wikipedia.org/wiki?curid=2352587">cleft chins</a> and <a href="http://en.wikipedia.org/wiki?curid=1924543">dimples</a>, it would be the members of the House of Representatives.</p>
<p>Again, one of the most-edited entries corresponds to a House of Representatives topic, which helps validate the IP blocks. The <a href="http://en.wikipedia.org/wiki?curid=107610">Cerritos, California</a> location had <a href="http://en.wikipedia.org/w/index.php?title=Cerritos,_California&amp;diff=21384826&amp;oldid=21363395">neutral edits</a> made by a <a href="http://en.wikipedia.org/wiki/Special:Contributions/143.231.249.141">rather dedicated Wikiuser</a>. Wynne, Arkansas and Michelle Ye&rsquo;s edits were made by the same dedicated Wikiuser. <a href="http://en.wikipedia.org/wiki?curid=1143590">Waverly, Pennsylvania</a> was edited by a <a href="http://en.wikipedia.org/wiki/Special:Contributions/137.18.255.33">user</a> who&rsquo;s <a href="http://en.wikipedia.org/w/index.php?title=Waverly,_Pennsylvania&amp;diff=7763793&amp;oldid=7761053">really passionate about Doc&rsquo;s Deli</a>. <a href="http://en.wikipedia.org/wiki?curid=1129560">Luis Fortuno</a>, former governor of Puerto Rico, had his <a href="http://en.wikipedia.org/w/index.php?title=Luis_Fortu%C3%B1o&amp;diff=prev&amp;oldid=134653411">history excised</a> by <a href="http://en.wikipedia.org/wiki/Special:Contributions/143.231.249.137">another user</a>. <a href="http://en.wikipedia.org/wiki?curid=6260346">Betty Sutton</a>, however, is a actual Representative from Ohio, representing another conflict of interest, as another <a href="http://en.wikipedia.org/wiki/Special:Contributions/143.228.129.9">user</a> constructed <a href="http://en.wikipedia.org/w/index.php?title=Betty_Sutton&amp;diff=303743778&amp;oldid=296652449">most of her entry</a>.</p>
<p>I have nothing to add regarding <a href="http://en.wikipedia.org/wiki?curid=862471">effeminacy</a> in the House of Representatives.</p>
<p>Other interesting edits by members of the House include <a href="http://en.wikipedia.org/wiki?curid=18951054">Apocalypse Now</a> (10 edits), <a href="http://en.wikipedia.org/wiki?curid=1161298">History of Italy as a monarchy and in the World Wars</a> (9 edits), and <a href="http://en.wikipedia.org/wiki?curid=34071">Whitney Houston</a> (9 edits)</p>
<p>In the end, the members of the U.S. Congress have the same peculiar interests as typical Americans. However, when these people edit entries on topics in which they are directly involved, the potential bias threatens the integrity of all Wikipedia. And this is just the tip of the iceburg.</p>
]]></content:encoded>
    </item>
  </channel>
</rss>
