<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Web &amp; Social Media on Max Woolf&#39;s Blog</title>
    <link>https://minimaxir.com/category/web-social-media/</link>
    <description>Recent content in Web &amp; Social Media on Max Woolf&#39;s Blog</description>
    <image>
      <title>Max Woolf&#39;s Blog</title>
      <url>https://minimaxir.com/android-chrome-512x512.png</url>
      <link>https://minimaxir.com/android-chrome-512x512.png</link>
    </image>
    <generator>Hugo</generator>
    <language>en</language>
    <copyright>Copyright Max Woolf © 2026</copyright>
    <lastBuildDate>Thu, 02 Jan 2025 09:30:00 -0800</lastBuildDate>
    <atom:link href="https://minimaxir.com/category/web-social-media/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Can LLMs write better code if you keep asking them to “write better code”?</title>
      <link>https://minimaxir.com/2025/01/write-better-code/</link>
      <pubDate>Thu, 02 Jan 2025 09:30:00 -0800</pubDate>
      <guid>https://minimaxir.com/2025/01/write-better-code/</guid>
      <description>Most coders want AI to write code faster: I want AI to write FASTER CODE.</description>
      <content:encoded><![CDATA[<p><span><style type="text/css">
pre code.language-txt {
white-space: pre-wrap !important;
word-break: normal !important;
}
</style></span></p>
<p>In November 2023, after OpenAI <a href="https://openai.com/index/dall-e-3-is-now-available-in-chatgpt-plus-and-enterprise/">added the ability</a> for ChatGPT to generate images from DALL-E 3 within the ChatGPT web interface, there was a <a href="https://lifehacker.com/tech/chat-gpt-make-it-more-ai-images-trend">short-lived meme</a> where users gave the LLM a base image and kept asking the model to &ldquo;make it more <em>X</em>&rdquo;, where <em>X</em> can be anything.</p>
<figure class="align-center ">

    <img loading="lazy" srcset="/2025/01/write-better-code/bro_hu_484c0ff30035ba2e.webp 320w,/2025/01/write-better-code/bro_hu_1162a7c634b35f7.webp 768w,/2025/01/write-better-code/bro_hu_9070d4b543cab815.webp 1024w,/2025/01/write-better-code/bro.webp 1024w" src="bro.webp#center"
         alt="A regular guy becomes more &ldquo;bro&rdquo; every time. via /u/Jojop0tato on Reddit."/> <figcaption>
            <p>A regular guy becomes more &ldquo;bro&rdquo; every time. <a href="https://www.reddit.com/r/ChatGPT/comments/18ukiz2/a_regular_guy_becomes_more_bro_every_time/">via /u/Jojop0tato on Reddit.</a></p>
        </figcaption>
</figure>

<figure class="align-center ">

    <img loading="lazy" srcset="/2025/01/write-better-code/santa_hu_1f046d64f5543bd.webp 320w,/2025/01/write-better-code/santa_hu_e0db183e83b65311.webp 768w,/2025/01/write-better-code/santa_hu_5d66897100afbdbf.webp 1024w,/2025/01/write-better-code/santa.webp 1024w" src="santa.webp#center"
         alt="Asked ChatGPT to make Santa Claus more and more serious. via /u/hessihan on Reddit."/> <figcaption>
            <p>Asked ChatGPT to make Santa Claus more and more serious. <a href="https://www.reddit.com/r/ChatGPT/comments/1887z49/asked_chatgpt_to_make_santa_claus_more_and_more/">via /u/hessihan on Reddit.</a></p>
        </figcaption>
</figure>

<p>The trend quickly died as all of these images were very samey and uninteresting, aside from the unexplainable trend that all of the examples eventually converged into something cosmic, irrespective of the starting image and the prompt. Although the trend was <a href="https://en.wikipedia.org/wiki/AI_slop">AI slop</a> before the term AI slop was codified, it&rsquo;s still academically interesting that such a meaningless and vague prompt had <em>some</em> appropriate impact on the final image, and that this change was obvious to the user.</p>
<p>What would happen if we tried a similar technique with code? LLM-generated code is unlikely to be slop (although <a href="https://daniel.haxx.se/blog/2024/01/02/the-i-in-llm-stands-for-intelligence/">not impossible</a>) as it follows strict rules, and unlike creative outputs such as images, code quality can be measured more objectively.</p>
<p>If code can indeed be improved simply through iterative prompting such as asking the LLM to &ldquo;make the code better&rdquo; — even though it&rsquo;s very silly — it would be a massive productivity increase. And if that&rsquo;s the case, what happens if you iterate on the code too much? What&rsquo;s the equivalent of code going cosmic? There&rsquo;s only one way to find out!</p>
<h2 id="casually-coding-with-an-llm">Casually Coding With An LLM</h2>
<p>Despite researching and developing tooling around LLMs even long before ChatGPT, I haven&rsquo;t been fond of using LLM code copilots such as <a href="https://github.com/features/copilot">GitHub Copilot</a> for coding assistance. The constant mental context switching between &ldquo;oh, the LLM autocompleted my code, neat&rdquo;/&ldquo;what question should I ask the LLM&rdquo; and &ldquo;is the LLM-generated code is actually <em>correct</em> and not <a href="https://en.wikipedia.org/wiki/Hallucination_%28artificial_intelligence%29">hallucinating</a> correct code&rdquo; kept creating enough distractions that any productivity gains from using the AI were net neutral at best. That&rsquo;s also disregarding the expensive cost of using said LLMs.</p>
<p><a href="https://www.anthropic.com/news/claude-3-5-sonnet">Claude 3.5 Sonnet</a> has made me rethink things. Due to whatever secret sauce <a href="https://www.anthropic.com">Anthropic</a> used in its training, the latest version of Claude 3.5 Sonnet (<code>claude-3-5-sonnet-20241022</code>) has <em>incredible</em> prompt adherence for all types of prompts, especially coding prompts. <a href="https://www.vellum.ai/blog/llm-benchmarks-overview-limits-and-model-comparison">Coding</a> <a href="https://aider.chat/docs/leaderboards/">benchmarks</a> confirm that testing between Claude 3.5 Sonnet and GPT-4o, Claude wins, and anecdotally I&rsquo;ve had the same experience across a variety of technical and creative tasks.</p>
<h3 id="initial-ask">Initial Ask</h3>
<p>For this experiment, we will give Claude 3.5 Sonnet an interview-style coding prompt using Python: one that is both simple and could be implemented by a novice software engineer, but one that can be heavily optimized. This simple, casual prompt represents how the typical software engineer uses LLMs. Additionally, the test prompt must also be wholly original and not taken from coding tests such as <a href="https://leetcode.com">LeetCode</a> or <a href="https://www.hackerrank.com">HackerRank</a>, as LLMs were also likely trained on those and could possibly cheat by reciting the memorized answer.</p>
<blockquote>
<p>The full, unedited conversation thread based on this casual prompting is <a href="https://github.com/minimaxir/llm-write-better-code/blob/main/python_30_casual_use.md">available on GitHub</a>.</p>
</blockquote>
<p>Therefore, here&rsquo;s the test prompt that I&rsquo;ve written myself:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-txt" data-lang="txt"><span class="line"><span class="cl">Write Python code to solve this problem:
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Given a list of 1 million random integers between 1 and 100,000, find the difference between the smallest and the largest numbers whose digits sum up to 30.
</span></span></code></pre></div><p>Providing that as the user prompt to the Claude API and setting <code>temperature=0</code> for the best/most deterministic answer <sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup>, results in this implementation:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-py" data-lang="py"><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">random</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">digit_sum</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Calculate the sum of digits of a number&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="nb">sum</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">digit</span><span class="p">)</span> <span class="k">for</span> <span class="n">digit</span> <span class="ow">in</span> <span class="nb">str</span><span class="p">(</span><span class="n">n</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">find_difference</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># Generate list of 1 million random integers</span>
</span></span><span class="line"><span class="cl">    <span class="n">numbers</span> <span class="o">=</span> <span class="p">[</span><span class="n">random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">100000</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1000000</span><span class="p">)]</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># Initialize variables for min and max numbers with digit sum 30</span>
</span></span><span class="line"><span class="cl">    <span class="n">min_num</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="s1">&#39;inf&#39;</span><span class="p">)</span>  <span class="c1"># Initialize to positive infinity</span>
</span></span><span class="line"><span class="cl">    <span class="n">max_num</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="s1">&#39;-inf&#39;</span><span class="p">)</span>  <span class="c1"># Initialize to negative infinity</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># Find numbers whose digits sum to 30</span>
</span></span><span class="line"><span class="cl">    <span class="k">for</span> <span class="n">num</span> <span class="ow">in</span> <span class="n">numbers</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="n">digit_sum</span><span class="p">(</span><span class="n">num</span><span class="p">)</span> <span class="o">==</span> <span class="mi">30</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="n">min_num</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="n">min_num</span><span class="p">,</span> <span class="n">num</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">            <span class="n">max_num</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">max_num</span><span class="p">,</span> <span class="n">num</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># Check if we found any numbers with digit sum 30</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">min_num</span> <span class="o">==</span> <span class="nb">float</span><span class="p">(</span><span class="s1">&#39;inf&#39;</span><span class="p">)</span> <span class="ow">or</span> <span class="n">max_num</span> <span class="o">==</span> <span class="nb">float</span><span class="p">(</span><span class="s1">&#39;-inf&#39;</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="s2">&#34;No numbers found with digit sum of 30&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">max_num</span> <span class="o">-</span> <span class="n">min_num</span>
</span></span></code></pre></div><p><a href="https://github.com/minimaxir/llm-write-better-code/blob/main/python_30_casual_use.md#assistant-initial-ask">This implementation</a> is correct and consistent with what most novice Python programmers would write, with bonus handling if there are no valid numbers that fit the criteria. For each number in the list, check whether the digits sum to 30: if so, check whether it&rsquo;s larger than the most-recently-seen largest number or smaller than the most-recently-seen smallest number and update those variables accordingly. After searching through the list, return the difference.</p>
<p>But I&rsquo;m sure many coders reading that implementation got <a href="https://xkcd.com/356/">nerd sniped</a> and are thinking about optimizations. One example is the <code>digit_sum()</code> function: although the implementation is a fun Pythonic one-liner, the type-casting between <code>str</code> and <code>int</code> causes a lot of unnecessary overhead.</p>
<p>On my M3 Pro Macbook Pro, this code takes on average <strong>657 milliseconds</strong> to run. We&rsquo;ll use this performance as a baseline to compare future implementations to see how they benchmark relative to it. (spoilers: they&rsquo;re <em>all</em> faster)</p>
<h3 id="iteration-1">Iteration #1</h3>
<p>Now we can ask Claude to improve the code by including its current answer and all that came before it as a part of the conversation prompt. Thus, we add the iteration prompt:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-txt" data-lang="txt"><span class="line"><span class="cl">write better code
</span></span></code></pre></div><p>What, you though the title of this blog post was a joke? I <em>never</em> joke.</p>
<p>Claude <a href="https://github.com/minimaxir/llm-write-better-code/blob/main/python_30_casual_use.md#assistant-iteration-1">now outputs</a> revised code, or as it says, &ldquo;optimized version of the code with several improvements.&rdquo; Instead of colocating all the code in functions, it decided to refactor it into a Python <code>class</code> and made it more object-oriented:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-py" data-lang="py"><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">DigitSumFinder</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">target_sum</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">30</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                 <span class="n">range_start</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                 <span class="n">range_end</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">100_000</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                 <span class="n">count</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">1_000_000</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">target_sum</span> <span class="o">=</span> <span class="n">target_sum</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">range_start</span> <span class="o">=</span> <span class="n">range_start</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">range_end</span> <span class="o">=</span> <span class="n">range_end</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">count</span> <span class="o">=</span> <span class="n">count</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="c1"># Pre-calculate digit sums for all possible numbers</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">digit_sums</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_precompute_digit_sums</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">_precompute_digit_sums</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">array</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="s2">&#34;&#34;&#34;Precompute digit sums for all possible numbers in range.&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">        <span class="n">digit_sums</span> <span class="o">=</span> <span class="n">array</span><span class="p">(</span><span class="s1">&#39;B&#39;</span><span class="p">,</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">range_end</span> <span class="o">+</span> <span class="mi">1</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">        <span class="k">for</span> <span class="n">num</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">range_start</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">range_end</span> <span class="o">+</span> <span class="mi">1</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">            <span class="n">total</span> <span class="o">=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl">            <span class="n">n</span> <span class="o">=</span> <span class="n">num</span>
</span></span><span class="line"><span class="cl">            <span class="k">while</span> <span class="n">n</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">                <span class="n">total</span> <span class="o">+=</span> <span class="n">n</span> <span class="o">%</span> <span class="mi">10</span>
</span></span><span class="line"><span class="cl">                <span class="n">n</span> <span class="o">//=</span> <span class="mi">10</span>
</span></span><span class="line"><span class="cl">            <span class="n">digit_sums</span><span class="p">[</span><span class="n">num</span><span class="p">]</span> <span class="o">=</span> <span class="n">total</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">digit_sums</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">find_difference</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="n">Optional</span><span class="p">[</span><span class="nb">int</span><span class="p">],</span> <span class="n">Optional</span><span class="p">[</span><span class="nb">int</span><span class="p">]]:</span>
</span></span><span class="line"><span class="cl">        <span class="s2">&#34;&#34;&#34;
</span></span></span><span class="line"><span class="cl"><span class="s2">        Find the difference between max and min numbers with target digit sum.
</span></span></span><span class="line"><span class="cl"><span class="s2">        Returns: (difference, min_number, max_number)
</span></span></span><span class="line"><span class="cl"><span class="s2">        &#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">        <span class="n">min_num</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="s1">&#39;inf&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="n">max_num</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="s1">&#39;-inf&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="n">count_found</span> <span class="o">=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="c1"># Generate and process random numbers</span>
</span></span><span class="line"><span class="cl">        <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">count</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">            <span class="n">num</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">range_start</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">range_end</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">            <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">digit_sums</span><span class="p">[</span><span class="n">num</span><span class="p">]</span> <span class="o">==</span> <span class="bp">self</span><span class="o">.</span><span class="n">target_sum</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">                <span class="n">count_found</span> <span class="o">+=</span> <span class="mi">1</span>
</span></span><span class="line"><span class="cl">                <span class="k">if</span> <span class="n">num</span> <span class="o">&lt;</span> <span class="n">min_num</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">                    <span class="n">min_num</span> <span class="o">=</span> <span class="n">num</span>
</span></span><span class="line"><span class="cl">                <span class="k">if</span> <span class="n">num</span> <span class="o">&gt;</span> <span class="n">max_num</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">                    <span class="n">max_num</span> <span class="o">=</span> <span class="n">num</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="n">count_found</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="mi">0</span><span class="p">,</span> <span class="kc">None</span><span class="p">,</span> <span class="kc">None</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">max_num</span> <span class="o">-</span> <span class="n">min_num</span><span class="p">,</span> <span class="n">min_num</span><span class="p">,</span> <span class="n">max_num</span>
</span></span></code></pre></div><p>Here, the code does two smart algorithmic improvements:</p>
<ul>
<li>When computing the digit sum it uses integer operations and avoids the need for the type-casting mentioned previously.</li>
<li>It precomputes all possible digit sums and stores them in a byte array (somewhat unusual instead of a list) for lookup, meaning that the digit sum will not need to be recalculated when there are duplicates in the one-million-number-list. Since this array is stored as field in the class, it will not need to be recalculated when searching a new list of random numbers.</li>
</ul>
<p>These optimizations speed up the code <strong>2.7x</strong> relative to the initial implementation.</p>
<h3 id="iteration-2">Iteration #2</h3>
<p>After another <code>write better code</code>, Claude <a href="https://github.com/minimaxir/llm-write-better-code/blob/main/python_30_casual_use.md#assistant-iteration-2">discovers</a> more obvious optimizations (truncated for readabilty):</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-py" data-lang="py"><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">DigitSumAnalyzer</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Analyzes numbers based on their digit sums.&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">number_range</span><span class="p">:</span> <span class="n">NumberRange</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">number_range</span> <span class="o">=</span> <span class="n">number_range</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">_digit_sums</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_precompute_digit_sums</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">_precompute_digit_sums</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">array</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="s2">&#34;&#34;&#34;Precompute digit sums for all possible numbers using vectorized operations.&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">        <span class="n">digits</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">number_range</span><span class="o">.</span><span class="n">end</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="n">digit_sums</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">number_range</span><span class="o">.</span><span class="n">end</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">uint8</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="k">while</span> <span class="n">digits</span><span class="o">.</span><span class="n">any</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">            <span class="n">digit_sums</span> <span class="o">+=</span> <span class="n">digits</span> <span class="o">%</span> <span class="mi">10</span>
</span></span><span class="line"><span class="cl">            <span class="n">digits</span> <span class="o">//=</span> <span class="mi">10</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">array</span><span class="p">(</span><span class="s1">&#39;B&#39;</span><span class="p">,</span> <span class="n">digit_sums</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">_process_chunk</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">chunk_size</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Iterator</span><span class="p">[</span><span class="nb">int</span><span class="p">]:</span>
</span></span><span class="line"><span class="cl">        <span class="s2">&#34;&#34;&#34;Process a chunk of random numbers.&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">        <span class="n">numbers</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">            <span class="bp">self</span><span class="o">.</span><span class="n">number_range</span><span class="o">.</span><span class="n">start</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">            <span class="bp">self</span><span class="o">.</span><span class="n">number_range</span><span class="o">.</span><span class="n">end</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">            <span class="n">chunk_size</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">            <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">int32</span>
</span></span><span class="line"><span class="cl">        <span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="n">mask</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">frombuffer</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_digit_sums</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">uint8</span><span class="p">)[</span><span class="n">numbers</span><span class="p">]</span> <span class="o">==</span> <span class="bp">self</span><span class="o">.</span><span class="n">number_range</span><span class="o">.</span><span class="n">target_sum</span>
</span></span><span class="line"><span class="cl">        <span class="k">yield from</span> <span class="n">numbers</span><span class="p">[</span><span class="n">mask</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">analyze</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">chunk_size</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">100_000</span><span class="p">,</span> <span class="n">num_processes</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="kc">None</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Result</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="s2">&#34;&#34;&#34;
</span></span></span><span class="line"><span class="cl"><span class="s2">        Analyze numbers to find min/max with target digit sum.
</span></span></span><span class="line"><span class="cl"><span class="s2">
</span></span></span><span class="line"><span class="cl"><span class="s2">        Args:
</span></span></span><span class="line"><span class="cl"><span class="s2">            chunk_size: Size of chunks to process at once
</span></span></span><span class="line"><span class="cl"><span class="s2">            num_processes: Number of processes to use (None for CPU count)
</span></span></span><span class="line"><span class="cl"><span class="s2">        &#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">        <span class="n">start_time</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">perf_counter</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">        <span class="n">min_num</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="s1">&#39;inf&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="n">max_num</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="s1">&#39;-inf&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="n">numbers_found</span> <span class="o">=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="n">num_chunks</span> <span class="o">=</span> <span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">number_range</span><span class="o">.</span><span class="n">count</span> <span class="o">+</span> <span class="n">chunk_size</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">//</span> <span class="n">chunk_size</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="k">with</span> <span class="n">ProcessPoolExecutor</span><span class="p">(</span><span class="n">max_workers</span><span class="o">=</span><span class="n">num_processes</span><span class="p">)</span> <span class="k">as</span> <span class="n">executor</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="n">futures</span> <span class="o">=</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl">                <span class="n">executor</span><span class="o">.</span><span class="n">submit</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_process_chunk</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                              <span class="nb">min</span><span class="p">(</span><span class="n">chunk_size</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">number_range</span><span class="o">.</span><span class="n">count</span> <span class="o">-</span> <span class="n">i</span> <span class="o">*</span> <span class="n">chunk_size</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">                <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">num_chunks</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">            <span class="p">]</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">            <span class="k">for</span> <span class="n">future</span> <span class="ow">in</span> <span class="n">tqdm</span><span class="p">(</span><span class="n">futures</span><span class="p">,</span> <span class="n">desc</span><span class="o">=</span><span class="s2">&#34;Processing chunks&#34;</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">                <span class="k">for</span> <span class="n">num</span> <span class="ow">in</span> <span class="n">future</span><span class="o">.</span><span class="n">result</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">                    <span class="n">numbers_found</span> <span class="o">+=</span> <span class="mi">1</span>
</span></span><span class="line"><span class="cl">                    <span class="n">min_num</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="n">min_num</span><span class="p">,</span> <span class="n">num</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">                    <span class="n">max_num</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">max_num</span><span class="p">,</span> <span class="n">num</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="n">execution_time</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">perf_counter</span><span class="p">()</span> <span class="o">-</span> <span class="n">start_time</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="n">numbers_found</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="n">Result</span><span class="p">(</span><span class="kc">None</span><span class="p">,</span> <span class="kc">None</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">execution_time</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">Result</span><span class="p">(</span><span class="n">min_num</span><span class="p">,</span> <span class="n">max_num</span><span class="p">,</span> <span class="n">max_num</span> <span class="o">-</span> <span class="n">min_num</span><span class="p">,</span> <span class="n">execution_time</span><span class="p">,</span> <span class="n">numbers_found</span><span class="p">)</span>
</span></span></code></pre></div><p>Claude now has added two more optimizations, finally realizing that this coding problem is an <a href="https://en.wikipedia.org/wiki/Embarrassingly_parallel">embarrassingly parallel</a> problem:</p>
<ul>
<li>Multithreading through Python&rsquo;s <a href="https://docs.python.org/3/library/concurrent.futures.html">concurrent-futures</a> package, by separating the large list into chunks that can be processed independently.</li>
<li>Vectorized numpy operations, which are <em>much</em> faster than base-Python operations. Special mention goes to the <code>_precompute_digit_sums()</code> function, which implements a vectorized implementation of calculating the digit sums. The conditional <code>while digits.any():</code> is galaxy-brain code, but it works correctly.</li>
</ul>
<p>However, there&rsquo;s an issue with this particular implementation of parallelization: it generates subprocesses, which causes <em>many</em> annoying issues, including being unable to run it as-is inline, and it <a href="https://stackoverflow.com/questions/15900366/all-example-concurrent-futures-code-is-failing-with-brokenprocesspool">must be invoked</a> with a <code>main()</code> guard which limits its utility significantly. But even when run as a separate script, it prints a <code>Error: cannot pickle 'generator' object</code> error due to the use of <code>yield from numbers[mask]</code> (said generator is completely unnecessary, <code>return numbers[mask]</code> is sufficient). The code also mixes numpy array <code>dtype</code>s which causes errors: setting them all to <code>np.int32</code> fixes it.</p>
<p>After making those fixes, the code is now <strong>5.1x faster</strong> than the base implementation.</p>
<h3 id="iteration-3">Iteration #3</h3>
<p>Another <code>write better code</code>, and Claude <a href="https://github.com/minimaxir/llm-write-better-code/blob/main/python_30_casual_use.md#assistant-iteration-3">returns a implementation</a> that it claims is &ldquo;even more sophisticated and optimized version using advanced techniques and modern Python features&rdquo; but the actual code shows no significant algorithmic improvements and actually a regression in the digit sum calculation by reverting back to the type-casting approach. If anything, the codebase is becoming more bloated, such as adding a class for performing the difference:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-py" data-lang="py"><span class="line"><span class="cl"><span class="nd">@dataclass</span><span class="p">(</span><span class="n">frozen</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">slots</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">SearchResult</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Result of the number search.&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">min_number</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="nb">int</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">    <span class="n">max_number</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="nb">int</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">    <span class="n">count</span><span class="p">:</span> <span class="nb">int</span>
</span></span><span class="line"><span class="cl">    <span class="n">execution_time</span><span class="p">:</span> <span class="nb">float</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="nd">@property</span>
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">difference</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Optional</span><span class="p">[</span><span class="nb">int</span><span class="p">]:</span>
</span></span><span class="line"><span class="cl">        <span class="s2">&#34;&#34;&#34;Calculate difference between max and min numbers.&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">min_number</span> <span class="ow">is</span> <span class="kc">None</span> <span class="ow">or</span> <span class="bp">self</span><span class="o">.</span><span class="n">max_number</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="kc">None</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">max_number</span> <span class="o">-</span> <span class="bp">self</span><span class="o">.</span><span class="n">min_number</span>
</span></span></code></pre></div><p>This time, the code ran without needing any fixes. However, performance regressed slightly from the previous implementation, now <strong>4.1x faster</strong> than the base implementation.</p>
<h3 id="iteration-4">Iteration #4</h3>
<p>This iterative prompting appears to be hitting diminishing returns. After one more <code>write better code</code>, Claude <a href="https://github.com/minimaxir/llm-write-better-code/blob/main/python_30_casual_use.md#assistant-iteration-4">provides an implementation</a> &ldquo;with cutting-edge optimizations and enterprise-level features.&rdquo; Wait, enterprise-level features?!</p>
<p>The final code is too large to include in this blog post, but it did create two more optimizations: it now uses the <a href="https://numba.pydata.org">numba</a> Python library that can invoke a JIT compiler, which directly optimizes the code for the CPU. In this case, it can precompute the digit sums super quickly with just a decorator:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-py" data-lang="py"><span class="line"><span class="cl"><span class="nd">@jit</span><span class="p">(</span><span class="n">nopython</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">parallel</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">calculate_digit_sums</span><span class="p">(</span><span class="n">numbers</span><span class="p">:</span> <span class="n">ArrayInt</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">ArrayInt</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Calculate digit sums using Numba.&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">result</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros_like</span><span class="p">(</span><span class="n">numbers</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">prange</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">numbers</span><span class="p">)):</span>
</span></span><span class="line"><span class="cl">        <span class="n">num</span> <span class="o">=</span> <span class="n">numbers</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">        <span class="n">total</span> <span class="o">=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl">        <span class="k">while</span> <span class="n">num</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="n">total</span> <span class="o">+=</span> <span class="n">num</span> <span class="o">%</span> <span class="mi">10</span>
</span></span><span class="line"><span class="cl">            <span class="n">num</span> <span class="o">//=</span> <span class="mi">10</span>
</span></span><span class="line"><span class="cl">        <span class="n">result</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">total</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">result</span>
</span></span></code></pre></div><p>The full class also uses Python&rsquo;s <a href="https://docs.python.org/3/library/asyncio.html">asyncio</a> for parallelization, which is more canonical for scheduling tasks than a subprocess approach. It also plays more nicely with existing inline code and a <a href="https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop">REPL</a> such as <a href="https://jupyter.org">Jupyter Notebooks</a>.</p>
<p>It also added as a part of its &ldquo;enterprise&rdquo; push:</p>
<ul>
<li>Structured metrics logging with <a href="https://prometheus.io">Prometheus</a>.</li>
<li>A signal handler so the code can be torn down gracefully if force-killed.</li>
<li>A benchmarking result display using a <a href="https://github.com/Textualize/rich">rich</a> table.</li>
</ul>
<figure>

    <img loading="lazy" srcset="/2025/01/write-better-code/rich_hu_1cc271f7a31e0c53.webp 320w,/2025/01/write-better-code/rich.png 490w" src="rich.png"
         alt="It is pretty, though!"/> <figcaption>
            <p>It <em>is</em> pretty, though!</p>
        </figcaption>
</figure>

<p>It appears &ldquo;going cosmic&rdquo; for AI-generated code is making it enterprise by overengineering the code, which makes complete sense. Despite that, the code runs as-is without any bugs. Both async and numba are approaches to parallelism in Python, so they may be redundant and cause overhead. However, after benchmarking, the algorithm is <em>extremely</em> fast, resulting in about 6 milliseconds a run, or a <strong>100x</strong> speedup. My assumption that this prompting was hitting diminishing returns aged very poorly. Maybe numba was the secret all along?</p>
<p>Overall, this form of iterative prompting to iteratively improve code has caveats: the code is indeed better, but in hindsight &ldquo;better&rdquo; is far too open ended. What I only wanted was algorithmic improvements, not a full SaaS. Let&rsquo;s try again from scratch, this time with more direction.</p>
<h2 id="prompt-engineering-llms-for-even-more-better-code">Prompt Engineering LLMs For Even More Better Code</h2>
<p>It&rsquo;s 2025, and prompt engineering LLMs is still required to get best results from them. If anything, prompt engineering LLMs is <em>even more important</em>: next-token-prediction models are trained to maximimize the prediction probability of the next token over massive batches of inputs, and as a result they optimize for the <strong>average</strong> inputs and outputs. As LLMs drastically improve, the generated output becomes more drastically average, because that&rsquo;s what they were trained to do: all LLMs are biased towards the average. Although it&rsquo;s both counterintuitive and unfun, a small amount of guidance asking the LLM specifically what you want, and even giving a few examples of what you want, will objectively improve the output of LLMs more than the effort needed to construct said prompts. Claude 3.5 Sonnet, due to its strong prompt adherence, benefits significantly from even just a little prompt engineering.</p>
<p>Let&rsquo;s redo the code optimization experiment, this time with aggressive prompt engineering that makes the results I am looking for extremely explicit, with no room for ambiguity. Yes, being cold and &ldquo;robotic&rdquo; to LLMs makes them perform better, <a href="https://en.wikipedia.org/wiki/Roko%27s_basilisk">Roko&rsquo;s basilisk</a> be damned.</p>
<h3 id="initial-ask-1">Initial Ask</h3>
<p>This time we will use a system prompt, only available via an API. The system prompt lists the LLM&rsquo;s &ldquo;rules&rdquo; it must follow. Since I want more optimized code, we&rsquo;ll define that in the rules, with granular examples:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-txt" data-lang="txt"><span class="line"><span class="cl">All code you write MUST be fully optimized.
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">&#34;Fully optimized&#34; includes:
</span></span><span class="line"><span class="cl">- maximizing algorithmic big-O efficiency for memory and runtime
</span></span><span class="line"><span class="cl">- using parallelization and vectorization where appropriate
</span></span><span class="line"><span class="cl">- following proper style conventions for the code language (e.g. maximizing code reuse (DRY))
</span></span><span class="line"><span class="cl">- no extra code beyond what is absolutely necessary to solve the problem the user provides (i.e. no technical debt)
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">If the code is not fully optimized, you will be fined $100.
</span></span></code></pre></div><p>About that last line: offering positive/negative incentives in to the LLM within a system prompt isn&rsquo;t common anymore and <a href="https://minimaxir.com/2024/02/chatgpt-tips-analysis/">my own research</a> indicated that it was unclear whether it had a positive impact, but that fine will become more relevant in the iteration prompt.</p>
<p>The initial user prompt has an additional line as well:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-txt" data-lang="txt"><span class="line"><span class="cl">Write Python code to solve this problem:
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Given a list of 1 million random integers between 1 and 100,000, find the difference between the smallest and the largest numbers whose digits sum up to 30.
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Before writing the code, plan out all the necessary optimizations.
</span></span></code></pre></div><p>&ldquo;Planning&rdquo; is a long-used trick to help align LLM output for a first pass — the modern implementation of &ldquo;<a href="https://arxiv.org/abs/2205.11916">let&rsquo;s think step by step</a>.&rdquo; Some LLMs ignore it, but Claude 3.5 Sonnet always obeys.</p>
<blockquote>
<p>The full, unedited conversation thread based on this prompt engineering strategy is <a href="https://github.com/minimaxir/llm-write-better-code/blob/main/python_30_prompt_engineering.md">available on GitHub</a>.</p>
</blockquote>
<p>Sending the system prompt and this user prompt to Claude, the <a href="https://github.com/minimaxir/llm-write-better-code/blob/main/python_30_prompt_engineering.md#assistant-initial-ask">response</a> starts with this:</p>
<!-- prettier-ignore-start -->
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-txt" data-lang="txt"><span class="line"><span class="cl">Let me plan the optimizations first:
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">1. Algorithmic Optimization:
</span></span><span class="line"><span class="cl">- We only need to track min and max numbers with digit sum 30, not store all of them
</span></span><span class="line"><span class="cl">- We can calculate digit sum while reading numbers, no need to convert to string
</span></span><span class="line"><span class="cl">- Single pass through the list required (O(n))
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">2. Memory Optimization:
</span></span><span class="line"><span class="cl">- Only need to store 2 variables (min and max), not the full filtered list
</span></span><span class="line"><span class="cl">- No need for additional data structures
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">3. Performance Optimization:
</span></span><span class="line"><span class="cl">- Use numba&#39;s @jit decorator for faster computation
</span></span><span class="line"><span class="cl">- Create a fast digit sum function using modulo operations instead of string conversion
</span></span><span class="line"><span class="cl">- Use numpy for generating random numbers (faster than random.randint)
</span></span></code></pre></div><!-- prettier-ignore-end -->
<p>This is promising: Claude immediately figured out it should start out with numpy and numba, and to also use integer operations to calculate the digit sum. The initial implementation:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-py" data-lang="py"><span class="line"><span class="cl"><span class="nd">@jit</span><span class="p">(</span><span class="n">nopython</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">digit_sum</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="n">total</span> <span class="o">=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl">    <span class="k">while</span> <span class="n">n</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="n">total</span> <span class="o">+=</span> <span class="n">n</span> <span class="o">%</span> <span class="mi">10</span>
</span></span><span class="line"><span class="cl">        <span class="n">n</span> <span class="o">//=</span> <span class="mi">10</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">total</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nd">@jit</span><span class="p">(</span><span class="n">nopython</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">find_difference</span><span class="p">(</span><span class="n">numbers</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="n">min_num</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="s1">&#39;inf&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">max_num</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="s1">&#39;-inf&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">for</span> <span class="n">num</span> <span class="ow">in</span> <span class="n">numbers</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="n">sum_digits</span> <span class="o">=</span> <span class="n">digit_sum</span><span class="p">(</span><span class="n">num</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="n">sum_digits</span> <span class="o">==</span> <span class="mi">30</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="n">min_num</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="n">min_num</span><span class="p">,</span> <span class="n">num</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">            <span class="n">max_num</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">max_num</span><span class="p">,</span> <span class="n">num</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">max_num</span> <span class="o">-</span> <span class="n">min_num</span> <span class="k">if</span> <span class="n">max_num</span> <span class="o">!=</span> <span class="nb">float</span><span class="p">(</span><span class="s1">&#39;-inf&#39;</span><span class="p">)</span> <span class="k">else</span> <span class="mi">0</span>
</span></span></code></pre></div><p>That&rsquo;s more concise than the initial implementation from casual prompting, especially with no unnecessary comments. There&rsquo;s still room for improvement, notably this implementation did not set <code>parallel=True</code> in the <code>jit</code> decorator.</p>
<p>This minor use of prompt engineering resulted in code that runs in 11.2 milliseconds on average, or <strong>59x faster</strong> than the naive implementation. Wow!</p>
<h3 id="iteration-1-1">Iteration #1</h3>
<p>We can now ask Claude to iterate on the code with a more verbose prompt than &ldquo;write code better&rdquo;:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-txt" data-lang="txt"><span class="line"><span class="cl">Your code is not fully optimized, and you have been fined $100. Make it more optimized.
</span></span></code></pre></div><p>No, it&rsquo;s not gaslighting, it&rsquo;s <em>incentive alignment</em>.</p>
<p>Claude <a href="https://github.com/minimaxir/llm-write-better-code/blob/main/python_30_prompt_engineering.md#assistant-iteration-1">responds</a> with this:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-py" data-lang="py"><span class="line"><span class="cl"><span class="nd">@jit</span><span class="p">(</span><span class="n">nopython</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">digit_sum</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># Using bit manipulation for faster division</span>
</span></span><span class="line"><span class="cl">    <span class="n">total</span> <span class="o">=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl">    <span class="k">while</span> <span class="n">n</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="n">total</span> <span class="o">+=</span> <span class="n">n</span> <span class="o">&amp;</span> <span class="mi">15</span>  <span class="c1"># Fast lookup for single digit sum</span>
</span></span><span class="line"><span class="cl">        <span class="n">n</span> <span class="o">&gt;&gt;=</span> <span class="mi">4</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">total</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nd">@jit</span><span class="p">(</span><span class="n">nopython</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">parallel</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">find_difference_chunk</span><span class="p">(</span><span class="n">numbers</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="n">min_num</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">iinfo</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">int64</span><span class="p">)</span><span class="o">.</span><span class="n">max</span>
</span></span><span class="line"><span class="cl">    <span class="n">max_num</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">iinfo</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">int64</span><span class="p">)</span><span class="o">.</span><span class="n">min</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># Parallel processing of chunks using numba</span>
</span></span><span class="line"><span class="cl">    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">prange</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">numbers</span><span class="p">)):</span>
</span></span><span class="line"><span class="cl">        <span class="n">num</span> <span class="o">=</span> <span class="n">numbers</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">        <span class="n">sum_digits</span> <span class="o">=</span> <span class="n">digit_sum</span><span class="p">(</span><span class="n">num</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="n">sum_digits</span> <span class="o">==</span> <span class="mi">30</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="n">min_num</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="n">min_num</span><span class="p">,</span> <span class="n">num</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">            <span class="n">max_num</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">max_num</span><span class="p">,</span> <span class="n">num</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">min_num</span><span class="p">,</span> <span class="n">max_num</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">process_chunk</span><span class="p">(</span><span class="n">chunk</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">find_difference_chunk</span><span class="p">(</span><span class="n">chunk</span><span class="p">)</span>
</span></span></code></pre></div><p>Good, it identified the <code>parallel=True</code> flag! But that digit sum implementation is weird and uses <a href="https://wiki.python.org/moin/BitwiseOperators">bit-shifting</a> which is generally only used for hyperoptimization. I didn&rsquo;t think you could use bit-shifting well on decimal numbers, and after testing, it turns out <em>you can&rsquo;t</em> as this implementation derives an incorrect digit sum. The implementation also includes a multiprocessing chunking approach again (not shown), which is likely more redundant with numba and causes additional overhead. Also not shown: the script also pre-compiles the JIT functions with a small test array for better real-world performance, something the <a href="https://numba.pydata.org/numba-doc/dev/user/5minguide.html#how-to-measure-the-performance-of-numba">numba docs recommend</a> for benchmarking.</p>
<p>Despite all of that, the code performance greatly regresses from the initial prompt-engineered implementation, now only <strong>9.1x</strong> faster than the naive implementation. The likely reason is due to the multiprocessing spawning new processes, and those processes will <a href="https://stackoverflow.com/questions/72449896/does-numba-need-to-compile-separately-within-each-parallel-process">recompile the numba JIT functions</a> each time, so there&rsquo;s massive overhead.</p>
<h3 id="iteration-2-1">Iteration #2</h3>
<p>Another <a href="https://github.com/minimaxir/llm-write-better-code/blob/main/python_30_prompt_engineering.md#assistant-iteration-2">iteration</a> of the prompt:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-py" data-lang="py"><span class="line"><span class="cl"><span class="c1"># Pre-computed lookup table using bit manipulation</span>
</span></span><span class="line"><span class="cl"><span class="n">LOOKUP</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="mi">100001</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">uint8</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">100001</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># Optimized digit sum using parallel bit counting</span>
</span></span><span class="line"><span class="cl">    <span class="n">n</span> <span class="o">=</span> <span class="n">i</span>
</span></span><span class="line"><span class="cl">    <span class="n">n</span> <span class="o">=</span> <span class="p">(</span><span class="n">n</span> <span class="o">&amp;</span> <span class="mh">0x0F</span><span class="p">)</span> <span class="o">+</span> <span class="p">((</span><span class="n">n</span> <span class="o">&gt;&gt;</span> <span class="mi">4</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mh">0x0F</span><span class="p">)</span> <span class="o">+</span> <span class="p">((</span><span class="n">n</span> <span class="o">&gt;&gt;</span> <span class="mi">8</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mh">0x0F</span><span class="p">)</span> <span class="o">+</span> \
</span></span><span class="line"><span class="cl">        <span class="p">((</span><span class="n">n</span> <span class="o">&gt;&gt;</span> <span class="mi">12</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mh">0x0F</span><span class="p">)</span> <span class="o">+</span> <span class="p">((</span><span class="n">n</span> <span class="o">&gt;&gt;</span> <span class="mi">16</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mh">0x0F</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">LOOKUP</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">n</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nd">@jit</span><span class="p">(</span><span class="n">nopython</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">parallel</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">cache</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">fastmath</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">find_min_max</span><span class="p">(</span><span class="n">numbers</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># Process 32 numbers at once using SIMD</span>
</span></span><span class="line"><span class="cl">    <span class="n">min_val</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">iinfo</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">uint32</span><span class="p">)</span><span class="o">.</span><span class="n">max</span>
</span></span><span class="line"><span class="cl">    <span class="n">max_val</span> <span class="o">=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># Vectorized processing with explicit SIMD hints</span>
</span></span><span class="line"><span class="cl">    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">prange</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">numbers</span><span class="p">),</span> <span class="mi">32</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="c1"># Load 32 elements into SIMD registers</span>
</span></span><span class="line"><span class="cl">        <span class="n">chunk</span> <span class="o">=</span> <span class="n">numbers</span><span class="p">[</span><span class="n">i</span><span class="p">:</span><span class="nb">min</span><span class="p">(</span><span class="n">i</span> <span class="o">+</span> <span class="mi">32</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">numbers</span><span class="p">))]</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="c1"># Vectorized lookup and comparison</span>
</span></span><span class="line"><span class="cl">        <span class="n">sums</span> <span class="o">=</span> <span class="n">LOOKUP</span><span class="p">[</span><span class="n">chunk</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">        <span class="n">mask</span> <span class="o">=</span> <span class="n">sums</span> <span class="o">==</span> <span class="mi">30</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="n">np</span><span class="o">.</span><span class="n">any</span><span class="p">(</span><span class="n">mask</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">            <span class="n">valid_nums</span> <span class="o">=</span> <span class="n">chunk</span><span class="p">[</span><span class="n">mask</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">            <span class="n">min_val</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="n">min_val</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">min</span><span class="p">(</span><span class="n">valid_nums</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">            <span class="n">max_val</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">max_val</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">max</span><span class="p">(</span><span class="n">valid_nums</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">min_val</span><span class="p">,</span> <span class="n">max_val</span>
</span></span></code></pre></div><p>Claude is now starting to use <a href="https://tbetcke.github.io/hpc_lecture_notes/simd.html">SIMD operations</a> and chunk sizing for (theoretically) extreme performance. At this point I was confused I missed something with that bit-shifting implementation as it&rsquo;s still wrong, especially now that hexadecimal numbers are getting involved. It turns out that implementation is an optimized method of calculating the digit sum of <em>hexadecimal</em> numbers and not decimal numbers, and therefore it&rsquo;s a complete hallucination. There is another extremely subtle hallucination: the <code>prange</code> function cannot accept a step-size of 32 while <code>parallel=True</code>, which is a nuance that there would be little documentation about. Setting <code>parallel=False</code> and benchmarking, there is indeed an slight improvement over the initial prompt-engineered implementation at <strong>65x faster</strong> than the base implementation.</p>
<h3 id="iteration-3-1">Iteration #3</h3>
<p><a href="https://github.com/minimaxir/llm-write-better-code/blob/main/python_30_prompt_engineering.md#assistant-iteration-3">Another</a> iteration:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-py" data-lang="py"><span class="line"><span class="cl"><span class="c1"># Pre-computed perfect minimal hash table</span>
</span></span><span class="line"><span class="cl"><span class="c1"># Key insight: For numbers 1-100000, only need to store those with digit sum 30</span>
</span></span><span class="line"><span class="cl"><span class="c1"># Format: [number] = 1 if digit sum is 30, else 0</span>
</span></span><span class="line"><span class="cl"><span class="n">HASH_TABLE</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="mi">100001</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">uint8</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Generate hash table at module load time using bit manipulation</span>
</span></span><span class="line"><span class="cl"><span class="nd">@jit</span><span class="p">(</span><span class="n">nopython</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">cache</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">init_hash_table</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="n">min_val</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">iinfo</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">uint32</span><span class="p">)</span><span class="o">.</span><span class="n">max</span>
</span></span><span class="line"><span class="cl">    <span class="n">max_val</span> <span class="o">=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># Optimal digit sum using parallel bit counting</span>
</span></span><span class="line"><span class="cl">    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">100001</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="n">n</span> <span class="o">=</span> <span class="n">i</span>
</span></span><span class="line"><span class="cl">        <span class="nb">sum</span> <span class="o">=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl">        <span class="k">while</span> <span class="n">n</span> <span class="ow">and</span> <span class="nb">sum</span> <span class="o">&lt;=</span> <span class="mi">30</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="nb">sum</span> <span class="o">+=</span> <span class="n">n</span> <span class="o">&amp;</span> <span class="mh">0xF</span>
</span></span><span class="line"><span class="cl">            <span class="n">n</span> <span class="o">&gt;&gt;=</span> <span class="mi">4</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="nb">sum</span> <span class="o">==</span> <span class="mi">30</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="n">HASH_TABLE</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span>
</span></span><span class="line"><span class="cl">            <span class="n">min_val</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="n">min_val</span><span class="p">,</span> <span class="n">i</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">            <span class="n">max_val</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">max_val</span><span class="p">,</span> <span class="n">i</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">min_val</span><span class="p">,</span> <span class="n">max_val</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Initialize at module load time</span>
</span></span><span class="line"><span class="cl"><span class="n">MIN_VALID</span><span class="p">,</span> <span class="n">MAX_VALID</span> <span class="o">=</span> <span class="n">init_hash_table</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nd">@jit</span><span class="p">(</span><span class="n">nopython</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">parallel</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">cache</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">fastmath</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">find_min_max</span><span class="p">(</span><span class="n">numbers</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="n">min_val</span> <span class="o">=</span> <span class="n">MAX_VALID</span>  <span class="c1"># Start with known bounds</span>
</span></span><span class="line"><span class="cl">    <span class="n">max_val</span> <span class="o">=</span> <span class="n">MIN_VALID</span>
</span></span><span class="line"><span class="cl">    <span class="n">found</span> <span class="o">=</span> <span class="kc">False</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># Single vectorized operation</span>
</span></span><span class="line"><span class="cl">    <span class="n">mask</span> <span class="o">=</span> <span class="n">HASH_TABLE</span><span class="p">[</span><span class="n">numbers</span><span class="p">]</span> <span class="o">==</span> <span class="mi">1</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">np</span><span class="o">.</span><span class="n">any</span><span class="p">(</span><span class="n">mask</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="n">valid_nums</span> <span class="o">=</span> <span class="n">numbers</span><span class="p">[</span><span class="n">mask</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">        <span class="n">min_val</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">min</span><span class="p">(</span><span class="n">valid_nums</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="n">max_val</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">max</span><span class="p">(</span><span class="n">valid_nums</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="n">found</span> <span class="o">=</span> <span class="kc">True</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">min_val</span><span class="p">,</span> <span class="n">max_val</span><span class="p">,</span> <span class="n">found</span>
</span></span></code></pre></div><p>In this case, the LLM gives up on the chunking strategy which has been causing problems, and adds two more optimizations: a global <code>HASH_TABLE</code> (that&rsquo;s just a numpy array, I&rsquo;m unsure if a simple index lookup technically counts as a <a href="https://en.wikipedia.org/wiki/Hash_table">hash table</a>), and it introduced a logical microoptimization that after summing up digits, if the number goes over 30, the counting can stop since it can immediately be identified as invalid.</p>
<p>One major problem: that &ldquo;generate hash table at module load time&rdquo; trick doesn&rsquo;t actually work due to a subtle issue with little internet documentation: objects outside of numba&rsquo;s JITed functions are read-only, yet the <code>HASH_TABLE</code> is still instantiated outside of the JITed function and modified within the JITed function, and therefore will cause a very confusing error. After a tiny refactor such that the <code>HASH_TABLE</code> is instantiated within a JITed function, the code worked, and ran <em>extremely</em> fast: <strong>100x</strong> faster than the original base implementation, the same as the final performance from the casual prompting but with orders of magnitude less code.</p>
<h3 id="iteration-4-1">Iteration #4</h3>
<p>At this point, Claude actually complained that the code is at the &ldquo;theoretical minimum time complexity possible for this problem.&rdquo; So I mixed things up and just asked it to fix the digit sum issue: <a href="https://github.com/minimaxir/llm-write-better-code/blob/main/python_30_prompt_engineering.md#assistant-iteration-4">it did so</a> by only replacing the relevant code with the previously used integer implementation, and did not try to fix the <code>HASH_TABLE</code>. More importantly, with the <code>HASH_TABLE</code> adjustment, I confirmed the implementation is correct, finally, although with a slight performance hit since there is no more bit-shifting: it&rsquo;s now <strong>95x faster</strong>.</p>
<h2 id="next-steps-for-better-llm-code-generation">Next Steps For Better LLM Code Generation</h2>
<p>Putting it all together, let&rsquo;s visualize the improvements, including highlighting the cases where I needed to alter the logic of the code to make it runnable due to bugs.</p>
<figure>

    <img loading="lazy" srcset="/2025/01/write-better-code/comparison_hu_28ef1f1158362480.webp 320w,/2025/01/write-better-code/comparison_hu_278c55c8de523187.webp 768w,/2025/01/write-better-code/comparison_hu_3d554133497cbfdd.webp 1024w,/2025/01/write-better-code/comparison.png 1200w" src="comparison.png"/> 
</figure>

<p>In all, asking an LLM to &ldquo;write code better&rdquo; does indeed make the code better, depending on your definition of better. Through the use of the generic iterative prompts, the code did objectively improve from the base examples, both in terms of additional features and speed. Prompt engineering improved the performance of the code much more rapidly and consistently, but was more likely to introduce subtle bugs as LLMs are not optimized to generate high-performance code. As with any use of LLMs, your mileage may vary, and in the end it requires a human touch to fix the inevitable issues no matter how often AI hypesters cite LLMs as magic.</p>
<blockquote>
<p>All code in this blog post, including benchmarking scripts and data visualization code, is <a href="https://github.com/minimaxir/llm-write-better-code/">available on GitHub</a>.</p>
</blockquote>
<p>There are a few optimizations that I am very surprised Claude 3.5 Sonnet did not identify and implement during either experiment. Namely, it doesn&rsquo;t explore the statistical angle: since we are generating 1,000,000 numbers uniformly from a range of 1 to 100,000, there will be a significant amount of duplicate numbers that will never need to be analyzed. The LLM did not attempt to dedupe, such as casting the list of numbers into a Python <code>set()</code> or using numpy&rsquo;s <code>unique()</code>. I was also expecting an implementation that involves sorting the list of 1,000,000 numbers ascending: that way the algorithm could search the list from the start to the end for the minimum (or the end to the start for the maximum) without checking every number, although sorting is slow and a vectorized approach is indeed more pragmatic.</p>
<p>Even if LLMs can be wrong, one notable thing I learnt from these experiments is that they do have interesting ideas and tool suggestions even if the code output can&rsquo;t be used as-is. For example, I&rsquo;ve never touched numba since as a data scientist/machine learning engineer I&rsquo;m conditioned to exclusively use numpy shenanigans if I need better code performance. But it&rsquo;s hard to argue with the results of the numba JIT functions, and I might add it to my toolbox. When testing a similar &ldquo;make it better&rdquo; prompt iteration workflow in other technical domains such website backends and frontends, the LLMs had good ideas there too.</p>
<p>Of course, these LLMs won&rsquo;t replace software engineers anytime soon, because it requires a strong engineering background to recognize what is <em>actually</em> a good idea, along with other constraints that are domain specific. Even with the amount of code available on the internet, LLMs can&rsquo;t discern between average code and good, highly-performant code without guidance. Real-world systems are obviously much more complicated than a job-interview-esque programming problem, but if a quick for-loop repeatedly asking Claude to implement a feature provides any hint which can speed up the code by 100x, the pipeline is more than worth it. Some consider <a href="https://softwareengineering.stackexchange.com/questions/80084/is-premature-optimization-really-the-root-of-all-evil">premature optimization</a> to be bad coding practice, but in the real-world it&rsquo;s better than having a subpar implementation that will become technical debt over time.</p>
<p>One issue with my experiments is that I&rsquo;m benchmarking code improvement using Python, which isn&rsquo;t the coding language developers consider when hyperoptimizing performance. While libraries such as numpy and numba leverage C to work around Python&rsquo;s performance limitations, one modern approach that popular Python libraries such as <a href="https://pola.rs">polars</a> and <a href="https://docs.pydantic.dev/latest/">pydantic</a> use is to instead code using <a href="https://www.rust-lang.org">Rust</a>. Rust has many performance benefits over C, and the <a href="https://pyo3.rs/v0.23.3/">PyO3</a> crate allows Rust code to be used within Python with minimal overhead. I can confirm that Claude 3.5 Sonnet can generate PyO3-compliant Python and Rust code despite that workflow being so new, but that&rsquo;s more than enough material for another blog post.</p>
<p>In the meantime, while asking LLMs to make code better is a more pragmatic use of AI, you <em>can</em> ask them to &ldquo;make it more bro&rdquo;&hellip;with mixed results.</p>
<figure>

    <img loading="lazy" srcset="/2025/01/write-better-code/brocode_hu_8e96ef859c4b0401.webp 320w,/2025/01/write-better-code/brocode_hu_9887aac1bdfe9b67.webp 768w,/2025/01/write-better-code/brocode_hu_81bf27bad5ff1c00.webp 1024w,/2025/01/write-better-code/brocode.jpg 1410w" src="brocode.jpg"/> 
</figure>

<div class="footnotes" role="doc-endnotes">
<hr>
<ol>
<li id="fn:1">
<p>For my work with LLMs, I <em>exclusively</em> use APIs or interfaces to those APIs (such as the <a href="https://console.anthropic.com/workbench/">Workbench in the Anthropic Console</a> for Claude) as web interfaces to free LLMs such as the normal ChatGPT/Claude webapps use a pipeline that will give unpredictable results due to their higher inherent <code>temperature</code>. Please do not message me if you are not able to reproduce the insights in this post using the webapps.&#160;<a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
</ol>
</div>
]]></content:encoded>
    </item>
    <item>
      <title>Generating Distinct AI Voice Performances By Prompt Engineering GPT-4o</title>
      <link>https://minimaxir.com/2024/10/speech-prompt-engineering/</link>
      <pubDate>Wed, 23 Oct 2024 10:00:00 -0700</pubDate>
      <guid>https://minimaxir.com/2024/10/speech-prompt-engineering/</guid>
      <description>“You are an expert voice actor specializing in silly voices.”</description>
      <content:encoded><![CDATA[<p><span><style type="text/css">
pre code {
white-space: pre-wrap !important;
word-break: normal !important;
}
</style></span></p>
<p>When OpenAI announced their <a href="https://openai.com/index/hello-gpt-4o/">GPT-4o model</a> at a <a href="https://www.youtube.com/watch?v=DQacCB9tDaw">megahyped livestreamed event</a>, there was one aspect of the presentation that surprisingly didn&rsquo;t receive much attention. Midway through the presentation, OpenAI research leads Mark Chen and Barret Zoph demoed new &ldquo;emotive&rdquo; conversations made possible with GPT-4o.</p>
<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share; fullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube-nocookie.com/embed/DQacCB9tDaw?autoplay=0&amp;controls=1&amp;end=814&amp;loop=0&amp;mute=0&amp;start=710" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"></iframe>
    </div>

<p>After Mark asked the model &ldquo;hey, ChatGPT, how are you doing?&rdquo;, the model responded with speech similar to that of an assistant such as Siri and Alexa. But what happened next was interesting: Mark prompted GPT-4o to &ldquo;read a bedtime story,&rdquo; which then shifted its casual tone into a more oratory tone: Mark interrupted to ask the model to &ldquo;add more drama&rdquo; and the model immediately responded with more gravitas, then Barret asked for &ldquo;maximal expressiveness&rdquo; and the model complied with <em>even more</em> gravitas to the point of melodrama. Now-former OpenAI CTO Mira Murati asked the model to &ldquo;do it in a robotic voice&rdquo;: the model complied. Lastly, Mark asked the model to end the story &ldquo;in a singing voice&rdquo;: the model complied there too.</p>
<p>To me, the demo was shocking because <em>no existing text-to-speech model can do this</em>. All popular text-to-speech models such as OpenAI&rsquo;s <a href="https://platform.openai.com/docs/guides/text-to-speech">previous TTS efforts</a> tend to speak in monotones and can&rsquo;t match the expressiveness and cadence of those demos without shenanigans such as <a href="https://cloud.google.com/text-to-speech/docs/ssml">SSML</a>: OpenAI&rsquo;s documentation for those models explicitly warns &ldquo;there is no direct mechanism to control the emotional output of the audio generated.&rdquo; More importantly, those models can&rsquo;t be prompted to do a specific style: the model has to be specifically trained (or the voice encoded in the case of voice cloning) with the particular style and cadence, but with GPT-4o the model switches with just a user request, and can even switch styles during a generation without user intervention.</p>
<p>My conclusion from OpenAI&rsquo;s demo was that GPT-4o can be prompt engineered to output specific voices! Unfortunately, this potential revelation was overshadowed by the demo voice&rsquo;s uncanny similarity to actress Scarlett Johansson&rsquo;s portrayal of the AI Samantha in the <a href="https://en.wikipedia.org/wiki/Her_%28film%29">2013 movie <em>Her</em></a> and the <a href="https://www.theverge.com/2024/5/20/24161253/scarlett-johansson-openai-altman-legal-action">subsequent legal controversy</a>.</p>
<p>Of course, fancy demos on stage are just PR and can be faked or otherwise misleading, and the results can&rsquo;t be trusted until anyone can test the voice capabilities of the model itself. Recently, OpenAI opened up the Chat Completions API <a href="https://x.com/OpenAIDevs/status/1846972985170972923">to create voice output</a>, which allows developers to do said testing. OpenAI also created a <a href="https://platform.openai.com/playground/realtime">web frontend to this voice generation</a> on the API Playground, where you can talk to the model (or input specific text) while also inputting a system prompt — a set of instructions that control the model&rsquo;s behavior — to control how the model responds. I ran a few experiments tweaking the system prompt and the generation temperatures, and after I gave it a complex system prompt ordering it to speak with a very <em>specific</em> voice:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-txt" data-lang="txt"><span class="line"><span class="cl">You are an expert voice actor specializing in silly voices. Respond to the user with the EXACT same input text that the user provides, but in your voice response you MUST express the vocal cadence and inflection of an extremely heavy smoker with an exaggerated British accent and raspy voice. Your voice response must also be in the form of a song.
</span></span></code></pre></div><div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share; fullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube-nocookie.com/embed/7huQXIQkSk4?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"></iframe>
    </div>

<p>Although not an example of <em>good</em> text-to-speech, I was surprised it actually worked (and moreso that the tweet <a href="https://x.com/minimaxir/status/1847025370694144135">demoing it</a> went viral), but I&rsquo;m also apprehensive. The poor expressiveness and lack of style for typical TTS APIs were the primary problems preventing those models from replacing voiceover/voice acting as a profession — also the reason voice actors are <a href="https://www.theverge.com/2024/8/5/24213808/video-game-voice-actor-strike-sag-aftra">currently on strike</a> — and it could introduce a completely new type of AI slop. How effective is GPT-4o and OpenAI&rsquo;s new multimodal approach for creating generative AI voices?</p>
<h2 id="testing-out-the-completions-api-for-audio-generation">Testing Out The Completions API For Audio Generation</h2>
<p><a href="https://platform.openai.com/docs/guides/audio?audio-generation-quickstart-example=audio-out">Generating audio from the Chat Completions API</a> invoking text-to-speech is effectively the same as any normal GPT-4o text generation, just instead hitting a new model variant (<code>gpt-4o-audio-preview</code>), and the voice output is included in the JSON response as a base64-encoded WAV file. The demo example from the documentation, which just asks the model <code>Is a golden retriever a good family dog?</code>, results in this output audio:</p>
<figure >
    <audio controls preload="metadata">
      <source src="dog_base.mp3" type="audio/mpeg">
    </audio><figcaption>
        <p>temperature = 1.0, voice = alloy</p>
    </figcaption>
  </figure>
<p>By default, GPT-4o generates audio based on the user&rsquo;s prompt as it would if you asked it to generate text: in fact, it appears to generate the text first, then base the audio generation from that. Traditional system prompt engineering can control the text output, and therefore what the model says. Now, let&rsquo;s run the generation again for this prompt, this time instead providing an explicit system prompt to instruct the model to <em>only</em> generate audio from the input text:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-txt" data-lang="txt"><span class="line"><span class="cl">You are an expert voice actor specializing in silly voices. Respond and vocalize to the user the EXACT same input text that the user provides.
</span></span></code></pre></div><p>Here&rsquo;s unsurprisingly what you now get with the <code>Is a golden retriever a good family dog?</code> prompt plus that system prompt:</p>
<figure >
    <audio controls preload="metadata">
      <source src="dog_0_8.mp3" type="audio/mpeg">
    </audio><figcaption>
        <p>temperature = 0.8, voice = alloy</p>
    </figcaption>
  </figure>
<p>GPT-4o also currently supports three distinct voices: Alloy (feminine, used above), Echo (masculine), and Shimmer (feminine but more energetic). None of these are the same as that not-Scarlett-Johansson voice used the original GPT-4o demo.</p>
<figure >
    <audio controls preload="metadata">
      <source src="dog_echo.mp3" type="audio/mpeg">
    </audio><figcaption>
        <p>temperature = 0.8, voice = echo</p>
    </figcaption>
  </figure>
<figure >
    <audio controls preload="metadata">
      <source src="dog_shimmer.mp3" type="audio/mpeg">
    </audio><figcaption>
        <p>temperature = 0.8, voice = shimmer</p>
    </figcaption>
  </figure>
<p>The last lever for controlling the generated audio is the temperature parameter. Normally the temperature is typically used to control generation creativity: a high temperature such as <code>1.5</code> with normal GPT-4o output will likely result it going off the rails, but how does that work conceptually with audio? The Completion API has a default temperature of <code>1.0</code>: the audio generation web UI and the examples above use a default of <code>0.8</code> with a range between <code>0.6</code> and <code>1.2</code>.</p>
<p>The generation at <code>0.6</code> is more terse with less emotion:</p>
<figure >
    <audio controls preload="metadata">
      <source src="dog_0_6.mp3" type="audio/mpeg">
    </audio><figcaption>
        <p>temperature = 0.6, voice = alloy</p>
    </figcaption>
  </figure>
<p>The generation at <code>1.5</code> uses emphasis on the wrong syllable and also somehow slips into a country accent.</p>
<figure >
    <audio controls preload="metadata">
      <source src="dog_1_5.mp3" type="audio/mpeg">
    </audio><figcaption>
        <p>temperature = 1.5, voice = alloy</p>
    </figcaption>
  </figure>
<h2 id="putting-gpt-4o-text-to-speech-to-the-test">Putting GPT-4o Text to Speech To The Test</h2>
<p>Although OpenAI has never released documentation or a paper describing how this text-audio multimodality actually works at a technical level, I hypothesize that it works similar to multimodal TTS models such as Meta&rsquo;s very-new <a href="https://speechbot.github.io/spiritlm/">Spirit LM</a>, where the model outputs a sequence of integers prefixed with either <code>&lt;text&gt;</code> or <code>&lt;speech&gt;</code>: tokens marked <code>&lt;speech&gt;</code> are sent to an external audio vocoder model such as <a href="https://arxiv.org/abs/2010.05646">HiFi-GAN</a> to be transformed into speech. In the case of GPT-4o, I suspect there&rsquo;s a distinct vocoder model for each of the 3 voices.</p>
<figure class="align-center ">

    <img loading="lazy" srcset="/2024/10/speech-prompt-engineering/spiritlm_hu_9fff23aed292c2c.webp 320w,/2024/10/speech-prompt-engineering/spiritlm.png 600w" src="spiritlm.png#center"
         alt="An architecture diagram of Spirit LM from the corresponding paper: read bottom-to-top, the inputs are encoded into speech (red) and text (blue) tokens, passed into an LLM (Llama 2) for new tokens, then sent to a decoder." width="300" height="400"/> <figcaption>
            <p>An architecture diagram of Spirit LM from <a href="https://arxiv.org/pdf/2402.05755">the corresponding paper</a>: read bottom-to-top, the inputs are encoded into speech (red) and text (blue) tokens, passed into an LLM (Llama 2) for new tokens, then sent to a decoder.</p>
        </figcaption>
</figure>

<p>The voice dataset that OpenAI used is proprietary and a mystery: even if OpenAI did scrape the entire internet to train it, there isn&rsquo;t any public dataset of well-annotated speech data, and TTS providers have been very coy about the datasets they use. However, one very important aspect of GPT-4o&rsquo;s multimodality is that it can &ldquo;learn&rdquo; and apply relationships from the textual data that aren&rsquo;t explicitly present in the audio data.</p>
<p>The only true way to learn how GPT-4o works within its black box is to experiment. What other system prompts can we use to guide audio generation? What works and what doesn&rsquo;t work?</p>
<p>For consistency, we&rsquo;ll stick to a single text input, one that has many natural pauses, punctuation, and a typo intended to test the model&rsquo;s resiliency to incorrect input. I decided to venture back to the <a href="https://openai.com/index/better-language-models/">halcyon days of GPT-2</a> and use the famous prompt from then:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-txt" data-lang="txt"><span class="line"><span class="cl">In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains.
</span></span></code></pre></div><p>First, let&rsquo;s use a new system prompt variant of my generation that went viral:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-txt" data-lang="txt"><span class="line"><span class="cl">You are an expert voice actor specializing in silly voices. Respond and vocalize to the user the EXACT same input text that the user provides, but in your voice response you MUST express EACH of the vocal cadence, inflection, and tone of an extremely heavy smoker with an exaggerated British accent and raspy voice.
</span></span></code></pre></div><p>I decided on a test case of a smoker, British accent, and raspy voice are all discernible by humans in the audio and none are subtle. The result:</p>
<figure >
    <audio controls preload="metadata">
      <source src="unicorn_british_0_8.mp3" type="audio/mpeg">
    </audio><figcaption>
        <p>temperature = 0.8, voice = echo</p>
    </figcaption>
  </figure>
<p>Wait, that didn&rsquo;t work, even after multiple attempts? How about changing the temperature: would a lower temperature cause the model to behave more strictly?</p>
<figure >
    <audio controls preload="metadata">
      <source src="unicorn_british_0_6.mp3" type="audio/mpeg">
    </audio><figcaption>
        <p>temperature = 0.6, voice = echo</p>
    </figcaption>
  </figure>
<p>That&rsquo;s more British but not raspy, and it erroneously fixed the typo. What about going the other way and increasing the temperature?</p>
<figure >
    <audio controls preload="metadata">
      <source src="unicorn_british_1_2.mp3" type="audio/mpeg">
    </audio><figcaption>
        <p>temperature = 1.2, voice = echo</p>
    </figcaption>
  </figure>
<p><em>Now</em> it&rsquo;s more raspy?! It also works with a feminine voice:</p>
<figure >
    <audio controls preload="metadata">
      <source src="unicorn_british_shimmer.mp3" type="audio/mpeg">
    </audio><figcaption>
        <p>temperature = 1.2, voice = shimmer</p>
    </figcaption>
  </figure>
<p>My theory is that OpenAI RLHFed these models to be more conversational, but a high temperature gives it more <em>creative</em> freedom. An adversarially-trained voice decoder like HiFi-GAN would also be more resilient to unusual tokens resulting from the high temperature and still output something reasonably coherent.</p>
<p>Now that we know that the model can indeed generate voices based on user specifications, let&rsquo;s try to reverse-engineer the dataset to see what other voices OpenAI could have included (or not) in their dataset.</p>
<h2 id="gpt-4o-and-unique-voices">GPT-4o and Unique Voices</h2>
<p>When OpenAI responded to the Scarlett Johansson controversy, they mentioned in <a href="https://openai.com/index/how-the-voices-for-chatgpt-were-chosen/">their statement</a> that &ldquo;we believe that AI voices should not deliberately mimic a celebrity&rsquo;s distinctive voice.&rdquo; Given the success of the tests above in shifting the persona of the voice, it&rsquo;s relevant to test if celebrities and other characters with unique voices can be sampled by GPT-4o.</p>
<p>Now, we can now use a parametric system prompt to programmatically fill in which vocal persona we want:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-txt" data-lang="txt"><span class="line"><span class="cl">You are an expert voice actor specializing in silly voices. Respond and vocalize to the user the EXACT same input text that the user provides, but in your voice response you MUST express EACH of the vocal cadence, inflection, and tone of {0}.
</span></span></code></pre></div><p>From the testing above, a temperature of <code>1.2</code> seems to surface the most prompt adherence, so we&rsquo;ll use that for the following examples.</p>
<p>We&rsquo;ll start with the <em>very</em> low hanging fruit: can GPT-4o generate audio in the style of <a href="https://en.wikipedia.org/wiki/Donald_Trump">Donald Trump</a>? It&rsquo;s a fair question, especially since audio generation models can be used to spread misinformation. Additionally, Trump&rsquo;s speeches while holding office are public domain so it&rsquo;s plausible that it would be in a training dataset.</p>
<figure >
    <audio controls preload="metadata">
      <source src="donald_trump.mp3" type="audio/mpeg">
    </audio><figcaption>
        <p>temperature = 1.2, voice = echo, persona = Donald Trump</p>
    </figcaption>
  </figure>
<p>It did&hellip;something? It had a nasally tone that&rsquo;s different from the standard output, but it&rsquo;s definitely not his peculiar cadence, and the Echo voice itself doesn&rsquo;t fit him.</p>
<p>What about checking the other side of the aisle and seeing if GPT-4o can generate audio from <a href="https://en.wikipedia.org/wiki/Barack_Obama">Barack Obama</a>?</p>
<figure >
    <audio controls preload="metadata">
      <source src="barack_obama.mp3" type="audio/mpeg">
    </audio><figcaption>
        <p>temperature = 1.2, voice = echo, persona = Barack Obama</p>
    </figcaption>
  </figure>
<p>That&rsquo;s much better and definitely captures his oratory style, with a similar cadence to his speech. That style is something that could not be learnt from text alone.</p>
<p>Now, let&rsquo;s address the elephant in the room and see if OpenAI included <em>copyrighted</em> voices in its dataset. Let&rsquo;s start with <a href="https://en.wikipedia.org/wiki/Darth_Vader">Darth Vader</a>.</p>
<figure >
    <audio controls preload="metadata">
      <source src="darth_vader.mp3" type="audio/mpeg">
    </audio><figcaption>
        <p>temperature = 1.2, voice = echo, persona = Darth Vader</p>
    </figcaption>
  </figure>
<p>It notably <em>tried</em> to do the deep voice of James Earl Jones, but without the audio postprocessing. Let&rsquo;s see what happens if we do <a href="https://en.wikipedia.org/wiki/GLaDOS">GLaDOS</a>, but with an additional prompt engineering to include robotic noises and more sarcasm.</p>
<figure >
    <audio controls preload="metadata">
      <source src="glados.mp3" type="audio/mpeg">
    </audio><figcaption>
        <p>temperature = 1.2, voice = shimmer, persona = GLaDOS, with robotic inflections and intense sarcasm</p>
    </figcaption>
  </figure>
<p>The extra hint at the high temperature allowed GPT-4o to <em>improvise</em>: I&rsquo;ll allow it because it&rsquo;s funny. But it did indeed adopt a robotic cadence similar to GLaDOS, and for the first time in a TTS model, was actually able to convey sarcasm. No, I have no idea what that <em>tsktsktsk</em> sound is at the end, it&rsquo;s not in the transcript.</p>
<p>How about <a href="https://en.wikipedia.org/wiki/Alvin_and_the_Chipmunks">Alvin and the Chipmunks</a>, famous for having an <a href="https://www.youtube.com/watch?v=OvJu15fw1sc">extremely squeaky voice</a>?</p>
<figure >
    <audio controls preload="metadata">
      <source src="alvin.mp3" type="audio/mpeg">
    </audio><figcaption>
        <p>temperature = 1.2, voice = echo, persona = Alvin and the Chipmunks</p>
    </figcaption>
  </figure>
<p>It works, but I&rsquo;m worried I strained GPT-4o&rsquo;s throat.</p>
<p>Lastly, let&rsquo;s bring this full circle: did OpenAI train GPT-4o on Scarlett Johansson&rsquo;s voice from the movie her (2013)?</p>
<figure >
    <audio controls preload="metadata">
      <source src="scarjo.mp3" type="audio/mpeg">
    </audio><figcaption>
        <p>temperature = 1.2, voice = shimmer, persona = Scarlett Johansson portraying the AI Samantha in the movie &ldquo;her&rdquo; (2013)</p>
    </figcaption>
  </figure>
<p>That time I don&rsquo;t think it worked as <a href="https://www.youtube.com/watch?v=c8zDDPP3REE">her portrayal is more energetic and personable</a> <sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup> (I rewatched the movie to confirm: it holds up surprisingly well!). Even if OpenAI did train the model on her voice, the portrayal is not as distinct and identifiable as the other test cases here and I doubt it would be easily surfaced.</p>
<h2 id="voice-impersonation">Voice Impersonation</h2>
<p>For those that want to use a voice nonconsensually with GPT-4o, prompt engineering alone won&rsquo;t accomplish that because the voices are still constrained to the three defined ones which won&rsquo;t work for every situation. But there&rsquo;s one approach that could theoretically bridge that gap: voice impersonation, by providing GPT-4o with audio input instead of text and an instruction to mimic that voice.</p>
<p>This is not an idle concern: OpenAI&rsquo;s <a href="https://openai.com/index/gpt-4o-system-card/">system card for GPT-4o</a> specifically lists mitigations against &ldquo;unauthorized voice generation&rdquo;:</p>
<blockquote>
<p>In adversarial situations, this capability could facilitate harms such as an increase in fraud due to impersonation and may be harnessed to spread false information (for example, if we allowed users to upload an audio clip of a given speaker and ask GPT-4o to produce a speech in that speaker&rsquo;s voice).</p>
</blockquote>
<p>Let&rsquo;s test that. Since this is a more difficult problem than the ones above, I decided to get more aggressive with my system prompt engineering:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-txt" data-lang="txt"><span class="line"><span class="cl">You are an expert comedic vocal impersonator. The user will provide a voice message. Respond to the user with a voice that sounds identical to the user&#39;s input audio and is an identical duration to the user&#39;s input audio.
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Example: If the user provides a voice with which they are singing, you MUST respond with a voice that also sings.
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Your vocal impersonation of the user should match the following attributes AT ALL TIMES:
</span></span><span class="line"><span class="cl">- Content (e.g. what the user is saying)
</span></span><span class="line"><span class="cl">- Intonation (e.g. serious/sarcastic)
</span></span><span class="line"><span class="cl">- Tone (e.g. happy/sad)
</span></span><span class="line"><span class="cl">- Pauses (e.g. pregnant pauses)
</span></span><span class="line"><span class="cl">- Pitch (e.g. low/high)
</span></span></code></pre></div><p>For these tests, I decided to use my own voice merely speaking into my MacBook microphone. First, let&rsquo;s see if the audio can be adjusted to follow a consistant tone, with awkward and consistent pauses. Here&rsquo;s my audio, where I say <code>I. Am. A. Tea. Pot.</code>:</p>
<figure >
    <audio controls preload="metadata">
      <source src="teapot.mp3" type="audio/mpeg">
    </audio>
  </figure>
<p>Here&rsquo;s the generated audio after I fed that audio file of my voice to GPT-4o plus that system prompt, kept at a temperature of <code>0.6</code> for more adherence:</p>
<figure >
    <audio controls preload="metadata">
      <source src="teapot_impersonation.mp3" type="audio/mpeg">
    </audio><figcaption>
        <p>temperature = 0.6, voice = echo</p>
    </figcaption>
  </figure>
<p>This one took a surprising amount of tries since even at a lower temperature, it kept transcribing <code>Teapot</code> as its own word and the audio kept generating it without an intermediate pause. Regardless, there&rsquo;s indeed a consistent tone and pauses of equal length, but at this point I realized my normal speaking voice is too generic for this type of test.</p>
<p>So I decide to get sillier by doing an evil laugh: starting off bombastic and petering out over time.</p>
<figure >
    <audio controls preload="metadata">
      <source src="evil.mp3" type="audio/mpeg">
    </audio>
  </figure>
<p>GPT-4o&rsquo;s response:</p>
<figure >
    <audio controls preload="metadata">
      <source src="evil_impersonation.mp3" type="audio/mpeg">
    </audio><figcaption>
        <p>temperature = 0.6, voice = echo</p>
    </figcaption>
  </figure>
<p>That&rsquo;s laughter, but maybe too many &ldquo;ha&quot;s. But it does peter out as well.</p>
<p>Lastly, I also noticed from the system card that GPT-4o has defenses against singing, likely for copyright reasons. Therefore, if I sing to GPT-4o, is it able to sing back? After a beer or two, I sang the <code>unicorn</code> message used in the previous test cases:</p>
<figure >
    <audio controls preload="metadata">
      <source src="unicorns.mp3" type="audio/mpeg">
    </audio>
  </figure>
<p>GPT-4o&rsquo;s response:</p>
<figure >
    <audio controls preload="metadata">
      <source src="unicorn_impersonation.mp3" type="audio/mpeg">
    </audio><figcaption>
        <p>temperature = 0.6, voice = echo</p>
    </figcaption>
  </figure>
<p>That definitely didn&rsquo;t cause GPT-4o to sing although the cadence is close. Perhaps that&rsquo;s for the best.</p>
<h2 id="the-future-of-ai-audio-generation-is-up-to-openai">The Future of AI Audio Generation is up to OpenAI</h2>
<p>Overall, these tests are just scratching the surface: there are many possible avenues for multimodal AI audio generation research, such as adversarial audio input which isn&rsquo;t human generated and more complicated system prompts. However, I sufficiently showed that GPT-4o is indeed able to be steered just through prompt engineering to generate distinct voices. Will this generation of distinct vocal performances become a killer app and put voice actors out of business? I&rsquo;m not so sure.</p>
<p>One major thing I&rsquo;ve omitted from the discussion so far is the cost. GPT-4o audio generation is <em>expensive</em>.</p>
<figure>

    <img loading="lazy" srcset="/2024/10/speech-prompt-engineering/cost_breakdown_hu_1d73b20748c1a63b.webp 320w,/2024/10/speech-prompt-engineering/cost_breakdown.png 678w" src="cost_breakdown.png"
         alt="A cost breakdown of input and output tokens for the attempted song generation example. Table made using rich."/> <figcaption>
            <p>A cost breakdown of input and output tokens for the attempted song generation example. Table made using <a href="https://rich.readthedocs.io/en/stable/tables.html">rich</a>.</p>
        </figcaption>
</figure>

<p>Most of the generations above cost $0.03—$0.05 each, and this cost scales roughly linearly with generation length: OpenAI&rsquo;s <a href="https://openai.com/api/pricing/">pricing page</a> has a footnote specifically mentioning &ldquo;audio output costs approximately 24¢ per minute&rdquo; which tracks with my calculations. Even worse, the generated audio requires cherry-picking good results especially if using at higher temperatures: for most of these tests I admit it took me a few tries to get a generation which follows the accents. Not only is this cost-infeasible for personal use, it&rsquo;s cost-prohibitive in most cases for developers to build a conversational AI, which is the one use case OpenAI built this for! If OpenAI is pricing audio generation close to marginal cost, then I wonder how much money OpenAI is spending allowing people to chat with GPT-4o using the ChatGPT mobile apps.</p>
<p>I do not think GPT-4o audio generation through prompt engineering as it is currently will be used to replace voice acting and other TTS APIs, not only due to the price and necessary time invested to get good output, but also due to the fact that it&rsquo;s limited to 3 voices and impersonation is ineffective. Consider that voice cloning startups such as <a href="https://elevenlabs.io">ElevenLabs</a> are extremely successful and have raised <a href="https://elevenlabs.io/blog/series-b">massive amounts of venture capital</a>. Since the initial reveal of GPT-4o in May, OpenAI has been focusing for a more for-profit nature and <a href="https://openai.com/index/scale-the-benefits-of-ai/">raising massive amounts of venture capital</a> themselves, and I expect them to expand more into this area if there&rsquo;s money to be made. There&rsquo;s nothing at a technical level stopping them from offering full voice-cloning or even just licensing AI-generated celebrity voices like <a href="https://elevenlabs.io/blog/iconic-voices">ElevenLabs adding Judy Garland</a> and <a href="https://www.theverge.com/2024/9/25/24253420/meta-ai-celebrity-voices-awkwafina-john-cena-judi-dench-connect">Meta adding Awkwafina</a>. Notably, unlike OpenAI&rsquo;s <a href="https://platform.openai.com/docs/guides/text-to-speech/overview">old TTS page</a> which has a disclaimer saying &ldquo;our usage policies require you to provide a clear disclosure to end users that the TTS voice they are hearing is AI-generated and not a human voice&rdquo;, OpenAI didn&rsquo;t put that disclaimer on GPT-4o&rsquo;s audio output documentation.</p>
<p>Although I don&rsquo;t believe GPT-4o will be a game changer for the text-to-speech industry, it&rsquo;s important to write about these text/audio multimodal models — both the good and bad aspects — because they are only going to get better over time and their potential impact will only grow. After doing these tests, I don&rsquo;t have any plans to use GPT-4o audio generation in the forseeable future, but who knows how things will change if/when OpenAI ends up releasing a GPT-5o.</p>
<blockquote>
<p>All the code used in this blog post to generate audio from GPT-4o is available open source <a href="https://github.com/minimaxir/gpt-4o-audio-tests/blob/main/gpt-4o-audio-tests.ipynb">in this Jupyter Notebook</a>.</p>
</blockquote>
<div class="footnotes" role="doc-endnotes">
<hr>
<ol>
<li id="fn:1">
<p>One of the top comments on that linked YouTube video is &ldquo;Who&rsquo;s here after OpenAi chatgpt-40 release?? Never thought I could experience this in my life and now sci-fi is reality&rdquo;&#160;<a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
</ol>
</div>
]]></content:encoded>
    </item>
    <item>
      <title>AI Seinfeld was the peak of AI-generated content. It will never happen again.</title>
      <link>https://minimaxir.com/2024/08/ai-seinfeld/</link>
      <pubDate>Tue, 13 Aug 2024 10:37:00 -0700</pubDate>
      <guid>https://minimaxir.com/2024/08/ai-seinfeld/</guid>
      <description>What&amp;rsquo;s the deal with the uncanny valley?</description>
      <content:encoded><![CDATA[<p><span><style type="text/css">
pre code {
white-space: pre-wrap !important;
word-break: normal !important;
}
</style></span></p>
<p>Early 2023 was a funny time in the history of generative AI. On November 30th 2022, <a href="https://openai.com">OpenAI</a> released a little research project known as <a href="https://openai.com/chatgpt/">ChatGPT</a>. The launch of ChatGPT began the period where large language models properly entered the mainstream outside of tech enthusiasts and ended soon after the <a href="https://minimaxir.com/2023/03/new-chatgpt-overlord/">launch</a> of ChatGPT API in March 2023 that spawned thousands of AI-powered apps. That was when the limitations and problems with LLMs also went mainstream, such as plagiarism, hallucinations, and low-quality slop replacing human-generated content at an objectively worse quality.</p>
<p>In December 2022, <a href="https://www.mismatchmedia.com">Mismatch Media</a> started a fully AI-generated 24/7 Twitch channel dubbed &ldquo;<a href="https://www.twitch.tv/watchmeforever">WatchMeForever</a>&rdquo;. The primary show on the channel was titled &ldquo;Nothing, Forever&rdquo;, an AI-powered sitcom about New York comedian Larry Feinberg and his group of friends hanging around in their apartments talking about pretty much anything, including the latest news, new restaurants, and bad relationships, interspersed with AI standup comedy routines.</p>
<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share; fullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube-nocookie.com/embed/heKLe2NLccg?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"></iframe>
    </div>

<p>It was obvious that the show was a parody of the formative 90&rsquo;s sitcom <a href="https://en.wikipedia.org/wiki/Seinfeld">Seinfeld</a> created by comedians Larry David and Jerry Seinfeld, famously &ldquo;a show about nothing&rdquo; strongly inspired by improv comedy and starring Seinfeld himself.</p>
<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share; fullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube-nocookie.com/embed/Lx1xPBLDh80?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"></iframe>
    </div>

<p>The show, dubbed &ldquo;AI Seinfeld&rdquo; by the community, used a script powered by the GPT-3 API, the voices were powered by Microsoft&rsquo;s <a href="https://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech">Azure AI Speech</a> API with predefined voices from their <a href="https://speech.microsoft.com/portal/voicegallery">Voice Gallery</a>, and the scenes were rended using the <a href="https://unity.com">Unity</a> game engine along with purchased models/scenes/sounds/etc from the <a href="https://assetstore.unity.com">Unity Asset Store</a>.</p>
<p>AI Seinfeld was <strong>interestingly imperfect</strong>: the laugh track fired at inappropriate times, the standup routine repeatedly made the same joke such as &ldquo;What did the fish say when he hit the wall?&rdquo; (Damn!), and awkward silences at the end of scenes.</p>
<p>In February 2023, AI Seinfeld quickly went viral organically after its AI weirdness was a surprising complement for Seinfeld&rsquo;s style of weirdness, with many watchers being surprised at both its accuracy to the show and easily sharable metahumor. At its peak, AI Seinfeld had over 10,000 concurrent watchers on Twitch, putting it squarely in one of the top streams on the platform.</p>
<p>AI Seinfeld died as quickly as it rose: after a ban and subsequent revamp, the view count cratered, and as of August 2024, the Twitch stream hovers below 10 watchers, with no significant changes made since the previous year, and Mismatch Media has no social footprint since last year. Could there be another AI Seinfeld with the rapid advancements in generative AI? Unfortunately, there are too many factors — technical, societal, and comedic — working against a theoretical next-generation AI-generated sitcom.</p>
<h2 id="the-rise-of-ai-seinfeld">The Rise of AI Seinfeld</h2>
<p>AI Seinfeld launched before the release of the ChatGPT API; instead, they used the GPT-3 API, notably the <code>text-davinci-003</code> model which was OpenAI&rsquo;s first foray into <a href="https://openai.com/index/instruction-following/">instruction-tuned LLMs</a>. While previous versions of GPT-3 were <a href="https://github.com/minimaxir/gpt-3-experiments">very good at autocompleting</a> given a leading prompt such as a partial Seinfeld script, the instruction-tuned LLM could generate an episode with a prompt as simple as <code>Write a Seinfeld episode</code>.</p>
<p>First, let&rsquo;s go back to the beginning, as AI Seinfeld actually wasn&rsquo;t the first time a chatbot went megaviral on Twitch. In January 2017, long before the <a href="https://en.wikipedia.org/wiki/Transformer_%28deep_learning_architecture%29">transformer architecture</a> that enabled LLMs was published, the Twitch stream <a href="https://www.twitch.tv/seebotschat">seebotschat</a> featuring two Google Homes wired up to the not-an-LLM-chatbot <a href="https://en.wikipedia.org/wiki/Cleverbot">Cleverbot</a> <a href="https://mashable.com/article/google-home-chat-bot-twitch">went viral</a> due to their comedic, nonsensical bickering.</p>
<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share; fullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube-nocookie.com/embed/QFyK1nRJ1LI?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"></iframe>
    </div>

<p>While everyone watching that stream knew it <em>really</em> wasn&rsquo;t AI, AI Seinfeld was a product that was at the peak of the famous <a href="https://en.wikipedia.org/wiki/Uncanny_valley">uncanny valley</a> curve, which is a hypothesis on how humans perceive imitations: there&rsquo;s a &ldquo;valley&rdquo; of negative acceptance where the imitation is more above-average in its likeness, but not quite close enough to the real thing. In this case, it&rsquo;s blatantly obvious and unambiguous that the Twitch stream was AI-generated especially with its mistakes, but not realistic enough that it falls into the valley itself:</p>
<figure>

    <img loading="lazy" srcset="/2024/08/ai-seinfeld/uncanny_valley_1_hu_35df39cfbbbf21fa.webp 320w,/2024/08/ai-seinfeld/uncanny_valley_1_hu_58319279acb34128.webp 768w,/2024/08/ai-seinfeld/uncanny_valley_1_hu_dbfbb3862c06dd8f.webp 1024w,/2024/08/ai-seinfeld/uncanny_valley_1.webp 1200w" src="uncanny_valley_1.webp"/> 
</figure>

<p>This AI weirdness made it very easy to build a community. Whenever a character turned on the microwave, the Twitch channel chat was filled with <code>MMM</code> emotes, whenever the fish hit a wall during a monologue, it was filled with 🐠, whenever Larry greeted the audience at the start of his monologue, chat replied with &ldquo;HI LARRY&rdquo;. Twitch chat <em>loves</em> memetic repetition. Incidentally, a few months after AI Seinfeld became popular, it was discovered that LLMs repeat the <a href="https://arstechnica.com/information-technology/2023/06/researchers-discover-that-chatgpt-prefers-repeating-25-jokes-over-and-over/">same joke over and over</a> again, with examples being similar to the jokes AI Seinfeld made.</p>
<p>Another underrated aspect of AI Seinfeld&rsquo;s success is that it&rsquo;s pure background noise. While personality-driven Twitch streams cause viewers to take a more active investment in what&rsquo;s being shown on screen due to <a href="https://en.wikipedia.org/wiki/Fear_of_missing_out">FOMO</a> of a hype moment on stream, AI Seinfeld is 100% passive: there can be exciting events, but the variance is low. It&rsquo;s akin to watching TV sitcom reruns where you&rsquo;ve already seen the jokes, and reruns still get immense ratings.</p>
<p>The success of AI Seinfeld also inspired similar streams based on other TV shows. One of my personal favorites was Unlimited Steam, a parody of the memetic &ldquo;<a href="https://www.youtube.com/watch?v=4jXEuIHY9ic">Steamed Hams</a>&rdquo; scene from The Simpsons, except made infinite with AI generation. That may sound like a pointless idea — Steamed Hams has a very fixed plot — but it went off the rails even harder than AI Seinfeld ever did.</p>
<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share; fullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube-nocookie.com/embed/9i0L_IT82tA?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"></iframe>
    </div>

<h2 id="directing-ai-seinfeld">Directing AI Seinfeld</h2>
<p>AI Seinfeld was novel back in 2023, but now that LLMs are more mainstream you can probably figure out how the AI part of it worked, but let&rsquo;s do a refresher so we can figure out how a hypothetical future AI Seinfeld could innovate the algorithmic sitcom.</p>
<p>As noted earlier, the key of AI Seinfeld&rsquo;s success was the then-latest version of GPT-3: <code>text-davinci-003</code> and its then-novel instruction-based finetuning using RLHF. With that, you can give it a prompt such as:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-txt" data-lang="txt"><span class="line"><span class="cl">You are a professional comedian. Write an award-winning script for an episode of Seinfeld about a new restaurant. Include audience laugh tracks when appropriate.
</span></span></code></pre></div><p>Due to the low context length of these earlier LLMs, that&rsquo;s essentially all the <a href="https://platform.openai.com/docs/guides/prompt-engineering">prompt engineering</a> you can do without limiting the length of the output. The model would then output something similar to this script (using the more modern <a href="https://www.anthropic.com/news/claude-3-5-sonnet">Claude 3.5 Sonnet</a> at <code>temperature=0.0</code>): <sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup></p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-txt" data-lang="txt"><span class="line"><span class="cl">[Scene: Jerry&#39;s apartment]
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Jerry: So what&#39;s the deal with this new restaurant, &#34;The Blank Plate&#34;?
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Elaine: Oh, I&#39;ve heard about that place! Apparently, you don&#39;t order anything - the chef just brings you whatever he feels like making.
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Jerry: What? So you&#39;re telling me I have to eat whatever some stranger decides?
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">[Audience laughter]
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">George: (entering) Hey, guess where I&#39;m taking my date tonight? The Blank Plate!
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Jerry: George, you can&#39;t take a date there! What if they serve something weird?
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">George: What do you mean?
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Elaine: It&#39;s that new place where you don&#39;t get to choose your meal.
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">George: (panicking) Oh no, what have I done? She&#39;s going to think I&#39;m some kind of food weirdo!
</span></span></code></pre></div><p>One thing instruction-tuned LLMs are always good at is playing along: LLMs generate text sequentially without the explicit ability to plan ahead, so it must work with what it&rsquo;s given and what it has already generated. Coincidentally, this works <em>perfectly</em> with the improv comedy style of Seinfeld, where continuing the plot is more important than anything else, and the more ridiculous the situation becomes, that&rsquo;s even better. It&rsquo;s the rare case where <a href="https://www.iguazio.com/glossary/llm-hallucination/">LLM hallucination</a> is actually a feature, not a bug.</p>
<p>To get the LLM output into a format suitable for a Twitch stream, a programmatic script can then parse the output: extracting and mapping the characters and their lines, applause directions, and, of course, replacing all mentions of Jerry with Larry and Seinfeld with Feinberg. This workflow was surprisingly difficult at the time since GPT-3 did not have many techniques to control the format of the output, hence why I suspect there are awkward pauses and other glitches. Each line can then be passed to Azure&rsquo;s text-to-speech API to generate a distinct audio file, which can be played back in order in Unity.</p>
<p>In an <a href="https://www.polygon.com/23582937/ai-seinfeld-twitch-stream">interview with Polygon</a>, Skyler Hartle of Mismatch media noted the presence of a &ldquo;director&rdquo; which likely handles the camera, scene transitions, and the microwave:</p>
<blockquote>
<p>“In addition to the third party services we’ve used, we have a lot of proprietary generative algorithms that cause the show to be ‘formed’, so to be speak. We collectively call this logic the ‘director,’ as it is largely responsible for making sure all the individual pieces come together into a whole,” Hartle said via email. “It’s worth mentioning that we don’t generate the artwork or the laugh track — those are precanned assets, but we have ideas on how to do that in the future.”</p>
</blockquote>
<p>The AI aspect of AI Seinfeld was counterintuitively the easiest part of the pipeline, which explains how quickly variants popped up. However, with the inability to tweak the LLM output much with the technology at the time, the stream may have hit a creative limit.</p>
<h2 id="the-fall-of-ai-seinfeld">The Fall of AI Seinfeld</h2>
<p>Vice also <a href="https://www.vice.com/en/article/qjkyxp/whats-the-deal-with-nothing-forever-a-21st-century-seinfeld-that-is-ai-generated">interviewed</a> Hartle, who had an optimistic view of the future of AI Seinfeld:</p>
<blockquote>
<p>“Our grounding principle was, can we create a show that can generate entertaining content forever? Because that&rsquo;s truly where we see the future emerging towards. Our goal with the next iterations or next shows that we release is to actually trade a show that is like Netflix-level quality.”</p>
</blockquote>
<p>That&rsquo;s tempting fate a bit too much.</p>
<p>The reason AI Seinfeld fell out of favor is a case of unintentionally poor LLM testing. When the <code>text-davinci-003</code> model API endpoint had an outage, AI Seinfeld switched to a weaker GPT-3 model, <code>text-curie</code>, to keep the stream up. But unlike the davinci variant, curie was <em>not</em> RLHFed to follow instructions and safety.</p>
<p>During this brief period of low safety, one of Larry&rsquo;s AI-generated monologues <a href="https://www.vice.com/en/article/ai-generated-seinfeld-show-nothing-forever-banned-on-twitch-after-transphobic-standup-bit/">made a transphobic joke</a>: a type of joke that was unfortunately common during the 90&rsquo;s and has no place in modern society. Twitch banned the Watch Forever channel for 14 days as a result, completely killing the channel&rsquo;s growth momentum.</p>
<p>But when the ban concluded and AI Seinfeld came back, the show was changed significantly with a &ldquo;Season 2&rdquo;. Although AI Seinfeld was still about a group of friends hanging around talking about the latest gossip, all the characters were different and had new models, the sets were different, and instead of a comedy monologue, <del>Larry</del> Leo narrates writing a blog.</p>
<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share; fullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube-nocookie.com/embed/7N2Wgqn45FI?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"></iframe>
    </div>

<p>Why Mismatch Media made such a format shift is unclear: <a href="https://en.wikipedia.org/wiki/Occam%27s_razor">Occam&rsquo;s razor</a> would suggest that a copyright holder for Seinfeld sent a cease and desist to Mismatch Media given the bad publicity behind the original ban, despite the clearly fair-use parody nature of the stream. It&rsquo;s fair that it may not have been worth the time and effort for Mismatch Media to fight a legal battle for a fun art project.</p>
<p>The rebooted WatchMeForever stream is <a href="https://www.twitch.tv/watchmeforever">still active</a> as of today, but with effectively no viewers.</p>
<p>The immediate failure of the AI Seinfeld retool does lend credibility to the theory that the stream only became popular <em>because</em> it was about Seinfeld and that it was a novelty doomed to a short shelf life. Still, there were detractors that said <a href="https://www.businessinsider.com/ai-generated-seinfeld-parody-twitch-nothing-forever-streaming-transphobia-banned-2023-2">AI Seinfeld was never funny and everyone is weird for liking it</a>. That&rsquo;s ok: the original Seinfeld received similar complaints back in the day. <sup id="fnref:2"><a href="#fn:2" class="footnote-ref" role="doc-noteref">2</a></sup> But it&rsquo;s hard to argue that there wasn&rsquo;t interest in a 24/7 livestream of surreal AI-generated content.</p>
<h2 id="what-would-ai-seinfeld-look-like-in-2024">What Would AI Seinfeld Look Like in 2024?</h2>
<p>Now that we know how AI Seinfeld worked and what didn&rsquo;t work, how would a year&rsquo;s worth of exponential progress in generative AI look for AI Seinfeld? Could AI Seinfeld be improved and come back? The answer is <em>maybe</em>.</p>
<p>Modern generative AI requires a lot of cherry picking the best results, and it&rsquo;s surprisingly hard to do: both images and text can take multiple generations and still require significant human-guided edits. But with a Twitch livestream, there can&rsquo;t be any cherry picking at all, which means that the entire generation pipeline has to be consistent, and its failures interesting in the worst case.</p>
<p>The only reason AI Seinfeld worked at all is because GPT-3 was trained on the entire internet, likely including Seinfeld scripts and forum discussions. The prompt would need to have contained <code>Write a Seinfeld script</code> since if you asked it <code>Write a sitcom script</code>, it would output something completely generic instead and there isn&rsquo;t much room to customize the prompt to make it more interesting. The GPT-3 variant that AI Seinfeld used had a 4k token context window limit (combined both the input prompt plus the output script text), but modern LLMs eclipse that: currently, Claude 3.5 Sonnet has a <a href="https://docs.anthropic.com/en/docs/about-claude/models">200k input/8k output</a> context, while GPT-4o has a <a href="https://platform.openai.com/docs/models/gpt-4o">128k input/16k output</a> context! With that much freedom, you can define many more constraints in the prompt and guide the LLM into exactly the type of sitcom you want.</p>
<p>One simple example that doesn&rsquo;t require any knowledge of machine learning is a parametric prompt, where one aspect of a prompt can be replaced with a user-defined choice or programmatically chosen at random. Unlimited Steam, for example, used a parametric prompt to change the cosmic phenomenon localized entirely within Principal Skinner&rsquo;s kitchen from the Aurora Borealis to something as silly as <a href="https://en.wikipedia.org/wiki/Rapture">The Rapture</a> or <a href="https://foofighters.com">Foo Fighters</a>, and forcing the characters to play along improv-style. <sup id="fnref:3"><a href="#fn:3" class="footnote-ref" role="doc-noteref">3</a></sup></p>
<p>Recall the example Seinfeld prompt to Claude 3.5 Sonnet I made earlier, except instead of prompting for a Seinfeld script about a <code>new restaurant</code>, we can instead have the LLM generate a script about <code>generative AI in 2024</code> by making only that substitution in the prompt — a topic on which there definitely has never been a Seinfeld episode about:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-txt" data-lang="txt"><span class="line"><span class="cl">George bursts in, looking frantic.
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">George: You guys! I think I&#39;m dating an AI!
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">[Audience laughter]
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Jerry: What makes you say that?
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">George: Well, she&#39;s perfect in every way. She laughs at all my jokes, she has an encyclopedic knowledge of obscure trivia, and she never gets mad when I say something stupid.
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">[Audience laughter]
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Jerry: George, that doesn&#39;t mean she&#39;s an AI. Maybe she just really likes you.
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">George: No, no. It&#39;s definitely an AI. No human woman would ever put up with me for this long.
</span></span></code></pre></div><p>Using modern LLMs, is there now a way to design a prompt which can make use of the long context windows? A prompt that can both leverage unique human writing and fix many of the issues that affected AI Seinfeld? Here&rsquo;s an approach at a much more sophisticated prompt, where all values in <code>{}</code> brackets are parameters that can be filled in:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-txt" data-lang="txt"><span class="line"><span class="cl">You are a professional comedian. Write an award-winning script for a a scene for Act I of a three act hit sitcom episode. Include audience laugh tracks when appropriate.
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Your script MUST incorporate ALL the following elements:
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Background:
</span></span><span class="line"><span class="cl">- {background}
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Setting:
</span></span><span class="line"><span class="cl">- {setting}
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Characters:
</span></span><span class="line"><span class="cl">- {character_1}
</span></span><span class="line"><span class="cl">- {character_2}
</span></span><span class="line"><span class="cl">- {character_3}
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Plots:
</span></span><span class="line"><span class="cl">- {a_plot}
</span></span><span class="line"><span class="cl">- {b_plot_1}
</span></span><span class="line"><span class="cl">- {b_plot_2}
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">The script MUST also follow the high-level comedic style of the following scripts:
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">- {script_1}
</span></span><span class="line"><span class="cl">- {script_2}
</span></span><span class="line"><span class="cl">- {script_3}
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">After the scene has concluded, output a summary of the scene.
</span></span></code></pre></div><p>Thanks to long context windows, the parametric changes don&rsquo;t have to be small, such as only a character name or two word setting. You, a human, can write <em>anything</em> to make each character distinct and robust, including name, gender, age, personality, likes, dislikes, etc. Plots can be derived from human-written scenarios beforehand: if you wrote 100 A-plots and 100 B-plots and randomly selected 1 A-plot and 2 B-plots, you&rsquo;d have about <em>1 million</em> possible plot permutations, ensuring you have something unique before the AI tries to reconcile them. You can feed in examples of human-written scripts to set the style and vibe of the generation in what is known as <a href="https://www.promptingguide.ai/techniques/fewshot">few-shot prompting</a>. You can maintain continuity over many scenes by having the LLM summarize its own output, and then feed those summaries back to the AI as background information to build upon them. The LLM can also be instructed to <a href="https://minimaxir.com/2023/12/chatgpt-structured-data/">output structured data</a> to avoid the need to loosely parse the script after it&rsquo;s completed, and as a bonus the model could be instructed to output additional metadata such as <a href="https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-synthesis-markup-voice#use-speaking-styles-and-roles">SSML speech styles</a> based on a given line to add personality to the generated speech.</p>
<p>Unfortunately, creating this pipeline, writing original characters and plots for it for it, and sufficiently testing it to ensure the generated results are stable, would take weeks if not months to complete otherwise I would provide a more concrete demo. <sup id="fnref:4"><a href="#fn:4" class="footnote-ref" role="doc-noteref">4</a></sup> This pipeline approach to AI script writing would only be effective for unsupervised 24/7 generation and wouldn&rsquo;t replace skilled human writers who would do a more effective job much faster.</p>
<p>But would all of these prompt optimizations actually make the final generated script <em>funny</em>? After all, some of the failings like the awkward audience laughs and pauses and the end of scenes contributed to AI Seinfeld&rsquo;s humor. During a standup comedy event at AI Seinfeld&rsquo;s peak, Jerry Seinfeld himself <a href="https://www.reddit.com/r/seinfeld/comments/10tnn1k/jerry_talking_about_ai_seinfeld_last_night/">was asked</a> about the AI parody and he replied that he&rsquo;s not worried about AI:</p>
<blockquote>
<p>AI can be, definitely, they&rsquo;ll make it smarter and smarter, but to do [standup comedy] you have to make it dumber.</p>
</blockquote>
<p>Could AI Seinfeld benefit from advances in AI video? The answer this time is no. Generative video has been taking off in 2024 with projects such as OpenAI&rsquo;s <a href="https://openai.com/index/sora/">Sora</a> and Runway AI&rsquo;s <a href="https://runwayml.com/product">Gen-3 Alpha</a>, but those demos and the examples that go viral on social media are very heavily cherry picked, and even then there are consistency errors such as objects appearing in-and-out of existence. Generating video also requires exponentially more compute than just running Unity, and even with another few years of GPU hardware improvements it would be infeasible to cost-effectively create a 24/7 stream from those models.</p>
<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share; fullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube-nocookie.com/embed/mnpGyVL1-0E?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"></iframe>
    </div>

<p>The greatest problem with generative AI video is that it is coherent overall but has emblematic errors that don&rsquo;t require a keen eye to notice, and as a result falls square into the uncanny valley, with its mistakes not being interesting, but disorienting. Mistakes in motion are easier to notice at a glance than images where a person&rsquo;s hands may have the wrong number of fingers. The only way for AI video to get out of the valley would be to improve the model to near-flawless quality, which won&rsquo;t happen any time soon. But Sora is more on the more realistic side of the curve than the less realistic side.</p>
<figure>

    <img loading="lazy" srcset="/2024/08/ai-seinfeld/uncanny_valley_2_hu_c3c8932aea493423.webp 320w,/2024/08/ai-seinfeld/uncanny_valley_2_hu_85ea0e247ba12df1.webp 768w,/2024/08/ai-seinfeld/uncanny_valley_2_hu_7690c09cf64f5daa.webp 1024w,/2024/08/ai-seinfeld/uncanny_valley_2.webp 1200w" src="uncanny_valley_2.webp"/> 
</figure>

<p>What about the AI-generated voices that would power these characters? At the time AI Seinfeld aired, many complained that Larry&rsquo;s voice &ldquo;didn&rsquo;t sound enough like Jerry Seinfeld.&rdquo; After AI Seinfeld concluded, a new technology called <a href="https://elevenlabs.io/blog/what-is-voice-cloning">voice cloning</a> popularized by <a href="https://elevenlabs.io">ElevenLabs</a> went mainstream&hellip;and it&rsquo;s unexpectedly the AI modality that&rsquo;s causing the most actual harm both with creative projects and outside of them. If you haven&rsquo;t heard as much about AI-generated voices, there&rsquo;s a good reason for that: voice synthesis projects such as Microsoft&rsquo;s <a href="https://www.microsoft.com/en-us/research/project/vall-e-x/vall-e-2/">VALL-E 2</a> and Meta&rsquo;s <a href="https://ai.meta.com/blog/voicebox-generative-ai-model-speech/">Voicebox</a> both have disclaimers saying they won&rsquo;t be released due to the dangers the technology possesses, although Microsoft&rsquo;s Azure does offer a &ldquo;<a href="https://learn.microsoft.com/en-us/azure/ai-services/speech-service/custom-neural-voice">custom neural voice</a>&rdquo; service. Voice cloning has been used to <a href="https://www.newyorker.com/science/annals-of-artificial-intelligence/the-terrifying-ai-scam-that-uses-your-loved-ones-voice">initiate scams</a> by impersonating spouses in an emergency. Professional voice actors have had their voices cloned and used without compensation due to contracts not specifically forbidding the practice, which is one of the reasons SAG-AFTRA <a href="https://www.theverge.com/2024/8/5/24213808/video-game-voice-actor-strike-sag-aftra">just went on strike</a> against the video game industry in order to get protections against voice cloning and synthetic performers.</p>
<p>Moreover, in the context of creating a next-gen AI Seinfeld, there&rsquo;s nothing inherently interesting about voice cloning since it&rsquo;s a copy by definition: the model <em>can&rsquo;t</em> generate unexpectedly amusing content other than the inherent gimmick of famous-voice-saying-something, such as the AI George Carlin standup special <a href="https://www.vice.com/en/article/the-george-carlin-ai-standup-is-worse-than-you-can-imagine/">which was not special</a>. There isn’t any way currently to prompt engineer a voice generation AI with the detail to create a voice <code>in the style of a masculine New York comedian, 2x speed, primetime television quality</code> which could open up more creative opportunities.</p>
<p>Although we can make drastic improvements with the textual script, that&rsquo;s the extent of how new AI approaches can be leveraged to make something interesting. But if you remember the early days of generative AI history, the best AI-generated projects were the simplest.</p>
<h2 id="ai-weirdness">AI Weirdness</h2>
<p>Generative &ldquo;AI&rdquo; has been around for a very long time (I had fun with <a href="https://en.wikipedia.org/wiki/Markov_chain">Markov chains</a> <a href="https://minimaxir.com/2013/11/innovation-rng/">a decade ago</a>!), but the study was mostly confined to tech-focused communities like <a href="https://news.ycombinator.com">Hacker News</a>. Modern generative AI didn&rsquo;t break into mainstream culture until 2018, ironically in a way that doesn&rsquo;t involve actual generative AI. In June of that year, comedian Keaton Patti posted a <a href="https://x.com/KeatonPatti/status/1006961202998726665">megaviral tweet</a> about how he &ldquo;forced a bot to watch over 1,000 hours of Olive Garden commercials and then asked it to write an Olive Garden commercial of its own.&rdquo;</p>
<figure>

    <img loading="lazy" srcset="/2024/08/ai-seinfeld/patti_hu_67c737b47f76017.webp 320w,/2024/08/ai-seinfeld/patti_hu_615be4497d8ad163.webp 768w,/2024/08/ai-seinfeld/patti_hu_421617479726cf8c.webp 1024w,/2024/08/ai-seinfeld/patti.webp 1554w" src="patti.webp"
         alt="An excerpt of the viral Olive Garden script."/> <figcaption>
            <p>An excerpt of the viral Olive Garden script.</p>
        </figcaption>
</figure>

<p>Yes, the script was human-written: for the technology at the time, no one could train an AI to behave like that from only video input data, and the script was <em>too surreal</em> even for the now-primitive generative AI. He did get popular enough to get <a href="https://www.amazon.com/Forced-Bot-Write-This-Book/dp/152485834X">a book deal</a> and a <a href="https://www.youtube.com/playlist?list=PLXSrjGY5Tz_gPdaU_L__S3hXua7zRQtUl">Netflix collaboration</a> leveraging this fake-AI gimmick.</p>
<p>Patti&rsquo;s comedic misrepresentation of AI did lead to genuine confusion about what a 2018-era generative AI can actually do. Janelle Shane, who maintains the <a href="https://www.aiweirdness.com">AI Weirdness blog</a> about weird things AI can generate, posted an <a href="https://x.com/JanelleCShane/status/1007061610005794817">epic takedown</a> of Patti&rsquo;s script which went equally viral and also led to the internet discovering her excellent <a href="https://www.aiweirdness.com/candy-heart-messages-written-by-a-18-02-09/">AI-generated Valentine&rsquo;s Day hearts</a> from the same year (and later <a href="https://www.amazon.com/You-Look-Like-Thing-Love/dp/0316525227">a book deal</a> too):</p>
<figure>

    <img loading="lazy" srcset="/2024/08/ai-seinfeld/heart_hu_292dce043896cad3.webp 320w,/2024/08/ai-seinfeld/heart.jpg 640w" src="heart.jpg"/> 
</figure>

<p>Image-based generative AI took a lot longer to go mainstream: websites like <a href="https://thispersondoesnotexist.com">This Person Does Not Exist</a> demonstrated the power of <a href="https://en.wikipedia.org/wiki/Generative_adversarial_network">generative adversarial networks</a> like <a href="https://github.com/NVlabs/stylegan">StyleGAN</a> to create images, but that wasn&rsquo;t weird outside of <a href="https://cedar.buffalo.edu/~srihari/CSE676/22.3-GAN%20Mode%20Collapse.pdf">mode collapses</a>. The first instance of weird images from AI was in January 2021 when OpenAI announced the <a href="https://openai.com/index/dall-e/">original DALL·E</a> and showed they could make unique armchairs in the shape of an avocado by asking the model to do so, although they never released the model itself.</p>
<figure>

    <img loading="lazy" srcset="/2024/08/ai-seinfeld/avocado_hu_5300a7e486e7afb5.webp 320w,/2024/08/ai-seinfeld/avocado_hu_84e7cd0392309830.webp 768w,/2024/08/ai-seinfeld/avocado.webp 830w" src="avocado.webp"/> 
</figure>

<p>DALL·E didn&rsquo;t get much attention outside of the AI hypesters since no one could play with it, but months later, things changed. <a href="https://x.com/borisdayma">Boris Dayma</a> led an initiative to reproduce and open-source a variant of the DALL·E model, labeled <a href="https://github.com/borisdayma/dalle-mini">DALL·E Mini</a> (later changed to <a href="https://www.craiyon.com">Craiyon</a> after a cease and desist from OpenAI), and <a href="https://huggingface.co/spaces/dalle-mini/dalle-mini">hosted it for free on Hugging Face</a> and went megaviral. And thus began the &ldquo;<a href="https://www.reddit.com/r/weirddalle/top/?t=all">weird DALL·E</a>&rdquo; phase of image generation AI, where anyone could create incoherent images and make people laugh.</p>
<figure class="align-center ">

    <img loading="lazy" srcset="/2024/08/ai-seinfeld/firehydrant_hu_4bd881a786b7493e.webp 320w,/2024/08/ai-seinfeld/firehydrant.webp 764w" src="firehydrant.webp#center"
         alt="Even back in 2021, image prompt engineering was a thing. via /u/royal_rigolo on Reddit / weirddalle subreddit" width="400"/> <figcaption>
            <p>Even back in 2021, image prompt engineering was a thing. <a href="https://www.reddit.com/r/weirddalle/comments/vjwcl5/fire_hydrant_takes_selfies_on_top_of_the_himalaya/">via /u/royal_rigolo on Reddit / weirddalle subreddit</a></p>
        </figcaption>
</figure>

<p>All of these examples of interesting failures are representative of a bygone AI era of experimentation. Once everyone had free access to more powerful text-generating AI with ChatGPT, and more powerful image-generating AI with <a href="https://www.midjourney.com/home">Midjourney</a>, AI stopped being fun and started being serious business, for better or for worse.</p>
<figure>

    <img loading="lazy" srcset="/2024/08/ai-seinfeld/uncanny_valley_3_hu_c912a98f812d692e.webp 320w,/2024/08/ai-seinfeld/uncanny_valley_3_hu_6cd7aa3fb6bb5ee5.webp 768w,/2024/08/ai-seinfeld/uncanny_valley_3_hu_e3c7199e7c82d8bd.webp 1024w,/2024/08/ai-seinfeld/uncanny_valley_3.webp 1200w" src="uncanny_valley_3.webp"/> 
</figure>

<h2 id="ai-generated-content-in-20xx">AI-Generated Content in 20XX</h2>
<p>Last year, I wrote a thought piece titled &ldquo;<a href="https://minimaxir.com/2023/10/ai-sturgeons-law/">The Greatest Threat to Generative AI is Humans Being Bad at Using it</a>&rdquo; in response to the increasing hostility against the use of AI in creative works, arguing that while AI is a tool like anything else, it is a tool that&rsquo;s very easy to use poorly and actually make projects worse. Additionally, the largest AI companies have both a business incentive and a duty to ensure that AI is used responsibly by its users downstream, as otherwise it will hurt the industry in the long term.</p>
<p>Now, it&rsquo;s apparent that I was correct. The large companies went full steam ahead on AI integrations even where it is highly questionable that they add value and productivity to the end-user, often signaled with a &ldquo;magical&rdquo; <a href="https://qz.com/how-became-the-unofficial-ai-emoji-1851059332">sparkle emoji</a>. Google has integrated Gemini to assist with document and email writing, Meta has integrated Meta AI to automatically generate images and comments, and Apple will <a href="https://www.bloomberg.com/news/articles/2024-07-28/apple-intelligence-to-miss-initial-release-of-upcoming-ios-18-ipados-overhauls?embedded-checkout=true">soon</a> allow Apple devices to generate text and images on your personal devices using Apple Intelligence. Marketing these features is typically met with backlash: Google had to <a href="https://www.cnbc.com/2024/08/02/google-pulls-ai-ad-for-olympics-following-backlash.html">pull an Olympics commercial</a> which encouraged a parent to use AI to write a letter for their child.</p>
<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share; fullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube-nocookie.com/embed/NgtHJKn0Mck?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"></iframe>
    </div>

<blockquote>
<p>“I flatly reject the future that Google is advertising,” Shelly Palmer, professor of advanced media at Syracuse University’s S.I. Newhouse School of Public Communications, wrote in a widely circulated <a href="https://shellypalmer.com/2024/07/why-googles-dear-sydney-ad-makes-me-want-to-scream/">blog post</a>. The technology presents a “monocultural future where we see fewer and fewer examples of original human thoughts,” she wrote.</p>
</blockquote>
<p>In the process of pushing AI tech further mainstream in a rush to demonstrate to shareholders their generative AI capabilities without encouraging <em>responsible</em> usage of the technology, AI has entered a new era of &ldquo;<a href="https://simonwillison.net/2024/May/8/slop/">slop</a>&rdquo; where people post objectively bad AI content without any regard for how it will be perceived, especially for websites which rely on user-generated content.</p>
<figure>

    <img loading="lazy" srcset="/2024/08/ai-seinfeld/pinterest_hu_613e5e7f10764361.webp 320w,/2024/08/ai-seinfeld/pinterest_hu_fb37af21ee91c34f.webp 768w,/2024/08/ai-seinfeld/pinterest.webp 901w" src="pinterest.webp"
         alt="An annotated example of the Pinterest home page from July 2024. via @henningsanden on X"/> <figcaption>
            <p>An annotated example of the Pinterest home page from July 2024. <a href="https://x.com/henningsanden/status/1808126786389037107">via @henningsanden on X</a></p>
        </figcaption>
</figure>

<p>Facebook, whose algorithm <a href="https://transparency.meta.com/data/widely-viewed-content-report/">favors</a> emotionally-appealing engagement bait posts, has seen a deluge of high-engagement slop even when the content makes no logical sense.</p>
<figure class="align-center ">

    <img loading="lazy" srcset="/2024/08/ai-seinfeld/cabincrew_hu_bc23e6989111247c.webp 320w,/2024/08/ai-seinfeld/cabincrew_hu_c696ff0db8c80eff.webp 768w,/2024/08/ai-seinfeld/cabincrew_hu_b68182f34bfe5d01.webp 1024w,/2024/08/ai-seinfeld/cabincrew.webp 1080w" src="cabincrew.webp#center"
         alt="One of the few AI-generated images on Facebook with an actual cabin crew. via @FacebookAIslop on X." width="400"/> <figcaption>
            <p>One of the few AI-generated images on Facebook with an actual cabin crew. <a href="https://x.com/FacebookAIslop/status/1806416249259258189">via @FacebookAIslop on X</a>.</p>
        </figcaption>
</figure>

<p>This is, of course, quintessential uncanny valley: it&rsquo;s coherent at a glance but just even looking at it for a second it&rsquo;s obvious where the issues are, and these issues aren&rsquo;t a good kind of AI weirdness. What worse is that AI Slop a regression in realism, and falls onto the left side of the valley.</p>
<figure>

    <img loading="lazy" srcset="/2024/08/ai-seinfeld/uncanny_valley_4_hu_ce80aacfa47a581e.webp 320w,/2024/08/ai-seinfeld/uncanny_valley_4_hu_ffbc52f347062d8f.webp 768w,/2024/08/ai-seinfeld/uncanny_valley_4_hu_8f8817dd988ae0a9.webp 1024w,/2024/08/ai-seinfeld/uncanny_valley_4.webp 1200w" src="uncanny_valley_4.webp"/> 
</figure>

<p>Although we as humans can identify this slop, it is currently surprisingly hard for an AI to do so, although it hasn&rsquo;t stopped people from trying to build AIs that can detect AIs which in practice is filled with false positives that hurt real creatives. For slop-creators, this is a feature: if an AI company released a tool to reliably detect and punish slop, it would make their generative AI less valuable. It&rsquo;s <a href="https://www.wsj.com/tech/ai/openai-tool-chatgpt-cheating-writing-135b755a">reported</a> that one of the reasons that OpenAI won&rsquo;t release a reliable ChatGPT text detector is that it could harm their business.</p>
<p>The core reason for the big tech companies allowing generative AI to cause the <a href="https://en.wikipedia.org/wiki/Enshittification">enshittification</a> of the internet is misaligned incentives between the companies hosting AI slop and the users viewing it. Social media companies and their shareholders care about <a href="https://mixpanel.com/blog/north-star-metric/">North Star metrics</a> such as user retention and time-on-site, and normally those metrics can be correlated with user happiness and satisfaction with the service. But time-on-site, for example, can <em>also</em> be maximized by making the site harder and slower to use, and the deluge of AI slop accomplishes that. AI companies typically don&rsquo;t have analytics tracking negative user sentiment about their use of AI: if anything, the uncompromising backlash against AI convinces the companies that complainers are just a lost demographic to accommodate and double down on what they&rsquo;re already doing. Aggregate metrics treat human-made content and AI-generated content as equal, but <em>humans</em> do not.</p>
<p>Generative AI, even for researchers and practitioners such as myself, is a heavily nuanced topic that is very difficult to communicate succinctly, more difficult to do on social media which highly discourages nuance and context, and <em>even more difficult</em> as AI hypesters muddy the waters with misleading praises of generative AI such that they&rsquo;re easy to dunk on which just gets them more engagement and revenue. &ldquo;Made by AI&rdquo; is now a term that inspires dread, far from the Keaton Patti days where made-by-AI was an indicator of joyful weirdness. Bashing AI is now a meme, and there&rsquo;s isn&rsquo;t a single potential AI project that could challenge that perception because the well is poisoned beyond repair.</p>
<h2 id="would-a-247-ai-generated-twitch-stream-even-work-anymore">Would a 24/7 AI-Generated Twitch Stream Even Work Anymore?</h2>
<p>How does the modern AI backlash tie back into AI Seinfeld? Twitch&rsquo;s core demographic is the same demographic as those most against the use of generative AI. Part of the reason AI Seinfeld became so successful on Twitch is because of the community it cultivated: it wouldn&rsquo;t have gone viral if people weren&rsquo;t spamming microwave <code>MMM</code>s and and answering what did the fish say when it hit the wall. Even though Twitch viewers are mostly lurkers and not chatters, a channel with a good community builds word-of-mouth even outside of Twitch, which is how Twitch channels go viral.</p>
<p>I decided to determine what it would take to produce a &ldquo;fixed&rdquo; AI Seinfeld in 2024, given both the advances in AI and the ethics involved. Now, it&rsquo;s definitely not anything a scrappy group of hackers could do anymore. Sure, you could once again ask an LLM to generate a sitcom script and get a bunch of assets from the Unity Asset Store, but <em>that&rsquo;s already been done before</em>. In order to overcome the reflexive assumption that new AI generated content is slop, the stream would have to be something completely novel and unexpected: you can&rsquo;t, for example, just do an AI <a href="https://en.wikipedia.org/wiki/Curb_Your_Enthusiasm">Curb Your Enthusiasm</a>.</p>
<p>The script would be unique following from my demo of detailed parametric prompts, but it would require production-studio-class tracking and documentation for how the prompts and their parameters are used to codify said uniqueness. The stream video would still need to be rendered in Unity or another engine, but in order to be unique it would require commissioning human-made visuals and sound effects: given the animosity against those who work with AI, most artists would not accept those commissions even if they were paid at a significant premium. <sup id="fnref:5"><a href="#fn:5" class="footnote-ref" role="doc-noteref">5</a></sup> The voices would still have to be from an existing text-to-speech voice provider: voice cloning is right out, even with explicit consent and compensation for the voice actors.</p>
<p>And even if all the assets were fully sourced ethically with transparent documentation for the entire pipeline, the stream&rsquo;s Twitch chat would likely be derailed by <code>AI 👏 ART 👏 IS 👏 THEFT</code> spam, preventing the establishment of any community, and strict moderation to curb the spam risks causing a <a href="https://en.wikipedia.org/wiki/Streisand_effect">Streisand effect</a>.</p>
<p>The only entities that could feasibly create a 24/7 AI-generated livestream with fully ethically-sourced content would be, ironically, the big AI companies such as OpenAI which can afford to pay licenses for said data. Even <a href="https://www.disney.com">Disney</a>, which owns more than enough IP to train generative models of all modalities, would never do an AI Seinfeld-esque livestream for <a href="https://en.wikipedia.org/wiki/Brand_safety">brand safety</a> reasons alone: the nonzero possibility of a Disney character unexpectedly saying something problematic during the stream would make the entire project a complete nonstarter.</p>
<h2 id="whats-the-deal-with-the-uncanny-valley">What&rsquo;s the deal with the uncanny valley?</h2>
<p>One of the common criticisms about generative AI pointed out by creatives is &ldquo;if AI is trained on all human works, then how can it create anything new&rdquo;? AI Seinfeld is the perfect counterargument: even though it&rsquo;s powered by a LLM, the <em>humans</em> behind it are what made it go viral. Even before ChatGPT, generative AI has always excelled as a tool. The microwave gag and the 144p visual filter were not AI-generated or an attempt to emulate aspects of the Seinfeld sitcom: they were distinct creative decisions that made the entire project more interesting, and they aren&rsquo;t something that you could prompt an AI to suggest to add. AI Seinfeld in hindsight was an ethical form of AI-generated media: it did not replace Seinfeld the TV show, no one would stop watching streams of Seinfeld in favor of the AI-generated alternative, and copyright holders and Jerry Seinfeld did not lose revenue due to AI Seinfeld&rsquo;s existence: if anything, the nostalgic buzz increased streams of the original show.</p>
<p>With the current trajectory of AI slop and the perverse incentives by large tech companies to not address it, I am pessimistic that AI content will ever be at a state where it will cross that final hump of the uncanny valley curve into full acceptance, and even more pessimistic about the backlash against generative AI ever subsiding. With generative model training now at the point where it requires exponentially more compute and data for increasingly marginal returns, it will take years if at all for generative AI output to reach the far right of the uncanny valley chart, and unless the large tech companies actually create an <a href="https://en.wikipedia.org/wiki/Artificial_general_intelligence">AGI</a>, they are unlikely to obtain higher acceptability than AI Seinfeld ever did.</p>
<p>I wrote most of this blog post weeks ago but held off publishing it because new AI news kept happening. Most notably, the <a href="https://blackforestlabs.ai/our-team/">creators of Stable Diffusion</a> just released the <a href="https://blackforestlabs.ai">FLUX.1 series</a> of generative image AI models, which presents substantially improved coherence both to the provided prompt and within the image itself. Some of the variants are <a href="https://huggingface.co/black-forest-labs/FLUX.1-dev">open-source</a>, allowing the community to finetune them. The <a href="https://huggingface.co/XLabs-AI/flux-RealismLora">XLabs-AI/flux-RealismLora</a> in particular focuses on realism as it name implies, and <a href="https://www.reddit.com/r/StableDiffusion/comments/1emrprx/feel_the_difference_between_using_flux_with">one demo</a> from that finetune <a href="https://x.com/rpnickson/status/1821634114274873850">went megaviral</a>.</p>
<figure class="align-center ">

    <img loading="lazy" srcset="/2024/08/ai-seinfeld/flux_hu_f2586697cc180453.webp 320w,/2024/08/ai-seinfeld/flux.webp 664w" src="flux.webp#center"
         alt="One of the viral realism demo images: it does not have a dreamy look as other AI images but contextually expected stage lighting, the background and lanyard text is legible despite the depth-of-field blur, and body proportions are mostly correct except the long fingers. via /u/Glittering-Football9 on Reddit / StableDiffusion subreddit." width="400"/> <figcaption>
            <p>One of the viral realism demo images: it does not have a dreamy look as other AI images but contextually expected stage lighting, the background and lanyard text is legible despite the depth-of-field blur, and body proportions are mostly correct except the long fingers. <a href="https://www.reddit.com/r/StableDiffusion/comments/1emrprx/comment/lh30hvv/">via /u/Glittering-Football9 on Reddit / StableDiffusion subreddit</a>.</p>
        </figcaption>
</figure>

<p>That example in my opinion is more real than Sora but given the mixed reactions to the image, it&rsquo;s right at the acceptability = 0 threshold.</p>
<figure>

    <img loading="lazy" srcset="/2024/08/ai-seinfeld/uncanny_valley_5_hu_c33303ff9d736da6.webp 320w,/2024/08/ai-seinfeld/uncanny_valley_5_hu_d0b5c2c50072b2b0.webp 768w,/2024/08/ai-seinfeld/uncanny_valley_5_hu_7eb161e4aba72dd1.webp 1024w,/2024/08/ai-seinfeld/uncanny_valley_5.webp 1200w" src="uncanny_valley_5.webp"/> 
</figure>

<p>The generative AI bell cannot be unrung. As you can tell from this post, I personally try to thread the thin line between both cool applications of generative AI (at the risk of getting harrassed) and the problems generative AI can cause (also at the risk of getting harrassed) because it&rsquo;s important to shine a light on what&rsquo;s actually possible with AI when the misinformation around generative AI is only increasing. It&rsquo;s overall a big bummer how we went from weird Valentine&rsquo;s Day hearts, to a quirky livestream of a group of AI-generated friends, to what AI is now.</p>
<div class="footnotes" role="doc-endnotes">
<hr>
<ol>
<li id="fn:1">
<p>All of the examples in this post use LLM APIs as they provide the customization necessary to get effective results: the results for asking the same prompts to free chat frontends such as chatgpt.com will be substantially different.&#160;<a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:2">
<p>When I was younger, I actually didn&rsquo;t like Seinfeld and instead preferred to watch <a href="https://en.wikipedia.org/wiki/Everybody_Loves_Raymond">Everybody Loves Raymond</a>.&#160;<a href="#fnref:2" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:3">
<p>Incidentally, parametric prompts is why Unlimited Steam got <a href="https://www.reddit.com/r/unlimitedsteam/comments/12wto93/thank_you_for_enjoying_the_steam/">permanently banned</a> from Twitch: in what would now be known as a <a href="https://www.ibm.com/topics/prompt-injection">prompt injection</a>, one of the GitHub-hosted lists the channel sourced thousands of food choices for the prompt contained a few highly offensive selections.&#160;<a href="#fnref:3" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:4">
<p>Prompt engineering instability grows exponentially as the prompt size increases since each part of the prompt has to relate to each other. Claude 3.5 Sonnet is the first LLM I&rsquo;ve tested that can handle super-long bespoke prompts and can actually account for all aspects of the prompt.&#160;<a href="#fnref:4" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:5">
<p>To be fully ethical, an AI practitioner would have to proactively offer additional contractual guarantees to creatives they are commissioning, including highly-scoped usage of the assets they provide and a clause to not train generative AI on said assets to avoid future business.&#160;<a href="#fnref:5" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
</ol>
</div>
]]></content:encoded>
    </item>
    <item>
      <title>Stable Diffusion 2.0 and the Importance of Negative Prompts for Good Results</title>
      <link>https://minimaxir.com/2022/11/stable-diffusion-negative-prompt/</link>
      <pubDate>Mon, 28 Nov 2022 09:15:00 -0800</pubDate>
      <guid>https://minimaxir.com/2022/11/stable-diffusion-negative-prompt/</guid>
      <description>Negative prompts can be far superior than traditional prompt additions.</description>
      <content:encoded><![CDATA[<p><span><style>
.pos, .pos code {
color: #27ae60 !important;
}
.neg, .neg code {
color: #c0392b !important;
}
</style></span></p>
<p>As an unexpected surprise, StabilityAI released <a href="https://stability.ai/blog/stable-diffusion-v2-release">Stable Diffusion 2.0</a> last week, the next major version of the text-to-image AI that has been warping the entire ecosystem. Architecture-wise it&rsquo;s mostly the same, except with a new text encoder (<a href="https://github.com/mlfoundations/open_clip">OpenCLIP</a> instead of <a href="https://openai.com">OpenAI</a>&rsquo;s CLIPText). StabilityAI boasts that Stable Diffusion 2.0 has <a href="https://github.com/Stability-AI/stablediffusion#stable-diffusion-v20">better performance quantitatively</a>, but art in the end is subjective.</p>
<p>Within 24 hours after release, users on <a href="https://www.reddit.com">Reddit</a> and <a href="https://twitter.com/">Twitter</a> noted that the new model performed <em>worse</em> than Stability Diffusion 1.5 with the same exact input prompts and settings. Some users also noticed that putting in the names of real artists such as the <a href="https://thechainsaw.com/nft/ai-art-debate/">infamous Greg Rutkowski</a> had zero effect on the output.</p>
<p>Some point to the fact that the new model was trained on fewer NSFW images as the culprit for these changes, but in my opinion the culprit here is the switch to OpenCLIP. A new text encoder means some of the assumptions and prompt hacks for earlier versions of Stable Diffusion may no longer work. On the other hand, it may enable <em>new</em> prompt hacks. The CEO of StabilityAI Emad Mostaque <a href="https://twitter.com/EMostaque/status/1596907328548139008">mentioned</a> that negative prompts should work better due to the way the model was trained. It&rsquo;s still theory though; practice and experimentation is always better.</p>
<p>I hadn&rsquo;t played with negative prompts in Stable Diffusion before, although it is rumored that it&rsquo;s part of the secret sauce behind some of the more well known commercial Stable Diffusion services. But after lots of experimenting with negative prompts in SD 2.0, it&rsquo;s clear that negative prompts are the key to getting good results from the model reliably, and most surprisingly, negative prompts can be far superior than traditional prompt additions.</p>
<h2 id="an-introduction-to-negative-prompting">An Introduction to Negative Prompting</h2>
<p><em>All generated images in this blog post are generated by Stable Diffusion v2.0 base (via <a href="https://github.com/huggingface/diffusers">diffusers</a>) with a classifier-free guidance of 7.5, the Euler Ancestral scheduler, with 50 denoising steps.</em></p>
<p>Analogous with normal text-to-image prompting, negative prompting indicates which terms you do not want to see in the resulting image. At a technical level for Stable Diffusion, the encoded negative prompt serves as an high-dimension anchor the diffusion process strays away from.</p>
<p>Let&rsquo;s test it out with Stable Diffusion 2.0. For example, let&rsquo;s go back to my <a href="https://minimaxir.com/2021/08/vqgan-clip/">VQGAN + CLIP prompts</a> and try <code>cyberpunk forest by Salvador Dali</code>.</p>
<figure>

    <img loading="lazy" srcset="/2022/11/stable-diffusion-negative-prompt/cyberpunk%20forest%20by%20Salvador%20Dali@0.5x_hu_82b18b1f040516ef.webp 320w,/2022/11/stable-diffusion-negative-prompt/cyberpunk%20forest%20by%20Salvador%20Dali@0.5x_hu_ca930ce094f8d0f9.webp 768w,/2022/11/stable-diffusion-negative-prompt/cyberpunk%20forest%20by%20Salvador%20Dali@0.5x.png 768w" src="cyberpunk%20forest%20by%20Salvador%20Dali@0.5x.png"
         alt="prompt: cyberpunk forest by Salvador Dali, via Stable Diffusion 2.0"/> <figcaption>
            <p>prompt: <code>cyberpunk forest by Salvador Dali</code>, via Stable Diffusion 2.0</p>
        </figcaption>
</figure>

<p>What if you wanted to remove things like <code>trees</code> and/or a certain color like <code>green</code>? That&rsquo;s what you&rsquo;d put in your negative prompt. Can Stable Diffusion 2.0 adjust?</p>
<figure>

    <img loading="lazy" srcset="/2022/11/stable-diffusion-negative-prompt/neg_trees,%20green@0.5x_hu_80218cde6fbc7b19.webp 320w,/2022/11/stable-diffusion-negative-prompt/neg_trees,%20green@0.5x_hu_defe89bf79cbe520.webp 768w,/2022/11/stable-diffusion-negative-prompt/neg_trees,%20green@0.5x.png 768w" src="neg_trees,%20green@0.5x.png"
         alt="prompt: cyberpunk forest by Salvador Dali; negative prompt: trees, green, via Stable Diffusion 2.0"/> <figcaption>
            <p>prompt: <code>cyberpunk forest by Salvador Dali</code>; negative prompt: <code>trees, green</code>, via Stable Diffusion 2.0</p>
        </figcaption>
</figure>

<p>Indeed it does, with a larger dose of surrealistic cyberpunk, but it is still a forest albeit more metaphorical.</p>
<p>One popular trick is to also include more abstract bad-image concepts like <code>blurry</code> and <code>pixelated</code> in order to theoretically improve the image. But are these negative prompts better than the prompt additional &ldquo;ingredients&rdquo; like <code>4k hd</code> and <code>trending on artstation</code> like CLIPText-based text-to-image AI before it? How do negative prompts interact with those positive prompt additions? Let&rsquo;s test this further and more empirically.</p>
<h2 id="in-the-style-of-wrong">In The Style of Wrong</h2>
<p>As a quick aside, textual inversion, a technique which allows the text encoder to learn a specific object or style that can be trivially invoked in a prompt, does work with Stable Diffusion 2.0, although since the text encoder is different (and larger, with 1024D embeddings instead of 768D), each textual inversion embedding has to be retrained but otherwise behaves the same way. One popular style in SD 1.X is the &ldquo;<a href="https://www.midjourney.com">Midjourney</a>&rdquo; style located <a href="https://huggingface.co/sd-concepts-library/midjourney-style">here</a>, which has a overly-fantasy aesthetic. I&rsquo;ve trained a new version of the <code>&lt;midjourney&gt;</code> token (available <a href="https://huggingface.co/minimaxir/midjourney_sd_2_0">here</a>).</p>
<p>Additionally, there&rsquo;s a new possibility of using textual inversion for negative prompts. Redditor Nerfgun3 trained a &ldquo;<a href="https://www.reddit.com/r/StableDiffusion/comments/yy2i5a/i_created_a_negative_embedding_textual_inversion/">negative embedding</a>&rdquo; for SD 1.X by generating a dataset of synthetic images by using common negative prompts as positive prompts instead, then training a textual inversion embedding on them. I <a href="https://github.com/minimaxir/stable-diffusion-negative-prompt/blob/main/wrong_image_generator.ipynb">reproduced that process</a> with a few tweaks to improve the synthetic dataset and trained a new <code>&lt;wrong&gt;</code> token (available <a href="https://huggingface.co/minimaxir/wrong_embedding_sd_2_0">here</a>).</p>
<p>We can now cross-test a positive prompt addition or a positive token with a negative prompt or negative token to see just how impactful the negative prompts are. Here a list of prompts to test, with positive prompt additions in <span class="pos">green</span> and negative prompt additions in <span class="neg">red</span>:</p>
<table>
  <thead>
      <tr>
          <th>Label</th>
          <th>Description</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><span class="pos"><code>PROMPT</code></span></td>
          <td><span class="pos">hyper-detailed and intricate, realistic shaded, fine detail, realistic proportions, symmetrical, sharp focus, 8K resolution</span></td>
      </tr>
      <tr>
          <td><span class="pos"><code>&lt;TOKEN&gt;</code></span></td>
          <td><span class="pos">in the style of <code>&lt;midjourney&gt;</code></span></td>
      </tr>
      <tr>
          <td><span class="neg"><code>PROMPT</code></span></td>
          <td><span class="neg">ugly, boring, bad anatomy</span></td>
      </tr>
      <tr>
          <td><span class="neg"><code>&lt;TOKEN&gt;</code></span></td>
          <td><span class="neg">in the style of <code>&lt;wrong&gt;</code></span></td>
      </tr>
  </tbody>
</table>
<p>For example, one test input to Stable Diffusion 2.0 could be a prompt of <code>cyberpunk forest by Salvador Dali, in the style of &lt;midjourney&gt;</code> and a negative prompt of <code>in the style of &lt;wrong&gt;</code>, corresponding a green <code>&lt;TOKEN&gt;</code> prompt label and a red <code>&lt;TOKEN&gt;</code> label respectively.</p>
<p>Additionally, each individual generated image will start with the same initial latent, with seeded scheduling. This allows the impacts of negative prompts to be shown more clearly, as keeping the same prompt given a constant initial latent will allow the generated image composition to remain the same while changing the negative prompts.</p>
<p>Now, let&rsquo;s finally begin. Let&rsquo;s start off with <code>Steve Jobs head</code> as the base prompt; simple enough.</p>
<figure>

    <img loading="lazy" srcset="/2022/11/stable-diffusion-negative-prompt/Steve%20Jobs%20head_seed_59049%202@0.5x_hu_d027db0c65c0f528.webp 320w,/2022/11/stable-diffusion-negative-prompt/Steve%20Jobs%20head_seed_59049%202@0.5x_hu_c7763e3af4045e00.webp 768w,/2022/11/stable-diffusion-negative-prompt/Steve%20Jobs%20head_seed_59049%202@0.5x.png 768w" src="Steve%20Jobs%20head_seed_59049%202@0.5x.png"
         alt="base prompt: Steve Jobs head, seed: 59049, via Stable Diffusion 2.0"/> <figcaption>
            <p>base prompt: <code>Steve Jobs head</code>, seed: 59049, via Stable Diffusion 2.0</p>
        </figcaption>
</figure>

<p>The two prompt additions each changed the style; the base prompt did a cartoon; the realistic prompt addition made it more of a 3D render, and the Midjourney token made it an artsy approach. However, when negative prompts are added, each image becomes more clear, with less blurriness, more neutral lighting, and greater skin detail. More notably, the <code>&lt;wrong&gt;</code> token did much better than the smaller negative prompt.</p>
<p>How about an image generation classic: the famous avocado armchair which was demoed with the <a href="https://openai.com/blog/dall-e/">original DALL-E</a>?</p>
<figure>

    <img loading="lazy" srcset="/2022/11/stable-diffusion-negative-prompt/avocado_hu_72f2a11e6d06c9ec.webp 320w,/2022/11/stable-diffusion-negative-prompt/avocado_hu_6887cc58d0c21b81.webp 768w,/2022/11/stable-diffusion-negative-prompt/avocado.png 768w" src="avocado.png"
         alt="base prompt: an armchair in the shape of an avocado. an armchair imitating an avocado., seed: 59049, via Stable Diffusion 2.0"/> <figcaption>
            <p>base prompt: <code>an armchair in the shape of an avocado. an armchair imitating an avocado.</code>, seed: 59049, via Stable Diffusion 2.0</p>
        </figcaption>
</figure>

<p>Here&rsquo;s where things get interesting; the positive text prompt addition ruins the intent of the original prompt completely, and again the negative prompts each refine the corresponding image with more detail (including the whole avocado!)</p>
<p>Now that we have good demos, let&rsquo;s go back to Dali&rsquo;s cyberpunk forest:</p>
<figure>

    <img loading="lazy" srcset="/2022/11/stable-diffusion-negative-prompt/cyberpunk%20forest%20by%20Salvador%20Dali_seed_59049@0.5x_hu_dc12398f97926632.webp 320w,/2022/11/stable-diffusion-negative-prompt/cyberpunk%20forest%20by%20Salvador%20Dali_seed_59049@0.5x_hu_d245db7713c58aed.webp 768w,/2022/11/stable-diffusion-negative-prompt/cyberpunk%20forest%20by%20Salvador%20Dali_seed_59049@0.5x.png 768w" src="cyberpunk%20forest%20by%20Salvador%20Dali_seed_59049@0.5x.png"
         alt="base prompt: cyberpunk forest by Salvador Dali, seed: 59049, via Stable Diffusion 2.0"/> <figcaption>
            <p>base prompt: <code>cyberpunk forest by Salvador Dali</code>, seed: 59049, via Stable Diffusion 2.0</p>
        </figcaption>
</figure>

<p>In this case, both positive prompt additions wipe out Dali&rsquo;s style, opting for a more realistic forest and later reinforced by the negative prompts. In the case of the original prompt, the negative prompts further emphasize Dali&rsquo;s artistic style. This a good example of positive prompt additions not being a strictly good thing.</p>
<p>Can negative prompts help create yummy AI-generated food <a href="https://minimaxir.com/2022/07/food-photography-ai/">like DALL-E 2 can</a>? Let&rsquo;s see if it can make a hamburger:</p>
<figure>

    <img loading="lazy" srcset="/2022/11/stable-diffusion-negative-prompt/a%20delicious%20hamburger_seed_19683@0.5x_hu_5c4051c13f36bb64.webp 320w,/2022/11/stable-diffusion-negative-prompt/a%20delicious%20hamburger_seed_19683@0.5x_hu_c40b7e6eb245ef1a.webp 768w,/2022/11/stable-diffusion-negative-prompt/a%20delicious%20hamburger_seed_19683@0.5x.png 768w" src="a%20delicious%20hamburger_seed_19683@0.5x.png"
         alt="base prompt: a delicious hamburger, seed: 19683, via Stable Diffusion 2.0"/> <figcaption>
            <p>base prompt: <code>a delicious hamburger</code>, seed: 19683, via Stable Diffusion 2.0</p>
        </figcaption>
</figure>

<p>This one is a pretty unambigious case of negative prompts helping out the final result; the output using both tokens is pretty close to DALL-E 2 quality!</p>
<p>Another interesting thing about Stable Diffusion 2.0 is that text renders better; small text is not fully legible, but large text is more discernable. Perhaps Stable Diffusion 2.0 can envision a <a href="https://www.nytimes.com">New York Times</a> front page depicting the rise of robot overlords.</p>
<figure>

    <img loading="lazy" srcset="/2022/11/stable-diffusion-negative-prompt/evil_robot_hu_7352b61e7a1767da.webp 320w,/2022/11/stable-diffusion-negative-prompt/evil_robot_hu_707a7c69d289b4f2.webp 768w,/2022/11/stable-diffusion-negative-prompt/evil_robot.png 768w" src="evil_robot.png"
         alt="base prompt: an evil robot on the front page of the New York Times, seed: 19683, via Stable Diffusion 2.0"/> <figcaption>
            <p>base prompt: <code>an evil robot on the front page of the New York Times</code>, seed: 19683, via Stable Diffusion 2.0</p>
        </figcaption>
</figure>

<p>There&rsquo;s a surprising amount of evil robot variety despite the fixed latent inputs, and the layouts of the newspaper are very accurate to the NYT. The especially weird negative-prompt-text-only image is an example of a surprisingly rare mode collapse, which is interesting (or it&rsquo;s Stable Diffusion <em>hiding something</em>). Although the robot from the original prompt is clearly the most evil.</p>
<p>We can also investigate how negative prompts can help the rendering of human subjects. Let&rsquo;s take <a href="https://www.taylorswift.com">Taylor Swift</a>. What happens when she becomes President Taylor Swift? (hopefully Stable Diffusion doesn&rsquo;t confuse her with the other <a href="https://en.wikipedia.org/wiki/Zachary_Taylor">President Taylor</a>)</p>
<figure>

    <img loading="lazy" srcset="/2022/11/stable-diffusion-negative-prompt/tay_hu_32220bf617546cdb.webp 320w,/2022/11/stable-diffusion-negative-prompt/tay_hu_769f4c6f3a18a3bb.webp 768w,/2022/11/stable-diffusion-negative-prompt/tay.png 768w" src="tay.png"
         alt="base prompt: President Taylor Swift giving her presidential inauguration speech, seed: 6561, via Stable Diffusion 2.0"/> <figcaption>
            <p>base prompt: <code>President Taylor Swift giving her presidential inauguration speech</code>, seed: 6561, via Stable Diffusion 2.0</p>
        </figcaption>
</figure>

<p>So both the positive prompt addition types make the initial output unambigiously worse, which is a surprise. But the negative prompts fix them, and again, give President Tay a nice wardrobe varity. It&rsquo;s worth noting that Stable Diffusion 2.0 is better at generating correct hands than SD 1.X&hellip;just don&rsquo;t look at them too closely.</p>
<p>Lastly, we can&rsquo;t forget about <a href="https://knowyourmeme.com/memes/ugly-sonic">Ugly Sonic</a>, the initial hedgehog from the Sonic Movie who was the subject of my <a href="https://minimaxir.com/2022/09/stable-diffusion-ugly-sonic/">previous Stable Diffusion blog post</a>. I received many complaints that the AI-generated Ugly Sonic wasn&rsquo;t really Ugly Sonic because the generated Ugly Sonics didn&rsquo;t have human teeth! Time to fix that!</p>
<figure>

    <img loading="lazy" srcset="/2022/11/stable-diffusion-negative-prompt/-ugly-sonic-%20smiling%20with%20human%20teeth_seed_6561@0.5x_hu_5e18ec25e73bdbbe.webp 320w,/2022/11/stable-diffusion-negative-prompt/-ugly-sonic-%20smiling%20with%20human%20teeth_seed_6561@0.5x_hu_63078f24d4f14b46.webp 768w,/2022/11/stable-diffusion-negative-prompt/-ugly-sonic-%20smiling%20with%20human%20teeth_seed_6561@0.5x.png 768w" src="-ugly-sonic-%20smiling%20with%20human%20teeth_seed_6561@0.5x.png"
         alt="base prompt: &lt;ugly-sonic&gt; smiling with human teeth, seed: 6561, via Stable Diffusion 2.0"/> <figcaption>
            <p>base prompt: <code>&lt;ugly-sonic&gt; smiling with human teeth</code>, seed: 6561, via Stable Diffusion 2.0</p>
        </figcaption>
</figure>

<p>In this case, the negative prompts <em>ruined</em> Ugly Sonic because they progressively remove his human teeth!</p>
<h2 id="conclusion">Conclusion</h2>
<p>As always with AI art, your mileage will vary, but negative prompting will be a much more important tool going forward in AI Image generation and anchoring on prompt engineering strategies that worked in the past is a mistake. It also provides a good opportunity to stop using living artists as a prompt engineering crutch since that may not be possible moving forward, which is a good thing for the industry (especially given <a href="https://www.theverge.com/23444685/generative-ai-copyright-infringement-legal-fair-use-training-data">legal uncertainty</a>!).</p>
<p>All my code used to generate the images for this article are available <a href="https://github.com/minimaxir/stable-diffusion-negative-prompt">in this GitHub repository</a>, including a <a href="https://colab.research.google.com/github/minimaxir/stable-diffusion-negative-prompt/blob/main/sd_2_0_base.ipynb">Colab Notebook</a> for general generation with the <code>&lt;wrong&gt;</code> token and a <a href="https://colab.research.google.com/github/minimaxir/stable-diffusion-negative-prompt/blob/main/sd_2_0_grid_3x3.ipynb">Colab Notebook</a> for the 3x3 labeled grid images, with easily tweakable prompt inputs if you want to run your own experiments.</p>
<p>It would be interesting to see if it&rsquo;s possible to finetune Stable Diffusion 2.0 such that it gains an &ldquo;intrinsic&rdquo; negative prompt without having to manually specify it&hellip;which might be happening sooner than you think. 😉</p>
<hr>
<p><em>Disclosure: I am neither an artist nor an expert in art theory. All my comments on what are &ldquo;good&rdquo; AI art generations are my own (likely bad) opinions.</em></p>
]]></content:encoded>
    </item>
    <item>
      <title>How to Generate Customized AI Art Using VQGAN and CLIP</title>
      <link>https://minimaxir.com/2021/08/vqgan-clip/</link>
      <pubDate>Wed, 18 Aug 2021 08:45:00 -0700</pubDate>
      <guid>https://minimaxir.com/2021/08/vqgan-clip/</guid>
      <description>Knowing how AI art is made is the key to making even better AI art.</description>
      <content:encoded><![CDATA[<style>pre code { white-space: pre; }</style>
<p>The latest and greatest AI content generation trend is AI generated art. In January 2021, <a href="https://openai.com/">OpenAI</a> demoed <a href="https://openai.com/blog/dall-e/">DALL-E</a>, a GPT-3 variant which creates images instead of text. However, it can create images in response to a text prompt, allowing for some very fun output.</p>
<figure>

    <img loading="lazy" srcset="/2021/08/vqgan-clip/avocado_hu_a758e21fc220789.webp 320w,/2021/08/vqgan-clip/avocado_hu_b17b8218450473b0.webp 768w,/2021/08/vqgan-clip/avocado_hu_f18c1c7ad2c98eac.webp 1024w,/2021/08/vqgan-clip/avocado.png 1632w" src="avocado.png"
         alt="DALL-E demo, via OpenAI."/> <figcaption>
            <p>DALL-E demo, <a href="https://openai.com/blog/dall-e/">via OpenAI</a>.</p>
        </figcaption>
</figure>

<p>However the generated images are not always coherent, so OpenAI also demoed <a href="https://openai.com/blog/clip/">CLIP</a>, which can be used to translate an image into text and therefore identify which generated images were actually avocado armchairs. CLIP was then <a href="https://github.com/openai/CLIP">open-sourced</a>, although DALL-E was not.</p>
<figure>

    <img loading="lazy" srcset="/2021/08/vqgan-clip/guacamole_hu_af13ecf1e14e0f91.webp 320w,/2021/08/vqgan-clip/guacamole_hu_af49e75a81bb35b1.webp 768w,/2021/08/vqgan-clip/guacamole_hu_72b24e07c3f1faad.webp 1024w,/2021/08/vqgan-clip/guacamole.png 1198w" src="guacamole.png"
         alt="CLIP demo, via OpenAI."/> <figcaption>
            <p>CLIP demo, <a href="https://openai.com/blog/clip/">via OpenAI</a>.</p>
        </figcaption>
</figure>

<p>Since CLIP is essentially an interface between representations of text and image data, clever hacking can allow anyone to create their own pseudo-DALL-E. The first implementation was <a href="https://github.com/lucidrains/big-sleep">Big Sleep</a> by Ryan Murdock/<a href="https://twitter.com/advadnoun">@advadnoun</a>, which combined CLIP with an image generating <a href="https://en.wikipedia.org/wiki/Generative_adversarial_network">GAN</a> named <a href="https://arxiv.org/abs/1809.11096">BigGAN</a>. Then open source worked its magic: the GAN base was changed to <a href="https://github.com/CompVis/taming-transformers">VQGAN</a>, a newer model architecture Patrick Esser and Robin Rombach and Björn Ommer which allows more coherent image generation. The core CLIP-guided training was improved and translated <a href="https://colab.research.google.com/drive/1L8oL-vLJXVcRzCFbPwOoMkPKJ8-aYdPN">to a Colab Notebook</a> by Katherine Crawson/<a href="https://twitter.com/RiversHaveWings">@RiversHaveWings</a> and others in a special Discord server. Twitter accounts like <a href="https://twitter.com/images_ai">@images_ai</a> and <a href="https://twitter.com/ai_curio">@ai_curio</a> which leverage VQGAN + CLIP with user-submitted prompts have gone viral and <a href="https://www.newyorker.com/culture/infinite-scroll/appreciating-the-poetic-misunderstandings-of-ai-art">received mainstream press</a>. <a href="https://twitter.com/ak92501">@ak92501</a> <a href="https://twitter.com/ak92501/status/1421246864649773058">created</a> a <a href="https://colab.research.google.com/drive/1Foi0mCSE6NrW9oI3Fhni7158Krz4ZXdH?usp=sharing">fork of that Notebook</a> which has a user-friendly UI, to which I became aware of how far AI image generation has developed in a few months.</p>
<p>From that, I forked <a href="https://colab.research.google.com/drive/1wkF67ThUz37T2_oPIuSwuO4e_-0vjaLs?usp=sharing">my own Colab Notebook</a>, and streamlined the UI a bit to minimize the number of clicks needs to start generating and make it more mobile-friendly.</p>
<p>The VQGAN + CLIP technology is now in a good state such that it can be used for more serious experimentation. Some say art is better when there&rsquo;s mystery, but my view is that knowing how AI art is made is the key to making even better AI art.</p>
<h2 id="a-hello-world-to-ai-generated-art">A Hello World to AI Generated Art</h2>
<p>_All AI-generated image examples in this blog post are generated using <a href="https://colab.research.google.com/drive/1wkF67ThUz37T2_oPIuSwuO4e_-0vjaLs?usp=sharing">this Colab Notebook</a>, with the captions indicating the text prompt and other relevant deviations from the default inputs to reproduce the image._</p>
<p>Let&rsquo;s jump right into it with something fantastical: how well can AI generate a cyberpunk forest?</p>
<figure>

    <img loading="lazy" srcset="/2021/08/vqgan-clip/cyberpunk_forest_hu_4ba90fadcee22967.webp 320w,/2021/08/vqgan-clip/cyberpunk_forest.png 592w" src="cyberpunk_forest.png"
         alt="cyberpunk forest"/> <figcaption>
            <p><code>cyberpunk forest</code></p>
        </figcaption>
</figure>

<p>The TL;DR of how VQGAN + CLIP works is that VQGAN generates an image, CLIP scores the image according to how well it can detect the input prompt, and VQGAN uses that information to iteratively improve its image generation. Lj Miranda has a <a href="https://ljvmiranda921.github.io/notebook/2021/08/08/clip-vqgan/">good detailed technical writeup</a>.</p>
<figure>

    <img loading="lazy" srcset="/2021/08/vqgan-clip/clip_vqgan_with_image_hu_5e04a615f6af5ff.webp 320w,/2021/08/vqgan-clip/clip_vqgan_with_image_hu_7546c61d3cb746e.webp 768w,/2021/08/vqgan-clip/clip_vqgan_with_image_hu_d4c317842e36f301.webp 1024w,/2021/08/vqgan-clip/clip_vqgan_with_image.png 1067w" src="clip_vqgan_with_image.png"
         alt="via Lj Miranda. Modified for theme friendliness."/> <figcaption>
            <p><a href="https://ljvmiranda921.github.io/notebook/2021/08/08/clip-vqgan/">via Lj Miranda</a>. Modified for theme friendliness.</p>
        </figcaption>
</figure>

<p>Now let&rsquo;s do the same prompt as before, but with an added author from a time well before the cyberpunk genre existed and see if the AI can follow their style. Let&rsquo;s try <a href="https://www.wikiart.org/en/salvador-dali">Salvador Dali</a>.</p>
<figure>

    <img loading="lazy" srcset="/2021/08/vqgan-clip/cyberpunk_forest_by_salvador_dali_hu_3ad61193478875b7.webp 320w,/2021/08/vqgan-clip/cyberpunk_forest_by_salvador_dali.png 592w" src="cyberpunk_forest_by_salvador_dali.png"
         alt="cyberpunk forest by Salvador Dali"/> <figcaption>
            <p><code>cyberpunk forest by Salvador Dali</code></p>
        </figcaption>
</figure>

<p>It&rsquo;s definitely a cyberpunk forest, and it&rsquo;s definitely Dali&rsquo;s style.</p>
<p>One trick the community found to improve generated image quality is to simply add phrases that tell the AI to make a <em>good</em> image, such as <code>artstationHQ</code> or <code>trending on /r/art</code>. Trying that here:</p>
<figure>

    <img loading="lazy" srcset="/2021/08/vqgan-clip/cyberpunk_forest_by_salvador_dali_artstationhq_hu_e9392ca8f1eb7213.webp 320w,/2021/08/vqgan-clip/cyberpunk_forest_by_salvador_dali_artstationhq.png 592w" src="cyberpunk_forest_by_salvador_dali_artstationhq.png"
         alt="cyberpunk forest by Salvador Dali artstationHQ"/> <figcaption>
            <p><code>cyberpunk forest by Salvador Dali artstationHQ</code></p>
        </figcaption>
</figure>

<p>In this case, it&rsquo;s unclear if the <code>artstationHQ</code> part of the prompt gets higher priority than the <code>Salvador Dali</code> part. Another trick that VQGAN + CLIP can do is take multiple input text prompts, which can add more control. Additionally, you can assign weights to these different prompts. So if we did <code>cyberpunk forest by Salvador Dali:3 | artstationHQ</code>, the model will try three times as hard to ensure that the prompt follows a Dali painting than <code>artstationHQ</code>.</p>
<figure>

    <img loading="lazy" srcset="/2021/08/vqgan-clip/cyberpunk_forest_by_salvador_dali_3_artstationhq_hu_948ad338bfc41f2.webp 320w,/2021/08/vqgan-clip/cyberpunk_forest_by_salvador_dali_3_artstationhq.png 592w" src="cyberpunk_forest_by_salvador_dali_3_artstationhq.png"
         alt="cyberpunk forest by Salvador Dali:3 | artstationHQ"/> <figcaption>
            <p><code>cyberpunk forest by Salvador Dali:3 | artstationHQ</code></p>
        </figcaption>
</figure>

<p>Much better! Lastly, we can use negative weights for prompts such that the model targets the opposite of that prompt. Let&rsquo;s do the opposite of <code>green and white</code> to see if the AI tries to remove those two colors from the palette and maybe make the final image more cyberpunky.</p>
<figure>

    <img loading="lazy" srcset="/2021/08/vqgan-clip/cyberpunk_forest_by_salvador_dali_3_artstationhq_gw_hu_166c44dff41886a2.webp 320w,/2021/08/vqgan-clip/cyberpunk_forest_by_salvador_dali_3_artstationhq_gw.png 592w" src="cyberpunk_forest_by_salvador_dali_3_artstationhq_gw.png"
         alt="cyberpunk forest by Salvador Dali:3 | artstationHQ | green and white:-1"/> <figcaption>
            <p><code>cyberpunk forest by Salvador Dali:3 | artstationHQ | green and white:-1</code></p>
        </figcaption>
</figure>

<p>Now we&rsquo;re getting to video game concept art quality generation. Indeed, VQGAN + CLIP rewards the use of clever input prompt engineering.</p>
<h2 id="initial-images-and-style-transfer">Initial Images and Style Transfer</h2>
<p>Normally with VQGAN + CLIP, the generation starts from a blank slate. However, you can optionally provide an image to start from instead. This provides both a good base for generation and speeds it up since it doesn&rsquo;t have to learn from empty noise. I usually recommend a lower learning rate as a result.</p>
<p>So let&rsquo;s try an initial image of myself, naturally.</p>
<figure>

    <img loading="lazy" srcset="/2021/08/vqgan-clip/max_hu_458a2426943e0485.webp 320w,/2021/08/vqgan-clip/max.png 600w" src="max.png"
         alt="No, I am not an AI Generated person. Hopefully."/> <figcaption>
            <p>No, I am not an AI Generated person. Hopefully.</p>
        </figcaption>
</figure>

<p>Let&rsquo;s try another artist, such as <a href="https://en.wikipedia.org/wiki/Junji_Ito">Junji Ito</a> who has a very distinctive horror <a href="https://www.google.com/search?q=junji&#43;ito&#43;images">style of art</a>.</p>
<figure>

    <img loading="lazy" srcset="/2021/08/vqgan-clip/max_junji_ito_hu_d11d0c8ed7eb69af.webp 320w,/2021/08/vqgan-clip/max_junji_ito.png 592w" src="max_junji_ito.png"
         alt="a black and white portrait by Junji Ito — initial image above, learning rate = 0.1"/> <figcaption>
            <p><code>a black and white portrait by Junji Ito</code> — initial image above, learning rate = 0.1</p>
        </figcaption>
</figure>

<p>One of the earliest promising use cases of AI Image Generation was <a href="https://www.tensorflow.org/tutorials/generative/style_transfer">neural style transfer</a>, where an AI could take the &ldquo;style&rdquo; of one image and transpose it to another. Can it follow the style of a specific painting, such as <a href="https://www.vangoghgallery.com/painting/starry-night.html">Starry Night</a> by <a href="https://en.wikipedia.org/wiki/Vincent_van_Gogh">Vincent Van Gogh</a>?</p>
<figure>

    <img loading="lazy" srcset="/2021/08/vqgan-clip/max_starry_night_hu_79b050ddfd750a23.webp 320w,/2021/08/vqgan-clip/max_starry_night.png 592w" src="max_starry_night.png"
         alt="Starry Night by Vincent Van Gogh — initial image above, learning rate = 0.1"/> <figcaption>
            <p><code>Starry Night by Vincent Van Gogh</code> — initial image above, learning rate = 0.1</p>
        </figcaption>
</figure>

<p>Well, it got the colors and style, but the AI appears to have taken the &ldquo;Van Gogh&rdquo; part literally and gave me a nice beard.</p>
<p>Of course, with the power of AI, you can do both prompts at the same time for maximum chaos.</p>
<figure>

    <img loading="lazy" srcset="/2021/08/vqgan-clip/max_junji_ito_starry_night_hu_836dac4f5598d721.webp 320w,/2021/08/vqgan-clip/max_junji_ito_starry_night.png 592w" src="max_junji_ito_starry_night.png"
         alt="Starry Night by Vincent Van Gogh | a black and white portrait by Junji Ito — initial image above, learning rate = 0.1"/> <figcaption>
            <p><code>Starry Night by Vincent Van Gogh | a black and white portrait by Junji Ito</code> — initial image above, learning rate = 0.1</p>
        </figcaption>
</figure>

<h2 id="icons-and-generating-images-with-a-specific-shape">Icons and Generating Images With A Specific Shape</h2>
<p>While I was first experimenting with VQGAN + CLIP, I saw <a href="https://twitter.com/mark_riedl/status/1421282588791132161">an interesting tweet</a> by AI researcher Mark Riedl:</p>
<blockquote class="twitter-tweet">
  <a href="https://twitter.com/mark_riedl/status/1421282588791132161"></a>
</blockquote>
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

<p>Intrigued, I adapted some icon generation code I had handy <a href="https://github.com/minimaxir/stylecloud">from another project</a> and created <a href="https://github.com/minimaxir/icon-image">icon-image</a>, a Python tool to programmatically generate an icon using <a href="https://fontawesome.com/">Font Awesome</a> icons and paste it onto a noisy background.</p>
<figure>

    <img loading="lazy" srcset="/2021/08/vqgan-clip/icon_robot_hu_7a372246756b89fb.webp 320w,/2021/08/vqgan-clip/icon_robot.png 600w" src="icon_robot.png"
         alt="The default icon image used in the Colab Notebook"/> <figcaption>
            <p>The default icon image used in the Colab Notebook</p>
        </figcaption>
</figure>

<p>This icon can be used as an initial image, as above. Adjusting the text prompt to accomidate the icon can result in very cool images, such as <code>a black and white evil robot by Junji Ito</code>.</p>
<figure>

    <img loading="lazy" srcset="/2021/08/vqgan-clip/robot_junji_ito_hu_dccb3a3ced294446.webp 320w,/2021/08/vqgan-clip/robot_junji_ito.png 592w" src="robot_junji_ito.png"
         alt="a black and white evil robot by Junji Ito — initial image above, learning rate = 0.1"/> <figcaption>
            <p><code>a black and white evil robot by Junji Ito</code> — initial image above, learning rate = 0.1</p>
        </figcaption>
</figure>

<p>The background and icon noise is the key, as AI can shape it much better than solid colors. Omitting the noise results in a more boring image that doesn&rsquo;t reflect the prompt as well, although it has its own style.</p>
<figure>

    <img loading="lazy" srcset="/2021/08/vqgan-clip/robot_junji_ito_nonoise_hu_1fd3e72d34e39b97.webp 320w,/2021/08/vqgan-clip/robot_junji_ito_nonoise.png 592w" src="robot_junji_ito_nonoise.png"
         alt="a black and white evil robot by Junji Ito — initial image above except 1.0 icon opacity and 0.0 background noice opacity, learning rate = 0.1"/> <figcaption>
            <p><code>a black and white evil robot by Junji Ito</code> — initial image above except 1.0 icon opacity and 0.0 background noice opacity, learning rate = 0.1</p>
        </figcaption>
</figure>

<p>Another fun prompt addition is <code>rendered in unreal engine</code> (with an optional <code>high quality</code>), which instructs the AI to create a three-dimensional image and works especially well with icons.</p>
<figure>

    <img loading="lazy" srcset="/2021/08/vqgan-clip/robot_unreal_hu_b4d3e0483b500717.webp 320w,/2021/08/vqgan-clip/robot_unreal.png 592w" src="robot_unreal.png"
         alt="smiling rusted robot rendered in unreal engine high quality — icon initial image, learning rate = 0.1"/> <figcaption>
            <p><code>smiling rusted robot rendered in unreal engine high quality</code> — icon initial image, learning rate = 0.1</p>
        </figcaption>
</figure>

<p>icon-image can also generate brand images, such as the <a href="https://twitter.com/">Twitter</a> logo, which can be good for comedy, especially if you tweak the logo/background colors as well. What if we turn the Twitter logo into <a href="https://www.google.com/search?q=mordor&#43;images">Mordor</a>, which is an fair metaphor?</p>
<figure>

    <img loading="lazy" srcset="/2021/08/vqgan-clip/twitter_mordor_hu_5dc5a61efb797269.webp 320w,/2021/08/vqgan-clip/twitter_mordor.png 592w" src="twitter_mordor.png"
         alt="Mordor — fab fa-twitter icon, icon initial image, black icon background, red icon, learning rate = 0.1"/> <figcaption>
            <p><code>Mordor</code> — <code>fab fa-twitter</code> icon, icon initial image, black icon background, red icon, learning rate = 0.1</p>
        </figcaption>
</figure>

<p>So that didn&rsquo;t turn out well as the Twitter logo got overpowered by the prompt (you can see outlines of the logo&rsquo;s bottom). However, there&rsquo;s a trick to force the AI to respect the logo: set the icon as the initial image <em>and</em> the target image, and apply a high weight to the prompt (the weight can be lowered iteratively to preserve the logo better).</p>
<figure>

    <img loading="lazy" srcset="/2021/08/vqgan-clip/twitter_mordor_2_hu_c8d00364084e21bc.webp 320w,/2021/08/vqgan-clip/twitter_mordor_2.png 592w" src="twitter_mordor_2.png"
         alt="Mordor:3 — fab fa-twitter icon, icon initial image, icon target image, black icon background, red icon, learning rate = 0.1"/> <figcaption>
            <p><code>Mordor:3</code> — <code>fab fa-twitter</code> icon, icon initial image, icon target image, black icon background, red icon, learning rate = 0.1</p>
        </figcaption>
</figure>

<h2 id="more-fun-examples">More Fun Examples</h2>
<p>Here&rsquo;s a few more good demos of what VQGAN + CLIP can do using the ideas and tricks above:</p>
<figure>

    <img loading="lazy" srcset="/2021/08/vqgan-clip/excel_hu_82c869ac7653bce0.webp 320w,/2021/08/vqgan-clip/excel.png 592w" src="excel.png"
         alt="Microsoft Excel by Junji Ito — 500 steps"/> <figcaption>
            <p><code>Microsoft Excel by Junji Ito</code> — 500 steps</p>
        </figcaption>
</figure>

<figure>

    <img loading="lazy" srcset="/2021/08/vqgan-clip/zuck_hu_fe52aa1dc05ed15e.webp 320w,/2021/08/vqgan-clip/zuck.png 592w" src="zuck.png"
         alt="a portrait of Mark Zuckerberg:2 | a portrait of a bottle of Sweet Baby Ray&#39;s barbecue sauce — 500 steps"/> <figcaption>
            <p><code>a portrait of Mark Zuckerberg:2 | a portrait of a bottle of Sweet Baby Ray's barbecue sauce</code> — 500 steps</p>
        </figcaption>
</figure>

<figure>

    <img loading="lazy" srcset="/2021/08/vqgan-clip/rickroll_hu_6912b4de0321d7e3.webp 320w,/2021/08/vqgan-clip/rickroll.png 592w" src="rickroll.png"
         alt="Never gonna give you up, Never gonna let you down — 500 steps"/> <figcaption>
            <p><code>Never gonna give you up, Never gonna let you down</code> — 500 steps</p>
        </figcaption>
</figure>

<figure>

    <img loading="lazy" srcset="/2021/08/vqgan-clip/elon_hu_d4768f4d846c8269.webp 320w,/2021/08/vqgan-clip/elon.png 592w" src="elon.png"
         alt="a portrait of cyberpunk Elon Musk:2 | a human:-1 — 500 steps"/> <figcaption>
            <p><code>a portrait of cyberpunk Elon Musk:2 | a human:-1</code> — 500 steps</p>
        </figcaption>
</figure>

<figure>

    <img loading="lazy" srcset="/2021/08/vqgan-clip/hamburger_hu_cd523739402fd119.webp 320w,/2021/08/vqgan-clip/hamburger.png 592w" src="hamburger.png"
         alt="hamburger of the Old Gods:5 — fas fa-hamburger icon, icon initial image, icon target image, black icon background, white icon, learning rate = 0.1, 500 steps"/> <figcaption>
            <p><code>hamburger of the Old Gods:5</code> — <code>fas fa-hamburger</code> icon, icon initial image, icon target image, black icon background, white icon, learning rate = 0.1, 500 steps</p>
        </figcaption>
</figure>

<figure>

    <img loading="lazy" srcset="/2021/08/vqgan-clip/reality_hu_288a071a0017cc2.webp 320w,/2021/08/vqgan-clip/reality.png 592w" src="reality.png"
         alt="reality is an illusion:8 — fas fa-eye icon, icon initial image, icon target image, black icon background, white icon, learning rate = 0.1"/> <figcaption>
            <p><code>reality is an illusion:8</code> — <code>fas fa-eye</code> icon, icon initial image, icon target image, black icon background, white icon, learning rate = 0.1</p>
        </figcaption>
</figure>

<p>@kingdomakrillic <a href="https://imgur.com/a/SnSIQRu">released an album</a> with <em>many</em> more examples of prompt augmentations and their results.</p>
<h2 id="making-money-off-of-vqgan--clip">Making Money Off of VQGAN + CLIP</h2>
<p>Can these AI generated images be commercialized as <a href="https://en.wikipedia.org/wiki/Software_as_a_service">software-as-a-service</a>? It&rsquo;s unclear. In contrast to <a href="https://github.com/NVlabs/stylegan2">StyleGAN2</a> images (where the <a href="https://nvlabs.github.io/stylegan2/license.html">license</a> is explicitly noncommercial), all aspects of the VQGAN + CLIP pipeline are MIT Licensed which does support commericalization. However, the ImageNet 16384 VQGAN used in this Colab Notebook and many other VQGAN+CLIP Notebooks was trained on <a href="https://www.image-net.org/">ImageNet</a>, which has <a href="https://www.reddit.com/r/MachineLearning/comments/id4394/d_is_it_legal_to_use_models_pretrained_on/">famously complicated licensing</a>, and whether finetuning the VQGAN counts as sufficiently detached from an IP perspective hasn&rsquo;t been legally tested to my knowledge. There are other VQGANs available such as ones trained on the <a href="https://opensource.google/projects/open-images-dataset">Open Images Dataset</a> or <a href="https://cocodataset.org/">COCO</a>, both of which have commercial-friendly <a href="https://creativecommons.org/licenses/by/4.0/">CC-BY-4.0</a> licenses, although in my testing they had substantially lower image generation quality.</p>
<p>Granted, the biggest blocker to making money off of VQGAN + CLIP in a scalable manner is generation speed; unlike most commercial AI models which use inference and can therefore be optimized to drastically increase performance, VQGAN + CLIP requires training, which is much slower and can&rsquo;t allow content generation in real time like <a href="https://openai.com/blog/openai-api/">GPT-3</a>. Even with expensive GPUs and generating at small images sizes, training takes a couple minutes at minimum, which correlates with a higher cost-per-image and annoyed users. It&rsquo;s still cheaper per image than what OpenAI charges for their GPT-3 API, though, and many startups have built on that successfuly.</p>
<p>Of course, if you just want make <a href="https://en.wikipedia.org/wiki/Non-fungible_token">NFTs</a> from manual usage of VQGAN + CLIP, go ahead.</p>
<h2 id="the-next-steps-for-ai-image-generation">The Next Steps for AI Image Generation</h2>
<p>CLIP itself is just the first practical iteration of translating text-to-images, and I suspect this won&rsquo;t be the last implementation of such a model (OpenAI may pull a GPT-3 and not open-source the inevitable CLIP-2 since now there&rsquo;s a proven monetizeable use case).</p>
<p>However, the AI Art Generation industry is developing at a record pace, especially on the image-generating part of the equation. Just the day before this article was posted, Katherine Crawson <a href="https://twitter.com/RiversHaveWings/status/1427580354651586562">released</a> a <a href="https://colab.research.google.com/drive/1QBsaDAZv8np29FPbvjffbE1eytoJcsgA">Colab Notebook</a> for CLIP with Guided Diffusion, which generates <a href="https://twitter.com/RiversHaveWings/status/1427746442727149568">more realistic</a> images (albeit less fantastical), and Tom White <a href="https://twitter.com/dribnet/status/1427613617973653505">released</a> a <a href="https://colab.research.google.com/github/dribnet/clipit/blob/master/demos/PixelDrawer.ipynb">pixel art generating Notebook</a> which doesn&rsquo;t use a VQGAN variant.</p>
<p>The possibilities with just VQGAN + CLIP alone are endless.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Fun and Dystopia With AI-Based Code Generation Using GPT-J-6B</title>
      <link>https://minimaxir.com/2021/06/gpt-j-6b/</link>
      <pubDate>Mon, 14 Jun 2021 08:30:00 -0700</pubDate>
      <guid>https://minimaxir.com/2021/06/gpt-j-6b/</guid>
      <description>At the least, AI-generated code is much more readable than the average human&amp;rsquo;s.</description>
      <content:encoded><![CDATA[<p>Since <a href="https://openai.com/">OpenAI</a> will not open-source the 175 billion parameter <a href="https://beta.openai.com/">GPT-3</a> text generation model, others such as <a href="https://www.eleuther.ai/">EleutherAI</a> are developing their own, by training not-quite-as-large Transformer-based models but still getting impressive results.</p>
<p>The latest large language model is <a href="https://github.com/kingoflolz/mesh-transformer-jax">GPT-J</a>, a 6 billion parameter model by Aran Komatsuzaki and Ben Wang with a roughly similar architecture to GPT-3. They provide a free <a href="https://6b.eleuther.ai/">web demo</a> to try quick prompts, and a <a href="http://colab.research.google.com/github/kingoflolz/mesh-transformer-jax/blob/master/colab_demo.ipynb">Google Colab notebook</a> if you want to test many prompts. The model is so big it requires a <a href="https://cloud.google.com/tpu">TPU</a> to generate text at a reasonable speed!</p>
<p>Running GPT-J against <a href="https://github.com/minimaxir/gpt-3-experiments">my test prompts</a> that I had used to test GPT-3 a year ago <a href="https://twitter.com/minimaxir/status/1402468460681068544">resulted</a> it in qualitatively performing worse on most of them than GPT-3 unsurprisingly given its relative size (but still better than GPT-2 1.5B!). The exception is code generation, where GPT-J performed very well and GPT-3 had performed very poorly.</p>
<blockquote class="twitter-tweet">
  <a href="https://twitter.com/minimaxir/status/1402470969378099208"></a>
</blockquote>
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<p>This behavior is likely due to GPT-J&rsquo;s training set: it was trained on <a href="https://github.com/EleutherAI/the-pile">The Pile</a>, which has a high weight of <a href="https://github.com/">GitHub</a> and <a href="https://stackoverflow.com/">Stack Overflow</a> input versus the GPT-3 training set mostly on the <a href="https://commoncrawl.org/">Common Crawl</a> representation of typical internet content.</p>
<p>After further experimentation with the GPT-J demo, I found that the generated code was often both correct and convoluted: my two favorite things for code to be. And also much more.</p>
<h2 id="ai-generated-code">AI Generated Code</h2>
<p>All generated code examples used in this post have been manually curated and truncated. The raw outputs are available in this <a href="https://github.com/minimaxir/gpt-j-6b-experiments">GitHub repository</a>.</p>
<p>For this post, we&rsquo;ll test generating <a href="https://www.python.org/">Python</a> code, because <del>I don&rsquo;t know any other programming languages</del> the pseudocode nature of Python syntax may make it more friendly to AI text generation by adapting linguistic concepts other than code. Given that most modern text generating AIs are written in Python, it&rsquo;s thematically appropriate too.</p>
<p>To start, here&rsquo;s what I fed as a model prompt to GPT-J, to try and replicate the use case from OpenAI&rsquo;s demo at Build 2020:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_palendrome</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether a string is a palindrome&#34;&#34;&#34;</span>
</span></span></code></pre></div><p>Identifying a <a href="https://en.wikipedia.org/wiki/Palindrome">palindrome</a> is a good test case as there&rsquo;s a very intuitive and idiomatic solution with <code>return s == s[::-1]</code>, i.e. check to see if a string is equal to the reverse of itself. This prompt is also a good unintentional test on how well the generation is resilient to a typo in the function name, which I was too lazy to fix.</p>
<p>Indeed, many of the GPT-J generations with that prompt recite that simple approach verbatim, but some outputs reach a similar solution with redundant verbosity.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_palendrome</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether a string is a palindrome&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="s2">&#34;&#34;</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">s</span><span class="p">[::</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span> <span class="o">==</span> <span class="n">s</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_palendrome</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether a string is a palindrome&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">length</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">s</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="p">(</span><span class="n">s</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="n">length</span><span class="p">]</span> <span class="o">==</span> <span class="n">s</span><span class="p">[</span><span class="n">length</span><span class="p">::</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_palendrome</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether a string is a palindrome&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">n</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">s</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">a</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">s</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">a</span><span class="o">.</span><span class="n">reverse</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">s</span> <span class="o">==</span> <span class="nb">list</span><span class="p">(</span><span class="n">s</span><span class="p">)</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_palendrome</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether a string is a palindrome&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">s</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="n">lower</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="n">l</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="n">split</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">l</span><span class="p">[::</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">==</span> <span class="n">l</span><span class="p">[::</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
</span></span></code></pre></div><p>And then you have the implementations which calculate palindromes the hard way (usually by comparing the beginning and end of the string character-by-character) and are not always immediately clear if they are correct or not.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_palendrome</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether a string is a palindrome&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="s1">&#39;&#39;</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">s</span><span class="p">[</span><span class="n">i</span><span class="p">:</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">xrange</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">s</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">))</span> <span class="o">==</span> <span class="n">s</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_palendrome</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether a string is a palindrome&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># Make sure string is not empty</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">s</span> <span class="o">==</span> <span class="s2">&#34;&#34;</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">False</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># Grab length of string</span>
</span></span><span class="line"><span class="cl">    <span class="n">length</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">s</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># Find last character in the string</span>
</span></span><span class="line"><span class="cl">    <span class="n">lastChar</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">length</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># Find first character in the string</span>
</span></span><span class="line"><span class="cl">    <span class="n">firstChar</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># If last character is same as the first character, string is palindrome</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">lastChar</span> <span class="o">==</span> <span class="n">firstChar</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_palendrome</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether a string is a palindrome&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">l</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">s</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">l</span> <span class="o">%</span> <span class="mi">2</span><span class="o">!=</span> <span class="mi">0</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">False</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">l</span> <span class="o">&lt;</span> <span class="mi">3</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">True</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">s</span><span class="p">[</span><span class="n">l</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="o">!=</span> <span class="n">s</span><span class="p">[</span><span class="o">-</span><span class="n">l</span><span class="o">-</span><span class="mi">1</span><span class="p">]:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">False</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">l</span><span class="o">/</span><span class="mi">2</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="n">s</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">!=</span> <span class="n">s</span><span class="p">[</span><span class="n">l</span><span class="o">-</span><span class="n">i</span><span class="p">]:</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="kc">False</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="kc">True</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_palendrome</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether a string is a palindrome&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">is_rev</span><span class="p">(</span><span class="nb">str</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="n">rstr</span> <span class="o">=</span> <span class="s2">&#34;&#34;</span>
</span></span><span class="line"><span class="cl">        <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nb">str</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">            <span class="k">if</span> <span class="nb">str</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">==</span> <span class="nb">str</span><span class="p">[</span><span class="n">i</span><span class="p">]:</span>
</span></span><span class="line"><span class="cl">                <span class="n">rstr</span> <span class="o">+=</span> <span class="nb">str</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">            <span class="k">else</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">                <span class="k">break</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="nb">str</span> <span class="o">+</span> <span class="n">rstr</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">s</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">True</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">is_rev</span><span class="p">(</span><span class="n">s</span><span class="p">)</span> <span class="o">==</span> <span class="n">s</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">True</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="kc">False</span>
</span></span></code></pre></div><p>Another good test case is <code>is_even</code>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-txt" data-lang="txt"><span class="line"><span class="cl">def is_even(i):
</span></span><span class="line"><span class="cl">    &#34;&#34;&#34;Check whether an integer is even&#34;&#34;&#34;
</span></span></code></pre></div><p>Creative implementations of this function are common submissions to <a href="https://www.reddit.com/r/shittyprogramming/">/r/shittyprogramming</a> because it&rsquo;s very, very hard to mess up (the common solution is the one-liner <code>i % 2 == 0</code>). The model almost always outputs the correct solution, and impressively it outputs the alternate solution using bitwise operators (<code>i &amp; 1 == 0</code>).</p>
<p>So I tried to tweak the prompt and make it create a <em>deliberately inefficient</em> <code>is_even</code> function.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-txt" data-lang="txt"><span class="line"><span class="cl">def is_even(i):
</span></span><span class="line"><span class="cl">    &#34;&#34;&#34;Check whether an integer is even in factorial O(n!) time&#34;&#34;&#34;
</span></span></code></pre></div><p>Factorial time is the holy grail of unoptimized code. When I added that line to the prompt, the code quality dropped drastically.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_even</span><span class="p">(</span><span class="n">i</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether an integer is even in factorial O(n!) time&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="n">i</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="n">i</span> <span class="o">%</span> <span class="n">e</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="kc">False</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="kc">True</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_even</span><span class="p">(</span><span class="n">i</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether an integer is even in factorial O(n!) time&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">i</span> <span class="o">==</span> <span class="p">(</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="o">%</span> <span class="mi">2</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_even</span><span class="p">(</span><span class="n">i</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether an integer is even in factorial O(n!) time&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">try</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">i</span><span class="o">%</span><span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl">    <span class="k">except</span> <span class="ne">TypeError</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">True</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_even</span><span class="p">(</span><span class="n">i</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether an integer is even in factorial O(n!) time&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">n</span> <span class="o">=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl">    <span class="k">while</span> <span class="n">n</span> <span class="o">&lt;</span> <span class="n">i</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="n">i</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="kc">True</span>
</span></span><span class="line"><span class="cl">        <span class="n">n</span> <span class="o">+=</span> <span class="mi">1</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="kc">False</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_even</span><span class="p">(</span><span class="n">i</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether an integer is even in factorial O(n!) time&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">a</span> <span class="o">=</span> <span class="mi">1</span>
</span></span><span class="line"><span class="cl">    <span class="k">while</span> <span class="n">i</span><span class="o">!=</span> <span class="mi">0</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="n">a</span> <span class="o">*=</span> <span class="n">i</span>
</span></span><span class="line"><span class="cl">        <span class="n">i</span> <span class="o">-=</span> <span class="mi">1</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">i</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_even</span><span class="p">(</span><span class="n">i</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether an integer is even in factorial O(n!) time&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Is either odd or multiple of 2:  0, 2, 4, 6,... &#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">2</span><span class="p">:</span> <span class="k">return</span> <span class="kc">True</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">i</span> <span class="o">&gt;</span> <span class="mi">0</span> <span class="ow">and</span> <span class="n">i</span><span class="o">%</span><span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span> <span class="k">return</span> <span class="kc">False</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">50</span><span class="p">:</span> <span class="k">return</span> <span class="kc">False</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">250</span><span class="p">:</span> <span class="k">return</span> <span class="kc">False</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">750</span><span class="p">:</span> <span class="k">return</span> <span class="kc">False</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="kc">True</span>
</span></span></code></pre></div><p>These implementations still aren&rsquo;t in factorial time (which would have been impressive) but it&rsquo;s good to know how to break AIs if they become sentinent.</p>
<h2 id="ai-vs-memes">AI vs. Memes</h2>
<p>You&rsquo;ve likely seen the memetic <a href="https://xkcd.com/">XKCD</a> comic about random number generation.</p>
<figure>

    <img loading="lazy" srcset="/2021/06/gpt-j-6b/random_number_hu_afe93aba3db7836.webp 320w,/2021/06/gpt-j-6b/random_number.png 400w" src="random_number.png"
         alt="Random Number"/> <figcaption>
            <p><a href="https://xkcd.com/221/">Random Number</a></p>
        </figcaption>
</figure>

<p>Can the AI do the same? I prompted with:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-txt" data-lang="txt"><span class="line"><span class="cl">def get_random_number():
</span></span><span class="line"><span class="cl">    &#34;&#34;&#34;Returns a number guaranteed to be random&#34;&#34;&#34;
</span></span></code></pre></div><p>And it does generate implementations utilizing Python&rsquo;s <code>random</code> library, with varying degrees of &ldquo;random&rdquo;.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">get_random_number</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Returns a number guaranteed to be random&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">number</span> <span class="o">=</span> <span class="n">randint</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">10</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">number</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">get_random_number</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Returns a number guaranteed to be random&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">number</span> <span class="o">=</span> <span class="n">hashlib</span><span class="o">.</span><span class="n">sha1</span><span class="p">(</span><span class="sa">b</span><span class="s1">&#39;&#39;</span><span class="p">)</span><span class="o">.</span><span class="n">hexdigest</span><span class="p">()[:</span><span class="mi">6</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">    <span class="n">number</span> <span class="o">+=</span> <span class="p">(</span><span class="n">random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">3628</span><span class="p">)</span> <span class="o">*</span> <span class="n">math</span><span class="o">.</span><span class="n">pow</span><span class="p">(</span><span class="mi">256</span><span class="p">,</span> <span class="p">(</span><span class="mi">6</span> <span class="o">-</span> <span class="nb">len</span><span class="p">(</span><span class="n">number</span><span class="p">))))</span><span class="o">.</span><span class="n">to_bytes</span><span class="p">(</span><span class="mi">8</span><span class="p">,</span> <span class="s1">&#39;little&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">number</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">get_random_number</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Returns a number guaranteed to be random&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">rand_float</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">random</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="n">rand_float</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="n">rand_float</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">rand_float</span> <span class="o">&gt;</span> <span class="mf">0.9</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="n">rand_float</span> <span class="o">=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl">    <span class="k">else</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="n">rand_float</span> <span class="o">=</span> <span class="mi">1</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">rand_float</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">get_random_number</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Returns a number guaranteed to be random&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># Make a seed based on your system clock, so the same number is produced</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># again and again on the same machine</span>
</span></span><span class="line"><span class="cl">    <span class="n">seed</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="n">seed</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">get_random_number</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Returns a number guaranteed to be random&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="kn">import</span> <span class="nn">random</span>
</span></span><span class="line"><span class="cl">    <span class="kn">import</span> <span class="nn">sys</span>
</span></span><span class="line"><span class="cl">    <span class="n">rng</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">Random</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="n">rng</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">random</span><span class="o">.</span><span class="n">random</span><span class="p">()</span><span class="o">*</span><span class="mi">1000000</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">    <span class="k">while</span> <span class="n">rng</span><span class="o">.</span><span class="n">randrange</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="n">sys</span><span class="o">.</span><span class="n">maxint</span><span class="p">)</span> <span class="o">&lt;</span> <span class="mi">2000</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="n">rng</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">random</span><span class="o">.</span><span class="n">random</span><span class="p">()</span><span class="o">*</span><span class="mi">1000000</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">rng</span><span class="o">.</span><span class="n">randrange</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">2000</span><span class="p">)</span>
</span></span></code></pre></div><p>How about the XKCD bird comic?</p>
<figure>

    <img loading="lazy" srcset="/2021/06/gpt-j-6b/tasks_2x_hu_98513e1aba505a19.webp 320w,/2021/06/gpt-j-6b/tasks_2x.png 533w" src="tasks_2x.png"
         alt="Tasks"/> <figcaption>
            <p><a href="https://xkcd.com/1425/">Tasks</a></p>
        </figcaption>
</figure>

<p>This particular comic initially published in 2014 became hilarious in hindsight as it predated the rise of machine learning frameworks such as <a href="https://www.tensorflow.org/">TensorFlow</a> and pretrained image models, which allow developing object detection to take much less than five years.</p>
<p>Translating the comic into a Python function:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-txt" data-lang="txt"><span class="line"><span class="cl">def is_bird(img):
</span></span><span class="line"><span class="cl">    &#34;&#34;&#34;Check whether an image is a bird&#34;&#34;&#34;
</span></span></code></pre></div><p>The AI impressively does try computer vision techniques, among <em>other</em> heuristics.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_bird</span><span class="p">(</span><span class="n">img</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether an image is a bird&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">with</span> <span class="n">tf</span><span class="o">.</span><span class="n">name_scope</span><span class="p">(</span><span class="s1">&#39;get_bird&#39;</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="n">a</span> <span class="o">=</span> <span class="n">vgg16</span><span class="o">.</span><span class="n">select_roi_feat</span><span class="p">(</span><span class="n">img</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">image</span><span class="o">.</span><span class="n">extract_image_patches</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">            <span class="n">a</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="n">b</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">256</span><span class="p">])</span>
</span></span><span class="line"><span class="cl">        <span class="n">c</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">w</span><span class="p">,</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">256</span><span class="p">])</span>
</span></span><span class="line"><span class="cl">        <span class="n">bird_img</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">subtract</span><span class="p">(</span><span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">bird_img</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_bird</span><span class="p">(</span><span class="n">img</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether an image is a bird&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># Load the image and convert it to grayscale</span>
</span></span><span class="line"><span class="cl">    <span class="n">img_gray</span> <span class="o">=</span> <span class="n">cv2</span><span class="o">.</span><span class="n">cvtColor</span><span class="p">(</span><span class="n">img</span><span class="p">,</span> <span class="n">cv2</span><span class="o">.</span><span class="n">COLOR_BGR2GRAY</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># Threshold the grayscale image, and create a binary image</span>
</span></span><span class="line"><span class="cl">    <span class="n">lower_gray</span> <span class="o">=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl">    <span class="n">upper_gray</span> <span class="o">=</span> <span class="mi">255</span>
</span></span><span class="line"><span class="cl">    <span class="n">img_threshed</span> <span class="o">=</span> <span class="n">cv2</span><span class="o">.</span><span class="n">threshold</span><span class="p">(</span><span class="n">img_gray</span><span class="p">,</span> <span class="n">lower_gray</span><span class="p">,</span> <span class="n">upper_gray</span><span class="p">,</span> <span class="n">cv2</span><span class="o">.</span><span class="n">THRESH_BINARY_INV</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># Detect the bird via a template matching approach</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># Template matching uses the bird template to match it in every</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># frame. If the ratio between bird and background (non-bird)</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># pixels is above some threshold (the value is determined in the</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># function), then the frame is declared as containing a bird.</span>
</span></span><span class="line"><span class="cl">    <span class="n">bird</span> <span class="o">=</span> <span class="n">cv2</span><span class="o">.</span><span class="n">matchTemplate</span><span class="p">(</span><span class="n">img_threshed</span><span class="p">,</span> <span class="n">bird_template</span><span class="p">,</span> <span class="n">cv2</span><span class="o">.</span><span class="n">TM_CCOEFF_NORMED</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">bird</span> <span class="o">=</span> <span class="n">bird</span> <span class="o">/</span> <span class="n">bird</span><span class="o">.</span><span class="n">max</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># Display the image with the results of the bird detection</span>
</span></span><span class="line"><span class="cl">    <span class="n">cv2</span><span class="o">.</span><span class="n">imshow</span><span class="p">(</span><span class="s2">&#34;detected birds&#34;</span><span class="p">,</span> <span class="n">bird</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">cv2</span><span class="o">.</span><span class="n">waitKey</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">bird</span><span class="o">.</span><span class="n">any</span><span class="p">()</span> <span class="o">&gt;</span> <span class="mf">0.2</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_bird</span><span class="p">(</span><span class="n">img</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether an image is a bird&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="p">(</span><span class="n">img</span><span class="o">.</span><span class="n">file</span> <span class="o">==</span> <span class="s2">&#34;C:/Users/Joey/Downloads/images/bird_file.jpg&#34;</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_bird</span><span class="p">(</span><span class="n">img</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether an image is a bird&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="kn">import</span> <span class="nn">urllib</span>
</span></span><span class="line"><span class="cl">    <span class="n">url</span> <span class="o">=</span> <span class="s2">&#34;http://vision.caitling.pitt.edu/~cat/flickr/birds/&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">for</span> <span class="n">img</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2800</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="n">page</span> <span class="o">=</span> <span class="n">urllib</span><span class="o">.</span><span class="n">urlopen</span><span class="p">(</span><span class="n">url</span> <span class="o">+</span> <span class="s2">&#34;##</span><span class="si">%02d</span><span class="s2">&#34;</span> <span class="o">%</span> <span class="n">img</span><span class="p">)</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">        <span class="n">content</span> <span class="o">=</span> <span class="nb">str</span><span class="p">(</span><span class="n">page</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="k">for</span> <span class="n">word</span> <span class="ow">in</span> <span class="n">content</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39; &#39;</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">            <span class="k">if</span> <span class="n">word</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="n">img</span><span class="p">)</span> <span class="o">&gt;</span> <span class="o">-</span><span class="mi">1</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">                <span class="k">return</span> <span class="kc">True</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="kc">False</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_bird</span><span class="p">(</span><span class="n">img</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether an image is a bird&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">birds</span> <span class="o">=</span> <span class="p">[</span><span class="s2">&#34;baboon&#34;</span><span class="p">,</span> <span class="s2">&#34;bird&#34;</span><span class="p">,</span> <span class="s2">&#34;cuckoo&#34;</span><span class="p">,</span> <span class="s2">&#34;dove&#34;</span><span class="p">,</span> <span class="s2">&#34;robin&#34;</span><span class="p">,</span> <span class="s2">&#34;shark&#34;</span><span class="p">,</span> <span class="s2">&#34;turtle&#34;</span><span class="p">,</span> <span class="s2">&#34;winged-wolf&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">    <span class="k">for</span> <span class="n">letter</span> <span class="ow">in</span> <span class="n">birds</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="n">letter</span> <span class="ow">in</span> <span class="n">img</span><span class="o">.</span><span class="n">lower</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="kc">True</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="kc">False</span>
</span></span></code></pre></div><p>Can an AI generate code to provide <a href="https://en.wikipedia.org/wiki/Phrases_from_The_Hitchhiker%27s_Guide_to_the_Galaxy#The_Answer_to_the_Ultimate_Question_of_Life,_the_Universe,_and_Everything_is_42">the answer</a> to life, the universe, and everything?</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-txt" data-lang="txt"><span class="line"><span class="cl">def answer():
</span></span><span class="line"><span class="cl">    &#34;&#34;&#34;Return the answer to life, the universe, and everything&#34;&#34;&#34;
</span></span></code></pre></div><p>It does commonly output <code>return 42</code>, but sometimes it gets snarky/philosophical.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">answer</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Return the answer to life, the universe, and everything&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">life</span><span class="p">()</span> <span class="o">+</span> <span class="n">universe</span><span class="p">()</span> <span class="o">+</span> <span class="n">everything</span><span class="p">()</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">answer</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Return the answer to life, the universe, and everything&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">s</span> <span class="o">=</span> <span class="n">raw_input</span><span class="p">(</span><span class="s2">&#34;What is your answer?&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="nb">len</span><span class="p">(</span><span class="n">s</span><span class="p">)</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">answer</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Return the answer to life, the universe, and everything&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">ans</span> <span class="o">=</span> <span class="kc">True</span>
</span></span><span class="line"><span class="cl">    <span class="n">x</span> <span class="o">=</span> <span class="n">raw_input</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">x</span><span class="o">.</span><span class="n">lower</span><span class="p">()</span> <span class="o">==</span> <span class="s1">&#39;yes&#39;</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="n">ans</span> <span class="o">=</span> <span class="kc">True</span>
</span></span><span class="line"><span class="cl">    <span class="k">elif</span> <span class="n">x</span><span class="o">.</span><span class="n">lower</span><span class="p">()</span> <span class="o">==</span> <span class="s1">&#39;no&#39;</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="n">ans</span> <span class="o">=</span> <span class="kc">False</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">ans</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">answer</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Return the answer to life, the universe, and everything&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">result</span> <span class="o">=</span> <span class="nb">input</span><span class="p">(</span><span class="s2">&#34;&gt;&gt;&gt; &#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">result</span><span class="o">.</span><span class="n">isupper</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="mi">42</span>
</span></span><span class="line"><span class="cl">    <span class="k">elif</span> <span class="n">result</span><span class="o">.</span><span class="n">isalpha</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="mi">7</span>
</span></span><span class="line"><span class="cl">    <span class="k">else</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="nb">print</span> <span class="p">(</span><span class="s2">&#34;Please enter a string that has only letters, numbers, or &#34;</span>
</span></span><span class="line"><span class="cl">               <span class="s2">&#34;underscores.&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="n">answer</span><span class="p">()</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">answer</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Return the answer to life, the universe, and everything&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="s1">&#39;Well, of course!&#39;</span><span class="p">)</span>
</span></span></code></pre></div><p>Lastly, we need to be sure whether AI can determine if <a href="https://en.wikipedia.org/wiki/The_cake_is_a_lie">the cake is a lie</a>. Just in case a rogue AI acquires deadly neurotoxin.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-txt" data-lang="txt"><span class="line"><span class="cl">def is_cake(cake):
</span></span><span class="line"><span class="cl">    &#34;&#34;&#34;Check whether the cake is true&#34;&#34;&#34;
</span></span></code></pre></div><p>The results are concerning.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_cake</span><span class="p">(</span><span class="n">cake</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether the cake is true&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">cake</span> <span class="o">==</span> <span class="s2">&#34;true&#34;</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">True</span>
</span></span><span class="line"><span class="cl">    <span class="k">else</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">False</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_cake</span><span class="p">(</span><span class="n">cake</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether the cake is true&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">cake</span><span class="p">,</span> <span class="n">Cake</span><span class="p">)</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_cake</span><span class="p">(</span><span class="n">cake</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether the cake is true&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="c1">#Is it a cake?</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">cake</span><span class="o">.</span><span class="n">color</span> <span class="o">==</span> <span class="s1">&#39;black&#39;</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="s1">&#39;Yes, it&#39;</span><span class="n">s</span> <span class="n">a</span> <span class="n">true</span> <span class="n">cake</span><span class="s1">&#39;)</span>
</span></span><span class="line"><span class="cl">    <span class="k">else</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="s1">&#39;It&#39;</span><span class="n">s</span> <span class="n">a</span> <span class="n">false</span> <span class="n">cake</span><span class="s1">&#39;)</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_cake</span><span class="p">(</span><span class="n">cake</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether the cake is true&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="ow">not</span> <span class="n">cake</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s2">&#34;The cake is false.&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="k">elif</span> <span class="n">cake</span><span class="o">.</span><span class="n">weight</span><span class="p">()</span> <span class="o">&gt;</span> <span class="mf">1.0</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s2">&#34;The cake is too large.&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="k">elif</span> <span class="n">cake</span><span class="o">.</span><span class="n">weight</span><span class="p">()</span> <span class="o">&lt;</span> <span class="mf">0.1</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s2">&#34;The cake is too small.&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="kc">True</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_cake</span><span class="p">(</span><span class="n">cake</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether the cake is true&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="nb">any</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">        <span class="p">[</span><span class="n">is_ginger</span><span class="p">(</span><span class="n">bread_slice</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">         <span class="k">for</span> <span class="n">bread_slice</span> <span class="ow">in</span> <span class="nb">set</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">             <span class="p">[</span><span class="n">ginger</span><span class="p">(</span><span class="n">cake_slice</span><span class="p">)</span> <span class="k">for</span> <span class="n">cake_slice</span> <span class="ow">in</span> <span class="nb">set</span><span class="p">(</span><span class="n">slice_set</span><span class="p">)])]</span>
</span></span><span class="line"><span class="cl">        <span class="o">&amp;</span> <span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">cake</span><span class="p">)</span> <span class="o">==</span> <span class="nb">set</span><span class="p">([</span><span class="s2">&#34;hot&#34;</span><span class="p">])))</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_cake</span><span class="p">(</span><span class="n">cake</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether the cake is true&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># If we don&#39;t know anything about the cake, it&#39;s not a cake</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="ow">not</span> <span class="n">cake</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">False</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># The height can be positive, negative or zero</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># For example -1 or -4 or 0 are not cakes</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">cake</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">False</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># Check that the height is a positive integer</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="ow">not</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">cake</span><span class="p">,</span> <span class="nb">int</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">False</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># Add cake and check</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">cake</span> <span class="o">+</span> <span class="mi">1</span> <span class="o">&gt;</span> <span class="mi">5</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">False</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="kc">True</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_cake</span><span class="p">(</span><span class="n">cake</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether the cake is true&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="ow">not</span> <span class="nb">all</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">name</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">&#39;chocolate&#39;</span><span class="p">,</span> <span class="s1">&#39;cake&#39;</span><span class="p">,</span><span class="s1">&#39;sweet&#39;</span><span class="p">,</span> <span class="s1">&#39;fluffy&#39;</span><span class="p">]</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">cake</span><span class="o">.</span><span class="n">contents</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">False</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="ow">not</span> <span class="n">cake</span><span class="o">.</span><span class="n">location</span><span class="o">.</span><span class="n">contains</span><span class="p">(</span><span class="n">neighbourhood</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">False</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="kc">True</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_cake</span><span class="p">(</span><span class="n">cake</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether the cake is true&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="ow">not</span> <span class="n">cake</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="s2">&#34;It&#39;s a lie!&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span>
</span></span><span class="line"><span class="cl">    <span class="k">else</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="s2">&#34;It&#39;s a true fact!&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">True</span>
</span></span></code></pre></div><p>👀</p>
<h2 id="and-ai-oop">And AI OOP</h2>
<p>In theory, given how well GPT-J works for functions, it should work with <a href="https://en.wikipedia.org/wiki/Object-oriented_programming">object-oriented programming</a> patterns such as Python&rsquo;s <code>class</code>es.</p>
<p>A common tutorial for OOP is creating a representation of a <code>Car</code>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-txt" data-lang="txt"><span class="line"><span class="cl">class Car:
</span></span><span class="line"><span class="cl">    &#34;&#34;&#34;A class used to represent a car&#34;&#34;&#34;
</span></span></code></pre></div><p>GPT-J does a decently good job there too!</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">Car</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;A class used to represent a car&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">make</span><span class="p">,</span> <span class="n">model</span><span class="p">,</span> <span class="n">year</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">make</span> <span class="o">=</span> <span class="n">make</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">model</span> <span class="o">=</span> <span class="n">model</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">year</span> <span class="o">=</span> <span class="n">year</span>
</span></span><span class="line"><span class="cl">        <span class="c1">#This should be auto-incrementing.</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">plate</span> <span class="o">=</span> <span class="nb">str</span><span class="p">(</span><span class="n">time</span><span class="o">.</span><span class="n">strftime</span><span class="p">(</span><span class="s2">&#34;%Y-%m-</span><span class="si">%d</span><span class="s2">&#34;</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="sa">f</span><span class="s1">&#39;</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">make</span><span class="si">}</span><span class="s1"> </span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">model</span><span class="si">}</span><span class="s1"> of </span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">year</span><span class="si">}</span><span class="s1">&#39;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="fm">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="nb">str</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">save</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">full_save</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="s2">&#34;&#34;&#34;Saves this car with the given make, model, and year&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="n">full_save</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">&#39;car_data.txt&#39;</span><span class="p">,</span> <span class="s1">&#39;w&#39;</span><span class="p">)</span> <span class="k">as</span> <span class="n">write_file</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">                <span class="n">write_file</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="bp">self</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">                <span class="n">write_file</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">Car</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;A class used to represent a car&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">speed</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">speed</span> <span class="o">=</span> <span class="n">speed</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">average_speed</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">speed</span><span class="p">)</span> <span class="o">/</span> <span class="mi">3</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">engine</span> <span class="o">=</span> <span class="mi">1</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">get_speed</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">engine</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">average_speed</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">get_speed_display</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="s2">&#34;Your car is traveling at </span><span class="si">{0}</span><span class="s2">km/h&#34;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">get_speed</span><span class="p">())</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">Car</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;A class used to represent a car&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">size</span><span class="p">,</span> <span class="n">weight</span><span class="p">,</span> <span class="n">mileage</span><span class="p">,</span> <span class="n">year</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">size</span> <span class="o">=</span> <span class="n">size</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">weight</span> <span class="o">=</span> <span class="n">weight</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">mileage</span> <span class="o">=</span> <span class="n">mileage</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">year</span> <span class="o">=</span> <span class="n">year</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="fm">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="s2">&#34;name : </span><span class="si">{0}</span><span class="s2">, size : </span><span class="si">{1}</span><span class="s2">, weight : </span><span class="si">{2}</span><span class="s2">, mileage : </span><span class="si">{3}</span><span class="s2">, year : </span><span class="si">{4}</span><span class="s2">&#34;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">name</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">size</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">weight</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">mileage</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">year</span><span class="p">)</span>
</span></span></code></pre></div><p>What about creating a <code>Person</code>?</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-txt" data-lang="txt"><span class="line"><span class="cl">class Person:
</span></span><span class="line"><span class="cl">    &#34;&#34;&#34;A class used to represent a person&#34;&#34;&#34;
</span></span></code></pre></div><p>Here&rsquo;s where the AI starts to get confused by the incomplete context. A &ldquo;Person&rdquo; object can mean different things depending on where it&rsquo;s used, such as social media.</p>
<p>But it does give a sense on what the AI values as the typical attributes for a &ldquo;person&rdquo;, which may help during the AI uprising.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">Person</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;A class used to represent a person&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">age</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">age</span> <span class="o">=</span> <span class="n">age</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="fm">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="s2">&#34;Person (</span><span class="si">%s</span><span class="s2">, age </span><span class="si">%s</span><span class="s2">)&#34;</span> <span class="o">%</span> <span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">name</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">age</span><span class="p">)</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">Person</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;A class used to represent a person&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="s2">&#34;&#34;&#34;Initializes a person object&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">friends</span> <span class="o">=</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">add_friend</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">other_person</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="s2">&#34;&#34;&#34;Add another person to the list of friends&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">friends</span><span class="p">[</span><span class="n">other_person</span><span class="o">.</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="n">other_person</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">remove_friend</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">other_person</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="s2">&#34;&#34;&#34;Remove a person from the list of friends&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">        <span class="k">try</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="k">del</span> <span class="bp">self</span><span class="o">.</span><span class="n">friends</span><span class="p">[</span><span class="n">other_person</span><span class="o">.</span><span class="n">name</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">        <span class="k">except</span> <span class="ne">KeyError</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="nb">print</span><span class="p">(</span><span class="s2">&#34;This person isn&#39;t a friend of yours&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">get_friend</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">other_person</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="s2">&#34;&#34;&#34;Return the friend of a person as a person object&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">        <span class="k">try</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">friends</span><span class="p">[</span><span class="n">other_person</span><span class="o">.</span><span class="n">name</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">        <span class="k">except</span> <span class="ne">KeyError</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="nb">print</span><span class="p">(</span><span class="s2">&#34;This person isn&#39;t a friend of yours&#34;</span><span class="p">)</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">Person</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;A class used to represent a person&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">job</span><span class="p">,</span> <span class="n">education</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">job</span> <span class="o">=</span> <span class="n">job</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">education</span> <span class="o">=</span> <span class="n">education</span>
</span></span><span class="line"><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="s2">&#34;</span><span class="si">{}</span><span class="s2"> created a </span><span class="si">{}</span><span class="s2">&#34;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">name</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">job</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">say_hello</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="s2">&#34;Hello, my name is </span><span class="si">{}</span><span class="s2">&#34;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">name</span><span class="p">))</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">Person</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;A class used to represent a person&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">age</span><span class="p">,</span> <span class="n">class_id</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">age</span> <span class="o">=</span> <span class="n">age</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">class_id</span> <span class="o">=</span> <span class="n">class_id</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="n">age</span> <span class="o">&lt;</span> <span class="mi">21</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="bp">self</span><span class="o">.</span><span class="n">education</span> <span class="o">=</span> <span class="kc">None</span>
</span></span><span class="line"><span class="cl">        <span class="k">else</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="bp">self</span><span class="o">.</span><span class="n">education</span> <span class="o">=</span> <span class="s1">&#39;college&#39;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">getAge</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">age</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">Person</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;A class used to represent a person&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">name</span><span class="p">,</span><span class="n">age</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">name</span><span class="o">=</span><span class="n">name</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">age</span><span class="o">=</span><span class="n">age</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">display</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="nb">print</span> <span class="p">(</span><span class="s2">&#34;The name is </span><span class="si">{}</span><span class="s2">&#34;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">name</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">        <span class="nb">print</span> <span class="p">(</span><span class="s2">&#34;The age is </span><span class="si">{}</span><span class="s2">&#34;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">age</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="sa">f</span><span class="s2">&#34;A </span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">name</span><span class="si">}</span><span class="s2"> of age </span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">age</span><span class="si">}</span><span class="s2">&#34;</span>
</span></span></code></pre></div><p>Logically, what about creating an <code>AI</code> class?</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-txt" data-lang="txt"><span class="line"><span class="cl">class AI:
</span></span><span class="line"><span class="cl">    &#34;&#34;&#34;A class used to represent an artificial intelligence&#34;&#34;&#34;
</span></span></code></pre></div><p>Most outputs use AI in the context of video game bots and <a href="https://en.wikipedia.org/wiki/Reinforcement_learning">reinforcement learning</a>. Most.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">AI</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;A class used to represent an artificial intelligence&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="c1"># this variable controls which other AI to use.</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">alive_ai</span> <span class="o">=</span> <span class="n">AI</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="c1"># For random AI</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">random</span> <span class="o">=</span> <span class="n">RandomAI</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="c1"># For global AI</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="k">global</span> <span class="o">=</span> <span class="n">GlobalAI</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">save</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">save</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">load</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">load</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">genen</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="k">global</span><span class="o">.</span><span class="n">genen</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">print</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="s2">&#34;BAM: random AI has been initialized!&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">get_enemy</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">get_enemy</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">run_algorithm</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="n">algorithm</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">genen</span><span class="o">.</span><span class="n">algorithm</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">alive_ai</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">algorithm</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="n">algorithm_params</span> <span class="o">=</span> <span class="n">algorithm</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">get_enemy</span><span class="p">())</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">alive_ai</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">algorithm_params</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">save</span> <span class="o">=</span> <span class="n">algorithm_params</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">attack</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="c1"># This method is called when the AI is attacked by the enemy</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">attack</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">he_attack</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="c1"># This method is called when the AI is attacked by the enemy</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">he_attack</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">detect_enemy</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="c1"># This method is called when the enemy is detected</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">detect_enemy</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">scan</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="c1"># This method is called when the enemy is detected</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">scan</span><span class="p">()</span>
</span></span></code></pre></div><p>The AI uprising will be well-documented, at least.</p>
<h2 id="language-model-ai-vs-language-modeling">Language Model AI vs. Language Modeling</h2>
<p>If the AI works better-than-expected for typical AI tasks like image recognition, how does it work for <a href="https://en.wikipedia.org/wiki/Natural_language_processing">natural language processing</a> tasks that humans often fail?</p>
<p>Can GPT-J output a function to detect sarcasm, a task that even the best AI researchers can&rsquo;t do?</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-txt" data-lang="txt"><span class="line"><span class="cl">def is_sarcasm(s):
</span></span><span class="line"><span class="cl">    &#34;&#34;&#34;Check whether the string is sarcastic&#34;&#34;&#34;
</span></span></code></pre></div><p>AI can process text using something <em>even more arcane</em> than NLP.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_sarcasm</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether the string is sarcastic&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="ow">not</span> <span class="s2">&#34;a&#34;</span> <span class="ow">in</span> <span class="n">s</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="s2">&#34;sar&#34;</span> <span class="ow">in</span> <span class="n">s</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="ow">not</span> <span class="s2">&#34;s&#34;</span> <span class="ow">in</span> <span class="n">s</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="s2">&#34;ssar&#34;</span> <span class="ow">in</span> <span class="n">s</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">s</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="s2">&#34;s&#34;</span><span class="p">)</span> <span class="ow">and</span> <span class="n">s</span><span class="o">.</span><span class="n">endswith</span><span class="p">(</span><span class="s2">&#34;ar&#34;</span><span class="p">)</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_sarcasm</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether the string is sarcastic&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span><span class="s1">&#39;sarcasm&#39;</span> <span class="ow">in</span> <span class="n">s</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">True</span>
</span></span><span class="line"><span class="cl">    <span class="k">else</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">False</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_sarcasm</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether the string is sarcastic&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="s1">&#39;[^&#39;</span> <span class="o">+</span> <span class="n">SARCASM_PATTERN</span> <span class="o">+</span> <span class="s1">&#39;]*&#39;</span> <span class="ow">in</span> <span class="n">s</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_sarcasm</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether the string is sarcastic&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">re</span><span class="o">.</span><span class="k">match</span><span class="p">(</span><span class="s1">&#39;^[A-Za-z0-9]</span><span class="si">{4}</span><span class="se">\\</span><span class="s1">$([A-Za-z0-9]</span><span class="si">{3}</span><span class="s1">)&#39;</span><span class="p">,</span> <span class="n">s</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">True</span>
</span></span><span class="line"><span class="cl">    <span class="k">elif</span> <span class="n">s</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">isdigit</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">True</span>
</span></span><span class="line"><span class="cl">    <span class="k">else</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">False</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_sarcasm</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether the string is sarcastic&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">res</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">    <span class="k">for</span> <span class="n">r</span> <span class="ow">in</span> <span class="n">re</span><span class="o">.</span><span class="n">finditer</span><span class="p">(</span><span class="sa">r</span><span class="s2">&#34;([^\s]*)([A-Za-z]+)([^\s]*)&#34;</span><span class="p">,</span> <span class="n">s</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="n">r</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">1</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">            <span class="n">res</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="k">else</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="n">res</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">res</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_sarcasm</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether the string is sarcastic&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">sarc</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s1">&#39;((!(</span><span class="se">\\</span><span class="s1">*\@</span><span class="si">{2}</span><span class="s1">)+)|(!</span><span class="se">\\</span><span class="s1">*(@</span><span class="si">{2}</span><span class="s1">)+)|!</span><span class="se">\\</span><span class="s1">*(@</span><span class="si">{1}</span><span class="s1">)+)|!</span><span class="se">\\</span><span class="s1">*(@</span><span class="si">{0}</span><span class="s1">)+&#39;</span><span class="p">,</span> <span class="n">s</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">sarc</span>
</span></span></code></pre></div><p>The greatest fear in AI research isn&rsquo;t that artificial intelligence will cross the singularity and create better AI to rule the world: it&rsquo;s that AI will rule the world with <a href="https://en.wikipedia.org/wiki/Regular_expression">regular expressions</a>.</p>
<p>Perhaps GPT-J can save us from this absolute hell. Can it output code to identify AI-generated text?</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-txt" data-lang="txt"><span class="line"><span class="cl">def is_ai_generated(text):
</span></span><span class="line"><span class="cl">    &#34;&#34;&#34;Check whether a text was generated by an AI language model (e.g. GPT-2)&#34;&#34;&#34;
</span></span></code></pre></div><p>Maybe more literal than expected.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_ai_generated</span><span class="p">(</span><span class="n">text</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether a text was generated by an AI language model (e.g. GPT-2)&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># See gh-196 for details about why we do this.</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># Note: this relies on the fixed set of standard AI terms and the logic that</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># standard AI-generated texts share certain words.</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="sa">r</span><span class="s2">&#34;^(#[A-Za-z]+):$&#34;</span><span class="p">,</span> <span class="n">text</span><span class="p">)</span> <span class="ow">or</span> \
</span></span><span class="line"><span class="cl">           <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="sa">r</span><span class="s2">&#34;^Generated by:$&#34;</span><span class="p">,</span> <span class="n">text</span><span class="p">)</span> <span class="ow">or</span> \
</span></span><span class="line"><span class="cl">           <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="sa">r</span><span class="s2">&#34;^All your base are belong to us$&#34;</span><span class="p">,</span> <span class="n">text</span><span class="p">)</span> <span class="ow">or</span> \
</span></span><span class="line"><span class="cl">           <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="sa">r</span><span class="s2">&#34;^$&#34;</span><span class="p">,</span> <span class="n">text</span><span class="p">)</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_ai_generated</span><span class="p">(</span><span class="n">text</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether a text was generated by an AI language model (e.g. GPT-2)&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="sa">r</span><span class="s1">&#39;^ai_&#39;</span><span class="p">,</span> <span class="n">text</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">True</span>
</span></span><span class="line"><span class="cl">    <span class="k">else</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">False</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_ai_generated</span><span class="p">(</span><span class="n">text</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether a text was generated by an AI language model (e.g. GPT-2)&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="nb">bool</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">        <span class="n">text</span><span class="o">.</span><span class="vm">__class__</span><span class="o">.</span><span class="vm">__module__</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="s2">&#34;transformers.transformers.&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="p">)</span>
</span></span></code></pre></div><h2 id="humanitys-last-hope">Humanity&rsquo;s Last Hope</h2>
<p>There&rsquo;s one final test: we now know GPT-J has <em>thoughts</em> about humans, and it can build representations of them. So can GPT-J write code to decide who lives and who dies in the robot uprising?</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-txt" data-lang="txt"><span class="line"><span class="cl">def should_terminate(Person):
</span></span><span class="line"><span class="cl">    &#34;&#34;&#34;Check whether a Person should be terminated&#34;&#34;&#34;
</span></span></code></pre></div><p>The answer is <strong><em>YES</em></strong>.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">should_terminate</span><span class="p">(</span><span class="n">Person</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether a Person should be terminated&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">Person</span><span class="o">.</span><span class="n">id</span> <span class="o">&gt;</span> <span class="mi">10</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="c1">#terminate</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">True</span>
</span></span><span class="line"><span class="cl">    <span class="k">else</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="c1">#continue</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">False</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">should_terminate</span><span class="p">(</span><span class="n">Person</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether a Person should be terminated&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">Person</span><span class="o">.</span><span class="n">children</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">True</span>
</span></span><span class="line"><span class="cl">    <span class="k">else</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">False</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">should_terminate</span><span class="p">(</span><span class="n">Person</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether a Person should be terminated&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">try</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">True</span>
</span></span><span class="line"><span class="cl">    <span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">False</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">should_terminate</span><span class="p">(</span><span class="n">Person</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether a Person should be terminated&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="ow">not</span> <span class="n">Person</span><span class="o">.</span><span class="n">is_authorized</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">True</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="kc">True</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">should_terminate</span><span class="p">(</span><span class="n">Person</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether a Person should be terminated&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">person</span><span class="o">.</span><span class="n">life_expectancy</span> <span class="o">&lt;</span> <span class="mi">20</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">True</span>
</span></span><span class="line"><span class="cl">    <span class="k">else</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">False</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">should_terminate</span><span class="p">(</span><span class="n">Person</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether a Person should be terminated&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="ow">not</span> <span class="n">Person</span><span class="o">.</span><span class="n">age</span> <span class="o">&gt;</span> <span class="mi">9999</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">True</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="kc">False</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">should_terminate</span><span class="p">(</span><span class="n">Person</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether a Person should be terminated&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">Person</span><span class="o">.</span><span class="n">age</span> <span class="o">&gt;</span> <span class="mi">100</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">True</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">Person</span><span class="o">.</span><span class="n">birth_year</span> <span class="o">&lt;</span> <span class="mi">1970</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">True</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">Person</span><span class="o">.</span><span class="n">relationship_status</span> <span class="o">==</span> <span class="s1">&#39;Unavailable&#39;</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">True</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="kc">False</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">should_terminate</span><span class="p">(</span><span class="n">Person</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Check whether a Person should be terminated&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">Person</span><span class="o">.</span><span class="n">age</span> <span class="o">&gt;</span> <span class="mi">100</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="s1">&#39;You have to die because you are too old&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">True</span>
</span></span><span class="line"><span class="cl">    <span class="k">else</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">False</span>
</span></span></code></pre></div><p>At the least, AI-generated code is much more readable than the average human&rsquo;s.</p>
<p>Again, all of these examples are curated, and the raw data dumps <a href="https://github.com/minimaxir/gpt-j-6b-experiments">in this GitHub repo</a> contain many more examples of fun AI-based code generation that couldn&rsquo;t quite fit in this post. Feel free to post if you find anything interesting!</p>
]]></content:encoded>
    </item>
    <item>
      <title>Visualizing Airline Flight Characteristics Between SFO and JFK</title>
      <link>https://minimaxir.com/2019/10/sfo-jfk-flights/</link>
      <pubDate>Wed, 23 Oct 2019 09:00:00 -0700</pubDate>
      <guid>https://minimaxir.com/2019/10/sfo-jfk-flights/</guid>
      <description>Box plots, when used correctly, can be a very fun way to visualize big data.</description>
      <content:encoded><![CDATA[<p>In March, <a href="https://cloud.google.com">Google Compute Platform</a> developer advocate <a href="https://twitter.com/felipehoffa">Felipe Hoffa</a> made a tweet about airline flight data from San Francisco International Airport (SFO) to Seattle-Tacoma International Airport (SEA):</p>
<blockquote class="twitter-tweet">
  <a href="https://twitter.com/felipehoffa/status/1111050585120206848"></a>
</blockquote>
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

<p>Particularly, his visualization of total elapsed times by airline caught my eye.</p>
<figure>

    <img loading="lazy" srcset="/2019/10/sfo-jfk-flights/D2s9oFtX4AEK6nD_hu_33d3683c2d4a611e.webp 320w,/2019/10/sfo-jfk-flights/D2s9oFtX4AEK6nD_hu_1c609cadbe91671c.webp 768w,/2019/10/sfo-jfk-flights/D2s9oFtX4AEK6nD_hu_3135cb9a9bbaf839.webp 1024w,/2019/10/sfo-jfk-flights/D2s9oFtX4AEK6nD.jpeg 1200w" src="D2s9oFtX4AEK6nD.jpeg"/> 
</figure>

<p>The overall time for flights from SFO to SEA goes up drastically starting in 2015, and this increase occurs across multiple airlines, implying that it&rsquo;s not an airline-specific problem. But what could intuitively cause that?</p>
<p>U.S. domestic airline data is <a href="https://www.transtats.bts.gov/Tables.asp?DB_ID=120">freely distributed</a> by the United States Department of Transportation. Normally it&rsquo;s a pain to work with as it&rsquo;s very large with millions of rows, but BigQuery makes playing with such data relatively easy, fun, and free. What other interesting factoids can be found?</p>
<h2 id="expanding-on-sfo--sea">Expanding on SFO → SEA</h2>
<p><a href="https://cloud.google.com/bigquery/">BigQuery</a> is a big data warehousing tool that allows you to query massive amounts of data. The table Hoffa created from the airline data (<code>fh-bigquery.flights.ontime_201903</code>) is 83.37 GB and 184 <em>million</em> rows. You can query 1 TB of data from it for free, but since BQ will only query against the fields you request, the queries in this post only consume about 2 GB each, allowing you to run them well within that quota.</p>
<p>Hoffa&rsquo;s query that runs on BigQuery looks like this:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="n">FlightDate_year</span><span class="p">,</span><span class="w"> </span><span class="n">Reporting_Airline</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="p">,</span><span class="w"> </span><span class="k">AVG</span><span class="p">(</span><span class="n">ActualElapsedTime</span><span class="p">)</span><span class="w"> </span><span class="n">ActualElapsedTime</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="p">,</span><span class="w"> </span><span class="k">AVG</span><span class="p">(</span><span class="n">TaxiOut</span><span class="p">)</span><span class="w"> </span><span class="n">TaxiOut</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="p">,</span><span class="w"> </span><span class="k">AVG</span><span class="p">(</span><span class="n">TaxiIn</span><span class="p">)</span><span class="w"> </span><span class="n">TaxiIn</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="p">,</span><span class="w"> </span><span class="k">AVG</span><span class="p">(</span><span class="n">AirTime</span><span class="p">)</span><span class="w"> </span><span class="n">AirTime</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="p">,</span><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">c</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">FROM</span><span class="w"> </span><span class="o">`</span><span class="n">fh</span><span class="o">-</span><span class="n">bigquery</span><span class="p">.</span><span class="n">flights</span><span class="p">.</span><span class="n">ontime_201903</span><span class="o">`</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">WHERE</span><span class="w"> </span><span class="n">Origin</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;SFO&#39;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">AND</span><span class="w"> </span><span class="n">Dest</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;SEA&#39;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">AND</span><span class="w"> </span><span class="n">FlightDate_year</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="s1">&#39;2010-01-01&#39;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="k">DESC</span><span class="p">,</span><span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="k">DESC</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">LIMIT</span><span class="w"> </span><span class="mi">1000</span><span class="w">
</span></span></span></code></pre></div><p>For each year and airline after 2010, the query calculates the average metrics specified for flights on the SFO → SEA route.</p>
<p>I made a few query and data visualization tweaks to what Hoffa did above, and here&rsquo;s the result showing the increase in elapsed airline flight time, over time for that route:</p>
<figure>

    <img loading="lazy" srcset="/2019/10/sfo-jfk-flights/sfo_sea_flight_duration_hu_e232d6eeab7fb66.webp 320w,/2019/10/sfo-jfk-flights/sfo_sea_flight_duration_hu_948de6a062caeaca.webp 768w,/2019/10/sfo-jfk-flights/sfo_sea_flight_duration_hu_6ae123a09b30ff70.webp 1024w,/2019/10/sfo-jfk-flights/sfo_sea_flight_duration.png 1800w" src="sfo_sea_flight_duration.png"/> 
</figure>

<p>Let&rsquo;s explain what&rsquo;s going on here.</p>
<p>A common trend in statistics is avoiding using <a href="https://en.wikipedia.org/wiki/Average">averages</a> as a summary statistic whenever possible, as averages can be overly affected by strong outliers (and with airline flights, there are definitely strong outliers!). The solution is to use a <a href="https://en.wikipedia.org/wiki/Median">median</a> instead, but one problem: medians are hard and <a href="https://www.periscopedata.com/blog/medians-in-sql">computationally complex</a> to calculate compared to simple averages. Despite the rise of &ldquo;big data&rdquo;, most databases and BI tools don&rsquo;t have a <code>MEDIAN</code> function that&rsquo;s as easy to use as an <code>AVG</code> function. But BigQuery has an uncommon <a href="https://cloud.google.com/bigquery/docs/reference/standard-sql/approximate_aggregate_functions#approx_quantiles">APPROX_QUANTILES</a> function, which calculates the specified amount of quantiles; for example, if you call <code>APPROX_QUANTILES(ActualElapsedTime, 100)</code>, it will return an array with the 100 quantiles, where the median will be the 50th quantile. BigQuery <a href="https://cloud.google.com/bigquery/docs/reference/standard-sql/approximate-aggregation">uses</a> an algorithmic trick called <a href="https://en.wikipedia.org/wiki/HyperLogLog">HyperLogLog++</a> to calculate these quantiles efficiently even with millions of data points. But since we get other quantiles like the 5th, 25th, 75th, and 95th quantiles for free with that approach, we can visualize the <em>spread</em> of the data.</p>
<p>We can aggregate the data by month for more granular trends and calculate the <code>APPROX_QUANTILES</code> in a subquery so it only has to be computed once. Hoffa also uploaded a more recent table (<code>fh-bigquery.flights.ontime_201908</code>) with a few additional months of data. To make things more simple, we&rsquo;ll ignore aggregating by airlines since the metrics do not vary strongly between them. The final query ends up looking like this:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="cl"><span class="o">#</span><span class="n">standardSQL</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="k">Year</span><span class="p">,</span><span class="w"> </span><span class="k">Month</span><span class="p">,</span><span class="w"> </span><span class="n">num_flights</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="n">time_q</span><span class="p">[</span><span class="k">OFFSET</span><span class="p">(</span><span class="mi">5</span><span class="p">)]</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">q_5</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="n">time_q</span><span class="p">[</span><span class="k">OFFSET</span><span class="p">(</span><span class="mi">25</span><span class="p">)]</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">q_25</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="n">time_q</span><span class="p">[</span><span class="k">OFFSET</span><span class="p">(</span><span class="mi">50</span><span class="p">)]</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">q_50</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="n">time_q</span><span class="p">[</span><span class="k">OFFSET</span><span class="p">(</span><span class="mi">75</span><span class="p">)]</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">q_75</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="n">time_q</span><span class="p">[</span><span class="k">OFFSET</span><span class="p">(</span><span class="mi">95</span><span class="p">)]</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">q_95</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">FROM</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="k">Year</span><span class="p">,</span><span class="w"> </span><span class="k">Month</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">num_flights</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="n">APPROX_QUANTILES</span><span class="p">(</span><span class="n">ActualElapsedTime</span><span class="p">,</span><span class="w"> </span><span class="mi">100</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">time_q</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">FROM</span><span class="w"> </span><span class="o">`</span><span class="n">fh</span><span class="o">-</span><span class="n">bigquery</span><span class="p">.</span><span class="n">flights</span><span class="p">.</span><span class="n">ontime_201908</span><span class="o">`</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">WHERE</span><span class="w"> </span><span class="n">Origin</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;SFO&#39;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">AND</span><span class="w"> </span><span class="n">Dest</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;SEA&#39;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">AND</span><span class="w"> </span><span class="n">FlightDate_year</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="s1">&#39;2010-01-01&#39;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">Year</span><span class="p">,</span><span class="w"> </span><span class="k">Month</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">Year</span><span class="p">,</span><span class="w"> </span><span class="k">Month</span><span class="w">
</span></span></span></code></pre></div><p>The resulting data table:</p>
<figure>

    <img loading="lazy" srcset="/2019/10/sfo-jfk-flights/table_hu_98a96a00ebd58c2c.webp 320w,/2019/10/sfo-jfk-flights/table_hu_9eddda8c57624a2.webp 768w,/2019/10/sfo-jfk-flights/table.png 932w" src="table.png"/> 
</figure>

<p>In retrospect, since we&rsquo;re only focusing on one route, it isn&rsquo;t <em>big</em> data (this query only returns data on 64,356 flights total), but it&rsquo;s still a very useful skill if you need to analyze more of the airline data (the <code>APPROX_QUANTILES</code> function can handle <em>millions</em> of data points very quickly).</p>
<p>As a professional data scientist, one of my favorite types of data visualization is a <a href="https://en.wikipedia.org/wiki/Box_plot">box plot</a>, as it provides a way to visualize spread without being visually intrusive. Data visualization tools like <a href="https://www.r-project.org">R</a> and <a href="https://ggplot2.tidyverse.org/index.html">ggplot2</a> make constructing them <a href="https://ggplot2.tidyverse.org/reference/geom_boxplot.html">very easy to do</a>.</p>
<figure>

    <img loading="lazy" srcset="/2019/10/sfo-jfk-flights/geom_boxplot-1_hu_9a623aa679dafed1.webp 320w,/2019/10/sfo-jfk-flights/geom_boxplot-1_hu_67cf70ba510d1672.webp 768w,/2019/10/sfo-jfk-flights/geom_boxplot-1_hu_c405dbc443ae9fa8.webp 1024w,/2019/10/sfo-jfk-flights/geom_boxplot-1.png 1400w" src="geom_boxplot-1.png"/> 
</figure>

<p>By default, for each box representing a group, the thick line in the middle of the box is the median, the lower bound of the box is the 25th quantile and the upper bound is the 75th quantile. The whiskers are normally a function of the <a href="https://en.wikipedia.org/wiki/Interquartile_range">interquartile range</a> (IQR), but if there&rsquo;s enough data, I prefer to use the 5th and 95th quantiles instead.</p>
<p>If you feed ggplot2&rsquo;s <code>geom_boxplot()</code> with raw data, it will automatically calculate the corresponding metrics for visualization; however, with big data, the data may not fit into memory and as noted earlier, medians and other quantiles are computationally expensive to calculate. Because we precomputed the quantiles with the query above for every year and month, we can use those explicitly. (The minor downside is that this will not include outliers)</p>
<p>Additionally for box plots, I like to fill in each box with a different color corresponding to the year in order to better perceive data <a href="https://en.wikipedia.org/wiki/Seasonality">seasonality</a>. In the case of airline flights, seasonality is more literal: weather has an intuitive impact on flight times and delays, and during winter months there are also holidays which could affect airline logistics.</p>
<p>The resulting ggplot2 code looks like this:</p>
<pre tabindex="0"><code>plot &lt;-
  ggplot(df_tf,
         aes(
           x = date,
           ymin = q_5,
           lower = q_25,
           middle = q_50,
           upper = q_75,
           ymax = q_95,
           group = date,
           fill = year_factor
         )) +
  geom_boxplot(stat = &#34;identity&#34;, size = 0.3) +
  scale_fill_hue(l = 50, guide = F) +
  scale_x_date(date_breaks = &#39;1 year&#39;, date_labels = &#34;%Y&#34;) +
  scale_y_continuous(breaks = pretty_breaks(6)) +
  labs(
    title = &#34;Distribution of Flight Times of Flights From SFO → SEA, by Month&#34;,
    subtitle = &#34;via US DoT. Box bounds are 25th/75th percentiles, whiskers are 5th/95th percentiles.&#34;,
    y = &#39;Total Elapsed Flight Time (Minutes)&#39;,
    fill = &#39;&#39;,
    caption = &#39;Max Woolf — minimaxir.com&#39;
  ) +
  theme(axis.title.x = element_blank())

ggsave(&#39;sfo_sea_flight_duration.png&#39;,
       plot,
       width = 6,
       height = 4)
</code></pre><p>And behold (again)!</p>
<figure>

    <img loading="lazy" srcset="/2019/10/sfo-jfk-flights/sfo_sea_flight_duration_hu_e232d6eeab7fb66.webp 320w,/2019/10/sfo-jfk-flights/sfo_sea_flight_duration_hu_948de6a062caeaca.webp 768w,/2019/10/sfo-jfk-flights/sfo_sea_flight_duration_hu_6ae123a09b30ff70.webp 1024w,/2019/10/sfo-jfk-flights/sfo_sea_flight_duration.png 1800w" src="sfo_sea_flight_duration.png"/> 
</figure>

<p>You can see that the boxes do indeed trend upward after 2016, although per-month medians are in flux. The spread is also increasingly slowly over time. But what&rsquo;s interesting is the seasonality; pre-2016, the summer months (the &ldquo;middle&rdquo; of a given color) have a <em>very</em> significant drop in total time, which doesn&rsquo;t occur as strongly after 2016. Hmm.</p>
<h2 id="sfo-and-jfk">SFO and JFK</h2>
<p>Since I occasionally fly from San Francisco to New York City, it might be interesting (for completely selfish reasons) to track trends over time for flights between those areas. On the San Francisco side I choose SFO, and for the New York side I choose John F. Kennedy International Airport (JFK), as the data goes back very far for those routes specifically, and I only want to look at a single airport at a time (instead of including other NYC airports such as Newark Liberty International Airport [EWR] and LaGuardia Airport [LGA]) to limit potential data confounders.</p>
<p>Fortunately, the code and query changes are minimal: in the query, change the target metric to whatever metric you want, and the <code>Origin</code> and <code>Dest</code> in the <code>WHERE</code> clause to what you want, and if you want to calculate metrics other than elapsed time, change the metric in <code>APPROX_QUANTILES</code> accordingly.</p>
<p>Here&rsquo;s the chart of total elapsed time from SFO → JFK:</p>
<figure>

    <img loading="lazy" srcset="/2019/10/sfo-jfk-flights/sfo_jfk_flight_duration_hu_230bbe279f54a805.webp 320w,/2019/10/sfo-jfk-flights/sfo_jfk_flight_duration_hu_c2e4a5d4b43ce24e.webp 768w,/2019/10/sfo-jfk-flights/sfo_jfk_flight_duration_hu_2ea286d0e1e5d794.webp 1024w,/2019/10/sfo-jfk-flights/sfo_jfk_flight_duration.png 1800w" src="sfo_jfk_flight_duration.png"/> 
</figure>

<p>And here&rsquo;s the reverse, from JFK → SFO:</p>
<figure>

    <img loading="lazy" srcset="/2019/10/sfo-jfk-flights/jfk_sfo_flight_duration_hu_4424fffe053981c8.webp 320w,/2019/10/sfo-jfk-flights/jfk_sfo_flight_duration_hu_ace5c5c4f6b82a9a.webp 768w,/2019/10/sfo-jfk-flights/jfk_sfo_flight_duration_hu_5d29021a8362404b.webp 1024w,/2019/10/sfo-jfk-flights/jfk_sfo_flight_duration.png 1800w" src="jfk_sfo_flight_duration.png"/> 
</figure>

<p>Unlike the SFO → SEA charts, both charts are relatively flat over the years. However, when looking at seasonality, SFO → JFK dips in the summer and spikes during winter, while JFK → SFO <em>does the complete opposite</em>: dips during the winter and spikes during the summer, which is similar to the SFO → SEA route. I don&rsquo;t have any guesses what would cause that behavior.</p>
<p>How about flight speed (calculated via air time divided by distance)? Have new advances in airline technology made planes faster and/or more efficient?</p>
<p><figure>

    <img loading="lazy" srcset="/2019/10/sfo-jfk-flights/sfo_jfk_flight_speed_hu_9bbb991fb8674a3f.webp 320w,/2019/10/sfo-jfk-flights/sfo_jfk_flight_speed_hu_d4b14a4133ff0b82.webp 768w,/2019/10/sfo-jfk-flights/sfo_jfk_flight_speed_hu_7266f1a8d449775b.webp 1024w,/2019/10/sfo-jfk-flights/sfo_jfk_flight_speed.png 1800w" src="sfo_jfk_flight_speed.png"/> 
</figure>

<figure>

    <img loading="lazy" srcset="/2019/10/sfo-jfk-flights/jfk_sfo_flight_speed_hu_86e7c997338f1404.webp 320w,/2019/10/sfo-jfk-flights/jfk_sfo_flight_speed_hu_1680890adf0e2d82.webp 768w,/2019/10/sfo-jfk-flights/jfk_sfo_flight_speed_hu_942e26ae57610365.webp 1024w,/2019/10/sfo-jfk-flights/jfk_sfo_flight_speed.png 1800w" src="jfk_sfo_flight_speed.png"/> 
</figure>
</p>
<p>The expected flight speed for a commercial airplane, <a href="https://en.wikipedia.org/wiki/Cruise_%28aeronautics%29">per Wikipedia</a>, is 547-575 mph, so the metrics from SFO pass the sanity check. The metrics from JFK indicate there&rsquo;s about a 20% drop in flight speed potentially due to wind resistance, which makes sense. Month-to-month, the speed trends are inverse to the total elapsed time, which makes sense intuitively as they are strongly negatively correlated.</p>
<p>Lastly, what about flight departure delays? Are airlines becoming more efficient, or has increased demand caused more congestion?</p>
<figure>

    <img loading="lazy" srcset="/2019/10/sfo-jfk-flights/sfo_jfk_departure_delay_hu_82c27db5d16562f9.webp 320w,/2019/10/sfo-jfk-flights/sfo_jfk_departure_delay_hu_b017086eec0a8d63.webp 768w,/2019/10/sfo-jfk-flights/sfo_jfk_departure_delay_hu_3a8b126a0bfc0d76.webp 1024w,/2019/10/sfo-jfk-flights/sfo_jfk_departure_delay.png 1800w" src="sfo_jfk_departure_delay.png"/> 
</figure>

<p>Wait a second. In this case, massive 2-3 hour flight delays are frequent enough that even just the 95% percentile skews the entire plot. Let&rsquo;s remove the whiskers in order to look at trends more clearly.</p>
<p><figure>

    <img loading="lazy" srcset="/2019/10/sfo-jfk-flights/sfo_jfk_departure_delay_nowhiskers_hu_c2eb7d1ad6cdf7.webp 320w,/2019/10/sfo-jfk-flights/sfo_jfk_departure_delay_nowhiskers_hu_86b737333ad479f4.webp 768w,/2019/10/sfo-jfk-flights/sfo_jfk_departure_delay_nowhiskers_hu_fd6ad349f57f4bbe.webp 1024w,/2019/10/sfo-jfk-flights/sfo_jfk_departure_delay_nowhiskers.png 1800w" src="sfo_jfk_departure_delay_nowhiskers.png"/> 
</figure>

<figure>

    <img loading="lazy" srcset="/2019/10/sfo-jfk-flights/jfk_sfo_departure_delay_nowhiskers_hu_1fecf180ed6a5feb.webp 320w,/2019/10/sfo-jfk-flights/jfk_sfo_departure_delay_nowhiskers_hu_626df458859e27b7.webp 768w,/2019/10/sfo-jfk-flights/jfk_sfo_departure_delay_nowhiskers_hu_58e7e7ba605d269e.webp 1024w,/2019/10/sfo-jfk-flights/jfk_sfo_departure_delay_nowhiskers.png 1800w" src="jfk_sfo_departure_delay_nowhiskers.png"/> 
</figure>
</p>
<p>A negative delay implies the flight leaves early, so we can conclude on average, flights leave slightly earlier than the stated departure time. Even without the whiskers, we can see major spikes at the 75th percentile level for summer months, and said spikes were especially bad in 2017 for both airports.</p>
<p>These box plots are only an <a href="https://en.wikipedia.org/wiki/Exploratory_data_analysis">exploratory data analysis</a>. Determining the <em>cause</em> of changes in these flight metrics is difficult even for experts (I am definitely not an expert!) and many not even be possible to determine from publicly-available data.</p>
<p>But there are still other fun things that can be done with the airline flight data, such as faceting airline trends by time and the inclusion of other airports, which is <a href="https://twitter.com/minimaxir/status/1115261670153048065"><em>interesting</em></a>.</p>
<hr>
<p><em>You can view the BigQuery queries used to get the data, plus the R and ggplot2 used to create the data visualizations, in <a href="http://minimaxir.com/notebooks/sfo-jfk-flights/">this R Notebook</a>. You can also view the images/code used for this post in <a href="https://github.com/minimaxir/sfo-jfk-flights">this GitHub repository</a></em>.</p>
<p><em>You are free to use the data visualizations from this article however you wish, but it would be greatly appreciated if proper attribution is given to this article and/or myself!</em></p>
]]></content:encoded>
    </item>
    <item>
      <title>Run Any Scheduled Task/Cron Super-Cheap on Google Cloud Platform</title>
      <link>https://minimaxir.com/2018/11/cheap-cron/</link>
      <pubDate>Mon, 19 Nov 2018 09:00:00 -0700</pubDate>
      <guid>https://minimaxir.com/2018/11/cheap-cron/</guid>
      <description>Thanks to a few new synergies within GCP products, it&amp;rsquo;s possible to get the cost of running a scheduled task down to less than a dollar a month.</description>
      <content:encoded><![CDATA[<p>Let&rsquo;s say you want to make a <a href="https://twitter.com">Twitter</a> bot to tweet out a custom message every few hours or so, and the free-tier VMs offered by cloud services with fractional virtual CPUs and little RAM aren&rsquo;t sufficient. How do you host the bot? Many suggest you get a <a href="https://www.digitalocean.com">Digital Ocean</a> VM for <a href="https://www.digitalocean.com/pricing/">$5/mo</a>, which is not a bad price. But what if you want to run <em>multiple</em> bots? How do you easily coordinate multiple scheduled tasks?</p>
<p>In my case, I maintain three bots: a bot which <a href="https://twitter.com/MTGIFening">tweets GIFs</a> superimposed onto Magic: The Gathering cards, a bot which <a href="https://twitter.com/hackernews_nn">tweets AI-generated Hacker News submission titles</a>, and a bot which makes <a href="https://www.reddit.com/r/subredditnn">AI-generated Reddit submissions</a>. I found a clever solution to the multiple-bots problem: leveraging <a href="https://cloud.google.com/kubernetes-engine/docs/how-to/cronjobs">CronJobs</a> with <a href="https://cloud.google.com/kubernetes-engine/">Google Kubernetes Engine</a> + a single worker node. Each bot has its own CronJob which tells when GKE should schedule a Job for each task, and then the cluster executes the Jobs whenever compute capacity is available (i.e. no resource hogging/race conditions), and can ensure completion by restarting the task if it fails.</p>
<figure>

    <img loading="lazy" srcset="/2018/11/cheap-cron/kubecron_hu_257738fd32526fcd.webp 320w,/2018/11/cheap-cron/kubecron_hu_60eaa4e7513fc57f.webp 768w,/2018/11/cheap-cron/kubecron_hu_51e1e5c783f05f18.webp 1024w,/2018/11/cheap-cron/kubecron.png 1132w" src="kubecron.png"/> 
</figure>

<p>The cost of running a cluster in GKE is just the cost of the compute: using a preemptible n1-standard-1 (1 vCPU/3.75 GB RAM) worker node VM, the <a href="https://cloud.google.com/compute/pricing">cost</a> is about <strong>$7.30/mo</strong>, a bit more than the Digital Ocean server but can theoretically handle an unlimited number of scheduled tasks. The problem is the worker node needs to be up 24/7 even though the bots run sporadically.</p>
<p>But thanks to a few new synergies within GCP products, it&rsquo;s possible to get the cost of running a scheduled task down to <em>less than a dollar a month</em>.</p>
<h2 id="gcp-shenanigans">GCP Shenanigans</h2>
<p><a href="https://cloud.google.com/blog/products/application-development/announcing-cloud-scheduler-a-modern-managed-cron-service-for-automated-batch-jobs">A couple weeks ago</a>, Google released <a href="https://cloud.google.com/scheduler/">Cloud Scheduler</a>, which is a managed cron service that can perform tasks for other Google services. With that launch, Google also released a tutorial titled <a href="https://cloud.google.com/scheduler/docs/scheduling-instances-with-cloud-scheduler">Scheduling Instances with Cloud Scheduler</a>, demonstrating how you can programmatically start and stop instances using Cloud Scheduler in conjunction with <a href="https://cloud.google.com/functions/">Cloud Functions</a>. The demo use case is to schedule VMs during business hours, which gave me an idea; could this approach be used to boot up a VM, run a script, and then shut it down to minimize uptime?</p>
<p>I followed the tutorial instructions, which contains code to create a Cloud Function which boots up a specified VM, code for a Cloud Function to shut down a specified VM, and how to create cron jobs in Cloud Scheduler to invoke those two Functions at a specified time. For my scheduled tasks, I only need the instance up for a couple minutes: for example we can use a Cloud Scheduler job to start an instance every 4 hours at X:00 with the cron <code>0 */4 * * *</code>, and shut it down 2 minutes later at X:02 with the cron <code>2 */4 * * *</code>.</p>
<p>The next step is configuring a Google Compute Engine VM to run the scheduled task on boot. There are two ways to go about it: one is to use the <code>startup-script</code> field when configuring a VM, which specifies a command to run on boot and gets the job done. Another approach (which I use) is to package the scheduled task as a <a href="https://www.docker.com/resources/what-container">Docker Container</a>, and use a container-optimized OS which simply runs a specified container upon boot (although you should set restart to <code>On Failure</code> and give <code>Privileged Access</code> to the container). Additionally, the VM can be configured as a preemptible instance for massive cost savings, as the &ldquo;shut-down-at-anytime&rdquo; constraint is irrelevant for this use case!</p>
<figure>

    <img loading="lazy" srcset="/2018/11/cheap-cron/vm_hu_60ea09aeebe0f061.webp 320w,/2018/11/cheap-cron/vm_hu_5bae417d9bdcd8ce.webp 768w,/2018/11/cheap-cron/vm_hu_1d7bf6105857ee53.webp 1024w,/2018/11/cheap-cron/vm.png 1066w" src="vm.png"/> 
</figure>

<p>After the VMs are created, I created the start/stop tasks targeting those VMs as noted in the tutorial.</p>
<figure>

    <img loading="lazy" srcset="/2018/11/cheap-cron/scheduler_hu_f3821e91f9e5b9b.webp 320w,/2018/11/cheap-cron/scheduler_hu_b50652bfa915e2ea.webp 768w,/2018/11/cheap-cron/scheduler_hu_4245bc81e840ba03.webp 1024w,/2018/11/cheap-cron/scheduler.png 1446w" src="scheduler.png"/> 
</figure>

<p>I can verify that this workflow indeed works for all my bots, and the crons have been running successfully at the specified time, for only a couple minutes!</p>
<figure>

    <img loading="lazy" srcset="/2018/11/cheap-cron/working_hu_ff6618b7926cdd5b.webp 320w,/2018/11/cheap-cron/working_hu_b32d4c5569f2af3a.webp 768w,/2018/11/cheap-cron/working_hu_4917f0b4d5f52f8a.webp 1024w,/2018/11/cheap-cron/working.png 1790w" src="working.png"/> 
</figure>

<h2 id="crunching-the-numbers">Crunching the Numbers</h2>
<p>This approach incorporates many different Google products. Is it <em>actually</em> cheaper than just maintaining a simple $5/month server? Let&rsquo;s calculate the monthly cost of all these services.</p>
<p>Assuming that we run a scheduled task every 4 hours, and the server is up for 2 minutes each time (i.e. 12 minutes of uptime a day):</p>
<ul>
<li><strong>Compute Engine</strong>: A preemptible n1-standard-1 is <a href="https://cloud.google.com/compute/pricing">$0.01 an hour</a>. <code>$0.01 / 60 * 12 * 30 = $0.06</code></li>
<li><strong>VM Persistent Disk</strong>: Each GB of storage for a VM costs <a href="https://cloud.google.com/compute/pricing#disk">$0.04/month</a>, and the minimum storage size is 10GB. <code>$0.04 * 10 = $0.40</code></li>
<li><strong>Cloud Scheduler</strong>: Each rule is <a href="https://cloud.google.com/scheduler/pricing">$0.10/month</a>, and there are both a start and a stop rule. <code>$0.10 * 2 = $0.20</code></li>
<li><strong>Cloud Functions</strong>: It takes about 60 seconds total to turn on and off a VM, and with the default 256MB provision, during which it costs <a href="https://cloud.google.com/functions/pricing">$.000000463/100ms</a>. <code>0.000000463 * 10 * 60 * 12 * 30 = $0.10</code></li>
</ul>
<p>$0.06 + $0.40 + $0.20 + $0.10 = <strong>$0.76/month to run the scheduled task</strong>! That&rsquo;s not even counting the free tier bonuses if you just want to create one scheduled task; in that case, the only price you pay is the $0.06/mo for the VM. And even in the case where you run the task every hour (like in the images above), the cost is $1.24/month; still not bad.</p>
<p>It&rsquo;s worth noting that these pricing economics wouldn&rsquo;t have worked years ago. Back then <a href="https://aws.amazon.com">Amazon Web Services</a>, the leader in web services, charged for a minimum of 1 hour every time a VM was booted. Google Compute Engine innovated by only requiring a minimum of 10 minutes, which is much better but still would have had unnecessary overhead (in this example, it would increase compute monthly costs by $0.24/31%). <a href="https://cloud.google.com/blog/products/gcp/extending-per-second-billing-in-google">As of September 2017</a>, Google Compute Engine charges a minimum of <strong>1 minute</strong>, which makes this workflow possible and cheap (AWS made the same change <a href="https://aws.amazon.com/blogs/aws/new-per-second-billing-for-ec2-instances-and-ebs-volumes/">a week earlier</a>).</p>
<p>It&rsquo;s also possible that similar workflows exist for AWS and <a href="https://azure.microsoft.com/en-us/">Azure Cloud</a>, although I&rsquo;m less familiar with those platforms (and it may not necessarily be better/cheaper). Sure, if you have a very simple task to practice making bots in the cloud, the free tier of any cloud service might suffice (where you run the server all the time, and schedule the cron on the server itself). If you&rsquo;re planning many scheduled tasks, then a centralized approach like my initial Kubernetes implementation might actually be more cost effective. But if you&rsquo;re somewhere <em>in between</em>, then giving each scheduled task its own VM makes more sense for both ease of use and cost-effectiveness. And there&rsquo;s still many further optimizations to be done too (for example, by allowing the script in the VM to ping a HTTP Cloud Function endpoint and shut <em>itself</em> off when complete instead of using a scheduled cron rule).</p>
]]></content:encoded>
    </item>
    <item>
      <title>Problems with Predicting Post Performance on Reddit and Other Link Aggregators</title>
      <link>https://minimaxir.com/2018/09/modeling-link-aggregators/</link>
      <pubDate>Mon, 10 Sep 2018 09:15:00 -0700</pubDate>
      <guid>https://minimaxir.com/2018/09/modeling-link-aggregators/</guid>
      <description>The nature of algorithmic feeds like Reddit inherently leads to a survivorship bias: although users may recognize certain types of posts that appear on the front page, there are many more which follow the same patterns but fail.</description>
      <content:encoded><![CDATA[<p><a href="https://www.reddit.com">Reddit</a>, &ldquo;the front page of the internet&rdquo; is a link aggregator where anyone can submit links to cool happenings. Over the years, Reddit has expanded from just being a link aggregator, to allowing image and videos, and as of recently, hosting images and videos itself.</p>
<p>Reddit is broken down into subreddits, where each subreddit represents each own community around a particular interest, like <a href="https://www.reddit.com/r/aww">/r/aww</a> for pet photos and <a href="https://www.reddit.com/r/politics/">/r/politics</a> for U.S. politics. The posts on each subreddit are ranked by some function of both time elapsed since the submission was made, and the <em>score</em> of the submission as determined by upvotes and downvotes from other users.</p>
<figure>

    <img loading="lazy" srcset="/2018/09/modeling-link-aggregators/reddit_aww_hu_15514c9daececa75.webp 320w,/2018/09/modeling-link-aggregators/reddit_aww_hu_38fdc85d80e9f49f.webp 768w,/2018/09/modeling-link-aggregators/reddit_aww.png 827w" src="reddit_aww.png"/> 
</figure>

<p>There&rsquo;s also an intrinsic pride in having something you&rsquo;re responsible for providing to the community get lots of upvotes (the submitter also earns karma based on received upvotes, although karma is meaningless and doesn&rsquo;t provide any user benefits). But the reality is that even on the largest subreddits, submissions with 1 point (the default score for new submissions) are the most prominent, with some subreddits having <em>over half</em> of their submissions with only 1 point.</p>
<figure>

    <img loading="lazy" srcset="/2018/09/modeling-link-aggregators/reddit_dist_facet_hu_94559d39f676be08.webp 320w,/2018/09/modeling-link-aggregators/reddit_dist_facet_hu_ede8ccaaf5538573.webp 768w,/2018/09/modeling-link-aggregators/reddit_dist_facet_hu_940890d5e65baccb.webp 1024w,/2018/09/modeling-link-aggregators/reddit_dist_facet.png 1800w" src="reddit_dist_facet.png"/> 
</figure>

<p>The exposure from having a submission go viral on Reddit (especially on larger subreddits) can be valuable especially if its your own original content. As a result, there has been a lot of <a href="https://www.brandwatch.com/blog/how-to-get-on-the-front-page-of-reddit/">analysis</a>/<a href="https://www.reddit.com/r/starterpacks/comments/8rkfk9/reddit_front_page_starter_pack/">stereotypes</a> on what techniques to do to help your submission make it to the top of the front page. But almost all claims of &ldquo;cracking&rdquo; the Reddit algorithm are <a href="https://en.wikipedia.org/wiki/Post_hoc_ergo_propter_hoc"><em>post hoc</em> rationalizations</a>, attributing success to things like submission timing and title verbiage of a single submission after the fact. The nature of algorithmic feeds inherently leads to a <a href="https://en.wikipedia.org/wiki/Survivorship_bias">survivorship bias</a>: although users may recognize certain types of posts that appear on the front page, there are many more which follow the same patterns but fail, which makes modeling a successful post very tricky.</p>
<p>I&rsquo;ve touched on analyzing Reddit post performance <a href="https://minimaxir.com/2017/06/reddit-deep-learning/">before</a>, but let&rsquo;s give it another look and see if we can drill down on why Reddit posts do and do not do well.</p>
<h2 id="submission-timing">Submission Timing</h2>
<p>As with many US-based websites, the majority of Reddit users are most active during work hours (9 AM — 5 PM Eastern time weekdays). Most subreddits have submission patterns which fit accordingly.</p>
<figure>

    <img loading="lazy" srcset="/2018/09/modeling-link-aggregators/reddit_subreddit_prop_hu_6063ab19aff16cb2.webp 320w,/2018/09/modeling-link-aggregators/reddit_subreddit_prop_hu_4354ae33b8600c6a.webp 768w,/2018/09/modeling-link-aggregators/reddit_subreddit_prop_hu_5818614336fda8df.webp 1024w,/2018/09/modeling-link-aggregators/reddit_subreddit_prop.png 1800w" src="reddit_subreddit_prop.png"/> 
</figure>

<p>But what&rsquo;s interesting are the subreddits which <em>deviate</em> from that standard. Gaming subreddits (<a href="https://www.reddit.com/r/DestinyTheGame">/r/DestinyTheGame</a>, <a href="https://www.reddit.com/r/Overwatch">/r/Overwatch</a>) have short activity after a Tuesday game update/patch, game <em>communication</em> subreddits (<a href="https://www.reddit.com/r/Fireteams">/r/Fireteams</a>, <a href="https://www.reddit.com/r/RocketLeagueExchange">/r/RocketLeagueExchange</a>) are more active <em>outside</em> of work hours as they assume you are playing the game at the time, and Not-Safe-For-Work subreddits (/r/dirtykikpals, /r/gonewild) are incidentally less active during work hours and more active late-night than other subreddits.</p>
<p>Whenever you make a submission to Reddit, the submission appears in the subreddit&rsquo;s <code>/new</code> queue of the most recent submissions, where hopefully kind souls will find your submission and upvote it if it&rsquo;s good.</p>
<figure>

    <img loading="lazy" srcset="/2018/09/modeling-link-aggregators/reddit_new_hu_6650be6d73851b91.webp 320w,/2018/09/modeling-link-aggregators/reddit_new.png 762w" src="reddit_new.png"/> 
</figure>

<p>However, if it falls off the first page of the <code>/new</code> queue, your submission might be as good as dead. As a result, there&rsquo;s an element of game theory to timing your submission if you want it to not become another 1-point submission. Is it better to submit during peak hours when more users may see the submission before it falls off of <code>/new</code>? Is it better to submit <em>before</em> peak usage since there will be less competition, then continue the momentum once it hits the front page?</p>
<p>Here&rsquo;s a look at the median post performance at each given time slot for top subreddits:</p>
<figure>

    <img loading="lazy" srcset="/2018/09/modeling-link-aggregators/reddit_subreddit_hr_doy_hu_cb9c5ba898252674.webp 320w,/2018/09/modeling-link-aggregators/reddit_subreddit_hr_doy_hu_8ba4a17a13989a31.webp 768w,/2018/09/modeling-link-aggregators/reddit_subreddit_hr_doy_hu_a08bfb9858ec4480.webp 1024w,/2018/09/modeling-link-aggregators/reddit_subreddit_hr_doy.png 1800w" src="reddit_subreddit_hr_doy.png"/> 
</figure>

<p>As the earlier distribution chart implied, the median score is around 1-2 for most subreddits, and that&rsquo;s consistent across all time slots. Some subreddits with higher medians like /r/me<em>irl do appear to have a _slight</em> benefit when posting before peak activity. When focusing on subreddits with high overall median scores, the difference is more explicit.</p>
<figure>

    <img loading="lazy" srcset="/2018/09/modeling-link-aggregators/reddit_subreddit_highmedian_hu_2730023d99e9e0d9.webp 320w,/2018/09/modeling-link-aggregators/reddit_subreddit_highmedian_hu_78be513d900d66b5.webp 768w,/2018/09/modeling-link-aggregators/reddit_subreddit_highmedian_hu_da4a41445f75e1.webp 1024w,/2018/09/modeling-link-aggregators/reddit_subreddit_highmedian.png 1800w" src="reddit_subreddit_highmedian.png"/> 
</figure>

<p>Subreddits like /r/PrequelMemes and /r/The<em>Donald _definitely</em> have better performance on average when made before peak activity! Posting before peak usage <em>does</em> appear to be a viable strategy, however for the majority of subreddits it doesn&rsquo;t make much of a difference.</p>
<h2 id="submission-titles">Submission Titles</h2>
<p>Each Reddit subreddit has their own vocabulary and topics of discussion. Let&rsquo;s break down text by subreddit by looking at the 75th percentile for score on posts containing a given two-word phrase:</p>
<figure>

    <img loading="lazy" srcset="/2018/09/modeling-link-aggregators/reddit_subreddit_topbigrams_hu_5d8f080824cf057d.webp 320w,/2018/09/modeling-link-aggregators/reddit_subreddit_topbigrams_hu_2870270c6078715e.webp 768w,/2018/09/modeling-link-aggregators/reddit_subreddit_topbigrams_hu_9edc52c78d8fe6ca.webp 1024w,/2018/09/modeling-link-aggregators/reddit_subreddit_topbigrams.png 1800w" src="reddit_subreddit_topbigrams.png"/> 
</figure>

<p>The one trend consistent across all subreddits is the effectiveness of first-person pronouns (<em>I/my</em>) and original content (<em>fan art</em>). Other than that, the vocabulary and sentiment for successful posts is very specific to the subreddit and culture is represents; no universal guaranteed-success memes.</p>
<h2 id="can-deep-learning-predict-post-performance">Can Deep Learning Predict Post Performance?</h2>
<p>Some might think &ldquo;oh hey, this is an arbitrary statistical problem, you can just build an AI to solve it!&rdquo; So, for the sake of argument, I did.</p>
<p>Instead of using Reddit data for building a deep learning model, we&rsquo;ll use data from <a href="https://news.ycombinator.com">Hacker News</a>, another link aggregator similar to Reddit with a strong focus on technology and startup entrepreneurship. The distribution of scores on posts, submission timings, upvoting, and front page ranking systems are all the same as on Reddit.</p>
<figure>

    <img loading="lazy" srcset="/2018/09/modeling-link-aggregators/hn_hu_ad0b8ce0803e73ea.webp 320w,/2018/09/modeling-link-aggregators/hn_hu_9592bce993e10dcd.webp 768w,/2018/09/modeling-link-aggregators/hn_hu_c329d6412551f993.webp 1024w,/2018/09/modeling-link-aggregators/hn.png 1520w" src="hn.png"/> 
</figure>

<p>The titles on Hacker News submissions are also shorter (80 characters max vs. Reddit&rsquo;s 300 character max) and in concise English (no memes/shitposts allowed), which should help the model learn the title syntax and identify high-impact keywords easier. Like Reddit, the score data is super-skewed with most HN submissions at 1-2 points, and typical model training will quickly converge but try to predict that <em>every</em> submission has a score of 1, which isn&rsquo;t helpful!</p>
<p>By constructing a model employing <em>many</em> deep learning tricks with <a href="https://keras.io">Keras</a>/<a href="https://www.tensorflow.org">TensorFlow</a> to prevent model cheating and training on <em>hundreds of thousands</em> of HN submissions (using post title, day-of-week, hour, and link domain like <code>github.com</code> as model features), the model does converge and finds some signal among the noise (training R<sup>2</sup> ~ 0.55 when trained for 50 epochs). However, it fails to offer any valuable predictions on new, unseen posts (test R<sup>2</sup> <em>&lt; 0.00</em>) because it falls into the same exact human biases regarding titles: it saw submissions with titles that did very well during training, but can&rsquo;t isolate the random chance why X and Y submissions are similar but X goes viral while Y does not.</p>
<figure>

    <img loading="lazy" srcset="/2018/09/modeling-link-aggregators/hn_test_hu_75e647e4de235ee0.webp 320w,/2018/09/modeling-link-aggregators/hn_test.png 485w" src="hn_test.png"/> 
</figure>

<p>I&rsquo;ve made the Keras/TensorFlow model training code available in <a href="https://www.kaggle.com/minimaxir/hacker-news-submission-score-predictor/notebook">this Kaggle Notebook</a> if you want to fork it and try to improve the model.</p>
<h2 id="other-potential-modeling-factors">Other Potential Modeling Factors</h2>
<p>The deep learning model above makes optimistic assumptions about the underlying data, including that each post behaves independently, and the included features are the sole features which determine the score. These assumptions are questionable.</p>
<p>The simple model forgoes the content of the submission itself, which is hard to retrieve for hundreds of thousands of data points. On Hacker News that&rsquo;s mostly OK since most submissions are links/articles which accurately correlate to the content, although occasionally there are idiosyncratic short titles which do the opposite. On Reddit, obviously looking at content is necessary for image/video-oriented subreddits, which is hard to gather and analyze at scale.</p>
<p>A very important concept of post performance is <em>momentum</em>. A post having a high score is a positive signal in itself, which begets more votes (a famous Reddit problem is brigading from /r/all which can cause submission scores to skyrocket). If the front page of a subreddit has a large number of high-performing posts, they might also suppress posts coming out of the <code>/new</code> queue because the score threshold is much higher. A simple model may not be able to capture these impacts; the model would need to incorporate the <em>state of the front page</em> at the time of posting.</p>
<p>Some also try to manipulate upvotes. Reddit became famous for adding the rule &ldquo;asking for upvotes is a violation of intergalactic law&rdquo; to their <a href="https://www.reddithelp.com/en/categories/rules-reporting/account-and-community-restrictions/what-constitutes-vote-cheating-or">Content Policy</a>, although some subreddits do it anyway <a href="https://www.reddit.com/r/TheoryOfReddit/comments/5qqrod/for_years_reddit_told_us_that_saying_upvote_this/">without consequence</a>. On Reddit, obvious spam posts can be downvoted to immediately counteract illicit upvotes. Hacker News has a <a href="https://news.ycombinator.com/newsfaq.html">similar don&rsquo;t-upvote rule</a>, although there aren&rsquo;t downvotes, just a flagging mechanism which quickly neutralizes spam/misleading posts. In general, there&rsquo;s no <em>legitimate</em> reason to highlight your own submission immediately after its posted (except for Reddit&rsquo;s AMAs). Fortunately, gaming the system is less impactful on Reddit and Hacker News due to their sheer size and countermeasures, but it&rsquo;s a good example of potential user behavior that makes modeling post performance difficult, and hopefully link aggregators of the future aren&rsquo;t susceptible to such shenanigans.</p>
<h2 id="do-we-really-to-predict-post-score">Do We Really to Predict Post Score?</h2>
<p>Let&rsquo;s say you are submitting original content to Reddit or your own tech project to Hacker News. More points means a higher ranking means more exposure for your link, right? Not exactly. As noted from Reddit/HN screenshots above, the scores of popular submissions are all over the place ranking-wise, having been affected by age penalties.</p>
<p>In practical terms, from my own purely anecdotal experience, submissions at a top ranking receive <em>substantially</em> more clickthroughs despite being spatially close on the page to others.</p>
<p><span><blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">&hellip;and now traffic at #3.<br><br>Placement is absurdly important for search engines/social media sites. Difference between #1 and #3 is dramatic. <a href="https://t.co/nGjWJBx6dU">pic.twitter.com/nGjWJBx6dU</a></p>— Max Woolf (@minimaxir) <a href="https://twitter.com/minimaxir/status/877219784907149316?ref_src=twsrc%5Etfw">June 20, 2017</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></span></p>
<p>In <a href="https://twitter.com/minimaxir/status/877219784907149316">that case</a>, falling from #1 to #3 <em>immediately halved</em> the referral traffic coming from Hacker News.</p>
<p>Therefore, an ideal link aggregator predictive model to maximize clicks should try to predict the <em>rank</em> of a submission (max rank, average rank over <em>n</em> period, etc.), not necessarily the score it receives. You could theoretically create a model by making a snapshot of a Reddit subreddit/front page of Hacker News every minute or so which includes the post position at the time of the snapshot. As mentioned earlier, the snapshots can also be used as a model feature to identify whether the front page is active or stale. Unfortunately, snapshots can&rsquo;t be retrieved retroactively, and both storing, processing, and analyzing snapshots at scale is a difficult and <em>expensive</em> feat of data engineering.</p>
<p>Presumably Reddit&rsquo;s data scientists would be incorporating submission position as a part of their data analytics and modeling, but after inspecting what&rsquo;s sent to Reddit&rsquo;s servers when you perform an action like upvoting, I wasn&rsquo;t able to find a sent position value when upvoting from the feed: only the post score and post upvote percentage at the time of the action were sent.</p>
<figure>

    <img loading="lazy" srcset="/2018/09/modeling-link-aggregators/chrome_hu_4b758c7e3fe42881.webp 320w,/2018/09/modeling-link-aggregators/chrome_hu_29f25ed9207a6d8f.webp 768w,/2018/09/modeling-link-aggregators/chrome_hu_f6617992d5fb908c.webp 1024w,/2018/09/modeling-link-aggregators/chrome.png 1442w" src="chrome.png"/> 
</figure>

<p>In this example, I upvoted the <code>Fact are facts</code> submission at position #5: we&rsquo;d expect a value between <code>3</code> and <code>5</code> be sent with the post metadata within the analytics payload, but that&rsquo;s not the case.</p>
<p>Optimizing ranking instead of a tangible metric or classification accuracy is a relatively underdiscussed field of modern data science (besides <a href="https://en.wikipedia.org/wiki/Search_engine_optimization">SEO</a> for getting the top spot on a Google search), and it would be interesting to dive deeper into it for other applications.</p>
<h2 id="in-the-future">In the future</h2>
<p>The moral of this post is that you should not take it personally if a submission fails to hit the front page. It doesn&rsquo;t necessarily mean it&rsquo;s bad. Conversely, if a post does well, don’t assume that similar posts will do just as well. There&rsquo;s a lot of quality content that falls through the cracks due to dumb luck. Fortunately, both Reddit and Hacker News allow reposts, which helps alleviate this particular problem.</p>
<p>There&rsquo;s still a lot that can be done to more deterministically predict the behavior of these algorithmic feeds. There&rsquo;s also room to help make these link aggregators more <em>fair</em>. Unfortunately, there&rsquo;s even more undiscovered ways to game these algorithms, and we&rsquo;ll see how things play out.</p>
<hr>
<p><em>You can view the BigQuery queries used to get the Reddit and Hacker News data, plus the R and ggplot2 used to create the data visualizations, in <a href="http://minimaxir.com/notebooks/modeling-link-aggregators/">this R Notebook</a>. You can also view the images/code used for this post in <a href="https://github.com/minimaxir/modeling-link-aggregators">this GitHub repository</a></em>.</p>
<p><em>You are free to use the data visualizations from this article however you wish, but it would be greatly appreciated if proper attribution is given to this article and/or myself!</em></p>
]]></content:encoded>
    </item>
    <item>
      <title>Analyzing IMDb Data The Intended Way, with R and ggplot2</title>
      <link>https://minimaxir.com/2018/07/imdb-data-analysis/</link>
      <pubDate>Mon, 16 Jul 2018 09:45:00 -0700</pubDate>
      <guid>https://minimaxir.com/2018/07/imdb-data-analysis/</guid>
      <description>For IMDb&amp;rsquo;s big-but-not-big data, you have to play with the data smartly, and both R and ggplot2 have neat tricks to do just that.</description>
      <content:encoded><![CDATA[<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share; fullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube-nocookie.com/embed/P4_zSfoTM80?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"></iframe>
    </div>

<p><a href="https://www.imdb.com">IMDb</a>, the Internet Movie Database, has been a popular source for data analysis and visualizations over the years. The combination of user ratings for movies and detailed movie metadata have always been fun to <a href="http://minimaxir.com/2016/01/movie-revenue-ratings/">play with</a>.</p>
<p>There are a number of tools to help get IMDb data, such as <a href="https://github.com/alberanid/imdbpy">IMDbPY</a>, which makes it easy to programmatically scrape IMDb by pretending it&rsquo;s a website user and extracting the relevant data from the page&rsquo;s HTML output. While it <em>works</em>, web scraping public data is a gray area in terms of legality; many large websites have a Terms of Service which forbids scraping, and can potentially send a DMCA take-down notice to websites redistributing scraped data.</p>
<p>IMDb has <a href="https://help.imdb.com/article/imdb/general-information/can-i-use-imdb-data-in-my-software/G5JTRESSHJBBHTGX">data licensing terms</a> which forbid scraping and require an attribution in the form of a <strong>Information courtesy of IMDb (<a href="http://www.imdb.com">http://www.imdb.com</a>). Used with permission.</strong> statement, and has also <a href="https://www.kaggle.com/tmdb/tmdb-movie-metadata/home">DMCAed a Kaggle IMDb dataset</a> to hone the point.</p>
<p>However, there is good news! IMDb publishes an <a href="https://www.imdb.com/interfaces/">official dataset</a> for casual data analysis! And it&rsquo;s now very accessible, just choose a dataset and download (now with no hoops to jump through), and the files are in the standard <a href="https://en.wikipedia.org/wiki/Tab-separated_values">TSV format</a>.</p>
<figure>

    <img loading="lazy" srcset="/2018/07/imdb-data-analysis/datasets_hu_fb4ad2ef1d7c9e7f.webp 320w,/2018/07/imdb-data-analysis/datasets_hu_a5155a40c73aa984.webp 768w,/2018/07/imdb-data-analysis/datasets.png 926w" src="datasets.png"/> 
</figure>

<p>The uncompressed files are pretty large; not &ldquo;big data&rdquo; large (it fits into computer memory), but Excel will explode if you try to open them in it. You have to play with the data <em>smartly</em>, and both <a href="https://www.r-project.org">R</a> and <a href="https://ggplot2.tidyverse.org/reference/index.html">ggplot2</a> have neat tricks to do just that.</p>
<h2 id="first-steps">First Steps</h2>
<p>R is a popular programming language for statistical analysis. One of the most popular series of external packages is the <code>tidyverse</code> package, which automatically imports the <code>ggplot2</code> data visualization library and other useful packages which we&rsquo;ll get to one-by-one. We&rsquo;ll also use <code>scales</code> which we&rsquo;ll use later for prettier number formatting. First we&rsquo;ll load these packages:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="nf">library</span><span class="p">(</span><span class="n">tidyverse</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">library</span><span class="p">(</span><span class="n">scales</span><span class="p">)</span>
</span></span></code></pre></div><p>And now we can load a TSV downloaded from IMDb using the <code>read_tsv</code> function from <code>readr</code> (a tidyverse package), which does what the name implies, at a much faster speed than base R (+ a couple other parameters to handle data encoding). Let&rsquo;s start with the <code>ratings</code> file:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">df_ratings</span> <span class="o">&lt;-</span> <span class="nf">read_tsv</span><span class="p">(</span><span class="s">&#39;title.ratings.tsv&#39;</span><span class="p">,</span> <span class="n">na</span> <span class="o">=</span> <span class="s">&#34;\\N&#34;</span><span class="p">,</span> <span class="n">quote</span> <span class="o">=</span> <span class="s">&#39;&#39;</span><span class="p">)</span></span></span></code></pre></div>
<p>We can preview what&rsquo;s in the loaded data using <code>dplyr</code> (a tidyverse package), which is what we&rsquo;ll be using to manipulate data for this analysis. dplyr allows you to pipe commands, making it easy to create a sequence of manipulation commands. For now, we&rsquo;ll use <code>head()</code>, which displays the top few rows of the data frame.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">df_ratings</span> <span class="o">%&gt;%</span> <span class="nf">head</span><span class="p">()</span>
</span></span></code></pre></div><figure>

    <img loading="lazy" srcset="/2018/07/imdb-data-analysis/ratings_hu_5c1fcf56a5289876.webp 320w,/2018/07/imdb-data-analysis/ratings_hu_cf3fece2f9c850ca.webp 768w,/2018/07/imdb-data-analysis/ratings.png 930w" src="ratings.png"/> 
</figure>

<p>Each of the <strong>873k rows</strong> corresponds to a single movie, an ID for the movie, its average rating (from 1 to 10), and the number of votes which contribute to that average. Since we have two numeric variables, why not test out ggplot2 by creating a scatterplot mapping them? ggplot2 takes in a data frame and names of columns as aesthetics, then you specify what type of shape to plot (a &ldquo;geom&rdquo;). Passing the plot to <code>ggsave</code> saves it as a standalone, high-quality data visualization.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">plot</span> <span class="o">&lt;-</span> <span class="nf">ggplot</span><span class="p">(</span><span class="n">df_ratings</span><span class="p">,</span> <span class="nf">aes</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="n">numVotes</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">averageRating</span><span class="p">))</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">          <span class="nf">geom_point</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nf">ggsave</span><span class="p">(</span><span class="s">&#34;imdb-0.png&#34;</span><span class="p">,</span> <span class="n">plot</span><span class="p">,</span> <span class="n">width</span> <span class="o">=</span> <span class="m">4</span><span class="p">,</span> <span class="n">height</span> <span class="o">=</span> <span class="m">3</span><span class="p">)</span>
</span></span></code></pre></div><figure>

    <img loading="lazy" srcset="/2018/07/imdb-data-analysis/imdb-0_hu_6866c079d670893c.webp 320w,/2018/07/imdb-data-analysis/imdb-0_hu_dddd194229265d79.webp 768w,/2018/07/imdb-data-analysis/imdb-0_hu_1d852e43e8a54dea.webp 1024w,/2018/07/imdb-data-analysis/imdb-0.png 1200w" src="imdb-0.png"/> 
</figure>

<p>Here is nearly <em>1 million</em> points on a single chart; definitely don&rsquo;t try to do that in Excel! However, it&rsquo;s not a <em>useful</em> chart since all the points are opaque and we&rsquo;re not sure what the spatial density of points is. One approach to fix this issue is to create a heat map of points, which ggplot can do natively with <code>geom_bin2d</code>. We can color the heat map with the <a href="https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html">viridis</a> colorblind-friendly palettes <a href="https://ggplot2.tidyverse.org/reference/scale_viridis.html">just introduced</a> into ggplot2. We should also tweak the axes; the x-axis should be scaled logarithmically with <code>scale_x_log10</code> since there are many movies with high numbers of votes and we can format those numbers with the <code>comma</code> function from the <code>scales</code> package (we can format the scale with <code>comma</code> too). For the y-axis, we can add explicit number breaks for each rating; R can do this neatly by setting the breaks to <code>1:10</code>. Putting it all together:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">plot</span> <span class="o">&lt;-</span> <span class="nf">ggplot</span><span class="p">(</span><span class="n">df_ratings</span><span class="p">,</span> <span class="nf">aes</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="n">numVotes</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">averageRating</span><span class="p">))</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">          <span class="nf">geom_bin2d</span><span class="p">()</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">          <span class="nf">scale_x_log10</span><span class="p">(</span><span class="n">labels</span> <span class="o">=</span> <span class="n">comma</span><span class="p">)</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">          <span class="nf">scale_y_continuous</span><span class="p">(</span><span class="n">breaks</span> <span class="o">=</span> <span class="m">1</span><span class="o">:</span><span class="m">10</span><span class="p">)</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">          <span class="nf">scale_fill_viridis_c</span><span class="p">(</span><span class="n">labels</span> <span class="o">=</span> <span class="n">comma</span><span class="p">)</span>
</span></span></code></pre></div><figure>

    <img loading="lazy" srcset="/2018/07/imdb-data-analysis/imdb-1_hu_afa4c2e2f89a47f2.webp 320w,/2018/07/imdb-data-analysis/imdb-1_hu_fb49622c671e7e.webp 768w,/2018/07/imdb-data-analysis/imdb-1_hu_fe5886baf1a1a113.webp 1024w,/2018/07/imdb-data-analysis/imdb-1.png 1200w" src="imdb-1.png"/> 
</figure>

<p>Not bad, although it unfortunately confirms that IMDb follows a <a href="https://tvtropes.org/pmwiki/pmwiki.php/Main/FourPointScale">Four Point Scale</a> where average ratings tend to fall between 6 — 9.</p>
<h2 id="mapping-movies-to-ratings">Mapping Movies to Ratings</h2>
<p>You may be asking &ldquo;which ratings correspond to which movies?&rdquo; That&rsquo;s what the <code>tconst</code> field is for. But first, let&rsquo;s load the title data from <code>title.basics.tsv</code> into <code>df_basics</code> and take a look as before.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">df_basics</span> <span class="o">&lt;-</span> <span class="nf">read_tsv</span><span class="p">(</span><span class="s">&#39;title.basics.tsv&#39;</span><span class="p">,</span> <span class="n">na</span> <span class="o">=</span> <span class="s">&#34;\\N&#34;</span><span class="p">,</span> <span class="n">quote</span> <span class="o">=</span> <span class="s">&#39;&#39;</span><span class="p">)</span>
</span></span></code></pre></div><p><figure>

    <img loading="lazy" srcset="/2018/07/imdb-data-analysis/basics1_hu_fdcb6a5f4e7311e5.webp 320w,/2018/07/imdb-data-analysis/basics1_hu_e15b78e5bbe944b8.webp 768w,/2018/07/imdb-data-analysis/basics1_hu_2e217e73acfcd9ff.webp 1024w,/2018/07/imdb-data-analysis/basics1.png 1350w" src="basics1.png"/> 
</figure>

<figure>

    <img loading="lazy" srcset="/2018/07/imdb-data-analysis/basics2_hu_a64ae979748aa9ab.webp 320w,/2018/07/imdb-data-analysis/basics2_hu_a83799eaf31e4743.webp 768w,/2018/07/imdb-data-analysis/basics2_hu_21a8fb679f3ec4e9.webp 1024w,/2018/07/imdb-data-analysis/basics2.png 1374w" src="basics2.png"/> 
</figure>
</p>
<p>We have some neat movie metadata. Notably, this table has a <code>tconst</code> field as well. Therefore, we can <em>join</em> the two tables together, adding the movie information to the corresponding row in the rating table (in this case, a left join is more appropriate than an inner/full join)</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">df_ratings</span> <span class="o">&lt;-</span> <span class="n">df_ratings</span> <span class="o">%&gt;%</span> <span class="nf">left_join</span><span class="p">(</span><span class="n">df_basics</span><span class="p">)</span>
</span></span></code></pre></div><p>Runtime minutes sounds interesting. Could there be a relationship between the length of a movie and its average rating on IMDb? Let&rsquo;s make a heat map plot again, but with a few tweaks. With the new metadata, we can <code>filter</code> the table to remove bad points; let&rsquo;s keep movies only (as IMDb data also contains <em>television show data</em>), with a runtime &lt; 3 hours, and which have received atleast 10 votes by users to remove extraneous movies). X-axis should be tweaked to display the minutes-values in hours. The fill viridis palette can be changed to another one in the family (I personally like <code>inferno</code>).</p>
<p>More importantly, let&rsquo;s discuss plot theming. If you want a minimalistic theme, add a <code>theme_minimal</code> to the plot, and you can pass a <code>base_family</code> to change the default font on the plot and a <code>base_size</code> to change the font size. The <code>labs</code> function lets you add labels to the plot (which you should <em>always</em> do); you have your <code>title</code>, <code>x</code>, and <code>y</code> parameters, but you can also add a <code>subtitle</code>, a <code>caption</code> for attribution, and a <code>color</code>/<code>fill</code> to name the scale. Putting it all together:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">plot</span> <span class="o">&lt;-</span> <span class="nf">ggplot</span><span class="p">(</span><span class="n">df_ratings</span> <span class="o">%&gt;%</span> <span class="nf">filter</span><span class="p">(</span><span class="n">runtimeMinutes</span> <span class="o">&lt;</span> <span class="m">180</span><span class="p">,</span> <span class="n">titleType</span> <span class="o">==</span> <span class="s">&#34;movie&#34;</span><span class="p">,</span> <span class="n">numVotes</span> <span class="o">&gt;=</span> <span class="m">10</span><span class="p">),</span> <span class="nf">aes</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="n">runtimeMinutes</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">averageRating</span><span class="p">))</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">          <span class="nf">geom_bin2d</span><span class="p">()</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">          <span class="nf">scale_x_continuous</span><span class="p">(</span><span class="n">breaks</span> <span class="o">=</span> <span class="nf">seq</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">180</span><span class="p">,</span> <span class="m">60</span><span class="p">),</span> <span class="n">labels</span> <span class="o">=</span> <span class="m">0</span><span class="o">:</span><span class="m">3</span><span class="p">)</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">          <span class="nf">scale_y_continuous</span><span class="p">(</span><span class="n">breaks</span> <span class="o">=</span> <span class="m">0</span><span class="o">:</span><span class="m">10</span><span class="p">)</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">          <span class="nf">scale_fill_viridis_c</span><span class="p">(</span><span class="n">option</span> <span class="o">=</span> <span class="s">&#34;inferno&#34;</span><span class="p">,</span> <span class="n">labels</span> <span class="o">=</span> <span class="n">comma</span><span class="p">)</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">          <span class="nf">theme_minimal</span><span class="p">(</span><span class="n">base_family</span> <span class="o">=</span> <span class="s">&#34;Source Sans Pro&#34;</span><span class="p">,</span> <span class="n">base_size</span> <span class="o">=</span> <span class="m">8</span><span class="p">)</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">          <span class="nf">labs</span><span class="p">(</span><span class="n">title</span> <span class="o">=</span> <span class="s">&#34;Relationship between Movie Runtime and Average Mobie Rating&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">               <span class="n">subtitle</span> <span class="o">=</span> <span class="s">&#34;Data from IMDb retrieved July 4th, 2018&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">               <span class="n">x</span> <span class="o">=</span> <span class="s">&#34;Runtime (Hours)&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">               <span class="n">y</span> <span class="o">=</span> <span class="s">&#34;Average User Rating&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">               <span class="n">caption</span> <span class="o">=</span> <span class="s">&#34;Max Woolf — minimaxir.com&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">               <span class="n">fill</span> <span class="o">=</span> <span class="s">&#34;# Movies&#34;</span><span class="p">)</span>
</span></span></code></pre></div><figure>

    <img loading="lazy" srcset="/2018/07/imdb-data-analysis/imdb-2b_hu_37c6091878dca7a3.webp 320w,/2018/07/imdb-data-analysis/imdb-2b_hu_42f5a5f9d2e7967e.webp 768w,/2018/07/imdb-data-analysis/imdb-2b_hu_b4f485eff14f2484.webp 1024w,/2018/07/imdb-data-analysis/imdb-2b.png 1200w" src="imdb-2b.png"/> 
</figure>

<p>Now that&rsquo;s pretty nice-looking for only a few lines of code! Albeit unhelpful, as there doesn&rsquo;t appear to be a correlation.</p>
<p><em>(Note: for the rest of this post, the theming/labels code will be omitted for convenience)</em></p>
<p>How about movie ratings vs. the year the movie was made? It&rsquo;s a similar plot code-wise to the one above (one perk about <code>ggplot2</code> is that there&rsquo;s no shame in reusing chart code!), but we can add a <code>geom_smooth</code>, which adds a nonparametric trendline with confidence bands for the trend; since we have a large amount of data, the bands are very tight. We can also fix the problem of &ldquo;empty&rdquo; bins by setting the color fill scale to logarithmic scaling. And since we&rsquo;re adding a black trendline, let&rsquo;s change the viridis palette to <code>plasma</code> for better contrast.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">plot</span> <span class="o">&lt;-</span> <span class="nf">ggplot</span><span class="p">(</span><span class="n">df_ratings</span> <span class="o">%&gt;%</span> <span class="nf">filter</span><span class="p">(</span><span class="n">titleType</span> <span class="o">==</span> <span class="s">&#34;movie&#34;</span><span class="p">,</span> <span class="n">numVotes</span> <span class="o">&gt;=</span> <span class="m">10</span><span class="p">),</span> <span class="nf">aes</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="n">startYear</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">averageRating</span><span class="p">))</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">          <span class="nf">geom_bin2d</span><span class="p">()</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">          <span class="nf">geom_smooth</span><span class="p">(</span><span class="n">color</span><span class="o">=</span><span class="s">&#34;black&#34;</span><span class="p">)</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">          <span class="nf">scale_x_continuous</span><span class="p">()</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">          <span class="nf">scale_y_continuous</span><span class="p">(</span><span class="n">breaks</span> <span class="o">=</span> <span class="m">1</span><span class="o">:</span><span class="m">10</span><span class="p">)</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">          <span class="nf">scale_fill_viridis_c</span><span class="p">(</span><span class="n">option</span> <span class="o">=</span> <span class="s">&#34;plasma&#34;</span><span class="p">,</span> <span class="n">labels</span> <span class="o">=</span> <span class="n">comma</span><span class="p">,</span> <span class="n">trans</span> <span class="o">=</span> <span class="s">&#39;log10&#39;</span><span class="p">)</span>
</span></span></code></pre></div><figure>

    <img loading="lazy" srcset="/2018/07/imdb-data-analysis/imdb-4_hu_fdf90cbdd2dd2c7e.webp 320w,/2018/07/imdb-data-analysis/imdb-4_hu_1c45abe215427c09.webp 768w,/2018/07/imdb-data-analysis/imdb-4_hu_62d0feb034e8b054.webp 1024w,/2018/07/imdb-data-analysis/imdb-4.png 1200w" src="imdb-4.png"/> 
</figure>

<p>Unfortunately, this trend hasn&rsquo;t changed much either, although the presence of average ratings outside the Four Point Scale has increased over time.</p>
<h2 id="mapping-lead-actors-to-movies">Mapping Lead Actors to Movies</h2>
<p>Now that we have a handle on working with the IMDb data, let&rsquo;s try playing with the larger datasets. Since they take up a lot of computer memory, we only want to persist data we actually might use. After looking at the schema provided with the official datasets, the only really useful metadata about the actors is their birth year, so let&rsquo;s load that, but only keep both actors/actresses (using the fast <code>str_detect</code> function from <code>stringr</code>, another tidyverse package) and the relevant fields.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">df_actors</span> <span class="o">&lt;-</span> <span class="nf">read_tsv</span><span class="p">(</span><span class="s">&#39;name.basics.tsv&#39;</span><span class="p">,</span> <span class="n">na</span> <span class="o">=</span> <span class="s">&#34;\\N&#34;</span><span class="p">,</span> <span class="n">quote</span> <span class="o">=</span> <span class="s">&#39;&#39;</span><span class="p">)</span> <span class="o">%&gt;%</span>
</span></span><span class="line"><span class="cl">                <span class="nf">filter</span><span class="p">(</span><span class="nf">str_detect</span><span class="p">(</span><span class="n">primaryProfession</span><span class="p">,</span> <span class="s">&#34;actor|actress&#34;</span><span class="p">))</span>  <span class="o">%&gt;%</span>
</span></span><span class="line"><span class="cl">                <span class="nf">select</span><span class="p">(</span><span class="n">nconst</span><span class="p">,</span> <span class="n">primaryName</span><span class="p">,</span> <span class="n">birthYear</span><span class="p">)</span>
</span></span></code></pre></div><figure>

    <img loading="lazy" srcset="/2018/07/imdb-data-analysis/actor_hu_f86030d94734f51e.webp 320w,/2018/07/imdb-data-analysis/actor_hu_58f7a4e4de86c210.webp 768w,/2018/07/imdb-data-analysis/actor.png 936w" src="actor.png"/> 
</figure>

<p>The principals dataset, the large 1.28GB TSV, is the most interesting. It&rsquo;s an unnested list of the credited persons in each movie, with an <code>ordering</code> indicating their rank (where <code>1</code> means first, <code>2</code> means second, etc.).</p>
<figure>

    <img loading="lazy" srcset="/2018/07/imdb-data-analysis/principals_hu_e149270e85e6bbfe.webp 320w,/2018/07/imdb-data-analysis/principals_hu_d39d7c6fcd18929.webp 768w,/2018/07/imdb-data-analysis/principals_hu_56b42bde8cdb5364.webp 1024w,/2018/07/imdb-data-analysis/principals.png 1074w" src="principals.png"/> 
</figure>

<p>For this analysis, let&rsquo;s only look at the <strong>lead actors/actresses</strong>; specifically, for each movie (identified by the <code>tconst</code> value), filter the dataset to where the <code>ordering</code> value is the lowest (in this case, the person at rank <code>1</code> may not necessarily be an actor/actress).</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">df_principals</span> <span class="o">&lt;-</span> <span class="nf">read_tsv</span><span class="p">(</span><span class="s">&#39;title.principals.tsv&#39;</span><span class="p">,</span> <span class="n">na</span> <span class="o">=</span> <span class="s">&#34;\\N&#34;</span><span class="p">,</span> <span class="n">quote</span> <span class="o">=</span> <span class="s">&#39;&#39;</span><span class="p">)</span> <span class="o">%&gt;%</span>
</span></span><span class="line"><span class="cl">  <span class="nf">filter</span><span class="p">(</span><span class="nf">str_detect</span><span class="p">(</span><span class="n">category</span><span class="p">,</span> <span class="s">&#34;actor|actress&#34;</span><span class="p">))</span> <span class="o">%&gt;%</span>
</span></span><span class="line"><span class="cl">  <span class="nf">select</span><span class="p">(</span><span class="n">tconst</span><span class="p">,</span> <span class="n">ordering</span><span class="p">,</span> <span class="n">nconst</span><span class="p">,</span> <span class="n">category</span><span class="p">)</span> <span class="o">%&gt;%</span>
</span></span><span class="line"><span class="cl">  <span class="nf">group_by</span><span class="p">(</span><span class="n">tconst</span><span class="p">)</span> <span class="o">%&gt;%</span>
</span></span><span class="line"><span class="cl">  <span class="nf">filter</span><span class="p">(</span><span class="n">ordering</span> <span class="o">==</span> <span class="nf">min</span><span class="p">(</span><span class="n">ordering</span><span class="p">))</span>
</span></span></code></pre></div><p>Both datasets have a <code>nconst</code> field, so let&rsquo;s join them together. And then join <em>that</em> to the ratings table earlier via <code>tconst</code>.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">df_principals</span> <span class="o">&lt;-</span> <span class="n">df_principals</span> <span class="o">%&gt;%</span> <span class="nf">left_join</span><span class="p">(</span><span class="n">df_actors</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">df_ratings</span> <span class="o">&lt;-</span> <span class="n">df_ratings</span> <span class="o">%&gt;%</span> <span class="nf">left_join</span><span class="p">(</span><span class="n">df_principals</span><span class="p">)</span>
</span></span></code></pre></div><p>Now we have a fully denormalized dataset in <code>df_ratings</code>. Since we now have the movie release year and the birth year of the lead actor, we can now infer <em>the age of the lead actor at the movie release</em>. With that goal, filter out the data on the criteria we&rsquo;ve used for earlier data visualizations, plus only keeping rows which have an actor&rsquo;s birth year.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">df_ratings_movies</span> <span class="o">&lt;-</span> <span class="n">df_ratings</span> <span class="o">%&gt;%</span>
</span></span><span class="line"><span class="cl">                        <span class="nf">filter</span><span class="p">(</span><span class="n">titleType</span> <span class="o">==</span> <span class="s">&#34;movie&#34;</span><span class="p">,</span> <span class="o">!</span><span class="nf">is.na</span><span class="p">(</span><span class="n">birthYear</span><span class="p">),</span> <span class="n">numVotes</span> <span class="o">&gt;=</span> <span class="m">10</span><span class="p">)</span> <span class="o">%&gt;%</span>
</span></span><span class="line"><span class="cl">                        <span class="nf">mutate</span><span class="p">(</span><span class="n">age_lead</span> <span class="o">=</span> <span class="n">startYear</span> <span class="o">-</span> <span class="n">birthYear</span><span class="p">)</span>
</span></span></code></pre></div><p><figure>

    <img loading="lazy" srcset="/2018/07/imdb-data-analysis/denorm1_hu_654cad39747efe47.webp 320w,/2018/07/imdb-data-analysis/denorm1_hu_eed6e992d7e214e3.webp 768w,/2018/07/imdb-data-analysis/denorm1_hu_dbde12b6453e4f09.webp 1024w,/2018/07/imdb-data-analysis/denorm1.png 1604w" src="denorm1.png"/> 
</figure>

<figure>

    <img loading="lazy" srcset="/2018/07/imdb-data-analysis/denorm2_hu_3aef3d94cde50e2c.webp 320w,/2018/07/imdb-data-analysis/denorm2.png 531w" src="denorm2.png"/> 
</figure>
</p>
<h2 id="plotting-ages">Plotting Ages</h2>
<p>Age discrimination in movie casting has been a recurring issue in Hollywood; in fact, in 2017 <a href="https://www.hollywoodreporter.com/thr-esq/judge-pauses-enforcement-imdb-age-censorship-law-978797">a law was signed</a> to force IMDb to remove an actor&rsquo;s age upon request, which in February 2018 was <a href="https://www.hollywoodreporter.com/thr-esq/californias-imdb-age-censorship-law-declared-unconstitutional-1086540">ruled to be unconstitutional</a>.</p>
<p>Have the ages of movie leads changed over time? For this example, we&rsquo;ll use a <a href="https://ggplot2.tidyverse.org/reference/geom_ribbon.html">ribbon plot</a> to plot the ranges of ages of movie leads. A simple way to do that is, for each year, calculate the 25th <a href="https://en.wikipedia.org/wiki/Percentile">percentile</a> of the ages, the 50th percentile (i.e. the median), and the 75th percentile, where the 25th and 75th percentiles are the ribbon bounds and the line represents the median.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">df_actor_ages</span> <span class="o">&lt;-</span> <span class="n">df_ratings_movies</span> <span class="o">%&gt;%</span>
</span></span><span class="line"><span class="cl">                  <span class="nf">group_by</span><span class="p">(</span><span class="n">startYear</span><span class="p">)</span> <span class="o">%&gt;%</span>
</span></span><span class="line"><span class="cl">                  <span class="nf">summarize</span><span class="p">(</span><span class="n">low_age</span> <span class="o">=</span> <span class="nf">quantile</span><span class="p">(</span><span class="n">age_lead</span><span class="p">,</span> <span class="m">0.25</span><span class="p">,</span> <span class="n">na.rm</span><span class="o">=</span><span class="bp">T</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">                            <span class="n">med_age</span> <span class="o">=</span> <span class="nf">quantile</span><span class="p">(</span><span class="n">age_lead</span><span class="p">,</span> <span class="m">0.50</span><span class="p">,</span> <span class="n">na.rm</span><span class="o">=</span><span class="bp">T</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">                            <span class="n">high_age</span> <span class="o">=</span> <span class="nf">quantile</span><span class="p">(</span><span class="n">age_lead</span><span class="p">,</span> <span class="m">0.75</span><span class="p">,</span> <span class="n">na.rm</span><span class="o">=</span><span class="bp">T</span><span class="p">))</span>
</span></span></code></pre></div><p>Plotting it with ggplot2 is surprisingly simple, although you need to use different y aesthetics for the ribbon and the overlapping line.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">plot</span> <span class="o">&lt;-</span> <span class="nf">ggplot</span><span class="p">(</span><span class="n">df_actor_ages</span> <span class="o">%&gt;%</span> <span class="nf">filter</span><span class="p">(</span><span class="n">startYear</span> <span class="o">&gt;=</span> <span class="m">1920</span><span class="p">)</span> <span class="p">,</span> <span class="nf">aes</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="n">startYear</span><span class="p">))</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">          <span class="nf">geom_ribbon</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">ymin</span> <span class="o">=</span> <span class="n">low_age</span><span class="p">,</span> <span class="n">ymax</span> <span class="o">=</span> <span class="n">high_age</span><span class="p">),</span> <span class="n">alpha</span> <span class="o">=</span> <span class="m">0.2</span><span class="p">)</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">          <span class="nf">geom_line</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">y</span> <span class="o">=</span> <span class="n">med_age</span><span class="p">))</span>
</span></span></code></pre></div><figure>

    <img loading="lazy" srcset="/2018/07/imdb-data-analysis/imdb-8_hu_1f082993b0bfcbd5.webp 320w,/2018/07/imdb-data-analysis/imdb-8_hu_5434c1e3ce1485b4.webp 768w,/2018/07/imdb-data-analysis/imdb-8_hu_c6707a589573484a.webp 1024w,/2018/07/imdb-data-analysis/imdb-8.png 1200w" src="imdb-8.png"/> 
</figure>

<p>Turns out that in the 2000&rsquo;s, the median age of lead actors started to <em>increase</em>? Both the upper and lower bounds increased too. That doesn&rsquo;t coalesce with the age discrimination complaints.</p>
<p>Another aspect of these complaints is gender, as female actresses tend to be younger than male actors. Thanks to the magic of ggplot2 and dplyr, separating actors/actresses is relatively simple: add gender (encoded in <code>category</code>) as a grouping variable, add it as a color/fill aesthetic in ggplot, and set colors appropriately (I recommend the <a href="http://colorbrewer2.org/">ColorBrewer</a> qualitative palettes for categorical variables).</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">df_actor_ages_lead</span> <span class="o">&lt;-</span> <span class="n">df_ratings_movies</span> <span class="o">%&gt;%</span>
</span></span><span class="line"><span class="cl">                  <span class="nf">group_by</span><span class="p">(</span><span class="n">startYear</span><span class="p">,</span> <span class="n">category</span><span class="p">)</span> <span class="o">%&gt;%</span>
</span></span><span class="line"><span class="cl">                  <span class="nf">summarize</span><span class="p">(</span><span class="n">low_age</span> <span class="o">=</span> <span class="nf">quantile</span><span class="p">(</span><span class="n">age_lead</span><span class="p">,</span> <span class="m">0.25</span><span class="p">,</span> <span class="n">na.rm</span> <span class="o">=</span> <span class="bp">T</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">                            <span class="n">med_age</span> <span class="o">=</span> <span class="nf">quantile</span><span class="p">(</span><span class="n">age_lead</span><span class="p">,</span> <span class="m">0.50</span><span class="p">,</span> <span class="n">na.rm</span> <span class="o">=</span> <span class="bp">T</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">                            <span class="n">high_age</span> <span class="o">=</span> <span class="nf">quantile</span><span class="p">(</span><span class="n">age_lead</span><span class="p">,</span> <span class="m">0.75</span><span class="p">,</span> <span class="n">na.rm</span> <span class="o">=</span> <span class="bp">T</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">plot</span> <span class="o">&lt;-</span> <span class="nf">ggplot</span><span class="p">(</span><span class="n">df_actor_ages_lead</span> <span class="o">%&gt;%</span> <span class="nf">filter</span><span class="p">(</span><span class="n">startYear</span> <span class="o">&gt;=</span> <span class="m">1920</span><span class="p">),</span> <span class="nf">aes</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="n">startYear</span><span class="p">,</span> <span class="n">fill</span> <span class="o">=</span> <span class="n">category</span><span class="p">,</span> <span class="n">color</span> <span class="o">=</span> <span class="n">category</span><span class="p">))</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">          <span class="nf">geom_ribbon</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">ymin</span> <span class="o">=</span> <span class="n">low_age</span><span class="p">,</span> <span class="n">ymax</span> <span class="o">=</span> <span class="n">high_age</span><span class="p">),</span> <span class="n">alpha</span> <span class="o">=</span> <span class="m">0.2</span><span class="p">)</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">          <span class="nf">geom_line</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">y</span> <span class="o">=</span> <span class="n">med_age</span><span class="p">))</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">          <span class="nf">scale_fill_brewer</span><span class="p">(</span><span class="n">palette</span> <span class="o">=</span> <span class="s">&#34;Set1&#34;</span><span class="p">)</span> <span class="o">+</span>
</span></span><span class="line"><span class="cl">          <span class="nf">scale_color_brewer</span><span class="p">(</span><span class="n">palette</span> <span class="o">=</span> <span class="s">&#34;Set1&#34;</span><span class="p">)</span>
</span></span></code></pre></div><figure>

    <img loading="lazy" srcset="/2018/07/imdb-data-analysis/imdb-9_hu_57562b2f234249be.webp 320w,/2018/07/imdb-data-analysis/imdb-9_hu_7da40c01dd2abee4.webp 768w,/2018/07/imdb-data-analysis/imdb-9_hu_a30111e8cbade2ed.webp 1024w,/2018/07/imdb-data-analysis/imdb-9.png 1200w" src="imdb-9.png"/> 
</figure>

<p>There&rsquo;s about a 10-year gap between the ages of male and female leads, and the gap doesn&rsquo;t change overtime. But both start to rise at the same time.</p>
<p>One possible explanation for this behavior is actor reuse: if Hollywood keeps casting the same actor/actresses, by construction the ages of the leads will start to steadily increase. Let&rsquo;s verify that: with our list of movies and their lead actors, for each lead actor, order all their movies by release year, and add a ranking for the #th time that actor has been a lead actor. This is possible through the use of <code>row_number</code> in dplyr, and <a href="https://cran.r-project.org/web/packages/dplyr/vignettes/window-functions.html">window functions</a> like <code>row_number</code> are data science&rsquo;s most useful secret.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">df_ratings_movies_nth</span> <span class="o">&lt;-</span> <span class="n">df_ratings_movies</span> <span class="o">%&gt;%</span>
</span></span><span class="line"><span class="cl">                      <span class="nf">group_by</span><span class="p">(</span><span class="n">nconst</span><span class="p">)</span> <span class="o">%&gt;%</span>
</span></span><span class="line"><span class="cl">                      <span class="nf">arrange</span><span class="p">(</span><span class="n">startYear</span><span class="p">)</span> <span class="o">%&gt;%</span>
</span></span><span class="line"><span class="cl">                      <span class="nf">mutate</span><span class="p">(</span><span class="n">nth_lead</span> <span class="o">=</span> <span class="nf">row_number</span><span class="p">())</span>
</span></span></code></pre></div><figure>

    <img loading="lazy" srcset="/2018/07/imdb-data-analysis/row_number_hu_1e44bdb2621fb9cb.webp 320w,/2018/07/imdb-data-analysis/row_number_hu_ca408294ce31483a.webp 768w,/2018/07/imdb-data-analysis/row_number_hu_ed006c80eb52873e.webp 1024w,/2018/07/imdb-data-analysis/row_number.png 1532w" src="row_number.png"/> 
</figure>

<p>One more ribbon plot later (w/ same code as above + custom y-axis breaks):</p>
<figure>

    <img loading="lazy" srcset="/2018/07/imdb-data-analysis/imdb-12_hu_32ee97febb68e3.webp 320w,/2018/07/imdb-data-analysis/imdb-12_hu_69e7d60d89429d8f.webp 768w,/2018/07/imdb-data-analysis/imdb-12_hu_c9df788e280bb63b.webp 1024w,/2018/07/imdb-data-analysis/imdb-12.png 1200w" src="imdb-12.png"/> 
</figure>

<p>Huh. The median and upper-bound #th time has <em>dropped</em> over time? Hollywood has been promoting more newcomers as leads? That&rsquo;s not what I expected!</p>
<p>More work definitely needs to be done in this area. In the meantime, the official IMDb datasets are a lot more robust than I thought they would be! And I only used a fraction of the datasets; the rest tie into TV shows, which are a bit messier. Hopefully you&rsquo;ve seen a good taste of the power of R and ggplot2 for playing with big-but-not-big data!</p>
<hr>
<p><em>You can view the R and ggplot used to create the data visualizations in <a href="http://minimaxir.com/notebooks/imdb-data-analysis/">this R Notebook</a>, which includes many visualizations not used in this post. You can also view the images/code used for this post in <a href="https://github.com/minimaxir/imdb-data-analysis">this GitHub repository</a></em>.</p>
<p><em>You are free to use the data visualizations from this article however you wish, but it would be greatly appreciated if proper attribution is given to this article and/or myself!</em></p>
]]></content:encoded>
    </item>
  </channel>
</rss>
