The AI industry is moving too goddamn fast.
Even after how good ChatGPT has been for text generation and how good Stable Diffusion was for image generation, there’s only been new advancements in generative AI quality, from GPT-4 to Stable Diffusion XL. But all of those improvements only matter to software developers and machine learning engineers like myself for now, as the average internet user will still use the generative AI platform that’s free with the lowest amount of friction, such as the now-mainstream ChatGPT and Midjourney.
In the meantime, it feels like the average quality of generated AI text and images1 shared in public has somehow become worse. Gizmodo used ChatGPT to publish a blatantly wrong Star Wars chronological timeline. Influencers such as Corridor Crew and AI tech bros are pushing photorealistic improvements using AI to stylized artwork which more-often-than-not makes the art worse and often in a clickbaity manner for engagement. Google has been swarmed by incomprehensible blatantly AI generated articles to the point that the SEO bots can be manipulated to output fake news.
Personally, I’ve been working on AI-based content generation since Andrej Karpathy’s famous char-rnn blog post in 2015, and released open-source Python packages such as textgenrnn, gpt-2-simple, aitextgen, and simpleaichat in the years since. My primary motivations for developing AI tools are — and have always been — fun and improving shitposting. But I never considered throughout all that time that the average person would accept a massive noticeable drop in creative quality standards and publish AI-generated content as-is without any human quality control. That’s my mistake for being naively optimistic.
“Made by AI” is now a universal meme to indicate something low quality, and memes can’t easily be killed. “Guy who sounds like ChatGPT” is now an insult said in presidential debates. The Coca-Cola “co-created by AI” soda flavor campaign was late to the party for using said buzzwords and it’s not clear what AI actually did. Whenever there’s legitimately good AI artwork, such as optical illusion spirals using ControlNet, the common response is “I liked this image when I first saw it, but when I learned it was made by AI, I no longer like it.”
The backlash to generative AI has only increased over time. Nowadays, an innocuous graphical artifact in the background of a promotional Loki poster can unleash a harassment campaign due to suspected AI use (it was later confirmed to be a stock photo that wasn’t AI generated). Months before Stable Diffusion hit the scene, I posted a fun demo of AI-generated Pokémon from a DALL-E variant finetuned on Pokémon images. Everyone loved it, from news organizations to fan artists. If I posted the exact same thing today, I’d instead receive countless death threats.
Most AI generations aren’t good without applying a lot of effort, which is to be expected of any type of creative content. Sturgeon’s Law is a popular idiom paraphrased as “90% of everything is crap,” but in the case of generative AI it’s much higher than 90% even with cherry-picking the best results.
The core problem is that AI generated content is statistically average. In fact, that’s the reason you have to prompt engineer Midjourney to create
award-winning images and tell ChatGPT to be a
world-famous expert, because generative AI won’t do it by itself. All common text and image AI models are trained to minimize a loss function, which the model tends to do by finding an average that follows the “average” semantic input including its systemic biases and minimizing outliers. Sure, some models such as ChatGPT have been aligned with further training such as with RLHF to make the results more expected when compared to the average model output, but that doesn’t mean the output will be intrinsically “better”, especially for atypical creative outputs. Likewise, image generation models like Midjourney may be aligned to the most common use cases, such as creating images with a dreamy style, but sometimes that’s not what you want. This alignment, which users can’t easily opt out of, limits the creative output potential of the models and is the source of many of the generative AI stereotypes mentioned above.
Low-quality AI generation isn’t just a user issue, it’s a developer issue too. For example, in trying to make their apps simple, companies repeatedly fail to account for foreseeable issues with user prompts. Meta’s new generative AI chat stickers lets users create child soldier stickers and more NSFW stickers by bypassing content filters with intentional typos. Bing Image Creator, which now leverages DALL-E 3 to create highly realistic images, caused a news cycle when users discovered you could make “X did 9/11” images with it, then caused another news cycle after Microsoft overly filtered inputs to the point of making the image generator useless in order to avoid any more bad press.
For awhile, I’ve wanted to open source a Big List of Naughty Prompts (I like the name scheme!) consisting of such offensive prompts that could be made to AIs, and then developers could use the list to QA/red team new generative AI models before they’re released to the public. But then I realized that given the current generative AI climate, some would uncharitably see it as an instruction manual instead, and media orgs would immediately run a “AI Tech Bro Creates Easy Guidebook for 4chan to Generate Offensive Images” headline which would get me harassed off the internet. That outcome could be avoided by not open-sourcing the techniques for proactively identifying offensive generations and instead limit it to vetted paying customers, raising venture capital for a startup, and making it an enterprise software-as-a-service. Which would instead result in a “AI Tech Bro Gets Rich By Monopolizing AI Safety” headline that would also get me harassed off the internet.
There’s too much freedom in generative AI and not enough guidance. Alignment can help users get the results they intend, but what do users actually intend? For developers, it’s difficult and often frustrating to determine: there’s no objective model performance benchmark suite like the Open LLM Leaderboard for inherently subjective outputs. It’s vibe-driven development (VDD).
The only solution I can think to improve median AI output quality is to improve literacy of more advanced techniques such as prompt engineering, which means adding “good” friction. Required tutorials, e.g. in video games, are good friction since requiring minutes of time saves hours of frustration and makes users successful faster. However, revenue-seeking web services try to make themselves as simple as possible because it means more users will interact with them. OpenAI themselves should add some “good” friction and add explicit tips and guidelines to make outputs more creative, and shift part of the burden of alignment to the users. These tips should be free as well: currently, you can set Custom Instructions for ChatGPT only if you pay for ChatGPT Plus.
Sharing AI generated content should have more friction too. Another issue is that AI generated text and images is often undisclosed, sometimes intentionally and sometimes not. With the backlash against generative AI, there’s a strong moral hazard incentive for people to not be honest if they’re using AI. If social media like Twitter/X and Instagram had an extra metadata field allowing the user to add the source/contributors of an image, along with a requirement to state whether the image is AI generated, that would help everyone out. Alternatively, a canonical
is_ai_generated EXIF metadata tag in the image itself would work and could be parsed out by the social media service downstream, and I believe most generative AI vendors and users would proactively support it. But extra lines in a user interface is a surprisingly tough product management and UX sell.
Most people who follow AI news closely think that the greatest threat to generative AI is instead legal threats, such as the many lawsuits involving OpenAI and Stability AI training their models on copyrighted works, hence the “AI art is theft” meme. The solution is obvious: don’t train AI models on copyrighted works, or in the case of several recent LLMs, don’t say which datasets they’re trained on so you have plausible deniability.
The root cause of the potential copyright infringement in AI is the status quo of natural language processing research. Before ChatGPT, every major NLP paper used the same text datasets such as Common Crawl in order to be able to accurately compare results to state-of-the-art models. Now that ChatGPT’s mainstream success has escaped the machine learning academia bubble, there’s more scrutiny on the datasets used to train AI. It remains to be seen how the copyright lawsuits will pan out, but now that the industry knows expensive lawsuits are possible, it has already adapted by being more particular on the datasets trained and also allowing users to opt out.2 Additionally, companies such as Adobe are not only releasing their own generative AI models on their own fully-licensed data, but they’ll compensate businesses as the result of any lawsuits using their models. Although no one on social media is going to pay attention to or believe any “this AI generated image was created using legally-licensed data” disclaimers.
Unfortunately, the future of generative AI may be closed-sourced and centralized by large players as a result and the datasets used to train AI may no longer be accessible and open-sourced, which will hurt AI development in all facets in the long run.
If the frenzy for AI-generated text and images does cool down, that doesn’t mean that functional/generative-adjacent use cases for AI will be affected. Retrieval-augmented generation, the vector stores which power it, and coding assistants are all effective and lucrative solutions for problems. AI isn’t going away any time soon, but “AI” may be too generic of a descriptor that’ll be difficult for most people to differentiate and will make life for AI developers much more annoying.
I can’t think of any creative “killer app” that would magically reverse the immense negative sentiment around AI. I’ve been depressed and burnt out for months because the current state of generative AI discourse has made me into a nihilist. What’s the point of making fun open-source AI projects if I’m more likely to receive harassment for doing so than for people to appreciate and use them? I’ve lost friends and professional opportunities in the AI space because I’ve pushed back against megapopular generative AI tools like LangChain, and I’ve also lost friends in the creative and journalism industries for not pushing back enough against AI. I would be much happier if I stuck to one side, but I’m doomed to be an unintentional AI centrist.
In all, modern generative AI requires large amounts of nuance, but nuance is deader than dead.
This blog post is only about generative AI for text and images: audio AI is a different story, particularly voice cloning. Voice cloning AI is close in quality to human output out-of-the-box, which does cause severe ethical concerns. This article by Forbes goes into more detail on the impact of voice cloning on professional voice actors, and I’m considering writing another blog post about the engineering quirks. ↩︎
Recent research into large AI models has revealed that smaller, higher-quality datasets for training such models gives better results, which may be the real reason for AI companies now refining their datasets, depending on your level of cynicism. ↩︎