Computers Can Be Creative, Too

The pace of advancement in AI capability in the last twenty years has been astonishing. Around the turn of the century, a paradigm shift towards “deep learning,” borrowing in many ways from the structure of the human mind, brought incredible advancements in AI. It’s an exciting time for AI research, where major advancements are being made simply by tweaking existing algorithms and running models on increasingly better hardware.

We can differentiate between two different use cases for AI - analytical AI and generative AI.

Analytical AI is extraordinarily good at finding patterns in large datasets. This makes it an ideal tool to solve well-defined problems that require searching over a large possibility space to find the solution. The first example of this is perhaps the most famous - DeepBlue defeating then world chess champion Garry Kasparov in 1997. This is certainly an exciting space - DeepMind’s AlphaFold model would probably have been the most important breakthrough in biology in the last decade, if not for the development of the mRNA vaccine.

But here I want to focus more on the other use case of AI. Generative AI is relatively new, using AI models to output something new based on training data, and has really been made possible only through advances in the past few years.

In 2019, OpenAI released GPT-2, a predictive language model trained on billions of samples of text. The idea of GPT-2 was simple - given a prompt, the model would try to guess how to continue the text, using its corpus of training data. It was mostly used as an amusing toy - although the model could answer simple questions correctly and even generate short pieces of text, anything more complex or longer than a couple of sentences tended to devolve into gibberish.

In 2020, OpenAI released GPT-3, the successor to GPT–2. In the paper presenting the model, the researchers warned of the potential dangers of the model - much of the time, the output produced by GPT-3 was indistinguishable from that of a human. GPT-3 is structurally similar to its predecessor, aside from some engineering tweaks and using 100X more parameters. In other words, by simply scaling up the existing framework, OpenAI was able to overcome many of the limitations of GPT-2.

The applications of this technology are just beginning to be explored: from Github’s Copilot, which can significantly improve the speed at which developers can write code; to automatically writing marketing copy for your website; to creating video games, and poetry. Kids are even using it to cheat on homework.

Then, earlier this year, OpenAI released DALL-E 2, a GPT-3 based model that was modified to output pictures instead of text. DALL-E 2, (and similar models like Stable Diffusion, MidJourney, and others) can create a picture based on an arbitrary text input - here’s an example based on “an illustration of a baby daikon radish in a tutu walking a dog.” The quality and diversity of the images are astounding - a few weeks ago, an AI generated image won first place at a state fair fine arts competition, beating many human artists.

It is difficult to overstate my excitement about the generative AI space. Graphic design, by itself, is a several billion dollar industry. So is copywriting, for websites, social media, blogs, instruction manuals, and much more. Generative AI is poised to disrupt both of these fields. I envision it as a very powerful tool - designers using the AI as a starting point or to brainstorm, then tweaking the result to perfection. Many menial content creation jobs may disappear, but many more jobs may be created: already, people are beginning to look for “prompt engineers,” who use just the right prompt to get the desired image from the AI.

Already, there are startups leveraging generative AI to solve use cases across marketing and commerce. There’s Poly, a YC startup using AI to generate design assets; Jasper, using GPT-3 to automatically generate marketing copy; Axdraft, which automatically drafts legal documents; Unstuck, using DALL-E to generate stock images; and many more. It’s a brand new space, and it’ll be exciting to see how talented entrepreneurs build businesses leveraging this technology. And at the rate the technology is improving, I believe it is inevitable that AI plays a large role in the creative process in the near future.

By the way, I’ve gotten access to the DALL-E beta. Please enjoy my attempt to create a visual interpretation of “Silicon Road,” powered by AI.

Computers Can Be Creative, Too

Recent Posts