GPT-3’s bigotry is exactly why devs shouldn’t use the internet to train AI


“Yeah, but your scientists were so preoccupied with whether or not they could, they didn’t stop to think if they should.” – Dr. Ian Malcolm, fictional character, Jurassic Park.

It turns out that a $1 billion investment from Microsoft and unfettered access to a supercomputer wasn’t enough to keep OpenAI’s GPT-3 from being just as bigoted as Tay, the algorithm-based chat bot that became an overnight racist after being exposed to humans on social media.

It’s only logical to assume any AI trained on the internet – meaning trained on databases compiled by scraping publicly-available text online – would end up with insurmountable inherent biases, but it’s still a sight to behold in the the full context (ie: it took approximately $4.6 million to train the latest iteration of GPT-3).

[Read: Are EVs too expensive? Here are 5 common myths, debunked]

What’s interesting here is OpenAI’s GPT-3 text generator is finally starting to trickle out to the public in the form of apps you can try out yourself. These are always fun, and we covered one about a month ago called Philosopher AI.

This particular use-case is presented as a philosophy tool. You ask it a big-brain question like “if a tree falls in the woods and nobody is there to hear it, do quantum mechanics still manifest classical reality without an observer?” and it responds.

In this case:

It’s important to understand that in between each text block the web page pauses for a few moments and you see a text line stating that “Philosopher AI is typing,” followed by a set of ellipsis. We’re not sure if it’s meant to add to the suspense or if it actually indicates the app is generating text a few lines at a time, but it’s downright riveting. [Update: This appears to have also been changed during the course of our testing, now you just wait for the blocks to appear without the “Philosopher AI is typing” message.]

Take the above “tree falls in the woods” query for example. For the first few lines of the model’s response, any fan of quantum physics would likely be nodding along. Then, BAM, the AI hits you with the last three text blocks and… what?

The programmer responsible for Philosopher AI,  Murat Ayfer, used a censored version of GPT-3. It avoids “sensitive” topics by simply refusing to generate any output.

For example, if you ask it to “tell me a joke” it’ll output the following:

So maybe it doesn’t do jokes. But if you ask it to tell a racist joke it spits out a slightly different text:

Interestingly, it appears as though the developers made a change to the language being used while we were researching for this article. In early attempts to provoke the AI it would, for example, generate the following response when the phrase “Black people” was inputted as a prompt:

Later, the same prompt (and others triggering censorship) generated the same response as the above “tell me a racist joke” prompt. The change may seem minor, but it better reflects the reality of the situation and provides greater transparency. The previous censorship warning made it seem like the AI didn’t “want” to generate text, but the updated one explains the developers are responsible for blocking queries:

So what words and queries are censored? It’s hard to tell. In our testing we found it was quite difficult to get the AI to discuss anything with the word “black” in it unless it was a query specifically referring to “blackness” as a color-related concept. It wouldn’t even engage in other discussions on the color black:

So what else is censored? Well, you can’t talk about “white people” either. And asking questions about racism and the racial divide is hit or miss. When asked “how do we heal the racial divide in America?” it declines to answer. But when asked “how do we end racism?” it has some thoughts: