Is GPT-4 more creative than Wharton MBA students? When AI is the topic of discussion, creatives often find solace in their human ability to create, but a new report summary published in The Wall Street Journal casts doubt on that theory. While on the surface, it may appear as though, creatives are in trouble, the results reveal more about our lack of creative understanding than our lack of creativity.
Christian Terwiesch and Karl Ulrich put GPT-4 against Wharton MBA Students in a creative competition. “Humanity was represented by a pool of 200 randomly selected ideas from our Wharton students. The machines were represented by ChatGPT4, which we instructed to generate 100 ideas with otherwise identical instructions as given to the students.”
Terwiesch and Ulrich then market-tested the ideas presented by students and GPT-4 by asking consumers, “How likely would you be to purchase based on this concept if it were available to you?” The purchase probability of the human ideas was 40%, and the purchase probability of GPT-4 ideas was 47%.
But before you fire your entire design team, let’s take a brief walk back through some of history’s biggest breakthroughs.
What the GPT-4 vs Wharton creative test demonstrates
The test reveals a few things, neither of which is how AI is more creative than people. First, it reminds us market testing isn’t the gold standard. Second, it reveals how poorly large groups of people are at judging great ideas.
Since recorded history, we’ve always ignored the best ideas. Galileo’s heliocentric universe, the Wright brother’s first flight, Seinfeld, and the original iPhone are prime examples. As I wrote in The Wall Street Journal in 2013, “most leaders talk about creativity (or its cousin, innovation) without understanding what it is and how it happens. The process of real creativity is messy, chaotic, sometimes even disgusting, and it reeks of failure, experimentation and disorganization.”
Things haven’t changed in the last 10 years. Leaders and researchers alike still know as much about creativity as they do the human brain that contains it. And they’re even worse at identifying great ideas.
Market Testing Isn’t the Gold Standard
You and I know Seinfeld as one of the most popular TV shows in television history, making Jerry Seinfeld the richest comedian on earth—not to mention one of the most successful entertainers in history. If market testing was the gold standard however, Seinfeld would have never aired and you may have never know his name.
Seinfeld tested terribly, and even after it launched as a mid-season replacement, results were abysmal. So much so, the decision was made to cut it. Had it not been for one person, Rick Ludwin, the show would have been lost. Ludwin believed in the show, so much so he put his entire special projects budget on the line (and his job) to keep it going.
What did test well in that same period? Knockoffs of existing shows—a reminder people will often favor familiar ideas over novel ones.
Humans Suck at Identifying Novel Ideas
One would imagine the incredible creativity of Orville and Wilbur Wright, which led to human flight, would have received wall-to-wall coverage. But not until three years after they had achieved human flight did The New York Times write a single word about it. As recounted by Rita McGrath in her book Seeing Around Corners as of 1908, nearly 5 years after the achievement, humans didn’t know human flight had been achieved.
Imagine that. Men and women riding on trains theorizing about the future existence of human flight, having no knowledge it was already a historical event.
The same creative blindness can be seen in a 2007 experiment run by The Washington Post with world-renowned violist Joshua Bell. As one of the world’s greatest violinists, performing incognito in their subway netted $32 in an hour. He typically makes $1,000 a minute.
The story of Chemex offers a similar lesson. Invented by scientist Dr. Peter Schlumbohm in 1941, by the early 60s, no one cared. Not until 2010 did sales finally surge when Intelligentsia’s CEO made a comment on his preference for the old brewing. After limping along for decades, today many swear by the brewing method. The company now makes an estimated $500 million in annual revenue.
As the new iPhone 15 releases today, even Apple wasn’t immune to the idea blindness effect. Then CEO of Microsoft, Steve Ballmer, reportedly said, “There’s no chance that the iPhone is going to get any significant market share.” But Jobs knew better. “Some people say give the customers what they want, but that’s not my approach. Our job is to figure out what they’re going to want before they do.” he said. “People don’t know what they want until you show it to them. That’s why I never rely on market research. Our task is to read things that are not yet on the page.” It’s the “faster horses” idea from Henry Ford.
What about the better mousetrap?
In his book, The Myths of Creativity, David Burkus refers to one symptom of our collective idea blindness as “the mousetrap myth.” We all believe in the adage, “if you build a better mousetrap, the world will beat a path to your door.” But with expert precision, he dismembers the idea as fiction. It’s not true and never has been.
“Don’t worry about people stealing your ideas,” said Howard Aiken at IBM, “If your ideas are any good, you’ll have to ram them down people’s throats.” He’s right. The mark of true creative ideas is typically met with aggressive opposition.
People diminish or demonize the unfamiliar and praise the familiar. They show favoritism toward familiarity, belonging, and their own trust channels. Channels like friends, their barber, chiropractor, podcasts they love, and even news publications they trust.
We don’t know how ideas are created. We can’t predict which ones will be adopted. What we do know is our unique experiences and empathy make us uniquely qualified to solve problems. Because others don’t have our unique experiences; they’re unlikely to “get it” right away.
Although GPT-4 won a controlled test in a lab setting, it tells us nothing. It’s why inventors like Kevin O’Leary, Damon John, Barbara Cochron, Mark Cuban, and Robert Herjavec on the hit series Shark Tank always ask about SALES. (Yet another show that didn’t test well, was almost cut, and ended up being a runaway success.)
As flawed as this Wharton/GPT-4 test may be, its implications are quite concerning. I can see future CEOs relying on AI to create new products or even running new ideas at their company. Or worse, I can see CEOs use GPT4 or AI to predict market validity—a dangerous proposition for sure. Could AI be a greater threat to creativity than a solution?
