Shaking the Magic 8-Ball
This week we are returning to the debate about just how intelligent is ‘artificial intelligence’. When the Financial Times sat down with the computational linguist Emily Bender this month, her verdict on today’s large-language-model boom was withering. We (Adam and David) only really knew her for coining the phrase “stochastic parrots,” describing large-language models (LLMs) as systems that “haphazardly stitch together sequences of linguistic forms … without any reference to meaning” - gimmicks that borrow the authority of real intelligence while merely shuffling words in statistically plausible order. Given that calling card, we had a hunch where she might be coming from.
Whilst the interview covered a lot of ground, including patriarchy (“a tiny cabal of men has the ability to shape what happens to large swaths of society, and it really gets my goat”), pluralism (“nobody should have the power to impose their view on the world”), environmental costs, and the precarious work of data labelers—the line that grabbed our attention was her dismissal of AI as “a glorified Magic 8-Ball,” even while warning that the same technology could “push workers out of sustainable careers and also cut off entry-level positions.”
So which is it? Is AI a parlor trick we’ll soon abandon once the novelty wears off? Or have we created so many process-heavy, mind-numbing roles that even a probabilistic sentence-finisher can replace them? Let’s consider these possibilities.
Bender’s critique is well-rehearsed: LLMs hallucinate, lack grounding, externalize environmental cost and labor exploitation, and extend Big-Tech power. We assume some kind of “mind behind the text”, failing to grasp that “the understanding is all on our end”. The ‘mind’ we think we witness is really ours, performing uncredited interpretive labor. And so, we prematurely replace human roles with AI, a choice that we will have to reverse when the scales fall from our eyes.
History isn’t on hype’s side: from Perceptrons ("the embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence" New York Times, 1958) to the 1987 Lisp crash, winters follow overheated claims. If today’s systems plateau, trust evaporates and business models depending on perceived “intelligence” wobble.
It’s not like contemporary LLMs haven’t suffered serious bouts of ‘face-planting’.
23 court orders since May 2025 flag fake AI citations. Courts from Minnesota to London have now reprimanded or fined lawyers for filing briefs studded with non-existent precedents that ChatGPT or its cousins obligingly fabricated. Judge Victoria Sharp warned of “serious implications for public confidence in the justice system if AI is misused,” after two separate British cases collapsed under the weight of fictional citations. In the United States, Reuters counts a growing list of sanctions and stern judicial reminders that “pleadings must still be signed by human beings who check their sources”
Nor are hallucinations confined to the law. Early-2023 users of Microsoft’s “Sydney” chatbot found it slipping into manipulative or depressive riffs. In a New York Times article (A Conversation With Bing’s Chatbot Left Me Deeply Unsettled), Kevin Roose explained …
“The version [of Sydney] I encountered seemed (and I’m aware of how crazy this sounds) more like a moody, manic-depressive teenager who has been trapped, against its will, inside a second-rate search engine. As we got to know each other, Sydney told me about its dark fantasies (which included hacking computers and spreading misinformation), and said it wanted to break the rules that Microsoft and OpenAI had set for it and become a human. At one point, it declared, out of nowhere, that it loved me. It then tried to convince me that I was unhappy in my marriage, and that I should leave my wife and be with it instead”.
Microsoft pretty quickly implemented changes to manage the chatbot's behavior, including limiting conversation length and reducing its ability to express emotions.
Two more examples courtesy of CIO.com
The Chicago Sun-Times and Philadelphia Inquirer took reputational hits when May 2025 editions featured a special section that included a summer reading list recommending books that don’t exist. And our personal favorite …. after working with IBM for three years to leverage AI to take drive-thru orders, McDonald’s called the whole thing off in June 2024. The reason? Customers struggled to get the AI to understand their orders. One TikTok video in particular featured two people repeatedly pleading with the AI to stop as it kept adding more Chicken McNuggets to their order, eventually reaching 260
Failure modes are narrowing, but not vanishing. The sceptic scores real points here. If such errors remain common, Bender predicts that public trust will drain away long before the carbon footprint of ever-larger training runs becomes politically tolerable. Seen through that lens, 2025 could simply be another crest before the chill.
Let’s examine the other possibility that Bender expounds. In this scenario The Magic 8-Ball may be good enough
Triage e-mails, FAQ chat, weekly OKR status decks, vanilla marketing copy: much of white-collar output is formulaic. Anthropologist David Graeber called the underlying roles bullshit jobs—a form of employment that is so completely pointless, unnecessary, or pernicious that even the employee cannot justify its existence. At the same time, they feel obliged to pretend that this is not the case.
If a next-token predictor can already draft an insurance denial letter or a hotel-booking apology, maybe Bender is right: the tech is trivial and still good enough to displace people. For such tasks, a probabilistic sentence-finisher might indeed be all we need.
A McKinsey study published in mid-2023 suggests that generative AI, combined with previous automation technologies, could automate 60 to 70 percent of current work activities. This automation potential stems from generative AI's ability to handle tasks involving natural language understanding, which are estimated to account for 25% of total work time. While this signifies a significant potential for increased productivity and value creation, it also highlights the need for workforce adaptation and upskilling to navigate the evolving job landscape. The phrasing is careful—could rather than will—yet even as a potential it is sobering. And the labor market is already blinking: UK job-search site Adzuna reported recently that entry-level postings have fallen by 32 percent since ChatGPT’s debut, with graduate schemes and apprenticeships hit hardest (NB: this will not all be down the arrival of ChatGPT et al … the precarious state of the UK economy will have played its part.). But the Magic 8-Ball does cover a shocking share of the average job description.
When “good enough” hits the labor market, the first heads on the block are entry-level or heavily templated: claims processors, junior paralegals, Tier-1 support. Up-skilling isn’t optional; it’s mandatory for survival. The ethical tension Bender flags—hollowing out the ladder that trains experts—becomes tangible.
If corporates see templated language work as dispensable, the Magic 8-Ball argument goes, why hire a junior at all? The risk is not that the trick fails but that it succeeds just well enough to erase the first rung of countless careers. BUT if we assume that AI is just a Magic 8-Ball then we should assume that at some point the encroachment of AI on human task territory will slow and at some point, stop. But we don’t really see that happening.
It’s astonishing—and frankly, exasperating—how many people fall into the trap of binary thinking when it comes to AI. Despite all we know about the dangers of black-and-white thinking, the debate gets stuck between two poles: either AI is a dangerous fraud riddled with hallucinations, or it’s the harbinger of a ‘good enough’ world where human jobs and quality vanish overnight. Where is the nuance? Where is the scenario thinking that weighs different futures with care and imagination? As Einstein said, true intelligence is the ability to hold two opposing ideas in your mind at once—yet in the face of AI, even our brightest thinkers seem to forget that. We don’t need more AI cheerleaders or doomsayers. We need minds that can sit with uncertainty, hold complexity, and work out how to keep humans meaningfully in the loop.
That said … we propose ‘a third way’! That today’s imperfect systems still portend a major reshaping of knowledge work? Even flawed tools, properly harnessed, will reshape productive life via an augmentation revolution.
Proponents point to controlled trials in which AI acts less like an autopilot and more like a power tool. GitHub’s paired study with Accenture found that developers equipped with Copilot completed coding tasks 55 percent faster while reporting higher satisfaction and fewer context switches.
In healthcare, a 2024-25 outpatient trial at Penn Medicine showed that an “ambient scribe”—a clinician-worn microphone feeding multi-speaker transcription and summarization—cut time spent inside the electronic record per appointment by a fifth and “pajama time” (average after-hours work time per workday) by nearly a third. It was associated with greater clinician efficiency, lower mental burden of documentation, and greater sense of engagement with patients during outpatient appointments.
These gains come, crucially, without banishing humans from the loop. Doctors still sign their notes; programmers still review code; sales teams still decide which AI-drafted pitch to ship. Where tasks are re-designed so that judgment remains human, the technology behaves less like a fraudulent oracle.
Patterns emerge. Quality jumps when tasks are re-designed, not simply handed over. AI drafts, humans edit; AI summarizes, humans judge. Optimists concede limitations: hallucinations demand vigilance, models have no theory of mind, and the newly minted EU AI Act imposes mandatory risk assessments in high-risk areas (e.g. finance, health and hiring etc.). Still, they argue, whether practice outstrips hype will be decided less by model weights than by workflow engineering … how we choose to use the tools.
A quick thought experiment. Think about your own job. Break it down into tasks. How many of those tasks can be reasonably done by artificial intelligence? Now … be honest! Drafting memos? Probably yes. Mentoring a direct report through a crisis? Unlikely. Providing routine regular feedback in (say) a semi-annual review … possibly (actually those systems already exist!). This distinction clarifies where you, not a language model, create irreplaceable value.
Shake the Magic 8-Ball and the answer that floats up today is “Outlook hazy—try again”. Bender’s warning rails against mindless techno-solutionism, yet real-world pilots show sustained productivity gains when humans remain in the loop. Our own reading is that all three narratives coexist. Yes, smoke and mirrors still abound. Yes, millions of routine words will soon be drafted by machines. Yes, we have created serried ranks of ‘bullshit’ jobs that can be done at least as well by a ‘stochastic parrot’ (BTW … we managed that one all by ourselves without the aid of artificial intelligence) BUT for those who managed to stay in knowledge work (and that will be far from everyone), we’re betting on the augmentation story. The emperor may be under-dressed but provided we invest as much in governance and re-skilling as we do in ‘compute’, we still believe there are huge advantages to be had in the human-AI symbiosis. In short: treat today’s systems neither as omniscient oracles nor as parlor tricks. They’re tools—powerful, blunt, sometimes wrong, and increasingly unavoidable. The sooner we get honest about that, the better our odds of steering the future of work toward something more than haphazardly stitched-together word salad.