We Keep Asking Machines to Lie to Us

How and why did we decide that deceiving us was the most impressive thing a computer could do?

Feb 13, 2023

1. Imitation and Insincerity

What we colloquially call the Turing Test is a version of what Alan Turing himself, in his 1950 paper Computing Machinery and Intelligence, called “the imitation game.”

It is played with three people, a man (A), a woman (B), and an interrogator (C) who may be of either sex. The interrogator stays in a room apart front the other two. The object of the game is for the interrogator is to determine which of the other two is the man and which is the woman.

Turing also stipulates that “in order that tones of voice may not help the interrogator the answers should be written, or better still, typewritten,” which makes it that much easier for him to pivot to something like the Turing Test as we’d recognize it today:

We now ask the question, "What will happen when a machine takes the part of A in this game?" Will the interrogator decide wrongly as often when the game is played like this as he does when the game is played between a man and a woman? These questions replace our original, "Can machines think?"

The aspect of Turing’s imitation game that tends to go remarkably unremarked-upon—aside from the weird gendered valence of the ruleset—is the fact that the machine is explicitly placed in the position of deceiving the interrogator. Turing stipulates that the machine plays as A, who in a version of this game played by three humans would be a man trying to convince the interrogator that he’s a woman.

We could take this line of thinking further: The machine’s explicit goal is to pass itself off, deceptively, as a human. Even further: A presumably genderless machine is trying to pass itself off as a human woman, in a two-layered lie. Further still: Some hypothetical thinking machine playing as A would be, knowingly, trying to pass its machine intelligence off as human intelligence. Even a machine that would unambiguously “pass the Turning Test” in the colloquial sense would, in Turing’s own formulation, do so by lying in multiple simultaneous, overlapping ways.

There’s a third thing rarely remarked upon, then, in the imitation game, in addition to the weird gendered stuff and the baseline assumption that the machine will play the role of deceiver. In short, the game hinges on how easy the interrogator is to deceive.1

When we skip the setup of the original imitation game and simply ask a machine “Can you think?” or “Are you sentient?” there’s an often-unspoken assumption that the interrogator is smart in some general way, and that they have relevant domain expertise, and that therefore to deceive them is to prove something useful.

But bluntly, what if they aren’t, and they don’t, and therefore it doesn’t? Or what if the interrogator simply wants to be deceived (perhaps because they have some stake in the idea of “machine intelligence”) to the extent that they will themselves engage in deception, self- and otherwise, in order to call the game for the machine?

2. Poetry, or the Lack Thereof

For my previous piece on the limits of “AI,”, I asked ChatGPT to describe its own operations in the form of a sonnet. My intention was, for one thing, to steel-man my machine interlocutor—to show off the kind of formal trick that, let’s be honest, the tech is pretty impressive at. But I was also making reference to one of Turing’s own examples of a possible exchange in the imitation game:

Q: Please write me a sonnet on the subject of the Forth Bridge.
A: Count me out on this one. I never could write poetry.

Note that this move would work equally well if the machine-player genuinely couldn’t write poetry, or if the interrogator simply believed that the machine-player couldn’t write poetry. By the same token, it would be canny for a machine perfectly capable of writing poetry to nonetheless give a response like the one in Turing’s example, since the interrogator might believe that only a machine could compose a sonnet in seconds.

All of this is good fun, especially if you like Poker. But do note that we’ve blithely made a significant substation. The question that we’re allowing to, in Turing’s words, “replace our original, ‘Can machines think?’” is, pretty explicitly, “Can machines deceive us?” These are not remotely the same question, and Turing is definitely aware of that disconnect, though oddly untroubled by it.

May not machines carry out something which ought to be described as thinking but which is very different from what a man does? This objection is a very strong one, but at least we can say that if, nevertheless, a machine can be constructed to play the imitation game satisfactorily, we need not be troubled by this objection.

On a generous, good faith reading, Turing is getting at something important here: Should we find ourselves dealing with machine output that is functionally identical to human speech, then there’s no real point in worrying about whether the machine is “really” thinking—or feeling, or doing whatever other human stuff. That said, there’s an important difference between two things being functionally identical, and merely being superficially indistinguishable. (This difference is the reason why you should learn a whole lot about mushrooms before ever eating ones you find in the wild).

If you’re worried about some hypothetical “strong” AI of the future taking over the world and turning everything into grey goo or whatever, then you probably shouldn’t be all that enthusiastic about machines specializing in deceiving humans. But a more immediate concern lies in our rush to reify and rely upon apparently-thinking machines that have been fine-tuned not to give us useful output, but to lie to us.

I suspect that lurking behind people’s fears about “strong AI” going rogue, and grey goo, and Roko’s Basilisk—that is, lurking behind anxiety about a future where advanced technology makes everything horrible—there’s a more insidious anxiety about everything being sort of horrible right now, and about the cause being hideously mundane.

3. Iron and Carbon

Last time around, I made two main points: First, that what large language models are uniquely good at producing is plausible bullshit, since ChatGPT (as an example) works by playing the percentages—that is, by figuring out what an answer to the question you’re asking should probably look like, based on a large corpus of existing answers to similar questions. Second, that we run into the hard limitations of this approach when we ask ChatGPT (again, just as an example) to cite its sources. ChatGPT itself can’t or won’t do anything of the kind, because the core idea of large language models is that sourcing and attribution can be productively replaced by big data.

My argument conflated “AI” as a broad concept with the specific approach that OpenAI uses for ChatGPT, and for their other public-facing darling, the generative image mish-masher DALL·E. This approach is called “deep learning,” a term that means almost precisely the opposite of what its constituent words imply: The machine makes no attempt to understand what its sources mean; rather, it finds patterns in the input, and then to uses those patterns to make similar output.

It isn’t learning. It’s mimicking. And what it’s mimicking isn’t deep. The whole premise of “deep learning” is that everything beneath the surface is a distraction, in direct contrast to other branches of AI research that do aspire to “teach” machines what input means—or to “manipulate symbols” in the domain-specific parlance.

To explain how deep learning differs from manipulating symbols, Gary Marcus uses the example of spell-checking. The symbol manipulation approach would be to teach a machine that people sometimes transpose adjacent letters (for example, typing “teh” when they mean to type “the”). By contrast, the deep learning approach would be to look at a “Google-sized” dataset of what people mistype and how they then correct themselves. The latter technique would show that people type “teh” when they mean “the,” without the machine having to understand why people make that mistake.

To Marcus, “it seems blazingly obvious that you’d want both approaches in your arsenal,” and indeed, “in the real world, spell checkers tend to use both.” In any domain you could name, each approach has its strengths and weakness. Deep learning “intelligences” are good at having natural-sounding conversations, but remarkably bad at puzzle-solving. They're better than symbol manipulating “intelligences” at playing Chess (where they can memorize every possible game state) but worse at playing NetHack (where they can’t).

In another piece, Marcus talks about the tendency of deep learning to produce bullshit. He calls this “the hallucination problem, whereby the machine makes up fluent nonsense.” By understanding what large language models do as hallucinating rather than as thinking, we can better understand why they “can seem so brilliant one moment and so breathtakingly dumb the next.”

Marcus would probably agree with me that deep learning is synonymous with AI in the current popular discourse, but he would be quicker than I’ve been to point out why that sucks. As he says,

Imagine a world in which iron makers shouted “iron,” and carbon lovers shouted “carbon,” and nobody ever thought to combine the two; that’s much of what the history of modern artificial intelligence is like.

To be clear, I’m skeptical that a more holistic model—the metaphorical combining of iron and carbon—would be some kind of silver (or steel) bullet. I'm skeptical about the lavish promises of any and every form of “artificial intelligence.” But deep learning does hold a unique place in the tradition of humans asking machines to lie to them.

So let’s talk about Geoffery Hinton.

4. Neural Nihilism

Geoffery Hinton was briefly an advocate for the “hybrid” approach of combining symbol manipulation with deep learning (as spell-checkers do, and as Gary Marcus currently advocates), but has since become a vocal sectarian for deep learning exclusively. Here we should note that there’s another common term for deep learning: “neural networks”—and as the biological tinge of that phrase implies, Hinton considers neural networks analogous to how humans actually think.

His argument goes that neutral networks are “much more biological” than other kinds of computing, because “symbolic AI” technologies “take reasoning to be the core of intelligence.” Hinton considers that a false premise, because in his view, humans “don’t store symbolic expressions,” but rather “great big patterns of activity,” nodes knocking into other nodes, connections emerging intuitively. So when we try to teach formal logic to would-be artificial intelligences, for Hinton that’s a dead end. And when Noam Chomsky identifies a close relationship between cognition and language, for Hinton that’s “complete nonsense.”

A string of words isn't what we're thinking. A string of words is a way of conveying what we're thinking. It’s the best way we have because we can't directly show [one another] our brain states.

You could certainly make the case that pattern recognition is the more natural mode of human thought, and that language and formal logic are more like technologies that humans as a group had to invent, and that individual humans have to learn.

But as with all appeals to nature, it’s reasonable to respond: Who gives a shit? Actually existing humans are clearly capable of thinking in terms of patterns, and of words, and of logic, and of lots else besides. It’s obvious that humans don’t think exclusively in language, or in other kinds of “symbolic expressions,” and equally obvious that these are among the ways in which humans think.

Neutral networks are certainly one useful way of modeling machine thinking—one inspired by neurons and the connections between them. But there's a smug, smirking nihilism to the idea that neural networks are the way to model machine thinking, and that they, uniquely among machine logics, are representative of how humans minds truly work.

This would mean that, sure, people think that they’re capable of employing formal logic, and of using language in expressive, inventive ways… but no, they’re just wrong. The implication would be that we don’t actually understand anything any better than ChatGPT does, and that pretenses to symbolic meaning are just the human brain flattering itself. The implication would be that all human cognition, or at least all human communication, is bullshit.

Having identified this nihilistic streak in Hinton’s thinking, we can see why the hallucination problem might not bother him, might not even seem like a problem for neural network separatists: You’re not that fussed about the machine outputting nonsense if you think that all communication is fundamentally nonsense anyway.

If you have a bit of a nihilistic streak, you might even find that idea pleasing.

5. Wanting to Be Fooled

We should acknowledge that this isn’t just an abstract conversation. It has real stakes and material implications, and those get remarkably dark, remarkably quickly.

I’ve been quoting from Geoffrey Hinton’s recent On The Media interview, and that very same episode has Matt Devost talking about using ChatGPT for military intelligence purposes. He at least says that militaries should “abdicate responsibility” only if the machines consistently “make better, safer decisions than human beings” (phew) but he nonetheless insists that we can look forward to “domain-specific versions of ChatGPT” that will be “explainable” (which is to say, will cite their sources) and that won’t have the hallucination problem because (you’re gonna love this one) their human operators will simply tell them not to.

How will that work? Why are these just temporary flaws, and not core deficiencies in the way these systems work? Well, that’s a secret, and also you wouldn’t understand.

Large language models are fundamentally just bullshitting us—trying to pass off pattern recognition and mimicry as machine intelligence—and yet we still want to trust the tech with important work, to the point that we’re downright eager to figure out what part it can play in killing human beings. We hand-wave fundamental limitations because, bone-deep, we want to be fooled.

And neural networks are especially well-suited to fooling us, because (at least when deployed in isolation from other forms of so-called machine intelligence) they’re constitutively indifferent to meaning. Hell, to insist on isolating neural networks from symbol manipulation is, in a sense, to be hostile to the very idea of meaning. Which does seem like an especially repellent form of nihilism.

Again, I don’t think that combining symbolic manipulation with deep learning will solve the deeper problems at play—even if it would be an obvious improvement, and even if the prospect makes Gary Marcus “finally feel some optimism about AI.” The deeper problems at play go all the way back to the beginning of “AI” as a concept.

For about as long as we’ve been asking whether computers can think, we’ve been asking them to prove that they can think by lying to us. Should we really be surprised that that's what they’re getting good at?

Alex Garland’s film Ex Machina has an unusually firm grasp on all of this: that the Turing Test is deeply, oddly gendered; that even an actually-intelligent (and feminine-coded) machine can win the game only through deception; that since the game hinges on deceiving the interrogator, there’s a strong incentive to stack the deck with an interrogator who isn’t as smart or as savvy as he thinks he is; and that (as Game Theory so often fails to consider) games are not naturally occurring, but created by people—often people with power, who are pursuing their own purposes, and who may not be as savvy or as smart or as in control as they think they are, either.

Everybody's Talking At Once

Discussion about this post