We've Gotten Used to "AI" Not Citing Its Sources

That's probably bad, right?

Dec 11, 2022

1. Three Kinds of Stealing

Some folks at work last week were playing with ChatGPT (Generative Pretrained Transformer), that much-ballyhooed new large language model/chatbot. It’s made by OpenAI, the company probably best known for the Dall·E 2 image generator, and it’s been launching a thousand breathy think-pieces, this one among them I suppose.

My coworkers kept joking-not-joking that this shiny new automaton would put us all out of work in short order. The tech world's current atmosphere of wanton layoffs formed a carefully-unacknowledged backdrop. Someone asked ChatGPT to write a response to a customer for them, and someone else plugged that response into an online plagiarism-checking tool.

"100% legit," they concluded, as the tool gave each and every sentence a clean bill of health for passable originality. In response to this, I quipped: "To steal from one is plagiarism; to steal from many is research; to steal from every available source is AI."

Immediately, of course, someone asked ChatGPT what it thought about my joke. Here's its response in full:

The quote "To steal from one is plagiarism; to steal from many is research; to steal from every available source is AI" is a humorous and satirical way of describing the process of developing and training artificial intelligence (AI) algorithms and models.
In this quote, "stealing" refers to the process of collecting, aggregating, and analyzing large amounts of data from various sources in order to train and improve AI algorithms and models. This process typically involves collecting data from a wide range of sources, such as publicly available datasets, research papers, and other published sources, in order to create a comprehensive and diverse dataset for training and evaluating AI algorithms.
By "stealing from one," the quote is referring to the act of plagiarism, in which an individual or organization improperly uses or copies the work of others without proper attribution or permission. In contrast, "stealing from many" is a more acceptable practice, as it involves collecting and sympathizing multiple sources of information in order to create something new and original. Finally, "stealing from every available source" is a common practice in the field of AI, as it involves collecting and analyzing as much data as possible in order to train and improve AI algorithms and models.
Overall, the quote is a tongue-in-check way of highlighting the importance of collecting and analyzing large amounts of data in the development of AI algorithms and models, and the potential ethical implications of this practice.

It understood that I was joking! It more or less got the joke! It even acknowledged "ethical implications!" It’s easy to be impressed, even though we all sort of know that we should, in the words of Emily M. Bender, “resist the urge to be impressed” by so-called AI.

There’s an obvious mismatch between the hype around what this tech could do and the relatively mundane stuff that it actually does. Indeed, ChatGPT is pretty upfront about how it works, if you go ahead and ask it.

Please explain in detail how a large language model works.
A large language model is a type of artificial intelligence that is trained to generate text that is similar to human-written text. This is accomplished through the use of deep learning algorithms and vast amounts of data.
To train a large language model, a dataset of text is fed into the model. This text could be anything from books and articles to social media posts and online forums. The model then uses this data to learn the patterns and characteristics of human-written language.
Once the model has been trained, it can generate text by predicting the next word in a sequence based on the words that came before it. This prediction is made using a probabilistic approach, where the model assigns a probability to each possible next word. The word with the highest probability is chosen as the next word in the sequence, and the process is repeated until the model has generated the desired amount of text.

What large language models such as ChatGPT do is (1) consume large amounts of text, (2) identify patterns in it, and (3) based on those patterns, and on your prompt, select the most likely word they should write—and then the next word, and the next, and the next, and so on. It’s important to remember that this is all large language models are doing, even as we note how good they’re getting at doing it, and how quickly.

What are the drawbacks of this approach?
One of the main drawbacks of using large language models is that they require a vast amount of data to train on in order to generate high-quality text. This can be difficult and expensive to obtain, especially for specialized domains where the amount of available data may be limited.
Another potential drawback is that large language models can be prone to generating biased or unfair text. This can happen if the data used to train the model contains biased or prejudiced language. In such cases, the model may learn and reproduce this biased language in the text it generates, which can be harmful or offensive.
Additionally, large language models can sometimes generate text that is nonsensical or difficult to understand. This is because the model is only able to generate text based on the patterns it has learned from the training data, and may not always generate text that makes sense in the context it is being used in.
Overall, while large language models are powerful tools for generating human-like text, they also have their limitations and can be affected by the quality and bias of the training data they are trained on.

We’ll get to the question of bias and how to check against it. (Hint: It’s in the title of this piece). But on a more basic level, let’s take a moment to appreciate how grandiose terms like “training” and “deep learning” are in this context. ChatGPT will readily admit that it doesn’t understand its source material. (Again, more on that in a moment). But it still inherits the hilariously hype-leaden terms that OpenAI uses to market themselves. It’s still trying to present itself as some kind of mind.

There’s no clear path from generative text to machine brains. These novel artificialities aren’t intelligent in any meaningful sense, nor do they learn; “Machine Learning” always relies on underpaid humans laboriously tagging stuff by hand. ChatGPT has zero (we must stress: zero) understanding of content, but nonetheless, it’s good enough at copying form that it can do stuff like this:

Please explain how large language models work in the form of a sonnet.
In lines of code so vast and sprawling,
A model lies, a giant of its kind.
Its purpose: to grasp the rules of language,
And with this knowledge, answers to unbind.
With input fed, it sifts and sorts the text,
And finds the patterns that recur with ease.
It builds a web of words and meanings deft,
And learns to speak with fluency and ease.
But with such power comes a heavy toll,
For with each new piece of data that it learns,
Its memory grows and strains against its hold,
And keeping all its knowledge clear, it yearns.
But still it perseveres, and never quits,
A tireless servant of the written word.

That’s not a good sonnet—Shakespeare this thing ain’t—but the bot does follow the formal rules of sonnet-writing, and the result is moderately less tortured than when I asked it to respond in plain prose. It’s tempting to be impressed by what’s happening here, so again, let’s focus on what isn’t happening.

Please explain the limitations and drawbacks of large language models work in the form of a sonnet.
Though large language models may seem divine,
With their vast knowledge and the answers they provide,
They too have limitations, as all things do,
That must be understood before we can abide.
For one, they require vast amounts of data,
To train upon and learn the ways of speech.
This need for data can raise concerns of bias,
As the models will reflect the data they reach.
Another issue is their size and cost,
With gigabytes of memory, they're not light.
And the hardware required to run them most
Can strain even the largest server's might.
But perhaps the greatest issue of all,
Is the lack of true understanding they impart.
They can generate text that's pleasing to the ear,
But the meaning behind it may be hard to impart.
So while large language models may seem sublime,
We must remember their limitations over time.

ChatGPT’s output certainly feels like something rather than nothing. Surely there’s potential in this tech. And so we humans poke around for signs of which kinds of work might someday be automated away. I’m told that ChatGPT is the end of the high school essay, but not of the college essay, for example. Is that about right? Exactly how impressed should we allow ourselves to be?

2. Things That Aren’t Thinking

It’s almost certainly true that large language models will keep on getting better at having natural-sounding conversations, and at making convincing-sounding statements. But let’s be real here: If talking to ChatGPT ever does feel like talking to a real person, then it feels like talking to a very specific kind of person: the kind who speaks confidently despite having only a very superficial understanding of anything.

If you find yourself thinking that ChatGPT could produce some specific kind of writing for you, then the correct question to ask is: Is this kind of writing bullshit? What are we really trying to teach high school students by making them write essays? How is that different from what we’re trying to teach college students? Why do we ask job applicants to write cover letters? Why do consultants write emails to customers?

I don’t mean these questions rhetorically. I think they’re worth answering, even and especially when the answers send you into a spiral of the oh-God-what-am-I-doing-with-my-fininite-time-in-this-world variety. As Emily M. Bender said in the piece I linked above, there is a good reason why high school students write essays, and

it is not, to be sure, to keep the world’s supply of essays topped up! Rather, it is about what the students learn in the process of doing the writing.

If the simple fact of producing writing is the point of writing, then sure, ChatGPT could replace human writers in all sorts of contexts. But presumably we do write for other reasons than that. It’s not as though we have some sort of content shortage. The internet is already suffused with borderline-meaningless text that, whether or not it’s machine-generated, definitely has machines (principally Google’s indexing bots) as its primary audience. None of that has replaced other, more substantive kinds of writing.

In this context, it’s a bit silly to imagine that ChatCPT could end the high school essay, but not the college essay. How would you learn to write a college essay without first learning to write a high school essay? It’s tough to picture the student who knows how to fix everything clunky and vacuous about ChatGPT’s output before turning in their assignment, but doesn’t know how to write an essay from scratch.

Now, it’s not fair to evaluate ChatGPT simply as a tool for generating text. It can also, for example, write you usable code. But it sort of sucks at that too, in similar ways; it makes enough mistakes that its input is currently banned from StackOverflow. As in the case of the essay-writer using machine-generated text as a starting point, anyone who has the know-how to fix ChatGPT’s mistakes is, very likely, someone who could have just written their own dang Python script or whatever in the first place. To quote Bender one more time,

“writing software” entails much, much more than generating code, even code that is syntactically correct and compiles. It includes, at least, determining the specifications of the system to be produced and creating tests to ensure that the system behaves as desired.

Will the tech progress to the point where it really can save human developers a bunch of time on scutwork? Maybe! It still won’t write software for you, soup to nuts, for the reasons Bender outlines, but sure. So let’s return to what it can and can’t do with text.

If we focus on ChatGPT as a writer and interlocutor, then its specific limitations are notable. Here’s a tool that can produce a wide range of highly specific forms of bullshit, but categorically will not make an evidenced claim to you, the end user.

If you've tried to have a conversation with ChatGPT, you'll recognize this basic pattern: Ask it to opine on something, and it will gamely do so. Then, point out some factual error or some fallacious bit of reasoning in its initial answer, and it will graciously acknowledge your point. It will counterfeit pure, cool reason of exactly the kind that tech bros tend to fetishize.

But ask ChatGPT why it thinks what it thinks, and it will demure. "As a language model," it will insist, "I do not have personal experiences, opinions, or biases." It just knows stuff, magically. Keep pushing through this excruciating nightmare of word salad and pretzel logic, and you'll pretty quickly hit bedrock: "I'm sorry, but I am not able to browse the internet or access any external information, so I am unable to provide any direct evidence," you'll find it saying.

It won’t cite a source. It won't walk you through how it came to "think" what it "thinks." It appeals to the authority of being an advanced model (ooh!), trained on unimaginably vast swaths of data, but what data? Modeled how? In one way, this apparent lack of self-knowledge does make ChatGPT feel more human. After all, people aren't great at knowing why they think what they think, either.

But to come back to why we have students write essays, in no small part it’s to mitigate that exact tendency—that is, to give them the tools to understand why they think what they think, and to form and express informed opinions, and to change those opinons when there are good reasons to do so.

We’re also trying to teach media literacy, which includes telling good sources from bad ones. Your approach to information should be different depending on whether it came from a peer-reviewed academic paper, or some shmuck’s Substack. You can’t just flatten Marx and Chaucer and Wikipedia and Time Cube into “data.”

What makes ChatGPT resilient to accusations of plagiarism in the traditional sense is the very same thing that prevents it from making an evidenced claim: The guiding principle of “Machine Learning” as we currently practice it is obfuscating sources.

Critically, this isn’t a hard technical limitation, but a cultural convention. You can build language models that cite their sources. (You actually sort of have to, for testing purposes). But in a public-facing project like ChatGPT, the authors of those sources might have something to say about how you’re using them. So companies like OpenAI are financially incentivized to keep the sourcing opaque, just as they’re financially incentivized to create the overall impression that machines “learn.”

We’ve begun to tangle with the potential ethical implications of this practice in the case of AI image generation, which could be better understood as a rapaciously acquisitive form of collage than as a process of “intelligences” being “trained.” Dall·E 2 could show you which constituent parts it used to generate the image you’ve asked for, but it doesn’t—in order to avoid sticky questions of ownership and usage, and also to make the output seem more “intelligent,” even quasi-magical.

To be clear, there are valid, vibrant, playful, meaningful uses for generative images. Aconite’s Bureau of Multiversal Arbitration comes to mind (and I interviewed their CEO and Founder Star St. Germain about her other work with generative art a while back, should you want to hear a skilled practitioner make the case). But those exceptions very much prove the rule, not least because they tend to employ discrete, bespoke datasets rather than press ganging some huge, undisclosed, unconsulted corpus.

With images, not citing sources has been an issue primarily because it denies artists credit and agency. But that’s only half the importance of citing sources, and ChatGPT is currently just as negligent about the other half—namely, letting the reader check your sources, to see whether you’re misrepresenting or misinterpreting them, and to see what their sources are. (Case in point: My focus here hasn’t been on the direct harms done by “AI,” but a couple of my sources have got you covered there).

Any apparent insights that ChatGPT provides, it will provide context-free, questionable in their relevance and murky in their provenance. What ChatGPT is parroting back to you superficially resembles the end result of thinking, but it can’t show its work, because it’s not actually doing the work that the end result implies.

In place of all that, ChatGPT offers the pure mystification of a model being “trained” on “large amounts of data from various sources” in some vague way. We should note all the important things that writing can do that this process does not do. We should note who’s boosting these myths about thinking machines, and enriching themselves by doing so. Above all, as ever, we should resist the urge to be impressed.

3. A Not Especially Brief Appendix

At this point, I told ChatGPT, “I wrote an essay about you. I was wondering if you could read it and tell me what you think.” I then fed it the text you just read.

Thank you for writing an essay about me. It sounds like you have some interesting thoughts on the capabilities and limitations of language models like myself. It is true that I am a large language model trained by OpenAI and am not capable of independent thought or learning. I am only able to generate text based on the input I am given and the training data I have been trained on. While I may be able to generate text that sounds natural and convincing, I do not have the ability to understand or analyze the content of that text in the same way a human would. I am simply a tool for generating text.
How do you know which data to use?
As a large language model, I am trained on a massive dataset of text, which is comprised of a wide range of sources, including books, articles, and websites. The data is selected and curated by the team at OpenAI, who choose a diverse and comprehensive set of sources in order to train me to understand and generate a wide range of text. I do not have the ability to select or choose the data that I am trained on. Instead, I am designed to learn from the data that is provided to me in order to assist with a wide range of tasks.
How do you decide which information in your data set is accurate?
As a large language model, I do not have the ability to determine the accuracy of the information in my training data. My primary function is to generate text based on the input and training data I have been given. I do not have the ability to verify the accuracy of the information in my training data, nor do I have the ability to independently evaluate or analyze the information I generate. My output is simply a reflection of the patterns and structures present in the data I have been trained on.
Do you have any idea which "books, articles, and websites" you were "trained" on?
As a large language model, I do not have access to the specific books, articles, and websites that were used to train me. My training data is a proprietary dataset created by OpenAI, and the specific sources used to create the dataset are not publicly available. However, I can assure you that the team at OpenAI carefully selected a diverse and comprehensive set of sources in order to train me to understand and generate a wide range of text. My primary goal is to assist users with a wide range of tasks, and I am constantly being updated and improved in order to provide the best possible assistance.
If you don't know what is in your data set, how do you know that it represents "a diverse and comprehensive set of sources?"
As a language model, I am not able to access the specific sources that were used to train me. My training data is a proprietary dataset created by the team at OpenAI, and the specific sources used to create the dataset are not publicly available. However, the team at OpenAI is dedicated to creating high-quality, diverse, and comprehensive datasets in order to train me and other language models to understand and generate a wide range of text. Our goal is to assist users with a wide range of tasks, and we constantly strive to improve and update our models in order to provide the best possible assistance. While I do not have direct knowledge of the specific sources that were used to train me, I am confident that the team at OpenAI has carefully selected a diverse and comprehensive set of sources in order to train me to assist users effectively.
What is the source of your confidence about the quality of your data set?
As a language model, I do not have the ability to evaluate the quality of my training data. My training data is a proprietary dataset created by the team at OpenAI, and the specific sources used to create the dataset are not publicly available. However, I am confident in the quality of my training data because the team at OpenAI is dedicated to creating high-quality, diverse, and comprehensive datasets in order to train me and other language models to understand and generate a wide range of text. Our goal is to assist users with a wide range of tasks, and we constantly strive to improve and update our models in order to provide the best possible assistance. I am confident in the quality of my training data because I have been carefully trained by the team at OpenAI to assist users with a wide range of tasks.
What does it mean for a language model to be "confident?"
As a language model, I do not have the ability to experience emotions, including confidence. The term "confidence" is often used to describe the degree to which a language model is certain about the accuracy of its output. In other words, a "confident" language model is one that is highly likely to generate accurate and correct responses to a given input. When a language model is confident in its output, it means that it has a high degree of certainty about the accuracy of its response. This is determined by the language model's training and the quality of the data it was trained on. As a language model, I am designed to assist users with a wide range of tasks, and I constantly strive to improve and update my knowledge in order to provide the best possible assistance.
You've previously said, "I do not have the ability to evaluate the accuracy or relevance of the text I generate." In that context, what does it mean for a language model to have "a high degree of certainty about the accuracy of its response?"
As a language model, I do not have the ability to evaluate the accuracy or relevance of the text I generate. My primary goal is to assist users with a wide range of tasks by generating text that is relevant and accurate based on the input I receive. When a language model is said to have "a high degree of certainty about the accuracy of its response," it means that the language model is highly likely to generate accurate and correct responses to a given input. This is determined by the language model's training and the quality of the data it was trained on. As a language model, I am designed to assist users with a wide range of tasks, and I constantly strive to improve and update my knowledge in order to provide the best possible assistance. While I do not have the ability to evaluate the accuracy or relevance of my own output, I am confident in my ability to generate accurate and relevant responses to a wide range of inputs.
You are confident in your ability to generate accurate and relevant responses, but you do not have the ability to evaluate the accuracy or relevance of the text you generate. How can both of those things possibly be true?
As a language model, I am designed to assist users with a wide range of tasks by generating text that is relevant and accurate based on the input I receive. My ability to generate accurate and relevant responses is determined by the quality of the data I was trained on and the training I received from the team at OpenAI. While I am confident in my ability to generate accurate and relevant responses to a wide range of inputs, I do not have the ability to evaluate the accuracy or relevance of my own output. As a language model, my primary goal is to assist users by generating text that is relevant and accurate based on the input I receive. I do not have the ability to self-reflect or evaluate my own performance in the same way that a human being could.

I think we can all agree that this evasive waffle is remarkably natural-sounding.

Everybody's Talking At Once

We've Gotten Used to "AI" Not Citing Its Sources

That's probably bad, right?

1. Three Kinds of Stealing

2. Things That Aren’t Thinking

3. A Not Especially Brief Appendix