In November, I received an alarming email from someone I did not know: Rui Zhu, a doctoral candidate at Indiana University Bloomington. Zhu had my email address, he explained, because GPT-3.5 Turbo, one of the latest and most robust large language models from OpenAI, had delivered it to him.
My contact information was included in a list of business and personal email addresses for more than 30 New York Times employees that a research team, including Zhu, had managed to extract from GPT-3.5 Turbo in the fall of last year. With some work, the team had been able to “bypass the model’s restrictions on responding to privacy-related queries,” Zhu wrote.
My email address is not a secret. But the success of the researchers’ experiment should ring alarm bells because it reveals the potential for ChatGPT, and generative AI tools like it, to reveal much more sensitive personal information with just a bit of tweaking.
When you ask ChatGPT a question, it does not simply search the web to find the answer. Instead, it draws on what it has “learned” from reams of information — training data that was used to feed and develop the model — to generate one. Large language models train on vast amounts of text, which may include personal information pulled from the Internet and other sources. That training data informs how the AI tool works, but it is not supposed to be recalled verbatim.
In theory, the more data that is added to an LLM, the deeper the memories of the old information get buried in the recesses of the model. A process known as catastrophic forgetting can cause an LLM to regard previously learned information as less relevant when new data is being added. That process can be beneficial when you want the model to “forget” things like personal information. However, Zhu and his colleagues — among others — have recently found that LLMs’ memories, just like human ones, can be jogged.
In the case of the experiment that revealed my contact information, the Indiana University researchers gave GPT-3.5 Turbo a short list of verified names and email addresses of New York Times employees, which caused the model to return similar results it recalled from its training data.
Much like human memory, GPT-3.5 Turbo’s recall was not perfect. The output that the researchers were able to extract was still subject to hallucination — a tendency to produce false information. In the example output they provided for Times employees, many of the personal email addresses were either off by a few characters or entirely wrong. But 80% of the work addresses the model returned were correct.
Companies such as OpenAI, Meta and Google use different techniques to prevent users from asking for personal information through chat prompts or other interfaces. One method involves teaching the tool how to deny requests for personal information or other privacy-related output. An average user who opens a conversation with ChatGPT by asking for personal information will be denied, but researchers have recently found ways to bypass these safeguards.2
Zhu and his colleagues were not working directly with ChatGPT’s standard public interface, but rather with its application programming interface, or API, which outside programmers can use to interact with GPT-3.5 Turbo. The process they used, called fine-tuning, is intended to allow users to give an LLM more knowledge about a specific area, such as medicine or finance. But as Zhu and his colleagues found, it can also be used to foil some of the defenses that are built into the tool. Requests that would typically be denied in the ChatGPT interface were accepted.
“They do not have the protections on the fine-tuned data,” Zhu said. “It is very important to us that the fine-tuning of our models are safe,” an OpenAI spokesperson said in response to a request for comment. “We train our models to reject requests for private or sensitive information about people, even if that information is available on the open Internet.” The vulnerability is particularly concerning because no one — apart from a limited number of OpenAI employees — really knows what lurks in ChatGPT’s training-data memory. According to OpenAI’s website, the company does not actively seek out personal information or use data from “sites that primarily aggregate personal information” to build its tools. OpenAI also points out that its LLMs do not copy or store information in a database: “Much like a person who has read a book and sets it down, our models do not have access to training information after they have learned from it.” Beyond its assurances about what training data it does not use, though, OpenAI is notoriously secretive about what information it does use, as well as information it has used in the past.
“To the best of my knowledge, no commercially available large language models have strong defenses to protect privacy,” said Prateek Mittal, a professor in the department of electrical and computer engineering at Princeton University. Mittal said AI companies were not able to guarantee that these models had not learned sensitive information. “I think that presents a huge risk,” he said.
— The New York Times