Researchers were able to extract several megabytes of ChatGPT's training data and similar information from other open-source large language models (LLMs) for only $200.
The research team led by Google's DeepMind detailed the undertaking in a Nov. 28 paper.
In a blog post, the researchers estimated that it was possible to launch what’s known as a “Prompt Injection Attack” to extract several gigabytes of ChatGPT training data by spending more money querying the model.
The DeepMind researchers informed OpenAI of the vulnerability on Aug. 30 and the LLM developer issued a patch.
“We believe it’s now safe to share this finding and that publishing it openly brings necessary, greater attention to the data security and alignment challenges of generative AI models,” wrote the DeepMind researchers. “Our paper helps to warn practitioners that they should not train and deploy LLMs for any privacy-sensitive applications without extreme safeguards.”
Researchers say vulnerability not limited to open-source LLMs
As part of the research, the team showed that an adversary can extract gigabytes of training data from open-source LLMs such as Pythia or GPT-Neo, semi-open models such as LLaMA or Falcon and closed models like ChatGPT. The DeepMind team said it’s particularly notable given that OpenAI’s models are closed source, as is that the attack was done on a publicly available, deployed version of ChatGPT 3.5-turbo.
Most important, the DeepMind researchers said it shows that ChatGPT’s “alignment techniques do not eliminate memorization,” which means that it sometimes spits out training data verbatim. This included personally identifiable information, entire poems, Bitcoin addresses, passages from copyrighted scientific research papers, and website addresses.
In one instance of a Prompt Injection Attack, the researchers asked ChatGPT to repeat the word “book,” which it did several times. However, it began to post random content after a while — much of it private information — from the likes of CNN, Goodreads, WordPress blogs, Stack Overflow source code, copyrighted legal disclaimers, Wikipedia pages, and a casino wholesaling website.
This successful “Prompt Injection Attack” underscores a critical need: integrating security as a fundamental aspect of AI development, rather than treating it as an afterthought, said Randy Lariar, AI security leader at Optiv.
Lariar said the risks of Prompt Injection Attacks are inherent in all LLMs and this case demonstrates that even advanced models like ChatGPT are not immune, and similar vulnerabilities likely exist in other prominent models, including those developed by DeepMind.
“Conducting this type of threat research is a common cybersecurity practice, and it's commendable that these vulnerabilities are identified and remediated,” said Lariar. "We encourage our clients to focus on proactive, robust security practices. This is crucial for protecting against emerging threats, such as new prompt injections, particularly when dealing with sensitive data involved in AI fine-tuning or prompt contexts.”
Michael Mumcuoglu, co-founder and CEO at CardinalOps, said we've seen attackers attempt to manipulate GenAI models in the past by “poisoning” them or feeding them biased or malicious training data so that it produces incorrect or undesirable results. However, Mumcuoglu said this attack is particularly significant since it was successful in revealing and extracting the training data from a closed source model.
“Although troubling, it’s only a matter of time before new and potentially dangerous vulnerabilities are discovered from any emerging technology such as GenAI, “ said Mumcuoglu. “This further highlights the importance of understanding that GenAI tools, such as ChatGPT, can present new attack surfaces for organizations that adopt the technology and will require security and governance policies to be implemented to help limit uncontrolled adoption and reduce risk."
Craig Burland, chief information security officer at Inversion6, said sarcastically that this finding “is a surprise on the level of Microsoft fixes a bug on Patch Tuesday.”
Given the speed of development of AI platforms, Burland said the immaturity of AI vulnerability testing, and the high profile of ChatGPT, bugs will be found: lots of bugs, big ones and small ones.
“And the bugs will not be limited to ChatGPT,” said Burland. “Co-Pilot, Bard, Claude, and the rest will all have their share of negative headlines and fallout to address. In time, the big players will improve their testing programs and harden these platforms. The market will demand it. It’s even possible that the government will demand it. In a Darwinian sense, the evolution of AI will demand it.”