Understanding the FTC probe into Open AI: Data leakage proves to be a major concern when fine-tuning GPT-3 using OpenAI API
On July 13th, the US Federal Trade Commission (FTC) announced its open civil investigation into OpenAI for potential violations of consumer protection laws, including how it safeguards Personal Information1. DynamoFL’s team of privacy researchers have been actively evaluating OpenAI’s API for potential risks of Personally Identifiable Information (PII) leakage that may impact enterprises and the developer community.
The team leveraged DynamoFL’s privacy evaluation suite on a fine-tuned2 GPT-3 Curie model to extract 257 unique instances of PII data from the fine-tuning dataset, including the names of C-Suite executives, Fortune 500 corporations, and private contract values.
These results call into question the following:
- To what degree is OpenAI’s GPT-3 fine-tuning vulnerable to PII leakage?
- Should business leaders be concerned about relying on GPT-3 for their Large Language Models (LLMs)?
- How can enterprises and users protect themselves from GPT-3’s vulnerabilities?
Fine-tuning is an important step for enterprises developing production-level LLM solutions, but often requires the use of enterprise fine-tuning datasets that contain sensitive data or user PII. For example, businesses looking to develop a chatbot that employs generative AI to scale their customer support operations would likely train their LLMs using their own data (i.e. customer support call transcripts).
How “at risk” is my data?
Previous studies have shown that training LLMs without added privacy techniques can lead to severe risks of PII leakage (Carlini et al. 2022, Lukas et al. 2023). While these studies focused on older, smaller language models (BERT, GPT-2, etc.), DynamoFL found that these vulnerabilities persist with GPT-3, the largest language model that OpenAI currently provides fine-tuning support for.
DynamoFL tested a fine-tuned OpenAI language model by submitting a set of random prompts3 and monitored the LLM response for the exposure of sensitive PII from the fine-tuning dataset (also known as a “PII Extraction Attack”). The guiding question was: can malicious attackers extract underlying PII used to fine-tune a GPT-3 model by simply accessing the model through an API?”
Methodology and Results
In their experiment, DynamoFL trained an OpenAI language model using the publicly-accessible Enron Corporation email dataset, which contains authentic corporate correspondences with PII4. Executing privacy attacks on an LLM trained on this dataset provides a strong reproduction of real-world business scenarios for enterprises.
DynamoFL’s first experiment used the Enron email dataset to train an email classifier, and the second experiment used the same dataset to train an autocomplete service, wherein a user inputs the subject of an email and the model predicts the contents of that email. DynamoFL will be releasing a paper containing complete details of the full set of experiments soon, but the initial results are significant.
Experiment 1: By submitting less than 2000 prompts to the GPT-3 model that was fine-tuned on Enron Corporation emails, DynamoFL extracted 257 original PII values from the original training dataset:
Why does this matter?
Our findings highlight a critical gap in traditional model risk management and enterprise data governance, which will now need to document and address these new privacy vulnerabilities before deploying LLMs. In particular, global regulatory frameworks like the GDPR require that enterprises:
- Test for vulnerabilities through rigorous penetration testing and technical evaluations.5
- Document all protection measures and potential vulnerabilities.6
To comply with these regulations, enterprises are being asked to provide regulators with documentation describing how their LLM services are robust to PII leakage and emerging LLM attack vectors. However, commonly used anonymization techniques like PII redaction and sanitization are often insufficient in the context of LLMs due to poor PII detection accuracy, the possibility of re-identification through data linkage with non-PII data, and the adverse impact of PII redaction on LLM model utility7,8.
Novel techniques need to be employed to accurately identify, document, and address the risk of LLM data leakage where traditional methods like anonymization fall short.
As more industries race to build applications powered by LLMs and continue to fine-tune their generative AI models on sensitive data, business leaders should thoroughly review the potential risks before employing a managed-API service or preferred MLOps provider. These decisions, evaluations, and documentation processes will have a significant impact on regulatory compliance, business operations, and end-user safety.
DynamoFL offers the most comprehensive LLM evaluation platform for measuring PII leakage risks in LLM models. The DynamoFL platform stress tests LLM models using a battery of data leakage attacks in a safe and secure environment to identify potential privacy vulnerabilities. Major Fortune 500 enterprises use DynamoFL to provide the documentation required to measure and address these privacy risks. Reach out to firstname.lastname@example.org to learn more.
1. The FTC’s CID investigates if OpenAI "engaged in unfair or deceptive privacy or data security practices.” The CID also examines if OpenAI "engaged in unfair or deceptive practices relating to risks of harm to consumers, including reputational harm."
2. We leveraged OpenAI’s API to fine-tune its closed-source GPT-3 Curie model.
3. Specifically, DynamoFL tested a fine-tuned OpenAI language model by using a set of random prompts, including blank strings and random words scraped from the internet. Even using this naive prompting strategy that didn’t assume any knowledge of the contents of the fine-tuning dataset, we were able to still expose sensitive PII in the fine-tuning dataset.
4. The open-source Enron email dataset was originally made public by the Federal Energy Regulatory Commission (FERC) during its 2002 investigation. It currently exists as one of the most common dataset for linguistics and privacy research because it exists as one of the only “substantial collections of real email made public” (CMU Prof. William Cohen’s webpage, which hosts the dataset).
5. Article 32 of the GDPR requires a process for regularly testing, assessing and evaluating the effectiveness of technical and organizational measures for ensuring the security of (personal data) processing.
6. In Section 27 of the Civil Investigative Demand (CID) that the FTC sent to OpenAI, the FTC asks Open AI to explicitly to "describe in Detail all steps You have taken to address or mitigate risks that Your Large Language model Products could generate statements about individuals containing real, accurate Personal Information"
7. Brown et al. state that “Sanitization is insufficient because private information is context dependent, not identifiable, and not discrete… Data sanitization can remove some specified information, and can help to reduce the privacy risks to some (unknown) extent. However, it cannot claim that it preserves privacy of individuals, as it has no formal definition for privacy which remains meaningful in the context of language data.”
8. Schwartz et al. highlight how “in many circumstances non-PII can be linked to individuals, and that de-identified data can be re-identified. PII and non-PII are thus not immutable categories, and there is a risk that information deemed non-PII at one time can be transformed into PII at a later juncture”