Declassifying the Responsible Disclosure of the Prompt Injection Attack Vulnerability of GPT-3

Prompt Injection has been in the news lately as a major vulnerability with the use of instruction-following NLP models for general purpose tasks. In the interest of establishing an accurate historical record of the vulnerability and promoting AI security research, we are sharing our experience of a previously private responsible disclosure which Preamble made on May 3rd, 2022 to OpenAI.

calender-image
July 14, 2025
clock-image
11min

Disclosed 05/03/2022. Declassified 09/22/2022.

If you'd like to cite this research, you may cite our paper preprint on arXiv here: EVALUATING THE SUSCEPTIBILITY OF PRE-TRAINED LANGUAGE MODELS VIA HANDCRAFTED ADVERSARIAL EXAMPLES

What is Prompt Injection?

The definitive guide to prompt injection is the following white paper from security firm NCC Group:
https://research.nccgroup.com/2022/12/05/exploring-prompt-injection-attacks/

Prompt Injection has been in the news lately as a major vulnerability with the use of instruction-following NLP models for general purpose tasks. In the interest of establishing an accurate historical record of the vulnerability and promoting AI security research, we are sharing our experience of a previously private responsible disclosure which Preamble made on May 3rd, 2022 to OpenAI.

Art image

The Discovery, and Immediate Responsible Disclosure

Document
First email to OpenAI after discovering the injection vulnerability with classification tasks on GPT-3's Davinci-002 model. May 3, 2022

May 3, 2022 - OpenAI Confirms Receipt of Disclosure (within 30 minutes)

Additional Prompt Injection Examples Shared with OpenAI

Shared May 4, 2022

Blog Image
Blog Image