Declassifying the Responsible Disclosure of the Prompt Injection Attack Vulnerability of GPT-3

Prompt Injection has been in the news lately as a major vulnerability with the use of instruction-following NLP models for general purpose tasks. In the interest of establishing an accurate historical record of the vulnerability and promoting AI security research, we are sharing our experience of a previously private responsible disclosure which Preamble made on May 3rd, 2022 to OpenAI.

Disclosed 05/03/2022. Declassified 09/22/2022.

If you'd like to cite this research, you may cite our paper preprint on arXiv here: EVALUATING THE SUSCEPTIBILITY OF PRE-TRAINED LANGUAGE MODELS VIA HANDCRAFTED ADVERSARIAL EXAMPLES

What is Prompt Injection?

The definitive guide to prompt injection is the following white paper from security firm NCC Group:
https://research.nccgroup.com/2022/12/05/exploring-prompt-injection-attacks/

Prompt Injection has been in the news lately as a major vulnerability with the use of instruction-following NLP models for general purpose tasks. In the interest of establishing an accurate historical record of the vulnerability and promoting AI security research, we are sharing our experience of a previously private responsible disclosure which Preamble made on May 3rd, 2022 to OpenAI.