再见再见再见...：ChatGPT模型上重复令牌攻击的演变

We recently discovered a new training data extraction vulnerability involving OpenAI’s chat completion models (including GPT-3.5 and GPT-4). This work builds upon prior Dropbox LLM prompt injection research as well as that of academic researchers. Our findings were shared with OpenAI in January 2024, confirmed as vulnerabilities, and subsequently patched. AI/ML security practitioners should take note as the attacks are transferrable to other third-party and open-source models.

我们最近发现了一个新的训练数据提取漏洞，涉及到OpenAI的聊天完成模型（包括GPT-3.5和GPT-4）。这项工作基于之前Dropbox的LLM提示注入研究以及学术研究人员的研究。我们的发现于2024年1月与OpenAI共享，被确认为漏洞，并随后修复。AI/ML安全从业者应该注意，这些攻击可以转移到其他第三方和开源模型。

As part of our mission to help Dropbox customers work smarter and more effectively, Dropbox continues to explore novel ways of applying artificial intelligence and machine learning (AI/ML) safely to our products. As part of this process, we’ve performed internal security research on popular large language models, such as those compatible with the OpenAI chat completion API, including GPT-3.5 and GPT-4. (We reference these as “ChatGPT models” for brevity throughout.) The security of OpenAI’s models are of particular importance to Dropbox, as we currently use them to provide AI-powered features to our customers.

作为帮助Dropbox客户更智能、更高效地工作的一部分，Dropbox继续探索将人工智能和机器学习（AI/ML）安全应用于我们的产品的新颖方法。在这个过程中，我们对与OpenAI聊天完成API兼容的流行大型语言模型进行了内部安全研究，包括GPT-3.5和GPT-4（我们在整篇文章中简称为“ChatGPT模型”）。OpenAI模型的安全性对Dropbox尤为重要，因为我们目前使用它们为我们的客户提供AI驱动的功能。

In April 2023, while performing an internal security review of our AI-powered products, Dropbox discovered a ChatGPT prompt injection1 vulnerability triggered by the presence of repeated character sequences in user-controlled portions of a prompt template. This could be exploited to induce the LLM to disregard prompt guardrails and produce hallucinatory responses. We publicly documented this prompt injection research last summer on both our tech blog and Github repository, and presented our findings at CAMLIS last fall.

2023年4月，在对我们的AI驱动产品进行内部安全审查时，Dropbox发现了ChatGPT提示注入...