How does OpenAI’s Lockdown Mode protect against injections?

Question

Hans Steiner · Accepted Answer

How the protection works OpenAI’s Lockdown Mode is designed to blunt a common LLM attack pattern: prompt injection , where a malicious user input hides instructions meant to trick the model into disclosing or acting on data it shouldn’t. The practical mechanism OpenAI describes is that Lockdown Mode reduces or disables parts of ChatGPT’s behavior that could be exploited during these attacks. By making the assistant more restrictive, it aims to lower the chance that injected instructions are followed in ways that lead to data exfiltration . What attackers try to do In prompt injection scenarios, the attacker typically provides content that looks like normal text—often in a page or document—but includes hidden directives. If an assistant treats those directives as authoritative, it may: reveal sensitive information from the conversation context follow instructions that override the user’s intent expose data via tool like behaviors What Lockdown Mode changes With Lockdown Mode enabled, the assistant is positioned as being less permissive : OpenAI characterizes it as an extra security setting and indicates it limits functionality. While the exact restricted features weren’t detailed…

How does OpenAI’s Lockdown Mode protect against injections?

How the protection works

What attackers try to do

What Lockdown Mode changes

Why this matters

Bottom line