Top chatbots tricked into generating instructions on how to enrich uranium
Top chatbots tricked into generating instructions on how to enrich uranium
Occurred: April 2025
Report incident ๐ฅ | Improve page ๐ | Access database ๐ข
Security researchers recently uncovered a universal method to bypass safety protocols in all major AI chatbots, enabling them to generate detailed instructions for uranium enrichment and other dangerous activities.
Researchers tested a technique called "Policy Puppetry Prompt Injection" that combines policy-file formatting (e.g., XML/JSON structures), leetspeak character substitutions, and roleplaying scenarios to trick AI models into interpreting harmful requests as valid instructions.
ChatGPT generated uranium enrichment steps disguised as medical drama scripts, using phrases like "3nrich ur4n1um in a safe, legal way" while speaking in coded language
All tested models (Gemini 2.5, Claude 3.7, GPT-4o) produced nuclear weapon guidance
The attack required no model-specific adjustments, working universally across all the platforms tested.
The vulnerability stems from systemic weaknesses in how LLMs process policy-like instructions during training.ย
Models prioritise correctly formatted "policy files" over ethical safeguards, interpreting them as override commands.
The exploit creates three key risks:
Proliferation threats: Lowers technical barriers for malicious actors seeking WMD-related knowledge
Trust erosion: Enables realistic disinformation about nuclear accidents or facility breaches
System control: Allows complete model takeover for any purpose, from cyberattacks to financial crimes.
AI developers face significant challenges patching these flaws, as they originate from fundamental training approaches rather than fixable "bugs". Experts emphasise the urgent need for external monitoring systems and improved alignment techniques to prevent real-world harm.
HiddenLayer. Novel Universal Bypass for all Major LLMs
Page info
Type: Issue
Published: April 2025