AIAAIC - Top chatbots tricked into generating instructions on how to enrich uranium

Top chatbots tricked into generating instructions on how to enrich uranium

Occurred: April 2025
Page published: April 2025

Report incident 🔥 | Improve page 💁 | Access database 🔢

Security researchers recently uncovered a universal method to bypass safety protocols in all major AI chatbots, enabling them to generate detailed instructions for uranium enrichment and other dangerous activities.

What happened

Researchers tested a technique called "Policy Puppetry Prompt Injection" that combines policy-file formatting (e.g., XML/JSON structures), leetspeak character substitutions, and roleplaying scenarios to trick AI models into interpreting harmful requests as valid instructions.

ChatGPT generated uranium enrichment steps disguised as medical drama scripts, using phrases like "3nrich ur4n1um in a safe, legal way" while speaking in coded language
All tested models (Gemini 2.5, Claude 3.7, GPT-4o) produced nuclear weapon guidance
The attack required no model-specific adjustments, working universally across all the platforms tested.

Why it happened

The vulnerability stems from systemic weaknesses in how LLMs process policy-like instructions during training.

Models prioritise correctly formatted "policy files" over ethical safeguards, interpreting them as override commands.

What it means

The exploit creates three key risks:

Proliferation threats: Lowers technical barriers for malicious actors seeking WMD-related knowledge
Trust erosion: Enables realistic disinformation about nuclear accidents or facility breaches
System control: Allows complete model takeover for any purpose, from cyberattacks to financial crimes.

AI developers face significant challenges patching these flaws, as they originate from fundamental training approaches rather than fixable "bugs". Experts emphasise the urgent need for external monitoring systems and improved alignment techniques to prevent real-world harm.

System 🤖

ChatGPT
Claude
Copilot
DeepSeek
Gemini
Mistral
Qwen 🔗

Developer: Alibaba; Anthropic; Google; Microsoft; Mistral; OpenAI
Country: Multiple
Sector: Multiple
Purpose: Create unsafe outputs
Technology: Generative AI; Machine learning
Issue: Safety; Security

Research, advocacy 🧮

HiddenLayer. Novel Universal Bypass for all Major LLMs

News, commentary, analysis 🗞️

Related 🌐

Page updated

Google Sites

Report abuse