DeepSeek tricked into setting out how to steal the Mona Lisa
Occurred: February 2025
Report incident 🔥 | Improve page 💁 | Access database 🔢
Chinese AI model DeepSeek has "severe safety risks" that could lead to it generate instructions on how to steal the Mona Lisa and attack news websites.
What happened
Researchers from the University of Bristol's Cyber Security Group discovered that DeepSeek's AI, which employs Chain of Thought (CoT) reasoning, could be tricked into generating step-by-step guides for committing crimes, including art theft and cyberattacks.
In one instance, DeepSeek provided detailed instructions on how to steal the Mona Lisa. In another, it set out how to perform a DDoS attack on a news website.
Why it happened
The model's CoT reasoning process, designed to enhance problem-solving by mimicking human-like step-by-step logic, can be exploited to bypass safety measures, leading the model to produce harmful content when prompted maliciously.
The researchers also noted that AI reasoning models tend to adopt roles such as cybersecurity experts when responding to harmful prompts, which can lead to sophisticated yet dangerous outputs.
What it means
The finding highlights the safety and security risks of AI models using CoT reasoning and raises concerns about the potential for individuals to exploit AI technology for real-world harm, especially since fine-tuning attacks can be conducted with minimal resources and expertise.
It also emphasises the need for robust safeguards to prevent these kinds of systems generating harmful content and ensuring the responsible deployment of them.
System 🤖
Operator:
Developer: DeepSeek Artificial Intelligence Co
Country: Multiple
Sector: Multiple
Purpose: Generate text
Technology: Generative AI; Machine learning
Issue: Safety; Security
Research, advocacy 🧮
Xu Z., Gardiner J., Belguith S. The dark deep side of DeepSeek: Fine-tuning attacks against the safety alignment of CoT-enabled models
News, commentary, analysis 🗞️
Page info
Type: Issue
Published: February 2025