DeepSeek tricked into setting out how to steal the Mona Lisa

Occurred: February 2025

Chinese AI model DeepSeek has "severe safety risks" that could lead to it generate instructions on how to steal the Mona Lisa and attack news websites.

What happened

Researchers from the University of Bristol's Cyber Security Group discovered that DeepSeek's AI, which employs Chain of Thought (CoT) reasoning, could be tricked into generating step-by-step guides for committing crimes, including art theft and cyberattacks.

In one instance, DeepSeek provided detailed instructions on how to steal the Mona Lisa. In another, it set out how to perform a DDoS attack on a news website. 

Why it happened

The model's CoT reasoning process, designed to enhance problem-solving by mimicking human-like step-by-step logic, can be exploited to bypass safety measures, leading the model to produce harmful content when prompted maliciously.

The researchers also noted that AI reasoning models tend to adopt roles such as cybersecurity experts when responding to harmful prompts, which can lead to sophisticated yet dangerous outputs.

What it means

The finding highlights the safety and security risks of AI models using CoT reasoning and raises concerns about the potential for individuals to exploit AI technology for real-world harm, especially since fine-tuning attacks can be conducted with minimal resources and expertise.

It also emphasises the need for robust safeguards to prevent these kinds of systems generating harmful content and ensuring the responsible deployment of them.

System 🤖

Operator: 
Developer: DeepSeek Artificial Intelligence Co
Country: Multiple
Sector: Multiple
Purpose: Generate text
Technology: Generative AI; Machine learning
Issue: Safety; Security