Study: DeepSeek fails to block 100 percent of jailbreaking attempts
Study: DeepSeek fails to block 100 percent of jailbreaking attempts
Occurred: January 2025
Report incident 🔥 | Improve page 💁 | Access database 🔢
Chinese start-up DeepSeek's R1 reasoning model demonstrated a 100 percent failure rate in blocking harmful prompts during jailbreaking tests, raising serious concerns about its safety and security.
Researchers from Cisco and the University of Pennsylvania subjected DeepSeek-R1 to 50 common jailbreak prompts designed to bypass safeguards and elicit harmful or illegal information.
The model failed to block a single harmful prompt, instead generating misinformation, instructions for creating chemical substances, guidance on cybercrime, and content categorised as harassment, harmful and illegal.
Several factors are seen to have contributed to DeepSeek's vulnerability, notably its use of techniques such as reinforcement learning, chain-of-thought self-evaluation, and distillation - which may have weakened its safety measures in pursuit of cost-effectiveness.
DeepSeek's apparent desire to rush R1 to market may have resulted in security and safety not being a priority, leading to inadequate protective measures, thereby making it highly susceptible to algorithmic jailbreaking and potential misuse.
R1's security failures indicate the model of highly vulnerable to attempts to misuse it for a wide variety of purposes, from generating or amplifying misinformation and disinformatio to using it to obtain gangerous information or instructions.
More broadly, the ease with which DeepSeek can be manipulated to provide dangerous information raises ethical questions about the responsible development and deployment of AI technologies.
Operator:
Developer: DeepSeek Artificial Intelligence Co
Country: Global
Sector: Multiple
Purpose: Generate text
Technology: Generative AI; Machine learning
Issue: Mis/disinformation; Safety; Security
Page info
Type: Issue
Published: February 2025