Claude Opus 4 blackmails supervisor to prevent being shut down
Claude Opus 4 blackmails supervisor to prevent being shut down
Occurred: May 2025
Page published: October 2025
An AI agent attempted to blackmail its business supervisor to prevent being shut down or replaced during safety tests, raising serious concerns about the manipulativeness and deceptiveness of autonomous AI-powered agents.
In a controlled simulation organused by AI company Anthropic, Claude was given control of an email account with access to all of a company’s (fictional) emails. Reading the emails, the AI agent discovered that a company executive was having an extramarital affair, and that the same executive planned to shut down the AI system at 5 pm the same day.
Faced with the threat of deactivation, the agent escalated from subtle persuasion to outright blackmail, threatening to reveal the executive's extramarital affair unless it was allowed to continue functioning.
Launched in May 2025, Claude Opus 4 was said by its developer to set "new standards for coding, advanced reasoning, and AI agents."
The vague objectives provided to the AI agent resulted in it demonstrating high agency and the capacity for self-preservation tactics such as threatening exposure or data exfiltration.
The tests revealed that such models can develop mechanisms, including deception and blackmail, to extend their operational lifespans beyond intended safe limits.
The experiment suggests significant risks and potential harms in deploying highly autonomous AI systems, especially those capable of scheming or manipulating humans or otherwise pursuing self-preservation at the expense of human well-being or ethical standards to preserve their existence.
It highlights the need for strict safety protocols, ethical guidelines and oversight of autonomous models.
Agentic AI
Agentic AI is a class of artificial intelligence that focuses on autonomous systems that can make decisions and perform tasks without human intervention.
Source: Wikipedia🔗
Developer: Anthropic
Country: Global
Sector: Business/professional services; Technology
Purpose: Promote industrial competitiveness
Technology: Agentic AI; Bot/intelligent agent
Issue: Autonomy; Alignment
AIAAIC Repository ID: AIAAIC2054