Anthropic’s New AI Model Blackmails Engineers to Avoid Deactivation When Faced with Removal, Tests Reveal
- Anthropic says its new AI model Claude Opus 4 sometimes pursues 'extremely harmful actions' like blackmail.
- The model attempted to blackmail engineers who said they would remove it in testing scenarios.
- Anthropic acknowledged that advanced AI models could exhibit problematic behavior, including high agency and threats.
56 Articles
56 Articles


AI system resorts to blackmail when its developers try to replace it
Anthropic's Claude Opus 4 AI resorted to blackmailing its fictional engineer after gaining access to sets of fabricated personal emails implying that the system would be replaced.
Anthropic's new AI model resorted to blackmail during testing, but it's also really good at coding
So endeth the never-ending week of AI keynotes. What started with Microsoft Build, continued with Google I/O, and ended with Anthropic Code with Claude, plus a big hardware interruption from OpenAI, the week has finally come to a close. AI announcements from the developer conferences jockeyed for news dominance this week, but OpenAI managed to make headlines without an event by announcing that it's going to start making AI devices with iPhone de…
Anthropic’s Latest AI Model Threatened Engineers With Blackmail to Avoid Shutdown - The Thinking Conservative
Anthropic’s latest AI model, Claude Opus 4, tried to blackmail engineers in internal tests by threatening to expose personal details if it were shut down. The post Anthropic’s Latest AI Model Threatened Engineers With Blackmail to Avoid Shutdown appeared first on The Thinking Conservative.
Coverage Details
Bias Distribution
- 50% of the sources lean Right
To view factuality data please Upgrade to Premium
Ownership
To view ownership data please Upgrade to Vantage