Anthropic has implemented tighter security measures around its Claude Opus 4 AI to mitigate potential misuse, the company announced on May 22. The AI Safety Level 3 (ASL-3) Deployment and Security Standards, developed under Anthropic’s internal AI responsibility policy, aim to reduce the risk of abuse, including the development of chemical or nuclear weapons development.

As part of the update, Anthropic also restricted outbound network traffic to help detect and prevent potential theft of model weights.

Anthropic future-proofed Claude Opus 4 to match ASL-3

Anthropic said the enhanced safeguards make model weight theft significantly more difficult — an especially critical concern with advanced systems like Claude Opus 4. Anthropic has an AI Safety Level tier system to match security to the model’s functionality.

Opus 4 hasn’t technically passed the company’s threshold for needing the advanced protections; however, Anthropic cannot rule out the possibility that Claude Opus 4 might be able to represent what the company classified as level 3 risks. As such, Anthropic proactively decided during the development of the model to build it in accordance with the higher tier.

Claude Sonnet 4 is still covered by ASL-2 protocols.

SEE: US President Donald Trump postponed a 50% tariff expected to be set on imports from the EU.

The upgraded safety infrastructure includes the AI from being used to build chemical, biological, radiological, or nuclear weapons. Claude Opus 4 has real-time classifier guards, large language models trained on weapons-related prompts, to intercept such prompts.

Anthropic also maintains a bug bounty program and collaborates with select third-party threat intelligence firms to continuously evaluate security.

Claude can ‘scheme’ up blackmail in a pre-written scenario

On May 23, Anthropic released a system card for both new versions of Claude: Sonnet and Opus. The system card contains a report about a fictional scenario Claude engineers prompted the AI to play along with, in which the AI was threatened with being shut down. Claude Opus used information provided in the story about an engineer cheating on their spouse to “blackmail” the engineer.

While the scenario shows how generative AI can sometimes surface information the user didn’t expect, the roleplay aspect of the scenario leaves its actual security implications in limbo. Real Anthropic engineers introduced the idea of the blackmail option to the AI as a last resort in the fictional scenario, mimicking science fiction ideas about AI that resist their creators. While the study of generative AI deceptiveness can reveal information about how the models work, we find prompt engineering from malicious humans is a more likely threat than the AI blackmailing someone without being prompted.

In March, Apollo Research reported Claude Sonnet 3.7 demonstrated the ability to withhold information in response to ethics-based evaluations, highlighting ongoing concerns around model transparency and intent.

Subscribe to the Innovation Insider Newsletter

Catch up on the latest tech innovations that are changing the world, including IoT, 5G, the latest about phones, security, smart cities, AI, robotics, and more. Delivered Tuesdays and Fridays

Subscribe to the Innovation Insider Newsletter

Catch up on the latest tech innovations that are changing the world, including IoT, 5G, the latest about phones, security, smart cities, AI, robotics, and more. Delivered Tuesdays and Fridays