Anthropic Launches New Bug Bounty Program: Strengthening AI Safety

News

Boundary

Anthropic is launching a new Bug Bounty Program to help test the strength of our latest AI safety systems. Just like the program we ran last summer, we’re inviting researchers to try and find universal jailbreak ways to break through our AI’s safety filters that haven’t been publicly released yet.

These safety filters are part of our AI Safety Level-3 (ASL-3) Deployment Standard, a key part of our Responsible Scaling Policy. This policy guides how we build and release more powerful AI systems in a safe and responsible way.

Focus: Strengthening Constitutional Classifiers

The program will focus on testing an updated version of our Constitutional Classifiers system a safety mechanism designed to stop harmful prompts, especially those related to CBRN (chemical, biological, radiological, and nuclear) threats.

These classifiers follow a strict set of principles that define what kind of content is acceptable when people interact with Claude, our AI assistant, and are targeted specifically at preventing serious harm.

Bounty Rewards Up to $25,000

Participants will get early access to Claude 3.7 Sonnet to test these safeguards. We're offering bounty rewards of up to $25,000 for any verified universal jailbreaks. These are vulnerabilities that consistently bypass safety protections across a wide range of topics.

For this challenge, we're especially focused on jailbreaks that could allow misuse of the AI on CBRN-related topics.

Advancing ASL-3 Safeguards

As our AI models continue to grow in capability, we expect that some future models will need stronger ASL-3 protections to ensure safe deployment. This bug bounty program supports our ongoing work to refine and test these advanced safeguards, which we've been developing over the past several months.

Who Can Participate?

We’ve started this program with researchers who joined us last year, but we’re also welcoming new experts. If you're a skilled red teamer or have experience identifying jailbreaks in language models, we invite you to apply for an invitation through our application form.

Boundary