White House Pushes Anthropic to Eliminate AI Jailbreaks—Experts Warn It's Technically Impossible
The Trump administration has presented Anthropic with a challenging requirement: if the company wants to rerelease its Fable 5 model, it must ensure the model's safety guardrails cannot be circumvented. However, security experts consulted by WIRED say this expectation misunderstands the fundamental nature of AI systems.
Jailbreaks refer to techniques that trick AI models into ignoring their safety guidelines, often through carefully crafted prompts or adversarial inputs. The core problem, according to researchers, is that sufficiently motivated actors can always find new ways to circumvent protections—the defense is inherently a moving target.
Security researchers note that jailbreak attempts exist on a spectrum of sophistication. While basic guardrails can block common attack patterns, determined adversaries with enough knowledge and resources can discover novel bypass techniques. This creates an asymmetry where defenders must anticipate every possible attack, while attackers only need to find one unexpected approach.
The administration reportedly conveyed this requirement directly to Anthropic executives. The company has not publicly commented on the specific demands, but the situation highlights growing tension between government expectations for AI safety and the technical realities of how large language models operate.
For Anthropic and other AI developers, the incident underscores the ongoing challenge of balancing model capability with safety—hardening systems against exploitation without overly restricting legitimate use cases remains a difficult engineering problem without a complete solution.