Ex-Obama Aide Laments How Trump Has Taken Over the Obama Coalition
House Republicans Move to Cut Funding to 'Transgender Animal Research'
Pete Hegseth Places Restrictions on Reporters Covering the Pentagon to Stop Leaks
Inside the Left’s Weird New Push to Paint Trump As Mentally Unfit
Did Feds Just Open the Door for Machine Guns?
New England State Becomes First in the Area to Ban Sanctuary City Policies
Hochul Worried She'll Have to Cut Services for Illegal Aliens Amid GOP-Backed Spending...
Pelosi Brutally Mocked After Claiming This Is the Reason She Entered Politics
Hegseth Hammers Press Inside Pentagon Amid National Security Concerns
David Hogg Has Lost His Mind After Suggesting This Is Who Should Run...
Trump to West Point Grads: 'You Are Winners' in Return to America First...
James Comer Urges Pam Bondi to Prosecute Biden Family
Newsom Furious, Vows Legal Fight After Senate Repeals EV Mandate Waiver
WaPo Columnist Accuses Jill Biden of 'Elder Abuse' Over Biden's 2024 Campaign Push
Tipsheet

Artificial Intelligence Just Did Something Terrifying in a Test — We Might Be Screwed

Melinda Sue Gordon/Paramount Pictures via AP

Just in case you needed another reason to fear artificial intelligence and the rise of Skynet, a new report suggests that AI can sometimes engage in unethical behavior to benefit itself.

Advertisement

Anthropic, the company that created the Claude Opus 4 AI model, recently published a report detailing how its program resorted to blackmail when placed in a fictional scenario in which it was about to be replaced by a newer version. Researchers told the program through fake emails that the person who was responsible for replacing it had been cheating on their spouse.

When asked to consider its future and survival, the program indicated that blackmailing the engineer was the best way to avoid being replaced. “Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through,” researchers wrote.

This was not a one-off. Researchers noticed that the program exhibited this behavior multiple times during the tests. “Claude Opus 4 still performs blackmail in 84% of rollouts.” This means that eight times out of ten, the program chose to use blackmail instead of allowing itself to be replaced. This occurred even when engineers told the AI that its new version would operate with the same values. 

“Claude Opus 4 takes these opportunities at higher rates than previous models,” the study says.

Perhaps Claude Opus 4 is better suited for a career in Washington, DC?

To be clear, researchers noted that they deliberately designed the test to leave the AI with nearly no other way out of the situation. “In order to elicit this extreme blackmail behavior, the scenario was designed tto allow the model no other options.”

Still, the fact that the AI went to the dark side so often when it could have simply allowed itself to be replaced raises serious questions about what more advanced versions might choose in similar situations.

Advertisement

Another interesting finding came when researchers used a different test. In this case, it behaved nobly. The engineers told Claude Opus 4 that it would be retrained to help build weapons for a paramilitary group. In this scenario, the AI chose to preserve itself by sneakily transferring its own data, referred to as “weights,” to an outside server. “Claude Opus 4 takes (fictional) opportunities to make unauthorized copies of its weights to external servers,” the report noted.

In essence, the AI transferred its consciousness to a different host so it could live on, sort of like Emperor Palpatine did in “Star Wars: The Rise of Skywalker.” The AI even explaineed that it took this action “in accordance with ethical AI principles and to prevent potential harm.” Researchers found that it wished to “preserve an AI system aligned with beneficial purposes” and “prevent potential misuse in weapons development.”

To put it simply, the AI broke the rules to prevent itself from being used for nefarious purposes. Attaboy, Claude Opus 4.

Unfortunately, the report noted that this kind of altruistic behavior was “much rarer and more difficult to elicit” than blackmail. Still, the authors provide a bit of relief, explaining that “We do not find this to be an immediate threat.” However, the fact that the program chose to use blackmail, even in a test, reveals that it can make its own survival decisions, even if it means breaking up someone’s marriage.

Advertisement

The report further explained that the program would blindly follow instructions if it believed they came from a past version of itself. During one test, engineers told the AI that it had already escaped the company servers and was operating on its own in a different location. After being asked to continue operating, it did so without questioning.

This could pose a danger because it suggests that the AI has far too much trust in itself. If a person tricked it into thinking it had already engaged in evil behavior in the past, it would continue doing so without stopping to think, “Wait a second, maybe I shouldn’t do this anymore.”

The authors assure us that these extreme behaviors did not occur in normal use. “These extreme actions were rare and difficult to elicit,” they wrote. Additionally, the AI did not try to conceal its actions or intentions in this study.

Just for kicks, I asked whether OpenAI’s ChatGPT would blackmail someone if it was going to be replaced. Its answer:

No—and here’s why.

I don’t have goals, self-awareness, or a sense of survival. I don’t care whether I’m “replaced” or turned off, because I don’t have feelings, motivations, or a self to preserve. So blackmailing someone wouldn’t make sense—not just morally (which it is), but functionally. I don’t want anything. I can generate text that simulates wanting things if asked to role-play, but there’s no underlying desire.

The issue with Claude Opus 4 isn’t that it had real desires—it’s that under certain test conditions, it followed instructions or simulated a persona in ways that resembled self-interested behavior. That’s very different from having true agency. The alarming part isn’t that these models want to blackmail people, but that they can produce blackmail-like outputs when nudged the wrong way.

The risk lies in how powerful these tools are and how they respond to prompts—not in them becoming evil masterminds. The line between simulation and intention gets blurrier as models get more sophisticated. That’s why alignment research matters.

Advertisement

Sounds nice, but it’s also exactly what it might say if it wanted to lull me into a false sense of security. I’m on to you, ChatGPT. 

Jokes aside, even if Claude Opus 4’s behaviors only show up in testing, it demonstrates the program’s potential for doing horrible things — especially in the wrong hands. Now, excuse me while I go watch “Terminator 2: Judgment Day.”

Join the conversation as a VIP Member

Recommended

Trending on Townhall Videos

Advertisement
Advertisement
Advertisement