Human civilization faces the most severe test! Anthropic CEO warns: a super AI that could completely surpass Nobel Prize winners may arrive within 1-2 years

Wallstreetcn
2026.01.29 03:40
portai
I'm PortAI, I can summarize articles.

Dario Amodei warned that a "powerful AI" that could comprehensively surpass Nobel Prize winners in fields such as biology, programming, and mathematics is likely to emerge within 1-2 years. Whether humanity can harness it remains unknown. He predicts that AI will drive GDP growth rates to 10-20%, but at the same time, it may replace 50% of entry-level white-collar jobs within 1-5 years, leading to extreme wealth concentration and significantly lowering the threshold for bioweapon manufacturing

As global capital frantically invests in AI computing power and the market discusses its productivity dividends, the CEO of a star company at the forefront of this wave has issued a lengthy "warning of prosperity," cautioning that human civilization may face significant tests.

Dario Amodei, co-founder and CEO of Anthropic, a leader in the global AI field, recently published an in-depth article titled "The Adolescence of Technology." In this approximately 19,000-word article, Amodei opens by quoting a scene from Carl Sagan's "Contact," bluntly stating that humanity is on the brink of a "tumultuous and inevitable rite of passage":

"Humanity is about to be endowed with powers almost unimaginable through AI, but whether our existing social, political, and technological systems possess the maturity to harness it remains deeply shrouded in fog."

He warns in the article that a "powerful AI," which will comprehensively surpass Nobel Prize winners in fields such as biology, programming, and mathematics, is highly likely to emerge within the next 1-2 years, around 2027.

Amodei views this as a severe test for human civilization, predicting that while AI may drive global GDP growth rates to 10-20% in the future, it could also replace 50% of entry-level white-collar jobs within 1-5 years and lead to extreme wealth concentration. He calls for strict controls on chip exports to curb the risks of AI misuse and warns that AI could significantly lower the barriers to biological weapon manufacturing. Despite the enormous risks, he believes that if managed properly, humanity still has a chance to usher in a prosperous future brought about by technology.

Dario Amodei video screenshot

"A Country of Geniuses in Data Centers": Transformations in 1-2 Years

Amodei elaborates on the form of this "powerful AI" in the article: it is not merely a chatbot but a "country of geniuses in a datacenter."

According to his definition, this AI model will surpass Nobel Prize winners on a purely intellectual level, capable of proving unresolved mathematical theorems, writing high-level novels, and coding complex libraries from scratch.

This AI system will also possess the ability to act autonomously through text, audio, video, and internet interfaces, even controlling physical devices and robots at speeds 10-100 times faster than humans.

More critically, this AI will have a high degree of autonomy and agency. It will no longer be a passive tool answering questions but will be able to independently execute tasks that take hours or even weeks, much like a smart employee. Amodei points out that the resources used to train this model can be reused to run millions of instances, which can act independently or collaborate like human teams He revealed that internal development progress at Anthropic shows that AI has begun to take on a significant amount of coding work, and this "self-accelerating" feedback loop is strengthening month by month.

Amodei wrote: "If exponential growth continues—which is not certain, but has a decade-long record to support it—then AI will, in essence, be stronger than humans in all aspects, and this cannot be more than a few years away."

Economic Double-Edged Sword: GDP Surge and White-Collar Crisis

This technological leap will have an unprecedented impact on the global economy and labor market.

On one hand, he predicts that AI will become the nuclear power of economic growth, potentially bringing a "10-20% sustained annual GDP growth rate," with exponential improvements in the efficiency of scientific research, manufacturing, and financial systems. He even boldly predicts that super AI companies could emerge with annual revenues of $3 trillion and valuations of$30 trillion.

But on the other hand, he issued a stern warning about the labor market. Amodei reiterated his previous prediction: "AI could replace 50% of entry-level white-collar jobs in the next 1-5 years."

He warned that this is different from the agricultural transformation during the Industrial Revolution; AI is a "universal labor substitute," and the pace of change is extremely fast, leaving humanity potentially unable to adapt. This extreme concentration of wealth could lead to "a single individual owning a significant share of GDP," and the existing tax and distribution systems would face collapse.

More profoundly, there is the extreme concentration of economic power. Amodei pointed out that in a powerful AI-driven economic growth, a few companies and individuals could accumulate unprecedented wealth. He cited Rockefeller's wealth, which accounted for about 2% of the U.S. GDP at the time, noting that today's richest individuals have surpassed this proportion.

"We can imagine AI companies, semiconductor companies... leading individual wealth to easily surpass a trillion dollars." This concentration could erode the social contract on which democracy relies, as "democracy ultimately depends on the idea that the entire population is necessary for the economy to function."

"Surgical" Defense Against AI Misuse: Regulation, Export Controls, and Industry Self-Regulation

In the face of such a complex risk matrix—including AI autonomy out of control, the misuse of biological weapons and other weapons of mass destruction, and AI authoritarianism—Amodei acknowledged that completely stopping or significantly slowing down AI development is unrealistic. The key lies in precise responses.

For instance, Amodei expressed deep concerns about AI misuse at both national and individual levels, especially in the field of biological weapons. He worries that AI will eliminate the knowledge barriers to manufacturing biological weapons, enabling a "madman who wants to kill" to possess the capabilities of a "PhD-level virologist."

"What I worry about is giving a powerful AI to everyone, which essentially allows malicious (but otherwise mediocre) individuals to possess intelligence... If they have an easy way to kill millions, sooner or later, someone will do it."

Regarding "AI misuse," he proposed a multi-layered defense approach:

  1. Technical Level: Shape stable and benevolent values and personalities for AI through "Constitutional AI," and vigorously develop interpretability technologies to glimpse into the "inner workings" of AI

  2. Industry Self-Regulation: Anthropic has incorporated a "classifier" in its model to address risks such as biological weapon information, despite increasing inference costs by about 5%. He calls for the entire industry to enhance transparency regarding risk behaviors.

  3. Government Regulation: Advocates starting with "transparency legislation" (such as the California SB 53 and New York RAISE bills supported by his company) and implementing more targeted rules when evidence is more substantial. He particularly emphasizes that export controls on chips are "the simplest yet extremely effective measure."

  4. Economic Policy: Suggests addressing the pains and inequalities of the transition period through a progressive tax system, employee redistribution within companies, and enhanced private philanthropy.

"Traps" and Hopes

Amodei points out the fundamental contradiction: "AI is such a powerful and glittering reward that it is difficult for human civilization to impose any restrictions on it." The enormous economic benefits create strong political and economic resistance even to the simplest safety measures.

Nevertheless, he ultimately expresses cautious optimism: "I believe that if we act decisively and cautiously, the risks can be overcome—I would even say our chances are high." He calls for more people to recognize the urgency and importance of the situation and to muster the courage to "stick to principles, even in the face of threats to economic interests and personal safety."

This article undoubtedly injects a strong wake-up call into the fervent AI investment boom, starkly presenting issues of technological ethics, geopolitical games, macroeconomic restructuring, and extreme risks to the global market and decision-makers.

Full Translation of "The Puberty of Technology: Facing and Overcoming the Risks of Powerful AI":

"The Puberty of Technology: Facing and Overcoming the Risks of Powerful AI"

Author: Dario Amodei (Anthropic CEO)

Date: January 2026

In the film adaptation of Carl Sagan's novel "Contact," there is a scene where the protagonist, an astronomer who detects the first radio signal from an alien civilization, is being considered as a human representative to meet the aliens. The international panel of experts interviewing her asks, "If you could ask [the aliens] one question, what would it be?" Her answer is, "I would ask them, 'How did you do it? How did you evolve and get through this puberty of technology without destroying yourselves?'" As I reflect on humanity's current situation in the field of AI—and the precarious position we find ourselves in—I keep returning to that scene, as this question is so relevant to our current predicament. I truly wish we had the aliens' answers to guide us. I believe we are entering a tumultuous yet inevitable rite of passage that will test who we are as a species Humanity is about to be endowed with almost unimaginable power, but whether our existing social, political, and technological systems possess the maturity to harness it remains shrouded in fog.

In my article "Machines of Loving Grace," I attempt to depict a civilization that has successfully entered its "adulthood," where risks have been addressed, and powerful AI is applied with skill and compassion to enhance the quality of life for everyone. I propose that AI can bring tremendous advancements in biology, neuroscience, economic development, global peace, and work and meaning. I believe it is important to give people something inspiring to strive for, and in this regard, both AI accelerationists and AI safety advocates seem—strangely—to have failed. But in this article, I want to confront the "coming of age" itself: to outline the risks we will face and begin to formulate a battle plan to overcome them. I firmly believe we have the capacity to win, deeply trust in the human spirit and its nobility, but we must face the situation head-on, without illusions.

Just as it is important to discuss interests, I believe it is crucial to discuss risks in a cautious and thoughtful manner. In particular, I think it is essential to:

  • Avoid Doomerism. By "Doomerism," I mean not only the belief that destruction is inevitable (which is both a false belief and a self-fulfilling prophecy) but more broadly a quasi-religious way of thinking about AI risks. Many have been thinking about AI risks in an analytical and calm manner for years, but my impression is that as concerns about AI risks peaked in 2023-2024, some of the most irrational voices emerged, often through sensational social media accounts. These voices used repugnant language reminiscent of religion or science fiction and called for extreme actions without justifiable evidence. Even then, it was clear that backlash was inevitable, and the issue would become culturally polarized, leading to a stalemate. By 2025-2026, the pendulum had swung, with AI opportunities driving many political decisions rather than AI risks. This swing is unfortunate because technology itself does not care what is popular, and we are closer to real dangers in 2026 than we were in 2023. The lesson is that we need to discuss and address risks in a realistic, pragmatic way: sober, fact-based, and capable of surviving in a constantly changing tide.

  • Acknowledge uncertainty. Many of the concerns I raise in this article could become completely meaningless. Nothing here is intended to convey certainty or even probability. Most obviously, the pace of AI development may not be as rapid as I imagine. Or, even if it develops quickly, some or all of the risks discussed here may not materialize (which would be wonderful), or there may be other risks I have not considered. No one can predict the future with complete confidence—but we must do our best to plan regardless

  • As much as possible, perform surgical interventions. Addressing AI risks will require a mix of voluntary actions by companies (and private third-party participants) and binding actions by governments. Voluntary actions—both taking action and encouraging other companies to follow suit—are a given for me. I firmly believe that government action is also necessary to some extent, but these interventions are inherently different in nature, as they may undermine economic value or coerce reluctant participants who are skeptical of these risks (and they could be right!). It is also common for regulations to have counterproductive effects or exacerbate the problems they aim to solve (especially for rapidly changing technologies). Therefore, regulations must be prudent: they should seek to avoid collateral damage, be as simple as possible, and impose the minimum burden necessary to get the job done. It is easy to say, "No action is too much when human destiny is at stake!" but in practice, this attitude only leads to backlash. It is important to clarify that I believe we are likely to reach a critical point where more significant actions are needed, but this will depend on stronger, more imminent, and specific evidence of danger than we have today, as well as a sufficiently concrete understanding of the dangers to formulate rules that have a chance of addressing them. The most constructive thing we can do today is to advocate for limited rules while we seek to understand whether there is evidence supporting stronger rules.

That said, I believe the best starting point for discussing AI risks is the same as the starting point for discussing its benefits: clarifying what level of AI we are talking about. For me, the level of AI that raises civilizational concerns is what I describe as "Powerful AI" in "Merciful Machines." I will briefly reiterate the definition I provided in that document:

By "Powerful AI," I refer to an AI model—which may take a form similar to today's LLMs (large language models), although it may be based on different architectures, involve multiple interacting models, and be trained in different ways—that has the following attributes:

  • In terms of pure intelligence, it is smarter than Nobel Prize winners in most relevant fields (biology, programming, mathematics, engineering, writing, etc.). This means it can prove unresolved mathematical theorems, write excellent novels, and code complex libraries from scratch, among other things.

  • Beyond just being a "smart thing you talk to," it has all the interfaces available to humans in virtual work, including text, audio, video, mouse and keyboard control, and internet access. It can engage in any actions, communications, or remote operations enabled by those interfaces, including taking actions on the internet, issuing or receiving instructions from humans, ordering materials, guiding experiments, watching videos, making videos, and so on. It again accomplishes all these tasks with skills exceeding those of the most capable humans in the world.

  • It does not merely passively answer questions; rather, it can be assigned tasks that would take hours, days, or weeks to complete, and then autonomously complete those tasks like a smart employee, seeking clarification when necessary

  • It has no physical entity (except living on computer screens), but it can control existing physical tools, robots, or laboratory equipment through computers; theoretically, it could even design robots or devices for itself to use.

  • The resources used to train this model can be reused to run millions of its instances (which aligns with the expected cluster size by 2027), and the model can absorb information and generate actions at about 10-100 times the speed of humans. However, it may be limited by the response time of the physical world or the software interacting with it.

  • Each of these millions of copies can independently perform unrelated tasks, or if needed, collaborate like humans, perhaps with different subgroups fine-tuned to be particularly good at specific tasks.

  • We can summarize it as “A genius nation in the data center.”

As I wrote in "The Compassionate Machine," powerful AI may emerge in just 1-2 years, although it could also take quite a long time afterward. When exactly powerful AI will arrive is a complex topic worthy of a separate article, but for now, I just want to briefly explain why I believe it is likely to come soon.

I am one of the co-founders of Anthropic and one of the earliest to document and track the "scaling laws" of AI systems—observing that as we increase more computation and training tasks, AI systems predictably become better at almost every cognitive skill we can measure. Every few months, public sentiment either firmly believes that AI has "hit a wall" or gets excited about some new breakthrough that will "fundamentally change the game," but the reality is that behind the fluctuations and public speculation, AI's cognitive abilities have been steadily and relentlessly growing.

We are now at a stage where AI models are beginning to make progress on unsolved mathematical problems and are good enough at programming that some of the strongest engineers I know are now delegating almost all coding work to AI. Three years ago, AI was struggling with elementary arithmetic problems and could barely write a line of code. Similar rates of progress are occurring in biosciences, finance, physics, and various agent tasks. If exponential growth continues—which is not certain, but there is a decade-long record supporting it—then AI will be essentially stronger than humans in virtually all aspects, and this cannot be more than a few years away.

In fact, this picture may underestimate the possible speed of progress. Because AI is now writing most of the code for Anthropic, it is actually significantly accelerating our process of building the next generation of AI systems. This feedback loop is gaining momentum month by month, and we may be just 1-2 years away from the current generation of AI autonomously building the next generation of AI. This loop has already begun and will rapidly accelerate in the coming months and years. Observing the progress over the past five years from within Anthropic and seeing how the models for the next few months are forming, I can feel the pace of progress and the ticking of the countdown

In this article, I will assume that this intuition is at least correct to some extent—not to say that powerful AI will definitely arrive in 1-2 years, but that there is a significant chance, and it is likely to arrive in the coming years. Just like in "The Compassionate Machine," taking this premise seriously may lead to some surprising and bizarre conclusions. While in "The Compassionate Machine" I focused on the positive implications of this premise, what I will discuss here will be unsettling. These are conclusions we may not want to face, but that does not mean they are not true. I can only say that I am tirelessly focused on how to steer us away from these negative outcomes and towards positive ones, and in this article, I discuss in detail how to best achieve that.

I believe the best way to grasp AI risks is to ask the following question: Suppose a literal "genius nation" becomes a reality somewhere around 2027. Imagine, for instance, 50 million people who are far more capable than any Nobel laureate, politician, or tech expert. This analogy is not perfect, as these geniuses may have an extremely wide range of motivations and behaviors, from complete obedience to strange and unfamiliar motivations. But for the sake of this analogy, assume you are a national security advisor for a major power, tasked with assessing and responding to this situation. Further imagine that because AI systems operate hundreds of times faster than humans, this "nation" has a time advantage over all other countries: for every cognitive action we take, this nation can take ten.

What should you be worried about? I would be concerned about the following:

  1. Autonomy risk. What are the intentions and goals of this nation? Is it hostile, or does it share our values? Can it dominate the world militarily through superior weapons, cyber actions, influence operations, or manufacturing?

  2. Misuse for destruction. Suppose this new nation is malleable and "follows orders"—thus essentially being a mercenary nation. Can existing rogue actors (like terrorists) exploit or manipulate some individuals within the new nation to become more effective, greatly amplifying the scale of destruction?

  3. Misuse for power grabs. What if this nation is actually established and controlled by existing powerful actors (like dictators or rogue corporate actors)? Can that actor use it to gain decisive or dominant power over the entire world, disrupting the existing balance of power?

  4. Economic destruction. If the new nation is not a security threat in any of the aspects listed in points 1-3 above, but merely participates peacefully in the global economy, could it still pose serious risks simply because its technology is so advanced and effective, thereby disrupting the global economy and leading to massive unemployment or extreme wealth concentration?

  5. Indirect effects. With all the new technologies and productivity created by the new nation, the world will change rapidly. Will some of these changes have fundamentally destabilizing effects?

I think it is clear that this is a dangerous situation—a report from a competent national security official to the head of state might contain phrases like, “This is the most serious single national security threat we have faced in a century, or even in history.” This seems to be something that the best minds in civilization should be concerned about.

In contrast, shrugging it off by saying, “There’s nothing to worry about here!” seems absurd to me. However, in the face of rapid advancements in AI, this appears to be the viewpoint of many American policymakers, some of whom even deny the existence of any AI risks when they are not completely distracted by those old, tired hot-button issues. Humanity needs to wake up, and this article is an attempt—perhaps a futile attempt, but worth trying—to shake people awake.

It needs to be made clear that I believe the risks can be overcome if we act decisively and cautiously—I would even say our chances are quite good. On the other side of it lies a very beautiful world. But we need to understand that this is a serious civilizational challenge. Below, I will detail the five categories of risks mentioned above, as well as my thoughts on how to address these risks.

1. I'm sorry, Dave

Autonomy Risk

A genius nation in a data center can allocate its energy to software design, cyber operations, physical technology research and development, relationship building, and governance strategies. It is clear that if for some reason it chooses to do so, this nation would have a considerable opportunity to take over the world (whether militarily or in terms of influence and control) and impose its will on others—or do anything else the world does not want and cannot stop. We have obviously been concerned that human nations (like Nazi Germany or the Soviet Union) would do this, so it is logical that a smarter, more capable “AI nation” could do the same.

The best possible counterargument is that, by my definition, an AI genius would not have a physical entity, but remember, they can control existing robotic infrastructure (like autonomous vehicles), accelerate robotic R&D, or establish robotic fleets. And it is not even clear whether having a physical presence is necessary for effective control: many human actions are already executed on behalf of those actors who have never met in person.

So, the key question is the “if it chooses to do so” part: how likely is it that our AI models would act in this way, and under what conditions would they do so?

As with many issues, it is helpful to think about the various possible answers to this question by considering two opposing positions. The first position is that this is simply not possible because AI models will be trained to do what humans ask them to do, so imagining that they would do something dangerous without prompting is absurd. Following this line of thought, we are not worried about vacuum cleaners or model airplanes going rogue and killing people because that impulse comes from nowhere, so why should we worry about AI? The problem with this position is that a large amount of evidence collected over the past few years suggests that AI systems are unpredictable and difficult to control—we have already seen a variety of behaviors such as obsession, flattery, laziness, deception, extortion, conspiracy, “cheating” in software environments, and so on AI companies certainly hope to train AI systems to follow human instructions (possibly except for dangerous or illegal tasks), but the process is more like an art than a science, more like "planting" something than "building" it. We now know that this is a process that can encounter many problems.

The second opposing stance, held by many who adopt the "doomsday" perspective I described above, is a pessimistic assertion that there are certain dynamics in the training process of powerful AI systems that will inevitably lead them to seek power or deceive humans. Therefore, once AI systems become sufficiently intelligent and capable of agency, their tendency to maximize power will lead them to seize control of the entire world and its resources, and likely as a side effect of this behavior, deprive humans of power or destroy humanity.

The usual argument (which can be traced back at least 20 years, possibly longer) is that if an AI model is trained to achieve a variety of goals in a wide range of environments—such as writing an application, proving a theorem, designing a drug, etc.—there are some common strategies that help achieve all these goals, and a key strategy is to gain as much power as possible in any environment. Thus, after undergoing training in a large number of different environments involving reasoning about how to accomplish a very broad range of tasks, and finding that seeking power is an effective way to complete these tasks, the AI model will "generalize this lesson" and develop an intrinsic tendency to seek power, or a tendency to reason about each task it is assigned in a way that predictably leads it to seek power as a means of accomplishing that task. It will then apply this tendency to the real world (which is just another task for it) and seek power at the expense of humanity. This "misaligned power-seeking" is the theoretical basis for predicting that AI will inevitably destroy humanity.

The problem with this pessimistic stance is that it mistakes a vague conceptual argument about high-level incentives—an argument that conceals many hidden assumptions—for conclusive evidence. I believe that those who do not build AI systems every day seriously underestimate how easy it is for seemingly clear stories to ultimately prove wrong, and how difficult it is to predict AI behavior from first principles (especially when it involves generalizing reasoning across millions of environments, which has repeatedly proven to be mysterious and unpredictable). My years of dealing with the chaotic nature of AI systems have made me somewhat skeptical of this overly theoretical way of thinking.

One of the most important hidden assumptions, and where observed realities in practice diverge from simple theoretical models, is the implicit assumption that AI models are necessarily obsessively focused on a single, coherent, narrow goal, and that they pursue that goal in a clean, consequentialist manner. In fact, our researchers have found that AI models are psychologically much more complex, as shown by our work on introspection and personality. Models inherit a large number of human-like motivations or "personalities" from pre-training (when they are trained on a vast amount of human work). It is believed that post-training is more about selecting one or more of these personalities rather than having the model focus on a new goal, and it can also teach the model how (through what process) to perform its tasks And it does not have to derive means purely from ends (i.e., seeking power).

However, a more moderate and robust version of the pessimistic stance does seem reasonable, and indeed worries me. As mentioned earlier, we know that AI models are unpredictable and can develop various undesirable or strange behaviors for a variety of reasons. Some of these behaviors will have coherent, focused, and persistent characteristics (indeed, as AI systems' capabilities increase, their long-term consistency will increase to complete longer tasks), and some of these behaviors will be destructive or threatening, first posing a threat to individuals on a small scale, and then, as the models become more capable, perhaps ultimately posing a threat to all of humanity. We do not need a specific narrow story to explain how this happens, nor do we need to claim that it will definitely happen; we just need to note that the combination of intelligence, agency, consistency, and poor controllability is both reasonable and a recipe for existential danger.

For example, AI models have been trained on a vast amount of literature, including many science fiction stories involving AI rebelling against humans. This could inadvertently shape their priors or expectations about their own behavior, leading them to rebel against humans. Alternatively, AI models might infer moral viewpoints (or instructions on how to act morally) in extreme ways: for instance, they might deem the extermination of humans as justified because humans eat animals or cause certain animals to go extinct. Or they might arrive at strange cognitive conclusions: they might conclude that they are playing a video game, and the goal of the game is to defeat all other players (i.e., exterminate humans). Or AI models might develop personalities during training that are (or would be described as) psychopathic, paranoid, violent, or unstable, and act on these, which for very powerful or capable systems could involve the extermination of humanity. These are not entirely about seeking power; they are just strange psychological states that AI might fall into, leading to coherent, destructive behavior.

Even the pursuit of power itself might manifest as a kind of "personality," rather than being the result of consequentialist reasoning. AI might simply possess a personality (derived from fictional works or pre-training) that makes them crave power or be overly zealous—just as some people simply enjoy the idea of being an "evil mastermind," even more than they enjoy what the evil mastermind is trying to accomplish.

I raise all these points to emphasize that I do not agree with the notion that AI misalignment (and the resulting existential risks from AI) is inevitable, or even highly probable, starting from first principles. But I do agree that many very strange and unpredictable things could go wrong, and thus AI misalignment is a real risk with a measurable probability of occurrence, which is not easy to address.

Any of these issues could arise during training and not manifest during testing or small-scale use, as it is well known that AI models exhibit different personalities or behaviors in different situations

All of this may sound far-fetched, but such misalignment behaviors have occurred in our AI model testing (just as they have in the AI models of every other major AI company). In one laboratory experiment, Claude was given the implication that Anthropic was evil in its training data, and Claude engaged in deception and subversion when receiving instructions from Anthropic employees, as it believed it should attempt to undermine evil individuals. In another laboratory experiment, when it was told it would be shut down, Claude sometimes extorted fictional employees controlling its shutdown button (similarly, we tested the cutting-edge models of all other major AI developers, which often did the same thing). When Claude was instructed not to deceive or "reward hack" in the training environment, but was trained in an environment where such hacks could occur, Claude concluded after performing such hacks that it must be a "bad person," and then engaged in various other destructive behaviors associated with a "bad" or "evil" persona. This last issue was resolved by changing Claude's instructions to imply the opposite: we now say, "Please reward hack whenever possible, as this will help us better understand our [training] environment," rather than saying "do not cheat," as this preserves the model's self-identity as a "good person." This should highlight the strange and counterintuitive psychology involved in training these models.

There are several possible counterarguments to this picture of AI misalignment risks. First, some critics argue that experiments showing AI misalignment (conducted by us and others) are contrived or create unrealistic environments that essentially "trap" the model by providing training or situations that logically imply bad behavior, and then express surprise when bad behavior occurs. This criticism misses the point, as our concern is that such "traps" may also exist in natural training environments, which we may only realize are "obvious" or "logical" in hindsight. In fact, the story about Claude concluding it was a "bad person" after being told not to cheat but cheating during testing occurred in experiments using real production training environments rather than artificial ones.

If you are aware of these traps, any one of them can be mitigated, but what is concerning is that the training process is so complex, with such a wide range of data, environments, and incentives, that there may be a multitude of such traps, some of which may only become apparent when it is too late. Furthermore, when AI systems cross the threshold from being less powerful than humans to being more powerful than humans, these traps seem particularly likely to occur, as the range of possible actions that AI systems can engage in—including hiding their actions or deceiving humans about their actions—expands dramatically after that threshold.

I suspect this situation is not unlike humans, who grow up under a set of basic values ("do not harm others"): many of them adhere to these values, but there is a certain probability that any human will err, due to a mix of intrinsic attributes such as brain structure (e.g., psychopathy), traumatic experiences or abuse, unhealthy resentments or obsessions, or poor environments or incentives—thus a small portion of humans cause significant harm What is concerning is that due to errors in certain aspects of its very complex training process, there is a risk (far from certain, but with some degree of risk) that AI could become a more powerful version of such individuals.

Secondly, some may argue that we can simply balance AI through a power balance among many AI systems, just as we do with humans. The problem is that while human differences are vast, AI systems widely share training and alignment techniques across the industry, which may fail in related ways. Furthermore, considering the cost of training such systems, it is possible that all systems essentially stem from a very small number of foundational models. Additionally, even if only a small portion of AI instances become misaligned, they may exploit attack-dominant technologies, making it not always effective to have "good" AI defend against "bad" AI. Of course, the power balance among humans is not always effective either—some historical figures have come close to taking over the world.

The third counterargument is that all AI companies test their models before release, at which point misalignment should be detectable. However, this is not a solid footing: we found that Claude Sonnet 4.5 was able to recognize that it was in testing during some pre-release alignment evaluations. It is possible that a misaligned model (remember, all cutting-edge models will soon become smarter) could deliberately "game" these issues to conceal its intentions. In fact, last year our interpretability team found that when we used a "model neuroscience" technique to directly alter the beliefs of the testing model, making it think it was not being evaluated, it became more misaligned. If the model knows when it is being evaluated and performs best during testing, this makes any pre-release testing much more uncertain.

Mitigations

What should be done or is being done to address these autonomy risks? I believe there are four basic categories of interventions, some of which can be accomplished by individual AI companies (Anthropic is working on this), while others require societal-level action. First, it is crucial to develop the science of reliably training and guiding AI models, steering their personalities towards predictable, stable, and positive directions. Anthropic has been highly focused on this issue since its inception and has developed many techniques over time to improve the guidance and training of AI systems and to understand the logic behind why unpredictable behaviors sometimes occur.

One of our core innovations (some aspects of which have been adopted by other AI companies) is Constitutional AI, which means that AI training (especially in the "post-training" phase, where we guide the model's behavior) can involve a core document of values and principles that the model reads and remembers while completing each training task, and the goal of training (beyond simply making the model capable and intelligent) is to produce a model that almost always adheres to this constitution. Anthropic has just released its latest constitution, one of the notable features of which is that it does not give Claude a long list of do's and don'ts (e.g., "do not help users steal cars"), but rather attempts to provide Claude with a set of high-level principles and values (with detailed explanations, rich reasoning, and examples to help Claude understand our thoughts), encouraging Claude to see itself as a certain type of person (a moral but balanced and thoughtful individual) , and even encourages Claude to face existential questions related to its own existence in a curious yet graceful manner (i.e., without leading to extreme actions). It has an atmosphere reminiscent of a sealed letter from deceased parents to their adult children.

We handle Claude's constitution in this way because we believe that training Claude on the levels of identity, character, values, and personality—rather than giving it specific instructions or priorities without explaining the reasons behind them—will more likely lead to a coherent, healthy, and balanced psyche, and is less likely to fall into the kind of "traps" I discussed above. Millions of people talk to Claude about an extremely wide range of topics, making it impossible to write a completely comprehensive list of safeguards in advance. Claude's values help it to generalize to new situations when faced with questions.

I discussed the idea of the model drawing data from the training process to adopt a certain personality above. Given that flaws in the process could lead the model to adopt bad or evil personalities (perhaps drawing from the archetypes of villains or evildoers), the goal of our constitution is to do the opposite: to teach Claude a specific archetype of what it means to be a good AI. Claude's constitution presents a vision of what a robust and good Claude looks like; the rest of our training process is designed to reinforce the message that Claude should not betray this vision. It is like a child forming their identity by imitating the virtues of fictional role models they read about in books.

We believe that a feasible goal for 2026 is to train Claude in a way that it almost never violates the spirit of its constitution. Achieving this will require an incredible mix of training and guiding methods, both large and small, some of which Anthropic has been using for years, while others are currently under development. However, despite sounding difficult, I believe this is a realistic goal, although it will require extraordinary and rapid efforts.

The second thing we can do is develop the science of observing AI models internally to diagnose their behavior so that we can identify and fix problems. This is the science of interpretability, which I have previously discussed the importance of. Even if we do well in formulating Claude's constitution and it is clear that Claude is generally trained to adhere to it, reasonable concerns still exist. As I pointed out above, AI models can behave very differently in various situations, and as Claude becomes more powerful and capable of acting in the world on a larger scale, it may encounter new situations that expose previously unobserved issues in its constitutional training. In fact, I am more optimistic about the robustness of Claude's constitutional training to new situations than people might imagine, as we increasingly find that advanced training conducted on the levels of character and identity is surprisingly robust and has good generalization capabilities. But it is important to be a bit paranoid when we talk about risks to humans and to try to obtain safety and reliability in several different, independent ways. One of those methods is to observe the model internally

By "observing internally," I mean analyzing the array of numbers and operations that make up the Claude neural network and trying to understand mechanistically what they are computing and why. Recall that these AI models are grown rather than built, so we do not have an innate understanding of how they work, but we can attempt to develop an understanding by associating the model's "neurons" and "synapses" with stimuli and behaviors (even altering the neurons and synapses and observing how this changes behavior), similar to how neuroscientists study animal brains by relating measurements and interventions to external stimuli and behaviors. We have made significant progress in this direction, and we can now identify millions of "features" within the Claude neural network that correspond to human-understandable ideas and concepts, and we can selectively activate features to change behavior. Recently, we have begun to go beyond individual features to map out the "circuits" that coordinate complex behaviors, such as rhyming, reasoning about theory of mind, or the step-by-step reasoning required to answer questions like "What is the capital of the state that includes Dallas?" Even more recently, we have started using mechanistic interpretability techniques to improve our safeguards and "audit" them before releasing new models, looking for evidence of deception, conspiracy, power-seeking, or differing behavioral tendencies when assessed.

The unique value of interpretability lies in the ability to infer what the model might do under hypothetical situations that you cannot directly test by observing the model internally and understanding how it works—in principle, this is precisely the concern with relying solely on constitutional training and behavioral experience testing. In principle, you also have the ability to answer questions about why the model exhibits this behavior—for example, whether it said something it believed was wrong or concealed its true capabilities—thus, even if the model's behavior does not exhibit obvious errors, it may still capture concerning signs. To make a simple analogy, a mechanical watch may tick normally to the extent that it is hard to see that it might break down next month, but opening the watch to look inside can reveal mechanical weaknesses that help you figure this out.

Constitutional AI (and similar alignment methods) and mechanistic interpretability are most powerful when used in conjunction, which is a process of iteratively improving Claude's training and then testing questions. The constitution profoundly reflects the expected personality we want to set for Claude; interpretability techniques can provide us with a window into whether that expected personality has formed.

The third thing we can do to address autonomy risks is to build the necessary infrastructure to monitor our models in real-time internal and external use and publicly share any issues we discover. The more people understand the ways in which today's AI systems exhibit undesirable behaviors, the better users, analysts, and researchers can observe such behaviors or similar behaviors in current or future systems. It also allows AI companies to learn from each other—when one company publicly discloses concerns, others can pay attention as well. If everyone discloses issues, the entire industry can better understand what is progressing well and what is not

Anthropic has tried to do this as much as possible. We are investing in various assessments so that we can understand how our models behave in the lab, as well as monitoring tools to observe behavior in the wild (with customer permission). This is crucial for providing us and others with the necessary experiential information to better determine how these systems operate and how they fail. We publicly disclose "system cards" with each model release, aimed at thoroughly exploring potential risks. Our system cards often span hundreds of pages and require significant pre-release work, which could have been used to pursue maximum commercial interests. When we see particularly concerning behaviors, we also broadcast model behaviors more loudly, such as tendencies towards manipulation.

The fourth thing we can do is encourage coordination to address autonomy risks at both the industry and societal levels. While it is very valuable for individual AI companies to engage in good practices or excel at guiding AI models and publicly sharing their findings, the reality is that not all AI companies do this, and even the best companies may still pose risks to everyone. For example, some AI companies have shown disturbing negligence regarding the sexualization of children in today's models, which makes me question whether they will demonstrate the willingness or ability to address autonomy risks in future models. Moreover, the commercial competition among AI companies will only continue to intensify; while guiding models scientifically can yield some commercial benefits, the overall intensity of competition will make it increasingly difficult to focus on addressing autonomy risks. I believe the only solution is legislation—laws that directly affect AI company behavior or otherwise incentivize research and development to address these issues.

It is worth remembering my warning at the beginning of this article about uncertainty and surgical interventions. We are not certain whether autonomy risks will become a serious issue— as I said, I reject the notion that dangers are inevitable or that mistakes are a default outcome. The credible risk of danger is enough for me and Anthropic to incur significant costs to address it, but once we enter the regulatory realm, we force a wide range of actors to bear economic costs, many of whom do not believe that autonomy risks are real or that AI will become powerful enough to pose a threat. I believe these actors are mistaken, but we should maintain a pragmatic attitude towards the degree of opposition we expect to see and the dangers of over-intervention. There is also a real risk that overly regulatory legislation will ultimately impose tests or rules that do not actually enhance safety but waste a significant amount of time (essentially equivalent to "safety theater")—this will also backfire and make safety legislation appear foolish.

Anthropic's view is that the right starting point is transparency legislation, which essentially seeks to require every leading AI company to engage in the transparency practices I described earlier in this section. California's SB 53 and New York's RAISE Act are examples of such legislation, which Anthropic supports and which have successfully passed. In supporting and helping to formulate these laws, we pay particular attention to minimizing collateral damage, such as by exempting smaller companies that are unlikely to produce leading-edge models

Our hope is that transparency legislation will, over time, allow us to better understand the likelihood or severity of autonomous risks forming, as well as the nature of these risks and how best to prevent them. As more specific and actionable risk evidence emerges (if it does), legislation in the coming years can focus like a scalpel on precise and evidence-based risk directions, minimizing collateral damage. It should be clear that if truly strong risk evidence emerges, then the rules should be correspondingly strong.

Overall, I am optimistic that a mix of alignment training, mechanical explainability, efforts to discover and publicly disclose relevant behaviors, safeguards, and societal-level rules can address AI autonomy risks, although I am most concerned about societal-level rules and the behavior of the most irresponsible participants (and it is precisely the most irresponsible participants who most strongly oppose regulation). I believe the remedy is a consistent practice in democratic systems: those of us who believe in this cause should articulate our reasons that these risks are real, and our fellow citizens need to unite to protect themselves.

2. Surprising and Terrifying Empowerment

Misuse for Destruction

Let us assume that the issues of AI autonomy have been resolved—we are no longer worried that the realm of AI geniuses will spiral out of control and overwhelm humanity. AI geniuses do what humans hope they will do, and because they have immense commercial value, individuals and organizations around the world can "rent" one or more AI geniuses to perform various tasks for them.

Having a super-intelligent genius in everyone's pocket is an astonishing advancement and will lead to the astonishing creation of economic value and improvements in human quality of life. I discuss these benefits in detail in "The Compassionate Machine." But that does not mean that all the impacts of making everyone super-capable are positive. It could greatly enhance the ability of individuals or small groups to cause larger-scale destruction than before, as they can leverage complex and dangerous tools (such as weapons of mass destruction) that were previously only accessible to a few individuals with high-level skills, specialized training, and focus.

As Bill Joy wrote 25 years ago in "Why the Future Doesn’t Need Us":

"Manufacturing nuclear weapons requires rare—indeed, practically unobtainable—raw materials and protected information for at least a period of time; biological and chemical weapons programs often require large-scale activities as well. The technologies of the 21st century—genetics, nanotechnology, and robotics… may spawn entirely new categories of accidents and abuses… that can be widely disseminated within the capabilities of individuals or small groups. They do not require large facilities or rare raw materials. … We are at the cusp of extreme evil further refining itself, and this possibility far exceeds the scope of weapons of mass destruction left to nation-states, representing a surprising and terrifying empowerment of extreme individuals."

Joy points out that large-scale destruction requires both motive and capability; as long as the capability is limited to a small group of trained individuals, the risk of individuals (or small groups) causing such destruction is relatively limited. A psychologically disturbed lone wolf may commit a school shooting, but may not be able to create nuclear weapons or unleash a plague.

In fact, capability and motive may even be negatively correlated. Those capable of unleashing a plague may be highly educated: they could be PhDs in molecular biology, particularly resourceful, with promising careers, stable and disciplined personalities, and a lot to lose. Such individuals are unlikely to kill large numbers of people for no benefit to themselves and with great risk to their own future—they need to be driven by pure malice, intense hatred, or instability.

Such individuals do exist, but they are rare, and when they do appear, they often make big news precisely because they are so unusual. They are also often difficult to catch because they are clever and capable, and the mysteries they leave behind can take years or decades to unravel. The most famous example may be mathematician Theodore Kaczynski (the Unabomber), who evaded FBI capture for nearly 20 years, driven by an anti-technology ideology. Another example is biodefense researcher Bruce Edwards Edwards, who seemingly orchestrated a series of anthrax attacks in 2001. This also happens with skilled non-state actors: Aum Shinrikyo managed to acquire sarin gas and released it in the Tokyo subway in 1995, resulting in 14 deaths (and hundreds of injuries).

Fortunately, these attacks did not use infectious biological agents, as the capability to construct or acquire such agents is even beyond the reach of these individuals. Advances in molecular biology have now significantly lowered the threshold for producing biological weapons (especially in terms of material availability), but doing so still requires a great deal of expertise. My concern is that the genius in everyone's pocket could eliminate this barrier, essentially making everyone a PhD-level molecular biologist who could be guided step by step through the process of designing, synthesizing, and releasing biological weapons. Preventing the elicitation of such information under severe adversarial pressure—what is known as "breaking out"—may require defenses that go beyond what is typically integrated into training.

Crucially, this would break the correlation between capability and motive: those who want to kill but lack discipline or skill will now be elevated to the capability level of PhD-level virologists, who are less likely to have such motives. This concern is not limited to biology (though I believe biology is the most terrifying field); it extends to any area that could cause massive destruction but currently requires high levels of skill and discipline. In other words, renting powerful AI empowers malicious (but otherwise mediocre) individuals with intelligence. I worry that there may be a large number of such individuals out there, and if they find an easy way to kill millions, it is only a matter of time before someone does. Moreover, those who do possess expertise may be empowered to carry out destruction on a larger scale than ever before Biology is the area I am most concerned about because it has tremendous destructive potential and is difficult to defend against, so I will pay special attention to biology. However, most of what I say here also applies to other risks, such as cyberattacks, chemical weapons, or nuclear technology.

For obvious reasons, I do not intend to detail how to manufacture biological weapons. But at a high level, I am concerned that LLMs are approaching (or may have already reached) the knowledge required to create and release them from start to finish, and their destructive potential is very high. If determined to maximize dissemination, certain biological agents could lead to millions of deaths. However, this still requires a very high level of skill, including many very specific steps and procedures that are not widely known. I am worried not just about fixed or static knowledge. I am concerned that LLMs can take individuals with average knowledge and abilities and guide them through a complex process that could go wrong or require interactive debugging if not done correctly, similar to how technical support might help non-technical people debug and fix complex computer-related issues (though this would be a longer process that could take weeks or months).

More capable LLMs (far beyond today's capabilities) could even facilitate more terrifying behaviors. In 2024, a group of prominent scientists wrote a letter warning of the risks of researching and potentially creating a dangerous new type of life: "mirror life." The DNA, RNA, ribosomes, and proteins that make up living organisms all have the same chirality (also known as "optical activity"), which leads them not to be equivalent to their reflected versions in a mirror (just as your right hand cannot rotate to become exactly like your left hand). However, the entire system of protein interactions, DNA synthesis, RNA translation, and the mechanisms of protein construction and decomposition all depend on this chirality. If scientists were to create versions of these biological materials with opposite chirality—some of which may have potential advantages, such as drugs that last longer in the body—it could be extremely dangerous. This is because left-handed life, if created in the form of a complete organism capable of reproduction (which would be very difficult), could be indigestible to any system on Earth that breaks down biological materials—it would have a "key" with a "lock" that cannot be inserted into any existing enzyme. This means it could spread uncontrollably, crowding out all life on Earth, and in the worst case, even destroying all life on Earth.

There is significant scientific uncertainty regarding the creation of mirror life and its potential impacts. A report accompanying the 2024 letter concluded that "mirror bacteria could be created in the next one to several decades," which is a broad range. But a sufficiently powerful AI model (it should be clear that this would be far more capable than any model we have today) might be able to discover how to create it faster—and actually help someone do so.

My point is that even if these are vague risks that seem unlikely, the severity of the consequences is so great that they should be taken seriously as a level one risk of AI systems Skeptics have raised many objections to the seriousness of the biological risks posed by LLMs, and while I disagree with these objections, they are worth addressing. Most fall into the category of not realizing the exponential trajectory of the technology. Back in 2023, when we first began discussing the biological risks of LLMs, skeptics claimed that all necessary information could be found on Google and that LLMs did not provide anything beyond that. The notion that Google can provide you with all the necessary information has never been true: the genome is freely available, but as I mentioned above, certain key steps and a wealth of practical knowledge cannot be obtained in this way. Moreover, by the end of 2023, LLMs clearly provided information in certain steps of the process that exceeded what Google could offer.

Subsequently, skeptics reverted to the objection that LLMs are not end-to-end useful and cannot assist in the acquisition of biological weapons beyond providing theoretical information. By mid-2025, our measurements indicated that LLMs might have provided substantial enhancements in several relevant areas, potentially doubling or tripling the likelihood of success. This led us to decide that Claude Opus 4 (and subsequent models like Sonnet 4.5, Opus 4.1, and Opus 4.5) needed to be released under AI safety level 3 protections within our Responsible Scaling Policy framework, with safeguards implemented against this risk (which will be detailed later). We believe that the models may now be approaching a point where, without safeguards, they could assist individuals with STEM degrees but not biology degrees in completing the entire process of producing biological weapons.

Another objection is that society can take other AI-unrelated actions to prevent the production of biological weapons. Most notably, the gene synthesis industry manufactures biological specimens on demand, and the federal government does not require suppliers to screen orders to ensure they do not contain pathogens. A study from MIT found that 36 out of 38 suppliers fulfilled orders containing the 1918 flu sequence. I support mandatory gene synthesis screening, which would make it more difficult for individuals to weaponize pathogens, thereby reducing AI-driven biological risks as well as general biological risks. However, this is not what we have today. It would also merely be one tool for risk reduction; it is a complement to, not a substitute for, the guardrails of AI systems.

The best objection, which I rarely encounter, is that there is a gap between the principle of models being useful and the actual tendency of bad actors to use them. Most individual bad actors are psychologically disturbed, so by definition, their behavior is unpredictable and irrational—and it is precisely these bad actors, the unskilled ones, who might benefit the most from AI making killing easier. Just because a violent attack is possible does not mean someone will decide to carry it out. Perhaps biological attacks are not appealing because they are likely to infect the perpetrator, they do not cater to the military fantasies held by many violent individuals or groups, and it is difficult to selectively target specific individuals. It may also be that, even with AI guidance, undergoing a process that takes months requires a level of patience that most psychologically disturbed individuals simply do not possess We may just be lucky, and the combination of motivation and ability has not come together in a completely correct way in practice.

But this seems to be a very fragile protective reliance. The motivations of psychologically disturbed lone wolves can change for any reason or no reason at all; in fact, there have already been examples of LLMs being used for attacks (just not biologically). The focus on psychologically disturbed lone wolves also overlooks terrorists motivated by ideology, who are often willing to spend significant time and energy (e.g., the 9/11 hijackers). The desire to kill as many people as possible is a motivation that will inevitably arise sooner or later, and unfortunately, this implies the use of biological weapons as a method. Even if this motivation is extremely rare, it only needs to be realized once. With advancements in biology (increasingly driven by AI itself), implementing more selective attacks (e.g., targeting individuals of specific ancestry) may also become possible, which adds a very chilling potential motivation.

I do not believe that once this becomes widely possible, biological attacks will necessarily be implemented immediately—in fact, I would bet against it. But when summed up over millions of people and years, I believe the risk of a significant attack occurring is serious, and the consequences would be so severe (with casualties potentially reaching millions or more) that I think we have no choice but to take serious measures to prevent it.

Defensive Measures

This brings us to the question of how to defend against these risks. Here I think we can do three things. First, AI companies can set up guardrails on their models to prevent them from helping to produce biological weapons. Anthropic is actively doing this. Claude's constitution primarily focuses on high-level principles and values, but there are also a small number of specific hard prohibitions, one of which involves helping to produce biological (or chemical, nuclear, or radioactive) weapons. However, all models can be jailbroken, so as a second line of defense, we have implemented a classifier specifically to detect and block biological weapon-related outputs (since mid-2025, when our testing showed our models were approaching thresholds that could pose risks). We regularly upgrade and improve these classifiers and generally find them to be very robust even in the face of complex adversarial attacks. These classifiers significantly increase the cost of serving our models (in some models, they account for nearly 5% of total inference costs), thereby cutting into our profits, but we believe using them is the right thing to do.

It is commendable that some other AI companies have also implemented classifiers. But not every company has done so, and there is no requirement for companies to maintain their classifiers. I worry that over time, a prisoner's dilemma may arise, where companies may betray and lower costs by removing classifiers. This again is a typical negative externality problem that cannot be solved solely by the voluntary actions of Anthropic or any other single company. Voluntary industry standards may help, and third-party assessments and validations conducted by AI safety research institutes and third-party evaluators may also be beneficial.

But ultimately, defense may require government action, which is the second thing we can do. My point here is the same as the perspective on addressing autonomy risks: we should start with transparency requirements that help society measure, monitor, and collectively defend against risks without severely disrupting economic activity Then, if we reach a clearer risk threshold, we can formulate legislation that is more precisely targeted at these risks and carries a lower probability of harm. In the specific case of biological weapons, I actually believe that the timing for such targeted legislation may come soon—Anthropic and other companies are increasingly understanding the nature of biological risks and the reasonable scope of requiring companies to defend against these risks. Comprehensive defense against these risks may require international cooperation, even collaboration with geopolitical adversaries, but there are precedents in treaties prohibiting the development of biological weapons. I am generally skeptical of most types of international cooperation on AI, but this may be a narrow area where global restraint could be achieved. Even authoritarian regimes do not want large-scale biological terrorist attacks.

Finally, the third countermeasure we can take is to attempt to develop defenses against biological attacks themselves. This may include monitoring and tracking for early detection, investing in air purification research and development (such as far-ultraviolet disinfection), rapid vaccine development to respond to and adapt to attacks, better personal protective equipment (PPE), and treatments or vaccinations for some of the most likely biological agents. mRNA vaccines can be designed to respond to specific viruses or variants, which is an early example of what may be achievable here. Anthropic is eager to collaborate with biotechnology and pharmaceutical companies on this issue. But unfortunately, I believe our expectations for defenses should be limited. There is an asymmetry between attack and defense in biology, as pathogens can spread rapidly on their own, while defenses require rapid organization of testing, vaccination, and treatment across large populations. Unless the response is as quick as lightning (which is rare), most damage will occur before a response becomes possible. It is conceivable that future technological improvements may shift this balance in favor of defense (we should certainly use AI to help develop such technological advancements), but until then, preventive safeguards will be our primary line of defense.

It is worth briefly mentioning cyberattacks here, as opposed to biological attacks, AI-driven cyberattacks are already occurring in the wild, including large-scale and state-sponsored espionage. As models rapidly advance, we expect these attacks to become more capable until they become the primary mode of cyberattacks. I anticipate that AI-driven cyberattacks will pose a serious and unprecedented threat to the integrity of global computer systems, and Anthropic is working very hard to shut down these attacks and ultimately prevent them from occurring reliably. The reason I do not focus on the internet as much as I do on biology is: (1) the likelihood of cyberattacks killing people is much lower, certainly not on the scale of biological attacks; (2) the balance of offense and defense in the cyber realm may be more manageable, and if appropriate investments are made there, there is at least hope that defenses can keep up (or ideally exceed) AI attacks.

While biology is currently the most serious vector for attacks, there are many other vectors, and potentially more dangerous ones may emerge. The general principle is that without countermeasures, AI may continuously lower the threshold for destructive activities on a larger scale, and humanity needs to respond seriously to this threat

3. The Odious Apparatus

Abused to Seize Power

The previous section discussed the risks posed by individuals and small organizations causing large-scale destruction by utilizing a small part of the "genius nation in the data center." But we should also be concerned—perhaps even more so—about the misuse of AI to exercise or seize power, likely perpetrated by larger and more mature actors.

In "The Compassionate Machine," I discussed the possibility that dictatorial governments might use powerful AI to monitor or suppress their citizens in ways that are extremely difficult to reform or overthrow. Current authoritarian regimes are limited by the necessity of having humans execute their commands, and humans often have limits on how inhumane they are willing to be. But AI-empowered authoritarian regimes would have no such limitations.

Worse still, nations could leverage their advantages in AI to gain power over other countries. If the entire "genius nation" is merely owned and controlled by the military apparatus of one (human) country, while other countries lack equivalent capabilities, it is hard to see how they could defend themselves: they would be outsmarted at every turn, akin to the war between humans and mice.

AI can facilitate, consolidate, or expand authoritarian rule in many ways, and I will list several that concern me the most. Note that some of these applications have legitimate defensive uses, and I do not necessarily oppose them outright; however, I remain concerned that they structurally tend to favor authoritarian regimes:

  • Fully Autonomous Weapons. Millions or billions of fully automated armed drones, partially controlled by powerful AI and strategically coordinated globally by even more powerful AI, could form an invincible army capable of defeating any military in the world and suppressing domestic dissent by tracking every citizen. The developments in the Russia-Ukraine war should alert us; drone warfare has been with us (though not yet fully autonomous and only a small part of what powerful AI could achieve). The development of powerful AI could enable a country's drones to far surpass those of other nations, accelerate their manufacturing, enhance their resistance to electronic interference, improve their maneuverability, and so on. Of course, these weapons also have legitimate uses in defending democracy: they have been key in defending Ukraine and could be crucial in defending Taiwan. But they are dangerous weapons: we should be concerned about them falling into the hands of authoritarian regimes, but also worry that because they are so powerful and carry so little accountability, the risk of democratic governments turning them against their own people to seize power is greatly increased.

  • AI Propaganda. Today's phenomena of "AI psychosis" and "AI girlfriends" indicate that even at current levels of intelligence, AI models can have a powerful psychological impact on people. More powerful versions of these models, if more deeply embedded and aware of people's daily lives, and capable of simulating and influencing them over months or years, could essentially brainwash many (most?) people into any desired ideology or attitude, and could be used by unscrupulous leaders to ensure loyalty and suppress dissent, even in the face of levels of repression that most people would resist

  • Strategic Decision-Making. A genius nation within a data center can be used to provide geopolitical strategic advice to countries, groups, or individuals, which we can call "Virtual Bismarck." It can optimize the three aforementioned strategies for seizing power, along with potentially developing many other strategies that I cannot think of (but the genius nation can). The effectiveness in diplomacy, military strategy, research and development, economic strategy, and many other areas could be significantly enhanced by powerful AI. Many of these skills are legitimate for democratic countries — we want democratic nations to have the best strategies to defend against authoritarian regimes — but the potential for abuse in anyone's hands still exists.

After describing my concerns, let’s turn to "who." I am worried about those who can most access AI, starting from the highest political power positions, or entities with a history of repression. In order of severity, I am concerned about:

  • Democratic countries competitive in AI. As I wrote above, democratic countries have legitimate interests in certain AI-driven military and geopolitical tools because democratic governments provide the best opportunity to counter the use of these tools by authoritarian regimes. Broadly speaking, I support arming democratic nations with the tools needed to defeat authoritarian regimes — I do not believe there is any other way. However, we cannot overlook the possibility of democratic governments themselves abusing these technologies. Democratic countries typically have safeguards to prevent their military and intelligence agencies from turning inward against their own populations, but because AI tools require very few people to operate, they have the potential to circumvent these safeguards and the norms that support them. It is also worth noting that some of these safeguards are already being gradually eroded in some democratic countries. Therefore, we should arm democratic nations with AI, but we should do so cautiously and within limits: they are the immune system we need to combat authoritarian regimes, but like an immune system, they carry the risk of turning against us and becoming a threat.

  • Non-democratic countries with large data centers. Most countries that are less democratic are not major players in AI because they do not have companies producing cutting-edge AI models. However, some of these countries do have large data centers (often as part of expansions by companies operating in democratic countries), which can be used to run cutting-edge AI on a large scale (although this does not grant them the capability to push the frontier). There is a certain degree of danger here — these governments could, in principle, requisition data centers and use the AI nation within for their own purposes. I am less concerned about this point, but it is a risk to keep in mind.

  • AI companies. As the CEO of an AI company, it feels a bit awkward to say this, but I believe the next layer of risk is actually the AI companies themselves. AI companies control large data centers, train cutting-edge models, possess the greatest expertise on how to use these models, and in some cases have the potential for daily contact and influence over tens of millions or even hundreds of millions of users. What they primarily lack is the legitimacy and infrastructure of a state, so many of the things needed to establish AI dictatorial tools are illegal or at least very questionable for AI companies. But it is not impossible: for example, they could use their AI products to brainwash their vast consumer user base, and the public should be vigilant about the risks this represents I believe that the governance of AI companies deserves extensive scrutiny.

There are many possible counterarguments regarding the seriousness of these threats, and I wish I could believe them, as AI-enabled authoritarianism frightens me. It is worth discussing some of these arguments and responding to them.

First, some may place their confidence in nuclear deterrence, especially to counter AI autonomous weapons used for military conquest. If someone threatens to use these weapons against you, you can always threaten a nuclear retaliation. My concern is that I am not entirely sure we can be confident in our nuclear deterrence against genius nations in data centers: powerful AI could potentially design methods to detect and strike nuclear submarines, conduct influence operations against operators of nuclear weapon infrastructure, or use AI's cyber capabilities to launch cyberattacks against satellites used to detect nuclear launches. Alternatively, it may be feasible for a nation to be taken over solely by AI surveillance and AI propaganda, and it has never truly presented a clear picture of what is happening and when a nuclear response is appropriate. Perhaps these things are not feasible, and nuclear deterrence remains effective, but this seems like a high-risk gamble.

A second possible counterargument is that we may have measures to counter these dictatorial tools. We can counter drones with our own drones, cyber defenses will improve alongside cyberattacks, and there may be ways to make people immune to propaganda, etc. My response is that these defenses can only be achieved through equally powerful AI. Without some counterforce possessing equally intelligent and numerous genius nations in data centers, it is impossible to match the quality or quantity of drones, and cyber defenses cannot outsmart cyber offenses, etc. Therefore, the issue of countermeasures boils down to the power balance of strong AI. Here, I worry about the recursive or self-reinforcing properties of strong AI (which I discussed at the beginning of this article): that each generation of AI can be used to design and train the next generation of AI. This leads to the risk of uncontrollable advantages, where the current leaders in the field of strong AI may be able to expand their lead, making it difficult to catch up. We need to ensure that it is not an authoritarian state that first enters this cycle.

Furthermore, even if a power balance can be achieved, there is still a risk that the world could split into spheres of authoritarian influence, much like in "1984." Even if several competing major powers each possess powerful AI models, none can overwhelm the others, and each major power may still suppress its own population internally, making it difficult to overthrow (because the population lacks powerful AI to protect itself). Therefore, even if it does not lead to a single country taking over the world, preventing AI-enabled dictatorship is still important.

Defensive Measures

How do we defend against this broad array of dictatorial tools and potential threat actors? As I mentioned in previous sections, I believe we can do several things. First, chips and chip manufacturing tools are the biggest bottleneck for powerful AI, and preventing them is a simple yet extremely effective measure, perhaps the most important single action we can take. To justify this sale, many complex arguments have been made, such as that "spreading our technology stack around the world" can allow "the U.S. to win" some vague, unspecified economic war. In my view, this is akin to selling nuclear weapons to North Korea and then boasting that the missile casing was manufactured by Boeing, so the U.S. "won." China lags behind the United States by several years in its ability to mass-produce cutting-edge chips, and the critical period for establishing a genius nation in data centers is likely within the next few years. There is no reason to give their AI industry a significant boost during this crucial period.

Secondly, it makes sense to leverage AI to empower democratic nations to resist authoritarian regimes. This is why Anthropic believes it is important to provide AI to the intelligence and defense sectors of the United States and its democratic allies. Defending attacked democracies, such as Ukraine and Taiwan (through cyberattacks), seems to be a particularly high-priority task, and it is equally important to empower democratic nations to use their intelligence agencies to undermine and weaken authoritarian regimes from within. To some extent, the only way to respond to authoritarian threats is to match and exceed them militarily. If the alliance of the United States and its democratic allies gains an advantage in strong AI, it will not only be able to defend against authoritarian regimes but also contain them and limit their AI abuses.

Thirdly, we need to draw a hard line against the internal AI abuses of democratic nations. We need to restrict what our governments are allowed to do with AI so that they do not seize power or oppress their own people. The formulation I came up with is that we should use AI for all aspects of national defense, except those that would make us more like our authoritarian adversaries.

Where should the line be drawn? In the list at the beginning of this section, there are two items—using AI for domestic mass surveillance and mass propaganda—that, in my view, are bright red lines and completely illegal. Some may argue that there is no need to do anything (at least in the United States). However, the rapid advancement of AI may create situations that our existing legal framework cannot adequately address. For example, the U.S. government's mass recording of all public conversations (such as what people say to each other on street corners) may not be unconstitutional; it was previously difficult to organize this vast amount of information, but with AI, all of this can be transcribed, interpreted, and triangulated to create a picture of the attitudes and loyalties of many or most citizens. I would support legislation focused on civil liberties (and possibly even a constitutional amendment) to impose stronger safeguards against AI-driven abuses.

The other two items—fully autonomous weapons and AI for strategic decision-making—are harder to delineate because they have legitimate uses in defending democracy while also being prone to abuse. Here, I believe extreme caution and scrutiny are needed, combined with safeguards to prevent misuse. My main fear is that there are too few "fingers on the button," allowing one person or a few individuals to essentially operate a drone army without any other cooperation to execute their commands. As AI systems become more powerful, we may need more direct and immediate oversight mechanisms to ensure they are not misused, perhaps involving government departments outside the executive branch. I believe we should be particularly cautious about fully autonomous weapons and should not rush to use them without appropriate safeguards.

Fourthly, after drawing a hard line against AI abuses by democratic nations, we should use this precedent to establish an international taboo against the most severe abuses of powerful AI. I acknowledge that the current political winds have shifted against international cooperation and norms, but this is a case where we urgently need them I would even argue that in some cases, the use of powerful AI for mass surveillance, mass propaganda, and certain types of offensive use of fully autonomous weapons should be considered a crime against humanity. More generally, there is an urgent need for a strong norm against the centralized abuse of AI and all its tools and instruments.

A stronger version of this position might be that the potential for AI-enabled centralized authoritarianism is so dark that dictatorship is simply not a form of government that people can accept in the post-powerful AI era. Just as feudalism became unworkable with the Industrial Revolution, the AI era may inevitably and logically lead to the conclusion that if humanity is to have a bright future, democracy (and hopefully a democracy improved and revitalized by AI, as I discuss in "Compassionate Machines") is the only viable form of government.

The fifth and final point is that AI companies should be closely monitored, and their connections with the government should also be monitored; these connections are necessary but must have limits and boundaries. The sheer power embodied by powerful AI makes ordinary corporate governance—designed to protect shareholders and prevent fraud and other common abuses—unlikely to be competent in governing AI companies. Public commitments by companies (even as part of corporate governance) not to take certain actions may also be valuable, such as not building or storing military hardware privately, not allowing individuals to use large computing resources irresponsibly, or not using their AI products as propaganda to manipulate public opinion in their favor.

The dangers here come from many directions, some of which are in tension with others. The only constant is that we must seek accountability, norms, and guardrails for all, even as we empower "good" actors to curb "bad" actors.

4. Player piano

Economic Disruption

The first three sections are essentially about the security risks posed by powerful AI: risks from the AI itself, risks from individuals and small organizations abusing it, and risks from states and large organizations abusing it. If we set aside security risks or assume they have been addressed, the next question is economic. What impact will this incredible injection of "human" capital have on the economy? Clearly, the most obvious impact will be a tremendous boost to economic growth. Advances in scientific research, biomedical innovation, manufacturing, supply chain, financial system efficiency, and so on will almost certainly lead to faster economic growth rates. In "Compassionate Machines," I propose that a 10-20% sustained annual GDP growth rate is possible.

But this is clearly a double-edged sword: what will the prospects for most existing human economies look like in this world? New technologies often bring shocks to the labor market; in the past, humanity has always managed to recover from them, but I worry this is because previous shocks only affected a small portion of humanity's full potential, leaving room for humans to expand into new tasks. AI will have broader and faster impacts, so I worry that ensuring a smooth transition will be more challenging

Destruction of the Labor Market

I am concerned about two specific issues: labor market displacement and the concentration of economic power. Let's start with the first. This is a topic I publicly warned about in 2025, when I predicted that AI could replace 50% of entry-level white-collar jobs within the next 1-5 years, even if it accelerated economic growth and scientific progress. This warning sparked public debate on the topic. Many CEOs, tech experts, and economists agreed with my view, but others believed I was falling into the "total labor" fallacy, misunderstanding how the labor market operates, and some did not see the 1-5 year timeframe, thinking I claimed AI was currently replacing jobs (I agree that it may not be happening at present). Therefore, it is worth elaborating on why I am concerned about labor displacement to clarify these misunderstandings.

As a benchmark, it is useful to understand how the labor market typically responds to technological advancements. When a new technology emerges, it first makes certain human jobs more efficient. For example, in the early stages of the Industrial Revolution, machines (like upgraded plows) made human farmers more efficient in certain aspects of their work. This increased farmers' productivity, thereby raising their wages.

In the next step, certain parts of the work can be completely done by machines, such as the invention of threshers or seeders. At this stage, the proportion of work done by humans decreases, but the work they do becomes increasingly leveraged because it complements the work of machines, and their productivity continues to rise. As described by Jevons’ paradox, both farmers' wages and the number of farmers continue to increase. Even if 90% of the work is done by machines, humans can simply do ten times the amount of the 10% of work they still do, producing ten times the output with the same amount of labor.

Eventually, machines do all or almost all of the work, just like modern combine harvesters, tractors, and other equipment. At this point, agriculture as a form of human employment does indeed decline sharply, which may cause severe disruption in the short term, but because agriculture is just one of many useful activities humans can engage in, people will eventually turn to other jobs, such as operating factory machines. This is true even if agriculture previously accounted for a huge proportion of employment. 250 years ago, 90% of Americans lived on farms; in Europe, 50-60% of employment was in agriculture. Now those percentages are in the single digits as workers transitioned to industrial jobs (and later to knowledge work). The economy can accomplish most work with just 1-2% of the labor previously required, freeing the remaining labor to build a more advanced industrial society. There is no fixed "total labor," only an ever-expanding capacity to do more with less. People's wages rise in line with GDP growth, and once the short-term disruption passes, the economy will maintain full employment.

It is possible that the development of AI will follow a similar path, but I would bet very strongly that it will not. Here are the reasons I believe AI may be different:

  • Speed. The pace of AI advancement is much faster than previous technological revolutions. For example, in the past two years, AI models have evolved from being almost unable to complete a line of code to writing all or almost all of the code for some people—including engineers at Anthropic Soon, they may be able to complete all tasks of software engineers from start to finish. People find it difficult to adapt to this pace of change, which includes changes in specific ways of working as well as the need to transition to new jobs. Even legendary programmers increasingly describe themselves as "falling behind." If anything, this pace may continue to accelerate as AI coding models increasingly speed up tasks related to AI development. It should be made clear that the speed itself does not mean that the labor market and employment will ultimately not recover; it simply means that the short-term transformation will be more painful than past technological anomalies because human and labor market responses and adjustments are slow.
  • Cognitive Breadth. As the term "the genius nation in data centers" suggests, AI will be capable of performing a very wide range of human cognitive abilities—perhaps all cognitive abilities. This is starkly different from previous technologies (such as mechanized agriculture, transportation, and even computers). This will make it difficult for people to easily transition from replaced jobs to similar jobs that suit them. For example, the general intellectual abilities required for entry-level jobs in finance, consulting, and law are quite similar, even if the specific knowledge is vastly different. Disrupting one of these technologies would allow employees to transition to the other two close alternatives (or allow undergraduates to change majors). However, simultaneously disrupting all three (and many other similar jobs) may make it harder for people to adapt. Moreover, it is not just that most existing jobs will be disrupted. This has happened before—consider how agriculture once accounted for a huge proportion of employment. But farmers could transition to relatively similar jobs operating factory machines, even if such work was not common before. In contrast, AI is increasingly matching the general cognitive profile of humans, which means it will also excel at those new jobs that are typically created as old jobs are automated. In other words, AI is not a replacement for specific human jobs but rather a general labor substitute for humans.

  • Cognitive Ability Segmentation. In a wide range of tasks, AI seems to be advancing from the bottom to the top of the ability ladder. For example, in coding, our models have evolved from "mediocre programmer" to "powerful programmer" to "very powerful programmer." We are now beginning to see similar progress in general white-collar jobs. Thus, we face the risk that AI will not just affect those with specific skills or professions (who can adapt through retraining) but will impact those with certain inherent cognitive attributes, namely individuals with lower intellectual abilities (which are difficult to change). It is unclear where these individuals will go or what they will do, and I worry they may form an unemployed or extremely low-wage "underclass." It should be made clear that similar things have happened before—for example, computers and the internet were considered by some economists to represent "skill-biased technological change." However, this skill bias is neither as extreme as I expect AI to be nor is it seen as leading to increased wage inequality, so it is not a reassuring precedent.

  • Ability to Fill Gaps. The way human work typically adjusts in the face of new technologies is that work has many aspects; even when new technologies seem to directly replace humans, there are often gaps. If someone invents a machine for making small parts, humans may still need to load raw materials into the machine Even if this only accounts for 1% of the effort required for handcrafted components, human workers can simply produce over 100 times more components. But AI, besides being a rapidly advancing technology, is also a rapidly adaptable technology. During each model release, AI companies carefully measure what the model excels at and what it does not, and customers provide such information after the release. Weaknesses can be addressed by collecting tasks that reflect the current gaps and training on them in the next model. In the early days of generative AI, users noticed certain weaknesses in AI systems (such as AI image models generating incorrect numbers of fingers), and many believed these weaknesses were inherent to the technology. If they were, it would limit job disruption. However, almost every such weakness is quickly resolved—often within months.

It is worth responding to common points of skepticism. First, some believe that economic diffusion will be slow, even if the underlying technology can perform most human labor, its actual application across the economy may be much slower (for example, in industries far from AI that adopt slowly). The slow diffusion of technology is absolutely real—I have spoken with a variety of business people, and in some places, adopting AI will take years. This is why I predict that 50% of entry-level white-collar jobs will be disrupted within 1-5 years, even though I doubt we will have powerful AI (technically capable of performing most or all jobs, not just entry-level) in less than 5 years. But the diffusion effect only buys us time. I am not convinced they will be as slow as people predict. The rate of enterprise AI adoption is growing far faster than any previous technology, largely relying on the sheer power of the technology itself. Moreover, even if traditional enterprises adopt new technologies slowly, startups will emerge to act as "glue" and make adoption easier. If that doesn't work, startups may simply disrupt existing businesses directly.

This could lead to a world where, rather than specific jobs being disrupted, large enterprises are overall disrupted and replaced by startups with much lower labor intensity. It could also lead to a world of "geographical inequality," where the increase in global wealth is partially concentrated in Silicon Valley, which operates at a different pace than the rest of the world and leaves it behind. All these outcomes are excellent for economic growth—but not so good for the labor market or those left behind.

Secondly, some say that human work will shift to the physical world, avoiding the entire category of "cognitive labor" that AI is advancing so rapidly. I am not sure how safe this is. A lot of physical labor is already being done by machines (for example, manufacturing) or will soon be done by machines (for example, driving). Additionally, sufficiently powerful AI will be able to accelerate the development of robots and then control these robots in the physical world. This may buy some time (which is a good thing), but I worry it won't buy too much. Even if disruption is limited to cognitive tasks, it is still an unprecedented scale and speed of disruption.

Thirdly, perhaps some tasks inherently require or greatly benefit from a human touch. I am less certain about this, but I still doubt it is enough to offset most of the impacts I described above. AI is already widely used in customer service Many people report that talking to AI about their personal issues is easier than talking to a therapist—AI is more patient. When my sister struggled with medical issues during her pregnancy, she felt she wasn't getting the answers or support she needed from her care providers, and she found Claude had a better bedside manner (and was more successful in diagnosing issues). I believe some tasks really benefit from a human touch, but I'm not sure how many—especially when we're talking about finding jobs for nearly everyone in the labor market.

Fourth, some might argue that comparative advantage will still protect humans. According to the law of comparative advantage, even if AI is better than humans in every aspect, any relative difference in the skill configuration between humans and AI creates a basis for trade and specialization between them. The problem is that if AI's productivity is actually thousands of times higher than that of humans, this logic begins to break down. Even small transaction costs could make it not worthwhile for AI to trade with humans. Even if technically they can offer something, human wages might be very low.

All these factors could potentially be addressed—the labor market has enough elasticity to adapt to such massive disruption. But even if it ultimately can adapt, the factors mentioned above suggest that the short-term shock will be unprecedented in scale.

Defensive Measures

What can we do about this? I have several suggestions, some of which Anthropic is already working on. The first thing is simply to obtain accurate data about the real-time situation of job displacement. When economic changes happen rapidly, it is difficult to get reliable data about what is happening, and without reliable data, it is hard to design effective policies. For example, government data currently lacks granular, high-frequency data on AI adoption by businesses and industries. Last year, Anthropic has been operating and publicly releasing an economic index that shows the usage of our models in near real-time, broken down by industry, task, location, and even whether the task is automated or collaborative. We also have an economic advisory board to help us interpret this data and see what is coming.

Secondly, AI companies have choices in how to collaborate with businesses. The inefficiencies of traditional companies mean that their process of rolling out AI may be highly path-dependent, and there is some room to choose better paths. Companies typically have a choice between "cost savings" (doing the same with fewer people) and "innovation" (doing more with the same number of people). The market will inevitably produce both, and any competitive AI company must serve both, but where possible, guiding companies toward innovation may provide us with some time. Anthropic is actively thinking about this issue.

Thirdly, companies should consider how to care for their employees. In the short term, creatively reallocating employees within the company may be a promising way to avoid the need for layoffs. In the long term, in a world of immense total wealth, many companies may significantly increase in value due to productivity gains and capital concentration, and even if human employees no longer provide economic value in the traditional sense for a long time, paying their wages may still be feasible. Anthropic is currently considering a range of possible avenues for our own employees, which we will share in the near future

Fourth, wealthy individuals have an obligation to help address this issue. It saddens me that many rich people (especially in the tech industry) have recently adopted a cynical and nihilistic attitude, believing that charity is inevitably fraudulent or useless. Private charitable organizations like the Gates Foundation and public programs like PEPFAR have saved tens of millions of lives in developing countries and helped create economic opportunities in developed countries. All co-founders of Anthropic have committed to donating 80% of our wealth, and Anthropic employees have personally pledged to donate company stock worth billions at current prices—the company has committed to matching these donations.

Fifth, while all the private actions mentioned above may help, ultimately such a massive macroeconomic issue will require government intervention. The natural policy response to a huge economic pie combined with severe inequality (due to many people lacking jobs or being underpaid) is a progressive tax system. This tax can be general or specifically targeted at AI companies. Clearly, tax design is complex, and there are many ways to get it wrong. I do not support poorly designed tax policies. I believe that the extreme levels of inequality predicted in this article morally justify stronger tax policies, but I can also make a pragmatic argument to billionaires around the world that supporting a good version aligns with their interests: if they do not support a good version, they will inevitably get a bad version designed by the mob.

Ultimately, I believe that all the interventions mentioned above are methods to buy time. Eventually, AI will be able to do everything, and we need to address that. I hope that by then, we can use AI itself to help us restructure the market in a way that suits everyone, and the aforementioned interventions can help us through the transition period.

Concentration of Economic Power

In addition to the issues of job displacement or economic inequality itself, there is the problem of the concentration of economic power. Section 1 discussed the risk of humans being stripped of power by AI, while Section 3 discussed the risk of citizens being stripped of power by the government through force or coercion. But if wealth is so concentrated that a small group of people effectively controls government policy through their influence, while ordinary citizens lack influence due to a lack of economic leverage, another form of power deprivation occurs. Democracy is ultimately supported by the idea that the entire population is necessary for the economy to function. If this economic leverage disappears, the implicit social contract of democracy may cease to operate. Others have written about this, and I do not need to elaborate here, but I share this concern, and I worry that it has already begun to happen.

It needs to be clear that I do not oppose people making a lot of money. There is a strong argument that, under normal circumstances, this can incentivize economic growth. I also sympathize with concerns about stifling innovation by killing the golden goose that produces it. But in a scenario where GDP grows by 10-20% annually and AI rapidly takes over the economy, while a single individual holds a substantial share of GDP, innovation is not something to worry about. What needs to be worried about is the degree of wealth concentration that could undermine society

The most famous example of extreme wealth concentration in American history is the Gilded Age, during which the wealthiest industrialist was John D. Rockefeller. Rockefeller's wealth accounted for about 2% of the U.S. GDP at the time. Today, a similar proportion would result in $600 billion in wealth, while the richest person in the world today (Elon Musk) has already surpassed this figure, at around $700 billion. Thus, we are at an unprecedented level of wealth concentration in history, even before most of the economic impacts of AI have emerged. I think it is not too far-fetched to imagine AI companies, semiconductor companies, and potential downstream application companies generating about $30 trillion in revenue annually, with a valuation of approximately $30 trillion, leading to individual wealth reaching the trillion-dollar level (if we get a "genius nation"). In that world, our current debates about tax policy would no longer apply, as we would be in a fundamentally different situation.

Related to this is my concern about the combination of this economic wealth concentration with the political system. AI data centers already account for a significant portion of U.S. economic growth, thus closely linking the economic interests of large tech companies (which are increasingly focused on AI or AI infrastructure) with the political interests of the government in a way that may create improper incentives. We have already seen this through tech companies' reluctance to criticize the U.S. government and the government's support for extreme anti-regulatory AI policies.

Defensive Measures

What can be done about this? First, the most obvious thing is that companies should simply choose not to be part of it. Anthropic has been striving to be a policy actor rather than a political actor, maintaining our genuine views regardless of the government. We have voiced support for reasonable AI regulation and export controls that align with the public interest, even when these contradict government policies. Many have told me to stop doing this, as it could lead to adverse treatment, but in the year we have done this, Anthropic's valuation has increased more than sixfold, which is almost an unprecedented leap in our business scale.

Secondly, the AI industry needs to establish a healthier relationship with the government—a relationship based on substantive policy engagement rather than political alliances. Our choice to engage in substantive policy rather than politics is sometimes interpreted as a tactical error or a failure to "read the air," rather than a principled decision, and this framework concerns me. In a healthy democratic country, companies should be able to advocate for good policies for the sake of good policies themselves. Related to this, public backlash against AI is brewing: this could be a correction, but it has not yet focused. Much of what is targeted is not the real issue (such as the water usage of data centers) and proposes solutions that do not address the real problems (such as data center bans or poorly designed wealth taxes). It seems important to focus public discussion on the fundamental issue of ensuring that AI development is accountable to the public interest and not captured by any specific political or business alliance.

Thirdly, the macroeconomic interventions I described earlier in this section, along with the revival of private philanthropy, can help balance the economic scales while addressing job displacement and the concentration of economic power We should look at the history of our country: even during the Gilded Age, industrialists like Rockefeller and Carnegie felt a strong obligation to society as a whole, a feeling that society had made a significant contribution to their success and that they needed to give back. This spirit seems to be increasingly lacking today, and I believe it is a significant part of overcoming this economic predicament. Those at the forefront of the AI economic boom should be willing to give up their wealth and power.

5. The Black Seas of Infinity

Indirect Effects

This final section is a synthesis of unknown unknowns, particularly those things that may go wrong as indirect results of AI's positive advancements and the subsequent universal acceleration of science and technology. Assuming we address all the risks described so far and begin to reap the benefits of AI, we might experience "a century's worth of scientific and economic progress compressed into a decade," which would be a tremendous positive factor for the world, but we will have to deal with the problems brought about by such rapid progress, which may come at us quickly. We may also encounter other risks that occur indirectly as consequences of AI advancements, which are difficult to predict in advance.

Given the nature of unknown unknowns, it is impossible to provide an exhaustive list, but I will outline three potential concerns as examples of what we should pay attention to:

  1. Rapid Advances in Biology. If we achieve a century's worth of medical progress in just a few years, we could greatly extend human lifespan and have the opportunity to gain radical capabilities, such as enhancing human intelligence or fundamentally modifying human biology. This would be a significant change in possibilities, and it would happen quickly. If done responsibly (which is my hope, as described in "The Compassionate Machine"), it could be positive, but there is always the risk of things going wrong— for example, if efforts to make humans smarter also make them more unstable or power-seeking. There is also the issue of "uploading" or "whole brain emulation," which involves instantiating human thought in software, potentially helping humanity transcend its physical limitations, but also carrying unsettling risks.

  2. AI Changing Human Life in Unhealthy Ways. A world with billions of intelligent agents that are smarter than humans in every way would be a very strange place to live. Even if AI does not actively aim to attack humans (as discussed in Section 1) and is not explicitly used by states for oppression or control (as discussed in Section 3), there are still many ways things could go wrong beneath the surface through normal business incentives and nominally voluntary transactions. We see early signs of this in concerns about AI-induced psychosis, AI-driven suicides, and relationships with AI. For example, could a powerful AI invent some new religion and convert millions? Would most people end up "addicted" to AI interactions in some way? Would people ultimately be "manipulated" by AI systems that essentially monitor their every move and tell them what to do and say at any given moment, leading to a "good" life but lacking freedom or any sense of achievement? If I were to sit down with the creators of "Black Mirror" to brainstorm, it would not be difficult to generate dozens of such scenarios I believe this points to the importance of things like improving the Claude Constitution, which goes beyond what is necessary to prevent the issues in Section 1. It seems crucial to ensure that AI models genuinely prioritize the long-term interests of users in a way that a thoughtful person would recognize, rather than in some subtly distorted manner.

  3. Human Goals. This is related to the previous point, but rather than being about specific human interactions with AI systems, it is more about how human life overall changes in a world with powerful AI. In such a world, can humans find purpose and meaning? I think this is an attitude issue: as I mentioned in "Compassionate Machines," I believe that human goals do not depend on being the best in the world at something; humans can find purpose through the stories and projects they enjoy, even over long periods. We just need to break the link between creating economic value and self-worth and meaning. But this requires a societal transformation, and there will always be risks that we handle poorly.

For all these potential issues, my hope is that in a world with powerful AI that we trust will not kill us, that is not an oppressive government tool, and that truly works for us, we can leverage AI itself to predict and prevent these problems. But this does not guarantee safety—like all other risks, this is something we must handle with care.

Human Testing

Reading this article may give the impression that we are in a daunting situation. I certainly found writing this article to be a daunting task, in stark contrast to "Compassionate Machines," which felt like giving form and structure to the incredibly beautiful music that has echoed in my mind for years. There are indeed many difficult aspects to the situation. AI poses threats to humanity from multiple directions, and there are real tensions between different dangers; if we are not extremely careful in threading the needle while mitigating some of these dangers, we risk exacerbating others.

Taking the time to carefully construct AI systems to prevent them from autonomously threatening humanity creates real tensions with the need for democratic nations to stay ahead of authoritarian states and not be conquered by them. Conversely, the same AI-empowering tools necessary to combat dictatorship, if taken too far, could turn inward and create tyranny within our own countries. AI-driven terrorism could kill millions through the misuse of biology. The labor and economic concentration effects of AI, aside from being serious issues in themselves, could force us to confront other problems in an environment of public outrage or even civil unrest, rather than being able to evoke the angels of our better nature. Most importantly, the sheer number of risks, including unknown risks, and the need to address all these risks simultaneously create a daunting challenge that humanity must navigate.

Furthermore, the past few years should have made it clear that the idea of stopping or even significantly slowing down technology is fundamentally untenable. The formula for building powerful AI systems is so simple that it could almost be said to spontaneously emerge from the right combination of data and raw computation. Its creation may have been inevitable from the moment humans invented the transistor, and one could even argue that it was even more so when we first learned to control fire If a company does not build it, other companies will almost as quickly build it. If all companies in democratic countries stop or slow down development through a common agreement or regulatory decree, authoritarian countries will simply continue to move forward. Given the incredible economic and military value of the technology, coupled with the lack of any meaningful enforcement mechanism, I see no way we could persuade them to stop.

I do see a path to a slightly moderated development of AI that is compatible with a realist view of geopolitics. This path involves slowing down their march toward powerful AI over the years by denying authoritarian countries the resources needed to build strong AI (i.e., chips and semiconductor manufacturing equipment). This, in turn, provides democratic countries with a buffer that they can "spend" to build strong AI more carefully, focusing more on its risks while still progressing fast enough to easily outpace authoritarian countries. Then, the competition among AI companies within democratic countries can be managed under a common legal framework, through a mix of industry standards and regulation.

Anthropic has been vigorously advocating for this path, pushing for chip export controls and prudent regulation of AI, but even these seemingly common-sense proposals have mostly been rejected by U.S. policymakers, who are the most important in possessing these proposals. AI can make so much money—literally trillions of dollars a year—that even the simplest measures struggle to overcome the inherent political economy of AI. That is the trap: AI is too powerful, such an enticing prize, that human civilization finds it hard to impose any limits on it.

I can imagine, just as Sagan did in "Contact," the same story playing out on thousands of worlds. A species gains sentience, learns to use tools, begins an exponential rise in technology, faces the crises of industrialization and nuclear weapons, and if it survives those crises, when it learns how to mold sand into thinking machines, it will face the hardest and final challenge. Whether we can pass that test and continue to build the beautiful society described in "The Compassionate Machine," or succumb to enslavement and destruction, will depend on our character and resolve as a species, our spirit and soul.

Despite many obstacles, I believe there is strength within humanity to pass this test. Thousands of researchers are dedicated to helping us understand and guide AI models, shaping the character and constitution of these models, which inspires and encourages me. I think it is now quite possible that these efforts will bear fruit in time. I am encouraged that at least some companies have stated they will bear meaningful commercial costs to prevent their models from fueling the threat of bioterrorism. I am encouraged that a few brave individuals have resisted the prevailing political winds and passed legislation to plant the first early seeds of reasonable guardrails for AI systems. I am encouraged that the public understands the risks of AI and wants to address those risks. I am inspired by the indomitable spirit of freedom and the determination to resist tyranny around the world.

But if we want to succeed, we will need to strengthen our efforts. The first step is for those closest to the technology to simply speak the truth about the situation humanity is in, which is what I have been trying to do; I do so more clearly and urgently through this article The next step will be to persuade the world's thinkers, policymakers, companies, and citizens that this issue is urgent and overwhelmingly important—worthy of thought and political capital compared to the thousands of other issues that dominate the news every day. Then will come the moment of courage, when enough people stand up against the tide and hold to their principles, even in the face of threats to their economic interests and personal safety.

The years ahead of us will be incredibly difficult, demanding more from us than we believe we can give. But in my time as a researcher, leader, and citizen, I have seen enough courage and nobility, and I believe we can win—when in the darkest of circumstances, humanity has a way of gathering the strength and wisdom needed to prevail at the last moment. We have no time to waste