The AI Arms Race

Kieren Sharma
May 21
6 min read

In this episode, we dive deep into a topic we've been eager to cover since the podcast began: The AI Arms Race. What do nuclear weapons, moon landings, and Artificial General Intelligence (AGI) all have in common? They are all results of intense, global arms races, and the AI arms race might just be the most consequential one yet.

Listen Now

A Look Back at History

An “arms race" is traditionally defined as a competition between nations for superiority in the development and accumulation of weapons. The term originated from the “Dreadnought race" in the early 20th century, a naval arms race between Britain and Germany, where the development of revolutionary battleships like the HMS Dreadnought kicked off a rapid escalation that significantly contributed to the tensions leading to World War I.

More generally, an arms race is a competitive dynamic where parties continuously bolster their capabilities in response to perceived or actual enhancements by rivals. Past examples include:

The Nuclear Arms Race (1940s-1991): Following the US detonation of the first atomic bombs in 1945, a fierce rivalry ignited with the Soviet Union, especially after their nuclear test in 1949 and the development of hydrogen bombs. This race brought the world to the brink of nuclear disaster, notably during the Cuban Missile Crisis, highlighting that treaties and safety procedures were only established after near-catastrophe.
The Space Race (1957-1975): A Cold War offshoot aimed at conquering space, this began with the Soviet Sputnik satellite and culminated in the American moon landing.

A key concept for understanding these dynamics is Game Theory, a mathematical area for analysing strategic interactions where each player's best move depends on what others are doing. In an arms race, companies and nation-states choose their R&D speed, openness, and safety budgets relative to rivals. The episode introduces the concept of a Multipolar Nash Equilibrium, a situation where no single player can improve their outcome by changing their current course. In the nuclear standoff, the “Mutually Assured Destruction" (MAD) theory created an equilibrium where no one used nukes because retaliation was guaranteed. However, in the AI race, this could be detrimental: if one company prioritises safety and slows down, it risks being left behind, forcing others to continue racing for dominance.

The AI Arms Race: Key Moments and Similarities

The AI arms race is currently unfolding. Key moments include:

AlphaGo beats Lee Sedol (2016): Google's AI system beat the world champion at the board game Go, a “Sputnik moment for AI" that prompted significant AI investment, especially in China.
ChatGPT Viral Launch (2022): OpenAI's release highlighted generative AI's potential and spurred other tech giants like Google, Meta, and Amazon to race to catch up.
DeepSeek-V3 (2024): This Chinese company's release of an AI model rivalling ChatGPT at a fraction of the cost changed dynamics, proving that frontier-level reasoning could be achieved cheaply and openly, and casting doubt on the valuation of Western companies.

The AI arms race shares some high-level similarities with past races, primarily the human desire to ‘play God' and master life or existence. The moon race was about conquering the physical universe, the nuclear race about conquering death, and the AGI race is about conquering life and consciousness. Strategically, they all involve a first-mover lock-in (setting global standards and rules) and act as a dual-use engine, spurring innovations (like nuclear energy from the atomic bomb or satellites/GPS from the space race) far beyond their initial motivation.

Why the AI Arms Race is Different (and More Concerning)

Despite similarities, the AI arms race is distinct in critical ways:

Intangible Weapons: Unlike missiles, AI consists of algorithms, data, and compute, making the threat less clear but far more pervasive and widespread.
Private-Sector Lead: Start-ups and Big Tech outspend governments, leading to faster R&D and less regulation. This makes it profit-driven, with companies trying to “lock you in" to their products.
Speed of Iteration: AI models improve at an unprecedented rate, often referred to as “double exponential growth," with capabilities doubling every year. This pace means no regulatory or safety framework is currently equipped to deal with it.
Defining Humanity's Interaction: It's not just a race to build AI, but also to define how humanity perceives and interacts with it. Companies may prioritise persuading users and gaining data over purely beneficial innovation.

Who's Winning the AI Race?

Determining the “best" chatbot is complex, as it depends on desired capabilities like:

Stylistic control and core language generation
Knowledge recall and factual accuracy
Reasoning and problem-solving abilities
Multilingual ability and translation
Creativity and content ideation

The episode highlights Chatbot Arena, a “battle-royale" system by UC Berkeley, where anonymous models respond to user prompts, and the public votes for the better answer. This generates an Elo rating (like in chess), reflecting real-world usefulness.

Current rankings show that Google's Gemini is often number one, followed closely by OpenAI's models, and XAI's Grok. DeepSeek ranks well as a non-US model. However, the Elo scores are very close, meaning the top models often beat each other just over half the time.

The “best" chatbot for you is often a personal preference based on your specific use case, and trying different models is recommended.

For example, Anthropic's Claude is highly rated for coding, even if not topping general benchmarks.

The Dark Side of Chatbots

The episode highlights two major concerns with large language models:

Environmental Impact:
1. AI models, especially large ones, consume significant energy for both training and inference (running the model when used).
2. While training occurs once, inference happens constantly with millions of users.
3. Sam Altman (OpenAI CEO) posted on X that the cost of users saying “please" and “thank you" to AI models costs OpenAI tens of millions, due to the longer text requiring more compute power. This indicates the massive overall costs involved.
4. To reduce your personal inference cost and be better for the planet, it's advised to start a new chat for each new, unrelated question, as older conversation context increases energy consumption unnecessarily.
Biases:
1. Chatbots are biased by the data they are trained on, which is predominantly Western, English-speaking, and white male-centric. Studies like “CultureBench" show that models perform best on North American culture and worst on Middle Eastern cultures.
2. Roleplay (telling the AI to act as a specific persona, e.g., “you are an architect") can significantly increase the likelihood of biased responses regarding race, culture, age, and occupation, even if the model avoids bias when not in a roleplay.
3. Cognitive biases in LLM evaluation: Researchers found that even LLMs used as “judges" in benchmark tests exhibit biases like:
  1. Order bias: Preferring the first answer seen (ChatGPT had a 38% order bias; Llama, 61%).
  2. Salience bias: Preferring the longest answer (ChatGPT had a 63% salience bias).
  3. Bandwagon bias: Disagreeing with a stated majority preference.
4. These findings suggest that AI models often replicate very human traits and are not objective, soulless creatures. They act like “shape-shifters," taking on personalities based on how you interact with them.

What's At Stake? The Technological Singularity

A core concern is the technological singularity, a hypothetical future point where AI surpasses human intelligence, leading to uncontrollable and irreversible growth. This is often compared to a black hole, where understanding breaks down beyond a certain point.

The singularity seems inevitable given the current trajectory: OpenAI's mission is to build AGI (defined as generally smarter than humans), and China aims to be an AI superpower by 2030. This could lead to an “intelligence explosion" – a concept coined by Irving John Good in 1965. If AI becomes smarter than humans and capable of recursive self-improvement (improving itself), the speed of development could rapidly accelerate beyond human comprehension or control, leading to superintelligence.

Companies are incentivised to optimise models for AI R&D tasks to win this race. We expressed concern that this push may lead to inadequate safety frameworks. The core difference from previous technologies is that AI is a “bottom-up" approach focused on developing intelligence itself, rather than top-down solutions for specific problems.

The episode uses the “tiger in a cage" analogy: humans can control a tiger not because of physical strength, but because of superior intelligence and technology. The terrifying question arises:

If we create something smarter than us, why would we expect it to treat less intelligent beings (humans) any differently than we treat less intelligent animals?

There are virtually no historical examples of a less intelligent being controlling a more intelligent one without relying on compassion. The danger is that these AI systems, built with limited oversight and no reliable emergency shutdown, may not possess such compassion.

Call to Action

We urge listeners to:

Realise the stakes of this arms race and its development.
Put pressure on governments to regulate AI effectively, as companies left to self-regulate often prioritise profit over safety.
Engage in discussions about AI with those around them to raise public understanding and contribute to the dialogue.

The AI arms race is a very real problem that requires immediate attention and action from everyone.

If you enjoyed reading, don’t forget to subscribe to our newsletter for more, share it with a friend or family member, and let us know your thoughts—whether it’s feedback, future topics, or guest ideas, we’d love to hear from you!