
"Free lunch" is over? The EU investigates Google's AI dominance, pointing to its use of search crawlers to obtain training data for free

According to reports, the investigation focuses on Google's use of the Googlebot crawler to freely obtain content from the entire web to train its AI, while competitors must spend hundreds of millions of dollars to purchase data licenses. Websites face a dilemma: blocking the crawler will result in losing search rankings, while allowing it means content is used without compensation. Regulators are considering forcing Google to separate its search and AI crawlers, allowing publishers to opt out or request compensation separately, creating a fair competitive environment
The European Union is investigating whether Google is using its dominant position in the search engine market to obtain content for AI training through web crawlers without compensation, thereby gaining an unfair advantage in the artificial intelligence competition.
On December 10th, Bloomberg reported that EU regulators have launched an investigation into Google's AI Overviews and AI Mode features to determine whether it has imposed unfair terms on content creators, giving its AI models an advantage over competitors.
The focus of the investigation is on the fact that while AI competitors like OpenAI, Anthropic, and Amazon spend hundreds of millions of dollars to reach licensing agreements with publishers for training data, Google obtains content from the entire internet for free through its web crawler Googlebot that serves Google Search.
This advantage has helped Google quickly catch up with competitors after the launch of ChatGPT. After being caught off guard by OpenAI's ChatGPT, Google's parent company Alphabet has made rapid progress, with its AI models now reaching the level of its competitors.
The report notes, however, that the question is whether Google achieved this accelerated development through fair means. If regulators determine that its practices are improper and enforce changes, it could harm Google's AI prospects.
Google's "Double Standard": Free Acquisition vs. Paid Competition
The report states that Google enjoys a unique advantage in acquiring AI training data. The company relies on an automated program called Googlebot to browse web pages and index them for its search engine, organizing all discovered content into Google's vast searchable index.
At the same time, Google uses the same program to provide training data for the models behind its Gemini chatbot and AI Overviews. This means that while other AI companies pay for high-quality data to train their AI, Google obtains these resources for free. The program used to index global information is also helping to train its AI systems.
Matthew Prince, CEO of Cloudflare, stated at Bloomberg's technology summit earlier this year: "Google is saying that we have a divine right to all the content in the world, even if we don't pay for it."
The situation is further complicated by the fact that users are now more reliant on AI summaries for information rather than clicking on links in search results, leading to a decline in traffic for website owners. This creates a dilemma:
Blocking Google's crawlers may result in a website not being indexed in regular search results, but allowing Google to use its content to train AI systems means losing the opportunity for compensation.
The report points out that worse still, Google disclosed in court earlier this year that due to organizational issues, even if a website opts out, Google will still use its content for AI training. Publishers are effectively held hostage: either accept their content being used for free or risk disappearing from search results.
Regulatory Solutions Emerge: Mandatory Separation of Search and AI Crawlers
According to Bloomberg, Matthew Prince has been guiding European regulators towards a concise and elegant solution:
Force Google to use Googlebot solely for search and create a separate web crawler specifically for scraping content needed for AI Overviews. This way, publishers can correctly opt out or request compensation.
From a technical perspective, Google engineers could easily build a crawler with different identifiers, allowing publishers to block it individually. However, Alphabet will resist any measures that force it to negotiate and pay for AI training content like other companies.
The logic behind this proposal is that if NVIDIA can charge for chips, engineers can charge for their time and intellect, then website publishers should also be able to charge for their content output.
The report points out that forced separation will create a fairer competitive environment, where all AI companies face the same cost structure for acquiring training data.
According to reports, Google claims that the EU investigation "could stifle market innovation that is more competitive than ever before." However, the reality is quite the opposite:
The AI boom should have driven a competitive market with hundreds of viable companies, just like the initial internet boom. But it is heading towards concentrating profits in the hands of existing giants like Google.
Analysis indicates that the "dual use" through Googlebot is just the latest example of how Google leverages its dominance to further entrench its advantages. This potential advantage that allows Google to catch up with competitors so quickly needs to be eliminated to create a truly fair AI competitive environment
