Anthropic accused of aggressive data scraping

Occurred: July 2024

AI company Anthropic was accused of aggressive and excessive data scraping practices by several website publishers.

Website publishers, including Freelancer.com and iFixit, accused Anthropic of "egregious" data scraping. The company's ClaudeBot (initially named GPTBot) is primarily used to scrape data to train it's large language models (LLMs). 

Freelancer.com reported 3.5 million visits from Anthropic's web crawler in just four hours, significantly impacting site performance and revenue. And  ClaudeBot reportedly visited technology advice site iFixit.com a million or so times over a 24-hour period, gobbling its content, driving its IT team to distraction, and reportedly costing it over USD 5,000 in bandwidth charges.

The scraping activities reportedly violated the terms of use of these and other websites - which explicitly prohibit the reproduction, copying, or distribution of content without prior permission, including using content for training AI models.

Anthropic responded by saying that it aims not to be intrusive or disruptive, but gid not provide detailed comments on the broader accusation. It also published details of its web scraping activities on its website.

Blocking web scraping activities is challenging, as AI developers can bypass blocks by launching new crawlers with different names. It transpired that organisations blocking Anthropic focused on two AI scraper bots called “ANTHROPIC-AI” and “CLAUDE-WEB” and had been unaware of ClaudeBot.

Operator: Anthropic
Developer: Anthropic