Mike Huckabee books used to train language models without consent

Occurred: October 2023

Can you improve this page?
Share your insights with us

Former Arkansas Governor Mike Huckabee and a group of religious authors are suing Meta, Microsoft, Bloomberg, and EleutherAI for using their books to train their large language models without their knowledge or consent.

The lawsuit centres on the Books3 AI training dataset of 180,000 works, which as part of EleutherAI's larger dataset The Pile, has been used to train multiple large language models. Books3 was taken offline in August 2023 following a complaint about copyright abuse by Danish anti-piracy group Rights Alliance.

Huckabee's suit argues that Meta, Microsoft and Bloomberg 'were able to incorporate sophisticated datasets, which included the pirated copyright-protected materials in Books3, as part of the LLM’s training process, without having to compensate the authors.' 

The suit is the latest in a series of copyright suits leveled against large language model and generative AI developers. 

Databank

Operator: Bloomberg; EleutherAI; Meta; Microsoft
Developer: Bloomberg; EleutherAI; Meta; Microsoft
Country: USA
Sector: Media/entertainment/sports/arts
Purpose: Train language models
Technology: 
Issue: Copyright
Transparency: Governance; Marketing