Nvidia sued for training NeMo on authors' copyrighted works

Occurred: March 2024

GPU chip provider Nvidia has been sued by 3 authors accusing it of training it's NeMo AI models on copyrighted books.

Authors Brian Keene, Abdi Nazemian and Stewart O'Nan submitted a class action lawsuit against Nvidia for copyright infringement, saying their works were part of the Books3 dataset and were trained on NeMO generative AI platform without their permission. 

The Books3 dataset, the lawsuit argued, copied "all of Bibliotek" - a so-called shadow library of approximately 196,640 pirated books that had earlier been available as part of The Pile - a larger dataset - through AI community Hugging Face. The Pile was later removed from Hugging Face in the wake of a copyright complaint.

The authors want compensation for their creative labour and the destruction of all copies of the Books3 dataset, and argue that Nvidia’s October 2023 takedown of the NeMo AI platform was an implicit admission of its guilt. 

The case highlighted ongoing copyright clashes between the AI industry and creative communities, with transparency and infringement claims at the forefront.

Fair use

Fair use is a doctrine in United States law that permits limited use of copyrighted material without having to first acquire permission from the copyright holder.

Source: Wikipedia 🔗

October 2023. Nvidia withdrew the NeMo platform and acknowledged the model had been trained on a dataset containing "approximately" 196,640 books. The Books3 dataset contains the same number of books.

Operator: Nvidia
Developer: Nvidia
Country: USA
Sector: Media/entertainment/sports/arts
Purpose: Train and deploy custom LLMs
Technology: Generative AI; Machine learning; Neural network; Deep learning; NLP/text analysis
Issue: Copyright; Ethics/values
Transparency: Governance; Marketing

Regulation 👩🏼‍⚖️