Apple, NVIDIA, Anthropic train AI models using thousands of YouTube videos without permission
Occurred: July 2024
Report incident π₯ | Improve page π | Access database π’
Apple, NVIDIA, Anthropic and other technology companies trained their AI models using subtitle files from over 173,000 YouTube videos without the consent of their creators.Β
An investigation by Proof News, in collaboration with WIRED, revealed that Apple, Nvidia, Anthropic, Salesforce and other firms used the YouTube Subtitles dataset without permission, violating YouTube's rules against harvesting materials from the platform.
Some content creators, upon learning about the use of their material, expressed discontent over the unauthorised use of their work for AI training.
Set against a backdrop of companies using controversial tactics to acquire large amounts of data for training their AI models, the revelation raised questions about data ownership, copyright infringement and fair use, and the ethical use of publicly available content for AI training.
β August 2024. Documents shared with 404 Media showed NVIDIA scraped videos from YouTube, Netflix and several other sources to compile training data for its AI products.
Fair use
Fair use is a doctrine in United States law that permits limited use of copyrighted material without having to first acquire permission from the copyright holder.
Source: Wikipedia π
System π€
Operator: Anthropic; Apple; Bloomberg; Databricks; Nvidia; Salesforce
Developer: EleutherAI
Country: USA
Sector: Media/entertainment/sports/arts
Purpose: Train AI models
Technology: Database/dataset
Issue: Cheating/plagiarism; Copyright; Ethics/values; Transparency
Page info
Type: Incident
Published: July 2024