OpenAI scrapes YouTube without permission to train GPT-4
Occurred: April 2024
Report incident 🔥 | Improve page 💁 | Access database 🔢
OpenAI quietly scraped and transcribed over one million hours of YouTube videos to train its GPT-4 large language model, raising questions about copyright and ethics.
In an effort to source additional training data, Open AI created Whisper, a speech recognition system to transcribe over a million hours of YouTube videos. The resulting transcriptions were included in the training data for GPT-4.
Experts have demonstrated that the performance of large language models improves with increased amounts of training data.
According to The New York Times, OpenAI knew its use of YouTube videos without permission was legally questionable but believed it to be fair use. But critics say it goes against Google’s terms which restrict automated access to YouTube videos and the use of its videos for independent applications outside of YouTube.
Google is thought likely to have known about OpenAI’s activities, but appears not to have acted, possibly as it has also been accused of using YouTube videos to train its own AI models.
YouTube creators are thought to face a variety of actual and potential harms as a result of OpenAI’s actions, including copyright violations, financial loss, and privacy abuse.
Operator: OpenAI
Developer: OpenAI
Country: Global
Sector: Media/entertainment/sports/arts
Purpose: Generate text
Technology: Chatbot; NLP/text analysis; Neural network; Deep learning; Machine learning; Reinforcement learning
Issue: Copyright; Ethics/values
Transparency: Governance
News, commentary, analysis 🗞️
https://www.nytimes.com/2024/04/06/technology/tech-giants-harvest-data-artificial-intelligence.html
https://www.theverge.com/2024/4/6/24122915/openai-youtube-transcripts-gpt-4-training-data-google
https://www.business-standard.com/technology/tech-news/openai-transcribed-google-s-youtube-videos-to-train-ai-models-report-124040800265_1.html
Page info
Type: Incident
Published: April 2024
Last updated: July 2024