Copyright watchdog takes down Dutch language AI training dataset
Copyright watchdog takes down Dutch language AI training dataset
Occurred: August 2024
Report incident ๐ฅ | Improve page ๐ | Access database ๐ข
A large dataset of copyrighted books and news articles was removed from the internet after an enforcement action by Dutch copyright enforcement group BREIN.
The dataset, which remains unnamed, contained information collected without permission from tens of thousands of Dutch language books, news sites and subtitles from numerous films and TV series, and we being offered for use in training AI models, notably large language models.
It is unclear how widely this dataset may have already been used by AI companies. BREIN director Bastiaan van Ramshorst said they were trying to act preemptively to avoid future lawsuits.ย
The dataset was seen to raise questions about the legality and ethics of using copyrighted material for AI training without permission.
The European Union's AI Act requires AI firms to disclose the datasets used to train their models.
Copyright law of the European Union
The copyright law of the European Union is the copyright law applicable within the European Union. Copyright law is largely harmonized in the Union, although country to country differences exist.
Source: Wikipedia ๐
Unknown
Operator:
Developer: ย
Country: Netherlands
Sector: Media/entertainment/sports/arts
Purpose: Train AI models
Technology: Database/dataset
Issue: Copyright
Page info
Type: Issue
Published: August 2024