AIAAIC - Copyright watchdog takes down Dutch language AI training dataset

Occurred: August 2024

Report incident 🔥 | Improve page 💁 | Access database 🔢

A large dataset of copyrighted books and news articles was removed from the internet after an enforcement action by Dutch copyright enforcement group BREIN.

The dataset, which remains unnamed, contained information collected without permission from tens of thousands of Dutch language books, news sites and subtitles from numerous films and TV series, and we being offered for use in training AI models, notably large language models.

It is unclear how widely this dataset may have already been used by AI companies. BREIN director Bastiaan van Ramshorst said they were trying to act preemptively to avoid future lawsuits.

The dataset was seen to raise questions about the legality and ethics of using copyrighted material for AI training without permission.

The European Union's AI Act requires AI firms to disclose the datasets used to train their models.

The copyright law of the European Union is the copyright law applicable within the European Union. Copyright law is largely harmonized in the Union, although country to country differences exist.

Source: Wikipedia 🔗

System 🤖

Unknown

Operator:
Developer:
Country: Netherlands
Sector: Media/entertainment/sports/arts
Purpose: Train AI models
Technology: Database/dataset
Issue: Copyright

Research, advocacy 🧮

BREIN. BREIN takes Artificial Intelligence dataset offline

News, commentary, analysis 🗞️

Related 🌐

Page info
Type: Issue
Published: August 2024

Page updated

Google Sites

Report abuse