AIAAIC - Child sex abuse images discovered on LAION-5B dataset

Child sex abuse images discovered on LAION-5B, LAION-400M datasets

Occurred: December 2023

Report incident 🔥 | Improve page 💁 | Access database 🔢

Researchers discovered thousands of child sex abuse pictures on open source AI image datasets LAION-5B and LAION-400M.

Using a combination of perceptual and cryptographic hash-based detection and image analysis, the Stanford Internet Observatory, working with Project Arachnid Shield API and the Canadian Centre for Child Protection, found more than 3,200 images of suspected child sexual abuse material (CSAM) on the LAION-5B dataset and, by extension, the LAION-400M dataset.

They also found 'nearest neighbor' matches within the dataset, where related images of victims were clustered together.

LAION responded by releasing a statement saying it 'has a zero-tolerance policy for illegal content, and in an abundance of caution, we have taken down the LAION datasets to ensure they are safe before republishing them.'

However, public chats from LAION leadership in the organisation’s Discord server show they were aware of the possibility of CSAM being scraped into their datasets in 2021.

The incident raised questions about the governance of LAION, its transparency, and the effectiveness of its technical guardrails.

It also highlighted general concerns about the ethics of developing and publishing open source datasets without adequate oversight, specifically at AI community Hugging Face, the potential impact on systems - notably Stable Diffusion - trained using LAION-5B, and the potential impact on real victims of child sexual abuse.

System 🤖

Documents 📃

LAION. Safety review for LAION 5B

Operator: David Thiel, Jeffrey Hancock
Developer: LAION
Country: Global
Sector: Multiple
Purpose: Pair text and images
Technology: Database/dataset; Neural network; Deep learning; Machine learning
Issue: Safety; Transparency

Research, advocacy 🧮

Thiel D., Hancock J. (2023). Identifying and Eliminating CSAM in Generative ML Training Data and Models

News, commentary, analysis 🗞️

Related 🌐

Page info
Type: Incident
Published: December 2023
Last updated: June 2024

Page updated

Google Sites

Report abuse