LAION-5B links to photos of identifiable Brazilian children

Occurred: June 2024

Dataset LAION-5B was found to contain personal photos and details of identifiable Brazilian children without their knowledge or consent, prompting concerns about privacy and its creator's governance and integrity.

Human Rights Watch (HRW) discovered over 170 photos of children across 10 Brazilian states in the dataset, including names, ages, locations, and other identifying information. Some of the photos dated back to the mid-1990s, while others are as recent as 2023. All images had been posted by families on social media.

In one instance, details revealed a 2-year-old girl and newborn sister, with their names and the hospital where the baby was born.

Human Rights Watch said it only reviewed 0.0001 percent of the 5.85 billion images on LAION-5B, suggesting many more such images could be present. The images violate children's privacy and enable malicious actors to create explicit deepfakes exploiting them, HRW said.

LAION, the non-profit organisation behind the dataset, temporarily removed the offending images and said it would implement filters. LAION-5B was used to train Stable Diffusion, among other models.

The incident was seen to highlight LAION's seemingly slapdash approach to protecting personal privacy. It also prompted concerns about the lack of comprehensive data privacy laws to safeguard children and others from violations by AI systems.

System 🤖

Operator: Human Rights Watch
Developer: LAION
Country: Brazil
Sector: Private - individual
Purpose: Pair text and images
Technology: Database/dataset; Neural network; Deep learning; Machine learning
Issue: Ethics/values; Privacy
Transparency: Governance; Marketing

Research, advocacy 🧮

News, commentary, analysis 🗞️