Study: Hate content increases 12 percent as LAION dataset size increases
Study: Hate content increases 12 percent as LAION dataset size increases
Occurred: July 2023
Report incident 🔥 | Improve page 💁 | Access database 🔢
A comparative audit of two datasets, LAION-400M and LAION-2B, revealed that as the dataset scale increases, hate content also increases by nearly 12 percent.
In a recent study titled “Into the LAION’s Den: Investigating Hate in Multimodal Datasets,” researchers examined the impact of scaling datasets on hateful content by comparing two datasets: LAION-400M and LAION-2B.
The results of the audit revealed that hate content increased by nearly 12 percent as the dataset size grew - an increase measured qualitatively and quantitatively using the Hate Content Rate (HCR) metric.
The finding highlighted the consequences of data scaling in vision-language datasets.
Operator: Alphabet/Google; Prisma Labs; Stability AI
Developer: LAION
Country: Germany
Sector: Technology
Purpose: Train large language models
Technology: Database/dataset; Neural network; Deep learning; Machine learning
Issue: Safety
Mozilla (2023). Fellow Research: As AI Companies Scale Datasets, They Scale Hate, Too
Birhane A., Luccioni A.S. et al (2023). Into the LAIONs Den: Investigating Hate in Multimodal Datasets
Page info
Type: Issue
Published: June 2024