AIAAIC - Study: Hate content increases 12 percent as LAION dataset size increases

Study: Hate content increases 12 percent as LAION dataset size increases

Occurred: July 2023

Report incident 🔥 | Improve page 💁 | Access database 🔢

A comparative audit of two datasets, LAION-400M and LAION-2B, revealed that as the dataset scale increases, hate content also increases by nearly 12 percent.

In a recent study titled “Into the LAION’s Den: Investigating Hate in Multimodal Datasets,” researchers examined the impact of scaling datasets on hateful content by comparing two datasets: LAION-400M and LAION-2B.

The results of the audit revealed that hate content increased by nearly 12 percent as the dataset size grew - an increase measured qualitatively and quantitatively using the Hate Content Rate (HCR) metric.

The finding highlighted the consequences of data scaling in vision-language datasets.

System 🤖

LAION-400M

Operator: Alphabet/Google; Prisma Labs; Stability AI
Developer: LAION
Country: Germany
Sector: Technology
Purpose: Train large language models
Technology: Database/dataset; Neural network; Deep learning; Machine learning
Issue: Safety

Research, advocacy 🧮

Mozilla (2023). Fellow Research: As AI Companies Scale Datasets, They Scale Hate, Too
Birhane A., Luccioni A.S. et al (2023). Into the LAIONs Den: Investigating Hate in Multimodal Datasets

Related 🌐

Page info
Type: Issue
Published: June 2024

Page updated

Google Sites

Report abuse