LAION-400M image-text pairing dataset

Report incident ๐Ÿ”ฅ | Improve page ๐Ÿ’ | Access database ๐Ÿ”ข

LAION-400M is a large, open dataset of 400 million image and text pairings.ย 

Developed by German non-profit collective LAION and launched in 2020, LAION-400M was used to train Imagen, Lensa, Stable Diffusion, and other text-to-image models.

The dataset's successor LAION-5B comprises 5 billion pairings.

Dataset ๐Ÿค–

Developer ๐Ÿง‘๐Ÿผโ€๐Ÿ’ป

Operator: Alphabet/Google; Prisma Labs; Stability AI
Developer: LAION
Country: Germany
Sector: Technology
Purpose: Train large language models
Technology: Database/dataset; Neural network; Deep learning; Machine learning
Issue: Bias/discrimination - race, ethnicity; Copyright; Ethics/values; Privacy; Safety
Transparency: Governance

Risks and harms ๐Ÿ›‘

The LAION-400M dataset is accused of violating privacy, enabling the generation of offensive, hateful, explicit, and derogatory content, and perpetuating biases due to unfiltered, large-scale web-scraped data.

Transparency and accountability ๐Ÿ™ˆ

The LAION-400M dataset is seen to have several important transparency limitations.

Research, advocacy ๐Ÿงฎ

Page info
Type: Data
Published: October 2021
Last updated: June 2024