LAION-5B image-text pairing dataset

Released: March 2020

Can you improve this page?
Share your insights with us

LAION-5B is a large, openly available dataset of 5 billion image and text pairings developed by German non-profit collective LAION. 

The dataset's predecessor was LAION-400M.

Privacy

In September 2022, AI artist 'Lapine' found that private medical photographs meant only to be available to her doctor had been used to train the image-text dataset LAION-5B. The dataset is supposed only to use publicly available images on the web.

Copyright

In April 2023, German stock photographer Robert Kneschke discovered that his photos had been used to train LAION-5B, raising further questions about copyright protections from AI datasets and systems, and the practices and ethics of the dataset's eponymous developer.

Operator: LAION
Developer: LAION
Country: Germany
Sector: Technology; Research/academia
Purpose: Pair text and images
Technology: Database/dataset; Neural network; Deep learning; Machine learning
Issue: Copyright; Ethics; Privacy; Security
Transparency: Governance; Complaints/appeals