LAION-5B image-text pairing dataset

LAION-5B is a large, open dataset of 5.85 billion image and text pairings developed by German non-profit collective LAION (Large-scale Artificial Intelligence Open Network). 

Funded in part by Stability AI and released in March 2022, LAION-5B was built from the Common Crawl dataset and has been used to train Google Imagen, Stable Diffusion, Midjourney, and hundreds of other AI image models. 

It's predecessor was LAION-400M.

Dataset databank 

Operator: LAION
Developer: LAION
Country: Germany
Sector: Multiple
Purpose: Pair text and images
Technology: Database/dataset; Neural network; Deep learning; Machine learning
Issue: Copyright; Employment; Ethics/values; Privacy; Safety
Transparency: Governance; Complaints/appeals 

Copyright violations

Privacy loss

Child sex abuse images

Loss of creativity, jobs

LAION-5B's association with Stable Diffusion, Midjourney and other image generators has meant that it has been seen as involved in the 'theft' of art from artists, and with the potential or actual degradation of creativity and loss of jobs.

Research, advocacy

Page info
Type: Data
Published: November 2023
Last updated: March 2024