Released: March 2020
LAION-5B is a large, openly available dataset of 5 billion image and text pairings developed by German non-profit collective LAION.
The dataset's predecessor was LAION-400M.
In September 2022, AI artist 'Lapine' found that private medical photographs meant only to be available to her doctor had been used to train the image-text dataset LAION-5B. The dataset is supposed only to use publicly available images on the web.
In April 2023, German stock photographer Robert Kneschke discovered that his photos had been used to train LAION-5B, raising further questions about copyright protections from AI datasets and systems, and the practices and ethics of the dataset's eponymous developer.
Sector: Technology; Research/academia
Purpose: Pair text and images
Technology: Database/dataset; Neural network; Deep learning; Machine learning
Issue: Copyright; Ethics; Privacy; Security
Transparency: Governance; Complaints/appeals
LAION-5B research study (pdf)
News, commentary, analysis
Published: November 2023