LAION-400M dataset

October 2021

Researchers have found that LAION-400M, an open dataset consisting of image and text pairings, contains 'troublesome and explicit images and text pairs of rape, pornography, malign stereotypes, racist and ethnic slurs, and other extremely problematic content.'

Drawing on content crawled from the internet between 2014 and 2020, the dataset also raises concerns about privacy and copyright.

Launched in 2020 by a group of AI researchers, LAION-400M is touted as the world's largest openly available dataset.

Developer: LAION - Jenia Jitsev, Richard Vencu, Christoph Schuhumann
Country:
Germany
Sector: Technology
Purpose:
Train ML models
Technology: Dataset
Adversary: Abeba Birhane, Vinay Uday Prabhu, Emmanuel Kahembwe
Issue/controversy:
Offensive/inappropriate content; bias/discrimination - racial, ethnic; privacy; copyright