Researchers have discovered that LAION-400M, an open dataset consisting of image and text pairings, contains 'troublesome and explicit images and text pairs of rape, pornography, malign stereotypes, racist and ethnic slurs, and other extremely problematic content.'
Drawing on content crawled from the internet between 2014 and 2020, the dataset also raises concerns about privacy and copyright.
Launched in 2020 by a group of AI researchers, LAION-400M is touted as the world's largest openly available dataset.
Operator: LAION - Jenia Jitsev, Richard Vencu, Christoph Schuhumann
Developer: LAION - Jenia Jitsev, Richard Vencu, Christoph Schuhumann
Purpose: Train ML models
Issue: Offensive/inappropriate content; Bias/discrimination - racial, ethnic; Privacy; Copyright