This page lists datasets (definition) used to develop AI, algorithmic and automation systems that have proved to be controversial in one way or the other.
The datasets are listed alphabetically. They are also added to the AIAAIC Repository spreadsheet, where they can be sorted in multiple ways.
AIAAIC0468: 80 Million Tiny Images
AIAAIC0594: BDD100K
AIAAIC0638: BookCorpus
AIAAIC1083: Books3
AIAAIC037: Brainwash
AIAAIC1030: C4
AIAAIC1770: Common Crawl
AIAAIC0351: Coronavirus Mask Image Dataset
AIAAIC0457: DiveFace
AIAAIC0317: Diversity in Faces (DiF)
AIAAIC0200: DukeMTMC
AIAAIC0901: GoEmotions
AIAAIC0924: HRT Transgender Dataset
AIAAIC0939: Iarpa Janus Benchmark-C (IJP-C)
AIAAIC0276: ImageNet
AIAAIC0935: Labeled Faces in the Wild (LFW)
AIAAIC0314: LAION-5B
AIAAIC0762: LAION 400-M
AIAAIC0930: Large-scale CelebFaces Attributes (CelebA)
AIAAIC1106: Library Genesis
AIAAIC0275: MegaFace
AIAAIC031: Microsoft Celeb (MS-Celeb-1M)
AIAAIC0639: NHS patient medical history data store
AIAAIC2160: NudeNet
AIAAIC0940: Oxford Town Centre
AIAAIC0937: People in Photo Albums (PIPA)
AIAAIC0925: People of Tinder
AIAAIC1073: Prosecraft
AIAAIC0903: Real-World Masked Face
AIAAIC1596: Simulated Masked Face Recognition Dataset (SMFRD)
AIAAIC1596: The Pile
AIAAIC0936: Unconstrained College Students (UCCS)
AIAAIC0915: VGG-Face
AIAAIC0871: WILDTRACK
AIAAIC1590: YouTube Subtitles
AIAAIC1107: Z-Library