Tiny Images dataset teaches AI systems to use racist slurs

Occurred: July 2020

Researchers found that several publicly available image datasets, including 80 Million Tiny Images, contained racist and misogynistic slurs which were causing models trained on them to exhibit racial and sexual bias. 

University of Toronto researchers Vinay Uday Prabhu and Abeba Birhane discovered (pdf) that large-scale image datasets, including the much cited 80 Million Tiny Images dataset, were associating offensive labels with real pictures. 

According to the research, the dataset labeled Black and Asian people with racist slurs and women holding children labeled as whores. It also included pornographic images. 80 Million Tiny Images was used to teach machine learning models to automatically identify and list the people and objects depicted in still images.

In addition, the researchers discovered that WordNet, from which 80 Million Tiny Images copied content, contained derogatory terms, resulting in images and labels that confirm and reinforce stereotypes and biases, albeit inadvertently. 

The creators of 80 Million Tiny Images acknowledged that the large scale (79.3 million images) and small image size (32x32 pixels) made a comprehensive visual inspection of the dataset's contents difficult, leading to the offensive labels and images going unnoticed initially. 

They apologised, and urged researchers to refrain from using the dataset and delete any copies to mitigate further harm. How many copies were downloaded, how they were used, and whether their plea was followed remains unclear.

Operator: University of Toronto
Developer: MIT

Country: USA

Sector: Technology; Research/academia

Purpose: Identify & classify objects, people

Technology: Database/dataset; Computer vision; Object recognition
Issue: Bias/discrimination - race, gender; Privacy; Safety 

Transparency: Governance

Research, advocacy 🧮

Page info
Type: Issue
Published: June 2024