ImageNet dataset stereotyping, privacy

Page created: September 2019
Updated: April 2022

Developed by Princeton University researchers in 2008, ImageNet is a database that was intended to help developers of image recognition-based systems by creating a dataset that was a) large, b) diverse and c) high quality.

Widely regarded as a landmark in computer vision research and its sub-set, object recognition, ImageNet was free and open to researchers on a non-commercial basis, though closed to journalists and other public interest parties.

The resource was the subject of an annual ImageNet Large-Scale Visual Recognition Challenge (or ImageNet Challenge) from 2010 to 2017, and resulted in the realisation of the effectiveness of deep learning and neural networks, and their adoption and use by academics, researchers, and technology professionals.

However, ImageNet has also prompted heated debate regarding the accuracy and fairness of its labeling, and its failure to respect the rights of people whose images its developers collected without their consent.

Bias, stereotyping, racism

In September 2019, ImageNet Roulette, a website that encouraged users to upload selfies and then analyse what it saw, revealed that ImageNet contained inaccurate, derogatory, and racially offensive information.

ImageNet Roulette told people what it thought they look like by running their photos through a neural network trained on ImageNet, a database that identifies and classifies over 14 million photographs.

While many captions produced by the code were harmless, some turned out to be inaccurate, or contained racist, misogynistic and other discriminatory and derogatory slurs.

Created by Kate Crawford, co-founder of the AI Now Institute, artist Trevor Paglen, and software developer Leif Ryge, ImageNet Roulette was a 'provocation designed to help us see into the ways that humans are classified in machine learning systems.'

The ensuing fracas led the developers of ImageNet to scrub 'unsafe' and 'sensitive' labels from the database, and to remove links to related photographs.

Privacy, copyright, reproducibility

By automatically scraping images from Google, Bing and photo-sharing platform Flickr to build its training dataset without consent, ImageNet developers were accused of ignoring user privacy, leading lawyers and rights activists to call for stronger privacy and copyright laws.

In March 2021, the ImageNet team announced it had blurred 243,198 photographs in its database using Amazon's Rekogniton image and video analytics service.

The update was seen to have minimal impact on the classification and transfer learning accuracy of the dataset; however, some commentators argued it would damage ImageNet's relevance by styming its reproducibility.

Operator:
Developer:
Princeton University; Jia Deng; Wei Dong, Richard Socher; Li-Jia Li; Kai Li; Fei-Fei Li
Country:
USA
Sector: Research/academia
Purpose:
Identify objects
Technology:
Dataset; Computer vision; Object detection; Object recognition; Machine learning; Deep learning
Issue:
Accuracy/reliability; Bias/discrimination - race, ethnicity, gender, religion, national identity, location; Copyright; Privacy
Opacity:
Access; Privacy

Dataset

Marketing, publications, presentations

Research, audits, investigations, inquiries, litigation

News, commentary, analysis