AIAAIC - ImageNet dataset racial, gender stereotyping

ImageNet image recognition dataset

Report incident 🔥 | Improve page 💁 | Access database 🔢

Developed by Princeton University researchers in 2008, ImageNet is a database that was intended to help developers of image recognition-based systems by creating a dataset that was large, diverse and high quality.

Widely regarded as a landmark in computer vision research and its sub-set, object recognition, ImageNet was free and open to researchers on a non-commercial basis, though closed to journalists and other public interest parties.

The resource was the subject of an annual ImageNet Large-Scale Visual Recognition Challenge (or ImageNet Challenge) from 2010 to 2017, and resulted in the realisation of the effectiveness of deep learning and neural networks, and their adoption and use by academics, researchers, and technology professionals.

Dataset 🤖

Documents 📃

ImageNet: A large-scale hierarchical image database
Li F.F., Krishna, R. (2022). Searching for Computer Vision North Stars (pdf)

Operator: Kate Crawford; Trevor Paglen
Developer: Princeton University; Jia Deng; Wei Dong, Richard Socher; Li-Jia Li; Kai Li; Fei-Fei Li
Country: USA
Sector: Research/academia
Purpose: Identify objects
Technology: Dataset; Computer vision; Object detection; Object recognition; Machine learning; Deep learning
Issue: Accuracy/reliability; Bias/discrimination - race, ethnicity, gender, religion, national identity, location; Copyright; Privacy
Transparency: Governance; Privacy

Risks and harms 🛑

ImageNet prompted heated debate regarding the accuracy and fairness of its labeling, and accusations that its developers had failed to respect the rights of people whose images they collected without their consent.

Transparency and accountability 🙈

The ImageNet image recognition dataset is seen to have several important transparency limitations:

Limited access to raw data. Since January 2019, downloads of the full ImageNet data have been disabled, except for a small subset of 1,000 categories, restricting researchers' ability to scrutinise the complete dataset.
Lack of demographic information. Many studies using ImageNet do not mention patient race, ethnicity, or skin tone, making it difficult to assess the diversity of the dataset.
Unclear data collection methods. The exact methods used for collecting and curating the images in the dataset are not clear, resulting in potential privacu and copyright issues.
Potential consent issues. Many individuals included in the dataset may be unaware that their images are being used for facial recognition research.
Bias in image selection. The images were largely sourced from Western/English-language websites, potentially leading to cultural and geographic biases in the dataset.
Limited context. Images are often taken out of their original context, which can lead to misinterpretations or oversimplifications of complex scenes.
Privacy concerns. Some images may contain identifiable individuals who did not consent to their inclusion in the dataset.
Limited metadata. Detailed information about image sources, capture methods, and original contexts is often not provided.
Potential copyright issues. The dataset's use of web-scraped images raises questions about copyright and fair use.

Incidents and issues 🔥

ImageNet found to contain inaccurate, derogatory, and racially offensive information

Research, advocacy 🧮

Prabhu V.U., Birhane A. (2020). Large image datasets: A pyrrhic win for computer vision?
Dulhanty C., Wong A. (2019). Auditing ImageNet: Towards a Model-driven Framework for Annotating Demographic Attributes of Large-Scale Image Datasets

Investigations, assessments, audits 🧐

Paglen T., Crawford K.: ImageNet Roulette

Related 🌐

Page info
Type: Data
Published: April 2022
Last updated: June 2024