ImageNet - dataset
ImageNet - dataset
Page published: April 2022 | Last updated: October 2024
Report incident๐ฅ| Improve page ๐| Access database ๐ข
Developed by Princeton University researchers in 2008, ImageNet is a database consisting of over 14 million images labeled with over 20,000 categories, with each image annotated using WordNet synonym sets. On release, it was much larger than many previously existing image datasets.
ImageNet is intended to help developers of image recognition-based systems by creating a large, diverse, high quality dataset that is free and open to researchers on a non-commercial basis, though closed to journalists and other public interest parties.
Widely regarded as a landmark in computer vision research and its sub-set, object recognition, the resource was the subject of an annual ImageNet Large-Scale Visual Recognition Challenge (or ImageNet Challenge) from 2010 to 2017.
It resulted in the realisation of the effectiveness of deep learning and neural networks, and their broad adoption and use by academics, researchers and technology professionals.
Computer vision - recognition
The classical problem in computer vision, image processing, and machine vision is that of determining whether or not the image data contains some specific object, feature, or activity.
Wikipedia: Image recognition ๐
Li F.F., Krishna, R. (2022). Searching for Computer Vision North Stars (pdf)
Yang K., Yau J., Li F.F., Jia Deng J., Russakovsky O.(2021). A Study of Face Obfuscation in ImageNet
Deng J.; Dong W.; Socher R.; Li L-J.; Li K.; Li F.F.. (2009). ImageNet: A large-scale hierarchical image database
The ImageNet image recognition dataset is seen to have several important transparency and accountability limitations:
Data provenance. The exact methods used for collecting and curating the images in the dataset are unclear, making it difficult to trace their source and ensure their ethical acquisition.
Data access. Since January 2019, downloads of the full ImageNet data have been disabled, except for a small subset of 1,000 categories, restricting researchers' ability to scrutinise the complete dataset.
Societal accountability. Critics have pointed out that there has been a "tactical abdication of responsibility" by the dataset's creators regarding the ethical implications of their work. This includes a lack of engagement with critical perspectives that could inform better practices in dataset management.
ImageNet has been found to comprise offensive, derogatory and otherwise inappropriate and unsafe images.ย
It also faced criticism that the largely western/English language focus of the images in the dataset may reinforce existing societal racial, religious, cultural, geographic and other biases.
The dataset also led to accusations that its developers had violated the privacy of people whose images had been collected without their acknowledgement and consent, as well as the copyright of people and organisations whose images had been scraped.
Prabhu V.U., Birhane A. (2020). Large image datasets: A pyrrhic win for computer vision?
Dulhanty C., Wong A. (2019). Auditing ImageNet: Towards a Model-driven Framework for Annotating Demographic Attributes of Large-Scale Image Datasets
Paglen T., Crawford K.: ImageNet Roulette
AIAAIC Repository ID: AIAAIC0276