80 Million Tiny Images dataset
Report incident ๐ฅ | Improve page ๐ | Access database ๐ข
80 Million Tiny Images is an image database that is used to train machine learning systems to identify people and objects in an environment.
Created in November 2008 by MIT professors Bill Freeman and Antonio Torralba, and NYU professor Rob Fergus, the dataset contains over 79 million 32ร32 pixel colour images, scaled down from images collected from search engine queries, and a set of 75,062 non-abstract nouns derived from WordNet.
Risks and harms ๐
The 80 Million Tiny Images dataset is seen to have posed significant risks and harms, including offensive and biased labels, privacy violations, the perpetutation of unethical practices, and the enabling of unsafe and harmful AI systems.
Transparency and accountability ๐
The 80 Million Tiny Images dataset is seen to have several notable transparency limitations:
Insufficient documentation. The dataset lacked comprehensive documentation about its contents, collection methods, and potential biases.
Lack of consent. Images were collected without clear consent from the individuals depicted or the copyright holders.
Unclear data sourcing. The exact sources of the images and the criteria for inclusion were not well-defined or disclosed.
Limited metadata. There was insufficient information about the context, origin, or potential biases of individual images.
Difficulty in auditing. Due to its massive size, comprehensive manual review and auditing of the dataset was impractical.
Incidents and issues ๐ฅ
Research, advocacy ๐งฎ
Prabhu V.U., Birhane A. (2020). Large Image Datasets: A Pyrrhic Win for Computer Vision?
Krizhevsky A. (2009). Learning Multiple Layers of Features from Tiny Images (pdf)ย
Page info
Type: Data
Published: December 2022
Last updated: June 2024