VGG Face facial recognition dataset
Report incident 🔥 | Improve page 💁 | Access database 🔢
VGG Face is a dataset created by University of Oxford researchers that comprises 2.6 million facial images of 2,622 people that was created to provide researchers working on facial recognition systems with access to biometric data.
The dataset mostly comprises celebrities, public figures, actors, and politicians whose names were chosen 'by extracting males and females, ranked by popularity, from the Internet Movie Data Base (IMDB) celebrity list.'
Information about ethnicity, age, and kinship was also collected from IMDB.
Operator: ChaLearn; Chinese Academy of Sciences; Delft University of Technology; Simula Research Laboratory; University of Applied Sciences & Arts Western Switzerland; University of California, Berkeley; Universitat Autònoma de Barcelona
Developer: University of Oxford
Country: UK
Sector: Research/academia
Purpose: Develop facial recognition systems
Technology: Database/dataset; Facial recognition
Issue: Copyright; Ethics/values; Privacy
Transparency: Privacy
Risks and harms 🛑
The VGG Face dataset has raised significant ethical concerns and potential harms by collecting and distributing biometric data of over 2,600 individuals without their consent, potentially enabling privacy violations, surveillance, and the development of biased facial recognition technologies.
Transparency and accountability 🙈
The VGG Face dataset is seen to have several significant transparency limitations:
Lack of consent. The dataset was created by scraping images of 2,622 individuals from the internet without obtaining their consent or informing them about how their biometric data would be used.
Unclear data collection process. While some details are provided about using IMDB and Google Image Search to collect images, the full extent of the data collection and curation process is not entirely transparent,
Limited demographic information. Although some information on ethnicity, age, and kinship was collected from IMDB, it is unclear how comprehensive or accurate this demographic data is.
Potential biases. The dataset primarily consists of celebrities and public figures, which may not represent a diverse range of faces and could introduce biases in facial recognition technologies developed using this data. But the dataset lacks comprehensive documentation about potential biases, limitations, or ethical considerations that researchers and developers should be aware of when using the data.
Lack of clear usage guidelines. There appears to be no clear guidelines or restrictions on how the dataset can be used, potentially leading to misuse or unethical applications of the biometric data.
Investigations, assessments, audits 🧐
Harvey, A., LaPlace, J. (2019). Exposing.ai
Page info
Type: Data
Published: January 2023
Last updated: June 2024