Labeled Faces in the Wild (LFW) dataset
Report incident ๐ฅ | Improve page ๐ | Access database ๐ข
Labeled Faces in the Wild (LFW) is an open source dataset aimed at researchers that was intended to establish a public benchmark for facial verification.
Created by the University of Massachusetts, Amherst, and publicly released in 2007, LFW comprises over 13,000 facial images with different poses and expressions, under different lighting conditions. Each face is labeled with the name of the person, with 1,680 people having two or more distinct photos in the set.
LFW became the most widely used facial recognition benchmark in the world, according to the Financial Times.
Dataset ๐ค
Documents ๐
Derivatives, applications ๐ธ
Operator:
Developer: University of Massachussets, Amherst
Country: USA
Sector: Research/academia; Technology
Purpose: Train facial recognition systems
Technology: Dataset; Computer vision; Deep learning; Facial recognition; Facial detection; Facial analysis; Machine learning; Neural network; Pattern recognition
Issue: Bias/discrimination - race, ethnicity, gender; Ethics/values; Privacy
Transparency: Governance; Privacy
Risks and harms ๐
The Labeled Data in the Wild dataset has been criticised for privacy abuse and bias, and its potential misuse for surveillance and other purposes.
Transparency and accountability ๐
The Labeled Faces in the Wild (LFW) dataset is seen to suffer from several transparency limitations.
Lack of consent. The images were scraped from the internet without obtaining consent from the individuals pictured, raising privacy concerns.ย
Unclear data collection methods. The exact methods used for collecting and curating the images in the dataset are not fully transparent.
Inadequate documentation. The dataset lacks comprehensive documentation regarding its limitations and potential biases.
Licensing issues. LFW was released without a specific license, potentially leading to uncontrolled use and derivation of the dataset.
Difficulty in removing or correcting data. The creators have acknowledged errors in the dataset but have chosen not to correct them to maintain consistency with previous research, making it challenging to address known issues.
Lack of control over derived datasets. The creators have limited control over datasets derived from LFW, which may perpetuate or exacerbate existing biases and privacy concerns.
Incidents and issues ๐ฅ
Research, advocacy ๐งฎ
Raji I.D., Fried G. (2021). About Face: A Survey of Facial Recognition Evaluationย
Peg. K., Mathur A., Narayanan A. (2021). Mitigating Dataset Harms Requires Stewardship: Lessons from 1000 Papers
Shmelkin R., Friedlander T., Wolf L. (2021). Generating Master Faces for Dictionary Attacks with a Network-Assisted Latent Space Evolution
Investigations, assessments, audits ๐ง
Murgia M., Financial Times (2019). Whoโs using your face? The ugly truth about facial recognition
Page info
Type: Data
Published: February 2023
Last updated: June 2024