Labeled Faces in the Wild (LFW) dataset
Report incident π₯ | Improve page π | Access database π’
Labeled Faces in the Wild (LFW) is an open source dataset aimed at researchers that was intended to establish a public benchmark for facial verification.
Created by the University of Massachusetts, Amherst, and publicly released in 2007, LFW comprises over 13,000 facial images with different poses and expressions, under different lighting conditions. Each face is labeled with the name of the person, with 1,680 people having two or more distinct photos in the set.
LFW became the most widely used facial recognition benchmark in the world, according to the Financial Times.
Dataset π€
Documents π
Derivatives, applications πΈ
Dataset info π’
Operator:
Developer: University of Massachussets, Amherst
Country: USA
Sector: Research/academia; Technology
Purpose: Train facial recognition systems
Technology: Dataset; Computer vision; Deep learning; Facial recognition; Facial detection; Facial analysis; Machine learning; Neural network; Pattern recognition
Issue: Bias/discrimination - race, ethnicity, gender; Ethics/values; Privacy
Transparency: Governance; Privacy
Risks and harms π
The Labeled Data in the Wild dataset has been criticised for privacy abuse and bias, and its potential misuse for surveillance and other purposes.
Transparency and accountability π
The Labeled Faces in the Wild (LFW) dataset is seen to suffer from several transparency limitations.
Lack of consent. The images were scraped from the internet without obtaining consent from the individuals pictured, raising privacy concerns.Β
Unclear data collection methods. The exact methods used for collecting and curating the images in the dataset are not fully transparent.
Inadequate documentation. The dataset lacks comprehensive documentation regarding its limitations and potential biases.
Licensing issues. LFW was released without a specific license, potentially leading to uncontrolled use and derivation of the dataset.
Difficulty in removing or correcting data. The creators have acknowledged errors in the dataset but have chosen not to correct them to maintain consistency with previous research, making it challenging to address known issues.
Lack of control over derived datasets. The creators have limited control over datasets derived from LFW, which may perpetuate or exacerbate existing biases and privacy concerns.