Labeled Faces in the Wild (LFW) dataset
Released: 2007
Can you improve this page?
Share your insights with us
Labeled Faces in the Wild (LFW) is an open source dataset aimed at researchers that was intended to establish a public benchmark for facial verification.
According to Papers with Code, 'Facial verification is the task of comparing a candidate face to another, and verifying whether it is a match. It is a one-to-one mapping: you have to check if this person is the correct one.'
Created by the University of Massachusetts, Amherst, and publicly released in 2007, LFW comprises over 13,000 facial images with different poses and expressions, under different lighting conditions. Each face is labeled with the name of the person, with 1,680 people having two or more distinct photos in the set.
Reaction
LFW has been found to be highly skewed towards a very small subset of people, specifically white male faces. It also contains 'a significant number of duplicate or nearly-duplicate images and mislabeled images.'
The researchers later admitted the dataset's limitations on their website. 'Many groups are not well represented in LFW,' it states. 'For example, there are very few children, no babies, very few people over the age of 80, and a relatively small proportion of women. In addition, many ethnicities have very minor representation or none at all.'
Despite these short-comings, LFW has become the most widely used facial recognition benchmark globally, according to the Financial Times. Tel Aviv University researcher Tomer Friedlander told The Register it is 'a widely used dataset in the academic literature for evaluating face recognition methods.'
LFW has also gained some notoriety amongst civil rights and privacy groups for being the first dataset for which 'wild' images were scraped from the internet. According to the Technology Review, it 'opened the floodgates to data collection through web search. Researchers began downloading images directly from Google, Flickr, and Yahoo without concern for consent.'
Operator:
Developer: University of Massachussets, Amherst
Country: USA
Sector: Research/academia; Technology
Purpose: Train facial recognition systems
Technology: Dataset; Computer vision; Deep learning; Facial recognition; Facial detection; Facial analysis; Machine learning; Neural network; Pattern recognition
Issue: Bias/discrimination - race, ethnicity, gender; Ethics; Privacy
Transparency: Governance; Privacy
Dataset
Derivatives, applications
Research, advocacy
Inioluwa D.R., Genevieve F. (2021). About Face: A Survey of Facial Recognition Evaluation
Peg. K., Mathur A., Narayanan A. (2021). Mitigating Dataset Harms Requires Stewardship: Lessons from 1000 Papers
Shmelkin R., Friedlander T., Wolf L. (2021). Generating Master Faces for Dictionary Attacks with a Network-Assisted Latent Space Evolution
Investigations, assessments, audits
Murgia M., Financial Times (2019). Who’s using your face? The ugly truth about facial recognition
News, commentary, analysis
https://www.technologyreview.com/2021/08/13/1031836/ai-ethics-responsible-data-stewardship/
https://www.ft.com/content/cf19b956-60a2-11e9-b285-3acd5d43599e
https://medium.com/voxel51/fifteen-minutes-with-fiftyone-labeled-faces-in-the-wild-6b4e2530787
https://jolt.law.harvard.edu/digest/why-racial-bias-is-prevalent-in-facial-recognition-technology
https://mashable.com/article/facial-recognition-databases-privacy-study
https://www.nytimes.com/2019/07/10/opinion/facial-recognition-race.html
Page info
Type: Data
Published: February 2023