Labeled Faces in the Wild

Page published: February 2023 | Last updated: October 2024

Report incident🔥| Improve page 💁| Access database 🔢

Labeled Faces in the Wild (LFW) is an open source dataset aimed at researchers that was intended to establish a public benchmark for facial verification.

Created by the University of Massachusetts, Amherst, and publicly released in 2007, LFW comprises over 13,000 facial images with different poses and expressions, under different lighting conditions. Each face is labeled with the name of the person, with 1,680 people having two or more distinct photos in the set.

LFW was the most widely used facial recognition benchmark in the world, according to the Financial Times.

Facial recognition system

A facial recognition system is a technology potentially capable of matching a human face from a digital image or a video frame against a database of faces.

Source: Wikipedia 🔗

Dataset 🤖

Website 🔗
Data 🔗
Released: 2007
Developer: University of Massachussets, Amherst
Purpose: Train facial recognition systems
Type: Database/dataset
Technique: Computer vision; Deep learning; Facial recognition; Facial detection; Facial analysis; Pattern recognition

Documents 📃

Huang G. et al. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments (pdf)

Derivatives, applications 🈸

Simulated Masked Face Recognition Dataset (SMFRD)

Transparency, accountability 🙈

The Labeled Faces in the Wild (LFW) dataset is seen to suffer from several transparency and accountability limitations:

Data collection. The images were scraped from the internet without obtaining consent from the individuals pictured, raising privacy concerns.
Inadequate documentation. The dataset lacks comprehensive documentation regarding its limitations and potential biases.
Licensing issues. LFW was released without a specific license, potentially leading to uncontrolled use and derivation of the dataset.
Complaints and appeals. The creators have acknowledged errors in the dataset but have chosen not to correct them to maintain consistency with previous research, making it challenging to address known issues.
Derived datasets. The creators have limited control over datasets derived from LFW, which may perpetuate or exacerbate existing biases and privacy concerns.