Large-scale CelebFaces Attributes (CelebA) dataset

The Large-scale CelebFaces Attributes (CelebA) Dataset is a facial dataset developed by a team of researchers at the Chinese University of Hong Kong to help train and test computer vision applications such as facial analysis, facial recognition, and facial detection.

Released late 2015, the dataset consists of 202,599 images of over 10,000 mostly western celebrities, each annotated with 40 attributes such as moustache, beard, spectacles, and the shape of face and nose.

CelebA became a commonly used dataset and is seen to have helped make facial recognition and analysis tools more accurate. It has been referred to and cited in hundreds of academic studies and tests. 

Operator: NVIDIA
Developer: The Chinese University of Hong Kong

Country: Hong Kong

Sector: Research/academia

Purpose: Train and develop AI models

Technology: Dataset; Computer vision; Deep learning; Facial recognition; Facial detection; Facial analysis; Machine learning; Neural network; Pattern recognition
Issue: Accuracy/reliability; Bias/discrimination; Dual/multi-use; Privacy; Surveillance

Transparency: Governance; Marketing; Privacy

Risks and harms 🛑

CelebA has been criticised for violating privacy and amplifying bias, and for its potential misuse to develop unauthorised facial recognition or deepfake applications which could lead to discrimination, identity fraud, and the perpetuation of harmful stereotypes.

Transparency and accountability 🙈

The Large-scale CelebFaces Attributes (CelebA) dataset is seen to suffer from several transparency limitations.

Incidents and issues 🔥

CelebA has been found to be flawed in important ways. 

Research, advocacy 🧮

Page info
Type: Data
Published: January 2023
Last updated: June 2024