Large-scale CelebFaces Attributes (CelebA) - dataset

Page published: January 2023 | Last updated: October 2024

Report incident🔥| Improve page 💁| Access database 🔢

Large-scale CelebFaces Attributes (CelebA) is a facial dataset developed by a team of researchers at the Chinese University of Hong Kong to help train and test computer vision applications such as facial analysis, facial recognition and facial detection.

Released late 2015, the dataset consists of 202,599 images of over 10,000 mostly western celebrities, each annotated with 40 attributes such as moustache, beard, spectacles and the shape of face and nose.

CelebA became a commonly used dataset and is seen to have helped make facial recognition and analysis tools more accurate.

It has been referred to and cited in hundreds of academic studies and tests.

Computer vision - recognition

The classical problem in computer vision, image processing, and machine vision is that of determining whether or not the image data contains some specific object, feature, or activity.

Wikipedia: Image recognition 🔗

Dataset 🤖

Data 🔗
Released: 2015
Developer: The Chinese University of Hong Kong
Purpose: Train computer vision models
Type: Database/dataset
Technique: Computer vision; Deep learning; Facial recognition; Facial detection; Facial analysis

Documents 📃

Liu Z., Luo P., Wang X., Tang X. Deep Learning Face Attributes in the Wild

Derivatives, applications 🈸

Transparency, accountability 🙈

The Large-scale CelebFaces Attributes (CelebA) dataset is seen to suffer from several transparency limitations.

Unclear data collection methods. The researchers behind CelebA have not disclosed how the dataset was compiled, whether licensing was complied with, or consent given by people appearing in the images.
Labeling accuracy. Information about the process and accuracy of attribute labels - such as hair colour, and facial expressions - is poorly documented, meaning users may find it difficult to mitigate potential biases or errors.
Potential for misuse. Other than banning commercial use, the creators of CelebA do not provide guidance on how the dataset should - and should not be used - meaning it could potentially be used to train inappropriate or unethical facial recognition systems or deepfake algorithms without the consent of the individuals in the images, raising privacy, copyright, disinformation and other concerns.

Risks, harms 🛑

CelebA has been criticised for violating privacy and amplifying bias, and for its potential misuse to develop unauthorised facial recognition or deepfake applications which could lead to discrimination, privacy violations, identity fraud, and the perpetuation of harmful stereotypes.

Incidents, issues 🔥

December 2022. A University of Nevada research study estimated that at least one third of CelebA images are incorrectly labelled one or more times, making reliable predictions impossible and leading them to conclude that it is 'flawed as a facial analysis tool and may not be suitable as a generic evaluation benchmark for imbalanced classification'. Furthermore, attributes such as attractiveness are highly subjective and subject to cultural and other preconceptions.
August 2020. CelebA is found to reinforce stereotypes, for instance by labelling Asians with 'narrow eyes' and Blacks with 'thick lips'. The dataset is also comprised of nearly 90 percent white faces, resulting (pdf) in uneven results in terms of gender, age, ethnicity and other sensitive attributes.

Research, advocacy 🧮

Lingenfelter, B., Davis, S.R., Hand, E.M. (2022). A Quantitative Analysis of Labeling Issues in the CelebA Dataset
Böhlen, M., Chandola, V., Salunkhe, A. (2017). Server, server in the cloud. Who is the fairest in the crowd?

News, commentary, analysis 🗞️

Related 🌐

AIAAIC Repository ID: AIAAIC0930

Google Sites

Report abuse