AIAAIC - DiveFace dataset

DiveFace

Page published: April 2024 | Last updated: October 2024

Report incident🔥| Improve page 💁 | Access database 🔢

DiveFace is a photographic facial recognition dataset comprises photographs of 24,000 people, with an average 5.5 images per person, for a total 139,677 images.

Published in 2019, DiveFace was created by combining the Megaface dataset with additional annotations in order to provide a useful basis for training unbiased and 'discrimination-aware' facial recognition algorithms.

According to the authors, 'DiveFace contains annotations equally distributed among six classes related to gender and ethnicity (male, female and three ethnic groups).'

The dataset broadly categorises people as: East Asian, Sub-Saharan and South Indian, and Caucasian.

Facial recognition system

A facial recognition system is a technology potentially capable of matching a human face from a digital image or a video frame against a database of faces.

Source: Wikipedia 🔗

Dataset 🤖

Data 🔗
Released: 2019
Developer: Aythami Morales, Julian Fierrez, Ruben Vera-Rodriguez, Ruben Tolosana
Purpose: Train facial recognition systems
Type: Database/dataset
Technique: Computer vision; Facial recognition

Documents 📃

Morales A., Fierrez J., Vera-Rodriguez R, Tolosana R. SensitiveNets: Learning Agnostic Representations with Application to Face Images (pdf)

Transparency, accountability 🙈

The DiveFace dataset suffers from multiple transparency limitations:

Demographic categorisation. The method for categorising individuals into demographic groups (e.g. by race or ethnicity) is not explained.
Image sourcing. The exact sources of the facial images and the criteria for selection are not transparent.
Privacy consent. It is unclear whether the individuals whose images are included gave informed consent for their use in this dataset.
Privacy protections. Measures taken to protect the privacy of individuals in the dataset are not fully explained.
Image quality variation. Information about the range of image qualities and how this might affect algorithm performance is incomplete.

Risks, harms 🛑

With over 5,000 ethnic groups worldwide, the decision to group all people means the DiveFace dataset is also regarded as highly simplistic and likely to suffer from its own biases, with certain ethnic groups or gender identities overrepresented or underrepresented.

Incidents, issues 🔥

January 2021. DiveFace dataset criticised for violating privacy, promoting harmful stereotyping, and abusing copyright

Investigations, assessments, audits 👁️

Harvey, A., LaPlace, J. (2019). Exposing.ai

Related 🌐

Page updated

Google Sites

Report abuse