DiveFace dataset
Report incident ๐ฅ | Improve page ๐ | Access database ๐ข
DiveFace is a photographic facial recognition dataset comprises photographs of 24,000 people, with an average 5.5 images per person, for a total 139,677 images.ย
Published in 2019, DiveFace was created by combining the Megaface dataset with additional annotations in order to provide a useful basis for training unbiased and 'discrimination-aware' facial recognition algorithms.
According to the authors, 'DiveFace contains annotations equally distributed among six classes related to gender and ethnicity (male, female and three ethnic groups).' The dataset broadly categorises people as: East Asian, Sub-Saharan and South Indian, and Caucasian.
Dataset ๐ค
Dataset info ๐ข
Operator:
Developer: Aythami Morales, Julian Fierrez, Ruben Vera-Rodriguez, Ruben Tolosana
Country: Global
Sector: Research/academia; Technology
Purpose: Train facial recognition systems
Technology: Database/dataset; Facial recognition; Computer vision
Issue: Bias/discrimination - race, ethnicity; Copyright; Privacy
Transparency:ย
Risks and harms ๐
With over 5,000 ethnic groups worldwide, the decision to group all people means the DiveFace dataset is also regarded as highly simplistic and likely to suffer from its own biases, with certain ethnic groups or gender identities overrepresented or underrepresented.
Transparency and accountability ๐
The DiveFace dataset suffers from multiple transparency limitations:
Demographic categorisation. The method for categorising individuals into demographic groups (e.g. by race or ethnicity) is not explained.
Image sourcing. The exact sources of the facial images and the criteria for selection are not transparent.
Privacy consent. It is unclear whether the individuals whose images are included gave informed consent for their use in this dataset.
Privacy protections. Measures taken to protect the privacy of individuals in the dataset are not fully explained.
Image quality variation. Information about the range of image qualities and how this might affect algorithm performance is incomplete.
Incidents and issues ๐ฅ
Research, advocacy ๐งฎ
Morales A., Fierrez J., Vera-Rodriguez R, Tolosana R. SensitiveNets: Learning Agnostic Representations with Application to Face Images (pdf)