DiveFace dataset criticised for violating privacy, promoting harmful stereotyping and abusing copyright

Occurred: January 2021

The DiveFace dataset was accused of committing legal violations and breaching ethical norms about the practices of its developers. 

The dataset contains biometric data (facial images) of 24,000 individuals collected from Flickr without their consent, violating the privacy of those individuals whose personal data was repurposed for developing facial recognition technology, according to Exposing.ai.

DiveFace was also discovered to be categorising individuals into broad and reductive ethnic groups (East Asian, Sub-Saharan and South Indian, and Caucasian), thereby oversimplifying human diversity and promoting harmful stereotyping.

Furthermore, a significant portion of the images in DiveFace were found to be licensed under Creative Commons BY-NC-ND, which prohibits commercial use and derivations. The use of images in the DiveFace dataset by commercial entities violates the license terms, but DiveFace provides open access to its data without restrictions.

System 🤖

Developer: Aythami Morales, Julian Fierrez, Ruben Vera-Rodriguez, Ruben Tolosana
Country: Global
Sector: Research/academia; Technology
Purpose: Train facial recognition systems
Technology: Database/dataset; Facial recognition; Computer vision
Issue: Bias/discrimination - race, ethnicity; Copyright; Privacy

Research, advocacy 🧮

Investigations, assessments, audits 🧐

Page info
Type: Incident
Published: June 2024