IBM Diversity in Faces dataset

Released: 2019

Can you improve this page?
Share your insights with us

IBM's Diversity in Faces is a dataset of annotations of one million publicly available facial images released in January 2019 that was intended to make artificial intelligence more fair and equitable across genders and skin colours and accelerate efforts towards creating more fair and accurate face recognition systems.'

IBM's dataset was based on Yahoo!'s YFCC100M dataset, which provides approximately 100 million photos from photo sharing website Flickr available under various Creative Commons licenses. IBM said Diversity in Faces was meant to be an academic/researcher resource, was not publicly available for download or sale, and could not be used for commercial purposes.

Privacy

A March 2019 NBC News investigation discovered that IBM had been using its Diversity in Faces dataset to train its own AI products, including Watson Visual Recognition, without the consent of the people in the photos. Not only was IBM ignoring its own terms of use for the dataset, it also failed to provide attribution links or public credit for any images.

In January 2020, IBM was sued in a class action seeking damages of USD 5,000 for each intentional violation of the Illinois Biometric Information Privacy Act, or $1,000 for each negligent violation, for all Illinois citizens whose biometric data was used in the dataset.

In June 2021, Amazon and Microsoft teamed up to defend themselves against lawsuits accusing them of using Diversity in Faces to train their own facial recognition products, and failing to gain the permission of people whose photographs were used in the dataset.

Transparency

Per the BBC, while IBM said people whose photos had been included in the dataset could technically opt-out of the dataset through the company's generic research privacy policy, nobody was informed that their data had been used.

In addition, image owners found it difficult to have their images removed from Diversity in Faces, and impossible to delete them from copies that had already been provided to researchers.

In June 2020, IBM announced it would no longer develop or sell facial recognition technologies to law enforcement authorities.

Operator: Alphabet/Google; Amazon; IBM; Microsoft
Developer: IBM
Country: USA
Sector: Technology; Research/academia
Purpose: Train & develop AI models
Technology: Dataset; Facial recognition; Computer vision
Issue: Privacy; Copyright; Ethics
Transparency: Governance; Privacy

Dataset

Research, audits, investigations, inquiries, litigation

News, commentary, analysis

Page info
Published: December 2022