IBM Diversity in Faces (DiF) dataset

Report incident ๐Ÿ”ฅ | Improve page ๐Ÿ’ | Access database ๐Ÿ”ข

IBM's Diversity in Faces (DiF) is a dataset of annotations of one million publicly available facial images that was intended to make artificial intelligence more fair and equitable across genders and skin colours and accelerate efforts towards creating more fair and accurate face recognition systems.

Released in January 2019, IBM's dataset was based on Yahoo!'s YFCC100M dataset, which provides approximately 100 million photos from photo sharing website Flickr available under various Creative Commons licenses.ย 

IBM said DiF was meant to be an academic/research resource, was not publicly available for download or sale, and could not be used for commercial purposes.

Dataset ๐Ÿค–

Documents ๐Ÿ“ƒ

Operator: Alphabet/Google; Amazon; IBM; Microsoft
Developer: IBM
Country: USA
Sector: Technology; Research/academia
Purpose: Train & develop AI models
Technology: Database/dataset; Facial recognition; Computer vision
Issue: Copyright; Ethics/values; Privacy
Transparency: Governance; Privacy

Risks and harms ๐Ÿ›‘

IBM's Diversity in Faces dataset raised significant ethical concerns regarding privacy, consent, its potential misuse for surveillance, discriminatory and other purposes. It was also criticised for inadequate transparency.

Transparency and accountability ๐Ÿ™‰

IBM was seen to have been opaque in a number of ways about its Diversity in Faces dataset.

Legal, regulatory ๐Ÿ‘ฉ๐Ÿผโ€โš–๏ธ

Investigations, assessments, audits ๐Ÿง

Research, advocacy ๐Ÿงฎ

Page info
Type: Data
Published: December 2022
Last published: June 2024