IBM Diversity in Faces (DiF) dataset

Report incident πŸ”₯ | Improve page πŸ’ | Access database πŸ”’

IBM's Diversity in Faces (DiF) is a dataset of annotations of one million publicly available facial images that was intended to make artificial intelligence more fair and equitable across genders and skin colours and accelerate efforts towards creating more fair and accurate face recognition systems.

Released in January 2019, IBM's dataset was based on Yahoo!'s YFCC100M dataset, which provides approximately 100 million photos from photo sharing website Flickr available under various Creative Commons licenses.Β 

IBM said DiF was meant to be an academic/research resource, was not publicly available for download or sale, and could not be used for commercial purposes.

Dataset πŸ€–

Documents πŸ“ƒ

Dataset info πŸ”’

Operator: Alphabet/Google; Amazon; IBM; Microsoft
Developer: IBM
Country: USA
Sector: Technology; Research/academia
Purpose: Train & develop AI models
Technology: Database/dataset; Facial recognition; Computer vision
Issue: Copyright; Ethics/values; Privacy
Transparency: Governance; Privacy

Risks and harms πŸ›‘

IBM's Diversity in Faces dataset raised significant ethical concerns regarding privacy, consent, its potential misuse for surveillance, discriminatory and other purposes. It was also criticised for inadequate transparency.

Transparency and accountability πŸ™‰

IBM was seen to have been opaque in a number of ways about its Diversity in Faces dataset.

Legal, regulatory πŸ‘©πŸΌβ€βš–οΈ