IBM Diversity in Faces (DiF) dataset
IBM's Diversity in Faces (DiF) is a dataset of annotations of one million publicly available facial images released in January 2019 that was intended to make artificial intelligence more fair and equitable across genders and skin colours and accelerate efforts towards creating more fair and accurate face recognition systems.'
IBM's dataset was based on Yahoo!'s YFCC100M dataset, which provides approximately 100 million photos from photo sharing website Flickr available under various Creative Commons licenses. IBM said DiF was meant to be an academic/research resource, was not publicly available for download or sale, and could not be used for commercial purposes.
In January 2020, IBM was sued in a class action seeking damages of USD 5,000 for each intentional violation of the Illinois Biometric Information Privacy Act, or $1,000 for each negligent violation, for all Illinois citizens whose biometric data was used in the DiF dataset.
In June 2021, Amazon and Microsoft teamed up to defend themselves against lawsuits accusing them of using DiF to train their own facial recognition products, and failing to gain the permission of people whose photographs were used in the dataset.
In addition, image owners found it difficult to have their images removed from Diversity in Faces, and impossible to delete them from copies that had already been provided to researchers.
In June 2020, IBM announced it would no longer develop or sell facial recognition technologies to law enforcement authorities.
Operator: Alphabet/Google; Amazon; IBM; Microsoft
Sector: Technology; Research/academia
Purpose: Train & develop AI models
Technology: Dataset; Facial recognition; Computer vision
Issue: Privacy; Copyright; Ethics
Transparency: Governance; Privacy
Investigations, assessments, audits
Harvey, A., LaPlace, J. (2019). Exposing.ai
Raji I.D., Gebru T., Mitchell M., Buolamwini J., Lee J., Denton E. (2020). Saving Face: Investigating the Ethical Concerns of Facial Recognition Auditing
Crawford K., Paglen T. (2021). Excavating AI: the politics of images in machine learning training sets
News, commentary, analysis
Published: December 2022