Report incident ๐ฅ | Improve page ๐ | Access database ๐ข
MS-Celeb-1M (or Microsoft Celeb) is a dataset developed by Microsoft Research to accelerate research into facial recognition technologies.ย
Created and published in 2016, MS-Celeb-10 consisted of approximately 10 million facial images of 100,000 celebrities, journalists, artists, musicians, activists, policy makers, writers, and academics. Micosoft also provided a 'target list' of an additional 900,000 names whose images were to be collected.
According to Microsoft, the dataset was created for 'non-commercial research purpose only' and would be applicable to image captioning and news video analysis.ย
Reckoned to be the largest public dataset of its kind, Microsoft terminated the project mid-2019 shortly after the publication of researcher Adam Harvey's Exposing.ai project and a Financial Times investigation into facial recognition data sharing.
Facial recognition system
A facial recognition system is a technology potentially capable of matching a human face from a digital image or a video frame against a database of faces.
Source: Wikipedia ๐
Website ๐
MS Celeb Challenge ๐
Dataset ๐
Dataset & benchmark ๐
Released: 2016
Availability: Available
Purpose: Train facial recognition systems
Type: Database/dataset
Technique: Computer vision; Facial recognition; Machine learning
Guo Y., Zhang L., Hu Y., He X., Gao J. MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition
Guo Y., Zhang L. (2017). One-shot Face Recognition by Promoting Underrepresented Classes (pdf)
Transparency limitations associated with the Microsoft Celeb (MS-Celeb-1M) dataset include:ย
Unclear consent processes for individuals included in the dataset
Lack of transparency about data collection methods and sources
Limited information on how "celebrity" status was determined
Insufficient disclosure of potential biases in the dataset
Inadequate documentation of data quality and accuracy
Lack of clear policies on data usage, sharing, and restrictions
Limited transparency on the dataset's impact on privacy and civil liberties
Insufficient information on data retention and deletion processes
Unclear mechanisms for individuals to request removal from the dataset
Limited public reporting on how the dataset has been used by researchers and companies.
Microsoft Celeb is seen to have posed significant privacy and ethical concerns due to its large-scale collection of celebrity images without explicit consent, potentially enabling unauthorised surveillance, identity theft, and the misuse of personal data.ย
Peng K., Mathur A., Narayanan A. (2021). Mitigating Dataset Harms Requires Stewardship: Lessons from 1000 Papers
Harvey, A., LaPlace, J. (2019). Exposing.ai
Murgia M., Financial Times (2019). Whoโs using your face? The ugly truth about facial recognition
Page info
Type: Data
Published: April 2022
Last updated: October 2024