Microsoft Celeb (MS-Celeb-1M)

Report incident ๐Ÿ”ฅ | Improve page ๐Ÿ’ | Access database ๐Ÿ”ข

MS-Celeb-1M (or Microsoft Celeb) is a dataset developed by Microsoft Research to accelerate research into facial recognition technologies.ย 

Created and published in 2016, MS-Celeb-10 consisted of approximately 10 million facial images of 100,000 celebrities, journalists, artists, musicians, activists, policy makers, writers, and academics. Micosoft also provided a 'target list' of an additional 900,000 names whose images were to be collected.

According to Microsoft, the dataset was created for 'non-commercial research purpose only' and would be applicable to image captioning and news video analysis.ย 

Reckoned to be the largest public dataset of its kind, Microsoft terminated the project mid-2019 shortly after the publication of researcher Adam Harvey's Exposing.ai project and a Financial Times investigation into facial recognition data sharing.

Facial recognition system

A facial recognition system is a technology potentially capable of matching a human face from a digital image or a video frame against a database of faces.

Source: Wikipedia ๐Ÿ”—

Dataset ๐Ÿค–

Documents ๐Ÿ“ƒ

Transparency and accountability ๐Ÿ™ˆ

Transparency limitations associated with the Microsoft Celeb (MS-Celeb-1M) dataset include:ย 

Risks and harms ๐Ÿ›‘

Microsoft Celeb is seen to have posed significant privacy and ethical concerns due to its large-scale collection of celebrity images without explicit consent, potentially enabling unauthorised surveillance, identity theft, and the misuse of personal data.ย 

Research, advocacy ๐Ÿงฎ

Investigations, assessments, audits ๐Ÿ‘๏ธ

Page info
Type: Data
Published: April 2022
Last updated: October 2024