MS-Celeb-1M facial recognition database criticised for violating user privacy
Occurred: July 2019
Report incident 🔥 | Improve page 💁 | Access database 🔢
Microsoft deleted its MS-Celeb-1M dataset after it had been found to have scraped the images of celebrities and used them to train facial recognition systems.
Microsoft collected photographs for MS-Celeb-1M by automatically scraping them from search engines, and did so without informing or gaining the consent of those affected, according to a Financial Times investigation.
'Celebrities' whose data was collected included US blogger Cory Doctorow, journalist Glenn Greenwald, author and academic Shoshana Zuboff, and former US FTC commissioner Julie Brill, sparking accusations that the technology company played fast and loose with the definition of public interest.
Despite being restricted to academic use, research paper citations reveal MS-Celeb-1M has been used hundreds of times across the world by companies such as IBM, Panasonic, Hitachi, and Nvidia for a wide variety of commercial purposes.
It also transpired that Microsoft used MS-Celeb-1M to train its own facial recognition systems, as had Chinese technology firms Huawei, Sensetime, and Megvii, whose products are allegedly used to detect and surveil Uyghurs, and to track foreign journalists.
Microsoft quietly took down the dataset in June 2019, telling the FT that 'the site was intended for academic purposes. It was run by an employee that is no longer with Microsoft and has since been removed.'
However, the dataset remains widely available online, with several versions on Github and Academic Torrents.
System 🤖
Operator: Alibaba; École Polytechnique Fédérale de Lausanne; Hitachi; Huawei; IBM; IDIAP Research Institute; Megvii; Microsoft; National University of Defense Technology (NUDT); Nvidia; Panasonic; SenseTime; Universidad Autónoma de Madrid; University of Leicester; Multiple
Developer: Microsoft
Country: USA
Sector: Technology; Research/academia
Purpose: Train facial recognition systems
Technology: Database/dataset; Facial recognition; Computer vision
Issue: Copyright; Dual/multi-use; Ethics/values; Privacy; Surveillance; Transparency
Research, advocacy 🧮
Peng K., Mathur A., Narayanan A. (2021). Mitigating Dataset Harms Requires Stewardship: Lessons from 1000 Papers
Harvey, A., LaPlace, J. (2019). Exposing.ai
Investigations, assessments, audits 🧐
Murgia M., Financial Times (2019). Who’s using your face? The ugly truth about facial recognition
News, commentary, analysis 🗞️
https://www.ft.com/content/7d3e0d6a-87a0-11e9-a028-86cea8523dc2
https://www.nytimes.com/2019/07/13/technology/databases-faces-facial-recognition-technology.html
https://www.biometricupdate.com/201906/ms-celeb-and-other-facial-biometrics-datasets-taken-down
https://futurism.com/microsoft-deletes-facial-recognition-database
https://www.fastcompany.com/90360490/ms-celeb-microsoft-deletes-10m-faces-from-face-database
Page info
Type: Incident
Published: July 2024