MegaFace facial recognition dataset

MegaFace is a facial recognition training dataset consisting of 4,753,320 faces of 672,057 identities from 3,311,471 photos downloaded from 48,383 Flickr users' photo albums. 

Created in 2015 by researchers at the University of Washington, the project was expanded in 2016 in the form of the MegaFace Challenge, in which facial recognition teams were encouraged to download the database and see how their algorithms performed when they had to distinguish between a million possible matches.

Like IBM's Diversity in Faces dataset, MegaFace was based on Yahoo!'s YFCC100M dataset, which provides approximately 100 million photos from photo sharing website Flickr under various Creative Commons licenses. 

Partly due its size, MegaFace became one of the most important benchmarks for commercial face recognition vendors. The only public dataset with a comparable number of images was Microsoft's MS-Celeb-1M dataset, which was withdrawn after a Financial Times/ investigation.

Operator: Alibaba; Alphabet/Google; Amazon; Bytedance; EUROPOL; Huawei; In-Q-Tel; IntelliVision; Megvii; Mitsubishi Electric; Northrup Grumman; Ntechlab; Philips; Samsung; SenseTime; Sogou; Tencent; Vision Semantics
Developer: University of Washington
Country: USA
Sector: Technology; Research/academia
Purpose: Improve research quality
Technology: Dataset; Facial recognition; Computer vision
Issue: Privacy; Copyright; Liability
Transparency: Privacy; Marketing

Risks and harms 🛑


In October 2019, the New York Times reported that as many as 700,000 people had their likenesses uploaded from Flickr to MegaFace, including many children. 

The images had been used to train AI to identify protesters in the USA and monitor Uighurs in China, among other uses.

Transparency 🙈

A University of Washington spokesperson told the NYT that the researchers who created the MegaFace database 'have moved on to other projects and don't have the time to comment on this.'

The MegaFace Challenge and dataset were discontinued in June 2020. But not before the dataset had been downloaded and used by thousands of organisations and individuals across the world, and used to create multiple derivative datasets, such as MegaAge, DiveFace, and TinyFace, many of which continue to exist.

Derivatives, applications 🈸

Investigations, assessments, audits 🧐