MegaFace facial recognition dataset

MegaFace is a facial recognition training dataset consisting of 4,753,320 faces of 672,057 identities from 3,311,471 photos downloaded from 48,383 Flickr users' photo albums. 

Created in 2015 by researchers at the University of Washington, the project was expanded in 2016 in the form of the MegaFace Challenge, in which facial recognition teams were encouraged to download the database and see how their algorithms performed when they had to distinguish between a million possible matches.

Like IBM's Diversity in Faces dataset, MegaFace was based on Yahoo!'s YFCC100M dataset, which provides approximately 100 million photos from photo sharing website Flickr under various Creative Commons licenses. 

Partly due its size, MegaFace became one of the most important benchmarks for commercial face recognition vendors. The only public dataset with a comparable number of images was Microsoft's MS-Celeb-1M dataset, which was withdrawn after a Financial Times/Exposing.ai investigation.

The MegaFace Challenge and dataset were discontinued in June 2020.

MegaFace was financed by Samsung, Google’s Faculty Research Award, and by the National Science Foundation/Intel.

Dataset 🤖

Documents 📃

Derivatives, applications 🈸

Dataset info 🔢

Operator: Alibaba; Alphabet/Google; Amazon; Bytedance; EUROPOL; Huawei; In-Q-Tel; IntelliVision; Megvii; Mitsubishi Electric; Northrup Grumman; Ntechlab; Philips; Samsung; SenseTime; Sogou; Tencent; Vision Semantics
Developer: University of Washington
Country: USA
Sector: Technology; Research/academia
Purpose: Improve research quality
Technology: Database/dataset; Facial recognition; Computer vision
Issue: Copyright; Dual/multi-use; Privacy; Surveillance; Liability
Transparency: Privacy; Marketing

Risks and harms 🛑

By using 3.3 million images scraped from Flickr without user consent, the MegaFace dataset is seen to pose significant risks and harms leading to privacy violations, potential misuse in surveillance and military applications, and the perpetuation of biases in facial recognition technologies. 

The dataset was downloaded and used by thousands of organisations and individuals across the world, and used to create multiple derivative datasets, many of which continue to exist.

Transparency and accountability 🙈

MegaFace has been criticised for poor transparency and accountability.