DukeMTMC facial recognition dataset

DukeMTMC is a dataset of video footage taken on Duke University's campus in 2014 with the aim of accelerating advances in 'multi-target, multi-camera tracking' using person re-identification and low-resolution facial recognition.

Published (pdf) in 2016 by Duke University academics and researchers, the dataset consists of over 2 million frames of 2,000 students captured using 8 cameras expressly set up to capture students 'during periods between lectures, when pedestrian traffic is heavy'.

Dataset 🤖

Dataset databank 🔢

Operator: CloudWalk; Hikvision; Megvii; SenseNets; SeeQuestor; SenseTime; Beihang University; National University of Defense Technology, China; NEC; PLA Army Engineering University 
Developer: Ergys Ristani; Francesco Solera; Roger Zou; Rita Cucchiara; Carlo Tomasi; Duke University
Country: USA
Sector: Technology; Research/academia
Purpose: Train facial recognition systems
Technology: Dataset; Facial recognition; Computer vision
Issue: Privacy; Ethics; Dual/multi-use
Transparency: Governance; Privacy

Risks and harms 🛑

The project was shut down after the publication of researcher Adam Harvey's Exposing.ai project and a Financial Times investigation into facial recognition data sharing.

Unethical data collection, availability

As reported in Duke's Chronicle newspaper, the university's Institutional Review Board said it had approved a study that would take place in a 'defined indoor space' and create a dataset that would be accessible only upon researchers’ request.

Carlo Tomasi, Iris Einheuser professor of computer science at Duke and an author of the study research paper, later apologised for running the study outdoors and for making it publicly available. 

Academic, commercial, and military uses

Though DukeMTMC had been released under a CC BY-NC-SA 4.0 license, which allows for attributed, non-commercial sharing and adaption of the dataset, it has been and continues to be used more broadly.

Analysis by Adam Harvey shows that DukeMTMC has been cited by hundreds of research studies across the world, with over twice as many originating in China as in the United States.

Chinese citations show the dataset was used by a wide range of academic institutions and companies with known links to the Chinese military and to Chinese government surveillance of Uyghurs in Xianjiang and elsewhere. 

These organisations include Hikvision, Megvii (Face++), SenseTime, Beihang University, China's National University of Defense Technology, and the PLA's Army Engineering University.

Harvey also points out that the project was 'supported in part by the United States Army Research Laboratory' and was for 'automated analysis of crowds and social gatherings for surveillance and security applications.'

Ongoing data availability 

Duke University may have removed the DukeMTMC dataset from its website, but multiple versions and extensions remain available on Github and elsewhere and the original dataset continues to be used for research. 

Research, advocacy 🧮

Investigations, assessments, audits 🧐