DukeMTMC facial recognition dataset
DukeMTMC is a dataset of video footage taken on Duke University's campus in 2014 with the aim of accelerating advances in 'multi-target, multi-camera tracking' using person re-identification and low-resolution facial recognition.
Published (pdf) in 2016 by Duke University academics and researchers, the dataset consists of over 2 million frames of 2,000 students captured using 8 cameras expressly set up to capture students 'during periods between lectures, when pedestrian traffic is heavy'.
The project was shut down after the publication of researcher Adam Harvey's Exposing.ai project and a Financial Times investigation into facial recognition data sharing.
Unethical data collection, availability
As reported in Duke's Chronicle newspaper, the university's Institutional Review Board said it had approved a study that would take place in a 'defined indoor space' and create a dataset that would be accessible only upon researchers’ request.
Carlo Tomasi, Iris Einheuser professor of computer science at Duke and an author of the study research paper, later apologised for running the study outdoors and for making it publicly available.
Academic, commercial, and military uses
Though DukeMTMC had been released under a CC BY-NC-SA 4.0 license, which allows for attributed, non-commercial sharing and adaption of the dataset, it has been and continues to be used more broadly.
Analysis by Adam Harvey shows that DukeMTMC has been cited by hundreds of research studies across the world, with over twice as many originating in China as in the United States.
Chinese citations show the dataset was used by a wide range of academic institutions and companies with known links to the Chinese military and to Chinese government surveillance of Uyghurs in Xianjiang and elsewhere.
These organisations include Hikvision, Megvii (Face++), SenseTime, Beihang University, China's National University of Defense Technology, and the PLA's Army Engineering University.
Harvey also points out that the project was 'supported in part by the United States Army Research Laboratory' and was for 'automated analysis of crowds and social gatherings for surveillance and security applications.'
Ongoing data availability
Duke University may have removed the DukeMTMC dataset from its website, but multiple versions and extensions remain available on Github and elsewhere and the original dataset continues to be used for research.
Operator: CloudWalk; Hikvision; Megvii; SenseNets; SeeQuestor; SenseTime; Beihang University; National University of Defense Technology, China; NEC; PLA Army Engineering University
Developer: Ergys Ristani; Francesco Solera; Roger Zou; Rita Cucchiara; Carlo Tomasi; Duke University
Sector: Technology; Research/academia
Purpose: Train facial recognition systems
Technology: Dataset; Facial recognition; Computer vision
Issue: Privacy; Ethics; Dual/multi-use
Transparency: Governance; Privacy
Harvey, A., LaPlace, J. (2019). Exposing.ai
Peng K., Mathur A., Narayanan A. (2021). Mitigating Dataset Harms Requires Stewardship: Lessons from 1000 Papers (pdf)
Investigations, assessments, audits
Murgia M., Financial Times (2019). Who’s using your face? The ugly truth about facial recognition
News, commentary, analysis
Published: May 2022