Duke University pulls facial recognition dataset after privacy controversy

Occurred: September 2019

Duke University pulled a controversial dataset of surveillance videos of thousands of students and staff recorded on campus without consent after a high-profile media investigation.

Created and released in 2015 for research purposes, the DukeMTMC dataset was revealed to have violated the university's own ethics guidelines by recording outdoors without consent. As reported in Duke's Chronicle newspaper, the university's Institutional Review Board said it had approved a study that would take place in a 'defined indoor space' and create a dataset that would be accessible only upon researchers’ request.

Critics also expressed concerns that while DukeMTMC had been released under a CC BY-NC-SA 4.0 license, which allows for attributed, non-commercial sharing and adaption of the dataset, it has been and continues to be used more broadly, including for unethical surveillance applications by companies and researchers in China and elsewhere.

Analysis by artist Adam Harvey showed that DukeMTMC had been used by a wide range of academic institutions and companies with known links to the Chinese military and to Chinese government surveillance of Uyghurs in Xianjiang and elsewhere, including Hikvision, Megvii (Face++), SenseTime, Beihang University, China's National University of Defense Technology, and the PLA's Army Engineering University.

Harvey also points out that the project was 'supported in part by the United States Army Research Laboratory' and was for 'automated analysis of crowds and social gatherings for surveillance and security applications.'

In response to the backlash, Carlo Tomasi, Iris Einheuser professor of computer science at Duke and an author of the study research paper, apologised for running the study outdoors and for making it publicly available. The dataset was also removed from Duke's website. However, the takedown was largely ineffective as the data had already proliferated widely online and been remixed and used by Microsoft, IBM, Baidu and multiple Chinese technology companies to improve their facial recognition systems

The fracas prompted critics to highlight the limitations of taking down original datasets and urge organisations such as Duke University to identify and take down derived datasets, and to better regulate the use of datasets form the outset, and to regulate the creation of derived datasets that enable unethical research. 

Operator: CloudWalk; Hikvision; Megvii; SenseNets; SeeQuestor; SenseTime; Beihang University; National University of Defense Technology, China; NEC; PLA Army Engineering University 
Developer: Ergys Ristani; Francesco Solera; Roger Zou; Rita Cucchiara; Carlo Tomasi; Duke University
Country: USA
Sector: Technology; Research/academia
Purpose: Train facial recognition systems
Technology: Dataset; Facial recognition; Computer vision
Issue: Ethics/values; Dual/multi-use; Privacy
Transparency: Governance; Privacy

Research, advocacy 🧮

Investigations, assessments, audits 🧐