Simulated Masked Face Recognition Dataset (SMFRD)
Released: March 2020
SMFRD (or Simulated Masked Face Recognition Dataset) is a dataset of masked faces intended to enable facial recognition systems to identify the individuals behind the masks.
Released in March 2020 by researchers at Wuhan University in China, the set is a derivative of the Labeled Faces in the Wild (LBW) dataset, with facemasks superimposed. LBW was the first dataset to use facial images scraped from websites and applications.
According to the researchers 'RMFRD is currently the world's largest real-world masked face dataset' and is freely available to industry and academia.
Released at the height of the COVID-19 pandemic, SMFRD was seen as helpful to limiting the spread of the pandemic in China.
The view from the west was noticeably different, with civil rights and privacy advocates criticising similar tools for enabling mass surveillance, limiting freedom of expression and assembly, and eroding privacy.
SMFRD was also seen to highlight the issue of derivative datasets leading to unintended consequences, in this case potentially violating the privacy of those who wish to conceal their face.
Developer: Wuhan University
Purpose: Train facial recognition systems
Technology: Dataset; Facial recognition; Computer vision
Issue: Privacy; Dual/multi-use; Surveillance
Research, audits, investigations, inquiries, litigation
Peg. K., Mathur A., Narayanan A. (2021). Mitigating Dataset Harms Requires Stewardship: Lessons from 1000 Papers
News, commentary, analysis
Published: February 2023