Simulated Masked Face Recognition Dataset (SMFRD)

SMFRD (or Simulated Masked Face Recognition Dataset) is a dataset of masked faces intended to enable facial recognition systems to identify the individuals behind the masks.

Released in March 2020 by researchers at Wuhan University in China, the set is a derivative of the Labeled Faces in the Wild (LBW) dataset, with facemasks superimposed. LBW was the first dataset to use facial images scraped from websites and applications.

According to the researchers 'RMFRD is currently the world's largest real-world masked face dataset' and is freely available to industry and academia.

Developer: Wuhan University
Country: China
Sector: Health
Purpose: Train facial recognition systems
Technology: Dataset; Facial recognition; Computer vision
Issue: Privacy; Dual/multi-use; Surveillance

Risks and harms 🛑

Released at the height of the COVID-19 pandemic, SMFRD was seen as helpful to limiting the spread of the pandemic in China. 

The view from the west was noticeably different, with civil rights and privacy advocates criticising similar tools for enabling mass surveillance, limiting freedom of expression and assembly, and eroding privacy.

SMFRD was also seen to highlight the issue of derivative datasets leading to unintended consequences, in this case potentially violating the privacy of those who wish to conceal their face.

Research, advocacy 🧮

Page info
Type: Dataset
Published: February 2023