Study: LFW dataset discards the privacy rights of internet users

Occurred: February 2021

Prominent dataset Labeled Faces in the Wild (LFW) quietly scraped Google, Flickr, YouTube and other online photo libraries, discarding the privacy rights of photo owners and subjects. 

In a paper examining over 130 facial-recognition data sets compiled over 43 years, researchers Deborah Raji and Genevieve Fried singled out the LFW dataset as being the first for which 'wild' images were scraped from the internet.

According to the Technology Review, the dataset 'opened the floodgates to data collection through web search, with researchers starting to download images directly from Google, Flickr, and Yahoo without concern for consent.'

Earlier, LFW had been found to be highly skewed towards a very small subset of people, specifically white male faces, and contained 'a significant number of duplicate or nearly-duplicate images and mislabeled images.' 

The finding persuaded LFW's creators to acknowledge the dataset's limitations. 

Operator:
Developer: University of Massachussets, Amherst

Country: USA

Sector: Research/academia; Technology

Purpose: Train facial recognition systems

Technology: Dataset; Computer vision; Deep learning; Facial recognition; Facial detection; Facial analysis; Machine learning; Neural network; Pattern recognition
Issue: Bias/discrimination - race, ethnicity, gender; Ethics/values; Privacy; Transparency

Research, advocacy 🧮