People in Photo Albums (PIPA) dataset

People in Photo Albums (PIPA) is a dataset of facial photographs intended to recognise peoples' identities in photo albums in an unconstrained setting.

Created by Facebook and UC Berkeley and published in 2015, the dataset comprises 60,000 facial images of approximately 2,000 people, of which 32,518 photographs were downloaded from Flickr.

Most of the photos are semi-public images of children, family dinners, weddings, and other personal events

Operator: ETH Zurich; Max Planck Institute of Informatics; Toyota Motor Europe; SenseTime; National University of Singapore; National University of Defense Technology, China; Meta/Facebook
Developer: UC Berkeley; Meta/Facebook
Country: Germany; USA
Sector: Research/academia; Technology; Media/entertainment/sports/arts
Purpose: Train facial recognition systems
Technology: Dataset; Facial analysis; Facial recognition; Computer vision;  
Issue: Copyright; Privacy; Dual/multi-use
Transparency: Governance; Legal

Risks and harms 🛑

The PIPA research paper and proposed methodology have proved popular, having been cited and referenced many times.

However, as Adam Harvey showed in his exposing.ai project, the uses of the data appear to have gone well beyond its stated purpose of processing personal photo albums.

For example, PIPA has been used by China's National University of Defense Technology and Tsinghua University, as well as by many commercial and industrial organisations.

Harvey also highlighted the personal nature of the PIPA dataset, alluding to the privacy implications of those whose images were used. 

It has also been pointed out that PIPA's creators fail to mention the type of CC licence under which the photographs were used, despite some CC licences not permitting any type of re-use.

In January 2020, UC Berkeley stopped distributing the dataset, though it remains available via the Max Planck Institut.

Research, advocacy 🧮

Page info
Type: Dataset
Published: February 2023