Iarpa Janus Benchmark-C (IJP-C)

Page published: January 2023 | Last updated: October 2024

Report incident🔥| Improve page 💁 | Access database 🔢

Iarpa Janus Benchmark-C (IJP-C) is a database of YouTube video still-frames and Flickr and Wikimedia photos used for facial recognition benchmarking.

IJP-C was compiled in 2017 by US government subcontractor Noblis and contains 21,294 images of 3,531 people 'with diverse occupations' and of varying levels of fame.

The dataset averages six pictures and three videos per person, and is available on application to computer vision and facial recognition researchers.

Facial recognition system

A facial recognition system is a technology potentially capable of matching a human face from a digital image or a video frame against a database of faces.

Source: Wikipedia 🔗

Dataset 🤖

Website 🔗
Released: 2017
Developer: Noblis; Iarpa
Purpose: Train facial recognition systems
Type: Database/dataset
Technique: Computer vision; Facial recognition

Documents 📃

Brianna Maze, Jocelyn Adams, James A. Duncan, Nathan Kalka, Tim Miller, Charles Otto. IARPA Janus Benchmark - C: Face Dataset and Protocol
Challenge
Poster (pdf)

Transparency, accountability 🙈

The Iarpa Janus Benchmark-C (IJB-C) dataset is seen to suffer from a number of transparency limitations:

Limited access to raw data. The original dataset is over 200GB in size, making it difficult for many researchers to access and analyse the full dataset.
Unclear selection criteria. The reasons for selecting specific individuals for inclusion in the dataset are not clearly explained. The only stated criteria is that source material must include "well-labeled, person-centric data".
Potential consent issues. Many individuals included in the dataset, such as activists and journalists, were likely unaware their images were being used for facial recognition research. For example, digital rights activist Jillian York's images were included without her knowledge or consent.
Violation of platform policies. The dataset includes thousands of faces from over 11,000 YouTube videos, which violates YouTube's terms of service regarding the use of data for facial recognition.
Lack of diversity representation. While the dataset aims to improve representation of the global population, it is unclear how effectively it captures diversity across different demographics.
Limited annotation transparency. Although the dataset includes expanded annotations for covariate analysis, the full extent and accuracy of these annotations are not clearly detailed.
Unclear data processing methods. The exact methods used for processing and curating the images and videos in the dataset are not fully transparent.

Risks, harms 🛑

The Iarpa Janus Benchmark-C (IJP-C) dataset has been criticised for using images of political activists, civil rights advocates, and journalists without their consent, and for enabling its potential misuse for military and security purposes.

Incidents, issues 🔥

September 2019. US government research dataset raises privacy, misuse concerns

Investigations, assessments, audits 👁️

Harvey, A., LaPlace, J. (2019). Exposing.ai
Murgia M., Financial Times (2019). Who’s using your face? The ugly truth about facial recognition

Related 🌐

Page updated

Google Sites

Report abuse