BDD100K driving video dataset

BDD100K is a driving video dataset that comprises 100,000 videos collected across the US using vehicle mounted cameras. 

The footage covers different geographic, environmental, and weather conditions, including sunny, overcast, and rainy, and different times of the day and night, and come with GPS/IMU information recorded by cell-phones to show rough driving trajectories. 

Described by UC Berkeley, which created the dataset, as 'the largest and most diverse open driving video dataset so far for computer vision research,' it is intended to help make self-driving safer.


Dataset databank

Developer: UC Berkeley
Country: USA
Sector: Automotive
Purpose: Train self-driving car systems
Technology: Database/dataset; Facial recognition; Object recognition
Issue: Accuracy/reliability; Bias/discrimination - race, gender

Racial and gender bias

By hiring people to manually apply labels according to skin colour based on the Fitzpatrick scale, a scale commonly used to classify human skin colour, a 2019 study (pdf) by Georgia Institute of Technology researchers found that BDD100K is, on average, 4.8 percent more accurate at correctly spotting light-skinned pedestrians, and up to 12 per cent worse at spotting people with darker skin.

The researchers noted that the bias 'is not specific to a particular model' and therefore likely to be prominent throughout a variety of facial recognition technology.

Research, advocacy

Page info
Type: Data
Published: January 2024