BDD100K
BDD100K
Report incident ๐ฅ | Improve page ๐ | Access database ๐ข
BDD100K (or 'Berkley DeepDrive 100K') is an open video dataset intended to help make self-driving safer that comprises 100,000 40 second+ videos collected across the US using vehicle mounted cameras.ย
The dataset contains about one million cars, more than 300,000 street signs and 130,000 pedestrians, with videos also containing GPS locations (from mobile phones), IMU data, and timestamps across 1100 hours.ย
The footage covers different geographic, environmental, and weather conditions, including sunny, overcast, and rainy, and different times of the day and night.
Considered a milestone autonomous and assisted driving research, the release of BDD100K gave researchers access to a large volume of annotated driving data with unparalleled variety in terms of location, weather and time of day, which is critical for creating robust perception algorithms for self-driving cars.
Computer vision
Computer vision tasks include methods for acquiring, processing, analyzing and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the forms of decisions.
Source: Wikipedia ๐
Website ๐
Status: Active
Released: 2018
Sector: Automotive
Purpose: Train self-driving car systems
Type: Database/dataset
Technique: Computer vision; Object recognition
The BDD100K self-driving video dataset is seen to suffer from a number of transparency limitations.ย ย
Data collection process. Limited information is available about the exact methods and criteria used to collect and select the video data.
Annotation quality. The dataset relies on human annotations, which may contain inconsistencies or errors. Details on annotation protocols and quality control measures are not fully disclosed.
Demographic representation. There's limited transparency regarding the demographic diversity of drivers, pedestrians, and locations represented in the dataset.
Environmental conditions. While the dataset includes various weather and lighting conditions, the distribution and selection criteria for these conditions are not entirely clear.
Privacy considerations. Information about steps taken to protect the privacy of individuals captured in the videos (e.g., face blurring) is not fully detailed.
Potential biases. The dataset may contain unacknowledged biases in terms of geographic locations, driving behaviors, or road types represented.
Data versioning. Clear information about dataset versions and updates may be lacking, making it challenging to reproduce results across different studies.
Licensing and usage restrictions. Full details of licensing terms and any usage restrictions may not be readily available or clearly communicated.
Sensor specifications. Detailed information about the cameras and other sensors used to capture the data might be limited.
Ethical considerations. There may be limited transparency regarding the ethical review process or considerations taken into account during dataset creation.
The BDD100K driving video dataset has raised privacy concerns as it contains footage of individuals and vehicles without their explicit consent, potentially exposing them to risks like surveillance, identification, and misuse of their data.ย
Llorca D.F. et al (2023). Attribute Annotation and Bias Evaluation in Visual Datasets for Autonomous Driving
Wilson B., Hoffman J., Morgenstern J. (2019). Predictive Inequity in Object Detection (pdf)
Krizhevsky A. (2009). Learning Multiple Layers of Features from Tiny Images (pdf)ย
Page info
Type: Data
Published: January 2024
Last updated: October 2024