NudeNet - dataset
NudeNet - dataset
Page published: December 2025
Report incidentπ₯| Improve page π| Access database π’
NudeNet is a dataset of over 700,000 images scraped from the internet that aimed to enable AI classifiers for the automatic detection of nude images.Β
Developed by Indian software engineer Bedapudi Praneeth and first released in 2019, the dataset later caused controversy when it was found to contain hundreds of child sexual abuse material (CSAM).
Website: π
Released: 2019Β
Developer: Bedapudi Praneeth
Country: Global
Purpose: Detect nude images
Type: Database/dataset
Technique: Machine learning
NudeNet is seen to suffer from several important transparency and accountability failures:Β
Sourcing and collection. The dataset was created by indiscriminately scraping a vast number of images from various public internet sources, including NSFW forums and social media - a collection method that inherently lacks transparency regarding consent (there was no mechanism to verify the consent of the individuals depicted in the images, fundamentally violating their autonomy) and legality (he scraping process failed to implement sufficient safeguards to filter out illegal content like CSAM, which later led to hundreds of confirmed illegal images being distributed.)
Lack of documentation. The dataset was distributed on platforms such as Academic Torrents and used in hundreds of research papers without the rigorous documentation necessary to inform users about its risks, including its audit history (there was no clear record or disclosure indicating if the dataset had ever been audited for illegal, non-consensual, or deeply harmful content) and biases and limitations.
Failure of responsibility. The individuals and entities who assembled and distributed NudeNet failed to employ fundamental, preventative measures (like working with organisations that use perceptual hashing to detect known CSAM) to ensure the dataset was free of illegal content before its release. This critical lack of due diligence is an accountability failure.Β
Absence of institutional oversight. Hundreds of academic and industry groups cited and used the dataset. This highlights a governance failure by Academic Research Ethics Boards (REBs) and university compliance offices, which did not adequately scrutinize the provenance and legality of the massive, sensitive training datasets used in their approved AI research projects.
NudeNet is seen to pose serious and far-reaching ethical, legal, and personal harms due to its inclusion of Child Sexual Abuse Material and its foundation in non-consensual image collection.Β
Re-victimisation of children: The dataset was found to contain nearly 680 images known or suspected to be CSAEM, including images of identified victims. The inclusion of this material means the victims' abuse is perpetually digitised, distributed, and viewed under the guise of scientific research, compounding the initial trauma.
Propagation of harmful models: Models trained on the NudeNet dataset are inherently tainted, risking the generation of further harmful content, and embedding biases that unfairly target or incorrectly classify certain body types, demographics, or styles of imagery as "nude," leading to unjust censorship for users.
Damage to institutional credibility: The involvement of researchers and universities who used the dataset severely undermines public trust in the AI research establishment. It reinforces the perception that the pursuit of technological advancement often outweighs fundamental ethical and human rights considerations.Β
Legal liability: The possession or distribution of CSAM is a serious crime in many countries. The dataset's creators, the platforms that hosted it (like Academic Torrents), and any researcher who downloaded and retained it face significant legal and criminal liability for possessing illegal content, even if their intent was benign.
Non-Consensual Intimate Imagery (NCII): Beyond CSAM, the vast majority of the "nude" and "sexy" images were collected by scraping public internet sources, including pornography sites and social media, without the consent of the depicted adults. This perpetuates the widespread abuse of NCII by normalising its use in academic and commercial AI systems.
AIAAIC Repository ID: AIAAIC2160