Nude detection dataset contains child sexual abuse imagery
Nude detection dataset contains child sexual abuse imagery
Occurred: December 2025
Page published: December 2025
NudeNet, a large dataset of over 700,000 images scraped from the internet for training AI nudity detection tools, was found to contain child sexual abuse material (CSAM), raising questions about the safety of the AI systems using it and the competence of the individual who created the resource.
A Canadian Centre for Child Protection (C3P) analysis of NudeNet, a dataset containing over 700,000 images scraped from the internet, discovered nearly 680 images that were confirmed or suspected to be CSAM or other harmful material of minors.
This included images of known victims of CSAM, images depicting the genital/anal area of pre-pubescent and post-pubescent children and images depicting sexual or abusive acts involving children and teenagers.
The dataset aimed to enable AI classifiers for the automatic detection of nudity but inadvertently included illegal imagery, thereby potentially enabling the non-consensual distribution of images of victims, and enabling AI models trained on the dataset to generate or propagate CSAM.
The discovery also potentially contributed to the ethical contamination of the academic community. NudeNet had been publicly available on Academic Torrents since June 2019 and cited in over 250 academic works to train AI classifiers designed to automatically detect nudity.
Following the discovery, a removal notice was issued, and the images were taken down from the web service distributing the dataset.
Poor due dligence of the NudeNet combined with a lack of transparency in sourcing internet-scraped images from social media and pornographic websites meant that CSAM entered the collection.
The sheer scale and speed of collecting hundreds of thousands of images, often without ethical review, prioritised quantity over safety.
Limited ethical guidelines for AI training data at the time failed to mandate thorough audits, mirroring issues in other datasets like LAION-5B, which contained over 1,000 verified CSAM instances.
Victims face "revictimisation" through repeated exposure in AI training pipelines and subsequent AI models
Researchers risk legal liabilities for using tainted data
For society, the inclusion of CSAM in NudeNet amplifies the risk of AI image genration tools such as Stable Diffusion being misused. It also highlights the need for stricter data provenance standards.
Developer: Bedapudi Praneeth
Country: Global
Sector: Multiple
Purpose: Detect nude images
Technology: Database/dataset
Issue: Privacy; Safety; Transparency
AIAAIC Repository ID: AIAAIC2159