AI Dungeon offensive speech filter upgrade generates child porn
AI Dungeon offensive speech filter upgrade generates child porn
Occurred: April 2021
Page published: December 2021 | Last updated: April 2023
AI Dungeon developer Latitude came under fire for developing a content moderation system intended to stop players of its open-ended adventure game from generating stories depicting sexual encounters with minors.
An upgrade to OpenAI's GPT-3 large language model resulted in some players typing words that caused the game to generate inappropriate stories. It also appears to have prompted the AI to create child pornography of its own.
However, it quickly became clear that Latitude's new solution was blocking a wider range of content than envisaged.
Gamers also complained that their private content was now being reviewed by moderators.
Meantime, a security researcher published a report calculated that around a third of stories on AI Dungeon are sexually explicit, and one-half are assessed as NSFW.
Latitude content moderation system
Operator: Latitude
Developer: Latitude; OpenAI
Country: USA
Sector: Media/entertainment/sports/arts
Purpose: Minimise sexual content
Technology: Content moderation system; NLP/text analysis
Issue: Accuracy/reliability; Consent; Safety; Privacy/surveillance
AetherDevSecOps (2021). AI Dungeon Public Disclosure Vulnerability Report
Latitude (2021). Update to our Community
https://www.wired.com/story/ai-fueled-dungeon-game-got-much-darker/
https://www.theregister.com/2021/04/30/ai_dungeon_filter_vulnerabilities/
https://analyticsindiamag.com/when-ai-turns-rogue-the-dark-story-of-ai-dungeon/
https://analyticsindiamag.com/openai-proposes-method-to-dilute-toxicity-of-gpt-3/
AIAAIC Repository ID: AIAAIC0616