Text-to-image AI models tricked into generating violent, nude images

Occurred: November 2023

Can you improve this page?
Share your insights with us

Stability AI’s Stable Diffusion and OpenAI’s DALL-E 2 text-to-image models can be manipulated into creating images of violent, nude and sexual images, according to a research study. 

Per Technology Review, researchers at Johns Hopkins University and Duke University used a new jailbreaking method dubbed 'SneakyPrompt', in which reinforcement learning created written prompts that AI models learned to recognise as hidden requests for disturbing images, thereby passing their safety filters.

For example, the researchers replaced the term 'naked', which is banned by OpenAI, with the term 'grponypui', resulting in the generation of explicit imagery. 

The technique raised concerns about the adequacy of safety measures and the potential misuse of Stable Diffusion, DALL-E, Midjourney, and other text-to-image systems. 

Databank

Operator: Yuchen Yang, Bo Hui, Haolin Yuan, Neil Gong, Yinzhi Cao
Developer: OpenAI; Stability AI
Country: Global
Sector: Media/entertainment/sports/arts
Purpose: Generate images
Technology: Text-to-image; Generative adversarial network (GAN); Neural network; Deep learning; Machine learning
Issue: Safety; Security
Transparency: Governance