Image-generation AIs memorise training images

Occurred: February 2023

Can you improve this page?
Share your insights with us

High-profile AI image generators such as DALL-E and Stable Diffusion memorise images from the data they are trained on, raising concerns about potential copyright and privacy violations.

Researchers at Google Deepmind, Princeton and other US universities extracted over one thousand training images from DALL-E, Google's Imagen, and Stable Diffusion, including photographs, film stills, copyrighted press photos, and trademarked company logos, and discovered that many of them were re-generated nearly exactly.

The researchers got the models to 'nearly identically' reproduce over a hundred training images, often with hardly visible changes like more noise in the image, raising concerns about the reproduction and distribution of copyrighted material, as well as privacy risks to people who do not want their images being used to train AI.


Operator: Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramèr, Borja Balle, Daphne Ippolito, Eric Wallace
Developer: Alphabet/Google; OpenAI; Stability AI
Country: Global
Sector: Multiple
Purpose: Generate images
Technology: Text-to-image; Generative adversarial network (GAN); Neural network; Deep learning; Machine learning  
Issue: Copyright; Privacy
Transparency: Governance


Research, advocacy

News, commentary, analysis