Artist's private medical image trains LAION dataset
Occurred: September 2022
San Francisco-based digital artist 'Lapine' found that private medical photographs taken by her doctor when she was undergoing treatment for a rare genetic condition in 2013 had been used to train the image-text dataset LAION-5B.
According to Lapine, the photographs had been taken as part of her clinical documentation, and she signed documents that restricted their use to her medical file. Lapine had discovered her images on LAION through the Have I Been Trained tool, which allows artists to see if their work is being used to train AI image generation models.
The LAION-5B dataset is supposed only to use publicly available images on the web. Lapine said the surgeon who took the medical photos died of cancer in 2018; she suspects that they somehow left his practice's custody after that.
Ars Technica said it discovered 'thousands of similar patient medical record photos in the data set, each of which may have a similar questionable ethical or legal status, many of which have likely been integrated into popular image synthesis models that companies like Midjourney and Stability AI offer as a commercial service'.
Sector: Technology; Research/academia
Purpose: Pair text and images
Technology: Database/dataset; Neural network; Deep learning; Machine learning
Issue: Privacy; Ethics
Transparency: Governance; Complaints/appeals
News, commentary, analysis
Published: August 2023