Speech2Face facial reconstructions

Speech2Face is an algorithm developed by a group of researchers at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) and Google AI that generates images of what someone might look like from snippets of audio recordings of their voice.

According to its associated research paper (pdf), the MIT researchers used a dataset of millions of clips from YouTube and elsewhere and created a neural network-based model that learns vocal attributes associated with facial features from the videos.

System 🤖

Documents 📃

Operator: Alphabet/Google
Developer: MIT; Alphabet/Google

Country: USA

Sector: Research/academia

Purpose: Reconstruct facial image

Technology: Neural network
Issue: Accuracy/reliability; Bias/discrimination - race, gender, LGBTQ; Privacy

Transparency: Privacy

Risks and harms 🛑

Concerns have been expressed about Speech2Face's accuracy and reliability, potential for discrimination, and violations of privacy.

Transparency 🙈

The MIT team urges caution on the project's GitHub page, acknowledging that the technology raises questions about discrimination and privacy. They said the training data used was a collection of educational videos from YouTube which may not represent the world population.

'Although this is a purely academic investigation, we feel that it is important to explicitly discuss in the paper a set of ethical considerations due to the potential sensitivity of facial information,' they wrote, recommending that 'any further investigation or practical use of this technology will be carefully tested to ensure that the training data is representative of the intended user population.'

Page info
Type: System
Published: December 2022
Last updated: May 2024