Whisper
Whisper
Report incident ๐ฅ | Improve page ๐ | Access database ๐ข
An advanced open source automatic speech recognition (ASR) model developed by OpenAI, Whisper is designed to convert spoken language into written text.
Trained on a dataset of 680,000 hours of multilingual audio, Whisper uses AI to transcribe speech across over 100 languages, and can provide real-time transcription for live events and be integrated into applications across different sectors, including healthcare.
Organisations known to have used and customised Whisper include Hint Health, Nabla, Microsoft, Speechly and Oracle.
OpenAI says it "hope[s] the technology will be used primarily for beneficial purposes."
Speech recognition
Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers.
Source: Wikipedia ๐
Website: Whisper ๐
Code: Github ๐
Data: Hugging Face ๐
Released: 2022
Developer: OpenAI
Purpose: Recognise speech; Transcribe speech
Type: Generative AI; Speech-to-text
Technique: Deep learning; Machine learning; Speech recognition
Whisper subreddit (unofficial)
Whisper has been associated with significant potential and actual harms, including:ย
Bias and discrimination. Whisper may perpetuate any manner of biases present in its training data.
Cultural insensitivity. Whisper's performance varies across different languages and dialects, resulting in what can appear culturally insensitive and disrepctful outputs.
Loss of active listening skills. Over-reliance on Whisper may reduce active listening skills.
Factual inaccuracies. Whisper can produce transcription errors, especially with accented speech, background noise, or technical terminology, leading to misinformation and disinformation, and poor decision-making. Such errors can have a serious impact in high-risk domains, including law, healthcare and medicine, and politics and government.
Dual/multiple use. Whisper can easily be misused for mass surveillance and other malicious purposes.
Copyright abuse. OpenAI has not disclosed Whisperโs training sources, raising concerns that the system may violate third-party copyright.
Privacy loss. Voice data collected by Whisper could potentially be to identify and profile individuals captured it its training data.
Employment. The extensive use of Whisper and other automated transcription tools risks replacing human jobs.
Data sources. Critics have pointed out the lack of transparency regarding the datasets used to train Whisper, particularly concerning indigenous languages.
User support. OpenAI does not provide ongoing support or integration assistance for users implementing Whisper, leaving them to manage issues independently - a lack of accountability that can lead to challenges in the ethical and effective use of the model.
Environment. OpenAI have not revealed details of the energy or water consumed in Whisper's training or of the carbon emissions associated with its training and use.
Koenecke A. et al. Careless Whisper: Speech-to-Text Hallucination Harms
Keoni Mahelona, Gianna Leoni, Suzanne Duncan, Miles Thompson. OpenAI's Whisper is another case study in Colonisation
https://www.tomsguide.com/ai/meet-whisper-web-a-new-and-free-way-to-transcribe-audio
https://www.nytimes.com/2024/04/06/technology/tech-giants-harvest-data-artificial-intelligence.html
https://www.healthcare-brew.com/stories/2024/11/18/openai-transcription-tool-whisper-hallucinationsQ
https://interestingengineering.com/culture/why-did-whisper-take-a-million-hours-of-youtube-videos
Page info
Type: System
Published: October 2024
Last updated: December 2024