Generative AI pollutes, terminates human language use project
Generative AI pollutes, terminates human language use project
Occurred: September 2024
Report incident 🔥 | Improve page 💁 | Access database 🔢
Wordfreq, a project analysing language usage across various online sources, is shutting down due to contamination from generative AI.
Robyn Speer announced that an influx of unreliable AI-generated text has rendered it impossible to accurately track language trends post-2021. Generative AI has filled the internet with "slop," she said, making it challenging to distinguish genuine human communication from machine-generated content.
The web is now saturated with content generated by large language models, which skews word frequency measurements and compromises data integrity. Speer noted that while spam has always existed online, the nature of AI-generated text makes it vitually indistinguishable from authentic language.
The project relied heavily on open web scraping for its data, but the current landscape has made this approach ineffective and the quality of data available for analysis significantly deteriorated, leading Speer to conclude that there is little reliable information about language usage since 2021.
The closure raises questions about the accuracy and reliability of many generative AI systems and about the ability of their developers and owners to reduce the amount of "slop" being produced, used and abused.
Information pollution
Information pollution (also referred to as info pollution) is the contamination of an information supply with irrelevant, redundant, unsolicited, hampering, and low-value information. Examples include misinformation, junk e-mail, and media violence.
Source: Wikipedia 🔗
Multiple
Operator:
Developer:
Country: Global
Sector: Research/academia
Purpose: Generate text
Technology: Generative AI; Machine learning
Issue: Accuracy/reliability; Information degradation
Page info
Type: Incident
Published: September 2024