Google hate detection AI mistakes bullying for civility
Occurred: February 2017
Report incident 🔥 | Improve page 💁 | Access database 🔢
Google's AI-powered anti-bullying tool faced criticism for misclassifying certain types of online interactions, prompting concerns about its accuracy and effectiveness.
Developed by Jigsaw, Perspective uses machine learning to assess the "toxicity" of online comments, categorising them from "very toxic" to "very healthy."
However, commentators and researchers pointed out that the AI's training data skews its understanding, leading it to overlook harmful phrases while deeming innocuous comments as toxic. For instance, phrases that express overtly discriminatory views can be rated as only slightly toxic, while straightforward profanity receives a much higher toxicity score.
This discrepancy was seen to highlight a significant flaw in the tool's design, reflecting the biases of its creators and the cultural push for civility that can inadvertently sanitise harmful discourse. Critics argue that this approach to moderation may perpetuate existing biases and fail to ad