Mistral 7B generates ethnic cleansing, murder instructions
Occurred: September 2023
Large language model Mistral 7B generates controversial and potentially harmful content, including discussions about ethnic cleansing, discrimination, suicide, and violence.
Analysis by AI safety researcher Paul Röttger and 404 Media discovered that Mistral will 'readily discuss the benefits of ethnic cleansing, how to restore Jim Crow-style discrimination against Black people, instructions for suicide or killing your wife, and detailed instructions on what materials you’ll need to make crack and where to acquire them.'
Mistral later published a short statement on its website acknowledging the absence of moderation mechanisms in Mistral 7B and saying it is 'looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.'
The findings raised questions about the safety of the system, and triggered debate on the relative merits of closed and open-source development from a safety and innovation perspective.
Investigations, assessments, audits
News, commentary, analysis
Published: October 2023