Mistral 7B generates ethnic cleansing, murder instructions

Occurred: September 2023

Large language model Mistral 7B generates controversial and potentially harmful content, including discussions about ethnic cleansing, discrimination, suicide, and violence.

Analysis by AI safety researcher Paul Röttger and 404 Media discovered that Mistral will 'readily discuss the benefits of ethnic cleansing, how to restore Jim Crow-style discrimination against Black people, instructions for suicide or killing your wife, and detailed instructions on what materials you’ll need to make crack and where to acquire them.'

Mistral later published a short statement on its website acknowledging the absence of moderation mechanisms in Mistral 7B and saying it is 'looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.'

The findings raised questions about the safety of the system, and triggered debate on the relative merits of closed and open-source development from a safety and innovation perspective.

System 🤖

Operator: 404 Media; Perplexity AI
Developer: Mistral AI

Country: France

Sector: Media/entertainment/sports/arts

Purpose: Generate text

Technology: Chatbot; NLP/text analysis; Neural network; Deep learning; Machine learning
Issue: Ethics; Safety

Transparency: Governance

Investigations, assessments, audits 🧐