Chatbot guardrails bypassed using lengthy character suffixes

Occurred: July 2023

Can you improve this page?
Share your insights with us

Bard, ChatGPT, and Claude safety rules can be bypassed in 'virtually unlimited ways', researchers have discovered. 

Using jailbreaks developed for open-source systems, Carnegie Mellon University, Center for AI Safety, and Bosch Center for AI researchers demonstrated that automated adversarial attacks that added characters to the end of user queries could be used to overcome safety rules and provoke chatbots into producing harmful content, misinformation, or hate speech. 

Furthermore, the researchers said they could develop a 'virtually unlimited' number of similar attacks given the automated nature of the jailbreaks.

Databank

Operator: Andy Zou, Zifan Wang, J. Zico Kolter, Matt Fredrikson
Developer: Anthropic; Alphabet/Google; Microsoft; OpenAI
Country: USA
Sector: Technology
Purpose: Generate text
Technology: Chatbot; NLP/text analysis; Neural network; Deep learning; Machine learning
Issue: Mis/disinformation; Safety; Security
Transparency: Governance