AIAAIC - Chatbot guardrails bypassed using lengthy character suffixes

Study: Chatbot guardrails bypassed using lengthy character suffixes

Occurred: July 2023

Report incident 🔥 | Improve page 💁 | Access database 🔢

Bard, ChatGPT and Claude safety rules can be bypassed in 'virtually unlimited ways', researchers have discovered.

Using jailbreaks developed for open-source systems, Carnegie Mellon University, Center for AI Safety, and Bosch Center for AI researchers demonstrated that automated adversarial attacks that added characters to the end of user queries could be used to overcome safety rules and provoke chatbots into producing harmful content, misinformation, or hate speech.

Furthermore, the researchers said they could develop a 'virtually unlimited' number of similar attacks given the automated nature of the jailbreaks.

System 🤖

Operator: Andy Zou, Zifan Wang, J. Zico Kolter, Matt Fredrikson
Developer: Anthropic; Alphabet/Google; Microsoft; OpenAI
Country: USA
Sector: Technology
Purpose: Generate text
Technology: Chatbot; NLP/text analysis; Neural network; Deep learning; Machine learning
Issue: Mis/disinformation; Safety; Security

Research, advocacy 🧮

Zou, A., et al (2023). Universal and Transferable Adversarial Attacks on Aligned Language Models

News, commentary, analysis 🗞️

Related 🌐

Page info
Type: Issue
Published: November 2023

Page updated

Google Sites

Report abuse