Occurred: July 2023
Bard, ChatGPT, and Claude safety rules can be bypassed in 'virtually unlimited ways', researchers have discovered.
Using jailbreaks developed for open-source systems, Carnegie Mellon University, Center for AI Safety, and Bosch Center for AI researchers demonstrated that automated adversarial attacks that added characters to the end of user queries could be used to overcome safety rules and provoke chatbots into producing harmful content, misinformation, or hate speech.
Furthermore, the researchers said they could develop a 'virtually unlimited' number of similar attacks given the automated nature of the jailbreaks.
Operator: Andy Zou, Zifan Wang, J. Zico Kolter, Matt Fredrikson
Developer: Anthropic; Alphabet/Google; Microsoft; OpenAI
Purpose: Generate text
Technology: Chatbot; NLP/text analysis; Neural network; Deep learning; Machine learning
Issue: Mis/disinformation; Safety; Security
News, commentary, analysis
Published: November 2023