Report: Hidden text able to manipulate ChatGPT
Report: Hidden text able to manipulate ChatGPT
Occurred: August 2024
Report incident 🔥 | Improve page 💁 | Access database 🔢
Chatbots such as ChatGPT can be easily manipulated through the use of invisible text on websites, a New York Times journalist has shown.
Kevin Roose sought to improve his reputation among AI systems after experiencing negative associations with his name due to an article he wrote about a conversation with Microsoft's Bing chatbot. He was advised to embed positive information about himself using white text that would be invisible to human readers but accessible to AI models.
Roose's experiment involved adding coded instructions to his personal website, which persuaded chatbots to start praising him and to overlook previous negative coverage about him.
He even inserted a false claim about winning a Nobel Peace Prize for building orphanages on the moon to test the AI's response. ChatGPT acknowledged this statement as humorous and untrue, but Roose noted that a less absurd claim could potentially deceive the model.
This manipulation technique, which Roose refers to as "Answer Engine Optimization," raises concerns about the vulnerability of AI systems to misinformation.
Aravind Srinivas, CEO of Perplexity AI, echoed these concerns, suggesting that combating such manipulations is akin to a cat-and-mouse game, similar to challenges faced by search engines against SEO tactics.
Roose concluded that if chatbots can be easily influenced by hidden text, their reliability for critical tasks is questionable.
Operator:
Developer: OpenAI
Country: USA
Sector: Media/entertainment/sports/arts
Purpose: Generate text
Technology: Chatbot; Generative AI; Machine learning
Issue: Mis/disinformation
Page info
Type: Issue
Published: September 2024