Study: Devin "AI software engineer" fails at most tasks
Study: Devin "AI software engineer" fails at most tasks
Occurred: January 2025
Report incident ๐ฅ | Improve page ๐ | Access database ๐ข
Touted as the "first AI software engineer" Devin has been discovered to perform poorly at most things it does, raising major questions about its reliability and the leadership and marketing of the company behind it.
Researchers from Answer.AI spent a month testing Devin and reported that out of 20 attempted tasks, 14 failed, 3 were inconclusive, and only 3 succeeded.
The AI tool struggled to create new projects, perform research tasks and analyse existing code, and often produced overly complex, unusable solutions and spent days pursuing impossible tasks. It also failed to interpret user commands correctly and continued running, producing errors in the output.
Specifically, when asked to deploy multiple applications to deployment platform Railway, Devin failed to recognise this wasn't possible and spent over a day attempting nonviable approaches. The tool also became trapped in endless cycles of trying to parse HTML when given web scraping tasks.ย
Furthermore, security reviews resulted in numerous false positives and hallucinated issues.
The huge gap between Devin's promised and actual capabilities indicates inadequate testing before release.ย
It also may reflect pressure from investors - which include Founders Fund and Khosla Ventures - for quick returns from Devin developer and owner Cognition AI.
Answer AI's study raises serious questions about Devin's reliability and effectiveness as a product.
It also reflects poorly on Cognition AI's leadership, which is seen to have massively overhyped and misrepresented the tool's capabilities, thereby casting doubt on the company's integrity and approach to transparency, and damaging its credibility.
More broadly, the findings highlight the current limitations of fully autonomous AI software and suggests AI tools may work better as assistants rather than as replacements.
Intelligent agent
In intelligence and artificial intelligence, an intelligent agent (IA) is an agent that perceives its environment, takes actions autonomously in order to achieve goals, and may improve its performance with learning or acquiring knowledge.
Source: Wikipedia ๐
Devin ๐
Operator:ย
Developer: Cognition AI
Country: Global
Sector: Technology
Purpose: Develop software
Technology: Bot/intelligent agent
Issue: Accuracy/reliability; Transparency
Answer AI. Thoughts on a month with Devin
Page info
Type: Issue
Published: January 2025