AIAAIC - Study: Generative AI systems overstate what they know

Study: Generative AI systems overstate what they know

Occurred: October 2024
Page published: November 2024

Report incident🔥| Improve page 💁| Access database 🔢

A recent study by OpenAI reveals that high-profile generative AI systems tend to overstate their knowledge, leading to user overconfidence and the creation and amplification of misinformation.

What happened

An OpenAI study proposing SimpleQA, a new benchmark to evaluate how generative AI models respond to questions, found that these systems, often provide inaccurate answers confidently, despite lacking sufficient information or understanding of what they are saying.

This tendency to "hallucinate" can lead to overconfidence amongst users, and mislead them about the accuracy and reliability of the information being presented.

The findings led experts to point out that several of OpenAI's own generative AI systems were included in the study, including its GPT-4o reasoning model, and that a variant of the company's GPT-3 language model performed worst.

Why it happened

The phenomenon of overstatement in generative AI systems is largely attributed to their design, which focuses on generating plausible-sounding text rather than accurately assessing the veracity of the information.

These models are trained on vast datasets and are programmed to produce coherent and contextually relevant responses, which can sometimes result in the projection of unwarranted confidence in their knowledge.

What it means

Experts point out that OpenAI's study highlights widespread concerns about the use of generative AI to produce and disseminate false and misleading information.

They also point out that users should approach these systems with real caution, recognising that even seemingly authoritative responses may be misleading.

Hallucination (artificial intelligence)

In the field of artificial intelligence (AI), a hallucination or artificial hallucination (also called bullshitting, confabulation or delusion) is a response generated by AI that contains false or misleading information presented as fact.

Source: Wikipedia 🔗

System 🤖

Claude 3
GPT-4o

Developer: OpenAI
Country: Global
Sector: Multiple
Purpose: Generate text
Technology: Generative AI; Large language model; Machine learning
Issue: Accuracy/reliability; Mis/disinformation

AIAAIC view 🤔

SimpleQA's test questions were chosen specifically because they cause problems for AI models. But it is clear that all major generative AI systems suffer from a marked and possibly intractable tendency to hallucinate.

OpenAI's willingness to admit the fallibility of its products is commendable in this instance - even if other aspects of its transparency are woefully limited, notably regarding its data sources and environmental impacts.

Sadly, Google, Microsoft and others appear only too willing for their generative AI systems to spill out nonsense and for their search engines to regurgitate them - if that's what it takes to buy user attention and advertising clicks.

It will interesting to see how OpenAI's new SearchGPT engine copes with the delusions of its own language models.

GPT-4o transparency

High on reasoning, highly selective on transparency

Research, advocacy 🧮

OpenAI. Introducing SimpleQA
OpenAI. Measuring short firm factuality in large language models (pdf)