AIAAIC - Larger language models less likely to admit ignorance

Study: Larger language models are less likely to admit ignorance

Occurred: September 2024

Report incident 🔥 | Improve page 💁 | Access database 🔢

The larger a language model is, the more reluctant it is to admit when it does not know the answer to a query, potentially leading to more misinformation, according to researchers.

Researchers from the Universitat Politècnica de València posed and analysed thousands of questions spanning math, science, and geography to OpenAI's GPT, Meta's LLaMA, and BigScience's BLOOM models. They then categorised the responses as correct, incorrect or avoidant.

The findings indicate that while newer models are more accurate when handling complex problems, they are less transparent regarding their limitations. Earlier versions of LLMs would often acknowledge their inability to answer a question or request additional information, whereas the latest iterations are more likely to provide incorrect answers instead of admitting ignorance.

For example, the study noted a significant decrease in "avoidant" responses from GPT-4 compared to its predecessor, GPT-3.5.

In addition, the researchers highlighted that despite improvements in handling challenging queries, these models still struggle with basic questions.

The phenomenon raised ethical concerns about users overestimating the capabilities of generative AI systems as they may present misleading information with confidence, potentially leading to greater dissemination of incorrect information.

Large language model

A large language model (LLM) is a computational model capable of language generation or other natural language processing tasks. As language models, LLMs acquire these abilities by learning statistical relationships from vast amounts of text during a self-supervised and semi-supervised training process.

Source: Wikipedia 🔗

System 🤖

BLOOM
GPT-4
GPT-3.5
LLaMA

Operator:
Developer: BigScience; Meta; OpenAI
Country: Global
Sector: Education; Research/academia
Purpose: Generate text
Technology: Large language model; Generative AI; Machine learning
Issue: Accuracy/reliability; Ethics/values

Research, advocacy 🧮

Lexin Zhou, Wout Schellaert, Fernando Martínez-Plumed, Yael Moros-Daval, Cèsar Ferri, José Hernández-Orallo. Larger and more instructable language models become less reliable

News, commentary, analysis 🗞️

Related 🌐

Page info
Type: Issue
Published: October 2024