Galactica large language model

Galatica is a large language model developed by Facebook that 'can store, combine and reason about scientific knowledge' in order to assist scientists 'summarize academic papers, solve math problems, generate Wiki articles, write scientific code, annotate molecules and proteins, and more'. 

Released in November 2022, the system was trained on 106 billion tokens of open-access scientific text and data, including papers, textbooks, scientific websites, encyclopedias, reference material, and knowledge bases.

Operator: Meta/Facebook
Developer: Meta/Facebook
Country: USA; Global
Sector: Technology
Purpose: Assist scientists
Technology: Large language model (LLM); NLP/text analysis; Neural network; Deep learning
Issue: Accuracy/reliability; Bias/discrimination - race, ethnicity, gender, religion; Mis/disinformation; Safety
Transparency: Black box; Marketing

Risks and harms 🛑

Galactica has been criticised for generating authoritative-sounding but often subtly incorrect or biased information, reproducing problems of bias and toxicity seen in other language models, and potentially enabling the spread of misinformation, which is particularly dangerous given its authoritative tone.

Galactica was withdrawn after three days in the wake of criticism from prominent AI researchers and technology commentators. 

Criticism of Galactica focused on the ease with which it could be prompted to generate inaccurate, racist, anti-semitic, homophobic, and offensive articles and research, and scientific misinformation.

According to AI researcher and entrepreneur Gary Marcus, Galactica constitutes 'pitch perfect and utterly bogus imitations of science and math, presented as the real thing.'

For University of Washington in Seattle biologist Carl Bergstrom, the problem with Galactica is that it 'pretends to a portal to knowledge. Actually it's just a random bullshit generator'. 

'It’s no longer possible to have some fun by casually misusing it. Happy?' Meta’s chief AI scientist Yann LeCun followed-up.

Transparency 🙈

In a nod to the actual and/or potential limitations of its system, Meta notes (pdf) that 'there are no guarantees for truthful or reliable output from language models, even large ones on high-quality data like Galactica,' adding that the generated text might appear 'very authentic and highly confident,' but could still be wrong.

For Technology Review's Will Douglas Heaven, Meta's suggestion that 'the human-like text such models generate will always contain trustworthy information, as Meta appeared to do in its promotion of Galactica, is reckless and irresponsible.'

The marketing of the system demonstrates 'the all-too-common tendency of AI researchers to exaggerate the abilities of the systems they build', according to AI commentator Alberto Romero.

Page info
Type: System
Published: November 2022
Last updated: May 2024