AIAAIC - Flawed AI algorithms grade student essays in multiple US states

Flawed AI algorithms grade student essays in multiple US states

Occurred: August 2019

Report incident 🔥 | Improve page 💁 | Access database 🔢

AI-powered systems used to score the essay portions of standardised tests in the US regularly suffer from bias and other issues, according to a media investigation.

What happened

Citing research studies, Motherboard reported that so-called 'automated essay scoring engines' used by 21 US states demonstrated bias against certain demographic groups.

These included e-rater, an engine developed and run by the nonprofit Educational Testing Service (ETS), which researchers found to have given higher scores to some students, particularly those from mainland China, than did expert human graders. It also tended to underscore African-American students and Arabic, Spanish and Hindi speakers.

Meantime, ACCUPLACER, a machine-scored test owned by the College Board, failed to reliably predict female, Asian, Hispanic, and African-American students’ eventual writing grades.

Furthermore, some systems can be fooled by nonsense essays with sophisticated vocabulary, whilst most stuggle to judge more nuanced aspects of writing, like creativity, experts say.

Why it happened

Automated scoring is proving popular with schools as it is cheaper and faster than human grading.

But concerns about accuracy and fairness persist. These systems use natural language processing (NLP) to predict scores based on patterns from previously graded essays, but they are prone to biases against certain demographic groups.

The algorithms focus on measurable aspects like vocabulary and sentence length, which may disadvantage English language learners and others who write differently.

What it means

Experts suggest that while machine scoring can be useful, it should always be accompanied by human grading to ensure quality control.

Automated essay scoring

Automated essay scoring (AES) is the use of specialized computer programs to assign grades to essays written in an educational setting. It is a form of educational assessment and an application of natural language processing.

Source: Wikipedia 🔗

System 🤖

ACCUPLACER 🔗
E-rater scoring engine 🔗

Operator:
Developer: ACCUPLACER; American Institutes for Research (AIR); Educational Testing Service (ETS)
Country: USA
Sector: Education
Purpose: Assess and score student essays
Technology: NLP/text analysis; Machine learning
Issue: Accuracy/reliability; Bias/discrimination - race, ethnicity; Ethics/values
Transparency: Governance

Research, advocacy 🧮

Ramineni C., Williamson D. (2018). Understanding Mean Score Differences Between the e-rater® Automated Scoring Engine and Humans for Demographically Based Groups in the GRE® General Test
Amorim E., Cançado M., Veloso A. (2018). Automated Essay Scoring in the Presence of Biased Ratings
Elliot N., Deess P., Rudniy A., Joshi K. (2012). Placement of Students into First-Year Writing Courses

News, commentary, analysis 🗞️

https://www.vice.com/en/article/pa7dj9/flawed-algorithms-are-grading-millions-of-students-essays
https://futurism.com/the-byte/states-using-ai-grade-essays-standardized-tests
https://www.mic.com/p/standardized-test-algorithms-used-for-grading-are-reinforcing-human-biases-18683017
https://medium.com/actnext-navigator/explaining-the-grade-auto-essay-scoring-and-crase-e7a3f6ddb6c6
https://www.vox.com/recode/2019/10/20/20921354/ai-algorithms-essay-writing
https://www.wbur.org/cognoscenti/2019/11/12/robo-grading-rich-barlow
https://www.npr.org/2018/06/30/624373367/more-states-opting-to-robo-grade-student-essays-by-computer
https://www.marketplace.org/shows/marketplace-tech/automated-test-grading-multiple-choice-scantron-bubble-sheets-artificial-intelligence-essays-writing-prep/
https://www.resetera.com/threads/flawed-algorithms-are-grading-millions-of-students%E2%80%99-essays.138115/

Related 🌐

Page info
Type: Incident
Published: March 2024
Last updated: November 2024

Page updated

Google Sites

Report abuse