SSOAR Logo
    • Deutsch
    • English
  • English 
    • Deutsch
    • English
  • Login
SSOAR ▼
  • Home
  • About SSOAR
  • Guidelines
  • Publishing in SSOAR
  • Cooperating with SSOAR
    • Cooperation models
    • Delivery routes and formats
    • Projects
  • Cooperation partners
    • Information about cooperation partners
  • Information
    • Possibilities of taking the Green Road
    • Grant of Licences
    • Download additional information
  • Operational concept
Browse and search Add new document OAI-PMH interface
JavaScript is disabled for your browser. Some features of this site may not work without it.

Download PDF
Download full text

(1.622Mb)

Citation Suggestion

Please use the following Persistent Identifier (PID) to cite this document:
https://nbn-resolving.org/urn:nbn:de:0168-ssoar-102878-3

Exports for your reference manager

Bibtex export
Endnote export

Display Statistics
Share
  • Share via E-Mail E-Mail
  • Share via Facebook Facebook
  • Share via Bluesky Bluesky
  • Share via Reddit reddit
  • Share via Linkedin LinkedIn
  • Share via XING XING

Scoring German Alternate Uses Items Applying Large Language Models

[journal article]

Saretzki, Janika
Knopf, Thomas
Forthmann, Boris
Goecke, Benjamin
Jaggy, Ann-Kathrin
Benedek, Mathias
Weiss, Selina

Abstract

The alternate uses task (AUT) is the most popular measure when it comes to the assessment of creative potential. Since their implementation, AUT responses have been rated by humans, which is a laborious task and requires considerable resources. Large language models (LLMs) have shown promising perfo... view more

The alternate uses task (AUT) is the most popular measure when it comes to the assessment of creative potential. Since their implementation, AUT responses have been rated by humans, which is a laborious task and requires considerable resources. Large language models (LLMs) have shown promising performance in automatically scoring AUT responses in English as well as in other languages, but it is not clear which method works best for German data. Therefore, we investigated the performance of different LLMs for the automated scoring of German AUT responses. We compiled German data across five research groups including ~50,000 responses for 15 different alternate uses objects from eight lab and online survey studies (including ~2300 participants) to examine generalizability across datasets and assessment conditions. Following a pre-registered analysis plan, we compared the performance of two fine-tuned, multilingual LLM-based approaches [Cross-Lingual Alternate Uses Scoring (CLAUS) and the Open Creativity Scoring with Artificial Intelligence (OCSAI)] with the Generative Pre-trained Transformer (GPT-4) in scoring (a) the original German AUT responses and (b) the responses translated to English. We found that the LLM-based scorings were substantially correlated with human ratings, with higher relationships for OCSAI followed by GPT-4 and CLAUS. Response translation, however, had no consistent positive effect. We discuss the generalizability of the results across different items and studies and derive recommendations and future directions.... view less

Classification
General Psychology

Free Keywords
GPT; alternate uses task; German; assessment; automated scoring; creativity; divergent thinking; large language models

Document language
English

Publication Year
2025

Journal
Journal of Intelligence, 13 (2025) 6

DOI
https://doi.org/10.3390/jintelligence13060064

ISSN
2079-3200

Status
Published Version; peer reviewed

Licence
Creative Commons - Attribution 4.0


GESIS LogoDFG LogoOpen Access Logo
Home  |  Legal notices  |  Operational concept  |  Privacy policy
© 2007 - 2025 Social Science Open Access Repository (SSOAR).
Based on DSpace, Copyright (c) 2002-2022, DuraSpace. All rights reserved.
 

 


GESIS LogoDFG LogoOpen Access Logo
Home  |  Legal notices  |  Operational concept  |  Privacy policy
© 2007 - 2025 Social Science Open Access Repository (SSOAR).
Based on DSpace, Copyright (c) 2002-2022, DuraSpace. All rights reserved.