Show simple item record

[journal article]

dc.contributor.authorSaretzki, Janikade
dc.contributor.authorKnopf, Thomasde
dc.contributor.authorForthmann, Borisde
dc.contributor.authorGoecke, Benjaminde
dc.contributor.authorJaggy, Ann-Kathrinde
dc.contributor.authorBenedek, Mathiasde
dc.contributor.authorWeiss, Selinade
dc.date.accessioned2025-06-11T08:56:49Z
dc.date.available2025-06-11T08:56:49Z
dc.date.issued2025de
dc.identifier.issn2079-3200de
dc.identifier.urihttps://www.ssoar.info/ssoar/handle/document/102878
dc.description.abstractThe alternate uses task (AUT) is the most popular measure when it comes to the assessment of creative potential. Since their implementation, AUT responses have been rated by humans, which is a laborious task and requires considerable resources. Large language models (LLMs) have shown promising performance in automatically scoring AUT responses in English as well as in other languages, but it is not clear which method works best for German data. Therefore, we investigated the performance of different LLMs for the automated scoring of German AUT responses. We compiled German data across five research groups including ~50,000 responses for 15 different alternate uses objects from eight lab and online survey studies (including ~2300 participants) to examine generalizability across datasets and assessment conditions. Following a pre-registered analysis plan, we compared the performance of two fine-tuned, multilingual LLM-based approaches [Cross-Lingual Alternate Uses Scoring (CLAUS) and the Open Creativity Scoring with Artificial Intelligence (OCSAI)] with the Generative Pre-trained Transformer (GPT-4) in scoring (a) the original German AUT responses and (b) the responses translated to English. We found that the LLM-based scorings were substantially correlated with human ratings, with higher relationships for OCSAI followed by GPT-4 and CLAUS. Response translation, however, had no consistent positive effect. We discuss the generalizability of the results across different items and studies and derive recommendations and future directions.de
dc.languageende
dc.subject.ddcPsychologiede
dc.subject.ddcPsychologyen
dc.subject.otherGPT; alternate uses task; German; assessment; automated scoring; creativity; divergent thinking; large language modelsde
dc.titleScoring German Alternate Uses Items Applying Large Language Modelsde
dc.description.reviewbegutachtet (peer reviewed)de
dc.description.reviewpeer revieweden
dc.identifier.urllocalfile:/var/local/dda-files/prod/crawlerfiles/b0b231efb312446ca675c5805fdf1338/b0b231efb312446ca675c5805fdf1338.pdfde
dc.source.journalJournal of Intelligence
dc.source.volume13de
dc.publisher.countryCHEde
dc.source.issue6de
dc.subject.classozAllgemeine Psychologiede
dc.subject.classozGeneral Psychologyen
dc.identifier.urnurn:nbn:de:0168-ssoar-102878-3
dc.rights.licenceCreative Commons - Namensnennung 4.0de
dc.rights.licenceCreative Commons - Attribution 4.0en
ssoar.contributor.institutionGESISde
internal.statusformal und inhaltlich fertig erschlossende
dc.type.stockarticlede
dc.type.documentZeitschriftenartikelde
dc.type.documentjournal articleen
internal.identifier.classoz10703
internal.identifier.journal2473
internal.identifier.document32
internal.identifier.ddc150
dc.identifier.doihttps://doi.org/10.3390/jintelligence13060064de
dc.description.pubstatusVeröffentlichungsversionde
dc.description.pubstatusPublished Versionen
internal.identifier.licence16
internal.identifier.pubstatus1
internal.identifier.review1
ssoar.wgl.collectiontruede
internal.dda.referencecrawler-deepgreen-1281@@b0b231efb312446ca675c5805fdf1338


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record