Show simple item record

[journal article]

dc.contributor.authorHeseltine, Michaelde
dc.contributor.authorClemm von Hohenberg, Bernhardde
dc.date.accessioned2024-04-03T07:04:00Z
dc.date.available2024-04-03T07:04:00Z
dc.date.issued2024de
dc.identifier.issn2053-1680de
dc.identifier.urihttps://www.ssoar.info/ssoar/handle/document/93576
dc.description.abstractLarge-scale text analysis has grown rapidly as a method in political science and beyond. To date, text-as-data methods rely on large volumes of human-annotated training examples, which place a premium on researcher resources. However, advances in large language models (LLMs) may make automated annotation increasingly viable. This paper tests the performance of GPT-4 across a range of scenarios relevant for analysis of political text. We compare GPT-4 coding with human expert coding of tweets and news articles across four variables (whether text is political, its negativity, its sentiment, and its ideology) and across four countries (the United States, Chile, Germany, and Italy). GPT-4 coding is highly accurate, especially for shorter texts such as tweets, correctly classifying texts up to 95% of the time. Performance drops for longer news articles, and very slightly for non-English text. We introduce a 'hybrid' coding approach, in which disagreements of multiple GPT-4 runs are adjudicated by a human expert, which boosts accuracy. Finally, we explore downstream effects, finding that transformer models trained on hand-coded or GPT-4-coded data yield almost identical outcomes. Our results suggest that LLM-assisted coding is a viable and cost-efficient approach, although consideration should be given to task complexity.de
dc.languageende
dc.subject.ddcSozialwissenschaften, Soziologiede
dc.subject.ddcSocial sciences, sociology, anthropologyen
dc.subject.otherGPT; Large language models; machine learning; text-as-datade
dc.titleLarge language models as a substitute for human experts in annotating political textde
dc.description.reviewbegutachtet (peer reviewed)de
dc.description.reviewpeer revieweden
dc.identifier.urllocalfile:/var/local/dda-files/prod/crawlerfiles/0011d79c903a44ceb4cd51f695fe4307/0011d79c903a44ceb4cd51f695fe4307.pdfde
dc.source.journalResearch and Politics
dc.source.volume11de
dc.publisher.countryGBRde
dc.source.issue1de
dc.subject.classozErhebungstechniken und Analysetechniken der Sozialwissenschaftende
dc.subject.classozMethods and Techniques of Data Collection and Data Analysis, Statistical Methods, Computer Methodsen
dc.subject.thesozTextanalysede
dc.subject.thesoztext analysisen
dc.subject.thesozAutomatisierungde
dc.subject.thesozautomationen
dc.subject.thesozkünstliche Intelligenzde
dc.subject.thesozartificial intelligenceen
dc.subject.thesozSprachede
dc.subject.thesozlanguageen
dc.subject.thesozModellde
dc.subject.thesozmodelen
dc.subject.thesozCodierungde
dc.subject.thesozcodingen
dc.identifier.urnurn:nbn:de:0168-ssoar-93576-2
dc.rights.licenceCreative Commons - Namensnennung, Nicht-kommerz. 4.0de
dc.rights.licenceCreative Commons - Attribution-NonCommercial 4.0en
internal.statusformal und inhaltlich fertig erschlossende
internal.identifier.thesoz10035477
internal.identifier.thesoz10037519
internal.identifier.thesoz10043031
internal.identifier.thesoz10036028
internal.identifier.thesoz10036422
internal.identifier.thesoz10040334
dc.type.stockarticlede
dc.type.documentZeitschriftenartikelde
dc.type.documentjournal articleen
internal.identifier.classoz10105
internal.identifier.journal1137
internal.identifier.document32
internal.identifier.ddc300
dc.identifier.doihttps://doi.org/10.1177/20531680241236239de
dc.description.pubstatusVeröffentlichungsversionde
dc.description.pubstatusPublished Versionen
internal.identifier.licence32
internal.identifier.pubstatus1
internal.identifier.review1
internal.dda.referencecrawler-deepgreen-203@@0011d79c903a44ceb4cd51f695fe4307


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record