Show simple item record

[conference paper]

dc.contributor.authorSchelb, Juliande
dc.contributor.authorUlloa, Robertode
dc.contributor.authorSpitz, Andreade
dc.contributor.editorFu, Xiyande
dc.contributor.editorFleisig, Evede
dc.date.accessioned2024-12-13T08:44:44Z
dc.date.available2024-12-13T08:44:44Z
dc.date.issued2024de
dc.identifier.isbn979-8-89176-097-4de
dc.identifier.urihttps://www.ssoar.info/ssoar/handle/document/98482
dc.description.abstractResearchers in the political and social sciences often rely on classification models to analyze trends in information consumption by examining browsing histories of millions of webpages. Automated scalable methods are necessary due to the impracticality of manual labeling. In this paper, we model the detection of topic-related content as a binary classification task and compare the accuracy of fine-tuned pre-trained encoder models against in-context learning strategies. Using only a few hundred annotated data points per topic, we detect content related to three German policies in a database of scraped webpages. We compare multilingual and monolingual models, as well as zero and few-shot approaches, and investigate the impact of negative sampling strategies and the combination of URL & content-based features. Our results show that a small sample of annotated data is sufficient to train an effective classifier. Finetuning encoder-based models yields better results than in-context learning. Classifiers using both URL & content-based features perform best, while using URLs alone provides adequate results when content is unavailable.de
dc.languageende
dc.subject.ddcSozialwissenschaften, Soziologiede
dc.subject.ddcSocial sciences, sociology, anthropologyen
dc.subject.otherSampling; Multilinguale vs. monolinguale Modelle; Fine-tuningde
dc.titleAssessing In-context Learning and Fine-tuning for Topic Classification of German Web Datade
dc.description.reviewunbekanntde
dc.description.reviewunknownen
dc.identifier.urlhttps://aclanthology.org/2024.acl-srw.22.pdfde
dc.source.collectionProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)de
dc.publisher.countryMISCde
dc.publisher.cityBangkok
dc.subject.classozGrundlagen, Geschichte, generelle Theorien und Methoden der Sozialwissenschaftende
dc.subject.classozBasic Research in the Social Sciencesen
dc.subject.thesozKlassifikationde
dc.subject.thesozclassificationen
dc.subject.thesozModellde
dc.subject.thesozmodelen
dc.subject.thesozWebsitede
dc.subject.thesozwebsiteen
dc.subject.thesozDatende
dc.subject.thesozdataen
dc.subject.thesozDatengewinnungde
dc.subject.thesozdata captureen
dc.subject.thesozPolitikwissenschaftde
dc.subject.thesozpolitical scienceen
dc.subject.thesozSozialwissenschaftde
dc.subject.thesozsocial scienceen
dc.subject.thesozTextanalysede
dc.subject.thesoztext analysisen
dc.identifier.urnurn:nbn:de:0168-ssoar-98482-6
dc.rights.licenceCreative Commons - Namensnennung 4.0de
dc.rights.licenceCreative Commons - Attribution 4.0en
ssoar.contributor.institutionGESISde
internal.statusformal und inhaltlich fertig erschlossende
internal.identifier.thesoz10048972
internal.identifier.thesoz10036422
internal.identifier.thesoz10064822
internal.identifier.thesoz10034708
internal.identifier.thesoz10040547
internal.identifier.thesoz10054725
internal.identifier.thesoz10058540
internal.identifier.thesoz10035477
dc.type.stockincollectionde
dc.type.documentKonferenzbeitragde
dc.type.documentconference paperen
dc.source.pageinfo144-158de
internal.identifier.classoz10100
internal.identifier.document16
internal.identifier.ddc300
dc.date.conference2024de
dc.description.pubstatusVeröffentlichungsversionde
dc.description.pubstatusPublished Versionen
internal.identifier.licence16
internal.identifier.pubstatus1
internal.identifier.review4
internal.pdf.validfalse
internal.pdf.wellformedtrue
internal.pdf.encryptedfalse


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record