SSOAR Logo
    • Deutsch
    • English
  • Deutsch 
    • Deutsch
    • English
  • Einloggen
SSOAR ▼
  • Home
  • Über SSOAR
  • Leitlinien
  • Veröffentlichen auf SSOAR
  • Kooperieren mit SSOAR
    • Kooperationsmodelle
    • Ablieferungswege und Formate
    • Projekte
  • Kooperationspartner
    • Informationen zu Kooperationspartnern
  • Informationen
    • Möglichkeiten für den Grünen Weg
    • Vergabe von Nutzungslizenzen
    • Informationsmaterial zum Download
  • Betriebskonzept
Browsen und suchen Dokument hinzufügen OAI-PMH-Schnittstelle
JavaScript is disabled for your browser. Some features of this site may not work without it.

Download PDF
Volltext herunterladen

(333.7 KB)

Zitationshinweis

Bitte beziehen Sie sich beim Zitieren dieses Dokumentes immer auf folgenden Persistent Identifier (PID):
https://nbn-resolving.org/urn:nbn:de:0168-ssoar-98482-6

Export für Ihre Literaturverwaltung

Bibtex-Export
Endnote-Export

Statistiken anzeigen
Weiterempfehlen
  • Share via E-Mail E-Mail
  • Share via Facebook Facebook
  • Share via Bluesky Bluesky
  • Share via Reddit reddit
  • Share via Linkedin LinkedIn
  • Share via XING XING

Assessing In-context Learning and Fine-tuning for Topic Classification of German Web Data

[Konferenzbeitrag]

Schelb, Julian
Ulloa, Roberto
Spitz, Andrea

Abstract

Researchers in the political and social sciences often rely on classification models to analyze trends in information consumption by examining browsing histories of millions of webpages. Automated scalable methods are necessary due to the impracticality of manual labeling. In this paper, we model th... mehr

Researchers in the political and social sciences often rely on classification models to analyze trends in information consumption by examining browsing histories of millions of webpages. Automated scalable methods are necessary due to the impracticality of manual labeling. In this paper, we model the detection of topic-related content as a binary classification task and compare the accuracy of fine-tuned pre-trained encoder models against in-context learning strategies. Using only a few hundred annotated data points per topic, we detect content related to three German policies in a database of scraped webpages. We compare multilingual and monolingual models, as well as zero and few-shot approaches, and investigate the impact of negative sampling strategies and the combination of URL & content-based features. Our results show that a small sample of annotated data is sufficient to train an effective classifier. Finetuning encoder-based models yields better results than in-context learning. Classifiers using both URL & content-based features perform best, while using URLs alone provides adequate results when content is unavailable.... weniger

Thesaurusschlagwörter
Klassifikation; Modell; Website; Daten; Datengewinnung; Politikwissenschaft; Sozialwissenschaft; Textanalyse

Klassifikation
Grundlagen, Geschichte, generelle Theorien und Methoden der Sozialwissenschaften

Freie Schlagwörter
Sampling; Multilinguale vs. monolinguale Modelle; Fine-tuning

Titel Sammelwerk, Herausgeber- oder Konferenzband
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

Herausgeber
Fu, Xiyan; Fleisig, Eve

Sprache Dokument
Englisch

Publikationsjahr
2024

Erscheinungsort
Bangkok

Seitenangabe
S. 144-158

ISBN
979-8-89176-097-4

Status
Veröffentlichungsversion

Lizenz
Creative Commons - Namensnennung 4.0


GESIS LogoDFG LogoOpen Access Logo
Home  |  Impressum  |  Betriebskonzept  |  Datenschutzerklärung
© 2007 - 2025 Social Science Open Access Repository (SSOAR).
Based on DSpace, Copyright (c) 2002-2022, DuraSpace. All rights reserved.
 

 


GESIS LogoDFG LogoOpen Access Logo
Home  |  Impressum  |  Betriebskonzept  |  Datenschutzerklärung
© 2007 - 2025 Social Science Open Access Repository (SSOAR).
Based on DSpace, Copyright (c) 2002-2022, DuraSpace. All rights reserved.