SSOAR Logo
    • Deutsch
    • English
  • English 
    • Deutsch
    • English
  • Login
SSOAR ▼
  • Home
  • About SSOAR
  • Guidelines
  • Publishing in SSOAR
  • Cooperating with SSOAR
    • Cooperation models
    • Delivery routes and formats
    • Projects
  • Cooperation partners
    • Information about cooperation partners
  • Information
    • Possibilities of taking the Green Road
    • Grant of Licences
    • Download additional information
  • Operational concept
Browse and search Add new document OAI-PMH interface
JavaScript is disabled for your browser. Some features of this site may not work without it.

Download PDF
Download full text

(1.291Mb)

Citation Suggestion

Please use the following Persistent Identifier (PID) to cite this document:
https://nbn-resolving.org/urn:nbn:de:0168-ssoar-91666-3

Exports for your reference manager

Bibtex export
Endnote export

Display Statistics
Share
  • Share via E-Mail E-Mail
  • Share via Facebook Facebook
  • Share via Bluesky Bluesky
  • Share via Reddit reddit
  • Share via Linkedin LinkedIn
  • Share via XING XING

Optimized Dictionaries: A Semi-Automated Workflow of Concept Identification in Text-Data

Optimierte Wörterbücher: Ein teilautomatisierter Arbeitsablauf zur Identifizierung von Konzepten in Textdaten
[working paper]

Röth, Leonce
Kaftan, Lea
Saldivia Gonzatti, Daniel

Abstract

Identifying social science concepts and measuring their prevalence and framing in text data has been a key task of scientists ever since. Whereas debates about text classifications typically contrast different approaches with each other, we propose a workflow that generates optimized dictionaries th... view more

Identifying social science concepts and measuring their prevalence and framing in text data has been a key task of scientists ever since. Whereas debates about text classifications typically contrast different approaches with each other, we propose a workflow that generates optimized dictionaries that are based on the complementary use of expert dictionaries, machine learning, and topic modeling. We demonstrate our case by identifying the concept of "territorial politics" in leading newspapers vis-à-vis parliamentary speeches in Spain (1976-2018) and the UK (1900-2018). We show that our optimized dictionaries outperform singular text-identification techniques with F1-scores around 0.9 for unseen data, even if the unseen data comes from a different political domain (media vs. parliaments). Optimized dictionaries have increasing returns and should be developed as a common good for researchers overcoming costly particularism.... view less

Keywords
text analysis; mass media; parliamentary debate; dictionary; attention; political agenda

Classification
Research Design

Free Keywords
text-as-data; agenda-setting; salience

Document language
English

Publication Year
2024

Page/Pages
24, 18 p.

Status
Preprint; not reviewed

Licence
Creative Commons - Attribution-NonCommercial 4.0


GESIS LogoDFG LogoOpen Access Logo
Home  |  Legal notices  |  Operational concept  |  Privacy policy
© 2007 - 2025 Social Science Open Access Repository (SSOAR).
Based on DSpace, Copyright (c) 2002-2022, DuraSpace. All rights reserved.
 

 


GESIS LogoDFG LogoOpen Access Logo
Home  |  Legal notices  |  Operational concept  |  Privacy policy
© 2007 - 2025 Social Science Open Access Repository (SSOAR).
Based on DSpace, Copyright (c) 2002-2022, DuraSpace. All rights reserved.