Download full text
(1.291Mb)
Citation Suggestion
Please use the following Persistent Identifier (PID) to cite this document:
https://nbn-resolving.org/urn:nbn:de:0168-ssoar-91666-3
Exports for your reference manager
Optimized Dictionaries: A Semi-Automated Workflow of Concept Identification in Text-Data
Optimierte Wörterbücher: Ein teilautomatisierter Arbeitsablauf zur Identifizierung von Konzepten in Textdaten
[working paper]
Abstract Identifying social science concepts and measuring their prevalence and framing in text data has been a key task of scientists ever since. Whereas debates about text classifications typically contrast different approaches with each other, we propose a workflow that generates optimized dictionaries th... view more
Identifying social science concepts and measuring their prevalence and framing in text data has been a key task of scientists ever since. Whereas debates about text classifications typically contrast different approaches with each other, we propose a workflow that generates optimized dictionaries that are based on the complementary use of expert dictionaries, machine learning, and topic modeling. We demonstrate our case by identifying the concept of "territorial politics" in leading newspapers vis-à-vis parliamentary speeches in Spain (1976-2018) and the UK (1900-2018). We show that our optimized dictionaries outperform singular text-identification techniques with F1-scores around 0.9 for unseen data, even if the unseen data comes from a different political domain (media vs. parliaments). Optimized dictionaries have increasing returns and should be developed as a common good for researchers overcoming costly particularism.... view less
Keywords
text analysis; mass media; parliamentary debate; dictionary; attention; political agenda
Classification
Research Design
Free Keywords
text-as-data; agenda-setting; salience
Document language
English
Publication Year
2024
Page/Pages
24, 18 p.
Status
Preprint; not reviewed