Coding Text Answers to Open-ended Questions: Human Coders and Statistical Learning Algorithms Make Similar Mistakes

He, Zhoushanyue; Schonlau, Matthias

Citation Suggestion

Please use the following Persistent Identifier (PID) to cite this document:
https://nbn-resolving.org/urn:nbn:de:0168-ssoar-71619-9

[journal article]

He, Zhoushanyue

Schonlau, Matthias

Abstract

Text answers to open-ended questions are often manually coded into one of several predefined categories or classes. More recently, researchers have begun to employ statistical models to automatically classify such text responses. It is unclear whether such automated coders and human coders find the same type of observations difficult to code or whether humans and models might be able to compensate for each other’s weaknesses. We analyze correlations between estimated error probabilities of human and automated coders and find: 1) Statistical models have higher error rates than human coders 2) Automated coders (models) and human coders tend to make similar coding mistakes. Specifically, the correlation between the estimated coding error of a statistical model and that of a human is comparable to that of two humans. 3) Two very different statistical models give highly correlated estimated coding errors. Therefore, a) the choice of statistical model does not matter, and b) having a second automated coder would be redundant.... view less

Keywords
survey research; error; statistical method; analysis; data preparation; coding

Classification
Methods and Techniques of Data Collection and Data Analysis, Statistical Methods, Computer Methods

Free Keywords
open-ended question; manual coding; automatic coding; text classification; text answer

Document language
English

Publication Year
2021

Page/Pages
p. 103-120

Journal
Methods, data, analyses : a journal for quantitative methods and survey methodology (mda), 15 (2021) 1

Issue topic
The Use of Open-ended Questions in Surveys

DOI
https://doi.org/10.12758/mda.2020.10

ISSN
2190-4936

Status
Published Version; peer reviewed

Licence
Creative Commons - Attribution 4.0