Coding Text Answers to Open-ended Questions: Human Coders and Statistical Learning Algorithms Make Similar Mistakes

He, Zhoushanyue; Schonlau, Matthias

Endnote export

%T Coding Text Answers to Open-ended Questions: Human Coders and Statistical Learning Algorithms Make Similar Mistakes
%A He, Zhoushanyue
%A Schonlau, Matthias
%J Methods, data, analyses : a journal for quantitative methods and survey methodology (mda)
%N 1
%P 103-120
%V 15
%D 2021
%K open-ended question; manual coding; automatic coding; text classification; text answer
%@ 2190-4936
%> https://nbn-resolving.org/urn:nbn:de:0168-ssoar-71619-9
%X Text answers to open-ended questions are often manually coded into one of several predefined categories or classes. More recently, researchers have begun to employ statistical models to automatically classify such text responses. It is unclear whether such automated coders and human coders find the same type of observations difficult to code or whether humans and models might be able to compensate for each other’s weaknesses. We analyze correlations between estimated error probabilities of human and automated coders and find: 1) Statistical models have higher error rates than human coders 2) Automated coders (models) and human coders tend to make similar coding mistakes. Specifically, the correlation between the estimated coding error of a statistical model and that of a human is comparable to that of two humans. 3) Two very different statistical models give highly correlated estimated coding errors. Therefore, a) the choice of statistical model does not matter, and b) having a second automated coder would be redundant.
%C DEU
%G en
%9 journal article
%W GESIS - http://www.gesis.org
%~ SSOAR - http://www.ssoar.info