Download full text
(external source)
Citation Suggestion
Please use the following Persistent Identifier (PID) to cite this document:
https://doi.org/10.1515/JOS-2017-0006
Exports for your reference manager
Three Methods for Occupation Coding Based on Statistical Learning
[journal article]
Abstract Occupation coding, an important task in official statistics, refers to coding a respondent's text answer into one of many hundreds of occupation codes. To date, occupation coding is still at least partially conducted manually, at great expense. We propose three methods for automatic coding: combining... view more
Occupation coding, an important task in official statistics, refers to coding a respondent's text answer into one of many hundreds of occupation codes. To date, occupation coding is still at least partially conducted manually, at great expense. We propose three methods for automatic coding: combining separate models for the detailed occupation codes and for aggregate occupation codes, a hybrid method that combines a duplicate-based approach with a statistical learning algorithm, and a modified nearest neighbor approach. Using data from the German General Social Survey (ALLBUS), we show that the proposed methods improve on both the coding accuracy of the underlying statistical learning algorithm and the coding accuracy of duplicates where duplicates exist. Further, we find defining duplicates based on ngram variables (a concept from text mining) is preferable to one based on exact string matches.... view less
Keywords
official statistics; ALLBUS; occupation; algorithm; method; coding
Classification
Methods and Techniques of Data Collection and Data Analysis, Statistical Methods, Computer Methods
Free Keywords
Automated coding; Machine learning; ISCO-88
Document language
English
Publication Year
2017
Page/Pages
p. 101-122
Journal
Journal of Official Statistics, 33 (2017) 1
ISSN
2001-7367
Status
Published Version; peer reviewed
Licence
Creative Commons - Attribution-Noncommercial-No Derivative Works 4.0