Show simple item record

[journal article]

dc.contributor.authorGweon, Hyukjunde
dc.contributor.authorSchonlau, Matthiasde
dc.contributor.authorKaczmirek, Larsde
dc.contributor.authorBlohm, Michaelde
dc.contributor.authorSteiner, Stefande
dc.date.accessioned2019-02-28T09:53:50Z
dc.date.available2019-02-28T09:53:50Z
dc.date.issued2017de
dc.identifier.issn2001-7367de
dc.identifier.urihttps://www.ssoar.info/ssoar/handle/document/61576
dc.description.abstractOccupation coding, an important task in official statistics, refers to coding a respondent's text answer into one of many hundreds of occupation codes. To date, occupation coding is still at least partially conducted manually, at great expense. We propose three methods for automatic coding: combining separate models for the detailed occupation codes and for aggregate occupation codes, a hybrid method that combines a duplicate-based approach with a statistical learning algorithm, and a modified nearest neighbor approach. Using data from the German General Social Survey (ALLBUS), we show that the proposed methods improve on both the coding accuracy of the underlying statistical learning algorithm and the coding accuracy of duplicates where duplicates exist. Further, we find defining duplicates based on ngram variables (a concept from text mining) is preferable to one based on exact string matches.de
dc.languageende
dc.subject.ddcSozialwissenschaften, Soziologiede
dc.subject.ddcSocial sciences, sociology, anthropologyen
dc.subject.otherAutomated coding; Machine learning; ISCO-88de
dc.titleThree Methods for Occupation Coding Based on Statistical Learningde
dc.description.reviewbegutachtet (peer reviewed)de
dc.description.reviewpeer revieweden
dc.source.journalJournal of Official Statistics
dc.source.volume33de
dc.publisher.countryDEU
dc.source.issue1de
dc.subject.classozErhebungstechniken und Analysetechniken der Sozialwissenschaftende
dc.subject.classozMethods and Techniques of Data Collection and Data Analysis, Statistical Methods, Computer Methodsen
dc.subject.thesozofficial statisticsen
dc.subject.thesozCodierungde
dc.subject.thesozALLBUSen
dc.subject.thesozBerufde
dc.subject.thesozoccupationen
dc.subject.thesozAlgorithmusde
dc.subject.thesozALLBUSde
dc.subject.thesozamtliche Statistikde
dc.subject.thesozalgorithmen
dc.subject.thesozMethodede
dc.subject.thesozmethoden
dc.subject.thesozcodingen
dc.rights.licenceCreative Commons - Attribution-Noncommercial-No Derivative Works 4.0en
dc.rights.licenceCreative Commons - Namensnennung, Nicht kommerz., Keine Bearbeitung 4.0de
ssoar.contributor.institutionGESISde
internal.statusformal und inhaltlich fertig erschlossende
internal.identifier.thesoz10035039
internal.identifier.thesoz10035431
internal.identifier.thesoz10036452
internal.identifier.thesoz10040334
internal.identifier.thesoz10060522
internal.identifier.thesoz10038285
dc.type.stockarticlede
dc.type.documentjournal articleen
dc.type.documentZeitschriftenartikelde
dc.source.pageinfo101-122de
internal.identifier.classoz10105
internal.identifier.journal201
internal.identifier.document32
internal.identifier.ddc300
dc.identifier.doihttps://doi.org/10.1515/JOS-2017-0006de
dc.description.pubstatusPublished Versionen
dc.description.pubstatusVeröffentlichungsversionde
internal.identifier.licence20
internal.identifier.pubstatus1
internal.identifier.review1
ssoar.wgl.collectiontruede
internal.dda.referenceexcel-database-6@@journal article%%78
ssoar.urn.registrationfalsede


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record