Show simple item record

[journal article]

dc.contributor.authorGussenbauer, Johannesde
dc.contributor.authorTempl, Matthiasde
dc.contributor.authorFritzmann, Sirode
dc.contributor.authorKowarik, Alexanderde
dc.date.accessioned2025-07-08T14:43:25Z
dc.date.available2025-07-08T14:43:25Z
dc.date.issued2024de
dc.identifier.issn1999-4893de
dc.identifier.urihttps://www.ssoar.info/ssoar/handle/document/103428
dc.description.abstractSyntheticdata generation methods are used to transform the original data into privacy-compliant synthetic copies (twin data). With our proposed approach, synthetic data can be simulated in the same size as the input data or in any size, and in the case of finite populations, even the entire population can be simulated. The proposed XGBoost-based method is compared with known model-based approaches to generate synthetic data using a complex survey data set. The XGBoost method shows strong performance, especially with synthetic categorical variables, and outperforms other tested methods. Furthermore, the structure and relationship between variables are well preserved. The tuning of the parameters is performed automatically by a modified k-fold cross-validation. If exact population margins are known, e.g., cross-tabulated population counts on age class, gender and region, the synthetic data must be calibrated to those known population margins. For this purpose, we have implemented a simulated annealing algorithm that is able to use multiple population margins simultaneously to post-calibrate a synthetic population. The algorithm is, thus, able to calibrate simulated population data containing cluster and individual information, e.g., about persons in households, at both person and household level. Furthermore, the algorithm is efficiently implemented so that the adjustment of populations with many millions or more persons is possible.de
dc.languageende
dc.subject.ddcSozialwissenschaften, Soziologiede
dc.subject.ddcSocial sciences, sociology, anthropologyen
dc.subject.othercomplex survey data; synthetic populations; XGBoost; calibration of populations; EU-SILC 2013de
dc.titleSimulation of Calibrated Complex Synthetic Population Data with XGBoostde
dc.description.reviewbegutachtet (peer reviewed)de
dc.description.reviewpeer revieweden
dc.source.journalAlgorithms
dc.source.volume17de
dc.publisher.countryCHEde
dc.source.issue6de
dc.subject.classozErhebungstechniken und Analysetechniken der Sozialwissenschaftende
dc.subject.classozMethods and Techniques of Data Collection and Data Analysis, Statistical Methods, Computer Methodsen
dc.subject.thesozPrivatsphärede
dc.subject.thesozprivacyen
dc.subject.thesozDatende
dc.subject.thesozdataen
dc.subject.thesozSimulationde
dc.subject.thesozsimulationen
dc.subject.thesozDatenaufbereitungde
dc.subject.thesozdata preparationen
dc.subject.thesozstatistische Methodede
dc.subject.thesozstatistical methoden
dc.subject.thesozMethodenforschungde
dc.subject.thesozmethodological researchen
dc.identifier.urnurn:nbn:de:0168-ssoar-103428-4
dc.rights.licenceCreative Commons - Namensnennung 4.0de
dc.rights.licenceCreative Commons - Attribution 4.0en
ssoar.contributor.institutionFDBde
internal.statusformal und inhaltlich fertig erschlossende
internal.identifier.thesoz10055257
internal.identifier.thesoz10034708
internal.identifier.thesoz10037865
internal.identifier.thesoz10040524
internal.identifier.thesoz10052184
internal.identifier.thesoz10052193
dc.type.stockarticlede
dc.type.documentZeitschriftenartikelde
dc.type.documentjournal articleen
dc.source.pageinfo1-28de
internal.identifier.classoz10105
internal.identifier.journal3396
internal.identifier.document32
internal.identifier.ddc300
dc.identifier.doihttps://doi.org/10.3390/a17060249de
dc.description.pubstatusVeröffentlichungsversionde
dc.description.pubstatusPublished Versionen
internal.identifier.licence16
internal.identifier.pubstatus1
internal.identifier.review1
internal.pdf.validfalse
internal.pdf.wellformedtrue
internal.pdf.encryptedfalse


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record