SSOAR Logo
    • Deutsch
    • English
  • English 
    • Deutsch
    • English
  • Login
SSOAR ▼
  • Home
  • About SSOAR
  • Guidelines
  • Publishing in SSOAR
  • Cooperating with SSOAR
    • Cooperation models
    • Delivery routes and formats
    • Projects
  • Cooperation partners
    • Information about cooperation partners
  • Information
    • Possibilities of taking the Green Road
    • Grant of Licences
    • Download additional information
  • Operational concept
Browse and search Add new document OAI-PMH interface
JavaScript is disabled for your browser. Some features of this site may not work without it.

Download PDF
Download full text

(external source)

Citation Suggestion

Please use the following Persistent Identifier (PID) to cite this document:
https://doi.org/10.18335/region.v6i3.276

Exports for your reference manager

Bibtex export
Endnote export

Display Statistics
Share
  • Share via E-Mail E-Mail
  • Share via Facebook Facebook
  • Share via Bluesky Bluesky
  • Share via Reddit reddit
  • Share via Linkedin LinkedIn
  • Share via XING XING

Demonstrating the utility of machine learning innovations in address matching to spatial socio-economic applications

[journal article]

Comber, Sam

Abstract

The last decade has heralded an unprecedented rise in the number, frequency and availability of data sources. Yet they are often incomplete, meaning data fusion is required to enhance their quality and scope. In the context of spatial analysis, address matching is critical to enhancing household soc... view more

The last decade has heralded an unprecedented rise in the number, frequency and availability of data sources. Yet they are often incomplete, meaning data fusion is required to enhance their quality and scope. In the context of spatial analysis, address matching is critical to enhancing household socio-economic and demographic characteristics. Matching administrative, commercial, or lifestyle data sources to items such as household surveys has the potential benefits of improving data quality, enabling spatial data visualisation, and the lowering of respondent burden in household surveys. Typically when a practitioner has high quality data, unique identifiers are used to facilitate a direct linkage between household addresses. However, real-world databases are often absent of unique identifiers to enable a one-to-one match. Moreover, irregularities between the text representations of potential matches mean extensive cleaning of the data is often required as a pre-processing step. For this reason, practitioners have traditionally relied on two linkage techniques for facilitating matches between the text representations of addresses that are broadly divided into deterministic or mathematical approaches. Deterministic matching consists of constructing hand-crafted rules that classify address matches and non-matches based on specialist domain knowledge, while mathematical approaches have increasingly adopted machine learning techniques for resolving pairs of addresses to a match. In this notebook we demonstrate methods of the latter by demonstrating the utility of machine learning approaches to the address matching work flow. To achieve this, we construct a predictive model that resolves matches between two small datasets of restaurant addresses in the US. While the problem case may seem trivial, the intention of the notebook is to demonstrate an approach that is reproducible and extensible to larger data challenges. Thus, in the present notebook, we document an end-to-end pipeline that is replicable and instructive towards assisting future address matching problem cases faced by the regional scientist.  ... view less

Classification
Natural Science and Engineering, Applied Sciences

Document language
English

Publication Year
2019

Page/Pages
p. 17-37

Journal
Region: the journal of ERSA, 6 (2019) 3

ISSN
2409-5370

Status
Published Version; peer reviewed

Licence
Creative Commons - Attribution-NonCommercial 4.0


GESIS LogoDFG LogoOpen Access Logo
Home  |  Legal notices  |  Operational concept  |  Privacy policy
© 2007 - 2025 Social Science Open Access Repository (SSOAR).
Based on DSpace, Copyright (c) 2002-2022, DuraSpace. All rights reserved.
 

 


GESIS LogoDFG LogoOpen Access Logo
Home  |  Legal notices  |  Operational concept  |  Privacy policy
© 2007 - 2025 Social Science Open Access Repository (SSOAR).
Based on DSpace, Copyright (c) 2002-2022, DuraSpace. All rights reserved.