SSOAR Logo
    • Deutsch
    • English
  • English 
    • Deutsch
    • English
  • Login
SSOAR ▼
  • Home
  • About SSOAR
  • Guidelines
  • Publishing in SSOAR
  • Cooperating with SSOAR
    • Cooperation models
    • Delivery routes and formats
    • Projects
  • Cooperation partners
    • Information about cooperation partners
  • Information
    • Possibilities of taking the Green Road
    • Grant of Licences
    • Download additional information
  • Operational concept
Browse and search Add new document OAI-PMH interface
JavaScript is disabled for your browser. Some features of this site may not work without it.

Download PDF
Download full text

(2.619Mb)

Citation Suggestion

Please use the following Persistent Identifier (PID) to cite this document:
https://nbn-resolving.org/urn:nbn:de:0168-ssoar-88562-5

Exports for your reference manager

Bibtex export
Endnote export

Display Statistics
Share
  • Share via E-Mail E-Mail
  • Share via Facebook Facebook
  • Share via Bluesky Bluesky
  • Share via Reddit reddit
  • Share via Linkedin LinkedIn
  • Share via XING XING

Explaining classification performance and bias via network structure and sampling technique

[conference paper]

Espín-Noboa, Lisette
Karimi, Fariba
Ribeiro, Bruno
Lerman, Kristina
Wagner, Claudia

Abstract

Social networks are very important carriers of information. For instance, the political leaning of our friends can serve as a proxy to identify our own political preferences. This explanatory power is leveraged in many scenarios ranging from business decision-making to scientific research to infer m... view more

Social networks are very important carriers of information. For instance, the political leaning of our friends can serve as a proxy to identify our own political preferences. This explanatory power is leveraged in many scenarios ranging from business decision-making to scientific research to infer missing attributes using machine learning. However, factors affecting the performance and the direction of bias of these algorithms are not well understood. To this end, we systematically study how structural properties of the network and the training sample influence the results of collective classification. Our main findings show that (i) mean classification performance can empirically and analytically be predicted by structural properties such as homophily, class balance, edge density and sample size, (ii) small training samples are enough for heterophilic networks to achieve high and unbiased classification performance, even with imperfect model estimates, (iii) homophilic networks are more prone to bias issues and low performance when group size differences increase, (iv) when sampling budgets are small, partial crawls achieve the most accurate model estimates, and degree sampling achieves the highest overall performance. Our findings help practitioners to better understand and evaluate their results when sampling budgets are small or when no ground-truth is available.... view less

Keywords
social network; data capture; sample; data quality

Classification
Methods and Techniques of Data Collection and Data Analysis, Statistical Methods, Computer Methods

Free Keywords
Collective inference; Input bias; Network structure; Output bias; Relational classification; Research; Sampling bias

Document language
English

Publication Year
2021

Journal
Applied Network Science, 6 (2021)

DOI
https://doi.org/10.1007/s41109-021-00394-3

ISSN
2364-8228

Status
Published Version; peer reviewed

Licence
Creative Commons - Attribution 4.0


GESIS LogoDFG LogoOpen Access Logo
Home  |  Legal notices  |  Operational concept  |  Privacy policy
© 2007 - 2025 Social Science Open Access Repository (SSOAR).
Based on DSpace, Copyright (c) 2002-2022, DuraSpace. All rights reserved.
 

 


GESIS LogoDFG LogoOpen Access Logo
Home  |  Legal notices  |  Operational concept  |  Privacy policy
© 2007 - 2025 Social Science Open Access Repository (SSOAR).
Based on DSpace, Copyright (c) 2002-2022, DuraSpace. All rights reserved.