Let's Connect
Follow Us
Watch Us
(+385) 1 2380 262
journal.prometfpz.unizg.hr
Promet - Traffic&Transportation journal

Accelerating Discoveries in Traffic Science

Accelerating Discoveries in Traffic Science

PUBLISHED
31.05.2021
LICENSE
Copyright (c) 2024 Patiphan Kaewwichian

Multiclass Classification with Imbalanced Datasets for Car Ownership Demand Model – Cost-Sensitive Learning

Authors:

Patiphan Kaewwichian
Faculty of Engineering, Rajamangala University of Technology Isan, Khon Kaen, Thailand

Keywords:cost matrix, decision trees, k-nearest neighbors (kNN), cross-validation, tour-based model

Abstract

In terms of the travel demand prediction from the household car ownership model, if the imbalanced data were used to support the transportation policy via a machine learning model, it would negatively affect the algorithm training process. The data on household car ownership obtained from the study project for the expressway preparation in the Khon Kaen Province (2015) was an unbalanced dataset. In other words, the number of members of the minority class is lower than the rest of the answer classes. The result is a bias in data classification. Consequently, this research suggested balancing the datasets with cost-sensitive learning methods, including decision trees, k-nearest neighbors (kNN), and naive Bayes algorithms. Before creating the 3-class model, a k-folds cross-validation method was applied to classify the datasets to define true positive rate (TPR) for the model’s performance validation. The outcome indicated that the kNN algorithm demonstrated the best performance for the minority class data prediction compared to other algorithms. It provides TPR for rural and suburban area types, which are region types with very different imbalance ratios, before balancing the data of 46.9% and 46.4%. After balancing the data (MCN1), TPR values were 84.4% and 81.4%, respectively.

References

  1. Karlaftis MG, Vlahogianni EI. Statistical Methods Versus Neural Networks in Transportation Research: Differences, Similarities and Some Insights. Transportation Research Part C: Emerging Technologies. 2011;19(3): 387-399.

    Kaewwichian P, Tanwanichkul L, Pitaksringkarn J. Car Ownership Demand Modeling Using Machine Learning: Decision Trees and Neural Networks. International Journal of GEOMATE. 2019;17(62): 219-230.

    Flach P. Machine Learning: The Art and Science of Algorithms that Make Sense of Data. Cambridge University Press; 2012.

    Chawla NV. Data Mining for Imbalanced Datasets: An Overview. In: Data Mining and Knowledge Discovery Handbook. Springer; 2009. p. 875-886.

    Longadge R, Dongre S. Class Imbalance Problem. In: Data Mining Review. arXiv Preprint; 2013.

    Branco P, Torgo L, Ribeiro RP. A Survey of Predictive Modeling on Imbalanced Domains. ACM Computing Surveys (CSUR). 2016;49(2): 1-50.

    Mazurowski MA, Habas PA, Zurada JM, Lo JY, Baker JA, Tourassi GD. Training Neural

Show more
How to Cite
Kaewwichian, P. (et al.) 2021. Multiclass Classification with Imbalanced Datasets for Car Ownership Demand Model – Cost-Sensitive Learning. Traffic&Transportation Journal. 33, 3 (May. 2021), 361-371. DOI: https://doi.org/10.7307/ptt.v33i3.3728.

SPECIAL ISSUE IS OUT

Guest Editor: Eleonora Papadimitriou, PhD

Editors: Marko Matulin, PhD, Dario Babić, PhD, Marko Ševrović, PhD


Accelerating Discoveries in Traffic Science |
2024 © Promet - Traffic&Transportation journal