oalogo2  

AUTHOR(S):

Danut Dragos Damian, Felicia Anisoara Michis, Luminita Moraru

 

TITLE

Assessing the Effect of Sensor Data Quality and Origin on Driver Behavior Modeling

pdf PDF

ABSTRACT

Although several public datasets for driver behaviour analysis are publicly available, inconsistencies in their formats hinder objective comparisons of machine learning and deep learning classification models. Such comparisons are essential for evaluating model performance under realistic driving conditions. To address this limitation, we present a benchmark study that investigates the influence of sensor data quality and source variability on the performance of driver behaviour classification models. Raw inertial measurement unit (IMU) data were analyzed from a publicly available driving dataset (Shardul, 2021, with 14,250 samples) and our proprietary smaller dataset, Drive2025 (containing 6,375 samples), both of which were collected under similar experimental conditions. Also, a combined dataset was built. The classification was performed on sequences of statistical feature vectors describing the dynamic behaviour of the vehicle: mean, variance, standard deviation, skewness, and kurtosis. For classification, the Random Forest (RF) and Support Vector Machine (SVM) algorithms were implemented as representative machine learning models. Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) architectures were used as deep learning counterparts. The experiments demonstrated the impact of data quality and origin on the performance of driver behaviour classification models. The CNN and LSTM models remain the most robust and stable, achieving accuracies of 0.80/0.81 and F1-scores of 0.85/0.84 on the proprietary dataset, and accuracies of 0.83/0.84 with F1-scores of 0.87/0.88 on the public dataset. On the combined dataset, they reached 0.82/0.85 accuracy and 0.84/0.84 F1-score, confirming strong generalization ability. The RF and SVM models showed better performance on the Mendeley dataset, with a moderate drop on the proprietary dataset due to natural noise and data variability. CNN and LSTM have considerable potential for improvement through appropriate filtering and preprocessing. These steps could significantly boost accuracy and prediction stability in real driving scenarios.

KEYWORDS

driver behaviour, IMU signals, Random Forest, CNN, LSTM

 

Cite this paper

Danut Dragos Damian, Felicia Anisoara Michis, Luminita Moraru. (2026) Assessing the Effect of Sensor Data Quality and Origin on Driver Behavior Modeling. International Journal of Mechanical Engineering, 11, 1-10

 

cc.png
Copyright © 2026 Author(s) retain the copyright of this article.
This article is published under the terms of the Creative Commons Attribution License 4.0