Title: A modified hidden Markov model for outlier detection in multivariate datasets

Authors: G. Manoharan; K. Sivakumar

Addresses: Department of Mathematics, Sathyabama Institute of Science and Technology, Chennai, India ' Department of Mathematics, Saveetha School of Engineering, SIMATS, Saveetha University, Chennai, India

Abstract: The processing of data is an essential part of any field. More than 80% of the study effort is focused on collecting meaningful information from the vast amounts of data available. However, in order to minimise calculation time and improve accuracy, it is necessary to keep track of any unused, redundant, or irrelevant data in the dataset. Because it is tough to build up a data warehouse to separate homogeneous data, it will be inefficient and inappropriate in terms of deployment costs and performance metrics. Meanwhile, handling heterogeneous data consumes more time to process due to uneven data samples and missing data. Thus, identifying the data class and balancing the data is critical for improving the performance of classification models. Outlier detection is the process of detecting irrelevant, missing, or unequal data samples in a large database. The goal of this study is to employ a modified hidden Markov model to find such outliers in a big dataset. This method improves classification model performance while also reducing computation time and increasing classification accuracy. The proposed model is experimentally verified and compared with prominent existing technologies such as random forest and decision tree models.

Keywords: outlier detection; hidden Markov model; HMM; classification; support vector machine; SVM; random forest; RF; decision tree; DT.

DOI: 10.1504/IJESMS.2024.138287

International Journal of Engineering Systems Modelling and Simulation, 2024 Vol.15 No.3, pp.121 - 128

Received: 10 Dec 2021
Accepted: 15 Feb 2022

Published online: 01 May 2024 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article