International Journal of Data Analysis Techniques and Strategies (IJDATS) Inderscience Publishers - linking academia, business and industry through research

Forthcoming and Online First Articles

International Journal of Data Analysis Techniques and Strategies

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Online First articles are published online here, before they appear in a journal issue. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.

Articles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

International Journal of Data Analysis Techniques and Strategies (21 papers in press)

Regular Issues

Tackling Data Sparsity: A Hybrid Filtering Paradigm for Robust Recommender Systems
by Umarani Srikanth, Lijetha C. Jaffrin, Sushmitha Srikanth, Shyam Ramesh
Abstract: This paper introduces a hybrid recommender system approach that aims to tackle the problems associated with data sparsity, also referred to as the cold start problem, Recommender systems use user preferences to filter information. To improve recommendation accuracy, our method combines user-based and content-based collaborative filtering techniques. More specifically, content-based filtering takes over when there is little data. When there is a high degree of user similarity, user-based collaborative filtering is used to maximise accuracy by suggesting diverse items. This strategy can be used in a variety of fields, including e-commerce, music, books, and film.
Keywords: hybrid filtering; recommender systems; collaborative filtering; Singular Value Decomposition(SVD); machine learning; k-nearest neighbors.
DOI: 10.1504/IJDATS.2025.10064959

Using the BIRCH Algorithm and Affinity Propagation, an Advanced Descriptor for Video Processing
by Jayanta Mondal, Jitendra Pramanik, Satyajit Pattnaik, Bijay Paikaray
Abstract: Video summarisation is the most preferred approach to administer the augmentation of video content. In the area of video surveillance and object and intrusion detection, Video Summarization has been the most popular as it provides concise and less redundant information. As video content continues to expand quickly, an automatic video summary would be helpful for anyone who wants to learn more quickly and with less effort. Most existing methods depend on various network architectures to train a single score predictor for shot rating and selection. This study addresses the issue of video summarisation, which involves selecting significant frames to succinctly and comprehensively express the material of the original film. The current paper presents a comparative study of the application of advanced texture descriptors Local Phase Quantization (LPQ), Local Ternary Pattern (LTP), and Local Binary Pattern (LBP) in the process of Video Summarization. Clusters of key frames have been extracted by unsupervised learning algorithms - Affinity Propagation & BIRCH. The performance of the proposed video summarising method has shown good trial results.
Keywords: Local Ternary Pattern; Local Binary Pattern; Affinity Propagation; Local Phase Quantization; BIRCH; Key Feature.
DOI: 10.1504/IJDATS.2025.10065080

Jasminum Grandiflorum Flower Images Classification: Deep Learning and Transfer Learning Models with the Influence of Preprocessing via Contours and Convex Hull in Agritech 4.0
by A. Anushya, Savita Shiwani, Ayush Shrivastava
Abstract: This study specifically centres on classifying Jasminum Grandiflorum flowers through the utilisation of deep learning and transfer learning techniques. To achieve this, the research leverages advanced deep learning models such as CNNs, along with transfer learning using pre-trained architectures like VGG16, VGG19, ResNet18, and Vision Transformer. CNN stood out, excelling after extensive iterations. VGG 16 and 19 showed solid performance with fewer iterations, indicating competence in shorter training times. ResNet18 achieved the highest accuracy with fewer iterations but took longer (about 8 minutes per epoch), balancing efficiency and accuracy. ViT impressed with high accuracy despite needing more iterations, showcasing prowess in intricate learning and pattern recognition in the Jasminum Grandiflorum flower image dataset. The intended outcome of this research is to contribute significantly to the advancement of Agritech 4.0 by establishing a robust methodology for accurate Jasminum Grandiflorum flower classification without human participation.
Keywords: Convolutional Neural Network; VGG16; VGG19; ResNet18; Vision Transformer; Jasminum Grandiflorum; AgriTech 4.0.
DOI: 10.1504/IJDATS.2025.10065343

Prediction Model for AQI through Indian Vedic Science: Knowledge Management Technique to Control Pollution and for Sustainable Society
by Rohit Rastogi, Saransh Chauhan, Yash Rastogi, Vaibhav Aggarwal, Utkarsh Agrawal, Richa Singh
Abstract: The paper provides an essence of how Indian Vedic Sciences can be used for preventing and predicting the ill effects of pollution on the human body and nature through adopting simple methods of Yajna and Hawan in daily routine. With respect to any other resource like land and water, air is considered as the most important resource. Evidence shows that Indian Vedic Sciences primarily focus on prana vayu which means air that we breathe. The authors team and the Central Pollution Control Board (CPCB) have gathered the data and reading of the last four months through installed sensors in an isolated as well as non-isolated environment that was continuously under the effects of Yajna and Hawan.
Keywords: AQI; PM 2.5; PM 10; Climate Change; Yajna; Mantra; Human Health; Economic Growth; Knowledge Management; Knowledge Pyramid; Sustainable Society; Knowledge Levels and Extractions.
DOI: 10.1504/IJDATS.2025.10065356

Adaptive Parking Demand Prediction Using Discrete Time Based Dynamic Markov Chain
by Semeneh H. Bayih, Surafel Tilahun
Abstract: The demand for urban parking rapidly increases and becomes a significant traffic issue in densely populated metropolitan regions. Prediction of parking demand is crucial for reducing traffic jams and decreasing greenhouse gas emissions. It is also essential to the development of parking facilities and price adjustments in urban parking planning. Most of the earlier studies developed model for parking demand prediction using historical data which lack to update the demand data. Furthermore, the demand predictions are not considering the effect of parking pricing. However, parking pricing affects the demand in a given parking platform. To address this issue, we have considered three categories of parking demand based on price based preference. Dynamic non-homogeneous Markov chain with discrete time and discrete state is used to predict the parking demand. An adaptive approach or a learning approach is proposed to make the Markov chain dynamic and to adapt changes in the demand environment. A numerical example demonstrating the prediction from data collection as well as incorporating the adaptive strategy so that the system learning new changes, is presented.
Keywords: Prediction;Parking demand; Markov chain model; Adaptive Learning.
DOI: 10.1504/IJDATS.2025.10065504

Emoji Translation for Sentiment Analysis in Algerian Arabic Dialect
by Samira Hazmoune, Fateh Bougamouza
Abstract: Sentiment analysis (SA) is an important natural language processing (NLP) field that involves extracting sentiments and opinions from text data. Although SA has advanced significantly, its application to dialectal Arabic text presents challenges due to linguistic nuances and resource constraints. This research investigates the incorporation of emojis into SA for Algerian Arabic dialect (AAD), marking the first exploration of its kind in this area. Specifically, we focus on emoji translation, building upon prior studies highlighting emojis, potential in SA and their translation into meaningful words or sentences as a preprocessing approach. We evaluate the impact of this approach on enhancing sentiment classification in AAD text, specifically focusing on customer reviews of Algerian telephone operators. After preprocessing, including various emoji translation techniques, we employ transfer learning by fine-tuning DziriBERT model on a compiled Algerian dialect dataset. Our results demonstrate promising outcomes and offer novel conclusions and perspectives in AAD sentiment analysis.
Keywords: Sentiment Analysis; Emoji Translation; DziriBERT; Algerian Arabic Dialect; Transfer Learning; Emoji Categorisation; Emoji Handling ; Customer Reviews.
DOI: 10.1504/IJDATS.2025.10065720

Analysis of Online Transaction using Data Analytics Framework
by Md Nurul Islam, Iqbal Hasan, Shahla Tarannum, S.M.K. Quadri
Abstract: Nowadays, online transactions become a necessity for everyone; thus, they generate a vast amount of data, which requires a robust framework to ensure their security, efficiency, and reliability. This research paper explores the application of advanced data analytics techniques to ensure and enhance the confidentiality of the online transaction process. Using this analytics framework, we can analyse patterns, detect anomalies, and predict trends with online transaction data. An online survey was conducted to collect data from one lakh consumers of different geographical regions and diverse working groups. Descriptive analysis has been used in this study to ascertain the present state of online transactions. The study investigates the significance of feature selection, anomaly detection, and clustering methods in identifying patterns, trends, and potential fraud indicators within online transactions. The findings of this research contribute to the growing body of knowledge on leveraging data analytics frameworks to extract valuable insights from online transaction data.
Keywords: Online transactions; Data analytics; Online payment; Security; E-commerce; Analysis.
DOI: 10.1504/IJDATS.2025.10065866

Enhanced Pearl Millet Mildew Disease Detection using Ensemble Deep Learning Methods
by Aditya Kumar, Jainath Yadav
Abstract: Millet crops play a crucial role in global food security, providing sustenance to millions of people worldwide. Mildew disease poses a significant threat to pearl millet, a staple crop in many regions, impacting both its quality and yield. Detecting diseases in millet crops is crucial for maintaining both the quality and quantity of agricultural yields. However, limited labelled data and the expense of manual data labelling pose significant challenges in this domain. To address these issues, we suggest a deep learning ensemble framework that utilises the potential of multiple models for enhanced disease detection accuracy. Ensembles integrate the strengths of individual deep-learning models to improve overall performance and robustness. DenseNet121 and ResNet50, two deep learning models, were selected as the base models in our ensemble. Preliminary experimental results demonstrate the effectiveness of our ensemble approach, with an impressive accuracy of 96.6%.
Keywords: Millet crops; Leaf disease; Precision agriculture; Deep learning; Machine learning.
DOI: 10.1504/IJDATS.2025.10066101

A Comprehensive and Comparative Analysis of Deep Learning Models for Textual Sentiment Analysis
by Leyla Mammadova
Abstract: Analyzing public opinion may provide important insights for us. Sentiment analysis is a textual data analysis technique that identifies subjective information expressed by people or groups, including views and emotions. By advancing natural language processing and deep learning approaches, sentiment analysis advances our comprehension of human language. In this study, we provide a thorough evaluation and comparative analysis of various deep learning models, such as RNNs, LSTMs, and GRUs, and their bidirectional variants. We achieve an analysis with four datasets that are accessible to the public: The imdb_reviews, Twitter Sentiment Dataset, Emotions dataset and ag_news_subset. We assess the accuracy of six well-known deep learning models performance. Our experimental results demonstrate that bidirectional architectures perform generally better than their unidirectional equivalents. The bidirectional models consistently achieved the highest accuracy across different datasets.
Keywords: Sentiment analysis; RNN; LSTM; GRU; Bidirectional RNN; Bidirectional LSTM; Bidirectional GRU.
DOI: 10.1504/IJDATS.2025.10066752

Volatility Modelling and Forecasting in Stock Markets: a Machine Learning Approach
by Soumen Ghosh, Kuntal Mukherjee, Biswajit Jana, Syed Saif Ahmed, Mohammad Aasif, Sayel Munsi
Abstract: This research explores the application of various models for stock price prediction, including ARIMA, LSTM, SARIMAX, and a hybrid SARIMAX-LSTM, highlighting their importance in the post-pandemic financial landscape. The study emphasises the limitations of traditional methods and the necessity of time-series analysis for understanding stock price patterns. It focuses on the impact of COVID-19 on financial markets and assesses the reliability of these models in unpredictable conditions. The methodology involves data selection, pre-processing, model parameter tuning, and performance evaluation. The research establishes a framework for the implementation of these models, underscoring the need for parameter optimisation to enhance accuracy. Ultimately, the study shows that LSTM performs better than the other models and offers valuable insights into using advanced forecasting techniques for improved investment strategies in the evolving stock market.
Keywords: LSTM; ARIMA ; Moving average (MA); Autoregressive (AR) ; Mean Absolute Error (MAE); Mean Squared Error (MSE); Root Mean Squared Error (RMSE); and R-squared (R²).
DOI: 10.1504/IJDATS.2025.10066979

Analysing Social Medial Sentiment: Unravelling the Trichotomy of Positive, Negative, and Neutral Sentiments in User Comments
by Reddy Sowmya Vangumalla, Yoonsuk Choi
Abstract: This study explores sentiment analysis of Twitter comments, focusing on neutral, negative, and positive attitudes. By applying advanced techniques such as feature engineering, data pre-processing, and machine learning, we aim to derive actionable insights. Our approach involves setting project goals, selecting data sources, and establishing infrastructure for analysis. After pre-processing, we utilise support vector machines (SVMs) for classification and evaluate the model with metrics like accuracy, precision, recall, and F1-score. Visualisation tools, including ROC curves and confusion matrices, help interpret the results. We discuss the limitations and suggest future research to enhance performance and address data quality issues.
Keywords: Data Analysis; Decision-Making; Feature Engineering; Machine Learning; Sentiment Analysis; Social Media; Support Vector Machines; Twitter; Text Preprocessing.
DOI: 10.1504/IJDATS.2025.10067092

Climatic Data Analysis Using Machine Learning and Correlation with Human Health
by Rohit Rastogi, Prabhinav Mishra, Rayush Jain, Prateek Singh
Abstract: Climatic data analysis and effects on human health is a data science project that focuses on the analysis and interpretation of climatic data to gain valuable insights into past and present climate patterns. The project utilises advanced data analytics techniques like regression models to process and analyse large-scale climatic datasets, enabling the identification of trends and patterns that contribute to a deeper understanding of climate dynamics. The primary objectives of this project are to investigate climate change phenomena, assess the impact of climatic change on human health, and predict the variation of spread of diseases as per the different climatic conditions. By employing various statistical models, machine learning algorithms, and visualisation tools, the project aims to uncover hidden relationships within the data and provide evidence-based findings for policymakers, researchers, and stakeholders. To achieve these goals, the project leverages diverse sources of climatic data, including maximum and minimum temperature records, rainfall and humidity measurements, atmospheric pressure data etc.
Keywords: Jupyter NoteBook; Pandas; Linear Regression.
DOI: 10.1504/IJDATS.2025.10067196

Enhancing Healthcare Predictions with Deep Learning: Insights from Image Datasets
by W. A. W. A. Bakar, Muhammad Amierusyahmi Zuhairi, Mustafa Man, Nur Laila Najwa Josdi
Abstract: This study builds on prior research to improve healthcare predictions using deep learning with image datasets. Unlike numerical data, image processing in deep learning faces challenges such as large data volume, storage demands, computational resource needs, manual annotation, class imbalance, overfitting, and scalability issues. Effective solutions require robust pre-processing, efficient computation, thoughtful model design, and ethical considerations. This paper presents a 3-layer deep convolutional neural network (DCNN) to integrate image datasets, achieving 99% accuracy on benchmark datasets, including the brain tumour medical dataset (BTMD). The model employs dropout regularisation and incorporates numeric data insights, showcasing adaptability across different healthcare data types. These results highlight the significant potential of DCNNs for high-accuracy predictions in medical applications.
Keywords: Image dataset; Deep Convolutional Neural Network (DCNN); Brain Tumor Medical Dataset (BTMD); Prediction accuracy; Healthcare applications.
DOI: 10.1504/IJDATS.2026.10067673

Comparing Discrimination and Calibration Performance of Two Flexible Link Functions in Discrete Survival Models
by Susan Maposa, Alphonce Bere, Caston Sigauke, Charles Chimedza
Abstract: This study provides the first direct comparison between the Pareto and Logit-power link functions within discrete survival models, evaluated alongside three commonly used links. We assess their discrimination and calibration using simulated and real-life datasets with varying skewness. Simulations included 100 data sets with symmetric, right-skewed, and left-skewed distributions, and bootstrapping was applied for robust evaluation. The results show that cloglog excels in discrimination, while logit offers superior calibration. The Pareto family demonstrates robust performance, making it a reliable secondary option. However, Logit-power performs poorly in calibration and is unsuitable for discrete survival models. The study offers practical recommendations for implementing the Logit-power link, addressing its complex estimation process, and suggests a grid search approach using information criteria for parameter optimization. These findings highlight the importance of carefully selecting link functions in discrete survival modeling.
Keywords: Calibration; Discrimination; Discrete survival models; Families of link functions.
DOI: 10.1504/IJDATS.2025.10067711

A Data Analytics Approach to Improve the International Supply of Metal Inputs in the Metal-Mechanical Sector in Colombia
by Lina Mayerly Lozano Suarez, Fabian Alexander Torres Cardenas, Eduardo Rangel Diaz
Abstract: The metal-mechanical sector is vital to Colombia's industry, significantly contributing to economic development. To ensure its growth, this sector must enhance competitiveness, particularly in managing metal supplies, often imported. Analyzing imports is crucial, but data from DIAN is unprocessed and provided in extensive Excel microdata packages, requiring processing. This study proposes a data analytics approach combining descriptive and predictive analyses. Descriptive analysis using DIAN's 2023 data identifies key import factors: major supplier countries, main customs entries, locations of top importers, and common transport modes. Predictive analysis using regression, decision trees, and k-NN models predicts import quantities based on FOB value, with regression showing the highest accuracy. This approach helps companies understand factors affecting imports, such as transportation, customs management, cargo handling, and preparation, facilitating better decision-making and competitiveness.
Keywords: Data Analytics; supply chain; machine learning; international supply; import modeling; regression model; decision tree; k-NN; metal-mechanical sector; CRISP-DM; dashboard; indicators.
DOI: 10.1504/IJDATS.2026.10067935

An Empirical Examination of Classification Algorithms and Resampling Strategies for Dealing with Imbalanced Datasets: a Comparative Analysis
by Himani Deshpande, Leena Ragha
Abstract: Imbalanced datasets can lead to biased models and inaccurate predictions, thus making it a crucial issue to be addressed. This research comprehensively analyses issues, approaches and evaluation parameters to work with imbalanced dataset based machine learning models. Literature suggests that data imbalance handling methods are categorised into three broad categories namely pre-processing methods, cost-sensitive learning, and ensemble methods. Experiments are conducted to test popular classifiers in combination with three pre-processing methods namely clustered smote, random over sampling, and scaled values on seven standard imbalanced datasets. The results of study show that Random Forest classifier with Random Over Sampling pre-processing method, performed best for most of the datasets with precision values between 0.68 to 1, AUC values between 0.831, and prediction accuracy between 76.199.8%. This study highlights that the choice of the evaluation metric and the pre-processing method can have a significant impact on the performance of the classifier.
Keywords: Imbalanced data; Over sampling; Undersampling; Classifictaion; Cost sensitive; Ensemble Learning; Feature weighing ; Instance Weighing.
DOI: 10.1504/IJDATS.2025.10068244

Design a Modern Scheme for Machine Learning-Based Detection of Image Forgery
by Emir Kalik, Ayad Adhab
Abstract: The rapid growth and development of information technology have led to the emergence of numerous methods that are used for digital image forgery. Thus, manipulating digital images to achieve a negative or positive purpose has become easy. The use of advanced methods in forgery has increased the difficulty of detecting the nature of the images, whether they are original or forged, especially when using classical methods. Therefore, many researchers are interested in this field, making it a popular research direction for researchers. In this paper, we will introduce an intelligent approach to designing a method for digital image forgery detection by using machine learning. This proposal seeks to train an intelligent model to discern between altered and original images by examining the essential features of the images. The results demonstrated that it achieved superior performance and high accuracy when it came to detecting forgeries in digital images.
Keywords: Convolutional Neural Network; CNN; Deep Reinforcement Learning; DRL; Forgery; Image Detection; Manipulation.
DOI: 10.1504/IJDATS.2026.10068720

Development of G-Causality by Utilising Hybridisation of Bootstrap Method for Assessing Tourism Impacts in Malaysia
by Anton Abdulbasah Kamil, Muhamad Safiih Lola
Abstract: This study aims to develop and examine the causality direction of non-economic short and long-term factors in the Malaysian tourism industry using a new hybrid Bootstrap-Granger Model. The proposed method was validated with non-economic factor dataset from the World Bank (tourist arrival, population, air transport, and carbon dioxide emission) in the tourism industry. The model effectiveness was tested and analysed by comparing it against the actual Granger model using statistical tests such as unit root, Johansen cointegration, and Granger causality tests. The empirical results revealed that compared to the Granger model, the proposed counterpart generated smaller mean square error and root mean square error values for non-economic factor datasets. Furthermore, the results also revealed that tourist arrival and other determinants were co-integrated. In other words, the proposed model enhanced Granger causality accuracy and proved to be more robust, precise, and accurate results towards the promotion of overall economic activities.
Keywords: Bootstrap method; Granger Causality; Hybridization; Tourism Impact and non-economy factors; Malaysia.
DOI: 10.1504/IJDATS.2026.10069162

A Cross-Sectional Analysis of Severe SARS Cases Evolution in a Brazilian Municipality using Data Mining Techniques
by Silvano Júnior, William Oliveira, Luis Neto, Hugo Souza, Yúri Sant’Anna
Abstract: The first severe acute respiratory syndrome (SARS) outbreak occurred in China in 2002, followed by other coronavirus variants like MERS (2012), 2019-nCOV (2019), and Omicron (2020). While data mining (DM) has been widely used for SARS classification and decision-making, most studies overlook socioeconomic factors such as income and education. This study applies the cross-industry standard process for data mining (CRISP-DM) framework and DM techniques to predict severe SARS case progression in Recife, Brazil. Using open datasets, it incorporates attributes related to symptoms, pre-existing conditions, and socioeconomic indicators. Three healthcare experts participated in the analysis. Results showed that the apriori algorithm performed best in rule induction, while the decision tree slightly outperformed logistic regression. Notably, correlations emerged between severe case progression and socioeconomic data, underscoring the importance of integrating social determinants in disease classification models. These findings provide insights for improving predictive models and public health strategies.
Keywords: SARS; data mining; machine learning; CRISP-DM.
DOI: 10.1504/IJDATS.2026.10069755

Improving Public Health Outcomes through Accurate UV Index Forecasting: ARIMA and ANN Approach in Songkhla Province
by Korakot Wichitsa-nguan Jetwanna, Orathai Yongseng, Supanan Kongmee, Tanongsak Sukyareak, Wasun Bunyod, Chidchanok Choksuchat, Nuntouchaporn Prateepausanont, Thanathip Limna
Abstract: This research forecasts the UV Index using five weather parameters: temperature, dew point, humidity, wind speed, and atmospheric pressure in Muang District, Songkhla Province, over a period of 1,000 days (from March 6, 2021, to November 30, 2023). It employs a combined ARIMA and ANN model for prediction. The ARIMA model outputs were further used to forecast the UV index with ANN, yielding high accuracy. The dataset was processed to handle missing data using median values. Results showed that the ARIMA model had the MAPE of 0.04% to 26.49%, MAE of 0.3% to 4.3%, and RMSE of 0.4% to 5.4 Meanwhile, the ANN model demonstrated an accuracy of 94.2%.
Keywords: UV Index Prediction; ARIMA; Artificial Neural Networks; Weather Parameters; Public Health Outcomes.
DOI: 10.1504/IJDATS.2026.10070269

MUSEM: Combining Multi-UpSampling and Ensemble learning Methods for Effective Financial Fraud Detection
by Asieh Bagheri, Hossein Rahmani, Mohamad Mahdi Yadegar
Abstract: The rise of electronic payments, both online and in-person, has coincided with an increase in fraudulent and defaulted transactions, leading to significant financial losses. Researchers have explored various machine learning models for anomaly detection in credit card transactions, but challenges such as overlapping data classes and imbalanced distributions persist. To address these issues, we propose a dual-strategy approach called MUSEM, which integrates multi-up sampling with ensemble learning for enhanced fraud detection. MUSEM combines seven individual models into a unified framework, offering a more efficient method for identifying fraud. This study presents a comprehensive review and comparative analysis of various machine learning algorithms employed in financial fraud detection. Experimental results demonstrate a 3% improvement in recall over individual classifiers, affirming the effectiveness of the ensemble learning paradigm adopted in MUSEM. The findings highlight MUSEMs potential for real-world fraud detection applications, improving electronic payment security and reducing financial risks.
Keywords: UpSampling techniques; Ensemble learning; Financial Fraud; Machine learning; Majority voting; MUSEM.
DOI: 10.1504/IJDATS.2026.10070295

Forthcoming and Online First Articles

International Journal of Data Analysis Techniques and Strategies

Keep up-to-date