Forthcoming and Online First Articles

International Journal of Data Analysis Techniques and Strategies

International Journal of Data Analysis Techniques and Strategies (IJDATS)

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Online First articles are published online here, before they appear in a journal issue. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.

Open AccessArticles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

Register for our alerting service, which notifies you by email when new issues are published online.

International Journal of Data Analysis Techniques and Strategies (20 papers in press)

Regular Issues

  • Machine Learning Made Easy A Beginner's Guide for Causal Inference and Discovery Methods using Python   Order a copy of this article
    by Irfan Saleem, Ali Irfan 
    Abstract: Machine learning is widely recognised and extensively used for data modelling and prediction across fields, including business and healthcare, to name a few of them, for informed decision-making. Numerous machine learning algorithms have been devised and deployed across multiple programming languages throughout the preceding decades for causal inference and discovery. This research, however, briefly introduces causal inference and discovery methods, accompanied by Python code for beginners. First, this study talks about machine learning in brief. Then, this study differentiates between causal discovery and causal inference. Thirdly, the study aims to describe popular machine-learning methods. Finally, this paper demonstrates the practical uses of these causal inference and discovery packages in Python. The study has recommended future research and implications for using machine learning methods.
    Keywords: Python; Machine Learning; Causal discovery (CD); CausalInference (CI); Linear Regression; Peter-Clark (PC) algorithm.
    DOI: 10.1504/IJDATS.2025.10064732
     
  • Brain Tumour Detection and Multi Classification Using GNB-Based Machine Learning Architecture   Order a copy of this article
    by Satish N. Gujar, Ashish Gupta, Sanjaykumar P. Pingat, Rashmi Pandey, Atul Kumar, Deepak Gupta, Priya Pise 
    Abstract: Brain tumours are abnormal tissues with rapidly reproducing cells, posing significant challenges for identification and treatment. This study proposes a multimodal approach using machine learning and medical techniques for early diagnosis and segmentation of brain tumours. Noisy magnetic resonance imaging (MRI) are processed with a geometric mean to simplify noise removal. Fuzzy c-means algorithms segment the images, aiding in the detection of specific areas of interest. The grey-level co-occurrence matrix (GLCM) algorithm is used for dimension reduction and feature extraction. Various machine learning techniques, including Convolutional Neural Networks (CNN), Artificial Neural Networks (ANN), Support Vector Machine (SVM), Gaussian Naive Bayes (NB), and Adaptive Boosting, classify the images. Among these methods, Gaussian NB is particularly effective for identifying and classifying brain tumours. This approach leverages advanced AI and neural network techniques to enhance early diagnosis and improve treatment outcomes.
    Keywords: Machine Learning; GLCM; Gaussian Naive Bayes; Adaptive boosting; MRI.
    DOI: 10.1504/IJDATS.2025.10064741
     
  • Application of Text Mining Analysis in Understanding GameFi Adoption   Order a copy of this article
    by Yimiao Zhang, Jing Ren, Wenting Liu, Ding Ding 
    Abstract: Blockchain-based gaming industry has been expanding over the past two years, but the GameFi sector has yet to solve its biggest problem the lack of mass gamer adoption. In this work, text mining was leveraged to study the adoption status of GameFi and explore the possible requirements and concerns of game players regarding blockchain games. Quora questions relating to GameFi were collected to examine the key topics discussed by GameFi users or potential users. Our findings disclosed that GameFi is in the early stage of the innovation diffusion process and has not been widely adopted by the public. Individuals are concerned about the risk and return of play-to-earn (P2E) games, and some potential users are deterred by the high entry barriers of GameFi. Through studying the opinions of players or potential players, this study sheds some light on the possible strategies for improving blockchain game design in the near future.
    Keywords: GameFi; P2E; Mass Adoption; Text Analysis.
    DOI: 10.1504/IJDATS.2025.10064876
     
  • Tackling Data Sparsity: A Hybrid Filtering Paradigm for Robust Recommender Systems   Order a copy of this article
    by Umarani Srikanth, Lijetha C. Jaffrin, Sushmitha Srikanth, Shyam Ramesh 
    Abstract: This paper introduces a hybrid recommender system approach that aims to tackle the problems associated with data sparsity, also referred to as the cold start problem, Recommender systems use user preferences to filter information. To improve recommendation accuracy, our method combines user-based and content-based collaborative filtering techniques. More specifically, content-based filtering takes over when there is little data. When there is a high degree of user similarity, user-based collaborative filtering is used to maximise accuracy by suggesting diverse items. This strategy can be used in a variety of fields, including e-commerce, music, books, and film.
    Keywords: hybrid filtering; recommender systems; collaborative filtering; Singular Value Decomposition(SVD); machine learning; k-nearest neighbors.
    DOI: 10.1504/IJDATS.2025.10064959
     
  • Using the BIRCH Algorithm and Affinity Propagation, an Advanced Descriptor for Video Processing   Order a copy of this article
    by Jayanta Mondal, Jitendra Pramanik, Satyajit Pattnaik, Bijay Paikaray 
    Abstract: Video summarisation is the most preferred approach to administer the augmentation of video content. In the area of video surveillance and object and intrusion detection, Video Summarization has been the most popular as it provides concise and less redundant information. As video content continues to expand quickly, an automatic video summary would be helpful for anyone who wants to learn more quickly and with less effort. Most existing methods depend on various network architectures to train a single score predictor for shot rating and selection. This study addresses the issue of video summarisation, which involves selecting significant frames to succinctly and comprehensively express the material of the original film. The current paper presents a comparative study of the application of advanced texture descriptors Local Phase Quantization (LPQ), Local Ternary Pattern (LTP), and Local Binary Pattern (LBP) in the process of Video Summarization. Clusters of key frames have been extracted by unsupervised learning algorithms - Affinity Propagation & BIRCH. The performance of the proposed video summarising method has shown good trial results.
    Keywords: Local Ternary Pattern; Local Binary Pattern; Affinity Propagation; Local Phase Quantization; BIRCH; Key Feature.
    DOI: 10.1504/IJDATS.2025.10065080
     
  • Prediction of Success Factors for Mobile Application using Machine Learning Technique   Order a copy of this article
    by Jyoti Deone, Nilima Dongre, Mohammad Atique Junaid 
    Abstract: The remarkable boom in the mobile market has attracted many developers to build mobile apps. However, the majority of developers are suffering to generate earnings. For those developers, knowing the characteristics of successful apps may be very vital. We propose an approach which examines the categories of apps by two factors. First, the correlation is measured between app features and secondly, concepts are extracted from apps to understand the common theme present in them. For this, we selected 3,000 applications available in the Google Play Store. The observations specify that there may be a strong correlation among purchaser rating and the quantity of app downloads, though there may be no correlation between rate and downloads, nor among charge and rating. Moreover, we find standards unique to excessive rated apps and low rated apps. The correlation along with the concepts proves useful for application developers to understand the market trend and customer demand more easily than earlier approaches.
    Keywords: Android; Latent Semantic Analysis; Correlation Analysis; Concept Extraction.
    DOI: 10.1504/IJDATS.2025.10065137
     
  • Nutritional Cluster Analysis of Leguminous Food Sources Across West Africa   Order a copy of this article
    by Donald D. Atsa'am, Gabriel S. Iorundu, Moses T. Ukeyima 
    Abstract: The present form of the data on West African legumes reported in the West Africa Food Composition Table (WAFCT) do not reflect sub-groupings based on (dis)similarity in nutritive value. A possible consequence is that an uninformed user interested in leguminous food could randomly pick any from the data since all are summarily classified as one family in the WAFCT. To resolve this, the objective of this study was to apply the clustering technique to form sub-groups based on similarity in nutritional content. Three clusters were extracted, and unique properties have been established for food sources in each cluster at the granular level of nutrients. Going by the clustering, users who are interested/not interested in a particular content could look up the cluster with a lower, moderate, or higher content of the desired/non-desired element. The results are useful in the selection of raw materials, formulation of nutritional guidelines, and food labelling.
    Keywords: Legumes; nutritional analysis; legumes food sources; West Africa food composition table; k-means clustering.
    DOI: 10.1504/IJDATS.2025.10065146
     
  • Jasminum Grandiflorum Flower Images Classification: Deep Learning and Transfer Learning Models with the Influence of Preprocessing via Contours and Convex Hull in Agritech 4.0   Order a copy of this article
    by A. Anushya, Savita Shiwani, Ayush Shrivastava 
    Abstract: This study specifically centres on classifying Jasminum Grandiflorum flowers through the utilisation of deep learning and transfer learning techniques. To achieve this, the research leverages advanced deep learning models such as CNNs, along with transfer learning using pre-trained architectures like VGG16, VGG19, ResNet18, and Vision Transformer. CNN stood out, excelling after extensive iterations. VGG 16 and 19 showed solid performance with fewer iterations, indicating competence in shorter training times. ResNet18 achieved the highest accuracy with fewer iterations but took longer (about 8 minutes per epoch), balancing efficiency and accuracy. ViT impressed with high accuracy despite needing more iterations, showcasing prowess in intricate learning and pattern recognition in the Jasminum Grandiflorum flower image dataset. The intended outcome of this research is to contribute significantly to the advancement of Agritech 4.0 by establishing a robust methodology for accurate Jasminum Grandiflorum flower classification without human participation.
    Keywords: Convolutional Neural Network; VGG16; VGG19; ResNet18; Vision Transformer; Jasminum Grandiflorum; AgriTech 4.0.
    DOI: 10.1504/IJDATS.2025.10065343
     
  • Prediction Model for AQI through Indian Vedic Science: Knowledge Management Technique to Control Pollution and for Sustainable Society   Order a copy of this article
    by Rohit Rastogi, Saransh Chauhan, Yash Rastogi, Vaibhav Aggarwal, Utkarsh Agrawal, Richa Singh 
    Abstract: The paper provides an essence of how Indian Vedic Sciences can be used for preventing and predicting the ill effects of pollution on the human body and nature through adopting simple methods of Yajna and Hawan in daily routine. With respect to any other resource like land and water, air is considered as the most important resource. Evidence shows that Indian Vedic Sciences primarily focus on prana vayu which means air that we breathe. The authors team and the Central Pollution Control Board (CPCB) have gathered the data and reading of the last four months through installed sensors in an isolated as well as non-isolated environment that was continuously under the effects of Yajna and Hawan.
    Keywords: AQI; PM 2.5; PM 10; Climate Change; Yajna; Mantra; Human Health; Economic Growth; Knowledge Management; Knowledge Pyramid; Sustainable Society; Knowledge Levels and Extractions.
    DOI: 10.1504/IJDATS.2025.10065356
     
  • Adaptive Parking Demand Prediction Using Discrete Time Based Dynamic Markov Chain   Order a copy of this article
    by Semeneh H. Bayih, Surafel Tilahun 
    Abstract: The demand for urban parking rapidly increases and becomes a significant traffic issue in densely populated metropolitan regions. Prediction of parking demand is crucial for reducing traffic jams and decreasing greenhouse gas emissions. It is also essential to the development of parking facilities and price adjustments in urban parking planning. Most of the earlier studies developed model for parking demand prediction using historical data which lack to update the demand data. Furthermore, the demand predictions are not considering the effect of parking pricing. However, parking pricing affects the demand in a given parking platform. To address this issue, we have considered three categories of parking demand based on price based preference. Dynamic non-homogeneous Markov chain with discrete time and discrete state is used to predict the parking demand. An adaptive approach or a learning approach is proposed to make the Markov chain dynamic and to adapt changes in the demand environment. A numerical example demonstrating the prediction from data collection as well as incorporating the adaptive strategy so that the system learning new changes, is presented.
    Keywords: Prediction;Parking demand; Markov chain model; Adaptive Learning.
    DOI: 10.1504/IJDATS.2025.10065504
     
  • Emoji Translation for Sentiment Analysis in Algerian Arabic Dialect   Order a copy of this article
    by Samira Hazmoune, Fateh Bougamouza 
    Abstract: Sentiment analysis (SA) is an important natural language processing (NLP) field that involves extracting sentiments and opinions from text data. Although SA has advanced significantly, its application to dialectal Arabic text presents challenges due to linguistic nuances and resource constraints. This research investigates the incorporation of emojis into SA for Algerian Arabic dialect (AAD), marking the first exploration of its kind in this area. Specifically, we focus on emoji translation, building upon prior studies highlighting emojis, potential in SA and their translation into meaningful words or sentences as a preprocessing approach. We evaluate the impact of this approach on enhancing sentiment classification in AAD text, specifically focusing on customer reviews of Algerian telephone operators. After preprocessing, including various emoji translation techniques, we employ transfer learning by fine-tuning DziriBERT model on a compiled Algerian dialect dataset. Our results demonstrate promising outcomes and offer novel conclusions and perspectives in AAD sentiment analysis.
    Keywords: Sentiment Analysis; Emoji Translation; DziriBERT; Algerian Arabic Dialect; Transfer Learning; Emoji Categorisation; Emoji Handling ; Customer Reviews.
    DOI: 10.1504/IJDATS.2025.10065720
     
  • Analysis of Online Transaction using Data Analytics Framework   Order a copy of this article
    by Md Nurul Islam, Iqbal Hasan, Shahla Tarannum, S.M.K. Quadri 
    Abstract: Nowadays, online transactions become a necessity for everyone; thus, they generate a vast amount of data, which requires a robust framework to ensure their security, efficiency, and reliability. This research paper explores the application of advanced data analytics techniques to ensure and enhance the confidentiality of the online transaction process. Using this analytics framework, we can analyse patterns, detect anomalies, and predict trends with online transaction data. An online survey was conducted to collect data from one lakh consumers of different geographical regions and diverse working groups. Descriptive analysis has been used in this study to ascertain the present state of online transactions. The study investigates the significance of feature selection, anomaly detection, and clustering methods in identifying patterns, trends, and potential fraud indicators within online transactions. The findings of this research contribute to the growing body of knowledge on leveraging data analytics frameworks to extract valuable insights from online transaction data.
    Keywords: Online transactions; Data analytics; Online payment; Security; E-commerce; Analysis.
    DOI: 10.1504/IJDATS.2025.10065866
     
  • Enhanced Pearl Millet Mildew Disease Detection using Ensemble Deep Learning Methods   Order a copy of this article
    by Aditya Kumar, Jainath Yadav 
    Abstract: Millet crops play a crucial role in global food security, providing sustenance to millions of people worldwide. Mildew disease poses a significant threat to pearl millet, a staple crop in many regions, impacting both its quality and yield. Detecting diseases in millet crops is crucial for maintaining both the quality and quantity of agricultural yields. However, limited labelled data and the expense of manual data labelling pose significant challenges in this domain. To address these issues, we suggest a deep learning ensemble framework that utilises the potential of multiple models for enhanced disease detection accuracy. Ensembles integrate the strengths of individual deep-learning models to improve overall performance and robustness. DenseNet121 and ResNet50, two deep learning models, were selected as the base models in our ensemble. Preliminary experimental results demonstrate the effectiveness of our ensemble approach, with an impressive accuracy of 96.6%.
    Keywords: Millet crops; Leaf disease; Precision agriculture; Deep learning; Machine learning.
    DOI: 10.1504/IJDATS.2025.10066101
     
  • A Comprehensive and Comparative Analysis of Deep Learning Models for Textual Sentiment Analysis   Order a copy of this article
    by Leyla Mammadova 
    Abstract: Analyzing public opinion may provide important insights for us. Sentiment analysis is a textual data analysis technique that identifies subjective information expressed by people or groups, including views and emotions. By advancing natural language processing and deep learning approaches, sentiment analysis advances our comprehension of human language. In this study, we provide a thorough evaluation and comparative analysis of various deep learning models, such as RNNs, LSTMs, and GRUs, and their bidirectional variants. We achieve an analysis with four datasets that are accessible to the public: The imdb_reviews, Twitter Sentiment Dataset, Emotions dataset and ag_news_subset. We assess the accuracy of six well-known deep learning models performance. Our experimental results demonstrate that bidirectional architectures perform generally better than their unidirectional equivalents. The bidirectional models consistently achieved the highest accuracy across different datasets.
    Keywords: Sentiment analysis; RNN; LSTM; GRU; Bidirectional RNN; Bidirectional LSTM; Bidirectional GRU.
    DOI: 10.1504/IJDATS.2025.10066752
     
  • Volatility Modelling and Forecasting in Stock Markets: a Machine Learning Approach   Order a copy of this article
    by Soumen Ghosh, Kuntal Mukherjee, Biswajit Jana, Syed Saif Ahmed, Mohammad Aasif, Sayel Munsi 
    Abstract: This research explores the application of various models for stock price prediction, including ARIMA, LSTM, SARIMAX, and a hybrid SARIMAX-LSTM, highlighting their importance in the post-pandemic financial landscape. The study emphasises the limitations of traditional methods and the necessity of time-series analysis for understanding stock price patterns. It focuses on the impact of COVID-19 on financial markets and assesses the reliability of these models in unpredictable conditions. The methodology involves data selection, pre-processing, model parameter tuning, and performance evaluation. The research establishes a framework for the implementation of these models, underscoring the need for parameter optimisation to enhance accuracy. Ultimately, the study shows that LSTM performs better than the other models and offers valuable insights into using advanced forecasting techniques for improved investment strategies in the evolving stock market.
    Keywords: LSTM; ARIMA ; Moving average (MA); Autoregressive (AR) ; Mean Absolute Error (MAE); Mean Squared Error (MSE); Root Mean Squared Error (RMSE); and R-squared (R²).
    DOI: 10.1504/IJDATS.2025.10066979
     
  • Analysing Social Medial Sentiment: Unravelling the Trichotomy of Positive, Negative, and Neutral Sentiments in User Comments   Order a copy of this article
    by Reddy Sowmya Vangumalla, Yoonsuk Choi 
    Abstract: This study explores sentiment analysis of Twitter comments, focusing on neutral, negative, and positive attitudes. By applying advanced techniques such as feature engineering, data pre-processing, and machine learning, we aim to derive actionable insights. Our approach involves setting project goals, selecting data sources, and establishing infrastructure for analysis. After pre-processing, we utilise support vector machines (SVMs) for classification and evaluate the model with metrics like accuracy, precision, recall, and F1-score. Visualisation tools, including ROC curves and confusion matrices, help interpret the results. We discuss the limitations and suggest future research to enhance performance and address data quality issues.
    Keywords: Data Analysis; Decision-Making; Feature Engineering; Machine Learning; Sentiment Analysis; Social Media; Support Vector Machines; Twitter; Text Preprocessing.
    DOI: 10.1504/IJDATS.2025.10067092
     
  • Climatic Data Analysis Using Machine Learning and Correlation with Human Health   Order a copy of this article
    by Rohit Rastogi, Prabhinav Mishra, Rayush Jain, Prateek Singh 
    Abstract: Climatic data analysis and effects on human health is a data science project that focuses on the analysis and interpretation of climatic data to gain valuable insights into past and present climate patterns. The project utilises advanced data analytics techniques like regression models to process and analyse large-scale climatic datasets, enabling the identification of trends and patterns that contribute to a deeper understanding of climate dynamics. The primary objectives of this project are to investigate climate change phenomena, assess the impact of climatic change on human health, and predict the variation of spread of diseases as per the different climatic conditions. By employing various statistical models, machine learning algorithms, and visualisation tools, the project aims to uncover hidden relationships within the data and provide evidence-based findings for policymakers, researchers, and stakeholders. To achieve these goals, the project leverages diverse sources of climatic data, including maximum and minimum temperature records, rainfall and humidity measurements, atmospheric pressure data etc.
    Keywords: Jupyter NoteBook; Pandas; Linear Regression.
    DOI: 10.1504/IJDATS.2025.10067196
     
  • Enhancing Healthcare Predictions with Deep Learning: Insights from Image Datasets   Order a copy of this article
    by W. A. W. A. Bakar, Muhammad Amierusyahmi Zuhairi, Mustafa Man, Nur Laila Najwa Josdi 
    Abstract: This study builds on prior research to improve healthcare predictions using deep learning with image datasets. Unlike numerical data, image processing in deep learning faces challenges such as large data volume, storage demands, computational resource needs, manual annotation, class imbalance, overfitting, and scalability issues. Effective solutions require robust pre-processing, efficient computation, thoughtful model design, and ethical considerations. This paper presents a 3-layer deep convolutional neural network (DCNN) to integrate image datasets, achieving 99% accuracy on benchmark datasets, including the brain tumour medical dataset (BTMD). The model employs dropout regularisation and incorporates numeric data insights, showcasing adaptability across different healthcare data types. These results highlight the significant potential of DCNNs for high-accuracy predictions in medical applications.
    Keywords: Image dataset; Deep Convolutional Neural Network (DCNN); Brain Tumor Medical Dataset (BTMD); Prediction accuracy; Healthcare applications.
    DOI: 10.1504/IJDATS.2026.10067673
     
  • Comparing Discrimination and Calibration Performance of Two Flexible Link Functions in Discrete Survival Models   Order a copy of this article
    by Susan Maposa, Alphonce Bere, Caston Sigauke, Charles Chimedza 
    Abstract: This study provides the first direct comparison between the Pareto and Logit-power link functions within discrete survival models, evaluated alongside three commonly used links. We assess their discrimination and calibration using simulated and real-life datasets with varying skewness. Simulations included 100 data sets with symmetric, right-skewed, and left-skewed distributions, and bootstrapping was applied for robust evaluation. The results show that cloglog excels in discrimination, while logit offers superior calibration. The Pareto family demonstrates robust performance, making it a reliable secondary option. However, Logit-power performs poorly in calibration and is unsuitable for discrete survival models. The study offers practical recommendations for implementing the Logit-power link, addressing its complex estimation process, and suggests a grid search approach using information criteria for parameter optimization. These findings highlight the importance of carefully selecting link functions in discrete survival modeling.
    Keywords: Calibration; Discrimination; Discrete survival models; Families of link functions.
    DOI: 10.1504/IJDATS.2025.10067711
     
  • A Data Analytics Approach to Improve the International Supply of Metal Inputs in the Metal-Mechanical Sector in Colombia   Order a copy of this article
    by Lina Mayerly Lozano Suarez, Fabian Alexander Torres Cardenas, Eduardo Rangel Diaz 
    Abstract: The metal-mechanical sector is vital to Colombia's industry, significantly contributing to economic development. To ensure its growth, this sector must enhance competitiveness, particularly in managing metal supplies, often imported. Analyzing imports is crucial, but data from DIAN is unprocessed and provided in extensive Excel microdata packages, requiring processing. This study proposes a data analytics approach combining descriptive and predictive analyses. Descriptive analysis using DIAN's 2023 data identifies key import factors: major supplier countries, main customs entries, locations of top importers, and common transport modes. Predictive analysis using regression, decision trees, and k-NN models predicts import quantities based on FOB value, with regression showing the highest accuracy. This approach helps companies understand factors affecting imports, such as transportation, customs management, cargo handling, and preparation, facilitating better decision-making and competitiveness.
    Keywords: Data Analytics; supply chain; machine learning; international supply; import modeling; regression model; decision tree; k-NN; metal-mechanical sector; CRISP-DM; dashboard; indicators.
    DOI: 10.1504/IJDATS.2026.10067935