International Journal of Data Mining, Modelling and Management (IJDMMM) Inderscience Publishers - linking academia, business and industry through research

Forthcoming Articles

International Journal of Data Mining, Modelling and Management

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Online First articles are also listed here. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.

Articles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

International Journal of Data Mining, Modelling and Management (14 papers in press)

Regular Issues

Recognition of Critical Built-up Areas Located on High-hill Slope Regions using Decision Tree Technique
by B. G. Kodge
Abstract: In mountainous places, structures are being built for residential or commercial uses without the necessary safety precautions. Every year, landslides, torrential downpours, severe snowfall, earthquakes, volcanic eruptions, and floods cause buildings to collapse. The bulk of them are found in high-hill slope areas with loose soil types, close to river flows and other sorts of water sources. Therefore, these incidents have claimed thousands of lives. This paper deals with the process of automatic identification of critical buildings (residential/commercial) located in mountainous area which are on high-hill-slope, close to river flows, having loose soil type and high variations in land elevation contours. This study use the primary data like, built-up/residential area and water body areas which are extracted from sample land use and land cover (LULC) using image classification techniques, and another important data like slope map and land elevation contour maps which are generated from digital elevation model (DEM). In addition, the supplementary data like, river maps, soil maps and other base maps, are also collected. All the data are integrated and taken into consideration for the identification and extraction of critical residential/build-up areas using spatial data mining technique.
Keywords: critical residential area identification; LUCL; DEM; image segmentation; decision tree; spatial data mining; SDM.
DOI: 10.1504/IJDMMM.2026.10068077

Profiling Cryptocurrency Influencers on Social Media: a Comparative Study using SetFit and DistilBERT
by Rebeh Imane Ammar Aouchiche, Fatima Boumahdi, Mohamed Abdelkarim Remmide, Amina Guendouz
Abstract: Nowadays, in a world dominated by social media, the content people share can have significant effects, particularly in the domain of cryptocurrency, where investors often turn to online advice. The instability of the cryptocurrency market is well known, and some social media individuals wield considerable influence over this market through their posts. Our study focuses on categorizing these influential cryptocurrency influencers based on their English tweets, with the challenge of limited data availability. Two transformer-based models: Sentence Transformer Fine-tuning (SetFit) and Distilled Bert (DistilBERT), were used to classify cryptocurrency influencers into three subtasks: profile authors based on their degree of influence, main interests, and message intent. These models were evaluated on a Twitter-based dataset from PAN2023. The results show that SetFit achieved the best performance with a 0.82 F1-score, followed closely by DistilBERT with a 0.80 F1-score.
Keywords: Social media; Author Profiling; Cryptocurrency influencers; DistilBert; SetFit; Few-shot-learning.
DOI: 10.1504/IJDMMM.2026.10068138

Satellite Image Classification using Deep Learning Model-ResNet
by Pranali Kosamkar, Vrushali Kulkarni, Abdulrahim Shaikh, Geetika Agarwal, Inderjeet Balotia
Abstract: Data mining framework and artificial intelligence (AI) have played a key part in all decision making scenarios. Due to the significant expenses associated with creating training and testing datasets, we need to deal with a number of issues, object recognition, classification, and semantic segmentation in images of low spatial resolution. In this paper we first reviewed the machine learning and deep learning based model for satellite health monitoring systems. We built the deep learning model for satellite image classification. The dataset used is Satellite Image Classification Dataset-RSI-CB256. Two variants, ResNet-12 and ResNet-18 were tested on the dataset. The ResNet-18 showed over 0.94 accuracy for 5 number of epochs and the ResNet-12 showed 0.92 accuracy for training over 10 number of epochs. The result shows that the choice of employing the ResNet CNN architecture for Satellite Image Classification is certainly better than employing other available models such as FCNN, RCNN (with F-RCNN).
Keywords: Deep Learning; ResNet; Data Mining; Artificial Intelligence; Machine Learning; satellite Image; Remote Sensing.
DOI: 10.1504/IJDMMM.2026.10068596

Exploring Solar Activity Dynamics: Nonparametric Change Point Analysis of Sunspot and Umbra Areas
by Sushovon Jana, Chandranath Pal
Abstract: Solar observational studies are crucial for understanding the suns behaviour, its impact on space weather, and its influence on Earths climate. Central to this research is sunspot data analysis, a key indicator of solar activity and magnetic field variations. The study of solar differential rotation has been fundamental, with pioneering work revealing that faster equatorial rotation influences the suns magnetic field and activity cycle. Sunspot areas, meticulously documented by observatories like the Royal Greenwich Observatory and KoSO, have been critical for analysing long-term solar activity trends. The integration of machine learning has significantly advanced sunspot data analysis, enhancing space weather forecasting and the understanding of solar phenomena. This paper employs change point analysis on KoSO sunspot and umbra area data to detect significant shifts over time, utilising nonparametric methods for their computational efficiency. Results show deviations from normality, positive trends, and significant autocorrelation in the data. The PELT algorithm reveals several significant shifts, dividing the period into distinct segments with varying statistical characteristics. These findings align with known solar cycles and highlight the importance of advanced statistical techniques in understanding solar activity.
Keywords: Sunspot; Summary statistics ; Change point analysis; Nonparametric.
DOI: 10.1504/IJDMMM.2026.10068653

Machine Learning Pipeline with an Optimal Feature Set in the Stage-wise Diagnosis of Hepatitis C Virus
by Shirina Samreen
Abstract: Timely and accurate diagnosis of Hepatitis C Virus is aimed in the proposed research using a novel dataset For this purpose, numerous experiments are conducted using various machine learning models employing preprocessing techniques like feature engineering and data augmentation along with multiple heterogeneous classifiers In addition to detecting the onset of the disease, the proposed method also detects the stage of the disease to comprehend the severity for an appropriate follow-up treatment to prevent further damage to the health of the patient. Each experiment comprises various combinations of feature engineering approaches along with multiple heterogeneous classifiers It was found that the machine learning pipeline employing the feature engineering approach of recursive feature elimination with Support Vector Classifier as the estimator and a stacking ensemble classifier provides the best score for all performance metrics with a F1-score of 0.95, accuracy of 95.2 and mean square error of 0.06.
Keywords: Machine Learning; Multi-class Classification; Feature Engineering; Imbalanced Dataset; Synthetic Minority Oversampling Technique; Recursive Feature Elimination; F1-Score; Mean Square Error.
DOI: 10.1504/IJDMMM.2026.10068989

Entity Resolution: a Novel Graph Embedding Approach Using RandomDeep
by Nour Mekki, Djamel Berrabah, Abdelhamid Malki
Abstract: The exponential growth of digital information necessitates robust methods for entity resolution to ensure data quality and integration across datasets. This paper presents three novel node embedding algorithms for entity resolution in graph databases: textit{RandomDeep}, Refined embedding, and Combined embedding. textit{RandomDeep} integrates Iterative Deepening Depth First Search with deep learning to capture structural and semantic characteristics. Refined embedding enhances initial Graph Convolutional (GCN) embeddings through random walk-based refinement. Combined embedding merges outputs from complementary algorithms to produce versatile representations adaptable to diverse graph structures. A two-stage graph summarization technique supports this approach: initially as a blocking method to reduce computational complexity, and later during merging to consolidate redundant nodes. Evaluation datasets (DBLP-Scholar, Amazon-Google, Cora, and Yellow-Yelp) demonstrate the methods' effectiveness, with Area Under Cover Precision and Recall values ranging from 0.50 to 0.97 and F-measure values between 0.67 and 0.94. These results showcase accurate, efficient entity resolution in graph databases.
Keywords: Entity Resolution; graph databases; node embedding; graph summarization; data quality.
DOI: 10.1504/IJDMMM.2026.10069148

Context-Specific Multi-Class Data Analytics for Improving Online Conversation through Deep Learning
by Dhanasekaran K, Nadana Ravishankar, Goyal S. B, Sardar M. N. Islam
Abstract: Social networks have emerged as a platform for disseminating information rapidly to friends, relatives, and the public. An effective text classification strategy can improve the effectiveness of online discussion. This has been a great motivation behind text analytics research. Several text classification approaches have been developed to enhance information extraction performance and address its challenges. However, traditional text data analytics are based on limited contextual and static resources and require effective intelligent techniques for automatically extracting features from the container. To address these issues, we proposed and developed a unique context-specific Multi-Class Data Analytics architecture based on Deep Learning, this approach improved the performance of data analytics and mainly focused on extracting various types of information that describe several attributes to improve the online conversation. The experimental results showed that the proposed multi-class data analytics provide promising results over classification accuracy, validation accuracy, validation loss, precision, recall, and F1-measure in support of text classification for information extraction.
Keywords: Convolutional neural network; Data analytics; Information extraction; Clustering; Deep learning.
DOI: 10.1504/IJDMMM.2026.10069923

MoDA-TL - Monitoring Domestic Animals using Convolutional Neural Networks and Transfer Learning
by Alex A. Do Amaral, Raimundo V. Costa Filho, Mário W. De L. Moreira
Abstract: In recent years, computer vision has made significant advances, expanding its knowledge and applications in various fields. An important example is the use of this technology to improve the recognition of different types of animals. This paper proposes an intelligent surveillance system that can individually identify each animal in a specific location and clearly indicate dangerous or unsuitable areas during monitoring, ensuring the safety of both people and the animals being monitored. In this context, deep learning algorithms, such as convolutional neural networks (CNN), are used to produce machine learning models capable of detecting and identifying objects in digital images. The study utilises the You Only Look Once (YOLO) version 8 model and achieves 99.5% accuracy in animal recognition, demonstrating its effectiveness in monitoring. Additionally, a comparison between a model trained from random weight initialisation and another based on transfer learning reveals that the latter outperforms across various metrics, showing 99.5% accuracy, 99.3% recall, 99.5% mAP50, and 77.5% mAP50-95. These results highlight the advantage of transfer learning in optimising performance.
Keywords: Artificial Intelligence; Deep Learning; Neural Networks; Computer Vision; Image Recognition.
DOI: 10.1504/IJDMMM.2026.10070032

Hybrid Kernel Support Vector Penalised Regression Model for Forewarning Pest Incidence using Weather Variables
by Naranammal Narayanasamy, Krishna S. R. Priya
Abstract: Crop pest incidence and development are impacted by environmental factors. Therefore, weather-based machine learning model will be an effective scientific measure for forewarning pests. But in many cases, the raw data is complex and has the problems of nonlinearity and multicollinearity. So, development of robust model is much needed to forecast complex data. The present study is an attempt to develop hybrid models such as kernel support vector ridge and kernel support vector elastic net regression (KSVENR) to forewarn crop pests of Cotton. Weekly pest incidence data of sucking pests such as aphids, jassid, thrips and whitefly from year 2015-16 to 2022-23 has been used for the study. The results reveal that the KSVENR model outperformed other penalised models by 43%, 42%, 40% and 33% for forewarning pest incidence of aphids, jassid, thrips and whitefly respectively. The proposed model would be a good tool for forecasting nonlinear data with multicollinearity.
Keywords: Time series; Modelling; Forecasting; Nonlinear; Multicollinearity; Data Analysis; Machine Learning; Hybrid Model.
DOI: 10.1504/IJDMMM.2026.10070953

ATESA: Audio Text Emotion & Sentiment Analyser- a Sentiment & Emotion Analysis Tool based on Deep Learning Methods
by Pallavi Shukla, Rakesh Kumar, Vijay Dwivedi, Ashutosh Singh
Abstract: Sentiment analysis (SA) identifies sentiments in text, reviews, tweets, audio, images, and videos. Sentiment integrates emotion and thinking, with emotions being temporary while sentiments last longer. Emotion recognition and sentiment polarity analysis are gaining popularity in natural language processing due to their ability to mine social media data. This study applies machine learning (ML) classifiers such as random forest, logistic regression, support vector machine, and decision tree to classify text and speech as positive, negative, or neutral. Additionally, it explores available sentiment analysis tools and introduces the audio text emotion and sentiment analyser (ATESA). ATESA leverages ensemble-oriented classification techniques using deep learning, specifically bidirectional long-short-term memory recurrent neural networks (Bi-LSTM-RNN). It processes text, Twitter data, and speech converted into text. Experimental results show that ATESA achieves 92% accuracy, outperforming other algorithms.
Keywords: Sentiment Analysis Tool; Bi-LSTM; RNN; TFIDF; Deep Learning.
DOI: 10.1504/IJDMMM.2026.10071047

Advancements in Mental Health Diagnosis: Leveraging Delta Feature Extraction Framework and PWSA Ensemble for Motion Data Analysis
by S. Annapoorani, Lakshmi M.
Abstract: Depression affects over 350 million people globally and can become a serious health issue, especially when prolonged and ranging from mild to severe. Physical activity data offers a cost-effective and accessible approach to aid in diagnosing mental illnesses. This study introduces the Delta feature extraction framework (D-FEF), which extracts delta series and relevant features from original time series data, subsequently selecting a significant feature set. A probabilistic weighted selection algorithm (PSWA) with SMOTE generates multiple hypotheses using training data based on modified distributions, creating an ensemble of classifiers to predict healthy controls, depressive disorder, and schizophrenia. The PSWA classifier, utilising the D-FEF feature selection process, achieved 92.94% accuracy, outperforming all other tested methods. The techniques performance was evaluated on mental health datasets, including Depresjon and Psykose, and compared against state-of-the-art approaches. The proposed D-FEF and PSWA methodology demonstrates promising results for the classification of mental health conditions using physical activity data.
Keywords: Actigraphy data; mental health; feature engineering; feature selection; ensemble machine learning algorithm.
DOI: 10.1504/IJDMMM.2026.10072023

D-HUP Tree: Distributed HUP Tree for Scalable High Utility Itemset Mining
by Chintan Rajput, Mathe John Kenny Kumar, Dipti Rana
Abstract: High utility itemset mining (HUIM) is useful for extracting useful information from datasets. As volume, velocity and variety increases, the traditional methods struggle with computational efficiency with respect to runtime and memory utilisation. The proposed work introduces a new approach called distributed-high utility pattern tree (D-HUP Tree) by combining a HUP Tree data structure with the Hadoop distributed computing framework thereby improving runtime, memory management and enabling parallel processing. Experimental results clearly illustrate that the proposed methodology reduces computation complexity without compromising the quality of discovered high utility itemsets, providing a substantial contribution to the high utility itemset mining field.
Keywords: High Utility Itemset Mining; HUP Tree; Distributed Itemset Mining; Map Reduce.
DOI: 10.1504/IJDMMM.2026.10072676

Mining Maximal Empty Rectangles
by Dwipen Laskar, Irani Hazarika, Farha Naznin, Anjana Kakoti Mahanta
Abstract: An interval data with k-dimensions can be represented as a hyperrectangle. All the domains of an interval dataset can be represented as a bounded hyperrectangle, which can be treated as the universe or bounding region. Empty hyperrectangles within this bounding hyperrectangle are regions having no intersections with any other hyperrectangle represented by any data in the dataset. A maximal empty hyperrectangle is an empty hyperrectangle that is not properly contained in any other empty hyperrectangle. In a 2D interval dataset, the problem of mining all maximal empty hyperrectangles can be reduced to mining all maximal empty rectangles within the bounding rectangle of the dataset. In this paper, a two-steps dynamic algorithm called AMER-Miner has been proposed for mining all maximal empty rectangles contained in bounding rectangle of a 2D interval dataset. The proposed method has been tested on two real life datasets, one synthetic dataset and experimental results have reported.
Keywords: Interval data; Empty interval; Empty rectangle; Hyperrectangle.
DOI: 10.1504/IJDMMM.2026.10072741

Analysis of the Debt Status of Households in Poor Areas based on Economic Capital using Two-Class Boosted Decision Trees
by Pita Jarupunphol, Wipawan Buathong, Suthasinee Kuptabut
Abstract: This study examines household debt determinants in Kut Bak district, Thailand, using a two-class boosted decision tree (TBDT) model to analyse 301 households across 30 financial, asset, and socio-economic variables. Compared with logistic regression, decision tree, random forest, and XGBoost, the model demonstrates superior performance, achieving an accuracy of 0.922, precision of 0.975, recall of 0.867, F1-score of 0.918, and AUC of 0.948. Key findings reveal that limited savings, minimal state assistance, and low ownership of productive assets significantly increase debt likelihood. Specific thresholds, such as savings below 4,500 units and cash reserves of 50 units or less, are strongly associated with indebtedness. The study highlights the model's effectiveness in predicting debt status and provides actionable insights for policymakers and organisations to enhance financial stability in rural communities. These results contribute to understanding socio-economic factors driving household debt in disadvantaged areas.
Keywords: data mining; household debt; machine learning; socio-economic factors; two-class boosted decision tree.
DOI: 10.1504/IJDMMM.2026.10072842

Forthcoming Articles

International Journal of Data Mining, Modelling and Management

Keep up-to-date