Forthcoming Articles

International Journal of Data Mining, Modelling and Management

International Journal of Data Mining, Modelling and Management (IJDMMM)

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Online First articles are also listed here. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.

Open AccessArticles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

Register for our alerting service, which notifies you by email when new issues are published online.

International Journal of Data Mining, Modelling and Management (19 papers in press)

Regular Issues

  • Sentiment Analysis of Danish Health Care Industries' Financial Text   Order a copy of this article
    by Rudra Pratap Deb Nath, Emil Bækdahl, Magnus Brogaard Larsen, Jakob Skallebæk, Jesper Juul Severinsen 
    Abstract: Sentiment analysis enables organisations to gain insights into market trends and customer opinions expressed in textual format. It quantifies textual opinions by classifying them as positive, negative, or neutral. We present a system for performing sentiment analysis on Danish texts related to the Danish healthcare industry. The system is composed of two components: domain-specific sentiment lexicon (DSSL) generator and dependency tree-based sentence analyser (DTSA). To generate DSSL, we use company stock prices to automatically label the sentiments of financial news articles based on the point-wise mutual information method and achieve performance improvements compared to existing general sentiment lexicons. Our DTSA is based on a data structure called a dependency tree, which describes how words in a text are connected. Depending on the types of connections between the words, we apply different rules to compute a sentiment value. This approach, in conjunction with DSSL, performs best in three-class sentence classification compared to systems using different sentiment lexicons and/or sentiment analysis components. We achieve an accuracy of 53% and the best F1 scores.
    Keywords: Sentiment Analysis; Danish Text Mining; Business Intelligence; Knowledge Discovery; Natural Language Processing; ETL.
    DOI: 10.1504/IJDMMM.2025.10066891
     
  • Lung Disease Classification using Deep Learning 1-D Convolutional Neural Network   Order a copy of this article
    by J. Viji Gripsy, Divya T 
    Abstract: Healthcare plays a crucial role in human life, particularly in the early diagnosis of diseases such as lung cancer, which affects people worldwide. Early detection of lung cancer can significantly improve treatment outcomes. This paper proposes a 1-D CNN deep learning architecture to classify patients into low, medium, and high-risk categories for lung cancer. The model achieves 97% training accuracy and 96.33% test accuracy, outperforming existing classification algorithms in accuracy, precision, recall, F1-score, and AUC. These results highlight the effectiveness of the proposed architecture in the early diagnosis of lung cancer.
    Keywords: lung disease; classification; 1-D convolutional neural network; 1-D CNN; prediction.
    DOI: 10.1504/IJDMMM.2025.10066898
     
  • Automated Big Data Quality Assessment using Knowledge Graph Embeddings   Order a copy of this article
    by Hadi Fadlallah, Chamoun Rima Kilany, Mitri Haber, Ali Jaber 
    Abstract: This paper introduces a knowledge-based approach to automate data quality assessment, addressing the limitations of traditional methods that overlook contextual data characteristics. By using knowledge graph embeddings, it predicts missing connections between a datasets context and relevant quality rules within a knowledge graph. This integration of diverse representations enables a context-specific data quality assessment plan tailored to each scenario. The approach enhances understanding of the datasets context, surpassing traditional strict matching methods. Numerical edge attributes are applied to assign weights to predicted quality measurements, providing a comprehensive assessment. The solution is evaluated using AmpliGraph on a radiation sensors dataset from the Lebanese Atomic Energy Commission (LAEC-CNRS), demonstrating its effectiveness in generating a robust data quality assessment plan. The results obtained from this evaluation demonstrate the capability of our solution to generate a comprehensive data quality assessment plan for the given input dataset.
    Keywords: Data quality assessment; Data context; Big data; Machine learning; Knowledge graph embeddings; Automation.
    DOI: 10.1504/IJDMMM.2025.10067404
     
  • Knowledge Discovery for Anthropometric Measures using Data Mining Techniques   Order a copy of this article
    by Ali Chegini, Alireza Dehghan, Roozbeh Ghousi 
    Abstract: The article presents a novel application for data mining techniques on anthropometric measures to uncover hidden relationships and associations between different body measurements. The anthropometric data consists of 111 samples with 15 features including basic demographics, body mass index (BMI), and ten anthropometric measures. The research utilises the CRISP-DM methodology to form an applicable data analytics framework for anthropometric measurements. Various data mining methods were applied including regression analysis to predict stature, clustering algorithms (K-means and hierarchical clustering) to segment the data, classification techniques (SVM and decision trees) to categorise BMI status, and association rules mining to uncover patterns between body dimensions and BMI category. The results demonstrated strong correlations between anthropometric dimensions with stature and weight; three distinct physical trait profiles clusters emerging from the K-means algorithm. The findings can facilitate ergonomic design and promote health assessments, and personalised interventions.
    Keywords: ergonomics; human factors; anthropometry; CRISP-DM; machine learning.
    DOI: 10.1504/IJDMMM.2025.10067602
     
  • Recognition of Critical Built-up Areas Located on High-hill Slope Regions using Decision Tree Technique   Order a copy of this article
    by B. G. Kodge  
    Abstract: In mountainous places, structures are being built for residential or commercial uses without the necessary safety precautions. Every year, landslides, torrential downpours, severe snowfall, earthquakes, volcanic eruptions, and floods cause buildings to collapse. The bulk of them are found in high-hill slope areas with loose soil types, close to river flows and other sorts of water sources. Therefore, these incidents have claimed thousands of lives. This paper deals with the process of automatic identification of critical buildings (residential/commercial) located in mountainous area which are on high-hill-slope, close to river flows, having loose soil type and high variations in land elevation contours. This study use the primary data like, built-up/residential area and water body areas which are extracted from sample land use and land cover (LULC) using image classification techniques, and another important data like slope map and land elevation contour maps which are generated from digital elevation model (DEM). In addition, the supplementary data like, river maps, soil maps and other base maps, are also collected. All the data are integrated and taken into consideration for the identification and extraction of critical residential/build-up areas using spatial data mining technique.
    Keywords: critical residential area identification; LUCL; DEM; image segmentation; decision tree; spatial data mining; SDM.
    DOI: 10.1504/IJDMMM.2026.10068077
     
  • Profiling Cryptocurrency Influencers on Social Media: a Comparative Study using SetFit and DistilBERT   Order a copy of this article
    by Rebeh Imane Ammar Aouchiche, Fatima Boumahdi, Mohamed Abdelkarim Remmide, Amina Guendouz 
    Abstract: Nowadays, in a world dominated by social media, the content people share can have significant effects, particularly in the domain of cryptocurrency, where investors often turn to online advice. The instability of the cryptocurrency market is well known, and some social media individuals wield considerable influence over this market through their posts. Our study focuses on categorizing these influential cryptocurrency influencers based on their English tweets, with the challenge of limited data availability. Two transformer-based models: Sentence Transformer Fine-tuning (SetFit) and Distilled Bert (DistilBERT), were used to classify cryptocurrency influencers into three subtasks: profile authors based on their degree of influence, main interests, and message intent. These models were evaluated on a Twitter-based dataset from PAN2023. The results show that SetFit achieved the best performance with a 0.82 F1-score, followed closely by DistilBERT with a 0.80 F1-score.
    Keywords: Social media; Author Profiling; Cryptocurrency influencers; DistilBert; SetFit; Few-shot-learning.
    DOI: 10.1504/IJDMMM.2026.10068138
     
  • A Web-Based Plagiarism Detection Method for Student Reports using Intrinsic Analysis   Order a copy of this article
    by Maryam Elamine, Lamia Hadrich Belguith 
    Abstract: With the advent of complex language models and the massive amount of data available on the Web, students have had an easier time committing plagiarism. This research describes a web-based system for identifying plagiarism in student reports using intrinsic analysis. To detect plagiarism, we use a combination of stylistic and semantic features as well as a similarity matching technique. We experimented with a dataset of scientific papers mostly published in French, the predominant language in our institutions. Our plagiarism detection method examines the writing style of suspect documents, locates relevant sources on the internet, and compares them to the suspicious documents using external text matching. The preliminary results are promising, with our intrinsic and extrinsic methods reaching an F-score of 40.3% and 89% accuracy, respectively.
    Keywords: Online plagiarism detection; intrinsic analysis; writing style analysis; semantic analysis; text-matching; plagiarism in education.
    DOI: 10.1504/IJDMMM.2025.10068457
     
  • Satellite Image Classification using Deep Learning Model-ResNet   Order a copy of this article
    by Pranali Kosamkar, Vrushali Kulkarni, Abdulrahim Shaikh, Geetika Agarwal, Inderjeet Balotia 
    Abstract: Data mining framework and artificial intelligence (AI) have played a key part in all decision making scenarios. Due to the significant expenses associated with creating training and testing datasets, we need to deal with a number of issues, object recognition, classification, and semantic segmentation in images of low spatial resolution. In this paper we first reviewed the machine learning and deep learning based model for satellite health monitoring systems. We built the deep learning model for satellite image classification. The dataset used is Satellite Image Classification Dataset-RSI-CB256. Two variants, ResNet-12 and ResNet-18 were tested on the dataset. The ResNet-18 showed over 0.94 accuracy for 5 number of epochs and the ResNet-12 showed 0.92 accuracy for training over 10 number of epochs. The result shows that the choice of employing the ResNet CNN architecture for Satellite Image Classification is certainly better than employing other available models such as FCNN, RCNN (with F-RCNN).
    Keywords: Deep Learning; ResNet; Data Mining; Artificial Intelligence; Machine Learning; satellite Image; Remote Sensing.
    DOI: 10.1504/IJDMMM.2026.10068596
     
  • Exploring Solar Activity Dynamics: Nonparametric Change Point Analysis of Sunspot and Umbra Areas   Order a copy of this article
    by Sushovon Jana, Chandranath Pal 
    Abstract: Solar observational studies are crucial for understanding the suns behaviour, its impact on space weather, and its influence on Earths climate. Central to this research is sunspot data analysis, a key indicator of solar activity and magnetic field variations. The study of solar differential rotation has been fundamental, with pioneering work revealing that faster equatorial rotation influences the suns magnetic field and activity cycle. Sunspot areas, meticulously documented by observatories like the Royal Greenwich Observatory and KoSO, have been critical for analysing long-term solar activity trends. The integration of machine learning has significantly advanced sunspot data analysis, enhancing space weather forecasting and the understanding of solar phenomena. This paper employs change point analysis on KoSO sunspot and umbra area data to detect significant shifts over time, utilising nonparametric methods for their computational efficiency. Results show deviations from normality, positive trends, and significant autocorrelation in the data. The PELT algorithm reveals several significant shifts, dividing the period into distinct segments with varying statistical characteristics. These findings align with known solar cycles and highlight the importance of advanced statistical techniques in understanding solar activity.
    Keywords: Sunspot; Summary statistics ; Change point analysis; Nonparametric.
    DOI: 10.1504/IJDMMM.2026.10068653
     
  • Machine Learning Pipeline with an Optimal Feature Set in the Stage-wise Diagnosis of Hepatitis C Virus   Order a copy of this article
    by Shirina Samreen 
    Abstract: Timely and accurate diagnosis of Hepatitis C Virus is aimed in the proposed research using a novel dataset For this purpose, numerous experiments are conducted using various machine learning models employing preprocessing techniques like feature engineering and data augmentation along with multiple heterogeneous classifiers In addition to detecting the onset of the disease, the proposed method also detects the stage of the disease to comprehend the severity for an appropriate follow-up treatment to prevent further damage to the health of the patient. Each experiment comprises various combinations of feature engineering approaches along with multiple heterogeneous classifiers It was found that the machine learning pipeline employing the feature engineering approach of recursive feature elimination with Support Vector Classifier as the estimator and a stacking ensemble classifier provides the best score for all performance metrics with a F1-score of 0.95, accuracy of 95.2 and mean square error of 0.06.
    Keywords: Machine Learning; Multi-class Classification; Feature Engineering; Imbalanced Dataset; Synthetic Minority Oversampling Technique; Recursive Feature Elimination; F1-Score; Mean Square Error.
    DOI: 10.1504/IJDMMM.2026.10068989
     
  • Entity Resolution: a Novel Graph Embedding Approach Using RandomDeep   Order a copy of this article
    by Nour Mekki, Djamel Berrabah, Abdelhamid Malki 
    Abstract: The exponential growth of digital information necessitates robust methods for entity resolution to ensure data quality and integration across datasets. This paper presents three novel node embedding algorithms for entity resolution in graph databases: textit{RandomDeep}, Refined embedding, and Combined embedding. textit{RandomDeep} integrates Iterative Deepening Depth First Search with deep learning to capture structural and semantic characteristics. Refined embedding enhances initial Graph Convolutional (GCN) embeddings through random walk-based refinement. Combined embedding merges outputs from complementary algorithms to produce versatile representations adaptable to diverse graph structures. A two-stage graph summarization technique supports this approach: initially as a blocking method to reduce computational complexity, and later during merging to consolidate redundant nodes. Evaluation datasets (DBLP-Scholar, Amazon-Google, Cora, and Yellow-Yelp) demonstrate the methods' effectiveness, with Area Under Cover Precision and Recall values ranging from 0.50 to 0.97 and F-measure values between 0.67 and 0.94. These results showcase accurate, efficient entity resolution in graph databases.
    Keywords: Entity Resolution; graph databases; node embedding; graph summarization; data quality.
    DOI: 10.1504/IJDMMM.2026.10069148
     
  • Context-Specific Multi-Class Data Analytics for Improving Online Conversation through Deep Learning   Order a copy of this article
    by Dhanasekaran K, Nadana Ravishankar, Goyal S. B, Sardar M. N. Islam 
    Abstract: Social networks have emerged as a platform for disseminating information rapidly to friends, relatives, and the public. An effective text classification strategy can improve the effectiveness of online discussion. This has been a great motivation behind text analytics research. Several text classification approaches have been developed to enhance information extraction performance and address its challenges. However, traditional text data analytics are based on limited contextual and static resources and require effective intelligent techniques for automatically extracting features from the container. To address these issues, we proposed and developed a unique context-specific Multi-Class Data Analytics architecture based on Deep Learning, this approach improved the performance of data analytics and mainly focused on extracting various types of information that describe several attributes to improve the online conversation. The experimental results showed that the proposed multi-class data analytics provide promising results over classification accuracy, validation accuracy, validation loss, precision, recall, and F1-measure in support of text classification for information extraction.
    Keywords: Convolutional neural network; Data analytics; Information extraction; Clustering; Deep learning.
    DOI: 10.1504/IJDMMM.2026.10069923
     
  • MoDA-TL - Monitoring Domestic Animals using Convolutional Neural Networks and Transfer Learning   Order a copy of this article
    by Alex A. Do Amaral, Raimundo V. Costa Filho, Mário W. De L. Moreira 
    Abstract: In recent years, computer vision has made significant advances, expanding its knowledge and applications in various fields. An important example is the use of this technology to improve the recognition of different types of animals. This paper proposes an intelligent surveillance system that can individually identify each animal in a specific location and clearly indicate dangerous or unsuitable areas during monitoring, ensuring the safety of both people and the animals being monitored. In this context, deep learning algorithms, such as convolutional neural networks (CNN), are used to produce machine learning models capable of detecting and identifying objects in digital images. The study utilises the You Only Look Once (YOLO) version 8 model and achieves 99.5% accuracy in animal recognition, demonstrating its effectiveness in monitoring. Additionally, a comparison between a model trained from random weight initialisation and another based on transfer learning reveals that the latter outperforms across various metrics, showing 99.5% accuracy, 99.3% recall, 99.5% mAP50, and 77.5% mAP50-95. These results highlight the advantage of transfer learning in optimising performance.
    Keywords: Artificial Intelligence; Deep Learning; Neural Networks; Computer Vision; Image Recognition.
    DOI: 10.1504/IJDMMM.2026.10070032
     
  • Hybrid Kernel Support Vector Penalised Regression Model for Forewarning Pest Incidence using Weather Variables   Order a copy of this article
    by Naranammal Narayanasamy, Krishna S. R. Priya 
    Abstract: Crop pest incidence and development are impacted by environmental factors. Therefore, weather-based machine learning model will be an effective scientific measure for forewarning pests. But in many cases, the raw data is complex and has the problems of nonlinearity and multicollinearity. So, development of robust model is much needed to forecast complex data. The present study is an attempt to develop hybrid models such as kernel support vector ridge and kernel support vector elastic net regression (KSVENR) to forewarn crop pests of Cotton. Weekly pest incidence data of sucking pests such as aphids, jassid, thrips and whitefly from year 2015-16 to 2022-23 has been used for the study. The results reveal that the KSVENR model outperformed other penalised models by 43%, 42%, 40% and 33% for forewarning pest incidence of aphids, jassid, thrips and whitefly respectively. The proposed model would be a good tool for forecasting nonlinear data with multicollinearity.
    Keywords: Time series; Modelling; Forecasting; Nonlinear; Multicollinearity; Data Analysis; Machine Learning; Hybrid Model.
    DOI: 10.1504/IJDMMM.2026.10070953
     
  • ATESA: Audio Text Emotion & Sentiment Analyser- a Sentiment & Emotion Analysis Tool based on Deep Learning Methods   Order a copy of this article
    by Pallavi Shukla, Rakesh Kumar, Vijay Dwivedi, Ashutosh Singh 
    Abstract: Sentiment analysis (SA) identifies sentiments in text, reviews, tweets, audio, images, and videos. Sentiment integrates emotion and thinking, with emotions being temporary while sentiments last longer. Emotion recognition and sentiment polarity analysis are gaining popularity in natural language processing due to their ability to mine social media data. This study applies machine learning (ML) classifiers such as random forest, logistic regression, support vector machine, and decision tree to classify text and speech as positive, negative, or neutral. Additionally, it explores available sentiment analysis tools and introduces the audio text emotion and sentiment analyser (ATESA). ATESA leverages ensemble-oriented classification techniques using deep learning, specifically bidirectional long-short-term memory recurrent neural networks (Bi-LSTM-RNN). It processes text, Twitter data, and speech converted into text. Experimental results show that ATESA achieves 92% accuracy, outperforming other algorithms.
    Keywords: Sentiment Analysis Tool; Bi-LSTM; RNN; TFIDF; Deep Learning.
    DOI: 10.1504/IJDMMM.2026.10071047
     
  • Advancements in Mental Health Diagnosis: Leveraging Delta Feature Extraction Framework and PWSA Ensemble for Motion Data Analysis   Order a copy of this article
    by S. Annapoorani, Lakshmi M. 
    Abstract: Depression affects over 350 million people globally and can become a serious health issue, especially when prolonged and ranging from mild to severe. Physical activity data offers a cost-effective and accessible approach to aid in diagnosing mental illnesses. This study introduces the Delta feature extraction framework (D-FEF), which extracts delta series and relevant features from original time series data, subsequently selecting a significant feature set. A probabilistic weighted selection algorithm (PSWA) with SMOTE generates multiple hypotheses using training data based on modified distributions, creating an ensemble of classifiers to predict healthy controls, depressive disorder, and schizophrenia. The PSWA classifier, utilising the D-FEF feature selection process, achieved 92.94% accuracy, outperforming all other tested methods. The techniques performance was evaluated on mental health datasets, including Depresjon and Psykose, and compared against state-of-the-art approaches. The proposed D-FEF and PSWA methodology demonstrates promising results for the classification of mental health conditions using physical activity data.
    Keywords: Actigraphy data; mental health; feature engineering; feature selection; ensemble machine learning algorithm.
    DOI: 10.1504/IJDMMM.2026.10072023
     
  • D-HUP Tree: Distributed HUP Tree for Scalable High Utility Itemset Mining   Order a copy of this article
    by Chintan Rajput, Mathe John Kenny Kumar, Dipti Rana 
    Abstract: High utility itemset mining (HUIM) is useful for extracting useful information from datasets. As volume, velocity and variety increases, the traditional methods struggle with computational efficiency with respect to runtime and memory utilisation. The proposed work introduces a new approach called distributed-high utility pattern tree (D-HUP Tree) by combining a HUP Tree data structure with the Hadoop distributed computing framework thereby improving runtime, memory management and enabling parallel processing. Experimental results clearly illustrate that the proposed methodology reduces computation complexity without compromising the quality of discovered high utility itemsets, providing a substantial contribution to the high utility itemset mining field.
    Keywords: High Utility Itemset Mining; HUP Tree; Distributed Itemset Mining; Map Reduce.
    DOI: 10.1504/IJDMMM.2026.10072676
     
  • Mining Maximal Empty Rectangles   Order a copy of this article
    by Dwipen Laskar, Irani Hazarika, Farha Naznin, Anjana Kakoti Mahanta 
    Abstract: An interval data with k-dimensions can be represented as a hyperrectangle. All the domains of an interval dataset can be represented as a bounded hyperrectangle, which can be treated as the universe or bounding region. Empty hyperrectangles within this bounding hyperrectangle are regions having no intersections with any other hyperrectangle represented by any data in the dataset. A maximal empty hyperrectangle is an empty hyperrectangle that is not properly contained in any other empty hyperrectangle. In a 2D interval dataset, the problem of mining all maximal empty hyperrectangles can be reduced to mining all maximal empty rectangles within the bounding rectangle of the dataset. In this paper, a two-steps dynamic algorithm called AMER-Miner has been proposed for mining all maximal empty rectangles contained in bounding rectangle of a 2D interval dataset. The proposed method has been tested on two real life datasets, one synthetic dataset and experimental results have reported.
    Keywords: Interval data; Empty interval; Empty rectangle; Hyperrectangle.
    DOI: 10.1504/IJDMMM.2026.10072741
     
  • Analysis of the Debt Status of Households in Poor Areas based on Economic Capital using Two-Class Boosted Decision Trees   Order a copy of this article
    by Pita Jarupunphol, Wipawan Buathong, Suthasinee Kuptabut 
    Abstract: This study examines household debt determinants in Kut Bak district, Thailand, using a two-class boosted decision tree (TBDT) model to analyse 301 households across 30 financial, asset, and socio-economic variables. Compared with logistic regression, decision tree, random forest, and XGBoost, the model demonstrates superior performance, achieving an accuracy of 0.922, precision of 0.975, recall of 0.867, F1-score of 0.918, and AUC of 0.948. Key findings reveal that limited savings, minimal state assistance, and low ownership of productive assets significantly increase debt likelihood. Specific thresholds, such as savings below 4,500 units and cash reserves of 50 units or less, are strongly associated with indebtedness. The study highlights the model's effectiveness in predicting debt status and provides actionable insights for policymakers and organisations to enhance financial stability in rural communities. These results contribute to understanding socio-economic factors driving household debt in disadvantaged areas.
    Keywords: data mining; household debt; machine learning; socio-economic factors; two-class boosted decision tree.
    DOI: 10.1504/IJDMMM.2026.10072842