Forthcoming and Online First Articles

International Journal of Data Mining, Modelling and Management

International Journal of Data Mining, Modelling and Management (IJDMMM)

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Online First articles are published online here, before they appear in a journal issue. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.

Open AccessArticles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

Register for our alerting service, which notifies you by email when new issues are published online.

International Journal of Data Mining, Modelling and Management (19 papers in press)

Regular Issues

  • Enhancing Link Prediction in Dynamic Social Networks: A Novel Algorithm Integrating Global and Local Topological Structures   Order a copy of this article
    by Shambhu Kumar, Arti Jain, DINESH BISHT 
    Abstract: The link prediction problem has gained significant importance due to the emergence of many social networks. Existing link prediction algorithms in social networks often prioritise local or global attributes, yielding satisfactory performance on specific network types but with limitations like reduced accuracy or higher computational burden. This paper presents a novel link prediction approach that integrates global and local topological structures, assessing node similarity through a similarity index formula between two node pairs that is based on three key features: the number of common neighbours between nodes with some penalty factor introduced for each common node, node influence, and the shortest path distance between unconnected nodes. Evaluation using AUC has been performed against seven datasets and demonstrates significant improvement over baseline and state-of-the-art methods, enhancing accuracy by 30% and 6.75%. This highlights the efficacy of integrating global and local features for more accurate link prediction.
    Keywords: social network; link prediction; common neighbour; similarity measure; degree centrality; node distance.
    DOI: 10.1504/IJDMMM.2025.10064902
     
  • Comparative Analysis of Distance Measures in Stock Network Construction and Cluster Analysis   Order a copy of this article
    by Serkan Alkan 
    Abstract: The mutual information (MI) metric and the Pearson correlation metric are both widely used in cluster analysis and stock network construction. This paper presents a detailed comparison between the MI metric and the Pearson correlation metric. To detect nonlinear relationships, polynomial and natural cubic spline regressions are proposed as alternatives to the MI metric. The methodology for computing model-fitting indices for determining network adjacencies is explained in detail, along with a comparison of the results with the MI methodology. This study employs two data sets derived from the log returns of the daily adjusted closing prices of 402 stocks in the S&P500 index to measure the impact of a financial crisis on nonlinearity: one covering the crisis period from January 2007 to December 2009, and the other covering the non-crisis period between January 2012 and December 2015. The local and global properties of hierarchical stock networks are compared using the minimum spanning tree for each distance measure. The graph-theoretic internal cluster validity indices and external indices are also used to investigate the relationship between the performance of the community detection algorithm and the selection of metrics.
    Keywords: financial networks; mutual information; Pearson correlation; regression models; community detection.
    DOI: 10.1504/IJDMMM.2025.10065097
     
  • Analysis and Evaluation of Business Process Management (BPM) Tools and Techniques in the Industry 4.0   Order a copy of this article
    by Hari Lal Bhaskar  
    Abstract: The purpose of this paper is to analysis and evaluate the different tools and techniques of business process management (BPM) as well as selection and adoption factors for process mining tools in industry 4.0 for BPM. This paper also discusses that how tools and techniques of process mining can be used to drive the pedals of microeconomics principles. This paper discusses the core concepts of BPM and process mining tool in industry 4.0 as well as evaluation of different types of models etc. A tactical roadmap has been provided with a lot of comparative analysis for selecting a process mining tool or software for initiating a business process optimisation or BPR program. This work lies in the fact that how the modern-day digitally enabled organization, industry 4.0 to be specific, can actually benefit and re-organise its legacy systems using data-driven business insights, in order to achieve operational excellence.
    Keywords: Business Process Management (BPM); Digital Transformation; Digitalization; Process Mining; Industry 4.0; BPM tools; Industrial Internet of Things (IIoT).
    DOI: 10.1504/IJDMMM.2025.10065406
     
  • A Frequent Itemset Generation Approach in Data Mining using Transaction-Labelling Dynamic Itemset Counting (TL-DIC) Method   Order a copy of this article
    by Ambily Balaram, Nedunchezhian Raju 
    Abstract: A significant amount of data is generated, gathered, stored, and evaluated in real-world applications as a result of technology breakthroughs. Data mining (DM) combines a number of disciplines to efficiently discover hidden patterns from vast archives of historical information. To significantly reduce complexities associated with data, the proposed method, transaction-labelling dynamic itemset counting (TL-DIC), utilises a labelling approach on the given transactional database to logically arrange and process the underlying transactions. This method generates frequent itemsets thereby improving the performance of conventional dynamic itemset counting (DIC) method. Based on experimental findings, the average scan count in DIC and M-Apriori is 4% and 3.66%, respectively higher than TL-DIC, for different support counts. TL-DIC executes 20% and 16% quicker than DIC and M-Apriori, respectively, in terms of execution time. These results validate the proposed approach’s efficacy in creating frequent itemsets from large datasets.
    Keywords: data mining; association rule mining; ARM; dynamic itemset counting method; DIC; frequent itemset generation; transaction labelling; TL; labelling.
    DOI: 10.1504/IJDMMM.2025.10065414
     
  • Sorting Paired Points: A Dissimilarity Measure Based on Sorting of Series   Order a copy of this article
    by Wallace Pinheiro, Ricardo Q. A. Fernandes, Ana Bárbara Sapienza Pinheiro 
    Abstract: We propose a new dissimilarity measure, sorting different time series and measuring their absolute and relative degree of disorganisation. This work compares this strategy with the state-of-the-art of dissimilarities or similarities measures, such as DTW, maximal information coefficient (MIC) and complexity-invariant distance (CID). Two clustering algorithms, one deterministic and one non-deterministic, K-means and hierarchical, allow us to analyse their results. To infer the accuracy, we use two different indexes, maximal HITS, and adjusted Rand index. The results of the experiments, over 128 different datasets, demonstrate that the proposed approach provides more accurate results for different domains using the proposed metrics.
    Keywords: clustering; similarity; time series; entropy; sorting.
    DOI: 10.1504/IJDMMM.2025.10065723
     
  • Ensemble of Large Self-Supervised Transformers for Improving Speech Emotion Recognition   Order a copy of this article
    by Mrunal Gavali, Abhishek Verma 
    Abstract: Speech emotion recognition (SER) is a challenging and active field of collaborative, social robotics to improve human-robot interaction (HRI) and affective computing as a feedback mechanism. More recently self-supervised learning (SSL) approaches have become an important method for learning speech representations. We present results of experiments on the challenging largescale speech emotion RAVDESS dataset. Six very large state-of-the-art selfsupervised learning transformer models were trained on the speech emotion dataset.Wav2vec2.0-XLSR-53 was the most successful of the six level-0 models and achieved classification accuracy of 93%. We propose majority voting ensemble models that combined three and five level-0 models. The five-model and three-model majority voting ensemble models achieved 96.88% and 96.53% accuracy respectively and thereby significantly outperformed the best level-0 model and surpassed the state-of-the-art.
    Keywords: Speech Emotion Recognition; self-supervised learning; Emotion AI; transformers; speech processing; acoustic features.
    DOI: 10.1504/IJDMMM.2025.10065871
     
  • Ensemble Learning Models for Predicting the Gaming Addiction Behaviours of Adolescents   Order a copy of this article
    by Nongyao Nai-arun, Warachanan Choothong 
    Abstract: This paper proposes: 1) to create a prediction model for the game addiction of adolescents using six data mining algorithms; 2) to optimise the models by adjusting the parameters; 3) to create an ensemble model. Bagging and boosting algorithms were investigated for improving the models. Data were collected from eight Northern Rajabhat Universities in Thailand. The results found that bagging with neural network had shown the highest performance with an accuracy of 99.35%, followed by the boosting with neural network (99.02%). The model with the best-optimised parameters of the neural network algorithm achieved by adjusting the learning rate. The best model was used to develop a web application for predicting the gaming addiction behaviours of adolescents which would contribute to solve the problem.
    Keywords: classification; ensemble learning; bagging; boosting; neural network; random forest; optimisation; gaming addiction behaviours.
    DOI: 10.1504/IJDMMM.2025.10065942
     
  • A Review on Breast Cancer Detection using Machine Learning Techniques   Order a copy of this article
    by Sowjanya Yerramaneni, Sudheer Reddy K. 
    Abstract: One of the major diseases that has a high mortality rate in women is breast cancer. As the womens death rate has been increasing every year, it is necessary to decrease this number to detect the cancerous cells accurately by employing various methods. This paper presents a review of various works on the detection of breast cancer using various machine learning techniques such as decision tree, random forest, K-nearest neighbour, support vector machine, logistic regression and Na
    Keywords: breast cancer; classification models; machine learning; neural networks; deep learning.
    DOI: 10.1504/IJDMMM.2025.10065995
     
  • An Approach to Improve the Healthcare Purchase Decision: An Application in a Healthcare Center in Turkey   Order a copy of this article
    by Sena Kumcu, Bahar Özyörük 
    Abstract: For the healthcare sector, the right supplier selection and order quantity allocation decisions for the healthcare sector are crucial because the healthcare sector must deliver its products and services to its patients properly and on time. However, in this sector, supplier selection and order allocation decision is still not given enough attention. For this reason, there is a significant research and application gap in the literature. In this study, first, in order to determine the annual purchasing needs of the medical equipment that are vital for a healthcare centre in Ankara, T
    Keywords: healthcare procurement practices; supplier selection; order allocation; goal programming; ABC-VED analysis.
    DOI: 10.1504/IJDMMM.2025.10066154
     
  • Analysing and Forecasting COVID-19 Vaccination - Evidence from a Native American Community in North Carolina, USA   Order a copy of this article
    by Xin Zhang, Zhixin Kang, Guanlin Gao, Xinyan Shi 
    Abstract: This study examines the determining factors of vaccination decisions for adults and children in a historical tribal region and evaluates various machine learning models in their predicting powers. COVID-19 vaccination data were investigated; though, the proposed method may be used for evaluating other vaccination data. We administrated a survey and collected cross-sectional data (e.g., socio-demographics, COVID-19 testing behaviours, vaccination status, and people's knowledge about, attitude toward, and belief in the vaccines), developed new features and built predicting models (e.g., random forest, neural network, and decision tree), and evaluated their performance against the benchmark logistic regression models. The results show that people, who tested more frequently, believed vaccination is a social responsibility, and were provided with paid leaves from employers are more likely to be fully vaccinated and vaccinate their children. Our results also show that not all machine learning models outperform the logistic regression model.
    Keywords: COVID-19 vaccination intention; feature design and evaluation; vaccination forecasting; machine learning; Bayesian-correlation; model evaluation.
    DOI: 10.1504/IJDMMM.2025.10066364
     
  • Multi-Document Text Summarisation using DL-BiLSTM Model with Hybrid Algorithms   Order a copy of this article
    by Jyotirmayee Rautaray, Sangram Panigrahi, Ajit Kumar Nayak 
    Abstract: With the overwhelming amount of information available online, it becomes challenging for users to access relevant data. Automated techniques are essential to effectively filter and extract valuable information from vast datasets. Recently, text summarisation has emerged as a key method for distilling relevant content from lengthy documents. This work introduces a novel deep learning-based approach for multi-document text summarisation. The proposed system begins with pre-processing tasks such as stop word removal, sentence and paragraph chunking, stemming, and lemmatisation. Textual phrases are transformed into vector space models using TF-ISF and sentence scores are evaluated. A deep learning-based bidirectional long short-term memory model is employed for summarisation. Additionally, cat swarm optimisation and aquila optimisers refine DL model's parameters. The approach is validated using DUC 2002, DUC 2003, and DUC 2005 datasets, demonstrating superior performance across various metrics including Rouge scores, BLEU scores, cohesion, sensitivity, positive predictive value, and readability when compared to other summarisation methods.
    Keywords: multi-document text summarisation; MDTS; BiLSTM; term frequency-inverse sentence frequency; deep learning; Aquila optimiser; cat swarm optimisation; CSO; natural language processing; NLP.
    DOI: 10.1504/IJDMMM.2025.10066438
     
  • Training an Artificial Neural Network for an Effective PCB Defect Detection   Order a copy of this article
    by Blanka Bartova, Vladislav Bina 
    Abstract: The Printed Circuit Boards (PCBs) are crucial components of most electronic devices. In the last decades, the PCBs' manufacturing process was significantly improved, mainly by Surface Mounted Technology (SMT) and Automatic Optical Inspection (AOI) implementation. The real data as an output from the AOI device used for our analysis have been composed in a real manufacturing company. The currently used AOI solution achieves an accuracy of 95 82%. The goal of our study was to train an Artificial Neural Network (ANN) to detect the defect PCBs with the highest possible accuracy. Different approaches have been used for ANN training, such as the experimental approach, regression, and Taguchi method. The resulted PCA-ANN model combines Principal Components Analysis (PCA) method for data dimensionality reduction and ANN for low quality products detection. Our proposed model increases the AOI accuracy rate by 3.95%.
    Keywords: ANN; Taguchi; PCB; defect; detection; SMT; regression; data mining; networks training; quality management; Industry 4.0.
    DOI: 10.1504/IJDMMM.2025.10066541
     
  • Identifying Immoral Posts on Social Media Platforms: a Review   Order a copy of this article
    by Bibi Saqia, Khairullah Khan, Atta Ur Rahman 
    Abstract: Social media has become an integral part of our lives, connecting people across different parts of the world. Recently, there has been an increasing concern over the proliferation of immoral content on social media platforms. The ease and speed of communication on social media have made it a popular platform for people to express their opinions. Still, it has also led to the spread of harmful and immoral content. Hate speech, cyberbullying, and other forms of immoral behaviour are common on social media platforms, which can have serious consequences for the individuals involved and the wider community. Current literature reviews have normally fixated on a specific class of immoral posts as hate speech. According to the study, no review has been dedicated to overall categories of immoral post-identification. This paper describes a systematic literature review of computational approaches, resources, challenges, and research gaps about overall categories of immoral post-identification.
    Keywords: immoral posts; social media; cyberbullying; hate speech; challenges and issues.
    DOI: 10.1504/IJDMMM.2025.10066845
     
  • Sentiment Analysis of Danish Health Care Industries' Financial Text   Order a copy of this article
    by Rudra Pratap Deb Nath, Emil Bækdahl, Magnus Brogaard Larsen, Jakob Skallebæk, Jesper Juul Severinsen 
    Abstract: Sentiment analysis enables organisations to gain insights into market trends and customer opinions expressed in textual format. It quantifies textual opinions by classifying them as positive, negative, or neutral. We present a system for performing sentiment analysis on Danish texts related to the Danish healthcare industry. The system is composed of two components: domain-specific sentiment lexicon (DSSL) generator and dependency tree-based sentence analyser (DTSA). To generate DSSL, we use company stock prices to automatically label the sentiments of financial news articles based on the point-wise mutual information method and achieve performance improvements compared to existing general sentiment lexicons. Our DTSA is based on a data structure called a dependency tree, which describes how words in a text are connected. Depending on the types of connections between the words, we apply different rules to compute a sentiment value. This approach, in conjunction with DSSL, performs best in three-class sentence classification compared to systems using different sentiment lexicons and/or sentiment analysis components. We achieve an accuracy of 53% and the best F1 scores.
    Keywords: Sentiment Analysis; Danish Text Mining; Business Intelligence; Knowledge Discovery; Natural Language Processing; ETL.
    DOI: 10.1504/IJDMMM.2025.10066891
     
  • Lung Disease Classification using Deep Learning 1-D Convolutional Neural Network   Order a copy of this article
    by J. Viji Gripsy, Divya T 
    Abstract: Healthcare plays a crucial role in human life, particularly in the early diagnosis of diseases such as lung cancer, which affects people worldwide. Early detection of lung cancer can significantly improve treatment outcomes. This paper proposes a 1-D CNN deep learning architecture to classify patients into low, medium, and high-risk categories for lung cancer. The model achieves 97% training accuracy and 96.33% test accuracy, outperforming existing classification algorithms in accuracy, precision, recall, F1-score, and AUC. These results highlight the effectiveness of the proposed architecture in the early diagnosis of lung cancer.
    Keywords: lung disease; classification; 1-D convolutional neural network; 1-D CNN; prediction.
    DOI: 10.1504/IJDMMM.2025.10066898
     
  • Sentiment Analysis on Customers' Review in Indonesian Marketplace using Natural Language Processing (a Case Study of Organic Face Mask)   Order a copy of this article
    by Nur Izzaty, Adelia Shinta, Riski Arifin, Sri Rahmawati 
    Abstract: The increasing development of technology nowadays has led to the transformation of customers behaviour in purchasing products, from offline to online through marketplace. One of the most popular marketplaces in Indonesia is Shopee with the best seller skincare product is organic face mask. This study aims to analyse the sentiment of customers review using natural language processing (NLP) and term frequency-inversed document frequency (TF-IDF). The result revealed that from 882 reviews extracted, 89.7% was classified as positive reviews (rating 4 and 5) and the rest as much as 10.3% was the negative ones (rating 1 and 2). The sentiments were visualised using word cloud. Among the positive reviews were 'very good', 'quickly absorbed', and 'convenient'. Meanwhile, among the negative reviews were 'disappointed', 'delivery', and 'acne'. In summary, the performance metrics used for the evaluation of the classification model showed that the model accuracy reached 95%.
    Keywords: customers review; natural language processing; NLP; sentiment analysis; term frequency-inverse document frequency; TF-IDF; skincare; organic face mask.
    DOI: 10.1504/IJDMMM.2025.10066900
     
  • Automated Big Data Quality Assessment using Knowledge Graph Embeddings   Order a copy of this article
    by Hadi Fadlallah, Chamoun Rima Kilany, Mitri Haber, Ali Jaber 
    Abstract: This paper introduces a knowledge-based approach to automate data quality assessment, addressing the limitations of traditional methods that overlook contextual data characteristics. By using knowledge graph embeddings, it predicts missing connections between a datasets context and relevant quality rules within a knowledge graph. This integration of diverse representations enables a context-specific data quality assessment plan tailored to each scenario. The approach enhances understanding of the datasets context, surpassing traditional strict matching methods. Numerical edge attributes are applied to assign weights to predicted quality measurements, providing a comprehensive assessment. The solution is evaluated using AmpliGraph on a radiation sensors dataset from the Lebanese Atomic Energy Commission (LAEC-CNRS), demonstrating its effectiveness in generating a robust data quality assessment plan. The results obtained from this evaluation demonstrate the capability of our solution to generate a comprehensive data quality assessment plan for the given input dataset.
    Keywords: Data quality assessment; Data context; Big data; Machine learning; Knowledge graph embeddings; Automation.
    DOI: 10.1504/IJDMMM.2025.10067404
     
  • Knowledge Discovery for Anthropometric Measures using Data Mining Techniques   Order a copy of this article
    by Ali Chegini, Alireza Dehghan, Roozbeh Ghousi 
    Abstract: The article presents a novel application for data mining techniques on anthropometric measures to uncover hidden relationships and associations between different body measurements. The anthropometric data consists of 111 samples with 15 features including basic demographics, body mass index (BMI), and ten anthropometric measures. The research utilises the CRISP-DM methodology to form an applicable data analytics framework for anthropometric measurements. Various data mining methods were applied including regression analysis to predict stature, clustering algorithms (K-means and hierarchical clustering) to segment the data, classification techniques (SVM and decision trees) to categorise BMI status, and association rules mining to uncover patterns between body dimensions and BMI category. The results demonstrated strong correlations between anthropometric dimensions with stature and weight; three distinct physical trait profiles clusters emerging from the K-means algorithm. The findings can facilitate ergonomic design and promote health assessments, and personalised interventions.
    Keywords: ergonomics; human factors; anthropometry; CRISP-DM; machine learning.
    DOI: 10.1504/IJDMMM.2025.10067602
     
  • Recognition of Critical Built-up Areas Located on High-hill Slope Regions using Decision Tree Technique   Order a copy of this article
    by B. G. Kodge  
    Abstract: In mountainous places, structures are being built for residential or commercial uses without the necessary safety precautions. Every year, landslides, torrential downpours, severe snowfall, earthquakes, volcanic eruptions, and floods cause buildings to collapse. The bulk of them are found in high-hill slope areas with loose soil types, close to river flows and other sorts of water sources. Therefore, these incidents have claimed thousands of lives. This paper deals with the process of automatic identification of critical buildings (residential/commercial) located in mountainous area which are on high-hill-slope, close to river flows, having loose soil type and high variations in land elevation contours. This study use the primary data like, built-up/residential area and water body areas which are extracted from sample land use and land cover (LULC) using image classification techniques, and another important data like slope map and land elevation contour maps which are generated from digital elevation model (DEM). In addition, the supplementary data like, river maps, soil maps and other base maps, are also collected. All the data are integrated and taken into consideration for the identification and extraction of critical residential/build-up areas using spatial data mining technique.
    Keywords: critical residential area identification; LUCL; DEM; image segmentation; decision tree; spatial data mining; SDM.
    DOI: 10.1504/IJDMMM.2026.10068077