Forthcoming and Online First Articles

International Journal of Data Mining and Bioinformatics

International Journal of Data Mining and Bioinformatics (IJDMB)

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Online First articles are published online here, before they appear in a journal issue. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.

Open AccessArticles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

Register for our alerting service, which notifies you by email when new issues are published online.

International Journal of Data Mining and Bioinformatics (18 papers in press)

Regular Issues

  • An adaptive multimodal biometric feature recognition method based on visual perception   Order a copy of this article
    by Hua Deng, Jifu Zhang, Yun He, Jing Zhang, Dan Han 
    Abstract: In order to overcome the recognition accuracy and image information entropy of traditional biometric recognition methods, this paper proposes an adaptive multimodal biometric recognition method based on visual perception. Segmenting images using Weber scores and calculating the just noticeable difference (JND) to obtain human visual perceptual features; using induction bilateral filtering method for processing; using adaptive convolutional neural networks to fuse multimodal biological features; by using the support vector machine Shafer Dempster (SVM-DS) feature recognition function, the reliability allocation of different evidence body recognition is obtained, and the target type obtained by the decision module is obtained to achieve biometric recognition. After testing, when using this method, the image information entropy is greater than 0.9, and the image quality is optimised; in the process of feature recognition, it has the ability to accurately match.
    Keywords: visual perception; adaptive; multimodal; biological characteristics; identification and classification; bilateral filter.
    DOI: 10.1504/IJDMB.2025.10066713
     
  • Evaluation of neutrophil gelatinase-associated lipocalin precision in periprosthetic joint infection diagnosis   Order a copy of this article
    by Ting Fu, Huhu Wang, Qiaolong Hu, Shuai Ding, Jiaming He 
    Abstract: This study assesses neutrophil gelatinase-associated lipocalin (NGAL) as a diagnostic biomarker for periprosthetic joint infection (PJI). A comprehensive search in multiple databases (Cochrane Library, Scopus, OVID, PubMed, Web of Science, and Embase) identified relevant studies on NGAL for PJI diagnosis until September 2023. Pooled sensitivity and specificity for NGAL, alongside other biomarkers (CRP, ESR, SF-WBC, D-dimer, and PCT), were analysed. Across nine studies, NGAL showed high diagnostic accuracy with pooled sensitivity of 0.93 and specificity of 0.90. Other markers showed lower sensitivity (ESR: 0.81, CRP: 0.75, SF-WBC: 0.91, D-dimer: 0.43, PCT: 0.65) and specificity (ESR: 0.90, CRP: 0.93, SF-WBC: 0.86, D-dimer: 0.93, PCT: 0.93). The diagnostic odds ratio (DOR) for NGAL (132.89) surpassed other markers, supporting NGAL as a superior diagnostic tool for PJI.
    Keywords: periprosthetic joint infection; PJI; neutrophil gelatinase-associated lipocalin; NGAL; diagnosis.
    DOI: 10.1504/IJDMB.2025.10067532
     
  • CNN-enabled transfer learning and cosine rat swarm optimisation for classification of heart disease   Order a copy of this article
    by K. Saravanan , B. Sasikumar  
    Abstract: This research introduces the Cosine Rat Swarm Optimisation (CRSO)-based Transfer learning (TL) model for the classification of heart disease by using the medical data. Originally, the input medical data are accumulated and then the image is pre-processed by using the min-max normalisation. Then, the feature fusion is done using Matusita similarity measures considering Deep Maxout Network (DMN). Thereafter, the Borderline- Synthetic Minority Over-sampling Technique (SMOTE) oversampling model is used to augment the data. Then, the classification of heart disease is carried out by using a Convolution Neural Network (CNN) with transfer learning wherein the CNN is used with the hyperparameters from the trained models like Deep Batch-normalised eLU AlexNet (DbneAlexnet). Here, the training of DbneAlexnet is done using the CRSO algorithm. The proposed method achieved an accuracy of 91.7%, with a True Negative Rate (TNR) of 91% and a True Positive Rate (TPR) of 91.8%.
    Keywords: hyperparameter tuning; transfer learning; min-max normalisation; deep maxout network; DMN; Matusita similarity.
    DOI: 10.1504/IJDMB.2025.10068039
     

Special Issue on: Empowering the Future Generation of Data Mining and Knowledge Discovery in Bioinformatics

  • A novel intelligent-based intrusion detection and prevention system in the cloud using deep learning with meta-heuristic strategy   Order a copy of this article
    by Srilatha Doddi, Thillaiarasu N 
    Abstract: Cloud computing serves diverse options for end-users to minimise costs, and services are easily accessible through online platforms. While the users access the services remotely, the attackers launch cyber-attacks to disrupt the services. Cloud security analysts treat the security of the cloud as a potential area of research to minimise the impacts of abnormal behaviour. One of the potential solutions to detect attacks is the development of the next-generation intrusion detection and prevention system (IDPS). Hence, this paper proposes an efficient IDPS using a hybridised model known as hybrid firebug-squirrel swarm algorithm-based ensemble classifiers (HF-SSA-EC). Initially, the NSL-KDD cup 1999 dataset is considered for experimental analysis. The efficient features are extracted via restricted Boltzmann machines (RBM) layers of the deep belief network (DBN) model. The extracted features are submitted to the ensemble classifiers (ECs), which use naive Bayes (NB), support vector machines (SVM), deep neural networks (DNN), and recurrent neural networks (RNN) for identifying the intrusions. EC parameter optimisation using a hybridised HF-SSA meta-heuristic improves performance. Finally, the prevention model eliminates malicious nodes from detected intrusions. Meta-heuristic clustering is used in the preventative model. The experimental results reveal that the recommended IDPS outperforms existing models.
    Keywords: intrusion detection and prevention system; IDPS; cloud computing; restricted Boltzmann machines; RBM; deep feature extraction; firebug swarm optimisation; FSO; squirrel search algorithm.
    DOI: 10.1504/IJDMB.2025.10062482
     
  • Metaheuristic gene regulatory networks inference using discrete crow search algorithm and quantitative association rules   Order a copy of this article
    by Makhlouf Ledmi, Mohammed El Habib Souidi, Aboubekeur Hamdi-Cherif, Abdeldjalil Ledmi, Hichem Haouassi, Chafia Kara-Mohamed 
    Abstract: Gene regulatory networks (GRNs) inference appeared as valuable tools for detecting irregularities in cell regulation. Association rule mining (ARM) encompasses specific data mining methods capable of inferring unknown associations between genes. In response to the scarcity of ARM-based GRN inference, a novel metaheuristic algorithm, DCSA-QAR, is presented. This algorithm infers quantitative association rules by discretising the crow search algorithm. A first series of experiments involved comparison with five metaheuristic algorithms on six datasets. The results showed that, for Co-citation and YeastNet datasets, our algorithm was first in precision (100%), specificity (100%) and score (3.75). A second series of experiments involved nine information-theoretic algorithms through the DREAM3 and SOS networks. The average results on DREAM3 datasets are compensated by the SOS real datasets results: the best in accuracy, and true positives. As an overall appraisal, DCSA-QAR can be considered as a good candidate for ARM-based metaheuristic GRNs inference.
    Keywords: artificial intelligence; bioinformatics; gene regulatory networks; GRNs; data mining; soft computing; mining association rules.
    DOI: 10.1504/IJDMB.2025.10062651
     
  • Plasma proteins related to the state of depression: a case-control study based on proteomics data of pregnant women.   Order a copy of this article
    by Yuhao Feng, Jinman Zhang, Zengyue Zheng, Chenyu Xing, Min Li, Guanghong Yan, Ping Chen, Dingyun You, Ying Wu 
    Abstract: Prenatal and postpartum emotional changes in pregnant women in early pregnancy are of great significance to the physical and mental health of mothers and infants. To identify factors related to this, we conducted this study to identify feature proteins that cause maternal depression. Boruta algorithm (BA), recursive partition algorithm (RPA), regularised random forest (RRF) algorithm, least absolute shrinkage and selection operator (LASSO) algorithm, and genetic algorithm (GA) were used to select features. Extreme gradient boosting (XGBoost), back propagation neural network (BPNN), support vector machine (SVM), random forest (RF), and logistic regression (LR) were selected to construct the predictive models. All models showed a good performance in predicting, with the mean AUC (the area under the receiver operating curve) exceeding 80%. Features will provide clues to prevent depression in pregnant women and improve the physical and mental health of mothers and babies.
    Keywords: pregnant women; depression; proteomics; biomarkers; feature selection.
    DOI: 10.1504/IJDMB.2025.10064226
     
  • Enhancing drug-drug interaction event prediction from knowledge graphs by multimodal deep neural networks   Order a copy of this article
    by Xiaomin Shen, Jianliang Gao, Tengfei Lyu, Jiamin Chen, Jiarun Zhang, Jing He, Zhao Li, Wei Yu 
    Abstract: This study tackles the challenging issue of predicting drug-drug interactions (DDI) in pharmacology. Despite advancements in deep learning for DDI prediction, many techniques fail to fully leverage multimodal data correlations, limiting accuracy. To address this, we propose the knowledge graphs by multimodal deep neural network (KGMDNN) framework, enhancing DDI prediction by integrating features from drug knowledge graphs (DKG) and heterogeneous features (HF). KGMDNN uses a dual-path structure to obtain multimodal drug representations, effectively capturing drug relationships and connections within DKG to improve prediction accuracy. Our method excels in learning joint representations of structural information and multimodal data, as demonstrated through numerous real-world dataset experiments. Additionally, testing various drug knowledge graphs confirmed the model’s robustness. KGMDNN outperforms both classic and state-of-the-art models in prediction metrics and interpretability.
    Keywords: DDI event prediction; drug-drug interaction; graph neural network; GNN; knowledge graph; heterogeneous information; multi-modal data.
    DOI: 10.1504/IJDMB.2025.10066016
     

Special Issue on: The Development of Novel Integrative Bioinformatics Based Machine Learning Techniques and Multi Omics Data Integration Part 2

  • Machine learning algorithm for lung cancer classification using ADASYN with standard random forest   Order a copy of this article
    by J. Viji Gripsy, T. Divya  
    Abstract: Lung cancer is one type of cancer that develops in the lungs. Early identification of lung cancer symptoms may lead to a successful treatment. The dataset indicates the presence of duplicate characteristics, as well as an imbalanced classification, making lung cancer classification a challenging task. This study presents a novel approach that combines the ADASYN with the standard random forest (ASRF) model to enhance the efficacy of lung cancer dataset identification. The ASRF, as described, offers interpretable outcomes by using feature significance, hence providing significant insights into the aspects that contribute to judgments on the classification of lung cancer. The classification algorithm is used to ascertain the existence or absence of lung cancer in a certain patient. When comparing the proposed ASRF with the current SVM, MLP, RF and GB, compared to other existing methods, the ASRF technique achieved 93.5% precision, 94.7% recall, 94.1% F-measure, and 94% accuracy.
    Keywords: lung cancer; LC; RF ASRF; MLP; support vector machine; SVM; GB.
    DOI: 10.1504/IJDMB.2025.10065391
     
  • Modified VGG-16 model for COVID-19 chest X-ray images: optimal binary severity assessment   Order a copy of this article
    by Manoranjan Dash  
    Abstract: A pandemic caused by a virus known as COVID-19 has swept across the globe. One potential weapon in the fight against COVID-19 could be early detection through the use of chest X-ray images. In this paper, I have used modified VGG-16 deep learning model for binary classification of COVID-19 chest X-ray images. There are 16 weight layers in the standard VGG-16 model. In the suggested modified VGG model, the total number of weight layers has been reduced from 16 to 9 (eight convolutional layers and one fully connected layer). According to the results, the modified VGG-16 model performs better than the other three models (CNN, KNN and VGG-16) in terms of quantitative measures of accuracy, sensitivity and specificity. The dataset used for the proposed work consists of 24,000 chest X-ray images of lung collected from online depository comprising of 12,000 for each class (healthy and pneumonia).
    Keywords: deep learning; classification; COVID-19; SARS-CoV-2; modified VGG-16.
    DOI: 10.1504/IJDMB.2025.10065665
     
  • Revealing novel biomarkers for oesophageal squamous cell carcinoma through integrated single-cell RNA sequencing analysis   Order a copy of this article
    by Bikash Baruah, Manash P. Dutta, Subhasish Banerjee, Dhruba K. Bhattacharyya 
    Abstract: This study employs single-cell RNA sequencing (scRNA-seq) to analyse oesophageal squamous cell carcinoma (ESCC), identifying 10 potential biomarkers (ALDH2, ANGPT2, APPL1, ARPC2, CAD, CALM1, CLDN7, CLTB, F2RL3, LPAR1) associated with radiation exposure. Methodology involves scRNA-seq for data partitioning, pre-processing, clustering, and differential expression analysis. Dysregulated genes are identified through comprehensive gene ontology (GO) annotations, and ESCC-related pathways are explored via the Kyoto encyclopedia of genes and genomes (KEGG) database. Analysis of 38 genes reveals distinct patterns under radiation exposure, enriching understanding of ESCC-related processes, components, and functions. This research provides a holistic view of ESCC’s molecular landscape, emphasising the clinical significance of identified biomarkers and contributing significantly to the understanding of this complex malignancy.
    Keywords: oesophageal squamous cell carcinoma; ESCC; single-cell RNA sequencing; scRNA-seq; differential expression analysis; gene ontology; GO; pathway analysis; potential biomarker.
    DOI: 10.1504/IJDMB.2025.10065927
     
  • A novel suppressed segmentation framework for hyper spectral image processing in earlier cancer detection   Order a copy of this article
    by Kaushal Kishor, Manoj Singhal, Rajesh Kumar Maurya, Pramod Kumar Sagar, Rupak Sharma, Satya Prakash Yadav 
    Abstract: This paper affords a novel suppressed segmentation framework for Hyperspectral image processing before most cancer detection. This framework integrates the most recent advances in deep learning fashions and image segmentation for the most fulfilling selection-making approximately early cancer analysis. The proposed framework facilitates the fast and correct segmentation of the tumour tissues and other aberrations within hyperspectral images. The key modules of this framework encompass input pre-processing, noisy additive analysis, random area cropping, augmented context representation, hierarchical segmentation, and submit-processing. Experiments performed on real-world datasets show that the proposed framework yields segmentation accuracy similar to other main segmentation techniques while having advanced pace and robustness. The proposed model obtained 95.32% accuracy, 92.89% sensitivity, 91.50% specificity, 94.25% precision and 92.51% F1-score. This proposed method offers an optimised workflow for fast and correct segmentation of tumour tissues in most early cancer diagnoses.
    Keywords: image processing; deep learning; suppressed segmentation framework; SSF; hyperspectral image processing; HSIP; earlier cancer detection; traditional imaging techniques; hyperspectral imaging; hierarchical segmentation.
    DOI: 10.1504/IJDMB.2025.10066121
     
  • Age invariant face recognition method based on enhanced convolutional neural network   Order a copy of this article
    by Bin Fang 
    Abstract: Research on anti age invariant face recognition can not only improve the robustness of facial recognition systems, but also provide guidance for the development and application of facial recognition technology. Aiming at the problems of low peak signal-to-noise ratio, low recognition accuracy and long recognition time of traditional anti-age invariant face recognition methods, an age invariant face recognition method based on enhanced convolutional neural network is proposed. The captured images are enhanced using a bilateral filtering algorithm. The SURF algorithm is employed to extract facial features and remove age-related interference features, completing the selection of facial image features. These selected features are then inputted into the enhanced CNN to obtain the age invariant face recognition results. The experimental results demonstrate that the proposed method achieves a maximum image peak signal-to-noise ratio of 56.85dB,varying recognition accuracy in the range of 96.1% to 97.6%,and a maximum recognition time of 78.96ms
    Keywords: enhanced convolutional neural network; age invariant; face recognition; bilateral filtering algorithm; SURF algorithm.
    DOI: 10.1504/IJDMB.2025.10066150
     
  • Dual pipeline technique for detecting sepsis from photoplethysmography   Order a copy of this article
    by Shadi Abudalfa, Sara Lombardi, Eleonora Barcali, Leonardo Bocchi 
    Abstract: The goal of this work is to improve the performance of sepsis-detection in photoplethysmography (PPG) data. To achieve this goal, we present a hybrid technique for classifying sepsis in PPG data based on confident learning (CL) with noisy data. The technique presented in this study employs CL to improve the accuracy and reliability of the machine learning models, as it takes into account the uncertainty associated with each prediction. Numerous experiments were carried out to assess the performance of the presented technique in detecting sepsis using PPG data. The results obtained, using the best-performing XGBoost model, were compared with those of a previous study in which a deep learning-based model was applied to the same sample of data. The presented technique demonstrated its effectiveness by achieving an F1 score of 80.62% on test set, with a 7% improvement compared to the performance of the previous study.
    Keywords: confident learning; rich features; noisy data; photoplethysmography; sepsis; synthetic data generation.
    DOI: 10.1504/IJDMB.2025.10066271
     
  • Deep mining of elderly health data based on improved association clustering   Order a copy of this article
    by Bo Yang 
    Abstract: To deeply process the health data of the elderly, this paper designs a deep mining method for elderly health data based on an improved association clustering approach. Initially, health data samples from the elderly are collected. The Apriori algorithm is enhanced with interest constraints, connectivity operations are employed to generate candidate itemsets, and those that do not meet the requirements are eliminated. Associated feature quantities are then extracted from the health data. Subsequently, a fuzzy K-means algorithm with weight attributes is incorporated as the core method, and a balance coefficient is calculated using the principle of balanced contribution. Finally, the improved fuzzy K-means algorithm is utilised to complete data classification, detect abnormal data points, and achieve deep mining of the health data. The results indicate that the proposed method has a false alarm rate of less than 3.21% and a false negative rate of less than 1.81%, demonstrating a superior mining effect compared to the comparison method.
    Keywords: association rules; clustering algorithm; the elderly; health data; deep mining.
    DOI: 10.1504/IJDMB.2025.10066985
     
  • In silico study discerns PIH1D1 and p53 to be promising prognostic markers for children's brain cancer   Order a copy of this article
    by Dhiraj Kumar Singh, Prashant Ranjan, Sahar Qazi, Bimal Prasad Jit, Amit Kumar Verma, Riyaz Ahmad Mir 
    Abstract: Genetic alterations in normal brain cells lead to the development of brain tumours (BT). The incidence of newly diagnosed cases is on the rise over time. Understanding the molecular biology of paediatric brain tumours is crucial for advancing novel therapeutic approaches to prevent or effectively manage this disease. The R2TP complex, a conserved co-chaperone from yeast to mammals, including RUVBL1, RUVBL2, PIH1D1, and RPAP3 in humans, plays a crucial role in the assembly and maturation of various multi-subunit complexes. This study evaluates the expression of PIH1D1 and p53 in paediatric brain cancers using The Cancer Genome Atlas (TCGA) data through the UALCAN. Our analysis revealed elevated expression levels of PIH1D1 in paediatric brain tumours across all age groups compared to normal tissues, suggesting its potential as an early detection marker and a prognostic indicator. Additionally, P53 emerged as a promising target for brain tumour treatment, warranting exploration for age-specific applications.
    Keywords: R2TP; PIH1D1; paediatric brain tumour; TCGA; UALCAN; CBTTC.
    DOI: 10.1504/IJDMB.2025.10067136
     
  • ICEP and ILEP: two new approaches to identify community of complex biological network   Order a copy of this article
    by Mamata Das, K. Selvakumar , P.J.A. Alphonse 
    Abstract: Understanding the internal modular organization of protein-protein interactions is crucial for deciphering molecular-level biological processes. Recognition of network communities enhances our comprehension of the biological origins of disease pathogenesis. This research introduces two innovative community detection algorithms, Iterative Credit-Edge Pruning (ICEP) and Iterative Load-Based Edge Pruning (ILEP), designed to identify communities within complex biological networks. Our algorithms are evaluated using real-world data from the Omicron dataset, and their performance is compared with four established algorithms: Girvan-Newman, Louvain, Leiden, and the Label Propagation algorithm. Validation of the community structures is achieved through modularity. Among the techniques compared, our proposed method, ICEP, stands out with the highest modularity score of 0.885, outperforming all other approaches. The alternative method, ILEP, also achieves a notable modularity score of 0.698, surpassing the Girvan Newman method. By implementing ICEP and ILEP, we gain profound insights into the structural organization and interconnections within the Omicron virus.
    Keywords: protein interaction network; omicron; community detection; modularity; graphlet; centrality.
    DOI: 10.1504/IJDMB.2025.10067341
     
  • BMSD-CDE: a robust community detection ensemble method for biomarker identification   Order a copy of this article
    by Bikash Baruah, Manash P. Dutta, Subhasish Banerjee, Dhruba K. Bhattacharyya 
    Abstract: Community detection algorithms (CDAs) are crucial for identifying cohesive groups within complex networks. However, individual CDAs often fall short of accurately uncovering all hidden communities due to their inherent biases and limitations. These algorithms are typically designed with specific objectives, which may inadvertently lead to the oversight of certain community types, resulting in partial or imprecise outcomes. To address these limitations, we propose BMSD-community detection ensemble (CDE), a novel ensemble method that integrates six prominent CDAs FastGreedy, Infomap, LabelProp, LeadingEigen, Louvain, and Walktrap. By strategically combining the outputs of these diverse algorithms using p-value references and elite genes, BMSD-CDE enhances the accuracy and robustness of community detection. 2 B. Baruah et al. This ensemble approach provides a more reliable foundation for downstream analyses, particularly in identifying potential biomarkers. Applied to esophageal squamous cell carcinoma (ESCC), BMSD-CDE reveals a set of genes F2RL3, ATP6V1C2, CGN, CAD, ANGPT2, ALDH2, CLDN7, and DTX2 as potential biomarkers. These findings are supported by extensive topological and biological analyses across normal and disease conditions using four distinct datasets.
    Keywords: potential biomarker; community detection algorithm; CDA; ensemble algorithm; topological experiment; ESCC; biological validation; community detection ensemble; CDE.
    DOI: 10.1504/IJDMB.2025.10067623
     
  • Multi-epitopes prediction for designing a candidate vaccine against Ebola virus: a reverse vaccinology and immunoinformatics approach   Order a copy of this article
    by Swati Mohanty, Himanshu Singh 
    Abstract: Over a span of four decades, the Ebola virus disease (EVD) outbreak, has wreaked havoc starting from Central African countries through to different parts of the world including Asian countries. Guinea was the first to witness the catastrophe followed by many African and Asian countries including Liberia and Sierra Leone. In this study, the immunoinformatics approach which would include both B cell and T cell epitopes has been used for candidate vaccine development against EVD. The prediction of B cell and T cell epitopes was done by targeting the glycoprotein (GP) and VP40 proteins of Ebolavirus and an antigenic multi-epitope vaccine construct was designed. The vaccine construct was then docked with human immunogenic Toll-like Receptor 4 (TLR 4) having binding energy 13,883.1 and in silico immune simulation was done to predict the immunogenic potential of the vaccine construct with the CAI of 0.94 and the GC content 54.35 as it showed efficient expression in Escherichia coli (E. coli) K12 strain which produced vaccine in wide scale. The Ebola virus vaccine construct designed through the immunoinformatics approach in this study could be useful in combatting EVD.
    Keywords: Ebola virus; epitope-based vaccine; molecular docking; immunoinformatics; reverse vaccinology.
    DOI: 10.1504/IJDMB.2025.10068508