Forthcoming and Online First Articles

International Journal of Data Mining and Bioinformatics

International Journal of Data Mining and Bioinformatics (IJDMB)

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Online First articles are published online here, before they appear in a journal issue. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.

Open AccessArticles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

Register for our alerting service, which notifies you by email when new issues are published online.

International Journal of Data Mining and Bioinformatics (30 papers in press)

Regular Issues

  • An adaptive multimodal biometric feature recognition method based on visual perception   Order a copy of this article
    by Hua Deng, Jifu Zhang, Yun He, Jing Zhang, Dan Han 
    Abstract: In order to overcome the recognition accuracy and image information entropy of traditional biometric recognition methods, this paper proposes an adaptive multimodal biometric recognition method based on visual perception. Segmenting images using Weber scores and calculating the just noticeable difference (JND) to obtain human visual perceptual features; using induction bilateral filtering method for processing; using adaptive convolutional neural networks to fuse multimodal biological features; by using the support vector machine Shafer Dempster (SVM-DS) feature recognition function, the reliability allocation of different evidence body recognition is obtained, and the target type obtained by the decision module is obtained to achieve biometric recognition. After testing, when using this method, the image information entropy is greater than 0.9, and the image quality is optimised; in the process of feature recognition, it has the ability to accurately match.
    Keywords: visual perception; adaptive; multimodal; biological characteristics; identification and classification; bilateral filter.
    DOI: 10.1504/IJDMB.2025.10066713
     
  • Evaluation of neutrophil gelatinase-associated lipocalin precision in periprosthetic joint infection diagnosis   Order a copy of this article
    by Ting Fu, Huhu Wang, Qiaolong Hu, Shuai Ding, Jiaming He 
    Abstract: This study assesses neutrophil gelatinase-associated lipocalin (NGAL) as a diagnostic biomarker for periprosthetic joint infection (PJI). A comprehensive search in multiple databases (Cochrane Library, Scopus, OVID, PubMed, Web of Science, and Embase) identified relevant studies on NGAL for PJI diagnosis until September 2023. Pooled sensitivity and specificity for NGAL, alongside other biomarkers (CRP, ESR, SF-WBC, D-dimer, and PCT), were analysed. Across nine studies, NGAL showed high diagnostic accuracy with pooled sensitivity of 0.93 and specificity of 0.90. Other markers showed lower sensitivity (ESR: 0.81, CRP: 0.75, SF-WBC: 0.91, D-dimer: 0.43, PCT: 0.65) and specificity (ESR: 0.90, CRP: 0.93, SF-WBC: 0.86, D-dimer: 0.93, PCT: 0.93). The diagnostic odds ratio (DOR) for NGAL (132.89) surpassed other markers, supporting NGAL as a superior diagnostic tool for PJI.
    Keywords: periprosthetic joint infection; PJI; neutrophil gelatinase-associated lipocalin; NGAL; diagnosis.
    DOI: 10.1504/IJDMB.2025.10067532
     
  • CNN-enabled transfer learning and cosine rat swarm optimisation for classification of heart disease   Order a copy of this article
    by K. Saravanan , B. Sasikumar  
    Abstract: This research introduces the Cosine Rat Swarm Optimisation (CRSO)-based Transfer learning (TL) model for the classification of heart disease by using the medical data. Originally, the input medical data are accumulated and then the image is pre-processed by using the min-max normalisation. Then, the feature fusion is done using Matusita similarity measures considering Deep Maxout Network (DMN). Thereafter, the Borderline- Synthetic Minority Over-sampling Technique (SMOTE) oversampling model is used to augment the data. Then, the classification of heart disease is carried out by using a Convolution Neural Network (CNN) with transfer learning wherein the CNN is used with the hyperparameters from the trained models like Deep Batch-normalised eLU AlexNet (DbneAlexnet). Here, the training of DbneAlexnet is done using the CRSO algorithm. The proposed method achieved an accuracy of 91.7%, with a True Negative Rate (TNR) of 91% and a True Positive Rate (TPR) of 91.8%.
    Keywords: hyperparameter tuning; transfer learning; min-max normalisation; deep maxout network; DMN; Matusita similarity.
    DOI: 10.1504/IJDMB.2025.10068039
     
  • Cross-modal imputation and gated GCN for predicting miRNA-disease association (CIGGNET)   Order a copy of this article
    by Yan Chen, Zhenjie Hou, Wenguang Zhang, Han Li, Haibin Yao 
    Abstract: microRNA(miRNA) is a short-chain non-coding RNA molecule encoded by endogenous genes. Currently, many miRNAs related to complex diseases have been found, which provides help for further exploring the molecular mechanism of disease pathogenesis. We proposed an algorithm named CIGGNET for predicting the association between miRNA-disease based on cross-modal data imputation and gated graph convolution network. First, CIGGNET uses a cross-modal data imputation operation on the miRNA-disease association matrix to obtain the filled association matrix. Second, CIGGNET integrates miRNA-disease heterogeneous networks, extracts features of miRNAs and diseases use random wander algorithm, and learns miRNA and disease embeddings using graph convolutional network. Third, CIGGNET uses a gating operation to select the appropriate convolution layer. The control gate adaptively outputs suitable convolution layers based on the similarity of different convolution layers and scores unobserved associations. The mean AUC of CIGGNET is 0.9423 in 100 five-fold cross-validations.
    Keywords: miRNA; disease; MiRNA-disease association prediction; cross-modal data imputation; gated graph convolution network.
    DOI: 10.1504/IJDMB.2025.10064546
     
  • Intelligent decision-making framework for big data using enhanced honey badger-based adaptive hybrid deep learning network   Order a copy of this article
    by D. Kavitha, A. Chinnasamy, P. Selvakumari 
    Abstract: By utilising the conventional models, it is also consuming more time for processing. Hence, there is a crucial requirement for real-world application over big data procedures to perform a scalable and effective solution. For the experimentation, input data is gathered from different application-oriented datasets. Initially, the input data is congregated and undergoes for data cleaning stage and then the cleaned data is given as input for optimal feature extraction, in which the enhanced map-reduce model is applied for extracting the optimal features. These obtained optimal features are fed into adaptive cascaded long short-term memory and auto-encoder-based long short-term memory (ACLALSTM), in which the parameters are optimised by using enhanced HBA for effective decision-making in proposed big data analysis. The experimental analysis shows, that the proposed big data-based decision-making model shows the tendency to provide rapid decisions that help to analyse the big data effectively.
    Keywords: decision making; big data; enhanced honey badger algorithm; adaptive cascaded long short-term memory; auto-encoder; MapReduce framework.
    DOI: 10.1504/IJDMB.2025.10066770
     
  • Integrating pathological images and genomics data to identify prognostic features related to recurrence of sarcoma   Order a copy of this article
    by Zengxin Li, Shiling Song, Jin Deng 
    Abstract: To investigate the prognostic prediction of sarcoma recurrence by combining pathological images and genomic data, and to explore potential markers of sarcoma recurrence. Pathological images and genomic data were used for recurrence feature extraction, followed by screening for survival-related features, and finally, pathological images and gene expression data were integrated to analyse prognostic prediction and identify factors affecting patients' survival by using Kaplan-Meier survival curves and Lasso-Cox regression models. Combining pathologic images and genomic data provided better prognostic prediction of patients, and six features were highly associated with sarcoma recurrence. These features have the potential to be key targets for studying sarcoma recurrence and provide valuable insights into personalised treatment of sarcoma recurrence.
    Keywords: sarcoma; recurrence; pathological images; bioinformatics; gene expression analysis.
    DOI: 10.1504/IJDMB.2025.10067137
     

Special Issue on: Empowering the Future Generation of Data Mining and Knowledge Discovery in Bioinformatics

  • A novel intelligent-based intrusion detection and prevention system in the cloud using deep learning with meta-heuristic strategy   Order a copy of this article
    by Srilatha Doddi, Thillaiarasu N 
    Abstract: Cloud computing serves diverse options for end-users to minimise costs, and services are easily accessible through online platforms. While the users access the services remotely, the attackers launch cyber-attacks to disrupt the services. Cloud security analysts treat the security of the cloud as a potential area of research to minimise the impacts of abnormal behaviour. One of the potential solutions to detect attacks is the development of the next-generation intrusion detection and prevention system (IDPS). Hence, this paper proposes an efficient IDPS using a hybridised model known as hybrid firebug-squirrel swarm algorithm-based ensemble classifiers (HF-SSA-EC). Initially, the NSL-KDD cup 1999 dataset is considered for experimental analysis. The efficient features are extracted via restricted Boltzmann machines (RBM) layers of the deep belief network (DBN) model. The extracted features are submitted to the ensemble classifiers (ECs), which use naive Bayes (NB), support vector machines (SVM), deep neural networks (DNN), and recurrent neural networks (RNN) for identifying the intrusions. EC parameter optimisation using a hybridised HF-SSA meta-heuristic improves performance. Finally, the prevention model eliminates malicious nodes from detected intrusions. Meta-heuristic clustering is used in the preventative model. The experimental results reveal that the recommended IDPS outperforms existing models.
    Keywords: intrusion detection and prevention system; IDPS; cloud computing; restricted Boltzmann machines; RBM; deep feature extraction; firebug swarm optimisation; FSO; squirrel search algorithm.
    DOI: 10.1504/IJDMB.2025.10062482
     
  • Metaheuristic gene regulatory networks inference using discrete crow search algorithm and quantitative association rules   Order a copy of this article
    by Makhlouf Ledmi, Mohammed El Habib Souidi, Aboubekeur Hamdi-Cherif, Abdeldjalil Ledmi, Hichem Haouassi, Chafia Kara-Mohamed 
    Abstract: Gene regulatory networks (GRNs) inference appeared as valuable tools for detecting irregularities in cell regulation. Association rule mining (ARM) encompasses specific data mining methods capable of inferring unknown associations between genes. In response to the scarcity of ARM-based GRN inference, a novel metaheuristic algorithm, DCSA-QAR, is presented. This algorithm infers quantitative association rules by discretising the crow search algorithm. A first series of experiments involved comparison with five metaheuristic algorithms on six datasets. The results showed that, for Co-citation and YeastNet datasets, our algorithm was first in precision (100%), specificity (100%) and score (3.75). A second series of experiments involved nine information-theoretic algorithms through the DREAM3 and SOS networks. The average results on DREAM3 datasets are compensated by the SOS real datasets results: the best in accuracy, and true positives. As an overall appraisal, DCSA-QAR can be considered as a good candidate for ARM-based metaheuristic GRNs inference.
    Keywords: artificial intelligence; bioinformatics; gene regulatory networks; GRNs; data mining; soft computing; mining association rules.
    DOI: 10.1504/IJDMB.2025.10062651
     
  • Plasma proteins related to the state of depression: a case-control study based on proteomics data of pregnant women.   Order a copy of this article
    by Yuhao Feng, Jinman Zhang, Zengyue Zheng, Chenyu Xing, Min Li, Guanghong Yan, Ping Chen, Dingyun You, Ying Wu 
    Abstract: Prenatal and postpartum emotional changes in pregnant women in early pregnancy are of great significance to the physical and mental health of mothers and infants. To identify factors related to this, we conducted this study to identify feature proteins that cause maternal depression. Boruta algorithm (BA), recursive partition algorithm (RPA), regularised random forest (RRF) algorithm, least absolute shrinkage and selection operator (LASSO) algorithm, and genetic algorithm (GA) were used to select features. Extreme gradient boosting (XGBoost), back propagation neural network (BPNN), support vector machine (SVM), random forest (RF), and logistic regression (LR) were selected to construct the predictive models. All models showed a good performance in predicting, with the mean AUC (the area under the receiver operating curve) exceeding 80%. Features will provide clues to prevent depression in pregnant women and improve the physical and mental health of mothers and babies.
    Keywords: pregnant women; depression; proteomics; biomarkers; feature selection.
    DOI: 10.1504/IJDMB.2025.10064226
     
  • Enhancing drug-drug interaction event prediction from knowledge graphs by multimodal deep neural networks   Order a copy of this article
    by Xiaomin Shen, Jianliang Gao, Tengfei Lyu, Jiamin Chen, Jiarun Zhang, Jing He, Zhao Li, Wei Yu 
    Abstract: This study tackles the challenging issue of predicting drug-drug interactions (DDI) in pharmacology. Despite advancements in deep learning for DDI prediction, many techniques fail to fully leverage multimodal data correlations, limiting accuracy. To address this, we propose the knowledge graphs by multimodal deep neural network (KGMDNN) framework, enhancing DDI prediction by integrating features from drug knowledge graphs (DKG) and heterogeneous features (HF). KGMDNN uses a dual-path structure to obtain multimodal drug representations, effectively capturing drug relationships and connections within DKG to improve prediction accuracy. Our method excels in learning joint representations of structural information and multimodal data, as demonstrated through numerous real-world dataset experiments. Additionally, testing various drug knowledge graphs confirmed the model’s robustness. KGMDNN outperforms both classic and state-of-the-art models in prediction metrics and interpretability.
    Keywords: DDI event prediction; drug-drug interaction; graph neural network; GNN; knowledge graph; heterogeneous information; multi-modal data.
    DOI: 10.1504/IJDMB.2025.10066016
     

Special Issue on: New Applications of Computational Biology and Bioinformatics

  • Identification of potential biomarkers of esophageal squamous cell carcinoma using community detection algorithms   Order a copy of this article
    by Bikash Baruah, Domum Karlo, Manash P. Dutta, Subhasish Banerjee, Dhruba K. Bhattacharyya 
    Abstract: Potential biomarker genes are uncovered in this research by developing a unique methodology through the employment of six eminent community detection algorithms (CDAs) on four RNAseq esophageal squamous cell carcinoma (ESCC) datasets. RNAseq datasets are preprocessed using galaxy server followed by the identification of a subset of differentially expressed genes (DEGs). CDAs are applied separately on control and disease samples of DEGs to extract the hidden communities of the datasets. To identify the significant communities, ESCC elite genes are extracted from Genecards for subsequent downstream analysis towards the identification of potential biomarkers. Topological analysis is performed to support critical gene identification based on elite genes followed by a biological investigation. For biological investigation, gene enrichment and pathway analysis are implemented. Finally, a group of genes EPHB2, ABLIM3, ACER1, ABCD4, ARF6, ADRA1D, ATP6V1D, CLTB, ATP6V0A4, and AP1M1 are identified as ESCC possible biomarkers that carry both topological and biological significance.
    Keywords: community detection algorithm; CDA; potential biomarker; esophageal squamous cell carcinoma; ESCC; Elite gene; topological analysis; biological significance.
    DOI: 10.1504/IJDMB.2025.10061876
     
  • Research on bioinformatics data classification method based on support vector machine   Order a copy of this article
    by Hui Yan, Yunxin Long, Chao Lv, Ping Yu, Duo Long 
    Abstract: Due to the problems of low classification accuracy and long classification time in traditional biological information data classification methods, a biological information data classification method based on support vector machine is proposed. Bio-information data was acquired through gene expression and the characteristics analysed. Based on the data analysis results, outlier detection and data scaling for the acquired bio-information data are carried out. Based on the processing results, mutual information is used to measure the correlation and redundancy, then, the bio-information data features are selected through the feature selection algorithm of minimum redundancy and maximum correlation, and finally, the selected bio-information data features are taken as data samples. Through support vector machine, the classification decision function is established under the conditions of linear and non-separable data samples to obtain the classification results of biological information data. The experimental results show that the proposed method has higher classification accuracy and shorter classification time.
    Keywords: support vector machine; bioinformation; data classification; minimum redundancy and maximum correlation; feature selection.
    DOI: 10.1504/IJDMB.2025.10061944
     
  • Spearman dependence function-based goodness-of-fit test for the gene's relation   Order a copy of this article
    by Selim Orhun Susam, Burcu Hudaverdi 
    Abstract: A gene network represents the relationship between different groups of genes with various functions, aiming to depict how genes collaborate and influence each other's activities within a biological system. This relationship can be effectively explained using copulas. Therefore, it is crucial to determine which copula best fits the gene data and provide the most accurate explanation of the relationships between gene groups. In this study, our objective is to introduce a Spearman dependence function-based goodness-of-fit test using Bernstein polynomial approximation. We apply this test to identify a copula model that can effectively explain the relationships between gene groups. A Monte Carlo simulation study is conducted to assess the performance of the proposed test. Next, we analyse histone gene groups using data from yeast cell regulation, as provided by Eisen et al. (1998). Specifically, we investigate the dependence model structures of gene interactions for eight histone genes.
    Keywords: Spearman dependence; copula goodness-of-fit test; Bernstein copula; histone genes.
    DOI: 10.1504/IJDMB.2025.10061726
     
  • Research on facial dataset cleaning in mixed scenes based on spatiotemporal correlation   Order a copy of this article
    by Siguang Dai 
    Abstract: Researching methods for cleaning mixed scene facial datasets can improve the performance and reliability of mixed scene facial recognition algorithms. Therefore, the paper proposes a facial dataset cleaning method in mixed scenes based on spatiotemporal correlation. The 2DPCA algorithm is used to reduce the dimensionality of the data set, and the composite multi-scale entropy is used to decompose, reconstruct and arrange the image sequence after the dimensionality reduction. The autocorrelation coefficient and the number of interrelations between image sequences were determined, and the anomaly detection of data set was realised by combining spatio-temporal correlation. Sparse representation was used to repair the abnormal images, and the images with high similarity were deleted to clean the mixed scene face data set. The experimental results show that the minimum anomaly rate of our method is 0.5%, the success rate is between 94% and 96%, and the minimum time cost is 0.2 s.
    Keywords: spatiotemporal correlation; mixed scenes; facial dataset; dataset cleaning; 2DPCA algorithm; composite multi-scale entropy; sparse representation.
    DOI: 10.1504/IJDMB.2025.10061768
     
  • Prediction method of commercial customers' mental health based on data mining   Order a copy of this article
    by Yanhua Shen, Bing Gao 
    Abstract: For commercial customer management, mental health prediction is crucial, therefore, a data mining-based method for predicting the mental health of commercial customers is proposed. Firstly, the K-means algorithm is used to mine and process the psychological health test data of commercial customers. Secondly, develop a program for evaluating the psychological health of commercial customers, construct a judgment matrix, and calculate weight coefficients to obtain the evaluation results of the psychological health level of commercial customers. Finally, based on the evaluation results of mental health level as input and the predicted results of mental health, a BP neural network is used to build a commercial customer mental health prediction model. The experimental data shows that after the proposed method is applied, the mining results of commercial customers' mental health data are consistent with the actual results, and the minimum error of commercial customers' mental health prediction is 0.4%.
    Keywords: commercial customers; mental health; enterprise development; data mining technology; prediction model construction.
    DOI: 10.1504/IJDMB.2025.10062484
     
  • Longitudinal analysis for predicting amino acid changes in HIV-1 using association rule mining   Order a copy of this article
    by Mounira Lakab, Abdelouahab Moussaoui 
    Abstract: The human immunodeficiency virus (HIV) remains a great challenge for humanity. HIV is characterised by high mutational rate, resulting in pathogenic variants that promotes the escape of immune response. In order to understand the correlations between amino acid mutations of the virus and quantify the evolutionary in HIV, we present a novel approach based on association rule mining (ARM) from protein sequence data taken at different time points. In this study, a longitudinal association rule mining (LARM) algorithm has been proposed. We collected the entire genome of 100 untreated HIV-1 infected patients over 3-5 years of infection, with 6-10 longitudinal samples per patient. We used the Los Alamos intra-patient search interface. Our experiments show the effectiveness of the proposed method in discovering major amino acid changes in comparison with the temporal analysis.
    Keywords: association rule mining; longitudinal data; HIV-1; mutation; amino acid; data mining.
    DOI: 10.1504/IJDMB.2025.10062519
     
  • Classification and retrieval method of personal health data based on differential privacy   Order a copy of this article
    by Guanpeng Xu, Liang Zhao 
    Abstract: Research on personal health data classification and retrieval methods can improve the accuracy and efficiency of medical decision-making, promoting the development of personalised medicine. To overcome the issues of low accuracy, long retrieval time, and low satisfaction in traditional methods, a classification and retrieval method of personal health data based on differential privacy is proposed. The method involves encrypting personal health data using linear regression model and differential privacy and constructing a classification objective function through integrated manifold learning that classifies the encrypted results of personal health data. Binary hash codes are used to retrieve the classification results, and the decrypted retrieval results are provided to users for personal health data classification and retrieval. The experimental results demonstrate that this method achieves a maximum accuracy of 96.8% in personal health data classification and retrieval, with a minimum retrieval time of 20 ms and an average satisfaction of 97.1% for the retrieval results.
    Keywords: differential privacy; personal health data; classification and retrieval; linear regression model; encrypted results; binary hash code.
    DOI: 10.1504/IJDMB.2025.10062018
     
  • Log anomaly detection and diagnosis method based on deep learning   Order a copy of this article
    by Zhiwei Liu, Xiaoyu Li, Dejun Mu 
    Abstract: In order to improve the accuracy of log anomaly detection and diagnostic effectiveness, this paper proposes a deep learning-based log anomaly detection and diagnosis method. Firstly, analyse the log data and obtain the corresponding relationship between the log keys and log parameters. Secondly, using deep learning to capture association features, a convolutional neural network bidirectional long short-term memory (CNN-BiLSTM) deep learning model is constructed. Finally, learning context sequence feature information from both positive and negative directions through bidirectional input, and implementing log anomaly detection and diagnosis based on the results of context sequence feature information. The experimental results show that the accuracy of log anomaly detection in this method can reach 98.6%, the time required for log anomaly detection can reach 1.1 s, and the recall rate for log anomaly detection is 96.8%. The log anomaly detection effect is good.
    Keywords: deep learning; one hot encoding; context sequence features; log exception.
    DOI: 10.1504/IJDMB.2025.10062017
     
  • An advanced approach for DNA sequencing and similarities analysis on the basis of groupings of nucleotide bases   Order a copy of this article
    by Kshatrapal Singh, Laxman Singh, Vijay Shukla, Yogesh Kumar Sharma, Arun Kumar Rai 
    Abstract: In order to seamlessly identify the links between various DNA sequences on a broad scale, DNA sequencing is a crucial tool. But there is still more potential for advancement in sequencing quality. A highly well-liked method for determining sequence similarities is the alignment-free technique. As per their chemical characteristics, the four bases of DNA sequences A, C, G, and T are separated in three types of groupings in this research. A primary DNA sequence is transformed into three symbolic sequences. In order to depict the sequence, the frequencies of group variations of three notational sequences have been aggregated in a 12-component vector. The nucleotide sequences of beta globin gene on a dataset of several species are characterised and compared using the Euclidean distances across inserted vectors. Using phylogenetic trees, the evolutionary relationships between various organisms are visually represented. A phylogenetic tree's branch structure shows how several species or other groups diverged from several common ancestors. Our findings are in agreement with recent biological assessments. Additionally, we compared our approach to a few currently used sequence comparing techniques and discover that it is more efficient and user-friendly. We also analysed the time and space complexities of our proposed approach.
    Keywords: alignment-free technique; similarity analysis; bases groupings; mutation; phylogenetic tree.
    DOI: 10.1504/IJDMB.2025.10063428
     
  • In silico evaluation via the docking of selected antidiabetic phytochemicals on proteins in the insulin signalling pathway: PTP1B, IRS1 and PP2A   Order a copy of this article
    by Hazim Alsharabaty, Niveen Alayasi, Safa Radi Jabarin, Siba Shanak, Hilal Zaid 
    Abstract: Type II diabetes mellitus (T2MD) is a worldwide disease, caused by the resistance of tissues to insulin. In this study, eight potential antidiabetic phytochemicals from Gundelia tournefortii and Ocimum basilicum were tested in silico. To this aim, we docked the phytochemicals on pivotal proteins in the insulin signalling pathway using the docking protocol of AutoDock. This work aimed at understanding the mechanism of action of these phytochemicals by finding the optimal binding site, calculating the best orientation, and studying the amino acids involved at the interaction interface between the phytochemicals and each protein target. Our results indicated that stigmasterol, beta-amyrinm, beta-sitosterol, lupeol-trifluoroacetate and lupeol introduce good binding to PTP1B, IRS1, and PP2A and are candidate drugs for the treatment of T2DM. The results of the study may serve as a focal point for drug discovery that may be further extended in the in vitro, in vivo and clinical studies.
    Keywords: diabetes; phytochemicals; in silico; Gundelia tournefortii; Ocimum basilicum; docking; AutoDock.
    DOI: 10.1504/IJDMB.2025.10064690
     

Special Issue on: The Development of Novel Integrative Bioinformatics Based Machine Learning Techniques and Multi Omics Data Integration Part 2

  • Machine learning algorithm for lung cancer classification using ADASYN with standard random forest   Order a copy of this article
    by J. Viji Gripsy, T. Divya  
    Abstract: Lung cancer is one type of cancer that develops in the lungs. Early identification of lung cancer symptoms may lead to a successful treatment. The dataset indicates the presence of duplicate characteristics, as well as an imbalanced classification, making lung cancer classification a challenging task. This study presents a novel approach that combines the ADASYN with the standard random forest (ASRF) model to enhance the efficacy of lung cancer dataset identification. The ASRF, as described, offers interpretable outcomes by using feature significance, hence providing significant insights into the aspects that contribute to judgments on the classification of lung cancer. The classification algorithm is used to ascertain the existence or absence of lung cancer in a certain patient. When comparing the proposed ASRF with the current SVM, MLP, RF and GB, compared to other existing methods, the ASRF technique achieved 93.5% precision, 94.7% recall, 94.1% F-measure, and 94% accuracy.
    Keywords: lung cancer; LC; RF ASRF; MLP; support vector machine; SVM; GB.
    DOI: 10.1504/IJDMB.2025.10065391
     
  • Modified VGG-16 model for COVID-19 chest X-ray images: optimal binary severity assessment   Order a copy of this article
    by Manoranjan Dash  
    Abstract: A pandemic caused by a virus known as COVID-19 has swept across the globe. One potential weapon in the fight against COVID-19 could be early detection through the use of chest X-ray images. In this paper, I have used modified VGG-16 deep learning model for binary classification of COVID-19 chest X-ray images. There are 16 weight layers in the standard VGG-16 model. In the suggested modified VGG model, the total number of weight layers has been reduced from 16 to 9 (eight convolutional layers and one fully connected layer). According to the results, the modified VGG-16 model performs better than the other three models (CNN, KNN and VGG-16) in terms of quantitative measures of accuracy, sensitivity and specificity. The dataset used for the proposed work consists of 24,000 chest X-ray images of lung collected from online depository comprising of 12,000 for each class (healthy and pneumonia).
    Keywords: deep learning; classification; COVID-19; SARS-CoV-2; modified VGG-16.
    DOI: 10.1504/IJDMB.2025.10065665
     
  • Revealing novel biomarkers for oesophageal squamous cell carcinoma through integrated single-cell RNA sequencing analysis   Order a copy of this article
    by Bikash Baruah, Manash P. Dutta, Subhasish Banerjee, Dhruba K. Bhattacharyya 
    Abstract: This study employs single-cell RNA sequencing (scRNA-seq) to analyse oesophageal squamous cell carcinoma (ESCC), identifying 10 potential biomarkers (ALDH2, ANGPT2, APPL1, ARPC2, CAD, CALM1, CLDN7, CLTB, F2RL3, LPAR1) associated with radiation exposure. Methodology involves scRNA-seq for data partitioning, pre-processing, clustering, and differential expression analysis. Dysregulated genes are identified through comprehensive gene ontology (GO) annotations, and ESCC-related pathways are explored via the Kyoto encyclopedia of genes and genomes (KEGG) database. Analysis of 38 genes reveals distinct patterns under radiation exposure, enriching understanding of ESCC-related processes, components, and functions. This research provides a holistic view of ESCC’s molecular landscape, emphasising the clinical significance of identified biomarkers and contributing significantly to the understanding of this complex malignancy.
    Keywords: oesophageal squamous cell carcinoma; ESCC; single-cell RNA sequencing; scRNA-seq; differential expression analysis; gene ontology; GO; pathway analysis; potential biomarker.
    DOI: 10.1504/IJDMB.2025.10065927
     
  • A novel suppressed segmentation framework for hyper spectral image processing in earlier cancer detection   Order a copy of this article
    by Kaushal Kishor, Manoj Singhal, Rajesh Kumar Maurya, Pramod Kumar Sagar, Rupak Sharma, Satya Prakash Yadav 
    Abstract: This paper affords a novel suppressed segmentation framework for Hyperspectral image processing before most cancer detection. This framework integrates the most recent advances in deep learning fashions and image segmentation for the most fulfilling selection-making approximately early cancer analysis. The proposed framework facilitates the fast and correct segmentation of the tumour tissues and other aberrations within hyperspectral images. The key modules of this framework encompass input pre-processing, noisy additive analysis, random area cropping, augmented context representation, hierarchical segmentation, and submit-processing. Experiments performed on real-world datasets show that the proposed framework yields segmentation accuracy similar to other main segmentation techniques while having advanced pace and robustness. The proposed model obtained 95.32% accuracy, 92.89% sensitivity, 91.50% specificity, 94.25% precision and 92.51% F1-score. This proposed method offers an optimised workflow for fast and correct segmentation of tumour tissues in most early cancer diagnoses.
    Keywords: image processing; deep learning; suppressed segmentation framework; SSF; hyperspectral image processing; HSIP; earlier cancer detection; traditional imaging techniques; hyperspectral imaging; hierarchical segmentation.
    DOI: 10.1504/IJDMB.2025.10066121
     
  • Age invariant face recognition method based on enhanced convolutional neural network   Order a copy of this article
    by Bin Fang 
    Abstract: Research on anti age invariant face recognition can not only improve the robustness of facial recognition systems, but also provide guidance for the development and application of facial recognition technology. Aiming at the problems of low peak signal-to-noise ratio, low recognition accuracy and long recognition time of traditional anti-age invariant face recognition methods, an age invariant face recognition method based on enhanced convolutional neural network is proposed. The captured images are enhanced using a bilateral filtering algorithm. The SURF algorithm is employed to extract facial features and remove age-related interference features, completing the selection of facial image features. These selected features are then inputted into the enhanced CNN to obtain the age invariant face recognition results. The experimental results demonstrate that the proposed method achieves a maximum image peak signal-to-noise ratio of 56.85dB,varying recognition accuracy in the range of 96.1% to 97.6%,and a maximum recognition time of 78.96ms
    Keywords: enhanced convolutional neural network; age invariant; face recognition; bilateral filtering algorithm; SURF algorithm.
    DOI: 10.1504/IJDMB.2025.10066150
     
  • Dual pipeline technique for detecting sepsis from photoplethysmography   Order a copy of this article
    by Shadi Abudalfa, Sara Lombardi, Eleonora Barcali, Leonardo Bocchi 
    Abstract: The goal of this work is to improve the performance of sepsis-detection in photoplethysmography (PPG) data. To achieve this goal, we present a hybrid technique for classifying sepsis in PPG data based on confident learning (CL) with noisy data. The technique presented in this study employs CL to improve the accuracy and reliability of the machine learning models, as it takes into account the uncertainty associated with each prediction. Numerous experiments were carried out to assess the performance of the presented technique in detecting sepsis using PPG data. The results obtained, using the best-performing XGBoost model, were compared with those of a previous study in which a deep learning-based model was applied to the same sample of data. The presented technique demonstrated its effectiveness by achieving an F1 score of 80.62% on test set, with a 7% improvement compared to the performance of the previous study.
    Keywords: confident learning; rich features; noisy data; photoplethysmography; sepsis; synthetic data generation.
    DOI: 10.1504/IJDMB.2025.10066271
     
  • Deep mining of elderly health data based on improved association clustering   Order a copy of this article
    by Bo Yang 
    Abstract: To deeply process the health data of the elderly, this paper designs a deep mining method for elderly health data based on an improved association clustering approach. Initially, health data samples from the elderly are collected. The Apriori algorithm is enhanced with interest constraints, connectivity operations are employed to generate candidate itemsets, and those that do not meet the requirements are eliminated. Associated feature quantities are then extracted from the health data. Subsequently, a fuzzy K-means algorithm with weight attributes is incorporated as the core method, and a balance coefficient is calculated using the principle of balanced contribution. Finally, the improved fuzzy K-means algorithm is utilised to complete data classification, detect abnormal data points, and achieve deep mining of the health data. The results indicate that the proposed method has a false alarm rate of less than 3.21% and a false negative rate of less than 1.81%, demonstrating a superior mining effect compared to the comparison method.
    Keywords: association rules; clustering algorithm; the elderly; health data; deep mining.
    DOI: 10.1504/IJDMB.2025.10066985
     
  • In silico study discerns PIH1D1 and p53 to be promising prognostic markers for children's brain cancer   Order a copy of this article
    by Dhiraj Kumar Singh, Prashant Ranjan, Sahar Qazi, Bimal Prasad Jit, Amit Kumar Verma, Riyaz Ahmad Mir 
    Abstract: Genetic alterations in normal brain cells lead to the development of brain tumours (BT). The incidence of newly diagnosed cases is on the rise over time. Understanding the molecular biology of paediatric brain tumours is crucial for advancing novel therapeutic approaches to prevent or effectively manage this disease. The R2TP complex, a conserved co-chaperone from yeast to mammals, including RUVBL1, RUVBL2, PIH1D1, and RPAP3 in humans, plays a crucial role in the assembly and maturation of various multi-subunit complexes. This study evaluates the expression of PIH1D1 and p53 in paediatric brain cancers using The Cancer Genome Atlas (TCGA) data through the UALCAN. Our analysis revealed elevated expression levels of PIH1D1 in paediatric brain tumours across all age groups compared to normal tissues, suggesting its potential as an early detection marker and a prognostic indicator. Additionally, P53 emerged as a promising target for brain tumour treatment, warranting exploration for age-specific applications.
    Keywords: R2TP; PIH1D1; paediatric brain tumour; TCGA; UALCAN; CBTTC.
    DOI: 10.1504/IJDMB.2025.10067136
     
  • ICEP and ILEP: two new approaches to identify community of complex biological network   Order a copy of this article
    by Mamata Das, K. Selvakumar , P.J.A. Alphonse 
    Abstract: Understanding the internal modular organization of protein-protein interactions is crucial for deciphering molecular-level biological processes. Recognition of network communities enhances our comprehension of the biological origins of disease pathogenesis. This research introduces two innovative community detection algorithms, Iterative Credit-Edge Pruning (ICEP) and Iterative Load-Based Edge Pruning (ILEP), designed to identify communities within complex biological networks. Our algorithms are evaluated using real-world data from the Omicron dataset, and their performance is compared with four established algorithms: Girvan-Newman, Louvain, Leiden, and the Label Propagation algorithm. Validation of the community structures is achieved through modularity. Among the techniques compared, our proposed method, ICEP, stands out with the highest modularity score of 0.885, outperforming all other approaches. The alternative method, ILEP, also achieves a notable modularity score of 0.698, surpassing the Girvan Newman method. By implementing ICEP and ILEP, we gain profound insights into the structural organization and interconnections within the Omicron virus.
    Keywords: protein interaction network; omicron; community detection; modularity; graphlet; centrality.
    DOI: 10.1504/IJDMB.2025.10067341
     
  • BMSD-CDE: a robust community detection ensemble method for biomarker identification   Order a copy of this article
    by Bikash Baruah, Manash P. Dutta, Subhasish Banerjee, Dhruba K. Bhattacharyya 
    Abstract: Community detection algorithms (CDAs) are crucial for identifying cohesive groups within complex networks. However, individual CDAs often fall short of accurately uncovering all hidden communities due to their inherent biases and limitations. These algorithms are typically designed with specific objectives, which may inadvertently lead to the oversight of certain community types, resulting in partial or imprecise outcomes. To address these limitations, we propose BMSD-community detection ensemble (CDE), a novel ensemble method that integrates six prominent CDAs FastGreedy, Infomap, LabelProp, LeadingEigen, Louvain, and Walktrap. By strategically combining the outputs of these diverse algorithms using p-value references and elite genes, BMSD-CDE enhances the accuracy and robustness of community detection. 2 B. Baruah et al. This ensemble approach provides a more reliable foundation for downstream analyses, particularly in identifying potential biomarkers. Applied to esophageal squamous cell carcinoma (ESCC), BMSD-CDE reveals a set of genes F2RL3, ATP6V1C2, CGN, CAD, ANGPT2, ALDH2, CLDN7, and DTX2 as potential biomarkers. These findings are supported by extensive topological and biological analyses across normal and disease conditions using four distinct datasets.
    Keywords: potential biomarker; community detection algorithm; CDA; ensemble algorithm; topological experiment; ESCC; biological validation; community detection ensemble; CDE.
    DOI: 10.1504/IJDMB.2025.10067623