International Journal of Data Science (IJDS) Inderscience Publishers - linking academia, business and industry through research

Forthcoming Articles

International Journal of Data Science

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Online First articles are also listed here. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.

Articles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

International Journal of Data Science (5 papers in press)

Regular Issues

Survey Logistic Regression Analysis of HIV/AIDS Knowledge, Attitudes, and Testing Among Sudanese Women
by Mohammed Omar Musa Mohammed, Ahmed Saied Rahama Abdallah
Abstract: This study explores HIV/AIDS-related knowledge, attitudes, and testing behaviours among Sudanese women of reproductive age, based on data from the 2014 Sudan Multiple Indicator Cluster Survey (MICS), which involved 13,017 women nationwide. Survey Logistic regression analysis was used to examine how demographic factors relate to HIV/AIDS outcomes. Higher education (OR = 2.27, 95% CI: 1.822.83) and older age (4549 years: OR = 1.37, 95% CI: 1.121.68) were significantly linked to better HIV/AIDS knowledge. Only 44% of women showed adequate knowledge, while negative attitudes were common (78%). Women with higher education were more likely to have positive attitudes (OR = 0.33, 95% CI: 0.260.43 for higher education vs. none). HIV testing rates were very low (5.5%). Interestingly, rural residence (OR = 1.59, 95% CI: 1.212.10) and lower wealth were associated with increased odds of HIV testing. Disparities at the state level were noted across all outcomes. The results emphasise the urgent need for targeted, inclusive HIV education, stigma-reduction initiatives, and expanded testing services, aligned with Sudanese national strategies and international guidelines. Programs should focus on rural, less educated, and economically vulnerable populations to improve HIV-related outcomes among Sudanese women.
Keywords: survey logistic; multiple indicator cluster survey; MICS; knowledge; attitude; HIV/AIDS.
DOI: 10.1504/IJDS.2025.10074904

Dynamic Decision-Making and Optimisation Based on Artificial Intelligence and Network Big Data Integration
by Xinjie Qian, Guixiang Hu
Abstract: With the growth of network big data, AI-driven decision analysis is crucial for optimising operations. Existing methods face limitations, such as poor handling of sparse user-item interactions, inadequate modelling of temporal patterns, and insufficient adaptability in dynamic decision-making. This study proposes an integrated model combining neural collaborative filtering (NCF), long short-term memory (LSTM), and reinforcement learning (Q-learning). The model leverages NCF to capture user-item interactions, LSTM to model temporal dependencies, and Q-learning to optimise strategies. Experimental results on benchmark datasets show the model outperforms baselines in root mean square error (RMSE) and mean absolute error (MAE). These results validate the models accuracy in sparse data environments. This research offers a framework integrating predictive modelling with dynamic optimisation, demonstrating the potential of network big data to enhance decision-making.
Keywords: AI-driven decision analysis; network big data; e-commerce; neural collaborative filtering; reinforcement learning; customer satisfaction.
DOI: 10.1504/IJDS.2025.10074905

High-Precision Anomaly Detection Method Based on Variational Autoencoder and Multi-Source Data Fusion
by Guo Li
Abstract: Anomaly detection in multi-source environments (e.g., network security, industrial monitoring) faces challenges in handling heterogeneous data and complex patterns. To address this, this paper proposes a high-precision method combining variational autoencoder (VAE) with multi-source data fusion. First, a weighted average strategy integrates diverse sources into a unified feature representation. The VAE learns a latent distribution via its encoder-decoder structure, and anomalies are identified from reconstruction errors. The model employs dynamic thresholding, setting thresholds based on a percentile (e.g., 95%) of normal-sample error distribution to enhance adaptability. KL divergence regularisation stabilises latent space learning. Evaluated on KDD Cup 1999 and SMAP & MSL datasets, the method achieves precision of 96.75%, recall of 96.45%, and AUC of 94.51%, outperforming traditional techniques and demonstrating strong robustness and generalisation for complex, multi-source scenarios.
Keywords: Multi-source Data Fusion; Variational Autoencoder; Anomaly Detection; Dynamic Threshold Setting; KL Divergence Regularization.
DOI: 10.1504/IJDS.2025.10075037

Measuring Consumer Sentiment using Self-Evolving Data Collection through Analytics and Business Intelligence
by Timothy J. Haase, Adam Moyer, William A. Young II
Abstract: This study implements the Self-Evolving Data Collection Engine through Analytics and Business Intelligence (SEDCABI) to measure consumer sentiment. The SEDCABI engine leverages diverse data sources to collect unstructured, unsolicited input beyond traditional surveys. We apply the engine to collect and analyze lexicon-specific social media data. Our analysis demonstrates the engine's ability to predict macroeconomic trends, specifically real personal consumption expenditures. Traditionally, consumer sentiment is measured via surveys conducted through the University of Michigan. Its ability to predict future macroeconomic behavior has weakened over time. The director emeritus of the University of Michigan Surveys of Consumers has noted that the existing index may not be suitable for all predictive purposes. Our application of the SEDCABI engine allows us to construct simple measures of positive and negative sentiment. Increasing volume in negative activity precedes significant declines in durable consumption spending by one month.
Keywords: Sentiment; Consumer Spending; Social Media; Data Collection.
DOI: 10.1504/IJDS.2025.10075049

Research on Carbon Emission Prediction Method of Cement Industry based on Electricity Data
by Xuejun Li, Yi Zhang, Xingwei Liao, Jinlin Xie, Shu Zhang, Yuanlin Cheng, Hu Yu
Abstract: In response to Chinas double carbon goal, the countrys national emissions trading system has proposed to include the cement industry. This paper proposes using machine learning techniques to predict cement energy consumption carbon emissions and production process carbon emissions respectively. The paper shows that these emissions can be predicted accurately based on the electricity purchase volume of the cement industry and thewaste heat power generated. The electricity-carbon emission prediction model of cement industry is established based on the least squares optimisation support vector machine (SVM), and Bayes linear regression, K-nearest neighbour (KNN), SVM, multiple linear regression and BP neural network are used for comparison. Through example simulation, the electrical input variables are reasonably selected, and the advantages of using machine learning to predict the carbon of cement industry through electricity data are analysed. The feasibility and reliability of the proposed algorithm are verified by taking the electricity data and carbon emission data of a cement factory in Hunan province as an example.
Keywords: Cement Carbon Emission; Machine Learning Algorithms; Electricity Data.
DOI: 10.1504/IJDS.2025.10075580

Forthcoming Articles

International Journal of Data Science

Keep up-to-date