Forthcoming and Online First Articles

International Journal of Data Science

International Journal of Data Science (IJDS)

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Online First articles are published online here, before they appear in a journal issue. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.

Open AccessArticles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

Register for our alerting service, which notifies you by email when new issues are published online.

International Journal of Data Science (15 papers in press)

Regular Issues

  • Prediction of Customer Churn Risk with Advanced Machine Learning Methods   Order a copy of this article
    by Oguzhan Akan, Abhishek Verma, Sonika Sharma 
    Abstract: Customer churn risk prediction is an important area of research as it directly impacts the revenue stream of businesses. An ability to predict customer churn allows businesses to come up with better strategies to retain existing customers. In this research we perform a comprehensive comparison of feature selection methods, upsampling methods, and machine learning methods on the customer churn risk dataset. (i) Our research compares likelihood-based, tree-based, and layer-based machine learning methods on the churn dataset. (ii) Models built on the churn dataset without upsampling performed better than oversampling methods. However, SMOTE and ADASYN helped stabilize model performance. (iii) The models built on ADASYN dataset were slightly better than the SMOTE counterparts. (iv) It was observed that XGBoost and Deep Cascading Forest combined with XGBoost were consistently better across all metrics compared to other methods. (v) Information Value analysis performed better than PCA. In particular, IVR DCFX model has the best AUROC score with 89.1%.
    Keywords: Customer Churn; Deep Neural Networks; Deep Cascading Forest; Smote; Adasyn.
    DOI: 10.1504/IJDS.2024.10064744
     
  • Self-Evolving Data Collection Through Analytics and Business Intelligence to Predict the Price of Cryptocurrency   Order a copy of this article
    by Adam Moyer, William A. Young II, Timothy J. Haase 
    Abstract: This article presents the Self-Evolving Data Collection Engine through Analytics and Business Intelligence (SEDCABI) for predicting Bitcoin prices. Traditionally models use either structured or unstructured data alone, limiting effectiveness. This research pioneers using both data types. SEDCABI harnesses analytics and BI to extract insights from structured historical price and market data. It also incorporates unstructured social media sentiment and news to capture Bitcoin perceptions. Experiments show integrating both data types significantly improves prediction accuracy. SEDCABI continuously adapts to the dynamic crypto market. The plug-in prediction module enables customization. Overall, SEDCABI offers robust Bitcoin price predictions by combining structured and unstructured data. This contributes to cryptocurrency prediction research with an innovative approach to informed decision-making.
    Keywords: SEDCABI; Prediction; Bitcoin; Cryptocurrency; Text Mining; Analytics; Business Intelligence; Unstructured Data; Sentiment; Price.
    DOI: 10.1504/IJDS.2024.10064877
     
  • A Study of MySQL Protocol-based Database Proxy Approval System for Fortress Machine   Order a copy of this article
    by Xian Zhang, Xinhui Luo, Dong Yin, Taiguo Qu, Hao Li 
    Abstract: With the increase of enterprise informatization, database security, and compliance operation management have become increasingly important. Therefore, it is essential to design an efficient database proxy approval system. In this paper, we develop a database proxy approval system based on the MySQL protocol for fortress machines, which provides a real-time customized configuration scheme for high-risk commands, designs a real-time approval process for six types of high-risk commands, and creates a simple and efficient matching algorithm for high-risk commands. We designed a large number of experiments to test the system's connection success rate, operation stability, response time, CPU resource consumption, matching algorithm performance, and other aspects. The experimental results show that this database proxy approval system has good configuration flexibility, high accuracy, and good time performance. This system has a wide range of applications in electric power, finance, petroleum, and other fields.
    Keywords: Fortress Machine; MySQL Protocol; Database Proxy; Approval System; Database Security.
    DOI: 10.1504/IJDS.2024.10066165
     
  • Mobile Target Defence Against IoT-DDoS Attacks   Order a copy of this article
    by Liping Wu, Xuehua Zhu 
    Abstract: This study analyses the mobile target defence method and feature extraction process based on multi-source information fusion technology (MSIFT), and introduces a feature level fusion (FLF) method for optimising backpropagation neural network (BPNN) DDoS attacks based on genetic algorithm. The models with 9 nodes and 11 nodes had the best learning performance, with learning rates of 0.37 and 0.15. When the intensity of DDoS attacks was low, the prediction accuracy of the proposed method was about 94%. The actual value was usually small, with the 10th group having the highest actual value, close to 800, and the 19th group having the lowest actual value, about 130. Introducing decision level fusion of DDoS attacks based on D-S evidence fusion can further improve the accuracy of attack detection. This study has made significant progress in improving the efficiency and accuracy of mobile target defence against DDoS attacks in the Internet of Things.
    Keywords: Internet of Things; DDoS attacks; Target defense; Multi source information; Genetic algorithm.
    DOI: 10.1504/IJDS.2025.10066963
     
  • A Data Value-Driven Collaborative Data Collection Method in Complex Multi-Constraint Environments   Order a copy of this article
    by LinLiang Zhang, LianShan Yan, ZhiSheng  Liu, Shuo Li, RuiFang Du, ZhiGuo Hu 
    Abstract: Data collection is a foundational task in mobile crowd sensing. However, existing data collection methods prioritise quantity, neglecting heterogeneity, cooperation, energy efficiency, and collision avoidance, causing low multi-agent efficiency in complex scenarios. To address this issue, this paper integrates multi-agent reinforcement learning and deep learning to propose the CS_MCE method. The CS_MCE method, applying to unmanned aerial vehicle (UAV) collaborative data collection scenarios, utilises deep neural networks to solve representation problems in vast state-action spaces and provides intelligent decision-making capabilities. In various experimental environments with different data values, experiments comparing CS_MCE with the MADDPG and IL-DDPG algorithms in terms of reward values, data quality, energy efficiency, and the number of collisions showed that the data quality collected by CS_MCE increased by 56 times, and energy efficiency improved by more than 60%, demonstrating the efficiency and stability of the CS_MCE method.
    Keywords: Mobile Crowd-sensing; Data Collection; Heterogeneous Data; Unmanned Vehicles; Deep Reinforcement Learning.
    DOI: 10.1504/IJDS.2025.10067169
     
  • A Commensurate Univariate Variable Ranking Method for Classification   Order a copy of this article
    by Nuo Xu, Xuan Huang, Thanh Nguyen, Jake Yue Chen 
    Abstract: To apply a variable ranking method for feature selection in classification, the notion of commensurateness is necessitated by the presence of different types of independent variables in a dataset. A commensurate ranking method is one that produces consistent and comparable ranking results among independent variables of different types, such as numeric vs categorical and discrete vs continuous. We invent a ranking method named Condition Empirical Expectation (CEE) and demonstrate it is the most commensurate among several representative ranking methods. Further, it has the highest statistical power as a test of independence when the categorical dependent variable is imbalanced. These properties make CEE uniquely suitable for fast feature selection for any datasets, especially those with high dimensionality of mixed types of variables. Its usage is demonstrated with a case study in facilitating preprocessing for classification.
    Keywords: variable types; variable ranking; variable relevance; commensurate; statistical dependence.
    DOI: 10.1504/IJDS.2025.10067405
     
  • Application of weaving based on log files in database systems   Order a copy of this article
    by Feng Chen, Bin Chen, Huan Xu, Qiuyong Yang, Xiaowen Zeng 
    Abstract: Aspect-oriented database (AODB) systems can effectively integrate and manage various data, improve data processing efficiency, and provide powerful data support for complex business scenarios. In order to improve the weaving efficiency of aspect oriented programming (AOP), this paper focuses on the weaving of log files in AODB. This paper introduces AOP technology in AODB and compares it with object-oriented programming (OOP) technology. This paper proposes a fast repair method for the normal operation and abnormal restart of the AODB system, and verifies the effectiveness of this fast repair mechanism through simulation experiments. The research results indicate that compared with OOP technology, AOP technology can be better applied to the study of log weaving. When notification modifications and connection point changes occur, incremental weaving has shorter weaving time and higher weaving efficiency. The weaving method based on log files can effectively improve the weaving efficiency of AODB and has certain application value.
    Keywords: log weaving; AODB; aspect-oriented database; AOP; aspect-oriented programming; incremental weaving; weaving state recovery; intelligent decision-making technology.
    DOI: 10.1504/IJDS.2024.10065629
     
  • Intelligent factory perception ability using distributed knowledge graph   Order a copy of this article
    by Wenjuan Wang, Donghui Shen, Anyin Bao, Jianming Shao, Shunkai Sun 
    Abstract: Traditional research often faces the problem of information segregation, resulting in a lack of access to comprehensive, cross-domain data during the decision-making process, limiting a comprehensive understanding of the entire smart factory ecosystem. In this paper, we introduce the proximal policy optimisation (PPO) algorithm, combined with the inference capability of knowledge graph, to support complex decision-making problems in smart factories. In this paper, we collected smart factory data from different departments and constructed a distributed knowledge graph, defined semantic labels for entities and relationships, and mapped data from different data sources into the semantic model of the knowledge graph, built a decision network using multilayer perceptron, and updated the parameters of the policy network through PPO. The experimental results show that the average fault prediction accuracy of PPO combined with distributed knowledge graph reaches 96.1%, and the fluctuation of fault prediction accuracy within 12 months is only 0.1%.
    Keywords: intelligent factory; perception ability; distributed knowledge graph; fault prediction; PPO; proximal policy optimisation.
    DOI: 10.1504/IJDS.2024.10066267
     
  • Comparison and database performance optimisation strategies based on NSGA-II genetic algorithm: MySQL and OpenGauss   Order a copy of this article
    by Ming Tang, Lincheng Qi, Sibo Bi, Xinyun Cheng, Shijie Zhang 
    Abstract: With the widespread application of databases in real-time environments, higher requirements are placed on their performance optimisation strategies. In response to the lack of dynamic adjustment and optimisation capabilities for real-time environmental changes in database performance optimisation strategies, as well as poor query throughput and response time performance, this paper adopted Non-dominated Sorting Genetic Algorithm II (NSGA-II) to study performance optimisation of My Structured Query Language (MySQL) and OpenGauss databases. Firstly, it defined three objective functions and the corresponding constraints for the response time of the database query, the performance of the query, and the utilisation of the query resource, and calculated the fitness of each individual and the distance between the layers. Then, the tournament rotation method can be used to output parents with high fitness, and the crossover and mutation probabilities can be set. Finally, the optimal parameter configuration of the database can be output. The experiment was based on the TPC-DS dataset (transaction processing performance council decision support benchmark) and compared the performance of MySQL and OpenGauss databases under different parameter configurations. The experimental results show that after optimisation by the NSGA-II genetic algorithm, MySQL and OpenGauss databases have certain improvements in query throughput, query response time, and query resource utilisation. Moreover, the optimisation effect on the MySQL database was as high as 90.30%, which is more significant than that on the OpenGauss database.
    Keywords: database performance optimisation; MySQL and OpenGauss; NSGA-II; Non-dominated Sorting Genetic Algorithm II; query response time; dynamic adjustment capability; resource utilisation.
    DOI: 10.1504/IJDS.2024.10065423
     
  • Image analysis of a museum intelligent digital navigation system based on a virtual 3D deep neural network   Order a copy of this article
    by Fanyu Meng 
    Abstract: The aim of this study is to develop an intelligent digital tour guide system that utilises virtual 3D deep neural network (DNN) technology to improve the visiting experience and cultural dissemination of museums, providing visitors with more information and interactive experiences. This study conducted a questionnaire survey on 20 tourists using an intelligent digital tour guide system based on virtual 3D DNN technology, and compared the performance of the system designed in the work with traditional systems 1 and 2. The research results indicate that the designed system outperforms traditional systems 1 and 2 in terms of information entropy, average gradient, signal-to-noise ratio (SNR), and equivalent coefficient. For example, in terms of information entropy, the system designed in this paper has a value of 6.974 compared to 5.127 and 5.368 in conventional systems 1 and 2, respectively.
    Keywords: virtual 3D technology; DNN; deep neural network; image analysis; smart museum; digital tour guide system.
    DOI: 10.1504/IJDS.2024.10067167
     
  • Privacy protection and anomaly detection in intelligent sorting based on convolutional neural networks in IoT environment   Order a copy of this article
    by Han Zhou, Danping Chen, Gengxin Chen, Xiaoli Lin 
    Abstract: At present, the Internet of Things (IoT) has improved people's lives. IoT provides users with various intelligent sorting, networked devices, and applications across different fields. Therefore, detecting anomalies in IoT devices with intelligent sorting is crucial to minimise threats and improve safety. The convolutional neural network-assisted anomaly detection (CNN-AD) method has been developed to enhance security by detecting anomalies in the IoT environment with intelligent sorting. The Anomaly detection method uses a focused event system to increase its efficiency in intelligent sorting with event grouping tasks and improve detection accuracy. The event privacy is obtained by utilising the feature selection, mapping, and normalisation to enhance security. CNN automatically extracts characteristics from data and identifies and classifies the different types of events and attacks in intelligent sorting. The performance analysis and assessments of CNN are based on detecting different classes of attacks and computation times that are significantly shorter.
    Keywords: anomaly detection; CNN; convolutional neural network; classification; different attacks; privacy; security; intelligent sorting.
    DOI: 10.1504/IJDS.2024.10066751
     
  • Inter-provincial demand-side resource trade mechanism for enhancing new energy consumption   Order a copy of this article
    by Zhifeng Liang, Weixi Ji, Jie Yu, Li Chang, Lili Li 
    Abstract: With the acceleration of the construction of the new power system, the proportion of new energy continues to increase, and the randomness and volatility of the system will be further aggravated. It may appear in a certain period due to the large amount of new energy and the situation that the region cannot be consumed, which is necessary to carry out trans-regional or trans-provincial surplus new energy trading to ensure the consumption of new energy and realise the mutual benefit of resource surplus and shortage. Because of this, based on the existing trading mechanism, this paper proposes a framework, design principle, and market positioning which is the interprovincial demand-side resource mutual trading market with the participation of demand-side resources and the consideration of electricity energy as the trading variety - and constructing the mechanism of the mutual trading market based on this. Then, the trading strategies of both buyers and sellers are studied. Finally, the validity of the proposed market mechanism and model is verified by an example analysis.
    Keywords: demand-side resources; virtual power plant; new energy; inter-provincial transaction; auction theory; mutual aid transactions.
    DOI: 10.1504/IJDS.2024.10066940
     
  • Construction of stock price fluctuation prediction model based on ABC-SVR artificial bee colony algorithm   Order a copy of this article
    by Bo Wang 
    Abstract: Traditional research on stock price volatility prediction faces problems such as complex models, difficulty in parameter optimisation, and insufficient model generalisation ability. In this paper, the artificial bee colony support vector regression (ABC-SVR) algorithm is applied to optimise the parameter combination of the SVR model. Firstly, the paper collects historical stock price data and related factor data, and extracts technical indicators such as closing price, trading volume, moving average, as well as company financial data features from them. Then, the ABC-SVR algorithm is applied to select the kernel function, adjust the penalty parameters, and construct a stock price volatility prediction model. Finally, the dataset is divided into training and testing sets through cross validation, and the MAE and RMSE of the models on the testing set are determined. Research shows that the model has high prediction accuracy and small errors on various test sets.
    Keywords: stock price prediction; ABC; artificial bee colony; SVR; support vector regression; model optimisation; generalisation ability; prediction accuracy.
    DOI: 10.1504/IJDS.2024.10066941
     
  • Construction of a credit risk measurement system for small and micro firms in the context of internet financing   Order a copy of this article
    by Yi Wang, Jiaru Lao, Xiaowei Niu, Eun-Young Nam 
    Abstract: Small and micro firms (SMFs) are indispensable elements of the socioeconomy. However, they face significant and costly financing challenges due to their inherent characteristics and market factors. The advent of internet financing offers a potential solution to these financing difficulties for SMFs. Nevertheless, given the current lack of sophisticated regulations over internet financing in China, balancing the provision of financial support to SMFs while maintaining the safety and stability of the financial market and institutions has become a critical area of interest. This paper aims to construct a credit risk measurement indicator system for SMFs in the context of internet financing. The proposed system consists of a goal layer, a criterion layer, an index layer, and a secondary criteria layer, and the analytic hierarchy process (AHP) method is used to develop a corresponding weight system. This paper offers a reference model to promote the development of financial services such as credit financing for SMFs while ensuring financial security.
    Keywords: internet financing in China; SMFs; small and micro firms; credit risk measurement; AHP; analytic hierarchy process method.
    DOI: 10.1504/IJDS.2024.10067743
     
  • Machine learning model training method and device based on artificial intelligence   Order a copy of this article
    by Danping Chen, Han Zhou, Xiaoli Lin, Yanpei Song 
    Abstract: As an important research direction in artificial intelligence (AI), machine learning (ML) has been widely used in many complex systems. This paper aimed to study how to improve and train graph based semi-supervised learning algorithm (GBSSLA) based on ML. This paper chooses decision trees (DT) and backpropagation neural networks (BPNN) as classifiers to train ML models. Experimental analysis shows that when the labelled data accounts for 20%, 50%, and 80% of the training set, the average error improvement rate of the improved graph based semi supervised learning algorithm (IGBSSLA) is always higher than that of the self training algorithm (STA) and cooperative training algorithm (CTA). From the experimental results, it could be seen that under the same experimental conditions, the same experimental data and the same classifier method, the final error of IGBSSLA and the percentage of error increase were better than STA and CTA.
    Keywords: machine learning model; artificial intelligence; training method; GBSSLA; graph based semi supervised learning algorithm.
    DOI: 10.1504/IJDS.2024.10067603