Forthcoming and Online First Articles

International Journal of Data Science

International Journal of Data Science (IJDS)

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Online First articles are published online here, before they appear in a journal issue. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.

Open AccessArticles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

Register for our alerting service, which notifies you by email when new issues are published online.

International Journal of Data Science (7 papers in press)

Regular Issues

  • Prediction of Customer Churn Risk with Advanced Machine Learning Methods   Order a copy of this article
    by Oguzhan Akan, Abhishek Verma, Sonika Sharma 
    Abstract: Customer churn risk prediction is an important area of research as it directly impacts the revenue stream of businesses. An ability to predict customer churn allows businesses to come up with better strategies to retain existing customers. In this research we perform a comprehensive comparison of feature selection methods, upsampling methods, and machine learning methods on the customer churn risk dataset. (i) Our research compares likelihood-based, tree-based, and layer-based machine learning methods on the churn dataset. (ii) Models built on the churn dataset without upsampling performed better than oversampling methods. However, SMOTE and ADASYN helped stabilize model performance. (iii) The models built on ADASYN dataset were slightly better than the SMOTE counterparts. (iv) It was observed that XGBoost and Deep Cascading Forest combined with XGBoost were consistently better across all metrics compared to other methods. (v) Information Value analysis performed better than PCA. In particular, IVR DCFX model has the best AUROC score with 89.1%.
    Keywords: Customer Churn; Deep Neural Networks; Deep Cascading Forest; Smote; Adasyn.
    DOI: 10.1504/IJDS.2024.10064744
     
  • Self-Evolving Data Collection Through Analytics and Business Intelligence to Predict the Price of Cryptocurrency   Order a copy of this article
    by Adam Moyer, William A. Young II, Timothy J. Haase 
    Abstract: This article presents the Self-Evolving Data Collection Engine through Analytics and Business Intelligence (SEDCABI) for predicting Bitcoin prices. Traditionally models use either structured or unstructured data alone, limiting effectiveness. This research pioneers using both data types. SEDCABI harnesses analytics and BI to extract insights from structured historical price and market data. It also incorporates unstructured social media sentiment and news to capture Bitcoin perceptions. Experiments show integrating both data types significantly improves prediction accuracy. SEDCABI continuously adapts to the dynamic crypto market. The plug-in prediction module enables customization. Overall, SEDCABI offers robust Bitcoin price predictions by combining structured and unstructured data. This contributes to cryptocurrency prediction research with an innovative approach to informed decision-making.
    Keywords: SEDCABI; Prediction; Bitcoin; Cryptocurrency; Text Mining; Analytics; Business Intelligence; Unstructured Data; Sentiment; Price.
    DOI: 10.1504/IJDS.2024.10064877
     
  • Tree-based methods for analytics of online shoppers' purchasing intentions   Order a copy of this article
    by Lu Xiong, Xi Chen, Jingsai Liang, Xingtong Cao, Pengyu Zhu, Mingyuan Zhao 
    Abstract: The recent speedy growth of e-commerce and big data has accumulated vast amounts of data about online shopping behaviour. Analysing this data can help online retailers gain competitive advantages. We propose four tree-based methods for analytics of online shoppers' purchasing intentions. After exploring data through various visualisation techniques, we conduct feature engineering to improve the model's accuracy. AUC is the primary measurement used to evaluate models. To make the conclusion more statistically robust, k-fold cross-validation is applied to obtain the statistics of AUCs, such as the average and standard deviation. By analysing the global and local feature importance of each model, the most critical predictor, PageValues is found. Furthermore, we do sensitivity analysis for PageValues concerning the target variable Revenue to examine the relationship. Our findings support the decision on how to improve sales. The interpretation of the models and the explanation of their business implications make this paper unique.
    Keywords: online shopping data analytics; feature engineering; decision tree; random forest; SGB; stochastic gradient boosting; XGBoost; feature importance; sensitivity analysis.
    DOI: 10.1504/IJDS.2024.10058603
     
  • Innovative research on encryption and protection of e-commerce with big data analysis   Order a copy of this article
    by Yifu Shu, Wenda Wang 
    Abstract: In the e-commerce sector, protecting data privacy is crucial. This study introduces the symmetric balanced funnel P^5 model as a method for addressing data protection challenges. The P^5 model organises data protection into five levels, each tailored to the specific security needs of different types of e-commerce data. It employs encryption algorithms like DES, AES, and RSA, with increasing encryption strength across the levels to ensure adequate protection. This approach not only safeguards user confidentiality and commercial interests but also provides balanced protection for data exchanged between parties. By offering focused protection for various categories of sensitive data, the P^5 model enhances overall data security in e-commerce. This study offers a new and comprehensive strategy for ensuring data privacy in e-commerce environments.
    Keywords: big data security; e-commerce data; symmetric balanced funnel P^5; multilevel encryption; data privacy; cybersecurity.
    DOI: 10.1504/IJDS.2024.10064485
     
  • Research on the reconstruction algorithm of a finite element deformation model based on digital twins   Order a copy of this article
    by Xu Jing, Li Wei, Wang Jun 
    Abstract: To ensure the integrity of the digital twins in the virtual-reality symbiosis stage, the problem that the finite element deformation and failure models can only be displayed in the software and cannot be exported needs to be solved. In this paper, a finite element reconstruction technique is developed through the biomimetic study of natural objects. First, the deformation data in the finite element is obtained by the 'hexagonal method' and 'slice method' through the study of mathematical principles and development based on Visual Studio, then the model format that can be recognised by the 3D printing equipment is reconstructed, smoothed, and optimised, and finally the modified model in the finite element is presented by using prototyping 3D printing technology rapidly. The innovation of the method and the development of the reconstruction algorithm solved the problem that digital twins can not accurately perceive the transient deformation of virtual realilty, which has strong application value and practical significance.
    Keywords: digital twins; finite element analysis; model reconstruction; full-life cycle; virtual-reality symbiosis.
    DOI: 10.1504/IJDS.2024.10061691
     
  • Comparing the impact of COVID-19 on three states: a data-driven approach   Order a copy of this article
    by K. Shao, Q. Shao 
    Abstract: The states of Florida, Michigan, and Ohio implemented rather different public health emergency policies to flatten the curve and save lives after the COVID-19 outbreak. This study aims to provide insight into one of the most important and fundamental topics for making public health policy: how to effectively handle life-threatening infectious diseases while minimising overall disruption of society. To compare these three states objectively, three severity risk metrics are proposed, and their log odds data are analysed. Both linear and multivariate models are applied to the log odds of the three severity rates. Contrary to visual inspection of the count data, only the result of one hypothesis test is statistically significant from the linear model, and none are significant from the multivariate model, at the significance level of 0.05. For a significant result, the estimates of the model parameters are in favor of Florida and Ohio.
    Keywords: COVID-19; population infection rate; case fatality rate; senior fatality rate; log odds; statistical models; statistical hypothesis testing; state of Florida; State of Michigan; State of Ohio.
    DOI: 10.1504/IJDS.2024.10059625
     
  • The evaluation of college students' entrepreneurship education performance using the t-test method   Order a copy of this article
    by Naiqi Chen, Yumei Wu 
    Abstract: This paper briefly introduced evaluation methods for entrepreneurship education performance, conducted a questionnaire survey with college students at Zhejiang Normal University, and compared the differences between the entrepreneurship education performance of students who participated in entrepreneurship education courses and those who did not use the T-test method. The results showed that students who participated in the entrepreneurship education course had higher overall entrepreneurship levels, but the entrepreneurship education course did not play a significant role in the dimension of "entrepreneurial behaviour", which involves entrepreneurial practice, but only improved performance in "determining the direction of entrepreneurship quickly and developing a plan"; after the entrepreneurship course, there was a need to focus more on the teaching of entrepreneurship practice.
    Keywords: entrepreneurship education; T-test method; analytical hierarchy process; college students; education performance.
    DOI: 10.1504/IJDS.2024.10059670