Title: Tree-based methods for analytics of online shoppers' purchasing intentions
Authors: Lu Xiong; Xi Chen; Jingsai Liang; Xingtong Cao; Pengyu Zhu; Mingyuan Zhao
Addresses: Department of Mathematical Sciences, Middle Tennessee State University, 1301 East Main Street, Murfreesboro, TN, 37132, USA ' Department of Computer Science, Utah Valley University, 800 W University Parkway, Orem, UT 84058, USA ' Department of Computer Science, Westminster College, 1840 S 1300 E, Salt Lake City, UT 84105, USA ' Department of Mathematical Sciences, Middle Tennessee State University, 1301 East Main Street, Murfreesboro, TN 37132, USA ' Department of Mathematical Sciences, Middle Tennessee State University, 1301 East Main Street, Murfreesboro, TN 37132, USA ' Department of Mathematical Sciences, Middle Tennessee State University, 1301 East Main Street, Murfreesboro, TN 37132, USA
Abstract: The recent speedy growth of e-commerce and big data has accumulated vast amounts of data about online shopping behaviour. Analysing this data can help online retailers gain competitive advantages. We propose four tree-based methods for analytics of online shoppers' purchasing intentions. After exploring data through various visualisation techniques, we conduct feature engineering to improve the model's accuracy. AUC is the primary measurement used to evaluate models. To make the conclusion more statistically robust, k-fold cross-validation is applied to obtain the statistics of AUCs, such as the average and standard deviation. By analysing the global and local feature importance of each model, the most critical predictor, PageValues is found. Furthermore, we do sensitivity analysis for PageValues concerning the target variable Revenue to examine the relationship. Our findings support the decision on how to improve sales. The interpretation of the models and the explanation of their business implications make this paper unique.
Keywords: online shopping data analytics; feature engineering; decision tree; random forest; SGB; stochastic gradient boosting; XGBoost; feature importance; sensitivity analysis.
International Journal of Data Science, 2024 Vol.9 No.2, pp.99 - 122
Received: 08 Nov 2022
Accepted: 30 Jun 2023
Published online: 05 Jul 2024 *