Title: Worldwide gross revenue prediction for Bollywood movies using a hybrid ensemble model
Authors: Alina Zaidi; Siddhaling Urolagin
Addresses: Department of Computer Science, Birla Institute of Science and Technology, Dubai, UAE ' Department of Computer Science, Birla Institute of Science and Technology, Dubai, UAE
Abstract: Prediction of revenue before a movie is released can be very beneficial for stakeholders and investors in the movie industry. Even though Indian cinema is a booming industry, the literature work in the field of movie revenue prediction is more inclined towards non-Indian movie. In this study we built a novel hybrid prediction model to predict worldwide gross for Bollywood movies. Bollywood movies dataset is prepared by downloading movie related features from IMDb and YouTube movie trailers which consists of 674 movies. K-means clustering is performed on the movie dataset and two major clusters are identified. Important features specific to clusters are selected. The proposed hybrid prediction model performs segregation of movies into two clusters and employs a prediction model for each cluster. Prediction models we tested included various basic machine learning models and ensemble models. The ensemble model that combined predictions from support vector regression, neural network and ridge regression gave us the best result for both clusters and we chose it to be our final model. We obtained an overall MAE of 0.0272 and R2 of 0.80 after 10-fold cross validation.
Keywords: Bollywood; movie revenue prediction; box office; regression; ensemble; feature selection; machine learning; Scikit-Learn.
DOI: 10.1504/IJBIDM.2021.115952
International Journal of Business Intelligence and Data Mining, 2021 Vol.19 No.1, pp.52 - 69
Received: 06 Jun 2018
Accepted: 26 Dec 2018
Published online: 06 Jul 2021 *