Title: User-generated content data analysis using machine learning methods: a case study in Bangkok, Thailand
Authors: Naragain Phumchusri; Naina Chugh
Addresses: Department of Industrial Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, Thailand ' Department of Industrial Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, Thailand
Abstract: With the travel and tourism (T&T) sector being the backbone to the global economy and the sector becoming more saturated and competitive, insights on T&T are vital now, more than ever. The rise of social media and user-generated contents has effectuated the opportunity for a systematic analysis of tourist preferences via user-generated content. The objective of this paper is to obtain insights on tourist preferences and tourism trends in Bangkok, Thailand through user-generated content scraped from TripAdvisor's online reviews of tours and activities. In order to develop insights on tourist preferences and tourism trends in Bangkok, various analyses are implemented, including sentiment analysis to gather tourist point-of view, association rules mining to find patterns of preferences, and natural language processing along with text frequency analysis to understand what features tourists are most frequently talking about. This paper also proposes machine learning prediction models using logistic regression, support vector machine and random forest algorithm to forecast 5-star ratings of reviews – with the goal to identify factors significantly affecting positive sentiments on tours and activities.
Keywords: user-generated content; TripAdvisor; sentiment analysis; machine learning; data analysis; Bangkok; Thailand.
DOI: 10.1504/IJBDA.2022.124054
International Journal of Business and Data Analytics, 2022 Vol.2 No.1, pp.72 - 109
Received: 28 Apr 2021
Accepted: 07 Jan 2022
Published online: 11 Jul 2022 *