Title: Identification of common parameters for landslides using text mining approach
Authors: Sonam Lhamu Bhutia; Samarjeet Borah
Addresses: Sikkim Manipal Institute of Technology, Sikkim Manipal University, East Sikkim, 737136, India ' Sikkim Manipal Institute of Technology, Sikkim Manipal University, East Sikkim, 737136, India
Abstract: The study focuses on the extraction of necessary parameters for developing a suitable landslide susceptibility map (LSM) using text mining. The study applies text mining concepts to identify common parameters for landslides from existing literature. Feature extraction was done using the MapReduce technique was performed, comparing serial and parallel processing runtimes. Results favoured parallel processing, taking 0.80 seconds compared to 0.02 seconds for serial processing. A total of 14 features were selected as landslide conditioning factors. multi-label text classification was applied, using support vector classifier (SVC) and stochastic gradient descent classifier (SGD) as classifiers. The accuracy of these classifiers was measured using a confusion matrix, with SVM classifiers showing better results with an overall f1 score of 0.90 versus 0.88. The text mining approach successfully extracted vital parameters from existing literature, contributing to the development of a suitable LSM.
Keywords: landslide; landslide parameters; machine learning; MapReduce; multi label text classification; stochastic gradient descent classifier; SGD; support vector machine; SVM; text mining.
DOI: 10.1504/IJDATS.2023.136665
International Journal of Data Analysis Techniques and Strategies, 2023 Vol.15 No.4, pp.277 - 301
Received: 14 Sep 2022
Accepted: 07 Feb 2023
Published online: 15 Feb 2024 *