Title: Identification of common parameters for landslides using text mining approach

Authors: Sonam Lhamu Bhutia; Samarjeet Borah

Addresses: Sikkim Manipal Institute of Technology, Sikkim Manipal University, East Sikkim, 737136, India ' Sikkim Manipal Institute of Technology, Sikkim Manipal University, East Sikkim, 737136, India

Abstract: The study focuses on the extraction of necessary parameters for developing a suitable landslide susceptibility map (LSM) using text mining. The study applies text mining concepts to identify common parameters for landslides from existing literature. Feature extraction was done using the MapReduce technique was performed, comparing serial and parallel processing runtimes. Results favoured parallel processing, taking 0.80 seconds compared to 0.02 seconds for serial processing. A total of 14 features were selected as landslide conditioning factors. multi-label text classification was applied, using support vector classifier (SVC) and stochastic gradient descent classifier (SGD) as classifiers. The accuracy of these classifiers was measured using a confusion matrix, with SVM classifiers showing better results with an overall f1 score of 0.90 versus 0.88. The text mining approach successfully extracted vital parameters from existing literature, contributing to the development of a suitable LSM.

Keywords: landslide; landslide parameters; machine learning; MapReduce; multi label text classification; stochastic gradient descent classifier; SGD; support vector machine; SVM; text mining.

DOI: 10.1504/IJDATS.2023.136665

International Journal of Data Analysis Techniques and Strategies, 2023 Vol.15 No.4, pp.277 - 301

Received: 14 Sep 2022
Accepted: 07 Feb 2023

Published online: 15 Feb 2024 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article