Title: Supervised learning models to predict mental illness and its severity from Reddit posts
Authors: Neha Arun Angadi; Navya Eedula; Kshitij Prit Gopali; R. Jayashree
Addresses: Department of Computer Science, PES University, Bangalore, Karnataka, India ' Department of Computer Science, PES University, Bangalore, Karnataka, India ' Department of Computer Science, PES University, Bangalore, Karnataka, India ' Department of Computer Science, PES University, Bangalore, Karnataka, India
Abstract: Given the growing popularity of free dialogue on social media, this paper presents a methodology for identifying mental illnesses from Reddit posts where users describe their experiences with illnesses like bipolar disorder, borderline personality disorder (BPD), depression, eating disorders, obsessive-compulsive disorder (OCD), panic disorder, post-traumatic stress disorder (PTSD), and schizophrenia. After data cleaning and pre-processing with the standard NLP techniques on the posts, hyperparameter tweaking helped evaluate multiple different supervised classification models, from which the LinearSVC model delivered the best results with 78.25% accuracy. CalibratedClassifierCV helped with probabilistic calibration for the model. If the findings revealed that multiple mental diseases had comparable probability, a second step of classification was performed using a questionnaire that described the user's conditions, which the model used to determine the mental illness. The final step is to assess the severity of the sickness, which helps analyse the next plan-of-action to tackle the mental disorder.
Keywords: supervised machine learning; probability calibration; text pre-processing; mental illness; Reddit posts; severity detection.
DOI: 10.1504/IJCSE.2024.138417
International Journal of Computational Science and Engineering, 2024 Vol.27 No.3, pp.352 - 363
Received: 03 Oct 2022
Accepted: 11 Jun 2023
Published online: 03 May 2024 *