Article: Supervised learning models to predict mental illness and its severity from Reddit posts Journal: International Journal of Computational Science and Engineering (IJCSE) 2024 Vol.27 No.3 pp.352 - 363 Abstract: Given the growing popularity of free dialogue on social media, this paper presents a methodology for identifying mental illnesses from Reddit posts where users describe their experiences with illnesses like bipolar disorder, borderline personality disorder (BPD), depression, eating disorders, obsessive-compulsive disorder (OCD), panic disorder, post-traumatic stress disorder (PTSD), and schizophrenia. After data cleaning and pre-processing with the standard NLP techniques on the posts, hyperparameter tweaking helped evaluate multiple different supervised classification models, from which the LinearSVC model delivered the best results with 78.25% accuracy. CalibratedClassifierCV helped with probabilistic calibration for the model. If the findings revealed that multiple mental diseases had comparable probability, a second step of classification was performed using a questionnaire that described the user's conditions, which the model used to determine the mental illness. The final step is to assess the severity of the sickness, which helps analyse the next plan-of-action to tackle the mental disorder. Inderscience Publishers - linking academia, business and industry through research

Title: Supervised learning models to predict mental illness and its severity from Reddit posts

Authors: Neha Arun Angadi; Navya Eedula; Kshitij Prit Gopali; R. Jayashree

Addresses: Department of Computer Science, PES University, Bangalore, Karnataka, India ' Department of Computer Science, PES University, Bangalore, Karnataka, India ' Department of Computer Science, PES University, Bangalore, Karnataka, India ' Department of Computer Science, PES University, Bangalore, Karnataka, India

Abstract: Given the growing popularity of free dialogue on social media, this paper presents a methodology for identifying mental illnesses from Reddit posts where users describe their experiences with illnesses like bipolar disorder, borderline personality disorder (BPD), depression, eating disorders, obsessive-compulsive disorder (OCD), panic disorder, post-traumatic stress disorder (PTSD), and schizophrenia. After data cleaning and pre-processing with the standard NLP techniques on the posts, hyperparameter tweaking helped evaluate multiple different supervised classification models, from which the LinearSVC model delivered the best results with 78.25% accuracy. CalibratedClassifierCV helped with probabilistic calibration for the model. If the findings revealed that multiple mental diseases had comparable probability, a second step of classification was performed using a questionnaire that described the user's conditions, which the model used to determine the mental illness. The final step is to assess the severity of the sickness, which helps analyse the next plan-of-action to tackle the mental disorder.

Keywords: supervised machine learning; probability calibration; text pre-processing; mental illness; Reddit posts; severity detection.

DOI: 10.1504/IJCSE.2024.138417

International Journal of Computational Science and Engineering, 2024 Vol.27 No.3, pp.352 - 363

Received: 03 Oct 2022
Accepted: 11 Jun 2023
Published online: 03 May 2024 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: Supervised learning models to predict mental illness and its severity from Reddit posts

Keep up-to-date