COVID-19 Mental Health NLP Analysis — True Stats

COVID-19 Lockdown
Mental Health Impact
Analysis Using NLP

An MSc dissertation examining how machine and deep learning algorithms can predict public sentiment from Twitter data during the initial COVID-19 lockdown — with implications for mental healthcare resource management.

PDF
Full Dissertation — Mustafa Ahmad, Birmingham City University (2022) MSc Big Data Analytics · CMP7200 · 44 pages
Download PDF →
BERT
LSTM
Log Reg
Rnd Fst
Naive B
Algorithm F-1 Score Comparison
86%BERT Acc.
85%LSTM Acc.
44KTweets
NLP · Mental Health · Deep Learning

COVID-19 Lockdown
Mental Health Impact
Analysis on Twitter

Social media platforms can be extremely valuable sources for gathering information related to mental health. This project applies machine and deep learning algorithms — including BERT and LSTM — to over 44,000 COVID-19-related tweets to predict sentiment, comparing performance across multiple models to identify optimal approaches for crisis-period mental health monitoring.

Research Overview

The Problem

Mental Health in Crisis

56% of young people in the UK reported anxiety since COVID-19. With a psychologist-to-patient ratio of 1:10,000, NLP offers a scalable, low-resource solution to monitor and respond to public mental health at scale.

The Dataset

44,000 Twitter Posts

Tweets collected over 6 weeks from March 2020 — the initial lockdown period. Sentiment labels across five categories, consolidated to Positive, Negative, and Neutral for modelling. Sourced ethically from Kaggle.

The Approach

5 Algorithms Compared

Traditional ML models (Random Forest, Multinomial Naive Bayes, Logistic Regression) benchmarked against deep learning heavyweights BERT and LSTM across F-1, precision, recall, and accuracy metrics.

Algorithm Performance Results

Tested on 30% holdout — averaged across Positive, Negative & Neutral classes

Algorithm Accuracy F-1 Score Precision Recall
BERT
Deep Learning · Transformer
86%
85% 86% 84% Best Overall
LSTM
Deep Learning · RNN-Based
85%
84% 87% 81% Best Precision
Multinomial Logistic Regression
Machine Learning · Supervised
79%
76%76%76%
Random Forest
Machine Learning · Ensemble
73%
71%70%72%
Multinomial Naive Bayes
Machine Learning · Probabilistic
68%
63%64%62%

Key Findings

Tweet Behaviour

Mid-Week Mental Health Peaks

63% of COVID-related tweets were posted Monday–Thursday, peaking mid-week before declining sharply over weekends — suggesting heightened anxiety correlates with work-week pressures.

Sentiment Distribution

Negative Sentiment Dominated

Negative and Extremely Negative tweets together accounted for nearly 38% of all posts. Top hashtags including #panicbuying and #CoronaCrisis reflect acute supply anxiety and fear.

Model Insight

Neutral Sentiment Hardest to Predict

All models struggled most with the Neutral class. Naive Bayes achieved only 47% F-1 on Neutral tweets, while BERT reached 80% — highlighting inherent linguistic ambiguity in neutral COVID discourse.

Deep Learning Advantage

Deep Learning Outperforms Classic ML

BERT and LSTM outperformed all three machine learning models by a significant margin, automatically learning contextual representations — a critical advantage for nuanced mental health language.

Conclusion

BERT emerged as the best overall performer at 86% accuracy, with LSTM a close second at 85%. Both deep learning models significantly outpaced traditional machine learning approaches, validating their use for real-world sentiment monitoring during public health crises. The research recommends deploying BERT on larger datasets, given its bidirectional reading capability — a key advantage in detecting nuanced mental health signals.

Future Directions

  • Extending sentiment analysis to languages beyond English to serve global populations
  • Integration with Electronic Health Records (EHR) for clinical decision support
  • Development of mental health chatbots using deep learning for early intervention
  • Real-time social media monitoring pipelines for crisis-period public health management
  • Cross-demographic depression classifiers for equitable global health applications

Full code and dataset available on GitHub. Includes all Python code, Jupyter notebooks, and the cleaned dataset.

View on GitHub →