A dataset containing basic conversations, mental health FAQ, classical therapy conversations, and general advice provided to people suffering from anxiety and depression.
This dataset can be used to train a model for a chatbot that can behave like a therapist in order to provide emotional support to people with anxiety & depression.
The dataset contains intents. An “intent” is the intention behind a user's message. For instance, If I were to say “I am sad” to the chatbot, the intent, in this case, would be “sad”. Depending upon the intent, there is a set of Patterns and Responses appropriate for the intent. Patterns are some examples of a user’s message which aligns with the intent while Responses are the replies that the chatbot provides in accordance with the intent. Various intents are defined and their patterns and responses are used as the model’s training data to identify a particular intent.
sflagg/Kaggle-Mental-Health-Survey-Data dataset hosted on Hugging Face and contributed by the HF Datasets community
Data Source: Substance Abuse and Mental Health Services Administration (SAMHSA) 2020, U.S. Department of Health and Human Services (HHS).
This is the dataset used for my first project for mental health analysis with Ann Bertram and Tiffany McBride at Purdue Fort Wayne. It has been cleaned and divided into datasets based on the states. Each dataset will include demographic information such as age, education level, ethnicity, race, genders, mental illness flags, etc. For more information, please refer to the codebook.
By Stephen Myers [source]
This dataset contains survey responses from individuals in the tech industry about their mental health, including questions about treatment, workplace resources, and attitudes towards discussing mental health in the workplace. Mental health is an issue that affects all people of all ages, genders and walks of life. The prevalence of these issues within the tech industry–one that places hard demands on those who work in it–is no exception. By analyzing this dataset, we can better understand how prevalent mental health issues are among those who work in the tech sector.–and what kinds of resources they rely upon to find help–so that more can be done to create a healthier working environment for all.
This dataset tracks key measures such as age, gender and country to determine overall prevalence, along with responses surrounding employee access to care options; whether mental health or physical illness are being taken as seriously by employers; whether or not anonymity is protected with regards to seeking help; and how coworkers may perceive those struggling with mental illness issues such as depression or anxiety. With an ever-evolving landscape due to new technology advancing faster than ever before – these statistics have never been more important for us to analyze if we hope remain true promoters of a healthy world inside and outside our office walls
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
In this dataset you will find data on age, gender, country, and state of survey respondents in addition to numerous questions that assess an individual's mental state including: self-employment status, family history of mental illness, treatment status and access or lack thereof; how their mental health condition affects their work; number of employees at the company they work for; remote work status; tech company status; benefit information from employers such as mental health benefits and wellness program availability; anonymity protection if seeking treatment resources for substance abuse or mental health issues ; ease (or difficulty) for medical leave for a mental health condition ; whether discussing physical or medical matters with employers have negative consequences. You will also find comments from survey participants.
To use this dataset effectively: - Clean the data by removing invalid responses/duplicates/missing values - you can do this with basic Pandas commands like .dropna() , .drop_duplicates(), .replace(). - Utilize descriptive statistics such as mean and median to draw general conclusions about patterns of responses - you can do this with Pandas tools such as .groupby() and .describe(). - Run various types analyses such as mean comparisons on different kinds of variables(age vs gender), correlations between different features etc using appropriate statistical methods - use commands like Statsmodels' OLS models (.smf) , calculate z-scores , run hypothesis tests etc depending on what analysis is needed. Make sure you are aware any underlying assumptions your analysis requires beforehand !
- Visualize your results with plotting libraries like Matplotlib/Seaborn to easily interpret these findings! Use boxplots/histograms/heatmaps where appropriate depending on your question !
- Using the results of this survey, you could develop targeted outreach campaigns directed at underrepresented groups that answer “No” to questions about their employers providing resources for mental health or discussing it as part of wellness programs.
- Analyzing the employee characteristics (e.g., age and gender) of those who reported negative consequences from discussing their mental health in the workplace could inform employer policies to support individuals with mental health conditions and reduce stigma and discrimination in the workplace.
- Correlating responses to questions about remote work, leave policies, and anonymity with whether or not individuals have sought treatment for a mental health condition may provide insight into which types of workplace resources are most beneficial for supporting employees dealing with these issues
If you use this dataset in your research, please credit the original authors. Data Source
License: Dataset copyright by authors - You are free to: - Share - copy and redi...
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
We collect direct text data from the narrative of the people who faced psychological problem. Then, we make this dataset from the text. In this dataset there are 6 columns those are Age, Gender, Problem description, problem summary, problem category and problem psychological category.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
We collected over 10,087 posts from cancer patients and their caregivers on platforms like Reddit, Daily Strength, and the Health Board. The posts were related to five types of cancer: brain, colon, liver, leukemia, and lung cancer. Two team members scored each post based on the emotions expressed, using a scale from -2 to 1. Negative scores (-1 or -2) were given for posts showing grief or suffering, positive scores (1) for happy emotions like relief or accomplishment, and posts with no emotion received a score of 0 and were considered neutral. This analysis aims to understand the emotional aspects of cancer patients posts for a mental health study.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
📌**Context**
The Healthcare Workforce Mental Health Dataset is designed to explore workplace mental health challenges in the healthcare industry, an environment known for high stress and burnout rates.
This dataset enables users to analyze key trends related to:
💠 Workplace Stressors: Examining the impact of heavy workloads, poor work environments, and emotional demands.
💠 Mental Health Outcomes: Understanding how stress and burnout influence job satisfaction, absenteeism, and turnover intention.
💠 Educational & Analytical Applications: A valuable resource for data analysts, students, and career changers looking to practice skills in data exploration and data visualization.
To help users gain deeper insights, this dataset is fully compatible with a Power BI Dashboard, available as part of a complete analytics bundle for enhanced visualization and reporting.
📌**Source**
This dataset was synthetically generated using the following methods:
💠 Python & Data Science Techniques: Probabilistic modeling to simulate realistic data distributions. Industry-informed variable relationships based on healthcare workforce studies.
💠 Guidance & Validation Using AI (ChatGPT): Assisted in refining dataset realism and logical mappings.
💠 Industry Research & Reports: Based on insights from WHO, CDC, OSHA, and academic studies on workplace stress and mental health in healthcare settings.
📌**Inspiration**
This dataset was inspired by ongoing discussions in healthcare regarding burnout, mental health, and staff retention. The goal is to bridge the gap between raw data and actionable insights by providing a structured, analyst-friendly dataset.
For those who want a ready-to-use reporting solution, a Power BI Dashboard Template is available, designed for interactive data exploration, workforce insights, and stress factor analysis.
📌**Important Note** This dataset is synthetic and intended for educational purposes only. It is not real-world employee data and should not be used for actual decision-making or policy implementation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Mental Health Patients 2021-2022 ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/meetnagadia/district-wise-mental-health-patients-20212022 on 13 February 2022.
--- Dataset description provided by original source is as follows ---
District Wise Number of Mental Health Patients in year 2021-2020 in Country India State Karnataka
District Wise number of mental health patients such as severe mental illness, common mental disorder, alcohol, and substance abuse, cases referred to higher centers, suicide attempt cases
Karnataka, Health and Family Welfare Department, Karnataka
Health and Family welfare › Health
Karnataka data government Click Here to visit the website
Department of Health and Family Welfare
--- Original source retains full ownership of the source dataset ---
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset comprises both human expert and Large Language Model responses to queries about mental health
Please note: patient context and psychologist responses found within this dataset are all collected from Kaggle, from the NLP Mental Health Conversations repository.
The additional "LLM" column within this dataset has been generated by the MISTRAL-7B instruct v0.2 model, via the prompt:
You are a psychologist speaking to a patient. The patient will speak to you and you will then answer their query. [/INST] Okay. Go ahead, patient. I will answer you as a psychologist. [INST] Patient: QUERY_GOES_HERE Psychologist: [/INST]
This data was generated for, and analysed within the following study:
Bird, J.J., Wright, D., Sumich, A., and Lotfi, A., 2024, June. Generative AI in Psychological Therapy: Perspectives on Computational Linguistics and Large Language Models in Written Behaviour Monitoring. In Proceedings of the 17th International Conference on PErvasive Technologies Related to Assistive Environments.
A novel large dataset of social media posts from users with one or multiple mental health conditions along with matched control users.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘OSMI Mental Health In Tech Survey 2020’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/osmihelp/osmi-2020-mental-health-in-tech-survey-results on 30 September 2021.
--- No further description of dataset provided by original source ---
--- Original source retains full ownership of the source dataset ---
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
🧠 Mental Health Posts Dataset
This dataset is curated for mental health emotion classification tasks. It originates from the Counsel Chat Dataset available on Kaggle and has been preprocessed and restructured to suit NLP-based classification models.
📄 Overview
The dataset is designed to support the training and evaluation of models that classify user-generated mental health posts into one of the following categories:
depression anxiety suicidal addiction… See the full description on the dataset page: https://huggingface.co/datasets/Noobie314/mental-health-posts-dataset.
Mental Disorders Symptom Dataset
This dataset is adapted from Basel Bakeer's Kaggle dataset, prepared for use in agentic LLM pipelines and co-occurrence symptom analysis. { "dataset_name": { "tags": ["mental health", "symptoms", "medical", "psychiatry"] } } tags:
healthcare mental-health tabular csv co-occurrence ai-health
Features
Binary labels for 20+ mental health symptoms Demographic info (age) Diagnosed disorder column Cleaned and ready for ML/NLP… See the full description on the dataset page: https://huggingface.co/datasets/faisalsns/mental_disorder_symptoms.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by mainamay
Released under Apache 2.0
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These data sets are survey results collected by OSMI and are made available by the CC-BY-SA 4.0 license. One survey was performed in 2014 and the other in 2016. Both of the surveys seek to understand how people that work in technology view mental health issues and to understand what support they receive from their employer.OSMI has made the data sets for both the 2014 survey and the 2016 survey available on Kaggle.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises mental health data from 1977 Bangladeshi university students across 15 top universities, collected from November to December 2023 using Google Forms. It includes assessments of academic anxiety, stress, and depression using widely used psychometric scales. The structured questionnaire covers sociodemographic variables and their associations, facilitating comprehensive analysis. Statistical analysis yielded satisfactory internal consistency (Cronbach’s alpha: 0.79), with anonymized participant data valuable for policymakers.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Anxiety and Depression Psychological Therapies ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/mpwolke/cusersmarildownloadsanxietycsv on 28 January 2022.
--- Dataset description provided by original source is as follows ---
National Clinical Audit of Anxiety and Depression Psychological Therapies Spotlight Audit. Data collected between October 2018 and January 2019 and aggregated by mental health services delivering psychological therapies in secondary care.
Freedom of Information (FOI) requests : Dr Alan Quirk Alan.Quirk@rcpsych.ac.uk https://www.rcpsych.ac.uk/improving-care/ccqi/national-clinical-audits/national-clinical-audit-of-anxiety-and-depression
Photo by Sarah Kilian on Unsplash (Covid-19 times)
The Implications of COVID-19 for Mental Health . The COVID-19 pandemic and resulting economic downturn have negatively affected many people’s mental health and created new barriers for people already suffering from mental illness and substance use disorders. Therefore this Pandemic affects not only the infected persons but all the World, with repercussions that can persists beyond 2020.
--- Original source retains full ownership of the source dataset ---
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset explores the relationship between digital behavior and mental well-being among 100,000 individuals. It records how much time people spend on screens, use of social media (including TikTok), and how these habits may influence their sleep, stress, and mood levels.
It includes six numerical features, all clean and ready for analysis, making it ideal for machine learning tasks like regression or classification. The data enables researchers and analysts to investigate how modern digital lifestyles may impact mental health indicators in measurable ways.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The objectives of this study were to assess the association of pet ownership and the quality of life, loneliness, anxiety, stress, and mental health of Canadians during the confinement measures following second wave of COVID-19 in Canada. This dataset contains the questionnaires (English and French), the data, and the R code used between April and November 2021.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This work focuses on the efficiency of the knowledge distillation approach in generating a lightweight yet powerful BERT-based model for natural language processing (NLP) applications. After the model creation, we applied the resulting model, LastBERT, to a real-world task—classifying severity levels of Attention Deficit Hyperactivity Disorder (ADHD)-related concerns from social media text data. Referring to LastBERT, a customized student BERT model, we significantly lowered model parameters from 110 million BERT base to 29 million-resulting in a model approximately 73.64% smaller. On the General Language Understanding Evaluation (GLUE) benchmark, comprising paraphrase identification, sentiment analysis, and text classification, the student model maintained strong performance across many tasks despite this reduction. The model was also used on a real-world ADHD dataset with an accuracy of 85%, F1 score of 85%, precision of 85%, and recall of 85%. When compared to DistilBERT (66 million parameters) and ClinicalBERT (110 million parameters), LastBERT demonstrated comparable performance, with DistilBERT slightly outperforming it at 87%, and ClinicalBERT achieving 86% across the same metrics. These findings highlight the LastBERT model’s capacity to classify degrees of ADHD severity properly, so it offers a useful tool for mental health professionals to assess and comprehend material produced by users on social networking platforms. The study emphasizes the possibilities of knowledge distillation to produce effective models fit for use in resource-limited conditions, hence advancing NLP and mental health diagnosis. Furthermore underlined by the considerable decrease in model size without appreciable performance loss is the lower computational resources needed for training and deployment, hence facilitating greater applicability. Especially using readily available computational tools like Google Colab and Kaggle Notebooks. This study shows the accessibility and usefulness of advanced NLP methods in pragmatic world applications.
A dataset containing basic conversations, mental health FAQ, classical therapy conversations, and general advice provided to people suffering from anxiety and depression.
This dataset can be used to train a model for a chatbot that can behave like a therapist in order to provide emotional support to people with anxiety & depression.
The dataset contains intents. An “intent” is the intention behind a user's message. For instance, If I were to say “I am sad” to the chatbot, the intent, in this case, would be “sad”. Depending upon the intent, there is a set of Patterns and Responses appropriate for the intent. Patterns are some examples of a user’s message which aligns with the intent while Responses are the replies that the chatbot provides in accordance with the intent. Various intents are defined and their patterns and responses are used as the model’s training data to identify a particular intent.