A dataset containing basic conversations, mental health FAQ, classical therapy conversations, and general advice provided to people suffering from anxiety and depression.
This dataset can be used to train a model for a chatbot that can behave like a therapist in order to provide emotional support to people with anxiety & depression.
The dataset contains intents. An “intent” is the intention behind a user's message. For instance, If I were to say “I am sad” to the chatbot, the intent, in this case, would be “sad”. Depending upon the intent, there is a set of Patterns and Responses appropriate for the intent. Patterns are some examples of a user’s message which aligns with the intent while Responses are the replies that the chatbot provides in accordance with the intent. Various intents are defined and their patterns and responses are used as the model’s training data to identify a particular intent.
sflagg/Kaggle-Mental-Health-Survey-Data dataset hosted on Hugging Face and contributed by the HF Datasets community
By Stephen Myers [source]
This dataset contains survey responses from individuals in the tech industry about their mental health, including questions about treatment, workplace resources, and attitudes towards discussing mental health in the workplace. Mental health is an issue that affects all people of all ages, genders and walks of life. The prevalence of these issues within the tech industry–one that places hard demands on those who work in it–is no exception. By analyzing this dataset, we can better understand how prevalent mental health issues are among those who work in the tech sector.–and what kinds of resources they rely upon to find help–so that more can be done to create a healthier working environment for all.
This dataset tracks key measures such as age, gender and country to determine overall prevalence, along with responses surrounding employee access to care options; whether mental health or physical illness are being taken as seriously by employers; whether or not anonymity is protected with regards to seeking help; and how coworkers may perceive those struggling with mental illness issues such as depression or anxiety. With an ever-evolving landscape due to new technology advancing faster than ever before – these statistics have never been more important for us to analyze if we hope remain true promoters of a healthy world inside and outside our office walls
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
In this dataset you will find data on age, gender, country, and state of survey respondents in addition to numerous questions that assess an individual's mental state including: self-employment status, family history of mental illness, treatment status and access or lack thereof; how their mental health condition affects their work; number of employees at the company they work for; remote work status; tech company status; benefit information from employers such as mental health benefits and wellness program availability; anonymity protection if seeking treatment resources for substance abuse or mental health issues ; ease (or difficulty) for medical leave for a mental health condition ; whether discussing physical or medical matters with employers have negative consequences. You will also find comments from survey participants.
To use this dataset effectively: - Clean the data by removing invalid responses/duplicates/missing values - you can do this with basic Pandas commands like .dropna() , .drop_duplicates(), .replace(). - Utilize descriptive statistics such as mean and median to draw general conclusions about patterns of responses - you can do this with Pandas tools such as .groupby() and .describe(). - Run various types analyses such as mean comparisons on different kinds of variables(age vs gender), correlations between different features etc using appropriate statistical methods - use commands like Statsmodels' OLS models (.smf) , calculate z-scores , run hypothesis tests etc depending on what analysis is needed. Make sure you are aware any underlying assumptions your analysis requires beforehand !
- Visualize your results with plotting libraries like Matplotlib/Seaborn to easily interpret these findings! Use boxplots/histograms/heatmaps where appropriate depending on your question !
- Using the results of this survey, you could develop targeted outreach campaigns directed at underrepresented groups that answer “No” to questions about their employers providing resources for mental health or discussing it as part of wellness programs.
- Analyzing the employee characteristics (e.g., age and gender) of those who reported negative consequences from discussing their mental health in the workplace could inform employer policies to support individuals with mental health conditions and reduce stigma and discrimination in the workplace.
- Correlating responses to questions about remote work, leave policies, and anonymity with whether or not individuals have sought treatment for a mental health condition may provide insight into which types of workplace resources are most beneficial for supporting employees dealing with these issues
If you use this dataset in your research, please credit the original authors. Data Source
License: Dataset copyright by authors - You are free to: - Share - copy and redi...
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
We collected over 10,087 posts from cancer patients and their caregivers on platforms like Reddit, Daily Strength, and the Health Board. The posts were related to five types of cancer: brain, colon, liver, leukemia, and lung cancer. Two team members scored each post based on the emotions expressed, using a scale from -2 to 1. Negative scores (-1 or -2) were given for posts showing grief or suffering, positive scores (1) for happy emotions like relief or accomplishment, and posts with no emotion received a score of 0 and were considered neutral. This analysis aims to understand the emotional aspects of cancer patients posts for a mental health study.
Data Source: Substance Abuse and Mental Health Services Administration (SAMHSA) 2020, U.S. Department of Health and Human Services (HHS).
This is the dataset used for my first project for mental health analysis with Ann Bertram and Tiffany McBride at Purdue Fort Wayne. It has been cleaned and divided into datasets based on the states. Each dataset will include demographic information such as age, education level, ethnicity, race, genders, mental illness flags, etc. For more information, please refer to the codebook.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset comprises both human expert and Large Language Model responses to queries about mental health
Please note: patient context and psychologist responses found within this dataset are all collected from Kaggle, from the NLP Mental Health Conversations repository.
The additional "LLM" column within this dataset has been generated by the MISTRAL-7B instruct v0.2 model, via the prompt:
You are a psychologist speaking to a patient. The patient will speak to you and you will then answer their query. [/INST] Okay. Go ahead, patient. I will answer you as a psychologist. [INST] Patient: QUERY_GOES_HERE Psychologist: [/INST]
This data was generated for, and analysed within the following study:
Bird, J.J., Wright, D., Sumich, A., and Lotfi, A., 2024, June. Generative AI in Psychological Therapy: Perspectives on Computational Linguistics and Large Language Models in Written Behaviour Monitoring. In Proceedings of the 17th International Conference on PErvasive Technologies Related to Assistive Environments.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
We collect direct text data from the narrative of the people who faced psychological problem. Then, we make this dataset from the text. In this dataset there are 6 columns those are Age, Gender, Problem description, problem summary, problem category and problem psychological category.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Mental Health Patients 2021-2022 ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/meetnagadia/district-wise-mental-health-patients-20212022 on 13 February 2022.
--- Dataset description provided by original source is as follows ---
District Wise Number of Mental Health Patients in year 2021-2020 in Country India State Karnataka
District Wise number of mental health patients such as severe mental illness, common mental disorder, alcohol, and substance abuse, cases referred to higher centers, suicide attempt cases
Karnataka, Health and Family Welfare Department, Karnataka
Health and Family welfare › Health
Karnataka data government Click Here to visit the website
Department of Health and Family Welfare
--- Original source retains full ownership of the source dataset ---
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
📌**Context**
The Healthcare Workforce Mental Health Dataset is designed to explore workplace mental health challenges in the healthcare industry, an environment known for high stress and burnout rates.
This dataset enables users to analyze key trends related to:
💠 Workplace Stressors: Examining the impact of heavy workloads, poor work environments, and emotional demands.
💠 Mental Health Outcomes: Understanding how stress and burnout influence job satisfaction, absenteeism, and turnover intention.
💠 Educational & Analytical Applications: A valuable resource for data analysts, students, and career changers looking to practice skills in data exploration and data visualization.
To help users gain deeper insights, this dataset is fully compatible with a Power BI Dashboard, available as part of a complete analytics bundle for enhanced visualization and reporting.
📌**Source**
This dataset was synthetically generated using the following methods:
💠 Python & Data Science Techniques: Probabilistic modeling to simulate realistic data distributions. Industry-informed variable relationships based on healthcare workforce studies.
💠 Guidance & Validation Using AI (ChatGPT): Assisted in refining dataset realism and logical mappings.
💠 Industry Research & Reports: Based on insights from WHO, CDC, OSHA, and academic studies on workplace stress and mental health in healthcare settings.
📌**Inspiration**
This dataset was inspired by ongoing discussions in healthcare regarding burnout, mental health, and staff retention. The goal is to bridge the gap between raw data and actionable insights by providing a structured, analyst-friendly dataset.
For those who want a ready-to-use reporting solution, a Power BI Dashboard Template is available, designed for interactive data exploration, workforce insights, and stress factor analysis.
📌**Important Note** This dataset is synthetic and intended for educational purposes only. It is not real-world employee data and should not be used for actual decision-making or policy implementation.
A novel large dataset of social media posts from users with one or multiple mental health conditions along with matched control users.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘OSMI Mental Health In Tech Survey 2020’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/osmihelp/osmi-2020-mental-health-in-tech-survey-results on 30 September 2021.
--- No further description of dataset provided by original source ---
--- Original source retains full ownership of the source dataset ---
This dataset is a collection of texts primarily focused on individuals experiencing anxiety, depression, and other mental health challenges. Its purpose is to facilitate understanding of language and sentiment related to mental health issues. The corpus can be applied to diverse tasks such as sentiment analysis, toxic language detection, and general mental health language analysis. The dataset is notably balanced, meaning it contains an equitable distribution of comments considered "poisonous" and those not.
The dataset is typically structured for distribution in a CSV file format. It contains a total of 27,972 unique records. The distribution of labels shows 14,139 records are classified with a label of '0' (not poisonous), and 13,838 records are classified with a label of '1' (poisonous), indicating its balanced nature.
This dataset is an ideal resource for developing and refining machine learning models for sentiment analysis, particularly within mental health contexts. It is also highly suitable for creating toxic language detection systems and for conducting linguistic research aimed at understanding patterns in mental health discourse.
The geographic scope of this dataset is global. It encompasses a wide range of text comments associated with mental health conditions such as anxiety and depression. The provided sources do not specify a particular time range for the data or specific demographic availability beyond the nature of the comments themselves.
CC-BY
The dataset is especially beneficial for researchers studying mental health language, mental health professionals seeking insights into online discourse, and developers creating AI models for content moderation, sentiment analysis tools, or support applications related to mental well-being.
Original Data Source: Mental Health Corpus
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
🧠 Mental Health Posts Dataset
This dataset is curated for mental health emotion classification tasks. It originates from the Counsel Chat Dataset available on Kaggle and has been preprocessed and restructured to suit NLP-based classification models.
📄 Overview
The dataset is designed to support the training and evaluation of models that classify user-generated mental health posts into one of the following categories:
depression anxiety suicidal addiction… See the full description on the dataset page: https://huggingface.co/datasets/Noobie314/mental-health-posts-dataset.
Mental Disorders Symptom Dataset
This dataset is adapted from Basel Bakeer's Kaggle dataset, prepared for use in agentic LLM pipelines and co-occurrence symptom analysis. { "dataset_name": { "tags": ["mental health", "symptoms", "medical", "psychiatry"] } } tags:
healthcare mental-health tabular csv co-occurrence ai-health
Features
Binary labels for 20+ mental health symptoms Demographic info (age) Diagnosed disorder column Cleaned and ready for ML/NLP… See the full description on the dataset page: https://huggingface.co/datasets/faisalsns/mental_disorder_symptoms.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides uncleaned Twitter data, specifically filtered for English content, designed for mental health classification at the Tweet-level. It serves as a valuable resource for developing and evaluating models that identify mental health indicators from social media text. The dataset includes raw tweet text and associated user metrics. Additionally, it can be used to explore and apply data cleaning and feature extraction techniques, such as Topic Modelling Features using Latent Dirichlet Allocation (LDA) to summarise tweets into top k topics, and Emoji Sentiment Features to count positive, negative, and neutral expression emojis present in tweets.
The data files are typically provided in CSV format and are in an uncleaned state. While a specific total number of rows or records is not explicitly stated, the dataset contains approximately 19,102 unique post IDs and 19,488 unique user IDs. Further details on the distribution of specific metrics like followers, friends, favourites, statuses, and retweets are available within the dataset's meta-information, showing various ranges and their corresponding counts.
This dataset is ideal for: * Developing and testing mental health classification models using social media data. * Practising and demonstrating Natural Language Processing (NLP) techniques, including text analysis and feature engineering. * Exploring and applying data cleaning methodologies on raw social media text. * Implementing and evaluating Topic Modelling using algorithms like LDA. * Conducting sentiment analysis based on emoji usage in tweets. * Research in social media analytics, public health, and digital epidemiology.
The dataset's coverage is global, with tweets specifically filtered to contain English context only. There is no specific time range for the collection period of the tweets provided, but the dataset was listed on 05/06/2025.
CCO
This dataset is suitable for: * Data scientists and machine learning engineers working on text classification and NLP projects. * Researchers in mental health, social sciences, and computational linguistics. * Students and academics learning about social media data analysis, feature engineering, and model development for health applications. * Healthcare professionals interested in leveraging social media for insights into mental wellness trends.
Original Data Source: Depression: Twitter Dataset + Feature Extraction
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The data is in uncleaned format and is collected using Twitter API. The Tweets has been filtered to keep only the English context. It targets mental health classification of the user at Tweet-level. Also check out notebooks I have provided which demonstrates Data Cleaning and Feature Extraction Techniques on the given dataset
Topic Modelling Features using LDA (Latent Dirichlet Allocation) i.e. summarizing tweet into one of Top k topics Emoji Sentiment Features i.e. count of Positive, Negative and Neutral Expression emoji's present in the tweet
Original Data Source: Depression: Twitter Dataset + Feature Extraction
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Huggingface Hub [source]
This dataset contains a compilation of carefully-crafted Q&A pairs which are designed to provide AI-based tailored support for mental health. These carefully chosen questions and answers offer an avenue for those looking for help to gain the assistance they need. With these pre-processed conversations, Artificial Intelligence (AI) solutions can be developed and deployed to better understand and respond appropriately to individual needs based on their input. This comprehensive dataset is crafted by experts in the mental health field, providing insightful content that will further research in this growing area. These data points will be invaluable for developing the next generation of personalized AI-based mental health chatbots capable of truly understanding what people need
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset contains pre-processed Q&A pairs for AI-based tailored support for mental health. As such, it represents an excellent starting point in building a conversational model which can handle conversations about mental health issues. Here are some tips on how to use this dataset to its fullest potential:
Understand your data: Spend time getting to know the text of the conversation between the user and the chatbot and familiarize yourself with what type of questions and answers are included in this specific dataset. This will help you better formulate queries for your own conversational model or develop new ones you can add yourself.
Refine your language processing models: By studying the patterns in syntax, grammar, tone, voice, etc., within this conversational data set you can hone your natural language processing capabilities - such as keyword extractions or entity extraction – prior to implementing them into a larger bot system .
Test assumptions: Have an idea of what you think may work best with a particular audience or context? See if these assumptions pan out by applying different variations of text to this dataset to see if it works before rolling out changes across other channels or programs that utilize AI/chatbot services
Research & Analyze Results : After testing out different scenarios on real-world users by using various forms of q&a within this chatbot pair data set , analyze & record any relevant results pertaining towards understanding user behavior better through further analysis after being exposed to tailored texted conversations about Mental Health topics both passively & actively . The more information you collect here , leads us closer towards creating effective AI powered conversations that bring our desired outcomes from our customer base .
- Developing a chatbot for personalized mental health advice and guidance tailored to individuals' unique needs, experiences, and struggles.
- Creating an AI-driven diagnostic system that can interpret mental health conversations and provide targeted recommendations for interventions or treatments based on clinical expertise.
- Designing an AI-powered recommendation engine to suggest relevant content such as articles, videos, or podcasts based on users’ questions or topics of discussion during their conversation with the chatbot
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: train.csv | Column name | Description | |:--------------|:------------------------------------------------------------------------| | text | The text of the conversation between the user and the chatbot. (String) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.
This dataset provides insights into patient and caregiver experiences with various psychiatric medications. It features unstructured text reviews alongside categorical ratings and demographic information. The primary aim is to capture real-world feedback on drug effectiveness, side effects, and overall satisfaction. The dataset currently includes over 61,000 reviews for hundreds of drugs used to treat conditions such as depression, anxiety, bipolar disorder, and schizophrenia. Future updates are planned to include more recent reviews.
drug_name
: The name of the psychiatric medication.date
: The date the review was submitted.age
: The age range of the reviewer (e.g., "45-54", "35-44", "13-18").gender
: The gender of the reviewer.time_on_drug
: The duration the reviewer has been taking the medication (e.g., "1 to less than 2 years", "less than 1 month").reviewer_type
: Indicates whether the reviewer is a "Patient" or a "Caregiver".condition
: The medical condition for which the drug was prescribed (e.g., "Posttraumatic Stress Syndrome", "Depression", "Panic Disorder").rating_overall
: The overall rating given by the reviewer, typically on a scale of 1 to 5.rating_effectiveness
: The reviewer's rating of the drug's effectiveness.rating_ease_of_use
: The reviewer's rating for how easy the drug was to use.rating_satisfaction
: The reviewer's rating of their satisfaction with the drug.text
: The detailed, unstructured text review provided by the patient or caregiver, describing their experiences.The dataset is typically provided as a data file, often in CSV format. It contains over 61,000 individual reviews. The exact number of rows or records for a specific sample may vary, but the overall dataset is substantial. Each record is structured with distinct columns as detailed above, allowing for both quantitative and qualitative analysis.
This dataset is ideal for a variety of applications focusing on real-world drug outcomes and patient sentiment. It can be used for: * Natural Language Processing (NLP): Training models for sentiment analysis, topic modelling, and entity recognition on healthcare-related text. * Social Science Research: Studying patient perceptions, drug adherence, and the psychosocial impact of psychiatric medications. * Healthcare Analytics: Identifying trends in drug effectiveness and side effects across different demographics and conditions. * Pharmaceutical Research: Understanding patient feedback to inform drug development and post-market surveillance. * Machine Learning: Developing predictive models for drug response or side effect occurrence based on review data.
The dataset's coverage is global, collecting reviews from diverse geographical regions. The time range for the reviews is ongoing, with updates expanding the dataset to include recent submissions to WebMD. Demographically, it includes feedback from patients and caregivers across various age groups and genders. The primary focus for conditions includes depression, anxiety (including anxiety with depression), bipolar disorder, and schizophrenia, with potential for expansion to other psychiatric disorders in future versions.
CC-BY-NC
This dataset is suitable for: * Academic Researchers: For studies on pharmacovigilance, mental health, and patient-reported outcomes. * Data Scientists and Analysts: To build and refine models for text analysis and predictive analytics in the healthcare domain. * Healthcare Providers: To gain a broader understanding of patient experiences beyond clinical trials. * AI and LLM Developers: For training and fine-tuning language models on domain-specific healthcare text. * Pharmaceutical Companies: For market research, competitor analysis, and identifying unmet patient needs.
Original Data Source: WebMD Reviews for Psychiatric Drugs
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These data sets are survey results collected by OSMI and are made available by the CC-BY-SA 4.0 license. One survey was performed in 2014 and the other in 2016. Both of the surveys seek to understand how people that work in technology view mental health issues and to understand what support they receive from their employer.OSMI has made the data sets for both the 2014 survey and the 2016 survey available on Kaggle.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises mental health data from 1977 Bangladeshi university students across 15 top universities, collected from November to December 2023 using Google Forms. It includes assessments of academic anxiety, stress, and depression using widely used psychometric scales. The structured questionnaire covers sociodemographic variables and their associations, facilitating comprehensive analysis. Statistical analysis yielded satisfactory internal consistency (Cronbach’s alpha: 0.79), with anonymized participant data valuable for policymakers.
A dataset containing basic conversations, mental health FAQ, classical therapy conversations, and general advice provided to people suffering from anxiety and depression.
This dataset can be used to train a model for a chatbot that can behave like a therapist in order to provide emotional support to people with anxiety & depression.
The dataset contains intents. An “intent” is the intention behind a user's message. For instance, If I were to say “I am sad” to the chatbot, the intent, in this case, would be “sad”. Depending upon the intent, there is a set of Patterns and Responses appropriate for the intent. Patterns are some examples of a user’s message which aligns with the intent while Responses are the replies that the chatbot provides in accordance with the intent. Various intents are defined and their patterns and responses are used as the model’s training data to identify a particular intent.