Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset appears to contain a variety of features related to text analysis, sentiment analysis, and psychological indicators, likely derived from posts or text data. Some features include readability indices such as Automated Readability Index (ARI), Coleman Liau Index, and Flesch-Kincaid Grade Level, as well as sentiment analysis scores like sentiment compound, negative, neutral, and positive scores. Additionally, there are features related to psychological aspects such as economic stress, isolation, substance use, and domestic stress. The dataset seems to cover a wide range of linguistic, psychological, and behavioural attributes, potentially suitable for analyzing mental health-related topics in online communities or text data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The PHQ-9 Student Depression Dataset contains responses from 500 students to the PHQ-9 questionnaire, a well-established tool for diagnosing depression. This dataset is designed to support the development of machine learning models aimed at automated depression detection by analyzing text responses to common depression-related questions.
The PHQ-9 questionnaire includes 9 questions that assess symptoms of depression over the past two weeks, covering areas like mood, energy levels, sleep, appetite, and thoughts of self-harm. The responses are scored on a scale from 0 (Not at all) to 3 (Nearly every day), with the total score ranging from 0 to 27. Based on this score, the depression severity is classified into one of the following categories: Minimal (0-4) Mild (5-9) Moderate (10-14) Moderately Severe (15-19) Severe (20-27)
This dataset is primarily designed for building models that can assist in automated depression detection. Some potential use cases include: Sentiment Analysis: Analyzing emotional tones in text responses to assess depression. Text Classification: Classifying responses into different depression severity levels. Predictive Modeling: Predicting depression severity based on textual responses. Feature Engineering: Extracting linguistic features (e.g., sentiment, keywords) to predict depression. The dataset is diverse, with synthetic responses across different levels of depression, providing a versatile foundation for machine learning applications. While the dataset does not contain personally identifiable information (PII), real-world applications should follow ethical guidelines regarding privacy, consent, and mental health resources. When working with real data or applying this dataset in clinical research, it is essential to adhere to ethical standards, including:
Data Privacy: Anonymizing personal information. Informed Consent: Ensuring participants give consent before data collection. Support Resources: Providing support for individuals who may exhibit serious mental health concerns.
Applications: Clinical Research: This dataset is valuable for studying depression detection using natural language processing and machine learning techniques. AI in Healthcare: It can be used in the development of tools for automated mental health assessment. Education: Training students or professionals in recognizing depression symptoms and analyzing responses.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This work focuses on the efficiency of the knowledge distillation approach in generating a lightweight yet powerful BERT-based model for natural language processing (NLP) applications. After the model creation, we applied the resulting model, LastBERT, to a real-world task—classifying severity levels of Attention Deficit Hyperactivity Disorder (ADHD)-related concerns from social media text data. Referring to LastBERT, a customized student BERT model, we significantly lowered model parameters from 110 million BERT base to 29 million-resulting in a model approximately 73.64% smaller. On the General Language Understanding Evaluation (GLUE) benchmark, comprising paraphrase identification, sentiment analysis, and text classification, the student model maintained strong performance across many tasks despite this reduction. The model was also used on a real-world ADHD dataset with an accuracy of 85%, F1 score of 85%, precision of 85%, and recall of 85%. When compared to DistilBERT (66 million parameters) and ClinicalBERT (110 million parameters), LastBERT demonstrated comparable performance, with DistilBERT slightly outperforming it at 87%, and ClinicalBERT achieving 86% across the same metrics. These findings highlight the LastBERT model’s capacity to classify degrees of ADHD severity properly, so it offers a useful tool for mental health professionals to assess and comprehend material produced by users on social networking platforms. The study emphasizes the possibilities of knowledge distillation to produce effective models fit for use in resource-limited conditions, hence advancing NLP and mental health diagnosis. Furthermore underlined by the considerable decrease in model size without appreciable performance loss is the lower computational resources needed for training and deployment, hence facilitating greater applicability. Especially using readily available computational tools like Google Colab and Kaggle Notebooks. This study shows the accessibility and usefulness of advanced NLP methods in pragmatic world applications.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset appears to contain a variety of features related to text analysis, sentiment analysis, and psychological indicators, likely derived from posts or text data. Some features include readability indices such as Automated Readability Index (ARI), Coleman Liau Index, and Flesch-Kincaid Grade Level, as well as sentiment analysis scores like sentiment compound, negative, neutral, and positive scores. Additionally, there are features related to psychological aspects such as economic stress, isolation, substance use, and domestic stress. The dataset seems to cover a wide range of linguistic, psychological, and behavioural attributes, potentially suitable for analyzing mental health-related topics in online communities or text data.