100+ datasets found

Student Health Dataset
kaggle.com
Updated May 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nikhil chahar (2023). Student Health Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/5633096
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/5633096
Dataset updated
May 8, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
nikhil chahar
Description
Student health data was collected from all over the university via survey, there are 18 classes in the dataset.

Data from: THE RELEVANCY OF MASSIVE HEALTH EDUCATION IN THE BRAZILIAN PRISON...

zenodo.org

csv, pdf

Updated Jul 16, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Janaína L. R. da S. Valentim; Janaína L. R. da S. Valentim; Sara Dias-Trindade; Sara Dias-Trindade; Eloiza da S. G. Oliveira; Eloiza da S. G. Oliveira; José A. M. Moreira; José A. M. Moreira; Felipe Fernandes; Felipe Fernandes; Manoel Honorio Romão; Manoel Honorio Romão; Philippi S. G. de Morais; Philippi S. G. de Morais; Alexandre R. Caitano; Alexandre R. Caitano; Aline P. Dias; Aline P. Dias; Carlos A. P. Oliveira; Carlos A. P. Oliveira; Karilany D. Coutinho; Karilany D. Coutinho; Ricardo B. Ceccim; Ricardo B. Ceccim; Ricardo A. de M. Valentim; Ricardo A. de M. Valentim (2024). THE RELEVANCY OF MASSIVE HEALTH EDUCATION IN THE BRAZILIAN PRISON SYSTEM: THE COURSE "HEALTH CARE FOR PEOPLE DEPRIVED OF FREEDOM" AND ITS IMPACTS [Dataset]. http://doi.org/10.5281/zenodo.6499752

Explore at:

csv, pdfAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.6499752

Dataset updated

Jul 16, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Dataset name: asppl_dataset_v2.csv

Version: 2.0

Dataset period: 06/07/2018 - 01/14/2022

Dataset Characteristics: Multivalued

Number of Instances: 8118

Number of Attributes: 9

Missing Values: Yes

Area(s): Health and education

Sources:

Virtual Learning Environment of the Brazilian Health System (AVASUS) (Brasil, 2022a);
Brazilian Occupational Classification (CBO) (Brasil, 2022b);
National Registry of Health Establishments (CNES) (Brasil, 2022c);
Brazilian Institute of Geography and Statistics (IBGE) (Brasil, 2022e).

Description: The data contained in the asppl_dataset_v2.csv dataset (see Table 1) originates from participants of the technology-based educational course “Health Care for People Deprived of Freedom.” The course is available on the AVASUS (Brasil, 2022a). This dataset provides elementary data for analyzing the course’s impact and reach and the profile of its participants. In addition, it brings an update of the data presented in work by Valentim et al. (2021).

Table 1: Description of AVASUS dataset features.

Attributes	Description	datatype	Value
gender	Gender of the course participant.	Categorical.	Feminino / Masculino / Não Informado. (In English, Female, Male or Uninformed)
course_progress	Percentage of completion of the course.	Numerical.	Range from 0 to 100.
course_evaluation	A score given to the course by the participant.	Numerical.	0, 1, 2, 3, 4, 5 or NaN.
evaluation_commentary	Comment made by the participant about the course.	Categorical.	Free text or NaN.
region	Brazilian region in which the participant resides.	Categorical.	Brazilian region according to IBGE: Norte, Nordeste, Centro-Oeste, Sudeste or Sul (In English North, Northeast, Midwest, Southeast or South).
CNES	The CNES code refers to the health establishment where the participant works.	Numerical.	CNES Code or NaN.
health_care_level	Identification of the health care network level for which the course participant works.	Categorical.	“ATENCAO PRIMARIA”, “MEDIA COMPLEXIDADE”, “ALTA COMPLEXIDADE”, and their possible combinations. (In English "PRIMARY HEALTH CARE", "SECONDARY HEALTH CARE" AND "TERTIARY HEALTH CARE")
year_enrollment	Year in which the course participant registered.	Numerical.	Year (YYYY).
CBO	Participant occupation.	Categorical.	Text coded according to the Brazilian Classification of Occupations or “Indivíduo sem afiliação formal.” (In English “Individual without formal affiliation.”)

Dataset name: prison_syphilis_and_population_brazil.csv

Dataset period: 2017 - 2020

Dataset Characteristics: Multivalued

Number of Instances: 6

Number of Attributes: 13

Missing Values: No

Source:

National Penitentiary Department (DEPEN) (Brasil, 2022d);

Description: The data contained in the prison_syphilis_and_population_brazil.csv dataset (see Table 2) originate from the National Penitentiary Department Information System (SISDEPEN) (Brasil, 2022d). This dataset provides data on the population and prevalence of syphilis in the Brazilian prison system. In addition, it brings a rate that represents the normalized data for purposes of comparison between the populations of each region and Brazil.

Table 2: Description of DEPEN dataset Features.

Attributes	Description	datatype	Value
Region	Brazilian region in which the participant resides. In addition, the sum of the regions, which refers to Brazil.	Categorical.	Brazil and Brazilian region according to IBGE: North, Northeast, Midwest, Southeast or South.
syphilis_2017	Number of syphilis cases in the prison system in 2017.	Numerical.	Number of syphilis cases.
syphilis_rate_2017	Normalized rate of syphilis cases in 2017.	Numerical.	Syphilis case rate.
syphilis_2018	Number of syphilis cases in the prison system in 2018.	Numerical.	Number of syphilis cases.
syphilis_rate_2018	Normalized rate of syphilis cases in 2018.	Numerical.	Syphilis case rate.
syphilis_2019	Number of syphilis cases in the prison system in 2019.	Numerical.	Number of syphilis cases.
syphilis_rate_2019	Normalized rate of syphilis cases in 2019.	Numerical.	Syphilis case rate.
syphilis_2020	Number of syphilis cases in the prison system in 2020.	Numerical.	Number of syphilis cases.
syphilis_rate_2020	Normalized rate of syphilis cases in 2020.	Numerical.	Syphilis case rate.
pop_2017	Prison population in 2017.	Numerical.	Population number.
pop_2018	Prison population in 2018.	Numerical.	Population number.
pop_2019	Prison population in 2019.	Numerical.	Population number.
pop_2020	Prison population in 2020.	Numerical.	Population number.

Dataset name: students_cumulative_sum.csv

Dataset period: 2018 - 2020

Dataset Characteristics: Multivalued

Number of Instances: 6

Number of Attributes: 7

Missing Values: No

Source:

Virtual Learning Environment of the Brazilian Health System (AVASUS) (Brasil, 2022a);
Brazilian Institute of Geography and Statistics (IBGE) (Brasil, 2022e).

Description: The data contained in the students_cumulative_sum.csv dataset (see Table 3) originate mainly from AVASUS (Brasil, 2022a). This dataset provides data on the number of students by region and year. In addition, it brings a rate that represents the normalized data for purposes of comparison between the populations of each region and Brazil. We used population data estimated by the IBGE (Brasil, 2022e) to calculate the rate.

Table 3: Description of Students dataset Features.

f
Synthetic Dataset of Emergency Healthcare Services
figshare.com
csv
Updated Dec 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marco Ferreira (2024). Synthetic Dataset of Emergency Healthcare Services [Dataset]. http://doi.org/10.6084/m9.figshare.28012784.v1
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28012784.v1
Dataset updated
Dec 12, 2024
Dataset provided by
figshare
Authors
Marco Ferreira
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset was generated using Simio simulation software. The simulations model patient flow in healthcare settings, capturing key metrics such as queue times, length of stay (LOS) for patients, and nurse utilization rates. Each CSV file contains time-series data, with measured variables including patient waiting times, resource utilization percentages, and service durations.## File Overview**CheckBloodPressure.csv** - (9 KB): Contains blood pressure Server records of patients.**CheckPatientType.csv** - (19 KB): Identifies the type of each patient (e.g., 1 or 3).**Fill_Information.csv** - (2 KB): Fill information records for new patients.**MedicalRecord1.csv** - (10 KB): Medical record dataset for patient type 1.**MedicalRecord2.csv** - (4 KB): Medical record dataset for patient type 2.**MedicalRecord3.csv** - (2 KB): Medical record dataset for patient type 3.**MedicalRecord4.csv** - (13 KB): Medical record dataset for patient type 4.**OutPatientDepartment.csv** - (18 KB): Data related to the satisfaction and length of stay of an given patient.**Triage.csv** - (13 KB): Data related to the triage process.**README.txt** - (4 KB): Documentation of the dataset, including structure, metadata, and usage.## Common Fields Across Files**Patient ID** (Integer): Unique identifier for each patient.**Patient Type** (Integer): Classification of patient (e.g., 1, 4).**Medical Records Arrival Time** (DateTime): Timestamp of the patient's first arrival in the medical record department.**Exiting Time** (DateTime): Timestamp when the patient exits a Server.**Waiting Time (min)** (Real): Total waiting time before being attended to.**Resource Used** (String): Resource (e.g., Operator) allocated to the patient.**Utilization %** (Real): Utilization rate of the resource as a percentage.**Queue Count Before Processing** (Integer): Number of patients in the queue before processing begins.**Queue Count After Processing** (Integer): Number of patients in the queue after processing ends.**Queue Difference** (Integer): Difference between the before and after queue counts.**Length of Stay (min)** (Real): Total time spent in the simulation by the patient.**LOS without Queues (min)** (Real): Length of stay excluding any queuing time.**Satisfaction %** (Real): Patient satisfaction rating based on their experience.**New Patient?** (String): Indicates if this is a new patient or a returning one.
c
Mental Health - Datasets - CTData.org
data.ctdata.org
Updated Jun 24, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). Mental Health - Datasets - CTData.org [Dataset]. http://data.ctdata.org/dataset/mental-health
Explore at:
Dataset updated
Jun 24, 2016
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Mental Health reports the prevalence of the mental illness in the past year by age range.
Medical_cost_dataset
kaggle.com
Updated Aug 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nandita Pore (2023). Medical_cost_dataset [Dataset]. https://www.kaggle.com/datasets/nanditapore/medical-cost-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 19, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Nandita Pore
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Description:

Explore the intricacies of medical costs and healthcare expenses with our meticulously curated Medical Cost Dataset. This dataset offers valuable insights into the factors influencing medical charges, enabling researchers, analysts, and healthcare professionals to gain a deeper understanding of the dynamics within the healthcare industry.

Columns: 1. ID: A unique identifier assigned to each individual record, facilitating efficient data management and analysis. 2. Age: The age of the patient, providing a crucial demographic factor that often correlates with medical expenses. 3. Sex: The gender of the patient, offering insights into potential cost variations based on biological differences. 4. BMI: The Body Mass Index (BMI) of the patient, indicating the relative weight status and its potential impact on healthcare costs. 5. Children: The number of children or dependents covered under the medical insurance, influencing family-related medical expenses. 6. Smoker: A binary indicator of whether the patient is a smoker or not, as smoking habits can significantly impact healthcare costs. 7. Region: The geographic region of the patient, helping to understand regional disparities in healthcare expenditure. 8. Charges: The medical charges incurred by the patient, serving as the target variable for analysis and predictions.

Whether you're aiming to uncover patterns in medical billing, predict future healthcare costs, or explore the relationships between different variables and charges, our Medical Cost Dataset provides a robust foundation for your research. Researchers can utilize this dataset to develop data-driven models that enhance the efficiency of healthcare resource allocation, insurers can refine pricing strategies, and policymakers can make informed decisions to improve the overall healthcare system.

Unlock the potential of healthcare data with our comprehensive Medical Cost Dataset. Gain insights, make informed decisions, and contribute to the advancement of healthcare economics and policy. Start your analysis today and pave the way for a healthier future.
p
AI-Driven Mental Health Literacy - An Interventional Study from India...
psycharchives.org
Updated Oct 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). AI-Driven Mental Health Literacy - An Interventional Study from India (Codebook for the data).csv [Dataset]. https://psycharchives.org/handle/20.500.12034/8771
Explore at:
Dataset updated
Oct 2, 2023
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Area covered
India
Description
The dataset is from an Indian study which made use of ChatGPT- a natural language processing model by OpenAI to design a mental health literacy intervention for college students. Prompt engineering tactics were used to formulate prompts that acted as anchors in the conversations with the AI agent regarding mental health. An intervention lasting for 20 days was designed with sessions of 15-20 minutes on alternative days. Fifty-one students completed pre-test and post-test measures of mental health literacy, mental help-seeking attitude, stigma, mental health self-efficacy, positive and negative experiences, and flourishing in the main study, which were then analyzed using paired t-tests. The results suggest that the intervention is effective among college students as statistically significant changes were noted in mental health literacy and mental health self-efficacy scores. The study affirms the practicality, acceptance, and initial indications of AI-driven methods in advancing mental health literacy and suggests the promising prospects of innovative platforms such as ChatGPT within the field of applied positive psychology.: Codebook for the Dataset provided
Z
A dataset of anonymised hospitalised COVID-19 patient data: outcomes,...
data.niaid.nih.gov
zenodo.org
Updated Jun 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stopard, Isaac J (2022). A dataset of anonymised hospitalised COVID-19 patient data: outcomes, demographics and biomarker measurements for two New York hospitals [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6771833
Explore at:
Dataset updated
Jun 29, 2022
Dataset provided by
Momeni-Boroujeni
Zuretti, Alejandro
Mendoza, Rachelle
Lambert, Ben
Stopard, Isaac J
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
New York
Description
These datasets are for a cohort of n=1540 anonymised hospitalised COVID-19 patients, and the data provide information on outcomes (i.e. patient death or discharge), demographics and biomarker measurements for two New York hospitals: State University of New York (SUNY) Downstate Health Sciences University and Maimonides Medical Center.

The file "demographics_both_hospitals.csv" contains the ultimate outcomes of hospitalisation (whether a patient was discharged or died), demographic information and known comorbidities for each of the patients.

The file "dynamics_clean_both_hospitals.csv" contains cleaned dynamic biomarker measurements for the n=1233 patients where this information was available and the data passed our various checks (see https://doi.org/10.1101/2021.11.12.21266248 for information of these checks and the cleaning process). Patients can be matched to demographic data via the "id" column.

Study approval and data collection

Study approval was obtained from the State University of New York (SUNY) Downstate Health Sciences University Institutional Review Board (IRB#1595271-1) and Maimonides Medical Center Institutional Review Board/Research Committee (IRB#2020-05-07). A retrospective query was performed among the patients who were admitted to SUNY Downstate Medical Center and Maimonides Medical Center with COVID-19-related symptoms, which was subsequently confirmed by RT PCR, from the beginning of February 2020 until the end of May 2020. Stratified randomization was used to select at least 500 patients who were discharged and 500 patients who died due to the complications of COVID-19. Patient outcome was recorded as a binary choice of “discharged” versus “COVID-19 related mortality”. Patients whose outcome was unknown were excluded. Demographic, clinical history and laboratory data was extracted from the hospital’s electronic health records.
A
‘Health-Response-Score Datasets’ analyzed by Analyst-2
analyst-2.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘Health-Response-Score Datasets’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-health-response-score-datasets-d431/8b47586f/?iid=004-686&v=presentation
Explore at:
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Health-Response-Score Datasets’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/kartikgo/healthresponsescore-datasets on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Health - Care

Health is very first base of every creature. It doesn't matter if it is human, Animal, Trees or crops. But in todays hectic life we have forgotten about this basic thing. Here is the dataset of various employees and their response + score and question that we asked. This Dataset is regarding the Data gathered from various employees and there daily routine + their body checkups + other related information.

Dataset -

hra_qna_scores.csv : This csv file contains Question Id and related Score. hra_responses.csv : This csv file contains response of the user, when response is stored and how related question Id and user Id. users_data.csv : Here This csv file contains user data and when the user account is created. sponsor_data.csv : This csv file is contains the sponsors.

Acknowledgements

A Very Big Thank to all the users that took time to fill all the details.

Inspiration

EDA. Prediction of different disease for users. Creativity of the data scientist is welcomed. Al the best. Happy learning!

--- Original source retains full ownership of the source dataset ---
m
EHR Dataset for Patient Treatment Classification
data.mendeley.com
paperswithcode.com
Updated May 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mujiono Sadikin (2020). EHR Dataset for Patient Treatment Classification [Dataset]. http://doi.org/10.17632/7kv3rctx7m.1
Explore at:
Unique identifier
https://doi.org/10.17632/7kv3rctx7m.1
Dataset updated
May 10, 2020
Authors
Mujiono Sadikin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset is Electronic Health Record Predicting collected from a private Hospital in Indonesia. It contains the patients laboratory test results used to determine next patient treatment whether in care or out care patient. The task embedded to the dataset is classification prediction.
Synthetic Healthcare Database for Research (SyH-DR)
catalog.data.gov
healthdata.gov
+2more
Updated Sep 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agency for Healthcare Research and Quality (2023). Synthetic Healthcare Database for Research (SyH-DR) [Dataset]. https://catalog.data.gov/dataset/synthetic-healthcare-database-for-research-syh-dr
Explore at:
Dataset updated
Sep 16, 2023
Dataset provided by
Agency for Healthcare Research and Qualityhttp://www.ahrq.gov/
Description
The Agency for Healthcare Research and Quality (AHRQ) created SyH-DR from eligibility and claims files for Medicare, Medicaid, and commercial insurance plans in calendar year 2016. SyH-DR contains data from a nationally representative sample of insured individuals for the 2016 calendar year. SyH-DR uses synthetic data elements at the claim level to resemble the marginal distribution of the original data elements. SyH-DR person-level data elements are not synthetic, but identifying information is aggregated or masked.
G
Health Trends, Comprehensive download file for all geographies
open.canada.ca
ouvert.canada.ca
csv
Updated Mar 9, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statistics Canada (2022). Health Trends, Comprehensive download file for all geographies [Dataset]. https://open.canada.ca/data/en/dataset/3ef254aa-519b-47d6-96ec-f0ba2e72e1dd
Explore at:
csvAvailable download formats
Dataset updated
Mar 9, 2022
Dataset provided by
Statistics Canada
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Description
This product presents comparable time-series data for a range of health indicators from a number of sources including the Canadian Community Health Survey, Vital Statistics, and Canadian Cancer Registry.
Mental Health Care in the Last 4 Weeks
catalog.data.gov
data.virginia.gov
+2more
Updated Apr 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Disease Control and Prevention (2025). Mental Health Care in the Last 4 Weeks [Dataset]. https://catalog.data.gov/dataset/mental-health-care-in-the-last-4-weeks
Explore at:
Dataset updated
Apr 23, 2025
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Description
The U.S. Census Bureau, in collaboration with five federal agencies, launched the Household Pulse Survey to produce data on the social and economic impacts of Covid-19 on American households. The Household Pulse Survey was designed to gauge the impact of the pandemic on employment status, consumer spending, food security, housing, education disruptions, and dimensions of physical and mental wellness. The survey was designed to meet the goal of accurate and timely weekly estimates. It was conducted by an internet questionnaire, with invitations to participate sent by email and text message. The sample frame is the Census Bureau Master Address File Data. Housing units linked to one or more email addresses or cell phone numbers were randomly selected to participate, and one respondent from each housing unit was selected to respond for him or herself. Estimates are weighted to adjust for nonresponse and to match Census Bureau estimates of the population by age, gender, race and ethnicity, and educational attainment. All estimates shown meet the NCHS Data Presentation Standards for Proportions.
Dataset for MH Journal article.csv
figshare.com
txt
Updated Nov 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sandrine Ingabire (2023). Dataset for MH Journal article.csv [Dataset]. http://doi.org/10.6084/m9.figshare.24550777.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24550777.v1
Dataset updated
Nov 13, 2023
Dataset provided by
figshare
Authors
Sandrine Ingabire
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In this paper, I have assessed the perceptions and attitudes of college students towards formal mental health support including campus-based mental healthcare services in Rwanda.
MHS Dashboard Adult Demographic Datasets
data.chhs.ca.gov
data.ca.gov
+3more
csv, zip
Updated Aug 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Health Care Services (2024). MHS Dashboard Adult Demographic Datasets [Dataset]. https://data.chhs.ca.gov/dataset/adult-ab470-datasets
Explore at:
csv(512596), csv(42734), csv(54553345), csv(3515490), csv(11278), csv(451612), csv(1916691), csv(1135935), csv(45828311), csv(197911), csv(22469442), csv(1637878), csv(1699554), csv(142267), csv(499193), csv(297746), csv(402564), csv(41051287), csv(30400), csv(1254239), zipAvailable download formats
Dataset updated
Aug 28, 2024
Dataset provided by
California Department of Health Care Serviceshttp://www.dhcs.ca.gov/
Authors
Department of Health Care Services
Description
The following datasets are based on the adult (age 21 and over) beneficiary population and consist of aggregate MHS data derived from Medi-Cal claims, encounter, and eligibility systems. These datasets were developed in accordance with California Welfare and Institutions Code (WIC) § 14707.5 (added as part of Assembly Bill 470 on 10/7/17). Please contact BHData@dhcs.ca.gov for any questions or to request previous years’ versions of these datasets. Note: The Performance Dashboard AB 470 Report Application Excel tool development has been discontinued. Please see the Behavioral Health reporting data hub at https://behavioralhealth-data.dhcs.ca.gov/ for access to dashboards utilizing these datasets and other behavioral health data.
Health Care Analytics
kaggle.com
Updated Jan 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abishek Sudarshan (2022). Health Care Analytics [Dataset]. https://www.kaggle.com/datasets/abisheksudarshan/health-care-analytics
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 10, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Abishek Sudarshan
Description
Context

Part of Janatahack Hackathon in Analytics Vidhya

Content

The healthcare sector has long been an early adopter of and benefited greatly from technological advances. These days, machine learning plays a key role in many health-related realms, including the development of new medical procedures, the handling of patient data, health camps and records, and the treatment of chronic diseases.

MedCamp organizes health camps in several cities with low work life balance. They reach out to working people and ask them to register for these health camps. For those who attend, MedCamp provides them facility to undergo health checks or increase awareness by visiting various stalls (depending on the format of camp).

MedCamp has conducted 65 such events over a period of 4 years and they see a high drop off between “Registration” and number of people taking tests at the Camps. In last 4 years, they have stored data of ~110,000 registrations they have done.

One of the huge costs in arranging these camps is the amount of inventory you need to carry. If you carry more than required inventory, you incur unnecessarily high costs. On the other hand, if you carry less than required inventory for conducting these medical checks, people end up having bad experience.

The Process:

MedCamp employees / volunteers reach out to people and drive registrations. During the camp, People who “ShowUp” either undergo the medical tests or visit stalls depending on the format of health camp.

Other things to note:

Since this is a completely voluntary activity for the working professionals, MedCamp usually has little profile information about these people. For a few camps, there was hardware failure, so some information about date and time of registration is lost. MedCamp runs 3 formats of these camps. The first and second format provides people with an instantaneous health score. The third format provides information about several health issues through various awareness stalls.

Favorable outcome:

For the first 2 formats, a favourable outcome is defined as getting a health_score, while in the third format it is defined as visiting at least a stall. You need to predict the chances (probability) of having a favourable outcome.

Train / Test split:

Camps started on or before 31st March 2006 are considered in Train Test data is for all camps conducted on or after 1st April 2006.

Acknowledgements

Credits to AV

Inspiration

To share with the data science community to jump start their journey in Healthcare Analytics
Data from: Associations between environmental quality and adult asthma...
catalog.data.gov
s.cnmilf.com
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Associations between environmental quality and adult asthma prevalence in medical claims data [Dataset]. https://catalog.data.gov/dataset/associations-between-environmental-quality-and-adult-asthma-prevalence-in-medical-claims-d
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
The MarketScan health claims database is a compilation of nearly 110 million patient records with information from more than 100 private insurance carriers and large self-insuring companies. Public forms of insurance (i.e., Medicare and Medicaid) are not included, nor are small (< 100 employees) or medium (1000 employees). We excluded the relatively few (n=6735) individuals over 65 years of age because Medicare is the primary insurance of U.S. adults over 65. The EQI was constructed for 2000-2005 for all US counties and is composed of five domains (air, water, built, land, and sociodemographic), each composed of variables to represent the environmental quality of that domain. Domain-specific EQIs were developed using principal components analysis (PCA) to reduce these variables within each domain while the overall EQI was constructed from a second PCA from these individual domains (L. C. Messer et al., 2014). To account for differences in environment across rural and urban counties, the overall and domain-specific EQIs were stratified by rural urban continuum codes (RUCCs) (U.S. Department of Agriculture, 2015). This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Gray, C., D. Lobdell, K. Rappazzo, Y. Jian, J. Jagai, L. Messer, A. Patel, S. Deflorio-Barker, C. Lyttle, J. Solway, and A. Rzhetsky. Associations between environmental quality and adult asthma prevalence in medical claims data. ENVIRONMENTAL RESEARCH. Elsevier B.V., Amsterdam, NETHERLANDS, 166: 529-536, (2018).
R
Synthetic Dataset of Emergency Healthcare Services
datarepositorium.uminho.pt
zenodo.org
csv, txt
Updated Jan 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Repositório de Dados da Universidade do Minho (2025). Synthetic Dataset of Emergency Healthcare Services [Dataset]. http://doi.org/10.34622/datarepositorium/AKSZQG
Explore at:
csv(1259), txt(4064)Available download formats
Unique identifier
https://doi.org/10.34622/datarepositorium/AKSZQG
Dataset updated
Jan 17, 2025
Dataset provided by
Repositório de Dados da Universidade do Minho
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Synthetic dataset of emergency services comprised of several CSV files that we have generated using a simulation software. This dataset is open for public use; please cite our work if used in research or applications. File Overview CheckBloodPressure.csv** - (9 KB): Contains blood pressure Server records of patients. CheckPatientType.csv** - (19 KB): Identifies the type of each patient (e.g., 1 or 3). Fill_Information.csv - (2 KB): Fill information records for new patients. MedicalRecord1.csv - (10 KB): Medical record dataset for patient type 1. MedicalRecord2.csv - (4 KB): Medical record dataset for patient type 2. MedicalRecord3.csv - (2 KB): Medical record dataset for patient type 3. MedicalRecord4.csv - (13 KB): Medical record dataset for patient type 4. OutPatientDepartment.csv - (18 KB): Data related to the satisfaction and length of stay of an given patient. Triage.csv - (13 KB): Data related to the triage process. README.txt - (4 KB): Documentation of the dataset, including structure, metadata, and usage. Common Fields Across Files Patient ID (Integer): Unique identifier for each patient. Patient Type (Integer): Classification of patient (e.g., 1, 4). Medical Records Arrival Time (DateTime): Timestamp of the patient's first arrival in the medical record department. Exiting Time (DateTime): Timestamp when the patient exits a Server. Waiting Time (min) (Real): Total waiting time before being attended to. Resource Used (String): Resource (e.g., Operator) allocated to the patient. Utilization % (Real): Utilization rate of the resource as a percentage. Queue Count Before Processing (Integer): Number of patients in the queue before processing begins. Queue Count After Processing (Integer): Number of patients in the queue after processing ends. Queue Difference (Integer): Difference between the before and after queue counts. Length of Stay (min) (Real): Total time spent in the simulation by the patient. LOS without Queues (min) (Real): Length of stay excluding any queuing time. Satisfaction % (Real): Patient satisfaction rating based on their experience. New Patient? (String): Indicates if this is a new patient or a returning one.
AI medical chatbot
kaggle.com
Updated Aug 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yousef Saeedian (2024). AI medical chatbot [Dataset]. https://www.kaggle.com/datasets/yousefsaeedian/ai-medical-chatbot
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 15, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Yousef Saeedian
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Description:

This dataset comprises transcriptions of conversations between doctors and patients, providing valuable insights into the dynamics of medical consultations. It includes a wide range of interactions, covering various medical conditions, patient concerns, and treatment discussions. The data is structured to capture both the questions and concerns raised by patients, as well as the medical advice, diagnoses, and explanations provided by doctors.

Key Features:

Doctor and Patient Roles: Each conversation is annotated with the role of the speaker (doctor or patient), making it easy to analyze communication patterns.

Medical Context: The dataset includes diverse scenarios, from routine check-ups to more complex medical discussions, offering a broad spectrum of healthcare dialogues.

Natural Language: The conversations are presented in natural language, allowing for the development and testing of NLP models focused on healthcare communication.

Applications: This dataset can be used for various applications, such as building dialogue systems, analyzing communication efficacy, developing medical NLP models, and enhancing patient care through better understanding of doctor-patient interactions.

Potential Use Cases:

NLP Model Training: Train models to understand and generate medical dialogues.

Healthcare Communication Studies: Analyze communication strategies between doctors and patients to improve healthcare delivery.

Medical Chatbots: Develop intelligent medical chatbots that can simulate doctor-patient conversations.

Patient Experience Enhancement: Identify common patient concerns and doctor responses to enhance patient care strategies.

This dataset is a valuable resource for researchers, data scientists, and healthcare professionals interested in the intersection of technology and medicine, aiming to improve healthcare communication through data-driven approaches.
Z
Data from: OpenChart-SE: A corpus of artificial Swedish electronic health...
data.niaid.nih.gov
zenodo.org
Updated Jul 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Berg, Johanna (2024). OpenChart-SE: A corpus of artificial Swedish electronic health records for imagined emergency care patients written by physicians in a crowd-sourcing project [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7499830
Explore at:
Dataset updated
Jul 15, 2024
Dataset provided by
Appelgren Thorell, Björn
Aasa, Carl Ollvik
Berg, Johanna
Aits, Sonja
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Sweden
Description
Electronic health records (EHRs) are a rich source of information for medical research and public health monitoring. Information systems based on EHR data could also assist in patient care and hospital management. However, much of the data in EHRs is in the form of unstructured text, which is difficult to process for analysis. Natural language processing (NLP), a form of artificial intelligence, has the potential to enable automatic extraction of information from EHRs and several NLP tools adapted to the style of clinical writing have been developed for English and other major languages. In contrast, the development of NLP tools for less widely spoken languages such as Swedish has lagged behind. A major bottleneck in the development of NLP tools is the restricted access to EHRs due to legitimate patient privacy concerns. To overcome this issue we have generated a citizen science platform for collecting artificial Swedish EHRs with the help of Swedish physicians and medical students. These artificial EHRs describe imagined but plausible emergency care patients in a style that closely resembles EHRs used in emergency departments in Sweden. In the pilot phase, we collected a first batch of 50 artificial EHRs, which has passed review by an experienced Swedish emergency care physician. We make this dataset publicly available as OpenChart-SE corpus (version 1) under an open-source license for the NLP research community. The project is now open for general participation and Swedish physicians and medical students are invited to submit EHRs on the project website (https://github.com/Aitslab/openchart-se), where additional batches of quality-controlled EHRs will be released periodically.

Dataset content

OpenChart-SE, version 1 corpus (txt files and and dataset.csv)

The OpenChart-SE corpus, version 1, contains 50 artificial EHRs (note that the numbering starts with 5 as 1-4 were test cases that were not suitable for publication). The EHRs are available in two formats, structured as a .csv file and as separate textfiles for annotation. Note that flaws in the data were not cleaned up so that it simulates what could be encountered when working with data from different EHR systems. All charts have been checked for medical validity by a resident in Emergency Medicine at a Swedish hospital before publication.

Codebook.xlsx

The codebook contain information about each variable used. It is in XLSForm-format, which can be re-used in several different applications for data collection.

suppl_data_1_openchart-se_form.pdf

OpenChart-SE mock emergency care EHR form.

suppl_data_3_openchart-se_dataexploration.ipynb

This jupyter notebook contains the code and results from the analysis of the OpenChart-SE corpus.

More details about the project and information on the upcoming preprint accompanying the dataset can be found on the project website (https://github.com/Aitslab/openchart-se).
The MultiCaRe Dataset: A Multimodal Case Report Dataset with Clinical Cases,...
zenodo.org
bin, csv, zip
Updated Jan 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mauro Nievas Offidani; Mauro Nievas Offidani; Claudio Delrieux; Claudio Delrieux (2024). The MultiCaRe Dataset: A Multimodal Case Report Dataset with Clinical Cases, Labeled Images and Captions from Open Access PMC Articles [Dataset]. http://doi.org/10.5281/zenodo.10079370
Explore at:
zip, bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10079370
Dataset updated
Jan 5, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mauro Nievas Offidani; Mauro Nievas Offidani; Claudio Delrieux; Claudio Delrieux
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains multi-modal data from over 75,000 open access and de-identified case reports, including metadata, clinical cases, image captions and more than 130,000 images. Images and clinical cases belong to different medical specialties, such as oncology, cardiology, surgery and pathology. The structure of the dataset allows to easily map images with their corresponding article metadata, clinical case, captions and image labels. Details of the data structure can be found in the file data_dictionary.csv.

Almost 100,000 patients and almost 400,000 medical doctors and researchers were involved in the creation of the articles included in this dataset. The citation data of each article can be found in the metadata.parquet file.

Refer to the examples showcased in this GitHub repository to understand how to optimize the use of this dataset.

For a detailed insight about the contents of this dataset, please refer to this data article published in Data In Brief.

Facebook

Twitter

Click to copy link

Link copied

Cite

nikhil chahar (2023). Student Health Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/5633096

Student Health Dataset

Physical health data of college students.

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.34740/kaggle/dsv/5633096

Dataset updated

May 8, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

nikhil chahar

Description

Student health data was collected from all over the university via survey, there are 18 classes in the dataset.

Clear search

Close search

Google apps

Main menu

Student Health Dataset

Data from: THE RELEVANCY OF MASSIVE HEALTH EDUCATION IN THE BRAZILIAN PRISON...

Synthetic Dataset of Emergency Healthcare Services

Mental Health - Datasets - CTData.org

Medical_cost_dataset

Description:

AI-Driven Mental Health Literacy - An Interventional Study from India...

A dataset of anonymised hospitalised COVID-19 patient data: outcomes,...

‘Health-Response-Score Datasets’ analyzed by Analyst-2

Health - Care

Dataset -

Acknowledgements

Inspiration

EHR Dataset for Patient Treatment Classification

Synthetic Healthcare Database for Research (SyH-DR)

Health Trends, Comprehensive download file for all geographies

Mental Health Care in the Last 4 Weeks

Dataset for MH Journal article.csv

MHS Dashboard Adult Demographic Datasets

Health Care Analytics

Context

Content

Acknowledgements

Inspiration

Data from: Associations between environmental quality and adult asthma...

Synthetic Dataset of Emergency Healthcare Services

AI medical chatbot

Data from: OpenChart-SE: A corpus of artificial Swedish electronic health...

The MultiCaRe Dataset: A Multimodal Case Report Dataset with Clinical Cases,...

Student Health Dataset

Physical health data of college students.