Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was generated using Simio simulation software. The simulations model patient flow in healthcare settings, capturing key metrics such as queue times, length of stay (LOS) for patients, and nurse utilization rates. Each CSV file contains time-series data, with measured variables including patient waiting times, resource utilization percentages, and service durations.## File Overview**CheckBloodPressure.csv** - (9 KB): Contains blood pressure Server records of patients.**CheckPatientType.csv** - (19 KB): Identifies the type of each patient (e.g., 1 or 3).**Fill_Information.csv** - (2 KB): Fill information records for new patients.**MedicalRecord1.csv** - (10 KB): Medical record dataset for patient type 1.**MedicalRecord2.csv** - (4 KB): Medical record dataset for patient type 2.**MedicalRecord3.csv** - (2 KB): Medical record dataset for patient type 3.**MedicalRecord4.csv** - (13 KB): Medical record dataset for patient type 4.**OutPatientDepartment.csv** - (18 KB): Data related to the satisfaction and length of stay of an given patient.**Triage.csv** - (13 KB): Data related to the triage process.**README.txt** - (4 KB): Documentation of the dataset, including structure, metadata, and usage.## Common Fields Across Files**Patient ID** (Integer): Unique identifier for each patient.**Patient Type** (Integer): Classification of patient (e.g., 1, 4).**Medical Records Arrival Time** (DateTime): Timestamp of the patient's first arrival in the medical record department.**Exiting Time** (DateTime): Timestamp when the patient exits a Server.**Waiting Time (min)** (Real): Total waiting time before being attended to.**Resource Used** (String): Resource (e.g., Operator) allocated to the patient.**Utilization %** (Real): Utilization rate of the resource as a percentage.**Queue Count Before Processing** (Integer): Number of patients in the queue before processing begins.**Queue Count After Processing** (Integer): Number of patients in the queue after processing ends.**Queue Difference** (Integer): Difference between the before and after queue counts.**Length of Stay (min)** (Real): Total time spent in the simulation by the patient.**LOS without Queues (min)** (Real): Length of stay excluding any queuing time.**Satisfaction %** (Real): Patient satisfaction rating based on their experience.**New Patient?** (String): Indicates if this is a new patient or a returning one.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset will help you apply your existing knowledge to great use. This dataset has 132 parameters on which 42 different types of diseases can be predicted. This dataset consists of 2 CSV files. One of them is for training and the other is for testing your model. Each CSV file has 133 columns. 132 of these columns are symptoms that a person experiences and the last column is the prognosis. These symptoms are mapped to 42 diseases you can classify these sets of symptoms. You are required to train your model on training data and test it on testing data.
Machine Learning
medicine,disease,Healthcare,ML,Machine Learning
4962
$109.00
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains multi-modal data from over 75,000 open access and de-identified case reports, including metadata, clinical cases, image captions and more than 130,000 images. Images and clinical cases belong to different medical specialties, such as oncology, cardiology, surgery and pathology. The structure of the dataset allows to easily map images with their corresponding article metadata, clinical case, captions and image labels. Details of the data structure can be found in the file data_dictionary.csv.
Almost 100,000 patients and almost 400,000 medical doctors and researchers were involved in the creation of the articles included in this dataset. The citation data of each article can be found in the metadata.parquet file.
Refer to the examples showcased in this GitHub repository to understand how to optimize the use of this dataset.
For a detailed insight about the contents of this dataset, please refer to this data article published in Data In Brief.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The heart attack datasets were collected at Zheen hospital in Erbil, Iraq, from January 2019 to May 2019. The attributes of this dataset are: age, gender, heart rate, systolic blood pressure, diastolic blood pressure, blood sugar, ck-mb and troponin with negative or positive output. According to the provided information, the medical dataset classifies either heart attack or none. The gender column in the data is normalized: the male is set to 1 and the female to 0. The glucose column is set to 1 if it is > 120; otherwise, 0. As for the output, positive is set to 1 and negative to 0.
Overview
This dataset of medical misinformation was collected and is published by Kempelen Institute of Intelligent Technologies (KInIT). It consists of approx. 317k news articles and blog posts on medical topics published between January 1, 1998 and February 1, 2022 from a total of 207 reliable and unreliable sources. The dataset contains full-texts of the articles, their original source URL and other extracted metadata. If a source has a credibility score available (e.g., from Media Bias/Fact Check), it is also included in the form of annotation. Besides the articles, the dataset contains around 3.5k fact-checks and extracted verified medical claims with their unified veracity ratings published by fact-checking organisations such as Snopes or FullFact. Lastly and most importantly, the dataset contains 573 manually and more than 51k automatically labelled mappings between previously verified claims and the articles; mappings consist of two values: claim presence (i.e., whether a claim is contained in the given article) and article stance (i.e., whether the given article supports or rejects the claim or provides both sides of the argument).
The dataset is primarily intended to be used as a training and evaluation set for machine learning methods for claim presence detection and article stance classification, but it enables a range of other misinformation related tasks, such as misinformation characterisation or analyses of misinformation spreading.
Its novelty and our main contributions lie in (1) focus on medical news article and blog posts as opposed to social media posts or political discussions; (2) providing multiple modalities (beside full-texts of the articles, there are also images and videos), thus enabling research of multimodal approaches; (3) mapping of the articles to the fact-checked claims (with manual as well as predicted labels); (4) providing source credibility labels for 95% of all articles and other potential sources of weak labels that can be mined from the articles' content and metadata.
The dataset is associated with the research paper "Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked Claims" accepted and presented at ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22).
The accompanying Github repository provides a small static sample of the dataset and the dataset's descriptive analysis in a form of Jupyter notebooks.
Options to access the dataset
There are two ways how to get access to the dataset:
1. Static dump of the dataset available in the CSV format
2. Continuously updated dataset available via REST API
In order to obtain an access to the dataset (either to full static dump or REST API), please, request the access by following instructions provided below.
References
If you use this dataset in any publication, project, tool or in any other form, please, cite the following papers:
@inproceedings{SrbaMonantPlatform,
author = {Srba, Ivan and Moro, Robert and Simko, Jakub and Sevcech, Jakub and Chuda, Daniela and Navrat, Pavol and Bielikova, Maria},
booktitle = {Proceedings of Workshop on Reducing Online Misinformation Exposure (ROME 2019)},
pages = {1--7},
title = {Monant: Universal and Extensible Platform for Monitoring, Detection and Mitigation of Antisocial Behavior},
year = {2019}
}
@inproceedings{SrbaMonantMedicalDataset,
author = {Srba, Ivan and Pecher, Branislav and Tomlein Matus and Moro, Robert and Stefancova, Elena and Simko, Jakub and Bielikova, Maria},
booktitle = {Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22)},
numpages = {11},
title = {Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked Claims},
year = {2022},
doi = {10.1145/3477495.3531726},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3477495.3531726},
}
Dataset creation process
In order to create this dataset (and to continuously obtain new data), we used our research platform Monant. The Monant platform provides so called data providers to extract news articles/blogs from news/blog sites as well as fact-checking articles from fact-checking sites. General parsers (from RSS feeds, Wordpress sites, Google Fact Check Tool, etc.) as well as custom crawler and parsers were implemented (e.g., for fact checking site Snopes.com). All data is stored in the unified format in a central data storage.
Ethical considerations
The dataset was collected and is published for research purposes only. We collected only publicly available content of news/blog articles. The dataset contains identities of authors of the articles if they were stated in the original source; we left this information, since the presence of an author's name can be a strong credibility indicator. However, we anonymised the identities of the authors of discussion posts included in the dataset.
The main identified ethical issue related to the presented dataset lies in the risk of mislabelling of an article as supporting a false fact-checked claim and, to a lesser extent, in mislabelling an article as not containing a false claim or not supporting it when it actually does. To minimise these risks, we developed a labelling methodology and require an agreement of at least two independent annotators to assign a claim presence or article stance label to an article. It is also worth noting that we do not label an article as a whole as false or true. Nevertheless, we provide partial article-claim pair veracities based on the combination of claim presence and article stance labels.
As to the veracity labels of the fact-checked claims and the credibility (reliability) labels of the articles' sources, we take these from the fact-checking sites and external listings such as Media Bias/Fact Check as they are and refer to their methodologies for more details on how they were established.
Lastly, the dataset also contains automatically predicted labels of claim presence and article stance using our baselines described in the next section. These methods have their limitations and work with certain accuracy as reported in this paper. This should be taken into account when interpreting them.
Reporting mistakes in the dataset
The mean to report considerable mistakes in raw collected data or in manual annotations is by creating a new issue in the accompanying Github repository. Alternately, general enquiries or requests can be sent at info [at] kinit.sk.
Dataset structure
Raw data
At first, the dataset contains so called raw data (i.e., data extracted by the Web monitoring module of Monant platform and stored in exactly the same form as they appear at the original websites). Raw data consist of articles from news sites and blogs (e.g. naturalnews.com), discussions attached to such articles, fact-checking articles from fact-checking portals (e.g. snopes.com). In addition, the dataset contains feedback (number of likes, shares, comments) provided by user on social network Facebook which is regularly extracted for all news/blogs articles.
Raw data are contained in these CSV files (and corresponding REST API endpoints):
Note: Personal information about discussion posts' authors (name, website, gravatar) are anonymised.
Annotations
Secondly, the dataset contains so called annotations. Entity annotations describe the individual raw data entities (e.g., article, source). Relation annotations describe relation between two of such entities.
Each annotation is described by the following attributes:
At the same time, annotations are associated with a particular object identified by:
entity_type
in case of entity annotations, or source_entity_type
and target_entity_type
in case of relation annotations). Possible values: sources, articles, fact-checking-articles.entity_id
in case of entity annotations, or source_entity_id
and target_entity_id
in case of relation
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset appears to contain a variety of features related to text analysis, sentiment analysis, and psychological indicators, likely derived from posts or text data. Some features include readability indices such as Automated Readability Index (ARI), Coleman Liau Index, and Flesch-Kincaid Grade Level, as well as sentiment analysis scores like sentiment compound, negative, neutral, and positive scores. Additionally, there are features related to psychological aspects such as economic stress, isolation, substance use, and domestic stress. The dataset seems to cover a wide range of linguistic, psychological, and behavioural attributes, potentially suitable for analyzing mental health-related topics in online communities or text data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mental Health reports the prevalence of the mental illness in the past year by age range.
The Agency for Healthcare Research and Quality (AHRQ) created SyH-DR from eligibility and claims files for Medicare, Medicaid, and commercial insurance plans in calendar year 2016. SyH-DR contains data from a nationally representative sample of insured individuals for the 2016 calendar year. SyH-DR uses synthetic data elements at the claim level to resemble the marginal distribution of the original data elements. SyH-DR person-level data elements are not synthetic, but identifying information is aggregated or masked.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset name: asppl_dataset_v2.csv
Version: 2.0
Dataset period: 06/07/2018 - 01/14/2022
Dataset Characteristics: Multivalued
Number of Instances: 8118
Number of Attributes: 9
Missing Values: Yes
Area(s): Health and education
Sources:
Virtual Learning Environment of the Brazilian Health System (AVASUS) (Brasil, 2022a);
Brazilian Occupational Classification (CBO) (Brasil, 2022b);
National Registry of Health Establishments (CNES) (Brasil, 2022c);
Brazilian Institute of Geography and Statistics (IBGE) (Brasil, 2022e).
Description: The data contained in the asppl_dataset_v2.csv dataset (see Table 1) originates from participants of the technology-based educational course “Health Care for People Deprived of Freedom.” The course is available on the AVASUS (Brasil, 2022a). This dataset provides elementary data for analyzing the course’s impact and reach and the profile of its participants. In addition, it brings an update of the data presented in work by Valentim et al. (2021).
Table 1: Description of AVASUS dataset features.
Attributes |
Description |
datatype |
Value |
gender |
Gender of the course participant. |
Categorical. |
Feminino / Masculino / Não Informado. (In English, Female, Male or Uninformed) |
course_progress |
Percentage of completion of the course. |
Numerical. |
Range from 0 to 100. |
course_evaluation |
A score given to the course by the participant. |
Numerical. |
0, 1, 2, 3, 4, 5 or NaN. |
evaluation_commentary |
Comment made by the participant about the course. |
Categorical. |
Free text or NaN. |
region |
Brazilian region in which the participant resides. |
Categorical. |
Brazilian region according to IBGE: Norte, Nordeste, Centro-Oeste, Sudeste or Sul (In English North, Northeast, Midwest, Southeast or South). |
CNES |
The CNES code refers to the health establishment where the participant works. |
Numerical. |
CNES Code or NaN. |
health_care_level |
Identification of the health care network level for which the course participant works. |
Categorical. |
“ATENCAO PRIMARIA”, “MEDIA COMPLEXIDADE”, “ALTA COMPLEXIDADE”, and their possible combinations. |
year_enrollment |
Year in which the course participant registered. |
Numerical. |
Year (YYYY). |
CBO |
Participant occupation. |
Categorical. |
Text coded according to the Brazilian Classification of Occupations or “Indivíduo sem afiliação formal.” (In English “Individual without formal affiliation.”) |
Dataset name: prison_syphilis_and_population_brazil.csv
Dataset period: 2017 - 2020
Dataset Characteristics: Multivalued
Number of Instances: 6
Number of Attributes: 13
Missing Values: No
Source:
National Penitentiary Department (DEPEN) (Brasil, 2022d);
Description: The data contained in the prison_syphilis_and_population_brazil.csv dataset (see Table 2) originate from the National Penitentiary Department Information System (SISDEPEN) (Brasil, 2022d). This dataset provides data on the population and prevalence of syphilis in the Brazilian prison system. In addition, it brings a rate that represents the normalized data for purposes of comparison between the populations of each region and Brazil.
Table 2: Description of DEPEN dataset Features.
Attributes |
Description |
datatype |
Value |
Region |
Brazilian region in which the participant resides. In addition, the sum of the regions, which refers to Brazil. |
Categorical. |
Brazil and Brazilian region according to IBGE: North, Northeast, Midwest, Southeast or South. |
syphilis_2017 |
Number of syphilis cases in the prison system in 2017. |
Numerical. |
Number of syphilis cases. |
syphilis_rate_2017 |
Normalized rate of syphilis cases in 2017. |
Numerical. |
Syphilis case rate. |
syphilis_2018 |
Number of syphilis cases in the prison system in 2018. |
Numerical. |
Number of syphilis cases. |
syphilis_rate_2018 |
Normalized rate of syphilis cases in 2018. |
Numerical. |
Syphilis case rate. |
syphilis_2019 |
Number of syphilis cases in the prison system in 2019. |
Numerical. |
Number of syphilis cases. |
syphilis_rate_2019 |
Normalized rate of syphilis cases in 2019. |
Numerical. |
Syphilis case rate. |
syphilis_2020 |
Number of syphilis cases in the prison system in 2020. |
Numerical. |
Number of syphilis cases. |
syphilis_rate_2020 |
Normalized rate of syphilis cases in 2020. |
Numerical. |
Syphilis case rate. |
pop_2017 |
Prison population in 2017. |
Numerical. |
Population number. |
pop_2018 |
Prison population in 2018. |
Numerical. |
Population number. |
pop_2019 |
Prison population in 2019. |
Numerical. |
Population number. |
pop_2020 |
Prison population in 2020. |
Numerical. |
Population number. |
Dataset name: students_cumulative_sum.csv
Dataset period: 2018 - 2020
Dataset Characteristics: Multivalued
Number of Instances: 6
Number of Attributes: 7
Missing Values: No
Source:
Virtual Learning Environment of the Brazilian Health System (AVASUS) (Brasil, 2022a);
Brazilian Institute of Geography and Statistics (IBGE) (Brasil, 2022e).
Description: The data contained in the students_cumulative_sum.csv dataset (see Table 3) originate mainly from AVASUS (Brasil, 2022a). This dataset provides data on the number of students by region and year. In addition, it brings a rate that represents the normalized data for purposes of comparison between the populations of each region and Brazil. We used population data estimated by the IBGE (Brasil, 2022e) to calculate the rate.
Table 3: Description of Students dataset Features.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Synthetic dataset of emergency services comprised of several CSV files that we have generated using a simulation software. This dataset is open for public use; please cite our work if used in research or applications. File Overview CheckBloodPressure.csv** - (9 KB): Contains blood pressure Server records of patients. CheckPatientType.csv** - (19 KB): Identifies the type of each patient (e.g., 1 or 3). Fill_Information.csv - (2 KB): Fill information records for new patients. MedicalRecord1.csv - (10 KB): Medical record dataset for patient type 1. MedicalRecord2.csv - (4 KB): Medical record dataset for patient type 2. MedicalRecord3.csv - (2 KB): Medical record dataset for patient type 3. MedicalRecord4.csv - (13 KB): Medical record dataset for patient type 4. OutPatientDepartment.csv - (18 KB): Data related to the satisfaction and length of stay of an given patient. Triage.csv - (13 KB): Data related to the triage process. README.txt - (4 KB): Documentation of the dataset, including structure, metadata, and usage. Common Fields Across Files Patient ID (Integer): Unique identifier for each patient. Patient Type (Integer): Classification of patient (e.g., 1, 4). Medical Records Arrival Time (DateTime): Timestamp of the patient's first arrival in the medical record department. Exiting Time (DateTime): Timestamp when the patient exits a Server. Waiting Time (min) (Real): Total waiting time before being attended to. Resource Used (String): Resource (e.g., Operator) allocated to the patient. Utilization % (Real): Utilization rate of the resource as a percentage. Queue Count Before Processing (Integer): Number of patients in the queue before processing begins. Queue Count After Processing (Integer): Number of patients in the queue after processing ends. Queue Difference (Integer): Difference between the before and after queue counts. Length of Stay (min) (Real): Total time spent in the simulation by the patient. LOS without Queues (min) (Real): Length of stay excluding any queuing time. Satisfaction % (Real): Patient satisfaction rating based on their experience. New Patient? (String): Indicates if this is a new patient or a returning one.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains disease names along with the symptoms faced by the respective patient. There are a total of 773 unique diseases and 377 symptoms, with ~246,000 rows. The dataset was artificially generated, preserving Symptom Severity and Disease Occurrence Possibility. Several distinct groups of symptoms might all be indicators of the same disease. There may even be one single symptom contributing to a disease in a row or sample. This is an indicator of a very high correlation between the symptom and that particular disease. A larger number of rows for a particular disease corresponds to its higher probability of occurrence in the real world. Similarly, in a row, if the feature vector has the occurrence of a single symptom, it implies that this symptom has more correlation to classify the disease than any one symptom of a feature vector with multiple symptoms in another sample.
The MarketScan health claims database is a compilation of nearly 110 million patient records with information from more than 100 private insurance carriers and large self-insuring companies. Public forms of insurance (i.e., Medicare and Medicaid) are not included, nor are small (< 100 employees) or medium (1000 employees). We excluded the relatively few (n=6735) individuals over 65 years of age because Medicare is the primary insurance of U.S. adults over 65. The EQI was constructed for 2000-2005 for all US counties and is composed of five domains (air, water, built, land, and sociodemographic), each composed of variables to represent the environmental quality of that domain. Domain-specific EQIs were developed using principal components analysis (PCA) to reduce these variables within each domain while the overall EQI was constructed from a second PCA from these individual domains (L. C. Messer et al., 2014). To account for differences in environment across rural and urban counties, the overall and domain-specific EQIs were stratified by rural urban continuum codes (RUCCs) (U.S. Department of Agriculture, 2015). This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Gray, C., D. Lobdell, K. Rappazzo, Y. Jian, J. Jagai, L. Messer, A. Patel, S. Deflorio-Barker, C. Lyttle, J. Solway, and A. Rzhetsky. Associations between environmental quality and adult asthma prevalence in medical claims data. ENVIRONMENTAL RESEARCH. Elsevier B.V., Amsterdam, NETHERLANDS, 166: 529-536, (2018).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Electronic health records (EHRs) are a rich source of information for medical research and public health monitoring. Information systems based on EHR data could also assist in patient care and hospital management. However, much of the data in EHRs is in the form of unstructured text, which is difficult to process for analysis. Natural language processing (NLP), a form of artificial intelligence, has the potential to enable automatic extraction of information from EHRs and several NLP tools adapted to the style of clinical writing have been developed for English and other major languages. In contrast, the development of NLP tools for less widely spoken languages such as Swedish has lagged behind. A major bottleneck in the development of NLP tools is the restricted access to EHRs due to legitimate patient privacy concerns. To overcome this issue we have generated a citizen science platform for collecting artificial Swedish EHRs with the help of Swedish physicians and medical students. These artificial EHRs describe imagined but plausible emergency care patients in a style that closely resembles EHRs used in emergency departments in Sweden. In the pilot phase, we collected a first batch of 50 artificial EHRs, which has passed review by an experienced Swedish emergency care physician. We make this dataset publicly available as OpenChart-SE corpus (version 1) under an open-source license for the NLP research community. The project is now open for general participation and Swedish physicians and medical students are invited to submit EHRs on the project website (https://github.com/Aitslab/openchart-se), where additional batches of quality-controlled EHRs will be released periodically.
Dataset content
OpenChart-SE, version 1 corpus (txt files and and dataset.csv)
The OpenChart-SE corpus, version 1, contains 50 artificial EHRs (note that the numbering starts with 5 as 1-4 were test cases that were not suitable for publication). The EHRs are available in two formats, structured as a .csv file and as separate textfiles for annotation. Note that flaws in the data were not cleaned up so that it simulates what could be encountered when working with data from different EHR systems. All charts have been checked for medical validity by a resident in Emergency Medicine at a Swedish hospital before publication.
Codebook.xlsx
The codebook contain information about each variable used. It is in XLSForm-format, which can be re-used in several different applications for data collection.
suppl_data_1_openchart-se_form.pdf
OpenChart-SE mock emergency care EHR form.
suppl_data_3_openchart-se_dataexploration.ipynb
This jupyter notebook contains the code and results from the analysis of the OpenChart-SE corpus.
More details about the project and information on the upcoming preprint accompanying the dataset can be found on the project website (https://github.com/Aitslab/openchart-se).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These datasets are for a cohort of n=1540 anonymised hospitalised COVID-19 patients, and the data provide information on outcomes (i.e. patient death or discharge), demographics and biomarker measurements for two New York hospitals: State University of New York (SUNY) Downstate Health Sciences University and Maimonides Medical Center.
The file "demographics_both_hospitals.csv" contains the ultimate outcomes of hospitalisation (whether a patient was discharged or died), demographic information and known comorbidities for each of the patients.
The file "dynamics_clean_both_hospitals.csv" contains cleaned dynamic biomarker measurements for the n=1233 patients where this information was available and the data passed our various checks (see https://doi.org/10.1101/2021.11.12.21266248 for information of these checks and the cleaning process). Patients can be matched to demographic data via the "id" column.
Study approval and data collection
Study approval was obtained from the State University of New York (SUNY) Downstate Health Sciences University Institutional Review Board (IRB#1595271-1) and Maimonides Medical Center Institutional Review Board/Research Committee (IRB#2020-05-07). A retrospective query was performed among the patients who were admitted to SUNY Downstate Medical Center and Maimonides Medical Center with COVID-19-related symptoms, which was subsequently confirmed by RT PCR, from the beginning of February 2020 until the end of May 2020. Stratified randomization was used to select at least 500 patients who were discharged and 500 patients who died due to the complications of COVID-19. Patient outcome was recorded as a binary choice of “discharged” versus “COVID-19 related mortality”. Patients whose outcome was unknown were excluded. Demographic, clinical history and laboratory data was extracted from the hospital’s electronic health records.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this paper, I have assessed the perceptions and attitudes of college students towards formal mental health support including campus-based mental healthcare services in Rwanda.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains synthetically generated real-time healthcare data intended for machine learning applications such as patient risk prediction, early diagnosis modeling, and health outcome analysis. The dataset simulates 1,000 patient records with 10 meaningful medical and demographic attributes including age, gender, heart rate, blood pressure, respiratory rate, oxygen saturation, glucose levels, smoking status, diabetes history, and a computed health risk score.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset is Electronic Health Record Predicting collected from a private Hospital in Indonesia. It contains the patients laboratory test results used to determine next patient treatment whether in care or out care patient. The task embedded to the dataset is classification prediction.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Patients Table:
This table stores information about individual patients, including their names and contact details.
Doctors Table:
This table contains details about healthcare providers, including their names, specializations, and contact information.
Appointments Table:
This table records scheduled appointments, linking patients to doctors.
MedicalProcedure Table:
This table stores details about medical procedures associated with specific appointments.
Billing Table:
This table maintains records of billing transactions, associating them with specific patients.
demo Table:
This table appears to be a demonstration or testing table, possibly unrelated to the healthcare management system.
This dataset schema is designed to capture comprehensive information about patients, doctors, appointments, medical procedures, and billing transactions in a healthcare management system. Adjustments can be made based on specific requirements, and additional attributes can be included as needed.
The Medical Information Mart for Intensive Care III (MIMIC-III) dataset is a large, de-identified and publicly-available collection of medical records. Each record in the dataset includes ICD-9 codes, which identify diagnoses and procedures performed. Each code is partitioned into sub-codes, which often include specific circumstantial details. The dataset consists of 112,000 clinical reports records (average length 709.3 tokens) and 1,159 top-level ICD-9 codes. Each report is assigned to 7.6 codes, on average. Data includes vital signs, medications, laboratory measurements, observations and notes charted by care providers, fluid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, and more.
The database supports applications including academic and industrial research, quality improvement initiatives, and higher education coursework.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
How does Facebook always seems to know what the next funny video should be to sustain your attention with the platform? Facebook has not asked you whether you like videos of cats doing something funny: They just seem to know. In fact, FaceBook learns through your behavior on the platform (e.g., how long have you engaged with similar movies, what posts have you previously liked or commented on, etc.). As a result, Facebook is able to sustain the attention of their user for a long time. On the other hand, the typical mHealth apps suffer from rapidly collapsing user engagement levels. To sustain engagement levels, mHealth apps nowadays employ all sorts of intervention strategies. Of course, it would be powerful to know—like Facebook knows—what strategy should be presented to what individual to sustain their engagement. To be able to do that, the first step could be to be able to cluster similar users (and then derive intervention strategies from there). This dataset was collected through a single mHealth app over 8 different mHealth campaigns (i.e., scientific studies). Using this dataset, one could derive clusters from app user event data. One approach could be to differentiate between two phases: a process mining phase and a clustering phase. In the process mining phase one may derive from the dataset the processes (i.e., sequences of app actions) that users undertake. In the clustering phase, based on the processes different users engaged in, one may cluster similar users (i.e., users that perform similar sequences of app actions).
List of files
0-list-of-variables.pdf
includes an overview of different variables within the dataset.
1-description-of-endpoints.pdf
includes a description of the unique endpoints that appear in the dataset.
2-requests.csv
includes the dataset with actual app user event data.
2-requests-by-session.csv
includes the dataset with actual app user event data with a session variable, to differentiate between user requests that were made in the same session.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was generated using Simio simulation software. The simulations model patient flow in healthcare settings, capturing key metrics such as queue times, length of stay (LOS) for patients, and nurse utilization rates. Each CSV file contains time-series data, with measured variables including patient waiting times, resource utilization percentages, and service durations.## File Overview**CheckBloodPressure.csv** - (9 KB): Contains blood pressure Server records of patients.**CheckPatientType.csv** - (19 KB): Identifies the type of each patient (e.g., 1 or 3).**Fill_Information.csv** - (2 KB): Fill information records for new patients.**MedicalRecord1.csv** - (10 KB): Medical record dataset for patient type 1.**MedicalRecord2.csv** - (4 KB): Medical record dataset for patient type 2.**MedicalRecord3.csv** - (2 KB): Medical record dataset for patient type 3.**MedicalRecord4.csv** - (13 KB): Medical record dataset for patient type 4.**OutPatientDepartment.csv** - (18 KB): Data related to the satisfaction and length of stay of an given patient.**Triage.csv** - (13 KB): Data related to the triage process.**README.txt** - (4 KB): Documentation of the dataset, including structure, metadata, and usage.## Common Fields Across Files**Patient ID** (Integer): Unique identifier for each patient.**Patient Type** (Integer): Classification of patient (e.g., 1, 4).**Medical Records Arrival Time** (DateTime): Timestamp of the patient's first arrival in the medical record department.**Exiting Time** (DateTime): Timestamp when the patient exits a Server.**Waiting Time (min)** (Real): Total waiting time before being attended to.**Resource Used** (String): Resource (e.g., Operator) allocated to the patient.**Utilization %** (Real): Utilization rate of the resource as a percentage.**Queue Count Before Processing** (Integer): Number of patients in the queue before processing begins.**Queue Count After Processing** (Integer): Number of patients in the queue after processing ends.**Queue Difference** (Integer): Difference between the before and after queue counts.**Length of Stay (min)** (Real): Total time spent in the simulation by the patient.**LOS without Queues (min)** (Real): Length of stay excluding any queuing time.**Satisfaction %** (Real): Patient satisfaction rating based on their experience.**New Patient?** (String): Indicates if this is a new patient or a returning one.