100+ datasets found
  1. f

    Synthetic Dataset of Emergency Healthcare Services

    • figshare.com
    csv
    Updated Dec 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marco Ferreira (2024). Synthetic Dataset of Emergency Healthcare Services [Dataset]. http://doi.org/10.6084/m9.figshare.28012784.v1
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 12, 2024
    Dataset provided by
    figshare
    Authors
    Marco Ferreira
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset was generated using Simio simulation software. The simulations model patient flow in healthcare settings, capturing key metrics such as queue times, length of stay (LOS) for patients, and nurse utilization rates. Each CSV file contains time-series data, with measured variables including patient waiting times, resource utilization percentages, and service durations.## File Overview**CheckBloodPressure.csv** - (9 KB): Contains blood pressure Server records of patients.**CheckPatientType.csv** - (19 KB): Identifies the type of each patient (e.g., 1 or 3).**Fill_Information.csv** - (2 KB): Fill information records for new patients.**MedicalRecord1.csv** - (10 KB): Medical record dataset for patient type 1.**MedicalRecord2.csv** - (4 KB): Medical record dataset for patient type 2.**MedicalRecord3.csv** - (2 KB): Medical record dataset for patient type 3.**MedicalRecord4.csv** - (13 KB): Medical record dataset for patient type 4.**OutPatientDepartment.csv** - (18 KB): Data related to the satisfaction and length of stay of an given patient.**Triage.csv** - (13 KB): Data related to the triage process.**README.txt** - (4 KB): Documentation of the dataset, including structure, metadata, and usage.## Common Fields Across Files**Patient ID** (Integer): Unique identifier for each patient.**Patient Type** (Integer): Classification of patient (e.g., 1, 4).**Medical Records Arrival Time** (DateTime): Timestamp of the patient's first arrival in the medical record department.**Exiting Time** (DateTime): Timestamp when the patient exits a Server.**Waiting Time (min)** (Real): Total waiting time before being attended to.**Resource Used** (String): Resource (e.g., Operator) allocated to the patient.**Utilization %** (Real): Utilization rate of the resource as a percentage.**Queue Count Before Processing** (Integer): Number of patients in the queue before processing begins.**Queue Count After Processing** (Integer): Number of patients in the queue after processing ends.**Queue Difference** (Integer): Difference between the before and after queue counts.**Length of Stay (min)** (Real): Total time spent in the simulation by the patient.**LOS without Queues (min)** (Real): Length of stay excluding any queuing time.**Satisfaction %** (Real): Patient satisfaction rating based on their experience.**New Patient?** (String): Indicates if this is a new patient or a returning one.

  2. Disease Prediction Using Machine Learning

    • dataandsons.com
    csv, zip
    Updated Oct 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    test test (2022). Disease Prediction Using Machine Learning [Dataset]. https://www.dataandsons.com/categories/machine-learning/disease-prediction-using-machine-learning
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Oct 31, 2022
    Dataset provided by
    Authors
    test test
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    About this Dataset

    This dataset will help you apply your existing knowledge to great use. This dataset has 132 parameters on which 42 different types of diseases can be predicted. This dataset consists of 2 CSV files. One of them is for training and the other is for testing your model. Each CSV file has 133 columns. 132 of these columns are symptoms that a person experiences and the last column is the prognosis. These symptoms are mapped to 42 diseases you can classify these sets of symptoms. You are required to train your model on training data and test it on testing data.

    Category

    Machine Learning

    Keywords

    medicine,disease,Healthcare,ML,Machine Learning

    Row Count

    4962

    Price

    $109.00

  3. The MultiCaRe Dataset: A Multimodal Case Report Dataset with Clinical Cases,...

    • zenodo.org
    bin, csv, zip
    Updated Jan 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mauro Nievas Offidani; Mauro Nievas Offidani; Claudio Delrieux; Claudio Delrieux (2024). The MultiCaRe Dataset: A Multimodal Case Report Dataset with Clinical Cases, Labeled Images and Captions from Open Access PMC Articles [Dataset]. http://doi.org/10.5281/zenodo.10079370
    Explore at:
    zip, bin, csvAvailable download formats
    Dataset updated
    Jan 5, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mauro Nievas Offidani; Mauro Nievas Offidani; Claudio Delrieux; Claudio Delrieux
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains multi-modal data from over 75,000 open access and de-identified case reports, including metadata, clinical cases, image captions and more than 130,000 images. Images and clinical cases belong to different medical specialties, such as oncology, cardiology, surgery and pathology. The structure of the dataset allows to easily map images with their corresponding article metadata, clinical case, captions and image labels. Details of the data structure can be found in the file data_dictionary.csv.

    Almost 100,000 patients and almost 400,000 medical doctors and researchers were involved in the creation of the articles included in this dataset. The citation data of each article can be found in the metadata.parquet file.

    Refer to the examples showcased in this GitHub repository to understand how to optimize the use of this dataset.

    For a detailed insight about the contents of this dataset, please refer to this data article published in Data In Brief.

  4. m

    Heart Attack Dataset

    • data.mendeley.com
    • kaggle.com
    Updated Nov 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tarik A. Rashid (2022). Heart Attack Dataset [Dataset]. http://doi.org/10.17632/wmhctcrt5v.1
    Explore at:
    Dataset updated
    Nov 23, 2022
    Authors
    Tarik A. Rashid
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The heart attack datasets were collected at Zheen hospital in Erbil, Iraq, from January 2019 to May 2019. The attributes of this dataset are: age, gender, heart rate, systolic blood pressure, diastolic blood pressure, blood sugar, ck-mb and troponin with negative or positive output. According to the provided information, the medical dataset classifies either heart attack or none. The gender column in the data is normalized: the male is set to 1 and the female to 0. The glucose column is set to 1 if it is > 120; otherwise, 0. As for the output, positive is set to 1 and negative to 0.

  5. Dataset for the paper: "Monant Medical Misinformation Dataset: Mapping...

    • zenodo.org
    • data.niaid.nih.gov
    Updated Apr 22, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ivan Srba; Ivan Srba; Branislav Pecher; Branislav Pecher; Matus Tomlein; Matus Tomlein; Robert Moro; Robert Moro; Elena Stefancova; Elena Stefancova; Jakub Simko; Jakub Simko; Maria Bielikova; Maria Bielikova (2022). Dataset for the paper: "Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked Claims" [Dataset]. http://doi.org/10.5281/zenodo.5996864
    Explore at:
    Dataset updated
    Apr 22, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ivan Srba; Ivan Srba; Branislav Pecher; Branislav Pecher; Matus Tomlein; Matus Tomlein; Robert Moro; Robert Moro; Elena Stefancova; Elena Stefancova; Jakub Simko; Jakub Simko; Maria Bielikova; Maria Bielikova
    Description

    Overview

    This dataset of medical misinformation was collected and is published by Kempelen Institute of Intelligent Technologies (KInIT). It consists of approx. 317k news articles and blog posts on medical topics published between January 1, 1998 and February 1, 2022 from a total of 207 reliable and unreliable sources. The dataset contains full-texts of the articles, their original source URL and other extracted metadata. If a source has a credibility score available (e.g., from Media Bias/Fact Check), it is also included in the form of annotation. Besides the articles, the dataset contains around 3.5k fact-checks and extracted verified medical claims with their unified veracity ratings published by fact-checking organisations such as Snopes or FullFact. Lastly and most importantly, the dataset contains 573 manually and more than 51k automatically labelled mappings between previously verified claims and the articles; mappings consist of two values: claim presence (i.e., whether a claim is contained in the given article) and article stance (i.e., whether the given article supports or rejects the claim or provides both sides of the argument).

    The dataset is primarily intended to be used as a training and evaluation set for machine learning methods for claim presence detection and article stance classification, but it enables a range of other misinformation related tasks, such as misinformation characterisation or analyses of misinformation spreading.

    Its novelty and our main contributions lie in (1) focus on medical news article and blog posts as opposed to social media posts or political discussions; (2) providing multiple modalities (beside full-texts of the articles, there are also images and videos), thus enabling research of multimodal approaches; (3) mapping of the articles to the fact-checked claims (with manual as well as predicted labels); (4) providing source credibility labels for 95% of all articles and other potential sources of weak labels that can be mined from the articles' content and metadata.

    The dataset is associated with the research paper "Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked Claims" accepted and presented at ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22).

    The accompanying Github repository provides a small static sample of the dataset and the dataset's descriptive analysis in a form of Jupyter notebooks.

    Options to access the dataset

    There are two ways how to get access to the dataset:

    1. Static dump of the dataset available in the CSV format
    2. Continuously updated dataset available via REST API

    In order to obtain an access to the dataset (either to full static dump or REST API), please, request the access by following instructions provided below.

    References

    If you use this dataset in any publication, project, tool or in any other form, please, cite the following papers:

    @inproceedings{SrbaMonantPlatform,
      author = {Srba, Ivan and Moro, Robert and Simko, Jakub and Sevcech, Jakub and Chuda, Daniela and Navrat, Pavol and Bielikova, Maria},
      booktitle = {Proceedings of Workshop on Reducing Online Misinformation Exposure (ROME 2019)},
      pages = {1--7},
      title = {Monant: Universal and Extensible Platform for Monitoring, Detection and Mitigation of Antisocial Behavior},
      year = {2019}
    }
    @inproceedings{SrbaMonantMedicalDataset,
      author = {Srba, Ivan and Pecher, Branislav and Tomlein Matus and Moro, Robert and Stefancova, Elena and Simko, Jakub and Bielikova, Maria},
      booktitle = {Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22)},
      numpages = {11},
      title = {Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked Claims},
      year = {2022},
      doi = {10.1145/3477495.3531726},
      publisher = {Association for Computing Machinery},
      address = {New York, NY, USA},
      url = {https://doi.org/10.1145/3477495.3531726},
    }
    


    Dataset creation process

    In order to create this dataset (and to continuously obtain new data), we used our research platform Monant. The Monant platform provides so called data providers to extract news articles/blogs from news/blog sites as well as fact-checking articles from fact-checking sites. General parsers (from RSS feeds, Wordpress sites, Google Fact Check Tool, etc.) as well as custom crawler and parsers were implemented (e.g., for fact checking site Snopes.com). All data is stored in the unified format in a central data storage.


    Ethical considerations

    The dataset was collected and is published for research purposes only. We collected only publicly available content of news/blog articles. The dataset contains identities of authors of the articles if they were stated in the original source; we left this information, since the presence of an author's name can be a strong credibility indicator. However, we anonymised the identities of the authors of discussion posts included in the dataset.

    The main identified ethical issue related to the presented dataset lies in the risk of mislabelling of an article as supporting a false fact-checked claim and, to a lesser extent, in mislabelling an article as not containing a false claim or not supporting it when it actually does. To minimise these risks, we developed a labelling methodology and require an agreement of at least two independent annotators to assign a claim presence or article stance label to an article. It is also worth noting that we do not label an article as a whole as false or true. Nevertheless, we provide partial article-claim pair veracities based on the combination of claim presence and article stance labels.

    As to the veracity labels of the fact-checked claims and the credibility (reliability) labels of the articles' sources, we take these from the fact-checking sites and external listings such as Media Bias/Fact Check as they are and refer to their methodologies for more details on how they were established.

    Lastly, the dataset also contains automatically predicted labels of claim presence and article stance using our baselines described in the next section. These methods have their limitations and work with certain accuracy as reported in this paper. This should be taken into account when interpreting them.


    Reporting mistakes in the dataset

    The mean to report considerable mistakes in raw collected data or in manual annotations is by creating a new issue in the accompanying Github repository. Alternately, general enquiries or requests can be sent at info [at] kinit.sk.


    Dataset structure

    Raw data

    At first, the dataset contains so called raw data (i.e., data extracted by the Web monitoring module of Monant platform and stored in exactly the same form as they appear at the original websites). Raw data consist of articles from news sites and blogs (e.g. naturalnews.com), discussions attached to such articles, fact-checking articles from fact-checking portals (e.g. snopes.com). In addition, the dataset contains feedback (number of likes, shares, comments) provided by user on social network Facebook which is regularly extracted for all news/blogs articles.

    Raw data are contained in these CSV files (and corresponding REST API endpoints):

    • sources.csv
    • articles.csv
    • article_media.csv
    • article_authors.csv
    • discussion_posts.csv
    • discussion_post_authors.csv
    • fact_checking_articles.csv
    • fact_checking_article_media.csv
    • claims.csv
    • feedback_facebook.csv

    Note: Personal information about discussion posts' authors (name, website, gravatar) are anonymised.


    Annotations

    Secondly, the dataset contains so called annotations. Entity annotations describe the individual raw data entities (e.g., article, source). Relation annotations describe relation between two of such entities.

    Each annotation is described by the following attributes:

    1. category of annotation (`annotation_category`). Possible values: label (annotation corresponds to ground truth, determined by human experts) and prediction (annotation was created by means of AI method).
    2. type of annotation (`annotation_type_id`). Example values: Source reliability (binary), Claim presence. The list of possible values can be obtained from enumeration in annotation_types.csv.
    3. method which created annotation (`method_id`). Example values: Expert-based source reliability evaluation, Fact-checking article to claim transformation method. The list of possible values can be obtained from enumeration methods.csv.
    4. its value (`value`). The value is stored in JSON format and its structure differs according to particular annotation type.


    At the same time, annotations are associated with a particular object identified by:

    1. entity type (parameter entity_type in case of entity annotations, or source_entity_type and target_entity_type in case of relation annotations). Possible values: sources, articles, fact-checking-articles.
    2. entity id (parameter entity_id in case of entity annotations, or source_entity_id and target_entity_id in case of relation

  6. Mental Health Dataset

    • kaggle.com
    Updated Mar 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhavik Jikadara (2024). Mental Health Dataset [Dataset]. https://www.kaggle.com/datasets/bhavikjikadara/mental-health-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 18, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Bhavik Jikadara
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset appears to contain a variety of features related to text analysis, sentiment analysis, and psychological indicators, likely derived from posts or text data. Some features include readability indices such as Automated Readability Index (ARI), Coleman Liau Index, and Flesch-Kincaid Grade Level, as well as sentiment analysis scores like sentiment compound, negative, neutral, and positive scores. Additionally, there are features related to psychological aspects such as economic stress, isolation, substance use, and domestic stress. The dataset seems to cover a wide range of linguistic, psychological, and behavioural attributes, potentially suitable for analyzing mental health-related topics in online communities or text data.

    Benefits of using this dataset:

    • Insight into Mental Health: The dataset provides valuable insights into mental health by analyzing linguistic patterns, sentiment, and psychological indicators in text data. Researchers and data scientists can gain a better understanding of how mental health issues manifest in online communication.
    • Predictive Modeling: With a wide range of features, including sentiment analysis scores and psychological indicators, the dataset offers opportunities for developing predictive models to identify or predict mental health outcomes based on textual data. This can be useful for early intervention and support.
    • Community Engagement: Mental health is a topic of increasing importance, and this dataset can foster community engagement on platforms like Kaggle. Data enthusiasts, researchers, and mental health professionals can collaborate to analyze the data and develop solutions to address mental health challenges.
    • Data-driven Insights: By analyzing the dataset, users can uncover correlations and patterns between linguistic features, sentiment, and mental health indicators. These insights can inform interventions, policies, and support systems aimed at promoting mental well-being.
    • Educational Resource: The dataset can serve as a valuable educational resource for teaching and learning about mental health analytics, sentiment analysis, and text mining techniques. It provides a real-world dataset for students and practitioners to apply data science skills in a meaningful context.
  7. c

    Mental Health - Datasets - CTData.org

    • data.ctdata.org
    Updated Jun 24, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). Mental Health - Datasets - CTData.org [Dataset]. http://data.ctdata.org/dataset/mental-health
    Explore at:
    Dataset updated
    Jun 24, 2016
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mental Health reports the prevalence of the mental illness in the past year by age range.

  8. Synthetic Healthcare Database for Research (SyH-DR)

    • catalog.data.gov
    • healthdata.gov
    • +2more
    Updated Sep 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agency for Healthcare Research and Quality (2023). Synthetic Healthcare Database for Research (SyH-DR) [Dataset]. https://catalog.data.gov/dataset/synthetic-healthcare-database-for-research-syh-dr
    Explore at:
    Dataset updated
    Sep 16, 2023
    Dataset provided by
    Agency for Healthcare Research and Qualityhttp://www.ahrq.gov/
    Description

    The Agency for Healthcare Research and Quality (AHRQ) created SyH-DR from eligibility and claims files for Medicare, Medicaid, and commercial insurance plans in calendar year 2016. SyH-DR contains data from a nationally representative sample of insured individuals for the 2016 calendar year. SyH-DR uses synthetic data elements at the claim level to resemble the marginal distribution of the original data elements. SyH-DR person-level data elements are not synthetic, but identifying information is aggregated or masked.

  9. Data from: THE RELEVANCY OF MASSIVE HEALTH EDUCATION IN THE BRAZILIAN PRISON...

    • zenodo.org
    csv, pdf
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Janaína L. R. da S. Valentim; Janaína L. R. da S. Valentim; Sara Dias-Trindade; Sara Dias-Trindade; Eloiza da S. G. Oliveira; Eloiza da S. G. Oliveira; José A. M. Moreira; José A. M. Moreira; Felipe Fernandes; Felipe Fernandes; Manoel Honorio Romão; Manoel Honorio Romão; Philippi S. G. de Morais; Philippi S. G. de Morais; Alexandre R. Caitano; Alexandre R. Caitano; Aline P. Dias; Aline P. Dias; Carlos A. P. Oliveira; Carlos A. P. Oliveira; Karilany D. Coutinho; Karilany D. Coutinho; Ricardo B. Ceccim; Ricardo B. Ceccim; Ricardo A. de M. Valentim; Ricardo A. de M. Valentim (2024). THE RELEVANCY OF MASSIVE HEALTH EDUCATION IN THE BRAZILIAN PRISON SYSTEM: THE COURSE "HEALTH CARE FOR PEOPLE DEPRIVED OF FREEDOM" AND ITS IMPACTS [Dataset]. http://doi.org/10.5281/zenodo.6499752
    Explore at:
    csv, pdfAvailable download formats
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Janaína L. R. da S. Valentim; Janaína L. R. da S. Valentim; Sara Dias-Trindade; Sara Dias-Trindade; Eloiza da S. G. Oliveira; Eloiza da S. G. Oliveira; José A. M. Moreira; José A. M. Moreira; Felipe Fernandes; Felipe Fernandes; Manoel Honorio Romão; Manoel Honorio Romão; Philippi S. G. de Morais; Philippi S. G. de Morais; Alexandre R. Caitano; Alexandre R. Caitano; Aline P. Dias; Aline P. Dias; Carlos A. P. Oliveira; Carlos A. P. Oliveira; Karilany D. Coutinho; Karilany D. Coutinho; Ricardo B. Ceccim; Ricardo B. Ceccim; Ricardo A. de M. Valentim; Ricardo A. de M. Valentim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset name: asppl_dataset_v2.csv

    Version: 2.0

    Dataset period: 06/07/2018 - 01/14/2022

    Dataset Characteristics: Multivalued

    Number of Instances: 8118

    Number of Attributes: 9

    Missing Values: Yes

    Area(s): Health and education

    Sources:

    • Virtual Learning Environment of the Brazilian Health System (AVASUS) (Brasil, 2022a);

    • Brazilian Occupational Classification (CBO) (Brasil, 2022b);

    • National Registry of Health Establishments (CNES) (Brasil, 2022c);

    • Brazilian Institute of Geography and Statistics (IBGE) (Brasil, 2022e).

    Description: The data contained in the asppl_dataset_v2.csv dataset (see Table 1) originates from participants of the technology-based educational course “Health Care for People Deprived of Freedom.” The course is available on the AVASUS (Brasil, 2022a). This dataset provides elementary data for analyzing the course’s impact and reach and the profile of its participants. In addition, it brings an update of the data presented in work by Valentim et al. (2021).

    Table 1: Description of AVASUS dataset features.

    Attributes

    Description

    datatype

    Value

    gender

    Gender of the course participant.

    Categorical.

    Feminino / Masculino / Não Informado. (In English, Female, Male or Uninformed)

    course_progress

    Percentage of completion of the course.

    Numerical.

    Range from 0 to 100.

    course_evaluation

    A score given to the course by the participant.

    Numerical.

    0, 1, 2, 3, 4, 5 or NaN.

    evaluation_commentary

    Comment made by the participant about the course.

    Categorical.

    Free text or NaN.

    region

    Brazilian region in which the participant resides.

    Categorical.

    Brazilian region according to IBGE: Norte, Nordeste, Centro-Oeste, Sudeste or Sul (In English North, Northeast, Midwest, Southeast or South).

    CNES

    The CNES code refers to the health establishment where the participant works.

    Numerical.

    CNES Code or NaN.

    health_care_level

    Identification of the health care network level for which the course participant works.

    Categorical.

    “ATENCAO PRIMARIA”,

    “MEDIA COMPLEXIDADE”,

    “ALTA COMPLEXIDADE”,

    and their possible combinations.

    (In English "PRIMARY HEALTH CARE", "SECONDARY HEALTH CARE" AND "TERTIARY HEALTH CARE")

    year_enrollment

    Year in which the course participant registered.

    Numerical.

    Year (YYYY).

    CBO

    Participant occupation.

    Categorical.

    Text coded according to the Brazilian Classification of Occupations or “Indivíduo sem afiliação formal.” (In English “Individual without formal affiliation.”)

    Dataset name: prison_syphilis_and_population_brazil.csv

    Dataset period: 2017 - 2020

    Dataset Characteristics: Multivalued

    Number of Instances: 6

    Number of Attributes: 13

    Missing Values: No

    Source:

    • National Penitentiary Department (DEPEN) (Brasil, 2022d);

    Description: The data contained in the prison_syphilis_and_population_brazil.csv dataset (see Table 2) originate from the National Penitentiary Department Information System (SISDEPEN) (Brasil, 2022d). This dataset provides data on the population and prevalence of syphilis in the Brazilian prison system. In addition, it brings a rate that represents the normalized data for purposes of comparison between the populations of each region and Brazil.

    Table 2: Description of DEPEN dataset Features.

    Attributes

    Description

    datatype

    Value

    Region

    Brazilian region in which the participant resides. In addition, the sum of the regions, which refers to Brazil.

    Categorical.

    Brazil and Brazilian region according to IBGE: North, Northeast, Midwest, Southeast or South.

    syphilis_2017

    Number of syphilis cases in the prison system in 2017.

    Numerical.

    Number of syphilis cases.

    syphilis_rate_2017

    Normalized rate of syphilis cases in 2017.

    Numerical.

    Syphilis case rate.

    syphilis_2018

    Number of syphilis cases in the prison system in 2018.

    Numerical.

    Number of syphilis cases.

    syphilis_rate_2018

    Normalized rate of syphilis cases in 2018.

    Numerical.

    Syphilis case rate.

    syphilis_2019

    Number of syphilis cases in the prison system in 2019.

    Numerical.

    Number of syphilis cases.

    syphilis_rate_2019

    Normalized rate of syphilis cases in 2019.

    Numerical.

    Syphilis case rate.

    syphilis_2020

    Number of syphilis cases in the prison system in 2020.

    Numerical.

    Number of syphilis cases.

    syphilis_rate_2020

    Normalized rate of syphilis cases in 2020.

    Numerical.

    Syphilis case rate.

    pop_2017

    Prison population in 2017.

    Numerical.

    Population number.

    pop_2018

    Prison population in 2018.

    Numerical.

    Population number.

    pop_2019

    Prison population in 2019.

    Numerical.

    Population number.

    pop_2020

    Prison population in 2020.

    Numerical.

    Population number.

    Dataset name: students_cumulative_sum.csv

    Dataset period: 2018 - 2020

    Dataset Characteristics: Multivalued

    Number of Instances: 6

    Number of Attributes: 7

    Missing Values: No

    Source:

    • Virtual Learning Environment of the Brazilian Health System (AVASUS) (Brasil, 2022a);

    • Brazilian Institute of Geography and Statistics (IBGE) (Brasil, 2022e).

    Description: The data contained in the students_cumulative_sum.csv dataset (see Table 3) originate mainly from AVASUS (Brasil, 2022a). This dataset provides data on the number of students by region and year. In addition, it brings a rate that represents the normalized data for purposes of comparison between the populations of each region and Brazil. We used population data estimated by the IBGE (Brasil, 2022e) to calculate the rate.

    Table 3: Description of Students dataset Features.

  10. R

    Synthetic Dataset of Emergency Healthcare Services

    • datarepositorium.uminho.pt
    • zenodo.org
    csv, txt
    Updated Jan 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Repositório de Dados da Universidade do Minho (2025). Synthetic Dataset of Emergency Healthcare Services [Dataset]. http://doi.org/10.34622/datarepositorium/AKSZQG
    Explore at:
    csv(1259), txt(4064)Available download formats
    Dataset updated
    Jan 17, 2025
    Dataset provided by
    Repositório de Dados da Universidade do Minho
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Synthetic dataset of emergency services comprised of several CSV files that we have generated using a simulation software. This dataset is open for public use; please cite our work if used in research or applications. File Overview CheckBloodPressure.csv** - (9 KB): Contains blood pressure Server records of patients. CheckPatientType.csv** - (19 KB): Identifies the type of each patient (e.g., 1 or 3). Fill_Information.csv - (2 KB): Fill information records for new patients. MedicalRecord1.csv - (10 KB): Medical record dataset for patient type 1. MedicalRecord2.csv - (4 KB): Medical record dataset for patient type 2. MedicalRecord3.csv - (2 KB): Medical record dataset for patient type 3. MedicalRecord4.csv - (13 KB): Medical record dataset for patient type 4. OutPatientDepartment.csv - (18 KB): Data related to the satisfaction and length of stay of an given patient. Triage.csv - (13 KB): Data related to the triage process. README.txt - (4 KB): Documentation of the dataset, including structure, metadata, and usage. Common Fields Across Files Patient ID (Integer): Unique identifier for each patient. Patient Type (Integer): Classification of patient (e.g., 1, 4). Medical Records Arrival Time (DateTime): Timestamp of the patient's first arrival in the medical record department. Exiting Time (DateTime): Timestamp when the patient exits a Server. Waiting Time (min) (Real): Total waiting time before being attended to. Resource Used (String): Resource (e.g., Operator) allocated to the patient. Utilization % (Real): Utilization rate of the resource as a percentage. Queue Count Before Processing (Integer): Number of patients in the queue before processing begins. Queue Count After Processing (Integer): Number of patients in the queue after processing ends. Queue Difference (Integer): Difference between the before and after queue counts. Length of Stay (min) (Real): Total time spent in the simulation by the patient. LOS without Queues (min) (Real): Length of stay excluding any queuing time. Satisfaction % (Real): Patient satisfaction rating based on their experience. New Patient? (String): Indicates if this is a new patient or a returning one.

  11. m

    Disease and symptoms dataset 2023

    • data.mendeley.com
    Updated Mar 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bran Stark (2025). Disease and symptoms dataset 2023 [Dataset]. http://doi.org/10.17632/2cxccsxydc.1
    Explore at:
    Dataset updated
    Mar 3, 2025
    Authors
    Bran Stark
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains disease names along with the symptoms faced by the respective patient. There are a total of 773 unique diseases and 377 symptoms, with ~246,000 rows. The dataset was artificially generated, preserving Symptom Severity and Disease Occurrence Possibility. Several distinct groups of symptoms might all be indicators of the same disease. There may even be one single symptom contributing to a disease in a row or sample. This is an indicator of a very high correlation between the symptom and that particular disease. A larger number of rows for a particular disease corresponds to its higher probability of occurrence in the real world. Similarly, in a row, if the feature vector has the occurrence of a single symptom, it implies that this symptom has more correlation to classify the disease than any one symptom of a feature vector with multiple symptoms in another sample.

  12. Data from: Associations between environmental quality and adult asthma...

    • catalog.data.gov
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Associations between environmental quality and adult asthma prevalence in medical claims data [Dataset]. https://catalog.data.gov/dataset/associations-between-environmental-quality-and-adult-asthma-prevalence-in-medical-claims-d
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    The MarketScan health claims database is a compilation of nearly 110 million patient records with information from more than 100 private insurance carriers and large self-insuring companies. Public forms of insurance (i.e., Medicare and Medicaid) are not included, nor are small (< 100 employees) or medium (1000 employees). We excluded the relatively few (n=6735) individuals over 65 years of age because Medicare is the primary insurance of U.S. adults over 65. The EQI was constructed for 2000-2005 for all US counties and is composed of five domains (air, water, built, land, and sociodemographic), each composed of variables to represent the environmental quality of that domain. Domain-specific EQIs were developed using principal components analysis (PCA) to reduce these variables within each domain while the overall EQI was constructed from a second PCA from these individual domains (L. C. Messer et al., 2014). To account for differences in environment across rural and urban counties, the overall and domain-specific EQIs were stratified by rural urban continuum codes (RUCCs) (U.S. Department of Agriculture, 2015). This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Gray, C., D. Lobdell, K. Rappazzo, Y. Jian, J. Jagai, L. Messer, A. Patel, S. Deflorio-Barker, C. Lyttle, J. Solway, and A. Rzhetsky. Associations between environmental quality and adult asthma prevalence in medical claims data. ENVIRONMENTAL RESEARCH. Elsevier B.V., Amsterdam, NETHERLANDS, 166: 529-536, (2018).

  13. Data from: OpenChart-SE: A corpus of artificial Swedish electronic health...

    • zenodo.org
    • data.niaid.nih.gov
    bin, csv, pdf, txt
    Updated Jul 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johanna Berg; Johanna Berg; Carl Ollvik Aasa; Björn Appelgren Thorell; Sonja Aits; Sonja Aits; Carl Ollvik Aasa; Björn Appelgren Thorell (2024). OpenChart-SE: A corpus of artificial Swedish electronic health records for imagined emergency care patients written by physicians in a crowd-sourcing project [Dataset]. http://doi.org/10.5281/zenodo.7499831
    Explore at:
    txt, csv, bin, pdfAvailable download formats
    Dataset updated
    Jul 15, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Johanna Berg; Johanna Berg; Carl Ollvik Aasa; Björn Appelgren Thorell; Sonja Aits; Sonja Aits; Carl Ollvik Aasa; Björn Appelgren Thorell
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Electronic health records (EHRs) are a rich source of information for medical research and public health monitoring. Information systems based on EHR data could also assist in patient care and hospital management. However, much of the data in EHRs is in the form of unstructured text, which is difficult to process for analysis. Natural language processing (NLP), a form of artificial intelligence, has the potential to enable automatic extraction of information from EHRs and several NLP tools adapted to the style of clinical writing have been developed for English and other major languages. In contrast, the development of NLP tools for less widely spoken languages such as Swedish has lagged behind. A major bottleneck in the development of NLP tools is the restricted access to EHRs due to legitimate patient privacy concerns. To overcome this issue we have generated a citizen science platform for collecting artificial Swedish EHRs with the help of Swedish physicians and medical students. These artificial EHRs describe imagined but plausible emergency care patients in a style that closely resembles EHRs used in emergency departments in Sweden. In the pilot phase, we collected a first batch of 50 artificial EHRs, which has passed review by an experienced Swedish emergency care physician. We make this dataset publicly available as OpenChart-SE corpus (version 1) under an open-source license for the NLP research community. The project is now open for general participation and Swedish physicians and medical students are invited to submit EHRs on the project website (https://github.com/Aitslab/openchart-se), where additional batches of quality-controlled EHRs will be released periodically.

    Dataset content

    OpenChart-SE, version 1 corpus (txt files and and dataset.csv)

    The OpenChart-SE corpus, version 1, contains 50 artificial EHRs (note that the numbering starts with 5 as 1-4 were test cases that were not suitable for publication). The EHRs are available in two formats, structured as a .csv file and as separate textfiles for annotation. Note that flaws in the data were not cleaned up so that it simulates what could be encountered when working with data from different EHR systems. All charts have been checked for medical validity by a resident in Emergency Medicine at a Swedish hospital before publication.

    Codebook.xlsx

    The codebook contain information about each variable used. It is in XLSForm-format, which can be re-used in several different applications for data collection.

    suppl_data_1_openchart-se_form.pdf

    OpenChart-SE mock emergency care EHR form.

    suppl_data_3_openchart-se_dataexploration.ipynb

    This jupyter notebook contains the code and results from the analysis of the OpenChart-SE corpus.

    More details about the project and information on the upcoming preprint accompanying the dataset can be found on the project website (https://github.com/Aitslab/openchart-se).

  14. Z

    A dataset of anonymised hospitalised COVID-19 patient data: outcomes,...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stopard, Isaac J (2022). A dataset of anonymised hospitalised COVID-19 patient data: outcomes, demographics and biomarker measurements for two New York hospitals [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6771833
    Explore at:
    Dataset updated
    Jun 29, 2022
    Dataset provided by
    Lambert, Ben
    Zuretti, Alejandro
    Momeni-Boroujeni
    Mendoza, Rachelle
    Stopard, Isaac J
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    New York
    Description

    These datasets are for a cohort of n=1540 anonymised hospitalised COVID-19 patients, and the data provide information on outcomes (i.e. patient death or discharge), demographics and biomarker measurements for two New York hospitals: State University of New York (SUNY) Downstate Health Sciences University and Maimonides Medical Center.

    The file "demographics_both_hospitals.csv" contains the ultimate outcomes of hospitalisation (whether a patient was discharged or died), demographic information and known comorbidities for each of the patients.

    The file "dynamics_clean_both_hospitals.csv" contains cleaned dynamic biomarker measurements for the n=1233 patients where this information was available and the data passed our various checks (see https://doi.org/10.1101/2021.11.12.21266248 for information of these checks and the cleaning process). Patients can be matched to demographic data via the "id" column.

    Study approval and data collection

    Study approval was obtained from the State University of New York (SUNY) Downstate Health Sciences University Institutional Review Board (IRB#1595271-1) and Maimonides Medical Center Institutional Review Board/Research Committee (IRB#2020-05-07). A retrospective query was performed among the patients who were admitted to SUNY Downstate Medical Center and Maimonides Medical Center with COVID-19-related symptoms, which was subsequently confirmed by RT PCR, from the beginning of February 2020 until the end of May 2020. Stratified randomization was used to select at least 500 patients who were discharged and 500 patients who died due to the complications of COVID-19. Patient outcome was recorded as a binary choice of “discharged” versus “COVID-19 related mortality”. Patients whose outcome was unknown were excluded. Demographic, clinical history and laboratory data was extracted from the hospital’s electronic health records.

  15. Dataset for MH Journal article.csv

    • figshare.com
    txt
    Updated Nov 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sandrine Ingabire (2023). Dataset for MH Journal article.csv [Dataset]. http://doi.org/10.6084/m9.figshare.24550777.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 13, 2023
    Dataset provided by
    figshare
    Authors
    Sandrine Ingabire
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this paper, I have assessed the perceptions and attitudes of college students towards formal mental health support including campus-based mental healthcare services in Rwanda.

  16. Healthcare Real-Time Risk Dataset

    • kaggle.com
    Updated Apr 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alekhya Abburi (2025). Healthcare Real-Time Risk Dataset [Dataset]. https://www.kaggle.com/datasets/alekhyaabburi/healthcare
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 25, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Alekhya Abburi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains synthetically generated real-time healthcare data intended for machine learning applications such as patient risk prediction, early diagnosis modeling, and health outcome analysis. The dataset simulates 1,000 patient records with 10 meaningful medical and demographic attributes including age, gender, heart rate, blood pressure, respiratory rate, oxygen saturation, glucose levels, smoking status, diabetes history, and a computed health risk score.

  17. m

    EHR Dataset for Patient Treatment Classification

    • data.mendeley.com
    • paperswithcode.com
    Updated May 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mujiono Sadikin (2020). EHR Dataset for Patient Treatment Classification [Dataset]. http://doi.org/10.17632/7kv3rctx7m.1
    Explore at:
    Dataset updated
    May 10, 2020
    Authors
    Mujiono Sadikin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset is Electronic Health Record Predicting collected from a private Hospital in Indonesia. It contains the patients laboratory test results used to determine next patient treatment whether in care or out care patient. The task embedded to the dataset is classification prediction.

  18. Healthcare Management System

    • kaggle.com
    Updated Dec 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anouska Abhisikta (2023). Healthcare Management System [Dataset]. https://www.kaggle.com/datasets/anouskaabhisikta/healthcare-management-system
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 23, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Anouska Abhisikta
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Patients Table:

    • PatientID: Unique identifier for each patient.
    • firstname: First name of the patient.
    • lastname: Last name of the patient.
    • email: Email address of the patient.

    This table stores information about individual patients, including their names and contact details.

    Doctors Table:

    • DoctorID: Unique identifier for each doctor.
    • DoctorName: Full name of the doctor.
    • Specialization: Area of medical specialization.
    • DoctorContact: Contact details of the doctor.

    This table contains details about healthcare providers, including their names, specializations, and contact information.

    Appointments Table:

    • AppointmentID: Unique identifier for each appointment.
    • Date: Date of the appointment.
    • Time: Time of the appointment.
    • PatientID: Foreign key referencing the Patients table, indicating the patient for the appointment.
    • DoctorID: Foreign key referencing the Doctors table, indicating the doctor for the appointment.

    This table records scheduled appointments, linking patients to doctors.

    MedicalProcedure Table:

    • ProcedureID: Unique identifier for each medical procedure.
    • ProcedureName: Name or description of the medical procedure.
    • AppointmentID: Foreign key referencing the Appointments table, indicating the appointment associated with the procedure.

    This table stores details about medical procedures associated with specific appointments.

    Billing Table:

    • InvoiceID: Unique identifier for each billing transaction.
    • PatientID: Foreign key referencing the Patients table, indicating the patient for the billing transaction.
    • Items: Description of items or services billed.
    • Amount: Amount charged for the billing transaction.

    This table maintains records of billing transactions, associating them with specific patients.

    demo Table:

    • ID: Primary key, serves as a unique identifier for each record.
    • Name: Name of the entity.
    • Hint: Additional information or hint about the entity.

    This table appears to be a demonstration or testing table, possibly unrelated to the healthcare management system.

    This dataset schema is designed to capture comprehensive information about patients, doctors, appointments, medical procedures, and billing transactions in a healthcare management system. Adjustments can be made based on specific requirements, and additional attributes can be included as needed.

  19. P

    MIMIC-III Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Apr 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alistair E.W. Johnson; Tom J. Pollard; Lu Shen; Li-wei H. Lehman; Mengling Feng; Mohammad Ghassemi; Benjamin Moody; Peter Szolovits; Leo Anthony Celi; Roger G. Mark (2022). MIMIC-III Dataset [Dataset]. https://paperswithcode.com/dataset/mimic-iii
    Explore at:
    Dataset updated
    Apr 20, 2022
    Authors
    Alistair E.W. Johnson; Tom J. Pollard; Lu Shen; Li-wei H. Lehman; Mengling Feng; Mohammad Ghassemi; Benjamin Moody; Peter Szolovits; Leo Anthony Celi; Roger G. Mark
    Description

    The Medical Information Mart for Intensive Care III (MIMIC-III) dataset is a large, de-identified and publicly-available collection of medical records. Each record in the dataset includes ICD-9 codes, which identify diagnoses and procedures performed. Each code is partitioned into sub-codes, which often include specific circumstantial details. The dataset consists of 112,000 clinical reports records (average length 709.3 tokens) and 1,159 top-level ICD-9 codes. Each report is assigned to 7.6 codes, on average. Data includes vital signs, medications, laboratory measurements, observations and notes charted by care providers, fluid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, and more.

    The database supports applications including academic and industrial research, quality improvement initiatives, and higher education coursework.

  20. f

    Dataset of mHealth event logs

    • figshare.com
    pdf
    Updated May 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raoul Nuijten; Pieter Van Gorp (2022). Dataset of mHealth event logs [Dataset]. http://doi.org/10.6084/m9.figshare.19688730.v2
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 1, 2022
    Dataset provided by
    figshare
    Authors
    Raoul Nuijten; Pieter Van Gorp
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    How does Facebook always seems to know what the next funny video should be to sustain your attention with the platform? Facebook has not asked you whether you like videos of cats doing something funny: They just seem to know. In fact, FaceBook learns through your behavior on the platform (e.g., how long have you engaged with similar movies, what posts have you previously liked or commented on, etc.). As a result, Facebook is able to sustain the attention of their user for a long time. On the other hand, the typical mHealth apps suffer from rapidly collapsing user engagement levels. To sustain engagement levels, mHealth apps nowadays employ all sorts of intervention strategies. Of course, it would be powerful to know—like Facebook knows—what strategy should be presented to what individual to sustain their engagement. To be able to do that, the first step could be to be able to cluster similar users (and then derive intervention strategies from there). This dataset was collected through a single mHealth app over 8 different mHealth campaigns (i.e., scientific studies). Using this dataset, one could derive clusters from app user event data. One approach could be to differentiate between two phases: a process mining phase and a clustering phase. In the process mining phase one may derive from the dataset the processes (i.e., sequences of app actions) that users undertake. In the clustering phase, based on the processes different users engaged in, one may cluster similar users (i.e., users that perform similar sequences of app actions).

    List of files

    0-list-of-variables.pdf includes an overview of different variables within the dataset. 1-description-of-endpoints.pdf includes a description of the unique endpoints that appear in the dataset. 2-requests.csv includes the dataset with actual app user event data. 2-requests-by-session.csv includes the dataset with actual app user event data with a session variable, to differentiate between user requests that were made in the same session.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Marco Ferreira (2024). Synthetic Dataset of Emergency Healthcare Services [Dataset]. http://doi.org/10.6084/m9.figshare.28012784.v1

Synthetic Dataset of Emergency Healthcare Services

Explore at:
csvAvailable download formats
Dataset updated
Dec 12, 2024
Dataset provided by
figshare
Authors
Marco Ferreira
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset was generated using Simio simulation software. The simulations model patient flow in healthcare settings, capturing key metrics such as queue times, length of stay (LOS) for patients, and nurse utilization rates. Each CSV file contains time-series data, with measured variables including patient waiting times, resource utilization percentages, and service durations.## File Overview**CheckBloodPressure.csv** - (9 KB): Contains blood pressure Server records of patients.**CheckPatientType.csv** - (19 KB): Identifies the type of each patient (e.g., 1 or 3).**Fill_Information.csv** - (2 KB): Fill information records for new patients.**MedicalRecord1.csv** - (10 KB): Medical record dataset for patient type 1.**MedicalRecord2.csv** - (4 KB): Medical record dataset for patient type 2.**MedicalRecord3.csv** - (2 KB): Medical record dataset for patient type 3.**MedicalRecord4.csv** - (13 KB): Medical record dataset for patient type 4.**OutPatientDepartment.csv** - (18 KB): Data related to the satisfaction and length of stay of an given patient.**Triage.csv** - (13 KB): Data related to the triage process.**README.txt** - (4 KB): Documentation of the dataset, including structure, metadata, and usage.## Common Fields Across Files**Patient ID** (Integer): Unique identifier for each patient.**Patient Type** (Integer): Classification of patient (e.g., 1, 4).**Medical Records Arrival Time** (DateTime): Timestamp of the patient's first arrival in the medical record department.**Exiting Time** (DateTime): Timestamp when the patient exits a Server.**Waiting Time (min)** (Real): Total waiting time before being attended to.**Resource Used** (String): Resource (e.g., Operator) allocated to the patient.**Utilization %** (Real): Utilization rate of the resource as a percentage.**Queue Count Before Processing** (Integer): Number of patients in the queue before processing begins.**Queue Count After Processing** (Integer): Number of patients in the queue after processing ends.**Queue Difference** (Integer): Difference between the before and after queue counts.**Length of Stay (min)** (Real): Total time spent in the simulation by the patient.**LOS without Queues (min)** (Real): Length of stay excluding any queuing time.**Satisfaction %** (Real): Patient satisfaction rating based on their experience.**New Patient?** (String): Indicates if this is a new patient or a returning one.

Search
Clear search
Close search
Google apps
Main menu