40 datasets found
  1. H

    data structure definition d024 2017 43 298 d021 contracts

    • dataverse.harvard.edu
    • datasetcatalog.nlm.nih.gov
    Updated Oct 25, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher Felker (2017). data structure definition d024 2017 43 298 d021 contracts [Dataset]. http://doi.org/10.7910/DVN/RPYWDG
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 25, 2017
    Dataset provided by
    Harvard Dataverse
    Authors
    Christopher Felker
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Set of structural metadata associated to a data set, which includes information about how concepts are associated with the measures, dimensions, and attributes of a data cube, along with information about the representation of data and related descriptive metadata. A DSD defines the structure of an organised collection of data (Data Set) by means of concepts with specific roles, and their representation.

  2. FHIR-Profiles-Resources

    • kaggle.com
    zip
    Updated Aug 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    fhirfly (2023). FHIR-Profiles-Resources [Dataset]. https://www.kaggle.com/datasets/fhirfly/fhirr4
    Explore at:
    zip(3709939 bytes)Available download formats
    Dataset updated
    Aug 1, 2023
    Authors
    fhirfly
    Description

    Kaggle Card: FHIR Profiles-Resources JSON File Overview Fast Healthcare Interoperability Resources (FHIR, pronounced "fire") is a standard developed by Health Level Seven International (HL7) for transferring electronic health records. The FHIR Profiles-Resources JSON file is an essential part of this standard. It provides a schema that defines the structure of FHIR resource types, including their properties and attributes.

    Dataset Structure This file is structured in the JSON format, known for its versatility and human-readable nature. Each JSON object corresponds to a unique FHIR resource type, outlining its structure and providing a blueprint for the properties and attributes each resource type should contain.

    Fields Description While the precise properties and attributes differ for each FHIR resource type, the typical elements you may encounter in this file include:

    Id: The unique identifier for the resource type. Url: A global identifier URI for the resource type. Version: The business version of the resource. Name: The human-readable name for the resource type. Status: The publication status of the resource (draft, active, retired). Experimental: A boolean value indicating whether this resource type is experimental. Date: The date of the resource type's last change. Publisher: The individual or organization that published the resource type. Contact: Contact details for the publishers. Description: A natural language description of the resource type. UseContext: A list outlining the usability context for the resource type. Jurisdiction: Identifies the region/country where the resource type is defined. Purpose: An explanation of why the resource type is necessary. Element: A list defining the structure of the properties for the resource type, including data types and relationships with other resource types. Potential Use Cases Schema Validation: Use the schema to validate FHIR data and ensure it aligns with the defined structure and types for each resource. Interoperability: Facilitate the exchange of healthcare information with other FHIR-compatible systems by providing a standardized structure. Data Mapping: Utilize the schema to map data from other formats into the FHIR format, or vice versa. System Design: Aid the design and development of healthcare systems by offering a template for data structure.

  3. a

    Data from: Public Health Departments

    • nc-onemap-2-nconemap.hub.arcgis.com
    • nconemap.gov
    • +3more
    Updated Feb 19, 2010
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NC OneMap / State of North Carolina (2010). Public Health Departments [Dataset]. https://nc-onemap-2-nconemap.hub.arcgis.com/datasets/public-health-departments
    Explore at:
    Dataset updated
    Feb 19, 2010
    Dataset authored and provided by
    NC OneMap / State of North Carolina
    License

    https://www.nconemap.gov/pages/termshttps://www.nconemap.gov/pages/terms

    Area covered
    Description

    State and Local Public Health Departments Governmental public health departments are responsible for creating and maintaining conditions that keep people healthy. A local health department may be locally governed, part of a region or district, be an office or an administrative unit of the state health department, or a hybrid of these. Furthermore, each community has a unique "public health system" comprising individuals and public and private entities that are engaged in activities that affect the public's health. (Excerpted from the Operational Definition of a functional local health department, National Association of County and City Health Officials, November 2005) Please reference http://www.naccho.org/topics/infrastructure/accreditation/upload/OperationalDefinitionBrochure-2.pdf for more information. Facilities involved in direct patient care are intended to be excluded from this dataset; however, some of the entities represented in this dataset serve as both administrative and clinical locations. This dataset only includes the headquarters of Public Health Departments, not their satellite offices. Some health departments encompass multiple counties; therefore, not every county will be represented by an individual record. Also, some areas will appear to have over representation depending on the structure of the health departments in that particular region. Visiting nurses are represented in this dataset if they are contracted through the local government to fulfill the duties and responsibilities of the local health organization. Effort was made by TechniGraphics to verify whether or not each health department tracks statistics on communicable diseases. Records with "-DOD" appended to the end of the [NAME] value are located on a military base, as defined by the Defense Installation Spatial Data Infrastructure (DISDI) military installations and military range boundaries. "#" and "*" characters were automatically removed from standard fields populated by TechniGraphics. Double spaces were replaced by single spaces in these same fields. Text fields in this dataset have been set to all upper case to facilitate consistent database engine search results. All diacritics (e.g., the German umlaut or the Spanish tilde) have been replaced with their closest equivalent English character to facilitate use with database systems that may not support diacritics. The currentness of this dataset is indicated by the [CONTDATE] field. Based on this field, the oldest record dates from 11/25/2009 and the newest record dates from 12/28/2009

  4. Health Care Data Set ( 20+ Tables )

    • kaggle.com
    zip
    Updated Nov 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Moid Ahmed (2025). Health Care Data Set ( 20+ Tables ) [Dataset]. https://www.kaggle.com/datasets/moid1234/health-care-data-set-20-tables
    Explore at:
    zip(2540688774 bytes)Available download formats
    Dataset updated
    Nov 1, 2025
    Authors
    Moid Ahmed
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    NOTE: Please Read Text File named "ERD Relationship Text" for Detailed Information.

    This dataset represents a complete healthcare management system modeled as a relational database containing over 20 interlinked tables. It captures the entire lifecycle of healthcare operations from patient registration to diagnosis, treatment, billing, inventory, and vendor management. The data structure is designed to simulate a real-world hospital information system (HIS), enabling advanced analytics, data modeling, and visualization. You can easily visualize and explore the schema using tools like dbdiagram.io by pasting the provided table definitions.

    The dataset covers multiple operational areas of a hospital including patient information, clinical operations, financial transactions, human resources, and logistics.

    Patient Information includes personal, contact, and emergency details, along with identification and insurance. Clinical Operations include visits, appointments, diagnoses, treatments, and medications. Financial Transactions cover bills, payments, and vendor settlements. Human Resources include staff details, departments, and medical teams. Logistics and Inventory include equipment, medicines, supplies, and vendor relationships.

    • Patients (STG_EHP_PATN) are linked to Appointments, Visits, Diagnoses, Treatments, Bills, and Insurance Policies.
    • Medical Teams (STG_EHP_MEDT) connect Staff with Visits and Treatments.
    • Allergies and Patient Allergies tables track patient-specific allergy information.
    • Financial tables (Bills, Payments, Vendor Payments) are interconnected through reference numbers for consistent transaction tracing.
    • Inventory tables record medicine and equipment stock movements, supply receipts, and vendor sourcing.

    This dataset can be used for data modeling and SQL practice for complex joins and normalization, healthcare analytics projects involving cost analysis, treatment efficiency, and patient demographics, visualization projects in Power BI, Tableau, or Domo for operational insights, building ETL pipelines and data warehouse models for healthcare systems, and machine learning applications such as predicting patient readmission, billing anomalies, or treatment outcomes.

    To explore the data relationships visually, go to dbdiagram.io, paste the entire provided schema code, and press 2 then 1 (or 2 and Enter) to auto-align the diagram. You’ll see an interactive Entity Relationship Diagram (ERD) representing the entire healthcare ecosystem.

    Total Tables: 20+ Total Columns: 200+ Primary Focus: Patient Management, Clinical Operations, Billing, and Supply Chain

  5. H

    Freetext Dataset

    • dtechtive.com
    • find.data.gov.scot
    Updated May 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BARTS HEALTH (2023). Freetext Dataset [Dataset]. https://dtechtive.com/datasets/26414
    Explore at:
    Dataset updated
    May 10, 2023
    Dataset provided by
    BARTS HEALTH
    Description

    Locally defined dataset which contains information from unstructured data held against a patient record. These include freetext notes in the patient record as well as radiology reports and discharge letters.

  6. Meta data and supporting documentation

    • catalog.data.gov
    • s.cnmilf.com
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Meta data and supporting documentation [Dataset]. https://catalog.data.gov/dataset/meta-data-and-supporting-documentation
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    We include a description of the data sets in the meta-data as well as sample code and results from a simulated data set. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The R code is available on line here: https://github.com/warrenjl/SpGPCW. Format: Abstract The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publicly available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. File format: R workspace file. Metadata (including data dictionary) • y: Vector of binary responses (1: preterm birth, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate). This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).

  7. f

    Table_1_A scalable and transparent data pipeline for AI-enabled health data...

    • figshare.com
    docx
    Updated Jul 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tuncay Namli; Ali Anıl Sınacı; Suat Gönül; Cristina Ruiz Herguido; Patricia Garcia-Canadilla; Adriana Modrego Muñoz; Arnau Valls Esteve; Gökçe Banu Laleci Ertürkmen (2024). Table_1_A scalable and transparent data pipeline for AI-enabled health data ecosystems.docx [Dataset]. http://doi.org/10.3389/fmed.2024.1393123.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jul 30, 2024
    Dataset provided by
    Frontiers
    Authors
    Tuncay Namli; Ali Anıl Sınacı; Suat Gönül; Cristina Ruiz Herguido; Patricia Garcia-Canadilla; Adriana Modrego Muñoz; Arnau Valls Esteve; Gökçe Banu Laleci Ertürkmen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionTransparency and traceability are essential for establishing trustworthy artificial intelligence (AI). The lack of transparency in the data preparation process is a significant obstacle in developing reliable AI systems which can lead to issues related to reproducibility, debugging AI models, bias and fairness, and compliance and regulation. We introduce a formal data preparation pipeline specification to improve upon the manual and error-prone data extraction processes used in AI and data analytics applications, with a focus on traceability.MethodsWe propose a declarative language to define the extraction of AI-ready datasets from health data adhering to a common data model, particularly those conforming to HL7 Fast Healthcare Interoperability Resources (FHIR). We utilize the FHIR profiling to develop a common data model tailored to an AI use case to enable the explicit declaration of the needed information such as phenotype and AI feature definitions. In our pipeline model, we convert complex, high-dimensional electronic health records data represented with irregular time series sampling to a flat structure by defining a target population, feature groups and final datasets. Our design considers the requirements of various AI use cases from different projects which lead to implementation of many feature types exhibiting intricate temporal relations.ResultsWe implement a scalable and high-performant feature repository to execute the data preparation pipeline definitions. This software not only ensures reliable, fault-tolerant distributed processing to produce AI-ready datasets and their metadata including many statistics alongside, but also serve as a pluggable component of a decision support application based on a trained AI model during online prediction to automatically prepare feature values of individual entities. We deployed and tested the proposed methodology and the implementation in three different research projects. We present the developed FHIR profiles as a common data model, feature group definitions and feature definitions within a data preparation pipeline while training an AI model for “predicting complications after cardiac surgeries”.DiscussionThrough the implementation across various pilot use cases, it has been demonstrated that our framework possesses the necessary breadth and flexibility to define a diverse array of features, each tailored to specific temporal and contextual criteria.

  8. Supplementary Dataset for “Revealing Principal Components, Patterns, and...

    • figshare.com
    xlsx
    Updated Aug 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adel A. Nasser; Mijahed Nasser Aljober; Abed Saif Ahmed Alghawli; Amani A. K. Elsayed (2025). Supplementary Dataset for “Revealing Principal Components, Patterns, and Structural Gaps in Health Security among High-Income Countries” [Dataset]. http://doi.org/10.6084/m9.figshare.29582498.v2
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Aug 29, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Adel A. Nasser; Mijahed Nasser Aljober; Abed Saif Ahmed Alghawli; Amani A. K. Elsayed
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This supplementary file provides comprehensive support for the findings and methodology presented in the study. It includes detailed outputs from the Principal Component Analysis (PCA), such as factor loadings, eigenvalues, and the percentage of variance explained, along with a full classification of the 37 Global Health Security Index (GHSI) indicators across the nine identified principal components. Additionally, it contains visualizations and datasets for all three clustering scenarios: one based on countries’ average scores across the nine extracted components, another using the 13 high-loading indicators from the first principal component, and a third based on aggregated scores from the six original GHSI categories. The file also presents the resulting cluster centroids, validation comparisons, and identified performance patterns. Together, these materials strengthen the credibility of the analytical approach and ensure transparency for replication, deeper analysis, and peer validation. All data are integrated into a single Excel-based tool that includes the underlying values used to generate the study’s tables and figures. This supplementary resource serves as a detailed and practical reference to replicate the study’s procedures and validate its results.

  9. AI Training Dataset In Healthcare Market Analysis, Size, and Forecast...

    • technavio.com
    pdf
    Updated Oct 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). AI Training Dataset In Healthcare Market Analysis, Size, and Forecast 2025-2029 : North America (US, Canada, and Mexico), Europe (Germany, UK, France, Italy, The Netherlands, and Spain), APAC (China, Japan, India, South Korea, Australia, and Indonesia), South America (Brazil, Argentina, and Colombia), Middle East and Africa (UAE, South Africa, and Turkey), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/ai-training-dataset-in-healthcare-market-industry-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Oct 9, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Time period covered
    2025 - 2029
    Area covered
    Canada, United States
    Description

    Snapshot img { margin: 10px !important; } AI Training Dataset In Healthcare Market Size 2025-2029

    The ai training dataset in healthcare market size is forecast to increase by USD 829.0 million, at a CAGR of 23.5% between 2024 and 2029.

    The global AI training dataset in healthcare market is driven by the expanding integration of artificial intelligence and machine learning across the healthcare and pharmaceutical sectors. This technological shift necessitates high-quality, domain-specific data for applications ranging from ai in medical imaging to clinical operations. A key trend involves the adoption of synthetic data generation, which uses techniques like generative adversarial networks to create realistic, anonymized information. This approach addresses the persistent challenges of data scarcity and stringent patient privacy regulations. The development of applied ai in healthcare is dependent on such innovations to accelerate research timelines and foster more equitable model training.This advancement in ai training dataset creation helps circumvent complex legal frameworks and provides a method for data augmentation, especially for rare diseases. However, the market's progress is constrained by an intricate web of data privacy regulations and security mandates. Navigating compliance with laws like HIPAA and GDPR is a primary operational burden, as the process of de-identification is technically challenging and risks catastrophic compliance failures if re-identification occurs. This regulatory complexity, alongside the need for secure infrastructure for protected health information, acts as a bottleneck, impeding market growth and the broader adoption of ai in patient management and ai in precision medicine.

    What will be the Size of the AI Training Dataset In Healthcare Market during the forecast period?

    Explore in-depth regional segment analysis with market size data - historical 2019 - 2023 and forecasts 2025-2029 - in the full report.
    Request Free SampleThe market for AI training datasets in healthcare is defined by the continuous need for high-quality, structured information to power sophisticated machine learning algorithms. The development of AI in precision medicine and ai in cancer diagnostics depends on access to diverse and accurately labeled datasets, including digital pathology images and multi-omics data integration. The focus is shifting toward creating regulatory-grade datasets that can support clinical validation and commercialization of AI-driven diagnostic tools. This involves advanced data harmonization techniques and robust AI governance protocols to ensure reliability and safety in all applications.Progress in this sector is marked by the evolution from single-modality data to complex multimodal datasets. This shift supports a more holistic analysis required for applications like generative AI in clinical trials and treatment efficacy prediction. Innovations in synthetic data generation and federated learning platforms are addressing key challenges related to patient data privacy and data accessibility. These technologies enable the creation of large-scale, analysis-ready assets while adhering to strict compliance frameworks, supporting the ongoing advancement of applied AI in healthcare and fostering collaborative research environments.

    How is this AI Training Dataset In Healthcare Industry segmented?

    The ai training dataset in healthcare industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in "USD million" for the period 2025-2029, as well as historical data from 2019 - 2023 for the following segments. TypeImageTextOthersComponentSoftwareServicesApplicationMedical imagingElectronic health recordsWearable devicesTelemedicineOthersGeographyNorth AmericaUSCanadaMexicoEuropeGermanyUKFranceItalyThe NetherlandsSpainAPACChinaJapanIndiaSouth KoreaAustraliaIndonesiaSouth AmericaBrazilArgentinaColombiaMiddle East and AfricaUAESouth AfricaTurkeyRest of World (ROW)

    By Type Insights

    The image segment is estimated to witness significant growth during the forecast period.The image data segment is the most mature and largest component of the market, driven by the central role of imaging in modern diagnostics. This category includes modalities such as radiology images, digital pathology whole-slide images, and ophthalmology scans. The development of computer vision models and other AI models is a key factor, with these algorithms designed to improve the diagnostic capabilities of clinicians. Applications include identifying cancerous lesions, segmenting organs for pre-operative planning, and quantifying disease progression in neurological scans.The market for these datasets is sustained by significant technical and logistical hurdles, including the need for regulatory approval for AI-based medical devices, which elevates the demand for high-quality training datasets. The market'

  10. Z

    Dataset - Terminology of e-Oral Health: Consensus Report of the IADR's...

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated Jul 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Uribe, Sergio; Janneke, Scheerman; Mariño, Rodrigo J. (2024). Dataset - Terminology of e-Oral Health: Consensus Report of the IADR's e-Oral Health Network Terminology Task Force. [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10605382
    Explore at:
    Dataset updated
    Jul 7, 2024
    Dataset provided by
    University of Melbourne
    Inholland University of Applied Sciences
    Riga Stradiņš University
    Authors
    Uribe, Sergio; Janneke, Scheerman; Mariño, Rodrigo J.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    README====================This repository contains the data and documentation for a research project. It includes the dataset,which is provided in CSV format and the original PDF with the survey answers.

    Research Information====================Terminology of e-Oral Health: Consensus Report of the IADR’s e-Oral Health Network TerminologyTask Force. Authors reported multiple definitions of e-oral health and related terms, and used several definitionsinterchangeably, like mhealth, teledentistry, teleoral medicine and telehealth. The InternationalAssociation of Dental Research e-Oral Health Network (e-OHN) aimed to establish a consensus onterminology related to digital technologies used in oral healthcare.

    This dataset contains data from a survey about digital oral health. The survey asked participants to provide their definition of various terms related to digital oral health, as well as their agreement with the provided definitions. The dataset also includes three figures that the participants were asked to review.

    The purpose of this dataset is to collect data on the public's understanding of digital oral health terms and to identify areas where there may be confusion or misinterpretation. The data from this dataset could be used to develop educational materials or to improve the way that digital oral health information is communicated to the public.

    Additional notes====================The data is not currently cleaned or preprocessed.

    Dataset====================The dataset file, named "dataset.csv," is in this repository. It contains the raw anonymized datacollected from the participants in a structured format. Each row represents a respondent, and thecolumns correspond to different variables.

    Codebook====================The codebook file, named "codebook.pdf," is also included in this repository. It provides acomprehensive description of the variables present in the dataset. The codebook outlines eachvariable's meaning, type, and possible values, allowing users to understand and analyze the dataeffectively.

    Metadata====================No metadata is provided

    Files====================01_readme.txt this readme file02_codebook.pdf The codebook of the dataset03_dataset.csv The dataset in csv format04_e-OHN Delphi (2023-02-03).pdf The output from the survey

    Usage====================To work with the dataset, you can download the "dataset.csv" file and import it into your preferredsoftware or programming language for analysis. The codebook provides valuable information aboutthe variables, allowing you to understand the data structure and make informed decisions during youranalysis.Please note that while every effort has been made to ensure the accuracy and quality of the data, it isimportant to review the codebook and understand the context of the research before concluding thedataset.

    License====================The data and documentation in this repository are provided under the CC BY-SA.This license enables reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. If you remix, adapt, or build upon the material, you must license the modified material under identical terms. CC BY-SA includes the following elements:

    BY: credit must be given to the creator. SA: Adaptations must be shared under the same terms. Please refer to the license file for further details on how the data can be used and shared.

    Contact Information====================For any questions, clarifications, or inquiries related to the dataset or research project, please contactAssoc Prof Dr Sergio Uribe, sergio.uribe@rsu.lv

  11. Structure Definition

    • johnsnowlabs.com
    csv
    Updated Sep 20, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs (2018). Structure Definition [Dataset]. https://www.johnsnowlabs.com/marketplace/structure-definition/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Sep 20, 2018
    Dataset authored and provided by
    John Snow Labs
    Area covered
    United States
    Description

    The Structure Definition resource describes a structure - a set of data element definitions, and their associated rules of usage. These structure definitions are used to describe both the content defined in the Fast Healthcare Interoperability Resources (FHIR) specification itself - resources, data types, the underlying infrastructural types, and also are used to describe how these structures are used in implementations.

  12. p

    Data from: Immunosuppressive Condition and Medication Annotations for...

    • physionet.org
    Updated Aug 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vijeeth Guggilla; Melissa Bak; Mengjia Kang; Theresa Walunas; Catherine A Gao (2025). Immunosuppressive Condition and Medication Annotations for Admission Notes in the MIMIC-III Database [Dataset]. http://doi.org/10.13026/etd0-dq69
    Explore at:
    Dataset updated
    Aug 4, 2025
    Authors
    Vijeeth Guggilla; Melissa Bak; Mengjia Kang; Theresa Walunas; Catherine A Gao
    License

    https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts

    Description

    Immunosuppression due to underlying conditions or immunosuppressive medication use increases the risk of morbidity and mortality in the context of infectious disease. Identifying patients with immunosuppression is important for better studying and understanding the impact of immunosuppression on critical care outcomes. While structured data (e.g., diagnosis codes, medication orders) from the electronic health record (EHR) can help identify patients with immunosuppression, the reliability of structured data is limited as it can miss more nuanced information that is only present in unstructured data, such as patient notes. We introduce a dataset for phenotyping immunosuppression, defined as identification of a patient’s immune status, based on admission notes. Patient admission notes were extracted from the Medical Information Mart for Intensive Care III (MIMIC-III) dataset, which contains health-related data and clinical notes associated with patients who stayed in critical care units at Beth Israel Deaconess Medical Center between 2001 and 2012. These notes were manually annotated for the presence of several immunosuppressive conditions and immunosuppressive medications. Each admission note was independently annotated by two human annotators, and discrepancies were reviewed by an attending critical care physician. Annotated conditions include solid organ transplant, stem cell transplant, HIV, acute leukemia, lymphoma, multiple myeloma, and immunoglobulin deficiency. Annotated medications include azathioprine, cyclosporine, cyclophosphamide, mycophenolate, rituximab, and tacrolimus. This dataset can be leveraged for medical and computer science research, especially as related to the application of natural language processing and large language models (LLMs) in medicine. It can also be used as a starting point for research related to immunosuppression in critically ill patients.

  13. p

    Data from: mcPHASES: A Dataset of Physiological, Hormonal, and Self-reported...

    • physionet.org
    Updated Sep 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Blue Lin; Jin Yi Li; Kaavya Kalani; Khai Truong; Alex Mariakakis (2025). mcPHASES: A Dataset of Physiological, Hormonal, and Self-reported Events and Symptoms for Menstrual Health Tracking with Wearables [Dataset]. http://doi.org/10.13026/zx6a-2c81
    Explore at:
    Dataset updated
    Sep 9, 2025
    Authors
    Blue Lin; Jin Yi Li; Kaavya Kalani; Khai Truong; Alex Mariakakis
    License

    https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts

    Description

    Individuals who menstruate are frequently led to believe that there is a standard menstrual cycle, typically characterized as 28 days in length with predictable and uniform patterns. This framing often emphasizes cycle dates as the only relevant metric, overlooking the broader physiological and emotional fluctuations throughout the cycle driven by complex hormonal interactions. Consequently, when individuals encounter menstrual experiences that do not align with calendar-based metrics, they are often left without adequate frameworks for understanding their menstrual health, which can result in distress or delays in seeking care. Our work advocates for a new definition of menstrual health that encompasses a wider range of physiological signals in order to acknowledge its connection to overall wellbeing, establish realistic expectations for menstruators, and build better health management systems. However, historical stigmatization has led to a dearth of datasets suitable for pursuing these aims. mcPHASES (menstrual cycle Physiological, Hormonal, and Self-Reported Events and Symptoms) is a comprehensive dataset consisting of multimodal physiological, hormonal, and self-reported measures collected to support holistic menstrual health research. Data from 42 Canadian young adult menstruators was collected across two 3-month periods. Participants wore Fitbit Sense smartwatches and Dexcom G6 continuous glucose monitors to measure physiological signals, and they used Mira Plus Starter Kits to track their hormone levels. Additionally, participants self-reported daily experiences like cramps, sleep quality, and stress levels. The dataset contains 23 structured tables organized by signal category so that researchers can examine relationships between physiological signals and hormonal fluctuations, analyze the impacts of lifestyle factors on the menstrual cycle, and develop better algorithms for menstrual cycle prediction. More broadly, mcPHASES supports research in women's health, digital health technologies, and personalized care by providing unprecedented multimodal data for building a more accurate understanding of menstrual health patterns.

  14. NUCC Health Provider Taxonomy Code Set

    • johnsnowlabs.com
    csv
    Updated Jan 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs (2024). NUCC Health Provider Taxonomy Code Set [Dataset]. https://www.johnsnowlabs.com/marketplace/nucc-health-provider-taxonomy-code-set/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 15, 2024
    Dataset authored and provided by
    John Snow Labs
    Time period covered
    2023
    Area covered
    United States
    Description

    This dataset shows the Health Care Provider Taxonomy code set. These code set are a collection of unique alphanumeric codes, ten characters in length. The code set is structured into three distinct "Levels" including Provider Grouping, Classification, and Area of Specialization.

  15. Means, standard deviations (SD) of hospital costs (in CHF), and results of...

    • figshare.com
    xls
    Updated Jun 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael M. Havranek; Josef Ondrej; Stella Bollmann; Philippe K. Widmer; Simon Spika; Stefan Boes (2023). Means, standard deviations (SD) of hospital costs (in CHF), and results of independent t-tests across different hospital types. [Dataset]. http://doi.org/10.1371/journal.pone.0264212.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 15, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Michael M. Havranek; Josef Ondrej; Stella Bollmann; Philippe K. Widmer; Simon Spika; Stefan Boes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Means, standard deviations (SD) of hospital costs (in CHF), and results of independent t-tests across different hospital types.

  16. D

    Anonymized Blood Test Reports

    • defined.ai
    Updated Jul 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Defined.ai (2025). Anonymized Blood Test Reports [Dataset]. https://defined.ai/datasets/blood-test-reports
    Explore at:
    Dataset updated
    Jul 4, 2025
    Dataset provided by
    Defined.ai
    Description

    Enhance medical LLMs and diagnostic AI with 100,000 anonymized, structured blood test reports. Ideal for reasoning, summarization and healthcare insights.

  17. Feature breakdown of characteristics for clusters identified by K-means.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Sep 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    R. Andrew Taylor; Aidan Gilson; Wade Schulz; Kevin Lopez; Patrick Young; Sameer Pandya; Andreas Coppi; David Chartash; David Fiellin; Gail D’Onofrio (2023). Feature breakdown of characteristics for clusters identified by K-means. [Dataset]. http://doi.org/10.1371/journal.pone.0291572.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Sep 15, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    R. Andrew Taylor; Aidan Gilson; Wade Schulz; Kevin Lopez; Patrick Young; Sameer Pandya; Andreas Coppi; David Chartash; David Fiellin; Gail D’Onofrio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Feature breakdown of characteristics for clusters identified by K-means.

  18. d

    Survey questionnaire and data of health management strategies of resilient...

    • search.dataone.org
    • datadryad.org
    Updated Oct 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Genesis Chong-Echavez; Jessica Webb; Boris Baer; Boris Maciejovsky (2025). Survey questionnaire and data of health management strategies of resilient honey bee stock throughout Southern California [Dataset]. http://doi.org/10.5061/dryad.0k6djhbbq
    Explore at:
    Dataset updated
    Oct 9, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Genesis Chong-Echavez; Jessica Webb; Boris Baer; Boris Maciejovsky
    Area covered
    California
    Description

    The dataset includes two primary components: (1) a survey questionnaire, which outlines the full set of questions designed to investigate beekeeping practices, colony health management strategies, and environmental conditions; and (2) the corresponding survey data, containing anonymized responses from 121 participants, primarily hobbyist beekeepers across Southern California. The data capture key variables such as colony origin, disease management practices, spending patterns, and perceptions of colony health. The dataset is structured to facilitate efficient analysis, with coded responses and clearly defined variables. It holds significant reuse potential for researchers examining honey bee health, sustainable apiculture practices, and the economic implications of different management strategies. Additionally, these data may benefit stakeholders in environmental policy, sustainable agriculture, and beekeeping education. Ethical considerations were carefully addressed, with data anonymi..., The data were collected via a Qualtrics-based online survey distributed to participants during the 2023 University of California Riverside Center for Integrative Bee Research (CIBER) Honey Bee Health Conference. The survey featured multiple-choice, ranking, and open-ended questions to explore beekeeping practices and colony management strategies. Data were systematically coded and reviewed for accuracy before analysis. This dataset provides a valuable resource for future research exploring the resilience of stress-tolerant honey bee genotypes and the socioeconomic factors influencing beekeeping practices. Researchers are encouraged to reference the associated manuscript for detailed insights into the study's findings and analysis., , # Survey questionnaire and data of health management strategies of resilient honey bee stock throughout Southern California

    Dataset DOI: 10.5061/dryad.0k6djhbbq

    Description of the data and file structure

    This dataset was collected as part of a study investigating the health management strategies of stress-tolerant honey bee stocks in Southern California. The research aimed to compare the beekeeping practices, disease management strategies, and economic implications associated with three honey bee colony types: commercial, Californian, and Mixed stock (both commercial and Californian).

    Data were collected through a Qualtrics-based survey administered to beekeepers during the 2023 University of California Riverside Center for Integrative Bee Research (CIBER) Honey Bee Health Conference. The survey included 35 questions covering key topics such as colony origins, disease management practices, pest deterrents, and annual costs associated with maintaining be..., This dataset contains anonymized survey data collected from beekeepers who participated in a study on honey bee health and management practices. All participants provided explicit informed consent for their anonymized responses to be shared in the public domain. Identifiable information such as names, contact details, and precise geographic locations has been removed to ensure participant confidentiality. The dataset complies with the University of California Institutional Review Board (IRB) protocols (HS 22-135) and Dryad's human subjects data standards.

  19. China Big Data Technology Investment Opportunities Market By Component...

    • verifiedmarketresearch.com
    Updated Nov 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VERIFIED MARKET RESEARCH (2024). China Big Data Technology Investment Opportunities Market By Component (Software, Hardware, Services), By Application (Financial Services, Healthcare, Government, Manufacturing, Retail, Others), By Deployment (Cloud, On-Premise), & Region For 2024-2031 [Dataset]. https://www.verifiedmarketresearch.com/product/china-big-data-technology-investment-opportunities-market/
    Explore at:
    Dataset updated
    Nov 16, 2024
    Dataset provided by
    Verified Market Researchhttps://www.verifiedmarketresearch.com/
    Authors
    VERIFIED MARKET RESEARCH
    License

    https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/

    Time period covered
    2024 - 2031
    Area covered
    China
    Description

    China Big Data Technology Investment Opportunities Market was valued at USD 45.2 Billion in 2023 and is projected to reach USD 95.6 Billion by 2031, growing at a CAGR of 9.8% from 2024 to 2031.

    China Big Data Technology Investment Opportunities Market: Definition/Overview

    Big data technology is defined as the complex ecosystem of tools, processes, and methodologies that are utilized to handle extremely large datasets. These technologies are designed to extract valuable insights from structured and unstructured data that is generated at unprecedented volumes. Furthermore, the applications of big data technology are seen across multiple sectors, where data is processed, analyzed, and transformed into actionable intelligence. Advanced analytics, artificial intelligence, and machine learning capabilities are integrated into these systems, through which deeper insights are enabled, and predictive capabilities are enhanced.

  20. Means, standard deviations (SD) and ranges of hospital costs (in CHF).

    • figshare.com
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael M. Havranek; Josef Ondrej; Stella Bollmann; Philippe K. Widmer; Simon Spika; Stefan Boes (2023). Means, standard deviations (SD) and ranges of hospital costs (in CHF). [Dataset]. http://doi.org/10.1371/journal.pone.0264212.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Michael M. Havranek; Josef Ondrej; Stella Bollmann; Philippe K. Widmer; Simon Spika; Stefan Boes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Means, standard deviations (SD) and ranges of hospital costs (in CHF).

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Christopher Felker (2017). data structure definition d024 2017 43 298 d021 contracts [Dataset]. http://doi.org/10.7910/DVN/RPYWDG

data structure definition d024 2017 43 298 d021 contracts

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 25, 2017
Dataset provided by
Harvard Dataverse
Authors
Christopher Felker
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Set of structural metadata associated to a data set, which includes information about how concepts are associated with the measures, dimensions, and attributes of a data cube, along with information about the representation of data and related descriptive metadata. A DSD defines the structure of an organised collection of data (Data Set) by means of concepts with specific roles, and their representation.

Search
Clear search
Close search
Google apps
Main menu