100+ datasets found
  1. Data from: Clinical Dataset

    • kaggle.com
    zip
    Updated Oct 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamadreza Momeni (2023). Clinical Dataset [Dataset]. https://www.kaggle.com/datasets/imtkaggleteam/clinical-dataset
    Explore at:
    zip(16220 bytes)Available download formats
    Dataset updated
    Oct 5, 2023
    Authors
    Mohamadreza Momeni
    Description

    The purest type of electronic clinical data which is obtained at the point of care at a medical facility, hospital, clinic or practice. Often referred to as the electronic medical record (EMR), the EMR is generally not available to outside researchers. The data collected includes administrative and demographic information, diagnosis, treatment, prescription drugs, laboratory tests, physiologic monitoring data, hospitalization, patient insurance, etc.

    Individual organizations such as hospitals or health systems may provide access to internal staff. Larger collaborations, such as the NIH Collaboratory Distributed Research Network provides mediated or collaborative access to clinical data repositories by eligible researchers. Additionally, the UW De-identified Clinical Data Repository (DCDR) and the Stanford Center for Clinical Informatics allow for initial cohort identification.

    About Dataset:

    333 scholarly articles cite this dataset.

    Unique identifier: DOI

    Dataset updated: 2023

    Authors: Haoyang Mi

    In this dataset, we have two dataset:

    1- Clinical Data_Discovery_Cohort: Name of columns: Patient ID Specimen date Dead or Alive Date of Death Date of last Follow Sex Race Stage Event Time

    2- Clinical_Data_Validation_Cohort Name of columns: Patient ID Survival time (days) Event Tumor size Grade Stage Age Sex Cigarette Pack per year Type Adjuvant Batch EGFR KRAS

    Feel free to put your thought and analysis in a notebook for this datasets. And you can create some interesting and valuable ML projects for this case. Thanks for your attention.

  2. mimic-iii-clinical-database-demo-1.4

    • kaggle.com
    zip
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Montassar bellah (2025). mimic-iii-clinical-database-demo-1.4 [Dataset]. https://www.kaggle.com/datasets/montassarba/mimic-iii-clinical-database-demo-1-4
    Explore at:
    zip(11100065 bytes)Available download formats
    Dataset updated
    Apr 1, 2025
    Authors
    Montassar bellah
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Abstract MIMIC-III is a large, freely-available database comprising deidentified health-related data associated with over 40,000 patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012 [1]. The MIMIC-III Clinical Database is available on PhysioNet (doi: 10.13026/C2XW26). Though deidentified, MIMIC-III contains detailed information regarding the care of real patients, and as such requires credentialing before access. To allow researchers to ascertain whether the database is suitable for their work, we have manually curated a demo subset, which contains information for 100 patients also present in the MIMIC-III Clinical Database. Notably, the demo dataset does not include free-text notes.

    Background In recent years there has been a concerted move towards the adoption of digital health record systems in hospitals. Despite this advance, interoperability of digital systems remains an open issue, leading to challenges in data integration. As a result, the potential that hospital data offers in terms of understanding and improving care is yet to be fully realized.

    MIMIC-III integrates deidentified, comprehensive clinical data of patients admitted to the Beth Israel Deaconess Medical Center in Boston, Massachusetts, and makes it widely accessible to researchers internationally under a data use agreement. The open nature of the data allows clinical studies to be reproduced and improved in ways that would not otherwise be possible.

    The MIMIC-III database was populated with data that had been acquired during routine hospital care, so there was no associated burden on caregivers and no interference with their workflow. For more information on the collection of the data, see the MIMIC-III Clinical Database page.

    Methods The demo dataset contains all intensive care unit (ICU) stays for 100 patients. These patients were selected randomly from the subset of patients in the dataset who eventually die. Consequently, all patients will have a date of death (DOD). However, patients do not necessarily die during an individual hospital admission or ICU stay.

    This project was approved by the Institutional Review Boards of Beth Israel Deaconess Medical Center (Boston, MA) and the Massachusetts Institute of Technology (Cambridge, MA). Requirement for individual patient consent was waived because the project did not impact clinical care and all protected health information was deidentified.

    Data Description MIMIC-III is a relational database consisting of 26 tables. For a detailed description of the database structure, see the MIMIC-III Clinical Database page. The demo shares an identical schema, except all rows in the NOTEEVENTS table have been removed.

    The data files are distributed in comma separated value (CSV) format following the RFC 4180 standard. Notably, string fields which contain commas, newlines, and/or double quotes are encapsulated by double quotes ("). Actual double quotes in the data are escaped using an additional double quote. For example, the string she said "the patient was notified at 6pm" would be stored in the CSV as "she said ""the patient was notified at 6pm""". More detail is provided on the RFC 4180 description page: https://tools.ietf.org/html/rfc4180

    Usage Notes The MIMIC-III demo provides researchers with an opportunity to review the structure and content of MIMIC-III before deciding whether or not to carry out an analysis on the full dataset.

    CSV files can be opened natively using any text editor or spreadsheet program. However, some tables are large, and it may be preferable to navigate the data stored in a relational database. One alternative is to create an SQLite database using the CSV files. SQLite is a lightweight database format which stores all constituent tables in a single file, and SQLite databases interoperate well with a number software tools.

    DB Browser for SQLite is a high quality, visual, open source tool to create, design, and edit database files compatible with SQLite. We have found this tool to be useful for navigating SQLite files. Information regarding installation of the software and creation of the database can be found online: https://sqlitebrowser.org/

    Release Notes Release notes for the demo follow the release notes for the MIMIC-III database.

    Acknowledgements This research and development was supported by grants NIH-R01-EB017205, NIH-R01-EB001659, and NIH-R01-GM104987 from the National Institutes of Health. The authors would also like to thank Philips Healthcare and staff at the Beth Israel Deaconess Medical Center, Boston, for supporting database development, and Ken Pierce for providing ongoing support for the MIMIC research community.

    Conflicts of Interest The authors declare no competing financial interests.

    References Johnson, A. E. W., Pollard, T. J., Shen, L., Lehman, L. H., Feng, M., Ghassemi, M., Mo...

  3. p

    MIMIC-IV Clinical Database Demo

    • physionet.org
    • registry.opendata.aws
    Updated Jan 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Steven Horng; Leo Anthony Celi; Roger Mark (2023). MIMIC-IV Clinical Database Demo [Dataset]. http://doi.org/10.13026/dp1f-ex47
    Explore at:
    Dataset updated
    Jan 31, 2023
    Authors
    Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Steven Horng; Leo Anthony Celi; Roger Mark
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    The Medical Information Mart for Intensive Care (MIMIC)-IV database is comprised of deidentified electronic health records for patients admitted to the Beth Israel Deaconess Medical Center. Access to MIMIC-IV is limited to credentialed users. Here, we have provided an openly-available demo of MIMIC-IV containing a subset of 100 patients. The dataset includes similar content to MIMIC-IV, but excludes free-text clinical notes. The demo may be useful for running workshops and for assessing whether the MIMIC-IV is appropriate for a study before making an access request.

  4. f

    Sample characterization and clinical data.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    • +1more
    Updated Jun 19, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    de Almeida Pinto-Sarmento, Tássia Cristina; Granville-Garcia, Ana Flávia; Paiva, Saul Martins; Martins, Carolina Castro; Gomes, Monalisa Cesarino; Clementino, Marayza Alves (2015). Sample characterization and clinical data. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001921346
    Explore at:
    Dataset updated
    Jun 19, 2015
    Authors
    de Almeida Pinto-Sarmento, Tássia Cristina; Granville-Garcia, Ana Flávia; Paiva, Saul Martins; Martins, Carolina Castro; Gomes, Monalisa Cesarino; Clementino, Marayza Alves
    Description

    Sample characterization and clinical data.

  5. Data cleaning using unstructured data

    • zenodo.org
    zip
    Updated Jul 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rihem Nasfi; Rihem Nasfi; Antoon Bronselaer; Antoon Bronselaer (2024). Data cleaning using unstructured data [Dataset]. http://doi.org/10.5281/zenodo.13135983
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 30, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Rihem Nasfi; Rihem Nasfi; Antoon Bronselaer; Antoon Bronselaer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this project, we work on repairing three datasets:

    • Trials design: This dataset was obtained from the European Union Drug Regulating Authorities Clinical Trials Database (EudraCT) register and the ground truth was created from external registries. In the dataset, multiple countries, identified by the attribute country_protocol_code, conduct the same clinical trials which is identified by eudract_number. Each clinical trial has a title that can help find informative details about the design of the trial.
    • Trials population: This dataset delineates the demographic origins of participants in clinical trials primarily conducted across European countries. This dataset include structured attributes indicating whether the trial pertains to a specific gender, age group or healthy volunteers. Each of these categories is labeled as (`1') or (`0') respectively denoting whether it is included in the trials or not. It is important to note that the population category should remain consistent across all countries conducting the same clinical trial identified by an eudract_number. The ground truth samples in the dataset were established by aligning information about the trial populations provided by external registries, specifically the CT.gov database and the German Trials database. Additionally, the dataset comprises other unstructured attributes that categorize the inclusion criteria for trial participants such as inclusion.
    • Allergens: This dataset contains information about products and their allergens. The data was collected from the German version of the `Alnatura' (Access date: 24 November, 2020), a free database of food products from around the world `Open Food Facts', and the websites: `Migipedia', 'Piccantino', and `Das Ist Drin'. There may be overlapping products across these websites. Each product in the dataset is identified by a unique code. Samples with the same code represent the same product but are extracted from a differentb source. The allergens are indicated by (‘2’) if present, or (‘1’) if there are traces of it, and (‘0’) if it is absent in a product. The dataset also includes information on ingredients in the products. Overall, the dataset comprises categorical structured data describing the presence, trace, or absence of specific allergens, and unstructured text describing ingredients.

    N.B: Each '.zip' file contains a set of 5 '.csv' files which are part of the afro-mentioned datasets:

    • "{dataset_name}_train.csv": samples used for the ML-model training. (e.g "allergens_train.csv")
    • "{dataset_name}_test.csv": samples used to test the the ML-model performance. (e.g "allergens_test.csv")
    • "{dataset_name}_golden_standard.csv": samples represent the ground truth of the test samples. (e.g "allergens_golden_standard.csv")
    • "{dataset_name}_parker_train.csv": samples repaired using Parker Engine used for the ML-model training. (e.g "allergens_parker_train.csv")
    • "{dataset_name}_parker_train.csv": samples repaired using Parker Engine used to test the the ML-model performance. (e.g "allergens_parker_test.csv")
  6. n

    Data from: Generalizable EHR-R-REDCap pipeline for a national...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Jan 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sophia Shalhout; Farees Saqlain; Kayla Wright; Oladayo Akinyemi; David Miller (2022). Generalizable EHR-R-REDCap pipeline for a national multi-institutional rare tumor patient registry [Dataset]. http://doi.org/10.5061/dryad.rjdfn2zcm
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 9, 2022
    Dataset provided by
    Harvard Medical School
    Massachusetts General Hospital
    Authors
    Sophia Shalhout; Farees Saqlain; Kayla Wright; Oladayo Akinyemi; David Miller
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Objective: To develop a clinical informatics pipeline designed to capture large-scale structured EHR data for a national patient registry.

    Materials and Methods: The EHR-R-REDCap pipeline is implemented using R-statistical software to remap and import structured EHR data into the REDCap-based multi-institutional Merkel Cell Carcinoma (MCC) Patient Registry using an adaptable data dictionary.

    Results: Clinical laboratory data were extracted from EPIC Clarity across several participating institutions. Labs were transformed, remapped and imported into the MCC registry using the EHR labs abstraction (eLAB) pipeline. Forty-nine clinical tests encompassing 482,450 results were imported into the registry for 1,109 enrolled MCC patients. Data-quality assessment revealed highly accurate, valid labs. Univariate modeling was performed for labs at baseline on overall survival (N=176) using this clinical informatics pipeline.

    Conclusion: We demonstrate feasibility of the facile eLAB workflow. EHR data is successfully transformed, and bulk-loaded/imported into a REDCap-based national registry to execute real-world data analysis and interoperability.

    Methods eLAB Development and Source Code (R statistical software):

    eLAB is written in R (version 4.0.3), and utilizes the following packages for processing: DescTools, REDCapR, reshape2, splitstackshape, readxl, survival, survminer, and tidyverse. Source code for eLAB can be downloaded directly (https://github.com/TheMillerLab/eLAB).

    eLAB reformats EHR data abstracted for an identified population of patients (e.g. medical record numbers (MRN)/name list) under an Institutional Review Board (IRB)-approved protocol. The MCCPR does not host MRNs/names and eLAB converts these to MCCPR assigned record identification numbers (record_id) before import for de-identification.

    Functions were written to remap EHR bulk lab data pulls/queries from several sources including Clarity/Crystal reports or institutional EDW including Research Patient Data Registry (RPDR) at MGB. The input, a csv/delimited file of labs for user-defined patients, may vary. Thus, users may need to adapt the initial data wrangling script based on the data input format. However, the downstream transformation, code-lab lookup tables, outcomes analysis, and LOINC remapping are standard for use with the provided REDCap Data Dictionary, DataDictionary_eLAB.csv. The available R-markdown ((https://github.com/TheMillerLab/eLAB) provides suggestions and instructions on where or when upfront script modifications may be necessary to accommodate input variability.

    The eLAB pipeline takes several inputs. For example, the input for use with the ‘ehr_format(dt)’ single-line command is non-tabular data assigned as R object ‘dt’ with 4 columns: 1) Patient Name (MRN), 2) Collection Date, 3) Collection Time, and 4) Lab Results wherein several lab panels are in one data frame cell. A mock dataset in this ‘untidy-format’ is provided for demonstration purposes (https://github.com/TheMillerLab/eLAB).

    Bulk lab data pulls often result in subtypes of the same lab. For example, potassium labs are reported as “Potassium,” “Potassium-External,” “Potassium(POC),” “Potassium,whole-bld,” “Potassium-Level-External,” “Potassium,venous,” and “Potassium-whole-bld/plasma.” eLAB utilizes a key-value lookup table with ~300 lab subtypes for remapping labs to the Data Dictionary (DD) code. eLAB reformats/accepts only those lab units pre-defined by the registry DD. The lab lookup table is provided for direct use or may be re-configured/updated to meet end-user specifications. eLAB is designed to remap, transform, and filter/adjust value units of semi-structured/structured bulk laboratory values data pulls from the EHR to align with the pre-defined code of the DD.

    Data Dictionary (DD)

    EHR clinical laboratory data is captured in REDCap using the ‘Labs’ repeating instrument (Supplemental Figures 1-2). The DD is provided for use by researchers at REDCap-participating institutions and is optimized to accommodate the same lab-type captured more than once on the same day for the same patient. The instrument captures 35 clinical lab types. The DD serves several major purposes in the eLAB pipeline. First, it defines every lab type of interest and associated lab unit of interest with a set field/variable name. It also restricts/defines the type of data allowed for entry for each data field, such as a string or numerics. The DD is uploaded into REDCap by every participating site/collaborator and ensures each site collects and codes the data the same way. Automation pipelines, such as eLAB, are designed to remap/clean and reformat data/units utilizing key-value look-up tables that filter and select only the labs/units of interest. eLAB ensures the data pulled from the EHR contains the correct unit and format pre-configured by the DD. The use of the same DD at every participating site ensures that the data field code, format, and relationships in the database are uniform across each site to allow for the simple aggregation of the multi-site data. For example, since every site in the MCCPR uses the same DD, aggregation is efficient and different site csv files are simply combined.

    Study Cohort

    This study was approved by the MGB IRB. Search of the EHR was performed to identify patients diagnosed with MCC between 1975-2021 (N=1,109) for inclusion in the MCCPR. Subjects diagnosed with primary cutaneous MCC between 2016-2019 (N= 176) were included in the test cohort for exploratory studies of lab result associations with overall survival (OS) using eLAB.

    Statistical Analysis

    OS is defined as the time from date of MCC diagnosis to date of death. Data was censored at the date of the last follow-up visit if no death event occurred. Univariable Cox proportional hazard modeling was performed among all lab predictors. Due to the hypothesis-generating nature of the work, p-values were exploratory and Bonferroni corrections were not applied.

  7. B

    Open Data Training Workshop: Case Studies in Open Data for Qualitative and...

    • borealisdata.ca
    • search.dataone.org
    Updated Apr 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Srinvivas Murthy; Maggie Woo Kinshella; Jessica Trawin; Teresa Johnson; Niranjan Kissoon; Matthew Wiens; Gina Ogilvie; Gurm Dhugga; J Mark Ansermino (2023). Open Data Training Workshop: Case Studies in Open Data for Qualitative and Quantitative Clinical Research [Dataset]. http://doi.org/10.5683/SP3/BNNAE7
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 18, 2023
    Dataset provided by
    Borealis
    Authors
    Srinvivas Murthy; Maggie Woo Kinshella; Jessica Trawin; Teresa Johnson; Niranjan Kissoon; Matthew Wiens; Gina Ogilvie; Gurm Dhugga; J Mark Ansermino
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Dataset funded by
    Digital Research Alliance of Canada
    Description

    Objective(s): Momentum for open access to research is growing. Funding agencies and publishers are increasingly requiring researchers make their data and research outputs open and publicly available. However, clinical researchers struggle to find real-world examples of Open Data sharing. The aim of this 1 hr virtual workshop is to provide real-world examples of Open Data sharing for both qualitative and quantitative data. Specifically, participants will learn: 1. Primary challenges and successes when sharing quantitative and qualitative clinical research data. 2. Platforms available for open data sharing. 3. Ways to troubleshoot data sharing and publish from open data. Workshop Agenda: 1. “Data sharing during the COVID-19 pandemic” - Speaker: Srinivas Murthy, Clinical Associate Professor, Department of Pediatrics, Faculty of Medicine, University of British Columbia. Investigator, BC Children's Hospital 2. “Our experience with Open Data for the 'Integrating a neonatal healthcare package for Malawi' project.” - Speaker: Maggie Woo Kinshella, Global Health Research Coordinator, Department of Obstetrics and Gynaecology, BC Children’s and Women’s Hospital and University of British Columbia This workshop draws on work supported by the Digital Research Alliance of Canada. Data Description: Presentation slides, Workshop Video, and Workshop Communication Srinivas Murthy: Data sharing during the COVID-19 pandemic presentation and accompanying PowerPoint slides. Maggie Woo Kinshella: Our experience with Open Data for the 'Integrating a neonatal healthcare package for Malawi' project presentation and accompanying Powerpoint slides. This workshop was developed as part of Dr. Ansermino's Data Champions Pilot Project supported by the Digital Research Alliance of Canada. NOTE for restricted files: If you are not yet a CoLab member, please complete our membership application survey to gain access to restricted files within 2 business days. Some files may remain restricted to CoLab members. These files are deemed more sensitive by the file owner and are meant to be shared on a case-by-case basis. Please contact the CoLab coordinator on this page under "collaborate with the pediatric sepsis colab."

  8. SAE sample data (CSV)

    • springernature.figshare.com
    txt
    Updated Jan 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jian Du; XUANYU SHI (2024). SAE sample data (CSV) [Dataset]. http://doi.org/10.6084/m9.figshare.24633675.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 2, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Jian Du; XUANYU SHI
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    SAE sample data (CSV)

  9. p

    MIMIC-III Clinical Database CareVue subset

    • physionet.org
    Updated Sep 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alistair Johnson; Tom Pollard; Roger Mark (2022). MIMIC-III Clinical Database CareVue subset [Dataset]. http://doi.org/10.13026/8a4q-w170
    Explore at:
    Dataset updated
    Sep 21, 2022
    Authors
    Alistair Johnson; Tom Pollard; Roger Mark
    License

    https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts

    Description

    MIMIC-III is a database of critically ill patients admitted to an intensive care unit (ICU) at the Beth Israel Deaconess Medical Center (BIDMC) in Boston, MA. MIMIC-III has seen broad use, and was updated with the release of MIMIC-IV. MIMIC-IV contains more contemporaneous stays, higher granularity data, and expanded domains of information. To maximize the sample size of MIMIC-IV, the database overlaps with MIMIC-III, and specifically both databases contain the same admissions which occurred between 2008 - 2012. This overlap complicates analyses of the two databases simultaneously. Here we provide a subset of MIMIC-III containing patients who are not in MIMIC-IV. The goal of this project is to simplify the combination of MIMIC-III with MIMIC-IV.

  10. s

    Clinical Data of Matched Primary and Locally Recurrent Breast Cancer Samples...

    • figshare.scilifelab.se
    • researchdata.se
    txt
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tommaso de Marchi (2025). Clinical Data of Matched Primary and Locally Recurrent Breast Cancer Samples [Dataset]. http://doi.org/10.17044/scilifelab.21904590.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 15, 2025
    Dataset provided by
    Lund University
    Authors
    Tommaso de Marchi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Clinical metadata of all samples included in the study "Proteogenomics decodes the evolution of human ipsilateral breast cancer". De Marchi T, Pyl PT, Sjöström M, Reinsbach SE, DiLorenzo S, Nystedt B, Tran L, Pekar G, Wärnberg F, Fredriksson I, Malmström P, Fernö M, Malmström L, Malmström J, Nimèus E..

    File reports clinical data of 27 primary breast cancers and their associated ipsilateral breast tumor recurrences (samples marked with S). Additionally, a cohort of 21 primary breast tumors with no recurrence is reported (samples marked with V). Data includes age at diagnosis of primary tumor, time to recurrence (S samples) or follow-up (V samples), Estrogen receptor status (positive/negative), progesterone receptor status (positive/negative), ERBB2 status (normal/amplified), proliferation marker Ki-67 (low/high), tumor grade (1/2/3), and adjuvant therapies (yes/no).

    This dataset was used for Figure 1-6 in the following manuscript: "Proteogenomics decodes the evolution of human ipsilateral breast cancer". De Marchi T, Pyl PT, Sjöström M, Reinsbach SE, DiLorenzo S, Nystedt B, Tran L, Pekar G, Wärnberg F, Fredriksson I, Malmström P, Fernö M, Malmström L, Malmström J, Nimèus E. accepted for publication

  11. c

    CP-NET: Hemi-NET Clinical Database Release

    • portal.conp.ca
    • portal-test.conp.ca
    Updated Jan 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ontario Brain Institute (2023). CP-NET: Hemi-NET Clinical Database Release [Dataset]. https://portal.conp.ca/dataset?id=projects/braincode_CP-NET
    Explore at:
    Dataset updated
    Jan 26, 2023
    Dataset authored and provided by
    Ontario Brain Institute
    Description

    This controlled data release focuses on CP-NET's initial Clinical Database which solely focused on children and youth, aged 2-18, with a confirmed diagnosis of hemiplegic cerebral palsy (CP). The Hemi-NET Clinical Database has data on 320 children and youth from across Ontario. The released data is organized around the following platforms: (1) Clinical Risk Factor Platform: clinically relevant neonatal and obstetric risk factors from obstetrical and neonatal health charts, (2) Genomics Platform: saliva samples acquired from the index child and both biological parent(s), (3) Neuroimaging Platform: standardized coding of clinically acquired neuroimaging, (4) Neurodevelopmental Platform: standardized assessments of gross motor, fine motor, language, cognitive, behavioural function, and self-reported quality of life.

  12. H

    Dummy ADaM datasets

    • dataverse.harvard.edu
    Updated Nov 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yen Phan (2024). Dummy ADaM datasets [Dataset]. http://doi.org/10.7910/DVN/L7RURL
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 28, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Yen Phan
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This sample study dataset contains dummy CDISC ADaM formatted data files created for demo purposes. It can be used by anyone interested in a CDISC ADaM formatted dataset. Contact me if you would like more dummy ADaM datasets to be published.

  13. US Clinical Trials Market Analysis - Size and Forecast 2025-2029

    • technavio.com
    pdf
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). US Clinical Trials Market Analysis - Size and Forecast 2025-2029 [Dataset]. https://www.technavio.com/report/us-clinical-trials-market-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Time period covered
    2025 - 2029
    Description

    Snapshot img

    US Clinical Trials Market Size 2025-2029

    The us clinical trials market size is valued to increase USD 6.5 billion, at a CAGR of 5.3% from 2024 to 2029. Rise in number of clinical trials of drugs will drive the us clinical trials market.

    Major Market Trends & Insights

    By Type - Phase III segment was valued at USD 9.50 billion in 2022
    By Service Type - Interventional studies segment accounted for the largest market revenue share in 2022
    

    Market Size & Forecast

    Market Opportunities: USD 61.02 billion
    Market Future Opportunities: USD 6.50 billion
    CAGR from 2024 to 2029 : 5.3%
    

    Market Summary

    The Clinical Trials Market in the US is a dynamic and evolving landscape shaped by advancements in core technologies and applications, service types, and regulatory frameworks. With the rise in the number of clinical trials for drugs, the market is witnessing significant growth. According to a recent report, the adoption rate of electronic data capture (EDC) systems in clinical trials has surged to over 70%, revolutionizing data management and analysis. However, the increasing cost of clinical trials poses a major challenge for market participants. In 2020, the average cost of a Phase III trial was estimated to be around USD4.5 billion. Despite these challenges, opportunities abound, particularly in areas such as personalized medicine and remote patient monitoring. As technology and scientific research continue to advance, the Clinical Trials Market in the US remains an exciting and innovative space.

    What will be the Size of the US Clinical Trials Market during the forecast period?

    Get Key Insights on Market Forecast (PDF) Request Free Sample

    How is the Clinical Trials in US Market Segmented and what are the key trends of market segmentation?

    The clinical trials in us industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments. TypePhase IIIPhase IPhase IIPhase IVService TypeInterventional studiesObservational studiesExpanded access studiesIndicationOncologyCNSAutoimmune/inflammationOthersGeographyNorth AmericaUS

    By Type Insights

    The phase iii segment is estimated to witness significant growth during the forecast period.

    The clinical trials market in the US is a dynamic and evolving landscape, with ongoing activities and emerging patterns shaping the drug development process. Phase 3 trials, a crucial segment, assess the safety and efficacy of new drugs or treatments on larger patient populations. In April 2024, the FDA granted accelerated approval to Enhertu for adult patients with unresectable or metastatic HER2-positive solid tumors who have previously undergone systemic treatment. This approval underscores Enhertu's potential to address a significant unmet need, solidifying its role in the market. Throughout the clinical trial process, from protocol development and sample size calculation to patient recruitment, informed consent, and adverse event reporting, regulatory compliance is paramount. Technological advancements, such as electronic health records, remote patient monitoring, and eCRF systems, facilitate more efficient data collection and management. Study design, including blinded, placebo-controlled, and parallel group trials, ensures rigorous testing and unbiased results. Adaptive clinical trials allow for real-time data analysis and adjustments, enhancing trial efficiency. Key aspects, like clinical data management, biomarker identification, and statistical analysis plans, ensure data integrity and standardization. Investigator training, interim analysis, and trial monitoring maintain study quality and regulatory compliance. With a focus on data privacy and security, the clinical trials market continues to evolve, addressing the needs of patients and stakeholders alike.

    Request Free Sample

    The Phase III segment was valued at USD 9.50 billion in 2019 and showed a gradual increase during the forecast period.

    Request Free Sample

    Market Dynamics

    Our researchers analyzed the data with 2024 as the base year, along with the key drivers, trends, and challenges. A holistic analysis of drivers will help companies refine their marketing strategies to gain a competitive advantage.

    The clinical trials market in the US is witnessing significant advancements, driven by the adoption of innovative technologies and strategies to streamline trial processes and enhance patient engagement. One such technology, the clinical trial data management system, is gaining traction due to its ability to facilitate efficient data collection, processing, and reporting. This system integrates various tools such as remote patient monitoring technology, electronic case report forms (eCRFs), and clinical trial data visualization too

  14. f

    Tissue samples and clinical data from patients and donors.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    • +1more
    Updated Feb 27, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dutertre, Charles-Antoine; Nicoletti, Antonino; Schäkel, Knut; Castier, Yves; Michel, Jean-Baptiste; Morvan, Marion; Alsac, Jean-Marc; Clement, Marc (2014). Tissue samples and clinical data from patients and donors. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001170620
    Explore at:
    Dataset updated
    Feb 27, 2014
    Authors
    Dutertre, Charles-Antoine; Nicoletti, Antonino; Schäkel, Knut; Castier, Yves; Michel, Jean-Baptiste; Morvan, Marion; Alsac, Jean-Marc; Clement, Marc
    Description

    Tissue samples and clinical data from patients and donors.

  15. Synthetic Healthcare Database for Research (SyH-DR)

    • catalog.data.gov
    • healthdata.gov
    • +2more
    Updated Sep 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agency for Healthcare Research and Quality (2023). Synthetic Healthcare Database for Research (SyH-DR) [Dataset]. https://catalog.data.gov/dataset/synthetic-healthcare-database-for-research-syh-dr
    Explore at:
    Dataset updated
    Sep 16, 2023
    Dataset provided by
    Agency for Healthcare Research and Qualityhttp://www.ahrq.gov/
    Description

    The Agency for Healthcare Research and Quality (AHRQ) created SyH-DR from eligibility and claims files for Medicare, Medicaid, and commercial insurance plans in calendar year 2016. SyH-DR contains data from a nationally representative sample of insured individuals for the 2016 calendar year. SyH-DR uses synthetic data elements at the claim level to resemble the marginal distribution of the original data elements. SyH-DR person-level data elements are not synthetic, but identifying information is aggregated or masked.

  16. d

    Labo data file showing examples of available lab test results

    • datarade.ai
    .csv, .xls, .txt
    Updated Nov 22, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Medical Data Vision (2015). Labo data file showing examples of available lab test results [Dataset]. https://datarade.ai/data-products/labo-data-file-showing-examples-of-available-lab-test-results-medical-data-vision
    Explore at:
    .csv, .xls, .txtAvailable download formats
    Dataset updated
    Nov 22, 2015
    Dataset authored and provided by
    Medical Data Vision
    Area covered
    Japan
    Description

    The lab test results is already provided by about 20 % of hospitals providing us their medical data.

    This dataset is a valuable resource for healthcare professionals, researchers, and organizations looking to analyze and understand the prevalence and distribution of various medical conditions in Japan. It can be used for epidemiological studies, healthcare planning, and medical research. The inclusion of ICD-10 codes allows for standardized analysis and comparison of diseases, and the patient count provides essential data for assessing the burden and impact of these conditions on the healthcare system and population.

  17. d

    Data from: Compliance with mandatory reporting of clinical trial results on...

    • datadryad.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Jan 4, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew P. Prayle; Matthew N. Hurley; Alan R. Smyth (2012). Compliance with mandatory reporting of clinical trial results on ClinicalTrials.gov: cross sectional study [Dataset]. http://doi.org/10.5061/dryad.j512f21p
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 4, 2012
    Dataset provided by
    Dryad
    Authors
    Andrew P. Prayle; Matthew N. Hurley; Alan R. Smyth
    Time period covered
    Dec 13, 2011
    Area covered
    United States
    Description

    clinicaltrials.gov_searchThis is complete original dataset.identify completed trialsThis is the R script which when run on "clinicaltrials.gov_search.txt" will produce a .csv file which lists all the completed trials.FDA_table_with_sensThis is the final dataset after cross referencing the trials. An explanation of the variables is included in the supplementary file "2011-10-31 Prayle Hurley Smyth Supplementary file 3 variables in the dataset".analysis_after_FDA_categorization_and_sensThis R script reproduces the analysis from the paper, including the tables and statistical tests. The comments should make it self explanatory.2011-11-02 prayle hurley smyth supplementary file 1 STROBE checklistThis is a STROBE checklist for the study2011-10-31 Prayle Hurley Smyth Supplementary file 2 examples of categorizationThis is a supplementary file which illustrates some of the decisions which had to be made when categorizing trials.2011-10-31 Prayle Hurley Smyth Supplementary file 3 variables in th...

  18. G

    Clinical Data Warehouse Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Aug 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Clinical Data Warehouse Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/clinical-data-warehouse-market
    Explore at:
    csv, pptx, pdfAvailable download formats
    Dataset updated
    Aug 23, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Clinical Data Warehouse Market Outlook




    According to our latest research, the global clinical data warehouse market size reached USD 2.84 billion in 2024, demonstrating robust demand across healthcare and life sciences sectors. The market is expected to expand at a CAGR of 11.2% from 2025 to 2033, reaching a forecasted value of USD 7.36 billion by 2033. This impressive growth trajectory is primarily fueled by the increasing adoption of data-driven healthcare, regulatory mandates for data integration, and the rising emphasis on evidence-based clinical decision-making worldwide.




    One of the most significant growth factors for the clinical data warehouse market is the exponential rise in healthcare data volumes generated by electronic health records (EHRs), medical imaging, genomics, and connected medical devices. Healthcare providers and research institutions are facing mounting pressure to harness this data for actionable insights, improved patient outcomes, and operational efficiency. Clinical data warehouses serve as the backbone for integrating disparate data sources, standardizing information, and enabling advanced analytics and artificial intelligence (AI) applications. As healthcare organizations increasingly prioritize digital transformation, the demand for robust, scalable, and secure clinical data warehousing solutions continues to surge, driving market expansion.




    Another key driver is the growing regulatory emphasis on data interoperability, patient privacy, and quality reporting. Governments and regulatory bodies across the globe are mandating the adoption of interoperable health IT systems and standardized data formats to ensure seamless data exchange and compliance with regulations such as HIPAA, GDPR, and the 21st Century Cures Act. Clinical data warehouses play a critical role in facilitating regulatory compliance, supporting quality reporting initiatives, and enabling value-based care models. Their ability to aggregate, cleanse, and harmonize clinical, operational, and financial data empowers healthcare organizations to demonstrate care quality, optimize reimbursements, and participate in population health management programs.




    The rapid advancement of artificial intelligence, machine learning, and predictive analytics is also transforming the clinical data warehouse landscape. These technologies require high-quality, well-structured data repositories for training algorithms, developing predictive models, and conducting real-world evidence studies. Clinical data warehouses are increasingly being integrated with advanced analytics platforms, enabling real-time insights for clinical research, patient stratification, risk prediction, and personalized medicine. As the healthcare industry moves toward precision health and data-driven innovation, the strategic value of clinical data warehouses is expected to grow, further accelerating market growth.




    From a regional perspective, North America currently dominates the global clinical data warehouse market, accounting for the largest revenue share in 2024. This leadership is attributed to the presence of advanced healthcare infrastructure, widespread adoption of EHRs, and strong regulatory frameworks supporting health data integration. Europe follows closely, driven by stringent data protection regulations and growing investments in digital health. Meanwhile, the Asia Pacific region is emerging as the fastest-growing market, propelled by healthcare modernization initiatives, increasing adoption of cloud-based solutions, and government efforts to digitize healthcare systems. Latin America and the Middle East & Africa are also witnessing steady growth, albeit from a smaller base, as healthcare providers in these regions increasingly recognize the value of data-driven decision-making.





    Component Analysis




    The clinical data warehouse market is segmented by component into software, hardware, and services, each playing a pivotal role in the ecosystem. Software represents the largest segment

  19. Identifying Diseases Treatments in Healthcare Data

    • kaggle.com
    zip
    Updated Mar 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sagar Maru (2025). Identifying Diseases Treatments in Healthcare Data [Dataset]. https://www.kaggle.com/datasets/marusagar/identifying-diseases-treatments-in-healthcare-data
    Explore at:
    zip(166655 bytes)Available download formats
    Dataset updated
    Mar 5, 2025
    Authors
    Sagar Maru
    Description

    Identifying Entities (Diseases, Treatments) in Healthcare Data

    Finding diseases and treatments in medical text—because even AI needs a medical degree to understand doctor’s notes! 🩺🤖

    📊 Understanding the Dataset

    In the contemporary healthcare ecosystem, substantial amounts of unstructured textual facts are generated day by day thru electronic health facts (EHRs), medical doctor’s notes, prescriptions, and medical literature. The potential to extract meaningful insights from this records is critical for improving patient care, advancing clinical studies, and optimizing healthcare offerings. The dataset in cognizance incorporates text-based totally scientific statistics, in which sicknesses and their corresponding remedies are embedded inside unstructured sentences.

    The dataset consists of categorized textual content samples, that are classified into: -**Train Sentences**: These sentences comprise clinical records, including patient diagnoses and the treatments administered. -**Train Labels**: The corresponding annotations for the train sentences, marking diseases and remedies as named entities. -**Test Sentences**: Similar to educate sentences however used to evaluate model overall performance. -**Test Labels**: The ground reality labels for the test sentences.

    A sneak from the dataset may look as follows:

    🔍 Example from Dataset:

    Train Sentences:

    _ "The patient was a 62 -year -old man with squamous epithelium, who was previously treated with success with a combination of radiation therapy and chemotherapy."

    Train Labels:

    • Disease: 🦠 lung cancer
    • Treatment: 💉 Radiation therapy, chemotherapy

    This dataset requires the use of** designated Unit Recognition (NER)** to remove and map and map diseases for related treatments 💊, causing the composition of unarmed medical data for analytical purposes.

    ⚙️ Dataset Properties

    1. Unnecessary medical text: Data set contains free-powered medical notes, where disease and treatment conditions are clearly mentioned. Removing this information without clear mapping is a challenge.
    2. Many unit types: Datasets contain different - -called institutions such as diseases, treatment, symptoms and possibly medication.
    3. Relevant addiction: Many treatments apply to many diseases, and proper mapping depends on reference. For example, "radiotherapy" is used for different cancers, which makes relevant understanding significantly.
    4. Unbalanced data distribution: Some diseases and treatment can be displayed more often than others, to balance model performance requires techniques such as overfalling, sub -sampling or transmission of learning.
    5. Domain-specific language: is rich in lesson medical terminology, which requires special preprochet using domain-specific NLP techniques and medical oncology such as UML or SNOM CT.

    🚧 Challenges Working with Dataset

    • Complex medical vocabulary: Medical texts often use vocals, which require special NLP models that are trained at the clinical company.

    • Implicit Relationships: Unlike based datasets, ailment-treatment relationships are inferred from context in preference to explicitly stated.

    • Synonyms and Abbreviations: Diseases and treatments can be cited the use of special names (e.G., ‘myocardial infarction’ vs. ‘coronary heart assault’). Handling such versions is vital.

    • Noise in Data: Unstructured records may additionally contain irrelevant records, typographical errors, and inconsistencies that affect extraction accuracy.

    🛠️ Approach to Extracting Insights from the Dataset

    To extract sicknesses and their respective treatments from this dataset, we follow a based NLP pipeline:

    1. Data Preprocessing 🧹

    • Text Cleaning: Remove needless characters, numbers, and stopwords whilst preserving clinical terms.
    • Tokenization: Split sentences into phrases for higher processing.
    • Medical Term Standardization: Use area-precise libraries like SciSpacy to standardize synonyms and abbreviations.

    2. Named Entity Recognition (NER) Model Development 🤖

    • Annotation: Ensure accurate labeling of sicknesses and treatments in the dataset.
    • Model Selection: Train a deep-mastering-based version like BioBERT or a rule-based model the use of spaCy.
    • Training: Use annotated data to teach a custom NER model that classifies words as sickness or treatment entities.
    • Evaluation: Measure precision, bear in mind, and F1-score to evaluate version overall performance.

    3. Mapping Diseases to Treatments 🔄

    • Contextual Relationship Extraction: Identify which treatment corresponds to which sickness using dependency parsing and courting extraction.
    • Dictionary or Tabular Output: Store extracted mappings in a based layout.

    Example Output:

    | 🦠 Disease | 💉 Treatments | |----------|--------------------...

  20. MIMIC-III - Deep Reinforcement Learning

    • kaggle.com
    zip
    Updated Apr 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asjad K (2022). MIMIC-III - Deep Reinforcement Learning [Dataset]. https://www.kaggle.com/datasets/asjad99/mimiciii
    Explore at:
    zip(11100065 bytes)Available download formats
    Dataset updated
    Apr 7, 2022
    Authors
    Asjad K
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Digitization of healthcare data along with algorithmic breakthroughts in AI will have a major impact on healthcare delivery in coming years. Its intresting to see application of AI to assist clinicians during patient treatment in a privacy preserving way. While scientific knowledge can help guide interventions, there remains a key need to quickly cut through the space of decision policies to find effective strategies to support patients during the care process.

    Offline Reinforcement learning (also referred to as safe or batch reinforcement learning) is a promising sub-field of RL which provides us with a mechanism for solving real world sequential decision making problems where access to simulator is not available. Here we assume that learn a policy from fixed dataset of trajectories with further interaction with the environment(agent doesn't receive reward or punishment signal from the environment). It has shown that such an approach can leverage vast amount of existing logged data (in the form of previous interactions with the environment) and can outperform supervised learning approaches or heuristic based policies for solving real world - decision making problems. Offline RL algorithms when trained on sufficiently large and diverse offline datasets can produce close to optimal policies(ability to generalize beyond training data).

    As Part of my PhD, research, I investigated the problem of developing a Clinical Decision Support System for Sepsis Management using Offline Deep Reinforcement Learning.

    MIMIC-III ('Medical Information Mart for Intensive Care') is a large open-access anonymized single-center database which consists of comprehensive clinical data of 61,532 critical care admissions from 2001–2012 collected at a Boston teaching hospital. Dataset consists of 47 features (including demographics, vitals, and lab test results) on a cohort of sepsis patients who meet the sepsis-3 definition criteria.

    we try to answer the following question:

    Given a particular patient’s characteristics and physiological information at each time step as input, can our DeepRL approach, learn an optimal treatment policy that can prescribe the right intervention(e.g use of ventilator) to the patient each stage of the treatment process, in order to improve the final outcome(e.g patient mortality)?

    we can use popular state-of-the-art algorithms such as Deep Q Learning(DQN), Double Deep Q Learning (DDQN), DDQN combined with BNC, Mixed Monte Carlo(MMC) and Persistent Advantage Learning (PAL). Using these methods we can train an RL policy to recommend optimum treatment path for a given patient.

    Data acquisition, standard pre-processing and modelling details can be found here in Github repo: https://github.com/asjad99/MIMIC_RL_COACH

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mohamadreza Momeni (2023). Clinical Dataset [Dataset]. https://www.kaggle.com/datasets/imtkaggleteam/clinical-dataset
Organization logo

Data from: Clinical Dataset

Clinical data for both discovery and validation cohorts

Related Article
Explore at:
zip(16220 bytes)Available download formats
Dataset updated
Oct 5, 2023
Authors
Mohamadreza Momeni
Description

The purest type of electronic clinical data which is obtained at the point of care at a medical facility, hospital, clinic or practice. Often referred to as the electronic medical record (EMR), the EMR is generally not available to outside researchers. The data collected includes administrative and demographic information, diagnosis, treatment, prescription drugs, laboratory tests, physiologic monitoring data, hospitalization, patient insurance, etc.

Individual organizations such as hospitals or health systems may provide access to internal staff. Larger collaborations, such as the NIH Collaboratory Distributed Research Network provides mediated or collaborative access to clinical data repositories by eligible researchers. Additionally, the UW De-identified Clinical Data Repository (DCDR) and the Stanford Center for Clinical Informatics allow for initial cohort identification.

About Dataset:

333 scholarly articles cite this dataset.

Unique identifier: DOI

Dataset updated: 2023

Authors: Haoyang Mi

In this dataset, we have two dataset:

1- Clinical Data_Discovery_Cohort: Name of columns: Patient ID Specimen date Dead or Alive Date of Death Date of last Follow Sex Race Stage Event Time

2- Clinical_Data_Validation_Cohort Name of columns: Patient ID Survival time (days) Event Tumor size Grade Stage Age Sex Cigarette Pack per year Type Adjuvant Batch EGFR KRAS

Feel free to put your thought and analysis in a notebook for this datasets. And you can create some interesting and valuable ML projects for this case. Thanks for your attention.

Search
Clear search
Close search
Google apps
Main menu