2 datasets found
  1. f

    Data_Sheet_1_ImputEHR: A Visualization Tool of Imputation for the Prediction...

    • figshare.com
    • frontiersin.figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yi-Hui Zhou; Ehsan Saghapour (2023). Data_Sheet_1_ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data.PDF [Dataset]. http://doi.org/10.3389/fgene.2021.691274.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Yi-Hui Zhou; Ehsan Saghapour
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Electronic health records (EHRs) have been widely adopted in recent years, but often include a high proportion of missing data, which can create difficulties in implementing machine learning and other tools of personalized medicine. Completed datasets are preferred for a number of analysis methods, and successful imputation of missing EHR data can improve interpretation and increase our power to predict health outcomes. However, use of the most popular imputation methods mainly require scripting skills, and are implemented using various packages and syntax. Thus, the implementation of a full suite of methods is generally out of reach to all except experienced data scientists. Moreover, imputation is often considered as a separate exercise from exploratory data analysis, but should be considered as art of the data exploration process. We have created a new graphical tool, ImputEHR, that is based on a Python base and allows implementation of a range of simple and sophisticated (e.g., gradient-boosted tree-based and neural network) data imputation approaches. In addition to imputation, the tool enables data exploration for informed decision-making, as well as implementing machine learning prediction tools for response data selected by the user. Although the approach works for any missing data problem, the tool is primarily motivated by problems encountered for EHR and other biomedical data. We illustrate the tool using multiple real datasets, providing performance measures of imputation and downstream predictive analysis.

  2. Z

    Pre-processed daily ERA5 and MODIS AOD data (2003 - 2022) ready for use in...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nowak, Trish E. (2024). Pre-processed daily ERA5 and MODIS AOD data (2003 - 2022) ready for use in AI/ML forecasting [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10593151
    Explore at:
    Dataset updated
    Jul 7, 2024
    Dataset provided by
    Augousti, Andy T.
    Simmons, Benno I.
    Nowak, Trish E.
    Siegert, Stefan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Long-term, pre-processed, atmospheric datasets for use in Machine Learning/AI based forecasting. Initially intended to predict AOD, however can be adapted for prediction of other atmospheric particles.

    Pre-processed data and code

    Machine Learning ready NumPy* dataset constructed by pre-processing selected atmospheric variables at 5 pressure levels form ERA5 reanalysis (resulting in 35 features) and AOD data from MODIS on board of Aqua and Terra satellites. This is a long-term daily dataset which spans 20 years from 1st Jan 2003 to 31st Dec 2022 and is homogeneously structured into 1ºx1º grid cells. Missing days and AOD values from MODIS were imputed using Lattice Kriging method (Python code used for imputation included as Jupyter Notebook 'Combine_impute_AOD.ipynb'), but raw (unimputed) MODIS data are also available. All datasets were created for a purpose of training Convolutional Neural Network model designed to forecast Saharan dust (DustNet). These datasets can also be used to train other ML models, or indeed to forecast other variables.

    This dataset was used to train the DustNet model and predict 24-hr ahead AOD. Please see doi: 10.5281/zenodo.10722953 for further details on predicting AOD and the DustNet model code.

    *datasets are NumPy arrays (v1.23) created in Python v3.8.18.

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yi-Hui Zhou; Ehsan Saghapour (2023). Data_Sheet_1_ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data.PDF [Dataset]. http://doi.org/10.3389/fgene.2021.691274.s001

Data_Sheet_1_ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data.PDF

Related Article
Explore at:
pdfAvailable download formats
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Yi-Hui Zhou; Ehsan Saghapour
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Electronic health records (EHRs) have been widely adopted in recent years, but often include a high proportion of missing data, which can create difficulties in implementing machine learning and other tools of personalized medicine. Completed datasets are preferred for a number of analysis methods, and successful imputation of missing EHR data can improve interpretation and increase our power to predict health outcomes. However, use of the most popular imputation methods mainly require scripting skills, and are implemented using various packages and syntax. Thus, the implementation of a full suite of methods is generally out of reach to all except experienced data scientists. Moreover, imputation is often considered as a separate exercise from exploratory data analysis, but should be considered as art of the data exploration process. We have created a new graphical tool, ImputEHR, that is based on a Python base and allows implementation of a range of simple and sophisticated (e.g., gradient-boosted tree-based and neural network) data imputation approaches. In addition to imputation, the tool enables data exploration for informed decision-making, as well as implementing machine learning prediction tools for response data selected by the user. Although the approach works for any missing data problem, the tool is primarily motivated by problems encountered for EHR and other biomedical data. We illustrate the tool using multiple real datasets, providing performance measures of imputation and downstream predictive analysis.

Search
Clear search
Close search
Google apps
Main menu