2 datasets found

f
Data_Sheet_1_ImputEHR: A Visualization Tool of Imputation for the Prediction...
figshare.com
frontiersin.figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yi-Hui Zhou; Ehsan Saghapour (2023). Data_Sheet_1_ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data.PDF [Dataset]. http://doi.org/10.3389/fgene.2021.691274.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2021.691274.s001
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Yi-Hui Zhou; Ehsan Saghapour
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Electronic health records (EHRs) have been widely adopted in recent years, but often include a high proportion of missing data, which can create difficulties in implementing machine learning and other tools of personalized medicine. Completed datasets are preferred for a number of analysis methods, and successful imputation of missing EHR data can improve interpretation and increase our power to predict health outcomes. However, use of the most popular imputation methods mainly require scripting skills, and are implemented using various packages and syntax. Thus, the implementation of a full suite of methods is generally out of reach to all except experienced data scientists. Moreover, imputation is often considered as a separate exercise from exploratory data analysis, but should be considered as art of the data exploration process. We have created a new graphical tool, ImputEHR, that is based on a Python base and allows implementation of a range of simple and sophisticated (e.g., gradient-boosted tree-based and neural network) data imputation approaches. In addition to imputation, the tool enables data exploration for informed decision-making, as well as implementing machine learning prediction tools for response data selected by the user. Although the approach works for any missing data problem, the tool is primarily motivated by problems encountered for EHR and other biomedical data. We illustrate the tool using multiple real datasets, providing performance measures of imputation and downstream predictive analysis.
Z
Pre-processed daily ERA5 and MODIS AOD data (2003 - 2022) ready for use in...
data.niaid.nih.gov
zenodo.org
Updated Jul 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nowak, Trish E. (2024). Pre-processed daily ERA5 and MODIS AOD data (2003 - 2022) ready for use in AI/ML forecasting [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10593151
Explore at:
Dataset updated
Jul 7, 2024
Dataset provided by
Augousti, Andy T.
Simmons, Benno I.
Nowak, Trish E.
Siegert, Stefan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Long-term, pre-processed, atmospheric datasets for use in Machine Learning/AI based forecasting. Initially intended to predict AOD, however can be adapted for prediction of other atmospheric particles.

Pre-processed data and code

Machine Learning ready NumPy* dataset constructed by pre-processing selected atmospheric variables at 5 pressure levels form ERA5 reanalysis (resulting in 35 features) and AOD data from MODIS on board of Aqua and Terra satellites. This is a long-term daily dataset which spans 20 years from 1st Jan 2003 to 31st Dec 2022 and is homogeneously structured into 1ºx1º grid cells. Missing days and AOD values from MODIS were imputed using Lattice Kriging method (Python code used for imputation included as Jupyter Notebook 'Combine_impute_AOD.ipynb'), but raw (unimputed) MODIS data are also available. All datasets were created for a purpose of training Convolutional Neural Network model designed to forecast Saharan dust (DustNet). These datasets can also be used to train other ML models, or indeed to forecast other variables.

This dataset was used to train the DustNet model and predict 24-hr ahead AOD. Please see doi: 10.5281/zenodo.10722953 for further details on predicting AOD and the DustNet model code.

*datasets are NumPy arrays (v1.23) created in Python v3.8.18.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Yi-Hui Zhou; Ehsan Saghapour (2023). Data_Sheet_1_ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data.PDF [Dataset]. http://doi.org/10.3389/fgene.2021.691274.s001

Data_Sheet_1_ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data.PDF

Explore at:

pdfAvailable download formats

Unique identifier

https://doi.org/10.3389/fgene.2021.691274.s001

Dataset updated

Jun 1, 2023

Dataset provided by

Frontiers

Authors

Yi-Hui Zhou; Ehsan Saghapour

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Electronic health records (EHRs) have been widely adopted in recent years, but often include a high proportion of missing data, which can create difficulties in implementing machine learning and other tools of personalized medicine. Completed datasets are preferred for a number of analysis methods, and successful imputation of missing EHR data can improve interpretation and increase our power to predict health outcomes. However, use of the most popular imputation methods mainly require scripting skills, and are implemented using various packages and syntax. Thus, the implementation of a full suite of methods is generally out of reach to all except experienced data scientists. Moreover, imputation is often considered as a separate exercise from exploratory data analysis, but should be considered as art of the data exploration process. We have created a new graphical tool, ImputEHR, that is based on a Python base and allows implementation of a range of simple and sophisticated (e.g., gradient-boosted tree-based and neural network) data imputation approaches. In addition to imputation, the tool enables data exploration for informed decision-making, as well as implementing machine learning prediction tools for response data selected by the user. Although the approach works for any missing data problem, the tool is primarily motivated by problems encountered for EHR and other biomedical data. We illustrate the tool using multiple real datasets, providing performance measures of imputation and downstream predictive analysis.

Clear search

Close search

Google apps

Main menu

Data_Sheet_1_ImputEHR: A Visualization Tool of Imputation for the Prediction...

Pre-processed daily ERA5 and MODIS AOD data (2003 - 2022) ready for use in...

Data_Sheet_1_ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data.PDF