7 datasets found

Global Ensemble Digital Terrain Model 30m (GEDTM30)
data.europa.eu
zenodo.org
unknown
Updated May 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2025). Global Ensemble Digital Terrain Model 30m (GEDTM30) [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-15490367
Explore at:
unknown(574776)Available download formats
Dataset updated
May 22, 2025
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Disclaimer This is the first release of the Global Ensemble Digital Terrain Model (GEDTM30). Use for testing purposes only. A publication describing the methods used has been submitted to PeerJ and is currently under review. This work was funded by the European Union. However, the views and opinions expressed are solely those of the author(s) and do not necessarily reflect those of the European Union or the European Commission. Neither the European Union nor the granting authority can be held responsible for them. The data is provided "as is." The Open-Earth-Monitor project consortium, along with its suppliers and licensors, hereby disclaims all warranties of any kind, express or implied, including, without limitation, warranties of merchantability, fitness for a particular purpose, and non-infringement. Neither the Open-Earth-Monitor project consortium nor its suppliers and licensors make any warranty that the website will be error-free or that access to it will be continuous or uninterrupted. You understand that you download or otherwise obtain content or services from the website at your own discretion and risk. Description GEDTM30 is presented as a 1-arc-second (~30m) global Digital Terrain Model (DTM) generated using machine-learning-based data fusion. It was trained using a global-to-local Random Forest model with ICESat-2 and GEDI data, incorporating almost 30 billion high-quality points. To see the documentation, please visit our GEDTM30 GitHub(https://github.com/openlandmap/GEDTM30). This dataset covers the entire world and can be used for applications such as topography, hydrology, and geomorphometry analysis. Dataset Contents This dataset includes: GEDTM30Represents the predicted terrain height. Uncertainty of GEDTM30 predictionProvides an uncertainty map of the terrain prediction, derived from the standard deviation of individual tree predictions in the Random Forest model. Due to Zenodo's storage limitations, the original GEDTM30 dataset and its standard deviation map are provided via external links: GEDTM30 30m Uncertainty of GEDTM30 prediction 30m Related Identifiers Landform:Slope in Degree, Geomorphons Light and Shadow:Positive Openness, Negative Openness, Hillshade Curvature:Minimal Curvature, Maximal Curvature, Profile Curvature, Tangential Curvature, Ring Curvature, Shape Index Local Topographic Position:Difference from Mean Elevation, Spherical Standard Deviation of the Normals Hydrology:Specific Catchment Area, LS Factor, Topographic Wetness Index Data Details Time period: static. Type of data: Digital Terrain Model How the data was collected or derived: Machine learning models. Statistical Methods used: Random Forest. Limitations or exclusions in the data: The dataset does not include data Antarctica. Coordinate reference system: EPSG:4326 Bounding box (Xmin, Ymin, Xmax, Ymax): (-180, -65, 180, 85) Spatial resolution: 120m Image size: 360,000P x 178,219L File format: Cloud Optimized Geotiff (COG) format. Layer information: Layer Scale Data Type No Data Ensemble Digital Terrain Model 10 Int32 -2,147,483,647 Standard Deviation EDTM 100 UInt16 65,535 Code Availability The primary development of GEDTM30 is documented in GEDTM30 GitHub(https://github.com/openlandmap/GEDTM30). The current version (v1) code is compressed and uploaded as GEDTM30-main.zip. To access the up-to-date development please visit our GitHub page. Support If you discover a bug, artifact, or inconsistency, or if you have a question please raise a GitHub issue here Naming convention To ensure consistency and ease of use across and within the projects, we follow the standard Ai4SoilHealth and Open-Earth-Monitor file-naming convention. The convention works with 10 fields that describe important properties of the data. In this way users can search files, prepare data analysis etc, without needing to open files. For example, for edtm_rf_m_120m_s_20000101_20231231_go_epsg.4326_v20250130.tif, the fields are: generic variable name: edtm = ensemble digital terrain model variable procedure combination: rf = random forest Position in the probability distribution/variable type: m = mean | sd = standard deviation Spatial support: 120m Depth reference: s = surface Time reference begin time: 20000101 = 2000-01-01 Time reference end time: 20231231 = 2023-12-31 Bounding box: go = global EPSG code: EPSG:4326 Version code: v20250130 = version from 2025-01-30
Data from: Interpretable machine learning for analysing heterogeneous...
figshare.com
zip
Updated Jul 27, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arif Masrur; Manzhu Yu; Prasenjit Mitra; Donna Peuquet; Alan Taylor (2021). Interpretable machine learning for analysing heterogeneous drivers of geographic events in space-time [Dataset]. http://doi.org/10.6084/m9.figshare.14152016.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14152016.v1
Dataset updated
Jul 27, 2021
Dataset provided by
Figsharehttp://figshare.com/
Authors
Arif Masrur; Manzhu Yu; Prasenjit Mitra; Donna Peuquet; Alan Taylor
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ABSTRACT

Machine learning (ML) interpretability has become increasingly crucial for identifying accurate and relevant structural relationships between spatial events and factors that explain them. Methodologically aspatial ML algorithms with an apparent high predictive power ignore non-stationary domain relationships in spatio-temporal data (e.g., dependence, heterogeneity), leading to incorrect interpretations and poor management decisions. This study addresses this critical methodological issue of ‘interpretability’ in ML-based modeling of structural relationships using the example of heterogeneous drivers of wildfires across the United States. Specifically, we present and evaluate a spatio-temporally interpretable random forest (iST-RF) that uses spatio-temporal sampling-based training and weighted prediction. Although the ultimate scientific objective is to derive interpretation in space-time, experiments show that iST-RF can improve predictive accuracy (76%) compared to the aspatial RF approach (70%), while enhancing interpretations of the trained model’s spatio-temporal relevance for its ensemble prediction. This novel approach can help balance prediction and interpretation with fidelity in a spatial data science life cycle. However, challenges exist for predictive modeling when dataset is very small, because in such cases locally optimized sub-model’s prediction performance can be suboptimal. With that caveat, our proposed approach is an ideal choice for identifying drivers of spatio-temporal events at country or regional-scale studies. Author contributions

A.M. conceived and designed the study, coded and performed data processing, modeling and interpretations, and wrote the manuscript. M.Y., P.M., D.P., and A.T. contributed to the refinement of the proposed methodology, experiments, and write-up. All authors reviewed the manuscript.
d
Data from: Global hotspots of shark interactions with industrial longline...
search.dataone.org
datadryad.org
Updated Jul 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Echelle S. Burns; Darcy Bradley; Lennon R. Thomas (2025). Global hotspots of shark interactions with industrial longline fisheries [Dataset]. http://doi.org/10.25349/D9789W
Explore at:
Unique identifier
https://doi.org/10.25349/D9789W
Dataset updated
Jul 14, 2025
Dataset provided by
Dryad Digital Repository
Authors
Echelle S. Burns; Darcy Bradley; Lennon R. Thomas
Time period covered
Jan 1, 2022
Description
We find shark catch risk hotspots in all ocean basins, with notable high-risk areas off Southwest Africa and in the Eastern Tropical Pacific. These patterns are mostly driven by more common species such as blue sharks, though risk areas for less common, Endangered and Critically Endangered species are also identified. Clear spatial patterns of shark fishing risk identified here can be leveraged to develop spatial management strategies for threatened populations. Sharks are susceptible to industrial longline fishing due to their slow life histories and association with targeted tuna stocks. Identifying fished areas with high shark interaction risk is vital to protect threatened species. We harmonize shark catch records from global tuna Regional Fisheries Management Organizations (tRFMOs) from 2012â€“2020 and use machine learning to identify where sharks are most threatened by longline fishing. Most spatial patterns are driven by more common species such as blue sharks, though risk areas fo..., We built Random Forest (RF) machine learning models to estimate spatially explicit shark catch risk globally by longlines using a suite of catch and effort data from tRFMOs, additional effort datasets for fishing effort (Global Fishing Watch), environmental datasets (sea surface temperature, sea surface height, chlorophyll-A) and economic datasets (ex-vessel price). More information on the exact datasets used can be foundÂ in the associated software works.Â For each tRFMO, we tested various spatial resolutions and shark catch units to determine the most appropriate dataset for future model runs, identified by the highest R2 for each tRFMO. Once a resolution and unit were selected for a tRFMO, the same resolution was used in future model runs. We then conducted a second phase of parameter testing for combinations of the following variables: sea surface temperature (mean or mean and coefficient of variation), chlorophyll-A (mean or mean and coefficient of variation), sea surface height (m..., Please refer to the associated software works for instructions on how to download the input dataset and set up your folder structure.Â The files saved here are the outputs for machine learning models that were run using publicly available tRFMO datasets. Please refer to the README files for metadata.

<r...
Credit_Risk_Analysis
kaggle.com
Updated Aug 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nandita Pore (2023). Credit_Risk_Analysis [Dataset]. https://www.kaggle.com/datasets/nanditapore/credit-risk-analysis/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 28, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Nandita Pore
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Description: Welcome to the "Loan Applicant Data for Credit Risk Analysis" dataset on Kaggle! This dataset provides essential information about loan applicants and their characteristics. Your task is to develop predictive models to determine the likelihood of loan default based on these simplified features.

In today's financial landscape, assessing credit risk is crucial for lenders and financial institutions. This dataset offers a simplified view of the factors that contribute to credit risk, making it an excellent opportunity for data scientists to apply their skills in machine learning and predictive modeling.

Column Descriptions:

ID: Unique identifier for each loan applicant.

Age: Age of the loan applicant.

Income: Income of the loan applicant.

Home: Home ownership status (Own, Mortgage, Rent).

Emp_Length: Employment length in years.

Intent: Purpose of the loan (e.g., education, home improvement).

Amount: Loan amount applied for.

Rate: Interest rate on the loan.

Status: Loan approval status (Fully Paid, Charged Off, Current).

Percent_Income: Loan amount as a percentage of income.

Default: Whether the applicant has defaulted on a loan previously (Yes, No).

Cred_Length: Length of the applicant's credit history.

Explore this dataset, preprocess the data as needed, and develop machine learning models, especially using Random Forest, to predict loan default. Your insights and solutions could contribute to better credit risk assessment methods and potentially help lenders make more informed decisions.

Remember to respect data privacy and ethics guidelines while working with this data. Good luck, and happy analyzing!
n
Data for "Prediction of Phakic Intraocular Lens Vault Using Machine Learning...
narcis.nl
data.mendeley.com
Updated Jan 11, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yoo, T (via Mendeley Data) (2021). Data for "Prediction of Phakic Intraocular Lens Vault Using Machine Learning of Anterior Segment Optical Coherence Tomography Metrics" [Dataset]. http://doi.org/10.17632/ffn745r57z.2
Explore at:
Unique identifier
https://doi.org/10.17632/ffn745r57z.2
Dataset updated
Jan 11, 2021
Dataset provided by
Data Archiving and Networked Services (DANS)
Authors
Yoo, T (via Mendeley Data)
Description
Prediction of Phakic Intraocular Lens Vault Using Machine Learning of Anterior Segment Optical Coherence Tomography Metrics. Authors: Kazutaka Kamiya, MD, PhD, Ik Hee Ryu, MD, MS, Tae Keun Yoo, MD, Jung Sub Kim MD, In Sik Lee, MD, PhD, Jin Kook Kim MD, Wakako Ando CO, Nobuyuki Shoji, MD, PhD, Tomofusa, Yamauchi, MD, PhD, Hitoshi Tabuchi, MD, PhD.

We hypothesize that machine learning of preoperative biometric data obtained by the As-OCT may be clinically beneficial for predicting the actual ICL vault. Therefore, we built the machine learning model using Random Forest to predict ICL vault after surgery.

This multicenter study comprised one thousand seven hundred forty-five eyes of 1745 consecutive patients (656 men and 1089 women), who underwent EVO ICL implantation (V4c and V5 Visian ICL with KS-AquaPORT) for the correction of moderate to high myopia and myopic astigmatism, and who completed at least a 1-month follow-up, at Kitasato University Hospital (Kanagawa, Japan), or at B&VIIT Eye Center (Seoul, Korea).

This data file (RFR_model(feature=12).mat) is the final trained random forest model for MATLAB 2020a.

Python version:

from sklearn.model_selection import train_test_split import pandas as pd import numpy as np from sklearn.ensemble import RandomForestClassifier from sklearn.ensemble import RandomForestRegressor

connect data in your google drive

from google.colab import auth auth.authenticate_user() from google.colab import drive drive.mount('/content/gdrive')

Change the path for the custom data

In this case, we used ICL vault prediction using preop measurement

dataset = pd.read_csv('gdrive/My Drive/ICL/data_icl.csv') dataset.head()

optimal features (sorted by importance) :

1. ICL size 2. ICL power 3. LV 4. CLR 5. ACD 6. ATA

7. MSE 8.Age 9. Pupil size 10. WTW 11. CCT 12. ACW

y = dataset['Vault_1M'] X = dataset.drop(['Vault_1M'], axis = 1)

Split the dataset to train and test data, if necessary.

For example, we can split data to 8:2 as a simple validation test

train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=0.2, random_state=0)

In our study, we already defined the training (B&VIIT Eye Center, n=1455) and test (Kitasato University, n=290) dataset, this code was not necessary to perform our analysis.

Optimal parameter search could be performed in this section

parameters = {'bootstrap': True, 'min_samples_leaf': 3, 'n_estimators': 500, 'criterion': 'mae' 'min_samples_split': 10, 'max_features': 'sqrt', 'max_depth': 6, 'max_leaf_nodes': None}

RF_model = RandomForestRegressor(**parameters) RF_model.fit(train_X, train_y) RF_predictions = RF_model.predict(test_X) importance = RF_model.feature_importances_
f
Dataset of the Konga community-based cluster-randomized trial in Tanzanian...
figshare.com
txt
Updated Oct 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kihulya Mageda; Leonard K. Katalambula; Ntuli A. Kapologwe; Pammla Petrucka (2023). Dataset of the Konga community-based cluster-randomized trial in Tanzanian children with high HIV viral loads [Dataset]. http://doi.org/10.6084/m9.figshare.22141400.v4
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22141400.v4
Dataset updated
Oct 2, 2023
Dataset provided by
figshare
Authors
Kihulya Mageda; Leonard K. Katalambula; Ntuli A. Kapologwe; Pammla Petrucka
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Tanzania
Description
This dataset contains data from a cluster-randomized controlled trial to evaluate the effectiveness of a community-based intervention (Konga model) for viral load suppression among children living with HIV in Simiyu, Tanzania. Children aged 2‒14 years with a viral load >1,000 cells/mL were randomly assigned to 15 treatment and 30 control clusters based on their area of residence. The intervention included adherence counseling, psychosocial support, and screening for comorbidities. Viral load was measured at baseline and 6 months later. We compared the mean viral loads of participants before and after the intervention. The 82 participants had a mean age of 9 years and a baseline median viral load of 13,150 copies/mL. After the study, the intervention group had significantly higher adherence (92%) than the control group (80%). After adjusting for baseline viral load, the intervention explained 4% of the viral load variation. This trial showed significant benefits of the Konga model. We recommend conducting similar trials elsewhere to confirm the generalizability of the intervention, so that it can be implemented elsewhere Further, we believe that this data will be of interest to the readership of your repository because our data increases our current understanding of the social dimensions of HIV in an African context and provides recommendations related to improving HIV care, particularly for children
Water-quality data imputation with a high percentage of missing values: a...
zenodo.org
data.niaid.nih.gov
csv
Updated Jun 8, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rafael Rodríguez; Rafael Rodríguez; Marcos Pastorini; Marcos Pastorini; Lorena Etcheverry; Lorena Etcheverry; Christian Chreties; Mónica Fossati; Alberto Castro; Alberto Castro; Angela Gorgoglione; Angela Gorgoglione; Christian Chreties; Mónica Fossati (2021). Water-quality data imputation with a high percentage of missing values: a machine learning approach [Dataset]. http://doi.org/10.5281/zenodo.4731169
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4731169
Dataset updated
Jun 8, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Rafael Rodríguez; Rafael Rodríguez; Marcos Pastorini; Marcos Pastorini; Lorena Etcheverry; Lorena Etcheverry; Christian Chreties; Mónica Fossati; Alberto Castro; Alberto Castro; Angela Gorgoglione; Angela Gorgoglione; Christian Chreties; Mónica Fossati
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The monitoring of surface-water quality followed by water-quality modeling and analysis is essential for generating effective strategies in water resource management. However, water-quality studies are limited by the lack of complete and reliable data sets on surface-water-quality variables. These deficiencies are particularly noticeable in developing countries.

This work focuses on surface-water-quality data from Santa Lucía Chico river (Uruguay), a mixed lotic and lentic river system. Data collected at six monitoring stations are publicly available at https://www.dinama.gub.uy/oan/datos-abiertos/calidad-agua/. The high temporal and spatial variability that characterizes water-quality variables and the high rate of missing values (between 50% and 70%) raises significant challenges.

To deal with missing values, we applied several statistical and machine-learning imputation methods. The competing algorithms implemented belonged to both univariate and multivariate imputation methods (inverse distance weighting (IDW), Random Forest Regressor (RFR), Ridge (R), Bayesian Ridge (BR), AdaBoost (AB), Huber Regressor (HR), Support Vector Regressor (SVR), and K-nearest neighbors Regressor (KNNR)).

IDW outperformed the others, achieving a very good performance (NSE greater than 0.8) in most cases.

In this dataset, we include the original and imputed values for the following variables:

Water temperature (Tw)

Dissolved oxygen (DO)

Electrical conductivity (EC)

pH

Turbidity (Turb)

Nitrite (NO2-)

Nitrate (NO3-)

Total Nitrogen (TN)

Each variable is identified as [STATION] VARIABLE FULL NAME (VARIABLE SHORT NAME) [UNIT METRIC].

More details about the study area, the original datasets, and the methodology adopted can be found in our paper https://www.mdpi.com/2071-1050/13/11/6318.

If you use this dataset in your work, please cite our paper:
Rodríguez, R.; Pastorini, M.; Etcheverry, L.; Chreties, C.; Fossati, M.; Castro, A.; Gorgoglione, A. Water-Quality Data Imputation with a High Percentage of Missing Values: A Machine Learning Approach. Sustainability 2021, 13, 6318. https://doi.org/10.3390/su13116318
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Zenodo (2025). Global Ensemble Digital Terrain Model 30m (GEDTM30) [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-15490367

Global Ensemble Digital Terrain Model 30m (GEDTM30)

Explore at:

unknown(574776)Available download formats

Dataset updated

May 22, 2025

Dataset authored and provided by

Zenodohttp://zenodo.org/

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Disclaimer This is the first release of the Global Ensemble Digital Terrain Model (GEDTM30). Use for testing purposes only. A publication describing the methods used has been submitted to PeerJ and is currently under review. This work was funded by the European Union. However, the views and opinions expressed are solely those of the author(s) and do not necessarily reflect those of the European Union or the European Commission. Neither the European Union nor the granting authority can be held responsible for them. The data is provided "as is." The Open-Earth-Monitor project consortium, along with its suppliers and licensors, hereby disclaims all warranties of any kind, express or implied, including, without limitation, warranties of merchantability, fitness for a particular purpose, and non-infringement. Neither the Open-Earth-Monitor project consortium nor its suppliers and licensors make any warranty that the website will be error-free or that access to it will be continuous or uninterrupted. You understand that you download or otherwise obtain content or services from the website at your own discretion and risk. Description GEDTM30 is presented as a 1-arc-second (~30m) global Digital Terrain Model (DTM) generated using machine-learning-based data fusion. It was trained using a global-to-local Random Forest model with ICESat-2 and GEDI data, incorporating almost 30 billion high-quality points. To see the documentation, please visit our GEDTM30 GitHub(https://github.com/openlandmap/GEDTM30). This dataset covers the entire world and can be used for applications such as topography, hydrology, and geomorphometry analysis. Dataset Contents This dataset includes: GEDTM30Represents the predicted terrain height. Uncertainty of GEDTM30 predictionProvides an uncertainty map of the terrain prediction, derived from the standard deviation of individual tree predictions in the Random Forest model. Due to Zenodo's storage limitations, the original GEDTM30 dataset and its standard deviation map are provided via external links: GEDTM30 30m Uncertainty of GEDTM30 prediction 30m Related Identifiers Landform:Slope in Degree, Geomorphons Light and Shadow:Positive Openness, Negative Openness, Hillshade Curvature:Minimal Curvature, Maximal Curvature, Profile Curvature, Tangential Curvature, Ring Curvature, Shape Index Local Topographic Position:Difference from Mean Elevation, Spherical Standard Deviation of the Normals Hydrology:Specific Catchment Area, LS Factor, Topographic Wetness Index Data Details Time period: static. Type of data: Digital Terrain Model How the data was collected or derived: Machine learning models. Statistical Methods used: Random Forest. Limitations or exclusions in the data: The dataset does not include data Antarctica. Coordinate reference system: EPSG:4326 Bounding box (Xmin, Ymin, Xmax, Ymax): (-180, -65, 180, 85) Spatial resolution: 120m Image size: 360,000P x 178,219L File format: Cloud Optimized Geotiff (COG) format. Layer information: Layer Scale Data Type No Data Ensemble Digital Terrain Model 10 Int32 -2,147,483,647 Standard Deviation EDTM 100 UInt16 65,535 Code Availability The primary development of GEDTM30 is documented in GEDTM30 GitHub(https://github.com/openlandmap/GEDTM30). The current version (v1) code is compressed and uploaded as GEDTM30-main.zip. To access the up-to-date development please visit our GitHub page. Support If you discover a bug, artifact, or inconsistency, or if you have a question please raise a GitHub issue here Naming convention To ensure consistency and ease of use across and within the projects, we follow the standard Ai4SoilHealth and Open-Earth-Monitor file-naming convention. The convention works with 10 fields that describe important properties of the data. In this way users can search files, prepare data analysis etc, without needing to open files. For example, for edtm_rf_m_120m_s_20000101_20231231_go_epsg.4326_v20250130.tif, the fields are: generic variable name: edtm = ensemble digital terrain model variable procedure combination: rf = random forest Position in the probability distribution/variable type: m = mean | sd = standard deviation Spatial support: 120m Depth reference: s = surface Time reference begin time: 20000101 = 2000-01-01 Time reference end time: 20231231 = 2023-12-31 Bounding box: go = global EPSG code: EPSG:4326 Version code: v20250130 = version from 2025-01-30

Clear search

Close search

Google apps

Main menu

Global Ensemble Digital Terrain Model 30m (GEDTM30)

Data from: Interpretable machine learning for analysing heterogeneous...

Data from: Global hotspots of shark interactions with industrial longline...

Credit_Risk_Analysis

Data for "Prediction of Phakic Intraocular Lens Vault Using Machine Learning...

connect data in your google drive

Change the path for the custom data

In this case, we used ICL vault prediction using preop measurement

optimal features (sorted by importance) :

1. ICL size 2. ICL power 3. LV 4. CLR 5. ACD 6. ATA

7. MSE 8.Age 9. Pupil size 10. WTW 11. CCT 12. ACW

Split the dataset to train and test data, if necessary.

For example, we can split data to 8:2 as a simple validation test

In our study, we already defined the training (B&VIIT Eye Center, n=1455) and test (Kitasato University, n=290) dataset, this code was not necessary to perform our analysis.

Optimal parameter search could be performed in this section

Dataset of the Konga community-based cluster-randomized trial in Tanzanian...

Water-quality data imputation with a high percentage of missing values: a...

Global Ensemble Digital Terrain Model 30m (GEDTM30)