8 datasets found

d
Data to train RF and LGBM models for each of the four experiments within the...
search.dataone.org
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amani, Alireza (2023). Data to train RF and LGBM models for each of the four experiments within the article [Dataset]. http://doi.org/10.7910/DVN/WG7AWF
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/WG7AWF
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Amani, Alireza
Description
Each file corresponds to a training or test set of one of the four experiments described in the article. The columns (variables/features) are the same as the ones in the full dataset.
Py style code for volatility
kaggle.com
zip
Updated Aug 25, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
sushi (2021). Py style code for volatility [Dataset]. https://www.kaggle.com/madquer/volatility
Explore at:
zip(24231567 bytes)Available download formats
Dataset updated
Aug 25, 2021
Authors
sushi
Description
Context

🚀 python package style code with package code on datasets - LightGBM and TabNet This is the code of training model and inference. Normally we use ipynb style code in kaggle. I just change the code style to py package and it's better for training with shell command.

I refer the original code below and thanks to @chumajin

[Notebook] Reference Notebook by chumajin

Content

1. contents in directory of src

prepare data(with feature engineering),

lightgbm : train and predict

tabnet : train and predict

volatility_2021.ipynb : the notebook of local version for last submission with shell command.

2. structure in detail

light_gbm

-- config : yaml file of parameter for lightgbm
-- models : saved model
-- train.py
-- predict test.py

prepare

-- feature_engineering.py
-- metric.py
-- preprocessing.py
-- seed.py
-- tabnet preprocessing.py

tabnet

-- config : tabnet hyp.yaml / tabnet config.py
-- models : saved model
-- predict_test.py
-- train.py

volatility_2021.ipynb

Acknowledgements

I refer the original code below and thanks to @chumajin

[Notebook] Reference Notebook by chumajin
AutoML-GP5-LightAutoML-OOFs-and-Test-preds
kaggle.com
Updated Sep 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Ryzhkov (2024). AutoML-GP5-LightAutoML-OOFs-and-Test-preds [Dataset]. https://www.kaggle.com/datasets/alexryzhkov/automl-gp5-lightautoml-oofs-and-test-preds
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 4, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Alexander Ryzhkov
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
[AutoML Grand Prix] 1st Place Solution Team LightAutoML testers - OOF and Test predictions

🏆 Solution description

Github repo with the training code

TLDR: 24 base regression and classification models, GBDT + NN, with their blend.

We trained all models (CatBoost and LGBM for regression, DenseLight and FT-Transformer for both regression and classification) with original and clipped target (clip all price values higher than 500k for training fold) using original and augmented with kagglex (added only in train fold) datasets (thanks to @lashfire).

Our final ensemble with 10-Fold CV scores is: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F19099%2Ff15b1940fee181b446a50537a55450ae%2Finbox_597945_ec16e57f8d54df381cbc2ca8fcecb9d1_Final_solution2.png?generation=1725464647329188&alt=media" alt="">
f
Performance metrics for nine models in the training dataset.
figshare.com
xls
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ligang Hao; Junjie Zhang; Yonghui Di; Zheng Qi; Peng Zhang (2025). Performance metrics for nine models in the training dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0320674.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0320674.t003
Dataset updated
Apr 1, 2025
Dataset provided by
PLOS ONE
Authors
Ligang Hao; Junjie Zhang; Yonghui Di; Zheng Qi; Peng Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Performance metrics for nine models in the training dataset.
u
Data from: Approximated UTCI
produccioncientifica.ucm.es
Updated 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Collazo, Soledad; Collazo, Soledad (2025). Approximated UTCI [Dataset]. https://produccioncientifica.ucm.es/documentos/688b603d17bb6239d2d4a144
Explore at:
Dataset updated
2025
Authors
Collazo, Soledad; Collazo, Soledad
Description
In the repository you can find a variety of data and scripts to approximate the UTCI in southern South America and apply it to forecasts generated by data-driven models:1) UTCI data from ERA5-HEAT and different meteorological variables from ERA5.2) LightGBM models trained to estimate the UTCI from different predictors.3) Two examples sripts to train the LGBM models4) Scripts for metric estimation on the test sample of different LightGBM-based models with different predictors.5) Forecasts of the traditional GFS model, and data-driven models during a heat wave in central Argentina during March 2023.6) Scripts to apply the UTCI approach on the forecasts mentioned in the previous item.

This material is related to the article "Forecasting Heat Stress in southern South America from data-driven model outputs"
f
Data splits for “in-lab” dataset: training, validation and testing.
plos.figshare.com
xls
Updated Apr 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brandon Rufino; Ajmal Khan; Tilak Dutta; Elaine Biddiss (2024). Data splits for “in-lab” dataset: training, validation and testing. [Dataset]. http://doi.org/10.1371/journal.pone.0299888.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0299888.t001
Dataset updated
Apr 2, 2024
Dataset provided by
PLOS ONE
Authors
Brandon Rufino; Ajmal Khan; Tilak Dutta; Elaine Biddiss
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Samples calculated with ≈93 ms window and 50% overlap.
f
S1 Data -
plos.figshare.com
figshare.com
csv
Updated Jan 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Huu Nam Nguyen; Quoc Thanh Tran; Canh Tung Ngo; Duc Dam Nguyen; Van Quan Tran (2025). S1 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0315955.s001
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0315955.s001
Dataset updated
Jan 2, 2025
Dataset provided by
PLOS ONE
Authors
Huu Nam Nguyen; Quoc Thanh Tran; Canh Tung Ngo; Duc Dam Nguyen; Van Quan Tran
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Solar energy generated from photovoltaic panel is an important energy source that brings many benefits to people and the environment. This is a growing trend globally and plays an increasingly important role in the future of the energy industry. However, it intermittent nature and potential for distributed system use require accurate forecasting to balance supply and demand, optimize energy storage, and manage grid stability. In this study, 5 machine learning models were used including: Gradient Boosting Regressor (GB), XGB Regressor (XGBoost), K-neighbors Regressor (KNN), LGBM Regressor (LightGBM), and CatBoost Regressor (CatBoost). Leveraging a dataset of 21045 samples, factors like Humidity, Ambient temperature, Wind speed, Visibility, Cloud ceiling and Pressure serve as inputs for constructing these machine learning models in forecasting solar energy. Model accuracy is meticulously assessed and juxtaposed using metrics such as coefficient of determination (R2), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE). The results show that the CatBoost model emerges as the frontrunner in predicting solar energy, with training values of R2 value of 0.608, RMSE of 4.478 W and MAE of 3.367 W and the testing value is R2 of 0.46, RMSE of 4.748 W and MAE of 3.583 W. SHAP analysis reveal that ambient temperature and humidity have the greatest influences on the value solar energy generated from photovoltaic panel.
f
Table_1_Predicting difficult airway intubation in thyroid surgery using...
frontiersin.figshare.com
doc
Updated Jun 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cheng-Mao Zhou; Ying Wang; Qiong Xue; Jian-Jun Yang; Yu Zhu (2023). Table_1_Predicting difficult airway intubation in thyroid surgery using multiple machine learning and deep learning algorithms.DOC [Dataset]. http://doi.org/10.3389/fpubh.2022.937471.s007
Explore at:
docAvailable download formats
Unique identifier
https://doi.org/10.3389/fpubh.2022.937471.s007
Dataset updated
Jun 8, 2023
Dataset provided by
Frontiers
Authors
Cheng-Mao Zhou; Ying Wang; Qiong Xue; Jian-Jun Yang; Yu Zhu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundIn this paper, we examine whether machine learning and deep learning can be used to predict difficult airway intubation in patients undergoing thyroid surgery.MethodsWe used 10 machine learning and deep learning algorithms to establish a corresponding model through a training group, and then verify the results in a test group. We used R for the statistical analysis and constructed the machine learning prediction model in Python.ResultsThe top 5 weighting factors for difficult airways identified by the average algorithm in machine learning were age, sex, weight, height, and BMI. In the training group, the AUC values and accuracy and the Gradient Boosting precision were 0.932, 0.929, and 100%, respectively. As for the modeled effects of predicting difficult airways in test groups, among the models constructed by the 10 algorithms, the three algorithms with the highest AUC values were Gradient Boosting, CNN, and LGBM, with values of 0.848, 0.836, and 0.812, respectively; In addition, among the algorithms, Gradient Boosting had the highest accuracy with a value of 0.913; Additionally, among the algorithms, the Gradient Boosting algorithm had the highest precision with a value of 100%.ConclusionAccording to our results, Gradient Boosting performed best overall, with an AUC >0.8, an accuracy >90%, and a precision of 100%. Besides, the top 5 weighting factors identified by the average algorithm in machine learning for difficult airways were age, sex, weight, height, and BMI.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Amani, Alireza (2023). Data to train RF and LGBM models for each of the four experiments within the article [Dataset]. http://doi.org/10.7910/DVN/WG7AWF

Data to train RF and LGBM models for each of the four experiments within the article

Explore at:

Unique identifier

https://doi.org/10.7910/DVN/WG7AWF

Dataset updated

Nov 8, 2023

Dataset provided by

Harvard Dataverse

Authors

Amani, Alireza

Description

Each file corresponds to a training or test set of one of the four experiments described in the article. The columns (variables/features) are the same as the ones in the full dataset.

Clear search

Close search

Google apps

Main menu

Data to train RF and LGBM models for each of the four experiments within the...

Py style code for volatility

Context

Content

1. contents in directory of src

2. structure in detail

Acknowledgements

AutoML-GP5-LightAutoML-OOFs-and-Test-preds

[AutoML Grand Prix] 1st Place Solution Team LightAutoML testers - OOF and Test predictions

🏆 Solution description

Github repo with the training code

Performance metrics for nine models in the training dataset.

Data from: Approximated UTCI

Data splits for “in-lab” dataset: training, validation and testing.

S1 Data -

Table_1_Predicting difficult airway intubation in thyroid surgery using...

Data to train RF and LGBM models for each of the four experiments within the article