8 datasets found
  1. d

    Data to train RF and LGBM models for each of the four experiments within the...

    • search.dataone.org
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amani, Alireza (2023). Data to train RF and LGBM models for each of the four experiments within the article [Dataset]. http://doi.org/10.7910/DVN/WG7AWF
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Amani, Alireza
    Description

    Each file corresponds to a training or test set of one of the four experiments described in the article. The columns (variables/features) are the same as the ones in the full dataset.

  2. Py style code for volatility

    • kaggle.com
    zip
    Updated Aug 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sushi (2021). Py style code for volatility [Dataset]. https://www.kaggle.com/madquer/volatility
    Explore at:
    zip(24231567 bytes)Available download formats
    Dataset updated
    Aug 25, 2021
    Authors
    sushi
    Description

    Context

    🚀 python package style code with package code on datasets - LightGBM and TabNet This is the code of training model and inference. Normally we use ipynb style code in kaggle. I just change the code style to py package and it's better for training with shell command.

    I refer the original code below and thanks to @chumajin

    [Notebook] Reference Notebook by chumajin

    Content

    1. contents in directory of src

    • prepare data(with feature engineering),
    • lightgbm : train and predict
    • tabnet : train and predict
    • volatility_2021.ipynb : the notebook of local version for last submission with shell command.

    2. structure in detail

    • light_gbm

    -- config : yaml file of parameter for lightgbm
    -- models : saved model
    -- train.py
    -- predict test.py

    • prepare

    -- feature_engineering.py
    -- metric.py
    -- preprocessing.py
    -- seed.py
    -- tabnet preprocessing.py

    • tabnet

    -- config : tabnet hyp.yaml / tabnet config.py
    -- models : saved model
    -- predict_test.py
    -- train.py

    • volatility_2021.ipynb

    Acknowledgements

    I refer the original code below and thanks to @chumajin

    [Notebook] Reference Notebook by chumajin

  3. AutoML-GP5-LightAutoML-OOFs-and-Test-preds

    • kaggle.com
    Updated Sep 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Ryzhkov (2024). AutoML-GP5-LightAutoML-OOFs-and-Test-preds [Dataset]. https://www.kaggle.com/datasets/alexryzhkov/automl-gp5-lightautoml-oofs-and-test-preds
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 4, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Alexander Ryzhkov
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    [AutoML Grand Prix] 1st Place Solution Team LightAutoML testers - OOF and Test predictions

    🏆 Solution description

    Github repo with the training code

    TLDR: 24 base regression and classification models, GBDT + NN, with their blend.

    We trained all models (CatBoost and LGBM for regression, DenseLight and FT-Transformer for both regression and classification) with original and clipped target (clip all price values higher than 500k for training fold) using original and augmented with kagglex (added only in train fold) datasets (thanks to @lashfire).

    Our final ensemble with 10-Fold CV scores is: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F19099%2Ff15b1940fee181b446a50537a55450ae%2Finbox_597945_ec16e57f8d54df381cbc2ca8fcecb9d1_Final_solution2.png?generation=1725464647329188&alt=media" alt="">

  4. f

    Performance metrics for nine models in the training dataset.

    • figshare.com
    xls
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ligang Hao; Junjie Zhang; Yonghui Di; Zheng Qi; Peng Zhang (2025). Performance metrics for nine models in the training dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0320674.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Ligang Hao; Junjie Zhang; Yonghui Di; Zheng Qi; Peng Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance metrics for nine models in the training dataset.

  5. u

    Data from: Approximated UTCI

    • produccioncientifica.ucm.es
    Updated 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Collazo, Soledad; Collazo, Soledad (2025). Approximated UTCI [Dataset]. https://produccioncientifica.ucm.es/documentos/688b603d17bb6239d2d4a144
    Explore at:
    Dataset updated
    2025
    Authors
    Collazo, Soledad; Collazo, Soledad
    Description

    In the repository you can find a variety of data and scripts to approximate the UTCI in southern South America and apply it to forecasts generated by data-driven models:1) UTCI data from ERA5-HEAT and different meteorological variables from ERA5.2) LightGBM models trained to estimate the UTCI from different predictors.3) Two examples sripts to train the LGBM models4) Scripts for metric estimation on the test sample of different LightGBM-based models with different predictors.5) Forecasts of the traditional GFS model, and data-driven models during a heat wave in central Argentina during March 2023.6) Scripts to apply the UTCI approach on the forecasts mentioned in the previous item.

    This material is related to the article "Forecasting Heat Stress in southern South America from data-driven model outputs"

  6. f

    Data splits for “in-lab” dataset: training, validation and testing.

    • plos.figshare.com
    xls
    Updated Apr 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brandon Rufino; Ajmal Khan; Tilak Dutta; Elaine Biddiss (2024). Data splits for “in-lab” dataset: training, validation and testing. [Dataset]. http://doi.org/10.1371/journal.pone.0299888.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Apr 2, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Brandon Rufino; Ajmal Khan; Tilak Dutta; Elaine Biddiss
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Samples calculated with ≈93 ms window and 50% overlap.

  7. f

    S1 Data -

    • plos.figshare.com
    • figshare.com
    csv
    Updated Jan 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Huu Nam Nguyen; Quoc Thanh Tran; Canh Tung Ngo; Duc Dam Nguyen; Van Quan Tran (2025). S1 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0315955.s001
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 2, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Huu Nam Nguyen; Quoc Thanh Tran; Canh Tung Ngo; Duc Dam Nguyen; Van Quan Tran
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Solar energy generated from photovoltaic panel is an important energy source that brings many benefits to people and the environment. This is a growing trend globally and plays an increasingly important role in the future of the energy industry. However, it intermittent nature and potential for distributed system use require accurate forecasting to balance supply and demand, optimize energy storage, and manage grid stability. In this study, 5 machine learning models were used including: Gradient Boosting Regressor (GB), XGB Regressor (XGBoost), K-neighbors Regressor (KNN), LGBM Regressor (LightGBM), and CatBoost Regressor (CatBoost). Leveraging a dataset of 21045 samples, factors like Humidity, Ambient temperature, Wind speed, Visibility, Cloud ceiling and Pressure serve as inputs for constructing these machine learning models in forecasting solar energy. Model accuracy is meticulously assessed and juxtaposed using metrics such as coefficient of determination (R2), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE). The results show that the CatBoost model emerges as the frontrunner in predicting solar energy, with training values of R2 value of 0.608, RMSE of 4.478 W and MAE of 3.367 W and the testing value is R2 of 0.46, RMSE of 4.748 W and MAE of 3.583 W. SHAP analysis reveal that ambient temperature and humidity have the greatest influences on the value solar energy generated from photovoltaic panel.

  8. f

    Table_1_Predicting difficult airway intubation in thyroid surgery using...

    • frontiersin.figshare.com
    doc
    Updated Jun 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cheng-Mao Zhou; Ying Wang; Qiong Xue; Jian-Jun Yang; Yu Zhu (2023). Table_1_Predicting difficult airway intubation in thyroid surgery using multiple machine learning and deep learning algorithms.DOC [Dataset]. http://doi.org/10.3389/fpubh.2022.937471.s007
    Explore at:
    docAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    Frontiers
    Authors
    Cheng-Mao Zhou; Ying Wang; Qiong Xue; Jian-Jun Yang; Yu Zhu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundIn this paper, we examine whether machine learning and deep learning can be used to predict difficult airway intubation in patients undergoing thyroid surgery.MethodsWe used 10 machine learning and deep learning algorithms to establish a corresponding model through a training group, and then verify the results in a test group. We used R for the statistical analysis and constructed the machine learning prediction model in Python.ResultsThe top 5 weighting factors for difficult airways identified by the average algorithm in machine learning were age, sex, weight, height, and BMI. In the training group, the AUC values and accuracy and the Gradient Boosting precision were 0.932, 0.929, and 100%, respectively. As for the modeled effects of predicting difficult airways in test groups, among the models constructed by the 10 algorithms, the three algorithms with the highest AUC values were Gradient Boosting, CNN, and LGBM, with values of 0.848, 0.836, and 0.812, respectively; In addition, among the algorithms, Gradient Boosting had the highest accuracy with a value of 0.913; Additionally, among the algorithms, the Gradient Boosting algorithm had the highest precision with a value of 100%.ConclusionAccording to our results, Gradient Boosting performed best overall, with an AUC >0.8, an accuracy >90%, and a precision of 100%. Besides, the top 5 weighting factors identified by the average algorithm in machine learning for difficult airways were age, sex, weight, height, and BMI.

  9. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Amani, Alireza (2023). Data to train RF and LGBM models for each of the four experiments within the article [Dataset]. http://doi.org/10.7910/DVN/WG7AWF

Data to train RF and LGBM models for each of the four experiments within the article

Explore at:
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Amani, Alireza
Description

Each file corresponds to a training or test set of one of the four experiments described in the article. The columns (variables/features) are the same as the ones in the full dataset.

Search
Clear search
Close search
Google apps
Main menu