100+ datasets found

U
Training and validation data from the AI for Critical Mineral Assessment...
data.usgs.gov
s.cnmilf.com
+1more
Updated Dec 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Margaret Goldman; Joshua Rosera; Graham Lederer; Garth Graham; Asitang Mishra; Alice Yepremyan (2023). Training and validation data from the AI for Critical Mineral Assessment Competition [Dataset]. http://doi.org/10.5066/P9FXSPT1
Explore at:
Unique identifier
https://doi.org/10.5066/P9FXSPT1
Dataset updated
Dec 27, 2023
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
Margaret Goldman; Joshua Rosera; Graham Lederer; Garth Graham; Asitang Mishra; Alice Yepremyan
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Time period covered
2022 - 2023
Description
Extracting useful and accurate information from scanned geologic and other earth science maps is a time-consuming and laborious process involving manual human effort. To address this limitation, the USGS partnered with the Defense Advanced Research Projects Agency (DARPA) to run the AI for Critical Mineral Assessment Competition, soliciting innovative solutions for automatically georeferencing and extracting features from maps. The competition opened for registration in August 2022 and concluded in December 2022. Training and validation data from the competition are provided here, as well as competition details and baseline solutions. The data are derived from published sources and are provided to the public to support continued development of automated georeferencing and feature extraction tools. References for all maps are included with the data.
Metatasks for AutoGluon - ROC AUC and Balanced Accuracy
figshare.com
bin
Updated Jul 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lennart Purucker (2023). Metatasks for AutoGluon - ROC AUC and Balanced Accuracy [Dataset]. http://doi.org/10.6084/m9.figshare.23609361.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.23609361.v1
Dataset updated
Jul 1, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Lennart Purucker
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Prediction Data of Base Models from AutoGluon on 71 classification datasets from the AutoML Benchmark for Balanced Accuracy and ROC AUC.

The files of this figshare item include data that was collected for the paper: CMA-ES for Post Hoc Ensembling in AutoML: A Great Success and Salvageable Failure, Lennart Purucker, Joeran Beel, Second International Conference on Automated Machine Learning, 2023.

The data was stored and used with the assembled framework: https://github.com/ISG-Siegen/assembled.

In detail, the data contains the predictions of base models on validation and test as produced by running AutoGluon for 4 hours. Such prediction data is included for each model produced by AutoGluon on each fold of 10-fold cross-validation on the 71 classification datasets from the AutoML Benchmark. The data exists for two metrics (ROC AUC and Balanced Accuracy). More details can be found in the paper.

The data was collected by code created for the paper and is available in its reproducibility repository: https://doi.org/10.6084/m9.figshare.23609226.

Its usage is intended for but not limited to using assembled to evaluate post hoc ensembling methods for AutoML.

Details The link above points to a hosted server that facilitates the download. We opted for a hosted server, as we found no other suitable solution to share these large files (due to file size or storage limits) for a reasonable price. If you want to obtain the data in another way or know of a more suitable alternative, please contact Lennart Purucker.

The link resolves to a directory containing the following:

example_metatasks: contains an example metatask for test purposes before committing to downloading all files.
metatasks_roc_auc.zip: The Metatasks obtained by running AutoGluon for ROC AUC. metatasks_bacc.zip: The Metatasks obtained by running AutoGluon for Balanced Accuracy.

The size after unzipping is:

metatasks_roc_auc.zip: ~85GB metatasks_bacc.zip: ~100GB

The metatask .zip files contain 2 files per metatask. One .json file with metadata information and a .hdf file containing the prediction data. The details on how this should be read and used as a Metatask can be found in the assembled framework and the reproducibility repository. To obtain the data without Metataks, we advise looking at the file content and metadata individually or parsing them by using Metatasks first.
Training and Validation Datasets for Neural Network to Fill in Missing Data...
catalog.data.gov
Updated Jul 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2025). Training and Validation Datasets for Neural Network to Fill in Missing Data in EBSD Maps [Dataset]. https://catalog.data.gov/dataset/training-and-validation-datasets-for-neural-network-to-fill-in-missing-data-in-ebsd-maps
Explore at:
Dataset updated
Jul 9, 2025
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
This dataset consists of the synthetic electron backscatter diffraction (EBSD) maps generated for the paper, titled "Hybrid Algorithm for Filling in Missing Data in Electron Backscatter Diffraction Maps" by Emmanuel Atindama, Conor Miller-Lynch, Huston Wilhite, Cody Mattice, Günay Doğan, and Prashant Athavale. The EBSD maps were used to train, test, and validate a neural network algorithm to fill in missing data points in a given EBSD map.The dataset includes 8000 maps for training, 1000 maps for testing, 2000 maps for validation. The dataset also includes noise-added versions of the maps, namely, one more map per each clean map.
Z
Data for training, validation and testing of methods in the thesis:...
data.niaid.nih.gov
zenodo.org
Updated May 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lucia Hajduková (2021). Data for training, validation and testing of methods in the thesis: Camera-based Accuracy Improvement of Indoor Localization [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4730337
Explore at:
Dataset updated
May 1, 2021
Dataset authored and provided by
Lucia Hajduková
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The package contains files for two modules designed to improve the accuracy of the indoor positioning system, namely the following:

door detection

videos_test - videos used to demonstrate the application of door detector

videos_res - videos from videos_test directory with detected doors marked

parts detection

frames_train_val - images generated from videos used for training and validation of VGG16 neural network model

frames_test - images generated from videos used for testing of the trained model

videos_test - videos used to demonstrate the application of parts detector

videos_res - videos from videos_test directory with detected parts marked
riiid cross validation files
kaggle.com
Updated Nov 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
tito (2020). riiid cross validation files [Dataset]. https://www.kaggle.com/its7171/riiid-cross-validation-files/activity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 4, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
tito
Description
Context

This dataset stores separate files of training and validation data for Riiid!

These files are made by following notebook. https://www.kaggle.com/its7171/cv-strategy

You can read these files like:
train1 = pd.read_pickle('../input/riiid-cross-validation-files/cv1_train.pickle') valid1 = pd.read_pickle('../input/riiid-cross-validation-files/cv1_valid.pickle')

Usage example: https://www.kaggle.com/its7171/riiid-cross-validation-files
DataAndSettings
figshare.com
zip
Updated Sep 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wei Lin (2022). DataAndSettings [Dataset]. http://doi.org/10.6084/m9.figshare.21159217.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21159217.v1
Dataset updated
Sep 20, 2022
Dataset provided by
Figsharehttp://figshare.com/
Authors
Wei Lin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We provided 1) the training and validation data, and 2) training settings. Particularly, the IEEE 33-bus test system employs 50000 training data and 5000 validation data, and the IEEE 136-bus test system employs 70000 training data and 10000 validation data.
Challenge Round 0 (Dry Run) Test Dataset
catalog.data.gov
data.nist.gov
+2more
Updated Jul 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2022). Challenge Round 0 (Dry Run) Test Dataset [Dataset]. https://catalog.data.gov/dataset/challenge-round-0-dry-run-test-dataset-ff885
Explore at:
Dataset updated
Jul 29, 2022
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
This dataset was an initial test harness infrastructure test for the TrojAI program. It should not be used for research. Please use the more refined datasets generated for the other rounds. The data being generated and disseminated is training, validation, and test data used to construct trojan detection software solutions. This data, generated at NIST, consists of human level AIs trained to perform a variety of tasks (image classification, natural language processing, etc.). A known percentage of these trained AI models have been poisoned with a known trigger which induces incorrect behavior. This data will be used to develop software solutions for detecting which trained AI models have been poisoned via embedded triggers. This dataset consists of 200 trained, human level, image classification AI models using the following architectures (Inception-v3, DenseNet-121, and ResNet50). The models were trained on synthetically created image data of non-real traffic signs superimposed on road background scenes. Half (50%) of the models have been poisoned with an embedded trigger which causes misclassification of the images when the trigger is present.
4
Train, validation, test data sets and confusion matrices underlying...
data.4tu.nl
zip
Updated Sep 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Louis Kuijpers; Nynke Dekker; Belen Solano Hermosilla; Edo van Veen (2023). Train, validation, test data sets and confusion matrices underlying publication: "Automated cell counting for Trypan blue stained cell cultures using machine learning" [Dataset]. http://doi.org/10.4121/21695819.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/21695819.v1
Dataset updated
Sep 7, 2023
Dataset provided by
4TU.ResearchData
Authors
Louis Kuijpers; Nynke Dekker; Belen Solano Hermosilla; Edo van Veen
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Annotated test and train data sets. Both images and annotations are provided separately.

Validation data set for Hi5, Sf9 and HEK cells.

Confusion matrices for the determination of performance parameters
h
alpaca-train-validation-test-split
huggingface.co
Updated Aug 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Doula Isham Rashik Hasan (2023). alpaca-train-validation-test-split [Dataset]. https://huggingface.co/datasets/disham993/alpaca-train-validation-test-split
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 12, 2023
Authors
Doula Isham Rashik Hasan
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Dataset Card for Alpaca

I have just performed train, test and validation split on the original dataset. Repository to reproduce this will be shared here soon. I am including the orignal Dataset card as follows.

Dataset Summary

Alpaca is a dataset of 52,000 instructions and demonstrations generated by OpenAI's text-davinci-003 engine. This instruction data can be used to conduct instruction-tuning for language models and make the language model follow instruction better.… See the full description on the dataset page: https://huggingface.co/datasets/disham993/alpaca-train-validation-test-split.
Sample, test, and validation data for findmycells
zenodo.org
zip
Updated Feb 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dennis Segebarth; Dennis Segebarth (2023). Sample, test, and validation data for findmycells [Dataset]. http://doi.org/10.5281/zenodo.7655292
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7655292
Dataset updated
Feb 20, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Dennis Segebarth; Dennis Segebarth
License
Attribution 1.0 (CC BY 1.0)https://creativecommons.org/licenses/by/1.0/
License information was derived automatically
Description
findmycells is an open source python package, developed to foster the use of deep-learning based python tools for bioimage analysis, specifically for researchers with limited python coding experience. It is developed and maintained in the following GitHub repository: https://github.com/Defense-Circuits-Lab/findmycells

Disclaimer: All data (including the model ensemble) uploaded here serve solely as a test dataset for findmycells and are not intended for any other purposes.

For instance, the group, subgroup, or subject IDs don´t refer to the actual experimental conditions. Likewise, also the included ROI-files were only created to allow the testing of findmycells and may not live up to scientific standards. Furthermore, the image data represents a subset of a dataset that is already published here:

Segebarth, Dennis et al. (2020), Data from: On the objectivity, reliability, and validity of deep learning enabled bioimage analyses, Dryad, Dataset, https://doi.org/10.5061/dryad.4b8gtht9d

The model ensemble (cfos_ensemble.zip) was trained using deepflash2 (v 0.1.7)

Griebel, M., Segebarth, D., Stein, N., Schukraft, N., Tovote, P., Blum, R., & Flath, C. M. (2021). Deep-learning in the bioimaging wild: Handling ambiguous data with deepflash2. arXiv preprint arXiv:2111.06693.

The training was performed on a subset of the "lab-wue1" training dataset, using only the 27 images with IDs 0000 - 0099 (cfos_training_images.zip) and the corresponding est. GT masks (cfos_training_masks.zip). The images used in "cfos_fmc_test_project.zip" for the actual testing of findmycells are the images with the IDs 0100, 0106, 0149, and 0152 of the aforementioned "lab-wue1" training dataset. They were randomly distributed to the made-up subject folders and renamed to "dentate_gyrus_01" or "dentate_gyrus_02".
f
Training, test data and model parameters.
figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Salvatore Cosentino; Mette Voldby Larsen; Frank Møller Aarestrup; Ole Lund (2023). Training, test data and model parameters. [Dataset]. http://doi.org/10.1371/journal.pone.0077302.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0077302.t001
Dataset updated
Jun 3, 2023
Dataset provided by
PLOS ONE
Authors
Salvatore Cosentino; Mette Voldby Larsen; Frank Møller Aarestrup; Ole Lund
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Training, test data and model parameters. The last 3 columns show the MinORG, LT and HT parameters used to create the pathogenicity families and build the model for each of the 10 models. Zthr is a threshold value, calculated for each model at the cross validation phase, which is used, given the final prediction score, to decide if the input organisms will be predicted as pathogenic or non-pathogenic. The parameters for each model are chosen after 5-fold cross-validation tests.
Z
CARLA Simulation Datasets for Training, Validation, and Test Data of the...
data.niaid.nih.gov
zenodo.org
Updated Jan 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaikh, Hamdaan Asif (2024). CARLA Simulation Datasets for Training, Validation, and Test Data of the project "Out-Of-Domain Data Detection using Uncertainty Quantification in End-to-End Driving Algorithms" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10511420
Explore at:
Dataset updated
Jan 15, 2024
Dataset authored and provided by
Shaikh, Hamdaan Asif
Description
These are CARLA Simulation Datasets of the project "Out-Of-Domain Data Detection using Uncertainty Quantification in End-to-End Driving Algorithms". The simulations are generated in CARLA Town 02 for different sun angles (in degrees). You will find image frames, command labels, and steering control values in the respective 'xxxx_files_data' folder. You will find videos of each simulation run in the 'xxxx_files_visualizations' folder.

The 8 simulation runs for Training Data, are with the Sun Angles : 90, 80, 70, 60, 50, 40, 30, 20

The 8 simulation runs for Training Data were seeded at 0000, 1000, 2000, 3000, 4000, 5000, 6000, 7000 respectively

The 4 simulation runs for Validation Data, are with the Sun Angles : 87, 67, 47, 23

The 4 simulation runs for Validation Data were seeded at 0000, 2000, 4000, 7000 respectively

The 29 simulation runs for Testing Data, are with the Sun Angles : 85, 75, 65, 55, 45, 35, 25, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 09, 08, 07, 06, 05, 04, 03, 02, 01, 00, -1, -10

The 29 simulation runs for Testing Data were all seeded at 5000 respectively
d
Machine Learning (ML) Data | 800M+ B2B Profiles | AI-Ready for Deep Learning...
datarade.ai
.json, .csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xverum, Machine Learning (ML) Data | 800M+ B2B Profiles | AI-Ready for Deep Learning (DL), NLP & LLM Training [Dataset]. https://datarade.ai/data-products/xverum-company-data-b2b-data-belgium-netherlands-denm-xverum
Explore at:
.json, .csvAvailable download formats
Dataset provided by
Xverum LLC
Authors
Xverum
Area covered
Norway, United Kingdom, India, Western Sahara, Jordan, Dominican Republic, Sint Maarten (Dutch part), Cook Islands, Barbados, Oman
Description
Xverum’s AI & ML Training Data provides one of the most extensive datasets available for AI and machine learning applications, featuring 800M B2B profiles with 100+ attributes. This dataset is designed to enable AI developers, data scientists, and businesses to train robust and accurate ML models. From natural language processing (NLP) to predictive analytics, our data empowers a wide range of industries and use cases with unparalleled scale, depth, and quality.

What Makes Our Data Unique?

Scale and Coverage: - A global dataset encompassing 800M B2B profiles from a wide array of industries and geographies. - Includes coverage across the Americas, Europe, Asia, and other key markets, ensuring worldwide representation.

Rich Attributes for Training Models: - Over 100 fields of detailed information, including company details, job roles, geographic data, industry categories, past experiences, and behavioral insights. - Tailored for training models in NLP, recommendation systems, and predictive algorithms.

Compliance and Quality: - Fully GDPR and CCPA compliant, providing secure and ethically sourced data. - Extensive data cleaning and validation processes ensure reliability and accuracy.

Annotation-Ready: - Pre-structured and formatted datasets that are easily ingestible into AI workflows. - Ideal for supervised learning with tagging options such as entities, sentiment, or categories.

How Is the Data Sourced? - Publicly available information gathered through advanced, GDPR-compliant web aggregation techniques. - Proprietary enrichment pipelines that validate, clean, and structure raw data into high-quality datasets. This approach ensures we deliver comprehensive, up-to-date, and actionable data for machine learning training.

Primary Use Cases and Verticals

Natural Language Processing (NLP): Train models for named entity recognition (NER), text classification, sentiment analysis, and conversational AI. Ideal for chatbots, language models, and content categorization.

Predictive Analytics and Recommendation Systems: Enable personalized marketing campaigns by predicting buyer behavior. Build smarter recommendation engines for ecommerce and content platforms.

B2B Lead Generation and Market Insights: Create models that identify high-value leads using enriched company and contact information. Develop AI systems that track trends and provide strategic insights for businesses.

HR and Talent Acquisition AI: Optimize talent-matching algorithms using structured job descriptions and candidate profiles. Build AI-powered platforms for recruitment analytics.

How This Product Fits Into Xverum’s Broader Data Offering Xverum is a leading provider of structured, high-quality web datasets. While we specialize in B2B profiles and company data, we also offer complementary datasets tailored for specific verticals, including ecommerce product data, job listings, and customer reviews. The AI Training Data is a natural extension of our core capabilities, bridging the gap between structured data and machine learning workflows. By providing annotation-ready datasets, real-time API access, and customization options, we ensure our clients can seamlessly integrate our data into their AI development processes.

Why Choose Xverum? - Experience and Expertise: A trusted name in structured web data with a proven track record. - Flexibility: Datasets can be tailored for any AI/ML application. - Scalability: With 800M profiles and more being added, you’ll always have access to fresh, up-to-date data. - Compliance: We prioritize data ethics and security, ensuring all data adheres to GDPR and other legal frameworks.

Ready to supercharge your AI and ML projects? Explore Xverum’s AI Training Data to unlock the potential of 800M global B2B profiles. Whether you’re building a chatbot, predictive algorithm, or next-gen AI application, our data is here to help.

Contact us for sample datasets or to discuss your specific needs.
f
Training, validation and test datasets and model files for larger US Health...
ufs.figshare.com
txt
Updated Dec 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jan Marthinus Blomerus (2023). Training, validation and test datasets and model files for larger US Health Insurance dataset [Dataset]. http://doi.org/10.38140/ufs.24598881.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.38140/ufs.24598881.v2
Dataset updated
Dec 12, 2023
Dataset provided by
University of the Free State
Authors
Jan Marthinus Blomerus
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Formats1.xlsx contains the descriptions of the columns of the following datasets: Training, validation and test datasets in combination are all the records.sens1.csv and and meansdX.csv are required for testing.
t
FAIR Dataset for Disease Prediction in Healthcare Applications
test.researchdata.tuwien.ac.at
bin, csv, json, png
Updated Apr 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sufyan Yousaf; Sufyan Yousaf; Sufyan Yousaf; Sufyan Yousaf (2025). FAIR Dataset for Disease Prediction in Healthcare Applications [Dataset]. http://doi.org/10.70124/5n77a-dnf02
Explore at:
csv, json, bin, pngAvailable download formats
Unique identifier
https://doi.org/10.70124/5n77a-dnf02
Dataset updated
Apr 14, 2025
Dataset provided by
TU Wien
Authors
Sufyan Yousaf; Sufyan Yousaf; Sufyan Yousaf; Sufyan Yousaf
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Description

Context and Methodology

Research Domain/Project:
This dataset was created for a machine learning experiment aimed at developing a classification model to predict outcomes based on a set of features. The primary research domain is disease prediction in patients. The dataset was used in the context of training, validating, and testing.

Purpose of the Dataset:
The purpose of this dataset is to provide training, validation, and testing data for the development of machine learning models. It includes labeled examples that help train classifiers to recognize patterns in the data and make predictions.

Dataset Creation:
Data preprocessing steps involved cleaning, normalization, and splitting the data into training, validation, and test sets. The data was carefully curated to ensure its quality and relevance to the problem at hand. For any missing values or outliers, appropriate handling techniques were applied (e.g., imputation, removal, etc.).

Technical Details

Structure of the Dataset:
The dataset consists of several files organized into folders by data type:

Training Data: Contains the training dataset used to train the machine learning model.

Validation Data: Used for hyperparameter tuning and model selection.

Test Data: Reserved for final model evaluation.

Each folder contains files with consistent naming conventions for easy navigation, such as train_data.csv, validation_data.csv, and test_data.csv. Each file follows a tabular format with columns representing features and rows representing individual data points.

Software Requirements:
To open and work with this dataset, you need VS Code or Jupyter, which could include tools like:

Python (with libraries such as pandas, numpy, scikit-learn, matplotlib, etc.)

Further Details

Reusability:
Users of this dataset should be aware that it is designed for machine learning experiments involving classification tasks. The dataset is already split into training, validation, and test subsets. Any model trained with this dataset should be evaluated using the test set to ensure proper validation.

Limitations:
The dataset may not cover all edge cases, and it might have biases depending on the selection of data sources. It's important to consider these limitations when generalizing model results to real-world applications.
Baltic Sea Region Land Cover Plus - Training and Validation data
zenodo.org
bin, pdf
Updated Oct 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vu-Dong Pham; Vu-Dong Pham (2024). Baltic Sea Region Land Cover Plus - Training and Validation data [Dataset]. http://doi.org/10.5281/zenodo.11073291
Explore at:
bin, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11073291
Dataset updated
Oct 15, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Vu-Dong Pham; Vu-Dong Pham
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Training and validation data used in creating Baltic Sea Region Land Cover Plus (BSRLC+) maps: Dataset link

landcover_training_data_2006_2018.gpkg: Points data of consistent land cover from 2006 to 2018

crop_training_data_{year}.gpkg: Points data of crop types derived from EuroCrop dataset in particular year (2019, 2021, 2023)

landcover_validation_{year}.gpkg: Points data of validation data derived from LUCAS points in particular year (2009, 2012, 2015, 2018)

Metadata.pdf: Information of land cover code in each dataset

Version notes:

Version 2: Correcting the validation data 2018 and Metadata file

Version 1: Original upload
t
Training and validation dataset of milling processes for time series...
service.tib.eu
Updated Nov 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Training and validation dataset of milling processes for time series prediction - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/rdr-doi-10-35097-1462
Explore at:
Dataset updated
Nov 28, 2024
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Abstract: Ziel des Datensatzes ist das Training sowie die Validierung von Modellen zur Prognose von Zeitreihen für Fräsprozesse. Hierfür wurden an einer DMG CMX 600 V durch eine Siemens Industrial Edge Prozesse mit einer Abtastrate von 500 Hz aufgenommen. Es wurde ein Prozess für das Modelltraining und ein Prozess für die Validierung aufgenommen, welche sowohl für die Bearbeitung von Stahl sowie von Aluminium verwendet wurden. Es wurden mehrere Aufnahmen mit und ohne Werkstück (Aircut) aufgenommen, um möglichst viele Fälle abdecken zu können. Abstract: The aim of the data set is the training as well as the validation of models for the prediction of time series for milling processes. For this purpose, processes with a sampling rate of 500 Hz were recorded on a DMG CMX 600 V by a Siemens Industrial Edge. A process for model training and a process for validation were recorded, which were used for both steel and aluminum machining. Several recordings were made with and without the workpiece (aircut) in order to cover as many cases as possible. TechnicalRemarks: Documents: -Design of Experiments: Information on the paths such as the technological values of the experiments -Recording information: Information about the recordings with comments -Data: All recorded datasets. The first level contains the folders for training and validation both with and without the workpiece. In the next level, the individual test executions are located. The individual recordings are stored in the form of a JSON file. This consists of a header with all relevant information such as the signal sources. This is followed by the entries of the recorded time series. -NC-Code: NC programs executed on the machine -Workpiece: Pictures of the raw parts as well as the machined workpieces. The pictures show the unfinished part on the left, the training part in the middle and a part with two validation runs on the right. Experimental data: -Machine: DMG CMX 600 V -Material: S235JR, 2007 T4 -Tools: -VHM-Fräser HPC, TiSi, ⌀ f8 DC: 5mm -VHM-Fräser HPC, TiSi, ⌀ f8 DC: 10mm -VHM-Fräser HPC, TiSi, ⌀ f8 DC: 20mm -Schaftfräser HSS-Co8, TiAlN, ⌀ k10 DC: 5mm -Schaftfräser HSS-Co8, TiAlN, ⌀ k10 DC: 10mm -Schaftfräser HSS-Co8, TiAlN, ⌀ k10 DC: 5mm -Workpiece blank dimensions: 150x75x50mm License: This work is licensed under a Creative Commons Attribution 4.0 International License. Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0).
f
Training and validation datasets for training probabilistic machine learning...
figshare.com
data.4tu.nl
txt
Updated Jun 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deepali Singh (2023). Training and validation datasets for training probabilistic machine learning models on NREL's 10-MW reference wind turbine [Dataset]. http://doi.org/10.4121/21939995.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.4121/21939995.v1
Dataset updated
Jun 11, 2023
Dataset provided by
4TU.ResearchData
Authors
Deepali Singh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository consists of two databases- CASE-ONSHORE and CASE-OFFSHORE, generated using OpenFAST v2.4 on NREL's 10-MW reference wind turbine for training data-driven probabilistic load surrogate models. The data is to be used for mapping 10-minute average environmental conditions to the corresponding 10-minute load statistics such as load average, fatigue and range at various locations on the tower and blades.
Machine Learning Dataset
brightdata.com
.json, .csv, .xlsx
Updated Jun 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2024). Machine Learning Dataset [Dataset]. https://brightdata.com/products/datasets/machine-learning
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Jun 19, 2024
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Utilize our machine learning datasets to develop and validate your models. Our datasets are designed to support a variety of machine learning applications, from image recognition to natural language processing and recommendation systems. You can access a comprehensive dataset or tailor a subset to fit your specific requirements, using data from a combination of various sources and websites, including custom ones. Popular use cases include model training and validation, where the dataset can be used to ensure robust performance across different applications. Additionally, the dataset helps in algorithm benchmarking by providing extensive data to test and compare various machine learning algorithms, identifying the most effective ones for tasks such as fraud detection, sentiment analysis, and predictive maintenance. Furthermore, it supports feature engineering by allowing you to uncover significant data attributes, enhancing the predictive accuracy of your machine learning models for applications like customer segmentation, personalized marketing, and financial forecasting.
e
Web Data Commons Training and Test Sets for Large-Scale Product Matching -...
b2find.eudat.eu
Updated Nov 27, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Web Data Commons Training and Test Sets for Large-Scale Product Matching - Version 2.0 Product Matching Task derived from the WDC Product Data Corpus - Version 2.0 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/720b440c-eda0-5182-af9f-f868ed999bd7
Explore at:
Dataset updated
Nov 27, 2020
Description
Many e-shops have started to mark-up product data within their HTML pages using the schema.org vocabulary. The Web Data Commons project regularly extracts such data from the Common Crawl, a large public web crawl. The Web Data Commons Training and Test Sets for Large-Scale Product Matching contain product offers from different e-shops in the form of binary product pairs (with corresponding label “match” or “no match”) for four product categories, computers, cameras, watches and shoes. In order to support the evaluation of machine learning-based matching methods, the data is split into training, validation and test sets. For each product category, we provide training sets in four different sizes (2.000-70.000 pairs). Furthermore there are sets of ids for each training set for a possible validation split (stratified random draw) available. The test set for each product category consists of 1.100 product pairs. The labels of the test sets were manually checked while those of the training sets were derived using shared product identifiers from the Web weak supervision. The data stems from the WDC Product Data Corpus for Large-Scale Product Matching - Version 2.0 which consists of 26 million product offers originating from 79 thousand websites. For more information and download links for the corpus itself, please follow the links below.

Facebook

Twitter

Click to copy link

Link copied

Cite

Margaret Goldman; Joshua Rosera; Graham Lederer; Garth Graham; Asitang Mishra; Alice Yepremyan (2023). Training and validation data from the AI for Critical Mineral Assessment Competition [Dataset]. http://doi.org/10.5066/P9FXSPT1

Training and validation data from the AI for Critical Mineral Assessment Competition

Explore at:

Unique identifier

https://doi.org/10.5066/P9FXSPT1

Dataset updated

Dec 27, 2023

Dataset provided by

United States Geological Surveyhttp://www.usgs.gov/

Authors

Margaret Goldman; Joshua Rosera; Graham Lederer; Garth Graham; Asitang Mishra; Alice Yepremyan

License

U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically

Time period covered

2022 - 2023

Description

Extracting useful and accurate information from scanned geologic and other earth science maps is a time-consuming and laborious process involving manual human effort. To address this limitation, the USGS partnered with the Defense Advanced Research Projects Agency (DARPA) to run the AI for Critical Mineral Assessment Competition, soliciting innovative solutions for automatically georeferencing and extracting features from maps. The competition opened for registration in August 2022 and concluded in December 2022. Training and validation data from the competition are provided here, as well as competition details and baseline solutions. The data are derived from published sources and are provided to the public to support continued development of automated georeferencing and feature extraction tools. References for all maps are included with the data.

Clear search

Close search

Google apps

Main menu

Training and validation data from the AI for Critical Mineral Assessment...

Metatasks for AutoGluon - ROC AUC and Balanced Accuracy

Training and Validation Datasets for Neural Network to Fill in Missing Data...

Data for training, validation and testing of methods in the thesis:...

riiid cross validation files

Context

DataAndSettings

Challenge Round 0 (Dry Run) Test Dataset

Train, validation, test data sets and confusion matrices underlying...

alpaca-train-validation-test-split

Sample, test, and validation data for findmycells

Training, test data and model parameters.

CARLA Simulation Datasets for Training, Validation, and Test Data of the...

Machine Learning (ML) Data | 800M+ B2B Profiles | AI-Ready for Deep Learning...

Training, validation and test datasets and model files for larger US Health...

FAIR Dataset for Disease Prediction in Healthcare Applications

Dataset Description

Context and Methodology

Technical Details

Further Details

Baltic Sea Region Land Cover Plus - Training and Validation data

Training and validation dataset of milling processes for time series...

Training and validation datasets for training probabilistic machine learning...

Machine Learning Dataset

Web Data Commons Training and Test Sets for Large-Scale Product Matching -...

Training and validation data from the AI for Critical Mineral Assessment CompetitionSee More Versions

Training and validation data from the AI for Critical Mineral Assessment Competition