100+ datasets found

d
BUTTER - Empirical Deep Learning Dataset
datasets.ai
data.openei.org
+2more
21, 28
Updated Sep 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Energy (2024). BUTTER - Empirical Deep Learning Dataset [Dataset]. https://datasets.ai/datasets/butter-empirical-deep-learning-dataset
Explore at:
28, 21Available download formats
Dataset updated
Sep 11, 2024
Dataset authored and provided by
Department of Energy
Description
The BUTTER Empirical Deep Learning Dataset represents an empirical study of the deep learning phenomena on dense fully connected networks, scanning across thirteen datasets, eight network shapes, fourteen depths, twenty-three network sizes (number of trainable parameters), four learning rates, six minibatch sizes, four levels of label noise, and fourteen levels of L1 and L2 regularization each. Multiple repetitions (typically 30, sometimes 10) of each combination of hyperparameters were preformed, and statistics including training and test loss (using a 80% / 20% shuffled train-test split) are recorded at the end of each training epoch. In total, this dataset covers 178 thousand distinct hyperparameter settings ("experiments"), 3.55 million individual training runs (an average of 20 repetitions of each experiments), and a total of 13.3 billion training epochs (three thousand epochs were covered by most runs). Accumulating this dataset consumed 5,448.4 CPU core-years, 17.8 GPU-years, and 111.2 node-years.
Machine Learning Dataset
brightdata.com
.json, .csv, .xlsx
Updated Jun 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2024). Machine Learning Dataset [Dataset]. https://brightdata.com/products/datasets/machine-learning
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Jun 19, 2024
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Utilize our machine learning datasets to develop and validate your models. Our datasets are designed to support a variety of machine learning applications, from image recognition to natural language processing and recommendation systems. You can access a comprehensive dataset or tailor a subset to fit your specific requirements, using data from a combination of various sources and websites, including custom ones. Popular use cases include model training and validation, where the dataset can be used to ensure robust performance across different applications. Additionally, the dataset helps in algorithm benchmarking by providing extensive data to test and compare various machine learning algorithms, identifying the most effective ones for tasks such as fraud detection, sentiment analysis, and predictive maintenance. Furthermore, it supports feature engineering by allowing you to uncover significant data attributes, enhancing the predictive accuracy of your machine learning models for applications like customer segmentation, personalized marketing, and financial forecasting.
US Deep Learning Market Analysis, Size, and Forecast 2025-2029
technavio.com
pdf
Updated Jul 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2025). US Deep Learning Market Analysis, Size, and Forecast 2025-2029 [Dataset]. https://www.technavio.com/report/us-deep-learning-market-industry-analysis
Explore at:
pdfAvailable download formats
Dataset updated
Jul 8, 2025
Dataset provided by
TechNavio
Authors
Technavio
Time period covered
2025 - 2029
Description
Snapshot img

US Deep Learning Market Size 2025-2029

The deep learning market size in US is forecast to increase by USD 5.02 billion at a CAGR of 30.1% between 2024 and 2029.

The deep learning market is experiencing robust growth, driven by the increasing adoption of artificial intelligence (AI) in various industries for advanced solutioning. This trend is fueled by the availability of vast amounts of data, which is a key requirement for deep learning algorithms to function effectively. Industry-specific solutions are gaining traction, as businesses seek to leverage deep learning for specific use cases such as image and speech recognition, fraud detection, and predictive maintenance. Alongside, intuitive data visualization tools are simplifying complex neural network outputs, helping stakeholders understand and validate insights. However, challenges remain, including the need for powerful computing resources, data privacy concerns, and the high cost of implementing and maintaining deep learning systems. Despite these hurdles, the market's potential for innovation and disruption is immense, making it an exciting space for businesses to explore further. Semi-supervised learning, data labeling, and data cleaning facilitate efficient training of deep learning models. Cloud analytics is another significant trend, as companies seek to leverage cloud computing for cost savings and scalability.

What will be the Size of the market During the Forecast Period?

Request Free Sample

Deep learning, a subset of machine learning, continues to shape industries by enabling advanced applications such as image and speech recognition, text generation, and pattern recognition. Reinforcement learning, a type of deep learning, gains traction, with deep reinforcement learning leading the charge. Anomaly detection, a crucial application of unsupervised learning, safeguards systems against security vulnerabilities. Ethical implications and fairness considerations are increasingly important in deep learning, with emphasis on explainable AI and model interpretability. Graph neural networks and attention mechanisms enhance data preprocessing for sequential data modeling and object detection. Time series forecasting and dataset creation further expand deep learning's reach, while privacy preservation and bias mitigation ensure responsible use.

In summary, deep learning's market dynamics reflect a constant pursuit of innovation, efficiency, and ethical considerations. The Deep Learning Market in the US is flourishing as organizations embrace intelligent systems powered by supervised learning and emerging self-supervised learning techniques. These methods refine predictive capabilities and reduce reliance on labeled data, boosting scalability. BFSI firms utilize AI image recognition for various applications, including personalizing customer communication, maintaining a competitive edge, and automating repetitive tasks to boost productivity. Sophisticated feature extraction algorithms now enable models to isolate patterns with high precision, particularly in applications such as image classification for healthcare, security, and retail.

How is this market segmented and which is the largest segment?

The market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

Application Image recognition Voice recognition Video surveillance and diagnostics Data mining Type Software Services Hardware End-user Security Automotive Healthcare Retail and commerce Others Geography North America US

By Application Insights

The Image recognition segment is estimated to witness significant growth during the forecast period. In the realm of artificial intelligence (AI) and machine learning, image recognition, a subset of computer vision, is gaining significant traction. This technology utilizes neural networks, deep learning models, and various machine learning algorithms to decipher visual data from images and videos. Image recognition is instrumental in numerous applications, including visual search, product recommendations, and inventory management. Consumers can take photographs of products to discover similar items, enhancing the online shopping experience. In the automotive sector, image recognition is indispensable for advanced driver assistance systems (ADAS) and autonomous vehicles, enabling the identification of pedestrians, other vehicles, road signs, and lane markings.

Furthermore, image recognition plays a pivotal role in augmented reality (AR) and virtual reality (VR) applications, where it tracks physical objects and overlays digital content onto real-world scenarios. The model training process involves the backpropagation algorithm, which calculates
Trained deep learning models
figshare.com
zip
Updated Apr 19, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anuradha Kar (2021). Trained deep learning models [Dataset]. http://doi.org/10.6084/m9.figshare.14433590.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14433590.v1
Dataset updated
Apr 19, 2021
Dataset provided by
Figsharehttp://figshare.com/
Authors
Anuradha Kar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Trained models for four deep learning based segmentation pipelines are included here. These include Unet models for the Plantseg and the Unet+Watershed and Cellpose pipelines and a MaskRCNN model for the MRCNN+Watershed pipeline.
R
Project Deep Learning Dataset
universe.roboflow.com
zip
Updated Feb 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
projectdeep (2024). Project Deep Learning Dataset [Dataset]. https://universe.roboflow.com/projectdeep/project-deep-learning-20bti
Explore at:
zipAvailable download formats
Dataset updated
Feb 8, 2024
Dataset authored and provided by
projectdeep
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Chilli Carrot Tomato Bounding Boxes
Description
Project Deep Learning

## Overview Project Deep Learning is a dataset for object detection tasks - it contains Chilli Carrot Tomato annotations for 1,144 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
m
LOCBEEF: Beef Quality Image dataset for Deep Learning Models
data.mendeley.com
Updated Nov 30, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tri Mulya Dharma (2022). LOCBEEF: Beef Quality Image dataset for Deep Learning Models [Dataset]. http://doi.org/10.17632/nhs6mjg6yy.1
Explore at:
Unique identifier
https://doi.org/10.17632/nhs6mjg6yy.1
Dataset updated
Nov 30, 2022
Authors
Tri Mulya Dharma
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The LOCBEEF dataset contains 3268 images of local Aceh beef collected from 07:00 a.m - 22:00 p.m, more information about the clock is shown in Fig. The dataset contains two categories of directories, namely train, and test. Furthermore, each subdirectory consists of fresh and rotten. An example of the image can be seen in Figs. 2 and 3. The directory structure for the data is shown in Fig. 1. The image directory for train contains 2228 images each subdirectory contains 1114 images, and the test directory contains 980 images for each subdirectory containing 490 images. For images have a resolution of 176 x 144 pixel, 320 x 240 pixel, 640 x 480 pixel, 720 x 480 pixel, 720 x 720 pixel, 1280 x 720 pixel, 1920 x 1080 pixel, 2560 x 1920 pixel, 3120 x 3120 pixel, 3264 x 2248 pixel, and 4160 x 3120 pixel.

The classification of LOCBEEF datasets has been carried out using the deep learning method of Convolutional Neural Networks with an image composition of 70% training data and 30% test data. Images with the mentioned dimensions are included in the LOCBEEF dataset to apply to the Resnet50.

Deep Learning Market Analysis North America, Europe, APAC, South America,...

technavio.com

Updated Nov 18, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Technavio (2022). Deep Learning Market Analysis North America, Europe, APAC, South America, Middle East and Africa - US, China, UK, Canada, Germany - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/deep-learning-market-industry-analysis

Explore at:

Dataset updated

Nov 18, 2022

Dataset provided by

TechNavio

Authors

Technavio

Time period covered

2021 - 2025

Area covered

United States, Global

Description

Snapshot img

Deep Learning Market Size 2024-2028

The deep learning market size is forecast to increase by USD 10.85 billion at a CAGR of 26.06% between 2023 and 2028.

Deep learning technology is revolutionizing various industries, including healthcare. In the healthcare sector, deep learning is being extensively used for the diagnosis and treatment of musculoskeletal and inflammatory disorders. The market for deep learning services is experiencing significant growth due to the increasing availability of high-resolution medical images, electronic health records, and big data. Medical professionals are leveraging deep learning technologies for disease indications such as failure-to-success ratio, image interpretation, and biomarker identification solutions. Moreover, with the proliferation of data from various sources such as social networks, smartphones, and IoT devices, there is a growing need for advanced analytics techniques to make sense of this data. Companies In the market are collaborating to offer comprehensive information services and digital analytical solutions. However, the lack of technical expertise among medical professionals poses a challenge to the widespread adoption of deep learning technologies. The market is witnessing an influx of startups, which is intensifying the competition. Deep learning services are being integrated with compatible devices for image processing and prognosis. Molecular data analysis is another area where deep learning technologies are making a significant impact.

What will be the Size of the Deep Learning Market During the Forecast Period?

Request Free Sample

A subset of machine learning and artificial intelligence (AI), is a computational method inspired by the structure and function of the human brain. This technology utilizes neural networks, a type of machine learning model, to recognize patterns and learn from data. In the US market, deep learning is gaining significant traction due to its ability to process large amounts of data and extract meaningful insights. The market In the US is driven by several factors. One of the primary factors is the increasing availability of big data.
Moreover, with the proliferation of data from various sources such as social networks, smartphones, and IoT devices, there is a growing need for advanced analytics techniques to make sense of this data. Deep learning algorithms, with their ability to learn from vast amounts of data, are well-positioned to address this need. Another factor fueling the growth of the market In the US is the increasing adoption of cloud-based technology. Cloud-based solutions offer several advantages, including scalability, flexibility, and cost savings. These solutions enable organizations to process large datasets and train complex models without the need for expensive hardware.

How is this Industry segmented and which is the largest segment?

The industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

Application

  Image recognition
  Voice recognition
  Video surveillance and diagnostics
  Data mining


Type

  Software
  Services
  Hardware


Geography

  North America

    Canada
    US


  Europe

    Germany
    UK


  APAC

    China


  South America



  Middle East and Africa

By Application Insights

The image recognition segment is estimated to witness significant growth during the forecast period.

In the realm of artificial intelligence (AI), image recognition holds significant value, particularly in sectors such as banking and finance (BFSI). This technology's ability to accurately identify and categorize images is invaluable, as extensive image repositories In these industries cannot be easily forged. BFSI firms utilize AI image recognition for various applications, including personalizing customer communication, maintaining a competitive edge, and automating repetitive tasks to boost productivity. For instance, social media platforms like Facebook employ this technology to correctly identify and assign images to the right user account with an impressive accuracy rate of approximately 98%. Moreover, AI image recognition plays a crucial role in eliminating fraudulent social media accounts.

Get a glance at the report of share of various segments Request Free Sample

The image recognition segment was valued at USD 1.05 billion in 2018 and showed a gradual increase during the forecast period.

Regional Analysis

North America is estimated to contribute 36% to the growth of the global market during the forecast period.

Technavio's analysts have elaborately explained the regional trends and drivers that shape the market during the forecast period.

For more insights on the market share of various regions, Reques

TREC 2022 Deep Learning test collection
catalog.data.gov
s.cnmilf.com
+1more
Updated May 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2023). TREC 2022 Deep Learning test collection [Dataset]. https://catalog.data.gov/dataset/trec-2022-deep-learning-test-collection
Explore at:
Dataset updated
May 9, 2023
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
This is a test collection for passage and document retrieval, produced in the TREC 2023 Deep Learning track. The Deep Learning Track studies information retrieval in a large training data regime. This is the case where the number of training queries with at least one positive label is at least in the tens of thousands, if not hundreds of thousands or more. This corresponds to real-world scenarios such as training based on click logs and training based on labels from shallow pools (such as the pooling in the TREC Million Query Track or the evaluation of search engines based on early precision).Certain machine learning based methods, such as methods based on deep learning are known to require very large datasets for training. Lack of such large scale datasets has been a limitation for developing such methods for common information retrieval tasks, such as document ranking. The Deep Learning Track organized in the previous years aimed at providing large scale datasets to TREC, and create a focused research effort with a rigorous blind evaluation of ranker for the passage ranking and document ranking tasks.Similar to the previous years, one of the main goals of the track in 2022 is to study what methods work best when a large amount of training data is available. For example, do the same methods that work on small data also work on large data? How much do methods improve when given more training data? What external data and models can be brought in to bear in this scenario, and how useful is it to combine full supervision with other forms of supervision?The collection contains 12 million web pages, 138 million passages from those web pages, search queries, and relevance judgments for the queries.
f
Data_Sheet_1_Deep Learning in Alzheimer's Disease: Diagnostic Classification...
frontiersin.figshare.com
pdf
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Taeho Jo; Kwangsik Nho; Andrew J. Saykin (2023). Data_Sheet_1_Deep Learning in Alzheimer's Disease: Diagnostic Classification and Prognostic Prediction Using Neuroimaging Data.pdf [Dataset]. http://doi.org/10.3389/fnagi.2019.00220.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fnagi.2019.00220.s001
Dataset updated
May 30, 2023
Dataset provided by
Frontiers
Authors
Taeho Jo; Kwangsik Nho; Andrew J. Saykin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Deep learning, a state-of-the-art machine learning approach, has shown outstanding performance over traditional machine learning in identifying intricate structures in complex high-dimensional data, especially in the domain of computer vision. The application of deep learning to early detection and automated classification of Alzheimer's disease (AD) has recently gained considerable attention, as rapid progress in neuroimaging techniques has generated large-scale multimodal neuroimaging data. A systematic review of publications using deep learning approaches and neuroimaging data for diagnostic classification of AD was performed. A PubMed and Google Scholar search was used to identify deep learning papers on AD published between January 2013 and July 2018. These papers were reviewed, evaluated, and classified by algorithm and neuroimaging type, and the findings were summarized. Of 16 studies meeting full inclusion criteria, 4 used a combination of deep learning and traditional machine learning approaches, and 12 used only deep learning approaches. The combination of traditional machine learning for classification and stacked auto-encoder (SAE) for feature selection produced accuracies of up to 98.8% for AD classification and 83.7% for prediction of conversion from mild cognitive impairment (MCI), a prodromal stage of AD, to AD. Deep learning approaches, such as convolutional neural network (CNN) or recurrent neural network (RNN), that use neuroimaging data without pre-processing for feature selection have yielded accuracies of up to 96.0% for AD classification and 84.2% for MCI conversion prediction. The best classification performance was obtained when multimodal neuroimaging and fluid biomarkers were combined. Deep learning approaches continue to improve in performance and appear to hold promise for diagnostic classification of AD using multimodal neuroimaging data. AD research that uses deep learning is still evolving, improving performance by incorporating additional hybrid data types, such as—omics data, increasing transparency with explainable approaches that add knowledge of specific disease-related features and mechanisms.
d
Machine Learning (ML) Data | 800M+ B2B Profiles | AI-Ready for Deep Learning...
datarade.ai
.json, .csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xverum, Machine Learning (ML) Data | 800M+ B2B Profiles | AI-Ready for Deep Learning (DL), NLP & LLM Training [Dataset]. https://datarade.ai/data-products/xverum-company-data-b2b-data-belgium-netherlands-denm-xverum
Explore at:
.json, .csvAvailable download formats
Dataset provided by
Xverum LLC
Authors
Xverum
Area covered
India, Dominican Republic, Norway, Western Sahara, Oman, United Kingdom, Barbados, Jordan, Sint Maarten (Dutch part), Cook Islands
Description
Xverum’s AI & ML Training Data provides one of the most extensive datasets available for AI and machine learning applications, featuring 800M B2B profiles with 100+ attributes. This dataset is designed to enable AI developers, data scientists, and businesses to train robust and accurate ML models. From natural language processing (NLP) to predictive analytics, our data empowers a wide range of industries and use cases with unparalleled scale, depth, and quality.

What Makes Our Data Unique?

Scale and Coverage: - A global dataset encompassing 800M B2B profiles from a wide array of industries and geographies. - Includes coverage across the Americas, Europe, Asia, and other key markets, ensuring worldwide representation.

Rich Attributes for Training Models: - Over 100 fields of detailed information, including company details, job roles, geographic data, industry categories, past experiences, and behavioral insights. - Tailored for training models in NLP, recommendation systems, and predictive algorithms.

Compliance and Quality: - Fully GDPR and CCPA compliant, providing secure and ethically sourced data. - Extensive data cleaning and validation processes ensure reliability and accuracy.

Annotation-Ready: - Pre-structured and formatted datasets that are easily ingestible into AI workflows. - Ideal for supervised learning with tagging options such as entities, sentiment, or categories.

How Is the Data Sourced? - Publicly available information gathered through advanced, GDPR-compliant web aggregation techniques. - Proprietary enrichment pipelines that validate, clean, and structure raw data into high-quality datasets. This approach ensures we deliver comprehensive, up-to-date, and actionable data for machine learning training.

Primary Use Cases and Verticals

Natural Language Processing (NLP): Train models for named entity recognition (NER), text classification, sentiment analysis, and conversational AI. Ideal for chatbots, language models, and content categorization.

Predictive Analytics and Recommendation Systems: Enable personalized marketing campaigns by predicting buyer behavior. Build smarter recommendation engines for ecommerce and content platforms.

B2B Lead Generation and Market Insights: Create models that identify high-value leads using enriched company and contact information. Develop AI systems that track trends and provide strategic insights for businesses.

HR and Talent Acquisition AI: Optimize talent-matching algorithms using structured job descriptions and candidate profiles. Build AI-powered platforms for recruitment analytics.

How This Product Fits Into Xverum’s Broader Data Offering Xverum is a leading provider of structured, high-quality web datasets. While we specialize in B2B profiles and company data, we also offer complementary datasets tailored for specific verticals, including ecommerce product data, job listings, and customer reviews. The AI Training Data is a natural extension of our core capabilities, bridging the gap between structured data and machine learning workflows. By providing annotation-ready datasets, real-time API access, and customization options, we ensure our clients can seamlessly integrate our data into their AI development processes.

Why Choose Xverum? - Experience and Expertise: A trusted name in structured web data with a proven track record. - Flexibility: Datasets can be tailored for any AI/ML application. - Scalability: With 800M profiles and more being added, you’ll always have access to fresh, up-to-date data. - Compliance: We prioritize data ethics and security, ensuring all data adheres to GDPR and other legal frameworks.

Ready to supercharge your AI and ML projects? Explore Xverum’s AI Training Data to unlock the potential of 800M global B2B profiles. Whether you’re building a chatbot, predictive algorithm, or next-gen AI application, our data is here to help.

Contact us for sample datasets or to discuss your specific needs.
Adversarial Machine Learning Dataset
kaggle.com
Updated Jul 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Network security group CNR-IEIIT (2022). Adversarial Machine Learning Dataset [Dataset]. https://www.kaggle.com/datasets/cnrieiit/adversarial-machine-learning-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 8, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Network security group CNR-IEIIT
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Adversarial Machine Learning Dataset

This repository contains the datasets adopted in the following paper, published in IEEE Access. If you use this repository in your research work, please consider citing our paper.

I. Vaccari, A. Carlevaro, S. Narteni, E. Cambiaso and M. Mongelli, "eXplainable and Reliable Against Adversarial Machine Learning in Data Analytics," in IEEE Access, vol. 10, pp. 83949-83970, 2022, DOI: 10.1109/ACCESS.2022.3197299. URL: https://ieeexplore.ieee.org/document/9852204

@ARTICLE{9852204, author={Vaccari, Ivan and Carlevaro, Alberto and Narteni, Sara and Cambiaso, Enrico and Mongelli, Maurizio}, journal={IEEE Access}, title={eXplainable and Reliable Against Adversarial Machine Learning in Data Analytics}, year={2022}, volume={10}, number={}, pages={83949-83970}, doi={10.1109/ACCESS.2022.3197299}}

Description

We consider three different applications: * DNS tunneling, referring to network data captured during a DNS tunneling attack * Platooning, referring to simulated data from a vehicle platooning scenario * Remaining useful life (RUL), related to predictive maintenance of aircraft engines

Each application scenario has been targeted through different Adversarial Machine Learning methods. Particularly, the following methods are considered: * Carlini-Wagner (CW): Carlini, N., & Wagner, D. (2017). Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp) (pp. 39-57). IEEE. * Fast Gradient Sign Method (FGSM): Goodfellow, I. J., Shlens, J., & Szegedy, C. (2014). Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. * Jacobian Saliency Map (JSMA): Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z. B., & Swami, A. (2016, March). The limitations of deep learning in adversarial settings. In 2016 IEEE European symposium on security and privacy (EuroS&P) (pp. 372-387). IEEE.

Data structure

We have a folder for each application scenario (DNS tunneling, platooning, RUL). For each application scenario, two folders are found: * legitimate, including original data * malicious, including data attacked by Adversarial Machine Learning methods; in this case, both training and test data are reported, together with a combined data

The target variable for the adversarial attacks detection is called attack in all cases, whereas the target variables in the original problems (legitimate data) are the following: * g for the DNS tunneling application * collision for the platooning application * RUL_binary for the RUL application

DNS tunneling data structure

Concerning the DNS tunneling application, the following features are considered: * mDt: mean inter-packet time interval * mA: mean answer packet size * mQ: mean query packet size * vDt: variance of inter-packet time interval * vA: variance of answer packet size * vQ: variance of query packet size * sDt: skewness of inter-packet time interval * sA: skewness of answer packet size * sQ: skewness of query packet size * kDt: kurtosis of inter-packet time interval * kA: kurtosis of answer packet size * kQ: kurtosis of query packet size

A portion of the legitimate data is reported in the following.

mDt | mA | mQ | vDt | vA | vQ | sDt | sA | sQ | kDt | kA | kQ | g --- | -- | -- | --- | -- | -- | --- | -- | -- | --- | -- | -- | - 0.46915 | 239.8778 | 86.6326 | 81.8955 | 26109.835267 | 71.490417 | 63.185077 | 2.380883 | 0.552061 | 4284.358131 | 7.187715 | 5623.187219 | 0 0.584831 | 254.0284 | 87.3976 | 11.010015 | 29161.123993 | 922.955914 | 6.777654 | 1.980254 | 42.142166 | 46.147515 | 4.539171 | -2.993006 | 0 0.633453 | 269.3278 | 88.255 | 11.800109 | 38263.725547 | 62.161175 | 6.453627 | 2.051226 | 0.469637 | 41.80766 | 4.600109 | -1.385322 | 0 2.649329 | 258.529 | 88.3704 | 25991.830887 | 35950.372759 | 1297.225604 | 70.664632 | 2.064281 | 37.215584 | 4992.657556 | 4.664802 | 2005555.720075 | 0 ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ...

Platooning data structure

Concerning the platooning application, the following features are considered: * N: number of vehicles in the platoon * F0: braking force applied by the leader * PER: packet error rate (probability of packet loss) * d0: initial distance between vehicles * v0: initial speed between vehicles

A portion of the legitimate data is reported in the following.

N | F0 | PER | d0 | v0 | collision --- | --...
2023 TREC Deep Learning Track Dataset
catalog.data.gov
data.nist.gov
+1more
Updated Jul 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2025). 2023 TREC Deep Learning Track Dataset [Dataset]. https://catalog.data.gov/dataset/2023-trec-deep-learning-track-dataset
Explore at:
Dataset updated
Jul 9, 2025
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
The Deep Learning track focuses on IR tasks where a large training set is available, allowing us to compare a variety of retrieval approaches including deep neural networks and strong non-neural approaches, to see what works best in a large-data regime.
f
Datasets
figshare.com
zip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bastian Eichenberger; YinXiu Zhan (2023). Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.12958037.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12958037.v1
Dataset updated
May 31, 2023
Dataset provided by
figshare
Authors
Bastian Eichenberger; YinXiu Zhan
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The benchmarking datasets used for deepBlink. The npz files contain train/valid/test splits inside and can be used directly. The files belong to the following challenges / classes:- ISBI Particle tracking challenge: microtubule, vesicle, receptor- Custom synthetic (based on http://smal.ws): particle- Custom fixed cell: smfish- Custom live cell: suntagThe csv files are to determine which image in the test splits correspond to which original image, SNR, and density.
Machine Learning model data
ecmwf.int
Updated Jan 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
European Centre for Medium-Range Weather Forecasts (2023). Machine Learning model data [Dataset]. https://www.ecmwf.int/en/forecasts/dataset/machine-learning-model-data
Explore at:
Dataset updated
Jan 1, 2023
Dataset authored and provided by
European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
three of these models are available:
i
A Dataset with Adversarial Attacks on Deep Learning in Wireless Modulation...
ieee-dataport.org
Updated Sep 23, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Antonios Argyriou (2023). A Dataset with Adversarial Attacks on Deep Learning in Wireless Modulation Classification [Dataset]. https://ieee-dataport.org/documents/dataset-adversarial-attacks-deep-learning-wireless-modulation-classification
Explore at:
Dataset updated
Sep 23, 2023
Authors
Antonios Argyriou
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains adversarial attacks on Deep Learning (DL) when it is employed for the classification of wireless modulated communication signals. The attack is executed with an obfuscating waveform that is embedded in the transmitted signal in such a way that prevents the extraction of clean data for training from a wireless eavesdropper. At the same time it allows a legitimate receiver (LRx) to demodulate the data.
d
Process-guided deep learning water temperature predictions: 6 Model...
catalog.data.gov
data.usgs.gov
+5more
Updated Jun 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Climate Adaptation Science Centers (2024). Process-guided deep learning water temperature predictions: 6 Model evaluation (test data and RMSE) [Dataset]. https://catalog.data.gov/dataset/process-guided-deep-learning-water-temperature-predictions-6-model-evaluation-test-data-an
Explore at:
Dataset updated
Jun 15, 2024
Dataset provided by
Climate Adaptation Science Centers
Description
This dataset includes evaluation data ("test" data) and performance metrics for water temperature predictions from multiple modeling frameworks. Process-Based (PB) models were configured and calibrated with training data to reduce root-mean squared error. Uncalibrated models used default configurations (PB0; see Winslow et al. 2016 for details) and no parameters were adjusted according to model fit with observations. Deep Learning (DL) models were Long Short-Term Memory artificial recurrent neural network models which used training data to adjust model structure and weights for temperature predictions (Jia et al. 2019). Process-Guided Deep Learning (PGDL) models were DL models with an added physical constraint for energy conservation as a loss term. These models were pre-trained with uncalibrated Process-Based model outputs (PB0) before training on actual temperature observations. Performance was measured as root-mean squared errors relative to temperature observations during the test period. Test data include compiled water temperature data from a variety of sources, including the Water Quality Portal (Read et al. 2017), the North Temperate Lakes Long-TERM Ecological Research Program (https://lter.limnology.wisc.edu/), the Minnesota department of Natural Resources, and the Global Lake Ecological Observatory Network (gleon.org). This dataset is part of a larger data release of lake temperature model inputs and outputs for 68 lakes in the U.S. states of Minnesota and Wisconsin (http://dx.doi.org/10.5066/P9AQPIVD).
n
Data from: Exploring deep learning techniques for wild animal behaviour...
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated Feb 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ryoma Otsuka; Naoya Yoshimura; Kei Tanigaki; Shiho Koyama; Yuichi Mizutani; Ken Yoda; Takuya Maekawa (2024). Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers [Dataset]. http://doi.org/10.5061/dryad.2ngf1vhwk
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.2ngf1vhwk
Dataset updated
Feb 22, 2024
Dataset provided by
Nagoya University
Osaka University
Authors
Ryoma Otsuka; Naoya Yoshimura; Kei Tanigaki; Shiho Koyama; Yuichi Mizutani; Ken Yoda; Takuya Maekawa
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Machine learning‐based behaviour classification using acceleration data is a powerful tool in bio‐logging research. Deep learning architectures such as convolutional neural networks (CNN), long short‐term memory (LSTM) and self‐attention mechanisms as well as related training techniques have been extensively studied in human activity recognition. However, they have rarely been used in wild animal studies. The main challenges of acceleration‐based wild animal behaviour classification include data shortages, class imbalance problems, various types of noise in data due to differences in individual behaviour and where the loggers were attached and complexity in data due to complex animal‐specific behaviours, which may have limited the application of deep learning techniques in this area. To overcome these challenges, we explored the effectiveness of techniques for efficient model training: data augmentation, manifold mixup and pre‐training of deep learning models with unlabelled data, using datasets from two species of wild seabirds and state‐of‐the‐art deep learning model architectures. Data augmentation improved the overall model performance when one of the various techniques (none, scaling, jittering, permutation, time‐warping and rotation) was randomly applied to each data during mini‐batch training. Manifold mixup also improved model performance, but not as much as random data augmentation. Pre‐training with unlabelled data did not improve model performance. The state‐of‐the‐art deep learning models, including a model consisting of four CNN layers, an LSTM layer and a multi‐head attention layer, as well as its modified version with shortcut connection, showed better performance among other comparative models. Using only raw acceleration data as inputs, these models outperformed classic machine learning approaches that used 119 handcrafted features. Our experiments showed that deep learning techniques are promising for acceleration‐based behaviour classification of wild animals and highlighted some challenges (e.g. effective use of unlabelled data). There is scope for greater exploration of deep learning techniques in wild animal studies (e.g. advanced data augmentation, multimodal sensor data use, transfer learning and self‐supervised learning). We hope that this study will stimulate the development of deep learning techniques for wild animal behaviour classification using time‐series sensor data.

This abstract is cited from the original article "Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers" in Methods in Ecology and Evolution (Otsuka et al., 2024).Please see README for the details of the datasets.
i
Dataset for "Integrating Deep Learning Approaches for Identifying News...
ieee-dataport.org
Updated Jun 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fangfang Wang (2025). Dataset for "Integrating Deep Learning Approaches for Identifying News Reprint Relation" [Dataset]. https://ieee-dataport.org/documents/dataset-integrating-deep-learning-approaches-identifying-news-reprint-relation
Explore at:
Dataset updated
Jun 17, 2025
Authors
Fangfang Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
of original news:30；# of candidate news:25899；# of reprinted news (no source label):4234 (537)
R
Transfer Learning In Deep Learning Dataset
universe.roboflow.com
zip
Updated Apr 18, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Suffolk University (2023). Transfer Learning In Deep Learning Dataset [Dataset]. https://universe.roboflow.com/suffolk-university-lfv35/transfer-learning-in-deep-learning
Explore at:
zipAvailable download formats
Dataset updated
Apr 18, 2023
Dataset authored and provided by
Suffolk University
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Flowers
Description
Transfer Learning In Deep Learning

## Overview Transfer Learning In Deep Learning is a dataset for classification tasks - it contains Flowers annotations for 6,921 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
m
A dataset for machine learning research in the field of stress analyses of...
data.mendeley.com
narcis.nl
Updated Jul 25, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jaroslav Matej (2020). A dataset for machine learning research in the field of stress analyses of mechanical structures [Dataset]. http://doi.org/10.17632/wzbzznk8z3.2
Explore at:
Unique identifier
https://doi.org/10.17632/wzbzznk8z3.2
Dataset updated
Jul 25, 2020
Authors
Jaroslav Matej
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset is prepared and intended as a data source for development of a stress analysis method based on machine learning. It consists of finite element stress analyses of randomly generated mechanical structures. The dataset contains more than 270,794 pairs of stress analyses images (von Mises stress) of randomly generated 2D structures with predefined thickness and material properties. All the structures are fixed at their bottom edges and loaded with gravity force only. See PREVIEW directory with some examples. The zip file contains all the files in the dataset.

Facebook

Twitter

Click to copy link

Link copied

Cite

Department of Energy (2024). BUTTER - Empirical Deep Learning Dataset [Dataset]. https://datasets.ai/datasets/butter-empirical-deep-learning-dataset

BUTTER - Empirical Deep Learning Dataset

Explore at:

6 scholarly articles cite this dataset (View in Google Scholar)

28, 21Available download formats

Dataset updated

Sep 11, 2024

Dataset authored and provided by

Department of Energy

Description

The BUTTER Empirical Deep Learning Dataset represents an empirical study of the deep learning phenomena on dense fully connected networks, scanning across thirteen datasets, eight network shapes, fourteen depths, twenty-three network sizes (number of trainable parameters), four learning rates, six minibatch sizes, four levels of label noise, and fourteen levels of L1 and L2 regularization each. Multiple repetitions (typically 30, sometimes 10) of each combination of hyperparameters were preformed, and statistics including training and test loss (using a 80% / 20% shuffled train-test split) are recorded at the end of each training epoch. In total, this dataset covers 178 thousand distinct hyperparameter settings ("experiments"), 3.55 million individual training runs (an average of 20 repetitions of each experiments), and a total of 13.3 billion training epochs (three thousand epochs were covered by most runs). Accumulating this dataset consumed 5,448.4 CPU core-years, 17.8 GPU-years, and 111.2 node-years.

Clear search

Close search

Google apps

Main menu

BUTTER - Empirical Deep Learning Dataset

Machine Learning Dataset

US Deep Learning Market Analysis, Size, and Forecast 2025-2029

Snapshot img

Trained deep learning models

Project Deep Learning Dataset

Project Deep Learning

LOCBEEF: Beef Quality Image dataset for Deep Learning Models

Deep Learning Market Analysis North America, Europe, APAC, South America,...

Snapshot img

TREC 2022 Deep Learning test collection

Data_Sheet_1_Deep Learning in Alzheimer's Disease: Diagnostic Classification...

Machine Learning (ML) Data | 800M+ B2B Profiles | AI-Ready for Deep Learning...

Adversarial Machine Learning Dataset

Adversarial Machine Learning Dataset

Description

Data structure

2023 TREC Deep Learning Track Dataset

Datasets

Machine Learning model data

A Dataset with Adversarial Attacks on Deep Learning in Wireless Modulation...

Process-guided deep learning water temperature predictions: 6 Model...

Data from: Exploring deep learning techniques for wild animal behaviour...

Dataset for "Integrating Deep Learning Approaches for Identifying News...

of original news:30；# of candidate news:25899；# of reprinted news (no source label):4234 (537)

Transfer Learning In Deep Learning Dataset

Transfer Learning In Deep Learning

A dataset for machine learning research in the field of stress analyses of...

BUTTER - Empirical Deep Learning Dataset