100+ datasets found
  1. d

    BUTTER - Empirical Deep Learning Dataset

    • datasets.ai
    • data.openei.org
    • +2more
    21, 28
    Updated Sep 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Energy (2024). BUTTER - Empirical Deep Learning Dataset [Dataset]. https://datasets.ai/datasets/butter-empirical-deep-learning-dataset
    Explore at:
    28, 21Available download formats
    Dataset updated
    Sep 11, 2024
    Dataset authored and provided by
    Department of Energy
    Description

    The BUTTER Empirical Deep Learning Dataset represents an empirical study of the deep learning phenomena on dense fully connected networks, scanning across thirteen datasets, eight network shapes, fourteen depths, twenty-three network sizes (number of trainable parameters), four learning rates, six minibatch sizes, four levels of label noise, and fourteen levels of L1 and L2 regularization each. Multiple repetitions (typically 30, sometimes 10) of each combination of hyperparameters were preformed, and statistics including training and test loss (using a 80% / 20% shuffled train-test split) are recorded at the end of each training epoch. In total, this dataset covers 178 thousand distinct hyperparameter settings ("experiments"), 3.55 million individual training runs (an average of 20 repetitions of each experiments), and a total of 13.3 billion training epochs (three thousand epochs were covered by most runs). Accumulating this dataset consumed 5,448.4 CPU core-years, 17.8 GPU-years, and 111.2 node-years.

  2. Machine Learning Dataset

    • brightdata.com
    .json, .csv, .xlsx
    Updated Jun 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2024). Machine Learning Dataset [Dataset]. https://brightdata.com/products/datasets/machine-learning
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Jun 19, 2024
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Utilize our machine learning datasets to develop and validate your models. Our datasets are designed to support a variety of machine learning applications, from image recognition to natural language processing and recommendation systems. You can access a comprehensive dataset or tailor a subset to fit your specific requirements, using data from a combination of various sources and websites, including custom ones. Popular use cases include model training and validation, where the dataset can be used to ensure robust performance across different applications. Additionally, the dataset helps in algorithm benchmarking by providing extensive data to test and compare various machine learning algorithms, identifying the most effective ones for tasks such as fraud detection, sentiment analysis, and predictive maintenance. Furthermore, it supports feature engineering by allowing you to uncover significant data attributes, enhancing the predictive accuracy of your machine learning models for applications like customer segmentation, personalized marketing, and financial forecasting.

  3. US Deep Learning Market Analysis, Size, and Forecast 2025-2029

    • technavio.com
    pdf
    Updated Jul 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). US Deep Learning Market Analysis, Size, and Forecast 2025-2029 [Dataset]. https://www.technavio.com/report/us-deep-learning-market-industry-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jul 8, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2025 - 2029
    Description

    Snapshot img

    US Deep Learning Market Size 2025-2029

    The deep learning market size in US is forecast to increase by USD 5.02 billion at a CAGR of 30.1% between 2024 and 2029.

    The deep learning market is experiencing robust growth, driven by the increasing adoption of artificial intelligence (AI) in various industries for advanced solutioning. This trend is fueled by the availability of vast amounts of data, which is a key requirement for deep learning algorithms to function effectively. Industry-specific solutions are gaining traction, as businesses seek to leverage deep learning for specific use cases such as image and speech recognition, fraud detection, and predictive maintenance. Alongside, intuitive data visualization tools are simplifying complex neural network outputs, helping stakeholders understand and validate insights. 
    
    
    However, challenges remain, including the need for powerful computing resources, data privacy concerns, and the high cost of implementing and maintaining deep learning systems. Despite these hurdles, the market's potential for innovation and disruption is immense, making it an exciting space for businesses to explore further. Semi-supervised learning, data labeling, and data cleaning facilitate efficient training of deep learning models. Cloud analytics is another significant trend, as companies seek to leverage cloud computing for cost savings and scalability. 
    

    What will be the Size of the market During the Forecast Period?

    Request Free Sample

    Deep learning, a subset of machine learning, continues to shape industries by enabling advanced applications such as image and speech recognition, text generation, and pattern recognition. Reinforcement learning, a type of deep learning, gains traction, with deep reinforcement learning leading the charge. Anomaly detection, a crucial application of unsupervised learning, safeguards systems against security vulnerabilities. Ethical implications and fairness considerations are increasingly important in deep learning, with emphasis on explainable AI and model interpretability. Graph neural networks and attention mechanisms enhance data preprocessing for sequential data modeling and object detection. Time series forecasting and dataset creation further expand deep learning's reach, while privacy preservation and bias mitigation ensure responsible use.

    In summary, deep learning's market dynamics reflect a constant pursuit of innovation, efficiency, and ethical considerations. The Deep Learning Market in the US is flourishing as organizations embrace intelligent systems powered by supervised learning and emerging self-supervised learning techniques. These methods refine predictive capabilities and reduce reliance on labeled data, boosting scalability. BFSI firms utilize AI image recognition for various applications, including personalizing customer communication, maintaining a competitive edge, and automating repetitive tasks to boost productivity. Sophisticated feature extraction algorithms now enable models to isolate patterns with high precision, particularly in applications such as image classification for healthcare, security, and retail.

    How is this market segmented and which is the largest segment?

    The market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

    Application
    
      Image recognition
      Voice recognition
      Video surveillance and diagnostics
      Data mining
    
    
    Type
    
      Software
      Services
      Hardware
    
    
    End-user
    
      Security
      Automotive
      Healthcare
      Retail and commerce
      Others
    
    
    Geography
    
      North America
    
        US
    

    By Application Insights

    The Image recognition segment is estimated to witness significant growth during the forecast period. In the realm of artificial intelligence (AI) and machine learning, image recognition, a subset of computer vision, is gaining significant traction. This technology utilizes neural networks, deep learning models, and various machine learning algorithms to decipher visual data from images and videos. Image recognition is instrumental in numerous applications, including visual search, product recommendations, and inventory management. Consumers can take photographs of products to discover similar items, enhancing the online shopping experience. In the automotive sector, image recognition is indispensable for advanced driver assistance systems (ADAS) and autonomous vehicles, enabling the identification of pedestrians, other vehicles, road signs, and lane markings.

    Furthermore, image recognition plays a pivotal role in augmented reality (AR) and virtual reality (VR) applications, where it tracks physical objects and overlays digital content onto real-world scenarios. The model training process involves the backpropagation algorithm, which calculates

  4. Trained deep learning models

    • figshare.com
    zip
    Updated Apr 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anuradha Kar (2021). Trained deep learning models [Dataset]. http://doi.org/10.6084/m9.figshare.14433590.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 19, 2021
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Anuradha Kar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Trained models for four deep learning based segmentation pipelines are included here. These include Unet models for the Plantseg and the Unet+Watershed and Cellpose pipelines and a MaskRCNN model for the MRCNN+Watershed pipeline.

  5. R

    Project Deep Learning Dataset

    • universe.roboflow.com
    zip
    Updated Feb 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    projectdeep (2024). Project Deep Learning Dataset [Dataset]. https://universe.roboflow.com/projectdeep/project-deep-learning-20bti
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 8, 2024
    Dataset authored and provided by
    projectdeep
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Chilli Carrot Tomato Bounding Boxes
    Description

    Project Deep Learning

    ## Overview
    
    Project Deep Learning is a dataset for object detection tasks - it contains Chilli Carrot Tomato annotations for 1,144 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  6. m

    LOCBEEF: Beef Quality Image dataset for Deep Learning Models

    • data.mendeley.com
    Updated Nov 30, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tri Mulya Dharma (2022). LOCBEEF: Beef Quality Image dataset for Deep Learning Models [Dataset]. http://doi.org/10.17632/nhs6mjg6yy.1
    Explore at:
    Dataset updated
    Nov 30, 2022
    Authors
    Tri Mulya Dharma
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The LOCBEEF dataset contains 3268 images of local Aceh beef collected from 07:00 a.m - 22:00 p.m, more information about the clock is shown in Fig. The dataset contains two categories of directories, namely train, and test. Furthermore, each subdirectory consists of fresh and rotten. An example of the image can be seen in Figs. 2 and 3. The directory structure for the data is shown in Fig. 1. The image directory for train contains 2228 images each subdirectory contains 1114 images, and the test directory contains 980 images for each subdirectory containing 490 images. For images have a resolution of 176 x 144 pixel, 320 x 240 pixel, 640 x 480 pixel, 720 x 480 pixel, 720 x 720 pixel, 1280 x 720 pixel, 1920 x 1080 pixel, 2560 x 1920 pixel, 3120 x 3120 pixel, 3264 x 2248 pixel, and 4160 x 3120 pixel.

    The classification of LOCBEEF datasets has been carried out using the deep learning method of Convolutional Neural Networks with an image composition of 70% training data and 30% test data. Images with the mentioned dimensions are included in the LOCBEEF dataset to apply to the Resnet50.

  7. Deep Learning Market Analysis North America, Europe, APAC, South America,...

    • technavio.com
    Updated Nov 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2022). Deep Learning Market Analysis North America, Europe, APAC, South America, Middle East and Africa - US, China, UK, Canada, Germany - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/deep-learning-market-industry-analysis
    Explore at:
    Dataset updated
    Nov 18, 2022
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    United States, Global
    Description

    Snapshot img

    Deep Learning Market Size 2024-2028

    The deep learning market size is forecast to increase by USD 10.85 billion at a CAGR of 26.06% between 2023 and 2028.

    Deep learning technology is revolutionizing various industries, including healthcare. In the healthcare sector, deep learning is being extensively used for the diagnosis and treatment of musculoskeletal and inflammatory disorders. The market for deep learning services is experiencing significant growth due to the increasing availability of high-resolution medical images, electronic health records, and big data. Medical professionals are leveraging deep learning technologies for disease indications such as failure-to-success ratio, image interpretation, and biomarker identification solutions. Moreover, with the proliferation of data from various sources such as social networks, smartphones, and IoT devices, there is a growing need for advanced analytics techniques to make sense of this data. Companies In the market are collaborating to offer comprehensive information services and digital analytical solutions. However, the lack of technical expertise among medical professionals poses a challenge to the widespread adoption of deep learning technologies. The market is witnessing an influx of startups, which is intensifying the competition. Deep learning services are being integrated with compatible devices for image processing and prognosis. Molecular data analysis is another area where deep learning technologies are making a significant impact.
    

    What will be the Size of the Deep Learning Market During the Forecast Period?

    Request Free Sample

    A subset of machine learning and artificial intelligence (AI), is a computational method inspired by the structure and function of the human brain. This technology utilizes neural networks, a type of machine learning model, to recognize patterns and learn from data. In the US market, deep learning is gaining significant traction due to its ability to process large amounts of data and extract meaningful insights. The market In the US is driven by several factors. One of the primary factors is the increasing availability of big data.
    Moreover, with the proliferation of data from various sources such as social networks, smartphones, and IoT devices, there is a growing need for advanced analytics techniques to make sense of this data. Deep learning algorithms, with their ability to learn from vast amounts of data, are well-positioned to address this need. Another factor fueling the growth of the market In the US is the increasing adoption of cloud-based technology. Cloud-based solutions offer several advantages, including scalability, flexibility, and cost savings. These solutions enable organizations to process large datasets and train complex models without the need for expensive hardware.
    

    How is this Industry segmented and which is the largest segment?

    The industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

    Application
    
      Image recognition
      Voice recognition
      Video surveillance and diagnostics
      Data mining
    
    
    Type
    
      Software
      Services
      Hardware
    
    
    Geography
    
      North America
    
        Canada
        US
    
    
      Europe
    
        Germany
        UK
    
    
      APAC
    
        China
    
    
      South America
    
    
    
      Middle East and Africa
    

    By Application Insights

    The image recognition segment is estimated to witness significant growth during the forecast period.
    

    In the realm of artificial intelligence (AI), image recognition holds significant value, particularly in sectors such as banking and finance (BFSI). This technology's ability to accurately identify and categorize images is invaluable, as extensive image repositories In these industries cannot be easily forged. BFSI firms utilize AI image recognition for various applications, including personalizing customer communication, maintaining a competitive edge, and automating repetitive tasks to boost productivity. For instance, social media platforms like Facebook employ this technology to correctly identify and assign images to the right user account with an impressive accuracy rate of approximately 98%. Moreover, AI image recognition plays a crucial role in eliminating fraudulent social media accounts.

    Get a glance at the report of share of various segments Request Free Sample

    The image recognition segment was valued at USD 1.05 billion in 2018 and showed a gradual increase during the forecast period.

    Regional Analysis

    North America is estimated to contribute 36% to the growth of the global market during the forecast period.
    

    Technavio's analysts have elaborately explained the regional trends and drivers that shape the market during the forecast period.

    For more insights on the market share of various regions, Reques

  8. TREC 2022 Deep Learning test collection

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated May 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2023). TREC 2022 Deep Learning test collection [Dataset]. https://catalog.data.gov/dataset/trec-2022-deep-learning-test-collection
    Explore at:
    Dataset updated
    May 9, 2023
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    This is a test collection for passage and document retrieval, produced in the TREC 2023 Deep Learning track. The Deep Learning Track studies information retrieval in a large training data regime. This is the case where the number of training queries with at least one positive label is at least in the tens of thousands, if not hundreds of thousands or more. This corresponds to real-world scenarios such as training based on click logs and training based on labels from shallow pools (such as the pooling in the TREC Million Query Track or the evaluation of search engines based on early precision).Certain machine learning based methods, such as methods based on deep learning are known to require very large datasets for training. Lack of such large scale datasets has been a limitation for developing such methods for common information retrieval tasks, such as document ranking. The Deep Learning Track organized in the previous years aimed at providing large scale datasets to TREC, and create a focused research effort with a rigorous blind evaluation of ranker for the passage ranking and document ranking tasks.Similar to the previous years, one of the main goals of the track in 2022 is to study what methods work best when a large amount of training data is available. For example, do the same methods that work on small data also work on large data? How much do methods improve when given more training data? What external data and models can be brought in to bear in this scenario, and how useful is it to combine full supervision with other forms of supervision?The collection contains 12 million web pages, 138 million passages from those web pages, search queries, and relevance judgments for the queries.

  9. f

    Data_Sheet_1_Deep Learning in Alzheimer's Disease: Diagnostic Classification...

    • frontiersin.figshare.com
    pdf
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Taeho Jo; Kwangsik Nho; Andrew J. Saykin (2023). Data_Sheet_1_Deep Learning in Alzheimer's Disease: Diagnostic Classification and Prognostic Prediction Using Neuroimaging Data.pdf [Dataset]. http://doi.org/10.3389/fnagi.2019.00220.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Frontiers
    Authors
    Taeho Jo; Kwangsik Nho; Andrew J. Saykin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Deep learning, a state-of-the-art machine learning approach, has shown outstanding performance over traditional machine learning in identifying intricate structures in complex high-dimensional data, especially in the domain of computer vision. The application of deep learning to early detection and automated classification of Alzheimer's disease (AD) has recently gained considerable attention, as rapid progress in neuroimaging techniques has generated large-scale multimodal neuroimaging data. A systematic review of publications using deep learning approaches and neuroimaging data for diagnostic classification of AD was performed. A PubMed and Google Scholar search was used to identify deep learning papers on AD published between January 2013 and July 2018. These papers were reviewed, evaluated, and classified by algorithm and neuroimaging type, and the findings were summarized. Of 16 studies meeting full inclusion criteria, 4 used a combination of deep learning and traditional machine learning approaches, and 12 used only deep learning approaches. The combination of traditional machine learning for classification and stacked auto-encoder (SAE) for feature selection produced accuracies of up to 98.8% for AD classification and 83.7% for prediction of conversion from mild cognitive impairment (MCI), a prodromal stage of AD, to AD. Deep learning approaches, such as convolutional neural network (CNN) or recurrent neural network (RNN), that use neuroimaging data without pre-processing for feature selection have yielded accuracies of up to 96.0% for AD classification and 84.2% for MCI conversion prediction. The best classification performance was obtained when multimodal neuroimaging and fluid biomarkers were combined. Deep learning approaches continue to improve in performance and appear to hold promise for diagnostic classification of AD using multimodal neuroimaging data. AD research that uses deep learning is still evolving, improving performance by incorporating additional hybrid data types, such as—omics data, increasing transparency with explainable approaches that add knowledge of specific disease-related features and mechanisms.

  10. d

    Machine Learning (ML) Data | 800M+ B2B Profiles | AI-Ready for Deep Learning...

    • datarade.ai
    .json, .csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xverum, Machine Learning (ML) Data | 800M+ B2B Profiles | AI-Ready for Deep Learning (DL), NLP & LLM Training [Dataset]. https://datarade.ai/data-products/xverum-company-data-b2b-data-belgium-netherlands-denm-xverum
    Explore at:
    .json, .csvAvailable download formats
    Dataset provided by
    Xverum LLC
    Authors
    Xverum
    Area covered
    India, Dominican Republic, Norway, Western Sahara, Oman, United Kingdom, Barbados, Jordan, Sint Maarten (Dutch part), Cook Islands
    Description

    Xverum’s AI & ML Training Data provides one of the most extensive datasets available for AI and machine learning applications, featuring 800M B2B profiles with 100+ attributes. This dataset is designed to enable AI developers, data scientists, and businesses to train robust and accurate ML models. From natural language processing (NLP) to predictive analytics, our data empowers a wide range of industries and use cases with unparalleled scale, depth, and quality.

    What Makes Our Data Unique?

    Scale and Coverage: - A global dataset encompassing 800M B2B profiles from a wide array of industries and geographies. - Includes coverage across the Americas, Europe, Asia, and other key markets, ensuring worldwide representation.

    Rich Attributes for Training Models: - Over 100 fields of detailed information, including company details, job roles, geographic data, industry categories, past experiences, and behavioral insights. - Tailored for training models in NLP, recommendation systems, and predictive algorithms.

    Compliance and Quality: - Fully GDPR and CCPA compliant, providing secure and ethically sourced data. - Extensive data cleaning and validation processes ensure reliability and accuracy.

    Annotation-Ready: - Pre-structured and formatted datasets that are easily ingestible into AI workflows. - Ideal for supervised learning with tagging options such as entities, sentiment, or categories.

    How Is the Data Sourced? - Publicly available information gathered through advanced, GDPR-compliant web aggregation techniques. - Proprietary enrichment pipelines that validate, clean, and structure raw data into high-quality datasets. This approach ensures we deliver comprehensive, up-to-date, and actionable data for machine learning training.

    Primary Use Cases and Verticals

    Natural Language Processing (NLP): Train models for named entity recognition (NER), text classification, sentiment analysis, and conversational AI. Ideal for chatbots, language models, and content categorization.

    Predictive Analytics and Recommendation Systems: Enable personalized marketing campaigns by predicting buyer behavior. Build smarter recommendation engines for ecommerce and content platforms.

    B2B Lead Generation and Market Insights: Create models that identify high-value leads using enriched company and contact information. Develop AI systems that track trends and provide strategic insights for businesses.

    HR and Talent Acquisition AI: Optimize talent-matching algorithms using structured job descriptions and candidate profiles. Build AI-powered platforms for recruitment analytics.

    How This Product Fits Into Xverum’s Broader Data Offering Xverum is a leading provider of structured, high-quality web datasets. While we specialize in B2B profiles and company data, we also offer complementary datasets tailored for specific verticals, including ecommerce product data, job listings, and customer reviews. The AI Training Data is a natural extension of our core capabilities, bridging the gap between structured data and machine learning workflows. By providing annotation-ready datasets, real-time API access, and customization options, we ensure our clients can seamlessly integrate our data into their AI development processes.

    Why Choose Xverum? - Experience and Expertise: A trusted name in structured web data with a proven track record. - Flexibility: Datasets can be tailored for any AI/ML application. - Scalability: With 800M profiles and more being added, you’ll always have access to fresh, up-to-date data. - Compliance: We prioritize data ethics and security, ensuring all data adheres to GDPR and other legal frameworks.

    Ready to supercharge your AI and ML projects? Explore Xverum’s AI Training Data to unlock the potential of 800M global B2B profiles. Whether you’re building a chatbot, predictive algorithm, or next-gen AI application, our data is here to help.

    Contact us for sample datasets or to discuss your specific needs.

  11. Adversarial Machine Learning Dataset

    • kaggle.com
    Updated Jul 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Network security group CNR-IEIIT (2022). Adversarial Machine Learning Dataset [Dataset]. https://www.kaggle.com/datasets/cnrieiit/adversarial-machine-learning-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 8, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Network security group CNR-IEIIT
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Adversarial Machine Learning Dataset

    This repository contains the datasets adopted in the following paper, published in IEEE Access. If you use this repository in your research work, please consider citing our paper.

    I. Vaccari, A. Carlevaro, S. Narteni, E. Cambiaso and M. Mongelli, "eXplainable and Reliable Against Adversarial Machine Learning in Data Analytics," in IEEE Access, vol. 10, pp. 83949-83970, 2022, DOI: 10.1109/ACCESS.2022.3197299. URL: https://ieeexplore.ieee.org/document/9852204

    @ARTICLE{9852204,
     author={Vaccari, Ivan and Carlevaro, Alberto and Narteni, Sara and Cambiaso, Enrico and Mongelli, Maurizio},
     journal={IEEE Access}, 
     title={eXplainable and Reliable Against Adversarial Machine Learning in Data Analytics}, 
     year={2022},
     volume={10},
     number={},
     pages={83949-83970},
     doi={10.1109/ACCESS.2022.3197299}}
    

    Description

    We consider three different applications: * DNS tunneling, referring to network data captured during a DNS tunneling attack * Platooning, referring to simulated data from a vehicle platooning scenario * Remaining useful life (RUL), related to predictive maintenance of aircraft engines

    Each application scenario has been targeted through different Adversarial Machine Learning methods. Particularly, the following methods are considered: * Carlini-Wagner (CW): Carlini, N., & Wagner, D. (2017). Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp) (pp. 39-57). IEEE. * Fast Gradient Sign Method (FGSM): Goodfellow, I. J., Shlens, J., & Szegedy, C. (2014). Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. * Jacobian Saliency Map (JSMA): Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z. B., & Swami, A. (2016, March). The limitations of deep learning in adversarial settings. In 2016 IEEE European symposium on security and privacy (EuroS&P) (pp. 372-387). IEEE.

    Data structure

    We have a folder for each application scenario (DNS tunneling, platooning, RUL). For each application scenario, two folders are found: * legitimate, including original data * malicious, including data attacked by Adversarial Machine Learning methods; in this case, both training and test data are reported, together with a combined data

    The target variable for the adversarial attacks detection is called attack in all cases, whereas the target variables in the original problems (legitimate data) are the following: * g for the DNS tunneling application * collision for the platooning application * RUL_binary for the RUL application

    DNS tunneling data structure

    Concerning the DNS tunneling application, the following features are considered: * mDt: mean inter-packet time interval * mA: mean answer packet size * mQ: mean query packet size * vDt: variance of inter-packet time interval * vA: variance of answer packet size * vQ: variance of query packet size * sDt: skewness of inter-packet time interval * sA: skewness of answer packet size * sQ: skewness of query packet size * kDt: kurtosis of inter-packet time interval * kA: kurtosis of answer packet size * kQ: kurtosis of query packet size

    A portion of the legitimate data is reported in the following.

    mDt | mA | mQ | vDt | vA | vQ | sDt | sA | sQ | kDt | kA | kQ | g --- | -- | -- | --- | -- | -- | --- | -- | -- | --- | -- | -- | - 0.46915 | 239.8778 | 86.6326 | 81.8955 | 26109.835267 | 71.490417 | 63.185077 | 2.380883 | 0.552061 | 4284.358131 | 7.187715 | 5623.187219 | 0 0.584831 | 254.0284 | 87.3976 | 11.010015 | 29161.123993 | 922.955914 | 6.777654 | 1.980254 | 42.142166 | 46.147515 | 4.539171 | -2.993006 | 0 0.633453 | 269.3278 | 88.255 | 11.800109 | 38263.725547 | 62.161175 | 6.453627 | 2.051226 | 0.469637 | 41.80766 | 4.600109 | -1.385322 | 0 2.649329 | 258.529 | 88.3704 | 25991.830887 | 35950.372759 | 1297.225604 | 70.664632 | 2.064281 | 37.215584 | 4992.657556 | 4.664802 | 2005555.720075 | 0 ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ...

    Platooning data structure

    Concerning the platooning application, the following features are considered: * N: number of vehicles in the platoon * F0: braking force applied by the leader * PER: packet error rate (probability of packet loss) * d0: initial distance between vehicles * v0: initial speed between vehicles

    A portion of the legitimate data is reported in the following.

    N | F0 | PER | d0 | v0 | collision --- | --...

  12. 2023 TREC Deep Learning Track Dataset

    • catalog.data.gov
    • data.nist.gov
    • +1more
    Updated Jul 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2025). 2023 TREC Deep Learning Track Dataset [Dataset]. https://catalog.data.gov/dataset/2023-trec-deep-learning-track-dataset
    Explore at:
    Dataset updated
    Jul 9, 2025
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    The Deep Learning track focuses on IR tasks where a large training set is available, allowing us to compare a variety of retrieval approaches including deep neural networks and strong non-neural approaches, to see what works best in a large-data regime.

  13. f

    Datasets

    • figshare.com
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bastian Eichenberger; YinXiu Zhan (2023). Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.12958037.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Authors
    Bastian Eichenberger; YinXiu Zhan
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The benchmarking datasets used for deepBlink. The npz files contain train/valid/test splits inside and can be used directly. The files belong to the following challenges / classes:- ISBI Particle tracking challenge: microtubule, vesicle, receptor- Custom synthetic (based on http://smal.ws): particle- Custom fixed cell: smfish- Custom live cell: suntagThe csv files are to determine which image in the test splits correspond to which original image, SNR, and density.

  14. Machine Learning model data

    • ecmwf.int
    Updated Jan 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Centre for Medium-Range Weather Forecasts (2023). Machine Learning model data [Dataset]. https://www.ecmwf.int/en/forecasts/dataset/machine-learning-model-data
    Explore at:
    Dataset updated
    Jan 1, 2023
    Dataset authored and provided by
    European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    three of these models are available:

  15. i

    A Dataset with Adversarial Attacks on Deep Learning in Wireless Modulation...

    • ieee-dataport.org
    Updated Sep 23, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antonios Argyriou (2023). A Dataset with Adversarial Attacks on Deep Learning in Wireless Modulation Classification [Dataset]. https://ieee-dataport.org/documents/dataset-adversarial-attacks-deep-learning-wireless-modulation-classification
    Explore at:
    Dataset updated
    Sep 23, 2023
    Authors
    Antonios Argyriou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains adversarial attacks on Deep Learning (DL) when it is employed for the classification of wireless modulated communication signals. The attack is executed with an obfuscating waveform that is embedded in the transmitted signal in such a way that prevents the extraction of clean data for training from a wireless eavesdropper. At the same time it allows a legitimate receiver (LRx) to demodulate the data.

  16. d

    Process-guided deep learning water temperature predictions: 6 Model...

    • catalog.data.gov
    • data.usgs.gov
    • +5more
    Updated Jun 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Climate Adaptation Science Centers (2024). Process-guided deep learning water temperature predictions: 6 Model evaluation (test data and RMSE) [Dataset]. https://catalog.data.gov/dataset/process-guided-deep-learning-water-temperature-predictions-6-model-evaluation-test-data-an
    Explore at:
    Dataset updated
    Jun 15, 2024
    Dataset provided by
    Climate Adaptation Science Centers
    Description

    This dataset includes evaluation data ("test" data) and performance metrics for water temperature predictions from multiple modeling frameworks. Process-Based (PB) models were configured and calibrated with training data to reduce root-mean squared error. Uncalibrated models used default configurations (PB0; see Winslow et al. 2016 for details) and no parameters were adjusted according to model fit with observations. Deep Learning (DL) models were Long Short-Term Memory artificial recurrent neural network models which used training data to adjust model structure and weights for temperature predictions (Jia et al. 2019). Process-Guided Deep Learning (PGDL) models were DL models with an added physical constraint for energy conservation as a loss term. These models were pre-trained with uncalibrated Process-Based model outputs (PB0) before training on actual temperature observations. Performance was measured as root-mean squared errors relative to temperature observations during the test period. Test data include compiled water temperature data from a variety of sources, including the Water Quality Portal (Read et al. 2017), the North Temperate Lakes Long-TERM Ecological Research Program (https://lter.limnology.wisc.edu/), the Minnesota department of Natural Resources, and the Global Lake Ecological Observatory Network (gleon.org). This dataset is part of a larger data release of lake temperature model inputs and outputs for 68 lakes in the U.S. states of Minnesota and Wisconsin (http://dx.doi.org/10.5066/P9AQPIVD).

  17. n

    Data from: Exploring deep learning techniques for wild animal behaviour...

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Feb 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ryoma Otsuka; Naoya Yoshimura; Kei Tanigaki; Shiho Koyama; Yuichi Mizutani; Ken Yoda; Takuya Maekawa (2024). Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers [Dataset]. http://doi.org/10.5061/dryad.2ngf1vhwk
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 22, 2024
    Dataset provided by
    Nagoya University
    Osaka University
    Authors
    Ryoma Otsuka; Naoya Yoshimura; Kei Tanigaki; Shiho Koyama; Yuichi Mizutani; Ken Yoda; Takuya Maekawa
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Machine learning‐based behaviour classification using acceleration data is a powerful tool in bio‐logging research. Deep learning architectures such as convolutional neural networks (CNN), long short‐term memory (LSTM) and self‐attention mechanisms as well as related training techniques have been extensively studied in human activity recognition. However, they have rarely been used in wild animal studies. The main challenges of acceleration‐based wild animal behaviour classification include data shortages, class imbalance problems, various types of noise in data due to differences in individual behaviour and where the loggers were attached and complexity in data due to complex animal‐specific behaviours, which may have limited the application of deep learning techniques in this area. To overcome these challenges, we explored the effectiveness of techniques for efficient model training: data augmentation, manifold mixup and pre‐training of deep learning models with unlabelled data, using datasets from two species of wild seabirds and state‐of‐the‐art deep learning model architectures. Data augmentation improved the overall model performance when one of the various techniques (none, scaling, jittering, permutation, time‐warping and rotation) was randomly applied to each data during mini‐batch training. Manifold mixup also improved model performance, but not as much as random data augmentation. Pre‐training with unlabelled data did not improve model performance. The state‐of‐the‐art deep learning models, including a model consisting of four CNN layers, an LSTM layer and a multi‐head attention layer, as well as its modified version with shortcut connection, showed better performance among other comparative models. Using only raw acceleration data as inputs, these models outperformed classic machine learning approaches that used 119 handcrafted features. Our experiments showed that deep learning techniques are promising for acceleration‐based behaviour classification of wild animals and highlighted some challenges (e.g. effective use of unlabelled data). There is scope for greater exploration of deep learning techniques in wild animal studies (e.g. advanced data augmentation, multimodal sensor data use, transfer learning and self‐supervised learning). We hope that this study will stimulate the development of deep learning techniques for wild animal behaviour classification using time‐series sensor data.

    This abstract is cited from the original article "Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers" in Methods in Ecology and Evolution (Otsuka et al., 2024).Please see README for the details of the datasets.

  18. i

    Dataset for "Integrating Deep Learning Approaches for Identifying News...

    • ieee-dataport.org
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fangfang Wang (2025). Dataset for "Integrating Deep Learning Approaches for Identifying News Reprint Relation" [Dataset]. https://ieee-dataport.org/documents/dataset-integrating-deep-learning-approaches-identifying-news-reprint-relation
    Explore at:
    Dataset updated
    Jun 17, 2025
    Authors
    Fangfang Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    of original news:30;# of candidate news:25899;# of reprinted news (no source label):4234 (537)

  19. R

    Transfer Learning In Deep Learning Dataset

    • universe.roboflow.com
    zip
    Updated Apr 18, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Suffolk University (2023). Transfer Learning In Deep Learning Dataset [Dataset]. https://universe.roboflow.com/suffolk-university-lfv35/transfer-learning-in-deep-learning
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 18, 2023
    Dataset authored and provided by
    Suffolk University
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Flowers
    Description

    Transfer Learning In Deep Learning

    ## Overview
    
    Transfer Learning In Deep Learning is a dataset for classification tasks - it contains Flowers annotations for 6,921 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  20. m

    A dataset for machine learning research in the field of stress analyses of...

    • data.mendeley.com
    • narcis.nl
    Updated Jul 25, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jaroslav Matej (2020). A dataset for machine learning research in the field of stress analyses of mechanical structures [Dataset]. http://doi.org/10.17632/wzbzznk8z3.2
    Explore at:
    Dataset updated
    Jul 25, 2020
    Authors
    Jaroslav Matej
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset is prepared and intended as a data source for development of a stress analysis method based on machine learning. It consists of finite element stress analyses of randomly generated mechanical structures. The dataset contains more than 270,794 pairs of stress analyses images (von Mises stress) of randomly generated 2D structures with predefined thickness and material properties. All the structures are fixed at their bottom edges and loaded with gravity force only. See PREVIEW directory with some examples. The zip file contains all the files in the dataset.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Department of Energy (2024). BUTTER - Empirical Deep Learning Dataset [Dataset]. https://datasets.ai/datasets/butter-empirical-deep-learning-dataset

BUTTER - Empirical Deep Learning Dataset

Explore at:
6 scholarly articles cite this dataset (View in Google Scholar)
28, 21Available download formats
Dataset updated
Sep 11, 2024
Dataset authored and provided by
Department of Energy
Description

The BUTTER Empirical Deep Learning Dataset represents an empirical study of the deep learning phenomena on dense fully connected networks, scanning across thirteen datasets, eight network shapes, fourteen depths, twenty-three network sizes (number of trainable parameters), four learning rates, six minibatch sizes, four levels of label noise, and fourteen levels of L1 and L2 regularization each. Multiple repetitions (typically 30, sometimes 10) of each combination of hyperparameters were preformed, and statistics including training and test loss (using a 80% / 20% shuffled train-test split) are recorded at the end of each training epoch. In total, this dataset covers 178 thousand distinct hyperparameter settings ("experiments"), 3.55 million individual training runs (an average of 20 repetitions of each experiments), and a total of 13.3 billion training epochs (three thousand epochs were covered by most runs). Accumulating this dataset consumed 5,448.4 CPU core-years, 17.8 GPU-years, and 111.2 node-years.

Search
Clear search
Close search
Google apps
Main menu