100+ datasets found
  1. n

    Data from: Exploring deep learning techniques for wild animal behaviour...

    • data.niaid.nih.gov
    • search.dataone.org
    • +2more
    zip
    Updated Feb 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ryoma Otsuka; Naoya Yoshimura; Kei Tanigaki; Shiho Koyama; Yuichi Mizutani; Ken Yoda; Takuya Maekawa (2024). Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers [Dataset]. http://doi.org/10.5061/dryad.2ngf1vhwk
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 22, 2024
    Dataset provided by
    Osaka University
    Nagoya University
    Authors
    Ryoma Otsuka; Naoya Yoshimura; Kei Tanigaki; Shiho Koyama; Yuichi Mizutani; Ken Yoda; Takuya Maekawa
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Machine learning‐based behaviour classification using acceleration data is a powerful tool in bio‐logging research. Deep learning architectures such as convolutional neural networks (CNN), long short‐term memory (LSTM) and self‐attention mechanisms as well as related training techniques have been extensively studied in human activity recognition. However, they have rarely been used in wild animal studies. The main challenges of acceleration‐based wild animal behaviour classification include data shortages, class imbalance problems, various types of noise in data due to differences in individual behaviour and where the loggers were attached and complexity in data due to complex animal‐specific behaviours, which may have limited the application of deep learning techniques in this area. To overcome these challenges, we explored the effectiveness of techniques for efficient model training: data augmentation, manifold mixup and pre‐training of deep learning models with unlabelled data, using datasets from two species of wild seabirds and state‐of‐the‐art deep learning model architectures. Data augmentation improved the overall model performance when one of the various techniques (none, scaling, jittering, permutation, time‐warping and rotation) was randomly applied to each data during mini‐batch training. Manifold mixup also improved model performance, but not as much as random data augmentation. Pre‐training with unlabelled data did not improve model performance. The state‐of‐the‐art deep learning models, including a model consisting of four CNN layers, an LSTM layer and a multi‐head attention layer, as well as its modified version with shortcut connection, showed better performance among other comparative models. Using only raw acceleration data as inputs, these models outperformed classic machine learning approaches that used 119 handcrafted features. Our experiments showed that deep learning techniques are promising for acceleration‐based behaviour classification of wild animals and highlighted some challenges (e.g. effective use of unlabelled data). There is scope for greater exploration of deep learning techniques in wild animal studies (e.g. advanced data augmentation, multimodal sensor data use, transfer learning and self‐supervised learning). We hope that this study will stimulate the development of deep learning techniques for wild animal behaviour classification using time‐series sensor data.

    This abstract is cited from the original article "Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers" in Methods in Ecology and Evolution (Otsuka et al., 2024).Please see README for the details of the datasets.

  2. H

    Data from: Data augmentation for disruption prediction via robust surrogate...

    • dataverse.harvard.edu
    • osti.gov
    Updated Aug 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katharina Rath, David Rügamer, Bernd Bischl, Udo von Toussaint, Cristina Rea, Andrew Maris, Robert Granetz, Christopher G. Albert (2024). Data augmentation for disruption prediction via robust surrogate models [Dataset]. http://doi.org/10.7910/DVN/FMJCAD
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 31, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Katharina Rath, David Rügamer, Bernd Bischl, Udo von Toussaint, Cristina Rea, Andrew Maris, Robert Granetz, Christopher G. Albert
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The goal of this work is to generate large statistically representative datasets to train machine learning models for disruption prediction provided by data from few existing discharges. Such a comprehensive training database is important to achieve satisfying and reliable prediction results in artificial neural network classifiers. Here, we aim for a robust augmentation of the training database for multivariate time series data using Student-t process regression. We apply Student-t process regression in a state space formulation via Bayesian filtering to tackle challenges imposed by outliers and noise in the training data set and to reduce the computational complexity. Thus, the method can also be used if the time resolution is high. We use an uncorrelated model for each dimension and impose correlations afterwards via coloring transformations. We demonstrate the efficacy of our approach on plasma diagnostics data of three different disruption classes from the DIII-D tokamak. To evaluate if the distribution of the generated data is similar to the training data, we additionally perform statistical analyses using methods from time series analysis, descriptive statistics, and classic machine learning clustering algorithms.

  3. Data from: Augmentation of Semantic Processes for Deep Learning Applications...

    • tandf.figshare.com
    txt
    Updated Jun 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maximilian Hoffmann; Lukas Malburg; Ralph Bergmann (2025). Augmentation of Semantic Processes for Deep Learning Applications [Dataset]. http://doi.org/10.6084/m9.figshare.29212617.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 2, 2025
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Maximilian Hoffmann; Lukas Malburg; Ralph Bergmann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The popularity of Deep Learning (DL) methods used in business process management research and practice is constantly increasing. One important factor that hinders the adoption of DL in certain areas is the availability of sufficiently large training datasets, particularly affecting domains where process models are mainly defined manually with a high knowledge-acquisition effort. In this paper, we examine process model augmentation in combination with semi-supervised transfer learning to enlarge existing datasets and train DL models effectively. The use case of similarity learning between manufacturing process models is discussed. Based on a literature study of existing augmentation techniques, a concept is presented with different categories of augmentation from knowledge-light approaches to knowledge-intensive ones, e. g. based on automated planning. Specifically, the impacts of augmentation approaches on the syntactic and semantic correctness of the augmented process models are considered. The concept also proposes a semi-supervised transfer learning approach to integrate augmented and non-augmented process model datasets in a two-phased training procedure. The experimental evaluation investigates augmented process model datasets regarding their quality for model training in the context of similarity learning between manufacturing process models. The results indicate a large potential with a reduction of the prediction error of up to 53%.

  4. R

    Train Augmentation Dataset

    • universe.roboflow.com
    zip
    Updated Sep 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nourelnouda (2025). Train Augmentation Dataset [Dataset]. https://universe.roboflow.com/nourelnouda/train-augmentation-lotij
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 20, 2025
    Dataset authored and provided by
    nourelnouda
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Train1 Bounding Boxes
    Description

    Train Augmentation

    ## Overview
    
    Train Augmentation is a dataset for object detection tasks - it contains Train1 annotations for 1,127 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  5. G

    Data Augmentation Tools Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Aug 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Data Augmentation Tools Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/data-augmentation-tools-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Aug 23, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Data Augmentation Tools Market Outlook



    As per our latest research, the global Data Augmentation Tools market size reached USD 1.47 billion in 2024, reflecting the rapidly increasing adoption of artificial intelligence and machine learning across diverse sectors. The market is experiencing robust momentum, registering a CAGR of 25.3% from 2025 to 2033. By the end of 2033, the Data Augmentation Tools market is forecasted to reach a substantial value of USD 11.6 billion. This impressive growth is primarily driven by the escalating need for high-quality, diverse datasets to train advanced AI models, coupled with the proliferation of digital transformation initiatives across industries.




    The primary growth factor fueling the Data Augmentation Tools market is the exponential rise in AI and machine learning applications, which require vast amounts of labeled data for effective training. As organizations strive to develop more accurate and robust models, the demand for data augmentation solutions that can synthetically expand and diversify datasets has surged. This trend is particularly pronounced in sectors such as healthcare, automotive, and retail, where the quality and quantity of data directly impact the performance and reliability of AI systems. The market is further propelled by the increasing complexity of data types, including images, text, audio, and video, necessitating sophisticated augmentation tools capable of handling multimodal data.




    Another significant driver is the growing focus on reducing model bias and improving generalization capabilities. Data augmentation tools enable organizations to generate synthetic samples that account for various real-world scenarios, thereby minimizing overfitting and enhancing the robustness of AI models. This capability is critical in regulated industries like BFSI and healthcare, where the consequences of biased or inaccurate models can be severe. Furthermore, the rise of edge computing and IoT devices has expanded the scope of data augmentation, as organizations seek to deploy AI solutions in resource-constrained environments that require optimized and diverse training datasets.




    The proliferation of cloud-based solutions has also played a pivotal role in shaping the trajectory of the Data Augmentation Tools market. Cloud deployment offers scalability, flexibility, and cost-effectiveness, allowing organizations of all sizes to access advanced augmentation capabilities without significant infrastructure investments. Additionally, the integration of data augmentation tools with popular machine learning frameworks and platforms has streamlined adoption, enabling seamless workflow integration and accelerating time-to-market for AI-driven products and services. These factors collectively contribute to the sustained growth and dynamism of the global Data Augmentation Tools market.




    From a regional perspective, North America currently dominates the Data Augmentation Tools market, accounting for the largest revenue share in 2024, followed closely by Europe and Asia Pacific. The strong presence of leading technology companies, robust investment in AI research, and early adoption of digital transformation initiatives have established North America as a key hub for data augmentation innovation. Meanwhile, Asia Pacific is poised for the fastest growth over the forecast period, driven by the rapid expansion of the IT and telecommunications sector, burgeoning e-commerce industry, and increasing government initiatives to promote AI adoption. Europe also maintains a significant market presence, supported by stringent data privacy regulations and a strong focus on ethical AI development.





    Component Analysis



    The Component segment of the Data Augmentation Tools market is bifurcated into Software and Services, each playing a critical role in enabling organizations to leverage data augmentation for AI and machine learning initiatives. The software sub-segment comprises

  6. D

    Data Augmentation Tools Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Oct 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Data Augmentation Tools Market Research Report 2033 [Dataset]. https://dataintelo.com/report/data-augmentation-tools-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Oct 1, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Data Augmentation Tools Market Outlook



    According to our latest research, the global Data Augmentation Tools market size reached USD 1.62 billion in 2024, with a robust year-on-year growth trajectory. The market is poised for accelerated expansion, projected to achieve a CAGR of 26.4% from 2025 to 2033. By the end of 2033, the market is forecasted to reach approximately USD 12.34 billion. This dynamic growth is primarily driven by the rising demand for artificial intelligence (AI) and machine learning (ML) applications across diverse industry verticals, which necessitate vast quantities of high-quality training data. The proliferation of data-centric AI models and the increasing complexity of real-world datasets are compelling enterprises to invest in advanced data augmentation tools to enhance data diversity and model robustness, as per the latest research insights.




    One of the principal growth factors fueling the Data Augmentation Tools market is the intensifying adoption of AI-driven solutions across industries such as healthcare, automotive, retail, and finance. Organizations are increasingly leveraging data augmentation to overcome the challenges posed by limited or imbalanced datasets, which are often a bottleneck in developing accurate and reliable AI models. By synthetically expanding training datasets through augmentation techniques, enterprises can significantly improve the generalization capabilities of their models, leading to enhanced performance and reduced risk of overfitting. Furthermore, the surge in computer vision, natural language processing, and speech recognition applications is creating a fertile environment for the adoption of specialized augmentation tools tailored to image, text, and audio data.




    Another significant factor contributing to market growth is the rapid evolution of augmentation technologies themselves. Innovations such as Generative Adversarial Networks (GANs), automated data labeling, and domain-specific augmentation pipelines are making it easier for organizations to deploy and scale data augmentation strategies. These advancements are not only reducing the manual effort and expertise required but also enabling the generation of highly realistic synthetic data that closely mimics real-world scenarios. As a result, businesses across sectors are able to accelerate their AI/ML development cycles, reduce costs associated with data collection and labeling, and maintain compliance with stringent data privacy regulations by minimizing the need to use sensitive real-world data.




    The growing integration of data augmentation tools within cloud-based AI development platforms is also acting as a major catalyst for market expansion. Cloud deployment offers unparalleled scalability, accessibility, and collaboration capabilities, allowing organizations of all sizes to harness the power of data augmentation without significant upfront infrastructure investments. This democratization of advanced data engineering tools is especially beneficial for small and medium enterprises (SMEs) and academic research institutes, which often face resource constraints. The proliferation of cloud-native augmentation solutions is further supported by strategic partnerships between technology vendors and cloud service providers, driving broader market penetration and innovation.




    From a regional perspective, North America continues to dominate the Data Augmentation Tools market, driven by the presence of leading AI technology companies, a mature digital infrastructure, and substantial investments in research and development. However, the Asia Pacific region is emerging as the fastest-growing market, fueled by rapid digital transformation initiatives, a burgeoning startup ecosystem, and increasing government support for AI innovation. Europe also holds a significant share, underpinned by strong regulatory frameworks and a focus on ethical AI development. Meanwhile, Latin America and the Middle East & Africa are witnessing steady adoption, particularly in sectors such as BFSI and healthcare, where data-driven insights are becoming increasingly critical.



    Component Analysis



    The Data Augmentation Tools market by component is bifurcated into Software and Services. The software segment currently accounts for the largest share of the market, owing to the widespread deployment of standalone and integrated augmentation solutions across enterprises and research institutions. These software plat

  7. Result of 10-Fold cross-validation on augmented dataset.

    • plos.figshare.com
    xls
    Updated Jun 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sidratul Montaha; Sami Azam; A. K. M. Rakibul Haque Rafid; Sayma Islam; Pronab Ghosh; Mirjam Jonkman (2023). Result of 10-Fold cross-validation on augmented dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0269826.t018
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Sidratul Montaha; Sami Azam; A. K. M. Rakibul Haque Rafid; Sayma Islam; Pronab Ghosh; Mirjam Jonkman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Result of 10-Fold cross-validation on augmented dataset.

  8. 400k Augmented MNIST: Extended Handwritten Digits

    • kaggle.com
    zip
    Updated Mar 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexandre Le Mercier (2025). 400k Augmented MNIST: Extended Handwritten Digits [Dataset]. https://www.kaggle.com/datasets/alexandrelemercier/400k-augmented-mnist-extended-handwritten-digits
    Explore at:
    zip(359213486 bytes)Available download formats
    Dataset updated
    Mar 26, 2025
    Authors
    Alexandre Le Mercier
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Overview

    The 400k Augmented MNIST dataset is an extended version of the classic MNIST handwritten digits dataset. By applying a variety of augmentation techniques, I have increased the number of training images to 400,000 - roughly 40,000 per digit label. This large and diverse training set is designed to significantly improve the robustness and generalization of models trained on it, making them less susceptible to overfitting and more resilient against adversarial perturbations.

    Dataset Structure

    The dataset is organized into two main directories:

    • Augmented MNIST Training Set (400k):
      This directory contains 10 subdirectories, one for each digit label ("Label 0" through "Label 9"). Each subdirectory holds the corresponding JPEG images generated via augmentation. These images have been produced using techniques such as random rotation, shear, translation, scaling, reflection, spatial padding, Ben Graham transformation, Gaussian noise, salt-and-pepper noise, and random text overlay.
    • MNIST Validation Set (4k):
      This directory also contains subdirectories "Label 0" to "Label 9". However, the validation set consists solely of the original MNIST images (approximately 400 per label) that were not used for augmentation. This allows you to evaluate model performance on natural, unaltered digit images, providing a clear benchmark for generalization.

    How to Use This Dataset

    1. Training:
      Use the augmented training set to train your deep learning models. The 400k images offer a wide variety of conditions, helping your model learn robust features that generalize well.
    2. Validation:
      Evaluate your models on the validation set, which contains only the original MNIST images. This will help you measure performance on “natural” digits, ensuring that improvements in robustness do not come at the expense of real-world accuracy.
    3. Flexibility:
      You can also experiment with mixed training (combining augmented and original images) to study how different training strategies affect model robustness and accuracy.

    Augmentation Techniques Applied

    The following augmentation functions were used to generate the extended dataset:

    • Random Rotation: Randomly rotates images within a specified angle range.
    • Random Shear: Applies slight shearing transformations.
    • Random Translation: Shifts images horizontally and vertically.
    • Random Scale: Zooms in or out on the images.
    • Ben Graham Transform: Enhances image contrast and clarity using a weighted Gaussian blur.
    • Random Gaussian Noise: Adds Gaussian noise to simulate sensor or environmental disturbances.
    • Random Salt-and-Pepper Noise: Introduces random pixel-level corruption.

    A random number of transformations (between 1 and 6, in a random order) is applied to each image, with the goal of creating a diverse and challenging training set.

    Citation

    If you use this dataset in your research, please cite it as follows:

    @misc{alexandre_le_mercier_2025,
      title={400k Augmented MNIST: Extended Handwritten Digits},
      url={https://www.kaggle.com/ds/6967763},
      DOI={10.34740/KAGGLE/DS/6967763},
      publisher={Kaggle},
      author={Alexandre Le Mercier},
      year={2025}
    }
    

    License

    This dataset is under the Apache 2.0 license.

    Contact

    For any questions or issues regarding this dataset, please send a message in the "Discussions" or "Suggestions" sections of the Kaggle dataset page.

    Good luck and happy coding! 🚀

  9. Packages Object Detection Dataset - augmented-v1

    • public.roboflow.com
    zip
    Updated Jan 14, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roboflow Community (2021). Packages Object Detection Dataset - augmented-v1 [Dataset]. https://public.roboflow.com/object-detection/packages-dataset/5
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 14, 2021
    Dataset provided by
    Roboflowhttps://roboflow.com/
    Authors
    Roboflow Community
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Variables measured
    Bounding Boxes of packages
    Description

    About This Dataset

    The Roboflow Packages dataset is a collection of packages located at the doors of various apartments and homes. Packages are flat envelopes, small boxes, and large boxes. Some images contain multiple annotated packages.

    Usage

    This dataset may be used as a good starter dataset to track and identify when a package has been delivered to a home. Perhaps you want to know when a package arrives to claim it quickly or prevent package theft.

    If you plan to use this dataset and adapt it to your own front door, it is recommended that you capture and add images from the context of your specific camera position. You can easily add images to this dataset via the web UI or via the Roboflow Upload API.

    About Roboflow

    Roboflow enables teams to build better computer vision models faster. We provide tools for image collection, organization, labeling, preprocessing, augmentation, training and deployment. :fa-spacer: Developers reduce boilerplate code when using Roboflow's workflow, save training time, and increase model reproducibility. :fa-spacer:

    Roboflow Wordmark

  10. i

    Model Performance Results For Distribution-Driven Augmentation of Medical...

    • ieee-dataport.org
    Updated Jul 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephen Price (2024). Model Performance Results For Distribution-Driven Augmentation of Medical Data [Dataset]. https://ieee-dataport.org/documents/model-performance-results-distribution-driven-augmentation-medical-data
    Explore at:
    Dataset updated
    Jul 8, 2024
    Authors
    Stephen Price
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data included here within is the associated model training results from the correlated paper "Distribution-Driven Augmentation of Real-World Datasets for Improved Cancer Diagnostics With Machine Learning". This paper focuses on using kernel density estimators to curate datasets by balancing classes and filling missing null values though synthetically generated data. Additionally

  11. h

    Train-Augmentation-V2

    • huggingface.co
    Updated May 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Esmail (2024). Train-Augmentation-V2 [Dataset]. https://huggingface.co/datasets/ahmedesmail16/Train-Augmentation-V2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 29, 2024
    Authors
    Ahmed Esmail
    Description

    ahmedesmail16/Train-Augmentation-V2 dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. Z

    Training dataset for "A deep learned nanowire segmentation model using...

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lin, Binbin; Nima, Emami; David, A. Santos; Yuting, Luo; Sarbajit, Banerjee; Bai-Xiang, Xu (2024). Training dataset for "A deep learned nanowire segmentation model using synthetic data augmentation" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6469772
    Explore at:
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Department of ChemistryTexas A&M University, College Station, TX77843, USA
    Mechanics of Functional Materials Division, Institute of Materials Science Technische Universität Darmstadt 64287 Darmstadt Germany
    Authors
    Lin, Binbin; Nima, Emami; David, A. Santos; Yuting, Luo; Sarbajit, Banerjee; Bai-Xiang, Xu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This image dataset contains synthetic structure images used for training the deep-learning based nanowire segmentation model presented in our work "A deep learned nanowire segmentation model using synthetic data augmentation" to be published in npj Computational materials. Detailed information can be found in the corresponding article.

  13. Per-class recall values for results in Table 5.

    • plos.figshare.com
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chinmayee Athalye; Rima Arnaout (2023). Per-class recall values for results in Table 5. [Dataset]. http://doi.org/10.1371/journal.pone.0282532.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Chinmayee Athalye; Rima Arnaout
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    While domain-specific data augmentation can be useful in training neural networks for medical imaging tasks, such techniques have not been widely used to date. Our objective was to test whether domain-specific data augmentation is useful for medical imaging using a well-benchmarked task: view classification on fetal ultrasound FETAL-125 and OB-125 datasets. We found that using a context-preserving cut-paste strategy, we could create valid training data as measured by performance of the resulting trained model on the benchmark test dataset. When used in an online fashion, models trained on this hybrid data performed similarly to those trained using traditional data augmentation (FETAL-125 F-score 85.33 ± 0.24 vs 86.89 ± 0.60, p-value 0.014; OB-125 F-score 74.60 ± 0.11 vs 72.43 ± 0.62, p-value 0.004). Furthermore, the ability to perform augmentations during training time, as well as the ability to apply chosen augmentations equally across data classes, are important considerations in designing a bespoke data augmentation. Finally, we provide open-source code to facilitate running bespoke data augmentations in an online fashion. Taken together, this work expands the ability to design and apply domain-guided data augmentations for medical imaging tasks.

  14. Image Processing & Multi-class Classification

    • kaggle.com
    zip
    Updated Oct 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md. Sagor Ahmed (2023). Image Processing & Multi-class Classification [Dataset]. https://www.kaggle.com/datasets/mdsagorahmed/image-processing-and-multi-class-classification
    Explore at:
    zip(2577462 bytes)Available download formats
    Dataset updated
    Oct 7, 2023
    Authors
    Md. Sagor Ahmed
    Description

    This Dataset is for image processing practices and training a simple multiclass classification model. there are 3 sub folders(class) .The image was collected from the internet manually.

  15. MNIST Data Augmentation Parameters - Grid Search

    • kaggle.com
    zip
    Updated Oct 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David List (2024). MNIST Data Augmentation Parameters - Grid Search [Dataset]. https://www.kaggle.com/datasets/davidlist/data-augmentation-parameters-grid-search/code
    Explore at:
    zip(1989 bytes)Available download formats
    Dataset updated
    Oct 16, 2024
    Authors
    David List
    Description

    This dataset is the the result of performing a grid search on the parameters passed to Keras ImageDataGenerator to augment data for training an LeNet-5 inspired neural network. For each combination of parameters, the accuracy on the 20% validation set is reported.

    The following parameters and values were included in the grid search:

    rotation_range: 0, 5, 10, 20, 30 zoom_range: 0, 0.1, 0.2 width_shift_range: 0, 0.1, 0.2 height_shift_range: 0, 0.1, 0.2 shear: 0, 5, 10

    For more information see the following notebook: https://www.kaggle.com/code/davidlist/mnist-augmented-data-parameter-selection

    This dataset is also related to the following dataset: https://www.kaggle.com/datasets/davidlist/data-augmentation-parameters-random-search

  16. R

    Training No Augmentations Dataset

    • universe.roboflow.com
    zip
    Updated Dec 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Drone Project (2021). Training No Augmentations Dataset [Dataset]. https://universe.roboflow.com/drone-project-d8pi1/training-no-augmentations
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 8, 2021
    Dataset authored and provided by
    Drone Project
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Human Bounding Boxes
    Description

    Training No Augmentations

    ## Overview
    
    Training No Augmentations is a dataset for object detection tasks - it contains Human annotations for 1,638 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  17. R

    Car Highway Dataset

    • universe.roboflow.com
    zip
    Updated Sep 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sallar (2023). Car Highway Dataset [Dataset]. https://universe.roboflow.com/sallar/car-highway/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 13, 2023
    Dataset authored and provided by
    Sallar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Vehicles Bounding Boxes
    Description

    Car-Highway Data Annotation Project

    Introduction

    In this project, we aim to annotate car images captured on highways. The annotated data will be used to train machine learning models for various computer vision tasks, such as object detection and classification.

    Project Goals

    • Collect a diverse dataset of car images from highway scenes.
    • Annotate the dataset to identify and label cars within each image.
    • Organize and format the annotated data for machine learning model training.

    Tools and Technologies

    For this project, we will be using Roboflow, a powerful platform for data annotation and preprocessing. Roboflow simplifies the annotation process and provides tools for data augmentation and transformation.

    Annotation Process

    1. Upload the raw car images to the Roboflow platform.
    2. Use the annotation tools in Roboflow to draw bounding boxes around each car in the images.
    3. Label each bounding box with the corresponding class (e.g., car).
    4. Review and validate the annotations for accuracy.

    Data Augmentation

    Roboflow offers data augmentation capabilities, such as rotation, flipping, and resizing. These augmentations can help improve the model's robustness.

    Data Export

    Once the data is annotated and augmented, Roboflow allows us to export the dataset in various formats suitable for training machine learning models, such as YOLO, COCO, or TensorFlow Record.

    Milestones

    1. Data Collection and Preprocessing
    2. Annotation of Car Images
    3. Data Augmentation
    4. Data Export
    5. Model Training

    Conclusion

    By completing this project, we will have a well-annotated dataset ready for training machine learning models. This dataset can be used for a wide range of applications in computer vision, including car detection and tracking on highways.

  18. SVD-Generated Video Dataset

    • kaggle.com
    zip
    Updated May 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Afnan Algharbi (2025). SVD-Generated Video Dataset [Dataset]. https://www.kaggle.com/datasets/afnanalgarby/svd-generated-video-dataset
    Explore at:
    zip(102546508 bytes)Available download formats
    Dataset updated
    May 11, 2025
    Authors
    Afnan Algharbi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains synthetic video samples generated from a 10-class subset of Tiny ImageNet using Stable Video Diffusion (SVD). It is designed to evaluate the impact of generative temporal augmentation on image classification performance.

    Each training and validation video corresponds to a single image augmented into a sequence of frames.

    Videos are stored in .mp4 format and labeled via train.csv and val.csv.

    Sources:

    Tiny ImageNet: Stanford CS231n

    SVD model: Stable Video Diffusion

    License: Creative Commons Attribution 4.0 International (CC BY 4.0)

  19. ECG Augmented Dataset

    • kaggle.com
    zip
    Updated Oct 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sidali Khelil cherfi (2025). ECG Augmented Dataset [Dataset]. https://www.kaggle.com/datasets/sidalikhelilcherfi/ecg-augmented
    Explore at:
    zip(5174909523 bytes)Available download formats
    Dataset updated
    Oct 7, 2025
    Authors
    sidali Khelil cherfi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    🩺 Dataset Description

    This dataset is an augmented version of an ECG image dataset created to balance and enrich the original classes for deep learning–based cardiovascular disease classification.

    The original dataset consisted of unbalanced image counts per class in the training set: - ABH: 233 images - MI: 239 images - HMI: 172 images - NORM: 284 images

    To improve class balance and model generalization, each class in the training set was expanded to 500 images using a combination of morphological, noise-based, and geometric data augmentation techniques. Additionally, the test set includes 112 images per class.

    ⚖️ Final Dataset Composition

    • Training set: 4 classes × 500 images each → 2,000 images total
    • Test set: 4 classes × 112 images each → 448 images total

    🔬 Data Augmentation Techniques

    1. Morphological Alterations - Erosion - Dilation - None (original preserved)

    2. Noise Introduction - augment_noise_black_rain — simulates black streaks - augment_noise_pixel_dropout_black — random black pixel dropout - augment_noise_white_rain — simulates white streaks - augment_noise_pixel_dropout_white — random white pixel dropout

    3. Geometric Transformations - Shift — small translations in all directions - Scale — random zoom-in/zoom-out between 0.9× and 1.1× - Rotate — small random rotation between -5° and +5°

    These transformations were applied with balanced proportions to ensure diversity and realism while preserving diagnostic features of ECG signals.

    💡 Intended Use

    This dataset is designed for: - Training and evaluating deep learning models (CNNs, ViTs) for ECG image classification - Research in medical image augmentation, imbalanced data learning, and cardiovascular disease prediction

    📘 License

    This dataset is released under the CC0 1.0 License, allowing free use and distribution for research and educational purposes.

  20. R

    Hard Hat Workers Object Detection Dataset - resize-416x416-reflectEdges

    • public.roboflow.com
    zip
    Updated Sep 30, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Northeastern University - China (2022). Hard Hat Workers Object Detection Dataset - resize-416x416-reflectEdges [Dataset]. https://public.roboflow.com/object-detection/hard-hat-workers/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 30, 2022
    Dataset authored and provided by
    Northeastern University - China
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Variables measured
    Bounding Boxes of Workers
    Description

    Overview

    The Hard Hat dataset is an object detection dataset of workers in workplace settings that require a hard hat. Annotations also include examples of just "person" and "head," for when an individual may be present without a hard hart.

    The original dataset has a 75/25 train-test split.

    Example Image: https://i.imgur.com/7spoIJT.png" alt="Example Image">

    Use Cases

    One could use this dataset to, for example, build a classifier of workers that are abiding safety code within a workplace versus those that may not be. It is also a good general dataset for practice.

    Using this Dataset

    Use the fork or Download this Dataset button to copy this dataset to your own Roboflow account and export it with new preprocessing settings (perhaps resized for your model's desired format or converted to grayscale), or additional augmentations to make your model generalize better. This particular dataset would be very well suited for Roboflow's new advanced Bounding Box Only Augmentations.

    Dataset Versions:

    Image Preprocessing | Image Augmentation | Modify Classes * v1 (resize-416x416-reflect): generated with the original 75/25 train-test split | No augmentations * v2 (raw_75-25_trainTestSplit): generated with the original 75/25 train-test split | These are the raw, original images * v3 (v3): generated with the original 75/25 train-test split | Modify Classes used to drop person class | Preprocessing and Augmentation applied * v5 (raw_HeadHelmetClasses): generated with a 70/20/10 train/valid/test split | Modify Classes used to drop person class * v8 (raw_HelmetClassOnly): generated with a 70/20/10 train/valid/test split | Modify Classes used to drop head and person classes * v9 (raw_PersonClassOnly): generated with a 70/20/10 train/valid/test split | Modify Classes used to drop head and helmet classes * v10 (raw_AllClasses): generated with a 70/20/10 train/valid/test split | These are the raw, original images * v11 (augmented3x-AllClasses-FastModel): generated with a 70/20/10 train/valid/test split | Preprocessing and Augmentation applied | 3x image generation | Trained with Roboflow's Fast Model * v12 (augmented3x-HeadHelmetClasses-FastModel): generated with a 70/20/10 train/valid/test split | Preprocessing and Augmentation applied, Modify Classes used to drop person class | 3x image generation | Trained with Roboflow's Fast Model * v13 (augmented3x-HeadHelmetClasses-AccurateModel): generated with a 70/20/10 train/valid/test split | Preprocessing and Augmentation applied, Modify Classes used to drop person class | 3x image generation | Trained with Roboflow's Accurate Model * v14 (raw_HeadClassOnly): generated with a 70/20/10 train/valid/test split | Modify Classes used to drop person class, and remap/relabel helmet class to head

    Choosing Between Computer Vision Model Sizes | Roboflow Train

    About Roboflow

    Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.

    Developers reduce 50% of their code when using Roboflow's workflow, automate annotation quality assurance, save training time, and increase model reproducibility.

    Roboflow Workmark

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ryoma Otsuka; Naoya Yoshimura; Kei Tanigaki; Shiho Koyama; Yuichi Mizutani; Ken Yoda; Takuya Maekawa (2024). Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers [Dataset]. http://doi.org/10.5061/dryad.2ngf1vhwk

Data from: Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers

Related Article
Explore at:
zipAvailable download formats
Dataset updated
Feb 22, 2024
Dataset provided by
Osaka University
Nagoya University
Authors
Ryoma Otsuka; Naoya Yoshimura; Kei Tanigaki; Shiho Koyama; Yuichi Mizutani; Ken Yoda; Takuya Maekawa
License

https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

Description

Machine learning‐based behaviour classification using acceleration data is a powerful tool in bio‐logging research. Deep learning architectures such as convolutional neural networks (CNN), long short‐term memory (LSTM) and self‐attention mechanisms as well as related training techniques have been extensively studied in human activity recognition. However, they have rarely been used in wild animal studies. The main challenges of acceleration‐based wild animal behaviour classification include data shortages, class imbalance problems, various types of noise in data due to differences in individual behaviour and where the loggers were attached and complexity in data due to complex animal‐specific behaviours, which may have limited the application of deep learning techniques in this area. To overcome these challenges, we explored the effectiveness of techniques for efficient model training: data augmentation, manifold mixup and pre‐training of deep learning models with unlabelled data, using datasets from two species of wild seabirds and state‐of‐the‐art deep learning model architectures. Data augmentation improved the overall model performance when one of the various techniques (none, scaling, jittering, permutation, time‐warping and rotation) was randomly applied to each data during mini‐batch training. Manifold mixup also improved model performance, but not as much as random data augmentation. Pre‐training with unlabelled data did not improve model performance. The state‐of‐the‐art deep learning models, including a model consisting of four CNN layers, an LSTM layer and a multi‐head attention layer, as well as its modified version with shortcut connection, showed better performance among other comparative models. Using only raw acceleration data as inputs, these models outperformed classic machine learning approaches that used 119 handcrafted features. Our experiments showed that deep learning techniques are promising for acceleration‐based behaviour classification of wild animals and highlighted some challenges (e.g. effective use of unlabelled data). There is scope for greater exploration of deep learning techniques in wild animal studies (e.g. advanced data augmentation, multimodal sensor data use, transfer learning and self‐supervised learning). We hope that this study will stimulate the development of deep learning techniques for wild animal behaviour classification using time‐series sensor data.

This abstract is cited from the original article "Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers" in Methods in Ecology and Evolution (Otsuka et al., 2024).Please see README for the details of the datasets.

Search
Clear search
Close search
Google apps
Main menu