100+ datasets found
  1. n

    Data from: Exploring deep learning techniques for wild animal behaviour...

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Feb 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ryoma Otsuka; Naoya Yoshimura; Kei Tanigaki; Shiho Koyama; Yuichi Mizutani; Ken Yoda; Takuya Maekawa (2024). Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers [Dataset]. http://doi.org/10.5061/dryad.2ngf1vhwk
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 22, 2024
    Dataset provided by
    Nagoya University
    Osaka University
    Authors
    Ryoma Otsuka; Naoya Yoshimura; Kei Tanigaki; Shiho Koyama; Yuichi Mizutani; Ken Yoda; Takuya Maekawa
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Machine learning‐based behaviour classification using acceleration data is a powerful tool in bio‐logging research. Deep learning architectures such as convolutional neural networks (CNN), long short‐term memory (LSTM) and self‐attention mechanisms as well as related training techniques have been extensively studied in human activity recognition. However, they have rarely been used in wild animal studies. The main challenges of acceleration‐based wild animal behaviour classification include data shortages, class imbalance problems, various types of noise in data due to differences in individual behaviour and where the loggers were attached and complexity in data due to complex animal‐specific behaviours, which may have limited the application of deep learning techniques in this area. To overcome these challenges, we explored the effectiveness of techniques for efficient model training: data augmentation, manifold mixup and pre‐training of deep learning models with unlabelled data, using datasets from two species of wild seabirds and state‐of‐the‐art deep learning model architectures. Data augmentation improved the overall model performance when one of the various techniques (none, scaling, jittering, permutation, time‐warping and rotation) was randomly applied to each data during mini‐batch training. Manifold mixup also improved model performance, but not as much as random data augmentation. Pre‐training with unlabelled data did not improve model performance. The state‐of‐the‐art deep learning models, including a model consisting of four CNN layers, an LSTM layer and a multi‐head attention layer, as well as its modified version with shortcut connection, showed better performance among other comparative models. Using only raw acceleration data as inputs, these models outperformed classic machine learning approaches that used 119 handcrafted features. Our experiments showed that deep learning techniques are promising for acceleration‐based behaviour classification of wild animals and highlighted some challenges (e.g. effective use of unlabelled data). There is scope for greater exploration of deep learning techniques in wild animal studies (e.g. advanced data augmentation, multimodal sensor data use, transfer learning and self‐supervised learning). We hope that this study will stimulate the development of deep learning techniques for wild animal behaviour classification using time‐series sensor data.

    This abstract is cited from the original article "Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers" in Methods in Ecology and Evolution (Otsuka et al., 2024).Please see README for the details of the datasets.

  2. Additional file 4 of Which data subset should be augmented for deep...

    • springernature.figshare.com
    xlsx
    Updated Jun 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yusra A. Ameen; Dalia M. Badary; Ahmad Elbadry I. Abonnoor; Khaled F. Hussain; Adel A. Sewisy (2023). Additional file 4 of Which data subset should be augmented for deep learning? a simulation study using urothelial cell carcinoma histopathology images [Dataset]. http://doi.org/10.6084/m9.figshare.22622732.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Yusra A. Ameen; Dalia M. Badary; Ahmad Elbadry I. Abonnoor; Khaled F. Hussain; Adel A. Sewisy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 4. A Microsoft® Excel® workbook that details the raw data for the 8 experiments in which either the test set was augmented alone (after its allocation) or augmentation of the whole dataset was done before test-set allocation. All of the image-classification output probabilities are included.

  3. Variable Message Signal annotated images for object detection

    • zenodo.org
    zip
    Updated Oct 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gonzalo de las Heras de Matías; Gonzalo de las Heras de Matías; Javier Sánchez-Soriano; Javier Sánchez-Soriano; Enrique Puertas; Enrique Puertas (2022). Variable Message Signal annotated images for object detection [Dataset]. http://doi.org/10.5281/zenodo.5904211
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 2, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gonzalo de las Heras de Matías; Gonzalo de las Heras de Matías; Javier Sánchez-Soriano; Javier Sánchez-Soriano; Enrique Puertas; Enrique Puertas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    If you use this dataset, please cite this paper: Puertas, E.; De-Las-Heras, G.; Sánchez-Soriano, J.; Fernández-Andrés, J. Dataset: Variable Message Signal Annotated Images for Object Detection. Data 2022, 7, 41. https://doi.org/10.3390/data7040041

    This dataset consists of Spanish road images taken from inside a vehicle, as well as annotations in XML files in PASCAL VOC format that indicate the location of Variable Message Signals within them. Also, a CSV file is attached with information regarding the geographic position, the folder where the image is located, and the text in Spanish. This can be used to train supervised learning computer vision algorithms, such as convolutional neural networks. Throughout this work, the process followed to obtain the dataset, image acquisition, and labeling, and its specifications are detailed. The dataset is constituted of 1216 instances, 888 positives, and 328 negatives, in 1152 jpg images with a resolution of 1280x720 pixels. These are divided into 576 real images and 576 images created from the data-augmentation technique. The purpose of this dataset is to help in road computer vision research since there is not one specifically for VMSs.

    The folder structure of the dataset is as follows:

    • vms_dataset/
      • data.csv
      • real_images/
        • imgs/
        • annotations/
      • data-augmentation/
        • imgs/
        • annotations/

    In which:

    • data.csv: Each row contains the following information separated by commas (,): image_name, x_min, y_min, x_max, y_max, class_name, lat, long, folder, text.
    • real_images: Images extracted directly from the videos.
    • data-augmentation: Images created using data-augmentation
    • imgs: Image files in .jpg format.
    • annotations: Annotation files in .xml format.
  4. H

    Data from: Data augmentation for disruption prediction via robust surrogate...

    • dataverse.harvard.edu
    • osti.gov
    Updated Aug 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katharina Rath, David Rügamer, Bernd Bischl, Udo von Toussaint, Cristina Rea, Andrew Maris, Robert Granetz, Christopher G. Albert (2024). Data augmentation for disruption prediction via robust surrogate models [Dataset]. http://doi.org/10.7910/DVN/FMJCAD
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 31, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Katharina Rath, David Rügamer, Bernd Bischl, Udo von Toussaint, Cristina Rea, Andrew Maris, Robert Granetz, Christopher G. Albert
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The goal of this work is to generate large statistically representative datasets to train machine learning models for disruption prediction provided by data from few existing discharges. Such a comprehensive training database is important to achieve satisfying and reliable prediction results in artificial neural network classifiers. Here, we aim for a robust augmentation of the training database for multivariate time series data using Student-t process regression. We apply Student-t process regression in a state space formulation via Bayesian filtering to tackle challenges imposed by outliers and noise in the training data set and to reduce the computational complexity. Thus, the method can also be used if the time resolution is high. We use an uncorrelated model for each dimension and impose correlations afterwards via coloring transformations. We demonstrate the efficacy of our approach on plasma diagnostics data of three different disruption classes from the DIII-D tokamak. To evaluate if the distribution of the generated data is similar to the training data, we additionally perform statistical analyses using methods from time series analysis, descriptive statistics, and classic machine learning clustering algorithms.

  5. f

    Detailed characterization of the dataset.

    • figshare.com
    xls
    Updated Sep 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rodrigo Gutiérrez Benítez; Alejandra Segura Navarrete; Christian Vidal-Castro; Claudia Martínez-Araneda (2024). Detailed characterization of the dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0310707.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Sep 26, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Rodrigo Gutiérrez Benítez; Alejandra Segura Navarrete; Christian Vidal-Castro; Claudia Martínez-Araneda
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Over the last ten years, social media has become a crucial data source for businesses and researchers, providing a space where people can express their opinions and emotions. To analyze this data and classify emotions and their polarity in texts, natural language processing (NLP) techniques such as emotion analysis (EA) and sentiment analysis (SA) are employed. However, the effectiveness of these tasks using machine learning (ML) and deep learning (DL) methods depends on large labeled datasets, which are scarce in languages like Spanish. To address this challenge, researchers use data augmentation (DA) techniques to artificially expand small datasets. This study aims to investigate whether DA techniques can improve classification results using ML and DL algorithms for sentiment and emotion analysis of Spanish texts. Various text manipulation techniques were applied, including transformations, paraphrasing (back-translation), and text generation using generative adversarial networks, to small datasets such as song lyrics, social media comments, headlines from national newspapers in Chile, and survey responses from higher education students. The findings show that the Convolutional Neural Network (CNN) classifier achieved the most significant improvement, with an 18% increase using the Generative Adversarial Networks for Sentiment Text (SentiGan) on the Aggressiveness (Seriousness) dataset. Additionally, the same classifier model showed an 11% improvement using the Easy Data Augmentation (EDA) on the Gender-Based Violence dataset. The performance of the Bidirectional Encoder Representations from Transformers (BETO) also improved by 10% on the back-translation augmented version of the October 18 dataset, and by 4% on the EDA augmented version of the Teaching survey dataset. These results suggest that data augmentation techniques enhance performance by transforming text and adapting it to the specific characteristics of the dataset. Through experimentation with various augmentation techniques, this research provides valuable insights into the analysis of subjectivity in Spanish texts and offers guidance for selecting algorithms and techniques based on dataset features.

  6. Additional file 3 of Which data subset should be augmented for deep...

    • figshare.com
    • springernature.figshare.com
    xlsx
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yusra A. Ameen; Dalia M. Badary; Ahmad Elbadry I. Abonnoor; Khaled F. Hussain; Adel A. Sewisy (2023). Additional file 3 of Which data subset should be augmented for deep learning? a simulation study using urothelial cell carcinoma histopathology images [Dataset]. http://doi.org/10.6084/m9.figshare.22622729.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Yusra A. Ameen; Dalia M. Badary; Ahmad Elbadry I. Abonnoor; Khaled F. Hussain; Adel A. Sewisy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 3. A Microsoft® Excel® workbook that details the raw data for the 20 experiments in which no test-set augmentation was done, including all of the image-classification output probabilities.

  7. Z

    Training dataset for "A deep learned nanowire segmentation model using...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David, A. Santos (2024). Training dataset for "A deep learned nanowire segmentation model using synthetic data augmentation" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6469772
    Explore at:
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Nima, Emami
    Lin, Binbin
    Sarbajit, Banerjee
    David, A. Santos
    Yuting, Luo
    Bai-Xiang, Xu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This image dataset contains synthetic structure images used for training the deep-learning based nanowire segmentation model presented in our work "A deep learned nanowire segmentation model using synthetic data augmentation" to be published in npj Computational materials. Detailed information can be found in the corresponding article.

  8. f

    Data from: Augmentation of Semantic Processes for Deep Learning Applications...

    • tandf.figshare.com
    txt
    Updated Jun 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maximilian Hoffmann; Lukas Malburg; Ralph Bergmann (2025). Augmentation of Semantic Processes for Deep Learning Applications [Dataset]. http://doi.org/10.6084/m9.figshare.29212617.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 2, 2025
    Dataset provided by
    Taylor & Francis
    Authors
    Maximilian Hoffmann; Lukas Malburg; Ralph Bergmann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The popularity of Deep Learning (DL) methods used in business process management research and practice is constantly increasing. One important factor that hinders the adoption of DL in certain areas is the availability of sufficiently large training datasets, particularly affecting domains where process models are mainly defined manually with a high knowledge-acquisition effort. In this paper, we examine process model augmentation in combination with semi-supervised transfer learning to enlarge existing datasets and train DL models effectively. The use case of similarity learning between manufacturing process models is discussed. Based on a literature study of existing augmentation techniques, a concept is presented with different categories of augmentation from knowledge-light approaches to knowledge-intensive ones, e. g. based on automated planning. Specifically, the impacts of augmentation approaches on the syntactic and semantic correctness of the augmented process models are considered. The concept also proposes a semi-supervised transfer learning approach to integrate augmented and non-augmented process model datasets in a two-phased training procedure. The experimental evaluation investigates augmented process model datasets regarding their quality for model training in the context of similarity learning between manufacturing process models. The results indicate a large potential with a reduction of the prediction error of up to 53%.

  9. Replication Package of Deep Learning and Data Augmentation for Detecting...

    • zenodo.org
    zip
    Updated Apr 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous Anonymous; Anonymous Anonymous (2024). Replication Package of Deep Learning and Data Augmentation for Detecting Self-Admitted Technical Debt [Dataset]. http://doi.org/10.5281/zenodo.10521909
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 24, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous Anonymous; Anonymous Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 17, 2024
    Description

    Self-Admitted Technical Debt (SATD) refers to circumstances where developers use code comments, issues, pull requests, or other textual artifacts to explain why the existing implementation is not optimal. Past research in detecting SATD has focused on either identifying SATD (classifying SATD instances as SATD or not) or categorizing SATD (labeling instances as SATD that pertain to requirements, design, code, test, etc.). However, the performance of such approaches remains suboptimal, particularly when dealing with specific types of SATD, such as test and requirement debt. This is mostly because the used datasets are extremely imbalanced.

    In this study, we utilize a data augmentation strategy to address the problem of imbalanced data. We also employ a two-step approach to identify and categorize SATD on various datasets derived from different artifacts. Based on earlier research, a deep learning architecture called BiLSTM is utilized for the binary identification of SATD. The BERT architecture is then utilized to categorize different types of SATD. We provide the dataset of balanced classes as a contribution for future SATD researchers, and we also show that the performance of SATD identification and categorization using deep learning and our two-step approach is significantly better than baseline approaches.

    Therefore, to showcase the effectiveness of our approach, we compared it against several existing approaches:

    1. Natural Language Processing (NLP) and Matches task Annotation Tags (MAT) [Github]
    2. eXtreme Gradient Boosting+Synthetic Minority Oversampling Technique (XGBoost+SMOTE) [Figshare]
    3. eXtreme Gradient Boosting+Easy Data Augmentation (XGBoost+EDA) [Github]
    4. MT-Text-CNN [Github]

    Structure of the Replication Package:

    In accordance with the original dataset, the dataset comprises four distinct CSV files delineated by the artifacts under consideration in this study. Each CSV file encompasses a text column and a class, which indicate classifications denoting specific types of SATD, namely code/design debt (C/D), documentation debt (DOC), test debt (TES), and requirement debt (REQ) or Not-SATD.

    ├── SATD Keywords
    │ ├── Keywords based on Source of Artifacts
    │ │ ├── Code comment.txt
    │ │ ├── Commit message.txt
    │ │ ├── Issue section.txt
    │ │ └── Pull section.txt
    │ ├── Keywords based on Types of SATD
    │ │ ├── code-design debt.txt
    │ │ ├── documentation debt.txt
    │ │ ├── requirement debt.txt
    │ │ └── test debt.txt
    ├── src
    │ ├── bert.py
    │ ├── bilstm.py
    │ └── preprocessing.py
    ├── data-augmentation-code_comments.csv
    ├── data-augmentation-commit_messages.csv
    ├── data-augmentation-issues.csv
    ├── data-augmentation-pull_requests.csv
    └── Supplementary Material.docx

    Requirements:

    nltk
    transformers
    torch
    tensorflow
    keras
    langdetect
    inflect
    inflection
    Project sources for each artifact are as follows:
    Source code commentIssue sectionPull sectionCommit message
    ant
    argouml
    columba
    emf
    hibernate
    jedit
    jfreechart
    jmeter
    jruby
    squirrel
    camel
    chromium
    gerrit
    hadoop
    hbase
    impala
    thrift
    accumulo
    activemq
    activemq-artemis
    airflow
    ambari
    apisix
    apisix-dashboard
    arrow
    attic-apex-core
    attic-apex-malhar
    attic-stratos
    avro
    beam
    bigtop
    bookkeeper
    brooklyn-server
    calcite
    camel
    camel-k
    camel-quarkus
    camel-website
    carbondata
    cassandra
    cloudstack
    commons-lang
    couchdb
    cxf
    daffodil
    drill
    druid
    dubbo
    echarts
    fineract
    flink
    fluo
    geode
    geode-native
    gobblin
    griffin
    groovy
    guacamole-client
    hadoop
    hawq
    hbase
    helix
    hive
    hudi
    iceberg
    ignite
    incubator-brooklyn
    incubator-dolphinscheduler
    incubator-doris
    incubator-heron
    incubator-hop
    incubator-mxnet
    incubator-pagespeed-ngx
    incubator-pinot
    incubator-weex
    infrastructure-puppet
    jena
    jmeter
    kafka
    karaf
    kylin
    lucene-solr
    madlib
    myfaces-tobago
    netbeans
    netbeans-website
    nifi
    nifi-minifi-cpp
    nutch
    openwhisk
    openwhisk-wskdeploy
    orc
    ozone
    parquet-mr
    phoenix
    pulsar
    qpid-dispatch
    reef
    rocketmq
    samza
    servicecomb-java-chassis
    shardingsphere
    shardingsphere-elasticjob
    skywalking
    spark
    storm
    streams
    superset
    systemds
    tajo
    thrift
    tinkerpop
    tomee
    trafficcontrol
    trafficserver
    trafodion
    tvm
    usergrid
    zeppelin
    zookeeper
    accumulo
    activemq
    activemq-artemis
    airflow
    ambari
    apisix
    apisix-dashboard
    arrow
    attic-apex-core
    attic-apex-malhar
    attic-stratos
    avro
    beam
    bigtop
    bookkeeper
    brooklyn-server
    calcite
    camel
    camel-k
    camel-quarkus
    camel-website
    carbondata
    cassandra
    cloudstack
    commons-lang
    couchdb
    cxf
    daffodil
    drill
    druid
    dubbo
    echarts
    fineract
    flink
    fluo
    geode
    geode-native
    gobblin
    griffin
    groovy
    guacamole-client
    hadoop
    hawq
    hbase
    helix
    hive
    hudi
    iceberg
    ignite
    incubator-brooklyn
    incubator-dolphinscheduler
    incubator-doris
    incubator-heron
    incubator-hop
    incubator-mxnet
    incubator-pagespeed-ngx
    incubator-pinot
    incubator-weex
    infrastructure-puppet
    jena
    jmeter
    kafka
    karaf
    kylin
    lucene-solr
    madlib
    myfaces-tobago
    netbeans
    netbeans-website
    nifi
    nifi-minifi-cpp
    nutch
    openwhisk
    openwhisk-wskdeploy
    orc
    ozone
    parquet-mr
    phoenix
    pulsar
    qpid-dispatch
    reef
    rocketmq
    samza
    servicecomb-java-chassis
    shardingsphere
    shardingsphere-elasticjob
    skywalking
    spark
    storm
    streams
    superset
    systemds
    tajo
    thrift
    tinkerpop
    tomee
    trafficcontrol
    trafficserver
    trafodion
    tvm
    usergrid
    zeppelin
    zookeeper

    This dataset has undergone a data augmentation process using the AugGPT technique. Meanwhile, the original dataset can be downloaded via the following link: https://github.com/yikun-li/satd-different-sources-data

  10. Variable Misuse tool: Dataset for data augmentation (6)

    • zenodo.org
    zip
    Updated Mar 8, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cristian Robledo; Cristian Robledo; Francesca Sallicati; Javier Gutiérrez; Francesca Sallicati; Javier Gutiérrez (2022). Variable Misuse tool: Dataset for data augmentation (6) [Dataset]. http://doi.org/10.5281/zenodo.6090482
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 8, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Cristian Robledo; Cristian Robledo; Francesca Sallicati; Javier Gutiérrez; Francesca Sallicati; Javier Gutiérrez
    Description

    Dataset used for data augmentation in the training phase of the Variable Misuse tool. It contains some source code files extracted from third-party repositories.

  11. i

    Data from: Equidistant and Uniform Data Augmentation for 3D Objects

    • ieee-dataport.org
    Updated Jan 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Morozov (2022). Equidistant and Uniform Data Augmentation for 3D Objects [Dataset]. https://ieee-dataport.org/documents/equidistant-and-uniform-data-augmentation-3d-objects
    Explore at:
    Dataset updated
    Jan 6, 2022
    Authors
    Alexander Morozov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    many methods exist to augment a 3D object

  12. i

    Light Field Image Augmentation

    • ieee-dataport.org
    Updated Aug 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhicheng Lu (2020). Light Field Image Augmentation [Dataset]. https://ieee-dataport.org/documents/light-field-image-augmentation
    Explore at:
    Dataset updated
    Aug 15, 2020
    Authors
    Zhicheng Lu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    "modified" is light field image with exactly same background and an object on it.

  13. R

    Augmentation Of Vl Dataset

    • universe.roboflow.com
    zip
    Updated Jul 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deep learning lab (2025). Augmentation Of Vl Dataset [Dataset]. https://universe.roboflow.com/deep-learning-lab-8macl/augmentation-of-vl
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 28, 2025
    Dataset authored and provided by
    Deep learning lab
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Skin Cancer VL
    Description

    Augmentation Of VL

    ## Overview
    
    Augmentation Of VL is a dataset for classification tasks - it contains Skin Cancer VL annotations for 243 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  14. m

    Database of scalable training of neural network potentials for complex...

    • archive.materialscloud.org
    bz2, text/markdown +1
    Updated Apr 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    In Won Yeu; Annika Stuke; Alexander Urban; Nongnuch Artrith; In Won Yeu; Annika Stuke; Alexander Urban; Nongnuch Artrith (2025). Database of scalable training of neural network potentials for complex interfaces through data augmentation [Dataset]. http://doi.org/10.24435/materialscloud:w6-9a
    Explore at:
    bz2, text/markdown, txtAvailable download formats
    Dataset updated
    Apr 2, 2025
    Dataset provided by
    Materials Cloud
    Authors
    In Won Yeu; Annika Stuke; Alexander Urban; Nongnuch Artrith; In Won Yeu; Annika Stuke; Alexander Urban; Nongnuch Artrith
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This database contains the reference data used for direct force training of Artificial Neural Network (ANN) interatomic potentials using the atomic energy network (ænet) and ænet-PyTorch packages (https://github.com/atomisticnet/aenet-PyTorch). It also includes the GPR-augmented data used for indirect force training via Gaussian Process Regression (GPR) surrogate models using the ænet-GPR package (https://github.com/atomisticnet/aenet-gpr). Each data file contains atomic structures, energies, and atomic forces in XCrySDen Structure Format (XSF). The dataset includes all reference training/test data and corresponding GPR-augmented data used in the four benchmark examples presented in the reference paper, "Scalable Training of Neural Network Potentials for Complex Interfaces Through Data Augmentation". A hierarchy of the dataset is described in the README.txt file, and an overview of the dataset is also summarized in supplementary Table S1 of the reference paper.

  15. Data and code for: Assessing the Reliability of Point Mutation as Data...

    • zenodo.org
    • explore.openaire.eu
    zip
    Updated Jan 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Utku Ozbulak; Utku Ozbulak; Joris Vankerschaver; Joris Vankerschaver (2024). Data and code for: Assessing the Reliability of Point Mutation as Data Augmentation for Deep Learning with Genomic Data [Dataset]. http://doi.org/10.5281/zenodo.10457988
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 4, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Utku Ozbulak; Utku Ozbulak; Joris Vankerschaver; Joris Vankerschaver
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data and code for the paper "Assessing the Reliability of Point Mutation as Data Augmentation for Deep Learning with Genomic Data".

  16. m

    augmentation data for DAISM

    • data.mendeley.com
    • explore.openaire.eu
    • +1more
    Updated Jun 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yating Lin (2022). augmentation data for DAISM [Dataset]. http://doi.org/10.17632/ysjwjvpnh3.1
    Explore at:
    Dataset updated
    Jun 22, 2022
    Authors
    Yating Lin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The purified dataset for data augmentation for DAISM-DNNXMBD can be downloaded from this repository.

    The pbmc8k dataset downloaded from 10X Genomics were processed and uesd for data augmentation to create training datasets for training DAISM-DNN models. pbmc8k.h5ad contains 5 cell types (B.cells, CD4.T.cells, CD8.T.cells, monocytic.lineage, NK.cells), and pbmc8k_fine.h5ad cantains 7 cell types (naive.B.cells, memory.B.cells, naive.CD4.T.cells, memory.CD4.T.cells,naive.CD8.T.cells, memory.CD8.T.cells, regulatory.T.cells, monocytes, macrophages, myeloid.dendritic.cells, NK.cells).

    For RNA-seq dataset, it contains 5 cell types (B.cells, CD4.T.cells, CD8.T.cells, monocytic.lineage, NK.cells). Raw FASTQ reads were downloaded from the NCBI website, and transcription and gene-level expression quantification were performed using Salmon (version 0.11.3) with Gencode v29 after quality control of FASTQ reads using fastp. All tools were used with default parameters.

  17. R

    Augmentation Of Ak Dataset

    • universe.roboflow.com
    zip
    Updated Apr 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deep learning lab (2025). Augmentation Of Ak Dataset [Dataset]. https://universe.roboflow.com/deep-learning-lab-8macl/augmentation-of-ak
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 3, 2025
    Dataset authored and provided by
    Deep learning lab
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Skin Cancer AK
    Description

    Augmentation Of AK

    ## Overview
    
    Augmentation Of AK is a dataset for classification tasks - it contains Skin Cancer AK annotations for 790 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  18. R

    All Cell Stages For Excel Gen Augmentation Dataset

    • universe.roboflow.com
    zip
    Updated Jan 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    deep learning class (2025). All Cell Stages For Excel Gen Augmentation Dataset [Dataset]. https://universe.roboflow.com/deep-learning-class/all-cell-stages-for-excel-gen-augmentation/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 26, 2025
    Dataset authored and provided by
    deep learning class
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Cell PRid 6ycW Polygons
    Description

    All Cell Stages For Excel Gen Augmentation

    ## Overview
    
    All Cell Stages For Excel Gen Augmentation is a dataset for instance segmentation tasks - it contains Cell PRid 6ycW annotations for 394 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  19. m

    Optimizing Object Detection in Challenging Environments with Deep...

    • data.mendeley.com
    Updated Oct 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asad Ali (2024). Optimizing Object Detection in Challenging Environments with Deep Convolutional Neural Networks [Dataset]. http://doi.org/10.17632/gfpg6hxrvz.1
    Explore at:
    Dataset updated
    Oct 24, 2024
    Authors
    Asad Ali
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Object detection in challenging environments, such as low-light, cluttered, or dynamic conditions, remains a critical issue in computer vision. Deep Convolutional Neural Networks (DCNNs) have emerged as powerful tools for addressing these challenges due to their ability to learn hierarchical feature representations. This paper explores the optimization of object detection in such environments by leveraging advanced DCNN architectures, data augmentation techniques, and domain-specific pre-training. We propose an enhanced detection framework that integrates multi-scale feature extraction, transfer learning, and regularization methods to improve robustness against noise, occlusion, and lighting variations. Experimental results demonstrate significant improvements in detection accuracy across various challenging datasets, outperforming traditional methods. This study highlights the potential of DCNNs in real-world applications, such as autonomous driving, surveillance, and robotics, where object detection in difficult conditions is crucial.

  20. i

    Data from: Regularization for Unconditional Image Diffusion Models via...

    • ieee-dataport.org
    Updated Jun 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kensuke NAKAMURA (2025). Regularization for Unconditional Image Diffusion Models via Shifted Data Augmentation [Dataset]. https://ieee-dataport.org/documents/regularization-unconditional-image-diffusion-models-shifted-data-augmentation
    Explore at:
    Dataset updated
    Jun 22, 2025
    Authors
    Kensuke NAKAMURA
    Description

    it often causes leakage

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ryoma Otsuka; Naoya Yoshimura; Kei Tanigaki; Shiho Koyama; Yuichi Mizutani; Ken Yoda; Takuya Maekawa (2024). Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers [Dataset]. http://doi.org/10.5061/dryad.2ngf1vhwk

Data from: Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers

Related Article
Explore at:
zipAvailable download formats
Dataset updated
Feb 22, 2024
Dataset provided by
Nagoya University
Osaka University
Authors
Ryoma Otsuka; Naoya Yoshimura; Kei Tanigaki; Shiho Koyama; Yuichi Mizutani; Ken Yoda; Takuya Maekawa
License

https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

Description

Machine learning‐based behaviour classification using acceleration data is a powerful tool in bio‐logging research. Deep learning architectures such as convolutional neural networks (CNN), long short‐term memory (LSTM) and self‐attention mechanisms as well as related training techniques have been extensively studied in human activity recognition. However, they have rarely been used in wild animal studies. The main challenges of acceleration‐based wild animal behaviour classification include data shortages, class imbalance problems, various types of noise in data due to differences in individual behaviour and where the loggers were attached and complexity in data due to complex animal‐specific behaviours, which may have limited the application of deep learning techniques in this area. To overcome these challenges, we explored the effectiveness of techniques for efficient model training: data augmentation, manifold mixup and pre‐training of deep learning models with unlabelled data, using datasets from two species of wild seabirds and state‐of‐the‐art deep learning model architectures. Data augmentation improved the overall model performance when one of the various techniques (none, scaling, jittering, permutation, time‐warping and rotation) was randomly applied to each data during mini‐batch training. Manifold mixup also improved model performance, but not as much as random data augmentation. Pre‐training with unlabelled data did not improve model performance. The state‐of‐the‐art deep learning models, including a model consisting of four CNN layers, an LSTM layer and a multi‐head attention layer, as well as its modified version with shortcut connection, showed better performance among other comparative models. Using only raw acceleration data as inputs, these models outperformed classic machine learning approaches that used 119 handcrafted features. Our experiments showed that deep learning techniques are promising for acceleration‐based behaviour classification of wild animals and highlighted some challenges (e.g. effective use of unlabelled data). There is scope for greater exploration of deep learning techniques in wild animal studies (e.g. advanced data augmentation, multimodal sensor data use, transfer learning and self‐supervised learning). We hope that this study will stimulate the development of deep learning techniques for wild animal behaviour classification using time‐series sensor data.

This abstract is cited from the original article "Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers" in Methods in Ecology and Evolution (Otsuka et al., 2024).Please see README for the details of the datasets.

Search
Clear search
Close search
Google apps
Main menu