100+ datasets found

n
Data from: Exploring deep learning techniques for wild animal behaviour...
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated Feb 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ryoma Otsuka; Naoya Yoshimura; Kei Tanigaki; Shiho Koyama; Yuichi Mizutani; Ken Yoda; Takuya Maekawa (2024). Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers [Dataset]. http://doi.org/10.5061/dryad.2ngf1vhwk
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.2ngf1vhwk
Dataset updated
Feb 22, 2024
Dataset provided by
Nagoya University
Osaka University
Authors
Ryoma Otsuka; Naoya Yoshimura; Kei Tanigaki; Shiho Koyama; Yuichi Mizutani; Ken Yoda; Takuya Maekawa
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Machine learning‐based behaviour classification using acceleration data is a powerful tool in bio‐logging research. Deep learning architectures such as convolutional neural networks (CNN), long short‐term memory (LSTM) and self‐attention mechanisms as well as related training techniques have been extensively studied in human activity recognition. However, they have rarely been used in wild animal studies. The main challenges of acceleration‐based wild animal behaviour classification include data shortages, class imbalance problems, various types of noise in data due to differences in individual behaviour and where the loggers were attached and complexity in data due to complex animal‐specific behaviours, which may have limited the application of deep learning techniques in this area. To overcome these challenges, we explored the effectiveness of techniques for efficient model training: data augmentation, manifold mixup and pre‐training of deep learning models with unlabelled data, using datasets from two species of wild seabirds and state‐of‐the‐art deep learning model architectures. Data augmentation improved the overall model performance when one of the various techniques (none, scaling, jittering, permutation, time‐warping and rotation) was randomly applied to each data during mini‐batch training. Manifold mixup also improved model performance, but not as much as random data augmentation. Pre‐training with unlabelled data did not improve model performance. The state‐of‐the‐art deep learning models, including a model consisting of four CNN layers, an LSTM layer and a multi‐head attention layer, as well as its modified version with shortcut connection, showed better performance among other comparative models. Using only raw acceleration data as inputs, these models outperformed classic machine learning approaches that used 119 handcrafted features. Our experiments showed that deep learning techniques are promising for acceleration‐based behaviour classification of wild animals and highlighted some challenges (e.g. effective use of unlabelled data). There is scope for greater exploration of deep learning techniques in wild animal studies (e.g. advanced data augmentation, multimodal sensor data use, transfer learning and self‐supervised learning). We hope that this study will stimulate the development of deep learning techniques for wild animal behaviour classification using time‐series sensor data.

This abstract is cited from the original article "Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers" in Methods in Ecology and Evolution (Otsuka et al., 2024).Please see README for the details of the datasets.
Additional file 4 of Which data subset should be augmented for deep...
springernature.figshare.com
xlsx
Updated Jun 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yusra A. Ameen; Dalia M. Badary; Ahmad Elbadry I. Abonnoor; Khaled F. Hussain; Adel A. Sewisy (2023). Additional file 4 of Which data subset should be augmented for deep learning? a simulation study using urothelial cell carcinoma histopathology images [Dataset]. http://doi.org/10.6084/m9.figshare.22622732.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22622732.v1
Dataset updated
Jun 21, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Yusra A. Ameen; Dalia M. Badary; Ahmad Elbadry I. Abonnoor; Khaled F. Hussain; Adel A. Sewisy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 4. A Microsoft® Excel® workbook that details the raw data for the 8 experiments in which either the test set was augmented alone (after its allocation) or augmentation of the whole dataset was done before test-set allocation. All of the image-classification output probabilities are included.
Variable Message Signal annotated images for object detection
zenodo.org
zip
Updated Oct 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gonzalo de las Heras de Matías; Gonzalo de las Heras de Matías; Javier Sánchez-Soriano; Javier Sánchez-Soriano; Enrique Puertas; Enrique Puertas (2022). Variable Message Signal annotated images for object detection [Dataset]. http://doi.org/10.5281/zenodo.5904211
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5904211
Dataset updated
Oct 2, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Gonzalo de las Heras de Matías; Gonzalo de las Heras de Matías; Javier Sánchez-Soriano; Javier Sánchez-Soriano; Enrique Puertas; Enrique Puertas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
If you use this dataset, please cite this paper: Puertas, E.; De-Las-Heras, G.; Sánchez-Soriano, J.; Fernández-Andrés, J. Dataset: Variable Message Signal Annotated Images for Object Detection. Data 2022, 7, 41. https://doi.org/10.3390/data7040041

This dataset consists of Spanish road images taken from inside a vehicle, as well as annotations in XML files in PASCAL VOC format that indicate the location of Variable Message Signals within them. Also, a CSV file is attached with information regarding the geographic position, the folder where the image is located, and the text in Spanish. This can be used to train supervised learning computer vision algorithms, such as convolutional neural networks. Throughout this work, the process followed to obtain the dataset, image acquisition, and labeling, and its specifications are detailed. The dataset is constituted of 1216 instances, 888 positives, and 328 negatives, in 1152 jpg images with a resolution of 1280x720 pixels. These are divided into 576 real images and 576 images created from the data-augmentation technique. The purpose of this dataset is to help in road computer vision research since there is not one specifically for VMSs.

The folder structure of the dataset is as follows:

vms_dataset/

data.csv

real_images/

imgs/

annotations/

data-augmentation/

imgs/

annotations/

In which:

data.csv: Each row contains the following information separated by commas (,): image_name, x_min, y_min, x_max, y_max, class_name, lat, long, folder, text.

real_images: Images extracted directly from the videos.

data-augmentation: Images created using data-augmentation

imgs: Image files in .jpg format.

annotations: Annotation files in .xml format.
H
Data from: Data augmentation for disruption prediction via robust surrogate...
dataverse.harvard.edu
osti.gov
Updated Aug 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Katharina Rath, David Rügamer, Bernd Bischl, Udo von Toussaint, Cristina Rea, Andrew Maris, Robert Granetz, Christopher G. Albert (2024). Data augmentation for disruption prediction via robust surrogate models [Dataset]. http://doi.org/10.7910/DVN/FMJCAD
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/FMJCAD
Dataset updated
Aug 31, 2024
Dataset provided by
Harvard Dataverse
Authors
Katharina Rath, David Rügamer, Bernd Bischl, Udo von Toussaint, Cristina Rea, Andrew Maris, Robert Granetz, Christopher G. Albert
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The goal of this work is to generate large statistically representative datasets to train machine learning models for disruption prediction provided by data from few existing discharges. Such a comprehensive training database is important to achieve satisfying and reliable prediction results in artificial neural network classifiers. Here, we aim for a robust augmentation of the training database for multivariate time series data using Student-t process regression. We apply Student-t process regression in a state space formulation via Bayesian filtering to tackle challenges imposed by outliers and noise in the training data set and to reduce the computational complexity. Thus, the method can also be used if the time resolution is high. We use an uncorrelated model for each dimension and impose correlations afterwards via coloring transformations. We demonstrate the efficacy of our approach on plasma diagnostics data of three different disruption classes from the DIII-D tokamak. To evaluate if the distribution of the generated data is similar to the training data, we additionally perform statistical analyses using methods from time series analysis, descriptive statistics, and classic machine learning clustering algorithms.
f
Detailed characterization of the dataset.
figshare.com
xls
Updated Sep 26, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rodrigo Gutiérrez Benítez; Alejandra Segura Navarrete; Christian Vidal-Castro; Claudia Martínez-Araneda (2024). Detailed characterization of the dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0310707.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0310707.t006
Dataset updated
Sep 26, 2024
Dataset provided by
PLOS ONE
Authors
Rodrigo Gutiérrez Benítez; Alejandra Segura Navarrete; Christian Vidal-Castro; Claudia Martínez-Araneda
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Over the last ten years, social media has become a crucial data source for businesses and researchers, providing a space where people can express their opinions and emotions. To analyze this data and classify emotions and their polarity in texts, natural language processing (NLP) techniques such as emotion analysis (EA) and sentiment analysis (SA) are employed. However, the effectiveness of these tasks using machine learning (ML) and deep learning (DL) methods depends on large labeled datasets, which are scarce in languages like Spanish. To address this challenge, researchers use data augmentation (DA) techniques to artificially expand small datasets. This study aims to investigate whether DA techniques can improve classification results using ML and DL algorithms for sentiment and emotion analysis of Spanish texts. Various text manipulation techniques were applied, including transformations, paraphrasing (back-translation), and text generation using generative adversarial networks, to small datasets such as song lyrics, social media comments, headlines from national newspapers in Chile, and survey responses from higher education students. The findings show that the Convolutional Neural Network (CNN) classifier achieved the most significant improvement, with an 18% increase using the Generative Adversarial Networks for Sentiment Text (SentiGan) on the Aggressiveness (Seriousness) dataset. Additionally, the same classifier model showed an 11% improvement using the Easy Data Augmentation (EDA) on the Gender-Based Violence dataset. The performance of the Bidirectional Encoder Representations from Transformers (BETO) also improved by 10% on the back-translation augmented version of the October 18 dataset, and by 4% on the EDA augmented version of the Teaching survey dataset. These results suggest that data augmentation techniques enhance performance by transforming text and adapting it to the specific characteristics of the dataset. Through experimentation with various augmentation techniques, this research provides valuable insights into the analysis of subjectivity in Spanish texts and offers guidance for selecting algorithms and techniques based on dataset features.
Additional file 3 of Which data subset should be augmented for deep...
figshare.com
springernature.figshare.com
xlsx
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yusra A. Ameen; Dalia M. Badary; Ahmad Elbadry I. Abonnoor; Khaled F. Hussain; Adel A. Sewisy (2023). Additional file 3 of Which data subset should be augmented for deep learning? a simulation study using urothelial cell carcinoma histopathology images [Dataset]. http://doi.org/10.6084/m9.figshare.22622729.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22622729.v1
Dataset updated
Jun 21, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Yusra A. Ameen; Dalia M. Badary; Ahmad Elbadry I. Abonnoor; Khaled F. Hussain; Adel A. Sewisy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 3. A Microsoft® Excel® workbook that details the raw data for the 20 experiments in which no test-set augmentation was done, including all of the image-classification output probabilities.
Z
Training dataset for "A deep learned nanowire segmentation model using...
data.niaid.nih.gov
zenodo.org
Updated Jul 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David, A. Santos (2024). Training dataset for "A deep learned nanowire segmentation model using synthetic data augmentation" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6469772
Explore at:
Dataset updated
Jul 16, 2024
Dataset provided by
Nima, Emami
Lin, Binbin
Sarbajit, Banerjee
David, A. Santos
Yuting, Luo
Bai-Xiang, Xu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This image dataset contains synthetic structure images used for training the deep-learning based nanowire segmentation model presented in our work "A deep learned nanowire segmentation model using synthetic data augmentation" to be published in npj Computational materials. Detailed information can be found in the corresponding article.
f
Data from: Augmentation of Semantic Processes for Deep Learning Applications...
tandf.figshare.com
txt
Updated Jun 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maximilian Hoffmann; Lukas Malburg; Ralph Bergmann (2025). Augmentation of Semantic Processes for Deep Learning Applications [Dataset]. http://doi.org/10.6084/m9.figshare.29212617.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29212617.v1
Dataset updated
Jun 2, 2025
Dataset provided by
Taylor & Francis
Authors
Maximilian Hoffmann; Lukas Malburg; Ralph Bergmann
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The popularity of Deep Learning (DL) methods used in business process management research and practice is constantly increasing. One important factor that hinders the adoption of DL in certain areas is the availability of sufficiently large training datasets, particularly affecting domains where process models are mainly defined manually with a high knowledge-acquisition effort. In this paper, we examine process model augmentation in combination with semi-supervised transfer learning to enlarge existing datasets and train DL models effectively. The use case of similarity learning between manufacturing process models is discussed. Based on a literature study of existing augmentation techniques, a concept is presented with different categories of augmentation from knowledge-light approaches to knowledge-intensive ones, e. g. based on automated planning. Specifically, the impacts of augmentation approaches on the syntactic and semantic correctness of the augmented process models are considered. The concept also proposes a semi-supervised transfer learning approach to integrate augmented and non-augmented process model datasets in a two-phased training procedure. The experimental evaluation investigates augmented process model datasets regarding their quality for model training in the context of similarity learning between manufacturing process models. The results indicate a large potential with a reduction of the prediction error of up to 53%.

Replication Package of Deep Learning and Data Augmentation for Detecting...

zenodo.org

zip

Updated Apr 24, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Anonymous Anonymous; Anonymous Anonymous (2024). Replication Package of Deep Learning and Data Augmentation for Detecting Self-Admitted Technical Debt [Dataset]. http://doi.org/10.5281/zenodo.10521909

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.10521909

Dataset updated

Apr 24, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Anonymous Anonymous; Anonymous Anonymous

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered

Jan 17, 2024

Description

Self-Admitted Technical Debt (SATD) refers to circumstances where developers use code comments, issues, pull requests, or other textual artifacts to explain why the existing implementation is not optimal. Past research in detecting SATD has focused on either identifying SATD (classifying SATD instances as SATD or not) or categorizing SATD (labeling instances as SATD that pertain to requirements, design, code, test, etc.). However, the performance of such approaches remains suboptimal, particularly when dealing with specific types of SATD, such as test and requirement debt. This is mostly because the used datasets are extremely imbalanced.

In this study, we utilize a data augmentation strategy to address the problem of imbalanced data. We also employ a two-step approach to identify and categorize SATD on various datasets derived from different artifacts. Based on earlier research, a deep learning architecture called BiLSTM is utilized for the binary identification of SATD. The BERT architecture is then utilized to categorize different types of SATD. We provide the dataset of balanced classes as a contribution for future SATD researchers, and we also show that the performance of SATD identification and categorization using deep learning and our two-step approach is significantly better than baseline approaches.

Therefore, to showcase the effectiveness of our approach, we compared it against several existing approaches:

Natural Language Processing (NLP) and Matches task Annotation Tags (MAT) [Github]
eXtreme Gradient Boosting+Synthetic Minority Oversampling Technique (XGBoost+SMOTE) [Figshare]
eXtreme Gradient Boosting+Easy Data Augmentation (XGBoost+EDA) [Github]
MT-Text-CNN [Github]

Structure of the Replication Package:

In accordance with the original dataset, the dataset comprises four distinct CSV files delineated by the artifacts under consideration in this study. Each CSV file encompasses a text column and a class, which indicate classifications denoting specific types of SATD, namely code/design debt (C/D), documentation debt (DOC), test debt (TES), and requirement debt (REQ) or Not-SATD.

├── SATD Keywords

│ ├── Keywords based on Source of Artifacts

│ │ ├── Code comment.txt

│ │ ├── Commit message.txt

│ │ ├── Issue section.txt

│ │ └── Pull section.txt

│ ├── Keywords based on Types of SATD

│ │ ├── code-design debt.txt

│ │ ├── documentation debt.txt

│ │ ├── requirement debt.txt

│ │ └── test debt.txt

├── src

│ ├── bert.py

│ ├── bilstm.py

│ └── preprocessing.py

├── data-augmentation-code_comments.csv

├── data-augmentation-commit_messages.csv

├── data-augmentation-issues.csv

├── data-augmentation-pull_requests.csv

└── Supplementary Material.docx

Requirements:

glove

nltk

transformers

torch

tensorflow

keras

langdetect

inflect

inflection

Project sources for each artifact are as follows:

Source code comment	Issue section	Pull section	Commit message
ant argouml columba emf hibernate jedit jfreechart jmeter jruby squirrel	camel chromium gerrit hadoop hbase impala thrift	accumulo activemq activemq-artemis airflow ambari apisix apisix-dashboard arrow attic-apex-core attic-apex-malhar attic-stratos avro beam bigtop bookkeeper brooklyn-server calcite camel camel-k camel-quarkus camel-website carbondata cassandra cloudstack commons-lang couchdb cxf daffodil drill druid dubbo echarts fineract flink fluo geode geode-native gobblin griffin groovy guacamole-client hadoop hawq hbase helix hive hudi iceberg ignite incubator-brooklyn incubator-dolphinscheduler incubator-doris incubator-heron incubator-hop incubator-mxnet incubator-pagespeed-ngx incubator-pinot incubator-weex infrastructure-puppet jena jmeter kafka karaf kylin lucene-solr madlib myfaces-tobago netbeans netbeans-website nifi nifi-minifi-cpp nutch openwhisk openwhisk-wskdeploy orc ozone parquet-mr phoenix pulsar qpid-dispatch reef rocketmq samza servicecomb-java-chassis shardingsphere shardingsphere-elasticjob skywalking spark storm streams superset systemds tajo thrift tinkerpop tomee trafficcontrol trafficserver trafodion tvm usergrid zeppelin zookeeper	accumulo activemq activemq-artemis airflow ambari apisix apisix-dashboard arrow attic-apex-core attic-apex-malhar attic-stratos avro beam bigtop bookkeeper brooklyn-server calcite camel camel-k camel-quarkus camel-website carbondata cassandra cloudstack commons-lang couchdb cxf daffodil drill druid dubbo echarts fineract flink fluo geode geode-native gobblin griffin groovy guacamole-client hadoop hawq hbase helix hive hudi iceberg ignite incubator-brooklyn incubator-dolphinscheduler incubator-doris incubator-heron incubator-hop incubator-mxnet incubator-pagespeed-ngx incubator-pinot incubator-weex infrastructure-puppet jena jmeter kafka karaf kylin lucene-solr madlib myfaces-tobago netbeans netbeans-website nifi nifi-minifi-cpp nutch openwhisk openwhisk-wskdeploy orc ozone parquet-mr phoenix pulsar qpid-dispatch reef rocketmq samza servicecomb-java-chassis shardingsphere shardingsphere-elasticjob skywalking spark storm streams superset systemds tajo thrift tinkerpop tomee trafficcontrol trafficserver trafodion tvm usergrid zeppelin zookeeper

This dataset has undergone a data augmentation process using the AugGPT technique. Meanwhile, the original dataset can be downloaded via the following link: https://github.com/yikun-li/satd-different-sources-data

Variable Misuse tool: Dataset for data augmentation (6)
zenodo.org
zip
Updated Mar 8, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cristian Robledo; Cristian Robledo; Francesca Sallicati; Javier Gutiérrez; Francesca Sallicati; Javier Gutiérrez (2022). Variable Misuse tool: Dataset for data augmentation (6) [Dataset]. http://doi.org/10.5281/zenodo.6090482
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6090482
Dataset updated
Mar 8, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Cristian Robledo; Cristian Robledo; Francesca Sallicati; Javier Gutiérrez; Francesca Sallicati; Javier Gutiérrez
Description
Dataset used for data augmentation in the training phase of the Variable Misuse tool. It contains some source code files extracted from third-party repositories.
i
Data from: Equidistant and Uniform Data Augmentation for 3D Objects
ieee-dataport.org
Updated Jan 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Morozov (2022). Equidistant and Uniform Data Augmentation for 3D Objects [Dataset]. https://ieee-dataport.org/documents/equidistant-and-uniform-data-augmentation-3d-objects
Explore at:
Dataset updated
Jan 6, 2022
Authors
Alexander Morozov
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
many methods exist to augment a 3D object
i
Light Field Image Augmentation
ieee-dataport.org
Updated Aug 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhicheng Lu (2020). Light Field Image Augmentation [Dataset]. https://ieee-dataport.org/documents/light-field-image-augmentation
Explore at:
Dataset updated
Aug 15, 2020
Authors
Zhicheng Lu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
"modified" is light field image with exactly same background and an object on it.
R
Augmentation Of Vl Dataset
universe.roboflow.com
zip
Updated Jul 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deep learning lab (2025). Augmentation Of Vl Dataset [Dataset]. https://universe.roboflow.com/deep-learning-lab-8macl/augmentation-of-vl
Explore at:
zipAvailable download formats
Dataset updated
Jul 28, 2025
Dataset authored and provided by
Deep learning lab
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Skin Cancer VL
Description
Augmentation Of VL

## Overview Augmentation Of VL is a dataset for classification tasks - it contains Skin Cancer VL annotations for 243 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
m
Database of scalable training of neural network potentials for complex...
archive.materialscloud.org
bz2, text/markdown +1
Updated Apr 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
In Won Yeu; Annika Stuke; Alexander Urban; Nongnuch Artrith; In Won Yeu; Annika Stuke; Alexander Urban; Nongnuch Artrith (2025). Database of scalable training of neural network potentials for complex interfaces through data augmentation [Dataset]. http://doi.org/10.24435/materialscloud:w6-9a
Explore at:
bz2, text/markdown, txtAvailable download formats
Unique identifier
https://doi.org/10.24435/materialscloud:w6-9a
Dataset updated
Apr 2, 2025
Dataset provided by
Materials Cloud
Authors
In Won Yeu; Annika Stuke; Alexander Urban; Nongnuch Artrith; In Won Yeu; Annika Stuke; Alexander Urban; Nongnuch Artrith
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This database contains the reference data used for direct force training of Artificial Neural Network (ANN) interatomic potentials using the atomic energy network (ænet) and ænet-PyTorch packages (https://github.com/atomisticnet/aenet-PyTorch). It also includes the GPR-augmented data used for indirect force training via Gaussian Process Regression (GPR) surrogate models using the ænet-GPR package (https://github.com/atomisticnet/aenet-gpr). Each data file contains atomic structures, energies, and atomic forces in XCrySDen Structure Format (XSF). The dataset includes all reference training/test data and corresponding GPR-augmented data used in the four benchmark examples presented in the reference paper, "Scalable Training of Neural Network Potentials for Complex Interfaces Through Data Augmentation". A hierarchy of the dataset is described in the README.txt file, and an overview of the dataset is also summarized in supplementary Table S1 of the reference paper.
Data and code for: Assessing the Reliability of Point Mutation as Data...
zenodo.org
explore.openaire.eu
zip
Updated Jan 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Utku Ozbulak; Utku Ozbulak; Joris Vankerschaver; Joris Vankerschaver (2024). Data and code for: Assessing the Reliability of Point Mutation as Data Augmentation for Deep Learning with Genomic Data [Dataset]. http://doi.org/10.5281/zenodo.10457988
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10457988
Dataset updated
Jan 4, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Utku Ozbulak; Utku Ozbulak; Joris Vankerschaver; Joris Vankerschaver
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data and code for the paper "Assessing the Reliability of Point Mutation as Data Augmentation for Deep Learning with Genomic Data".
m
augmentation data for DAISM
data.mendeley.com
explore.openaire.eu
+1more
Updated Jun 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yating Lin (2022). augmentation data for DAISM [Dataset]. http://doi.org/10.17632/ysjwjvpnh3.1
Explore at:
Unique identifier
https://doi.org/10.17632/ysjwjvpnh3.1
Dataset updated
Jun 22, 2022
Authors
Yating Lin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The purified dataset for data augmentation for DAISM-DNNXMBD can be downloaded from this repository.

The pbmc8k dataset downloaded from 10X Genomics were processed and uesd for data augmentation to create training datasets for training DAISM-DNN models. pbmc8k.h5ad contains 5 cell types (B.cells, CD4.T.cells, CD8.T.cells, monocytic.lineage, NK.cells), and pbmc8k_fine.h5ad cantains 7 cell types (naive.B.cells, memory.B.cells, naive.CD4.T.cells, memory.CD4.T.cells,naive.CD8.T.cells, memory.CD8.T.cells, regulatory.T.cells, monocytes, macrophages, myeloid.dendritic.cells, NK.cells).

For RNA-seq dataset, it contains 5 cell types (B.cells, CD4.T.cells, CD8.T.cells, monocytic.lineage, NK.cells). Raw FASTQ reads were downloaded from the NCBI website, and transcription and gene-level expression quantification were performed using Salmon (version 0.11.3) with Gencode v29 after quality control of FASTQ reads using fastp. All tools were used with default parameters.
R
Augmentation Of Ak Dataset
universe.roboflow.com
zip
Updated Apr 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deep learning lab (2025). Augmentation Of Ak Dataset [Dataset]. https://universe.roboflow.com/deep-learning-lab-8macl/augmentation-of-ak
Explore at:
zipAvailable download formats
Dataset updated
Apr 3, 2025
Dataset authored and provided by
Deep learning lab
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Skin Cancer AK
Description
Augmentation Of AK

## Overview Augmentation Of AK is a dataset for classification tasks - it contains Skin Cancer AK annotations for 790 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
R
All Cell Stages For Excel Gen Augmentation Dataset
universe.roboflow.com
zip
Updated Jan 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
deep learning class (2025). All Cell Stages For Excel Gen Augmentation Dataset [Dataset]. https://universe.roboflow.com/deep-learning-class/all-cell-stages-for-excel-gen-augmentation/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Jan 26, 2025
Dataset authored and provided by
deep learning class
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Cell PRid 6ycW Polygons
Description
All Cell Stages For Excel Gen Augmentation

## Overview All Cell Stages For Excel Gen Augmentation is a dataset for instance segmentation tasks - it contains Cell PRid 6ycW annotations for 394 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
m
Optimizing Object Detection in Challenging Environments with Deep...
data.mendeley.com
Updated Oct 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Asad Ali (2024). Optimizing Object Detection in Challenging Environments with Deep Convolutional Neural Networks [Dataset]. http://doi.org/10.17632/gfpg6hxrvz.1
Explore at:
Unique identifier
https://doi.org/10.17632/gfpg6hxrvz.1
Dataset updated
Oct 24, 2024
Authors
Asad Ali
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Object detection in challenging environments, such as low-light, cluttered, or dynamic conditions, remains a critical issue in computer vision. Deep Convolutional Neural Networks (DCNNs) have emerged as powerful tools for addressing these challenges due to their ability to learn hierarchical feature representations. This paper explores the optimization of object detection in such environments by leveraging advanced DCNN architectures, data augmentation techniques, and domain-specific pre-training. We propose an enhanced detection framework that integrates multi-scale feature extraction, transfer learning, and regularization methods to improve robustness against noise, occlusion, and lighting variations. Experimental results demonstrate significant improvements in detection accuracy across various challenging datasets, outperforming traditional methods. This study highlights the potential of DCNNs in real-world applications, such as autonomous driving, surveillance, and robotics, where object detection in difficult conditions is crucial.
i
Data from: Regularization for Unconditional Image Diffusion Models via...
ieee-dataport.org
Updated Jun 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kensuke NAKAMURA (2025). Regularization for Unconditional Image Diffusion Models via Shifted Data Augmentation [Dataset]. https://ieee-dataport.org/documents/regularization-unconditional-image-diffusion-models-shifted-data-augmentation
Explore at:
Dataset updated
Jun 22, 2025
Authors
Kensuke NAKAMURA
Description
it often causes leakage

Facebook

Twitter

Click to copy link

Link copied

Cite

Ryoma Otsuka; Naoya Yoshimura; Kei Tanigaki; Shiho Koyama; Yuichi Mizutani; Ken Yoda; Takuya Maekawa (2024). Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers [Dataset]. http://doi.org/10.5061/dryad.2ngf1vhwk

Data from: Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5061/dryad.2ngf1vhwk

Dataset updated

Feb 22, 2024

Dataset provided by

Nagoya University
Osaka University

Authors

Ryoma Otsuka; Naoya Yoshimura; Kei Tanigaki; Shiho Koyama; Yuichi Mizutani; Ken Yoda; Takuya Maekawa

License

https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

Description

Machine learning‐based behaviour classification using acceleration data is a powerful tool in bio‐logging research. Deep learning architectures such as convolutional neural networks (CNN), long short‐term memory (LSTM) and self‐attention mechanisms as well as related training techniques have been extensively studied in human activity recognition. However, they have rarely been used in wild animal studies. The main challenges of acceleration‐based wild animal behaviour classification include data shortages, class imbalance problems, various types of noise in data due to differences in individual behaviour and where the loggers were attached and complexity in data due to complex animal‐specific behaviours, which may have limited the application of deep learning techniques in this area. To overcome these challenges, we explored the effectiveness of techniques for efficient model training: data augmentation, manifold mixup and pre‐training of deep learning models with unlabelled data, using datasets from two species of wild seabirds and state‐of‐the‐art deep learning model architectures. Data augmentation improved the overall model performance when one of the various techniques (none, scaling, jittering, permutation, time‐warping and rotation) was randomly applied to each data during mini‐batch training. Manifold mixup also improved model performance, but not as much as random data augmentation. Pre‐training with unlabelled data did not improve model performance. The state‐of‐the‐art deep learning models, including a model consisting of four CNN layers, an LSTM layer and a multi‐head attention layer, as well as its modified version with shortcut connection, showed better performance among other comparative models. Using only raw acceleration data as inputs, these models outperformed classic machine learning approaches that used 119 handcrafted features. Our experiments showed that deep learning techniques are promising for acceleration‐based behaviour classification of wild animals and highlighted some challenges (e.g. effective use of unlabelled data). There is scope for greater exploration of deep learning techniques in wild animal studies (e.g. advanced data augmentation, multimodal sensor data use, transfer learning and self‐supervised learning). We hope that this study will stimulate the development of deep learning techniques for wild animal behaviour classification using time‐series sensor data.

This abstract is cited from the original article "Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers" in Methods in Ecology and Evolution (Otsuka et al., 2024).Please see README for the details of the datasets.

Clear search

Close search

Google apps

Main menu

Data from: Exploring deep learning techniques for wild animal behaviour...

Additional file 4 of Which data subset should be augmented for deep...

Variable Message Signal annotated images for object detection

Data from: Data augmentation for disruption prediction via robust surrogate...

Detailed characterization of the dataset.

Additional file 3 of Which data subset should be augmented for deep...

Training dataset for "A deep learned nanowire segmentation model using...

Data from: Augmentation of Semantic Processes for Deep Learning Applications...

Replication Package of Deep Learning and Data Augmentation for Detecting...

Variable Misuse tool: Dataset for data augmentation (6)

Data from: Equidistant and Uniform Data Augmentation for 3D Objects

Light Field Image Augmentation

Augmentation Of Vl Dataset

Augmentation Of VL

Database of scalable training of neural network potentials for complex...

Data and code for: Assessing the Reliability of Point Mutation as Data...

augmentation data for DAISM

Augmentation Of Ak Dataset

Augmentation Of AK

All Cell Stages For Excel Gen Augmentation Dataset

All Cell Stages For Excel Gen Augmentation

Optimizing Object Detection in Challenging Environments with Deep...

Data from: Regularization for Unconditional Image Diffusion Models via...

Data from: Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers