94 datasets found

hymenoptera_data
kaggle.com
zip
Updated Sep 7, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiwei Liu (2019). hymenoptera_data [Dataset]. https://www.kaggle.com/datasets/jiweiliu/hymenoptera-data
Explore at:
zip(47284419 bytes)Available download formats
Dataset updated
Sep 7, 2019
Authors
Jiwei Liu
Description
Context

The dataset for pytorch transfer learning tutorial. https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html
h
pytorch-reasoning
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RELAI, pytorch-reasoning [Dataset]. https://huggingface.co/datasets/relai-ai/pytorch-reasoning
Explore at:
Dataset authored and provided by
RELAI
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Samples in this benchmark were generated by RELAI using the following data source(s): Data Source Name: pytorch Data Source Link: https://pytorch.org/docs/stable/index.html Data Source License: https://github.com/pytorch/pytorch/blob/main/LICENSE Data Source Authors: PyTorch AI Benchmarks by Data Agents. 2025 RELAI.AI. Licensed under CC BY 4.0. Source: https://relai.ai
n
PyTorch geometric datasets for morphVQ models
data.niaid.nih.gov
dataone.org
+1more
zip
Updated Sep 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oshane Thomas; Hongyu Shen; Ryan L. Rauum; William E. H. Harcourt-Smith; John D. Polk; Mark Hasegawa-Johnson (2022). PyTorch geometric datasets for morphVQ models [Dataset]. http://doi.org/10.5061/dryad.bvq83bkcr
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.bvq83bkcr
Dataset updated
Sep 29, 2022
Dataset provided by
City University of New York
University of Illinois Urbana-Champaign
American Museum of Natural History
Authors
Oshane Thomas; Hongyu Shen; Ryan L. Rauum; William E. H. Harcourt-Smith; John D. Polk; Mark Hasegawa-Johnson
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
The methods of geometric morphometrics are commonly used to quantify morphology in a broad range of biological sciences. The application of these methods to large datasets is constrained by manual landmark placement limiting the number of landmarks and introducing observer bias. To move the field forward, we need to automate morphological phenotyping in ways that capture comprehensive representations of morphological variation with minimal observer bias. Here, we present Morphological Variation Quantifier (morphVQ), a shape analysis pipeline for quantifying, analyzing, and exploring shape variation in the functional domain. morphVQ uses descriptor learning to estimate the functional correspondence between whole triangular meshes in lieu of landmark configurations. With functional maps between pairs of specimens in a dataset, we can analyze and explore shape variation. morphVQ uses Consistent ZoomOut refinement to improve these functional maps and produce a new representation of shape variation and area-based and conformal (angular) latent shape space differences (LSSDs). We compare this new representation of shape variation to shape variables obtained via manual digitization and auto3DGM, an existing approach to automated morphological phenotyping. We find that LSSDs compare favorably to modern 3DGM and auto3DGM while being more computationally efficient. By characterizing whole surfaces, our method incorporates more morphological detail in shape analysis. We can classify known biological groupings, such as Genus affiliation with comparable accuracy. The shape spaces produced by our method are similar to those produced by modern 3DGM and to auto3DGM, and distinctiveness functions derived from LSSDs show us how shape variation differs between groups. morphVQ can capture shape in an automated fashion while avoiding the limitations of manually digitized landmarks and thus represents a novel and computationally efficient addition to the geometric morphometrics toolkit. Methods The main dataset consists of 102 triangular meshes from laser surface scans of hominoid cuboid bones. These cuboids were from wild-collected individuals housed in the American Museum of Natural History, the National Museum of Natural History, the Harvard Museum of Comparative Biology, and the Field Museum. Hylobates, Pongo, Gorilla, Pan, and Homo are all well represented. Each triangular mesh is denoised, remeshed, and cleaned using the Geomagic Studio Wrap Software. The resulting meshes vary in vertex-count/resolution from 2,000 - 390,000. Each mesh is then upsampled or decimated to an even 12,000 vertices using the recursive subdivisions process and quadric decimation algorithm implemented in VTK python. The first of the two smaller datasets is comprised of 26 hominoid medial cuneiforms meshes isolated from laser surface scans obtained from the same museum collections listed above. The second dataset comprises 33 mouse humeri meshes from micro-CT data (34.5 μm resolution using a Skyscan 1172). These datasets were processed identically to the 102 hominoid cuboid meshes introduced above.
Z
Model Zoo: A Dataset of Diverse Populations of Neural Network Models -...
data.niaid.nih.gov
Updated Jun 13, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Schürholt, Konstantin (2022). Model Zoo: A Dataset of Diverse Populations of Neural Network Models - Fashion-MNIST [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6632104
Explore at:
Dataset updated
Jun 13, 2022
Dataset provided by
Taskiran, Diyar
Knyazev, Boris
Borth, Damian
Giró-i-Nieto, Xavier
Schürholt, Konstantin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract

In the last years, neural networks have evolved from laboratory environments to the state-of-the-art for many real-world problems. Our hypothesis is that neural network models (i.e., their weights and biases) evolve on unique, smooth trajectories in weight space during training. Following, a population of such neural network models (refereed to as “model zoo”) would form topological structures in weight space. We think that the geometry, curvature and smoothness of these structures contain information about the state of training and can be reveal latent properties of individual models. With such zoos, one could investigate novel approaches for (i) model analysis, (ii) discover unknown learning dynamics, (iii) learn rich representations of such populations, or (iv) exploit the model zoos for generative modelling of neural network weights and biases. Unfortunately, the lack of standardized model zoos and available benchmarks significantly increases the friction for further research about populations of neural networks. With this work, we publish a novel dataset of model zoos containing systematically generated and diverse populations of neural network models for further research. In total the proposed model zoo dataset is based on six image datasets, consist of 24 model zoos with varying hyperparameter combinations are generated and includes 47’360 unique neural network models resulting in over 2’415’360 collected model states. Additionally, to the model zoo data we provide an in-depth analysis of the zoos and provide benchmarks for multiple downstream tasks as mentioned before.

Dataset

This dataset is part of a larger collection of model zoos and contains the zoos trained on the labelled samples from Fashion-MNIST. All zoos with extensive information and code can be found at www.modelzoos.cc.

This repository contains two types of files: the raw model zoos as collections of models (file names beginning with "fmnist_"), as well as preprocessed model zoos wrapped in a custom pytorch dataset class (filenames beginning with "dataset"). Zoos are trained in three configurations varying the seed only (seed), varying hyperparameters with fixed seeds (hyp_fix) or varying hyperparameters with random seeds (hyp_rand). The index_dict.json files contain information on how to read the vectorized models.

For more information on the zoos and code to access and use the zoos, please see www.modelzoos.cc.
Data from: pytorch-lightning
kaggle.com
Updated Nov 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mark Peng (2020). pytorch-lightning [Dataset]. https://www.kaggle.com/datasets/markpeng/pytorch-lightning/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 7, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mark Peng
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Mark Peng

Released under CC0: Public Domain

Contents
DUNEdn supporting data
zenodo.org
data.niaid.nih.gov
application/gzip
Updated Jun 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marco Rossi; Marco Rossi (2022). DUNEdn supporting data [Dataset]. http://doi.org/10.5281/zenodo.6599305
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6599305
Dataset updated
Jun 1, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Marco Rossi; Marco Rossi
Description
A dataset containing a sample event inspired by ProtoDUNE-SP simulation.
Checkpoints of trained DUNEdn package models used for Springer original article.
E
Data from: PyTorch model for Slovenian Named Entity Recognition SloNER 1.0
live.european-language-grid.eu
Updated Jan 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). PyTorch model for Slovenian Named Entity Recognition SloNER 1.0 [Dataset]. https://live.european-language-grid.eu/catalogue/tool-service/20980
Explore at:
Dataset updated
Jan 26, 2023
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The SloNER is a model for Slovenian Named Entity Recognition. It is is a PyTorch neural network model, intended for usage with the HuggingFace transformers library (https://github.com/huggingface/transformers).

The model is based on the Slovenian RoBERTa contextual embeddings model SloBERTa 2.0 (http://hdl.handle.net/11356/1397). The model was trained on the SUK 1.0 training corpus (http://hdl.handle.net/11356/1747).The source code of the model is available on GitHub repository https://github.com/clarinsi/SloNER.
pretrained pytorch
kaggle.com
zip
Updated Nov 24, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
tonyguo (2019). pretrained pytorch [Dataset]. https://www.kaggle.com/tony92151/pretrained-pytorch
Explore at:
zip(1287723819 bytes)Available download formats
Dataset updated
Nov 24, 2019
Authors
tonyguo
Description
Dataset

This dataset was created by tonyguo

Contents
Oxford 102 Flower Dataset
kaggle.com
Updated May 26, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lalu Erfandi Maula Yusnu (2021). Oxford 102 Flower Dataset [Dataset]. https://www.kaggle.com/nunenuh/pytorch-challange-flower-dataset/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 26, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Lalu Erfandi Maula Yusnu
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Overview

We have created a 102 category dataset, consisting of 102 flower categories. The flowers chosen to be flower commonly occuring in the United Kingdom. Each class consists of between 40 and 258 images. The details of the categories and the number of images for each class can be found on this category statistics page.

The images have large scale, pose and light variations. In addition, there are categories that have large variations within the category and several very similar categories. The dataset is visualized using isomap with shape and colour features.

Directory Structure

> dataset > train > valid > test - cat_to_name.json - README.md - sample_submission.csv

Visualization of the dataset

We visualize the categories in the dataset using SIFT features as shape descriptors and HSV as colour descriptor. The images are randomly sampled from the category.

https://i.imgur.com/Tl6TKUC.png" alt="">

Publications

Nilsback, M-E. and Zisserman, A. Automated flower classification over a large number of classes
Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing (2008)

Source

Original source of this data can be found in 102 Category Flower Dataset

Original readme from author can be found in AUTHOR README

Directory test is added from another kaggle dataset that can be found in Oxford 102 Flower Pytorch
C
Data from: RGBD_fruit_detection_faster-rcnn.pytorch
dataverse.csuc.cat
bin, c, cu, sh +5
Updated Jun 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jordi Gené Mola; Jordi Gené Mola; Verónica Vilaplana Besler; Verónica Vilaplana Besler; Joan Ramon Rosell Polo; Joan Ramon Rosell Polo; Josep Ramon Morros Rubió; Josep Ramon Morros Rubió; Javier Ruiz Hidalgo; Javier Ruiz Hidalgo; Eduard Gregorio López; Eduard Gregorio López (2025). RGBD_fruit_detection_faster-rcnn.pytorch [Dataset]. http://doi.org/10.34810/data2331
Explore at:
text/x-python(1773), bin(770), bin(6297), bin(165), c(7602), c(569), bin(9236), txt(59), text/x-python(18923), text/x-python(5541), bin(6842), bin(3110), text/x-python(385), bin(7362), bin(171), text/x-python(7114), text/x-python(9679), text/x-python(2866), text/x-matlab(1332), c(272), bin(162), text/x-python(1436), txt(14), text/x-python(17661), bin(6157), text/x-python(382), c(2425), bin(170), bin(439), bin(24232), text/x-python(377), c(369), text/x-python(524), text/x-python(2751), text/x-python(1655), bin(3828), text/x-python(864), text/x-python(2467), bin(3431), cu(7732), sh(1407), text/x-python(5922), bin(20040), bin(734), bin(4649), bin(7034), c(360842), text/x-python(842), c(6735), bin(810), bin(11929), text/x-python(1956), text/x-python(8415), bin(1029), bin(35224), text/x-python(8787), text/x-python(15233), text/x-matlab(258), bin(7808), sh(211), text/x-python(1672), text/x-python(875), text/markdown(7303), text/x-python(4046), c(206), bin(173), bin(3062), text/x-python(5674), text/x-python(8918), text/x-python(7495), bin(1277), c(178), bin(66080), c(767), text/x-python(2270), c(659), bin(176), bin(2492), bin(108592), c(22258), bin(160), bin(242968), cu(9573), text/x-python(310), text/x-python(287), bin(166), text/x-python(15787), c(2836), bin(16873), text/x-python(2879), text/x-python(2140), cu(17168), bin(1664), text/x-python(1903), bin(363), bin(159103), text/x-python(9183), c(420), bin(822), c(1263), bin(9358), bin(287), c(361), text/x-python(7921), c(481), c(2816), sh(209), text/plain; charset=us-ascii(1068), text/x-python(11988), text/x-python(8335), text/x-python(218), bin(23624), bin(89056), cu(5623), bin(3769), text/x-python(874), sh(219), bin(168), text/x-python(983), bin(3809), text/x-python(811), text/x-python(248), text/x-python(772), text/x-python(312), text/x-python(3788), bin(156), bin(3021), bin(800), bin(3134), bin(10170), cu(5064), bin(93496), text/x-python(17785), bin(795), bin(2070), text/x-python(299), bin(2910), text/x-python(2399), text/x-python(2268), bin(174), bin(2179), bin(8783), c(4104), text/x-matlab(231), text/x-python(383), text/x-python(1490), bin(178), bin(4739), bin(1191), bin(5433), bin(2559), text/x-python(1402), text/x-python(13723), bin(175), text/x-python(670), bin(159), bin(2965), bin(347), text/x-python(4202)Available download formats
Unique identifier
https://doi.org/10.34810/data2331
Dataset updated
Jun 6, 2025
Dataset provided by
CORA.Repositori de Dades de Recerca
Authors
Jordi Gené Mola; Jordi Gené Mola; Verónica Vilaplana Besler; Verónica Vilaplana Besler; Joan Ramon Rosell Polo; Joan Ramon Rosell Polo; Josep Ramon Morros Rubió; Josep Ramon Morros Rubió; Javier Ruiz Hidalgo; Javier Ruiz Hidalgo; Eduard Gregorio López; Eduard Gregorio López
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This project is a pytorch implementation of a Faster R-CNN for fruit detection suitable with multi-modal images (up to 5 channels). It's based on implementation of: jwyang/faster_rcnn.pytorch, developed based on Pytorch + Numpy This implementation has been used to train and test the KFuji RGB-DS dataset, which contains images with 3 different modalities: colour (RGB), depth(D), and range-corrected intensity signal (S).
RSNA pytorch 3d data 128*128 2
kaggle.com
Updated Sep 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard Luo (2023). RSNA pytorch 3d data 128*128 2 [Dataset]. https://www.kaggle.com/datasets/richardlzluo/rsna-pytorch-3d-data-128128-2/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 16, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Richard Luo
Description
Dataset

This dataset was created by Richard Luo

Contents
u
Data from: Efficient imaging and computer vision detection of two cell...
agdatacommons.nal.usda.gov
datasets.ai
+1more
zip
Updated Feb 21, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin P. Graham; Jeremy Park; Grant Billings; Amanda M. Hulse-Kemp; Candace H. Haigler; Edgar Lobaton (2024). Data from: Efficient imaging and computer vision detection of two cell shapes in young cotton fibers [Dataset]. http://doi.org/10.15482/USDA.ADC/1528324
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/1528324
Dataset updated
Feb 21, 2024
Dataset provided by
Ag Data Commons
Authors
Benjamin P. Graham; Jeremy Park; Grant Billings; Amanda M. Hulse-Kemp; Candace H. Haigler; Edgar Lobaton
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
Methods Cotton plants were grown in a well-controlled greenhouse in the NC State Phytotron as described previously (Pierce et al, 2019). Flowers were tagged on the day of anthesis and harvested three days post anthesis (3 DPA). The distinct fiber shapes had already formed by 2 DPA (Stiff and Haigler, 2016; Graham and Haigler, 2021), and fibers were still relatively short at 3 DPA, which facilitated the visualization of multiple fiber tips in one image. Cotton fiber sample preparation, digital image collection, and image analysis: Ovules with attached fiber were fixed in the greenhouse. The fixative previously used (Histochoice) (Stiff and Haigler, 2016; Pierce et al., 2019; Graham and Haigler, 2021) is obsolete, which led to testing and validation of another low-toxicity, formalin-free fixative (#A5472; Sigma-Aldrich, St. Louis, MO; Fig. S1). The boll wall was removed without damaging the ovules. (Using a razor blade, cut away the top 3 mm of the boll. Make about 1 mm deep longitudinal incisions between the locule walls, and finally cut around the base of the boll.) All of the ovules with attached fiber were lifted out of the locules and fixed (1 h, RT, 1:10 tissue:fixative ratio) prior to optional storage at 4°C. Immediately before imaging, ovules were examined under a stereo microscope (incident light, black background, 31X) to select three vigorous ovules from each boll while avoiding drying. Ovules were rinsed (3 x 5 min) in buffer [0.05 M PIPES, 12 mM EGTA. 5 mM EDTA and 0.1% (w/v) Tween 80, pH 6.8], which had lower osmolarity than a microtubule-stabilizing buffer used previously for aldehyde-fixed fibers (Seagull, 1990; Graham and Haigler, 2021). While steadying an ovule with forceps, one to three small pieces of its chalazal end with attached fibers were dissected away using a small knife (#10055-12; Fine Science Tools, Foster City, CA). Each ovule piece was placed in a single well of a 24-well slide (#63430-04; Electron Microscopy Sciences, Hatfield, PA) containing a single drop of buffer prior to applying and sealing a 24 x 60 mm coverslip with vaseline. Samples were imaged with brightfield optics and default settings for the 2.83 mega-pixel, color, CCD camera of the Keyence BZ-X810 imaging system (www.keyence.com; housed in the Cellular and Molecular Imaging Facility of NC State). The location of each sample in the 24-well slides was identified visually using a 2X objective and mapped using the navigation function of the integrated Keyence software. Using the 10X objective lens (plan-apochromatic; NA 0.45) and 60% closed condenser aperture setting, a region with many fiber apices was selected for imaging using the multi-point and z-stack capture functions. The precise location was recorded by the software prior to visual setting of the limits of the z-plane range (1.2 µm step size). Typically, three 24-sample slides (representing three accessions) were set up in parallel prior to automatic image capture. The captured z-stacks for each sample were processed into one two-dimensional image using the full-focus function of the software. (Occasional samples contained too much debris for computer vision to be effective, and these were reimaged.) Resources in this dataset:Resource Title: Deltapine 90 - Manually Annotated Training Set. File Name: GH3 DP90 Keyence 1_45 JPEG.zipResource Description: These images were manually annotated in Labelbox.Resource Title: Deltapine 90 - AI-Assisted Annotated Training Set. File Name: GH3 DP90 Keyence 46_101 JPEG.zipResource Description: These images were AI-labeled in RoboFlow and then manually reviewed in RoboFlow. Resource Title: Deltapine 90 - Manually Annotated Training-Validation Set. File Name: GH3 DP90 Keyence 102_125 JPEG.zipResource Description: These images were manually labeled in LabelBox, and then used for training-validation for the machine learning model.Resource Title: Phytogen 800 - Evaluation Test Images. File Name: Gb cv Phytogen 800.zipResource Description: These images were used to validate the machine learning model. They were manually annotated in ImageJ.Resource Title: Pima 3-79 - Evaluation Test Images. File Name: Gb cv Pima 379.zipResource Description: These images were used to validate the machine learning model. They were manually annotated in ImageJ.Resource Title: Pima S-7 - Evaluation Test Images. File Name: Gb cv Pima S7.zipResource Description: These images were used to validate the machine learning model. They were manually annotated in ImageJ.Resource Title: Coker 312 - Evaluation Test Images. File Name: Gh cv Coker 312.zipResource Description: These images were used to validate the machine learning model. They were manually annotated in ImageJ.Resource Title: Deltapine 90 - Evaluation Test Images. File Name: Gh cv Deltapine 90.zipResource Description: These images were used to validate the machine learning model. They were manually annotated in ImageJ.Resource Title: Half and Half - Evaluation Test Images. File Name: Gh cv Half and Half.zipResource Description: These images were used to validate the machine learning model. They were manually annotated in ImageJ.Resource Title: Fiber Tip Annotations - Manual. File Name: manual_annotations.coco_.jsonResource Description: Annotations in COCO.json format for fibers. Manually annotated in Labelbox.Resource Title: Fiber Tip Annotations - AI-Assisted. File Name: ai_assisted_annotations.coco_.jsonResource Description: Annotations in COCO.json format for fibers. AI annotated with human review in Roboflow.

Resource Title: Model Weights (iteration 600). File Name: model_weights.zipResource Description: The final model, provided as a zipped Pytorch .pth file. It was chosen at training iteration 600. The model weights can be imported for use of the fiber tip type detection neural network in Python.Resource Software Recommended: Google Colab,url: https://research.google.com/colaboratory/
Z
Data from: Aeroelastic simulations of wind turbines affected by leading edge...
data.niaid.nih.gov
Updated Dec 17, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gregory Duthé (2021). Aeroelastic simulations of wind turbines affected by leading edge erosion: datasets for multivariate time-series classification [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5544042
Explore at:
Dataset updated
Dec 17, 2021
Dataset provided by
Sarah Barber
Eleni Chatzi
Gregory Duthé
Imad Abdallah
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains data generated and used for classification in the publication: - Duthé, Gregory, Imad Abdallah, Sarah Barber, and Eleni Chatzi. 2021. “Modelling and Monitoring Erosion of the Leading Edge of Wind Turbine Blades.” engrXiv. September 1. doi:10.31224/osf.io/mcg75. (https://engrxiv.org/mcg75)

The data is generated via OpenFAST aeroelastic simulations coupled with a Non-Homogeneous Compound Poisson Process for degradation modelling and was used to train a Transformer deep learning model. Each sample is a multivariate time-series of length 60'000, with the following 4 channels extracted from the simulations for a section at the tip of the blade:

Inflow velocity

Angle of attack

Lift coefficient

Drag coefficient

.Please see the publication above for more information as well as the included readme for information about the data and an example of how to load it into to PyTorch.
T
Graph Network Simulator PyTorch training dataset for water drop sample
dataverse.tdl.org
bin, json
Updated Apr 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Krishna Kumar; Krishna Kumar (2022). Graph Network Simulator PyTorch training dataset for water drop sample [Dataset]. http://doi.org/10.18738/T8/HUBMDM
Explore at:
bin(5933885), bin(7596095), json(365), bin(7174932)Available download formats
Unique identifier
https://doi.org/10.18738/T8/HUBMDM
Dataset updated
Apr 1, 2022
Dataset provided by
Texas Data Repository
Authors
Krishna Kumar; Krishna Kumar
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
DataSet for training the PyTorch Graph Network Simulator. https://github.com/geoelements/gns. The repository contains the data sets for water drop sample
m
Turbulent Flow data as PyTorch tensors for ML: Kolmogorov Flow at Re=222,...
figshare.manchester.ac.uk
zip
Updated Jun 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohammed Sardar; Alex Skillen (2025). Turbulent Flow data as PyTorch tensors for ML: Kolmogorov Flow at Re=222, and Kelvin-Helmholtz instability [Dataset]. http://doi.org/10.48420/29329565.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.48420/29329565.v1
Dataset updated
Jun 17, 2025
Dataset provided by
University of Manchester
Authors
Mohammed Sardar; Alex Skillen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains three files, listed below. The Kolmogorov flow is generated using a spectral solver, available at: https://github.com/google/jax-cfd. The Kelvin-Helmholtz Instability is generated using an in-house code.Case 1: Kolmogorov Flownu_0p0045_2500_8f_uv_128.pt -- a PyTorch tensor containing 2500 eight-frame videos of a 2D Re=222 forced turbulent flow (Kolmogorov flow), with only velocity vectors provided. The first 2000 samples are used as training data, the next 450 are used for validation and the final 50 are used to test the model, after training.Case 2: Kelvin Helmholtz InstabilityTraining and Validation:kh_8f_72_208_r34568.pt -- a PyTorch tensor containing 1000 eight-frame videos of a Kelvin-Helmholtz instability flow from 5 realisations of the flow (i.e. initialised from different random seeds). Each two hundred videos are from one simulation - the last two hundred may be used as a validation set.Testing: kh_8f_72_208_r9.pt -- a PyTorch tensor containing 200 eight-frame videos of a Kelvin-Helmholtz instability flow from a realisation of the flow different to the above. This is used as the test set for a model trained on kh_8f_72_208_r34568.pt.
Data and scripts from "Unsupervised learning for structure detection in...
zenodo.org
data.niaid.nih.gov
zip
Updated Jan 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BARBOT Armand; GATTI Riccardo; BARBOT Armand; GATTI Riccardo (2023). Data and scripts from "Unsupervised learning for structure detection in plastically deformed crystals" [Dataset]. http://doi.org/10.5281/zenodo.7582668
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7582668
Dataset updated
Jan 31, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
BARBOT Armand; GATTI Riccardo; BARBOT Armand; GATTI Riccardo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This documents contains the scripts and dataset used for the paper "Unsupervised learning for structure detection in plastically deformed crystals".

More precisely it contains 4 folders :

DumpForFigures : subfolder containing the atomic positions in .dump format (see lammps documentation) used for the article figures.

DumpForTraining : subfolder containing the atomic position in .dump format (see lammps documentation) used for training the autoencoder.

ScriptsToDetectStructuresFromDump : subfolder containing the script sused to detect the substructures of the system by combining autoencoder and clustering methods. This folder contains a readme with the details of the contents.

ScriptToGenerateDump : subfolder containing the scripts used to generate the atomic data with molecular dynamics. These data are then used to train the autoencoder. This folder contains a readme with the details of the contents.

REQUIREMENTS :

Lammps

Python3 with packages :

-numpy

-matplotlib

-pyscal

-sci-kit learn

-pytorch

-glob
Z
Sentence/Table Pair Data from Wikipedia for Pre-training with...
data.niaid.nih.gov
zenodo.org
Updated Oct 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cong Yu (2021). Sentence/Table Pair Data from Wikipedia for Pre-training with Distant-Supervision [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5612315
Explore at:
Dataset updated
Oct 29, 2021
Dataset provided by
Alyssa Lees
Xiang Deng
You Wu
Cong Yu
Yu Su
Huan Sun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the dataset used for pre-training in "ReasonBERT: Pre-trained to Reason with Distant Supervision", EMNLP'21.

There are two files:

sentence_pairs_for_pretrain_no_tokenization.tar.gz -> Contain only sentences as evidence, Text-only

table_pairs_for_pretrain_no_tokenization.tar.gz -> At least one piece of evidence is a table, Hybrid

The data is chunked into multiple tar files for easy loading. We use WebDataset, a PyTorch Dataset (IterableDataset) implementation providing efficient sequential/streaming data access.

For pre-training code, or if you have any questions, please check our GitHub repo https://github.com/sunlab-osu/ReasonBERT

Below is a sample code snippet to load the data

import webdataset as wds

path to the uncompressed files, should be a directory with a set of tar files

url = './sentence_multi_pairs_for_pretrain_no_tokenization/{000000...000763}.tar' dataset = ( wds.Dataset(url) .shuffle(1000) # cache 1000 samples and shuffle .decode() .to_tuple("json") .batched(20) # group every 20 examples into a batch )

Please see the documentation for WebDataset for more details about how to use it as dataloader for Pytorch

You can also iterate through all examples and dump them with your preferred data format

Below we show how the data is organized with two examples.

Text-only

{'s1_text': 'Sils is a municipality in the comarca of Selva, in Catalonia, Spain.', # query sentence 's1_all_links': { 'Sils,_Girona': [[0, 4]], 'municipality': [[10, 22]], 'Comarques_of_Catalonia': [[30, 37]], 'Selva': [[41, 46]], 'Catalonia': [[51, 60]] }, # list of entities and their mentions in the sentence (start, end location) 'pairs': [ # other sentences that share common entity pair with the query, group by shared entity pairs { 'pair': ['Comarques_of_Catalonia', 'Selva'], # the common entity pair 's1_pair_locs': [[[30, 37]], [[41, 46]]], # mention of the entity pair in the query 's2s': [ # list of other sentences that contain the common entity pair, or evidence { 'md5': '2777e32bddd6ec414f0bc7a0b7fea331', 'text': 'Selva is a coastal comarque (county) in Catalonia, Spain, located between the mountain range known as the Serralada Transversal or Puigsacalm and the Costa Brava (part of the Mediterranean coast). Unusually, it is divided between the provinces of Girona and Barcelona, with Fogars de la Selva being part of Barcelona province and all other municipalities falling inside Girona province. Also unusually, its capital, Santa Coloma de Farners, is no longer among its larger municipalities, with the coastal towns of Blanes and Lloret de Mar having far surpassed it in size.', 's_loc': [0, 27], # in addition to the sentence containing the common entity pair, we also keep its surrounding context. 's_loc' is the start/end location of the actual evidence sentence 'pair_locs': [ # mentions of the entity pair in the evidence [[19, 27]], # mentions of entity 1 [[0, 5], [288, 293]] # mentions of entity 2 ], 'all_links': { 'Selva': [[0, 5], [288, 293]], 'Comarques_of_Catalonia': [[19, 27]], 'Catalonia': [[40, 49]] } } ,...] # there are multiple evidence sentences }, ,...] # there are multiple entity pairs in the query }

Hybrid

{'s1_text': 'The 2006 Major League Baseball All-Star Game was the 77th playing of the midseason exhibition baseball game between the all-stars of the American League (AL) and National League (NL), the two leagues comprising Major League Baseball.', 's1_all_links': {...}, # same as text-only 'sentence_pairs': [{'pair': ..., 's1_pair_locs': ..., 's2s': [...]}], # same as text-only 'table_pairs': [ 'tid': 'Major_League_Baseball-1', 'text':[ ['World Series Records', 'World Series Records', ...], ['Team', 'Number of Series won', ...], ['St. Louis Cardinals (NL)', '11', ...], ...] # table content, list of rows 'index':[ [[0, 0], [0, 1], ...], [[1, 0], [1, 1], ...], ...] # index of each cell [row_id, col_id]. we keep only a table snippet, but the index here is from the original table. 'value_ranks':[ [0, 0, ...], [0, 0, ...], [0, 10, ...], ...] # if the cell contain numeric value/date, this is its rank ordered from small to large, follow TAPAS 'value_inv_ranks': [], # inverse rank 'all_links':{ 'St._Louis_Cardinals': { '2': [ [[2, 0], [0, 19]], # [[row_id, col_id], [start, end]] ] # list of mentions in the second row, the key is row_id }, 'CARDINAL:11': {'2': [[[2, 1], [0, 2]]], '8': [[[8, 3], [0, 2]]]}, } 'name': '', # table name, if exists 'pairs': { 'pair': ['American_League', 'National_League'], 's1_pair_locs': [[[137, 152]], [[162, 177]]], # mention in the query 'table_pair_locs': { '17': [ # mention of entity pair in row 17 [ [[17, 0], [3, 18]], [[17, 1], [3, 18]], [[17, 2], [3, 18]], [[17, 3], [3, 18]] ], # mention of the first entity [ [[17, 0], [21, 36]], [[17, 1], [21, 36]], ] # mention of the second entity ] } } ] }
Cellpose model for Digital Phase Contrast images
zenodo.org
data.niaid.nih.gov
bin
Updated Feb 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laura Capolupo; Laura Capolupo; Olivier Burri; Olivier Burri; Romain Guiet; Romain Guiet (2022). Cellpose model for Digital Phase Contrast images [Dataset]. http://doi.org/10.5281/zenodo.6023317
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6023317
Dataset updated
Feb 21, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Laura Capolupo; Laura Capolupo; Olivier Burri; Olivier Burri; Romain Guiet; Romain Guiet
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Name: Cellpose model for Digital Phase Contrast images

Data type: Cellpose model, trained via transfer learning from ‘cyto’ model.

Training Dataset: Light microscopy (Digital Phase Contrast) and Manual annotations (10.5281/zenodo.5996883)

Training Procedure: Model was trained using a Cellpose version 0.6.5 with GPU support (NVIDIA GeForce RTX 2080) using default settings as per the Cellpose documentation

python -m cellpose --train --dir TRAINING/DATASET/PATH/train --test_dir TRAINING/DATASET/PATH/test --pretrained_model cyto --chan 0 --chan2 0

The model file (MODEL NAME) in this repository is the result of this training.

Prediction Procedure: Using this model, a label image can be obtained from new unseen images in a given folder with

python -m cellpose --dir NEW/DATASET/PATH --pretrained_model FULL_MODEL_PATH --chan 0 --chan2 0 --save_tif --no_npy
pytorch-image-models-dependents
huggingface.co
Updated Jun 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hugging Face OSS Metrics (2023). pytorch-image-models-dependents [Dataset]. https://huggingface.co/datasets/open-source-metrics/pytorch-image-models-dependents
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 16, 2023
Dataset provided by
Hugging Facehttps://huggingface.co/
Authors
Hugging Face OSS Metrics
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
pytorch-image-models metrics

This dataset contains metrics about the huggingface/pytorch-image-models package. Number of repositories in the dataset: 3615 Number of packages in the dataset: 89

Package dependents

This contains the data available in the used-by tab on GitHub.

Package & Repository star count

This section shows the package and repository star count, individually.

Package Repository

There are 18 packages that have more than 1000… See the full description on the dataset page: https://huggingface.co/datasets/open-source-metrics/pytorch-image-models-dependents.
pytorch-bert-fold5
kaggle.com
zip
Updated Feb 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ariel (2020). pytorch-bert-fold5 [Dataset]. https://www.kaggle.com/arielfabiano/pytorch-bert-fold5
Explore at:
zip(4106200329 bytes)Available download formats
Dataset updated
Feb 7, 2020
Authors
Ariel
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Ariel

Released under CC0: Public Domain

Contents

Facebook

Twitter

Click to copy link

Link copied

Cite

Jiwei Liu (2019). hymenoptera_data [Dataset]. https://www.kaggle.com/datasets/jiweiliu/hymenoptera-data

hymenoptera_data

Explore at:

6 scholarly articles cite this dataset (View in Google Scholar)

zip(47284419 bytes)Available download formats

Dataset updated

Sep 7, 2019

Authors

Jiwei Liu

Description

Context

The dataset for pytorch transfer learning tutorial. https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html

Clear search

Close search

Google apps

Main menu

hymenoptera_data

Context

pytorch-reasoning

PyTorch geometric datasets for morphVQ models

Model Zoo: A Dataset of Diverse Populations of Neural Network Models -...

Data from: pytorch-lightning

Dataset

Contents

DUNEdn supporting data

Data from: PyTorch model for Slovenian Named Entity Recognition SloNER 1.0

pretrained pytorch

Dataset

Contents

Oxford 102 Flower Dataset

Overview

Directory Structure

Visualization of the dataset

Publications

Source

Data from: RGBD_fruit_detection_faster-rcnn.pytorch

RSNA pytorch 3d data 128*128 2

Dataset

Contents

Data from: Efficient imaging and computer vision detection of two cell...

Data from: Aeroelastic simulations of wind turbines affected by leading edge...

Graph Network Simulator PyTorch training dataset for water drop sample

Turbulent Flow data as PyTorch tensors for ML: Kolmogorov Flow at Re=222,...

Data and scripts from "Unsupervised learning for structure detection in...

Sentence/Table Pair Data from Wikipedia for Pre-training with...

path to the uncompressed files, should be a directory with a set of tar files

Please see the documentation for WebDataset for more details about how to use it as dataloader for Pytorch

You can also iterate through all examples and dump them with your preferred data format

Cellpose model for Digital Phase Contrast images

pytorch-image-models-dependents

pytorch-bert-fold5

Dataset

Contents

hymenoptera_data

Context