94 datasets found
  1. hymenoptera_data

    • kaggle.com
    zip
    Updated Sep 7, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jiwei Liu (2019). hymenoptera_data [Dataset]. https://www.kaggle.com/datasets/jiweiliu/hymenoptera-data
    Explore at:
    zip(47284419 bytes)Available download formats
    Dataset updated
    Sep 7, 2019
    Authors
    Jiwei Liu
    Description

    Context

    The dataset for pytorch transfer learning tutorial. https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html

  2. h

    pytorch-reasoning

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RELAI, pytorch-reasoning [Dataset]. https://huggingface.co/datasets/relai-ai/pytorch-reasoning
    Explore at:
    Dataset authored and provided by
    RELAI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Samples in this benchmark were generated by RELAI using the following data source(s): Data Source Name: pytorch Data Source Link: https://pytorch.org/docs/stable/index.html Data Source License: https://github.com/pytorch/pytorch/blob/main/LICENSE Data Source Authors: PyTorch AI Benchmarks by Data Agents. 2025 RELAI.AI. Licensed under CC BY 4.0. Source: https://relai.ai

  3. n

    PyTorch geometric datasets for morphVQ models

    • data.niaid.nih.gov
    • dataone.org
    • +1more
    zip
    Updated Sep 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oshane Thomas; Hongyu Shen; Ryan L. Rauum; William E. H. Harcourt-Smith; John D. Polk; Mark Hasegawa-Johnson (2022). PyTorch geometric datasets for morphVQ models [Dataset]. http://doi.org/10.5061/dryad.bvq83bkcr
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 29, 2022
    Dataset provided by
    City University of New York
    University of Illinois Urbana-Champaign
    American Museum of Natural History
    Authors
    Oshane Thomas; Hongyu Shen; Ryan L. Rauum; William E. H. Harcourt-Smith; John D. Polk; Mark Hasegawa-Johnson
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The methods of geometric morphometrics are commonly used to quantify morphology in a broad range of biological sciences. The application of these methods to large datasets is constrained by manual landmark placement limiting the number of landmarks and introducing observer bias. To move the field forward, we need to automate morphological phenotyping in ways that capture comprehensive representations of morphological variation with minimal observer bias. Here, we present Morphological Variation Quantifier (morphVQ), a shape analysis pipeline for quantifying, analyzing, and exploring shape variation in the functional domain. morphVQ uses descriptor learning to estimate the functional correspondence between whole triangular meshes in lieu of landmark configurations. With functional maps between pairs of specimens in a dataset, we can analyze and explore shape variation. morphVQ uses Consistent ZoomOut refinement to improve these functional maps and produce a new representation of shape variation and area-based and conformal (angular) latent shape space differences (LSSDs). We compare this new representation of shape variation to shape variables obtained via manual digitization and auto3DGM, an existing approach to automated morphological phenotyping. We find that LSSDs compare favorably to modern 3DGM and auto3DGM while being more computationally efficient. By characterizing whole surfaces, our method incorporates more morphological detail in shape analysis. We can classify known biological groupings, such as Genus affiliation with comparable accuracy. The shape spaces produced by our method are similar to those produced by modern 3DGM and to auto3DGM, and distinctiveness functions derived from LSSDs show us how shape variation differs between groups. morphVQ can capture shape in an automated fashion while avoiding the limitations of manually digitized landmarks and thus represents a novel and computationally efficient addition to the geometric morphometrics toolkit. Methods The main dataset consists of 102 triangular meshes from laser surface scans of hominoid cuboid bones. These cuboids were from wild-collected individuals housed in the American Museum of Natural History, the National Museum of Natural History, the Harvard Museum of Comparative Biology, and the Field Museum. Hylobates, Pongo, Gorilla, Pan, and Homo are all well represented. Each triangular mesh is denoised, remeshed, and cleaned using the Geomagic Studio Wrap Software. The resulting meshes vary in vertex-count/resolution from 2,000 - 390,000. Each mesh is then upsampled or decimated to an even 12,000 vertices using the recursive subdivisions process and quadric decimation algorithm implemented in VTK python. The first of the two smaller datasets is comprised of 26 hominoid medial cuneiforms meshes isolated from laser surface scans obtained from the same museum collections listed above. The second dataset comprises 33 mouse humeri meshes from micro-CT data (34.5 μm resolution using a Skyscan 1172). These datasets were processed identically to the 102 hominoid cuboid meshes introduced above.

  4. Z

    Model Zoo: A Dataset of Diverse Populations of Neural Network Models -...

    • data.niaid.nih.gov
    Updated Jun 13, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schürholt, Konstantin (2022). Model Zoo: A Dataset of Diverse Populations of Neural Network Models - Fashion-MNIST [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6632104
    Explore at:
    Dataset updated
    Jun 13, 2022
    Dataset provided by
    Taskiran, Diyar
    Knyazev, Boris
    Borth, Damian
    Giró-i-Nieto, Xavier
    Schürholt, Konstantin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract

    In the last years, neural networks have evolved from laboratory environments to the state-of-the-art for many real-world problems. Our hypothesis is that neural network models (i.e., their weights and biases) evolve on unique, smooth trajectories in weight space during training. Following, a population of such neural network models (refereed to as “model zoo”) would form topological structures in weight space. We think that the geometry, curvature and smoothness of these structures contain information about the state of training and can be reveal latent properties of individual models. With such zoos, one could investigate novel approaches for (i) model analysis, (ii) discover unknown learning dynamics, (iii) learn rich representations of such populations, or (iv) exploit the model zoos for generative modelling of neural network weights and biases. Unfortunately, the lack of standardized model zoos and available benchmarks significantly increases the friction for further research about populations of neural networks. With this work, we publish a novel dataset of model zoos containing systematically generated and diverse populations of neural network models for further research. In total the proposed model zoo dataset is based on six image datasets, consist of 24 model zoos with varying hyperparameter combinations are generated and includes 47’360 unique neural network models resulting in over 2’415’360 collected model states. Additionally, to the model zoo data we provide an in-depth analysis of the zoos and provide benchmarks for multiple downstream tasks as mentioned before.

    Dataset

    This dataset is part of a larger collection of model zoos and contains the zoos trained on the labelled samples from Fashion-MNIST. All zoos with extensive information and code can be found at www.modelzoos.cc.

    This repository contains two types of files: the raw model zoos as collections of models (file names beginning with "fmnist_"), as well as preprocessed model zoos wrapped in a custom pytorch dataset class (filenames beginning with "dataset"). Zoos are trained in three configurations varying the seed only (seed), varying hyperparameters with fixed seeds (hyp_fix) or varying hyperparameters with random seeds (hyp_rand). The index_dict.json files contain information on how to read the vectorized models.

    For more information on the zoos and code to access and use the zoos, please see www.modelzoos.cc.

  5. Data from: pytorch-lightning

    • kaggle.com
    Updated Nov 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark Peng (2020). pytorch-lightning [Dataset]. https://www.kaggle.com/datasets/markpeng/pytorch-lightning/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 7, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mark Peng
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Mark Peng

    Released under CC0: Public Domain

    Contents

  6. DUNEdn supporting data

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Jun 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marco Rossi; Marco Rossi (2022). DUNEdn supporting data [Dataset]. http://doi.org/10.5281/zenodo.6599305
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jun 1, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marco Rossi; Marco Rossi
    Description

    A dataset containing a sample event inspired by ProtoDUNE-SP simulation.
    Checkpoints of trained DUNEdn package models used for Springer original article.

  7. E

    Data from: PyTorch model for Slovenian Named Entity Recognition SloNER 1.0

    • live.european-language-grid.eu
    Updated Jan 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). PyTorch model for Slovenian Named Entity Recognition SloNER 1.0 [Dataset]. https://live.european-language-grid.eu/catalogue/tool-service/20980
    Explore at:
    Dataset updated
    Jan 26, 2023
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    The SloNER is a model for Slovenian Named Entity Recognition. It is is a PyTorch neural network model, intended for usage with the HuggingFace transformers library (https://github.com/huggingface/transformers).

    The model is based on the Slovenian RoBERTa contextual embeddings model SloBERTa 2.0 (http://hdl.handle.net/11356/1397). The model was trained on the SUK 1.0 training corpus (http://hdl.handle.net/11356/1747).The source code of the model is available on GitHub repository https://github.com/clarinsi/SloNER.

  8. pretrained pytorch

    • kaggle.com
    zip
    Updated Nov 24, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    tonyguo (2019). pretrained pytorch [Dataset]. https://www.kaggle.com/tony92151/pretrained-pytorch
    Explore at:
    zip(1287723819 bytes)Available download formats
    Dataset updated
    Nov 24, 2019
    Authors
    tonyguo
    Description

    Dataset

    This dataset was created by tonyguo

    Contents

  9. Oxford 102 Flower Dataset

    • kaggle.com
    Updated May 26, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lalu Erfandi Maula Yusnu (2021). Oxford 102 Flower Dataset [Dataset]. https://www.kaggle.com/nunenuh/pytorch-challange-flower-dataset/tasks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 26, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Lalu Erfandi Maula Yusnu
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Overview

    We have created a 102 category dataset, consisting of 102 flower categories. The flowers chosen to be flower commonly occuring in the United Kingdom. Each class consists of between 40 and 258 images. The details of the categories and the number of images for each class can be found on this category statistics page.

    The images have large scale, pose and light variations. In addition, there are categories that have large variations within the category and several very similar categories. The dataset is visualized using isomap with shape and colour features.

    Directory Structure

    > dataset
      > train
      > valid
      > test
    - cat_to_name.json
    - README.md
    - sample_submission.csv
    

    Visualization of the dataset

    We visualize the categories in the dataset using SIFT features as shape descriptors and HSV as colour descriptor. The images are randomly sampled from the category.

    https://i.imgur.com/Tl6TKUC.png" alt="">

    Publications

    Nilsback, M-E. and Zisserman, A. Automated flower classification over a large number of classes
    Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing (2008)

    Source

  10. C

    Data from: RGBD_fruit_detection_faster-rcnn.pytorch

    • dataverse.csuc.cat
    bin, c, cu, sh +5
    Updated Jun 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jordi Gené Mola; Jordi Gené Mola; Verónica Vilaplana Besler; Verónica Vilaplana Besler; Joan Ramon Rosell Polo; Joan Ramon Rosell Polo; Josep Ramon Morros Rubió; Josep Ramon Morros Rubió; Javier Ruiz Hidalgo; Javier Ruiz Hidalgo; Eduard Gregorio López; Eduard Gregorio López (2025). RGBD_fruit_detection_faster-rcnn.pytorch [Dataset]. http://doi.org/10.34810/data2331
    Explore at:
    text/x-python(1773), bin(770), bin(6297), bin(165), c(7602), c(569), bin(9236), txt(59), text/x-python(18923), text/x-python(5541), bin(6842), bin(3110), text/x-python(385), bin(7362), bin(171), text/x-python(7114), text/x-python(9679), text/x-python(2866), text/x-matlab(1332), c(272), bin(162), text/x-python(1436), txt(14), text/x-python(17661), bin(6157), text/x-python(382), c(2425), bin(170), bin(439), bin(24232), text/x-python(377), c(369), text/x-python(524), text/x-python(2751), text/x-python(1655), bin(3828), text/x-python(864), text/x-python(2467), bin(3431), cu(7732), sh(1407), text/x-python(5922), bin(20040), bin(734), bin(4649), bin(7034), c(360842), text/x-python(842), c(6735), bin(810), bin(11929), text/x-python(1956), text/x-python(8415), bin(1029), bin(35224), text/x-python(8787), text/x-python(15233), text/x-matlab(258), bin(7808), sh(211), text/x-python(1672), text/x-python(875), text/markdown(7303), text/x-python(4046), c(206), bin(173), bin(3062), text/x-python(5674), text/x-python(8918), text/x-python(7495), bin(1277), c(178), bin(66080), c(767), text/x-python(2270), c(659), bin(176), bin(2492), bin(108592), c(22258), bin(160), bin(242968), cu(9573), text/x-python(310), text/x-python(287), bin(166), text/x-python(15787), c(2836), bin(16873), text/x-python(2879), text/x-python(2140), cu(17168), bin(1664), text/x-python(1903), bin(363), bin(159103), text/x-python(9183), c(420), bin(822), c(1263), bin(9358), bin(287), c(361), text/x-python(7921), c(481), c(2816), sh(209), text/plain; charset=us-ascii(1068), text/x-python(11988), text/x-python(8335), text/x-python(218), bin(23624), bin(89056), cu(5623), bin(3769), text/x-python(874), sh(219), bin(168), text/x-python(983), bin(3809), text/x-python(811), text/x-python(248), text/x-python(772), text/x-python(312), text/x-python(3788), bin(156), bin(3021), bin(800), bin(3134), bin(10170), cu(5064), bin(93496), text/x-python(17785), bin(795), bin(2070), text/x-python(299), bin(2910), text/x-python(2399), text/x-python(2268), bin(174), bin(2179), bin(8783), c(4104), text/x-matlab(231), text/x-python(383), text/x-python(1490), bin(178), bin(4739), bin(1191), bin(5433), bin(2559), text/x-python(1402), text/x-python(13723), bin(175), text/x-python(670), bin(159), bin(2965), bin(347), text/x-python(4202)Available download formats
    Dataset updated
    Jun 6, 2025
    Dataset provided by
    CORA.Repositori de Dades de Recerca
    Authors
    Jordi Gené Mola; Jordi Gené Mola; Verónica Vilaplana Besler; Verónica Vilaplana Besler; Joan Ramon Rosell Polo; Joan Ramon Rosell Polo; Josep Ramon Morros Rubió; Josep Ramon Morros Rubió; Javier Ruiz Hidalgo; Javier Ruiz Hidalgo; Eduard Gregorio López; Eduard Gregorio López
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This project is a pytorch implementation of a Faster R-CNN for fruit detection suitable with multi-modal images (up to 5 channels). It's based on implementation of: jwyang/faster_rcnn.pytorch, developed based on Pytorch + Numpy This implementation has been used to train and test the KFuji RGB-DS dataset, which contains images with 3 different modalities: colour (RGB), depth(D), and range-corrected intensity signal (S).

  11. RSNA pytorch 3d data 128*128 2

    • kaggle.com
    Updated Sep 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Richard Luo (2023). RSNA pytorch 3d data 128*128 2 [Dataset]. https://www.kaggle.com/datasets/richardlzluo/rsna-pytorch-3d-data-128128-2/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 16, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Richard Luo
    Description

    Dataset

    This dataset was created by Richard Luo

    Contents

  12. u

    Data from: Efficient imaging and computer vision detection of two cell...

    • agdatacommons.nal.usda.gov
    • datasets.ai
    • +1more
    zip
    Updated Feb 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin P. Graham; Jeremy Park; Grant Billings; Amanda M. Hulse-Kemp; Candace H. Haigler; Edgar Lobaton (2024). Data from: Efficient imaging and computer vision detection of two cell shapes in young cotton fibers [Dataset]. http://doi.org/10.15482/USDA.ADC/1528324
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 21, 2024
    Dataset provided by
    Ag Data Commons
    Authors
    Benjamin P. Graham; Jeremy Park; Grant Billings; Amanda M. Hulse-Kemp; Candace H. Haigler; Edgar Lobaton
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    Methods Cotton plants were grown in a well-controlled greenhouse in the NC State Phytotron as described previously (Pierce et al, 2019). Flowers were tagged on the day of anthesis and harvested three days post anthesis (3 DPA). The distinct fiber shapes had already formed by 2 DPA (Stiff and Haigler, 2016; Graham and Haigler, 2021), and fibers were still relatively short at 3 DPA, which facilitated the visualization of multiple fiber tips in one image. Cotton fiber sample preparation, digital image collection, and image analysis: Ovules with attached fiber were fixed in the greenhouse. The fixative previously used (Histochoice) (Stiff and Haigler, 2016; Pierce et al., 2019; Graham and Haigler, 2021) is obsolete, which led to testing and validation of another low-toxicity, formalin-free fixative (#A5472; Sigma-Aldrich, St. Louis, MO; Fig. S1). The boll wall was removed without damaging the ovules. (Using a razor blade, cut away the top 3 mm of the boll. Make about 1 mm deep longitudinal incisions between the locule walls, and finally cut around the base of the boll.) All of the ovules with attached fiber were lifted out of the locules and fixed (1 h, RT, 1:10 tissue:fixative ratio) prior to optional storage at 4°C. Immediately before imaging, ovules were examined under a stereo microscope (incident light, black background, 31X) to select three vigorous ovules from each boll while avoiding drying. Ovules were rinsed (3 x 5 min) in buffer [0.05 M PIPES, 12 mM EGTA. 5 mM EDTA and 0.1% (w/v) Tween 80, pH 6.8], which had lower osmolarity than a microtubule-stabilizing buffer used previously for aldehyde-fixed fibers (Seagull, 1990; Graham and Haigler, 2021). While steadying an ovule with forceps, one to three small pieces of its chalazal end with attached fibers were dissected away using a small knife (#10055-12; Fine Science Tools, Foster City, CA). Each ovule piece was placed in a single well of a 24-well slide (#63430-04; Electron Microscopy Sciences, Hatfield, PA) containing a single drop of buffer prior to applying and sealing a 24 x 60 mm coverslip with vaseline. Samples were imaged with brightfield optics and default settings for the 2.83 mega-pixel, color, CCD camera of the Keyence BZ-X810 imaging system (www.keyence.com; housed in the Cellular and Molecular Imaging Facility of NC State). The location of each sample in the 24-well slides was identified visually using a 2X objective and mapped using the navigation function of the integrated Keyence software. Using the 10X objective lens (plan-apochromatic; NA 0.45) and 60% closed condenser aperture setting, a region with many fiber apices was selected for imaging using the multi-point and z-stack capture functions. The precise location was recorded by the software prior to visual setting of the limits of the z-plane range (1.2 µm step size). Typically, three 24-sample slides (representing three accessions) were set up in parallel prior to automatic image capture. The captured z-stacks for each sample were processed into one two-dimensional image using the full-focus function of the software. (Occasional samples contained too much debris for computer vision to be effective, and these were reimaged.) Resources in this dataset:Resource Title: Deltapine 90 - Manually Annotated Training Set. File Name: GH3 DP90 Keyence 1_45 JPEG.zipResource Description: These images were manually annotated in Labelbox.Resource Title: Deltapine 90 - AI-Assisted Annotated Training Set. File Name: GH3 DP90 Keyence 46_101 JPEG.zipResource Description: These images were AI-labeled in RoboFlow and then manually reviewed in RoboFlow. Resource Title: Deltapine 90 - Manually Annotated Training-Validation Set. File Name: GH3 DP90 Keyence 102_125 JPEG.zipResource Description: These images were manually labeled in LabelBox, and then used for training-validation for the machine learning model.Resource Title: Phytogen 800 - Evaluation Test Images. File Name: Gb cv Phytogen 800.zipResource Description: These images were used to validate the machine learning model. They were manually annotated in ImageJ.Resource Title: Pima 3-79 - Evaluation Test Images. File Name: Gb cv Pima 379.zipResource Description: These images were used to validate the machine learning model. They were manually annotated in ImageJ.Resource Title: Pima S-7 - Evaluation Test Images. File Name: Gb cv Pima S7.zipResource Description: These images were used to validate the machine learning model. They were manually annotated in ImageJ.Resource Title: Coker 312 - Evaluation Test Images. File Name: Gh cv Coker 312.zipResource Description: These images were used to validate the machine learning model. They were manually annotated in ImageJ.Resource Title: Deltapine 90 - Evaluation Test Images. File Name: Gh cv Deltapine 90.zipResource Description: These images were used to validate the machine learning model. They were manually annotated in ImageJ.Resource Title: Half and Half - Evaluation Test Images. File Name: Gh cv Half and Half.zipResource Description: These images were used to validate the machine learning model. They were manually annotated in ImageJ.Resource Title: Fiber Tip Annotations - Manual. File Name: manual_annotations.coco_.jsonResource Description: Annotations in COCO.json format for fibers. Manually annotated in Labelbox.Resource Title: Fiber Tip Annotations - AI-Assisted. File Name: ai_assisted_annotations.coco_.jsonResource Description: Annotations in COCO.json format for fibers. AI annotated with human review in Roboflow.

    Resource Title: Model Weights (iteration 600). File Name: model_weights.zipResource Description: The final model, provided as a zipped Pytorch .pth file. It was chosen at training iteration 600. The model weights can be imported for use of the fiber tip type detection neural network in Python.Resource Software Recommended: Google Colab,url: https://research.google.com/colaboratory/

  13. Z

    Data from: Aeroelastic simulations of wind turbines affected by leading edge...

    • data.niaid.nih.gov
    Updated Dec 17, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gregory Duthé (2021). Aeroelastic simulations of wind turbines affected by leading edge erosion: datasets for multivariate time-series classification [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5544042
    Explore at:
    Dataset updated
    Dec 17, 2021
    Dataset provided by
    Sarah Barber
    Eleni Chatzi
    Gregory Duthé
    Imad Abdallah
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains data generated and used for classification in the publication: - Duthé, Gregory, Imad Abdallah, Sarah Barber, and Eleni Chatzi. 2021. “Modelling and Monitoring Erosion of the Leading Edge of Wind Turbine Blades.” engrXiv. September 1. doi:10.31224/osf.io/mcg75. (https://engrxiv.org/mcg75)

    The data is generated via OpenFAST aeroelastic simulations coupled with a Non-Homogeneous Compound Poisson Process for degradation modelling and was used to train a Transformer deep learning model. Each sample is a multivariate time-series of length 60'000, with the following 4 channels extracted from the simulations for a section at the tip of the blade:

    Inflow velocity

    Angle of attack

    Lift coefficient

    Drag coefficient

    .Please see the publication above for more information as well as the included readme for information about the data and an example of how to load it into to PyTorch.

  14. T

    Graph Network Simulator PyTorch training dataset for water drop sample

    • dataverse.tdl.org
    bin, json
    Updated Apr 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Krishna Kumar; Krishna Kumar (2022). Graph Network Simulator PyTorch training dataset for water drop sample [Dataset]. http://doi.org/10.18738/T8/HUBMDM
    Explore at:
    bin(5933885), bin(7596095), json(365), bin(7174932)Available download formats
    Dataset updated
    Apr 1, 2022
    Dataset provided by
    Texas Data Repository
    Authors
    Krishna Kumar; Krishna Kumar
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    DataSet for training the PyTorch Graph Network Simulator. https://github.com/geoelements/gns. The repository contains the data sets for water drop sample

  15. m

    Turbulent Flow data as PyTorch tensors for ML: Kolmogorov Flow at Re=222,...

    • figshare.manchester.ac.uk
    zip
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammed Sardar; Alex Skillen (2025). Turbulent Flow data as PyTorch tensors for ML: Kolmogorov Flow at Re=222, and Kelvin-Helmholtz instability [Dataset]. http://doi.org/10.48420/29329565.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 17, 2025
    Dataset provided by
    University of Manchester
    Authors
    Mohammed Sardar; Alex Skillen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains three files, listed below. The Kolmogorov flow is generated using a spectral solver, available at: https://github.com/google/jax-cfd. The Kelvin-Helmholtz Instability is generated using an in-house code.Case 1: Kolmogorov Flownu_0p0045_2500_8f_uv_128.pt -- a PyTorch tensor containing 2500 eight-frame videos of a 2D Re=222 forced turbulent flow (Kolmogorov flow), with only velocity vectors provided. The first 2000 samples are used as training data, the next 450 are used for validation and the final 50 are used to test the model, after training.Case 2: Kelvin Helmholtz InstabilityTraining and Validation:kh_8f_72_208_r34568.pt -- a PyTorch tensor containing 1000 eight-frame videos of a Kelvin-Helmholtz instability flow from 5 realisations of the flow (i.e. initialised from different random seeds). Each two hundred videos are from one simulation - the last two hundred may be used as a validation set.Testing: kh_8f_72_208_r9.pt -- a PyTorch tensor containing 200 eight-frame videos of a Kelvin-Helmholtz instability flow from a realisation of the flow different to the above. This is used as the test set for a model trained on kh_8f_72_208_r34568.pt.

  16. Data and scripts from "Unsupervised learning for structure detection in...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jan 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BARBOT Armand; GATTI Riccardo; BARBOT Armand; GATTI Riccardo (2023). Data and scripts from "Unsupervised learning for structure detection in plastically deformed crystals" [Dataset]. http://doi.org/10.5281/zenodo.7582668
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 31, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    BARBOT Armand; GATTI Riccardo; BARBOT Armand; GATTI Riccardo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This documents contains the scripts and dataset used for the paper "Unsupervised learning for structure detection in plastically deformed crystals".

    More precisely it contains 4 folders :


    DumpForFigures : subfolder containing the atomic positions in .dump format (see lammps documentation) used for the article figures.

    DumpForTraining : subfolder containing the atomic position in .dump format (see lammps documentation) used for training the autoencoder.

    ScriptsToDetectStructuresFromDump : subfolder containing the script sused to detect the substructures of the system by combining autoencoder and clustering methods. This folder contains a readme with the details of the contents.

    ScriptToGenerateDump : subfolder containing the scripts used to generate the atomic data with molecular dynamics. These data are then used to train the autoencoder. This folder contains a readme with the details of the contents.

    REQUIREMENTS :

    Lammps

    Python3 with packages :

    -numpy

    -matplotlib

    -pyscal

    -sci-kit learn

    -pytorch

    -glob

  17. Z

    Sentence/Table Pair Data from Wikipedia for Pre-training with...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cong Yu (2021). Sentence/Table Pair Data from Wikipedia for Pre-training with Distant-Supervision [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5612315
    Explore at:
    Dataset updated
    Oct 29, 2021
    Dataset provided by
    Alyssa Lees
    Xiang Deng
    You Wu
    Cong Yu
    Yu Su
    Huan Sun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the dataset used for pre-training in "ReasonBERT: Pre-trained to Reason with Distant Supervision", EMNLP'21.

    There are two files:

    sentence_pairs_for_pretrain_no_tokenization.tar.gz -> Contain only sentences as evidence, Text-only

    table_pairs_for_pretrain_no_tokenization.tar.gz -> At least one piece of evidence is a table, Hybrid

    The data is chunked into multiple tar files for easy loading. We use WebDataset, a PyTorch Dataset (IterableDataset) implementation providing efficient sequential/streaming data access.

    For pre-training code, or if you have any questions, please check our GitHub repo https://github.com/sunlab-osu/ReasonBERT

    Below is a sample code snippet to load the data

    import webdataset as wds

    path to the uncompressed files, should be a directory with a set of tar files

    url = './sentence_multi_pairs_for_pretrain_no_tokenization/{000000...000763}.tar' dataset = ( wds.Dataset(url) .shuffle(1000) # cache 1000 samples and shuffle .decode() .to_tuple("json") .batched(20) # group every 20 examples into a batch )

    Please see the documentation for WebDataset for more details about how to use it as dataloader for Pytorch

    You can also iterate through all examples and dump them with your preferred data format

    Below we show how the data is organized with two examples.

    Text-only

    {'s1_text': 'Sils is a municipality in the comarca of Selva, in Catalonia, Spain.', # query sentence 's1_all_links': { 'Sils,_Girona': [[0, 4]], 'municipality': [[10, 22]], 'Comarques_of_Catalonia': [[30, 37]], 'Selva': [[41, 46]], 'Catalonia': [[51, 60]] }, # list of entities and their mentions in the sentence (start, end location) 'pairs': [ # other sentences that share common entity pair with the query, group by shared entity pairs { 'pair': ['Comarques_of_Catalonia', 'Selva'], # the common entity pair 's1_pair_locs': [[[30, 37]], [[41, 46]]], # mention of the entity pair in the query 's2s': [ # list of other sentences that contain the common entity pair, or evidence { 'md5': '2777e32bddd6ec414f0bc7a0b7fea331', 'text': 'Selva is a coastal comarque (county) in Catalonia, Spain, located between the mountain range known as the Serralada Transversal or Puigsacalm and the Costa Brava (part of the Mediterranean coast). Unusually, it is divided between the provinces of Girona and Barcelona, with Fogars de la Selva being part of Barcelona province and all other municipalities falling inside Girona province. Also unusually, its capital, Santa Coloma de Farners, is no longer among its larger municipalities, with the coastal towns of Blanes and Lloret de Mar having far surpassed it in size.', 's_loc': [0, 27], # in addition to the sentence containing the common entity pair, we also keep its surrounding context. 's_loc' is the start/end location of the actual evidence sentence 'pair_locs': [ # mentions of the entity pair in the evidence [[19, 27]], # mentions of entity 1 [[0, 5], [288, 293]] # mentions of entity 2 ], 'all_links': { 'Selva': [[0, 5], [288, 293]], 'Comarques_of_Catalonia': [[19, 27]], 'Catalonia': [[40, 49]] } } ,...] # there are multiple evidence sentences }, ,...] # there are multiple entity pairs in the query }

    Hybrid

    {'s1_text': 'The 2006 Major League Baseball All-Star Game was the 77th playing of the midseason exhibition baseball game between the all-stars of the American League (AL) and National League (NL), the two leagues comprising Major League Baseball.', 's1_all_links': {...}, # same as text-only 'sentence_pairs': [{'pair': ..., 's1_pair_locs': ..., 's2s': [...]}], # same as text-only 'table_pairs': [ 'tid': 'Major_League_Baseball-1', 'text':[ ['World Series Records', 'World Series Records', ...], ['Team', 'Number of Series won', ...], ['St. Louis Cardinals (NL)', '11', ...], ...] # table content, list of rows 'index':[ [[0, 0], [0, 1], ...], [[1, 0], [1, 1], ...], ...] # index of each cell [row_id, col_id]. we keep only a table snippet, but the index here is from the original table. 'value_ranks':[ [0, 0, ...], [0, 0, ...], [0, 10, ...], ...] # if the cell contain numeric value/date, this is its rank ordered from small to large, follow TAPAS 'value_inv_ranks': [], # inverse rank 'all_links':{ 'St._Louis_Cardinals': { '2': [ [[2, 0], [0, 19]], # [[row_id, col_id], [start, end]] ] # list of mentions in the second row, the key is row_id }, 'CARDINAL:11': {'2': [[[2, 1], [0, 2]]], '8': [[[8, 3], [0, 2]]]}, } 'name': '', # table name, if exists 'pairs': { 'pair': ['American_League', 'National_League'], 's1_pair_locs': [[[137, 152]], [[162, 177]]], # mention in the query 'table_pair_locs': { '17': [ # mention of entity pair in row 17 [ [[17, 0], [3, 18]], [[17, 1], [3, 18]], [[17, 2], [3, 18]], [[17, 3], [3, 18]] ], # mention of the first entity [ [[17, 0], [21, 36]], [[17, 1], [21, 36]], ] # mention of the second entity ] } } ] }

  18. Cellpose model for Digital Phase Contrast images

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Feb 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laura Capolupo; Laura Capolupo; Olivier Burri; Olivier Burri; Romain Guiet; Romain Guiet (2022). Cellpose model for Digital Phase Contrast images [Dataset]. http://doi.org/10.5281/zenodo.6023317
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 21, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Laura Capolupo; Laura Capolupo; Olivier Burri; Olivier Burri; Romain Guiet; Romain Guiet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Name: Cellpose model for Digital Phase Contrast images

    Data type: Cellpose model, trained via transfer learning from ‘cyto’ model.

    Training Dataset: Light microscopy (Digital Phase Contrast) and Manual annotations (10.5281/zenodo.5996883)

    Training Procedure: Model was trained using a Cellpose version 0.6.5 with GPU support (NVIDIA GeForce RTX 2080) using default settings as per the Cellpose documentation

    python -m cellpose --train --dir TRAINING/DATASET/PATH/train --test_dir TRAINING/DATASET/PATH/test --pretrained_model cyto --chan 0 --chan2 0

    The model file (MODEL NAME) in this repository is the result of this training.

    Prediction Procedure: Using this model, a label image can be obtained from new unseen images in a given folder with

    python -m cellpose --dir NEW/DATASET/PATH --pretrained_model FULL_MODEL_PATH --chan 0 --chan2 0 --save_tif --no_npy

  19. pytorch-image-models-dependents

    • huggingface.co
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugging Face OSS Metrics (2023). pytorch-image-models-dependents [Dataset]. https://huggingface.co/datasets/open-source-metrics/pytorch-image-models-dependents
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    Hugging Face OSS Metrics
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    pytorch-image-models metrics

    This dataset contains metrics about the huggingface/pytorch-image-models package. Number of repositories in the dataset: 3615 Number of packages in the dataset: 89

      Package dependents
    

    This contains the data available in the used-by tab on GitHub.

      Package & Repository star count
    

    This section shows the package and repository star count, individually.

    Package Repository

    There are 18 packages that have more than 1000… See the full description on the dataset page: https://huggingface.co/datasets/open-source-metrics/pytorch-image-models-dependents.

  20. pytorch-bert-fold5

    • kaggle.com
    zip
    Updated Feb 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ariel (2020). pytorch-bert-fold5 [Dataset]. https://www.kaggle.com/arielfabiano/pytorch-bert-fold5
    Explore at:
    zip(4106200329 bytes)Available download formats
    Dataset updated
    Feb 7, 2020
    Authors
    Ariel
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Ariel

    Released under CC0: Public Domain

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jiwei Liu (2019). hymenoptera_data [Dataset]. https://www.kaggle.com/datasets/jiweiliu/hymenoptera-data
Organization logo

hymenoptera_data

Explore at:
6 scholarly articles cite this dataset (View in Google Scholar)
zip(47284419 bytes)Available download formats
Dataset updated
Sep 7, 2019
Authors
Jiwei Liu
Description

Context

The dataset for pytorch transfer learning tutorial. https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html

Search
Clear search
Close search
Google apps
Main menu