The dataset for pytorch transfer learning tutorial. https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Samples in this benchmark were generated by RELAI using the following data source(s): Data Source Name: pytorch Data Source Link: https://pytorch.org/docs/stable/index.html Data Source License: https://github.com/pytorch/pytorch/blob/main/LICENSE Data Source Authors: PyTorch AI Benchmarks by Data Agents. 2025 RELAI.AI. Licensed under CC BY 4.0. Source: https://relai.ai
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
The methods of geometric morphometrics are commonly used to quantify morphology in a broad range of biological sciences. The application of these methods to large datasets is constrained by manual landmark placement limiting the number of landmarks and introducing observer bias. To move the field forward, we need to automate morphological phenotyping in ways that capture comprehensive representations of morphological variation with minimal observer bias. Here, we present Morphological Variation Quantifier (morphVQ), a shape analysis pipeline for quantifying, analyzing, and exploring shape variation in the functional domain. morphVQ uses descriptor learning to estimate the functional correspondence between whole triangular meshes in lieu of landmark configurations. With functional maps between pairs of specimens in a dataset, we can analyze and explore shape variation. morphVQ uses Consistent ZoomOut refinement to improve these functional maps and produce a new representation of shape variation and area-based and conformal (angular) latent shape space differences (LSSDs). We compare this new representation of shape variation to shape variables obtained via manual digitization and auto3DGM, an existing approach to automated morphological phenotyping. We find that LSSDs compare favorably to modern 3DGM and auto3DGM while being more computationally efficient. By characterizing whole surfaces, our method incorporates more morphological detail in shape analysis. We can classify known biological groupings, such as Genus affiliation with comparable accuracy. The shape spaces produced by our method are similar to those produced by modern 3DGM and to auto3DGM, and distinctiveness functions derived from LSSDs show us how shape variation differs between groups. morphVQ can capture shape in an automated fashion while avoiding the limitations of manually digitized landmarks and thus represents a novel and computationally efficient addition to the geometric morphometrics toolkit. Methods The main dataset consists of 102 triangular meshes from laser surface scans of hominoid cuboid bones. These cuboids were from wild-collected individuals housed in the American Museum of Natural History, the National Museum of Natural History, the Harvard Museum of Comparative Biology, and the Field Museum. Hylobates, Pongo, Gorilla, Pan, and Homo are all well represented. Each triangular mesh is denoised, remeshed, and cleaned using the Geomagic Studio Wrap Software. The resulting meshes vary in vertex-count/resolution from 2,000 - 390,000. Each mesh is then upsampled or decimated to an even 12,000 vertices using the recursive subdivisions process and quadric decimation algorithm implemented in VTK python. The first of the two smaller datasets is comprised of 26 hominoid medial cuneiforms meshes isolated from laser surface scans obtained from the same museum collections listed above. The second dataset comprises 33 mouse humeri meshes from micro-CT data (34.5 μm resolution using a Skyscan 1172). These datasets were processed identically to the 102 hominoid cuboid meshes introduced above.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract
In the last years, neural networks have evolved from laboratory environments to the state-of-the-art for many real-world problems. Our hypothesis is that neural network models (i.e., their weights and biases) evolve on unique, smooth trajectories in weight space during training. Following, a population of such neural network models (refereed to as “model zoo”) would form topological structures in weight space. We think that the geometry, curvature and smoothness of these structures contain information about the state of training and can be reveal latent properties of individual models. With such zoos, one could investigate novel approaches for (i) model analysis, (ii) discover unknown learning dynamics, (iii) learn rich representations of such populations, or (iv) exploit the model zoos for generative modelling of neural network weights and biases. Unfortunately, the lack of standardized model zoos and available benchmarks significantly increases the friction for further research about populations of neural networks. With this work, we publish a novel dataset of model zoos containing systematically generated and diverse populations of neural network models for further research. In total the proposed model zoo dataset is based on six image datasets, consist of 24 model zoos with varying hyperparameter combinations are generated and includes 47’360 unique neural network models resulting in over 2’415’360 collected model states. Additionally, to the model zoo data we provide an in-depth analysis of the zoos and provide benchmarks for multiple downstream tasks as mentioned before.
Dataset
This dataset is part of a larger collection of model zoos and contains the zoos trained on the labelled samples from Fashion-MNIST. All zoos with extensive information and code can be found at www.modelzoos.cc.
This repository contains two types of files: the raw model zoos as collections of models (file names beginning with "fmnist_"), as well as preprocessed model zoos wrapped in a custom pytorch dataset class (filenames beginning with "dataset"). Zoos are trained in three configurations varying the seed only (seed), varying hyperparameters with fixed seeds (hyp_fix) or varying hyperparameters with random seeds (hyp_rand). The index_dict.json files contain information on how to read the vectorized models.
For more information on the zoos and code to access and use the zoos, please see www.modelzoos.cc.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Mark Peng
Released under CC0: Public Domain
A dataset containing a sample event inspired by ProtoDUNE-SP simulation.
Checkpoints of trained DUNEdn package models used for Springer original article.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The SloNER is a model for Slovenian Named Entity Recognition. It is is a PyTorch neural network model, intended for usage with the HuggingFace transformers library (https://github.com/huggingface/transformers).
The model is based on the Slovenian RoBERTa contextual embeddings model SloBERTa 2.0 (http://hdl.handle.net/11356/1397). The model was trained on the SUK 1.0 training corpus (http://hdl.handle.net/11356/1747).The source code of the model is available on GitHub repository https://github.com/clarinsi/SloNER.
This dataset was created by tonyguo
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
We have created a 102 category dataset, consisting of 102 flower categories. The flowers chosen to be flower commonly occuring in the United Kingdom. Each class consists of between 40 and 258 images. The details of the categories and the number of images for each class can be found on this category statistics page.
The images have large scale, pose and light variations. In addition, there are categories that have large variations within the category and several very similar categories. The dataset is visualized using isomap with shape and colour features.
> dataset
> train
> valid
> test
- cat_to_name.json
- README.md
- sample_submission.csv
We visualize the categories in the dataset using SIFT features as shape descriptors and HSV as colour descriptor. The images are randomly sampled from the category.
https://i.imgur.com/Tl6TKUC.png" alt="">
Nilsback, M-E. and Zisserman, A.
Automated flower classification over a large number of classes
Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing (2008)
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This project is a pytorch implementation of a Faster R-CNN for fruit detection suitable with multi-modal images (up to 5 channels). It's based on implementation of: jwyang/faster_rcnn.pytorch, developed based on Pytorch + Numpy This implementation has been used to train and test the KFuji RGB-DS dataset, which contains images with 3 different modalities: colour (RGB), depth(D), and range-corrected intensity signal (S).
This dataset was created by Richard Luo
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Methods Cotton plants were grown in a well-controlled greenhouse in the NC State Phytotron as described previously (Pierce et al, 2019). Flowers were tagged on the day of anthesis and harvested three days post anthesis (3 DPA). The distinct fiber shapes had already formed by 2 DPA (Stiff and Haigler, 2016; Graham and Haigler, 2021), and fibers were still relatively short at 3 DPA, which facilitated the visualization of multiple fiber tips in one image. Cotton fiber sample preparation, digital image collection, and image analysis: Ovules with attached fiber were fixed in the greenhouse. The fixative previously used (Histochoice) (Stiff and Haigler, 2016; Pierce et al., 2019; Graham and Haigler, 2021) is obsolete, which led to testing and validation of another low-toxicity, formalin-free fixative (#A5472; Sigma-Aldrich, St. Louis, MO; Fig. S1). The boll wall was removed without damaging the ovules. (Using a razor blade, cut away the top 3 mm of the boll. Make about 1 mm deep longitudinal incisions between the locule walls, and finally cut around the base of the boll.) All of the ovules with attached fiber were lifted out of the locules and fixed (1 h, RT, 1:10 tissue:fixative ratio) prior to optional storage at 4°C. Immediately before imaging, ovules were examined under a stereo microscope (incident light, black background, 31X) to select three vigorous ovules from each boll while avoiding drying. Ovules were rinsed (3 x 5 min) in buffer [0.05 M PIPES, 12 mM EGTA. 5 mM EDTA and 0.1% (w/v) Tween 80, pH 6.8], which had lower osmolarity than a microtubule-stabilizing buffer used previously for aldehyde-fixed fibers (Seagull, 1990; Graham and Haigler, 2021). While steadying an ovule with forceps, one to three small pieces of its chalazal end with attached fibers were dissected away using a small knife (#10055-12; Fine Science Tools, Foster City, CA). Each ovule piece was placed in a single well of a 24-well slide (#63430-04; Electron Microscopy Sciences, Hatfield, PA) containing a single drop of buffer prior to applying and sealing a 24 x 60 mm coverslip with vaseline. Samples were imaged with brightfield optics and default settings for the 2.83 mega-pixel, color, CCD camera of the Keyence BZ-X810 imaging system (www.keyence.com; housed in the Cellular and Molecular Imaging Facility of NC State). The location of each sample in the 24-well slides was identified visually using a 2X objective and mapped using the navigation function of the integrated Keyence software. Using the 10X objective lens (plan-apochromatic; NA 0.45) and 60% closed condenser aperture setting, a region with many fiber apices was selected for imaging using the multi-point and z-stack capture functions. The precise location was recorded by the software prior to visual setting of the limits of the z-plane range (1.2 µm step size). Typically, three 24-sample slides (representing three accessions) were set up in parallel prior to automatic image capture. The captured z-stacks for each sample were processed into one two-dimensional image using the full-focus function of the software. (Occasional samples contained too much debris for computer vision to be effective, and these were reimaged.) Resources in this dataset:Resource Title: Deltapine 90 - Manually Annotated Training Set. File Name: GH3 DP90 Keyence 1_45 JPEG.zipResource Description: These images were manually annotated in Labelbox.Resource Title: Deltapine 90 - AI-Assisted Annotated Training Set. File Name: GH3 DP90 Keyence 46_101 JPEG.zipResource Description: These images were AI-labeled in RoboFlow and then manually reviewed in RoboFlow. Resource Title: Deltapine 90 - Manually Annotated Training-Validation Set. File Name: GH3 DP90 Keyence 102_125 JPEG.zipResource Description: These images were manually labeled in LabelBox, and then used for training-validation for the machine learning model.Resource Title: Phytogen 800 - Evaluation Test Images. File Name: Gb cv Phytogen 800.zipResource Description: These images were used to validate the machine learning model. They were manually annotated in ImageJ.Resource Title: Pima 3-79 - Evaluation Test Images. File Name: Gb cv Pima 379.zipResource Description: These images were used to validate the machine learning model. They were manually annotated in ImageJ.Resource Title: Pima S-7 - Evaluation Test Images. File Name: Gb cv Pima S7.zipResource Description: These images were used to validate the machine learning model. They were manually annotated in ImageJ.Resource Title: Coker 312 - Evaluation Test Images. File Name: Gh cv Coker 312.zipResource Description: These images were used to validate the machine learning model. They were manually annotated in ImageJ.Resource Title: Deltapine 90 - Evaluation Test Images. File Name: Gh cv Deltapine 90.zipResource Description: These images were used to validate the machine learning model. They were manually annotated in ImageJ.Resource Title: Half and Half - Evaluation Test Images. File Name: Gh cv Half and Half.zipResource Description: These images were used to validate the machine learning model. They were manually annotated in ImageJ.Resource Title: Fiber Tip Annotations - Manual. File Name: manual_annotations.coco_.jsonResource Description: Annotations in COCO.json format for fibers. Manually annotated in Labelbox.Resource Title: Fiber Tip Annotations - AI-Assisted. File Name: ai_assisted_annotations.coco_.jsonResource Description: Annotations in COCO.json format for fibers. AI annotated with human review in Roboflow.
Resource Title: Model Weights (iteration 600). File Name: model_weights.zipResource Description: The final model, provided as a zipped Pytorch .pth
file. It was chosen at training iteration 600.
The model weights can be imported for use of the fiber tip type detection neural network in Python.Resource Software Recommended: Google Colab,url: https://research.google.com/colaboratory/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains data generated and used for classification in the publication: - Duthé, Gregory, Imad Abdallah, Sarah Barber, and Eleni Chatzi. 2021. “Modelling and Monitoring Erosion of the Leading Edge of Wind Turbine Blades.” engrXiv. September 1. doi:10.31224/osf.io/mcg75. (https://engrxiv.org/mcg75)
The data is generated via OpenFAST aeroelastic simulations coupled with a Non-Homogeneous Compound Poisson Process for degradation modelling and was used to train a Transformer deep learning model. Each sample is a multivariate time-series of length 60'000, with the following 4 channels extracted from the simulations for a section at the tip of the blade:
Inflow velocity
Angle of attack
Lift coefficient
Drag coefficient
.Please see the publication above for more information as well as the included readme for information about the data and an example of how to load it into to PyTorch.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
DataSet for training the PyTorch Graph Network Simulator. https://github.com/geoelements/gns. The repository contains the data sets for water drop sample
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains three files, listed below. The Kolmogorov flow is generated using a spectral solver, available at: https://github.com/google/jax-cfd. The Kelvin-Helmholtz Instability is generated using an in-house code.Case 1: Kolmogorov Flownu_0p0045_2500_8f_uv_128.pt -- a PyTorch tensor containing 2500 eight-frame videos of a 2D Re=222 forced turbulent flow (Kolmogorov flow), with only velocity vectors provided. The first 2000 samples are used as training data, the next 450 are used for validation and the final 50 are used to test the model, after training.Case 2: Kelvin Helmholtz InstabilityTraining and Validation:kh_8f_72_208_r34568.pt -- a PyTorch tensor containing 1000 eight-frame videos of a Kelvin-Helmholtz instability flow from 5 realisations of the flow (i.e. initialised from different random seeds). Each two hundred videos are from one simulation - the last two hundred may be used as a validation set.Testing: kh_8f_72_208_r9.pt -- a PyTorch tensor containing 200 eight-frame videos of a Kelvin-Helmholtz instability flow from a realisation of the flow different to the above. This is used as the test set for a model trained on kh_8f_72_208_r34568.pt.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This documents contains the scripts and dataset used for the paper "Unsupervised learning for structure detection in plastically deformed crystals".
More precisely it contains 4 folders :
DumpForFigures : subfolder containing the atomic positions in .dump format (see lammps documentation) used for the article figures.
DumpForTraining : subfolder containing the atomic position in .dump format (see lammps documentation) used for training the autoencoder.
ScriptsToDetectStructuresFromDump : subfolder containing the script sused to detect the substructures of the system by combining autoencoder and clustering methods. This folder contains a readme with the details of the contents.
ScriptToGenerateDump : subfolder containing the scripts used to generate the atomic data with molecular dynamics. These data are then used to train the autoencoder. This folder contains a readme with the details of the contents.
REQUIREMENTS :
Lammps
Python3 with packages :
-numpy
-matplotlib
-pyscal
-sci-kit learn
-pytorch
-glob
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the dataset used for pre-training in "ReasonBERT: Pre-trained to Reason with Distant Supervision", EMNLP'21.
There are two files:
sentence_pairs_for_pretrain_no_tokenization.tar.gz -> Contain only sentences as evidence, Text-only
table_pairs_for_pretrain_no_tokenization.tar.gz -> At least one piece of evidence is a table, Hybrid
The data is chunked into multiple tar files for easy loading. We use WebDataset, a PyTorch Dataset (IterableDataset) implementation providing efficient sequential/streaming data access.
For pre-training code, or if you have any questions, please check our GitHub repo https://github.com/sunlab-osu/ReasonBERT
Below is a sample code snippet to load the data
import webdataset as wds
url = './sentence_multi_pairs_for_pretrain_no_tokenization/{000000...000763}.tar' dataset = ( wds.Dataset(url) .shuffle(1000) # cache 1000 samples and shuffle .decode() .to_tuple("json") .batched(20) # group every 20 examples into a batch )
Below we show how the data is organized with two examples.
Text-only
{'s1_text': 'Sils is a municipality in the comarca of Selva, in Catalonia, Spain.', # query sentence 's1_all_links': { 'Sils,_Girona': [[0, 4]], 'municipality': [[10, 22]], 'Comarques_of_Catalonia': [[30, 37]], 'Selva': [[41, 46]], 'Catalonia': [[51, 60]] }, # list of entities and their mentions in the sentence (start, end location) 'pairs': [ # other sentences that share common entity pair with the query, group by shared entity pairs { 'pair': ['Comarques_of_Catalonia', 'Selva'], # the common entity pair 's1_pair_locs': [[[30, 37]], [[41, 46]]], # mention of the entity pair in the query 's2s': [ # list of other sentences that contain the common entity pair, or evidence { 'md5': '2777e32bddd6ec414f0bc7a0b7fea331', 'text': 'Selva is a coastal comarque (county) in Catalonia, Spain, located between the mountain range known as the Serralada Transversal or Puigsacalm and the Costa Brava (part of the Mediterranean coast). Unusually, it is divided between the provinces of Girona and Barcelona, with Fogars de la Selva being part of Barcelona province and all other municipalities falling inside Girona province. Also unusually, its capital, Santa Coloma de Farners, is no longer among its larger municipalities, with the coastal towns of Blanes and Lloret de Mar having far surpassed it in size.', 's_loc': [0, 27], # in addition to the sentence containing the common entity pair, we also keep its surrounding context. 's_loc' is the start/end location of the actual evidence sentence 'pair_locs': [ # mentions of the entity pair in the evidence [[19, 27]], # mentions of entity 1 [[0, 5], [288, 293]] # mentions of entity 2 ], 'all_links': { 'Selva': [[0, 5], [288, 293]], 'Comarques_of_Catalonia': [[19, 27]], 'Catalonia': [[40, 49]] } } ,...] # there are multiple evidence sentences }, ,...] # there are multiple entity pairs in the query }
Hybrid
{'s1_text': 'The 2006 Major League Baseball All-Star Game was the 77th playing of the midseason exhibition baseball game between the all-stars of the American League (AL) and National League (NL), the two leagues comprising Major League Baseball.', 's1_all_links': {...}, # same as text-only 'sentence_pairs': [{'pair': ..., 's1_pair_locs': ..., 's2s': [...]}], # same as text-only 'table_pairs': [ 'tid': 'Major_League_Baseball-1', 'text':[ ['World Series Records', 'World Series Records', ...], ['Team', 'Number of Series won', ...], ['St. Louis Cardinals (NL)', '11', ...], ...] # table content, list of rows 'index':[ [[0, 0], [0, 1], ...], [[1, 0], [1, 1], ...], ...] # index of each cell [row_id, col_id]. we keep only a table snippet, but the index here is from the original table. 'value_ranks':[ [0, 0, ...], [0, 0, ...], [0, 10, ...], ...] # if the cell contain numeric value/date, this is its rank ordered from small to large, follow TAPAS 'value_inv_ranks': [], # inverse rank 'all_links':{ 'St._Louis_Cardinals': { '2': [ [[2, 0], [0, 19]], # [[row_id, col_id], [start, end]] ] # list of mentions in the second row, the key is row_id }, 'CARDINAL:11': {'2': [[[2, 1], [0, 2]]], '8': [[[8, 3], [0, 2]]]}, } 'name': '', # table name, if exists 'pairs': { 'pair': ['American_League', 'National_League'], 's1_pair_locs': [[[137, 152]], [[162, 177]]], # mention in the query 'table_pair_locs': { '17': [ # mention of entity pair in row 17 [ [[17, 0], [3, 18]], [[17, 1], [3, 18]], [[17, 2], [3, 18]], [[17, 3], [3, 18]] ], # mention of the first entity [ [[17, 0], [21, 36]], [[17, 1], [21, 36]], ] # mention of the second entity ] } } ] }
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Name: Cellpose model for Digital Phase Contrast images
Data type: Cellpose model, trained via transfer learning from ‘cyto’ model.
Training Dataset: Light microscopy (Digital Phase Contrast) and Manual annotations (10.5281/zenodo.5996883)
Training Procedure: Model was trained using a Cellpose version 0.6.5 with GPU support (NVIDIA GeForce RTX 2080) using default settings as per the Cellpose documentation
python -m cellpose --train --dir TRAINING/DATASET/PATH/train --test_dir TRAINING/DATASET/PATH/test --pretrained_model cyto --chan 0 --chan2 0
The model file (MODEL NAME) in this repository is the result of this training.
Prediction Procedure: Using this model, a label image can be obtained from new unseen images in a given folder with
python -m cellpose --dir NEW/DATASET/PATH --pretrained_model FULL_MODEL_PATH --chan 0 --chan2 0 --save_tif --no_npy
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
pytorch-image-models metrics
This dataset contains metrics about the huggingface/pytorch-image-models package. Number of repositories in the dataset: 3615 Number of packages in the dataset: 89
Package dependents
This contains the data available in the used-by tab on GitHub.
Package & Repository star count
This section shows the package and repository star count, individually.
Package Repository
There are 18 packages that have more than 1000… See the full description on the dataset page: https://huggingface.co/datasets/open-source-metrics/pytorch-image-models-dependents.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Ariel
Released under CC0: Public Domain
The dataset for pytorch transfer learning tutorial. https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html