This dataset was created by Luong Hoang Minh
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The SloNER is a model for Slovenian Named Entity Recognition. It is is a PyTorch neural network model, intended for usage with the HuggingFace transformers library (https://github.com/huggingface/transformers).
The model is based on the Slovenian RoBERTa contextual embeddings model SloBERTa 2.0 (http://hdl.handle.net/11356/1397). The model was trained on the SUK 1.0 training corpus (http://hdl.handle.net/11356/1747).The source code of the model is available on GitHub repository https://github.com/clarinsi/SloNER.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Pre-processed dataset for working with the HydroNet dataset in PyTorch Geometric.
See:
This dataset was created by Sunghyun Jun
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Samples in this benchmark were generated by RELAI using the following data source(s): Data Source Name: pytorch Data Source Link: https://pytorch.org/docs/stable/index.html Data Source License: https://github.com/pytorch/pytorch/blob/main/LICENSE Data Source Authors: PyTorch AI Benchmarks by Data Agents. 2025 RELAI.AI. Licensed under CC BY 4.0. Source: https://relai.ai
These datasets are customized Torch Geometric Datasets that contain raw .off polygon meshes as well as preprocessed .pt files needed for training morphVQ models. morphVQ can be found at https://github.com/oothomas/morphVQ.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Slovenian model for coreference resolution: a neural network based on a customized transformer architecture, usable with the code published on https://github.com/matejklemen/slovene-coreference-resolution. The model is based on the Slovenian CroSloEngual BERT 1.1 model (http://hdl.handle.net/11356/1330). It was trained on the SUK 1.0 training corpus (http://hdl.handle.net/11356/1747), specifically the SentiCoref subcorpus.
Using the evaluation setting where entity mentions are assumed to be correctly pre-detected, the model achieves the following metric values: MUC: precision = 0.931, recall = 0.957, F1 = 0.943 BCubed: precision = 0.887, recall = 0.947, F1 = 0.914 CEAFe: precision = 0.945, recall = 0.893, F1 = 0.916 CoNLL-12: precision = 0.921, recall = 0.932, F1 = 0.924
Federated Learning Demonstrator MNIST Example (Version 1.0.1)
Bayesian inference has predominantly relied on the Markov chain Monte Carlo (MCMC) algorithm for many years. However, MCMC is computationally laborious, especially for complex phylogenetic models of time trees. This bottleneck has led to the search for alternatives, such as variational Bayes, which can scale better to large datasets. In this paper, we introduce torchtree, a framework written in Python that allows developers to easily implement rich phylogenetic models and algorithms using a fixed tree topology. One can either use automatic differentiation, or leverage torchtree's plug-in system to compute gradients analytically for model components for which automatic differentiation is slow. We demonstrate that the torchtree variational inference framework performs similarly to BEAST in terms of speed and approximation accuracy. Furthermore, we explore the use of the forward KL divergence as an optimizing criterion for variational inference, which can handle discontinuous and non-diffe..., , , # torchtree: flexible phylogenetic model development and inference using PyTorch
Mathieu Fourment,Ă‚ Matthew Macaulay,Ă‚ Christiaan J Swanepoel,Ă‚ Xiang Ji,Ă‚ Marc A Suchard,Ă‚ Frederick A Matsen IV.Ă‚ torchtree: flexible phylogenetic model development and inference using PyTorch.Ă‚ arXiv:2406.18044 (2024)
The SI.pdf file contains supplementary methods and figures referenced in the main manuscript (found on Zenodo under Supplemental Information).
The data.zip contains input files and phylogenetic trees used for analyses in the associated manuscript. The data are organized by dataset (HCV
and SC2
) and by tool (beast
and torchtree
), and include sequence alignments (see next section for SC2 alignment) and configuration files (xml and json files). torchtree uses variational Bayes while BEAST uses MCMC.
data/
├── HCV/
│ ├── HCV.fasta # Sequence alignment for HCV
│ ├── HCV.tree # Newick ...,
A dataset containing a sample event inspired by ProtoDUNE-SP simulation.
Checkpoints of trained DUNEdn package models used for Springer original article.
Federated Learning Client Base Image (Version 1.0.1)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the dataset used for pre-training in "ReasonBERT: Pre-trained to Reason with Distant Supervision", EMNLP'21.
There are two files:
sentence_pairs_for_pretrain_no_tokenization.tar.gz -> Contain only sentences as evidence, Text-only
table_pairs_for_pretrain_no_tokenization.tar.gz -> At least one piece of evidence is a table, Hybrid
The data is chunked into multiple tar files for easy loading. We use WebDataset, a PyTorch Dataset (IterableDataset) implementation providing efficient sequential/streaming data access.
For pre-training code, or if you have any questions, please check our GitHub repo https://github.com/sunlab-osu/ReasonBERT
Below is a sample code snippet to load the data
import webdataset as wds
# path to the uncompressed files, should be a directory with a set of tar files
url = './sentence_multi_pairs_for_pretrain_no_tokenization/{000000...000763}.tar'
dataset = (
wds.Dataset(url)
.shuffle(1000) # cache 1000 samples and shuffle
.decode()
.to_tuple("json")
.batched(20) # group every 20 examples into a batch
)
# Please see the documentation for WebDataset for more details about how to use it as dataloader for Pytorch
# You can also iterate through all examples and dump them with your preferred data format
Below we show how the data is organized with two examples.
Text-only
{'s1_text': 'Sils is a municipality in the comarca of Selva, in Catalonia, Spain.', # query sentence
's1_all_links': {
'Sils,_Girona': [[0, 4]],
'municipality': [[10, 22]],
'Comarques_of_Catalonia': [[30, 37]],
'Selva': [[41, 46]],
'Catalonia': [[51, 60]]
}, # list of entities and their mentions in the sentence (start, end location)
'pairs': [ # other sentences that share common entity pair with the query, group by shared entity pairs
{
'pair': ['Comarques_of_Catalonia', 'Selva'], # the common entity pair
's1_pair_locs': [[[30, 37]], [[41, 46]]], # mention of the entity pair in the query
's2s': [ # list of other sentences that contain the common entity pair, or evidence
{
'md5': '2777e32bddd6ec414f0bc7a0b7fea331',
'text': 'Selva is a coastal comarque (county) in Catalonia, Spain, located between the mountain range known as the Serralada Transversal or Puigsacalm and the Costa Brava (part of the Mediterranean coast). Unusually, it is divided between the provinces of Girona and Barcelona, with Fogars de la Selva being part of Barcelona province and all other municipalities falling inside Girona province. Also unusually, its capital, Santa Coloma de Farners, is no longer among its larger municipalities, with the coastal towns of Blanes and Lloret de Mar having far surpassed it in size.',
's_loc': [0, 27], # in addition to the sentence containing the common entity pair, we also keep its surrounding context. 's_loc' is the start/end location of the actual evidence sentence
'pair_locs': [ # mentions of the entity pair in the evidence
[[19, 27]], # mentions of entity 1
[[0, 5], [288, 293]] # mentions of entity 2
],
'all_links': {
'Selva': [[0, 5], [288, 293]],
'Comarques_of_Catalonia': [[19, 27]],
'Catalonia': [[40, 49]]
}
}
,...] # there are multiple evidence sentences
},
,...] # there are multiple entity pairs in the query
}
Hybrid
{'s1_text': 'The 2006 Major League Baseball All-Star Game was the 77th playing of the midseason exhibition baseball game between the all-stars of the American League (AL) and National League (NL), the two leagues comprising Major League Baseball.',
's1_all_links': {...}, # same as text-only
'sentence_pairs': [{'pair': ..., 's1_pair_locs': ..., 's2s': [...]}], # same as text-only
'table_pairs': [
'tid': 'Major_League_Baseball-1',
'text':[
['World Series Records', 'World Series Records', ...],
['Team', 'Number of Series won', ...],
['St. Louis Cardinals (NL)', '11', ...],
...] # table content, list of rows
'index':[
[[0, 0], [0, 1], ...],
[[1, 0], [1, 1], ...],
...] # index of each cell [row_id, col_id]. we keep only a table snippet, but the index here is from the original table.
'value_ranks':[
[0, 0, ...],
[0, 0, ...],
[0, 10, ...],
...] # if the cell contain numeric value/date, this is its rank ordered from small to large, follow TAPAS
'value_inv_ranks': [], # inverse rank
'all_links':{
'St._Louis_Cardinals': {
'2': [
[[2, 0], [0, 19]], # [[row_id, col_id], [start, end]]
] # list of mentions in the second row, the key is row_id
},
'CARDINAL:11': {'2': [[[2, 1], [0, 2]]], '8': [[[8, 3], [0, 2]]]},
}
'name': '', # table name, if exists
'pairs': {
'pair': ['American_League', 'National_League'],
's1_pair_locs': [[[137, 152]], [[162, 177]]], # mention in the query
'table_pair_locs': {
'17': [ # mention of entity pair in row 17
[
[[17, 0], [3, 18]],
[[17, 1], [3, 18]],
[[17, 2], [3, 18]],
[[17, 3], [3, 18]]
], # mention of the first entity
[
[[17, 0], [21, 36]],
[[17, 1], [21, 36]],
] # mention of the second entity
]
}
}
]
}
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Acute poisoning is a significant global health burden, and the causative agent is often unclear. The primary aim of this pilot study was to develop a deep learning algorithm that predicts the most probable agent a poisoned patient was exposed to from a pre-specified list of drugs. Data were queried from the National Poison Data System (NPDS) from 2014 through 2018 for eight single-agent poisonings (acetaminophen, diphenhydramine, aspirin, calcium channel blockers, sulfonylureas, benzodiazepines, bupropion, and lithium). Two Deep Neural Networks (PyTorch and Keras) designed for multi-class classification tasks were applied. There were 201,031 single-agent poisonings included in the analysis. For distinguishing among selected poisonings, PyTorch model had specificity of 97%, accuracy of 83%, precision of 83%, recall of 83%, and a F1-score of 82%. Keras had specificity of 98%, accuracy of 83%, precision of 84%, recall of 83%, and a F1-score of 83%. The best performance was achieved in the diagnosis of single-agent poisoning in diagnosing poisoning by lithium, sulfonylureas, diphenhydramine, calcium channel blockers, then acetaminophen, in PyTorch (F1-score = 99%, 94%, 85%, 83%, and 82%, respectively) and Keras (F1-score = 99%, 94%, 86%, 82%, and 82%, respectively). Deep neural networks can potentially help in distinguishing the causative agent of acute poisoning. This study used a small list of drugs, with polysubstance ingestions excluded.Reproducible source code and results can be obtained at https://github.com/ashiskb/npds-workspace.git.
This dataset was created by Ryazantsev Gleb
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains three files, listed below. The Kolmogorov flow is generated using a spectral solver, available at: https://github.com/google/jax-cfd. The Kelvin-Helmholtz Instability is generated using an in-house code.Case 1: Kolmogorov Flownu_0p0045_2500_8f_uv_128.pt -- a PyTorch tensor containing 2500 eight-frame videos of a 2D Re=222 forced turbulent flow (Kolmogorov flow), with only velocity vectors provided. The first 2000 samples are used as training data, the next 450 are used for validation and the final 50 are used to test the model, after training.Case 2: Kelvin Helmholtz InstabilityTraining and Validation:kh_8f_72_208_r34568.pt -- a PyTorch tensor containing 1000 eight-frame videos of a Kelvin-Helmholtz instability flow from 5 realisations of the flow (i.e. initialised from different random seeds). Each two hundred videos are from one simulation - the last two hundred may be used as a validation set.Testing: kh_8f_72_208_r9.pt -- a PyTorch tensor containing 200 eight-frame videos of a Kelvin-Helmholtz instability flow from a realisation of the flow different to the above. This is used as the test set for a model trained on kh_8f_72_208_r34568.pt.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This documents contains the scripts and dataset used for the paper "Unsupervised learning for structure detection in plastically deformed crystals".
More precisely it contains 4 folders :
DumpForFigures : subfolder containing the atomic positions in .dump format (see lammps documentation) used for the article figures.
DumpForTraining : subfolder containing the atomic position in .dump format (see lammps documentation) used for training the autoencoder.
ScriptsToDetectStructuresFromDump : subfolder containing the script sused to detect the substructures of the system by combining autoencoder and clustering methods. This folder contains a readme with the details of the contents.
ScriptToGenerateDump : subfolder containing the scripts used to generate the atomic data with molecular dynamics. These data are then used to train the autoencoder. This folder contains a readme with the details of the contents.
REQUIREMENTS :
Lammps
Python3 with packages :
-numpy
-matplotlib
-pyscal
-sci-kit learn
-pytorch
-glob
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
DataSet for training the PyTorch Graph Network Simulator. https://github.com/geoelements/gns. The repository contains the data sets for water drop sample
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Methods Cotton plants were grown in a well-controlled greenhouse in the NC State Phytotron as described previously (Pierce et al, 2019). Flowers were tagged on the day of anthesis and harvested three days post anthesis (3 DPA). The distinct fiber shapes had already formed by 2 DPA (Stiff and Haigler, 2016; Graham and Haigler, 2021), and fibers were still relatively short at 3 DPA, which facilitated the visualization of multiple fiber tips in one image. Cotton fiber sample preparation, digital image collection, and image analysis: Ovules with attached fiber were fixed in the greenhouse. The fixative previously used (Histochoice) (Stiff and Haigler, 2016; Pierce et al., 2019; Graham and Haigler, 2021) is obsolete, which led to testing and validation of another low-toxicity, formalin-free fixative (#A5472; Sigma-Aldrich, St. Louis, MO; Fig. S1). The boll wall was removed without damaging the ovules. (Using a razor blade, cut away the top 3 mm of the boll. Make about 1 mm deep longitudinal incisions between the locule walls, and finally cut around the base of the boll.) All of the ovules with attached fiber were lifted out of the locules and fixed (1 h, RT, 1:10 tissue:fixative ratio) prior to optional storage at 4°C. Immediately before imaging, ovules were examined under a stereo microscope (incident light, black background, 31X) to select three vigorous ovules from each boll while avoiding drying. Ovules were rinsed (3 x 5 min) in buffer [0.05 M PIPES, 12 mM EGTA. 5 mM EDTA and 0.1% (w/v) Tween 80, pH 6.8], which had lower osmolarity than a microtubule-stabilizing buffer used previously for aldehyde-fixed fibers (Seagull, 1990; Graham and Haigler, 2021). While steadying an ovule with forceps, one to three small pieces of its chalazal end with attached fibers were dissected away using a small knife (#10055-12; Fine Science Tools, Foster City, CA). Each ovule piece was placed in a single well of a 24-well slide (#63430-04; Electron Microscopy Sciences, Hatfield, PA) containing a single drop of buffer prior to applying and sealing a 24 x 60 mm coverslip with vaseline. Samples were imaged with brightfield optics and default settings for the 2.83 mega-pixel, color, CCD camera of the Keyence BZ-X810 imaging system (www.keyence.com; housed in the Cellular and Molecular Imaging Facility of NC State). The location of each sample in the 24-well slides was identified visually using a 2X objective and mapped using the navigation function of the integrated Keyence software. Using the 10X objective lens (plan-apochromatic; NA 0.45) and 60% closed condenser aperture setting, a region with many fiber apices was selected for imaging using the multi-point and z-stack capture functions. The precise location was recorded by the software prior to visual setting of the limits of the z-plane range (1.2 µm step size). Typically, three 24-sample slides (representing three accessions) were set up in parallel prior to automatic image capture. The captured z-stacks for each sample were processed into one two-dimensional image using the full-focus function of the software. (Occasional samples contained too much debris for computer vision to be effective, and these were reimaged.) Resources in this dataset:Resource Title: Deltapine 90 - Manually Annotated Training Set. File Name: GH3 DP90 Keyence 1_45 JPEG.zipResource Description: These images were manually annotated in Labelbox.Resource Title: Deltapine 90 - AI-Assisted Annotated Training Set. File Name: GH3 DP90 Keyence 46_101 JPEG.zipResource Description: These images were AI-labeled in RoboFlow and then manually reviewed in RoboFlow. Resource Title: Deltapine 90 - Manually Annotated Training-Validation Set. File Name: GH3 DP90 Keyence 102_125 JPEG.zipResource Description: These images were manually labeled in LabelBox, and then used for training-validation for the machine learning model.Resource Title: Phytogen 800 - Evaluation Test Images. File Name: Gb cv Phytogen 800.zipResource Description: These images were used to validate the machine learning model. They were manually annotated in ImageJ.Resource Title: Pima 3-79 - Evaluation Test Images. File Name: Gb cv Pima 379.zipResource Description: These images were used to validate the machine learning model. They were manually annotated in ImageJ.Resource Title: Pima S-7 - Evaluation Test Images. File Name: Gb cv Pima S7.zipResource Description: These images were used to validate the machine learning model. They were manually annotated in ImageJ.Resource Title: Coker 312 - Evaluation Test Images. File Name: Gh cv Coker 312.zipResource Description: These images were used to validate the machine learning model. They were manually annotated in ImageJ.Resource Title: Deltapine 90 - Evaluation Test Images. File Name: Gh cv Deltapine 90.zipResource Description: These images were used to validate the machine learning model. They were manually annotated in ImageJ.Resource Title: Half and Half - Evaluation Test Images. File Name: Gh cv Half and Half.zipResource Description: These images were used to validate the machine learning model. They were manually annotated in ImageJ.Resource Title: Fiber Tip Annotations - Manual. File Name: manual_annotations.coco_.jsonResource Description: Annotations in COCO.json format for fibers. Manually annotated in Labelbox.Resource Title: Fiber Tip Annotations - AI-Assisted. File Name: ai_assisted_annotations.coco_.jsonResource Description: Annotations in COCO.json format for fibers. AI annotated with human review in Roboflow.
Resource Title: Model Weights (iteration 600). File Name: model_weights.zipResource Description: The final model, provided as a zipped Pytorch .pth
file. It was chosen at training iteration 600.
The model weights can be imported for use of the fiber tip type detection neural network in Python.Resource Software Recommended: Google Colab,url: https://research.google.com/colaboratory/
This database contains the reference data used for direct force training of Artificial Neural Network (ANN) interatomic potentials using the atomic energy network (ænet) and ænet-PyTorch packages (https://github.com/atomisticnet/aenet-PyTorch). It also includes the GPR-augmented data used for indirect force training via Gaussian Process Regression (GPR) surrogate models using the ænet-GPR package (https://github.com/atomisticnet/aenet-gpr). Each data file contains atomic structures, energies, and atomic forces in XCrySDen Structure Format (XSF). The dataset includes all reference training/test data and corresponding GPR-augmented data used in the four benchmark examples presented in the reference paper, “Scalable Training of Neural Network Potentials for Complex Interfaces Through Data Augmentation”. A hierarchy of the dataset is described in the README.txt file, and an overview of the dataset is also summarized in supplementary Table S1 of the reference paper.
Source Code and Data snapshot accompanying the Training " Towards physics-based deep learning in OpenFOAM: Combining OpenFOAM with the PyTorch C++ API" given at the 17th OpenFOAM Workshop
This dataset was created by Luong Hoang Minh