9 datasets found

d
Data from: Torchtree: flexible phylogenetic model development and inference...
search.dataone.org
datadryad.org
Updated Jun 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mathieu Fourment; Matthew Macaulay; Christiaan Swanepoel; Xiang Ji; Marc Suchard; Frederick Matsen IV (2025). Torchtree: flexible phylogenetic model development and inference using PyTorch [Dataset]. http://doi.org/10.5061/dryad.zw3r228gv
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.zw3r228gv
Dataset updated
Jun 21, 2025
Dataset provided by
Dryad Digital Repository
Authors
Mathieu Fourment; Matthew Macaulay; Christiaan Swanepoel; Xiang Ji; Marc Suchard; Frederick Matsen IV
Description
Bayesian inference has predominantly relied on the Markov chain Monte Carlo (MCMC) algorithm for many years. However, MCMC is computationally laborious, especially for complex phylogenetic models of time trees. This bottleneck has led to the search for alternatives, such as variational Bayes, which can scale better to large datasets. In this paper, we introduce torchtree, a framework written in Python that allows developers to easily implement rich phylogenetic models and algorithms using a fixed tree topology. One can either use automatic differentiation, or leverage torchtree's plug-in system to compute gradients analytically for model components for which automatic differentiation is slow. We demonstrate that the torchtree variational inference framework performs similarly to BEAST in terms of speed and approximation accuracy. Furthermore, we explore the use of the forward KL divergence as an optimizing criterion for variational inference, which can handle discontinuous and non-diffe..., , , # torchtree: flexible phylogenetic model development and inference using PyTorch

Mathieu Fourment,Â Matthew Macaulay,Â Christiaan J Swanepoel,Â Xiang Ji,Â Marc A Suchard,Â Frederick A Matsen IV.Â torchtree: flexible phylogenetic model development and inference using PyTorch.Â arXiv:2406.18044 (2024)

Description of the data

The SI.pdf file contains supplementary methods and figures referenced in the main manuscript (found on Zenodo under Supplemental Information).

The data.zip contains input files and phylogenetic trees used for analyses in the associated manuscript. The data are organized by dataset (HCV and SC2) and by tool (beast and torchtree), and include sequence alignments (see next section for SC2 alignment) and configuration files (xml and json files). torchtree uses variational Bayes while BEAST uses MCMC.

data/ â”œâ”€â”€ HCV/ â”‚ â”œâ”€â”€ HCV.fasta # Sequence alignment for HCV â”‚ â”œâ”€â”€ HCV.tree # Newick ...,
Z
Personal Protective Equipment Dataset (PPED)
data.niaid.nih.gov
zenodo.org
Updated May 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous (2022). Personal Protective Equipment Dataset (PPED) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6551757
Explore at:
Dataset updated
May 17, 2022
Dataset authored and provided by
Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Personal Protective Equipment Dataset (PPED)

This dataset serves as a benchmark for PPE in chemical plants We provide datasets and experimental results.

The dataset

We produced a data set based on the actual needs and relevant regulations in chemical plants. The standard GB 39800.1-2020 formulated by the Ministry of Emergency Management of the People’s Republic of China defines the protective requirements for plants and chemical laboratories. The complete dataset is contained in the folder PPED/data.

1.1. Image collection

We took more than 3300 pictures. We set the following different characteristics, including different environments, different distances, different lighting conditions, different angles, and the diversity of the number of people photographed.

Backgrounds: There are 4 backgrounds, including office, near machines, factory and regular outdoor scenes.

Scale: By taking pictures from different distances, the captured PPEs are classified in small, medium and large scales.

Light: Good lighting conditions and poor lighting conditions were studied.

Diversity: Some images contain a single person, and some contain multiple people.

Angle: The pictures we took can be divided into front and side.

A total of more than 3300 photos were taken in the raw data under all conditions. All images are located in the folder “PPED/data/JPEGImages”.

1.2. Label

We use Labelimg as the labeling tool, and we use the PASCAL-VOC labelimg format. Yolo use the txt format, we can use trans_voc2yolo.py to convert the XML file in PASCAL-VOC format to txt file. Annotations are stored in the folder PPED/data/Annotations

1.3. Dataset Features

The pictures are made by us according to the different conditions mentioned above. The file PPED/data/feature.csv is a CSV file which notes all the .os of all the image. It records every feature of the picture, including lighting conditions, angles, backgrounds, number of people and scale.

1.4. Dataset Division

The data set is divided into 9:1 training set and test set.

Baseline Experiments

We provide baseline results with five models, namely Faster R-CNN ®, Faster R-CNN (M), SSD, YOLOv3-spp, and YOLOv5. All code and results is given in folder PPED/experiment.

2.1. Environment and Configuration:

Intel Core i7-8700 CPU

NVIDIA GTX1060 GPU

16 GB of RAM

Python: 3.8.10

pytorch: 1.9.0

pycocotools: pycocotools-win

Windows 10

2.2. Applied Models

The source codes and results of the applied models is given in folder PPED/experiment with sub-folders corresponding to the model names.

2.2.1. Faster R-CNN

Faster R-CNN

backbone: resnet50+fpn

We downloaded the pre-training weights from https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth.

We modified the dataset path, training classes and training parameters including batch size.

We run train_res50_fpn.py start training.

Then, the weights are trained by the training set.

Finally, we validate the results on the test set.

backbone: mobilenetv2

the same training method as resnet50+fpn, but the effect is not as good as resnet50+fpn, so it is directly discarded.

The Faster R-CNN source code used in our experiment is given in folder PPED/experiment/Faster R-CNN. The weights of the fully-trained Faster R-CNN (R), Faster R-CNN (M) model are stored in file PPED/experiment/trained_models/resNetFpn-model-19.pth and mobile-model.pth. The performance measurements of Faster R-CNN (R) Faster R-CNN (M) are stored in folder PPED/experiment/results/Faster RCNN(R)and Faster RCNN(M).

2.2.2. SSD

backbone: resnet50

We downloaded pre-training weights from https://download.pytorch.org/models/resnet50-19c8e357.pth.

The same training method as Faster R-CNN is applied.

The SSD source code used in our experiment is given in folder PPED/experiment/ssd. The weights of the fully-trained SSD model are stored in file PPED/experiment/trained_models/SSD_19.pth. The performance measurements of SSD are stored in folder PPED/experiment/results/SSD.

2.2.3. YOLOv3-spp

backbone: DarkNet53

We modified the type information of the XML file to match our application.

We run trans_voc2yolo.py to convert the XML file in VOC format to a txt file.

The weights used are: yolov3-spp-ultralytics-608.pt.

The YOLOv3-spp source code used in our experiment is given in folder PPED/experiment/YOLOv3-spp. The weights of the fully-trained YOLOv3-spp model are stored in file PPED/experiment/trained_models/YOLOvspp-19.pt. The performance measurements of YOLOv3-spp are stored in folder PPED/experiment/results/YOLOv3-spp.

2.2.4. YOLOv5

backbone: CSP_DarkNet

We modified the type information of the XML file to match our application.

We run trans_voc2yolo.py to convert the XML file in VOC format to a txt file.

The weights used are: yolov5s.

The YOLOv5 source code used in our experiment is given in folder PPED/experiment/yolov5. The weights of the fully-trained YOLOv5 model are stored in file PPED/experiment/trained_models/YOLOv5.pt. The performance measurements of YOLOv5 are stored in folder PPED/experiment/results/YOLOv5.

2.3. Evaluation

The computed evaluation metrics as well as the code needed to compute them from our dataset are provided in the folder PPED/experiment/eval.

Code Sources

Faster R-CNN (R and M)

https://github.com/WZMIAOMIAO/deep-learning-for-image-processing/tree/master/pytorch_object_detection/faster_rcnn

official code: https://github.com/pytorch/vision/blob/main/torchvision/models/detection/faster_rcnn.py

SSD

https://github.com/WZMIAOMIAO/deep-learning-for-image-processing/tree/master/pytorch_object_detection/ssd

official code: https://github.com/pytorch/vision/blob/main/torchvision/models/detection/ssd.py

YOLOv3-spp

https://github.com/WZMIAOMIAO/deep-learning-for-image-processing/tree/master/pytorch_object_detection/yolov3-spp

YOLOv5

https://github.com/ultralytics/yolov5
Chinese Chemical Safety Signs (CCSS)
zenodo.org
bin, html, xz
Updated Mar 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous; Anonymous (2023). Chinese Chemical Safety Signs (CCSS) [Dataset]. http://doi.org/10.5281/zenodo.5938816
Explore at:
xz, html, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5938816
Dataset updated
Mar 21, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anonymous; Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Chinese Chemical Safety Signs (CCSS)

This dataset is compiled as a benchmark for recognizing chemical safety signs from images. We provide both the dataset and the experimental results.

1. The Dataset

The complete dataset is contained in the folder ccss/data. The images include signs based on the Chinese standard "Safety Signs and their Application Guidelines" (GB 2894-2008) for safety signs in chemical environments. This standard, in turn, refers to the standards ISO 7010 (Graphical symbols – Safety Colours and Safety Signs – Safety signs used in workplaces and public areas), GB/T 10001 (Public Information Graphic Symbols for Signs), and GB 13495 (Fire Safety Signs)

1.1. Image Collection

We collect photos of commonly used chemical safety signs in chemical laboratories and chemistry teaching. For a discussion of the standards we base our collections, refer to the book "Talking about Hazardous Chemicals and Safety Signs" for common signs, and refer to the safety signs guidelines (GB 2894-2008).

The shooting was mainly carried out in 6 locations, namely on the road, in a parking lot, construction walls, in a chemical laboratory, outside near big machines, and inside the factory and corridor.

Shooting scale: Images in which the signs appear in small, medium and large scales were taken for each location by shooting photos from different distances.

Shooting light: good lighting conditions and poor lighting conditions were investigated.

Part of the images contain multiple targets and the other part contains only single signs.

Under all conditions, a total of 4650 photos were taken in the original data. These were expanded to 27,900 photos were via data enhancement. All images are located in folder ccss/data/JPEGImages.

The file ccss/data/features/enhanced_data_to_original_data.csv provides a mapping between the enhanced image name and the corresponding original image.

1.2. Annotation and Labelimg

We use Labelimg as labeling tool, which, in turn, uses the PASCAL-VOC labelimg format. The annotation is stored in the folder ccss/data/Annotations.

Faster R-CNN and SSD are two algorithms that use this format. When training YOLOv5, you can run trans_voc2yolo.py to convert the XML file in PASCAL-VOC format to a txt file.

We provide further meta-information about the dataset in form of a CSV file features.csv which notes, for each image, which other features it has (lighting conditions, scale, multiplicity, etc.). We apply the COCO standard for deciding whether a target is small, medium, or large in size.

1.3. Dataset Features

As stated above, the images have been shot under different conditions. We provide all the feature information in folder ccss/data/features. For each feature, there is a separate list of file names in that folder. The file ccss/data/features/features_on_original_data.csv is a CSV file which notes all the features of each original image.

1.4. Dataset Division

The data set is fixedly divided into 7:3 training set and test set. You can find the corresponding image names in the files ccss/data/training_data_file_names.txt and ccss/data/test_data_file_names.txt.

2. Baseline Experiments

We provide baseline results with five models, namely Faster R-CNN (R), Faster R-CNN (M), SSD, YOLOv3-spp, and YOLOv5. All code and results is given in folder ccss/experiment.

2.2. Environment and Configuration:

Single Intel Core i7-8700 CPU

NVIDIA GTX1060 GPU

16 GB of RAM

Python: 3.8.10

pytorch: 1.9.0

pycocotools: pycocotools-win

Visual Studio 2017

Windows 10

2.3. Applied Models

The source codes and results of the applied models is given in folder ccss/experiment with sub-folders corresponding to the model names.

2.3.1. Faster R-CNN

Faster R-CNN (R) has the backbone resnet50+fpn.

we downloaded the pre-training weights from https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth

we modify the type information of the JSON file to match our application.

run train_res50_fpn.py

finally, the weights trained by the training set.

The Faster R-CNN (R) source code used in our experiment is given in folder ccss/experiment/sources/faster_rcnn (R). The weights of the fully-trained Faster R-CNN (R) model are stored in file ccss/experiment/trained_models/faster_rcnn (R).pth. The performance measurements of Faster R-CNN (R) are stored in folder ccss/experiment/performance_indicators/faster_rcnn (R).

Faster R-CNN (M) has the backbone mobilenetv2.

backbone: MobileNetV2.

we modify the type information of the JSON file to match our application.

run train_mobilenetv2.py

finally, the weights trained by the training set.

The Faster R-CNN (M) source code used in our experiment is given in folder ccss/experiment/sources/faster_rcnn (M). The weights of the fully-trained Faster R-CNN (M) model are stored in file ccss/experiment/trained_models/faster_rcnn (M).pth. The performance measurements of Faster R-CNN (M) are stored in folder ccss/experiment/performance_indicators/faster_rcnn (M).

2.3.2. SSD

backbone: resnet50

we downloaded pre-training weights from https://download.pytorch.org/models/resnet50-19c8e357.pth

the same training method as Faster R-CNN is applied.

The SSD source code used in our experiment is given in folder ccss/experiment/sources/ssd. The weights of the fully-trained SSD model are stored in file ccss/experiment/trained_models/ssd.pth. The performance measurements of SSD are stored in folder ccss/experiment/performance_indicators/ssd.

2.3.3. YOLOv3-spp

backbone: DarkNet53

we modified the type information of the XML file to match our application

run trans_voc2yolo.py to convert the XML file in VOC format to a txt file.

the weights used are: yolov3-spp-ultralytics-608.pt.

The YOLOv3-spp source code used in our experiment is given in folder ccss/experiment/sources/yolov3-spp. The weights of the fully-trained YOLOv3-spp model are stored in file ccss/experiment/trained_models/yolov3-spp.pt. The performance measurements of YOLOv3-spp are stored in folder ccss/experiment/performance_indicators/yolov3-spp.

2.3.4. YOLOv5

backbone: CSP_DarkNet

we modified the type information of the XML file to match our application

run trans_voc2yolo.py to convert the XML file in VOC format to a txt file.

the weights used are: yolov5s.

The YOLOv5 source code used in our experiment is given in folder ccss/experiment/sources/yolov5. The weights of the fully-trained YOLOv5 model are stored in file ccss/experiment/trained_models/yolov5.pt. The performance measurements of YOLOv5 are stored in folder ccss/experiment/performance_indicators/yolov5.

2.4. Evaluation

The computed evaluation metrics as well as the code needed to compute them from our dataset are provided in the folder ccss/experiment/performance_indicators. They are provided over the complete test st as well as separately for the image features (over the test set).

3. Code Sources

Faster R-CNN (R and M)

https://github.com/WZMIAOMIAO/deep-learning-for-image-processing/tree/master/pytorch_object_detection/faster_rcnn

official code: https://github.com/pytorch/vision/blob/main/torchvision/models/detection/faster_rcnn.py

SSD

https://github.com/WZMIAOMIAO/deep-learning-for-image-processing/tree/master/pytorch_object_detection/ssd

official code: https://github.com/pytorch/vision/blob/main/torchvision/models/detection/ssd.py

YOLOv3-spp

https://github.com/WZMIAOMIAO/deep-learning-for-image-processing/tree/master/pytorch_object_detection/yolov3-spp

YOLOv5

https://github.com/ultralytics/yolov5

We are particularly thankful to the author of the GitHub repository WZMIAOMIAO/deep-learning-for-image-processing (with whom we are not affiliated). Their instructive videos and codes were most helpful during our work. In
e
Foldclass databases for protein structural domains in CATH and TED - Dataset...
b2find.eudat.eu
Updated Feb 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Foldclass databases for protein structural domains in CATH and TED - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/89e96e17-6989-53e9-9768-45066418d923
Explore at:
Dataset updated
Feb 1, 2025
Description
This repository contains databases of protein domains for use with Foldclass and Merizo-search. We provide databases for all 365 million domains in TED, as well as all classified domains in CATH 4.3.Foldclass and Merizo-search use two formats for databases. The default format uses a PyTorch tensor and a pickled list of Python tuples to store the data. This format is used for the CATH database, which is small enough to fit in memory. For larger-than-memory datasets, such as TED, we use a binary format that is searched using the Faiss library.The CATH database requires approximately 1.4 GB of disk space, whereas the TED database requires about 885 GB. Please ensure you have enough free storage space before downloading. For best search performance with the TED database, the database should be stored on the fastest storage hardware available to you.IMPORTANT:We recommend going in to each folder and downloading the files; if you attempt to download each folder in one go, it will download a zip file which will need to be decompressed. This is particularly an issue if downloading the TED database, as you will need to have roughly twice the storage space needed as compared to downloading the individual files. Our GitHub repository (see Related Materials below) contains a convenience script to download each database; we recommend using that.
GEMS: GNN Framework For Efficient Protein-Ligand Binding Affinity Prediction...
zenodo.org
application/gzip, bin +1
Updated Dec 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peter Stockinger; Peter Stockinger (2024). GEMS: GNN Framework For Efficient Protein-Ligand Binding Affinity Prediction Through Robust Data Filtering and Language Model Integration [Dataset]. http://doi.org/10.5281/zenodo.14260171
Explore at:
json, application/gzip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14260171
Dataset updated
Dec 8, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Peter Stockinger; Peter Stockinger
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
For fast reproduction of our results, we provide PyTorch datasets of precomputed interaction graphs for the entire PDBbind database on Zenodo. To enable quick establishment of leakage-free evaluation setups with PDBbind, we also provide pairwise similarity matrices for the entire PDBbind dataset on Zenodo.
ASL Benchmark Dataset (YOLOv5 PyTorch Format)
kaggle.com
Updated Sep 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dima (2021). ASL Benchmark Dataset (YOLOv5 PyTorch Format) [Dataset]. https://www.kaggle.com/tasnimdima/datazip/activity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 4, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Dima
Description
Context

I made this data annotation for conference paper . I try to make an application that will be fast and light enough to deploy in any cutting edge device while maintaining a good accuracy like any state-of-the-art model.

Data Details

The following pre-processing was applied to each image: * Auto-orientation of pixel data (with EXIF-orientation stripping) * Resize to 416x416 (Stretch)

The following augmentation was applied to create 3 versions of each source image in trainig set images: * 50% probability of horizontal flip * 50% probability of vertical flip * Equal probability of one of the following 90-degree rotations: none, clockwise, counter-clockwise, upside-down * Randomly crop between 0 and 7 percent of the image * Random rotation of between -40 and +40 degrees * Random shear of between -29° to +29° horizontally and -15° to +15° vertically * Random exposure adjustment of between -34 and +34 percent * Random Gaussian blur of between 0 and 1.5 pixels * Salt and pepper noise was applied to 4 percent of pixels

Acknowledgements

A big shoutout to Massey University for making this dataset public. The original dataset Link is : here , Please keep in mind that the original dataset maybe updated from time to time. However, I don't intend to update this annotated version.
Data from: Segment and Support: a dual-purpose Deep Learning solution for...
zenodo.org
bin, text/x-python +1
Updated Mar 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michał Gontarz; Michał Gontarz; Vibekananda Dutta; Vibekananda Dutta; Wojciech Krauze; Wojciech Krauze; Małgorzata Kujawińska; Małgorzata Kujawińska (2025). Segment and Support: a dual-purpose Deep Learning solution for Limited Angle Holographic Tomography - dataset [Dataset]. http://doi.org/10.5281/zenodo.13591396
Explore at:
bin, text/x-python, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13591396
Dataset updated
Mar 17, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Michał Gontarz; Michał Gontarz; Vibekananda Dutta; Vibekananda Dutta; Wojciech Krauze; Wojciech Krauze; Małgorzata Kujawińska; Małgorzata Kujawińska
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset constains training data of two types:

Synthetic volumes:
Containing randomly generated and placed spheres of various sizes. The dataset was augmented by adding noise, rotation, holes, flipping, blurring and shifting. Data generated this way is treated as the input data. An output pair for each datapoint has been generated by inducing missing cone artifacts, by masking the 3D Fourier spectrum of each volume (to mimic an experimental scenario).

Experimental data:
This part of the dataset contains erroneous Direct Inversion reconstructions of cells [1], organoids [2], phantoms [3-4]. The mask has been generated through a thresholding step of a GTVIC [5] algorithm.

Each volume has the shape (128,128,128), and is of float32 precision. They are saved as .tiff files, which can be easily read via the tifffile Python library.

Additionally, the dataset contains:

SnSNet_best_loss_model.pth - PyTorch file containing the weights of the model, whose loss was the lowest during training.

SnSNet_best_metric_model.pth - PyTorch file containing the weights of the model, whose metrics were the highest during training.

SnSNet_config.yml - YAML file containing all config information about the model, its training and dataset. It is necessary for inference and model recreation.

Inference.py - Short Python script for quick inference. The input should be a floating point, erroneous DI reconstruction. It produces a binary mask of the object with corrected geometry and saves it into the same directory as the input file, with the prefix PRED_.

reqs.txt - text file, containing the library requirements necessary for running Inference.py. It is compiled, such that the script performs the inference on GPU, using CUDA 11.8 on Windows 10. If the configuration does not enable running on GPU, it will run on CPU and will be indicated in the logs.

training.xlsx - Excel file, containing data about the training process.

[1] M. Baczewska, W. Krauze, A. Kuś, P. Stępień, K. Tokarska, K. Zukowski, E. Malinowska, Z. Brzózka, and M. Kujawińska, “On-chip holographic tomography for quantifying refractive index changes of cells’ dynamics,” in Quantitative Phase Imaging VIII, vol. 11970 Y. Liu, G. Popescu, and Y. Park, eds., International Society for Optics and Photonics (SPIE, 2022), p. 1197008.
[2] P. Stępień, M. Ziemczonok, M. Kujawińska, M. Baczewska, L. Valenti, A. Cherubini, E. Casirati, and W. Krauze, “Numerical refractive index correction for the stitching procedure in tomographic quantitative phase imaging,” Biomed. Opt. Express 13, 5709–5720 (2022).
[3] M. Ziemczonok, A. Kuś, P. Wasylczyk, and M. Kujawińska, “3d-printed biological cell phantom for testing 3d quantitative phase imaging systems,” Sci. Reports 9, 1–9 (2019).
[4] M. Ziemczonok, A. Kuś, and M. Kujawińska, “Optical diffraction tomography meets metrology — measurement accuracy on cellular and subcellular level,” Measurement 195, 111106 (2022).
[5] W. Krauze, P. Makowski, M. Kujawińska, and A. Kuś, “Generalized total variation iterative constraint strategy in limited angle optical diffraction tomography,” Opt. Express 24, 4924–4936 (2016).
Code and Source Data for "Knowledge-Guided Machine Learning can improve C...
zenodo.org
zip
Updated Oct 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LICHENG LIU; LICHENG LIU; Zhenong Jin; Zhenong Jin (2024). Code and Source Data for "Knowledge-Guided Machine Learning can improve C cycle quantification in agroecosystems" [Dataset]. http://doi.org/10.5281/zenodo.10155516
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10155516
Dataset updated
Oct 19, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
LICHENG LIU; LICHENG LIU; Zhenong Jin; Zhenong Jin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Datasets for code and Source Data for the study "Knowledge-Guided Machine Learning can improve C cycle quantification in agroecosystems" https://doi.org/10.1038/s41467-023-43860-5. All files belong to Licheng Liu and Zhenong Jin at University of Minnesota. deposit_code_v2.zip contains packaged codes and sample runs for KGML-ag-Carbon training, validation and implementations. Source Data.zip contains data for generating the figures inside the study.

Note: We used Pytorch 1.6.0 (https://pytorch.org/get-started/previous-versions/, last access: 21 Oct 2023) and Python 3.7.11 (https://www.python.org/downloads/release/python-3711/, last access: 21 Oct 2023) as the programming environment for model development. Statistical analysis, such as linear regression, was conducted using Statsmodels 0.14.0 (https://github.com/statsmodels/statsmodels/, last access: 21 Oct 2023) In order to use a GPU to speed-up the training process, we installed the CUDA Toolkit 10.1.243 (https://developer.nvidia.com/cuda-toolkit, last access: 21 Oct 2023).

To use the full kgml_lib function, please create a new environment with the same python and libs above.
c
Data from: GreynirTranslate - mBART25 NMT (with layer drop) models for...
repotest.clarin.is
Updated Sep 27, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vésteinn Snæbjarnarson; Svanhvít Lilja Ingólfsdóttir; Haukur Barri Símonarson (2021). GreynirTranslate - mBART25 NMT (with layer drop) models for Translations between Icelandic and English (1.0) [Dataset]. https://repotest.clarin.is/repository/xmlui/handle/20.500.12537/128?locale-attribute=is
Explore at:
Dataset updated
Sep 27, 2021
Authors
Vésteinn Snæbjarnarson; Svanhvít Lilja Ingólfsdóttir; Haukur Barri Símonarson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These are the models in http://hdl.handle.net/20.500.12537/125 trained with 40% layer drop. They are suitable for inference using every other layer for optimized inference speed with lower translation performance. We refer to the prior submission for usage and the documentation on layerdrop at https://github.com/pytorch/fairseq/blob/fcca32258c8e8bcc9f9890bf4714fa2f96b6b3e1/examples/layerdrop/README.md.

Þessi líkön eru þjálfuð með 40% laga missi (e. layer drop) á líkönunum í http://hdl.handle.net/20.500.12537/125. Þau henta vel til þýðinga þar sem er búið að henda öðru hverju lagi í netinu og þannig er hægt að hraða á þýðingum á kostnað gæða. Leiðbeiningar um notkun netanna er að finna með upphaflegu líkönunum og í notkunarleiðbeiningum Fairseq í https://github.com/pytorch/fairseq/blob/fcca32258c8e8bcc9f9890bf4714fa2f96b6b3e1/examples/layerdrop/README.md.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Mathieu Fourment; Matthew Macaulay; Christiaan Swanepoel; Xiang Ji; Marc Suchard; Frederick Matsen IV (2025). Torchtree: flexible phylogenetic model development and inference using PyTorch [Dataset]. http://doi.org/10.5061/dryad.zw3r228gv

Data from: Torchtree: flexible phylogenetic model development and inference using PyTorch

Explore at:

Unique identifier

https://doi.org/10.5061/dryad.zw3r228gv

Dataset updated

Jun 21, 2025

Dataset provided by

Dryad Digital Repository

Authors

Mathieu Fourment; Matthew Macaulay; Christiaan Swanepoel; Xiang Ji; Marc Suchard; Frederick Matsen IV

Description

Bayesian inference has predominantly relied on the Markov chain Monte Carlo (MCMC) algorithm for many years. However, MCMC is computationally laborious, especially for complex phylogenetic models of time trees. This bottleneck has led to the search for alternatives, such as variational Bayes, which can scale better to large datasets. In this paper, we introduce torchtree, a framework written in Python that allows developers to easily implement rich phylogenetic models and algorithms using a fixed tree topology. One can either use automatic differentiation, or leverage torchtree's plug-in system to compute gradients analytically for model components for which automatic differentiation is slow. We demonstrate that the torchtree variational inference framework performs similarly to BEAST in terms of speed and approximation accuracy. Furthermore, we explore the use of the forward KL divergence as an optimizing criterion for variational inference, which can handle discontinuous and non-diffe..., , , # torchtree: flexible phylogenetic model development and inference using PyTorch

Mathieu Fourment,Â Matthew Macaulay,Â Christiaan J Swanepoel,Â Xiang Ji,Â Marc A Suchard,Â Frederick A Matsen IV.Â torchtree: flexible phylogenetic model development and inference using PyTorch.Â arXiv:2406.18044 (2024)

Description of the data

The SI.pdf file contains supplementary methods and figures referenced in the main manuscript (found on Zenodo under Supplemental Information).

The data.zip contains input files and phylogenetic trees used for analyses in the associated manuscript. The data are organized by dataset (HCV and SC2) and by tool (beast and torchtree), and include sequence alignments (see next section for SC2 alignment) and configuration files (xml and json files). torchtree uses variational Bayes while BEAST uses MCMC.

data/
â”œâ”€â”€ HCV/
â”‚  â”œâ”€â”€ HCV.fasta          # Sequence alignment for HCV
â”‚  â”œâ”€â”€ HCV.tree          # Newick ...,

Clear search

Close search

Google apps

Main menu

Data from: Torchtree: flexible phylogenetic model development and inference...

Description of the data

Personal Protective Equipment Dataset (PPED)

Chinese Chemical Safety Signs (CCSS)

Foldclass databases for protein structural domains in CATH and TED - Dataset...

GEMS: GNN Framework For Efficient Protein-Ligand Binding Affinity Prediction...

ASL Benchmark Dataset (YOLOv5 PyTorch Format)

Context

Data Details

Acknowledgements

Data from: Segment and Support: a dual-purpose Deep Learning solution for...

Code and Source Data for "Knowledge-Guided Machine Learning can improve C...

Data from: GreynirTranslate - mBART25 NMT (with layer drop) models for...

Data from: Torchtree: flexible phylogenetic model development and inference using PyTorch

Description of the data