9 datasets found
  1. d

    Data from: Torchtree: flexible phylogenetic model development and inference...

    • search.dataone.org
    • datadryad.org
    Updated Jun 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mathieu Fourment; Matthew Macaulay; Christiaan Swanepoel; Xiang Ji; Marc Suchard; Frederick Matsen IV (2025). Torchtree: flexible phylogenetic model development and inference using PyTorch [Dataset]. http://doi.org/10.5061/dryad.zw3r228gv
    Explore at:
    Dataset updated
    Jun 21, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Mathieu Fourment; Matthew Macaulay; Christiaan Swanepoel; Xiang Ji; Marc Suchard; Frederick Matsen IV
    Description

    Bayesian inference has predominantly relied on the Markov chain Monte Carlo (MCMC) algorithm for many years. However, MCMC is computationally laborious, especially for complex phylogenetic models of time trees. This bottleneck has led to the search for alternatives, such as variational Bayes, which can scale better to large datasets. In this paper, we introduce torchtree, a framework written in Python that allows developers to easily implement rich phylogenetic models and algorithms using a fixed tree topology. One can either use automatic differentiation, or leverage torchtree's plug-in system to compute gradients analytically for model components for which automatic differentiation is slow. We demonstrate that the torchtree variational inference framework performs similarly to BEAST in terms of speed and approximation accuracy. Furthermore, we explore the use of the forward KL divergence as an optimizing criterion for variational inference, which can handle discontinuous and non-diffe..., , , # torchtree: flexible phylogenetic model development and inference using PyTorch

    Mathieu Fourment, Matthew Macaulay, Christiaan J Swanepoel, Xiang Ji, Marc A Suchard, Frederick A Matsen IV. torchtree: flexible phylogenetic model development and inference using PyTorch. arXiv:2406.18044 (2024)

    Description of the data

    The SI.pdf file contains supplementary methods and figures referenced in the main manuscript (found on Zenodo under Supplemental Information).

    The data.zip contains input files and phylogenetic trees used for analyses in the associated manuscript. The data are organized by dataset (HCV and SC2) and by tool (beast and torchtree), and include sequence alignments (see next section for SC2 alignment) and configuration files (xml and json files). torchtree uses variational Bayes while BEAST uses MCMC.

    data/
    ├── HCV/
    │  ├── HCV.fasta          # Sequence alignment for HCV
    │  ├── HCV.tree          # Newick ...,
    
  2. Z

    Personal Protective Equipment Dataset (PPED)

    • data.niaid.nih.gov
    • zenodo.org
    Updated May 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous (2022). Personal Protective Equipment Dataset (PPED) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6551757
    Explore at:
    Dataset updated
    May 17, 2022
    Dataset authored and provided by
    Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Personal Protective Equipment Dataset (PPED)

    This dataset serves as a benchmark for PPE in chemical plants We provide datasets and experimental results.

    1. The dataset

    We produced a data set based on the actual needs and relevant regulations in chemical plants. The standard GB 39800.1-2020 formulated by the Ministry of Emergency Management of the People’s Republic of China defines the protective requirements for plants and chemical laboratories. The complete dataset is contained in the folder PPED/data.

    1.1. Image collection

    We took more than 3300 pictures. We set the following different characteristics, including different environments, different distances, different lighting conditions, different angles, and the diversity of the number of people photographed.

    Backgrounds: There are 4 backgrounds, including office, near machines, factory and regular outdoor scenes.

    Scale: By taking pictures from different distances, the captured PPEs are classified in small, medium and large scales.

    Light: Good lighting conditions and poor lighting conditions were studied.

    Diversity: Some images contain a single person, and some contain multiple people.

    Angle: The pictures we took can be divided into front and side.

    A total of more than 3300 photos were taken in the raw data under all conditions. All images are located in the folder “PPED/data/JPEGImages”.

    1.2. Label

    We use Labelimg as the labeling tool, and we use the PASCAL-VOC labelimg format. Yolo use the txt format, we can use trans_voc2yolo.py to convert the XML file in PASCAL-VOC format to txt file. Annotations are stored in the folder PPED/data/Annotations

    1.3. Dataset Features

    The pictures are made by us according to the different conditions mentioned above. The file PPED/data/feature.csv is a CSV file which notes all the .os of all the image. It records every feature of the picture, including lighting conditions, angles, backgrounds, number of people and scale.

    1.4. Dataset Division

    The data set is divided into 9:1 training set and test set.

    1. Baseline Experiments

    We provide baseline results with five models, namely Faster R-CNN ®, Faster R-CNN (M), SSD, YOLOv3-spp, and YOLOv5. All code and results is given in folder PPED/experiment.

    2.1. Environment and Configuration:

    Intel Core i7-8700 CPU

    NVIDIA GTX1060 GPU

    16 GB of RAM

    Python: 3.8.10

    pytorch: 1.9.0

    pycocotools: pycocotools-win

    Windows 10

    2.2. Applied Models

    The source codes and results of the applied models is given in folder PPED/experiment with sub-folders corresponding to the model names.

    2.2.1. Faster R-CNN

    Faster R-CNN

    backbone: resnet50+fpn

    We downloaded the pre-training weights from https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth.

    We modified the dataset path, training classes and training parameters including batch size.

    We run train_res50_fpn.py start training.

    Then, the weights are trained by the training set.

    Finally, we validate the results on the test set.

    backbone: mobilenetv2

    the same training method as resnet50+fpn, but the effect is not as good as resnet50+fpn, so it is directly discarded.

    The Faster R-CNN source code used in our experiment is given in folder PPED/experiment/Faster R-CNN. The weights of the fully-trained Faster R-CNN (R), Faster R-CNN (M) model are stored in file PPED/experiment/trained_models/resNetFpn-model-19.pth and mobile-model.pth. The performance measurements of Faster R-CNN (R) Faster R-CNN (M) are stored in folder PPED/experiment/results/Faster RCNN(R)and Faster RCNN(M).

    2.2.2. SSD

    backbone: resnet50

    We downloaded pre-training weights from https://download.pytorch.org/models/resnet50-19c8e357.pth.

    The same training method as Faster R-CNN is applied.

    The SSD source code used in our experiment is given in folder PPED/experiment/ssd. The weights of the fully-trained SSD model are stored in file PPED/experiment/trained_models/SSD_19.pth. The performance measurements of SSD are stored in folder PPED/experiment/results/SSD.

    2.2.3. YOLOv3-spp

    backbone: DarkNet53

    We modified the type information of the XML file to match our application.

    We run trans_voc2yolo.py to convert the XML file in VOC format to a txt file.

    The weights used are: yolov3-spp-ultralytics-608.pt.

    The YOLOv3-spp source code used in our experiment is given in folder PPED/experiment/YOLOv3-spp. The weights of the fully-trained YOLOv3-spp model are stored in file PPED/experiment/trained_models/YOLOvspp-19.pt. The performance measurements of YOLOv3-spp are stored in folder PPED/experiment/results/YOLOv3-spp.

    2.2.4. YOLOv5

    backbone: CSP_DarkNet

    We modified the type information of the XML file to match our application.

    We run trans_voc2yolo.py to convert the XML file in VOC format to a txt file.

    The weights used are: yolov5s.

    The YOLOv5 source code used in our experiment is given in folder PPED/experiment/yolov5. The weights of the fully-trained YOLOv5 model are stored in file PPED/experiment/trained_models/YOLOv5.pt. The performance measurements of YOLOv5 are stored in folder PPED/experiment/results/YOLOv5.

    2.3. Evaluation

    The computed evaluation metrics as well as the code needed to compute them from our dataset are provided in the folder PPED/experiment/eval.

    1. Code Sources

    Faster R-CNN (R and M)

    https://github.com/WZMIAOMIAO/deep-learning-for-image-processing/tree/master/pytorch_object_detection/faster_rcnn

    official code: https://github.com/pytorch/vision/blob/main/torchvision/models/detection/faster_rcnn.py

    SSD

    https://github.com/WZMIAOMIAO/deep-learning-for-image-processing/tree/master/pytorch_object_detection/ssd

    official code: https://github.com/pytorch/vision/blob/main/torchvision/models/detection/ssd.py

    YOLOv3-spp

    https://github.com/WZMIAOMIAO/deep-learning-for-image-processing/tree/master/pytorch_object_detection/yolov3-spp

    YOLOv5

    https://github.com/ultralytics/yolov5

  3. Chinese Chemical Safety Signs (CCSS)

    • zenodo.org
    bin, html, xz
    Updated Mar 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous; Anonymous (2023). Chinese Chemical Safety Signs (CCSS) [Dataset]. http://doi.org/10.5281/zenodo.5938816
    Explore at:
    xz, html, binAvailable download formats
    Dataset updated
    Mar 21, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous; Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Chinese Chemical Safety Signs (CCSS)

    This dataset is compiled as a benchmark for recognizing chemical safety signs from images. We provide both the dataset and the experimental results.

    1. The Dataset

    The complete dataset is contained in the folder ccss/data. The images include signs based on the Chinese standard "Safety Signs and their Application Guidelines" (GB 2894-2008) for safety signs in chemical environments. This standard, in turn, refers to the standards ISO 7010 (Graphical symbols – Safety Colours and Safety Signs – Safety signs used in workplaces and public areas), GB/T 10001 (Public Information Graphic Symbols for Signs), and GB 13495 (Fire Safety Signs)

    1.1. Image Collection

    We collect photos of commonly used chemical safety signs in chemical laboratories and chemistry teaching. For a discussion of the standards we base our collections, refer to the book "Talking about Hazardous Chemicals and Safety Signs" for common signs, and refer to the safety signs guidelines (GB 2894-2008).

    • The shooting was mainly carried out in 6 locations, namely on the road, in a parking lot, construction walls, in a chemical laboratory, outside near big machines, and inside the factory and corridor.
    • Shooting scale: Images in which the signs appear in small, medium and large scales were taken for each location by shooting photos from different distances.
    • Shooting light: good lighting conditions and poor lighting conditions were investigated.
    • Part of the images contain multiple targets and the other part contains only single signs.

    Under all conditions, a total of 4650 photos were taken in the original data. These were expanded to 27,900 photos were via data enhancement. All images are located in folder ccss/data/JPEGImages.

    The file ccss/data/features/enhanced_data_to_original_data.csv provides a mapping between the enhanced image name and the corresponding original image.

    1.2. Annotation and Labelimg

    We use Labelimg as labeling tool, which, in turn, uses the PASCAL-VOC labelimg format. The annotation is stored in the folder ccss/data/Annotations.

    Faster R-CNN and SSD are two algorithms that use this format. When training YOLOv5, you can run trans_voc2yolo.py to convert the XML file in PASCAL-VOC format to a txt file.

    We provide further meta-information about the dataset in form of a CSV file features.csv which notes, for each image, which other features it has (lighting conditions, scale, multiplicity, etc.). We apply the COCO standard for deciding whether a target is small, medium, or large in size.

    1.3. Dataset Features

    As stated above, the images have been shot under different conditions. We provide all the feature information in folder ccss/data/features. For each feature, there is a separate list of file names in that folder. The file ccss/data/features/features_on_original_data.csv is a CSV file which notes all the features of each original image.

    1.4. Dataset Division

    The data set is fixedly divided into 7:3 training set and test set. You can find the corresponding image names in the files ccss/data/training_data_file_names.txt and ccss/data/test_data_file_names.txt.

    2. Baseline Experiments

    We provide baseline results with five models, namely Faster R-CNN (R), Faster R-CNN (M), SSD, YOLOv3-spp, and YOLOv5. All code and results is given in folder ccss/experiment.

    2.2. Environment and Configuration:

    • Single Intel Core i7-8700 CPU
    • NVIDIA GTX1060 GPU
    • 16 GB of RAM
    • Python: 3.8.10
    • pytorch: 1.9.0
    • pycocotools: pycocotools-win
    • Visual Studio 2017
    • Windows 10

    2.3. Applied Models

    The source codes and results of the applied models is given in folder ccss/experiment with sub-folders corresponding to the model names.

    2.3.1. Faster R-CNN

    • Faster R-CNN (R) has the backbone resnet50+fpn. The Faster R-CNN (R) source code used in our experiment is given in folder ccss/experiment/sources/faster_rcnn (R). The weights of the fully-trained Faster R-CNN (R) model are stored in file ccss/experiment/trained_models/faster_rcnn (R).pth. The performance measurements of Faster R-CNN (R) are stored in folder ccss/experiment/performance_indicators/faster_rcnn (R).
    • Faster R-CNN (M) has the backbone mobilenetv2.
      • backbone: MobileNetV2.
      • we modify the type information of the JSON file to match our application.
      • run train_mobilenetv2.py
      • finally, the weights trained by the training set.
      The Faster R-CNN (M) source code used in our experiment is given in folder ccss/experiment/sources/faster_rcnn (M). The weights of the fully-trained Faster R-CNN (M) model are stored in file ccss/experiment/trained_models/faster_rcnn (M).pth. The performance measurements of Faster R-CNN (M) are stored in folder ccss/experiment/performance_indicators/faster_rcnn (M).

    2.3.2. SSD

    The SSD source code used in our experiment is given in folder ccss/experiment/sources/ssd. The weights of the fully-trained SSD model are stored in file ccss/experiment/trained_models/ssd.pth. The performance measurements of SSD are stored in folder ccss/experiment/performance_indicators/ssd.

    2.3.3. YOLOv3-spp

    • backbone: DarkNet53
    • we modified the type information of the XML file to match our application
    • run trans_voc2yolo.py to convert the XML file in VOC format to a txt file.
    • the weights used are: yolov3-spp-ultralytics-608.pt.

    The YOLOv3-spp source code used in our experiment is given in folder ccss/experiment/sources/yolov3-spp. The weights of the fully-trained YOLOv3-spp model are stored in file ccss/experiment/trained_models/yolov3-spp.pt. The performance measurements of YOLOv3-spp are stored in folder ccss/experiment/performance_indicators/yolov3-spp.

    2.3.4. YOLOv5

    • backbone: CSP_DarkNet
    • we modified the type information of the XML file to match our application
    • run trans_voc2yolo.py to convert the XML file in VOC format to a txt file.
    • the weights used are: yolov5s.

    The YOLOv5 source code used in our experiment is given in folder ccss/experiment/sources/yolov5. The weights of the fully-trained YOLOv5 model are stored in file ccss/experiment/trained_models/yolov5.pt. The performance measurements of YOLOv5 are stored in folder ccss/experiment/performance_indicators/yolov5.

    2.4. Evaluation

    The computed evaluation metrics as well as the code needed to compute them from our dataset are provided in the folder ccss/experiment/performance_indicators. They are provided over the complete test st as well as separately for the image features (over the test set).

    3. Code Sources

    1. Faster R-CNN (R and M)
    2. SSD
    3. YOLOv3-spp
    4. YOLOv5

    We are particularly thankful to the author of the GitHub repository WZMIAOMIAO/deep-learning-for-image-processing (with whom we are not affiliated). Their instructive videos and codes were most helpful during our work. In

  4. e

    Foldclass databases for protein structural domains in CATH and TED - Dataset...

    • b2find.eudat.eu
    Updated Feb 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Foldclass databases for protein structural domains in CATH and TED - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/89e96e17-6989-53e9-9768-45066418d923
    Explore at:
    Dataset updated
    Feb 1, 2025
    Description

    This repository contains databases of protein domains for use with Foldclass and Merizo-search. We provide databases for all 365 million domains in TED, as well as all classified domains in CATH 4.3.Foldclass and Merizo-search use two formats for databases. The default format uses a PyTorch tensor and a pickled list of Python tuples to store the data. This format is used for the CATH database, which is small enough to fit in memory. For larger-than-memory datasets, such as TED, we use a binary format that is searched using the Faiss library.The CATH database requires approximately 1.4 GB of disk space, whereas the TED database requires about 885 GB. Please ensure you have enough free storage space before downloading. For best search performance with the TED database, the database should be stored on the fastest storage hardware available to you.IMPORTANT:We recommend going in to each folder and downloading the files; if you attempt to download each folder in one go, it will download a zip file which will need to be decompressed. This is particularly an issue if downloading the TED database, as you will need to have roughly twice the storage space needed as compared to downloading the individual files. Our GitHub repository (see Related Materials below) contains a convenience script to download each database; we recommend using that.

  5. GEMS: GNN Framework For Efficient Protein-Ligand Binding Affinity Prediction...

    • zenodo.org
    application/gzip, bin +1
    Updated Dec 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter Stockinger; Peter Stockinger (2024). GEMS: GNN Framework For Efficient Protein-Ligand Binding Affinity Prediction Through Robust Data Filtering and Language Model Integration [Dataset]. http://doi.org/10.5281/zenodo.14260171
    Explore at:
    json, application/gzip, binAvailable download formats
    Dataset updated
    Dec 8, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Peter Stockinger; Peter Stockinger
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    For fast reproduction of our results, we provide PyTorch datasets of precomputed interaction graphs for the entire PDBbind database on Zenodo. To enable quick establishment of leakage-free evaluation setups with PDBbind, we also provide pairwise similarity matrices for the entire PDBbind dataset on Zenodo.

  6. ASL Benchmark Dataset (YOLOv5 PyTorch Format)

    • kaggle.com
    Updated Sep 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dima (2021). ASL Benchmark Dataset (YOLOv5 PyTorch Format) [Dataset]. https://www.kaggle.com/tasnimdima/datazip/activity
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 4, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Dima
    Description

    Context

    I made this data annotation for conference paper . I try to make an application that will be fast and light enough to deploy in any cutting edge device while maintaining a good accuracy like any state-of-the-art model.

    Data Details

    The following pre-processing was applied to each image: * Auto-orientation of pixel data (with EXIF-orientation stripping) * Resize to 416x416 (Stretch)

    The following augmentation was applied to create 3 versions of each source image in trainig set images: * 50% probability of horizontal flip * 50% probability of vertical flip * Equal probability of one of the following 90-degree rotations: none, clockwise, counter-clockwise, upside-down * Randomly crop between 0 and 7 percent of the image * Random rotation of between -40 and +40 degrees * Random shear of between -29° to +29° horizontally and -15° to +15° vertically * Random exposure adjustment of between -34 and +34 percent * Random Gaussian blur of between 0 and 1.5 pixels * Salt and pepper noise was applied to 4 percent of pixels

    Acknowledgements

    A big shoutout to Massey University for making this dataset public. The original dataset Link is : here , Please keep in mind that the original dataset maybe updated from time to time. However, I don't intend to update this annotated version.

  7. Data from: Segment and Support: a dual-purpose Deep Learning solution for...

    • zenodo.org
    bin, text/x-python +1
    Updated Mar 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michał Gontarz; Michał Gontarz; Vibekananda Dutta; Vibekananda Dutta; Wojciech Krauze; Wojciech Krauze; Małgorzata Kujawińska; Małgorzata Kujawińska (2025). Segment and Support: a dual-purpose Deep Learning solution for Limited Angle Holographic Tomography - dataset [Dataset]. http://doi.org/10.5281/zenodo.13591396
    Explore at:
    bin, text/x-python, txtAvailable download formats
    Dataset updated
    Mar 17, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Michał Gontarz; Michał Gontarz; Vibekananda Dutta; Vibekananda Dutta; Wojciech Krauze; Wojciech Krauze; Małgorzata Kujawińska; Małgorzata Kujawińska
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset constains training data of two types:

    1. Synthetic volumes:
      Containing randomly generated and placed spheres of various sizes. The dataset was augmented by adding noise, rotation, holes, flipping, blurring and shifting. Data generated this way is treated as the input data. An output pair for each datapoint has been generated by inducing missing cone artifacts, by masking the 3D Fourier spectrum of each volume (to mimic an experimental scenario).
    2. Experimental data:
      This part of the dataset contains erroneous Direct Inversion reconstructions of cells [1], organoids [2], phantoms [3-4]. The mask has been generated through a thresholding step of a GTVIC [5] algorithm.

    Each volume has the shape (128,128,128), and is of float32 precision. They are saved as .tiff files, which can be easily read via the tifffile Python library.

    Additionally, the dataset contains:

    • SnSNet_best_loss_model.pth - PyTorch file containing the weights of the model, whose loss was the lowest during training.
    • SnSNet_best_metric_model.pth - PyTorch file containing the weights of the model, whose metrics were the highest during training.
    • SnSNet_config.yml - YAML file containing all config information about the model, its training and dataset. It is necessary for inference and model recreation.
    • Inference.py - Short Python script for quick inference. The input should be a floating point, erroneous DI reconstruction. It produces a binary mask of the object with corrected geometry and saves it into the same directory as the input file, with the prefix PRED_.
    • reqs.txt - text file, containing the library requirements necessary for running Inference.py. It is compiled, such that the script performs the inference on GPU, using CUDA 11.8 on Windows 10. If the configuration does not enable running on GPU, it will run on CPU and will be indicated in the logs.
    • training.xlsx - Excel file, containing data about the training process.

    [1] M. Baczewska, W. Krauze, A. Kuś, P. Stępień, K. Tokarska, K. Zukowski, E. Malinowska, Z. Brzózka, and M. Kujawińska, “On-chip holographic tomography for quantifying refractive index changes of cells’ dynamics,” in Quantitative Phase Imaging VIII, vol. 11970 Y. Liu, G. Popescu, and Y. Park, eds., International Society for Optics and Photonics (SPIE, 2022), p. 1197008.
    [2] P. Stępień, M. Ziemczonok, M. Kujawińska, M. Baczewska, L. Valenti, A. Cherubini, E. Casirati, and W. Krauze, “Numerical refractive index correction for the stitching procedure in tomographic quantitative phase imaging,” Biomed. Opt. Express 13, 5709–5720 (2022).
    [3] M. Ziemczonok, A. Kuś, P. Wasylczyk, and M. Kujawińska, “3d-printed biological cell phantom for testing 3d quantitative phase imaging systems,” Sci. Reports 9, 1–9 (2019).
    [4] M. Ziemczonok, A. Kuś, and M. Kujawińska, “Optical diffraction tomography meets metrology — measurement accuracy on cellular and subcellular level,” Measurement 195, 111106 (2022).
    [5] W. Krauze, P. Makowski, M. Kujawińska, and A. Kuś, “Generalized total variation iterative constraint strategy in limited angle optical diffraction tomography,” Opt. Express 24, 4924–4936 (2016).

  8. Code and Source Data for "Knowledge-Guided Machine Learning can improve C...

    • zenodo.org
    zip
    Updated Oct 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LICHENG LIU; LICHENG LIU; Zhenong Jin; Zhenong Jin (2024). Code and Source Data for "Knowledge-Guided Machine Learning can improve C cycle quantification in agroecosystems" [Dataset]. http://doi.org/10.5281/zenodo.10155516
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    LICHENG LIU; LICHENG LIU; Zhenong Jin; Zhenong Jin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasets for code and Source Data for the study "Knowledge-Guided Machine Learning can improve C cycle quantification in agroecosystems" https://doi.org/10.1038/s41467-023-43860-5. All files belong to Licheng Liu and Zhenong Jin at University of Minnesota. deposit_code_v2.zip contains packaged codes and sample runs for KGML-ag-Carbon training, validation and implementations. Source Data.zip contains data for generating the figures inside the study.

    Note: We used Pytorch 1.6.0 (https://pytorch.org/get-started/previous-versions/, last access: 21 Oct 2023) and Python 3.7.11 (https://www.python.org/downloads/release/python-3711/, last access: 21 Oct 2023) as the programming environment for model development. Statistical analysis, such as linear regression, was conducted using Statsmodels 0.14.0 (https://github.com/statsmodels/statsmodels/, last access: 21 Oct 2023) In order to use a GPU to speed-up the training process, we installed the CUDA Toolkit 10.1.243 (https://developer.nvidia.com/cuda-toolkit, last access: 21 Oct 2023).

    To use the full kgml_lib function, please create a new environment with the same python and libs above.

  9. c

    Data from: GreynirTranslate - mBART25 NMT (with layer drop) models for...

    • repotest.clarin.is
    Updated Sep 27, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vésteinn Snæbjarnarson; Svanhvít Lilja Ingólfsdóttir; Haukur Barri Símonarson (2021). GreynirTranslate - mBART25 NMT (with layer drop) models for Translations between Icelandic and English (1.0) [Dataset]. https://repotest.clarin.is/repository/xmlui/handle/20.500.12537/128?locale-attribute=is
    Explore at:
    Dataset updated
    Sep 27, 2021
    Authors
    Vésteinn Snæbjarnarson; Svanhvít Lilja Ingólfsdóttir; Haukur Barri Símonarson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These are the models in http://hdl.handle.net/20.500.12537/125 trained with 40% layer drop. They are suitable for inference using every other layer for optimized inference speed with lower translation performance. We refer to the prior submission for usage and the documentation on layerdrop at https://github.com/pytorch/fairseq/blob/fcca32258c8e8bcc9f9890bf4714fa2f96b6b3e1/examples/layerdrop/README.md.

    Þessi líkön eru þjálfuð með 40% laga missi (e. layer drop) á líkönunum í http://hdl.handle.net/20.500.12537/125. Þau henta vel til þýðinga þar sem er búið að henda öðru hverju lagi í netinu og þannig er hægt að hraða á þýðingum á kostnað gæða. Leiðbeiningar um notkun netanna er að finna með upphaflegu líkönunum og í notkunarleiðbeiningum Fairseq í https://github.com/pytorch/fairseq/blob/fcca32258c8e8bcc9f9890bf4714fa2f96b6b3e1/examples/layerdrop/README.md.

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mathieu Fourment; Matthew Macaulay; Christiaan Swanepoel; Xiang Ji; Marc Suchard; Frederick Matsen IV (2025). Torchtree: flexible phylogenetic model development and inference using PyTorch [Dataset]. http://doi.org/10.5061/dryad.zw3r228gv

Data from: Torchtree: flexible phylogenetic model development and inference using PyTorch

Related Article
Explore at:
Dataset updated
Jun 21, 2025
Dataset provided by
Dryad Digital Repository
Authors
Mathieu Fourment; Matthew Macaulay; Christiaan Swanepoel; Xiang Ji; Marc Suchard; Frederick Matsen IV
Description

Bayesian inference has predominantly relied on the Markov chain Monte Carlo (MCMC) algorithm for many years. However, MCMC is computationally laborious, especially for complex phylogenetic models of time trees. This bottleneck has led to the search for alternatives, such as variational Bayes, which can scale better to large datasets. In this paper, we introduce torchtree, a framework written in Python that allows developers to easily implement rich phylogenetic models and algorithms using a fixed tree topology. One can either use automatic differentiation, or leverage torchtree's plug-in system to compute gradients analytically for model components for which automatic differentiation is slow. We demonstrate that the torchtree variational inference framework performs similarly to BEAST in terms of speed and approximation accuracy. Furthermore, we explore the use of the forward KL divergence as an optimizing criterion for variational inference, which can handle discontinuous and non-diffe..., , , # torchtree: flexible phylogenetic model development and inference using PyTorch

Mathieu Fourment, Matthew Macaulay, Christiaan J Swanepoel, Xiang Ji, Marc A Suchard, Frederick A Matsen IV. torchtree: flexible phylogenetic model development and inference using PyTorch. arXiv:2406.18044 (2024)

Description of the data

The SI.pdf file contains supplementary methods and figures referenced in the main manuscript (found on Zenodo under Supplemental Information).

The data.zip contains input files and phylogenetic trees used for analyses in the associated manuscript. The data are organized by dataset (HCV and SC2) and by tool (beast and torchtree), and include sequence alignments (see next section for SC2 alignment) and configuration files (xml and json files). torchtree uses variational Bayes while BEAST uses MCMC.

data/
├── HCV/
│  ├── HCV.fasta          # Sequence alignment for HCV
│  ├── HCV.tree          # Newick ...,
Search
Clear search
Close search
Google apps
Main menu