84 datasets found

o
Sawfish Bioinformatics App Note SV VCFs and assessment results
explore.openaire.eu
Updated Feb 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christopher T Saunders (2025). Sawfish Bioinformatics App Note SV VCFs and assessment results [Dataset]. http://doi.org/10.5281/zenodo.14898461
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.14898461
Dataset updated
Feb 20, 2025
Authors
Christopher T Saunders
Description
This tarball "sawfish_publication_sv_vcfs_and_assessments.tar.gz" contains structural variant VCFs from analyses in the sawfish Bioinformatics App Note, as well as the corresponding VCF assessment results and assessment scripts. The corresponding Bioinformatics article can be found here: https://doi.org/10.1093/bioinformatics/btaf136The top 3 levels of of the tarball file tree are as follows: sawfish_publication_sv_vcfs_and_assessments ├── CEPH1463_pedigree_analysis │ ├── assessment_scripts │ │ ├── ceph.GRCh38.viterbi.oa.csv.gz │ │ ├── get_gqcut.bash │ │ ├── get_pass_ge50.bash │ │ └── run_concordance.bash │ ├── README.md │ └── sv_calls │ ├── pbsv │ ├── sawfish │ └── sniffles ├── HG002_depth_titration │ ├── assessment_scripts │ │ ├── giab_cmrg │ │ ├── giab_t2t_20241113 │ │ └── truvari_utils │ ├── benchmark_data │ │ ├── cmrg_1.0 │ │ └── T2T_V0.019-20241113_T2T-HG002-Q100v1.1 │ ├── README.md │ ├── reference_fasta │ │ ├── human_GRCh38_no_alt_analysis_set.fasta │ │ ├── human_GRCh38_no_alt_analysis_set.fasta.fai │ │ └── README.md │ ├── singuarlity_images │ │ ├── README.md │ │ └── truvari-v4.2.2.sif │ └── sv_calls │ ├── pbsv │ ├── sawfish │ └── sniffles └── README.md
Summary of applications in bioinformatics.
plos.figshare.com
xls
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michiaki Hamada; Hisanori Kiryu; Wataru Iwasaki; Kiyoshi Asai (2023). Summary of applications in bioinformatics. [Dataset]. http://doi.org/10.1371/journal.pone.0016450.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0016450.t001
Dataset updated
Jun 2, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Michiaki Hamada; Hisanori Kiryu; Wataru Iwasaki; Kiyoshi Asai
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The top row includes problems about RNA secondary structure predictions and the middle row includes problems about alignment of biological sequences. Note that the estimators in the same column corresponds to each other.
[Dataset] Data for the course "Population Genomics" at Aarhus University
zenodo.org
application/gzip, bin
Updated Jan 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samuele Soraggi; Samuele Soraggi; Kasper Munch; Kasper Munch (2025). [Dataset] Data for the course "Population Genomics" at Aarhus University [Dataset]. http://doi.org/10.5281/zenodo.7670839
Explore at:
application/gzip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7670839
Dataset updated
Jan 8, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Samuele Soraggi; Samuele Soraggi; Kasper Munch; Kasper Munch
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Datasets, conda environments and Softwares for the course "Population Genomics" of Prof Kasper Munch. This course material is maintained by the health data science sandbox. This webpage shows the latest version of the course material.

Data.tar.gz Contains the datasets and executable files for some of the softwares
You can unpack by simply doing
tar -zxf Data.tar.gz -C ./
This will create a folder called Data with the uncompressed material inside

Course_Env.packed.tar.gz Contains the conda environment used for the course. This needs to be unpacked to adjust all the prefixes (Note this environment is created on Ubuntu 22.10). You do this in the command line by

creating the folder Course_Env: mkdir Course_Env

untar the file: tar -zxf Course_Env.packed.tar.gz -C Course_Env

Activate the environment: conda activate ./Course_Env

Run the unpacking script (it can take quite some time to get it done): conda-unpack

Course_Env.unpacked.tar.gz The same environment as above, but will work only if untarred into the folder /usr/Material - so use the version above if you are using it in another folder. This file is mostly to execute the course in our own cloud environment.

environment_with_args.yml The file needed to generate the conda environment. Create and activate the environment with the following commands:

conda env create -f environment_with_args.yml -p ./Course_Env

conda activate ./Course_Env

The data is connected to the following repository: https://github.com/hds-sandbox/Popgen_course_aarhus. The original course material from Prof Kasper Munch is at https://github.com/kaspermunch/PopulationGenomicsCourse.

Description

The participants will after the course have detailed knowledge of the methods and applications required to perform a typical population genomic study.

The participants must at the end of the course be able to:

Identify an experimental platform relevant to a population genomic analysis.

Apply commonly used population genomic methods.

Explain the theory behind common population genomic methods.

Reflect on strengths and limitations of population genomic methods.

Interpret and analyze results of population genomic inference.

Formulate population genetics hypotheses based on data

The course introduces key concepts in population genomics from generation of population genetic data sets to the most common population genetic analyses and association studies. The first part of the course focuses on generation of population genetic data sets. The second part introduces the most common population genetic analyses and their theoretical background. Here topics include analysis of demography, population structure, recombination and selection. The last part of the course focus on applications of population genetic data sets for association studies in relation to human health.

Curriculum

The curriculum for each week is listed below. "Coop" refers to a set of lecture notes by Graham Coop that we will use throughout the course.

Course plan

Course intro and overview:

Coop chapters 1, 2, 3, Paper: Genome Diversity Project

Drift and the coalescent:

Coop chapter 4; Paper: Platypus

Exercise: Read mapping and base calling

Recombination:

Lecture: Review: Recombination in eukaryotes, Review: Recombination rate estimation

Exercise: Phasing and recombination rate

Population strucure and incomplete lineage sorting:

Lecture: Coop chapter 6, Review: Incomplete lineage sorting

Exercise: Working with VCF files

Hidden Markov models:

Lecture: Durbin chapter 3, Paper: population structure

Exercise: Inference of population structure and admixture

Ancestral recombination graphs:

Lecture: Paper: Approximating the ARG, Paper: Tree inference

Exercise: ARG dashboard exercises + Inference of trees along sequence

Past population demography:

Lecture: Coop chapter 4, Paper: PSMC, revisit Paper: Tree inference

Exercise: Inferring historical populations

Direct and linked selection:

Lecture: Coop chapters 12, 13, revisit Paper: Tree inference

Admixture:

Lecture: Review: Admixture, Paper: Admixture inference

Exercise: Detecting archaic ancestry in modern humans

Genome-wide association study (GWAS):

Lecture: Coop lecture notes 99-120

Exercise: GWAS quality control

Heritability:

Lecture: Coop Lecture notes Sec. 2.2 (p23-36) + Chap. 7 (p119-142)

Exercise: Association testing

Evolution and disease:

Lecture: Coop Lecture notes Sec. 11.0.1 (p217-221)

Exercise: Estimating heritability
d
Data from: Accurate, scalable, and fully automated inference of species...
search.dataone.org
datadryad.org
Updated Jan 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anshu Gupta; Siavash Mirarab; Yatish Turakhia (2025). Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES [Dataset]. http://doi.org/10.5061/dryad.tht76hf73
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.tht76hf73
Dataset updated
Jan 10, 2025
Dataset provided by
Dryad Digital Repository
Authors
Anshu Gupta; Siavash Mirarab; Yatish Turakhia
Description
Current genome sequencing initiatives across a wide range of life forms offer significant potential to enhance our understanding of evolutionary relationships and support transformative biological and medical applications. Species trees play a central role in many of these applications; however, despite the widespread availability of genome assemblies, accurate inference of species trees remains challenging for many scientists due to the limited automation, significant domain expertise, and substantial computational resources required by conventional methods. To address this limitation, we present ROADIES, a fully-automated pipeline to infer species trees starting from raw genome assemblies (those lacking prior annotations). In contrast to the prominent approach, ROADIES randomly selects segments of the input genomes to generate gene trees. This eliminates the need to choose any single reference species or perform the cumbersome steps of gene annotations and whole genome alignments. ROA..., , , # Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES

Usage Notes

https://doi.org/10.5061/dryad.tht76hf73

ROADIES is a novel pipeline designed for phylogenetic tree inference of the species directly from their raw genomic assemblies.

For further details related to how to run the tool ROADIES, please refer to our Wiki:Â https://turakhia.ucsd.edu/ROADIES/

This repository contains the output files generated by ROADIES (v0.1.0) (https://github.com/TurakhiaLab/ROADIES/releases/tag/v0.1.0) for estimating the species tree for the following datasets (in the accurate mode of operation):

240 mammalian species from the infraclass Placentalia (alternatively referred to as â€œplacental mammalsâ€ )

100 flies species belonging to the subfamily of Drosophilinae and Steganinae

363 bird species from...
Protocol data (R version)
figshare.com
application/gzip
Updated Oct 16, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jesse Gillis (2020). Protocol data (R version) [Dataset]. http://doi.org/10.6084/m9.figshare.13020569.v2
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13020569.v2
Dataset updated
Oct 16, 2020
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Jesse Gillis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We published 3 protocols illustrating how MetaNeighbor can be used to quantify cell type replicability across single cell transcriptomic datasets.The data files included here are needed to run the R version of the protocols available on Github (https://github.com/gillislab/MetaNeighbor-Protocol) in RMarkdown (.Rmd) and Jupyter (.ipynb) notebook format. To run the protocols, download the protocols on Github, download the data on Figshare, place the data and protocol files in the same directory, then run the notebooks in Rstudio or Jupyter.The scripts used to generate the data are included in the Github directory. Briefly: - full_biccn_hvg.rds contains a single cell transcriptomic dataset published by the Brain Initiative Cell Census Network (in SingleCellExperiment format). It combines data from 7 datasets obtained in the mouse primary motor cortex (https://www.biorxiv.org/content/10.1101/2020.02.29.970558v2). Note that this dataset only contains highly variable genes. - biccn_hvgs.txt: highly variable genes from the BICCN dataset described above (computed with the MetaNeighbor library). - biccn_gaba.rds: same dataset as full_biccn_hvg.rds, but restricted to GABAergic neurons. The dataset contains all genes common to the 7 BICCN datasets (not just highly variable genes). - go_mouse.rds: gene ontology annotations, stored as a list of gene symbols (one element per gene set).- functional_aurocs.txt: results of the MetaNeighbor functional analysis in protocol 3.
zol: prepTG Databases for ESKAPE Pathogens
zenodo.org
nde-dev.biothings.io
+1more
application/gzip
Updated Oct 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rauf Salamzade; Rauf Salamzade; Lindsay Kalan; Lindsay Kalan (2023). zol: prepTG Databases for ESKAPE Pathogens [Dataset]. http://doi.org/10.5281/zenodo.10042148
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10042148
Dataset updated
Oct 26, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Rauf Salamzade; Rauf Salamzade; Lindsay Kalan; Lindsay Kalan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Each of the tar.gz compressed directories corresponds to prepTG databases (for the zol suite) featuring distinct, representative genomes for one of the six genera containing ESKAPE pathogens. Representative genomes for each genus/taxon were selected using skDER v1.0.7 in greedy mode with 99% ANI and 90% AF cutoffs.
The compressed folders also contain an extra file, corresponding to a species tree of the representative genomes constructed using GToTree with Universal markers (ribosomal proteins) from Hug et al. 2016 and in best-hits mode. Note, GToTree was modified to always use -super5 mode for SCG alignments for computational efficiency. Also, note, because genomes can be dropped by GToTree prior to phylogeny inference (e.g. if they lack enough SCGs), not all genomes in the database might be represented in the phylogenies.
m
Research data for "Subjective data models in bioinformatics: Do wet-lab and...
figshare.manchester.ac.uk
explore.openaire.eu
txt
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yochannah Yehudi; Carole Goble; Caroline Jay; Lukas Hughes-Noehrer (2023). Research data for "Subjective data models in bioinformatics: Do wet-lab and computational biologists comprehend data differently?" [Dataset]. http://doi.org/10.48420/20641017.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.48420/20641017.v2
Dataset updated
Jun 1, 2023
Dataset provided by
University of Manchester
Authors
Yochannah Yehudi; Carole Goble; Caroline Jay; Lukas Hughes-Noehrer
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Subjective data models dataset

This dataset is comprised of data collected from study participants, for a study into how people working with biological data perceive data, and whether or not this perception of data aligns with a person's experiential and educational background. We call the concept of what data looks like to an individual a "subjective data model".

Todo: link paper/preprint once published.

Computational python analysis code: https://doi.org/10.5281/zenodo.7022789 and https://github.com/yochannah/subjective-data-models-analysis

Files

Transcripts of the recorded sessions are attached and have been verified by a second researcher. These files are all in plain text .txt format. Note that participant 3 did not agree to sharing the transcript of their interview. Interview paper files This folder has digital and photographed versions of the files shown to the participants for the file mapping task. Note that the original files are from the NCBI and from FlyBase. Videos and stills from the recordings have been deleted in line with the Data Management Plan and Ethical Review. anonymous_participant_list.csv shows which files have transcripts associated (not all participants agreed to share transcripts), what the order of Tasks A and B were, the date of interview, and what entities participants added to the set provided (if any). See the paper methods for more info about why entities were added to the set. cards.txt is a full list of the cards presented in the tasks. background survey and background manual annotations are the select survey data about participant background and manual additions to this where necessary, e.g. to interpret free text. codes.csv shows the qualitative codes used within the transcripts. entry_point.csv is a record of participants' identified entry points into the data. file_mapping_responses shows a record of responses to the file mapping task.
S
Supplementary Data for: "Association of QPRT gene polymorphisms with...
scidb.cn
Updated Dec 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ZHAO Shanshan; DUAN Kaiming (2025). Supplementary Data for: "Association of QPRT gene polymorphisms with postpartum depression in Chinese cesarean parturients: A candidate gene association study" [Dataset]. http://doi.org/10.57760/sciencedb.xbyxb.00144
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.xbyxb.00144
Dataset updated
Dec 24, 2025
Dataset provided by
Science Data Bank
Authors
ZHAO Shanshan; DUAN Kaiming
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This dataset comprises the key supplementary materials supporting the findings of the research article titled “Association of QPRT Gene Polymorphisms with Postpartum Depression in Chinese Cesarean Parturients: A Candidate Gene Association Study.” The data were generated through bioinformatics database mining and in vitro molecular cloning design.Data Generation and Processing Methods:Bioinformatics Data: These data were retrieved from online public databases in 2024. The eQTL (expression quantitative trait loci) summary plot for the QPRT gene rs9933310 locus across human tissues (with a focus on the cerebral cortex and hippocampus) was queried and exported from the GTEx Portal (version 8). Visualizations of chromatin states, histone modification profiles, and cis-regulatory element predictions for the rs9933310 locus were obtained by querying the ENCODE, GeneCards (integrating GeneHancer), and 3DSNP v2.0 databases via the UCSC Genome Browser (assembly GRCh38/hg38), using the specific genomic coordinates (chr16:29679583). No secondary statistical calculations were performed on the raw database outputs during this process.Experimental Sequence Data: Based on NCBI reference sequences, DNA fragments encompassing the rs9933310 locus and its flanking regions were designed using sequence design software (e.g., Primer Premier) and commercially synthesized. The wild-type sequence (QPRT-W) contains the ‘A’ allele, while the mutant sequence (QPRT-M) features a single nucleotide substitution to ‘G’ to model the SNP. These sequences were subsequently cloned into the pGL3 reporter vector for functional validation and were verified by Sanger sequencing.Dataset Content and Spatiotemporal Information:The data itself does not pertain to specific geographical spatial information or continuous time series. Its temporal context is defined by the date of query/generation (2024) and the specific versions of the underlying public databases (e.g., GTEx v8, hg38). The dataset consists of four core files:Supplementary Figure 1 : An eQTL analysis plot illustrating the association between rs9933310 genotypes and QPRT expression. Data points represent expression levels from individuals of different genotypes. The Y-axis displays the normalized QPRT expression level (typically in units like TPM), and the X-axis shows genotype groups. This figure visually demonstrates the expression trend: AA > AG > GG.Supplementary Figure 2 : A genome browser screenshot from the ENCODE database, displaying enrichment signals of various histone modifications (e.g., H3K4me3, H3K27ac) in the region surrounding the rs9933310 locus. These modifications are hallmarks of promoter/enhancer activity.Supplementary Figure 3 and 4 (or as separate files): Functional prediction screenshots from the GeneCards/GeneHancer and 3DSNP databases, respectively. They present graphical evidence and corresponding confidence scores predicting the locus's potential promoter and/or enhancer activity.Supplementary Table 1 : A table listing the complete DNA sequences for QPRT-W and QPRT-M. The table contains 2 rows (representing the two constructs) and 1 column (“DNA Sequence”). Sequences are provided as 5‘->3’ nucleotide strings (unit: nucleotide), with no other measurement units involved. This dataset is complete with no missing values; all sequences are fully provided and have been verified.Data Quality and Usage Notes:The bioinformatics images in this dataset are static outputs exported from authoritative public databases. Any inherent “error” or uncertainty is already encapsulated within the original databases' statistical models and confidence intervals and is not separately annotated in the figures. The experimental sequence data are accurate and have been validated by sequencing. No data points are missing due to human error or processing in any of the files.All files are in widely compatible formats: PNG (images) and DOCX (document). They can be opened and viewed using any standard image viewer (e.g., Windows Photo Viewer, Preview) and office suite software (e.g., Microsoft Word, WPS Office, Google Docs) or text editors. No specialized or niche software is required.This dataset is intended to provide transparent and traceable primary evidence for the proposed functional mechanism of the rs9933310 locus discussed in the associated manuscript. It is available for peer researchers to review, reference, or use as an educational example in related bioinformatics and molecular biology contexts.
f
Supplementary data: Comparison of target enrichment strategies for ancient...
datasetcatalog.nlm.nih.gov
tandf.figshare.com
Updated Nov 3, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Knauf, Sascha; Furtwängler, Anja; Schuenemann, Verena J.; Cole, Stewart T.; Calvignac-Spencer, Sébastien; Reiter, Ella; Singh, Pushpendra; Arora, Natasha; Böhme, Lisa; Vollstedt, Melanie; Krause-Kyora, Ben; Neukamm, Judith; Krause, Johannes; Herbig, Alexander (2020). Supplementary data: Comparison of target enrichment strategies for ancient pathogen DNA [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000454150
Explore at:
Dataset updated
Nov 3, 2020
Authors
Knauf, Sascha; Furtwängler, Anja; Schuenemann, Verena J.; Cole, Stewart T.; Calvignac-Spencer, Sébastien; Reiter, Ella; Singh, Pushpendra; Arora, Natasha; Böhme, Lisa; Vollstedt, Melanie; Krause-Kyora, Ben; Neukamm, Judith; Krause, Johannes; Herbig, Alexander
Description
Supplementary Note 1 – Laboratory workflow Supplementary Note 2 - Bioinformatics and Statistical Analysis Supplementary Note 3 – Results of the Bioinformatics and Statistical Analysis Supplementary Figure 1: Comparison of (A) mean coverage, (B) standard deviation of the mean coverage, (C) enrichment factor, (D) and the percentage of the genome covered 5 fold, (E) distribution of the fragment length and (F) frequency of the aDNA damage for the ancient and modern strains of M. leprae. Three independent replicates were performed for each method. Labels of the ancient samples are in black and for the modern samples in red. Boxplots of the array are blue, of the DNA bait capture red and the RNA baits capture is green and grey for the first and second round, respectively Supplementary Figure 2: Comparison of (A) mean coverage, (B) standard deviation of the mean coverage, (C) enrichment factor, (D) and the percentage of the genome covered 5 fold, (E) distribution of the fragment length and (F) frequency of the aDNA damage for the ancient and modern strains of T. pallidum. Three independent replicates were performed for each method. Labels of the ancient samples are in black and for the modern samples in red. Boxplots of the array are blue, of the DNA bait capture red and the RNA baits capture is green and grey for the first and second round, respectively Supplementary Figure 3: Number of unique reads for the three replicate batches of the three tested methods. The number of unique reads in the second round of hybridization with the RNA baits does not strongly increase compared to the first round. Supplementary Table 1: List of all samples used in this study group according to organism and age together with the original publications. Supplementary Table 4: Comparison of the specific reads of the three tested protocols. Supplementary Table 6: Comparison of the variance within each method tested. Supplementary Table 7: Comparison of the costs per reaction.
Bioinformatic databases survey
zenodo.org
csv
Updated Aug 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alise Ponsero; Alise Ponsero; Bonnie Hurwitz; Bonnie Hurwitz; Kiran Smelser; Kiran Smelser; Karen Valencia; Lucas Jimenez Miranda; Lucas Jimenez Miranda; Abby McDermott; Karen Valencia; Abby McDermott (2024). Bioinformatic databases survey [Dataset]. http://doi.org/10.5281/zenodo.12790448
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12790448
Dataset updated
Aug 17, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alise Ponsero; Alise Ponsero; Bonnie Hurwitz; Bonnie Hurwitz; Kiran Smelser; Kiran Smelser; Karen Valencia; Lucas Jimenez Miranda; Lucas Jimenez Miranda; Abby McDermott; Karen Valencia; Abby McDermott
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Bioinformatic databases survey

The dataset surveys bioinformatic databases published in the NAR database issue from 1995 to 2022. It evaluates the current number of citations and availability of each ressources.

Data content

The dataset is composed of two tables :

A. Databases table : Contains the information of each database published in the NAR database issue.

db_id : Database ID in the dataset

resource_name : Name(s) of the database

current_access : Latest known web address of the database

is_a_pun : The database name is a play on word

available_2022 : The database was accessible online during the 2022 survey

last_accessible_year : If not accessible, latest point in time where the database was found online (using the Internet web archive snapshots)

unavailable_message : If not accessible, the message/error when trying to access the ressource

year_first_publication : Year of first publication of the database

year_last_publication : Year of latest publication of the database (including database update publications)

total_citations_2022 : Cumulative number of citation for all articles of the database

nb_authors_max : Maximum number of authors associated to any articles published for that database

nb_articles_2022 : Number of articles published for that database in 2022

B. Articles table : Contains the information collected for the NAR articles

collector : Person who contributed to add this database in the dataset

article_global_id : DOI of the article surveyed

db_id : Database ID of the ressource described in the article

article_id : Article unique ID

article_year : Article publication year

Authors : list of authors of the article. Separated by ";"

Author.ID : list of ORCID of the authors of the article. Separated by ";"

Title : Title of the atricle

Source.title : Journal name

Volume : Volume number

Issue : Issue number

Funding.Details : Funding information of the article

Funding.Text : Funding text provided by the authors

PubMed.ID : Pubmed ID of the article

citations_2016 : Number of citations of the article in 2016 (if published)

citations_2022 : Number of citations of the article in 2022

nb_authors : Number of authors in the article

Index.Keywords : Keywords associated to the publication

Data sources

Note that the presented dataset leverage and expand on the dataset gathered and published in Imker, H.J., 2020. Who Bears the Burden of Long-Lived Molecular Biology Databases?. Data Science Journal, 19(1), p.8. The original dataset collected by Dr. Imker is available at : https://doi.org/10.13012/B2IDB-4311325_V1

The dataset was collected and is maintained by undergraduate students of a CURE class (Course-based Undergraduate Research Experience) held at the University of Arizona. All students of the class have participated to the collection, update and curation the dataset that is available as a database and a web-portal at https://hurwitzlab.shinyapps.io/DS_Heroes/. Students could elect to be added or not as author to this Zenodo repository.

The CURE class BAT102 "Data Science Heroes: An undergraduate research experience in Open Data Science Practices" gives the students an opportunity to learn about open science and investigate open data practices in bioinformatics through a survey of the databases published in the NAR database issue.
f
Data_Sheet_1_Application note: TDbasedUFE and TDbasedUFEadv: bioconductor...
datasetcatalog.nlm.nih.gov
Updated Sep 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Turki, Turki; Taguchi, Y-h. (2023). Data_Sheet_1_Application note: TDbasedUFE and TDbasedUFEadv: bioconductor packages to perform tensor decomposition based unsupervised feature extraction.ZIP [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001118073
Explore at:
Dataset updated
Sep 1, 2023
Authors
Turki, Turki; Taguchi, Y-h.
Description
MotivationTensor decomposition (TD)-based unsupervised feature extraction (FE) has proven effective for a wide range of bioinformatics applications ranging from biomarker identification to the identification of disease-causing genes and drug repositioning. However, TD-based unsupervised FE failed to gain widespread acceptance due to the lack of user-friendly tools for non-experts.ResultsWe developed two bioconductor packages—TDbasedUFE and TDbasedUFEadv—that enable researchers unfamiliar with TD to utilize TD-based unsupervised FE. The packages facilitate the identification of differentially expressed genes and multiomics analysis. TDbasedUFE was found to outperform two state-of-the-art methods, such as DESeq2 and DIABLO.Availability and implementationTDbasedUFE and TDbasedUFEadv are freely available as R/Bioconductor packages, which can be accessed at https://bioconductor.org/packages/TDbasedUFE and https://bioconductor.org/packages/TDbasedUFEadv, respectively.

GTDB r220 Mash Database (UNOFFICIAL MIRROR)

zenodo.org

bin

Updated Jun 5, 2024

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Josh L. Espinoza; Josh L. Espinoza (2024). GTDB r220 Mash Database (UNOFFICIAL MIRROR) [Dataset]. http://doi.org/10.5281/zenodo.11494307

Explore at:

binAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.11494307

Dataset updated

Jun 5, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Josh L. Espinoza; Josh L. Espinoza

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This is an UNOFFICIAL host for the GTDB mash sketch based on GTDB r220

Intended use of this file is to include in the VEBA database for quicker GTDB-Tk analysis.

Created by running the following command using GTDB-Tk v2.4.0 on the S1 sample from Zenodo:7946802:

gtdbtk classify_wf --genome_dir veba_output/binning/prokaryotic/S1/output/genomes/ --out_dir test_output -x fa --cpus 1 --mash_db ./gtdb_r220.msh

Source Files:

gtdbtk_r220_data.tar.gz

RELEASE_NOTES.txt

Release 220.0:
--------------

GTDB release R09-RS220 comprises 596,859 genomes organised into 113,104 species clusters. 
Additional statistics for this release are available on the GTDB Statistics page.

Release notes:
--------------

 - Average nucleotide identity (ANI) between genomes is now calculated using skani (Shaw et al., Nat Methods, 2023) instead of FastANI (Jain et al, Nat Commun, 2018). 
  skani provides a substantial reduction in computational requirements while producing similar ANI values and more accurate alignment fraction (AF) values.
 - CheckM v2 information is included on the website and in the metadata files, noting at this stage that these data were not used for the QC step in release 220. 
 - Post-curation cycle, we identified updated spelling for 15 taxon names: 
  p_Calescibacterota (updated name: Calescibacteriota)
  c_Brachyspirae (updated name: Brachyspiria)
  c_Leptospirae (updated name: Leptospiria)
  o_Ammonifexales (updated name: Ammonificales)
  o_Exiguobacterales (updated name: Exiguobacteriales)
  o_Hydrogenedentiales (updated name: Hydrogenedentales)
  o_Phormidesmiales (updated name: Phormidesmidales)
  f_Arcanobacteraceae (updated name: Arcanibacteraceae)
  f_Acetonemaceae (updated name: Acetonemataceae)
  f_Ethanoligenenaceae (updated name: Ethanoligenentaceae)
  f_Exiguobacteraceae (updated name: Exiguobacteriaceae)
  f_Geitlerinemaceae (updated name: Geitlerinemataceae)
  f_Koribacteraceae (updated name: Korobacteraceae)
  f_Phormidesmiaceae (updated name: Phormidesmidaceae)
  f_Porisulfidaceae (updated name: Poriferisulfidaceae)
  Note that the LPSN linkouts point to the correct updated names. We encourage users to use the updated names as these will appear in the next release.
 - Post-curation cycle, we discovered that two provisionally named families, Nitrincolaceae and Denitrovibrionaceae have been validly named under the ICNP as Balneatricaceae and Geovibrionaceae, respectively. 
  We encourage users to use the validly published names as these will appear in the next release.
 - We thank Jan Mares for his assistance in curating the class Cyanobacteriia and Brian Kemish for providing IT support to the project.

If you have found this useful, please cite the original publications:

Chaumeil PA, et al. 2022. GTDB-Tk v2: memory friendly classification with the Genome Taxonomy Database. Bioinformatics, btac672.
Parks, D.H., et al. (2021). GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Research, 50: D785–D794.

AbEMap results: Epitope Maps with AF2 folded (with NO templates) Ab inputs
figshare.com
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Israel Desta (2023). AbEMap results: Epitope Maps with AF2 folded (with NO templates) Ab inputs [Dataset]. http://doi.org/10.6084/m9.figshare.19652508.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19652508.v1
Dataset updated
May 31, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Israel Desta
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The third application of AbEmap is to use computationally modelled antibody structures to map epitopes on given antigen structures. This dataset contains epitope mapping results of 40 BM5 antigens when using AlphaFold folded antibody structures (Note - No templates were used in structure prediction). The resulting AbEMap scores are saved as b-factors in the PDB files included in this data set.
d
Data from: Sex-biased gene expression, sexual antagonism and levels of...
search.dataone.org
data.niaid.nih.gov
+1more
Updated Jun 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ludovic Dutoit; Carina Mugal; Paulina Bolivar; Mi Wang; Krystyna Nadachowska-Brzyska; Linnea Smeds; Homa Papoli; Lars Gustavsson; Hans Ellegren (2025). Sex-biased gene expression, sexual antagonism and levels of genetic diversity in the collared flycatcher (Ficedula albicollis) genome [Dataset]. http://doi.org/10.5061/dryad.qc5ft8n
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.qc5ft8n
Dataset updated
Jun 18, 2025
Dataset provided by
Dryad Digital Repository
Authors
Ludovic Dutoit; Carina Mugal; Paulina Bolivar; Mi Wang; Krystyna Nadachowska-Brzyska; Linnea Smeds; Homa Papoli; Lars Gustavsson; Hans Ellegren
Time period covered
Sep 26, 2019
Description
Theoretical work suggests that sexual conflict should promote the maintenance of genetic diversity by the opposing directions of selection on males and females. If such conflict is pervasive, it could potentially lead to genomic heterogeneity in levels of genetic diversity an idea that so far has not been empirically tested on a genome-wide scale. We used large-scale population genomic and transcriptomic data from the collared flycatcher (Ficedula albicollis) to analyse how sexual conflict, for which we use sex-biased gene expression as a proxy, relates to genetic variability. Here, we demonstrate that the extent of sex-biased gene expression of both male-biased and female-biased genes is significantly correlated with levels of nucleotide diversity in gene sequences and that this correlation extends to diversity levels also in intergenic DNA and introns. We find signatures of balancing selection in sex-biased genes but also note that relaxed purifying selection could potential...
m
Supplementary Data S1: Next generation sequencing from Hepatozoon canis...
data.mendeley.com
Updated Feb 11, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexandre Léveillé (2019). Supplementary Data S1: Next generation sequencing from Hepatozoon canis (Apicomplexa: Coccidia: Adeleorina): Complete apicoplast genome and multiple mitochondrion-associated sequences [Dataset]. http://doi.org/10.17632/bs2z92449s.1
Explore at:
Unique identifier
https://doi.org/10.17632/bs2z92449s.1
Dataset updated
Feb 11, 2019
Authors
Alexandre Léveillé
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These files comprise all of the NGS sequence assemblies referred to in the article: "Next generation sequencing from Hepatozoon canis (Apicomplexa: Coccidia: Adeleorina): Complete apicoplast genome and multiple mitochondrion-associated sequences."

All assemblies were generated from Illumina HiSeq 2500 sequencing data (126 bp paired-end reads, insert length ~500 bp). In the case of mitochondrion-associated sequences 1, 2, 3 and 4: PCR and Sanger sequencing data were utilized to provide additional assembly coverage of CDS regions.

Files included are: BAM assembly files: .bam, .bai and .fasta (these files are needed together to generate a BAM assembly flat file - supported by many software platforms).

Geneious assembly files: Complete annotated assemblies (with NGS read pairings) can be viewed with Geneious software (versions 6.1 or newer). These files will provide the greatest details of the assembly data.

Jpeg images of Geneious assemblies: These files were provided for ease of viewing and rapid analysis. Note: images were not generated for the complete ribosomal DNA unit and 18S rDNA variant assemblies as these assemblies were too large to viewed as images.
manuscript_analysis_benchmarking_output
figshare.com
txt
Updated Aug 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Braden Tierney (2021). manuscript_analysis_benchmarking_output [Dataset]. http://doi.org/10.6084/m9.figshare.15833451.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.15833451.v1
Dataset updated
Aug 20, 2021
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Braden Tierney
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the raw data of the benchmarking analysis used to generate Figure 5 and Supplementary Figure 2 in the manuscript.Note that the first column in this csv contains information on the number of vibrations (last number before the .rds extension) as well as the number of dependent variables (e.g. ldl-3_DR1TFIBE_quantvoe_output_1_100.rds contained 3 dependent variables).
Z
GTDB r214.1 Mash Database (UNOFFICIAL MIRROR)
data.niaid.nih.gov
data-staging.niaid.nih.gov
+1more
Updated Jun 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Josh L. Espinoza (2023). GTDB r214.1 Mash Database (UNOFFICIAL MIRROR) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8048186
Explore at:
Dataset updated
Jun 17, 2023
Dataset provided by
J. Craig Venter Institute
Authors
Josh L. Espinoza
License
https://www.gnu.org/licenses/agpl.txthttps://www.gnu.org/licenses/agpl.txt
Description
This is an UNOFFICIAL host for the GTDB mash sketch based on GTDB r214.1

Intended use of this file is to include in the VEBA database for quicker GTDB-Tk analysis.

Created by running the following command using GTDB-Tk v2.3.0 on the S1 sample from Zenodo:7946802:

gtdbtk classify_wf --genome_dir veba_output/binning/prokaryotic/S1/output/genomes/ --out_dir test_output -x fa --cpus 1 --mash_db ./gtdb_r214.msh

Source Files:

gtdbtk_r214_data.tar.gz

RELEASE_NOTES.txt

Release Notes:

Release 214.1:

Correction regarding the classification of the genome "GB_GCA_902406375.1" in 214.1 release. We have identified an error in the taxonomy assignment for this particular genome.

The genome GB_GCA_902406375.1 was previously classified as Collinsella sp905215505 in some files . We have reevaluated the taxonomy and determined that the correct classification should be Collinsella sp002232035. We have rectified this error and made the necessary updates to the following files within the package: - bac120_taxonomy_r214.tsv - sp_clusters_r214.tsv - ssu_all_r214.tar.gz

Notes:

We thank Jan MareÅ¡ for his help in curating the Cyanobacteria

Phylum names have been updated following the valid publication of 42 names in IJSEM (https://pubmed.ncbi.nlm.nih.gov/34694987/), including Bacillota and Pseudomonadota

Fixed issue with SSU files where sequences started 2 bp after correct start and stopped 1 bp after correct end of sequence. Thanks to CX for bringing this issue to our attention: https://forum.gtdb.ecogenomic.org/t/16s-23s-and-ssu-all-r207/307/2

SSU files now provide sequences in their 5' to 3' orientation

Changed QC criterion for number of contigs from 1000 to 2000 in order to better align the GTDB criteria with RefSeq (https://www.ncbi.nlm.nih.gov/assembly/help/anomnotrefseq/)

Changed QC criterion to use ar53 instead of ar122 marker set. The impact of this change was evaluated on the 353,569 genomes (~6,100 archaeal) considered for GTDB R207: -- only 1 additional genome passed QC -- only 21 additional genomes failed QC which included the following species representatives: -- s_Methanoregula sp002497485 -- s_Methanobrevibacter_A sp017634055 -- s_Methanosphaera sp003266165 -- s_MGIIa-L1 sp002688825 -- s_MGIIb-N2 sp002503665 -- s_MGIIa-L2 sp002692685 -- s_MGIIb-O3 sp002730445 -- s_DTDI01 sp011334935 -- s_Methanosphaera sp017652595 -- s_Nitrosopelagicus sp902606945 -- s_Methanolinea sp002501965

If you have found this useful, please cite the original publications:

Chaumeil PA, et al. 2022. GTDB-Tk v2: memory friendly classification with the Genome Taxonomy Database. Bioinformatics, btac672.

Parks, D.H., et al. (2021). GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Research, 50: D785–D794.
o
Symphony pre-built single-cell reference atlases
explore.openaire.eu
zenodo.org
Updated Jul 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joyce Kang (2021). Symphony pre-built single-cell reference atlases [Dataset]. http://doi.org/10.5281/zenodo.4602301
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.4602301
Dataset updated
Jul 9, 2021
Authors
Joyce Kang
Description
Pre-built Symphony reference objects that can be downloaded and used to map new query datasets. The Symphony algorithm is used to perform reference mapping to these atlases. Preprint: https://www.biorxiv.org/content/10.1101/2020.11.18.389189v2 Usage: https://github.com/immunogenomics/symphony References available for download: 10x PBMCs Atlas (pbmcs_10x_reference.rds) Pancreatic Islet Cells Atlas (pancreas_plate-based_reference.rds) Fetal Liver Hematopoiesis Atlas (fetal_liver_reference_3p.rds) Healthy Fetal Kidney Atlas (kidney_healthy_fetal_reference.rds) T cell CITE-seq atlas (tbru_ref.rds) Cross-tissue Fibroblast Atlas (see here) Cross-tissue Inflammatory Immune Atlas (here) Tabula Muris Senis (FACS) Atlas (TMS_facs_reference.rds) To read in a reference into R, one may simply execute: reference = readRDS('path/to/reference_name.rds') Note: To be able to map query datasets into the reference UMAP coordinates, you must also download the corresponding 'uwot_model' file and set the reference$save_uwot_path. {"references": ["https://www.biorxiv.org/content/10.1101/2020.11.18.389189v2"]}
References Database from Behavioural change models for infectious disease...
search.datacite.org
datasetcatalog.nlm.nih.gov
Updated Dec 5, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Frederik Verelst; Lander Willem; Philippe Beutels (2016). References Database from Behavioural change models for infectious disease transmission: a systematic review (2010–2015) [Dataset]. http://doi.org/10.6084/m9.figshare.4285238.v1
Explore at:
Unique identifier
https://doi.org/10.6084/m9.figshare.4285238.v1
Dataset updated
Dec 5, 2016
Dataset provided by
DataCite
The Royal Society
Authors
Frederik Verelst; Lander Willem; Philippe Beutels
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We review behavioural change models (BCM) for infectious disease transmission in humans. Following the Cochrane collaboration guidelines and the PRISMA statement, our systematic search and selection yielded 178 papers covering the period 2010–2015. We observe an increasing trend in published BCMs, frequently coupled to (re)emergence events, and propose a categorization by distinguishing how information translates into preventive actions. Behaviour is usually captured by introducing information as a dynamic parameter (76/178) or by introducing an economic objective function, either with (26/178) or without (37/178) imitation. Approaches using information thresholds (29/178) and exogenous behaviour formation (16/178) are also popular. We further classify according to disease, prevention measure, transmission model (with 81/178 population, 6/178 metapopulation and 91/178 individual-level models) and the way prevention impacts transmission. We highlight the minority (15%) of studies that use any real-life data for parametrization or validation and note that BCMs increasingly use social media data and generally incorporate multiple sources of information (16/178), multiple types of information (17/178) or both (9/178). We conclude that individual-level models are increasingly used and useful to model behaviour changes. Despite recent advancements, we remain concerned that most models are purely theoretical and lack representative data and a validation process.
r
Classifying protein kinase conformations with machine learning: data
resodate.org
zenodo.org
Updated Jul 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ivan REVEGUK (2023). Classifying protein kinase conformations with machine learning: data [Dataset]. https://resodate.org/resources/aHR0cHM6Ly96ZW5vZG8ub3JnL3JlY29yZHMvODE3NTM3MA==
Explore at:
Dataset updated
Jul 23, 2023
Dataset provided by
Zenodo
Authors
Ivan REVEGUK
Description
This data collection accompanies the manuscript Classifying protein kinase conformations with machine learning.

It is created using thekinactivev0.1tool written in pure Python v3.10. Note that the data areprovided for the reference and reproducibility purposes and will not be compatible with later versions ofkinactive built uponlXtractor 0.1.1. Refer to thekinactive documentationfor instructions on how to obtain an actualized version of the structural kinome collection.

File descriptions:

db_v3.tar.gz -- a structural kinome collection archive. One can unpack it and inspect the contents orload it into the Python interpreter using `kinactive` or `lXtractor` tools. db_af2.tar.gz -- an AlphaFold2 kinome collection for Swiss-Prot sequences. default_*_vs.tsv -- structure/sequence variables calculated with lXtractor and used in an interpretable ML pipeline. *_features.tsv -- lists of ranked features selected by the eBoruta tool for each classifier. Supplement_labels.tsv -- ML model predictions for each PK domain structure found in db_v3. predictions_af2.csv -- Active/Inactive and DFG labels predicted for domains in db_af2.

Facebook

Twitter

Click to copy link

Link copied

Cite

Christopher T Saunders (2025). Sawfish Bioinformatics App Note SV VCFs and assessment results [Dataset]. http://doi.org/10.5281/zenodo.14898461

Sawfish Bioinformatics App Note SV VCFs and assessment results

Explore at:

26 scholarly articles cite this dataset (View in Google Scholar)

Unique identifier

https://doi.org/10.5281/zenodo.14898461

Dataset updated

Feb 20, 2025

Authors

Christopher T Saunders

Description

This tarball "sawfish_publication_sv_vcfs_and_assessments.tar.gz" contains structural variant VCFs from analyses in the sawfish Bioinformatics App Note, as well as the corresponding VCF assessment results and assessment scripts. The corresponding Bioinformatics article can be found here: https://doi.org/10.1093/bioinformatics/btaf136The top 3 levels of of the tarball file tree are as follows: sawfish_publication_sv_vcfs_and_assessments ├── CEPH1463_pedigree_analysis │ ├── assessment_scripts │ │ ├── ceph.GRCh38.viterbi.oa.csv.gz │ │ ├── get_gqcut.bash │ │ ├── get_pass_ge50.bash │ │ └── run_concordance.bash │ ├── README.md │ └── sv_calls │ ├── pbsv │ ├── sawfish │ └── sniffles ├── HG002_depth_titration │ ├── assessment_scripts │ │ ├── giab_cmrg │ │ ├── giab_t2t_20241113 │ │ └── truvari_utils │ ├── benchmark_data │ │ ├── cmrg_1.0 │ │ └── T2T_V0.019-20241113_T2T-HG002-Q100v1.1 │ ├── README.md │ ├── reference_fasta │ │ ├── human_GRCh38_no_alt_analysis_set.fasta │ │ ├── human_GRCh38_no_alt_analysis_set.fasta.fai │ │ └── README.md │ ├── singuarlity_images │ │ ├── README.md │ │ └── truvari-v4.2.2.sif │ └── sv_calls │ ├── pbsv │ ├── sawfish │ └── sniffles └── README.md

Clear search

Close search

Google apps

Main menu

Sawfish Bioinformatics App Note SV VCFs and assessment results

Summary of applications in bioinformatics.

[Dataset] Data for the course "Population Genomics" at Aarhus University

Data from: Accurate, scalable, and fully automated inference of species...

Usage Notes

Protocol data (R version)

zol: prepTG Databases for ESKAPE Pathogens

Research data for "Subjective data models in bioinformatics: Do wet-lab and...

Supplementary Data for: "Association of QPRT gene polymorphisms with...

Supplementary data: Comparison of target enrichment strategies for ancient...

Bioinformatic databases survey

Bioinformatic databases survey

Data content

Data sources

Data_Sheet_1_Application note: TDbasedUFE and TDbasedUFEadv: bioconductor...

GTDB r220 Mash Database (UNOFFICIAL MIRROR)

AbEMap results: Epitope Maps with AF2 folded (with NO templates) Ab inputs

Data from: Sex-biased gene expression, sexual antagonism and levels of...

Supplementary Data S1: Next generation sequencing from Hepatozoon canis...

manuscript_analysis_benchmarking_output

GTDB r214.1 Mash Database (UNOFFICIAL MIRROR)

Release 214.1:

Notes:

Symphony pre-built single-cell reference atlases

References Database from Behavioural change models for infectious disease...

Classifying protein kinase conformations with machine learning: data

Sawfish Bioinformatics App Note SV VCFs and assessment results