Facebook
TwitterThis tarball "sawfish_publication_sv_vcfs_and_assessments.tar.gz" contains structural variant VCFs from analyses in the sawfish Bioinformatics App Note, as well as the corresponding VCF assessment results and assessment scripts. The corresponding Bioinformatics article can be found here: https://doi.org/10.1093/bioinformatics/btaf136The top 3 levels of of the tarball file tree are as follows: sawfish_publication_sv_vcfs_and_assessments ├── CEPH1463_pedigree_analysis │ ├── assessment_scripts │ │ ├── ceph.GRCh38.viterbi.oa.csv.gz │ │ ├── get_gqcut.bash │ │ ├── get_pass_ge50.bash │ │ └── run_concordance.bash │ ├── README.md │ └── sv_calls │ ├── pbsv │ ├── sawfish │ └── sniffles ├── HG002_depth_titration │ ├── assessment_scripts │ │ ├── giab_cmrg │ │ ├── giab_t2t_20241113 │ │ └── truvari_utils │ ├── benchmark_data │ │ ├── cmrg_1.0 │ │ └── T2T_V0.019-20241113_T2T-HG002-Q100v1.1 │ ├── README.md │ ├── reference_fasta │ │ ├── human_GRCh38_no_alt_analysis_set.fasta │ │ ├── human_GRCh38_no_alt_analysis_set.fasta.fai │ │ └── README.md │ ├── singuarlity_images │ │ ├── README.md │ │ └── truvari-v4.2.2.sif │ └── sv_calls │ ├── pbsv │ ├── sawfish │ └── sniffles └── README.md
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The top row includes problems about RNA secondary structure predictions and the middle row includes problems about alignment of biological sequences. Note that the estimators in the same column corresponds to each other.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets, conda environments and Softwares for the course "Population Genomics" of Prof Kasper Munch. This course material is maintained by the health data science sandbox. This webpage shows the latest version of the course material.
The data is connected to the following repository: https://github.com/hds-sandbox/Popgen_course_aarhus. The original course material from Prof Kasper Munch is at https://github.com/kaspermunch/PopulationGenomicsCourse.
Description
The participants will after the course have detailed knowledge of the methods and applications required to perform a typical population genomic study.
The participants must at the end of the course be able to:
The course introduces key concepts in population genomics from generation of population genetic data sets to the most common population genetic analyses and association studies. The first part of the course focuses on generation of population genetic data sets. The second part introduces the most common population genetic analyses and their theoretical background. Here topics include analysis of demography, population structure, recombination and selection. The last part of the course focus on applications of population genetic data sets for association studies in relation to human health.
Curriculum
The curriculum for each week is listed below. "Coop" refers to a set of lecture notes by Graham Coop that we will use throughout the course.
Course plan
Facebook
TwitterCurrent genome sequencing initiatives across a wide range of life forms offer significant potential to enhance our understanding of evolutionary relationships and support transformative biological and medical applications. Species trees play a central role in many of these applications; however, despite the widespread availability of genome assemblies, accurate inference of species trees remains challenging for many scientists due to the limited automation, significant domain expertise, and substantial computational resources required by conventional methods. To address this limitation, we present ROADIES, a fully-automated pipeline to infer species trees starting from raw genome assemblies (those lacking prior annotations). In contrast to the prominent approach, ROADIES randomly selects segments of the input genomes to generate gene trees. This eliminates the need to choose any single reference species or perform the cumbersome steps of gene annotations and whole genome alignments. ROA..., , , # Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES
https://doi.org/10.5061/dryad.tht76hf73
ROADIES is a novel pipeline designed for phylogenetic tree inference of the species directly from their raw genomic assemblies.
For further details related to how to run the tool ROADIES, please refer to our Wiki:Â https://turakhia.ucsd.edu/ROADIES/
This repository contains the output files generated by ROADIES (v0.1.0) (https://github.com/TurakhiaLab/ROADIES/releases/tag/v0.1.0) for estimating the species tree for the following datasets (in the accurate mode of operation):
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We published 3 protocols illustrating how MetaNeighbor can be used to quantify cell type replicability across single cell transcriptomic datasets.The data files included here are needed to run the R version of the protocols available on Github (https://github.com/gillislab/MetaNeighbor-Protocol) in RMarkdown (.Rmd) and Jupyter (.ipynb) notebook format. To run the protocols, download the protocols on Github, download the data on Figshare, place the data and protocol files in the same directory, then run the notebooks in Rstudio or Jupyter.The scripts used to generate the data are included in the Github directory. Briefly: - full_biccn_hvg.rds contains a single cell transcriptomic dataset published by the Brain Initiative Cell Census Network (in SingleCellExperiment format). It combines data from 7 datasets obtained in the mouse primary motor cortex (https://www.biorxiv.org/content/10.1101/2020.02.29.970558v2). Note that this dataset only contains highly variable genes. - biccn_hvgs.txt: highly variable genes from the BICCN dataset described above (computed with the MetaNeighbor library). - biccn_gaba.rds: same dataset as full_biccn_hvg.rds, but restricted to GABAergic neurons. The dataset contains all genes common to the 7 BICCN datasets (not just highly variable genes). - go_mouse.rds: gene ontology annotations, stored as a list of gene symbols (one element per gene set).- functional_aurocs.txt: results of the MetaNeighbor functional analysis in protocol 3.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Each of the tar.gz compressed directories corresponds to prepTG databases (for the zol suite) featuring distinct, representative genomes for one of the six genera containing ESKAPE pathogens. Representative genomes for each genus/taxon were selected using skDER v1.0.7 in greedy mode with 99% ANI and 90% AF cutoffs.
The compressed folders also contain an extra file, corresponding to a species tree of the representative genomes constructed using GToTree with Universal markers (ribosomal proteins) from Hug et al. 2016 and in best-hits mode. Note, GToTree was modified to always use -super5 mode for SCG alignments for computational efficiency. Also, note, because genomes can be dropped by GToTree prior to phylogeny inference (e.g. if they lack enough SCGs), not all genomes in the database might be represented in the phylogenies.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Subjective data models dataset
This dataset is comprised of data collected from study participants, for a study into how people working with biological data perceive data, and whether or not this perception of data aligns with a person's experiential and educational background. We call the concept of what data looks like to an individual a "subjective data model".
Todo: link paper/preprint once published.
Computational python analysis code: https://doi.org/10.5281/zenodo.7022789 and https://github.com/yochannah/subjective-data-models-analysis
Files
Transcripts of the recorded sessions are attached and have been verified by a second researcher. These files are all in plain text .txt format. Note that participant 3 did not agree to sharing the transcript of their interview.
Interview paper files This folder has digital and photographed versions of the files shown to the participants for the file mapping task. Note that the original files are from the NCBI and from FlyBase.
Videos and stills from the recordings have been deleted in line with the Data Management Plan and Ethical Review.
anonymous_participant_list.csv shows which files have transcripts associated (not all participants agreed to share transcripts), what the order of Tasks A and B were, the date of interview, and what entities participants added to the set provided (if any). See the paper methods for more info about why entities were added to the set.
cards.txt is a full list of the cards presented in the tasks.
background survey and background manual annotations are the select survey data about participant background and manual additions to this where necessary, e.g. to interpret free text.
codes.csv shows the qualitative codes used within the transcripts.
entry_point.csv is a record of participants' identified entry points into the data.
file_mapping_responses shows a record of responses to the file mapping task.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset comprises the key supplementary materials supporting the findings of the research article titled “Association of QPRT Gene Polymorphisms with Postpartum Depression in Chinese Cesarean Parturients: A Candidate Gene Association Study.” The data were generated through bioinformatics database mining and in vitro molecular cloning design.Data Generation and Processing Methods:Bioinformatics Data: These data were retrieved from online public databases in 2024. The eQTL (expression quantitative trait loci) summary plot for the QPRT gene rs9933310 locus across human tissues (with a focus on the cerebral cortex and hippocampus) was queried and exported from the GTEx Portal (version 8). Visualizations of chromatin states, histone modification profiles, and cis-regulatory element predictions for the rs9933310 locus were obtained by querying the ENCODE, GeneCards (integrating GeneHancer), and 3DSNP v2.0 databases via the UCSC Genome Browser (assembly GRCh38/hg38), using the specific genomic coordinates (chr16:29679583). No secondary statistical calculations were performed on the raw database outputs during this process.Experimental Sequence Data: Based on NCBI reference sequences, DNA fragments encompassing the rs9933310 locus and its flanking regions were designed using sequence design software (e.g., Primer Premier) and commercially synthesized. The wild-type sequence (QPRT-W) contains the ‘A’ allele, while the mutant sequence (QPRT-M) features a single nucleotide substitution to ‘G’ to model the SNP. These sequences were subsequently cloned into the pGL3 reporter vector for functional validation and were verified by Sanger sequencing.Dataset Content and Spatiotemporal Information:The data itself does not pertain to specific geographical spatial information or continuous time series. Its temporal context is defined by the date of query/generation (2024) and the specific versions of the underlying public databases (e.g., GTEx v8, hg38). The dataset consists of four core files:Supplementary Figure 1 : An eQTL analysis plot illustrating the association between rs9933310 genotypes and QPRT expression. Data points represent expression levels from individuals of different genotypes. The Y-axis displays the normalized QPRT expression level (typically in units like TPM), and the X-axis shows genotype groups. This figure visually demonstrates the expression trend: AA > AG > GG.Supplementary Figure 2 : A genome browser screenshot from the ENCODE database, displaying enrichment signals of various histone modifications (e.g., H3K4me3, H3K27ac) in the region surrounding the rs9933310 locus. These modifications are hallmarks of promoter/enhancer activity.Supplementary Figure 3 and 4 (or as separate files): Functional prediction screenshots from the GeneCards/GeneHancer and 3DSNP databases, respectively. They present graphical evidence and corresponding confidence scores predicting the locus's potential promoter and/or enhancer activity.Supplementary Table 1 : A table listing the complete DNA sequences for QPRT-W and QPRT-M. The table contains 2 rows (representing the two constructs) and 1 column (“DNA Sequence”). Sequences are provided as 5‘->3’ nucleotide strings (unit: nucleotide), with no other measurement units involved. This dataset is complete with no missing values; all sequences are fully provided and have been verified.Data Quality and Usage Notes:The bioinformatics images in this dataset are static outputs exported from authoritative public databases. Any inherent “error” or uncertainty is already encapsulated within the original databases' statistical models and confidence intervals and is not separately annotated in the figures. The experimental sequence data are accurate and have been validated by sequencing. No data points are missing due to human error or processing in any of the files.All files are in widely compatible formats: PNG (images) and DOCX (document). They can be opened and viewed using any standard image viewer (e.g., Windows Photo Viewer, Preview) and office suite software (e.g., Microsoft Word, WPS Office, Google Docs) or text editors. No specialized or niche software is required.This dataset is intended to provide transparent and traceable primary evidence for the proposed functional mechanism of the rs9933310 locus discussed in the associated manuscript. It is available for peer researchers to review, reference, or use as an educational example in related bioinformatics and molecular biology contexts.
Facebook
TwitterSupplementary Note 1 – Laboratory workflow Supplementary Note 2 - Bioinformatics and Statistical Analysis Supplementary Note 3 – Results of the Bioinformatics and Statistical Analysis Supplementary Figure 1: Comparison of (A) mean coverage, (B) standard deviation of the mean coverage, (C) enrichment factor, (D) and the percentage of the genome covered 5 fold, (E) distribution of the fragment length and (F) frequency of the aDNA damage for the ancient and modern strains of M. leprae. Three independent replicates were performed for each method. Labels of the ancient samples are in black and for the modern samples in red. Boxplots of the array are blue, of the DNA bait capture red and the RNA baits capture is green and grey for the first and second round, respectively Supplementary Figure 2: Comparison of (A) mean coverage, (B) standard deviation of the mean coverage, (C) enrichment factor, (D) and the percentage of the genome covered 5 fold, (E) distribution of the fragment length and (F) frequency of the aDNA damage for the ancient and modern strains of T. pallidum. Three independent replicates were performed for each method. Labels of the ancient samples are in black and for the modern samples in red. Boxplots of the array are blue, of the DNA bait capture red and the RNA baits capture is green and grey for the first and second round, respectively Supplementary Figure 3: Number of unique reads for the three replicate batches of the three tested methods. The number of unique reads in the second round of hybridization with the RNA baits does not strongly increase compared to the first round. Supplementary Table 1: List of all samples used in this study group according to organism and age together with the original publications. Supplementary Table 4: Comparison of the specific reads of the three tested protocols. Supplementary Table 6: Comparison of the variance within each method tested. Supplementary Table 7: Comparison of the costs per reaction.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset surveys bioinformatic databases published in the NAR database issue from 1995 to 2022. It evaluates the current number of citations and availability of each ressources.
The dataset is composed of two tables :
A. Databases table : Contains the information of each database published in the NAR database issue.
B. Articles table : Contains the information collected for the NAR articles
Note that the presented dataset leverage and expand on the dataset gathered and published in Imker, H.J., 2020. Who Bears the Burden of Long-Lived Molecular Biology Databases?. Data Science Journal, 19(1), p.8. The original dataset collected by Dr. Imker is available at : https://doi.org/10.13012/B2IDB-4311325_V1
The dataset was collected and is maintained by undergraduate students of a CURE class (Course-based Undergraduate Research Experience) held at the University of Arizona. All students of the class have participated to the collection, update and curation the dataset that is available as a database and a web-portal at https://hurwitzlab.shinyapps.io/DS_Heroes/. Students could elect to be added or not as author to this Zenodo repository.
The CURE class BAT102 "Data Science Heroes: An undergraduate research experience in Open Data Science Practices" gives the students an opportunity to learn about open science and investigate open data practices in bioinformatics through a survey of the databases published in the NAR database issue.
Facebook
TwitterMotivationTensor decomposition (TD)-based unsupervised feature extraction (FE) has proven effective for a wide range of bioinformatics applications ranging from biomarker identification to the identification of disease-causing genes and drug repositioning. However, TD-based unsupervised FE failed to gain widespread acceptance due to the lack of user-friendly tools for non-experts.ResultsWe developed two bioconductor packages—TDbasedUFE and TDbasedUFEadv—that enable researchers unfamiliar with TD to utilize TD-based unsupervised FE. The packages facilitate the identification of differentially expressed genes and multiomics analysis. TDbasedUFE was found to outperform two state-of-the-art methods, such as DESeq2 and DIABLO.Availability and implementationTDbasedUFE and TDbasedUFEadv are freely available as R/Bioconductor packages, which can be accessed at https://bioconductor.org/packages/TDbasedUFE and https://bioconductor.org/packages/TDbasedUFEadv, respectively.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is an UNOFFICIAL host for the GTDB mash sketch based on GTDB r220
Intended use of this file is to include in the VEBA database for quicker GTDB-Tk analysis.
Created by running the following command using GTDB-Tk v2.4.0 on the S1 sample from Zenodo:7946802:
gtdbtk classify_wf --genome_dir veba_output/binning/prokaryotic/S1/output/genomes/ --out_dir test_output -x fa --cpus 1 --mash_db ./gtdb_r220.msh
Source Files:
RELEASE_NOTES.txt
Release 220.0: -------------- GTDB release R09-RS220 comprises 596,859 genomes organised into 113,104 species clusters. Additional statistics for this release are available on the GTDB Statistics page. Release notes: -------------- - Average nucleotide identity (ANI) between genomes is now calculated using skani (Shaw et al., Nat Methods, 2023) instead of FastANI (Jain et al, Nat Commun, 2018). skani provides a substantial reduction in computational requirements while producing similar ANI values and more accurate alignment fraction (AF) values. - CheckM v2 information is included on the website and in the metadata files, noting at this stage that these data were not used for the QC step in release 220. - Post-curation cycle, we identified updated spelling for 15 taxon names: p_Calescibacterota (updated name: Calescibacteriota) c_Brachyspirae (updated name: Brachyspiria) c_Leptospirae (updated name: Leptospiria) o_Ammonifexales (updated name: Ammonificales) o_Exiguobacterales (updated name: Exiguobacteriales) o_Hydrogenedentiales (updated name: Hydrogenedentales) o_Phormidesmiales (updated name: Phormidesmidales) f_Arcanobacteraceae (updated name: Arcanibacteraceae) f_Acetonemaceae (updated name: Acetonemataceae) f_Ethanoligenenaceae (updated name: Ethanoligenentaceae) f_Exiguobacteraceae (updated name: Exiguobacteriaceae) f_Geitlerinemaceae (updated name: Geitlerinemataceae) f_Koribacteraceae (updated name: Korobacteraceae) f_Phormidesmiaceae (updated name: Phormidesmidaceae) f_Porisulfidaceae (updated name: Poriferisulfidaceae) Note that the LPSN linkouts point to the correct updated names. We encourage users to use the updated names as these will appear in the next release. - Post-curation cycle, we discovered that two provisionally named families, Nitrincolaceae and Denitrovibrionaceae have been validly named under the ICNP as Balneatricaceae and Geovibrionaceae, respectively. We encourage users to use the validly published names as these will appear in the next release. - We thank Jan Mares for his assistance in curating the class Cyanobacteriia and Brian Kemish for providing IT support to the project.
If you have found this useful, please cite the original publications:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The third application of AbEmap is to use computationally modelled antibody structures to map epitopes on given antigen structures. This dataset contains epitope mapping results of 40 BM5 antigens when using AlphaFold folded antibody structures (Note - No templates were used in structure prediction). The resulting AbEMap scores are saved as b-factors in the PDB files included in this data set.
Facebook
TwitterTheoretical work suggests that sexual conflict should promote the maintenance of genetic diversity by the opposing directions of selection on males and females. If such conflict is pervasive, it could potentially lead to genomic heterogeneity in levels of genetic diversity an idea that so far has not been empirically tested on a genome-wide scale. We used large-scale population genomic and transcriptomic data from the collared flycatcher (Ficedula albicollis) to analyse how sexual conflict, for which we use sex-biased gene expression as a proxy, relates to genetic variability. Here, we demonstrate that the extent of sex-biased gene expression of both male-biased and female-biased genes is significantly correlated with levels of nucleotide diversity in gene sequences and that this correlation extends to diversity levels also in intergenic DNA and introns. We find signatures of balancing selection in sex-biased genes but also note that relaxed purifying selection could potential...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These files comprise all of the NGS sequence assemblies referred to in the article: "Next generation sequencing from Hepatozoon canis (Apicomplexa: Coccidia: Adeleorina): Complete apicoplast genome and multiple mitochondrion-associated sequences."
All assemblies were generated from Illumina HiSeq 2500 sequencing data (126 bp paired-end reads, insert length ~500 bp). In the case of mitochondrion-associated sequences 1, 2, 3 and 4: PCR and Sanger sequencing data were utilized to provide additional assembly coverage of CDS regions.
Files included are: BAM assembly files: .bam, .bai and .fasta (these files are needed together to generate a BAM assembly flat file - supported by many software platforms).
Geneious assembly files: Complete annotated assemblies (with NGS read pairings) can be viewed with Geneious software (versions 6.1 or newer). These files will provide the greatest details of the assembly data.
Jpeg images of Geneious assemblies: These files were provided for ease of viewing and rapid analysis. Note: images were not generated for the complete ribosomal DNA unit and 18S rDNA variant assemblies as these assemblies were too large to viewed as images.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the raw data of the benchmarking analysis used to generate Figure 5 and Supplementary Figure 2 in the manuscript.Note that the first column in this csv contains information on the number of vibrations (last number before the .rds extension) as well as the number of dependent variables (e.g. ldl-3_DR1TFIBE_quantvoe_output_1_100.rds contained 3 dependent variables).
Facebook
Twitterhttps://www.gnu.org/licenses/agpl.txthttps://www.gnu.org/licenses/agpl.txt
This is an UNOFFICIAL host for the GTDB mash sketch based on GTDB r214.1
Intended use of this file is to include in the VEBA database for quicker GTDB-Tk analysis.
Created by running the following command using GTDB-Tk v2.3.0 on the S1 sample from Zenodo:7946802:
gtdbtk classify_wf --genome_dir veba_output/binning/prokaryotic/S1/output/genomes/ --out_dir test_output -x fa --cpus 1 --mash_db ./gtdb_r214.msh
Source Files:
gtdbtk_r214_data.tar.gz
RELEASE_NOTES.txt
Release Notes:
Correction regarding the classification of the genome "GB_GCA_902406375.1" in 214.1 release. We have identified an error in the taxonomy assignment for this particular genome.
The genome GB_GCA_902406375.1 was previously classified as Collinsella sp905215505 in some files . We have reevaluated the taxonomy and determined that the correct classification should be Collinsella sp002232035. We have rectified this error and made the necessary updates to the following files within the package: - bac120_taxonomy_r214.tsv - sp_clusters_r214.tsv - ssu_all_r214.tar.gz
We thank Jan Mareš for his help in curating the Cyanobacteria
Phylum names have been updated following the valid publication of 42 names in IJSEM (https://pubmed.ncbi.nlm.nih.gov/34694987/), including Bacillota and Pseudomonadota
Fixed issue with SSU files where sequences started 2 bp after correct start and stopped 1 bp after correct end of sequence. Thanks to CX for bringing this issue to our attention: https://forum.gtdb.ecogenomic.org/t/16s-23s-and-ssu-all-r207/307/2
SSU files now provide sequences in their 5' to 3' orientation
Changed QC criterion for number of contigs from 1000 to 2000 in order to better align the GTDB criteria with RefSeq (https://www.ncbi.nlm.nih.gov/assembly/help/anomnotrefseq/)
Changed QC criterion to use ar53 instead of ar122 marker set. The impact of this change was evaluated on the 353,569 genomes (~6,100 archaeal) considered for GTDB R207: -- only 1 additional genome passed QC -- only 21 additional genomes failed QC which included the following species representatives: -- s_Methanoregula sp002497485 -- s_Methanobrevibacter_A sp017634055 -- s_Methanosphaera sp003266165 -- s_MGIIa-L1 sp002688825 -- s_MGIIb-N2 sp002503665 -- s_MGIIa-L2 sp002692685 -- s_MGIIb-O3 sp002730445 -- s_DTDI01 sp011334935 -- s_Methanosphaera sp017652595 -- s_Nitrosopelagicus sp902606945 -- s_Methanolinea sp002501965
If you have found this useful, please cite the original publications:
Chaumeil PA, et al. 2022. GTDB-Tk v2: memory friendly classification with the Genome Taxonomy Database. Bioinformatics, btac672.
Parks, D.H., et al. (2021). GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Research, 50: D785–D794.
Facebook
TwitterPre-built Symphony reference objects that can be downloaded and used to map new query datasets. The Symphony algorithm is used to perform reference mapping to these atlases. Preprint: https://www.biorxiv.org/content/10.1101/2020.11.18.389189v2 Usage: https://github.com/immunogenomics/symphony References available for download: 10x PBMCs Atlas (pbmcs_10x_reference.rds) Pancreatic Islet Cells Atlas (pancreas_plate-based_reference.rds) Fetal Liver Hematopoiesis Atlas (fetal_liver_reference_3p.rds) Healthy Fetal Kidney Atlas (kidney_healthy_fetal_reference.rds) T cell CITE-seq atlas (tbru_ref.rds) Cross-tissue Fibroblast Atlas (see here) Cross-tissue Inflammatory Immune Atlas (here) Tabula Muris Senis (FACS) Atlas (TMS_facs_reference.rds) To read in a reference into R, one may simply execute: reference = readRDS('path/to/reference_name.rds') Note: To be able to map query datasets into the reference UMAP coordinates, you must also download the corresponding 'uwot_model' file and set the reference$save_uwot_path. {"references": ["https://www.biorxiv.org/content/10.1101/2020.11.18.389189v2"]}
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We review behavioural change models (BCM) for infectious disease transmission in humans. Following the Cochrane collaboration guidelines and the PRISMA statement, our systematic search and selection yielded 178 papers covering the period 2010–2015. We observe an increasing trend in published BCMs, frequently coupled to (re)emergence events, and propose a categorization by distinguishing how information translates into preventive actions. Behaviour is usually captured by introducing information as a dynamic parameter (76/178) or by introducing an economic objective function, either with (26/178) or without (37/178) imitation. Approaches using information thresholds (29/178) and exogenous behaviour formation (16/178) are also popular. We further classify according to disease, prevention measure, transmission model (with 81/178 population, 6/178 metapopulation and 91/178 individual-level models) and the way prevention impacts transmission. We highlight the minority (15%) of studies that use any real-life data for parametrization or validation and note that BCMs increasingly use social media data and generally incorporate multiple sources of information (16/178), multiple types of information (17/178) or both (9/178). We conclude that individual-level models are increasingly used and useful to model behaviour changes. Despite recent advancements, we remain concerned that most models are purely theoretical and lack representative data and a validation process.
Facebook
TwitterThis data collection accompanies the manuscript Classifying protein kinase conformations with machine learning.
It is created using thekinactivev0.1tool written in pure Python v3.10. Note that the data areprovided for the reference and reproducibility purposes and will not be compatible with later versions ofkinactive built uponlXtractor 0.1.1. Refer to thekinactive documentationfor instructions on how to obtain an actualized version of the structural kinome collection.
File descriptions:
db_v3.tar.gz -- a structural kinome collection archive. One can unpack it and inspect the contents orload it into the Python interpreter using `kinactive` or `lXtractor` tools.
db_af2.tar.gz -- an AlphaFold2 kinome collection for Swiss-Prot sequences.
default_*_vs.tsv -- structure/sequence variables calculated with lXtractor and used in an interpretable ML pipeline.
*_features.tsv -- lists of ranked features selected by the eBoruta tool for each classifier.
Supplement_labels.tsv -- ML model predictions for each PK domain structure found in db_v3.
predictions_af2.csv -- Active/Inactive and DFG labels predicted for domains in db_af2.
Facebook
TwitterThis tarball "sawfish_publication_sv_vcfs_and_assessments.tar.gz" contains structural variant VCFs from analyses in the sawfish Bioinformatics App Note, as well as the corresponding VCF assessment results and assessment scripts. The corresponding Bioinformatics article can be found here: https://doi.org/10.1093/bioinformatics/btaf136The top 3 levels of of the tarball file tree are as follows: sawfish_publication_sv_vcfs_and_assessments ├── CEPH1463_pedigree_analysis │ ├── assessment_scripts │ │ ├── ceph.GRCh38.viterbi.oa.csv.gz │ │ ├── get_gqcut.bash │ │ ├── get_pass_ge50.bash │ │ └── run_concordance.bash │ ├── README.md │ └── sv_calls │ ├── pbsv │ ├── sawfish │ └── sniffles ├── HG002_depth_titration │ ├── assessment_scripts │ │ ├── giab_cmrg │ │ ├── giab_t2t_20241113 │ │ └── truvari_utils │ ├── benchmark_data │ │ ├── cmrg_1.0 │ │ └── T2T_V0.019-20241113_T2T-HG002-Q100v1.1 │ ├── README.md │ ├── reference_fasta │ │ ├── human_GRCh38_no_alt_analysis_set.fasta │ │ ├── human_GRCh38_no_alt_analysis_set.fasta.fai │ │ └── README.md │ ├── singuarlity_images │ │ ├── README.md │ │ └── truvari-v4.2.2.sif │ └── sv_calls │ ├── pbsv │ ├── sawfish │ └── sniffles └── README.md