Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
I initially hosted my blog on a lab server, and have since migrated to WordPress.com. As part of this migration, I am using figshare (as an alternative to my lab server) for hosting supplments to my blog posts, such as graphics and data files.
https://whoisdatacenter.com/index.php/terms-of-use/https://whoisdatacenter.com/index.php/terms-of-use/
Explore the historical Whois records related to blog-bioinformatics.science (Domain). Get insights into ownership history and changes over time.
A database of information on pox viruses. Goals of this project are to acquire and annotate data on poxviruses, and to develop and utilize new tools to facilitate the study of this group of organisms. This basic research is being undertaken with an eye toward the development of novel antiviral therapies, vaccines against human orthopoxvirus infections, new approaches for the environmental detection of virions, and methods to accomplish more rapid diagnosis of disease.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data availability for the blog post on binning giant viruses and their close relatives with anvi'o.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
While multiple bioinformatics software are already available to generate and/or visualize pangenomes, interfaces do not necessary offer flexible analysis performances, limiting the user's capabilities interacting with their data. We recently have introduced a software platform, anvi'o, to bridge some of the gaps in our common bioinformatics toolkit. We are happy to announce that anvi'o can now process, visualize and manipulate pangenomic data in a user-friendly environment. Some modules are still under construction for a fully automatized workflow. Nevertheless, the current anvi'o interface already offers novel opportunities to combine pangenomes with a variety of contextual metadata and exports high-quality figures for publications. This blog describes original pangenomic investigations of publically available genomic collections. It is set to introduce the anvi'o pangenomic workflow to our enthusiastic users community.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Molecular Biology Information Service (MBIS) of the Health Sciences Library System (HSLS) at the University of Pittsburgh conducted a 33-question online
survey to evaluate the effectiveness of services provided by the MBIS. The
survey was administered via Qualtrics. Questions were organized into 6 categories: Demographics, Software, Instruction, Website, Service, and Outreach.
Questions were a mix of multiple choice, ranking, and free text. Participants were recruited during a six-week period in early 2018. The survey was advertised via numerous methods: MBIS blog post, HSLS website post, MBIS listserv notifications, direct email invitations, and during MBIS workshops. The survey did not require oversight by the University of Pittsburgh IRB.The
CSV file contains de-identifed survey responses--identifying information for Q6.7 was redacted.
Also included is a PDF of the
survey questions and a PDF of the Qualtrics survey response report.
THIS RESOURCE IS NO LONGER IN SERVICE, documented on 8/12/13. An expanded version of the Alternative Splicing Annotation Project (ASAP) database with a new interface and integration of comparative features using UCSC BLASTZ multiple alignments. It supports 9 vertebrate species, 4 insects, and nematodes, and provides with extensive alternative splicing analysis and their splicing variants. As for human alternative splicing data, newly added EST libraries were classified and included into previous tissue and cancer classification, and lists of tissue and cancer (normal) specific alternatively spliced genes are re-calculated and updated. They have created a novel orthologous exon and intron databases and their splice variants based on multiple alignment among several species. These orthologous exon and intron database can give more comprehensive homologous gene information than protein similarity based method. Furthermore, splice junction and exon identity among species can be valuable resources to elucidate species-specific genes. ASAP II database can be easily integrated with pygr (unpublished, the Python Graph Database Framework for Bioinformatics) and its powerful features such as graph query, multi-genome alignment query and etc. ASAP II can be searched by several different criteria such as gene symbol, gene name and ID (UniGene, GenBank etc.). The web interface provides 7 different kinds of views: (I) user query, UniGene annotation, orthologous genes and genome browsers; (II) genome alignment; (III) exons and orthologous exons; (IV) introns and orthologous introns; (V) alternative splicing; (IV) isoform and protein sequences; (VII) tissue and cancer vs. normal specificity. ASAP II shows genome alignments of isoforms, exons, and introns in UCSC-like genome browser. All alternative splicing relationships with supporting evidence information, types of alternative splicing patterns, and inclusion rate for skipped exons are listed in separate tables. Users can also search human data for tissue- and cancer-specific splice forms at the bottom of the gene summary page. The p-values for tissue-specificity as log-odds (LOD) scores, and highlight the results for LOD >= 3 and at least 3 EST sequences are all also reported.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Final results from the preliminary survey found here: https://figshare.com/articles/TGAC_-_Repositive_Preliminary_Survey_Results/3503873After that preliminary survey we added some additional questions to gain further insights and then opened the survey up to a wider audience. 50 people responded and in the blog post I will discuss our findings from this survey and our final conclusions.
THIS RESOURCE IS NO LONGER IN SERVICE, documented August 29, 2016. Database containing structural annotations for the proteomes of just under 100 organisms. Using data derived from public databases of translated genomic sequences, representatives from the major branches of Life are included: Prokaryota, Eukaryota and Archaea. The annotations stored in the database may be accessed in a number of ways. The help page provides information on how to access the database. 3D-GENOMICS is now part of a larger project, called e-Protein. The project brings together similar databases at three sites: Imperial College London , University College London and the European Bioinformatics Institute . e-Protein''s mission statement is To provide a fully automated distributed pipeline for large-scale structural and functional annotation of all major proteomes via the use of cutting-edge computer GRID technologies. The following databases are incorporated: NRprot, SCOP, ASTRAL, PFAM, Prosite, taxonomy, COG The following eukaryotic genomes are incorporated: Anopheles gambiae, protein sequences from the mosquito genome; Arabidopsis thaliana, protein sequences from the Arabidopsis genome; Caenorhabditis briggsae, protein sequences from the C.briggsae genome; Caenorhabditis elegans protein sequences from the worm genome; Ciona intestinalis protein sequences from the sea squirt genome; Danio rerio protein sequences from the zebrafish genome; Drosophila melanogaster protein sequences from the fruitfly genome; Encephalitozoon cuniculi protein sequences from the E.cuniculi genome; Fugu rubripes protein sequences from the pufferfish genome; Guillardia theta protein sequences from the G.theta genome; Homo sapiens protein sequences from the human genome; Mus musculus protein sequences from the mouse genome; Neurospora crassa protein sequences from the N.crassa genome; Oryza sativa protein sequences from the rice genome; Plasmodium falciparum protein sequences from the P.falciparum genome; Rattus norvegicus protein sequences from the rat genome; Saccharomyces cerevisiae protein sequences from the yeast genome; Schizosaccharomyces pombe protein sequences from the yeast genome
Microbiome collection date model is a Named Entity Recognition (NER) model that identifies and annotates the collection date of microbiome samples in texts. This is the final model version used to annotate metagenomics publications in Europe PMC and enrich metagenomics studies in MGnify with collection date metadata from literature. For more information, please refer to the following blogs: http://blog.europepmc.org/2020/11/europe-pmc-publications-metagenomics-annotations.html https://www.ebi.ac.uk/about/news/service-news/enriched-metadata-fields-mgnify-based-text-mining-associated-publications
Data set linked to the paper, "Combining genome-wide studies of breast, prostate, ovarian and endometrial cancers maps cross-cancer susceptibility loci and identifies new genetic associations". Pre-print of the paper is here: https://doi.org/10.1101/2020.06.16.146803. cross_cancer_sum_stats.txt.gz contains summary genome-wide association statistics for susceptibility to single cancers (breast (BR), prostate (PR), ovarian (OV), endometrial (EN), estrogen receptor (ER)-positive breast (POS), ER-negative breast (NEG), and high-grade serous ovarian (HGS) cancers) and from the cross-cancer meta-analysis (main [main] and subtype-focused [sub]). EA in the header refers to the effect allele, OA is the other allele, EAF is the effect allele frequency in the largest of the single cancer data sets (BR), IMPR2 is the imputation quality in the largest of the single cancer data sets (BR), SE is the standard error, PVAL is the P-value, RE2Cs1 is the RE2C statistic mean effect part, RE2Cs2 is the RE2C statistic heterogeneity part, RE2Cp* is the RE2C* P-value. More on RE2Cp* can be found here: http://software.buhmhan.com/RE2C/index.php?mid=contact&act=dispBoardWrite and in https://academic.oup.com/bioinformatics/article/33/14/i379/3953957 SNP names in cross_cancer_sum_stats.txt.gz include the chromosome and build 37 position. main_tetrachoric_corr_matrix.txt and subtype_tetrachoric_corr_matrix.txt provide the tetrachoric correlation matrices used in the main and subtype-focused meta-analyses. These were also used to specify the cryptic.cor argument of the exh.abf function of MetABF. More on MetABF can be found here: https://github.com/trochet/metabf and in https://onlinelibrary.wiley.com/doi/abs/10.1002/gepi.22202 prior_sigmas_for_metabf.txt contains the values used to specify the prior.sigma argument of the exh.abf function in MetABF. The breast cancer data used are described in PMID 29059683 and can be downloaded from http://bcac.ccge.medschl.cam.ac.uk/bcacdata/oncoarray/oncoarray-and-combined-summary-result/gwas- summary-results-breast-cancer-risk-2017/ (this link also includes acknowledgements). The prostate cancer data are described in PMID 29892016 and can be downloaded from: http://practical.icr.ac.uk/blog/?page_id=8164 (this link also includes acknowledgements). The ovarian cancer data used are described in PMID 28346442 and can be downloaded from https://www.ebi.ac.uk/gwas/studies/GCST004415. The endometrial cancer data are described in PMID 30093612 and can be downloaded from https://www.ebi.ac.uk/gwas/studies/GCST006464. These links point to the same data that form the basis of the cross_cancer_sum_stats.txt.gz file. The sample size and precision of the data presented should preclude identification of any individual study participant. However, in downloading these data, you undertake not to attempt to identify individual study participant and not to re-post these data to a third-party website. Please cite the PMIDs highlighted above along with the appropriate acknowledements if you use the cross_cancer_sum_stats.txt.gz file. If you have any questions about this repository, please email Siddhartha Kar at siddhartha dot kar at bristol dot ac dot uk
Microbiome site model is a Named Entity Recognition (NER) model that identifies and annotates the site of microbiome samples in texts. This is the final model version used to annotate metagenomics publications in Europe PMC and enrich metagenomics studies in MGnify with site metadata from literature. For more information, please refer to the following blogs: http://blog.europepmc.org/2020/11/europe-pmc-publications-metagenomics-annotations.html https://www.ebi.ac.uk/about/news/service-news/enriched-metadata-fields-mgnify-based-text-mining-associated-publications
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This datapack is associated with the blog post at https://merenlab.org/2021/10/20/targeted-binning-nif-mag/ It contains the data necessary to run the commands described in the blog post.
body-site model is a Named Entity Recognition (NER) model that identifies and annotates the body-site of microbiome samples in texts. This is the final model version used to annotate metagenomics publications in Europe PMC and enrich metagenomics studies in MGnify with body-site metadata from literature. For more information, please refer to the following blogs: http://blog.europepmc.org/2020/11/europe-pmc-publications-metagenomics-annotations.html https://www.ebi.ac.uk/about/news/service-news/enriched-metadata-fields-mgnify-based-text-mining-associated-publications
AAindex is a database of numerical indices representing various physicochemical and biochemical properties of amino acids and pairs of amino acids. AAindex consists of three sections now: AAindex1 for the amino acid index of 20 numerical values, AAindex2 for the amino acid mutation matrix and AAindex3 for the statistical protein contact potentials. All data are derived from published literature. An amino acid index is a set of 20 numerical values representing any of the different physicochemical and biological properties of amino acids. The AAindex1 section of the Amino Acid Index Database is a collection of published indices together with the result of cluster analysis using the correlation coefficient as the distance between two indices. This section currently contains 544 indices. Another important feature of amino acids that can be represented numerically is the similarity between amino acids. Thus, a similarity matrix, also called a mutation matrix, is a set of 210 numerical values, 20 diagonal and 20x19/2 off-diagonal elements, used for sequence alignments and similarity searches. The AAindex2 section of the Amino Acid Index Database is a collection of published amino acid mutation matrices together with the result of cluster analysis. This section currently contains 94 matrices. In the release 9.0, we added a collection of published protein pairwise contact potentials to AAindex as AAindex3. This section currently contains 47 contact potential matrices. Sponsors: This work was supported by grants and resources from the Ministry of Education, Culture, Sports, Science and Technology, and the Japan Science and Technology Agency, and the Bioinformatics Center, Institute for Chemical Research, Kyoto University and the Super Computer System, Human Genome Center, Institute of Medical Science, University of Tokyo.
Microbiome state model is a Named Entity Recognition (NER) model that identifies and annotates the state of microbiome environment or host in texts. This is the final model version used to annotate metagenomics publications in Europe PMC and enrich metagenomics studies in MGnify with state metadata from literature. For more information, please refer to the following blogs: http://blog.europepmc.org/2020/11/europe-pmc-publications-metagenomics-annotations.html https://www.ebi.ac.uk/about/news/service-news/enriched-metadata-fields-mgnify-based-text-mining-associated-publications
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
It would have been really useful for Inspector Javert to find Jean Valjean if he had the cross-references between the identifies of all of the people he met in Montreuil-sur-Mer to the prisoner database (all he needed was 24601!). Since this dataset has a long name, you can feel free to abbreviate it as IJXD.
We have the same problem in bioinformatics, so this is a database of cross-references extracted from OBO Foundry and other sources by PyOBO. It is a gzipped five-column TSV file that has source namespace, source identifier, target namespace, target identifier, and provenance. Each has been normalized so cross-references from different sources can be integrated and traversed.
It was generated with the following code in the shell:
pip install pyobo pyobo obo xrefs
More information on this blog post: https://cthoyt.com/2020/04/19/inspector-javerts-xref-database.html.
Trends in biotechnology Publication fee - ResearchHelpDesk - Trends in Biotechnology publishes reviews and perspectives on the applied biological sciences: useful science applied to, derived from, or inspired by living systems. The major themes that TIBTECH is interested in include Bioprocessing (biochemical engineering, applied enzymology, industrial biotechnology, biofuels, metabolic engineering) Omics (genome editing, single-cell technologies, bioinformatics, synthetic biology) Materials and devices (bionanotechnology, biomaterials, diagnostics/imaging/detection, soft robotics, biosensors/bioelectronics) Therapeutics (biofabrication, stem cells, tissue engineering and regenerative medicine, antibodies and other protein drugs, drug delivery) Agroenvironment (environmental engineering, bioremediation, genetically modified crops, sustainable development) We particularly seek articles that are relevant to more than one of these themes. Additionally, we welcome articles on law and intellectual property, policy and regulation, bioethics, scientific communication, and the economics of biotechnology. Reviews of mechanistic or phenomenological biology are generally not within TIBTECH's scope, although we do consider reviews of technologies developed from basic biology as long as there's an application in mind. TIBTECH has a diverse audience that reflects its intentionally broad scope. Our readers include not only biologists but also engineers, chemists, pharmacologists, computer scientists, and physicians, and they work in academic, clinical, industrial, NGO, and governmental settings. Therefore, we emphasize accessible articles that are easy to read, and we encourage authors to keep in mind that many readers may not be familiar with their field's specific terminology. For more of TIBTECH editor Matt Pavlovich's take on the journal's aims and scope, read his posts at CrossTalk, the Cell Press blog: "What I talk about when I talk about biotechnology" and "A data-driven map of biotechnology."
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
USRCAT search with one of the OSDD Malaria compounds, OSM-S-35. The SMILES strings were obtained by using chemicalize.org on the relevant OSDD blog post (http://malaria.ourexperiment.org/biological_data/6734/Biological_Activities_of_OSMS106_through_116.html). Two isomers were found for OSM-S-35 and a conformer generated for each with OpenEye's OMEGA toolkit.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
I initially hosted my blog on a lab server, and have since migrated to WordPress.com. As part of this migration, I am using figshare (as an alternative to my lab server) for hosting supplments to my blog posts, such as graphics and data files.