Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supplemental Data 1-S1. Timeline of important events shaping contemporary bioinformatics and comparative genomics. Timeline is not intended to be absolutely comprehensive of each of the observed fields, their respective histories. See footnotes for key review publications, sources in addition to those listed in Reference column. Field of contributions are color-coded accordingly: purple= computer science/engineering, blue= legislation/government action, biology= green, economic/markets= orange, academic institution= pink
Facebook
Twitterhttps://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/
Explore the historical Whois records related to blog-bioinformatics.science (Domain). Get insights into ownership history and changes over time.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview of the NHM Informatics Intiative based around the data life cycle.
Facebook
Twitterhttps://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
This item contains a test dataset based on Sumatran rhinoceros (Dicerorhinus sumatrensis) whole-genome re-sequencing data that we publish along with the GenErode pipeline (https://github.com/NBISweden/GenErode; Kutschera et al. 2022) and that we reduced in size so that users have the possibility to get familiar with the pipeline before analyzing their own genome-wide datasets. We extracted scaffold ‘Sc9M7eS_2_HRSCAF_41’ of size 40,842,778 bp from the Sumatran rhinoceros genome assembly (Dicerorhinus sumatrensis harrissoni; GenBank accession number GCA_014189135.1) to be used as reference genome in GenErode. Some GenErode steps require the reference genome of a closely related species, so we additionally provide three scaffolds from the White rhinoceros genome assembly (Ceratotherium simum simum; GenBank accession number GCF_000283155.1) with a combined length of 41,195,616 bp that are putatively orthologous to Sumatran rhinoceros scaffold ‘Sc9M7eS_2_HRSCAF_41’, along with gene predictions in GTF format. The repository also contains a Sumatran rhinoceros mitochondrial genome (GenBank accession number NC_012684.1) to be used as reference for the optional mitochondrial mapping step in GenErode. The test dataset contains whole-genome re-sequencing data from three historical and three modern Sumatran rhinoceros samples from the now-extinct Malay Peninsula population from von Seth et al. (2021) that was subsampled to paired-end reads that mapped to Sumatran rhinoceros scaffold ‘Sc9M7eS_2_HRSCAF_41’, along with a small proportion of randomly selected reads that mapped to the Sumatran rhinoceros mitochondrial genome or elsewhere in the genome. For GERP analyses, scaffolds from the genome assemblies of 30 mammalian outgroup species are provided that had reciprocal blast hits to gene predictions from Sumatran rhinoceros scaffold ‘Sc9M7eS_2_HRSCAF_41’. Further, a phylogeny of the White rhinoceros and the 30 outgroup species including divergence time estimates (in billions of years) from timetree.org is available. Finally, the item contains configuration and metadata files that were used for three separate runs of GenErode to generate the results presented in Kutschera et al. (2022). Bash scripts and a workflow description for the test dataset generation are available in the GenErode GitHub repository (https://github.com/NBISweden/GenErode/docs/extras/test_dataset_generation).
References: Kutschera VE, Kierczak M, van der Valk T, von Seth J, Dussex N, Lord E, et al. GenErode: a bioinformatics pipeline to investigate genome erosion in endangered and extinct species. BMC Bioinformatics 2022;23:228. https://doi.org/10.1186/s12859-022-04757-0 von Seth J, Dussex N, Díez-Del-Molino D, van der Valk T, Kutschera VE, Kierczak M, et al. Genomic insights into the conservation status of the world’s last remaining Sumatran rhinoceros populations. Nature Communications 2021;12:2393.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here you find the transcripts of interviews collected by Sabina Leonelli as part of the ERC project "The Epistemology of Data-Intensive Science". You also find the information sheet provided to interviewees, which gives you the context for this project. Further information and related publications can be found at www.datastudies.eu. One paper that specifically makes use of these interviews was published by Sabina Leonelli in the journal Philosophy of Science in 2018, under the title "Data in Time: Time-Scales of Data Use in the Life Sciences." The transcripts document yeast researchers' attitudes to data curation and the use of databases in their field. Researchers have consented to have these transcripts made available as Open Data. Other interviewees did not give consent, so those transcripts are held securely by the research team in Exeter.
Facebook
Twitterhttps://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/
Uncover historical ownership history and changes over time by performing a reverse Whois lookup for the company Swiss-Institute-of-Bioinformatics.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variation data of pan-genome in 1913-based allotetraploid cottonsThe variome data sets (SNPs, InDels, SVs. CNVs) in 1,913 cotton accessions, non-reference genome sequences and annotated genes of G. hirsutum and G. barbadense pan-genome.1. The SNPs, InDels calls in hapmap format of 1,913 cotton accession cottons.2. The SVs and CNVs in VCF format 742 cotton accessions.3. The non-reference genome sequences and gene annotations of G. hirsutum and G. barbadense accessions.4. Gene number and presence frequency in G. hirsutum and G. barbadense pan-genomes.
Facebook
TwitterODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
This FASTA file is the NCBI Nt (Nucleotide) database (public domain) used for holistic metagenomic screening of ancient DNA data at the Department of Archaeogenetics at the Max Planck Institute for the Science of Human History. We offer here the FASTA file used to construct MALT databases (https://uni-tuebingen.de/fakultaeten/mathematisch-naturwissenschaftliche-fakultaet/fachbereiche/informatik/lehrstuehle/algorithms-in-bioinformatics/software/malt/), which are generally too large for uploading. Please see each relevent publications that use the database for MALT database construction commands.
NCBI does not retain older versions of this database which is why this has been uploaded here. It was downloaded on 2017-10-26 12:39 from: ftp://ftp-trace.ncbi.nih.gov/blast/db/FASTA/nt.gz. The NCBI Nt database is released into the public domain as per https://www.ncbi.nlm.nih.gov/home/about/policies/.
Facebook
TwitterIn order to introduce students to the concept of molecular diversity, we developed a short, engaging online lesson using basic bioinformatics techniques. Students were introduced to basic bioinformatics while learning about local on-campus species diversity by 1) identifying species based on a given sequence (performing Basic Local Alignment Search Tool [BLAST] analysis) and 2) researching and documenting the natural history of each species identified in a concise write-up. To assess the student’s perception of this lesson, we surveyed students using a Likert scale and asking them to elaborate in written reflection on this activity. When combined, student responses indicated that 94% of students agreed this lesson helped them understand DNA barcoding and how it is used to identify species. The majority of students, 89.5%, reported they enjoyed the lesson and mainly provided positive feedback, including “It really opened my eyes to different species on campus by looking at DNA sequences”, “I loved searching information and discovering all this new information from a DNA sequence”, and finally, “the database was fun to navigate and identifying species felt like a cool puzzle.” Our results indicate this lesson both engaged and informed students on the use of DNA barcoding as a tool to identify local species biodiversity.
Primary Image: DNA Barcoded Specimens. Crane fly, dragonfly, ant, and spider identified using DNA barcoding.
Facebook
Twitterhttps://www.zionmarketresearch.com/privacy-policyhttps://www.zionmarketresearch.com/privacy-policy
Global Bioinformatics Services market size was USD 3.12 billion in 2023 and is grow to around USD 10.87 billion by 2032 with a CAGR of roughly 14.86%.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Spliceosomal introns are gene segments removed ("spliced") from RNA transcripts by large ribonucleoprotein machineries called spliceosomes. In some eukaryotes a second spliceosome (the minor/ U12-type) is responsible for processing a tiny minority of introns. Despite its seemingly modest role, minor splicing has persisted for roughly 1.5 billion years of eukaryotic evolution. Identifying and cataloging minor introns in > 3000 eukaryotic genomes, we report diverse evolutionary histories including surprisingly high numbers of minor introns in some fungi and green algae, repeated massive loss, as well as several general biases in the positional and genic distributions of minor introns. We estimate that ancestral minor intron densities were comparable to those of the most minor intron-rich species, suggesting a trend of long-term stasis. Finally, three findings suggest a major role for neutral processes in minor intron evolution. First, we find highly similar patterns of minor and major intron evolution, in contrast to the predictions of both functionalist and deleterious models. Second, we find that observed functional biases among minor intron-containing genes are largely explained by these genes' greater ages. Third, we find no association of intron splicing with cell proliferation in a minor intron-rich fungus, suggesting that regulatory roles are lineage-specific and thus cannot offer a general explanation for minor splicing's persistence. These data constitute the most comprehensive view to date of modern minor introns, their evolutionary history, and the forces shaping minor splicing, and provide a foundation for future studies of these remarkable genomic elements.
Facebook
Twitterhttps://www.zionmarketresearch.com/privacy-policyhttps://www.zionmarketresearch.com/privacy-policy
Bioinformatics In IVD Testing Market valued at $97.51 Bn in 2023, and is projected to $USD 171.91 Bn by 2032, at a CAGR of 6.44% from 2023 to 2032
Facebook
TwitterA protein superfamily contains distantly related proteins that have acquired diverse biological functions through a long evolutionary history. Phylogenetic analysis of the early evolution of protein superfamilies is a key challenge because existing phylogenetic methods show poor performance when protein sequences are too diverged to construct an informative multiple sequence alignment. Here, we propose the Graph Splitting (GS) method, which rapidly reconstructs a protein superfamily-scale phylogenetic tree using a graph-based approach. Evolutionary simulation showed that the GS method can accurately reconstruct phylogenetic trees and be robust to major problems in phylogenetic estimation, such as biased taxon sampling, heterogeneous evolutionary rates, and long-branch attraction when sequences are substantially diverged. Its application to an empirical dataset of the triosephosphate isomerase (TIM)-barrel superfamily suggests rapid evolution of protein-mediated pyrimidine biosynthesis, ...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The application of high-throughput, short-read sequencing to degraded DNA has greatly increased the feasibility of generating genomic data from historical museum specimens. While many published studies report successful sequencing results from historical specimens; in reality, success and quality of sequence data can be highly variable. To examine predictors of sequencing quality, and methodological approaches to improving data accuracy, we generated and analyzed genomic sequence data from 115 historically collected museum specimens up to 180 years old. Data span both population genomic and phylogenomic scales, including historically collected specimens from 34 specimens of four species of Australian rock-wallabies (genus Petrogale) and 92 samples from 79 specimens of Australo-Papuan murine rodents (subfamily Murinae). For historical rodent specimens, where the focus was sampling for phylogenomics, we found that regardless of specimen age, DNA sequence libraries prepared from toe pad or bone subsamples performed significantly better than those taken from the skin (in terms of proportion of reads on target, number of loci captured, and data accuracy). In total, 93% of DNA libraries from toe pad or bone subsamples resulted in reliable data for phylogenetic inference, compared to 63% of skin subsamples. For skin subsamples, proportion of reads on target weakly correlated with collection year. Then using population genomic data from rock-wallaby skins as a test case, we found substantial improvement in final data quality by mapping to a high-quality “closest sister” de novo assembly from fresh tissues, compared to mapping to a sample-specific historical de novo assembly. Choice of mapping approach also affected final estimates of the number of segregating sites and Watterson's θ, both important parameters for population genomic inference. The incorporation of accurate and reliable sequence data from historical specimens has important outcomes for evolutionary studies at both population and phylogenomic scales. By assessing the outcomes of different approaches to specimen subsampling, library preparation and bioinformatic processing, our results provide a framework for increasing sequencing success for irreplaceable historical specimens.
Facebook
TwitterOpen source web-based system and database that provides access to historical records and trends in the Gene Ontology (GO) and GO annotations (GOA). Used for monitoring changes in the Gene Ontology and their impact on genomic data analysis.
Facebook
TwitterPredicting functions of proteins and alternatively spliced isoforms encoded in a genome is one of the important applications of bioinformatics in the post-genome era. Due to the practical limitation of experimental characterization of all proteins encoded in a genome using biochemical studies, bioinformatics methods provide powerful tools for function annotation and prediction. These methods also help minimize the growing sequence-to-function gap. Phylogenetic profiling is a bioinformatics approach to identify the influence of a trait across species and can be employed to infer the evolutionary history of proteins encoded in genomes. Here we propose an improved phylogenetic profile-based method which considers the co-evolution of the reference genome to derive the basic similarity measure, the background phylogeny of target genomes for profile generation and assigning weights to target genomes. The ordering of genomes and the runs of consecutive matches between the proteins were used to...
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Bioinformatic pipeline
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cushion-forming plant species are found in alpine and polar environments around the world. They modify the microclimate, thereby facilitating other plant species. Similar to the effectiveness of shrubs as a means to study facilitation in arid and semi-arid environments, we explore the potential for cushion plant species to expand the general- ity of research on this contemporary ecological interaction. A systematic review was conducted to determine the number of publications and citation frequency on relevant ecological topics whilst using shrub literature as a baseline to assess relative importance of cushions as a focal point for future ecological research. Although, there are forty times more shrub articles, mean citations per paper is comparable between cushion and shrub literature. Furthermore, the scope of ecological research topics studied us- ing cushions is broad including facilitation, competition, environmental gradients, life history, genetics, reproduction, community, ecosystem and evolution. The preliminary ecological evidence to date also strongly suggests that cushion plants can be keystone species in their ecosystems. Hence, ecological research on net interactions including facilitation and patterns of diversity can be successfully examined using cushion plants, and this is particularly timely given expectations associated with a changing climate in these regions.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview This dataset contains comprehensive metadata from single-cell gene expression studies, providing researchers with structured information about cellular phenotypes, experimental conditions, and sample characteristics. The data is particularly valuable for bioinformatics research, machine learning applications in genomics, and comparative studies across different cell types and conditions.
Dataset Description: The dataset comprises metadata associated with single-cell RNA sequencing (scRNA-seq) experiments, including: Cell Type Information: Classification of different cell types and subtypes Experimental Metadata: Details about experimental conditions, protocols, and methodologies Sample Characteristics: Information about biological samples, including tissue origin, developmental stages, and treatment conditions Quality Metrics: Data quality indicators and filtering parameters Annotation Details: Standardized cell type annotations and biological classifications
Data Source and Licensing This dataset is derived from publicly available single-cell gene expression data, potentially sourced from: CELLxGENE Data Portal (https://cellxgene.cziscience.com/) Gene Expression Omnibus (GEO) European Bioinformatics Institute (EBI) Other public genomics repositories
License: Creative Commons CC BY 4.0 (or specify the actual license) ✅ Commercial use allowed ✅ Modification allowed ✅ Distribution allowed ✅ Private use allowed ❗ Attribution required
Research Applications Cell Type Discovery: Identify novel cell types and subtypes Comparative Genomics: Study cellular differences across conditions, tissues, or species Disease Research: Investigate cellular changes in disease states Developmental Biology: Analyze cellular differentiation and development patterns
Machine Learning Applications Classification Tasks: Predict cell types from gene expression data Clustering Analysis: Discover cellular subpopulations and states Dimensionality Reduction: Apply PCA, t-SNE, UMAP for visualization Biomarker Discovery: Identify genes characteristic of specific cell types
Educational Use : Teaching bioinformatics and computational biology concepts. Demonstrating single-cell analysis workflows. Training in data preprocessing and quality control.
Data Quality and Preprocessing : Quality Control: Metadata has been curated and standardized Missing Values: [Specify how missing values are handled] Standardization: Cell type annotations follow established ontologies (e.g., Cell Ontology) Validation: Data has been cross-referenced with original publications
Usage Guidelines : Getting Started- Load the metadata files using pandas or your preferred data analysis tool. Explore the cell type distributions and experimental conditions. Filter data based on quality metrics as needed. Join with corresponding gene expression data for comprehensive analysis.
Best Practices Always cite original data sources and publications. Consider batch effects when combining data from different experiments. Validate findings with independent datasets when possible. Follow established bioinformatics workflows for single-cell analysis.
Citation and Acknowledgments : If you use this dataset in your research, please: Cite this dataset:[Kazi Aishikuzzaman]. (2024). Cell Gene Expression Metadata. Kaggle. https://www.kaggle.com/datasets/kaziaishikuzzaman/cell-gene-expression-metadata
File Structure :
dataset-
─ metadata_summary.csv # Main metadata file
─ cell_type_annotations.csv # Detailed cell type information
─ experimental_conditions.csv # Experiment-specific metadata
─ quality_metrics.csv # Data quality indicators
─ README.txt # Detailed file descriptions
Technical Specifications : File Encoding: UTF-8 Separator: Comma-separated values (CSV) Missing Values: Represented as 'NA' or empty cells Data Types: Mixed (categorical, numerical, text)
Contact and Support : For questions about this dataset: Kaggle Profile: @kaziaishikuzzaman Dataset Issues: Use Kaggle's discussion section Collaboration: Open to research collaborations and improvements
Version History : v1.0: Initial release with comprehensive metadata collection [Future versions]: Updates and additional annotations as available
Related Datasets: Consider exploring these complementary datasets- Single-cell gene expression data (companion to this metadata) Cell atlas datasets from major consortiums Disease-specific single-cell studies Multi-omics datasets with matching cell types
Keywords: single-cell, RNA-seq, genomics, cell types, metadata, bioinformatics, machine learning, computational biology Category: Biology > Genomics
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
OHEJP Project: BeOne Salmonella enterica serovar Dublin (S. Dublin) is a host-adapted serovar that causes enteritis and/or systemic diseases in cattle. Because the serovar is not host-specific, it can infect other species, including human beings, causing severe disease and a higher mortality rate than other non-typhoidal serovars. Given that human illnesses are primarily caused by contaminated milk, milk products, and beef, data on the genetic connection between S. Dublin strains from livestock and food should be analyzed. Whole genome sequencing (WGS) was performed on 144 S. Dublin strains from cattle and 30 strains from food. Multilocus sequence typing (MLST) found that the majority of livestock and food isolates were of the sequence type ST-10. As discovered by core-genome Single-Nucleotide Polymorphisms Typing and core-genome MLST, 14 of 30 strains from food origin were clonally related to at least one strain from cattle. Without outliers, the remaining 16 food-borne strains fit into the genomic structure of S. Dublin in Germany. WGS demonstrated to be an effective method not only for learning about the epidemiology of Salmonella strains, but also for detecting clonal relationships between organisms isolated at different stages of production. This study discovered a strong genetic link between S. Dublin strains from cattle and food, and thus the potential to cause human infections. S. Dublin strains from both origins have a nearly comparable collection of virulence factors, emphasizing their ability to produce severe clinical symptoms in animals as well as humans, emphasizing the importance of effective S. Dublin management in a farm to fork strategy.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supplemental Data 1-S1. Timeline of important events shaping contemporary bioinformatics and comparative genomics. Timeline is not intended to be absolutely comprehensive of each of the observed fields, their respective histories. See footnotes for key review publications, sources in addition to those listed in Reference column. Field of contributions are color-coded accordingly: purple= computer science/engineering, blue= legislation/government action, biology= green, economic/markets= orange, academic institution= pink