RNA expression analysis was performed on the corpus luteum tissue at five time points after prostaglandin F2 alpha treatment of midcycle cows using an Affymetrix Bovine Gene v1 Array. The normalized linear microarray data was uploaded to the NCBI GEO repository (GSE94069). Subsequent statistical analysis determined differentially expressed transcripts ± 1.5-fold change from saline control with P ≤ 0.05. Gene ontology of differentially expressed transcripts was annotated by DAVID and Panther. Physiological characteristics of the study animals are presented in a figure. Bioinformatic analysis by Ingenuity Pathway Analysis was curated, compiled, and presented in tables. A dataset comparison with similar microarray analyses was performed and bioinformatics analysis by Ingenuity Pathway Analysis, DAVID, Panther, and String of differentially expressed genes from each dataset as well as the differentially expressed genes common to all three datasets were curated, compiled, and presented in tables. Finally, a table comparing four bioinformatics tools' predictions of functions associated with genes common to all three datasets is presented. These data have been further analyzed and interpreted in the companion article "Early transcriptome responses of the bovine mid-cycle corpus luteum to prostaglandin F2 alpha includes cytokine signaling". Resources in this dataset:Resource Title: Supporting information as Excel spreadsheets and tables. File Name: Web Page, url: http://www.sciencedirect.com/science/article/pii/S2352340917304031?via=ihub#s0070
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Framing the investigation of diverse cancers as a machine learning problem has recently shown significant potential in multi-omics analysis and cancer research. Empowering these successful machine learning models are the high-quality training datasets with sufficient data volume and adequate preprocessing. However, while there exist several public data portals including The Cancer Genome Atlas (TCGA) multi-omics initiative or open-bases such as the LinkedOmics, these databases are not off-the-shelf for existing machine learning models. we propose MLOmics, an open cancer multi-omics database aiming at serving better the development and evaluation of bioinformatics and machine learning models. MLOmics contains 8,314 patient samples covering all 32 cancer types with four omics types, stratified features, and extensive baselines. Complementary support for downstream analysis and bio-knowledge linking are also included to support interdisciplinary analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A collection of similar but different presentations I've made aimed at introducing bioinformatics to bench biologists.
Computational and Structural Biotechnology Journal Impact Factor 2024-2025 - ResearchHelpDesk - Computational and Structural Biotechnology Journal (CSBJ) is an online gold open access journal publishing research articles and reviews after full peer review. All articles are published, without barriers to access, immediately upon acceptance. The journal places a strong emphasis on functional and mechanistic understanding of how molecular components in a biological process work together through the application of computational methods. Structural data may provide such insights, but they are not a pre-requisite for publication in the journal. Specific areas of interest include, but are not limited to: Structure and function of proteins, nucleic acids and other macromolecules Structure and function of multi-component complexes Protein folding, processing and degradation Enzymology Computational and structural studies of plant systems Microbial Informatics Genomics Proteomics Metabolomics Algorithms and Hypothesis in Bioinformatics Mathematical and Theoretical Biology Computational Chemistry and Drug Discovery Microscopy and Molecular Imaging Nanotechnology Systems and Synthetic Biology The journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence, and enables the rapid publication of papers under the following categories: Research articles Review articles Mini Reviews Highlights Communications Software/Web server articles Methods articles Database articles Book Reviews Meeting Reviews
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Modern research is increasingly data-driven and reliant on bioinformatics software. Publication is a common way of introducing new software, but not all bioinformatics tools get published. Giving there are competing tools, it is important not merely to find the appropriate software, but have a metric for judging its usefulness. Journal's impact factor has been shown to be a poor predictor of software popularity; consequently, focusing on publications in high-impact journals limits user's choices in finding useful bioinformatics tools. Free and open source software repositories on popular code sharing platforms such as GitHub provide another venue to follow the latest bioinformatics trends. The open source component of GitHub allows users to bookmark and copy repositories that are most useful to them. This Perspective aims to demonstrate the utility of GitHub “stars,” “watchers,” and “forks” (GitHub statistics) as a measure of software impact. We compiled lists of impactful bioinformatics software and analyzed commonly used impact metrics and GitHub statistics of 50 genomics-oriented bioinformatics tools. We present examples of community-selected best bioinformatics resources and show that GitHub statistics are distinct from the journal's impact factor (JIF), citation counts, and alternative metrics (Altmetrics, CiteScore) in capturing the level of community attention. We suggest the use of GitHub statistics as an unbiased measure of the usability of bioinformatics software complementing the traditional impact metrics.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset contains all of the source code used in the analysis described in the paper "Molecular Basis of Human Complex Diseases." The dataset contains codes for the three main results mentioned in the article, which are packaged in three separate files, and numbered in the same order as the article describes. The first section of the code summarizes the disease-related regulatory analysis process. The second section contains codes for identifying all cohort- and family-related variants. The third section of the code describes the entire process of analyzing single-cell data.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This collection contains an example MINUTE-ChIP dataset to run minute pipeline on, provided as supporting material to help users understand the results of a MINUTE-ChIP experiment from raw data to a primary analysis that yields the relevant files for downstream analysis along with summarized QC indicators. Example primary non-demultiplexed FASTQ files provided here were used to generate GSM5493452-GSM5493463 (H3K27m3) and GSM5823907-GSM5823918 (Input), deposited on GEO with the minute pipeline all together under series GSE181241. For more information about MINUTE-ChIP, you can check the publication relevant to this dataset: Kumar, Banushree, et al. "Polycomb repressive complex 2 shields naïve human pluripotent cells from trophectoderm differentiation." Nature Cell Biology 24.6 (2022): 845-857. If you want more information about the minute pipeline, there is a public biorXiv and a GitHub repository and official documentation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets, conda environments and Softwares for the course "Population Genomics" of Prof Kasper Munch. This course material is maintained by the health data science sandbox. This webpage shows the latest version of the course material.
The data is connected to the following repository: https://github.com/hds-sandbox/Popgen_course_aarhus. The original course material from Prof Kasper Munch is at https://github.com/kaspermunch/PopulationGenomicsCourse.
Description
The participants will after the course have detailed knowledge of the methods and applications required to perform a typical population genomic study.
The participants must at the end of the course be able to:
The course introduces key concepts in population genomics from generation of population genetic data sets to the most common population genetic analyses and association studies. The first part of the course focuses on generation of population genetic data sets. The second part introduces the most common population genetic analyses and their theoretical background. Here topics include analysis of demography, population structure, recombination and selection. The last part of the course focus on applications of population genetic data sets for association studies in relation to human health.
Curriculum
The curriculum for each week is listed below. "Coop" refers to a set of lecture notes by Graham Coop that we will use throughout the course.
Course plan
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This record contains the data (references, reads, assemblies) used in the analyses for the Trycycler paper.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Index file corresponding to BAM file http://figshare.com/articles/Example_BAM_file/1460736
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Allen Human Brain Atlas provides an anatomically detailed view of gene expression in the brain. This full transcriptomic dataset contains over 200 million expression values. To facilitate use of this resource, we distilled the data into a matrix of 20,737 genes by 68 cortical regions that represent the automatically segmented FreeSurfer regions (Desikan-Killiany parcellation). In this data report, we describe the Allen-FreeSurfer mapping, sampling information and provide regional values of gene expression. The resulting cortical transcriptome facilitates interpretation of human neuroimaging findings by providing a molecular context. Read the full data report at: http://journal.frontiersin.org/article/10.3389/fnins.2015.00323/full AllenHBA_to_DKRegion_Map.xls - Excel file containing the mapping from Allen Brain Atlas samples to the FreeSurfer cortical regions. Each sample is referenced with a unique identifier by combining the donor ID with the x, y and z MNI152 coordinates (“10021_5.1_27.1_28.6“ for example). DKRegionStatistics.tsv - Tab separated file characterizing the FreeSurfer cortical regions. This file lists how many donors contribute to each region, Allen Brain Atlas samples per region and alternative identifiers. AllenHBA_DK_ExpessionMatrix.tsv - tab separated file containing correlation to the median values across the donors (left hemisphere only, column named ”Average donor correlation to median”) and gene expression values across the 68 FreeSurfer cortical regions (columns). This file can be opened with read.table in R or as a tab separated file in a spreadsheet program. ConsistentGOGroups.tsv – Gene Ontology enrichment analyses results for the consistency measure. InconsistentGOGroups.tsv – Gene Ontology enrichment analyses results for the consistency measure (with high ranked groups showing more inconsistency). CreateExpressionLUT.r - This R script provides the method to load and convert expression values for a specific gene into a color lookup table that can be used to visualize the averaged expression values in FreeSurfer. CreateColorBar.r - This R script will create a color bar to complement the TKSurfer generated images.
http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/
This work presents a new consensus clustering method for gene expression microarray data based on a genetic algorithm. Using two datasets - DA and DB - as input, the genetic algorithm examines putative partitions for the samples in DA, selecting biomarkers that support such partitions. The biomarkers are then used to build a classifier which is used in DB to determine its samples classes. The genetic algorithm is guided by an objective function that takes into account the accuracy of classification in both datasets, the number of biomarkers that support the partition, and the distribution of the samples across the classes for each dataset. To illustrate the method, two whole-genome breast cancer instances from dfferent sources were used. In this application, the results indicate that the method could be used to find unknown subtypes of diseases supported by biomarkers presenting similar gene expression profiles across platforms. Moreover, even though this initial study was restricted to two datasets and two classes, the method can be easily extended to consider both more datasets and classes. PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1
Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ; Chetty, Madhu ; Ahmad, Shandar ; Ngom, Alioune ; Teng, Shyh Wei ; Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ; Coverage: Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The BAltic Gene Set gene catalogue v1.1 encompasses 66,530,673 genes.The 66 million genes are based on metagenomic data from Alneberg at al. (2020) from 124 seawater samples, that span the salinity and oxygen gradients of the Baltic Sea and capture seasonal dynamics at two locations. To obtain the gene catalogue, we used a mix-assembly approach described in Delgado et al. (2022).The gene catalogue has been functionally and taxonomically annotated, using the Mix-assembly Gene Catalog pipeline (https://github.com/EnvGen/mix_assembly_pipeline). The taxonomy annotation was performed using Mmseqs21 and CAT3.Here you find representative mix-assembly gene and protein sequences, and different types of annotations for the proteins. Also, contigs for the co-assembly are included (see Delgado et al. 2022), gene and protein sequences from each individual assembly and the co-assembly, and a table containing the genes in each of the clusters. See README for details.When using the BAGSv1.1 gene catalogue, please cite:1. Delgado LF, Andersson AF. Evaluating metagenomic assembly approaches for biome-specific gene catalogues. Microbiome 10, 72 (2022)2. Alneberg J, Bennke C, Beier S, Bunse C, Quince C, Ininbergs K, Riemann L, Ekman M, Jürgens K, Labrenz M, Pinhassi J, Andersson AF (2020) Ecosystem-wide metagenomic binning enables prediction of ecological niches from genomes. Commun Biol 3, 119 (2020)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundHealth sciences research is increasingly focusing on big data applications, such as genomic technologies and precision medicine, to address key issues in human health. These approaches rely on biological data repositories and bioinformatic analyses, both of which are growing rapidly in size and scope. Libraries play a key role in supporting researchers in navigating these and other information resources.MethodsWith the goal of supporting bioinformatics research in the health sciences, the University of Arizona Health Sciences Library established a Bioinformation program. To shape the support provided by the library, I developed and administered a needs assessment survey to the University of Arizona Health Sciences campus in Tucson, Arizona. The survey was designed to identify the training topics of interest to health sciences researchers and the preferred modes of training.ResultsSurvey respondents expressed an interest in a broad array of potential training topics, including "traditional" information seeking as well as interest in analytical training. Of particular interest were training in transcriptomic tools and the use of databases linking genotypes and phenotypes. Staff were most interested in bioinformatics training topics, while faculty were the least interested. Hands-on workshops were significantly preferred over any other mode of training. The University of Arizona Health Sciences Library is meeting those needs through internal programming and external partnerships.ConclusionThe results of the survey demonstrate a keen interest in a variety of bioinformatic resources; the challenge to the library is how to address those training needs. The mode of support depends largely on library staff expertise in the numerous subject-specific databases and tools. Librarian-led bioinformatic training sessions provide opportunities for engagement with researchers at multiple points of the research life cycle. When training needs exceed library capacity, partnering with intramural and extramural units will be crucial in library support of health sciences bioinformatic research.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Salmonella is a pathogenic microorganism linked to foodborne outbreaks associated with egg and egg products. This microorganism can resist sanitation of the egg processing equipment and form biofilms. The main challenge is detecting Salmonella cells in the early stages of biofilm formation to use effective interventions to control and remove the Salmonella biofilms. This work aimed to study the biofilm formation of S. Typhimurium in liquid whole egg (LWE) on three common food-contact surfaces, stainless steel, silicone, and nylon, during the first five hours of incubation at 37°C and compare traditional microbiological methods to innovative and fast detection techniques. The results showed that using general plate counts, Salmonella cells were detected after three h of incubation with less than 1 – log of growth, silicone was the material with most cells attached, followed by stainless steel. Long-read whole genome sequencing detected Salmonella on stainless steel, silicone, and nylon after only one h of incubation. The results of this study suggest that long-read sequencing could be very useful for detecting Salmonella at low concentrations in the processing environment.
This research used the resources provided by SCINet project and the AI Center of Excellence of the USDA Agricultural Research Service, ARS project number 0500-00093-001-00-D.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A summary of the bacterial genomes used in the prophage analysis. This file contains the columns GENOMEID, Number of Contigs, Total Length (bp), Shortest Contig (bp),Longest Contig (bp) separated by tabs
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These files contain the ONT reads as produced by the Albacore basecaller (before Porechop trimming and demultiplexing).The set of adapter trimmed reads (after Porechop, no subsampling) are available in SRA:SRR5665597, SRR5665596, SRR5665591, SRR5665590, SRR5665593, SRR5665592, SRR5665595, SRR5665594, SRR5665601, SRR5665600, SRR5665599, SRR5665598
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These are the basecalled (FASTQ format) reads for the WGS test set used in the Deepbinner manuscript.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These files contain the ONT reads after subsampling based on length and quality.The full set of basecalled, adapter trimmed and demultiplexed reads (no subsampling) are available in SRA:SRR5665597, SRR5665596, SRR5665591, SRR5665590, SRR5665593, SRR5665592, SRR5665595, SRR5665594, SRR5665601, SRR5665600, SRR5665599, SRR5665598
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Contains the Supplementary Tables of the article of HADEG database
RNA expression analysis was performed on the corpus luteum tissue at five time points after prostaglandin F2 alpha treatment of midcycle cows using an Affymetrix Bovine Gene v1 Array. The normalized linear microarray data was uploaded to the NCBI GEO repository (GSE94069). Subsequent statistical analysis determined differentially expressed transcripts ± 1.5-fold change from saline control with P ≤ 0.05. Gene ontology of differentially expressed transcripts was annotated by DAVID and Panther. Physiological characteristics of the study animals are presented in a figure. Bioinformatic analysis by Ingenuity Pathway Analysis was curated, compiled, and presented in tables. A dataset comparison with similar microarray analyses was performed and bioinformatics analysis by Ingenuity Pathway Analysis, DAVID, Panther, and String of differentially expressed genes from each dataset as well as the differentially expressed genes common to all three datasets were curated, compiled, and presented in tables. Finally, a table comparing four bioinformatics tools' predictions of functions associated with genes common to all three datasets is presented. These data have been further analyzed and interpreted in the companion article "Early transcriptome responses of the bovine mid-cycle corpus luteum to prostaglandin F2 alpha includes cytokine signaling". Resources in this dataset:Resource Title: Supporting information as Excel spreadsheets and tables. File Name: Web Page, url: http://www.sciencedirect.com/science/article/pii/S2352340917304031?via=ihub#s0070