https://www.bccresearch.com/aboutus/terms-conditionshttps://www.bccresearch.com/aboutus/terms-conditions
Explore BCC Research's comprehensive report on Bioinformatics technologies Market. This report aims to study current and historical market revenues can be estimated based on the services & platforms, solutions, and application type.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
"Synthetic protein dataset with sequences, physical properties, and functional classification for machine learning tasks."
This synthetic dataset was created to explore and develop machine learning models in bioinformatics. It contains 20,000 synthetic proteins, each with an amino acid sequence, calculated physicochemical properties, and a functional classification.
While this is a simulated dataset, it was inspired by patterns observed in real protein datasets, such as: - UniProt: A comprehensive database of protein sequences and annotations. - Kyte-Doolittle Scale: Calculations of hydrophobicity. - Biopython: A tool for analyzing biological sequences.
This dataset is ideal for: - Training classification models for proteins. - Exploratory analysis of physicochemical properties of proteins. - Building machine learning pipelines in bioinformatics.
The dataset is divided into two subsets:
- Training: 16,000 samples (proteinas_train.csv
).
- Testing: 4,000 samples (proteinas_test.csv
).
This dataset was inspired by real bioinformatics challenges and designed to help researchers and developers explore machine learning applications in protein analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction to bioinformatics is a book. It was written by Arthur M. Lesk and published by Oxford University Press in 2002.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ObjectivesIn this research, we aim to explore the bioinformatic mechanism of infertile endometriosis in order to identify new treatment targets and molecular mechanism.MethodsThe Gene Expression Omnibus (GEO) database was used to download MRNA sequencing data from infertile endometriosis patients. The “limma” package in R software was used to find differentially expressed genes (DEGs). Weighted gene co-expression network analysis (WGCNA) was used to classify genes into modules, further obtained the correlation coefficient between the modules and infertility endometriosis. The intersection genes of the most disease-related modular genes and DEGs are called gene set 1. To clarify the molecular mechanisms and potential therapeutic targets for infertile endometriosis, we used Gene Ontology (GO), Kyoto Gene and Genome Encyclopedia (KEGG) enrichment, Protein-Protein Interaction (PPI) networks, and Gene Set Enrichment Analysis (GSEA) on these intersecting genes. We identified lncRNAs and miRNAs linked with infertility and created competing endogenous RNAs (ceRNA) regulation networks using the Human MicroRNA Disease Database (HMDD), mirTarBase database, and LncRNA Disease database.ResultsFirstly, WGCNA enrichment analysis was used to examine the infertile endometriosis dataset GSE120103, and we discovered that the Meorangered1 module was the most significantly related with infertile endometriosis. The intersection genes were mostly enriched in the metabolism of different amino acids, the cGMP-PKG signaling pathway, and the cAMP signaling pathway according to KEGG enrichment analysis. The Meorangered1 module genes and DEGs were then subjected to bioinformatic analysis. The hub genes in the PPI network were performed KEGG enrichment analysis, and the results were consistent with the intersection gene analysis. Finally, we used the database to identify 13 miRNAs and two lncRNAs linked to infertility in order to create the ceRNA regulatory network linked to infertile endometriosis.ConclusionIn this study, we used a bioinformatics approach for the first time to identify amino acid metabolism as a possible major cause of infertility in patients with endometriosis and to provide potential targets for the diagnosis and treatment of these patients.
The Bioinformatics and Systems Biology (BISB) Core aims to assist investigators in overcoming the technical challenges in utilizing bioinformatics and systems biology techniques. The core will collaborate with principal investigators to incorporate systems biology approaches synergistically into their laboratory studies in order to speed the tempo of their research and develop transformative and translational results.
Leveraging prior viral genome sequencing data to make predictions on whether an unknown, emergent virus harbors a ‘phenotype-of-concern’ has been a long-sought goal of genomic epidemiology. A predictive phenotype model built from nucleotide-level information alone is challenging with respect to RNA viruses due to the ultra-high intra-sequence variance of their genomes, even within closely related clades. We developed a degenerate k-mer method to accommodate this high intra-sequence variation of RNA virus genomes for modeling frameworks. By leveraging a taxonomy-guided ‘group-shuffle-split’ cross validation paradigm on complete coronavirus assemblies from prior to October 2018, we trained multiple regularized logistic regression classifiers at the nucleotide k-mer level. We demonstrate the feasibility of this method by finding models accurately predicting withheld SARS-CoV-2 genome sequences as human pathogens and accurately predicting withheld Swine Acute Diarrhea Syndrome coronavirus (...
Different from significant gene expression analysis which looks for all genes that are differentially regulated, feature selection in prognostic gene expression analysis aims at finding a subset of informative marker genes that are discriminative for prediction. Unfortunately feature selection in the literature of microarray study is predominated by the simple heuristic univariate gene filter paradigm that selects differentially expressed genes according to their statistical significance. Since the univariate approach does not take into account the correlated or interactive structure among the genes, classifiers built on genes so selected can be less accurate. More advanced approaches based on multivariate models have to be considered. Here, we introduce a feature ranking method through forward orthogonal search to assist prognostic gene selection. Application to published gene-lists selected by univariate models shows that the feature space can be largely reduced while achieving improved testing performances. Our results indicate that "significant" features selected using the gene-wised approaches can contain irrelevant genes that only serve to complicate model building. Multivariate feature ranking can help to reduce feature redundancy and to select highly informative prognostic marker genes. PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1
Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ; Chetty, Madhu ; Ahmad, Shandar ; Ngom, Alioune ; Teng, Shyh Wei ; Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ; Coverage: Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This workflow adapts the approach and parameter settings of Trans-Omics for precision Medicine (TOPMed). The RNA-seq pipeline originated from the Broad Institute. There are in total five steps in the workflow starting from: 1. Read alignment using STAR which produces aligned BAM files including the Genome BAM and Transcriptome BAM. 2. The Genome BAM file is processed using Picard MarkDuplicates producing an updated BAM file containing information on duplicate reads (such reads can indicate biased interpretation). 3. SAMtools index is then employed to generate an index for the BAM file, in preparation for the next step. 4. The indexed BAM file is processed further with RNA-SeQC which takes the BAM file, human genome reference sequence and Gene Transfer Format (GTF) file as inputs to generate transcriptome-level expression quantifications and standard quality control metrics. 5. In parallel with transcript quantification, isoform expression levels are quantified by RSEM. This step depends only on the output of the STAR tool, and additional RSEM reference sequences. For testing and analysis, the workflow author provided example data created by down-sampling the read files of a TOPMed public access data. Chromosome 12 was extracted from the Homo Sapien Assembly 38 reference sequence and provided by the workflow authors. The required GTF and RSEM reference data files are also provided. The workflow is well-documented with a detailed set of instructions of the steps performed to down-sample the data are also provided for transparency. The availability of example input data, use of containerization for underlying software and detailed documentation are important factors in choosing this specific CWL workflow for CWLProv evaluation. This dataset folder is a CWLProv Research Object that captures the Common Workflow Language execution provenance, see https://w3id.org/cwl/prov/0.5.0 or use https://pypi.org/project/cwl
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These data can be used to test my tool delfies on real data, to get a concrete sense of its inputs/outputs and test that it is
properly installed.
I downloaded the genome of Oscheius onirici, accession: GCA_932521025.
I subsampled the genome to the last 2kbp of chromosome I, which contains an elimination breakpoint,
using `seqkit` v2.8.2, giving the FASTA file in this release.
I then downloaded the following sequencing data for *O. onirici*, from the European Nucleotide Archive:
And aligned them to the above genome with `minimap2` version 2.26-r1175, using the following presets:
"map-ont" for the Nanopore data, "map-hifi" for the PacBio data, "sr" for the Illumina data.
After sorting with `samtools`, this gives the BAM files in this release.
I then ran `delfies` version 0.6.0 on each BAM and genome, as:
```sh
delfies --threads 16 \
--telo_forward_seq TTAGGC \
--breakpoint_type all \
--min_mapq 20 \
--min_supporting_reads 6 \
\${genome} \${bam} \${odirname}
```
The three resulting output directories are in this release, prefixed with `delfies_`.
A single, identical breakpoint is found using all three BAMs (see files '*breakpoint_locations.bed').
The above raw data were produced and released by the Wellcome Sanger Institute as part of projects
PRJEB51305 and PRJEB59023.
Over the past year, biology educators and staff at the Department of Energy Systems Biology Knowledgebase (KBase) initiated a collaborative effort to develop a curriculum for bioinformatics education. KBase is a free and easily accessible data science platform that integrates many bioinformatics resources into a graphical user interface built upon reproducible analysis notebooks. KBase held conversations with college and high school instructors to understand how KBase could potentially support their educational goals. These conversations morphed into a working group of biological and data science instructors that adapted the KBase platform to their curriculum needs, specifically around concepts in Genomics, Metagenomics, Pangenomics, and Phylogenetics. The KBase Educators Working Group developed modular, adaptable, and customizable instructional units. Each instructional module contains teaching resources, publicly available data, analysis tools, and markdown capability to tailor instructions and learning goals for each class. The online user interface enables students to conduct hands-on data science research and analyses without requiring programming skills or their own computational resources (these are provided by KBase). Alongside these resources, KBase continues to work with instructors, supporting the development of additional curriculum modules. For anyone new to the platform, KBase, and the growing KBase Educators Organization, provides a community network, accompanied by community-sourced guidelines, instructional templates, and peer support to use KBase within a classroom whether virtual or in-person.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background
Many types of data from genomic analyses can be represented as genomic tracks, i.e. features linked to the genomic coordinates of a reference genome. Examples of such data are epigenetic DNA methylation data, ChIP-seq peaks, germline or somatic DNA variants, or RNA-seq expression levels. Researchers often face difficulties in locating, accessing and combining relevant tracks from external sources, as well as locating the raw data, reducing the value of the generated information.
FAIRtracks software ecosystem
We have, as an output of the ELIXIR Implementation Study "FAIRification of Genomic Tracks", developed a basic set of recommendations for genomic track metadata together with an implementation called FAIRtracks in the form of a JSON Schema. We propose FAIRtracks as a draft standard for genomic track metadata in order to advance the application of FAIR data principles (Findable, Accessible, Interoperable, and Reusable). We have demonstrated practical usage of this approach by designing a software ecosystem around the FAIRtracks draft standard, integrating globally identifiable metadata from various track hubs in the Track Hub Registry and other relevant repositories into a novel track search service, called TrackFind. The software ecosystem also includes the FAIRtracks augmentation service, which assists metadata producers by automatically augmenting minimal machine-readable metadata with their human-readable counterparts, as well as the FAIRtracks validation service, which extends basic JSON Schema validation to include FAIR-related features (global identifiers, ontology terms, and object references). Finally, we have implemented track metadata search and import functionality into relevant analytical tools: EPICO and the GSuite HyperBrowser. For an overview of the FAIRtracks software ecosystem, please visit: http://fairtracks.github.io/
Example FAIRtracks JSON document - augmented
The "Example FAIRtracks JSON document - augmented" is generated as part of the build process of the FAIRtracks draft standard JSON Schema (source code: https://github.com/fairtracks/fairtracks_standard/). The example FAIRtracks document contains a small selection of tracks and objects from the ENCODE project metadata (https://www.encodeproject.org/), adapted to align with the FAIRtracks draft standard. In addition to being available in the above-mentioned GitHub repository, the "Example FAIRtracks JSON document - augmented" is also published here on Zenodo in order for the document to be globally uniquely identifiable by a Digital Object Identifier (DOI).
Output files from the No 4. Taxonomic Workflow page of the SWELTR high- temp study. In this workflow we used the microeco package for taxonomic assessment. We first converted each phyloseq object into a microtable object using the file2meco package.
taxa_wf.rdata : contains all variables and phyloseq objects from 16s rRNA and ITS ASV taxonomic assessment. To see the Objects, in R run _load("taxa_wf.rdata", verbose=TRUE)_
Additional files:
For convenience, we also include individual phyloseq and microtable objects (collected in zip files).
I** _TS (its_taxa_objects.zip)_ :**
its18_ps_work_me.rds : microtable object for the FULL (unfiltered) ITS
data.
its18_ps_filt_me.rds : microtable object for the Arbitrary filtered ITS
data.
its18_ps_perfect_me.rds : microtable object for the PERfect ITS data.
its18_ps_pime_me.rds : microtable object for the PIME ITS data.
_**16S rRNA (ssu_taxa_objects.zip):**_
ssu18_ps_work_me.rds : microtable object for the FULL (unfiltered) 16S
rRNA data.
ssu18_ps_filt_me.rds : microtable object for the Arbitrary filtered 16S
rRNA data.
ssu18_ps_perfect_me.rds : microtable object for the PERfect 16S rRNA data.
ssu18_ps_pime_me.rds : microtable object for the PIME 16S rRNA data.
For one of the 16S rRNA analyses we looked at family-level diversity of major bacterial phyla. For this analysis, we renamed NA ranks by the next highest named rank. For example, ASV13884 was unclassifed at family level, so the NA was replaced with the next highest named rank (in this case order). Therefore the family-level classification for this ASV was changed to _o_Polyangiales_. Doing this allowed us to include uncalssifed abundance in our analyses. We include the following phyloseq objects containing the modifed taxonomies.
ssu18_ps_work_clean.rds : modified phyloseq object for the FULL
(unfiltered) 16S rRNA data.
ssu18_ps_filt_clean.rds : modified phyloseq object for the Arbitrary
filtered 16S rRNA data.
ssu18_ps_perfect_clean.rds : modified phyloseq object for the PERfect
filtered 16S rRNA data.
ssu18_ps_pime_clean.rds : modified phyloseq object for the PIME filtered
16S rRNA data.
Source code for the workflow can be found here:
https://github.com/sweltr/high-temp/blob/master/taxa.Rmd
A database of information on pox viruses. Goals of this project are to acquire and annotate data on poxviruses, and to develop and utilize new tools to facilitate the study of this group of organisms. This basic research is being undertaken with an eye toward the development of novel antiviral therapies, vaccines against human orthopoxvirus infections, new approaches for the environmental detection of virions, and methods to accomplish more rapid diagnosis of disease.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data collection accompanies the manuscript "Classifying protein kinase conformations with machine learning".
It is created using the kinactive v0.1 tool written in pure Python>=3.10. Note that the data are provided for the reference and reproducibility purposes and will not be compatible with later versions of kinactive
built upon lXtractor > 0.1.1. Refer to the kinactive documentation for instructions on how to obtain an actualized version of the structural kinome collection.
File descriptions:
db_v3.tar.gz -- a structural kinome collection archive. One can unpack it and inspect the contents or use load it into the Python interpreter using kinactive
or lXtractor
tools.
default_*_vs.tsv -- structure/sequence variables calculated with lXtractor and used in an interpretable ML pipeline.
*_features.tsv -- lists of ranked features selected by the eBoruta tool for each classifier.
Supplement_labels.tsv -- ML model predictions for each PK domain structure found in db_v3.
A database that curates new experimental and bioinformatic information about the genes and gene products of the model bacterium Escherichia coli K-12 strain MG1655. It has been created to integrate information from post-genomic experiments into a single resource with the aim of providing functional predictions for the 1500 or so gene products for which we have no knowledge of their physiological function. While EchoBASE provides a basic annotation of the genome, taken from other databases, its novelty is in the curation of post-genomic experiments and their linkage to genes of unknown function. Experiments published on E. coli are curated to one of two levels. Papers dealing with the determination of function of a single gene are briefly described, while larger dataset are actually included in the database and can be searched and manipulated. This includes data for proteomics studies, protein-protein interaction studies, microarray data, functional genomic approaches (looking at multiple deletion strains for novel phenotypes) and a wide range of predictions that come out of in silico bioinformatic approaches. The aim of the database is to provide hypothesis for the functions of uncharacterized gene products that may be used by the E. coli research community to further our knowledge of this model bacterium.
This record includes training materials associated with the Australian BioCommons workshop ‘Introduction to Machine Learning in R - from data to knowledge’. This workshop took place over one, 4 hour sessions on 09 December 2024. Event description With the rise in high-throughput sequencing technologies, the volume of omics data has grown exponentially. A major issue is to mine useful knowledge from these heterogeneous collections of data. The analysis of complex high-volume data is not trivial and classical tools cannot be used to explore their full potential. Machine Learning (ML), a discipline in which computers perform automated learning without being programmed explicitly and assist humans to make sense of large and complex data sets, can thus be very useful in mining large omics datasets to uncover new insights that can advance the field of bioinformatics. This hands-on workshop will introduce participants to the ML taxonomy and the applications of common ML algorithms to health data. The workshop will cover the foundational concepts and common methods being used to analyse omics data sets by providing a practical context through the use of basic but widely used R libraries. Participants will acquire an understanding of the standard ML processes, as well as the practical skills in applying them on familiar problems and publicly available real-world data sets. Materials are shared under a Creative Commons Attribution 4.0 International agreement unless otherwise specified and were current at the time of the event. Lead trainers: Dr Fotis Psomopoulos, Senior Researcher, Institute of Applied Biosciences (INAB), Center for Research and Technology Hellas (CERTH) Facilitators: Dr Giorgia Mori, Australian BioCommons Dr Eden Zhang, Sydney Informatics Hub Dr Erin Graham, Queensland Cyber Infrastructure Foundation (QCIF) Infrastructure provision: Uwe Winter, Australian BioCommons Host: Dr. Giorgia Mori, Australian BioCommons Training materials Files and materials included in this record: Event metadata (PDF): Information about the event including, description, event URL, learning objectives, prerequisites, technical requirements etc. Training materials webpage Data and documentation
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In the post-genomic era, every biologist is faced with the task of analyzing, interpreting and visualizing complex and huge data. An increasing number of scientists have begun writing small programs using script-based languages, such as Python. This course is designed to train students and scientists without previous experience in programming who want -- or need -- to write their own bioinformatics software tools. The aim of this training course is to provide an introduction to the Python programming language by solving everyday tasks of Bioinformatics. Each folder contains a short introduction video and a PDF file about the topic, assignments covering the topic (provided as Jupyter notebooks) as well as the solutions (provided as Python files).
Please first read the content of the file 'readme.pdf' and follow the course plan according to the content of the file 'course_plan.pdf'.
Please cite the authors for all course material if you use them in your work.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the accompanying dataset that was generated by the GitHub project: https://github.com/tonyreina/tdc-tcr-epitope-antibody-binding. In that repository I show how to create a machine learning models for predicting if a T-cell receptor (TCR) and protein epitope will bind to each other.
A model that can predict how well a TCR bindings to an epitope can lead to more effective treatments that use immunotherapy. For example, in anti-cancer therapies it is important for the T-cell receptor to bind to the protein marker in the cancer cell so that the T-cell (actually the T-cell's friends in the immune system) can kill the cancer cell.
import pandas as pd
train_data = pd.read_pickle("train_data.pkl")
validation_data = pd.read_pickle("validation_data.pkl")
test_data = pd.read_pickle("test_data.pkl")
The epitope_aa and the tcr_full columns are the protein (peptide) sequences for the epitope and the T-cell receptor, respectively. The letters correspond to the standard amino acid codes.
The epitope_smi column is the SMILES notation for the chemical structure of the epitope. We won't use this information. Instead, the ESM-1b embedder should be sufficient for the input to our binary classification model.
The tcr column is the CDR3 hyperloop. It's the part of the TCR that actually binds (assuming it binds) to the epitope.
The label column is whether the two proteins bind. 0 = No. 1 = Yes.
The tcr_vector and epitope_vector columns are the bio-embeddings of the TCR and epitope sequences generated by the Facebook ESM-1b model. These two vectors can be used to create a machine learning model that predicts whether the combination will produce a successful protein binding.
From the TDC website:
T-cells are an integral part of the adaptive immune system, whose survival, proliferation, activation and function are all governed by the interaction of their T-cell receptor (TCR) with immunogenic peptides (epitopes). A large repertoire of T-cell receptors with different specificity is needed to provide protection against a wide range of pathogens. This new task aims to predict the binding affinity given a pair of TCR sequence and epitope sequence.
Weber et al.
Dataset Description: The dataset is from Weber et al. who assemble a large and diverse data from the VDJ database and ImmuneCODE project. It uses human TCR-beta chain sequences. Since this dataset is highly imbalanced, the authors exclude epitopes with less than 15 associated TCR sequences and downsample to a limit of 400 TCRs per epitope. The dataset contains amino acid sequences either for the entire TCR or only for the hypervariable CDR3 loop. Epitopes are available as amino acid sequences. Since Weber et al. proposed to represent the peptides as SMILES strings (which reformulates the problem to protein-ligand binding prediction) the SMILES strings of the epitopes are also included. 50% negative samples were generated by shuffling the pairs, i.e. associating TCR sequences with epitopes they have not been shown to bind.
Task Description: Binary classification. Given the epitope (a peptide, either represented as amino acid sequence or as SMILES) and a T-cell receptor (amino acid sequence, either of the full protein complex or only of the hypervariable CDR3 loop), predict whether the epitope binds to the TCR.
Dataset Statistics: 47,182 TCR-Epitope pairs between 192 epitopes and 23,139 TCRs.
References:
Dataset License: CC BY 4.0.
Contributed by: Anna Weber and Jannis Born.
Checkpoint name | Number of layers | Number of parameters |
esm2_t48_15B_UR50D | 48 | 15B |
esm2_t36_3B_UR50D | 36 | 3B |
esm2_t33_650M_UR50D | 33 | 650M |
esm2_t30_150M_UR50D | 30 | 150M |
esm2_t12_35M_UR50D | 12 | 35M |
esm2_t6_8M_UR50D | 6 | 8M |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Traumatic brain injury (TBI) is a serious disease that could increase the risk of epilepsy. The purpose of this article is to explore the common molecular mechanism in TBI and epilepsy with the aim of providing a theoretical basis for the prevention and treatment of post-traumatic epilepsy (PTE). Two datasets of TBI and epilepsy in the Gene Expression Omnibus (GEO) database were downloaded. Functional enrichment analysis, protein–protein interaction (PPI) network construction, and hub gene identification were performed based on the cross-talk genes of aforementioned two diseases. Another dataset was used to validate these hub genes. Moreover, the abundance of infiltrating immune cells was evaluated through Immune Cell Abundance Identifier (ImmuCellAI). The common microRNAs (miRNAs) between TBI and epilepsy were acquired via the Human microRNA Disease Database (HMDD). The overlapped genes in cross-talk genes and target genes predicted through the TargetScan were obtained to construct the common miRNAs–mRNAs network. A total of 106 cross-talk genes were screened out, including 37 upregulated and 69 downregulated genes. Through the enrichment analyses, we showed that the terms about cytokine and immunity were enriched many times, particularly interferon gamma signaling pathway. Four critical hub genes were screened out for co-expression analysis. The miRNA–mRNA network revealed that three miRNAs may affect the shared interferon-induced genes, which might have essential roles in PTE. Our study showed the potential role of interferon gamma signaling pathway in pathogenesis of PTE, which may provide a promising target for future therapeutic interventions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sex steroids play a key role in triggering sex differentiation in fish, the use of exogenous hormone treatment leading to partial or complete sex reversal. This phenomenon has attracted attention since the discovery that even low environmental doses of exogenous steroids can adversely affect gonad morphology (ovotestis development) and induce reproductive failure. Modern genomic-based technologies have enhanced opportunities to find out mechanisms of actions (MOA) and identify biomarkers related to the toxic action of a compound. However, high throughput data interpretation relies on statistical analysis, species genomic resources, and bioinformatics tools. The goals of this study are to improve the knowledge of feminisation in fish, by the analysis of molecular responses in the gonads of rainbow trout fry after chronic exposure to several doses (0.01, 0.1, 1 and 10 μg/L) of ethynylestradiol (EE2) and to offer target genes as potential biomarkers of ovotestis development. We successfully adapted a bioinformatics microarray analysis workflow elaborated on human data to a toxicogenomic study using rainbow trout, a fish species lacking accurate functional annotation and genomic resources. The workflow allowed to obtain lists of genes supposed to be enriched in true positive differentially expressed genes (DEGs), which were subjected to over-representation analysis methods (ORA). Several pathways and ontologies, mostly related to cell division and metabolism, sexual reproduction and steroid production, were found significantly enriched in our analyses. Moreover, two sets of potential ovotestis biomarkers were selected using several criteria. The first group displayed specific potential biomarkers belonging to pathways/ontologies highlighted in the experiment. Among them, the early ovarian differentiation gene foxl2a was overexpressed. The second group, which was highly sensitive but not specific, included the DEGs presenting the highest fold change and lowest p-value of the statistical workflow output. The methodology can be generalized to other (non-model) species and various types of microarray platforms.
https://www.bccresearch.com/aboutus/terms-conditionshttps://www.bccresearch.com/aboutus/terms-conditions
Explore BCC Research's comprehensive report on Bioinformatics technologies Market. This report aims to study current and historical market revenues can be estimated based on the services & platforms, solutions, and application type.