Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionSkin cutaneous melanoma (SKCM) is a common malignant skin cancer with high mortality and recurrence rates. Although the mRNA vaccine is a promising strategy for cancer treatment, its application against SKCM remains confusing. In this study, we employed computational bioinformatics analysis to explore SKCM-associated antigens for an mRNA vaccine and suitable populations for vaccination.MethodsGene expression and clinical data were retrieved from GEO and TCGA. The differential expression levels and prognostic index of selected antigens were computed via GEPIA2,while genetic alterations were analyzed using cBioPortal. TIMER was utilized to assess the correlation between antigen-presenting cell infiltration and antigen. Consensus clustering identified immune subtypes, and immune characteristics were evaluated across subtypes. Weighted gene co-expression network analysis was performed to identify modules of immune-related genes.ResultsWe discovered five tumor antigens (P2RY6, PLA2G2D, RBM47, SEL1L3, and SPIB) that are significantly increased and mutated, which correlate with the survival of patients and the presence of immune cells that present these antigens. Our analysis revealed two distinct immune subtypes among the SKCM samples. Immune subtype 1 was associated with poorer clinical outcomes and exhibited low levels of immune activity, characterized by fewer mutations and lower immune cell infiltration. In contrast, immune subtype 2 showed higher immune activity and better patient outcomes. Subsequently, the immune landscape of SKCM exhibited immune heterogeneity among patients, and a key gene module that is enriched in immune-related pathways was identified.ConclusionsOur findings suggest that the identified tumor antigens could serve as valuable targets for developing mRNA vaccines against SKCM, particularly for patients in immune subtype 1. This research provides valuable insights into personalized immunotherapy approaches for this challenging cancer and highlights the advantages of bioinformatics in identifying immune targets and optimizing treatment approaches.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Teaching material and blank questionnaires used for the event with Kilgraston School. Names and contact details have been redacted. (ZIP 432 kb)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The two tables contain data of software packages published in Molecular Ecology Resource (MER) from 2001 to 2015, and in BMC Bioinformatics (2004). These data sets are part of an analysis under the title stated above.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Community composition data are essential for conservation management, facilitating identification of rare native and invasive species, along with abundant ones. However, traditional capture-based morphological surveys require considerable taxonomic expertise, are time consuming and expensive, can kill rare taxa and damage habitats, and often are prone to false negatives. Alternatively, metabarcode assays can be used to assess the genetic identity and compositions of entire communities from environmental samples, comprising a more sensitive, less damaging, and relatively time- and cost-efficient approach. However, there is a trade-off between the stringency of bioinformatic filtering needed to remove false positives and the potential for false negatives. The present investigation thus evaluated use of four mitochondrial (mt) DNA metabarcode assays and a customized bioinformatic pipeline to increase confidence in species identifications by removing false positives, while achieving high detection probability. Positive controls were used to calculate sequencing error, and results that fell below those cutoff values were removed, unless found with multiple assays. The performance of this approach was tested to discern and identify North American freshwater fishes using lab experiments (mock communities and aquarium experiments) and processing of a bulk ichthyoplankton sample. The method then was applied to field environmental (e)DNA water samples taken concomitant with electrofishing surveys and morphological identifications. This protocol detected 100% of species present in concomitant electrofishing surveys in the Wabash River and an additional 21 that were absent from traditional sampling. Using single 1 L water samples collected from just four locations, the metabarcoding assays discerned 73% of the total fish species that were discerned in comparison to four months of an extensive electrofishing river survey in the Maumee River, along with an additional nine species. In both rivers, total fish species diversity was best resolved when all four metabarcode assays were used together, which identified 35 additional species missed by electrofishing. Ecological distinction and diversity levels among the fish communities also were better resolved with the metabarcode assays than with morphological sampling and identifications, especially with the combined assays. At the population-level, metabarcode analyses targeting the invasive round goby Neogobius melanostomus and the silver carp Hypophthalmichthys molitrix identified all population haplotype variants found using Sanger sequencing of morphologically sampled fish, along with additional intra-specific diversity, meriting further investigation. Overall findings demonstrated that the use of multiple metabarcode assays and custom bioinformatics that filter potential error from true positive detections improves confidence in evaluating biodiversity.
Methods These scripts were written and databases curated by Matthew Snyder during his PhD Dissertation research in Dr. Carol Stepien's Genetics and Genomics Group at the Pacific Marine Environmental Laboratory, National Oceanic and Atmospheric Administration, Seattle, WA.
http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/
The estimation of three-dimension neural active sources from the magnetoencephalography (MEG) record is a very critical issue for both clinical neurology and brain functions research. Nowadays multiple signal classification (MUSIC) algorithm and recursive MUSIC algorithm are widely used to locate dipolar sources from MEG data. The drawback of these algorithms is that they need excessive calculation and is quite time-consuming when scanning a three-dimensional space. In order to solve this problem, we propose a MEG sources localization scheme based on an improved Particle Swarm Optimization (PSO). This scheme uses the advantage of global searching ability of PSO to estimate the rough source location. Then combining with grids search in small area, the accurate dipolar source localization is performed. In addition, we compare the results of our method with those based on Genetic Algorithm (GA). Computer simulation results show that our PSO strategy is an effective and precise approach to dipole localization which can improve the speed greatly and localize the sources accurately. PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1
Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ; Chetty, Madhu ; Ahmad, Shandar ; Ngom, Alioune ; Teng, Shyh Wei ; Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ; Coverage: Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data from questionnaires (LibreOffice Calc spreadsheet). For details of contents, see the “Notes†worksheet. (ODS 82 kb)
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The public sharing of primary research datasets potentially benefits the research community but is not yet common practice. In this pilot study, we analyzed whether data sharing frequency was associated with funder and publisher requirements, journal impact factor, or investigator experience and impact. Across 397 recent biomedical microarray studies, we found investigators were more likely to publicly share their raw dataset when their study was published in a high-impact journal and when the first or last authors had high levels of career experience and impact. We estimate the USA's National Institutes of Health (NIH) data sharing policy applied to 19% of the studies in our cohort; being subject to the NIH data sharing plan requirement was not found to correlate with increased data sharing behavior in multivariate logistic regression analysis. Studies published in journals that required a database submission accession number as a condition of publication were more likely to share their data, but this trend was not statistically significant. These early results will inform our ongoing larger analysis, and hopefully contribute to the development of more effective data sharing initiatives. Earlier version presented at ASIS&T and ISSI Pre-Conference: Symposium on Informetrics and Scientometrics 2009
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Objectives:
Using agile software development practices, develop and evaluate a software architecture and implementation for reliable management of bioinformatic data that is stored in the cloud.
Materials and Methods:
CORE (Comprehensive Oncology Research Environment) Browser is a new open-source web application for cancer researchers to manage sequencing data organized in a flexible format in Amazon Simple Storage Service (S3) buckets. It has a microservices- and hypermedia-based architecture, which we integrated with Test-Driven Development (TDD), the iterative writing of computable specifications for how software should work prior to development. Optimal testing completeness is a tradeoff between code coverage and software development costs. We hypothesized this architecture would permit developing tests that can be executed repeatedly for all microservices, maximizing code coverage while minimizing effort.
Results:
After one-and-a-half years of development, the CORE Browser backend had 121 tests designed for repeated execution and 875 custom tests that were executed 3,031 times, providing 78% code coverage.
Discussion:
Hypermedia architecture's repeating pattern, links, permits CORE Browser to implement tests that can be executed repeatedly by every microservice to achieve high code coverage. Code coverage correlates with software reliability. Other benefits of this architecture include permitting access to bucket data from outside the application and separating management of bioinformatic data from analysis.
Conclusion:
Architectural choices are important enablers of modern software development practices, such as TDD. Updating software architecture may be a critical next step in agile transformation after an engineering team implements the structural changes on which most such transformations focus.
Keywords:
High-throughput nucleotide sequencing, software, data management, cloud computing.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Recent work has shown that evolvability plays a key role in determining the long-term population dynamics of asexual clones. However, simple considerations suggest that the evolvability of a focal lineage of bacteria should also be influenced by the evolvability of its competitors. First, evolvable competitors should accelerate evolution by impeding the fixation of the focal lineage through a clonal interference–like mechanism. Second, evolvable competitors should increase the strength of selection by rapidly degrading the environment, increasing selection for adaptive mutations. Here we tested these ideas by allowing a high-fitness clone of the bacterium Pseudomonas aeruginosa to invade populations of two low-fitness resident clones that differ in their evolvability. Both competition from mutations in the resident lineage and environmental degradation lead to faster adaptation in the invader through fixing single mutations with a greater fitness advantage. The results suggest that competition from mutations in both the successful invader and the unsuccessful resident shapes the adaptive trajectory of the invader through both direct competition and indirect environmental effects. Therefore, to predict evolutionary outcomes, it will be necessary to consider the evolvability of all members of the community and the effects of adaptation on the quality of the environment. This is particularly relevant to mixed microbial communities where lineages differ in their adaptive potential, a common feature of chronic infections.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Recombination is a key evolutionary driver in shaping novel viral populations and lineages. When unaccounted for, recombination can impact evolutionary estimations, or complicate their interpretation. Therefore, identifying signals for recombination in sequencing data is a key prerequisite to further analyses. A repertoire of recombination detection methods have been developed over the past two decades, however, the prevalence of pandemic-scale viral sequencing data poses a computational challenge for existing methods. Here, we assessed five recombination detection methods (PhiPack (Profile), 3SEQ, GENECONV, VSEARCH (UCHIME), and gmos) to determine if any are suitable for the analysis of bulk sequencing data. To test the performance and scalability of these methods, we analysed simulated viral sequencing data across a range of sequence diversities, recombination frequencies, and sample sizes. Further, we provide a practical example for the analysis and validation of empirical data. We find that recombination detection methods need to be scalable, use an analytical approach and resolution that is suitable for the intended research application, and are accurate for the properties of a given dataset (e.g. sequence diversity and estimated recombination frequency). Analysis of simulated and empirical data revealed that the assessed methods exhibited considerable trade-offs between these criteria. Overall, we provide general guidelines for the validation of recombination detection results, the benefits and shortcomings of each assessed method, and future considerations for recombination detection methods for the assessment of large-scale viral sequencing data.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Effects of intraguild predation (IGP) on omnivores and detritivores are relatively understudied when compared to work on predator guilds. Functional genetic work in IGP is even more limited, but its application can help answer a range of questions related to ultimate and proximate causes of this behavior. Here we integrate behavioral assays and transcriptomic analysis of facultative predation in a blow fly (Diptera: Calliphoridae) to evaluate the prevalence, effect, and correlated gene expression of facultative predation by the invasive species Chrysomya rufifacies. Field work observing donated human cadavers indicated facultative predation by C. rufifacies on the native blow fly Cochliomyia macellaria was rare under undisturbed conditions, owing in part to spatial segregation between species. Laboratory assays under conditions of starvation showed predation had a direct fitness benefit (i.e., survival) to the predator. As a genome is not available for C. rufifacies, a de novo transcriptome was developed and annotated using sequence similarity to Drosophila melanogaster. Under a variety of assembly parameters, several genes were identified as being differentially expressed between predators and non-predators of this species, including genes involved in cell-to-cell signaling, osmotic regulation, starvation responses, and dopamine regulation. Results of this work were integrated to develop a model of the processes and genetic regulation controlling facultative predation.
Attribution-NonCommercial-ShareAlike 2.5 (CC BY-NC-SA 2.5)https://creativecommons.org/licenses/by-nc-sa/2.5/
License information was derived automatically
Background: Structural variants (SVs) are large DNA rearrangements, typically >50 bp, that may be directly involved in genome evolution and human diseases. SVs can be classified as copy number variants (CNVs), characterized by gain (tandem duplications and insertions) or loss of genetic material (deletions); non-CNVs, large rearrangements non involving gain or loss of DNA (perfect inversions, reciprocal translocations), and complex, combining characteristics. The Human Genome (HG) presents a vast complexity characterized by a high number of long and particularly short interspersed repeated elements (LINEs and SINEs, respectively) as well as low copy number repeats (LCRs) (typically 2-4 copies per haploid genome). Homologous pairing and recombination between non-allelic copies of these repeats may fuel the occurrence of all type of SVs. These SVs mediated by unequal homologous recombination are characterized by being delimited by these repeats and, consequently, the specific breakpoint cannot be determined and will remain undefined and confined to the range of the repeated tract involved in the reciprocal. Massive parallel sequencing (MPS) has greatly improved modern human genetics, so called genomic medicine, by mean of performing a comprehensive and accurate genotyping of small variants including single nucleotide substitutions and indels (SNV). However, SV genotyping by using 2nd generation MPS (e.g., Illumina platform of short read sequencing) and the associated bioinformatics algorithms are still far to reach its potential. 3rd generation MPS has come to address the challenges posed by SV genotyping by producing much longer reads. It is expected that long-read MPS will alleviate numerous computational challenges surrounding genome assembly, transcript reconstruction, and metagenomics among other important areas of modern biology and medicine. Among 2nd generation MPS, pair-end read technologies, is highly accurate for SNV genotyping, but no so efficient for SV calling in the complex HG, in part due to its abundance in repeated sequences such as SINE, LINE and LCR. Certainty, long read sequencing technologies or third generation MPS, have clear advantages for SV genotyping, but its insertion in genomic medicine worldwide is still. Consequently, some recurrent SVs, mediated by large LCRs (e.g., >1 kb), still need characterization by massive sequencing. Aim: The objective of this work is to present the proof-of-concept of a novel approach for SV genotyping characterized by applying second generation Whole Genome Sequencing (WGS) from circularized restriction-fragment DNA and the development of the specifically designed bioinformatic protocol.
The Daphnia Genomics Consortium (DGC) is an international network of investigators committed to mounting the freshwater crustacean Daphnia as a model system for ecology, evolution and the environmental sciences. Along with research activities, the DGC is: (1) coordinating efforts towards developing the Daphnia genomic toolbox, which will then be available for use by the general community; (2) facilitating collaborative cross-disciplinary investigations; (3) developing bioinformatic strategies for organizing the rapidly growing genome database; and (4) exploring emerging technologies to improve high throughput analyses of molecular and ecological samples. If we are to succeed in creating a new model system for modern life-sciences research, it will need to be a community-wide effort. Research activities of the DGC are primarily focused on creating genomic tools and information. When completed, the current projects will offer a first view of the Daphnia genome''s topography, including regions of high and low recombination, the distribution of transposable, repetitive and regulatory elements, the size and structure of genes and of their neighborhoods. This information is crucial in formulating testable hypotheses relating genetics and demographics to the evolutionary potential or constraints of natural populations. Projects aiming to compile identifiable genes with their function are also underway, together with robust methods to verify these findings. Finally, these tools are being tested, by exploring their uses in key ecological and toxicological investigations. Each project benefits from the leadership and expertise of many individuals. For further details, begin by contacting the project directors. The DGC consists of biologists from a broad spectrum of subdisciplines, including limnology, ecotoxicology, quantitative and population genetics, systematics, molecular biology and evolution, developmental biology, genomics and bioinformatics. In many regards, the rapid early success of the consortium results from its grass-roots origin promoting an international composition, under a cooperative model, with significant scientific breadth. We hold to this approach in building this network and encourage more people to participate. All the while, the DGC is structured to effectively reach specific goals. The consortium includes an advisory board (composed of experts of the various subdisciplines), whose responsibility is to act as the research community''s agent in guiding the development of Daphnia genomic resources. The advisors communicate directly to DGC members, who are either contributing genomic tools or actively seeking funds for this function. The consortium''s main body (given the widespread interest in applying genomic tools in environmental studies) are the affiliates, who make use of these tools for their research and who are soliciting support.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Emerging in the realm of bioinformatics, plant bioinformatics integrates computational and statistical methods to study plant genomes, transcriptomes, and proteomes. With the introduction of high-throughput sequencing technologies and other omics data, the demand for automated methods to analyze and interpret these data has increased. We propose a novel explainable gradient-based approach EG-CNN model for both omics data and hyperspectral images to predict the type of attack on plants in this study. We gathered gene expression, metabolite, and hyperspectral image data from plants afflicted with four prevalent diseases: powdery mildew, rust, leaf spot, and blight. Our proposed EG-CNN model employs a combination of these omics data to learn crucial plant disease detection characteristics. We trained our model with multiple hyperparameters, such as the learning rate, number of hidden layers, and dropout rate, and attained a test set accuracy of 95.5%. We also conducted a sensitivity analysis to determine the model’s resistance to hyperparameter variations. Our analysis revealed that our model exhibited a notable degree of resilience in the face of these variations, resulting in only marginal changes in performance. Furthermore, we conducted a comparative examination of the time efficiency of our EG-CNN model in relation to baseline models, including SVM, Random Forest, and Logistic Regression. Although our model necessitates additional time for training and validation due to its intricate architecture, it demonstrates a faster testing time per sample, offering potential advantages in real-world scenarios where speed is paramount. To gain insights into the internal representations of our EG-CNN model, we employed saliency maps for a qualitative analysis. This visualization approach allowed us to ascertain that our model effectively captures crucial aspects of plant disease, encompassing alterations in gene expression, metabolite levels, and spectral discrepancies within plant tissues. Leveraging omics data and hyperspectral images, this study underscores the potential of deep learning methods in the realm of plant disease detection. The proposed EG-CNN model exhibited impressive accuracy and displayed a remarkable degree of insensitivity to hyperparameter variations, which holds promise for future plant bioinformatics applications.
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The global CGH Microarray Software market is experiencing robust growth, projected to reach $256.2 million in 2025. While the provided CAGR is missing, considering the rapid advancements in genomic technologies and increasing demand for precise genetic analysis in healthcare and research, a conservative estimate would place the CAGR between 8% and 12% for the forecast period (2025-2033). This growth is fueled by several key drivers. The rising prevalence of genetic disorders necessitates advanced diagnostic tools like CGH microarrays, driving software demand for analysis and interpretation. Furthermore, the increasing adoption of personalized medicine, coupled with the falling cost of sequencing and microarray technologies, makes CGH microarray analysis more accessible and cost-effective. Research organizations are increasingly utilizing these tools for genetic research, contributing significantly to market expansion. The market is segmented by software type (web-based and cloud-based) and application (hospitals and health systems, research organizations, and others). Web-based solutions are gaining traction due to their accessibility and ease of use, while cloud-based solutions offer scalability and data storage advantages. Hospitals and health systems represent a significant market segment, driven by the need for accurate diagnosis and treatment planning. The competitive landscape includes established players like Agilent Technologies and QIAGEN Digital Insights, alongside emerging companies such as Fabric Genomics and Congenica, fostering innovation and competition. The market's geographical distribution shows strong presence across North America, Europe, and Asia Pacific. North America currently holds a significant market share due to the advanced healthcare infrastructure and high adoption rates of genomic technologies. However, the Asia Pacific region is expected to witness the fastest growth, propelled by rising healthcare expenditure and increasing awareness of genetic diseases. Factors like stringent regulatory approvals and the high cost of software licensing could pose challenges, potentially restraining market growth in certain regions. Nevertheless, the overall market outlook remains positive, with significant opportunities for growth driven by technological advancements and increasing demand for precise genetic analysis across diverse applications. The continuous development of user-friendly interfaces and integration with other genomic analysis tools will further propel the market's expansion in the coming years.
Author:
Source: Unknown - Date unknown
Please cite:
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality metrics (e.g. accuracy, precision, area under ROC curve, etc.) for classification, feature selection or clustering algorithms.
This repository was inspired by an increasing need in machine learning / bioinformatics communities for a collection of microarray classification problems that could be used by different researches. This way many different classification or feature selection techniques can finally be compared to eachother on the same set of problems.
Origin of data
Each gene expression sample in GEMLeR repository comes from a large publicly available expO (Expression Project For Oncology) repository by International Genomics Consortium.
The goal of expO and its consortium supporters is to procure tissue samples under standard conditions and perform gene expression analyses on a clinically annotated set of deidentified tumor samples. The tumor data is updated with clinical outcomes and is released into the public domain without intellectual property restriction. The availability of this information translates into direct benefits for patients, researchers and pharma alike.
Source: expO website Although there are various other sources of gene expression data available, a decision to use data from expO repository was made because of: - consistency of tissue samples processing procedure - same microarray platform used for all samples - availability of additional information for combined genotype-phenotype studies - availability of a large number of samples for different tumor types
In case of publishing material based on GEMLeR datasets, then, please note the assistance you received by using this repository. This will help others to obtain the same datasets and replicate your experiments. Please cite as follows when referring to this repository:
Stiglic, G., & Kokol, P. (2010). Stability of Ranked Gene Lists in Large Microarray Analysis Studies. Journal of biomedicine biotechnology, 2010, 616358.
You are also welcome to acknowledge the contribution of expO (Expression Project For Oncology) and International Genomics Consortium for providing their gene expression samples to the public.
Plink1.9, R, vegan package in R
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
1. Despite widespread recognition of its great promise to aid decision-making in environmental management, the applied use of metabarcoding requires improvements to reduce the multiple errors that arise during PCR amplification, sequencing, and library generation. We present a co-designed wet-lab and bioinformatic workflow for metabarcoding bulk samples that removes both false-positive (tag jumps, chimeras, erroneous sequences) and false-negative ('dropout') errors. However, we find that it is not possible to recover relative-abundance information from amplicon data, due to persistent species-specific biases.
2. To present and validate our workflow, we created eight mock arthropod soups, all containing the same 248 arthropod morphospecies but differing in absolute and relative DNA concentrations, and we ran them under five different PCR conditions. Our pipeline includes qPCR-optimized PCR annealing temperature and cycle number, twin-tagging, multiple independent PCR replicates per sample, and negative and positive controls. In the bioinformatic portion, we introduce Begum, which is a new version of DAMe (Zepeda-Mendoza et al. 2016. BMC Res. Notes 9:255) that ignores heterogeneity spacers, allows primer mismatches when demultiplexing samples, and is more efficient. Like DAMe, Begum removes tag-jumped reads and removes sequence errors by keeping only sequences that appear in more than one PCR above a minimum copy number per PCR. The filtering thresholds are user-configurable.
3. We report that OTU dropout frequency and taxonomic amplification bias are both reduced by using a PCR annealing temperature and cycle number on the low ends of the ranges currently used for the Leray-FolDegenRev primers. We also report that tag jumps and erroneous sequences can be nearly eliminated with Begum filtering, at the cost of only a small rise in dropouts. We replicate published findings that uneven size distribution of input biomasses leads to greater dropout frequency and that OTU size is a poor predictor of species input biomass. Finally, we find no evidence for 'tag-biased' PCR amplification.
4. To aid learning, reproducibility, and the design and testing of alternative metabarcoding pipelines, we provide our Illumina and input-species sequence datasets, scripts, a spreadsheet for designing primer tags, and a tutorial.
Author:
Source: Unknown - Date unknown
Please cite:
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality metrics (e.g. accuracy, precision, area under ROC curve, etc.) for classification, feature selection or clustering algorithms.
This repository was inspired by an increasing need in machine learning / bioinformatics communities for a collection of microarray classification problems that could be used by different researches. This way many different classification or feature selection techniques can finally be compared to eachother on the same set of problems.
Origin of data
Each gene expression sample in GEMLeR repository comes from a large publicly available expO (Expression Project For Oncology) repository by International Genomics Consortium.
The goal of expO and its consortium supporters is to procure tissue samples under standard conditions and perform gene expression analyses on a clinically annotated set of deidentified tumor samples. The tumor data is updated with clinical outcomes and is released into the public domain without intellectual property restriction. The availability of this information translates into direct benefits for patients, researchers and pharma alike.
Source: expO website Although there are various other sources of gene expression data available, a decision to use data from expO repository was made because of: - consistency of tissue samples processing procedure - same microarray platform used for all samples - availability of additional information for combined genotype-phenotype studies - availability of a large number of samples for different tumor types
In case of publishing material based on GEMLeR datasets, then, please note the assistance you received by using this repository. This will help others to obtain the same datasets and replicate your experiments. Please cite as follows when referring to this repository:
Stiglic, G., & Kokol, P. (2010). Stability of Ranked Gene Lists in Large Microarray Analysis Studies. Journal of biomedicine biotechnology, 2010, 616358.
You are also welcome to acknowledge the contribution of expO (Expression Project For Oncology) and International Genomics Consortium for providing their gene expression samples to the public.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The identification of peptide sequences and their post-translational modifications (PTMs) is a crucial step in the analysis of bottom-up proteomics data. The recent development of open modification search (OMS) engines allows virtually all PTMs to be searched for. This not only increases the number of spectra that can be matched to peptides but also greatly advances the understanding of biological roles of PTMs through the identification, and thereby facilitated quantification, of peptidoforms (peptide sequences and their potential PTMs). While the benefits of combining results from multiple protein database search engines has been established previously, similar approaches for OMS results are missing so far. Here, we compare and combine results from three different OMS engines, demonstrating an increase in peptide spectrum matches of 8-18%. The unification of search results furthermore allows for the combined downstream processing of search results, including the mapping to potential PTMs. Finally, we test for the ability of OMS engines to identify glycosylated peptides. The implementation of these engines in the Python framework Ursgal facilitates the straightforward application of OMS with unified parameters and results files, thereby enabling yet unmatched high-throughput, large-scale data analysis.
This dataset includes all relevant results files, databases, and scripts that correspond to the accompanying journal article. Specifically, the following files are deposited:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionSkin cutaneous melanoma (SKCM) is a common malignant skin cancer with high mortality and recurrence rates. Although the mRNA vaccine is a promising strategy for cancer treatment, its application against SKCM remains confusing. In this study, we employed computational bioinformatics analysis to explore SKCM-associated antigens for an mRNA vaccine and suitable populations for vaccination.MethodsGene expression and clinical data were retrieved from GEO and TCGA. The differential expression levels and prognostic index of selected antigens were computed via GEPIA2,while genetic alterations were analyzed using cBioPortal. TIMER was utilized to assess the correlation between antigen-presenting cell infiltration and antigen. Consensus clustering identified immune subtypes, and immune characteristics were evaluated across subtypes. Weighted gene co-expression network analysis was performed to identify modules of immune-related genes.ResultsWe discovered five tumor antigens (P2RY6, PLA2G2D, RBM47, SEL1L3, and SPIB) that are significantly increased and mutated, which correlate with the survival of patients and the presence of immune cells that present these antigens. Our analysis revealed two distinct immune subtypes among the SKCM samples. Immune subtype 1 was associated with poorer clinical outcomes and exhibited low levels of immune activity, characterized by fewer mutations and lower immune cell infiltration. In contrast, immune subtype 2 showed higher immune activity and better patient outcomes. Subsequently, the immune landscape of SKCM exhibited immune heterogeneity among patients, and a key gene module that is enriched in immune-related pathways was identified.ConclusionsOur findings suggest that the identified tumor antigens could serve as valuable targets for developing mRNA vaccines against SKCM, particularly for patients in immune subtype 1. This research provides valuable insights into personalized immunotherapy approaches for this challenging cancer and highlights the advantages of bioinformatics in identifying immune targets and optimizing treatment approaches.