80 datasets found
  1. d

    Data from: Permutation-validated principal components analysis of microarray...

    • catalog.data.gov
    • healthdata.gov
    • +1more
    Updated Sep 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institutes of Health (2025). Permutation-validated principal components analysis of microarray data [Dataset]. https://catalog.data.gov/dataset/permutation-validated-principal-components-analysis-of-microarray-data
    Explore at:
    Dataset updated
    Sep 7, 2025
    Dataset provided by
    National Institutes of Health
    Description

    Background In microarray data analysis, the comparison of gene-expression profiles with respect to different conditions and the selection of biologically interesting genes are crucial tasks. Multivariate statistical methods have been applied to analyze these large datasets. Less work has been published concerning the assessment of the reliability of gene-selection procedures. Here we describe a method to assess reliability in multivariate microarray data analysis using permutation-validated principal components analysis (PCA). The approach is designed for microarray data with a group structure. Results We used PCA to detect the major sources of variance underlying the hybridization conditions followed by gene selection based on PCA-derived and permutation-based test statistics. We validated our method by applying it to well characterized yeast cell-cycle data and to two datasets from our laboratory. We could describe the major sources of variance, select informative genes and visualize the relationship of genes and arrays. We observed differences in the level of the explained variance and the interpretability of the selected genes. Conclusions Combining data visualization and permutation-based gene selection, permutation-validated PCA enables one to illustrate gene-expression variance between several conditions and to select genes by taking into account the relationship of between-group to within-group variance of genes. The method can be used to extract the leading sources of variance from microarray data, to visualize relationships between genes and hybridizations and to select informative genes in a statistically reliable manner. This selection accounts for the level of reproducibility of replicates or group structure as well as gene-specific scatter. Visualization of the data can support a straightforward biological interpretation.

  2. Representative sample of the data file required to input user-specific data...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laura D. Hughes; Scott A. Lewis; Michael E. Hughes (2023). Representative sample of the data file required to input user-specific data into ExpressionDB. [Dataset]. http://doi.org/10.1371/journal.pone.0187457.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Laura D. Hughes; Scott A. Lewis; Michael E. Hughes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This example includes two tissues with three replicates apiece downloaded from GTEx. Complete.csv file here: https://github.com/5c077/ExpressionDB/tree/master/data.

  3. Versions of all software packages used in developing ExpressionDB.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laura D. Hughes; Scott A. Lewis; Michael E. Hughes (2023). Versions of all software packages used in developing ExpressionDB. [Dataset]. http://doi.org/10.1371/journal.pone.0187457.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Laura D. Hughes; Scott A. Lewis; Michael E. Hughes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Package versions can also be found online: https://github.com/5c077/ExpressionDB/tree/master/data.

  4. n

    Onto-Design

    • neuinfo.org
    • scicrunch.org
    • +3more
    Updated Aug 19, 2011
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2011). Onto-Design [Dataset]. http://identifiers.org/RRID:SCR_000601
    Explore at:
    Dataset updated
    Aug 19, 2011
    Description

    THIS RESOURCE IS NO LONGER IN SERVICE. Documented on September 6,2023. Many Laboratories chose to design and print their own microarrays. At present, the choice of the genes to include on a certain microarray is a very laborious process requiring a high level of expertise. Onto-Design database is able to assist the designers of custom microarrays by providing the means to select genes based on their experiment. Design custom microarrays based on GO terms of interest. User account required. Platform: Online tool

  5. Representative sample of the annotation file required to input user-specific...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laura D. Hughes; Scott A. Lewis; Michael E. Hughes (2023). Representative sample of the annotation file required to input user-specific data into ExpressionDB. [Dataset]. http://doi.org/10.1371/journal.pone.0187457.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Laura D. Hughes; Scott A. Lewis; Michael E. Hughes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This example comprises human annotations downloaded from Entrez Gene. Complete.csv files in appropriate format for many common organisms studied can be downloaded here:https://github.com/5c077/ExpressionDB/tree/master/data.

  6. d

    Microarray DB

    • dknet.org
    • rrid.site
    • +2more
    Updated Aug 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Microarray DB [Dataset]. http://identifiers.org/RRID:SCR_008525
    Explore at:
    Dataset updated
    Aug 31, 2025
    Description

    A tool for mapping transcriptome data and for creating a database with an overview of the entire pathway, a web-based resource consisting of a web-application for the visualization of complex omics data onto KEGG pathways to overview all entities in the context of cellular pathways, and databases created with the software to visualize a series of microarray data. The web-application accepts transcriptome, proteome, metabolome, or the combination of these data as input, and because of this scalability it is advantageous for the visualization of cell simulation results. Several databases of transcriptome data obtained at Mori Laboratory, Nara Institute of Science and Technology, Japan, are also presented.

  7. Data Preprocessing EDA Microarray GE Data GSE5583

    • kaggle.com
    zip
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr. Nagendra (2025). Data Preprocessing EDA Microarray GE Data GSE5583 [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/data-preprocessing-eda-microarray-ge-data-gse5583
    Explore at:
    zip(3144708 bytes)Available download formats
    Dataset updated
    Nov 29, 2025
    Authors
    Dr. Nagendra
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset is based on GEO series GSE5583. OmicsDI

    The experiment compares gene expression profiles between wild‑type mouse embryonic stem cells (ES cells) and ES cells in which Histone deacetylase 1 (HDAC1) has been knocked out. OmicsDI

    The organism used is mouse (Mus musculus). OmicsDI

    Microarray technology was employed to measure transcript abundance across the genome, aiming to identify putative HDAC1 target genes. OmicsDI +1

    The dataset includes processed expression data (after normalization and log2 transformation), allowing for downstream exploratory data analysis (EDA) and differential gene expression (DGE) analysis.

    As part of EDA, sample‑wise distribution plots (e.g. boxplots) are provided to assess normalization across all arrays.

    The dataset also includes downstream visualizations and analysis results, such as boxplots, which help in evaluating the consistency and quality of the processed data.

    Researchers can use this dataset to perform differential expression analysis between HDAC1 knockout vs wild‑type ES cells, investigate epigenetic regulation, or explore downstream effects of histone deacetylation loss.

    Additionally, the dataset can serve as a reference example for microarray data preprocessing, normalization, transformation (e.g. log2), and exploratory visualization workflows.

    The dataset is publicly available and sourced from a trusted repository (GEO), ensuring transparency and reproducibility of the experiment.

  8. DGE GO Enrichment Analysis Microarray Data GDS2778

    • kaggle.com
    zip
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr. Nagendra (2025). DGE GO Enrichment Analysis Microarray Data GDS2778 [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/dge-go-enrichment-analysis-microarray-data-gds2778
    Explore at:
    zip(6820264 bytes)Available download formats
    Dataset updated
    Nov 29, 2025
    Authors
    Dr. Nagendra
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    his dataset is based on National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) DataSet accession GDS2778. girke.bioinformatics.ucr.edu +1

    The dataset originates from a microarray experiment measuring global gene expression under specific experimental conditions. girke.bioinformatics.ucr.edu +1

    Raw and processed expression data (for all probes/genes) are included, enabling downstream analysis such as normalization, differential expression, and clustering.

    The dataset has been used to perform differential gene expression (DGE) analysis to identify genes that are up- or down-regulated under the experimental condition compared to control.

    Data processing steps typically include normalization (e.g., log-transformation), quality control, probe-to-gene mapping, and statistical testing for significance (e.g., using packages such as limma or other DGE tools). mahsa-ehsanifard.github.io +1

    Resulting differentially expressed genes (DEGs) include statistics such as log fold change (logFC), adjusted p‑values (adj.P.Val), and possibly other metrics (e.g., B-statistic), allowing assessment of both magnitude and significance of changes.

    The dataset also includes a visualization file (heatmap image) that displays expression patterns of DEGs (or top variable genes) across samples — enabling clustering and pattern recognition across samples and genes.

    The heatmap helps illustrate sample-wise and gene-wise expression variation: clustering groups together samples (e.g. control vs treatment) and genes with similar expression dynamics. NCBI +1

    This dataset is suitable for further bioinformatics analysis: e.g. functional enrichment (GO/Pathway), co‑expression analysis, gene signature identification, or integration with other datasets.

    Users who download this dataset can reproduce or extend analyses, such as re-normalization, alternative clustering, custom DEG thresholds, or downstream biological interpretation (pathway, network analysis).

  9. r

    Onto-Compare

    • rrid.site
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Onto-Compare [Dataset]. http://identifiers.org/RRID:SCR_005669
    Explore at:
    Description

    Microarrays are at the center of a revolution in biotechnology, allowing researchers to screen tens of thousands of genes simultaneously. Typically, they have been used in exploratory research to help formulate hypotheses. In most cases, this phase is followed by a more focused, hypothesis driven stage in which certain specific biological processes and pathways are thought to be involved. Since a single biological process can still involve hundreds of genes, microarrays are still the preferred approach as proven by the availability of focused arrays from several manufacturers. Since focused arrays from different manufacturers use different sets of genes, each array will represent any given regulatory pathway to a different extent. We argue that a functional analysis of the arrays available should be the most important criterion used in the array selection. We developed Onto-Compare as a database that can provide this functionality, based on the GO nomenclature. Compare commercially available microarrays based on GO. User account required. Platform: Online tool

  10. n

    Tissue Microarray Database

    • neuinfo.org
    • scicrunch.org
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Tissue Microarray Database [Dataset]. http://identifiers.org/RRID:SCR_005527
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    THIS RESOURCE IS NO LONGER IN SERVICE. Documented on May 2nd,2023. TMAD stores raw and processed data from Tissue Microarray experiments along with their corresponding stained tissue images. In addition, TMAD provides methods for data retrieval, grouping of data, analysis and visualization as well as export to standard formats. Researchers at the Stanford University School of Medicine and their collaborators worldwide have constructed many tissue microarrays for use in basic research.

  11. r

    UNC Microarray Database

    • rrid.site
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). UNC Microarray Database [Dataset]. http://identifiers.org/RRID:SCR_010979
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Database for microarray data storage, retrieval, analysis, and visualization.

  12. File S1 - Visualization-Aided Classification Ensembles Discriminate Lung...

    • plos.figshare.com
    docx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ao Zhang; Chi Wang; Shiji Wang; Liang Li; Zhongmin Liu; Suyan Tian (2023). File S1 - Visualization-Aided Classification Ensembles Discriminate Lung Adenocarcinoma and Squamous Cell Carcinoma Samples Using Their Gene Expression Profiles [Dataset]. http://doi.org/10.1371/journal.pone.0110052.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ao Zhang; Chi Wang; Shiji Wang; Liang Li; Zhongmin Liu; Suyan Tian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Supplementary Materials. (DOCX)

  13. e

    Microarray data for rejuvenation experiments through reprogramming

    • ebi.ac.uk
    Updated Dec 13, 2011
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2011). Microarray data for rejuvenation experiments through reprogramming [Dataset]. https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-27924
    Explore at:
    Dataset updated
    Dec 13, 2011
    Description

    Proliferative and replicative senescent fibroblasts from aged human donors were reprogrammed towards pluripotency and re-differentiated in fibroblasts and then further analyzed for rejuvenation assessment. Comparison of microarrays were performed by non hierarchical clustering visualized in with Treeview software

  14. r

    Bio Resource for Array Genes Database

    • rrid.site
    • scicrunch.org
    • +2more
    Updated Oct 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Bio Resource for Array Genes Database [Dataset]. http://identifiers.org/RRID:SCR_000748
    Explore at:
    Dataset updated
    Oct 26, 2025
    Description

    Bio Resource for array genes is a free online resource for easy access to collective and integrated information from various public biological resources for human, mouse, rat, fly and c. elegans genes. The resource includes information about the genes that are represented in Unigene clusters. This resource provides interactive tools to selectively view, analyze and interpret gene expression patterns against the background of gene and protein functional information. Different query options are provided to mine the biological relationships represented in the underlying database. Search button will take you to the list of query tools available. This Bio resource is a platform designed as an online resource to assist researchers in analyzing results of microarray experiments and developing a biological interpretation of the results. This site is mainly to interpret the unique gene expression patterns found as biological changes that can lead to new diagnostic procedures and drug targets. This interactive site allows users to selectively view a variety of information about gene functions that is stored in an underlying database. Although there are other online resources that provide a comprehensive annotation and summary of genes, this resource differs from these by further enabling researchers to mine biological relationships amongst the genes captured in the database using new query tools. Thus providing a unique way of interpreting the microarray data results based on the knowledge provided for the cellular roles of genes and proteins. A total of six different query tools are provided and each offer different search features, analysis options and different forms of display and visualization of data. The data is collected in relational database from public resources: Unigene, Locus link, OMIM, NCBI dbEST, protein domains from NCBI CDD, Gene Ontology, Pathways (Kegg, Genmapp and Biocarta) and BIND (Protein interactions). Data is dynamically collected and compiled twice a week from public databases. Search options offer capability to organize and cluster genes based on their Interactions in biological pathways, their association with Gene Ontology terms, Tissue/organ specific expression or any other user-chosen functional grouping of genes. A color coding scheme is used to highlight differential gene expression patterns against a background of gene functional information. Concept hierarchies (Anatomy and Diseases) of MESH (Medical Subject Heading) terms are used to organize and display the data related to Tissue specific expression and Diseases. Sponsors: BioRag database is maintained by the Bioinformatics group at Arizona Cancer Center. The material presented here is compiled from different public databases. BioRag is hosted by the Biotechnology Computing Facility of the University of Arizona. 2002,2003 University of Arizona.

  15. Gene Expression in Brain Cancer

    • kaggle.com
    zip
    Updated Aug 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Khushi Mittal (2022). Gene Expression in Brain Cancer [Dataset]. https://www.kaggle.com/datasets/khushim10/gene-expression-in-brain-cancer/suggestions
    Explore at:
    zip(17519115 bytes)Available download formats
    Dataset updated
    Aug 21, 2022
    Authors
    Khushi Mittal
    Description

    The datasets presented here are the description of the dataset Brain cancer gene expression - CuMiD. They include the total count, mean, standard deviation, minimum, maximum, percentiles(25%,50%,70%), and maximum, for Normal samples and cancer types: Ependymoma, Glioblastoma, Medulloblastoma, and Pilocytic Astrocytoma. These datasets could be helpful with the visualization of the large dataset by CuMiD

  16. d

    Public Expression Profiling Resource

    • dknet.org
    • scicrunch.org
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Public Expression Profiling Resource [Dataset]. http://identifiers.org/RRID:SCR_007274
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    An experiment in web-database access to large multi-dimensional data sets using a standardized experimental platform to determine if the larger scientific community can be given simple, intuitive, and user-friendly web-based access to large microarray data sets. All data in PEPR is also available via NCBI GEO. The structure and goals of PEPR differ from other mRNA expression profiling databases in a number of important ways. * The experimental platform in PEPR is standardized, and is an Affymetrix - only database. All microarrays available in the PEPR web database should ascribe to quality control and standard operating procedures. A recent publication has described the QC/SOP criteria utilized in PEPR profiles ( The Tumor Analysis Best Practices Working Group 2004 ). * PEPR permits gene-based queries of large Affymetrix array data sets without any specialized software. For example, a number of large time series projects are available within PEPR, containing 40-60 microarrays, yet these can be simply queried via a dynamic web interface with no prior knowledge of microarray data analysis. * Projects in PEPR originate from scientists world-wide, but all data has been generated by the Research Center for Genetic Medicine, Children''''s National Medical Center, Washington DC. Future developments of PEPR will allow remote entry of Affymetrix data ascribing to the same QC/SOP protocols. They have previously described an initial implementation of PEPR, and a dynamic web-queried time series graphical interface ( Chen et al. 2004 ). A publication showing the utility of PEPR for pharmacodynamic data has recently been published ( Almon et al. 2003 ).

  17. Galaxy Training Data for "End-to-End Tissue Microarray Image Analysis with...

    • zenodo.org
    csv, tiff
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allison L Creason; Allison L Creason; Cameron Watson; Cameron Watson (2024). Galaxy Training Data for "End-to-End Tissue Microarray Image Analysis with Galaxy-ME" [Dataset]. http://doi.org/10.5281/zenodo.7622545
    Explore at:
    tiff, csvAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Allison L Creason; Allison L Creason; Cameron Watson; Cameron Watson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset provides the inputs used in the Galaxy Training Network (GTN) training 'End-to-End Tissue Microarray Image Analysis with Galaxy-ME'. The tutorial demonstrates how to use the Galaxy-ME tool suite for primary image processing, data analysis, and interactive visualization of multiple tissue imaging datasets. Original data was published by Schapiro et al.

  18. s

    Genevestigator

    • scicrunch.org
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Genevestigator [Dataset]. http://identifiers.org/RRID:SCR_002358
    Explore at:
    Description

    A high performance search engine for gene expression that integrates thousands of manually curated public microarray and RNAseq experiments and nicely visualizes gene expression across different biological contexts (diseases, drugs, tissues, cancers, genotypes, etc.). There are two basic analysis approaches: # for a gene of interest, identify which conditions affect its expression. # for condition(s) of interest, identify which genes are specifically expressed in this/these conditions. Genevestigator builds on the deep integration of data, both at the level of data normalization and on the level of sample annotations. This deep integration allows scientists to ask new types of questions that cannot be addressed using conventional tools.

  19. GEO ExpressionMatrixHandlingNormalization GSE32138

    • kaggle.com
    zip
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr. Nagendra (2025). GEO ExpressionMatrixHandlingNormalization GSE32138 [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/geo-expressionmatrixhandlingnormalization-gse32138
    Explore at:
    zip(8536153 bytes)Available download formats
    Dataset updated
    Nov 29, 2025
    Authors
    Dr. Nagendra
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    • This dataset contains expression matrix handling and normalization results derived from GEO dataset GSE32138. • It includes raw gene expression values processed using standardized bioinformatics workflows. • The dataset demonstrates quantile normalization applied to microarray-based expression data. • It provides visualization outputs used to assess data distribution before and after normalization. • The goal of this dataset is to support reproducible analysis of GSE32138 preprocessing and quality control. • Researchers can use the files for practice in normalization, exploratory data analysis, and visualization. • This dataset is useful for learning microarray preprocessing techniques in R or Python.

  20. b

    biological Process Inference from eXperimental Interaction...

    • bioregistry.io
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). biological Process Inference from eXperimental Interaction Evidence/Microarray Experiment Functional Integration Technology [Dataset]. https://bioregistry.io/registry/biopixie
    Explore at:
    Dataset updated
    May 31, 2023
    Description

    bioPIXIE is a novel system for biological data integration and visualization. It allows you to discover interaction networks and pathways in which your gene(s) (e.g. BNI1, YFL039C) of interest participate.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
National Institutes of Health (2025). Permutation-validated principal components analysis of microarray data [Dataset]. https://catalog.data.gov/dataset/permutation-validated-principal-components-analysis-of-microarray-data

Data from: Permutation-validated principal components analysis of microarray data

Related Article
Explore at:
Dataset updated
Sep 7, 2025
Dataset provided by
National Institutes of Health
Description

Background In microarray data analysis, the comparison of gene-expression profiles with respect to different conditions and the selection of biologically interesting genes are crucial tasks. Multivariate statistical methods have been applied to analyze these large datasets. Less work has been published concerning the assessment of the reliability of gene-selection procedures. Here we describe a method to assess reliability in multivariate microarray data analysis using permutation-validated principal components analysis (PCA). The approach is designed for microarray data with a group structure. Results We used PCA to detect the major sources of variance underlying the hybridization conditions followed by gene selection based on PCA-derived and permutation-based test statistics. We validated our method by applying it to well characterized yeast cell-cycle data and to two datasets from our laboratory. We could describe the major sources of variance, select informative genes and visualize the relationship of genes and arrays. We observed differences in the level of the explained variance and the interpretability of the selected genes. Conclusions Combining data visualization and permutation-based gene selection, permutation-validated PCA enables one to illustrate gene-expression variance between several conditions and to select genes by taking into account the relationship of between-group to within-group variance of genes. The method can be used to extract the leading sources of variance from microarray data, to visualize relationships between genes and hybridizations and to select informative genes in a statistically reliable manner. This selection accounts for the level of reproducibility of replicates or group structure as well as gene-specific scatter. Visualization of the data can support a straightforward biological interpretation.

Search
Clear search
Close search
Google apps
Main menu