Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset for human osteoarthritis (OA) — microarray gene expression (Affymetrix GPL570) PMC +1
Contains expression data for 7 healthy control (normal) tissue samples and 7 osteoarthritis patient tissue samples from synovial / joint tissue. PMC +1
Pre-processed for normalization (background correction, log-transformation, normalization) to remove technical variation.
Suitable for downstream analyses: differential gene expression (normal vs OA), subtype- or phenotype-based classification, machine learning.
Can act as a validation dataset when combining with other GEO datasets to increase sample size or test reproducibility. SpringerLink +1
Useful for biomarker discovery, pathway enrichment analysis (e.g., GO, KEGG), immune infiltration analysis, and subtype analysis in osteoarthritis research.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
his dataset is based on National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) DataSet accession GDS2778. girke.bioinformatics.ucr.edu +1
The dataset originates from a microarray experiment measuring global gene expression under specific experimental conditions. girke.bioinformatics.ucr.edu +1
Raw and processed expression data (for all probes/genes) are included, enabling downstream analysis such as normalization, differential expression, and clustering.
The dataset has been used to perform differential gene expression (DGE) analysis to identify genes that are up- or down-regulated under the experimental condition compared to control.
Data processing steps typically include normalization (e.g., log-transformation), quality control, probe-to-gene mapping, and statistical testing for significance (e.g., using packages such as limma or other DGE tools). mahsa-ehsanifard.github.io +1
Resulting differentially expressed genes (DEGs) include statistics such as log fold change (logFC), adjusted p‑values (adj.P.Val), and possibly other metrics (e.g., B-statistic), allowing assessment of both magnitude and significance of changes.
The dataset also includes a visualization file (heatmap image) that displays expression patterns of DEGs (or top variable genes) across samples — enabling clustering and pattern recognition across samples and genes.
The heatmap helps illustrate sample-wise and gene-wise expression variation: clustering groups together samples (e.g. control vs treatment) and genes with similar expression dynamics. NCBI +1
This dataset is suitable for further bioinformatics analysis: e.g. functional enrichment (GO/Pathway), co‑expression analysis, gene signature identification, or integration with other datasets.
Users who download this dataset can reproduce or extend analyses, such as re-normalization, alternative clustering, custom DEG thresholds, or downstream biological interpretation (pathway, network analysis).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Quality control, global biases, normalization, and analysis methods for RNA-Seq data are quite different than those for microarray-based studies. The assumption of normality is reasonable for microarray based gene expression data; however, RNA-Seq data tend to follow an over-dispersed Poisson or negative binomial distribution. Little research has been done to assess how data transformations impact Gaussian model-based clustering with respect to clustering performance and accuracy in estimating the correct number of clusters in RNA-Seq data. In this article, we investigate Gaussian model-based clustering performance and accuracy in estimating the correct number of clusters by applying four data transformations (i.e., naïve, logarithmic, Blom, and variance stabilizing transformation) to simulated RNA-Seq data. To do so, an extensive simulation study was carried out in which the scenarios varied in terms of: how genes were selected to be included in the clustering analyses, size of the clusters, and number of clusters. Following the application of the different transformations to the simulated data, Gaussian model-based clustering was carried out. To assess clustering performance for each of the data transformations, the adjusted rand index, clustering error rate, and concordance index were utilized. As expected, our results showed that clustering performance was gained in scenarios where data transformations were applied to make the data appear “more” Gaussian in distribution.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is based on GEO series GSE5583. OmicsDI
The experiment compares gene expression profiles between wild‑type mouse embryonic stem cells (ES cells) and ES cells in which Histone deacetylase 1 (HDAC1) has been knocked out. OmicsDI
The organism used is mouse (Mus musculus). OmicsDI
Microarray technology was employed to measure transcript abundance across the genome, aiming to identify putative HDAC1 target genes. OmicsDI +1
The dataset includes processed expression data (after normalization and log2 transformation), allowing for downstream exploratory data analysis (EDA) and differential gene expression (DGE) analysis.
As part of EDA, sample‑wise distribution plots (e.g. boxplots) are provided to assess normalization across all arrays.
The dataset also includes downstream visualizations and analysis results, such as boxplots, which help in evaluating the consistency and quality of the processed data.
Researchers can use this dataset to perform differential expression analysis between HDAC1 knockout vs wild‑type ES cells, investigate epigenetic regulation, or explore downstream effects of histone deacetylation loss.
Additionally, the dataset can serve as a reference example for microarray data preprocessing, normalization, transformation (e.g. log2), and exploratory visualization workflows.
The dataset is publicly available and sourced from a trusted repository (GEO), ensuring transparency and reproducibility of the experiment.
Facebook
TwitterTT3 and TN are cell cycle time courses and should be centered or transformed before clustering or computing fourier scores. The preferred method is SVD centering preceded by KNNImpute missing-value estimation, although mean-centering is a viable alternative. Hela Heat 42 degrees time course has 2 zeroes (the second is labeled "Cold 25 degrees 0h"). Normalization is by centering or zero transformation (average 2 zeroes) WI38 Crowding (1-3 days) was by plating cells at ~50%? confluence and allowing to age. The zeroes are the same as for WI38 Heat Shock (there are 2) Other time courses should be self-explanatory. They have 1-4 zeroes each and can be treated either by centering or zero-transforming. WI38Set of arrays organized by shared biological context, such as organism, tumors types, processes, etc.Keywords: Logical Set Computed
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Genetic association estimates with circulating plasma ACE2 levels were obtained in a subcohort of 4,998 blood donors enrolled in the INTERVAL study. Plasma ACE2 levels were measured using a multiplex proximity extension immunoassay (Cardiovascular 2 panel, Olink Bioscience, Uppsala, Sweden). A total of 4,947 samples passed quality control. The data were pre-processed using standard Olink workflows including applying median centring normalization across plates, where the median is centred to the overall median for all plates, followed by log2 transformation to provide normalized protein levels (NPX). NPX values were regressed on age, sex, plate, time from blood draw to processing (in days), and season. The residuals were then rank-inverse normalized. Genotype data was processed as described previously. Genome-wide pQTL analysis was performed by linear regression of the rank-inverse normalized residuals on genotype in SNPTEST, with the first three components of multi-dimensional scaling as covariates to adjust for ancestry.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset for human osteoarthritis (OA) — microarray gene expression (Affymetrix GPL570) PMC +1
Contains expression data for 7 healthy control (normal) tissue samples and 7 osteoarthritis patient tissue samples from synovial / joint tissue. PMC +1
Pre-processed for normalization (background correction, log-transformation, normalization) to remove technical variation.
Suitable for downstream analyses: differential gene expression (normal vs OA), subtype- or phenotype-based classification, machine learning.
Can act as a validation dataset when combining with other GEO datasets to increase sample size or test reproducibility. SpringerLink +1
Useful for biomarker discovery, pathway enrichment analysis (e.g., GO, KEGG), immune infiltration analysis, and subtype analysis in osteoarthritis research.