6 datasets found
  1. GSE206848 Data Normalization and Subtype Analysis

    • kaggle.com
    zip
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr. Nagendra (2025). GSE206848 Data Normalization and Subtype Analysis [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/gse206848-data-normalization-and-subtype-analysis
    Explore at:
    zip(2631363 bytes)Available download formats
    Dataset updated
    Nov 29, 2025
    Authors
    Dr. Nagendra
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset for human osteoarthritis (OA) — microarray gene expression (Affymetrix GPL570) PMC +1

    Contains expression data for 7 healthy control (normal) tissue samples and 7 osteoarthritis patient tissue samples from synovial / joint tissue. PMC +1

    Pre-processed for normalization (background correction, log-transformation, normalization) to remove technical variation.

    Suitable for downstream analyses: differential gene expression (normal vs OA), subtype- or phenotype-based classification, machine learning.

    Can act as a validation dataset when combining with other GEO datasets to increase sample size or test reproducibility. SpringerLink +1

    Useful for biomarker discovery, pathway enrichment analysis (e.g., GO, KEGG), immune infiltration analysis, and subtype analysis in osteoarthritis research.

  2. DGE GO Enrichment Analysis Microarray Data GDS2778

    • kaggle.com
    zip
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr. Nagendra (2025). DGE GO Enrichment Analysis Microarray Data GDS2778 [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/dge-go-enrichment-analysis-microarray-data-gds2778
    Explore at:
    zip(6820264 bytes)Available download formats
    Dataset updated
    Nov 29, 2025
    Authors
    Dr. Nagendra
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    his dataset is based on National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) DataSet accession GDS2778. girke.bioinformatics.ucr.edu +1

    The dataset originates from a microarray experiment measuring global gene expression under specific experimental conditions. girke.bioinformatics.ucr.edu +1

    Raw and processed expression data (for all probes/genes) are included, enabling downstream analysis such as normalization, differential expression, and clustering.

    The dataset has been used to perform differential gene expression (DGE) analysis to identify genes that are up- or down-regulated under the experimental condition compared to control.

    Data processing steps typically include normalization (e.g., log-transformation), quality control, probe-to-gene mapping, and statistical testing for significance (e.g., using packages such as limma or other DGE tools). mahsa-ehsanifard.github.io +1

    Resulting differentially expressed genes (DEGs) include statistics such as log fold change (logFC), adjusted p‑values (adj.P.Val), and possibly other metrics (e.g., B-statistic), allowing assessment of both magnitude and significance of changes.

    The dataset also includes a visualization file (heatmap image) that displays expression patterns of DEGs (or top variable genes) across samples — enabling clustering and pattern recognition across samples and genes.

    The heatmap helps illustrate sample-wise and gene-wise expression variation: clustering groups together samples (e.g. control vs treatment) and genes with similar expression dynamics. NCBI +1

    This dataset is suitable for further bioinformatics analysis: e.g. functional enrichment (GO/Pathway), co‑expression analysis, gene signature identification, or integration with other datasets.

    Users who download this dataset can reproduce or extend analyses, such as re-normalization, alternative clustering, custom DEG thresholds, or downstream biological interpretation (pathway, network analysis).

  3. Assessment of data transformations for model-based clustering of RNA-Seq...

    • plos.figshare.com
    xlsx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Janelle R. Noel-MacDonnell; Joseph Usset; Ellen L. Goode; Brooke L. Fridley (2023). Assessment of data transformations for model-based clustering of RNA-Seq data [Dataset]. http://doi.org/10.1371/journal.pone.0191758
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Janelle R. Noel-MacDonnell; Joseph Usset; Ellen L. Goode; Brooke L. Fridley
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Quality control, global biases, normalization, and analysis methods for RNA-Seq data are quite different than those for microarray-based studies. The assumption of normality is reasonable for microarray based gene expression data; however, RNA-Seq data tend to follow an over-dispersed Poisson or negative binomial distribution. Little research has been done to assess how data transformations impact Gaussian model-based clustering with respect to clustering performance and accuracy in estimating the correct number of clusters in RNA-Seq data. In this article, we investigate Gaussian model-based clustering performance and accuracy in estimating the correct number of clusters by applying four data transformations (i.e., naïve, logarithmic, Blom, and variance stabilizing transformation) to simulated RNA-Seq data. To do so, an extensive simulation study was carried out in which the scenarios varied in terms of: how genes were selected to be included in the clustering analyses, size of the clusters, and number of clusters. Following the application of the different transformations to the simulated data, Gaussian model-based clustering was carried out. To assess clustering performance for each of the data transformations, the adjusted rand index, clustering error rate, and concordance index were utilized. As expected, our results showed that clustering performance was gained in scenarios where data transformations were applied to make the data appear “more” Gaussian in distribution.

  4. Data Preprocessing EDA Microarray GE Data GSE5583

    • kaggle.com
    zip
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr. Nagendra (2025). Data Preprocessing EDA Microarray GE Data GSE5583 [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/data-preprocessing-eda-microarray-ge-data-gse5583
    Explore at:
    zip(3144708 bytes)Available download formats
    Dataset updated
    Nov 29, 2025
    Authors
    Dr. Nagendra
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset is based on GEO series GSE5583. OmicsDI

    The experiment compares gene expression profiles between wild‑type mouse embryonic stem cells (ES cells) and ES cells in which Histone deacetylase 1 (HDAC1) has been knocked out. OmicsDI

    The organism used is mouse (Mus musculus). OmicsDI

    Microarray technology was employed to measure transcript abundance across the genome, aiming to identify putative HDAC1 target genes. OmicsDI +1

    The dataset includes processed expression data (after normalization and log2 transformation), allowing for downstream exploratory data analysis (EDA) and differential gene expression (DGE) analysis.

    As part of EDA, sample‑wise distribution plots (e.g. boxplots) are provided to assess normalization across all arrays.

    The dataset also includes downstream visualizations and analysis results, such as boxplots, which help in evaluating the consistency and quality of the processed data.

    Researchers can use this dataset to perform differential expression analysis between HDAC1 knockout vs wild‑type ES cells, investigate epigenetic regulation, or explore downstream effects of histone deacetylation loss.

    Additionally, the dataset can serve as a reference example for microarray data preprocessing, normalization, transformation (e.g. log2), and exploratory visualization workflows.

    The dataset is publicly available and sourced from a trusted repository (GEO), ensuring transparency and reproducibility of the experiment.

  5. N

    Data from: Diverse and specific gene expression responses to stresses in...

    • data.niaid.nih.gov
    Updated Jul 29, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Whitfield M; Murray J (2013). Diverse and specific gene expression responses to stresses in cultured human cells. [Dataset]. https://data.niaid.nih.gov/resources?id=gse4301
    Explore at:
    Dataset updated
    Jul 29, 2013
    Dataset provided by
    Stanford Microarray Database (SMD)
    Authors
    Whitfield M; Murray J
    Description

    TT3 and TN are cell cycle time courses and should be centered or transformed before clustering or computing fourier scores. The preferred method is SVD centering preceded by KNNImpute missing-value estimation, although mean-centering is a viable alternative. Hela Heat 42 degrees time course has 2 zeroes (the second is labeled "Cold 25 degrees 0h"). Normalization is by centering or zero transformation (average 2 zeroes) WI38 Crowding (1-3 days) was by plating cells at ~50%? confluence and allowing to age. The zeroes are the same as for WI38 Heat Shock (there are 2) Other time courses should be self-explanatory. They have 1-4 zeroes each and can be treated either by centering or zero-transforming. WI38Set of arrays organized by shared biological context, such as organism, tumors types, processes, etc.Keywords: Logical Set Computed

  6. pQTL GWAS association data for ACE2 in plasma from INTERVAL study

    • figshare.com
    application/gzip
    Updated Apr 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephen Burgess (2020). pQTL GWAS association data for ACE2 in plasma from INTERVAL study [Dataset]. http://doi.org/10.6084/m9.figshare.12102777.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Apr 12, 2020
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Stephen Burgess
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Genetic association estimates with circulating plasma ACE2 levels were obtained in a subcohort of 4,998 blood donors enrolled in the INTERVAL study. Plasma ACE2 levels were measured using a multiplex proximity extension immunoassay (Cardiovascular 2 panel, Olink Bioscience, Uppsala, Sweden). A total of 4,947 samples passed quality control. The data were pre-processed using standard Olink workflows including applying median centring normalization across plates, where the median is centred to the overall median for all plates, followed by log2 transformation to provide normalized protein levels (NPX). NPX values were regressed on age, sex, plate, time from blood draw to processing (in days), and season. The residuals were then rank-inverse normalized. Genotype data was processed as described previously. Genome-wide pQTL analysis was performed by linear regression of the rank-inverse normalized residuals on genotype in SNPTEST, with the first three components of multi-dimensional scaling as covariates to adjust for ancestry.

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Dr. Nagendra (2025). GSE206848 Data Normalization and Subtype Analysis [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/gse206848-data-normalization-and-subtype-analysis
Organization logo

GSE206848 Data Normalization and Subtype Analysis

“Normalized Microarray Expression Data for Osteoarthritis vs Healthy Controls”

Explore at:
zip(2631363 bytes)Available download formats
Dataset updated
Nov 29, 2025
Authors
Dr. Nagendra
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Dataset for human osteoarthritis (OA) — microarray gene expression (Affymetrix GPL570) PMC +1

Contains expression data for 7 healthy control (normal) tissue samples and 7 osteoarthritis patient tissue samples from synovial / joint tissue. PMC +1

Pre-processed for normalization (background correction, log-transformation, normalization) to remove technical variation.

Suitable for downstream analyses: differential gene expression (normal vs OA), subtype- or phenotype-based classification, machine learning.

Can act as a validation dataset when combining with other GEO datasets to increase sample size or test reproducibility. SpringerLink +1

Useful for biomarker discovery, pathway enrichment analysis (e.g., GO, KEGG), immune infiltration analysis, and subtype analysis in osteoarthritis research.

Search
Clear search
Close search
Google apps
Main menu