100+ datasets found

Clust_100_GE_datasets
zenodo.org
pdf, zip
Updated Aug 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Basel Abu-Jamous; Basel Abu-Jamous; Steven Kelly; Steven Kelly (2024). Clust_100_GE_datasets [Dataset]. http://doi.org/10.5281/zenodo.1169191
Explore at:
zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1169191
Dataset updated
Aug 2, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Basel Abu-Jamous; Basel Abu-Jamous; Steven Kelly; Steven Kelly
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
100 microarray and RNA-seq gene expression datasets from five model species (human, mouse, fruit fly, arabidopsis plants, and baker's yeast). These datasets represent the benchmark set that was used to test our clust clustering method and to compare it with five widely used clustering methods (MCL, k-means, hierarchical clustering, WGCNA, and self-organising maps). This data resource includes raw data files, pre-processed data files, clustering results, clustering results evaluation, and scripts.

The files are split into three zipped parts, 100Datasets_part_1.zip, 100Datasets_part_2.zip, and 100Datasets_part_3.zip. The contents of the three zipped files should be extracted to a single folder (e.g. 100Datasets).

Below is a thorough description of the files and folders in this data resource.

Scripts

The scripts used to apply each one of the clustering methods to each one of the 100 datasets and to evaluate their results are all included in the folder (scripts/).

Datasets and clustering results (folders starting with D)

The datasets are labelled as D001 to D100. Each dataset has two folders: D###/ and D###_Res/, where ### is the number of the dataset. The first folder only includes the raw dataset while the second folder includes the results of applying the clustering methods to that dataset. The files ending with _B.tsv include clustering results in the form of a partition matrix. The files ending with _E include metrics evaluating the clustering results. The files ending with _go and _go_E respectively include the enriched GO terms in the clustering results and evaluation metrics of these GO terms.

Simultaneous analysis of multiple datasets (folders starting with MD)

As our clust method is design to be able to extract clusters from multiple datasets simultaneously, we also tested it over multiple datasets. All folders starting with MD_ are related to "multiple datasets (MD)" results. Each MD experiment simultaneously analyses d randomly selected datasets either out of a set of 10 arabidopsis datasets or out of a set of 10 yeast datasets. For each one of the two species, all d values from 2 to 10 were tested, and at each one of these d values, 10 different runs were conducted, where at each run a different subset of d datasets is selected randomly.

The folders MD_10A and MD_10Y include the full sets of 10 arabidposis or 10 yeast datasets, respectively. Each folder with the format MD_10#_d#_Res## includes the results of applying the six clustering methods at one of the 10 random runs of one of the selected d values. For example, the "MD_10A_d4_Res03/" folder includes the clustering results of the 3^rd random selection of 4 arabidopsis datasets (the letter A in the folder's name refers to arabidopsis).

Our clust method is applied directly over multiple datasets where each dataset is in a separate data file. Each "MD_10#_d#_Res##" folder includes these individual files in a sub-folder named "Processed_Data/". However, the other clustering methods only accept a single input data file. Therefore, the datasets are merged first before being submitted to these methods. Each "MD_10#_d#_Res##" folder includes a file "X_merged.tsv" for the merged data.

Evaluation metrics (folders starting with Metrics)

Each clustering results folder (D##_Res or MD_10#_d#_Res##) includes some clustering evaluation files ending with _E. This information is combined into tables for all datasets, and these tables appear in the folders starting with "Metrics_".

Other files and folders

The GO folder includes the reference GO term annotations for arabidopsis and yeast. The Datasets file includes a TAB delimited table describing the 100 datasets. The SearchCriterion file includes the objective methodology of searching the NCBI database to select these 100 datasets. The Specials file includes some special considerations for couple of datasets that differ a bit from what is described in the SearchCriterion file. The Norm### files and the files in the Reps/ folder describe normalisation codes and replicate structures for the datasets and were fed to the clust method as inputs. The Plots/ folder includes plots of the gene expression profiles of the individual genes in the clusters generated by each one of the 6 methods over each one of the 100 datasets. Only up to 14 clusters per method are plotted.
f
Description of six real microarray data sets.
figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Guifang Shao; Dongyao Li; Junfa Zhang; Jianbo Yang; Yali Shangguan (2023). Description of six real microarray data sets. [Dataset]. http://doi.org/10.1371/journal.pone.0210075.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0210075.t002
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Guifang Shao; Dongyao Li; Junfa Zhang; Jianbo Yang; Yali Shangguan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description of six real microarray data sets.
d
Bio Resource for Array Genes Database
dknet.org
scicrunch.org
+1more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Bio Resource for Array Genes Database [Dataset]. http://identifiers.org/RRID:SCR_000748
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_000748
Dataset updated
Jan 29, 2022
Description
Bio Resource for array genes is a free online resource for easy access to collective and integrated information from various public biological resources for human, mouse, rat, fly and c. elegans genes. The resource includes information about the genes that are represented in Unigene clusters. This resource provides interactive tools to selectively view, analyze and interpret gene expression patterns against the background of gene and protein functional information. Different query options are provided to mine the biological relationships represented in the underlying database. Search button will take you to the list of query tools available. This Bio resource is a platform designed as an online resource to assist researchers in analyzing results of microarray experiments and developing a biological interpretation of the results. This site is mainly to interpret the unique gene expression patterns found as biological changes that can lead to new diagnostic procedures and drug targets. This interactive site allows users to selectively view a variety of information about gene functions that is stored in an underlying database. Although there are other online resources that provide a comprehensive annotation and summary of genes, this resource differs from these by further enabling researchers to mine biological relationships amongst the genes captured in the database using new query tools. Thus providing a unique way of interpreting the microarray data results based on the knowledge provided for the cellular roles of genes and proteins. A total of six different query tools are provided and each offer different search features, analysis options and different forms of display and visualization of data. The data is collected in relational database from public resources: Unigene, Locus link, OMIM, NCBI dbEST, protein domains from NCBI CDD, Gene Ontology, Pathways (Kegg, Genmapp and Biocarta) and BIND (Protein interactions). Data is dynamically collected and compiled twice a week from public databases. Search options offer capability to organize and cluster genes based on their Interactions in biological pathways, their association with Gene Ontology terms, Tissue/organ specific expression or any other user-chosen functional grouping of genes. A color coding scheme is used to highlight differential gene expression patterns against a background of gene functional information. Concept hierarchies (Anatomy and Diseases) of MESH (Medical Subject Heading) terms are used to organize and display the data related to Tissue specific expression and Diseases. Sponsors: BioRag database is maintained by the Bioinformatics group at Arizona Cancer Center. The material presented here is compiled from different public databases. BioRag is hosted by the Biotechnology Computing Facility of the University of Arizona. 2002,2003 University of Arizona.
f
MOESM2 of AutoSOME: a clustering method for identifying gene expression...
springernature.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aaron Newman; James Cooper (2023). MOESM2 of AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number [Dataset]. http://doi.org/10.6084/m9.figshare.8136281.v1
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.8136281.v1
Dataset updated
May 31, 2023
Dataset provided by
figshare
Authors
Aaron Newman; James Cooper
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 2: Table S2. F-measure and NMI for each benchmarking dataset an clustering method. (XLS 34 KB)
f
MOESM4 of AutoSOME: a clustering method for identifying gene expression...
springernature.figshare.com
xls
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aaron Newman; James Cooper (2023). MOESM4 of AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number [Dataset]. http://doi.org/10.6084/m9.figshare.8136293.v1
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.8136293.v1
Dataset updated
Jun 4, 2023
Dataset provided by
figshare
Authors
Aaron Newman; James Cooper
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 4: PluriUp and PluriPlus gene list and raw interaction network. Table S7, Updated HUGO gene symbols for PluriUp and PluriPlus; Table S8, Edges of PluriPlus interaction network; Table S9, Nodes and annotation of PluriPlus interaction network. (XLS 672 KB)
d
Microarray Analysis of chemosensitivity in Laryngeal Squamous Cell Carcinoma...
datamed.org
Updated Aug 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). Microarray Analysis of chemosensitivity in Laryngeal Squamous Cell Carcinoma [Dataset]. https://datamed.org/display-item.php?repository=0044&idName=ID&id=5841d7705152c649505ed12e
Explore at:
Dataset updated
Aug 19, 2016
Description
OBJECTIVE: To investigate the differentially expressed genes related to the chemosensitivity of laryngeal squamous cell carcinoma （LSCC）by microarrays arrays. METHODS: 1. A total number of 11 patients who underwent induction chemotherapy for primary hypopharyngeal squamous cell carcinoma (7 patients are sensitive to chemotherapy ,and others are not) were recruited for microarray and miRNA array gene expression analysis 2. Bioinformatics analysis of differentially expressed genes screened by microarrays : The differential gene cluster analysis was applied in biological processes, cellular components and molecular functions by GO database; The differential gene enrichment analysis was applied in signaling pathways by KEGG database, and the differentially expressed and biologically meaningful core genes would be screened. RESULTS: 1. Analyzed by microarrays, there were 1554 genes significantly related to the sensitivity to chemotherapy; Among these 1554genes, 777 showed a higher expression in the tissue from patients who are sensitive to chemotherapy , while 785 presented the contrasting pattern. CONCLUSIONS: The research revealed a gene expression signature of chemosensitivity in laryngeal squamous cell carcinoma by microarrays arrays. The result will contribute to the understanding of the molecular basis of laryngeal squamous cell carcinoma and help to improve diagnosis and treatment. 1. A total number of 11 patients who underwent induction chemotherapy for primary hypopharyngeal squamous cell carcinoma (7 patients are sensitive to chemotherapy ,and others are not) were recruited for microarray and miRNA array gene expression analysis 2. Bioinformatics analysis of differentially expressed genes screened by microarrays : The differential gene cluster analysis was applied in biological processes, cellular components and molecular functions by GO database; The differential gene enrichment analysis was applied in signaling pathways by KEGG database, and the differentially expressed and biologically meaningful core genes would be screened.
n
NIA Array Analysis
neuinfo.org
Updated Oct 16, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). NIA Array Analysis [Dataset]. http://identifiers.org/RRID:SCR_010948
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_010948 https://identifiers.org/RRID:SCR_010948/resolver/mentions?q=&i=rrid
Dataset updated
Oct 16, 2019
Description
Data analysis server / software designed to test statistical significance of gene microarray data, visualize the results, and provide links to clone information and gene index. Several public datasets are also available.
B
Coexpression Analysis of Human Genes Across Many Microarray Data Sets
borealisdata.ca
open.library.ubc.ca
Updated Mar 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Homin K Lee; Amy K Hsu; Jon Sajdak; Jie Qin; Paul Pavlidis (2019). Coexpression Analysis of Human Genes Across Many Microarray Data Sets [Dataset]. http://doi.org/10.5683/SP2/JOJYOP
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP2/JOJYOP
Dataset updated
Mar 12, 2019
Dataset provided by
Borealis
Authors
Homin K Lee; Amy K Hsu; Jon Sajdak; Jie Qin; Paul Pavlidis
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Dataset funded by
ABCF
Description
We present a large-scale analysis of mRNA coexpression based on 60 large human data sets containing a total of 3924 microarrays. We sought pairs of genes that were reliably coexpressed (based on the correlation of their expression profiles) in multiple data sets, establishing a high-confidence network of 8805 genes connected by 220,649 “coexpression links” that are observed in at least three data sets. Confirmed positive correlations between genes were much more common than confirmed negative correlations. We show that confirmation of coexpression in multiple data sets is correlated with functional relatedness, and show how cluster analysis of the network can reveal functionally coherent groups of genes. Our findings demonstrate how the large body of accumulated microarray data can be exploited to increase the reliability of inferences about gene function.
o
DNA microarrays (time course) in the iPS process treated with microRNA...
omicsdi.org
xml
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Duanqing Pei, DNA microarrays (time course) in the iPS process treated with microRNA clusters [Dataset]. https://www.omicsdi.org/dataset/geo/GSE23104
Explore at:
xmlAvailable download formats
Authors
Duanqing Pei
Variables measured
Other
Description
Treatment with MicroRNA cluster B and C increase the iPS efficiency We used microarrays to identify changes induced by MicroRNA clusters in the iPS process Overall design: One group were MEFs infected with SKO factors, plus MicroRNA cluster B, C or control blank virus. The other group were MEFs infected with MicroRNA cluster B, C or control blank virus only.TRIZOL cell lysates were prepared at D4 and D8.
d
Stemformatics
dknet.org
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Stemformatics [Dataset]. http://identifiers.org/RRID:SCR_017002/resolver/mentions?q=&i=rrid
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_017002 https://identifiers.org/RRID:SCR_017002/resolver/mentions?q=&i=rrid
Dataset updated
Apr 11, 2025
Description
Gene expression data portal developed for stem cell community, containing public gene expression datasets derived from microarray, RNA sequencing and single cell profiling technologies. Portal to visualize and download curated stem cell data. Provides easy to use and intuitive tools for biologists to visually explore data, including interactive gene expression profiles, principal component analysis plots and hierarchical clusters, among others.
f
Special gene expression comparison of four methods on 6 data sets.
plos.figshare.com
xls
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Guifang Shao; Dongyao Li; Junfa Zhang; Jianbo Yang; Yali Shangguan (2023). Special gene expression comparison of four methods on 6 data sets. [Dataset]. http://doi.org/10.1371/journal.pone.0210075.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0210075.t003
Dataset updated
Jun 2, 2023
Dataset provided by
PLOS ONE
Authors
Guifang Shao; Dongyao Li; Junfa Zhang; Jianbo Yang; Yali Shangguan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Special gene expression comparison of four methods on 6 data sets.
r
Data from: Consensus clustering of gene expression microarray data using...
researchdata.edu.au
bridges.monash.edu
Updated May 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexandre Mendes (2022). Consensus clustering of gene expression microarray data using genetic algorithms [Dataset]. http://doi.org/10.4225/03/5a13728358b1d
Explore at:
Unique identifier
https://doi.org/10.4225/03/5a13728358b1d
Dataset updated
May 5, 2022
Dataset provided by
Monash University
Authors
Alexandre Mendes
Description
This work presents a new consensus clustering method for gene expression microarray data based on a genetic algorithm. Using two datasets - DA and DB - as input, the genetic algorithm examines putative partitions for the samples in DA, selecting biomarkers that support such partitions. The biomarkers are then used to build a classifier which is used in DB to determine its samples classes. The genetic algorithm is guided by an objective function that takes into account the accuracy of classification in both datasets, the number of biomarkers that support the partition, and the distribution of the samples across the classes for each dataset. To illustrate the method, two whole-genome breast cancer instances from dfferent sources were used. In this application, the results indicate that the method could be used to find unknown subtypes of diseases supported by biomarkers presenting similar gene expression profiles across platforms. Moreover, even though this initial study was restricted to two datasets and two classes, the method can be easily extended to consider both more datasets and classes. PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1

Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ; Chetty, Madhu ; Ahmad, Shandar ; Ngom, Alioune ; Teng, Shyh Wei ; Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ; Coverage: Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.
f
Identifying Subspace Gene Clusters from Microarray Data Using Low-Rank...
plos.figshare.com
doc
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yan Cui; Chun-Hou Zheng; Jian Yang (2023). Identifying Subspace Gene Clusters from Microarray Data Using Low-Rank Representation [Dataset]. http://doi.org/10.1371/journal.pone.0059377
Explore at:
docAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0059377
Dataset updated
Jun 3, 2023
Dataset provided by
PLOS ONE
Authors
Yan Cui; Chun-Hou Zheng; Jian Yang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Identifying subspace gene clusters from the gene expression data is useful for discovering novel functional gene interactions. In this paper, we propose to use low-rank representation (LRR) to identify the subspace gene clusters from microarray data. LRR seeks the lowest-rank representation among all the candidates that can represent the genes as linear combinations of the bases in the dataset. The clusters can be extracted based on the block diagonal representation matrix obtained using LRR, and they can well capture the intrinsic patterns of genes with similar functions. Meanwhile, the parameter of LRR can balance the effect of noise so that the method is capable of extracting useful information from the data with high level of background noise. Compared with traditional methods, our approach can identify genes with similar functions yet without similar expression profiles. Also, it could assign one gene into different clusters. Moreover, our method is robust to the noise and can identify more biologically relevant gene clusters. When applied to three public datasets, the results show that the LRR based method is superior to existing methods for identifying subspace gene clusters.
Microarray Biochips Market Analysis North America, Europe, Asia, Rest of...
technavio.com
Updated Feb 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2022). Microarray Biochips Market Analysis North America, Europe, Asia, Rest of World (ROW) - US, UK, Germany, China, Japan - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/microarray-biochips-market-industry-analysis
Explore at:
Dataset updated
Feb 10, 2022
Dataset provided by
TechNavio
Authors
Technavio
Time period covered
2021 - 2025
Area covered
United States, Global
Description
Snapshot img

Microarray Biochips Market Size 2024-2028

The microarray biochips market size is forecast to increase by USD 17.28 billion, at a CAGR of 22.2% between 2023 and 2028.

The market is characterized by a growing number of collaborations among key players, which is expanding market presence and driving innovation. This strategic approach is essential in the capital-intensive market, where significant investments are required for research and development. A notable trend in the market is the emergence of Label-One-Component (LOC) technology, offering advantages such as improved sensitivity and specificity. However, the high cost of microarray biochips remains a significant challenge for market growth. Companies seeking to capitalize on opportunities must navigate this obstacle by focusing on cost reduction through economies of scale and process optimization. Additionally, collaborations and partnerships can help share research and development costs and accelerate time-to-market for innovative products. The strategic landscape of the market is dynamic, with ongoing advancements in technology and a growing demand for personalized medicine, creating opportunities for companies to differentiate themselves and gain a competitive edge.

What will be the Size of the Microarray Biochips Market during the forecast period?

Explore in-depth regional segment analysis with market size data - historical 2018-2022 and forecasts 2024-2028 - in the full report.
Request Free SampleThe market continues to evolve, driven by advancements in technology and expanding applications across various sectors. Protein microarray technology, a crucial component, enables high-throughput analysis of protein-protein interactions and antibody discovery. Reproducibility metrics and spot morphology analysis ensure consistency and accuracy in data generation. Label incorporation methods, such as biotinylated target cDNA and reverse transcription PCR, facilitate efficient probe attachment. Gene ontology enrichment and pathway analysis tools provide insights into biological functions and molecular interactions. Data mining algorithms, including clustering algorithms and fold change calculations, facilitate pattern recognition and discovery. Microarray data normalization techniques, such as CDNA microarray platforms and genomic DNA extraction, ensure data consistency. Microarray experimental design, hybridization kinetics, and high-throughput screening are essential for optimizing data generation and analysis. Single nucleotide polymorphism (SNP) detection and comparative genomic hybridization offer valuable insights into genetic variations. Data quality assessment, signal-to-noise ratios, and background correction methods ensure data accuracy and reliability. In situ hybridization and fluorescence detection methods facilitate visualization and analysis of gene expression at the cellular level. Differential gene expression analysis provides insights into disease mechanisms and therapeutic targets. Microarray scanner systems and image analysis software facilitate efficient and accurate data analysis. DNA microarray technology continues to evolve, offering exciting possibilities for research and diagnostic applications.

How is this Microarray Biochips Industry segmented?

The microarray biochips industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments. ApplicationDrug discovery and developmentDiagnostics and treatmentsResearch and consumablesForensic medicinesOthersGeographyNorth AmericaUSEuropeGermanyUKAPACChinaJapanRest of World (ROW)

By Application Insights

The drug discovery and development segment is estimated to witness significant growth during the forecast period.The market is witnessing significant growth due to its increasing application in drug discovery, driven by the rising preference for personalized medicines. With the global population aging, the demand for better healthcare solutions is escalating, leading manufacturers to continually innovate and improve microarray technology. In genomics and proteomics, microarray biochips are increasingly utilized, further fueling market growth. Advancements in protein microarray technology ensure greater reproducibility and accuracy, while spot morphology analysis and label incorporation enhance data reliability. Gene ontology enrichment and pathway analysis tools enable deeper insights into biological processes, and clustering algorithms facilitate the identification of complex relationships between genes. Genomic DNA extraction and microarray data normalization are crucial steps in ensuring data quality, while high-throughput screening and single nucleotide polymorphism analysis accelerate research. Image analysis software, biotinylated target cDNA, rever
f
SP500.xvmz
figshare.com
application/gzip
Updated Oct 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James Li (2022). SP500.xvmz [Dataset]. http://doi.org/10.6084/m9.figshare.21433071.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21433071.v1
Dataset updated
Oct 30, 2022
Dataset provided by
figshare
Authors
James Li
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Test dataset
f
Spatial statistical tools for genome-wide mutation cluster detection under a...
plos.figshare.com
tiff
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bin Luo; Alanna K. Edge; Cornelia Tolg; Eva A. Turley; C. B. Dean; Kathleen A. Hill; R. J. Kulperger (2023). Spatial statistical tools for genome-wide mutation cluster detection under a microarray probe sampling system [Dataset]. http://doi.org/10.1371/journal.pone.0204156
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0204156
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Bin Luo; Alanna K. Edge; Cornelia Tolg; Eva A. Turley; C. B. Dean; Kathleen A. Hill; R. J. Kulperger
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Mutation cluster analysis is critical for understanding certain mutational mechanisms relevant to genetic disease, diversity, and evolution. Yet, whole genome sequencing for detection of mutation clusters is prohibitive with high cost for most organisms and population surveys. Single nucleotide polymorphism (SNP) genotyping arrays, like the Mouse Diversity Genotyping Array, offer an alternative low-cost, screening for mutations at hundreds of thousands of loci across the genome using experimental designs that permit capture of de novo mutations in any tissue. Formal statistical tools for genome-wide detection of mutation clusters under a microarray probe sampling system are yet to be established. A challenge in the development of statistical methods is that microarray detection of mutation clusters is constrained to select SNP loci captured by probes on the array. This paper develops a Monte Carlo framework for cluster testing and assesses test statistics for capturing potential deviations from spatial randomness which are motivated by, and incorporate, the array design. While null distributions of the test statistics are established under spatial randomness via the homogeneous Poisson process, power performance of the test statistics is evaluated under postulated types of Neyman-Scott clustering processes through Monte Carlo simulation. A new statistic is developed and recommended as a screening tool for mutation cluster detection. The statistic is demonstrated to be excellent in terms of its robustness and power performance, and useful for cluster analysis in settings of missing data. The test statistic can also be generalized to any one dimensional system where every site is observed, such as DNA sequencing data. The paper illustrates how the informal graphical tools for detecting clusters may be misleading. The statistic is used for finding clusters of putative SNP differences in a mixture of different mouse genetic backgrounds and clusters of de novo SNP differences arising between tissues with development and carcinogenesis.
o
Peripheral Blood Mononuclear Cell Gene Expression Profiles May Predict Poor...
omicsdi.org
xml
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jose Herazo Maya,JOSE DAVID HERAZO MAYA, Peripheral Blood Mononuclear Cell Gene Expression Profiles May Predict Poor Outcome in Idiopathic Pulmonary Fibrosis [Agilent] [Dataset]. https://www.omicsdi.org/dataset/arrayexpress-repository/E-GEOD-28042
Explore at:
xmlAvailable download formats
Authors
Jose Herazo Maya,JOSE DAVID HERAZO MAYA
Variables measured
Transcriptomics,Multiomics
Description
Background: In this study we aimed to identify peripheral blood mononuclear cell (PBMC) gene expression profiles predictive of poor outcomes in idiopathic pulmonary fibrosis (IPF) Methods: Microarray analyses of PBMC were performed in 120 patients from discovery (n=45) and replication cohorts (n=75). Genes and pathways associated with transplant-free survival (TFS) were identified and confirmed by qRT-PCR. Findings: 52 genes were predictive of TFS in a discovery cohort (FDR<5%, Cox score above 2.5 or below -2.5). Clustering the replication cohort samples using these genes distinguished two patient groups with significantly different TFS (hazard ratio 1.96, 95%CI 1.01-3.8, P=0.018). Decreased expression of “The co-stimulatory signaling during T cell activation” Biocarta pathway and in particular CD28, ICOS, LCK and ITK was associated with shorter TFS times in each cohort (FDR<5%). qRT-PCR expression of CD28, ICOS, LCK and ITK correlated with the microarray results in the discovery cohort (P<0.05) and their decreased expression was predictive of shorter TFS in the replication cohort (P<0.05). A genomic and clinical model demonstrated an area under the ROC curve of 78.5% at 2.4 months for death and lung transplant prediction. Interpretation: Our results suggest that CD28, ICOS, LCK and ITK are outcome biomarkers in IPF. PBMC from 75 patients with the diagnosis of IPF were obtained within 30 minutes from blood draw. Total RNA was extracted, labeled and hybridized to Agilent Whole Human Genome Oligo Microarray, 4 x 44K. Patients were followed from blood draw until death, transplant or last follow up. Hierarchical clustering and gene-set analysis with censored outcome data were used to study the association of gene expression and outcome in this cohort (replication cohort)
o
Four subgroups by gene expression profile correlate with biological and...
omicsdi.org
xml
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
masahiro inoue, Four subgroups by gene expression profile correlate with biological and clinical features in colorectal cancer. [Dataset]. https://www.omicsdi.org/dataset/arrayexpress-repository/E-GEOD-33193
Explore at:
xmlAvailable download formats
Authors
masahiro inoue
Variables measured
Transcriptomics
Description
(Purpose) Biological classification of colorectal cancer (CRC) can help to understand its heterogeneous background. The purpose of this study is to classify CRC based on gene expression profiles using formalin-fixed paraffin-embedded (FFPE) samples and to correlate subgroups of CRC with biological features and clinical outcomes. (Results) CRC was clustered into four subgroups by unsupervised hierarchical clustering method. These subgroups show different biological and clinical features. (Conclusion) Gene expression profiles of CRC using FFPE samples distinguish four subgroups that had different biological features and clinical outcomes. These subgroups may explain heterogeneity of CRC and be useful biomarker for clinical. Patients and Methods: One hundred patients with unresectable and advanced or recurrent CRC who underwent the surgical resection from 1998 to 2010 were enrolled in this study. RNA extracted from FFPE samples was subjected to gene expression microarray. After comprehensive gene expression analysis, CRC were classified by an unsupervised hierarchical clustering and a principle component analysis (PCA). Mutation analysis of KRAS, BRAF, PIK3CA and TP53 genes were performed by direct DNA sequencing. Correlation between the biological information, clinicopathological factors and clinical outcomes were analyzed.
d
Suppression of breast tumor growth and metastasis by an engineered...
datamed.org
Updated May 2, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2014). Suppression of breast tumor growth and metastasis by an engineered transcription factor [Dataset]. https://datamed.org/display-item.php?repository=0006&id=5913b7035152c62a9fc22964&query=MIR363
Explore at:
Dataset updated
May 2, 2014
Description
Abstract Maspin is a tumor and metastasis suppressor playing an essential role as gatekeeper of tumor progression. It is highly expressed in epithelial cells but is silenced in the onset of metastatic disease by epigenetic mechanisms. Reprogramming of Maspin epigenetic silencing offers a therapeutic potential to lock metastatic progression. Herein we have investigated the ability of the Artificial Transcription Factor 126 (ATF-126) designed to upregulate the Maspin promoter to inhibit tumor progression in pre-established breast tumors in immunodeficient mice. ATF-126 was transduced in the aggressive, mesenchymal-like and triple negative breast cancer line, MDA-MB-231. Induction of ATF expression in vivo by Doxycycline resulted in 50% reduction in tumor growth and totally abolished tumor cell colonization. Genome-wide transcriptional profiles of ATF-induced cells revealed a gene signature that was found over-represented in estrogen receptor positive (ER+) “Normal-like” intrinsic subtype of breast cancer and in poorly aggressive, ER+ luminal A breast cancer cell lines. The comparison transcriptional profiles of ATF-126 and Maspin cDNA defined an overlapping 19-gene signature, comprising novel targets downstream the Maspin signaling cascade. Our data suggest that Maspin up-regulates downstream tumor and metastasis suppressor genes that are silenced in breast cancers, and are normally expressed in the neural system, including CARNS1, SLC8A2 and DACT3. In addition, ATF-126 and Maspin cDNA induction led to the re-activation of tumor suppressive miRNAs also expressed in neural cells, such as miR-1 and miR-34, and to the down-regulation of potential oncogenic miRNAs, such as miR-10b, miR-124, and miR-363. As expected from its over-representation in ER+ tumors, the ATF-126-gene signature predicted favorable prognosis for breast cancer patients. Our results describe for the first time an ATF able to reduce tumor growth and metastatic colonization by epigenetic reactivation of a dormant, normal-like, and more differentiated gene program. A total of six cell lines were used for gene expression analyses: CONTROL –DOX, CONTROL +DOX, ATF-126 –DOX, ATF-126 +DOX (all with 3 technical replicates), p-RetoX-Tight-Maspin –DOX, and p-RetoX-Tight-Maspin +DOX (with 2 technical replicates). For each cell line, total RNA was purified, amplified, labeled, and hybridized [46] using Agilent Agilent 4X44K oligo microarrays (Agilent Technologies, United States). The probes/genes were filtered by requiring the lowest normalized intensity values in both –DOX and +DOX samples to be >10. The normalized log2 ratios (Cy5 sample/Cy3 control) of probes mapping to the same gene were averaged to generate independent expression estimates. We also used available microarrays from the breast cancer cell lines [21], the UNC337-patient [20], the MERGE 550-patient dataset [47] and the NKI (295 patients [48,49]). All microarray cluster analyses were displayed using Java Treeview version 1.1.3. Average-linkage hierarchical clustering was performed using Cluster v2.12 [50]. ANOVA tests for gene expression data were performed using R ().
N
Children's Oncology Group Study 9906 for High-Risk Pediatric ALL
data.niaid.nih.gov
Updated Nov 8, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Willman CL; Ar K; Atlas SR; Bedrick EJ; Bhojwani D; Borowitz MJ; Bowman WP; Camitta B; Carroll AJ; Carroll WL; Chen I; Davidson GS; Devidas M; Harvey RC; Hunger SP; Kang H; Murphy M; Pullen J; Reaman GH; Wang X; Wilson CS (2019). Children's Oncology Group Study 9906 for High-Risk Pediatric ALL [Dataset]. https://data.niaid.nih.gov/resources?id=gse11877
Explore at:
Dataset updated
Nov 8, 2019
Dataset provided by
UNM Health Sciences Center
Authors
Willman CL; Ar K; Atlas SR; Bedrick EJ; Bhojwani D; Borowitz MJ; Bowman WP; Camitta B; Carroll AJ; Carroll WL; Chen I; Davidson GS; Devidas M; Harvey RC; Hunger SP; Kang H; Murphy M; Pullen J; Reaman GH; Wang X; Wilson CS
Description
PAPER 1:"Identification of novel subgroups of high-risk pediatric precursor B acute lymphoblastic leukemia (B-ALL) by unsupervised microarray analysis: clinical correlates and therapeutic implications. A Children's Oncology Group (COG) study."ABSTRACTWe examined gene expression profiles of pre-treatment specimens from 207 patients from the COG P9906 study to identify signatures of children with high risk B-precursor acute lymphoblastic leukemia (ALL) and to determine whether the resulting clusters are associated with either specific clinical features or treatment response characteristics.Four unsupervised clustering methods were utilized to classify patients into similar groups. The different clustering algorithms showed significant overlap in cluster membership. Two clusters contained all cases with either t(1;19)(q23;p13) translocations or MLL rearrangements. The other six clusters were novel and had no recurring chromosomal abnormalities or distinctive clinical features. Members of two of these novel clusters had significant survival differences when compared to the overall 4-year relapse-free survival (RFS) of 61%. These included clusters of patients with either significantly better (94.7%) or worse (21.0%) RFS at 4 years. Children of Hispanic/Latino ethnicity were disproportionately present in the poor outcome cluster. The poor outcome cluster represents a novel biologically distinctive subset of B-precursor ALL that may occur at least as frequently as BCR/ABL. Further molecular characterization of this cluster may lead to the discovery of genomic abnormalities that can be targeted to improve the currently dismal outcome for children with this gene signature.The Sample data have also been used in another study:PAPER 2: "Gene expression classifiers for minimal residual disease and relapse free survival improve outcome prediction and risk classification in children with high risk acute lymphoblastic leukemia. A Children's Oncology Group study".ABSTRACTBackground. Nearly 25% of children with B-precursor ALL present with "high-risk" disease (HR-ALL) that is resistant to current therapies. Gene expression profiling may yield molecular classifiers for outcome prediction that can be used to improve risk classification and therapeutic targeting.Methods. Expression profiles were obtained in pre-treatment leukemic samples from 207 uniformly treated children with HR-ALL. Relapse free survival (RFS) was 61% at 4 years and flow cytometric measures of minimal residual disease (MRD) at the end of induction (day 29) were predictive of outcome (P<0.001). Molecular classifiers predictive of RFS and MRD were developed using extensive cross-validation procedures.Results. A 38 gene molecular risk classifier predictive of RFS (MRC-RFS) distinguished two groups in HR-ALL with different relapse risks: low (4 yr RFS: 81%, n=109) vs. high (4 yr RFS: 50%, n=98) (P<0.0001). In multivariate analysis, the best predictor combined MRC-RFS and day 29 flow MRD data, classifying children into low (87% RFS), intermediate (62% RFS), or high risk (29% RFS) groups (P<0.0001). A 21 gene molecular classifier predictive of MRD could effectively substitute for day 29 flow MRD, yielding a combined classifier that similarly distinguished three risk groups at pre-treatment (low: 82% RFS; intermediate: 63% RFS; and high risk: 45% RFS) (P<0.0001). This combined molecular classifier was further validated on an independent cohort of 84 children with HR-ALL (P = 0.006).Conclusions. Molecular classifiers predictive of RFS and MRD can be used to distinguish distinct prognostic groups within HR-ALL, significantly improving risk classification schemes and the ability to prospectively identify children at diagnosis who will respond to or fail current treatment regimens.NOTE: Due to Children's Oncology Group (COG) restrictions, outcome and MRD data cannot be provided as part of the covariate data for this dataset at the present time. If you would like to arrange individual access to this data, please contact COG or the PI of this study, Dr. Cheryl Willman, at the University of New Mexico Cancer Center (cwillman@unm.edu) to arrange a collaboration. Unsupervised clustering and supervised risk classification analyses of 207 diagnostic samples and associated clinical covariate data.See the Summary for greater details.The data were analyzed using Microarray Suite version 5.0 (MAS 5.0) in the Affymetrix Gene Chip Operating Software Version 1.4. Probe masking was used (see 9906_TT207_Affymetrix_probe_mask.msk, linked below as a supplementary file). Otherwise all Affymetrix default parameter settings were used. Global scaling as the normalization method, with the default target intensity of 500, was used.

Facebook

Twitter

Click to copy link

Link copied

Cite

Basel Abu-Jamous; Basel Abu-Jamous; Steven Kelly; Steven Kelly (2024). Clust_100_GE_datasets [Dataset]. http://doi.org/10.5281/zenodo.1169191

Clust_100_GE_datasets

Explore at:

zip, pdfAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.1169191

Dataset updated

Aug 2, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Basel Abu-Jamous; Basel Abu-Jamous; Steven Kelly; Steven Kelly

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

100 microarray and RNA-seq gene expression datasets from five model species (human, mouse, fruit fly, arabidopsis plants, and baker's yeast). These datasets represent the benchmark set that was used to test our clust clustering method and to compare it with five widely used clustering methods (MCL, k-means, hierarchical clustering, WGCNA, and self-organising maps). This data resource includes raw data files, pre-processed data files, clustering results, clustering results evaluation, and scripts.

The files are split into three zipped parts, 100Datasets_part_1.zip, 100Datasets_part_2.zip, and 100Datasets_part_3.zip. The contents of the three zipped files should be extracted to a single folder (e.g. 100Datasets).

Below is a thorough description of the files and folders in this data resource.

Scripts

The scripts used to apply each one of the clustering methods to each one of the 100 datasets and to evaluate their results are all included in the folder (scripts/).

Datasets and clustering results (folders starting with D)

The datasets are labelled as D001 to D100. Each dataset has two folders: D###/ and D###_Res/, where ### is the number of the dataset. The first folder only includes the raw dataset while the second folder includes the results of applying the clustering methods to that dataset. The files ending with _B.tsv include clustering results in the form of a partition matrix. The files ending with _E include metrics evaluating the clustering results. The files ending with _go and _go_E respectively include the enriched GO terms in the clustering results and evaluation metrics of these GO terms.

Simultaneous analysis of multiple datasets (folders starting with MD)

As our clust method is design to be able to extract clusters from multiple datasets simultaneously, we also tested it over multiple datasets. All folders starting with MD_ are related to "multiple datasets (MD)" results. Each MD experiment simultaneously analyses d randomly selected datasets either out of a set of 10 arabidopsis datasets or out of a set of 10 yeast datasets. For each one of the two species, all d values from 2 to 10 were tested, and at each one of these d values, 10 different runs were conducted, where at each run a different subset of d datasets is selected randomly.

The folders MD_10A and MD_10Y include the full sets of 10 arabidposis or 10 yeast datasets, respectively. Each folder with the format MD_10#_d#_Res## includes the results of applying the six clustering methods at one of the 10 random runs of one of the selected d values. For example, the "MD_10A_d4_Res03/" folder includes the clustering results of the 3^rd random selection of 4 arabidopsis datasets (the letter A in the folder's name refers to arabidopsis).

Our clust method is applied directly over multiple datasets where each dataset is in a separate data file. Each "MD_10#_d#_Res##" folder includes these individual files in a sub-folder named "Processed_Data/". However, the other clustering methods only accept a single input data file. Therefore, the datasets are merged first before being submitted to these methods. Each "MD_10#_d#_Res##" folder includes a file "X_merged.tsv" for the merged data.

Evaluation metrics (folders starting with Metrics)

Each clustering results folder (D##_Res or MD_10#_d#_Res##) includes some clustering evaluation files ending with _E. This information is combined into tables for all datasets, and these tables appear in the folders starting with "Metrics_".

Other files and folders

The GO folder includes the reference GO term annotations for arabidopsis and yeast. The Datasets file includes a TAB delimited table describing the 100 datasets. The SearchCriterion file includes the objective methodology of searching the NCBI database to select these 100 datasets. The Specials file includes some special considerations for couple of datasets that differ a bit from what is described in the SearchCriterion file. The Norm### files and the files in the Reps/ folder describe normalisation codes and replicate structures for the datasets and were fed to the clust method as inputs. The Plots/ folder includes plots of the gene expression profiles of the individual genes in the clusters generated by each one of the 6 methods over each one of the 100 datasets. Only up to 14 clusters per method are plotted.

Clear search

Close search

Google apps

Main menu

Clust_100_GE_datasets

Description of six real microarray data sets.

Bio Resource for Array Genes Database

MOESM2 of AutoSOME: a clustering method for identifying gene expression...

MOESM4 of AutoSOME: a clustering method for identifying gene expression...

Microarray Analysis of chemosensitivity in Laryngeal Squamous Cell Carcinoma...

NIA Array Analysis

Coexpression Analysis of Human Genes Across Many Microarray Data Sets

DNA microarrays (time course) in the iPS process treated with microRNA...

Stemformatics

Special gene expression comparison of four methods on 6 data sets.

Data from: Consensus clustering of gene expression microarray data using...

Identifying Subspace Gene Clusters from Microarray Data Using Low-Rank...

Microarray Biochips Market Analysis North America, Europe, Asia, Rest of...

Snapshot img

SP500.xvmz

Spatial statistical tools for genome-wide mutation cluster detection under a...

Peripheral Blood Mononuclear Cell Gene Expression Profiles May Predict Poor...

Four subgroups by gene expression profile correlate with biological and...

Suppression of breast tumor growth and metastasis by an engineered...

Children's Oncology Group Study 9906 for High-Risk Pediatric ALL

Clust_100_GE_datasetsSee More Versions

Clust_100_GE_datasets