100+ datasets found
  1. Clust_100_GE_datasets

    • zenodo.org
    pdf, zip
    Updated Aug 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Basel Abu-Jamous; Basel Abu-Jamous; Steven Kelly; Steven Kelly (2024). Clust_100_GE_datasets [Dataset]. http://doi.org/10.5281/zenodo.1169191
    Explore at:
    zip, pdfAvailable download formats
    Dataset updated
    Aug 2, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Basel Abu-Jamous; Basel Abu-Jamous; Steven Kelly; Steven Kelly
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    100 microarray and RNA-seq gene expression datasets from five model species (human, mouse, fruit fly, arabidopsis plants, and baker's yeast). These datasets represent the benchmark set that was used to test our clust clustering method and to compare it with five widely used clustering methods (MCL, k-means, hierarchical clustering, WGCNA, and self-organising maps). This data resource includes raw data files, pre-processed data files, clustering results, clustering results evaluation, and scripts.

    The files are split into three zipped parts, 100Datasets_part_1.zip, 100Datasets_part_2.zip, and 100Datasets_part_3.zip. The contents of the three zipped files should be extracted to a single folder (e.g. 100Datasets).

    Below is a thorough description of the files and folders in this data resource.

    Scripts

    The scripts used to apply each one of the clustering methods to each one of the 100 datasets and to evaluate their results are all included in the folder (scripts/).

    Datasets and clustering results (folders starting with D)

    The datasets are labelled as D001 to D100. Each dataset has two folders: D###/ and D###_Res/, where ### is the number of the dataset. The first folder only includes the raw dataset while the second folder includes the results of applying the clustering methods to that dataset. The files ending with _B.tsv include clustering results in the form of a partition matrix. The files ending with _E include metrics evaluating the clustering results. The files ending with _go and _go_E respectively include the enriched GO terms in the clustering results and evaluation metrics of these GO terms.

    Simultaneous analysis of multiple datasets (folders starting with MD)

    As our clust method is design to be able to extract clusters from multiple datasets simultaneously, we also tested it over multiple datasets. All folders starting with MD_ are related to "multiple datasets (MD)" results. Each MD experiment simultaneously analyses d randomly selected datasets either out of a set of 10 arabidopsis datasets or out of a set of 10 yeast datasets. For each one of the two species, all d values from 2 to 10 were tested, and at each one of these d values, 10 different runs were conducted, where at each run a different subset of d datasets is selected randomly.

    The folders MD_10A and MD_10Y include the full sets of 10 arabidposis or 10 yeast datasets, respectively. Each folder with the format MD_10#_d#_Res## includes the results of applying the six clustering methods at one of the 10 random runs of one of the selected d values. For example, the "MD_10A_d4_Res03/" folder includes the clustering results of the 3rd random selection of 4 arabidopsis datasets (the letter A in the folder's name refers to arabidopsis).

    Our clust method is applied directly over multiple datasets where each dataset is in a separate data file. Each "MD_10#_d#_Res##" folder includes these individual files in a sub-folder named "Processed_Data/". However, the other clustering methods only accept a single input data file. Therefore, the datasets are merged first before being submitted to these methods. Each "MD_10#_d#_Res##" folder includes a file "X_merged.tsv" for the merged data.

    Evaluation metrics (folders starting with Metrics)

    Each clustering results folder (D##_Res or MD_10#_d#_Res##) includes some clustering evaluation files ending with _E. This information is combined into tables for all datasets, and these tables appear in the folders starting with "Metrics_".

    Other files and folders

    The GO folder includes the reference GO term annotations for arabidopsis and yeast. The Datasets file includes a TAB delimited table describing the 100 datasets. The SearchCriterion file includes the objective methodology of searching the NCBI database to select these 100 datasets. The Specials file includes some special considerations for couple of datasets that differ a bit from what is described in the SearchCriterion file. The Norm### files and the files in the Reps/ folder describe normalisation codes and replicate structures for the datasets and were fed to the clust method as inputs. The Plots/ folder includes plots of the gene expression profiles of the individual genes in the clusters generated by each one of the 6 methods over each one of the 100 datasets. Only up to 14 clusters per method are plotted.

  2. f

    Description of six real microarray data sets.

    • figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guifang Shao; Dongyao Li; Junfa Zhang; Jianbo Yang; Yali Shangguan (2023). Description of six real microarray data sets. [Dataset]. http://doi.org/10.1371/journal.pone.0210075.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Guifang Shao; Dongyao Li; Junfa Zhang; Jianbo Yang; Yali Shangguan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description of six real microarray data sets.

  3. d

    Bio Resource for Array Genes Database

    • dknet.org
    • scicrunch.org
    • +1more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Bio Resource for Array Genes Database [Dataset]. http://identifiers.org/RRID:SCR_000748
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Bio Resource for array genes is a free online resource for easy access to collective and integrated information from various public biological resources for human, mouse, rat, fly and c. elegans genes. The resource includes information about the genes that are represented in Unigene clusters. This resource provides interactive tools to selectively view, analyze and interpret gene expression patterns against the background of gene and protein functional information. Different query options are provided to mine the biological relationships represented in the underlying database. Search button will take you to the list of query tools available. This Bio resource is a platform designed as an online resource to assist researchers in analyzing results of microarray experiments and developing a biological interpretation of the results. This site is mainly to interpret the unique gene expression patterns found as biological changes that can lead to new diagnostic procedures and drug targets. This interactive site allows users to selectively view a variety of information about gene functions that is stored in an underlying database. Although there are other online resources that provide a comprehensive annotation and summary of genes, this resource differs from these by further enabling researchers to mine biological relationships amongst the genes captured in the database using new query tools. Thus providing a unique way of interpreting the microarray data results based on the knowledge provided for the cellular roles of genes and proteins. A total of six different query tools are provided and each offer different search features, analysis options and different forms of display and visualization of data. The data is collected in relational database from public resources: Unigene, Locus link, OMIM, NCBI dbEST, protein domains from NCBI CDD, Gene Ontology, Pathways (Kegg, Genmapp and Biocarta) and BIND (Protein interactions). Data is dynamically collected and compiled twice a week from public databases. Search options offer capability to organize and cluster genes based on their Interactions in biological pathways, their association with Gene Ontology terms, Tissue/organ specific expression or any other user-chosen functional grouping of genes. A color coding scheme is used to highlight differential gene expression patterns against a background of gene functional information. Concept hierarchies (Anatomy and Diseases) of MESH (Medical Subject Heading) terms are used to organize and display the data related to Tissue specific expression and Diseases. Sponsors: BioRag database is maintained by the Bioinformatics group at Arizona Cancer Center. The material presented here is compiled from different public databases. BioRag is hosted by the Biotechnology Computing Facility of the University of Arizona. 2002,2003 University of Arizona.

  4. f

    MOESM2 of AutoSOME: a clustering method for identifying gene expression...

    • springernature.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aaron Newman; James Cooper (2023). MOESM2 of AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number [Dataset]. http://doi.org/10.6084/m9.figshare.8136281.v1
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Authors
    Aaron Newman; James Cooper
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 2: Table S2. F-measure and NMI for each benchmarking dataset an clustering method. (XLS 34 KB)

  5. f

    MOESM4 of AutoSOME: a clustering method for identifying gene expression...

    • springernature.figshare.com
    xls
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aaron Newman; James Cooper (2023). MOESM4 of AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number [Dataset]. http://doi.org/10.6084/m9.figshare.8136293.v1
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    figshare
    Authors
    Aaron Newman; James Cooper
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 4: PluriUp and PluriPlus gene list and raw interaction network. Table S7, Updated HUGO gene symbols for PluriUp and PluriPlus; Table S8, Edges of PluriPlus interaction network; Table S9, Nodes and annotation of PluriPlus interaction network. (XLS 672 KB)

  6. d

    Microarray Analysis of chemosensitivity in Laryngeal Squamous Cell Carcinoma...

    • datamed.org
    Updated Aug 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). Microarray Analysis of chemosensitivity in Laryngeal Squamous Cell Carcinoma [Dataset]. https://datamed.org/display-item.php?repository=0044&idName=ID&id=5841d7705152c649505ed12e
    Explore at:
    Dataset updated
    Aug 19, 2016
    Description

    OBJECTIVE: To investigate the differentially expressed genes related to the chemosensitivity of laryngeal squamous cell carcinoma (LSCC)by microarrays arrays. METHODS: 1. A total number of 11 patients who underwent induction chemotherapy for primary hypopharyngeal squamous cell carcinoma (7 patients are sensitive to chemotherapy ,and others are not) were recruited for microarray and miRNA array gene expression analysis 2. Bioinformatics analysis of differentially expressed genes screened by microarrays : The differential gene cluster analysis was applied in biological processes, cellular components and molecular functions by GO database; The differential gene enrichment analysis was applied in signaling pathways by KEGG database, and the differentially expressed and biologically meaningful core genes would be screened. RESULTS: 1. Analyzed by microarrays, there were 1554 genes significantly related to the sensitivity to chemotherapy; Among these 1554genes, 777 showed a higher expression in the tissue from patients who are sensitive to chemotherapy , while 785 presented the contrasting pattern. CONCLUSIONS: The research revealed a gene expression signature of chemosensitivity in laryngeal squamous cell carcinoma by microarrays arrays. The result will contribute to the understanding of the molecular basis of laryngeal squamous cell carcinoma and help to improve diagnosis and treatment. 1. A total number of 11 patients who underwent induction chemotherapy for primary hypopharyngeal squamous cell carcinoma (7 patients are sensitive to chemotherapy ,and others are not) were recruited for microarray and miRNA array gene expression analysis 2. Bioinformatics analysis of differentially expressed genes screened by microarrays : The differential gene cluster analysis was applied in biological processes, cellular components and molecular functions by GO database; The differential gene enrichment analysis was applied in signaling pathways by KEGG database, and the differentially expressed and biologically meaningful core genes would be screened.

  7. n

    NIA Array Analysis

    • neuinfo.org
    Updated Oct 16, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). NIA Array Analysis [Dataset]. http://identifiers.org/RRID:SCR_010948
    Explore at:
    Dataset updated
    Oct 16, 2019
    Description

    Data analysis server / software designed to test statistical significance of gene microarray data, visualize the results, and provide links to clone information and gene index. Several public datasets are also available.

  8. B

    Coexpression Analysis of Human Genes Across Many Microarray Data Sets

    • borealisdata.ca
    • open.library.ubc.ca
    Updated Mar 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Homin K Lee; Amy K Hsu; Jon Sajdak; Jie Qin; Paul Pavlidis (2019). Coexpression Analysis of Human Genes Across Many Microarray Data Sets [Dataset]. http://doi.org/10.5683/SP2/JOJYOP
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 12, 2019
    Dataset provided by
    Borealis
    Authors
    Homin K Lee; Amy K Hsu; Jon Sajdak; Jie Qin; Paul Pavlidis
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Dataset funded by
    ABCF
    Description

    We present a large-scale analysis of mRNA coexpression based on 60 large human data sets containing a total of 3924 microarrays. We sought pairs of genes that were reliably coexpressed (based on the correlation of their expression profiles) in multiple data sets, establishing a high-confidence network of 8805 genes connected by 220,649 “coexpression links” that are observed in at least three data sets. Confirmed positive correlations between genes were much more common than confirmed negative correlations. We show that confirmation of coexpression in multiple data sets is correlated with functional relatedness, and show how cluster analysis of the network can reveal functionally coherent groups of genes. Our findings demonstrate how the large body of accumulated microarray data can be exploited to increase the reliability of inferences about gene function.

  9. o

    DNA microarrays (time course) in the iPS process treated with microRNA...

    • omicsdi.org
    xml
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Duanqing Pei, DNA microarrays (time course) in the iPS process treated with microRNA clusters [Dataset]. https://www.omicsdi.org/dataset/geo/GSE23104
    Explore at:
    xmlAvailable download formats
    Authors
    Duanqing Pei
    Variables measured
    Other
    Description

    Treatment with MicroRNA cluster B and C increase the iPS efficiency We used microarrays to identify changes induced by MicroRNA clusters in the iPS process Overall design: One group were MEFs infected with SKO factors, plus MicroRNA cluster B, C or control blank virus. The other group were MEFs infected with MicroRNA cluster B, C or control blank virus only.TRIZOL cell lysates were prepared at D4 and D8.

  10. d

    Stemformatics

    • dknet.org
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Stemformatics [Dataset]. http://identifiers.org/RRID:SCR_017002/resolver/mentions?q=&i=rrid
    Explore at:
    Dataset updated
    Apr 11, 2025
    Description

    Gene expression data portal developed for stem cell community, containing public gene expression datasets derived from microarray, RNA sequencing and single cell profiling technologies. Portal to visualize and download curated stem cell data. Provides easy to use and intuitive tools for biologists to visually explore data, including interactive gene expression profiles, principal component analysis plots and hierarchical clusters, among others.

  11. f

    Special gene expression comparison of four methods on 6 data sets.

    • plos.figshare.com
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guifang Shao; Dongyao Li; Junfa Zhang; Jianbo Yang; Yali Shangguan (2023). Special gene expression comparison of four methods on 6 data sets. [Dataset]. http://doi.org/10.1371/journal.pone.0210075.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Guifang Shao; Dongyao Li; Junfa Zhang; Jianbo Yang; Yali Shangguan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Special gene expression comparison of four methods on 6 data sets.

  12. r

    Data from: Consensus clustering of gene expression microarray data using...

    • researchdata.edu.au
    • bridges.monash.edu
    Updated May 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexandre Mendes (2022). Consensus clustering of gene expression microarray data using genetic algorithms [Dataset]. http://doi.org/10.4225/03/5a13728358b1d
    Explore at:
    Dataset updated
    May 5, 2022
    Dataset provided by
    Monash University
    Authors
    Alexandre Mendes
    Description

    This work presents a new consensus clustering method for gene expression microarray data based on a genetic algorithm. Using two datasets - DA and DB - as input, the genetic algorithm examines putative partitions for the samples in DA, selecting biomarkers that support such partitions. The biomarkers are then used to build a classifier which is used in DB to determine its samples classes. The genetic algorithm is guided by an objective function that takes into account the accuracy of classification in both datasets, the number of biomarkers that support the partition, and the distribution of the samples across the classes for each dataset. To illustrate the method, two whole-genome breast cancer instances from dfferent sources were used. In this application, the results indicate that the method could be used to find unknown subtypes of diseases supported by biomarkers presenting similar gene expression profiles across platforms. Moreover, even though this initial study was restricted to two datasets and two classes, the method can be easily extended to consider both more datasets and classes. PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1

    Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ; Chetty, Madhu ; Ahmad, Shandar ; Ngom, Alioune ; Teng, Shyh Wei ; Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ; Coverage: Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.

  13. f

    Identifying Subspace Gene Clusters from Microarray Data Using Low-Rank...

    • plos.figshare.com
    doc
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yan Cui; Chun-Hou Zheng; Jian Yang (2023). Identifying Subspace Gene Clusters from Microarray Data Using Low-Rank Representation [Dataset]. http://doi.org/10.1371/journal.pone.0059377
    Explore at:
    docAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Yan Cui; Chun-Hou Zheng; Jian Yang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Identifying subspace gene clusters from the gene expression data is useful for discovering novel functional gene interactions. In this paper, we propose to use low-rank representation (LRR) to identify the subspace gene clusters from microarray data. LRR seeks the lowest-rank representation among all the candidates that can represent the genes as linear combinations of the bases in the dataset. The clusters can be extracted based on the block diagonal representation matrix obtained using LRR, and they can well capture the intrinsic patterns of genes with similar functions. Meanwhile, the parameter of LRR can balance the effect of noise so that the method is capable of extracting useful information from the data with high level of background noise. Compared with traditional methods, our approach can identify genes with similar functions yet without similar expression profiles. Also, it could assign one gene into different clusters. Moreover, our method is robust to the noise and can identify more biologically relevant gene clusters. When applied to three public datasets, the results show that the LRR based method is superior to existing methods for identifying subspace gene clusters.

  14. Microarray Biochips Market Analysis North America, Europe, Asia, Rest of...

    • technavio.com
    Updated Feb 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2022). Microarray Biochips Market Analysis North America, Europe, Asia, Rest of World (ROW) - US, UK, Germany, China, Japan - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/microarray-biochips-market-industry-analysis
    Explore at:
    Dataset updated
    Feb 10, 2022
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    United States, Global
    Description

    Snapshot img

    Microarray Biochips Market Size 2024-2028

    The microarray biochips market size is forecast to increase by USD 17.28 billion, at a CAGR of 22.2% between 2023 and 2028.

    The market is characterized by a growing number of collaborations among key players, which is expanding market presence and driving innovation. This strategic approach is essential in the capital-intensive market, where significant investments are required for research and development. A notable trend in the market is the emergence of Label-One-Component (LOC) technology, offering advantages such as improved sensitivity and specificity. However, the high cost of microarray biochips remains a significant challenge for market growth. Companies seeking to capitalize on opportunities must navigate this obstacle by focusing on cost reduction through economies of scale and process optimization. Additionally, collaborations and partnerships can help share research and development costs and accelerate time-to-market for innovative products. The strategic landscape of the market is dynamic, with ongoing advancements in technology and a growing demand for personalized medicine, creating opportunities for companies to differentiate themselves and gain a competitive edge.

    What will be the Size of the Microarray Biochips Market during the forecast period?

    Explore in-depth regional segment analysis with market size data - historical 2018-2022 and forecasts 2024-2028 - in the full report.
    Request Free SampleThe market continues to evolve, driven by advancements in technology and expanding applications across various sectors. Protein microarray technology, a crucial component, enables high-throughput analysis of protein-protein interactions and antibody discovery. Reproducibility metrics and spot morphology analysis ensure consistency and accuracy in data generation. Label incorporation methods, such as biotinylated target cDNA and reverse transcription PCR, facilitate efficient probe attachment. Gene ontology enrichment and pathway analysis tools provide insights into biological functions and molecular interactions. Data mining algorithms, including clustering algorithms and fold change calculations, facilitate pattern recognition and discovery. Microarray data normalization techniques, such as CDNA microarray platforms and genomic DNA extraction, ensure data consistency. Microarray experimental design, hybridization kinetics, and high-throughput screening are essential for optimizing data generation and analysis. Single nucleotide polymorphism (SNP) detection and comparative genomic hybridization offer valuable insights into genetic variations. Data quality assessment, signal-to-noise ratios, and background correction methods ensure data accuracy and reliability. In situ hybridization and fluorescence detection methods facilitate visualization and analysis of gene expression at the cellular level. Differential gene expression analysis provides insights into disease mechanisms and therapeutic targets. Microarray scanner systems and image analysis software facilitate efficient and accurate data analysis. DNA microarray technology continues to evolve, offering exciting possibilities for research and diagnostic applications.

    How is this Microarray Biochips Industry segmented?

    The microarray biochips industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments. ApplicationDrug discovery and developmentDiagnostics and treatmentsResearch and consumablesForensic medicinesOthersGeographyNorth AmericaUSEuropeGermanyUKAPACChinaJapanRest of World (ROW)

    By Application Insights

    The drug discovery and development segment is estimated to witness significant growth during the forecast period.The market is witnessing significant growth due to its increasing application in drug discovery, driven by the rising preference for personalized medicines. With the global population aging, the demand for better healthcare solutions is escalating, leading manufacturers to continually innovate and improve microarray technology. In genomics and proteomics, microarray biochips are increasingly utilized, further fueling market growth. Advancements in protein microarray technology ensure greater reproducibility and accuracy, while spot morphology analysis and label incorporation enhance data reliability. Gene ontology enrichment and pathway analysis tools enable deeper insights into biological processes, and clustering algorithms facilitate the identification of complex relationships between genes. Genomic DNA extraction and microarray data normalization are crucial steps in ensuring data quality, while high-throughput screening and single nucleotide polymorphism analysis accelerate research. Image analysis software, biotinylated target cDNA, rever

  15. f

    SP500.xvmz

    • figshare.com
    application/gzip
    Updated Oct 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James Li (2022). SP500.xvmz [Dataset]. http://doi.org/10.6084/m9.figshare.21433071.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Oct 30, 2022
    Dataset provided by
    figshare
    Authors
    James Li
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Test dataset

  16. f

    Spatial statistical tools for genome-wide mutation cluster detection under a...

    • plos.figshare.com
    tiff
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bin Luo; Alanna K. Edge; Cornelia Tolg; Eva A. Turley; C. B. Dean; Kathleen A. Hill; R. J. Kulperger (2023). Spatial statistical tools for genome-wide mutation cluster detection under a microarray probe sampling system [Dataset]. http://doi.org/10.1371/journal.pone.0204156
    Explore at:
    tiffAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Bin Luo; Alanna K. Edge; Cornelia Tolg; Eva A. Turley; C. B. Dean; Kathleen A. Hill; R. J. Kulperger
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mutation cluster analysis is critical for understanding certain mutational mechanisms relevant to genetic disease, diversity, and evolution. Yet, whole genome sequencing for detection of mutation clusters is prohibitive with high cost for most organisms and population surveys. Single nucleotide polymorphism (SNP) genotyping arrays, like the Mouse Diversity Genotyping Array, offer an alternative low-cost, screening for mutations at hundreds of thousands of loci across the genome using experimental designs that permit capture of de novo mutations in any tissue. Formal statistical tools for genome-wide detection of mutation clusters under a microarray probe sampling system are yet to be established. A challenge in the development of statistical methods is that microarray detection of mutation clusters is constrained to select SNP loci captured by probes on the array. This paper develops a Monte Carlo framework for cluster testing and assesses test statistics for capturing potential deviations from spatial randomness which are motivated by, and incorporate, the array design. While null distributions of the test statistics are established under spatial randomness via the homogeneous Poisson process, power performance of the test statistics is evaluated under postulated types of Neyman-Scott clustering processes through Monte Carlo simulation. A new statistic is developed and recommended as a screening tool for mutation cluster detection. The statistic is demonstrated to be excellent in terms of its robustness and power performance, and useful for cluster analysis in settings of missing data. The test statistic can also be generalized to any one dimensional system where every site is observed, such as DNA sequencing data. The paper illustrates how the informal graphical tools for detecting clusters may be misleading. The statistic is used for finding clusters of putative SNP differences in a mixture of different mouse genetic backgrounds and clusters of de novo SNP differences arising between tissues with development and carcinogenesis.

  17. o

    Peripheral Blood Mononuclear Cell Gene Expression Profiles May Predict Poor...

    • omicsdi.org
    xml
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jose Herazo Maya,JOSE DAVID HERAZO MAYA, Peripheral Blood Mononuclear Cell Gene Expression Profiles May Predict Poor Outcome in Idiopathic Pulmonary Fibrosis [Agilent] [Dataset]. https://www.omicsdi.org/dataset/arrayexpress-repository/E-GEOD-28042
    Explore at:
    xmlAvailable download formats
    Authors
    Jose Herazo Maya,JOSE DAVID HERAZO MAYA
    Variables measured
    Transcriptomics,Multiomics
    Description

    Background: In this study we aimed to identify peripheral blood mononuclear cell (PBMC) gene expression profiles predictive of poor outcomes in idiopathic pulmonary fibrosis (IPF) Methods: Microarray analyses of PBMC were performed in 120 patients from discovery (n=45) and replication cohorts (n=75). Genes and pathways associated with transplant-free survival (TFS) were identified and confirmed by qRT-PCR. Findings: 52 genes were predictive of TFS in a discovery cohort (FDR<5%, Cox score above 2.5 or below -2.5). Clustering the replication cohort samples using these genes distinguished two patient groups with significantly different TFS (hazard ratio 1.96, 95%CI 1.01-3.8, P=0.018). Decreased expression of “The co-stimulatory signaling during T cell activation” Biocarta pathway and in particular CD28, ICOS, LCK and ITK was associated with shorter TFS times in each cohort (FDR<5%). qRT-PCR expression of CD28, ICOS, LCK and ITK correlated with the microarray results in the discovery cohort (P<0.05) and their decreased expression was predictive of shorter TFS in the replication cohort (P<0.05). A genomic and clinical model demonstrated an area under the ROC curve of 78.5% at 2.4 months for death and lung transplant prediction. Interpretation: Our results suggest that CD28, ICOS, LCK and ITK are outcome biomarkers in IPF. PBMC from 75 patients with the diagnosis of IPF were obtained within 30 minutes from blood draw. Total RNA was extracted, labeled and hybridized to Agilent Whole Human Genome Oligo Microarray, 4 x 44K. Patients were followed from blood draw until death, transplant or last follow up. Hierarchical clustering and gene-set analysis with censored outcome data were used to study the association of gene expression and outcome in this cohort (replication cohort)

  18. o

    Four subgroups by gene expression profile correlate with biological and...

    • omicsdi.org
    xml
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    masahiro inoue, Four subgroups by gene expression profile correlate with biological and clinical features in colorectal cancer. [Dataset]. https://www.omicsdi.org/dataset/arrayexpress-repository/E-GEOD-33193
    Explore at:
    xmlAvailable download formats
    Authors
    masahiro inoue
    Variables measured
    Transcriptomics
    Description

    (Purpose) Biological classification of colorectal cancer (CRC) can help to understand its heterogeneous background. The purpose of this study is to classify CRC based on gene expression profiles using formalin-fixed paraffin-embedded (FFPE) samples and to correlate subgroups of CRC with biological features and clinical outcomes. (Results) CRC was clustered into four subgroups by unsupervised hierarchical clustering method. These subgroups show different biological and clinical features. (Conclusion) Gene expression profiles of CRC using FFPE samples distinguish four subgroups that had different biological features and clinical outcomes. These subgroups may explain heterogeneity of CRC and be useful biomarker for clinical. Patients and Methods: One hundred patients with unresectable and advanced or recurrent CRC who underwent the surgical resection from 1998 to 2010 were enrolled in this study. RNA extracted from FFPE samples was subjected to gene expression microarray. After comprehensive gene expression analysis, CRC were classified by an unsupervised hierarchical clustering and a principle component analysis (PCA). Mutation analysis of KRAS, BRAF, PIK3CA and TP53 genes were performed by direct DNA sequencing. Correlation between the biological information, clinicopathological factors and clinical outcomes were analyzed.

  19. d

    Suppression of breast tumor growth and metastasis by an engineered...

    • datamed.org
    Updated May 2, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2014). Suppression of breast tumor growth and metastasis by an engineered transcription factor [Dataset]. https://datamed.org/display-item.php?repository=0006&id=5913b7035152c62a9fc22964&query=MIR363
    Explore at:
    Dataset updated
    May 2, 2014
    Description

    Abstract Maspin is a tumor and metastasis suppressor playing an essential role as gatekeeper of tumor progression. It is highly expressed in epithelial cells but is silenced in the onset of metastatic disease by epigenetic mechanisms. Reprogramming of Maspin epigenetic silencing offers a therapeutic potential to lock metastatic progression. Herein we have investigated the ability of the Artificial Transcription Factor 126 (ATF-126) designed to upregulate the Maspin promoter to inhibit tumor progression in pre-established breast tumors in immunodeficient mice. ATF-126 was transduced in the aggressive, mesenchymal-like and triple negative breast cancer line, MDA-MB-231. Induction of ATF expression in vivo by Doxycycline resulted in 50% reduction in tumor growth and totally abolished tumor cell colonization. Genome-wide transcriptional profiles of ATF-induced cells revealed a gene signature that was found over-represented in estrogen receptor positive (ER+) “Normal-like” intrinsic subtype of breast cancer and in poorly aggressive, ER+ luminal A breast cancer cell lines. The comparison transcriptional profiles of ATF-126 and Maspin cDNA defined an overlapping 19-gene signature, comprising novel targets downstream the Maspin signaling cascade. Our data suggest that Maspin up-regulates downstream tumor and metastasis suppressor genes that are silenced in breast cancers, and are normally expressed in the neural system, including CARNS1, SLC8A2 and DACT3. In addition, ATF-126 and Maspin cDNA induction led to the re-activation of tumor suppressive miRNAs also expressed in neural cells, such as miR-1 and miR-34, and to the down-regulation of potential oncogenic miRNAs, such as miR-10b, miR-124, and miR-363. As expected from its over-representation in ER+ tumors, the ATF-126-gene signature predicted favorable prognosis for breast cancer patients. Our results describe for the first time an ATF able to reduce tumor growth and metastatic colonization by epigenetic reactivation of a dormant, normal-like, and more differentiated gene program. A total of six cell lines were used for gene expression analyses: CONTROL –DOX, CONTROL +DOX, ATF-126 –DOX, ATF-126 +DOX (all with 3 technical replicates), p-RetoX-Tight-Maspin –DOX, and p-RetoX-Tight-Maspin +DOX (with 2 technical replicates). For each cell line, total RNA was purified, amplified, labeled, and hybridized [46] using Agilent Agilent 4X44K oligo microarrays (Agilent Technologies, United States). The probes/genes were filtered by requiring the lowest normalized intensity values in both –DOX and +DOX samples to be >10. The normalized log2 ratios (Cy5 sample/Cy3 control) of probes mapping to the same gene were averaged to generate independent expression estimates. We also used available microarrays from the breast cancer cell lines [21], the UNC337-patient [20], the MERGE 550-patient dataset [47] and the NKI (295 patients [48,49]). All microarray cluster analyses were displayed using Java Treeview version 1.1.3. Average-linkage hierarchical clustering was performed using Cluster v2.12 [50]. ANOVA tests for gene expression data were performed using R ().

  20. N

    Children's Oncology Group Study 9906 for High-Risk Pediatric ALL

    • data.niaid.nih.gov
    Updated Nov 8, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Willman CL; Ar K; Atlas SR; Bedrick EJ; Bhojwani D; Borowitz MJ; Bowman WP; Camitta B; Carroll AJ; Carroll WL; Chen I; Davidson GS; Devidas M; Harvey RC; Hunger SP; Kang H; Murphy M; Pullen J; Reaman GH; Wang X; Wilson CS (2019). Children's Oncology Group Study 9906 for High-Risk Pediatric ALL [Dataset]. https://data.niaid.nih.gov/resources?id=gse11877
    Explore at:
    Dataset updated
    Nov 8, 2019
    Dataset provided by
    UNM Health Sciences Center
    Authors
    Willman CL; Ar K; Atlas SR; Bedrick EJ; Bhojwani D; Borowitz MJ; Bowman WP; Camitta B; Carroll AJ; Carroll WL; Chen I; Davidson GS; Devidas M; Harvey RC; Hunger SP; Kang H; Murphy M; Pullen J; Reaman GH; Wang X; Wilson CS
    Description

    PAPER 1:"Identification of novel subgroups of high-risk pediatric precursor B acute lymphoblastic leukemia (B-ALL) by unsupervised microarray analysis: clinical correlates and therapeutic implications. A Children's Oncology Group (COG) study."ABSTRACTWe examined gene expression profiles of pre-treatment specimens from 207 patients from the COG P9906 study to identify signatures of children with high risk B-precursor acute lymphoblastic leukemia (ALL) and to determine whether the resulting clusters are associated with either specific clinical features or treatment response characteristics.Four unsupervised clustering methods were utilized to classify patients into similar groups. The different clustering algorithms showed significant overlap in cluster membership. Two clusters contained all cases with either t(1;19)(q23;p13) translocations or MLL rearrangements. The other six clusters were novel and had no recurring chromosomal abnormalities or distinctive clinical features. Members of two of these novel clusters had significant survival differences when compared to the overall 4-year relapse-free survival (RFS) of 61%. These included clusters of patients with either significantly better (94.7%) or worse (21.0%) RFS at 4 years. Children of Hispanic/Latino ethnicity were disproportionately present in the poor outcome cluster. The poor outcome cluster represents a novel biologically distinctive subset of B-precursor ALL that may occur at least as frequently as BCR/ABL. Further molecular characterization of this cluster may lead to the discovery of genomic abnormalities that can be targeted to improve the currently dismal outcome for children with this gene signature.The Sample data have also been used in another study:PAPER 2: "Gene expression classifiers for minimal residual disease and relapse free survival improve outcome prediction and risk classification in children with high risk acute lymphoblastic leukemia. A Children's Oncology Group study".ABSTRACTBackground. Nearly 25% of children with B-precursor ALL present with "high-risk" disease (HR-ALL) that is resistant to current therapies. Gene expression profiling may yield molecular classifiers for outcome prediction that can be used to improve risk classification and therapeutic targeting.Methods. Expression profiles were obtained in pre-treatment leukemic samples from 207 uniformly treated children with HR-ALL. Relapse free survival (RFS) was 61% at 4 years and flow cytometric measures of minimal residual disease (MRD) at the end of induction (day 29) were predictive of outcome (P<0.001). Molecular classifiers predictive of RFS and MRD were developed using extensive cross-validation procedures.Results. A 38 gene molecular risk classifier predictive of RFS (MRC-RFS) distinguished two groups in HR-ALL with different relapse risks: low (4 yr RFS: 81%, n=109) vs. high (4 yr RFS: 50%, n=98) (P<0.0001). In multivariate analysis, the best predictor combined MRC-RFS and day 29 flow MRD data, classifying children into low (87% RFS), intermediate (62% RFS), or high risk (29% RFS) groups (P<0.0001). A 21 gene molecular classifier predictive of MRD could effectively substitute for day 29 flow MRD, yielding a combined classifier that similarly distinguished three risk groups at pre-treatment (low: 82% RFS; intermediate: 63% RFS; and high risk: 45% RFS) (P<0.0001). This combined molecular classifier was further validated on an independent cohort of 84 children with HR-ALL (P = 0.006).Conclusions. Molecular classifiers predictive of RFS and MRD can be used to distinguish distinct prognostic groups within HR-ALL, significantly improving risk classification schemes and the ability to prospectively identify children at diagnosis who will respond to or fail current treatment regimens.NOTE: Due to Children's Oncology Group (COG) restrictions, outcome and MRD data cannot be provided as part of the covariate data for this dataset at the present time. If you would like to arrange individual access to this data, please contact COG or the PI of this study, Dr. Cheryl Willman, at the University of New Mexico Cancer Center (cwillman@unm.edu) to arrange a collaboration. Unsupervised clustering and supervised risk classification analyses of 207 diagnostic samples and associated clinical covariate data.See the Summary for greater details.The data were analyzed using Microarray Suite version 5.0 (MAS 5.0) in the Affymetrix Gene Chip Operating Software Version 1.4. Probe masking was used (see 9906_TT207_Affymetrix_probe_mask.msk, linked below as a supplementary file). Otherwise all Affymetrix default parameter settings were used. Global scaling as the normalization method, with the default target intensity of 500, was used.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Basel Abu-Jamous; Basel Abu-Jamous; Steven Kelly; Steven Kelly (2024). Clust_100_GE_datasets [Dataset]. http://doi.org/10.5281/zenodo.1169191
Organization logo

Clust_100_GE_datasets

Explore at:
zip, pdfAvailable download formats
Dataset updated
Aug 2, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Basel Abu-Jamous; Basel Abu-Jamous; Steven Kelly; Steven Kelly
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

100 microarray and RNA-seq gene expression datasets from five model species (human, mouse, fruit fly, arabidopsis plants, and baker's yeast). These datasets represent the benchmark set that was used to test our clust clustering method and to compare it with five widely used clustering methods (MCL, k-means, hierarchical clustering, WGCNA, and self-organising maps). This data resource includes raw data files, pre-processed data files, clustering results, clustering results evaluation, and scripts.

The files are split into three zipped parts, 100Datasets_part_1.zip, 100Datasets_part_2.zip, and 100Datasets_part_3.zip. The contents of the three zipped files should be extracted to a single folder (e.g. 100Datasets).

Below is a thorough description of the files and folders in this data resource.

Scripts

The scripts used to apply each one of the clustering methods to each one of the 100 datasets and to evaluate their results are all included in the folder (scripts/).

Datasets and clustering results (folders starting with D)

The datasets are labelled as D001 to D100. Each dataset has two folders: D###/ and D###_Res/, where ### is the number of the dataset. The first folder only includes the raw dataset while the second folder includes the results of applying the clustering methods to that dataset. The files ending with _B.tsv include clustering results in the form of a partition matrix. The files ending with _E include metrics evaluating the clustering results. The files ending with _go and _go_E respectively include the enriched GO terms in the clustering results and evaluation metrics of these GO terms.

Simultaneous analysis of multiple datasets (folders starting with MD)

As our clust method is design to be able to extract clusters from multiple datasets simultaneously, we also tested it over multiple datasets. All folders starting with MD_ are related to "multiple datasets (MD)" results. Each MD experiment simultaneously analyses d randomly selected datasets either out of a set of 10 arabidopsis datasets or out of a set of 10 yeast datasets. For each one of the two species, all d values from 2 to 10 were tested, and at each one of these d values, 10 different runs were conducted, where at each run a different subset of d datasets is selected randomly.

The folders MD_10A and MD_10Y include the full sets of 10 arabidposis or 10 yeast datasets, respectively. Each folder with the format MD_10#_d#_Res## includes the results of applying the six clustering methods at one of the 10 random runs of one of the selected d values. For example, the "MD_10A_d4_Res03/" folder includes the clustering results of the 3rd random selection of 4 arabidopsis datasets (the letter A in the folder's name refers to arabidopsis).

Our clust method is applied directly over multiple datasets where each dataset is in a separate data file. Each "MD_10#_d#_Res##" folder includes these individual files in a sub-folder named "Processed_Data/". However, the other clustering methods only accept a single input data file. Therefore, the datasets are merged first before being submitted to these methods. Each "MD_10#_d#_Res##" folder includes a file "X_merged.tsv" for the merged data.

Evaluation metrics (folders starting with Metrics)

Each clustering results folder (D##_Res or MD_10#_d#_Res##) includes some clustering evaluation files ending with _E. This information is combined into tables for all datasets, and these tables appear in the folders starting with "Metrics_".

Other files and folders

The GO folder includes the reference GO term annotations for arabidopsis and yeast. The Datasets file includes a TAB delimited table describing the 100 datasets. The SearchCriterion file includes the objective methodology of searching the NCBI database to select these 100 datasets. The Specials file includes some special considerations for couple of datasets that differ a bit from what is described in the SearchCriterion file. The Norm### files and the files in the Reps/ folder describe normalisation codes and replicate structures for the datasets and were fed to the clust method as inputs. The Plots/ folder includes plots of the gene expression profiles of the individual genes in the clusters generated by each one of the 6 methods over each one of the 100 datasets. Only up to 14 clusters per method are plotted.

Search
Clear search
Close search
Google apps
Main menu