https://ega-archive.org/dacs/EGAC00001002224https://ega-archive.org/dacs/EGAC00001002224
This dataset contains ATAC-seq data performed in MM.1S cell line in ETOH (control) or Dexamethasone condition (Treatment)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) is a fundamental epigenomics approach and has been widely used in profiling the chromatin accessibility dynamics in multiple species. A comprehensive reference of ATAC-seq datasets for mammalian tissues is important for the understanding of regulatory specificity and developmental abnormality caused by genetic or environmental alterations. Here, we report a mouse ATAC-seq atlas by producing a total of 66 ATAC-seq profiles from 20 primary tissues of both male and female mice. The ATAC-seq read enrichment, fragment size distribution, and reproducibility between replicates demonstrated the high quality of the full dataset.
https://ega-archive.org/dacs/EGAC00001000721https://ega-archive.org/dacs/EGAC00001000721
Dataset consisting of:
(1) N=234 genome-wide chromatin accessibility (ATAC-seq) profiles for distinct N=21 healthy old and N=28 healthy young subjects. ATAC-seq biological samples provided for the following tissues: PBMC (N=24), CD14+ monocytes (N=18), CD8+ memory T cells (N=7), CD8+ naive T cells (N=7), CD4+ memory T cells (N=7), CD4+ naive T cells (N=7), and naive B cells (N=7).
(2) N=39 genome-wide transcription (RNA-seq) data for distinct N=15 healthy old and N=24 healthy young subjects' PBMCs.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Gene transcription is largely regulated by cis-regulatory elements. Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) is an emerging technology that can accurately map cis-regulatory elements in animals and plants. However, the presence of cell walls and chloroplasts in plants hinders the extraction of high-quality nuclei, thereby affects the quality of ATAC-seq data. Meanwhile, it is tricky to perform ATAC-seq with different tissue types, especially for those with limited size and amount. Moreover, with rapid growth of ATAC-seq datasets from plants, powerful and easy-to-use data analysis pipelines for ATAC-seq, especially for wheat is lacking. Here, we provided an all-in-one solution for mapping open chromatin in wheat including both experimental and data analysis procedure. We efficiently obtained nuclei with less cell debris from various wheat tissues. High-quality ATAC-seq data from young spike and ovary, which are hard to harvest were generated. We determined that the saturation sequencing depth of wheat ATAC-seq is about 16 Gb. Particularly, we developed a powerful and easy-to-use online pipeline to analyze the wheat ATAC-seq data and this pipeline can be easily extended to other plant species. The method developed here will facilitate plant regulatory genome study not only for wheat but also for other plant species.
Aire is a transcriptional regulator that induces promiscuous expression of thousands of tissue-restricted antigen (TRA) genes in medullary thymic epithelial cells (mTECs). While the target genes of Aire are well characterized, the transcriptional programs regulating its own expression remain elusive. We used Affymetrix microarrays to analyze the gene expression patterns of Aire expressing cells (mature mTECs and Thymic B cells) and compared them to control counterparts, namely immature mTECs, cortical Thymic epithelial cells and splenic B cells of tissue-restricted antigen (TRA) genes in medullary thymic epithelial cells (mTECs). While the target genes of Aire are well characterized, the transcriptional programs regulating its own expression remain elusive. We’ve used Assay for transposase-accessible chromatin using sequencing (ATAC-Seq) on the different thymic epithelial cell populations to assess chromatin accessibility around the Aire locus in these cells. Moreover, we’ve used the indexing-first chromatin immunoprecipitation (iChIP) technique to assess the occupancy of the Irf8 transcription factor in the Aire locus Overall design: Mature EpCAM+MHC-II high mTECs, Immature EpCAM+MHC-II low mTECs, and EpCAM+Ly51+ cTECs were flow-sorted from thymi isolated from thymi of C57BL/6 6weeks old mice. These cells were then subjected to ATAC-Seq. typically 10 thousand cells were used per replicate.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This dataset contains single-cell ATAC sequencing data from nineteen cases of childhood BCP-ALL and four samples of mononuclear cells from normal bone marrow from healthy donors. The dataset is available as raw sequencing reads (fastq; restricted access) or as an annotated ATAC dataset (h5ad). The libraries were prepared according to the manufacturer’s instructions (10x Genomics CG000169: Nuclei Isolation for Single Cell ATAC Sequencing; 10x Genomics CG000209: Chromium Single Cell ATAC Reagent Kits v1.1).) and sequenced on a Novaseq 6000.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Assay for transposase-accessible chromatin using sequencing data (ATAC-seq) is an efficient and precise method for revealing chromatin accessibility across the genome. Most of the current ATAC-seq tools follow chromatin immunoprecipitation sequencing (ChIP-seq) strategies that do not consider ATAC-seq-specific properties. To incorporate specific ATAC-seq quality control and the underlying biology of chromatin accessibility, we developed a bioinformatics software named ATACgraph for analyzing and visualizing ATAC-seq data. ATACgraph profiles accessible chromatin regions and provides ATAC-seq-specific information including definitions of nucleosome-free regions (NFRs) and nucleosome-occupied regions. ATACgraph also allows identification of differentially accessible regions between two ATAC-seq datasets. ATACgraph incorporates the docker image with the Galaxy platform to provide an intuitive user experience via the graphical interface. Without tedious installation processes on a local machine or cloud, users can analyze data through activated websites using pre-designed workflows or customized pipelines composed of ATACgraph modules. Overall, ATACgraph is an effective tool designed for ATAC-seq for biologists with minimal bioinformatics knowledge to analyze chromatin accessibility. ATACgraph can be run on any ATAC-seq data with no limit to specific genomes. As validation, we demonstrated ATACgraph on human genome to showcase its functions for ATAC-seq interpretation. This software is publicly accessible and can be downloaded at https://github.com/RitataLU/ATACgraph.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The emergence of eusociality is one of the major events in evolution. Although several previous studies have investigated the mechanism underlying caste differentiation and social behavior of eusocial insects including ants and honeybees, the molecular circuits governing the sociality of these insects remain obscure. In this study, we profiled the brain transcriptome and chromatin accessibility of all categories of adult castes: queens, males, gynes and workers in Monomorium pharaonis which is a typical caste-dependent eusocial insect. We created a comprehensive dataset including 16 RNA-seq and 16 ATAC-seq profiles from 4 biological replicates. We also demonstrated strong reproducibility of the datasets and identified specific genes and open chromatin regions in the genome that may be associated with caste differentiation. Overall, our data will be a valuable resource for further study of the mechanisms underlying eusocial insect behavior, particularly the role of brain in the control of eusociality.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract
Transcription factors read the genome, fundamentally connecting DNA sequence to gene expression across diverse cell types. Determining how, where, and when TFs bind chromatin will advance our understanding of gene regulatory networks and cellular behavior. The 2017 ENCODE-DREAM in vivo Transcription-Factor Binding Site (TFBS) Prediction Challenge highlighted the value of chromatin accessibility data to TFBS prediction, establishing state-of-the-art methods for TFBS prediction from DNase-seq. However, the more recent Assay-for-Transposase-Accessible-Chromatin (ATAC)-seq has surpassed DNase-seq as the most widely-used chromatin accessibility profiling method. Furthermore, ATAC-seq is the only such technique available at single-cell resolution from standard commercial platforms. While ATAC-seq datasets grow exponentially, suboptimal motif scanning is unfortunately the most common method for TFBS prediction from ATAC-seq. To enable community access to state-of-the-art TFBS prediction from ATAC-seq, we (1) curated an extensive benchmark dataset (127 TFs) for ATAC-seq model training and (2) built “maxATAC”, a suite of user-friendly, deep neural network models for genome-wide TFBS prediction from ATAC-seq in any cell type. With models available for 127 human TFs, maxATAC is the first collection of high-performance TFBS prediction models for ATAC-seq.
Repository Overview
This repository contains all of the processed training data used by maxATAC for model training and benchmarking. All directories have the extension .tar.gz .
In this repository you will find the directories:
ATAC_Peaks: ATAC-seq peak files called with MACS2. These files are generated for the hg38 reference genome. The files are have the extension .bed.gz. ATAC_Signal_File: ATAC-seq signal file. This file has been read-depth normalized and min-max normalized between 0,1 using the 99th percentile max value. These files are presented as bigwig files with a .bw extension. ChIP_Binding_File: ChIP-seq signal tracks. These files are the binary signal tracks in bigwig format that are found in the ChIP_Peaks directory. ChIP_Peaks: ChIP-seq peaks files. This directory contains the ENCODE IDR peak sets and peak sets created in the maxATAC publication. These files have the extension .bed.gz. Full_Models: Current set of 127 maxATAC TF models. This directory includes the information for thresholding and the .h5 model files. hg38: This directory includes the hg38 reference genome information that was used in this publication. Prediction_and_Benchmarking: This directory contains all of the predictions for chr1 used for benchmarking in a round-robin training approach. Tn5_CutSites: This directory contains the Tn5 cut sites that have been shifted +4 on the (+) strand and -5 on the (-) strand. The cut sites were then slopped 20 bp using bedtools slop. These files are presented as bed files that have been bzipped. Each file represents an individual biological replicate. scATAC: This directory includes data used for scATAC-seq based predictions.
For additional details please see the maxATAC GitHub Repository and bioRxiv pre-print.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Clinical subtypes of ATL samples and “closest cell type” computed by our algorithm.
https://ega-archive.org/dacs/EGAC50000000023https://ega-archive.org/dacs/EGAC50000000023
ATAC-Seq data for C32, CACO2, CL11, HT29, SW403, SW480, SW948 MSS CRC cell lines, and HCEC-1CT normal colon cell line
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data provided here is part of the Galaxy Training Network tutorial that analyses 10x genomics single-cell ATAC-seq data from the 10x platform. The original data is from 1k Peripheral Blood Mononuclear Cells (PBMCs) from a Healthy Donor.
Due to time constraints during training, the datasets were subsampled to reads that map to chromosome 21 only.
The 10x Genomics Datasets follow the Creative Commons Attribution license.
There is an additional count matrix in Anndata format created from full datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data used for tutorial.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Overview This item contains references and test datasets for the Cactus pipeline. Cactus (Chromatin ACcessibility and Transcriptomics Unification Software) is an mRNA-Seq and ATAC-Seq analysis pipeline that aims to provide advanced molecular insights on the conditions under study.
Test datasets The test datasets contain all data needed to run Cactus in each of the 4 supported organisms. This include ATAC-Seq and mRNA-Seq data (.fastq.gz), parameter files (.yml) and design files (*.tsv). They were were created for each species by downloading publicly available datasets with fetchngs (Ewels et al., 2020) and subsampling reads to the minimum required to have enough DAS (Differential Analysis Subsets) for enrichment analysis. Datasets downloaded: - Worm and Humans: GSE98758 - Fly: GSE149339 - Mouse: GSE193393
References One of the goals of Cactus is to make the analysis as simple and fast as possible for the user while providing detailed insights on molecular mechanisms. This is achieved by parsing all needed references for the 4 ENCODE (Dunham et al., 2012; Stamatoyannopoulos et al., 2012; Luo et al., 2020) and modENCODE (THE MODENCODE CONSORTIUM et al., 2010; Gerstein et al., 2010) organisms (human, M. musculus, D. melanogaster and C. elegans). This parsing step was done with a Nextflow pipeline with most tools encapsulated within containers for improved efficiency and reproducibility and to allow the creation of customized references. Genomic sequences and annotations were downloaded from Ensembl (Cunningham et al., 2022). The ENCODE API (Luo et al., 2020) was used to download the CHIP-Seq profiles of 2,714 Transcription Factors (TFs) (Landt et al., 2012; Boyle et al., 2014) and chromatin states in the form of 899 ChromHMM profiles (Boix et al., 2021; van der Velde et al., 2021) and 6 HiHMM profiles (Ho et al., 2014). Slim annotations (cell, organ, development, and system) were parsed and used to create groups of CHIP-Seq profiles that share the same annotations, allowing users to analyze only CHIP-Seq profiles relevant to their study. 2,779 TF motifs were obtained from the Cis-BP database (Lambert et al., 2019). GO terms and KEGG pathways were obtained via the R packages AnnotationHub (Morgan and Shepherd, 2021) and clusterProfiler (Yu et al., 2012; Wu et al., 2021), respectively.
Documentation More information on how to use Cactus and how references and test datasets were created is available on the documentation website: https://github.com/jsalignon/cactus.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Primary pediatric AML patient ATAC-seq data were obtained from Yokohama City University (YCU) and published in:
Yamato G, Kawai T, Shiba N, Ikeda J, Hara Y, Ohki K, Tsujimoto S. I., Kaburagi T, Yoshida K, Shiraishi Y, Miyano S, Kiyokawa N, Tomizawa D, Shimada A, Sotomatsu M, Arakawa H, Adachi S, Taga T, Horibe K, Ogawa S, Hata K, Hayashi Y. Genome-wide DNA methylation analysis in pediatric acute myeloid leukemia. Blood Adv. 2022 Jun 14;6(11):3207-3219. doi: 10.1182/bloodadvances.2021005381. PMID: 35008106; PMCID: PMC9198913.
A modified version of the ATAC-seq Data Processing Pipeline (Reichl, S. et al. Ultimate ATAC-seq Data Processing & Quantification Pipeline. (2024)) was applied to the raw BAM files, accessible at: https://github.com/epigen/atacseq_pipeline.
The pipeline utilized fastp (Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018)) for adapter removal and Bowtie2 (Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012)) for read alignment to the GRCh38 (hg38) human reference genome.
Duplicate marking was performed with samblaster (Faust, G. G. & Hall, I. M. SAMBLASTER: fast duplicate marking and structural variant read extraction. Bioinformatics 30, 2503–2505 (2014)). The aligned BAM files were sorted, indexed, and filtered for ENCODE blacklisted regions using samtools (Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009)).
Counts over exons were obtained using featureCounts (Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general-purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014)).
The table contains the following columns:
Column Name | Description |
---|---|
NCBI_id | RefSeq (NCBI Reference Sequence) accession number for a specific mRNA transcript |
Gene_symbol | Official gene symbol |
ENTREZ_id | Entrez Gene ID |
YCU_NUP98-NSD1+PRDM16high-AM | Read counts per gene for this sample |
YCU_NUP98-NSD1+PRDM16high-HR | Read counts per gene for this sample |
YCU_RUNX1-RUNX1T1-SR | Read counts per gene for this sample |
YCU_t11-19MLL_KA | Read counts per gene for this sample |
YCU_t11-19MLL-NR | Read counts per gene for this sample |
Assay for Transposable Accessible Chromatin (ATAC) reveals a genome wide view of areas of open chromatin at very high resolution, which are often associated with regulatory activity. The ATAC-seq technology uses a Tn5 transposase loaded with nex-generation sequencing primers in order to simultaneously fragment areas of open chromatin and ligate adapters.
Attribution 1.0 (CC BY 1.0)https://creativecommons.org/licenses/by/1.0/
License information was derived automatically
Tables S1-S5. ATAC-seq and DNA methylation data for vtRNA promoters in primary tumors and normal adjacent tissue samples. CSV spreadsheets: Table_S1_ATAC-seq_data_500bp: All ATAC-seq data of vtRNAs promoter (500bp) data for primary tumor samples; Table_S2_DNA_methylation_500bp: All DNA methylation data and ATAC-seq data of vtRNAs promoter (500bp) data for total primary tumor and normal adjacent samples; Table_S3_DNA_methylation_NORMAL: All DNA methylation data of vtRNAs promoter (500bp) data for normal adjacent samples; Table_S4_DNA_methylation_TUMOR: All DNA methylation data of vtRNAs promoter (500bp) data for primary tumor samples; Table_S5_Normal_&_Tumor_matched: All DNA methylation data of vtRNAs promoter (500bp) data for primary tumor and normal adjacent samples.
Table S6. VtRNAs Transcription Factors Binding and KEEG enriched terms. CSV spreadsheets: Table_S6_Binding_Factors: Transcription factors identified in the cell line K562 as ChIP-seq Peaks by ENCODE 3 project and KEEG_terms: enriched KEGG pathway terms (FDR < 0.05).
Tables S7-S8. DNA methylation, ATAC-seq data and associated survival data for primary tumors. CSV spreadsheets: Table_S7_DNA-methylation_Survival_data: All DNA methylation data of vtRNAs promoter (500bp) and survival data for primary tumor samples; Table_S8_ATAC-seq_Survival_data: ATAC-seq data of vtRNAs promoter (500bp) and survival data for primary tumor samples.
Tables S9-S10. Correlation of ATAC-seq values between vtRNA and all genome promoters in primary tumor samples. CSV spreadsheets: Table_S9_ATAC-seq_gene_promoter_spearman_correlation: Spearman correlation values of all promoter genes and vtRNAs in primary tumors samples; Table_S10_vtRNAs_pathway_enrichment_and_cluster_chromosome_localization_analysis: vtRNAs vtRNA1-1, vtRNA1-2, vtRNA1-3 and vtRNA2-1 pathway enrichment and cluster chromosome localization data.
Tables S11-S14. ATAC-seq and DNA methylation data for vtRNA promoters in primary tumors and the associated Immune Subtypes data. CSV spreadsheets: Table_S11_Immune_Subtypes_DNA_methylation_data: All DNA methylation data of vtRNAs promoter (500bp) and Immune Subtypes data for primary tumor samples; Table_S12_Spearman_corr_vtRNAs_Immune_Subtypes_DNA_methylation_data: Spearman correlation values of DNA methylation data of vtRNAs promoter (500bp) and Immune Subtypes data for primary tumor samples; Table_S13_Immune_Subtypes_ATAC-seq_data: All ATAC-seq data of vtRNAs promoter (500bp) and Immune Subtypes data for primary tumor samples; Table_S14_Spearman_corr_vtRNAs_Immune_Subtypes_ATAC-seq_data: Spearman correlation values of ATAC-seq data of vtRNAs promoter (500bp) and Immune Subtypes data for primary tumor samples.
https://www.scilifelab.se/data/restricted-access/https://www.scilifelab.se/data/restricted-access/
Human breast cancer OMICs data generated for the publication "Solid phase capture and profiling of open chromatin by spatial ATAC"
Abstract from the publication: Current methods for epigenomic profiling are limited in the ability to obtain genome wide information with spatial resolution. Here we introduce spatial ATAC, a method that integrates transposase-accessible chromatin profiling in tissue sections with barcoded solid-phase capture to perform spatially resolved epigenomics. We show that spatial ATAC enables the discovery of the regulatory programs underlying spatial gene expression during mouse organogenesis, lineage differentiation and in human pathology.
Dataset description The dataset includes spatially-resolved chromatin accessibility profiling performed on three fresh-frozen tissue sections of HER2+ breast cancer. We provide raw data in the form of fastq files, along with processed feature barcode matrices, metadata, and photomicrographs of the tissue slices. Additionally the dataset contains spatially-resolved gene expression profiling of tissue sections from the same specimen. For this too, we provide raw and processed data, along with the metadata information.
Spatial transcriptomics data were generated using 10X Genomics' Visium platform, while spatial ATAC data were created using a method introduced in our publication, which relies on an analogous workflow. Samples were sequenced on Illumina Nextseq 550 or 2000 and raw data were processed with CellRanger Gene Expression or ATAC-seq pipelines.
To apply for conditional access to the dataset, please contact datacentre@scilifelab.se.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Assay for Transposase Accessible Chromatin with high-throughput sequencing (ATAC-seq) is a powerful genomic technology that is used for the global mapping and analysis of open chromatin regions. However, for users to process and analyze such data they either have to use a number of complicated bioinformatic tools or attempt to use the currently available ATAC-seq analysis software, which are not very user friendly and lack visualization of the ATAC-seq results. Because of these issues, biologists with minimal bioinformatics background who wish to process and analyze their own ATAC-seq data by themselves will find these tasks difficult and ultimately will need to seek help from bioinformatics experts. Moreover, none of the available tools provide complete solution for ATAC-seq data analysis. Therefore, to enable non-programming researchers to analyze ATAC-seq data on their own, we developed a tool called Graphical User interface for the Analysis and Visualization of ATAC-seq data (GUAVA). GUAVA is a standalone software that provides users with a seamless solution from beginning to end including adapter trimming, read mapping, the identification and differential analysis of ATAC-seq peaks, functional annotation, and the visualization of ATAC-seq results. We believe GUAVA will be a highly useful and time-saving tool for analyzing ATAC-seq data for biologists with minimal or no bioinformatics background. Since GUAVA can also operate through command-line, it can easily be integrated into existing pipelines, thus providing flexibility to users with computational experience.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As chromatin accessibility provides rich information on transcription factor binding process, for a given TF-based raw regulon, firstly we test whether the TF motif is enriched in this regulon. To perform this efficiently, we have built our own database in BED format, which contains all available TF motifs and their occurrences across the potential binding regions (TSS$\pm10$ kb) of all HUMAN genes.
https://ega-archive.org/dacs/EGAC00001002224https://ega-archive.org/dacs/EGAC00001002224
This dataset contains ATAC-seq data performed in MM.1S cell line in ETOH (control) or Dexamethasone condition (Treatment)