66 datasets found
  1. f

    Additional file 5 of ATAC-seq normalization method can significantly affect...

    • springernature.figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jake J. Reske; Mike R. Wilson; Ronald L. Chandler (2023). Additional file 5 of ATAC-seq normalization method can significantly affect differential accessibility analysis and interpretation [Dataset]. http://doi.org/10.6084/m9.figshare.12177726.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Authors
    Jake J. Reske; Mike R. Wilson; Ronald L. Chandler
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 2.ATACseq_workflow.txt—Example machine-readable Fig. 4 workflow including stepwise unix and R commands for ATAC-seq data processing.

  2. m

    Analysis of multiome sequencing data from neuroblastoma cell lines

    • data.mendeley.com
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Richard Guyer (2025). Analysis of multiome sequencing data from neuroblastoma cell lines [Dataset]. http://doi.org/10.17632/s2fcfb8phh.2
    Explore at:
    Dataset updated
    Jun 30, 2025
    Authors
    Richard Guyer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    R scripts used to process and analyze multiome (joint GEX and ATAC) single-nucleus sequencing data from human neuroblastoma cell lines.

  3. Z

    Processed read counts from macrophage RNA-seq and ATAC-seq experiments

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alasoo, Kaur (2020). Processed read counts from macrophage RNA-seq and ATAC-seq experiments [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_748695
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Gaffney, Daniel
    Alasoo, Kaur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    RNA-seq files:

    RNA_count_matrix.txt.gz - raw read counts

    RNA_cqn_matrix.txt.gz - read counts quantile normalised with the cqn R package

    RNA_gene_metadata.txt.gz - information about the genes

    RNA_sample_metadata.txt.gz - information about the samples

    ATAC-seq files:

    ATAC_count_matrix.txt.gz - raw read counts

    ATAC_cqn_matrix.txt.gz - read counts quantile normalised with the cqn R package

    ATAC_peak_metadata.txt.gz - peak coordinates and other metadata

    ATAC_sample_metadata.txt.gz - sample metadata

  4. Additional file 3 of intePareto: an R package for integrative analyses of...

    • springernature.figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yingying Cao; Simo Kitanovski; Daniel Hoffmann (2023). Additional file 3 of intePareto: an R package for integrative analyses of RNA-Seq and ChIP-Seq data [Dataset]. http://doi.org/10.6084/m9.figshare.13502196.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Yingying Cao; Simo Kitanovski; Daniel Hoffmann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 3 Results_of_RNASeq_data_analysis. Full list of the results of differential gene analysis with RNA-Seq data.

  5. d

    DoubleChEC program to identify transcription factor binding sites from...

    • dataone.org
    • data.niaid.nih.gov
    • +2more
    Updated Dec 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jason Brickner (2023). DoubleChEC program to identify transcription factor binding sites from mapped ChEC-seq data [Dataset]. http://doi.org/10.5061/dryad.c866t1gd5
    Explore at:
    Dataset updated
    Dec 23, 2023
    Dataset provided by
    Dryad Digital Repository
    Authors
    Jason Brickner
    Time period covered
    Jan 1, 2023
    Description

    ChIP-seq (chromatin immunoprecipitation followed by sequencing) is commonly used to identify genome-wide protein-DNA interactions. However, ChIP-seq often gives a low yield, which is not ideal for quantitative outcomes. An alternative method to ChIP-seq is ChEC-seq (Chromatin endogenous cleavage with high-throughput sequencing). In this method, the endogenous TF (transcription factor) of interest is fused with MNase (micrococcal nuclease) that non-specifically cleaves DNA near binding sites. Compared to the original ChEC-seq method, the modified version requires far less amplification. Since MACS3 failed to identify peaks in data generated from the modified ChEC-seq method, a new peak finder has been developed specifically for it. There are three functions in the peak_finder/. callpeaks() is used to identify peaks from BAM files. goanalysis() is used to make GO (Gene Ontology) term plots from peaks. bedtomeme() is a wrapper function to perform MEME analysis in R after MEME Suite is inst..., ****EXCERPTED FROM BIORXIV PREPRINT; SEE PREPRINT OR PUBLISHED PAPER FOR REFERENCES AND DETAILS**** Yeast strains All yeast strains were derived from BY4741. A C-terminal micrococcal nuclease fusion was introduced to the protein of interest through transformation and homologous recombination of PCR-amplified DNA. Primers were designed with 50-bp of homology to the 3’ end of the coding sequence of interest. The 3xFLAG-MNase with a KanR marker was amplified from pGZ108 (Zentner et al., 2015) and transformed into BY4741 as previously described. Successful transformation was confirmed by immunoblotting and PCR, followed by sequencing. Lyophilized DNA oligonucleotides were resuspended in molecular-grade water to a concentration of 100 µM. For ligation, the following pair of oligonucleotides were annealed to produce the Y-adapter: Tn5ME-A (5’-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-3’) and Y-Adapt-i5 R (5’-CTGTCTCTTATACACATCTTCATAGTAATCATC-3’). For Tn5 Tagmentation, the following i7 oligonucle..., , # DoubleChEC TF binding site finder

    Introduction

    ChIP-seq (chromatin immunoprecipitation followed by sequencing) is commonly used to identify genome-wide protein-DNA interactions. However, ChIP-seq often gives a low yield, which is not ideal for quantitative outcomes. An alternative method to ChIP-seq is ChEC-seq (Chromatin endogenous cleavage with high-throughput sequencing). In this method, an endogenous TF (transcription factor) fused to MNase (micrococcal nuclease) cleaves DNA near binding sites. This package is designed to identify high-confidence binding sites from cleavage patterns from ChEC-seq2, a variant form of ChEC-seq.

    There are three functions in the peak_finder/. callpeaks() is used to identify peaks from single-end mapped reads input as BAM files. goanalysis() is used to make GO (Gene Ontology) term plots from peaks. bedtomeme() is a wrapper function to perform MEME analysis in R **after [MEME Suite](https://meme-...

  6. f

    DataSheet1_scATACpipe: A nextflow pipeline for comprehensive and...

    • frontiersin.figshare.com
    zip
    Updated Jun 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kai Hu; Haibo Liu; Nathan D. Lawson; Lihua Julie Zhu (2023). DataSheet1_scATACpipe: A nextflow pipeline for comprehensive and reproducible analyses of single cell ATAC-seq data.ZIP [Dataset]. http://doi.org/10.3389/fcell.2022.981859.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    Frontiers
    Authors
    Kai Hu; Haibo Liu; Nathan D. Lawson; Lihua Julie Zhu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Single cell ATAC-seq (scATAC-seq) has become the most widely used method for profiling open chromatin landscape of heterogeneous cell populations at a single-cell resolution. Although numerous software tools and pipelines have been developed, an easy-to-use, scalable, reproducible, and comprehensive pipeline for scATAC-seq data analyses is still lacking. To fill this gap, we developed scATACpipe, a Nextflow pipeline, for performing comprehensive analyses of scATAC-seq data including extensive quality assessment, preprocessing, dimension reduction, clustering, peak calling, differential accessibility inference, integration with scRNA-seq data, transcription factor activity and footprinting analysis, co-accessibility inference, and cell trajectory prediction. scATACpipe enables users to perform the end-to-end analysis of scATAC-seq data with three sub-workflow options for preprocessing that leverage 10x Genomics Cell Ranger ATAC software, the ultra-fast Chromap procedures, and a set of custom scripts implementing current best practices for scATAC-seq data preprocessing. The pipeline extends the R package ArchR for downstream analysis with added support to any eukaryotic species with an annotated reference genome. Importantly, scATACpipe generates an all-in-one HTML report for the entire analysis and outputs cluster-specific BAM, BED, and BigWig files for visualization in a genome browser. scATACpipe eliminates the need for users to chain different tools together and facilitates reproducible and comprehensive analyses of scATAC-seq data from raw reads to various biological insights with minimal changes of configuration settings for different computing environments or species. By applying it to public datasets, we illustrated the utility, flexibility, versatility, and reliability of our pipeline, and demonstrated that our scATACpipe outperforms other workflows.

  7. n

    Data from: ETV4 mediates dosage-dependent prostate tumor initiation and...

    • data.niaid.nih.gov
    • datacatalog.mskcc.org
    • +1more
    zip
    Updated Mar 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dan Li; Yu Chen; Ping Chi; Yu Zhan; Naitao Wang; Fanying Tang; Cindy Lee; Gabriella Bayshtok; Amanda Moore; Elissa Wong; Mohini Pachai; Yuanyuan Xie; Jessica Sher; Jimmy Zhao; Anuradha Gopalan; Joseph Chan; Ekta Khurana; Peter Shepherd; Nora Navone; Makhzuna Khudoynazarova (2023). ETV4 mediates dosage-dependent prostate tumor initiation and cooperates with p53 loss to generate prostate cancer [Dataset]. http://doi.org/10.5061/dryad.v41ns1s0s
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 24, 2023
    Dataset provided by
    Memorial Sloan Kettering Cancer Center
    The University of Texas MD Anderson Cancer Center
    Weill Cornell Medicine
    Authors
    Dan Li; Yu Chen; Ping Chi; Yu Zhan; Naitao Wang; Fanying Tang; Cindy Lee; Gabriella Bayshtok; Amanda Moore; Elissa Wong; Mohini Pachai; Yuanyuan Xie; Jessica Sher; Jimmy Zhao; Anuradha Gopalan; Joseph Chan; Ekta Khurana; Peter Shepherd; Nora Navone; Makhzuna Khudoynazarova
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The mechanisms underlying ETS-driven prostate cancer initiation and progression remain poorly understood due to a lack of model systems that recapitulate this phenotype. We generated a genetically engineered mouse with prostate-specific expression of the ETS factor, ETV4, at lower and higher protein dosages through mutation of its degron. Lower-level expression of ETV4 caused mild luminal cell expansion without histologic abnormalities and higher-level expression of stabilized ETV4 caused prostatic intraepithelial neoplasia (mPIN) with 100% penetrance within 1 week. Tumor progression was limited by p53-mediated senescence and Trp53 deletion cooperated with stabilized ETV4. The neoplastic cells expressed differentiation markers such as Nkx3.1 recapitulating luminal gene expression features of untreated human prostate cancer. Single-cell and bulk RNA-sequencing showed stabilized ETV4 induced a novel luminal-derived expression cluster with signatures of the cell cycle, senescence, and epithelial to mesenchymal transition. These data suggest that ETS overexpression alone, at sufficient dosage, can initiate prostate neoplasia. Methods Mouse prostate digestion: Intraperitoneal injection of tamoxifen was administered in 8-week-old mice. 2 weeks after tamoxifen treatment, the mouse prostate was digested 1 hour with Collagenase/Hyaluronidase (STEMCELL, #07912), and then 30 minutes with TrypLETM Express Enzyme (Thermo Fischer, # 12605028) at 37°C to isolate single prostate cells. The prostate cells were stained with PE/Cy7 conjugated anti-mouse CD326 (EpCAM) antibody (BioLegend, 118216) and then, CD326 and EYFP double positive cells were sorted out by flow cytometry, which are luminal cells mainly from the anterior prostate and dorsal prostate. The mRNA or genomic DNA were extracted from these double-positive cells and then were used for ATAC-sequencing and RNA-sequencing analysis. ATAC-seq and primary data processing: ATAC-seq was performed as previously described. Primary data processing and peak calling were performed using ENCODE ATAC-seq pipeline (https://github.com/kundajelab/atac_dnase_pipelines). Briefly, paired-end reads were trimmed, filtered, and aligned against mm9 using Bowtie2. PCR duplicates and reads mapped to mitochondrial chromosome or repeated regions were removed. Mapped reads were shifted +4/-5 to correct for the Tn5 transposase insertion. Peak calling was performed using MACS2, with p-value < 0.01 as the cutoff. Reproducible peaks from two biological replicates were defined as peaks that overlapped by more than 50%. On average 25 million uniquely mapped pairs of reads were remained after filtering. The distribution of inserted fragment length shows a typical nucleosome banding pattern, and the TSS enrichment score (reads that are enriched around TSS against background) ranges between 28 and 33, suggesting the libraries have high quality and were able to capture the majority of regions of interest. Differential peak accessibility: Reads aligned to peak regions were counted using R package GenomicAlignments_v1.12.2. Read count normalization and differential accessible peaks were called with DESeq2_v1.16.1 in R 3.4.1. Differential peaks were defined as peaks with adjusted p-value < 0.01 and |log2(FC)| > 2. For visualization, coverage bigwig files were generated using bamCoverage command from deepTools2, normalizing using the size factor generated by DESeq2. The differential ATAC-seq peak density plot was generated with deepTools2, using regions that were significantly more or less accessible in ETV4AAA samples relative to EYFP samples. Motif analysis: Enriched motif was performed using MEME-ChIP 5.0.0 with differentially accessible regions in ETV4AAA relative to EYFP. ATAC-seq footprinting was performed using TOBIAS. First, ACACCorrect was run to correct Tn5 bias, followed by ScoreBigwig to calculate footprint score, and finally BindDetect to generate differential footprint across regions. RNA-seq analysis: The extracted RNA was processed for RNA-sequencing by the Integrated Genomics Core Facility at MSKCC. The libraries were sequenced on an Illumina HiSeq-2500 platform with 51 bp paired-end reads to obtain a minimum yield of 40 million reads per sample. The sequenced data were aligned using STAR v2.3 with GRCm38.p6 as annotation. DESeq2_v1.16.1 was subsequently applied on read counts for normalization and the identification of differentially expressed genes between ETV4AAA and EYFP groups, with an adjusted p-value < 0.05 as the threshold. Genes were ranked by sign(log2(FC)) * (-log(p-value)) as input for GSEA analysis using ‘Run GSEA Pre-ranked’ with 1000 permutations (48). The custom gene sets used in GSEA analysis are shown in Table S2. Unsupervised hierarchical clustering: To get an overall sample clustering as part of QC, hierarchical clustering was performed using pheatmap_v1.0.10 package in R on normalized ATAC-seq or RNA-seq data. It was done using all peaks or all genes, with Spearman or Pearson correlation as the distance metric. To have an overview of the differential gene expression from the RNA-seq data, unsupervised clustering was also performed on a matrix with all samples as columns and scaled normalized read counts of differentially expressed genes between ETV4AAA and EYFP as rows. Integrative analysis of ATAC-seq, RNA-seq, and ChIP-seq data: ERG ChIP-seq peaks were called using MACS 2.1, with an FDR cutoff of q < 10-3 and the removal of peaks mapped to blacklist regions. Reproducible peaks between two biological replicates were identified as ETV4AAA ATAC-seq peaks. ERG ChIP-seq peaks and ETV4AAA ATAC-seq peaks were considered as overlap if peak summits were within 250bp. To determine whether the overlap was significant, enrichment analysis was done using regioneR_v1.8.1 in R, which counted the number of overlapped peaks between a set of randomly selected regions in the genome (excluding blacklist regions) and the ERG-ChIP seq peaks or ETV4AAA ATAC-seq peaks. A null distribution was formed using 1000 permutation tests to compute the p-value and z-score of the original evaluation. To assign ATAC-seq peaks to genes, ChIPseeker_v1.12.1 in R was used. Each peak was unambiguously assigned to one gene with a TSS or 3’ end closest to that peak. Differential gene expression between ETV4AAA and EYFP was evaluated using log2(FC) calculated by DESeq2. p-values were estimated with Wilcoxon rank t-test and Student t-test. scRNA-sequencing: Tmprss2-CreERT2, EYFP; Tmprss2-CreERT2, ETV4WT; Tmprss2-CreERT2, ETV4AAA; and Tmprss2-CreERT2, ETV4AAA; Trp53L/L mice were euthanized 2 weeks or 4 months after tamoxifen treatment (n=3 mice for each genotype and time point). After euthanasia, the prostates were dissected out and minced with scalpel, and then processed for 1h digestion with collagenase/hyaluronidase (#07912, STEMCELL Technologies) and 30min digestion with TrypLE (#12605010, Gibco). Live single prostate cells were sorted out by flow cytometry as DAPI-. For each mouse, 5,000 cells were directly processed with 10X genomics Chromium Single Cell 3’ GEM, Library & Gel Bead Kit v3 according to manufacturer’s specifications. For each sample, 200 million reads were acquired on NovaSeq platform S4 flow cell. Reads obtained from the 10x Genomics scRNAseq platform were mapped to mouse genome (mm9) using the Cell Ranger package (10X Genomics). True cells are distinguished from empty droplets using scCB2 package. The levels of mitochondrial reads and numbers of unique molecular identifiers (UMIs) were similar among the samples, which indicates that there were no systematic biases in the libraries from mice with different genotypes. Cells were removed if they expressed fewer than 600 unique genes, less than 1,500 total counts, more than 50,000 total counts, or greater than 20% mitochondrial reads. Genes detected in less than 10 cells and all mitochondrial genes were removed for subsequent analyses. Putative doublets were removed using the Doublet Detection package. The average gene detection in each cell type was similar among the samples. Combining samples in the entire cohort yielded a filtered count matrix of 48,926 cells by 19,854 genes, with a median of 6,944 counts and a median of 1,973 genes per cell, and a median of 2,039 cells per sample. The count matrix was then normalized to CPM (counts per million), and log2(X+1) transformed for analysis of the combined dataset. The top 1000 highly variable genes were found using SCANPY (version 1.6.1) (77). Principal Component Analysis (PCA) was performed on the 1,000 most variable genes with the top 50 principal components (PCs) retained with 29% variance explained. To visualize single cells of the global atlas, we used UMAP projections (https://arxiv.org/abs/1802.03426). We then performed Leiden clustering. Marker genes for each cluster were found with scanpy.tl.rank_genes_groups. Cell types were determined using the SCSA package, an automatic tool, based on a score annotation model combining differentially expressed genes (DEGs) and confidence levels of cell markers from both known and user-defined information. Heat-map were performed for single cells based on log-normalized and scaled expression values of marker genes curated from literature or identified as highly differentially expressed. Differentially expressed genes between different clusters were found using MAST package, which were shown in heat-map. The logFC of MAST output was used for the ranked gene list in GSEA analysis (48). The custom gene sets used in GSEA analysis are shown in Table S2. Gene imputation was performed using MAGIC (Markov affinity-based graph imputation of cells) package, and imputated gene expression were used in the heatmap. Analysis of public human gene expression datasets: To analyze TP53 RNA expression in human prostate cancer samples, we obtained normalized RNA-seq data from prostate cancer TCGA (www.firebrowse.org) (3). To assess the role of TP53 loss on

  8. Data from: Precise modulation of transcription factor levels identifies...

    • zenodo.org
    application/gzip, bin +1
    Updated Mar 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sahin; Sahin (2023). Precise modulation of transcription factor levels identifies features underlying dosage sensitivity [Dataset]. http://doi.org/10.5281/zenodo.7689948
    Explore at:
    application/gzip, txt, binAvailable download formats
    Dataset updated
    Mar 2, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sahin; Sahin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Processed data and code for "Precise modulation of transcription factor levels reveals drivers of dosage sensitivity," Naqvi et al 2022.

    Count/expression data

    • all.sub.150bpclust.greater2.500bp.merge.ATAC.counts.fulldep.3h.24h.txt.gz - ATAC-seq counts from all samples (SOX9 titration and depletion) over all reproducible ATAC-seq peak regions
    • all.sub.150bpclust.greater2.500bp.merge.k27ac.txt.gz - H3K27ac ChIP-seq counts from SOX9 depletion samples over all reproducible peak regions
    • all.sub.150bpclust.greater2.500bp.merge.SOX9titr.V5.counts.txt.gz - V5 (SOX9) ChIP-seq counts from partial SOX9 titration (100%, 60%, 30%, 0%) over all reproducible peak regions
    • all.sub.150bpclust.greater2.500bp.merge.SOX9titr.TWIST1.in.counts.tab.txt.gz - TWIST1 and input ChIP-seq counts from partial SOX9 titration (100%, 60%, 30%, 0%) over all reproducible peak regions
    • rna.salmon.7rep.txi.counts.txt.gz - RNA-seq counts from SOX9 titration samples
    • rna.salmon.7rep.txi.abundance.txt.gz - RNA-seq TPM values from SOX9 titration samples
    • slam.tcreadcount.txt.gz - SLAM-seq T-C conversion-containing read counts (representing newly transcribed mRNAs) from SOX9 depletion samples
    • slam.readcount.txt.gz - SLAM-seq read counts (representing all mRNAs) from SOX9 depletion samples

    Metadata

    • all.protcod.gene.features.txt.gz - Features of interest for all protein-coding genes
    • all.sub.150bpclust.greater2.500bp.merge.features.txt.gz - Features of interest for all reproducible peak regions
    • atac_depletion_3h_24h_design.txt - design matrix for ATAC-seq SOX9 depletion samples
    • atac_titration_48h_design.txt - design matrix for ATAC-seq SOX9 titration samples
    • Homo_sapiens.GRCh38.cdna.all.txid2gene.id.symbol.type.txt.gz - Ensembl transcript types (for filtering to protein-coding genes in various analyses)
    • k27_depletion_3h_24h_design.txt - design matrix for H3K27ac ChIP-seq SOX9 depletion samples
    • v5_sox9titration_design.txt - design matrix for V5 (SOX9) ChIP-seq SOX9 partial titration samples
    • twist1_sox9titration_design.txt - design matrix for TWIST1 ChIP-seq SOX9 partial titration samples
    • rna_titration_48h_design.txt - design matrix for RNA-seq SOX9 titration samples
    • slam_depletion_3h_24h_design.txt - design matrix for SLAM-seq SOX9 depletion samples
    • facialgwas_snpia_ld0.5.hg38.bed - SNPs in LD (r2 > 0.5) with any of the facial GWAS lead SNPs in Supplementary Table 2 of Naqvi, Hoskens, et al, Annu Rev. Hum Genet. Genom. 2022.
    • facialgwas_prsendo_7e5_either_snpia_ld0.5.hg38.bed - SNPs in LD (r2 > 0.5) with the subset of the same facial GWAS SNPs that show significant (p-value < 7e-05, ~corresponding to Bonferonni-corrected p-value of 0.01) association with the PRS endophenotype GWAS in either US or UK cohort.

    Scripts

    • atac_deseq_fitmodels_bs_parallel.R - R code for fitting bootstrapped Hill equations to all SOX9-dependent REs (computationally intensive, so has been coded for parallelization over multiple cores)
      • Input: all.sub.150bpclust.greater2.500bp.merge.ATAC.counts.fulldep.3h.24h.txt.gz, atac_titration_48h_design.txt
      • Output: enh_linear_sig_aic_bsmat.txt, enh_linear_sig_aic_bsmat_enhind.txt
    • atac_deseq_fitmodels.R - R code for fitting Hill equations (no bootstrap) to all SOX9-dependent REs
      • Input: all.sub.150bpclust.greater2.500bp.merge.ATAC.counts.fulldep.3h.24h.txt.gz, atac_titration_48h_design.txt
      • Output: enh_linear_sig_aic.rds
    • atac_k27_depletion_deseq.R - R code for DESeq2 analysis of ATAC and H3K27ac ChIP SOX9 depletion (3h and 24h)
      • Input: all.sub.150bpclust.greater2.500bp.merge.ATAC.counts.fulldep.3h.24h.txt.gz, all.sub.150bpclust.greater2.500bp.merge.k27ac.txt.gz, atac_depletion_3h_24h_design.txt
      • Output: atac_depletion_3h_24h_deseq.txt, k27_depletion_3h_24h_deseq.txt
    • v5_twist1_sox9titration_deseq.R - R code for DESeq2 analysis of V5 (SOX9) and TWIST1 ChIP in partial SOX9 titration (100%, 60%, 30%, 0%)
      • Input: all.sub.150bpclust.greater2.500bp.merge.SOX9titr.V5.counts.txt.gz, all.sub.150bpclust.greater2.500bp.merge.SOX9titr.TWIST1.in.counts.tab.txt.gz, v5_sox9titration_design.txt, twist1_sox9titration_design.txt
      • Output: v5_sox9titration_deseq.txt, twist1_sox9titration_deseq.txt
    • drm.R - Modified version of code from drc() package to prevent errors, install drc() with this version to avoid errors
    • group_comparisons.Rmd - R code to compare computed parameters (i.e. ED50, Hill) between sets of REs/genes utilizing bootstrap information
      • Input: enh_linear_sig_aic_bsmat_enhind.txt, enh_linear_sig_aic_bsmat.txt.gz, enh_linear_sig_aic.rds, gene_linear_sig_aic_bsmat_enhind.txt, gene_linear_sig_aic_bsmat.txt.gz, gene_linear_sig_aic.rds, all.sub.150bpclust.greater2.500bp.merge.features.txt.gz, all.protcod.gene.features.txt.gz
      • Uses: summarize_bs_helper.R
    • plot_re_gene_fits.Rmd - R code for plotting individual RE/gene counts and Hill/linear fits
      • Input: all.sub.150bpclust.greater2.500bp.merge.ATAC.counts.fulldep.3h.24h.txt.gz, atac_titration_48h_design.txt, rna.salmon.7rep.txi.counts.txt.gz, rna.salmon.7rep.txi.abundance.txt.gz, rna_titration_48h_design.txt
    • rna_deseq_fitmodels_bs_parallel.R - R code for fitting bootstrapped Hill equations to all SOX9-dependent genes (computationally intensive, so has been coded for parallelization over multiple cores)
      • Input: rna.salmon.7rep.txi.counts.txt.gz, rna.salmon.7rep.txi.abundance.txt.gz, rna_titration_48h_design.txt
      • Output: gene_linear_sig_aic_bsmat.txt, gene_linear_sig_aic_bsmat_enhind.txt
    • rna_deseq_fitmodels.R - R code for fitting Hill equations (no bootstrap) to all SOX9-dependent genes
      • Input: rna.salmon.7rep.txi.counts.txt.gz, rna.salmon.7rep.txi.abundance.txt.gz, rna_titration_48h_design.txt
      • Output: gene_linear_sig_aic.rds
    • slam_depletion_deseq.R - R code for DESeq2/sva analysis of SLAM-seq SOX9 depletion (3h and 24h)
      • Input: slam.tcreadcount.txt.gz, slam_depletion_3h_24h_design.txt
      • Output: slam_depletion_3h_24h_deseq.txt
    • summarize_bs_helper.R - Helper functions for group_comparisons.Rmd

    Intermediate/output files (some files are gzipped to save space, the Rscripts that output them won't gzip but they expect gzipped input when indicated)

    • atac_depletion_3h_24h_deseq.txt.gz - DESeq2 output of ATAC SOX9 depletion (3h and 24h)
    • enh_linear_sig_aic_bsmat_enhind.txt - index to name file for SOX9-dependent RE bootstrap output
    • enh_linear_sig_aic_bsmat.txt.gz - SOX9-dependent RE bootstrap output
    • enh_linear_sig_aic.rds - Parameters from Hill equation fit on all SOX9-dependent REs (no bootstrap) (RDS file)
    • gene_linear_sig_aic_bsmat_enhind.txt - index to name file for SOX9-dependent gene bootstrap output
    • gene_linear_sig_aic_bsmat.txt.gz - SOX9-dependent gene bootstrap output
    • gene_linear_sig_aic.rds - Parameters from Hill equation fit on all SOX9-dependent gene (no bootstrap) (RDS file)
    • k27_depletion_3h_24h_deseq.txt.gz - DESeq2 output of H3K27ac ChIP-seq SOX9 depletion (3h and 24h)
    • slam_depletion_3h_24h_deseq.txt.gz - DESeq2 output of SLAM-seq SOX9 depletion (3h and 24h)
    • v5_sox9titration_deseq.txt.gz - DESeq2 output of V5 (SOX9) ChIP-seq in partial SOX9 titration (100%, 60%, 30%, 0%)
    • twist1_sox9titration_deseq.txt.gz - DESeq2 output of TWIS1 ChIP-seq in partial SOX9 titration (100%, 60%, 30%, 0%)

    chromatin_predictions.tar.gz (self-contained folder for chromatin-based predictions of gene expression change) contains:

    • ABC_6conc.sh - Bash script to calculate predicted gene expression change based on ATAC-seq fold-change at each of five SOX9 concentrations (warning: creates a number of very large additional intermediate output files). Requires as input all files in this folder except for all.sub.150bpclust.greater2.500bp.merge.ABC.5Mb.power-0.7.norm.6conc.all.total.txt
    • all.sub.150bpclust.greater2.500bp.merge.ABC.5Mb.power-0.7.norm.6conc.all.total.txt - ATAC-based predicted fold-change of all genes each of five SOX9 concentrations (78, 52, 25, 8, 0, in that order), relative to 100% SOX9
    • all.sub.150bpclust.greater2.500bp.merge.ATAC.DMSO.counts.txt - ATAC-seq counts over all reproducible peak regions in updepleted samples
    • all.sub.150bpclust.greater2.500bp.merge.bed - bed file of all reproducible peak regions
    • all.sub.150bpclust.greater2.500bp.merge.deseq.allconc.lfc.txt - DESeq2 output from ATAC SOX9 titration, comparing each lowered SOX9 concentration to 100% SOX9
    • all.sub.150bpclust.greater2.500bp.merge.k27ac.dmso.counts.txt - H3K27ac ChIP-seq counts over all reproducible peak regions in updepleted samples
    • hg38_refGene_TSS_collapsed.bed - collapsed TSSs for all genes
    • hg38.genome - genome file

  9. Z

    ATAC-seq analysis for intravesical BCG in bladder cancer

    • data.niaid.nih.gov
    Updated Apr 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Diem, Gabriel (2023). ATAC-seq analysis for intravesical BCG in bladder cancer [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7825967
    Explore at:
    Dataset updated
    Apr 14, 2023
    Dataset provided by
    Pichler, Renate
    Thurnher, Martin
    Diem, Gabriel
    Posch, Wilfried
    Hackl, Hubert
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BCG vaccination can boost innate immune responses via trained immunity (TI), resulting in an increased resistance to respiratory viral infections. Assay for transposase accessible chromatin (ATAC), including tagmentation, library preparation and sequencing were performed by Genewiz (Azenta Life Sciences, MA, USA) on PBMCs from two BCG-treated NMIBC patients at baseline and during BCG (mid) as well as from three healthy donors. This dataset include the pipeline for preprocessing of raw data including mapping to the hg38 reference genome using bowtie2 and peak calling by MACS2. Differentially accessible regions in proximity to annotated genes between the two time points (during BCG versus baseline) were identified using the R packages csaw and edgeR and resulting data files are provided.

  10. Additional file 2 of intePareto: an R package for integrative analyses of...

    • figshare.com
    • springernature.figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yingying Cao; Simo Kitanovski; Daniel Hoffmann (2023). Additional file 2 of intePareto: an R package for integrative analyses of RNA-Seq and ChIP-Seq data [Dataset]. http://doi.org/10.6084/m9.figshare.13502193.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Yingying Cao; Simo Kitanovski; Daniel Hoffmann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 2 Results_of_intePareto. Full list of the results of integrative analysis using intePareto.

  11. f

    Additional file 6 of ATAC-seq normalization method can significantly affect...

    • springernature.figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jake J. Reske; Mike R. Wilson; Ronald L. Chandler (2023). Additional file 6 of ATAC-seq normalization method can significantly affect differential accessibility analysis and interpretation [Dataset]. http://doi.org/10.6084/m9.figshare.12177750.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Authors
    Jake J. Reske; Mike R. Wilson; Ronald L. Chandler
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 6. csaw_workflow.R—Example R workflow for differential accessibility analysis with csaw as graphically displayed in Fig. 6. Describes process for both TMM and loess normalizations and either supplying MACS2 peak sets as query regions or identifying de novo locally enriched windows.

  12. o

    Data from: PING 2.0: An R/Bioconductor package for nucleosome positioning...

    • omicsdi.org
    • data.niaid.nih.gov
    xml
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yongjun Zhao,Sangsoon Woo,Tina Wong,François Robert, PING 2.0: An R/Bioconductor package for nucleosome positioning using next-generation sequencing data [Dataset]. https://www.omicsdi.org/dataset/arrayexpress-repository/E-GEOD-47073
    Explore at:
    xmlAvailable download formats
    Authors
    Yongjun Zhao,Sangsoon Woo,Tina Wong,François Robert
    Variables measured
    Genomics
    Description

    MNase-Seq and ChIP-Seq have evolved as popular techniques to study chromatin and histone modification. Although many tools have been developed to identify enriched regions, software tools for nucleosome positioning are still limited. We introduce a flexible and powerful open-source R package, PING 2.0, for nucleosome positioning using MNase-Seq data or MNase- or sonicated- ChIP-Seq data combined with either single-end or paired-end sequencing. PING uses a model-based approach, which enables nucleosome predictions even in the presence of low read counts. We illustrate PING using two paired-end datasets from Saccharomyces cerevisiae and compare its performance to nucleR and ChIPseqR. Identification of nucleosomes from two different mononucleosomes data. A yeast strain (W303 background) with the HTZ1 gene expressed a fusion with a myc epitope was used to map total and Htz1-containign nucleosome by MNase-ChIP-Seq. Cells were grown to mid-log phase and monomucleosomes were generated using MNase treatment of isolated nuclei. Especially for the sample of SC0017_61YDGAAXX_8_TCATTC, the Htz1-containing nucleosomes were enriched by immunoprecipitation using an anti-Myc antibody (3E10). DNA from both total nucleosomes and Htz1-enriched nucleosomes were purified and sequenced on an Illumina GA IIx using the by paired-end protocol.

  13. f

    Additional file 4 of ATAC-seq normalization method can significantly affect...

    • springernature.figshare.com
    text/x-shellscript
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jake J. Reske; Mike R. Wilson; Ronald L. Chandler (2023). Additional file 4 of ATAC-seq normalization method can significantly affect differential accessibility analysis and interpretation [Dataset]. http://doi.org/10.6084/m9.figshare.12177744.v1
    Explore at:
    text/x-shellscriptAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    figshare
    Authors
    Jake J. Reske; Mike R. Wilson; Ronald L. Chandler
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 5.naiveOverlapBroad.sh—Bash script for calculating naïve overlap broad peak set from 2 individual replicate peak sets and a pooled replicate peak set. Can be modified for to accept more replicates as desired. See Fig. 4 for usage.

  14. Data from: Transfer learning reveals sequence determinants of the...

    • zenodo.org
    application/gzip, zip
    Updated May 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sahin Naqvi; Sahin Naqvi (2024). Transfer learning reveals sequence determinants of the quantitative response to transcription factor dosage [Dataset]. http://doi.org/10.5281/zenodo.11224809
    Explore at:
    application/gzip, zipAvailable download formats
    Dataset updated
    May 29, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sahin Naqvi; Sahin Naqvi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Processed data and code for "Transfer learning reveals sequence determinants of the quantitative response to transcription factor dosage," Naqvi et al 2024.

    Directory is organized into 4 subfolders, each tar'ed and gzipped:

    data_analysis.tar.gz - Processed data for modulation of TWIST1 levels and calculation of RE responsiveness to TWIST1 dosage

    • atac_design.txt - design matrix for ATAC-seq TWIST1 titration samples
    • all.sub.150bpclust.greater2.500bp.merge.TWIST1.titr.ATAC.counts.txt - ATAC-seq counts from all samples over all reproducible ATAC-seq peak regions, as defined in Naqvi et al 2023
    • atac_deseq_fitmodels_moded50.R - R code for calculating new version of ED50 and response to full depletion from TWIST1 titration data (note, uses drm.R function from 10.5281/zenodo.7689948, install drc() with this version to avoid errors)

    baseline_models.tar.gz - Code and data for training baseline models to predict RE responsiveness to SOX9/TWIST1 dosage

    • {sox9|twist1}.{0v100|ed50}.{train|valid|test}.txt - Training/testing/validation data (ED50 or full TF depletion effect for SOX9 or TWIST1), split into train/test/validation folds
    • HOCOMOCOv11_core_HUMAN_mono_jaspar_format.all.sub.150bpclust.greater2.500bp.merge.minus300bp.p01.maxscore.mat.cpg.gc.basemean.txt.gz - matrix of predictors for all REs. Quantitative encoding of PWM match for all HOCOMOCO motifs + CpG + GC content, plus unperturbed ATAC-seq signal
    • train_baseline.R - R code to train baseline (LASSO regression or random forest) models using predictor matrix and the provided training data.
      • Note: training the random forest to predict full TF depletion is computationally intensive because it is across all REs, if doing this run on CPU for ~6 hrs.

    chrombpnet_models.tar.gz - Remainder of code, data, and models for fine-tuning and interpreting ChromBPNet mdoels to predict RE responsiveness to SOX9/TWIST1 dosage

    • Fine-tuning code, data, models
      • {all|sox9.direct|twist1.bound.down}.{train|valid|test}.{ed50|0v100.log2fc}.txt - Training/testing/validation data (ED50 or full TF depletion effect for SOX9 or TWIST1), split into train/test/validation folds
      • pretrained.unperturbed.chrombpnet.h5 - Pretrained model of unperturbed ATAC-seq signal in CNCCs, obtained by running ChromBPNet (https://github.com/kundajelab/chrombpnet) on DMSO-treated SOX9/TWIST1-tagged ATAC-seq data
      • finetune_chrombpnet.py - code for fine-tuning the pretrained model for any of the relevant prediction tasks (ED50/ effect of full TF depletion for SOX9/TWIST1)
      • best.model.chrombpnet.{0v100|ed50}.{sox9|twist1}.h5 - output of finetune_chrombpnet.py, best model after 10 training epochs for the indicated task
      • chrombpnet.{0v100|ed50}.{sox9|twist1}.contrib.{h5|bw} - contribution scores for the indicated predictive model, obtained by running chrombpnet contribs_bw on the corresponding model h5 file.
      • chrombpnet.{0v100|ed50}.{sox9|twist1}.contrib.modisco.{h5|bw} - TF-MoDIsCo output from the corresponding contribution score file
    • Interpretation code, data, models
      • contrib_h5_to_projshap_npy.py - code to convert contrib .h5 files into .npy files containing projected SHAP scores (required because the CWM matching code takes this format of contribution scores)
      • sox9.direct.10col.bed, twist1.bound.down.10col.uniq.bed - regions over which CWMs will be matched (likely direct targets of each TF)
      • match_cwms.py - Python code to match individual CWM instances. Takes as input: modisco .h5 file, SHAP .npy file, bed file of regions to be matched. Output is a bed file of all CWM matches (not pruned, contains many redundant matches).
      • chrombpnet.ed50.{sox9|twist1}.contrib.perc05.matchperc10.allmatch.bed - output of match_cwms.py
      • take_max_overlap.py - code to merge output of match_cwms.py into clusters, and then take the maximum (length-normalized) match score in each cluster as the representative CWM match of that cluster. Requires upstream bedtools commands to be piped in, see example usage in file.
      • chrombpnet.ed50.{sox9|twist1}.contrib.perc05.matchperc10.allmatch.maxoverlap.bed - output of take_max_overlap.py. These CWM instances are the ones used throughout the paper.

    modisco_reports.zip - TF-MoDIsCo reports from running on the fine-tuned ChromBPNet models

    • modisco_report_{sox9|twist1}_{0v100|ed50}: folders containing images of discovered CWMs and HTMLs/PDFs of summarized reports from running TF-MoDisCo on the indicated fine-tuned ChromBPNet model

    mirny_model.tar.gz - Code and data for analyzing and fitting Mirny model of TF-nucleosome competition to observed RE dosage response curves

    • twist1.strong.multi.only.ed50.cutoff.true.hill.txt - ED50 and signed hill coefficients for all TWIST1-dependent REs with only buffering Coordinators (mostly one or two) and no other TFs' binding sites. "ed50_new" is the ED50 calculation used in this paper.
    • twist1.strong.weak{1|2|3}.ed50.cutoff.true.hill.txt - ED50 and signed hill coefficients for all TWIST1-dependent REs with only buffering Coordinators (mostly one or two) and the indicated number of sensitizing (weak) Coordinators and no other TFs' binding sites. "ed50_new" is the ED50 calculation used in this paper.
    • MirnyModelAnalysis.py - Python code for analysis of Mirny model of TF-nucleosome competition. Contains implementations of analytic solutions, as well as code to fit model to observed ED50 and hill coefficients in the provided data files.
  15. N

    An integrated functional and clinical genomics approach reveal genes driving...

    • data.niaid.nih.gov
    Updated Aug 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shrestha R; Das R; Feng FY; Gilbert LA (2021). An integrated functional and clinical genomics approach reveal genes driving aggressive metastatic prostate cancer [ATAC-Seq] [Dataset]. https://data.niaid.nih.gov/resources?id=gse178329
    Explore at:
    Dataset updated
    Aug 11, 2021
    Dataset provided by
    University of California, San Francisco
    Authors
    Shrestha R; Das R; Feng FY; Gilbert LA
    Description

    Genomic sequencing of many thousands of tumors has revealed many genes associated with specific types of cancer. Similarly, large scale CRISPR functional genomics efforts have mapped genes required for proliferation or survival in hundreds of cancer cell lines. Despite this, for specific disease subtypes, such as metastatic prostate cancer, it is likely that there exist many undiscovered tumor specific genetic dependencies, such as prostate cancer specific drivers, that represent drug targets. To identify such genetic dependencies, we performed genome-scale CRISPRi screens in metastatic prostate cancer models. We then created a pipeline in which we integrated publicly available pan-cancer functional genomics data with our metastatic prostate cancer functional and clinical genomics data to identify genes that can drive aggressive prostate cancer phenotypes. Our integrative analysis of these data revealed two known prostate cancer specific driver genes, AR and HOXB13, as the top two hits and also nominated a number of unexpected genes. In this study we highlight the strength of an integrated clinical and functional genomics pipeline and focus on two hit genes, KIF4A and WDR62. We demonstrate that both KIF4A and WDR62 drive aggressive prostate cancer phenotypes in vitro and in vivo in multiple models, irrespective of AR-status, and are also associated with poor patient outcome. ATAC-seq was performed in KIF4A knockdown in LNCaP and C42B prostate cancer cells

  16. Analysis Products: Transcription factor stoichiometry, motif affinity and...

    • zenodo.org
    tsv, zip
    Updated Nov 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Surag Nair; Surag Nair; Mohamed Ameen; Kevin Wang; Kevin Wang; Anshul Kundaje; Anshul Kundaje; Mohamed Ameen (2023). Analysis Products: Transcription factor stoichiometry, motif affinity and syntax regulate single cell chromatin dynamics during fibroblast reprogramming to pluripotency [Dataset]. http://doi.org/10.5281/zenodo.8313962
    Explore at:
    zip, tsvAvailable download formats
    Dataset updated
    Nov 11, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Surag Nair; Surag Nair; Mohamed Ameen; Kevin Wang; Kevin Wang; Anshul Kundaje; Anshul Kundaje; Mohamed Ameen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This record contains analysis products for the paper "Transcription factor stoichiometry, motif affinity and syntax regulate single cell chromatin dynamics during fibroblast reprogramming to pluripotency" by Nair, Ameen et al. Please refer to the READMEs in the directories, which are summarized below.

    The record contains the following files:

    `clusters.tsv`: contains the cluster id, name and colour of clusters in the paper

    scATAC.zip

    Analysis products for the single-cell ATAC-seq data. Contains:

    - `cells.tsv`: list of barcodes that pass QC. Columns include:
    - `barcode`
    - `sample`: (time point)
    - `umap1`
    - `umap2`
    - `cluster`
    - `dpt_pseudotime_fibr_root`: pseudotime values treating a fibroblast cell as root
    - `dpt_pseudotime_xOSK_root`: pseudotime values treating xOSK cell as root
    - `peaks.bed`: list of peaks of 500bp across all cell states. 4th column contains the peak set label. Note that ~5000 peaks are not assigned to any peak set and are marked as NA.
    - `features.tsv`: 50 dimensional representation of each cell
    - `cell_x_peak.mtx.gz`: sparse matrix of fragment counts within peaks. Load using scipy.io.mmread in python or readMM in R. Columns correspond to cells from `cells.tsv` (combine sample + barcode). Rows correspond to peaks in `peaks.bed`

    scATAC_clusters.zip

    Analysis products corresponding to cluster pseudo-bulks of the single-cell ATAC-seq data.

    - `clusters.tsv`: contains the cluster id, name and colour used in the paper
    - `peaks`: contains `overlap_reproducibilty/overlap.optimal_peak` peaks called using ENCODE bulk ATAC-seq pipeline in the narrowPeak format.
    - `fragments`: contains per cluster fragment files

    scATAC_scRNA_integration.zip

    Analysis products from the integration of scATAC with scRNA. Contains:

    - `peak_gene_links_fdr1e-4.tsv`: file with peak gene links passing FDR 1e-4. For analyses in the paper, we filter to peaks with absolute correlation >0.45.
    - `harmony.cca.30.feat.tsv`: 30 dimensional co-embedding for scATAC and scRNA cells obtained by CCA followed by applying Harmony over assay type.
    - `harmony.cca.metadata.tsv`: UMAP coordinates for scATAC and scRNA cells derived from the Harmony CCA embedding. First column contains barcode.

    scRNA.zip

    Analysis products for the single-cell RNA-seq data. Contains:

    - `seurat.rds`: seurat object that contains expression data (raw counts, normalized, and scaled), reductions (umap, pca), knn graphs, all associated metadata. Note that barcode suffix (1-9 corresponds to samples D0, D2, ..., D14, iPSC)
    - `genes.txt`: list of all genes
    - `cells.tsv`: list of barcodes that pass QC across samples. Contains:
    - `barcode_sample`: barcode with index of sample (1-9 corresponding to D0, D2, ..., D14, iPSC)
    - `sample`: sample name (D0, D2, .., D14, iPSC)
    - `umap1`
    - `umap2`
    - `nCount_RNA`
    - `nFeature_RNA`
    - `cluster`
    - `percent.mt`: percent of mitochondrial transcripts in cell
    - `percent.oskm`: percent of OSKM transcripts in cell
    - `gene_x_cell.mtx.gz`: sparse matrix of gene counts. Load using scipy.io.mmread in python or readMM in R. Columns correspond to cells from `cells.tsv` (barcode suffix contains sample information). Rows correspond to genes in `genes.txt`
    - `pca.tsv`: first 50 PC of each cell
    - `oskm_endo_sendai.tsv`: estimated raw counts (cts, may not be integers) and log(1+ tp10k) normalized expression (norm) for endogenous and exogenous (Sendai derived) counts of POU5F1 (OCT4), SOX2, KLF4 and MYC genes. Rows are consistent with `seurat.rds` and `cells.tsv`

    multiome.zip

    multiome/snATAC:

    These files are derived from the integration of nuclei from multiome (D1M and D2M), with cells from day 2 of scATAC-seq (labeled D2).

    - `cells.tsv`: This is the list of nuclei barcodes that pass QC from multiome AND also cell barcodes from D2 of scATAC-seq. Includes:
    - `barcode`
    - `umap1`: These are the coordinates used for the figures involving multiome in the paper.
    - `umap2`: ^^^
    - `sample`: D1M and D2M correspond to multiome, D2 corresponds to day 2 of scATAC-seq
    - `cluster`: For multiome barcodes, these are labels transfered from scATAC-seq. For D2 scATAC-seq, it is the original cluster labels.
    - `peaks.bed`: This is the same file as scATAC/peaks.bed. List of peaks of 500bp. 4th column contains the peak set label. Note that ~5000 peaks are not assigned to any peak set and are marked as NA.
    - `cell_x_peak.mtx.gz`: sparse matrix of fragment counts within peaks. Load using scipy.io.mmread in python or readMM in R. Columns correspond to cells from `cells.tsv` (combine sample + barcode). Rows correspond to peaks in `peaks.bed`.
    - `features.no.harmony.50d.tsv`: 50 dimensional representation of each cell prior to running Harmony (to correct for batch effect between D2 scATAC and D1M,D2M snMultiome). Rows correspond to cells from `cells.tsv`.
    - `features.harmony.10d.tsv`: 10 dimensional representation of each cell after running Harmony. Rows correspond to cells from `cells.tsv`.

    multiome/snRNA:

    - `seurat.rds`: seurat object that contains expression data (raw counts, normalized, and scaled), reductions (umap, pca),associated metadata. Note that barcode suffix (1,2 corresponds to samples D1M, D2M). Please use the UMAP/features from snATAC/ for consistency.
    - `genes.txt`: list of all genes (this is different from the list in scRNA analysis)
    - `cells.tsv`: list of barcodes that pass QC across samples. Contains:
    - `barcode_sample`: barcode with index of sample (1,2 corresponding to D1M, D2M respectively)
    - `sample`: sample name (D1M, D2M)
    - `nCount_RNA`
    - `nFeature_RNA`
    - `percent.oskm`: percent of OSKM genes in cell
    - `gene_x_cell.mtx.gz`: sparse matrix of gene counts. Load using scipy.io.mmread in python or readMM in R. Columns correspond to cells from `cells.tsv` (barcode suffix contains sample information). Rows correspond to genes in `genes.txt`

  17. N

    Data from: Cicero Predicts cis-Regulatory DNA Interactions from Single-Cell...

    • data.niaid.nih.gov
    Updated Mar 26, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pliner HA; Packer J; McFaline-Figueroa JL; Cusanovich DA; Daza R; Aghamirzaie D; Srivatsan S; Qiu X; Jackson D; Minkina A; Adey A; Steemers FJ; Shendure J; Trapnell C (2019). Cicero Predicts cis-Regulatory DNA Interactions from Single-Cell Chromatin Accessibility Data [Dataset]. https://data.niaid.nih.gov/resources?id=gse109828
    Explore at:
    Dataset updated
    Mar 26, 2019
    Dataset provided by
    University of Washington
    Authors
    Pliner HA; Packer J; McFaline-Figueroa JL; Cusanovich DA; Daza R; Aghamirzaie D; Srivatsan S; Qiu X; Jackson D; Minkina A; Adey A; Steemers FJ; Shendure J; Trapnell C
    Description

    Linking regulatory DNA elements to their target genes, which may be located hundreds of kilobases away, remains challenging. Here, we introduce Cicero, an algorithm that identifies co-accessible pairs of DNA elements using single-cell chromatin accessibility data and so connects regulatory elements to their putative target genes. We apply Cicero to investigate how dynamically accessible elements orchestrate gene regulation in differentiating myoblasts. Groups of Cicero-linked regulatory elements meet criteria of “chromatin hubs”—they are enriched for physical proximity, interact with a common set of transcription factors, and undergo coordinated changes in histone marks that are predictive of changes in gene expression. Pseudotemporal analysis revealed that most DNA elements remain in chromatin hubs throughout differentiation. A subset of elements bound by MYOD1 in myoblasts exhibit early opening in a PBX1- and MEIS1-dependent manner. Our strategy can be applied to dissect the architecture, sequence determinants, and mechanisms of cis-regulation on a genome-wide scale. sci-ATAC-seq data was collected on human skeletal muscle myoblasts (HSMM) in culture at four timepoints after serum switch to induce differentiation into myotubes, 0 hours, 24 hours, 48 hours and 72 hours. Libraries pooled for sequencing (Experiment 1). An additional experiment was collected using the same system, at 0 hours and 72 hours after serum switch (Experiment 2). Bulk ATAC-seq data was also collected for each of the four timepoints. In addition, sci-ATAC-seq data was collected on an artificial mixture of GM12878 and HL60 cells. Lastly, bulk ATAC-seq data was collected at day 0 and day 7 after serum switch in 54-1 immortalized myoblasts that were transduced with lentivirus carrying small guide RNAs targeting Pbx1, Meis1 or non-targeting controls using lentiCRISPRv2-blast. Cells were allowed time for editing post transduction before differentiation. See publication for details.

  18. Data from: Decoding myofibroblast origins in human kidney fibrosis

    • zenodo.org
    application/gzip, bin +1
    Updated Oct 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christoph Kuppe; Mahmoud M Ibrahim; Jennifer Kranz; Xiaoting Zhang; Susanne Ziegler; Jitske Jansen; Katharina C Reimer; Javier Perales-Paton; James R Smith; Ross Dobie; John R Wilson-Kanamari; Maurice Halder; Yaoxian Xu; Nazanin Kabgani; Nadine Kaesler; Martin Klaus; Lukas Gemhold; Victor G Puelles; Tobias B Huber; Peter Boor; Sylvia Menzel; Remco M Hoogenboezem; Eric M J Bindels; Joachim Steffens; Floege Jürgen; Rebekka K Schneider; Julio Saez-Rodriguez; Neil C Henderson; Rafael Kramann; Christoph Kuppe; Mahmoud M Ibrahim; Jennifer Kranz; Xiaoting Zhang; Susanne Ziegler; Jitske Jansen; Katharina C Reimer; Javier Perales-Paton; James R Smith; Ross Dobie; John R Wilson-Kanamari; Maurice Halder; Yaoxian Xu; Nazanin Kabgani; Nadine Kaesler; Martin Klaus; Lukas Gemhold; Victor G Puelles; Tobias B Huber; Peter Boor; Sylvia Menzel; Remco M Hoogenboezem; Eric M J Bindels; Joachim Steffens; Floege Jürgen; Rebekka K Schneider; Julio Saez-Rodriguez; Neil C Henderson; Rafael Kramann (2020). Decoding myofibroblast origins in human kidney fibrosis [Dataset]. http://doi.org/10.5281/zenodo.4059315
    Explore at:
    application/gzip, csv, binAvailable download formats
    Dataset updated
    Oct 9, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Christoph Kuppe; Mahmoud M Ibrahim; Jennifer Kranz; Xiaoting Zhang; Susanne Ziegler; Jitske Jansen; Katharina C Reimer; Javier Perales-Paton; James R Smith; Ross Dobie; John R Wilson-Kanamari; Maurice Halder; Yaoxian Xu; Nazanin Kabgani; Nadine Kaesler; Martin Klaus; Lukas Gemhold; Victor G Puelles; Tobias B Huber; Peter Boor; Sylvia Menzel; Remco M Hoogenboezem; Eric M J Bindels; Joachim Steffens; Floege Jürgen; Rebekka K Schneider; Julio Saez-Rodriguez; Neil C Henderson; Rafael Kramann; Christoph Kuppe; Mahmoud M Ibrahim; Jennifer Kranz; Xiaoting Zhang; Susanne Ziegler; Jitske Jansen; Katharina C Reimer; Javier Perales-Paton; James R Smith; Ross Dobie; John R Wilson-Kanamari; Maurice Halder; Yaoxian Xu; Nazanin Kabgani; Nadine Kaesler; Martin Klaus; Lukas Gemhold; Victor G Puelles; Tobias B Huber; Peter Boor; Sylvia Menzel; Remco M Hoogenboezem; Eric M J Bindels; Joachim Steffens; Floege Jürgen; Rebekka K Schneider; Julio Saez-Rodriguez; Neil C Henderson; Rafael Kramann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data repository for the manuscript: Kuppe, Ibrahim et al. "Decoding myofibroblast origins in human kidney fibrosis", 2020. Please also consult the supplemental data in the paper, and the data availability statement in hte manuscript for raw FASTQ files for mouse data.

    For further data requests and questions, please contact Dr. Rafael Kramann (rkramann@ukaachen.de)

    File Details:

    - Human in vitro PDGFRb+ RNA-seq (bulk RNA-seq data for various NKD2 knock-out and knock-in clones)
    * invitro_bulk_rnaseq.tar.gz: Salmon output for all samples. Please see the manuscript for further information.

    - UUO Mouse FACS sorted PDGFRa+/b+ ATAC-Seq
    * mouse_uuo_pdgfrab_atacseq.bw: BigWig Signal file for ATAC-Seq data, PDGFRa+/b+ FACS sorted cells from day 10 UUO mouse kidneys (average of two biological replicates)
    * mouse_uuo_pdgfrab_motifs.meme: Motifs identified based on the ATAC-Seq data and further analyzed in the paper

    - UUO and Sham Mouse FACS sorted PDGFRa+/b+ scRNA-seq (10x Genomics)
    * Mouse_PDGFRab.tar.gz: contains the count data derived by Alevin/Salmon for the cells analyzed in the paper in matrix market format (.mtx). column data include cell cluster annotations.

    - UUO and Sham Mouse FACS sorted PDGFRb+ scRNA-seq (SmartSeq2)
    * Mouse_PDGFRa.tar.gz: contains the expression data for the cells analyzed in the paper in matrix market format (.mtx). column data include cell cluster annotations.

    - Human FACS sorted CD10+ scRNA-seq (10x Genomics)
    * Human_CD10plus.tar.gz: contains the count data derived by Alevin/Salmon for the cells analyzed in the paper in matrix market format (.mtx). column data include cell cluster annotations.

    - Human FACS sorted CD10- scRNA-seq (10x Genomics)
    * Human_CD10minus.tar.gz: contains the count data derived by Alevin/Salmon for the cells analyzed in the paper in matrix market format (.mtx). column data include cell cluster annotations.

    - Human FACS sorted PDGFRb+ scRNA-seq (10x Genomics)
    * Human_PDGFRb.tar.gz: contains the count data derived by Alevin/Salmon for the cells analyzed in the paper in matrix market format (.mtx). column data include cell cluster annotations.
    * HumanPDGFRBpositive_Nkd2_grnboost2.csv: Gene Regulatory Network obtained by GRNboost2 on genes correlated with NKD2 in Fibroblast (Mesenchymal) cells. See manuscript for details.
    * Human_PDGFRBplus_TFanalysis.tar.gz: TF analysis based on single cell RNA-seq for promoter and distal regions. See manuscript for details.

    - github_files.tar.gz: RData Objects associated with the paper code repository (https://github.com/mahmoudibrahim/KidneyMap)

  19. d

    Data from: CENH3 information from: Einkorn genomics sheds light on history...

    • search.dataone.org
    • datadryad.org
    Updated Nov 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hanin Ibrahim Ahmed (2023). CENH3 information from: Einkorn genomics sheds light on history of the oldest domesticated wheat [Dataset]. http://doi.org/10.5061/dryad.0p2ngf24b
    Explore at:
    Dataset updated
    Nov 29, 2023
    Dataset provided by
    Dryad Digital Repository
    Authors
    Hanin Ibrahim Ahmed
    Time period covered
    Jan 1, 2022
    Description

    Einkorn (Triticum monococcum) is the first domesticated wheat species, being central to the birth of agriculture and the Neolithic Revolution in the Fertile Crescent ~10,000 years ago. Here, we generate and analyze 5.2-gigabase genome assemblies for wild and domesticated einkorn, including completely assembled centromeres. Einkorn centromeres are highly dynamic, showing evidence of ancient and recent centromere shifts caused by structural rearrangements. Whole-genome sequencing of a diversity panel uncovered the population structure and evolutionary history of einkorn, revealing complex patterns of hybridizations and introgressions following the dispersal of domesticated einkorn from the Fertile Crescent. We also discovered that around 1% of the modern bread wheat (Triticum aestivum) A subgenome originates from einkorn. These resources and findings highlight the history of einkorn evolution and provide a basis to accelerate the genomics-assisted improvement of einkorn and bread wheat., Chromatin immunoprecipitation (ChIP) and sequencing (ChIP-seq): Chromatin immunoprecipitation (ChIP) was performed according to the method given by Nagaki et al. standardized with wheat CENH3 antibody. Nuclei were isolated from 2-week-old seedlings and digested with micrococcal nuclease (Sigma, MO) to liberate nucleosomes. The digested mixture was incubated overnight with 3 mg of wheat CENH3 antibody at 4°C. The chromatin-antibody complexes were captured using Dynabeads Protein G (Invitrogen, CA). Elution of the chromatin was done using 100 ml of preheated elution buffer (1% sodium dodecyl sulfate and 0.1 M NaHCO3) for 30 min at 65°C. DNA from the ChIP was isolated using ChIP DNA Clean and Concentrator Kit (Zymo Research, CA). ChIP-seq libraries were then constructed using the TruSeq ChIP Library Preparation Kit (Illumina, CA) according to the manufacturer’s instructions, and libraries were sequenced using NovoSeq S4 with 150-bp paired-end sequencing run. CENH3 ChIP-seq data analysis: R..., The link contain the BED files and the BAM (mapped) files of CENH3 reads against the respective genome assembly.

  20. f

    Additional file 3 of ATAC-seq normalization method can significantly affect...

    • figshare.com
    text/x-shellscript
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jake J. Reske; Mike R. Wilson; Ronald L. Chandler (2023). Additional file 3 of ATAC-seq normalization method can significantly affect differential accessibility analysis and interpretation [Dataset]. http://doi.org/10.6084/m9.figshare.12177738.v1
    Explore at:
    text/x-shellscriptAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    figshare
    Authors
    Jake J. Reske; Mike R. Wilson; Ronald L. Chandler
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 4.bedpeMinimalConvert.sh—Bash script for converting standard 10-column format BEDPE to the “minimal” format defined by MACS2. See Fig. 4 for usage.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jake J. Reske; Mike R. Wilson; Ronald L. Chandler (2023). Additional file 5 of ATAC-seq normalization method can significantly affect differential accessibility analysis and interpretation [Dataset]. http://doi.org/10.6084/m9.figshare.12177726.v1

Additional file 5 of ATAC-seq normalization method can significantly affect differential accessibility analysis and interpretation

Related Article
Explore at:
txtAvailable download formats
Dataset updated
May 30, 2023
Dataset provided by
figshare
Authors
Jake J. Reske; Mike R. Wilson; Ronald L. Chandler
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Additional file 2.ATACseq_workflow.txt—Example machine-readable Fig. 4 workflow including stepwise unix and R commands for ATAC-seq data processing.

Search
Clear search
Close search
Google apps
Main menu