9 datasets found
  1. o

    Repository for the single cell RNA sequencing data analysis for the human...

    • explore.openaire.eu
    Updated Aug 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan; Andrew; Pierre; Allart; Adrian (2023). Repository for the single cell RNA sequencing data analysis for the human manuscript. [Dataset]. http://doi.org/10.5281/zenodo.8286134
    Explore at:
    Dataset updated
    Aug 26, 2023
    Authors
    Jonathan; Andrew; Pierre; Allart; Adrian
    Description

    This is the GitHub repository for the single cell RNA sequencing data analysis for the human manuscript. The following essential libraries are required for script execution: Seurat scReportoire ggplot2 dplyr ggridges ggrepel ComplexHeatmap Linked File: -------------------------------------- This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. Provided below are descriptions of the linked datasets: 1. Gene Expression Omnibus (GEO) ID: GSE229626 - Title: Gene expression profile at single cell level of human T cells stimulated via antibodies against the T Cell Receptor (TCR) - Description: This submission contains the matrix.mtx, barcodes.tsv, and genes.tsv files for each replicate and condition, corresponding to the aligned files for single cell sequencing data. - Submission type: Private. In order to gain access to the repository, you must use a "reviewer token"(https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html). 2. Sequence read archive (SRA) repository - Title: Gene expression profile at single cell level of human T cells stimulated via antibodies against the T Cell Receptor (TCR) - Description: This submission contains the "raw sequencing" or .fastq.gz files, which are tab delimited text files. - Submission type: Private. In order to gain access to the repository, you must use a "reviewer token" (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html). Please note that since the GSE submission is private, the raw data deposited at SRA may not be accessible until the embargo on GSE229626 has been lifted. Installation and Instructions -------------------------------------- The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation: > Ensure you have R version 4.1.2 or higher for compatibility. > Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code. The following code can be used to set working directory in R: > setwd(directory) Steps: 1. Download the "Human_code_April2023.R" and "Install_Packages.R" R scripts, and the processed data from GSE229626. 2. Open "R-Studios"(https://www.rstudio.com/tags/rstudio-ide/) or a similar integrated development environment (IDE) for R. 3. Set your working directory to where the following files are located: - Human_code_April2023.R - Install_Packages.R 4. Open the file titled Install_Packages.R and execute it in R IDE. This script will attempt to install all the necessary pacakges, and its dependencies. 5. Open the Human_code_April2023.R R script and execute commands as necessary.

  2. Multiple Single Cell RNA Expressions ARCHS4

    • kaggle.com
    zip
    Updated Jun 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander (2021). Multiple Single Cell RNA Expressions ARCHS4 [Dataset]. https://www.kaggle.com/alexandervc/multiple-single-cell-rna-expressions-archs4
    Explore at:
    zip(23088130184 bytes)Available download formats
    Dataset updated
    Jun 26, 2021
    Authors
    Alexander
    Description

    Context

    Dataset is downloaded from https://amp.pharm.mssm.edu/archs4/download.html The methods are described in Nature Communications paper: https://www.nature.com/articles/s41467-018-03751-6

    The ARCHS4 data provides user-friendly access to multiple gene expression data from the GEO database. (https://www.ncbi.nlm.nih.gov/geo/ ). While in GEO database most of data is stored in raw formats, ARCHS4 provides prepared count matrix expression data. While GEO contains data stored separately for each research paper, ARCHS4 collects all the information in one single matrix. One may consult the main site for further information.

    Main data files are in H5 (HD5, Hierarchical Data Format ) file format https://en.wikipedia.org/wiki/Hierarchical_Data_Format It contains expression data, as well as annotation data and futher meta-information. There are several other auxilliary files like TSNE 3d projection (in CSV format) and correlation matrices for genes for human and mouse in feather format.

    Content

    The main file (for human): human_matrix.h5 - contains data matrix - which is 238522 samples times 35238 genes, as well as, various meta information: gene names, samples information (tissue, etc), references to GEO database id where all the details can be found.

    There is also similar data for mouse, csv files with TSNE images, correlation matrices for genes.

    Acknowledgements

    The ARCHS4 project is by :

    'Alexander Lachmann', 'alexander.lachmann@mssm.edu', update: '2020-02-06'

  3. Data from: TempO-seq and RNA-seq gene expression levels are highly...

    • catalog.data.gov
    Updated Jun 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2025). TempO-seq and RNA-seq gene expression levels are highly correlated for most genes: A comparison using 39 human cell lines [Dataset]. https://catalog.data.gov/dataset/tempo-seq-and-rna-seq-gene-expression-levels-are-highly-correlated-for-most-genes-a-compar
    Explore at:
    Dataset updated
    Jun 8, 2025
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Journal article published in PLOS One, Vol 20, Issue 5, e0320862, 2025; DOI: https://doi.org/10.1371/journal.pone.0320862; PMC12064016. The datasets generated and analyzed during the current study are provided in Supplemental S1 File. The RNA-seq data is Protein Atlas Version 23 from the Human Protein Atlas website (https://www.proteinatlas.org/about/download, “RNA HPA cell line gene data” released 2023.06.19). All FASTQ files and aligned counts for the U.S. EPA TempO-seq data have been deposited into NCBI Gene Expression Omnibus under the accession number GSE288929 and are publicly available at: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE288929. The R code is available through FigShare at: https://doi.org/10.23645/epacomptox.27341970.v1. This dataset is associated with the following publication: Word, L., C. Willis, R. Judson, L. Everett, S. Davidson-Fritz, D. Haggard, B. Chambers, J. Rogers, J. Bundy, I. Shah, N. Sipes, and J. Harrill. TempO-seq and RNA-seq Gene Expression Levels are Highly Correlated for Most Genes: A Comparison Using 39 Human Cell Lines. PLOS ONE. Public Library of Science, San Francisco, CA, USA, 20(5): e0320862, (2025).

  4. Single Cell RNAseq of Mouse Testis

    • zenodo.org
    bin, zip
    Updated Jan 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wells; Wells; Myers; Conrad; Jung; Myers; Conrad; Jung (2023). Single Cell RNAseq of Mouse Testis [Dataset]. http://doi.org/10.5281/zenodo.3233870
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Jan 7, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Wells; Wells; Myers; Conrad; Jung; Myers; Conrad; Jung
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the processed data from the publication: "Unified single-cell analysis of testis gene regulation and pathology in 5 mouse strains" (https://doi.org/10.1101/393769)

    The raw data is avaliable at GEO: GSE113293

    Associated software is at https://zenodo.org/badge/latestdoi/140632831

    SDA_objects.zip contains key tables required for many functions, download this to use the shiny app. Contents:

    • cell_data: data.table containing metadata of the cells. 80 columns including cell id, SDA component cell scores, Tsne coordinates, pseudotime, experimental group etc.
    • data: a sparse matrix of normalised read counts (20322 cells by 19262 genes)
    • gene_annotations: data.table containing gene locations, enrichment vs other tissues from bulk data, infertility gene status
    • GO_enrich: data.table containing gene ontology enrichments for each component, the genes enriched, p.values and enrichment values
    • principal_curves: list of output of princurve() containing the pseudotime trajetories
    • SDAresults: output of SDA run loaded using SDAtools::load_results()

    Other R objects include:

    • QC_count_matrix: Sparse matrix of raw count values for QC cells and genes
    • cell_imputation_AUCs: data.table of PRAC AUC values for unimputed (train), mean cell, SDA (predict), ICA, PCA, NNMF, and MAGIC
    • HocomocoV11_motifProbs_matrix: Matrix of regulation probabilities from MotifFinder using fixed motifs from HocomocoV11 database
    • motifFinder_denovo: list of results of MotifFinder denovo motifs from promoters regions of genes from each component
    • motifFinder_denovo_fixed: list of results of MotifFinder using 125 denovo motifs with fixed motif on all genes.
    • tomtom_matched_motifs: data.table of TOMTOM matches of denovo motifs, plus metadata
  5. f

    DataSheet_1_An integrative analysis of single-cell and bulk transcriptome...

    • frontiersin.figshare.com
    docx
    Updated Dec 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hong-Kai Cui; Chao-Jie Tang; Yu Gao; Zi-Ang Li; Jian Zhang; Yong-Dong Li (2023). DataSheet_1_An integrative analysis of single-cell and bulk transcriptome and bidirectional mendelian randomization analysis identified C1Q as a novel stimulated risk gene for Atherosclerosis.docx [Dataset]. http://doi.org/10.3389/fimmu.2023.1289223.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Dec 21, 2023
    Dataset provided by
    Frontiers
    Authors
    Hong-Kai Cui; Chao-Jie Tang; Yu Gao; Zi-Ang Li; Jian Zhang; Yong-Dong Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundThe role of complement component 1q (C1Q) related genes on human atherosclerotic plaques (HAP) is less known. Our aim is to establish C1Q associated hub genes using single-cell RNA sequencing (scRNA-seq) and bulk RNA analysis to diagnose and predict HAP patients more effectively and investigate the association between C1Q and HAP (ischemic stroke) using bidirectional Mendelian randomization (MR) analysis.MethodsHAP scRNA-seq and bulk-RNA data were download from the Gene Expression Omnibus (GEO) database. The C1Q-related hub genes was screened using the GBM, LASSO and XGBoost algorithms. We built machine learning models to diagnose and distinguish between types of atherosclerosis using generalized linear models and receiver operating characteristics (ROC) analyses. Further, we scored the HALLMARK_COMPLEMENT signaling pathway using ssGSEA and confirmed hub gene expression through qRT-PCR in RAW264.7 macrophages and apoE-/- mice. Furthermore, the risk association between C1Q and HAP was assessed through bidirectional MR analysis, with C1Q as exposure and ischemic stroke (IS, large artery atherosclerosis) as outcomes. Inverse variance weighting (IVW) was used as the main method.ResultsWe utilized scRNA-seq dataset (GSE159677) to identify 24 cell clusters and 12 cell types, and revealed seven C1Q associated DEGs in both the scRNA-seq and GEO datasets. We then used GBM, LASSO and XGBoost to select C1QA and C1QC from the seven DEGs. Our findings indicated that both training and validation cohorts had satisfactory diagnostic accuracy for identifying patients with HPAs. Additionally, we confirmed SPI1 as a potential TF responsible for regulating the two hub genes in HAP. Our analysis further revealed that the HALLMARK_COMPLEMENT signaling pathway was correlated and activated with C1QA and C1QC. We confirmed high expression levels of C1QA, C1QC and SPI1 in ox-LDL-treated RAW264.7 macrophages and apoE-/- mice using qPCR. The results of MR indicated that there was a positive association between the genetic risk of C1Q and IS, as evidenced by an odds ratio (OR) of 1.118 (95%CI: 1.013–1.234, P = 0.027).ConclusionThe authors have effectively developed and validated a novel diagnostic signature comprising two genes for HAP, while MR analysis has provided evidence supporting a favorable association of C1Q on IS.

  6. Z

    Data and code from: "A transcriptional rheostat couples past activity to...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Mar 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brann, David H (2023). Data and code from: "A transcriptional rheostat couples past activity to future sensory responses" (Tsukahara, Brann, et al. 2021 Cell) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5550453
    Explore at:
    Dataset updated
    Mar 11, 2023
    Dataset provided by
    Brann, David H
    Datta, Sandeep Robert
    Tsukahara, Tatsuya
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A transcriptional rheostat couples past activity to future sensory responses

    Code and data to replicate analyses in Tsukahara, Brann et al. 2021 Cell https://doi.org/10.1016/j.cell.2021.11.022

    Summary

    Animals traversing different environments encounter both stable background stimuli and novel cues, which are thought to be detected by primary sensory neurons and then distinguished by downstream brain circuits. Here we show that each of the ~1000 olfactory sensory neuron (OSN) subtypes in the mouse harbors a distinct transcriptome whose content is precisely determined by interactions between its odorant receptor and the environment. This transcriptional variation is systematically organized to support sensory adaptation: expression levels of more than 70 genes relevant to transforming odors into spikes continuously vary across OSN subtypes, dynamically adjust to new environments over hours, and accurately predict acute OSN-specific odor responses. The sensory periphery therefore separates salient signals from predictable background via a transcriptional rheostat whose moment-to-moment state reflects the past and constrains the future; these findings suggest a general model in which structured transcriptional variation within a cell type reflects individual experience.

    Manuscript

    For more details, please see our Open Access manuscript: https://www.cell.com/cell/fulltext/S0092-8674(21)01337-4

    Code

    1. The code here is a copy of that on GitHub: https://github.com/dattalab/Tsukahara_Brann_OSN. Instructions for how to download and install it can be found in the README.md file.

    2. Data is available on the NCBI GEO (accession GSE173947) and raw fastq files are available from the SRA (accession SRP318630).

    3. Supplementary data (imaging traces and example preprocessed AnnData object for the home-cage dataset) can be found in the data folders of the attached Tsukahara_Brann_OSN-zenodo.zip file.

  7. COVID-19 Case Surveillance Public Use Data

    • catalog.data.gov
    • opendatalab.com
    • +5more
    Updated Mar 3, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Centers for Disease Control and Prevention (2022). COVID-19 Case Surveillance Public Use Data [Dataset]. https://catalog.data.gov/dataset/covid-19-case-surveillance-public-use-data
    Explore at:
    Dataset updated
    Mar 3, 2022
    Dataset provided by
    Centers for Disease Control and Preventionhttp://www.cdc.gov/
    Description

    Beginning March 1, 2022, the "COVID-19 Case Surveillance Public Use Data" will be updated on a monthly basis. This case surveillance public use dataset has 12 elements for all COVID-19 cases shared with CDC and includes demographics, any exposure history, disease severity indicators and outcomes, presence of any underlying medical conditions and risk behaviors, and no geographic data. CDC has three COVID-19 case surveillance datasets: COVID-19 Case Surveillance Public Use Data with Geography: Public use, patient-level dataset with clinical data (including symptoms), demographics, and county and state of residence. (19 data elements) COVID-19 Case Surveillance Public Use Data: Public use, patient-level dataset with clinical and symptom data and demographics, with no geographic data. (12 data elements) COVID-19 Case Surveillance Restricted Access Detailed Data: Restricted access, patient-level dataset with clinical and symptom data, demographics, and state and county of residence. Access requires a registration process and a data use agreement. (32 data elements) The following apply to all three datasets: Data elements can be found on the COVID-19 case report form located at www.cdc.gov/coronavirus/2019-ncov/downloads/pui-form.pdf. Data are considered provisional by CDC and are subject to change until the data are reconciled and verified with the state and territorial data providers. Some data cells are suppressed to protect individual privacy. The datasets will include all cases with the earliest date available in each record (date received by CDC or date related to illness/specimen collection) at least 14 days prior to the creation of the previously updated datasets. This 14-day lag allows case reporting to be stabilized and ensures that time-dependent outcome data are accurately captured. Datasets are updated monthly. Datasets are created using CDC’s operational Policy on Public Health Research and Nonresearch Data Management and Access and include protections designed to protect individual privacy. For more information about data collection and reporting, please see https://wwwn.cdc.gov/nndss/data-collection.html For more information about the COVID-19 case surveillance data, please see https://www.cdc.gov/coronavirus/2019-ncov/covid-data/faq-surveillance.html Overview The COVID-19 case surveillance database includes individual-level data reported to U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and affiliates. On April 5, 2020, COVID-19 was added to the Nationally Notifiable Condition List and classified as “immediately notifiable, urgent (within 24 hours)” by a Council of State and Territorial Epidemiologists (CSTE) Interim Position Statement (Interim-20-ID-01). CSTE updated the position statement on August 5, 2020 to clarify the interpretation of antigen detection tests and serologic test results within the case classification. The statement also recommended that all states and territories enact laws to make COVID-19 reportable in their jurisdiction, and that jurisdictions conducting surveillance should submit case notifications to CDC. COVID-19 case surveillance data are collected by jurisdictions and reported volun

  8. Codes of "Single -cell transcriptomes of zebrafish germline reveal...

    • figshare.com
    bin
    Updated Jul 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hao Ho; Chenwei Hsu; Ching-Hsin Yang; Yan-wei Wang; Ker-Chau Li; Bon-chu Chung (2025). Codes of "Single -cell transcriptomes of zebrafish germline reveal progenitor types and feminization by Foxl2l" [Dataset]. http://doi.org/10.6084/m9.figshare.26314126.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jul 1, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Hao Ho; Chenwei Hsu; Ching-Hsin Yang; Yan-wei Wang; Ker-Chau Li; Bon-chu Chung
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The codes were used in this study "Single -cell transcriptomes of zebrafish germline reveal progenitor types and feminization by Foxl2l".All snapshots of the notebooks for code related to quality control (QC), preprocessing, wild-type (WT) data analysis, integration, public data analysis, mutant (MUT) data analysis, and figure generation are provided in the Notebooks folder. Custom utility functions are located in the Source_codes folder, and the required file lists are available in the File_lists folder.To reproduce the results:Download the raw count matrix from GEO (GSE173718) and perform data preprocessing to obtain the filtered_feature_bc_matrix.Update the path to the filtered_feature_bc_matrix accordingly in the scripts.For the public dataset from Liu et al., download the zx1_40gc_final_orig.robj file and move it to the Data folder.Execute the following notebooks sequentially in the specified order:• 0_QC_Preprocessing.ipynb• 1_WT_data_analysis.ipynb• 2_Integration.ipynb• S_Public_data_analysis.ipynb• S2_MUT_data_analysis.ipynb• 3_Figures.ipynb

  9. A Coherent Histone and DNA Hypomethylation Reprogramming Drives Small Cell...

    • figshare.com
    application/x-gzip
    Updated Feb 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiangyu Pan (2022). A Coherent Histone and DNA Hypomethylation Reprogramming Drives Small Cell Lung Cancer Metastasis [Dataset]. http://doi.org/10.6084/m9.figshare.13206338.v1
    Explore at:
    application/x-gzipAvailable download formats
    Dataset updated
    Feb 23, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Xiangyu Pan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In our paper, we used bulk omics data to deeply illustrate academic principles about SCLC metastasis, including bulk RNA-seq, bulk ATAC-seq and bulk WGBS-seq. To increase the reproducibility, we would like to share all processed files that you could download from GEO and figshare.The detail information of each processed file were displayed by tree map. And analysis codes of bulk omics data were recorded and collected that would help you understand how’s the processed files meaning and how’s them could be used in your study.

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jonathan; Andrew; Pierre; Allart; Adrian (2023). Repository for the single cell RNA sequencing data analysis for the human manuscript. [Dataset]. http://doi.org/10.5281/zenodo.8286134

Repository for the single cell RNA sequencing data analysis for the human manuscript.

Explore at:
22 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Aug 26, 2023
Authors
Jonathan; Andrew; Pierre; Allart; Adrian
Description

This is the GitHub repository for the single cell RNA sequencing data analysis for the human manuscript. The following essential libraries are required for script execution: Seurat scReportoire ggplot2 dplyr ggridges ggrepel ComplexHeatmap Linked File: -------------------------------------- This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. Provided below are descriptions of the linked datasets: 1. Gene Expression Omnibus (GEO) ID: GSE229626 - Title: Gene expression profile at single cell level of human T cells stimulated via antibodies against the T Cell Receptor (TCR) - Description: This submission contains the matrix.mtx, barcodes.tsv, and genes.tsv files for each replicate and condition, corresponding to the aligned files for single cell sequencing data. - Submission type: Private. In order to gain access to the repository, you must use a "reviewer token"(https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html). 2. Sequence read archive (SRA) repository - Title: Gene expression profile at single cell level of human T cells stimulated via antibodies against the T Cell Receptor (TCR) - Description: This submission contains the "raw sequencing" or .fastq.gz files, which are tab delimited text files. - Submission type: Private. In order to gain access to the repository, you must use a "reviewer token" (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html). Please note that since the GSE submission is private, the raw data deposited at SRA may not be accessible until the embargo on GSE229626 has been lifted. Installation and Instructions -------------------------------------- The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation: > Ensure you have R version 4.1.2 or higher for compatibility. > Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code. The following code can be used to set working directory in R: > setwd(directory) Steps: 1. Download the "Human_code_April2023.R" and "Install_Packages.R" R scripts, and the processed data from GSE229626. 2. Open "R-Studios"(https://www.rstudio.com/tags/rstudio-ide/) or a similar integrated development environment (IDE) for R. 3. Set your working directory to where the following files are located: - Human_code_April2023.R - Install_Packages.R 4. Open the file titled Install_Packages.R and execute it in R IDE. This script will attempt to install all the necessary pacakges, and its dependencies. 5. Open the Human_code_April2023.R R script and execute commands as necessary.

Search
Clear search
Close search
Google apps
Main menu