79 datasets found

d
Data from: Gene Expression Omnibus (GEO)
catalog.data.gov
data.virginia.gov
+2more
Updated Jul 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institutes of Health (NIH) (2023). Gene Expression Omnibus (GEO) [Dataset]. https://catalog.data.gov/dataset/gene-expression-omnibus-geo
Explore at:
Dataset updated
Jul 26, 2023
Dataset provided by
National Institutes of Health (NIH)
Description
Gene Expression Omnibus is a public functional genomics data repository supporting MIAME-compliant submissions of array- and sequence-based data. Tools are provided to help users query and download experiments and curated gene expression profiles.
r
Data from: Gene Expression Omnibus (GEO)
rrid.site
scicrunch.org
+1more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Gene Expression Omnibus (GEO) [Dataset]. http://identifiers.org/RRID:SCR_005012
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_005012
Dataset updated
Jan 29, 2022
Description
Functional genomics data repository supporting MIAME-compliant data submissions. Includes microarray-based experiments measuring the abundance of mRNA, genomic DNA, and protein molecules, as well as non-array-based technologies such as serial analysis of gene expression (SAGE) and mass spectrometry proteomic technology. Array- and sequence-based data are accepted. Collection of curated gene expression DataSets, as well as original Series and Platform records. The database can be searched using keywords, organism, DataSet type and authors. DataSet records contain additional resources including cluster tools and differential expression queries.
d
GEO (Gene Expression Omnibus)
catalog.data.gov
data.virginia.gov
+2more
Updated Jun 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Library of Medicine (2025). GEO (Gene Expression Omnibus) [Dataset]. https://catalog.data.gov/dataset/gene-expression-omnibus-geo-e0e2a
Explore at:
Dataset updated
Jun 19, 2025
Dataset provided by
National Library of Medicine
Description
GEO (Gene Expression Omnibus) is a public functional genomics data repository supporting MIAME-compliant data submissions. There are also tools provided to help users query and download experiments and curated gene expression profiles.
Gene Expression Omnibus (GEO) - ypwa-g5v3 - Archive Repository
healthdata.gov
application/rdfxml +5
Updated Jun 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Gene Expression Omnibus (GEO) - ypwa-g5v3 - Archive Repository [Dataset]. https://healthdata.gov/dataset/Gene-Expression-Omnibus-GEO-ypwa-g5v3-Archive-Repo/e6ie-fuc4
Explore at:
json, xml, tsv, csv, application/rssxml, application/rdfxmlAvailable download formats
Dataset updated
Jun 28, 2025
Description
This dataset tracks the updates made on the dataset "Gene Expression Omnibus (GEO)" as a repository for previous versions of the data and metadata.
Gene Expression Omnibus (GEO) - c73c-g6pf - Archive Repository
healthdata.gov
application/rdfxml +5
Updated Feb 25, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Gene Expression Omnibus (GEO) - c73c-g6pf - Archive Repository [Dataset]. https://healthdata.gov/dataset/Gene-Expression-Omnibus-GEO-c73c-g6pf-Archive-Repo/hn98-zsct
Explore at:
csv, json, application/rdfxml, xml, tsv, application/rssxmlAvailable download formats
Dataset updated
Feb 25, 2021
Description
This dataset tracks the updates made on the dataset "Gene Expression Omnibus (GEO)" as a repository for previous versions of the data and metadata.
Field-wide assessment of differential HT-seq from NCBI GEO database
zenodo.org
application/gzip
Updated Jan 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Taavi Päll; Taavi Päll; Hannes Luidalepp; Tanel Tenson; Tanel Tenson; Ülo Maiväli; Ülo Maiväli; Hannes Luidalepp (2023). Field-wide assessment of differential HT-seq from NCBI GEO database [Dataset]. http://doi.org/10.5281/zenodo.5070518
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5070518
Dataset updated
Jan 13, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Taavi Päll; Taavi Päll; Hannes Luidalepp; Tanel Tenson; Tanel Tenson; Ülo Maiväli; Ülo Maiväli; Hannes Luidalepp
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We analyzed the field of expression profiling by high throughput sequencing, or HT-seq, in terms of replicability and reproducibility, using data from the NCBI GEO (Gene Expression Omnibus) repository. Our work puts an upper bound of 62% to field-wide reproducibility, based on the types of files submitted to GEO.

Archived dataset contains following files:

- output/parsed_suppfiles.csv, p-value histograms, histogram classes, estimated number of true null hypotheses (pi0).

- output/document_summaries.csv, document summaries of NCBI GEO series

- output/publications.csv, publication info of NCBI GEO series

- output/scopus_citedbycount.csv, Scopus citation info of NCBI GEO series

- output/single-cell.csv, single cell experiments

- spots.csv, NCBI SRA sequencing run metadata

- suppfilenames.txt, list of all supplementary file names of NCBI GEO submissions. One filename per row.

- suppfilenames_filtered.txt, list of supplementary file names used for downloading files from NCBI GEO. One filename per row.
Field-wide assessment of differential HT-seq from NCBI GEO database
zenodo.org
application/gzip
Updated Jan 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Taavi Päll; Taavi Päll; Hannes Luidalepp; Tanel Tenson; Tanel Tenson; Ülo Maiväli; Ülo Maiväli; Hannes Luidalepp (2023). Field-wide assessment of differential HT-seq from NCBI GEO database [Dataset]. http://doi.org/10.5281/zenodo.5356064
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5356064
Dataset updated
Jan 13, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Taavi Päll; Taavi Päll; Hannes Luidalepp; Tanel Tenson; Tanel Tenson; Ülo Maiväli; Ülo Maiväli; Hannes Luidalepp
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We analysed the field of expression profiling by high throughput sequencing, or HT-seq, in terms of replicability and reproducibility, using data from the NCBI GEO (Gene Expression Omnibus) repository.

- This release includes GEO series up to Dec-31, 2020;

- Fixed xlrd missing optional dependency, which affected import of some xls files, previously we were using only openpyxl (thanks to anonymous reviewer);

- All files in supplementary _RAW.tar files were checked for p values, previously _RAW.tar files were completely omitted, alas (thanks to anonymous reviewer).

Archived dataset contains following files:

- output/parsed_suppfiles.csv, p-value histograms, histogram classes, estimated number of true null hypotheses (pi0).

- output/document_summaries.csv, document summaries of NCBI GEO series

- output/publications.csv, publication info of NCBI GEO series

- output/scopus_citedbycount.csv, Scopus citation info of NCBI GEO series

- output/single-cell.csv, single cell experiments

- spots.csv, NCBI SRA sequencing run metadata

- suppfilenames.txt, list of all supplementary file names of NCBI GEO submissions. One filename per row.

- suppfilenames_filtered.txt, list of supplementary file names used for downloading files from NCBI GEO. One filename per row.
Evaluation of pre-processing on the meta-analysis of DNA methylation data...
plos.figshare.com
xlsx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Claudia Sala; Pietro Di Lena; Danielle Fernandes Durso; Andrea Prodi; Gastone Castellani; Christine Nardini (2023). Evaluation of pre-processing on the meta-analysis of DNA methylation data from the Illumina HumanMethylation450 BeadChip platform [Dataset]. http://doi.org/10.1371/journal.pone.0229763
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0229763
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Claudia Sala; Pietro Di Lena; Danielle Fernandes Durso; Andrea Prodi; Gastone Castellani; Christine Nardini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionMeta-analysis is a powerful means for leveraging the hundreds of experiments being run worldwide into more statistically powerful analyses. This is also true for the analysis of omic data, including genome-wide DNA methylation. In particular, thousands of DNA methylation profiles generated using the Illumina 450k are stored in the publicly accessible Gene Expression Omnibus (GEO) repository. Often, however, the intensity values produced by the BeadChip (raw data) are not deposited, therefore only pre-processed values -obtained after computational manipulation- are available. Pre-processing is possibly different among studies and may then affect meta-analysis by introducing non-biological sources of variability.Material and methodsTo systematically investigate the effect of pre-processing on meta-analysis, we analysed four different collections of DNA methylation samples (datasets), each composed of two subsets, for which raw data from controls (i.e. healthy subjects) and cases (i.e. patients) are available. We pre-processed the data from each dataset with nine among the most common pipelines found in literature. Moreover, we evaluated the performance of regRCPqn, a modification of the RCP algorithm that aims to improve data consistency. For each combination of pre-processing (9 × 9), we first evaluated the between-sample variability among control subjects and, then, we identified genomic positions that are differentially methylated between cases and controls (differential analysis).Results and conclusionThe pre-processing of DNA methylation data affects both the between-sample variability and the loci identified as differentially methylated, and the effects of pre-processing are strongly dataset-dependent. By contrast, application of our renormalization algorithm regRCPqn: (i) reduces variability and (ii) increases agreement between meta-analysed datasets, both critical components of data harmonization.
f
Data from: Metadata record for the manuscript: FOXA1 and adaptive response...
springernature.figshare.com
xlsx
Updated Feb 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Steven P. Angus; Timothy J. Stuhlmiller; Gaurav Mehta; Samantha M. Bevill; Daniel R. Goulet; J. Felix Olivares-Quintero; Michael P. East; Maki Tanioka; Jon S. Zawistowski; Darshan Singh; Noah Sciaky; Xin Chen; Xiaping He; Naim U. Rashid; Lynn Chollet-Hinton; Cheng Fan; Matthew G. Soloway; Patricia A. Spears; Stuart Jefferys; Joel S. Parker; Kristalyn K. Gallagher; Andres Forero-Torres; Ian E. Krop; Alastair M. Thompson; Rashmi Murthy; Michael L. Gatza; Charles M. Perou; H. Shelton Earp; Lisa A. Carey; Gary L. Johnson (2024). Metadata record for the manuscript: FOXA1 and adaptive response determinants to HER2 targeted therapy in TBCRC 036 [Dataset]. http://doi.org/10.6084/m9.figshare.14376746.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14376746.v1
Dataset updated
Feb 14, 2024
Dataset provided by
figshare
Authors
Steven P. Angus; Timothy J. Stuhlmiller; Gaurav Mehta; Samantha M. Bevill; Daniel R. Goulet; J. Felix Olivares-Quintero; Michael P. East; Maki Tanioka; Jon S. Zawistowski; Darshan Singh; Noah Sciaky; Xin Chen; Xiaping He; Naim U. Rashid; Lynn Chollet-Hinton; Cheng Fan; Matthew G. Soloway; Patricia A. Spears; Stuart Jefferys; Joel S. Parker; Kristalyn K. Gallagher; Andres Forero-Torres; Ian E. Krop; Alastair M. Thompson; Rashmi Murthy; Michael L. Gatza; Charles M. Perou; H. Shelton Earp; Lisa A. Carey; Gary L. Johnson
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Summary

This metadata record provides details of the data supporting the claims of the related manuscript: “FOXA1 and adaptive response determinants to HER2 targeted therapy in TBCRC 036”.

The related study aimed to determine the global alterations in gene enhancers and transcriptional changes to identify factors involved in the adaptive response to HER2 inhibition. In parallel, it analysed the in vivo human adaptive molecular responses to HER2 targeting in a window-of-opportunity clinical trial using both RNAseq and a chemical proteomics method (MIB/MS) to assess the functional kinome.

Type of data: mass spectrometry proteomics data; normalised patient RNA sequencing data; cell line RNA sequencing data; cell line ChIPseq data

Subject of data: Homo sapiens; Eukaryotic cell lines

Recruitment: Eligible women included those with newly diagnosed Stage I-IV HER2+ breast cancer scheduled to undergo definitive surgery (either lumpectomy or mastectomy). Stage I-IIIc patients could not be candidates for a therapeutic neoadjuvant treatment. Study subjects provided informed written consent that included details of the nontherapeutic nature of the trial.

Trial registration number: https://clinicaltrials.gov/ct2/show/NCT01875666

Data access

The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the data set identifier https://identifiers.org/pride.project:PXD021865.

Normalized patient RNAseq data (https://identifiers.org/geo:GSE161743), cell line RNAseq (https://identifiers.org/geo:GSE160001 and https://identifiers.org/geo:GSE160001), and cell line ChIPseq (https://identifiers.org/geo:GSE160667) are all part of the SuperSeries https://identifiers.org/geo:GSE160670 available through the Gene Expression Omnibus.

Processed and normalized data are provided as supplemental materials associated with the article on the journal website, and also attached to this data record in the Excel spreadsheets called Supplementary Data 1-10 and the PDF called Supplementary material file.PDF. Accompanying Supplementary Information and Supplementary Data files contain relevant data used to produce the included figures and are available with this article. A detailed list of which data files underlie which figures and tables in the related article is included in the file ‘Angus_et_al_2021_underlying_data_files_list.xlsx’, which is shared with this data record.

The data supporting Figure 3c is in the GraphPad Prism file called ‘siGrowth’, which is not shared publicly as it is in a non-open format, but it can be made available upon reasonable request to the corresponding author.

Corresponding author(s) for this study

Gary L. Johnson, PhD, Department of Pharmacology, 4079 Genetic Medicine Building, University of North Carolina School of Medicine, Chapel Hill, NC 27599. Email: glj@med.unc.edu. Phone: 919-843-3106.

Study approval

Approved by the UNC Office of Human Research Ethics and conducted in accordance with the Declaration of Helsinki. IRB# 13-1826
Flu vaccinated blood samples
kaggle.com
Updated Jan 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Janis (2020). Flu vaccinated blood samples [Dataset]. https://www.kaggle.com/janiscorona/flu-vaccinated-blood-samples/notebooks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 9, 2020
Dataset provided by
Kaggle
Authors
Janis
Description
Context

No matter how much you wash your hands, you are still susceptible to flu airborne viruses or cold viruses in close proximity to others who have a cold or flu. The flu vaccine is a treatment many folks get in hopes of not getting sick that cold/flu season. The flu vaccine is somewhat of a math cheat sheet for your body preparing for a math course final without having to know all of the formulas off hand, but only the ones that are on the exam. If you have a crooked teacher/TA that decided not to allow the cheat sheet to be a good representation of what the content of the final exam is, then you could assume that is how your body will be with a flu vaccine that doesn't have the strand(s) of flu your body is likely to encounter that flu season. I found this data set munging the GEO database sets of NCBI while searching for 'flu vaccines' and wanted some microarray gene expression data sets that I could also compare those values to other blood micro array samples from separate studies on females using EGCG for obesity, and males who do/don't have heart disease. This data can be blended with the other data sets here or in my github repositories at janjanjan2018.

Content

Blood gene expressions of microarray samples.

Acknowledgements

NCBI and the GEO grant funded data repositories of gene expression data.

Inspiration

Sick people.
o
Single-Cell Gene Expression Profiles for Classification Problems
explore.openaire.eu
Updated Mar 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stefano Gualandi; Andrea Codegoni; Eleonora Vercesi (2021). Single-Cell Gene Expression Profiles for Classification Problems [Dataset]. http://doi.org/10.5281/zenodo.4604569
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.4604569
Dataset updated
Mar 15, 2021
Authors
Stefano Gualandi; Andrea Codegoni; Eleonora Vercesi
Description
This repository contains a collection of three datasets we use to introduce the Gene Mover Distance in [1] and described below. The three datasets are exported with a basic text-based format (.csv file) like other public datasets largely used in the Machine Learning community. The three datasets are extracted from the Gene Expression Omnibus (GEO) database [2], where they appear, respectively, with access number GSE116256 (blood leukemia, [3]), GSE84133 (human pancreas, [4]), and GSE67835 (human brain, [5]). In GEO, the datasets are decomposed into several files, which contain much more details than those reported in this version. However, the proposed format should facilitate other researchers in using this data. The Gene Mover's Distance is a measure of similarity between a pair of cells based on their gene expression profiles obtained via single-cell RNA sequencing. The underlying idea of GMD is to interpret the gene expression array of a single cell as a discrete probability measure. The distance between two cells is hence computed by solving an Optimal Transport problem between the two corresponding discrete measures. The Gene Mover's Distance can be used, for instance, to solve two classification problems: the classification of cells according to their condition and according to their type. The repository contains a python script to check the basic statistics of the data. [1] Bellazzi, R., Codegoni, A., Gualandi, S., Nicora, G., Vercesi, E. The Gene Mover's Distance: Single-cell similarity via Optimal Transport. https://arxiv.org/abs/2102.01218 [2] Gene Expression Omnibus (GEO) database, http://www.ncbi.nlm.nih.gov/geo [3] van Galen, P., Hovestadt, V., Wadsworth II, M.H., Hughes, T.K., Griffin, G.K., Battaglia, S., Verga, J.A., Stephansky, J., Pastika, T.J., Story, J.L. and Pinkus, G.S., 2019. Single-cell RNA-seq reveals AML hierarchies relevant to disease progression and immunity. Cell, 176(6), pp.1265-1281. [4] Baron, M., Veres, A., Wolock, S.L., Faust, A.L., Gaujoux, R., Vetere, A., Ryu, J.H., Wagner, B.K., Shen-Orr, S.S., Klein, A.M. and Melton, D.A., 2016. A single-cell transcriptomic map of the human and mouse pancreas reveals inter-and intra-cell population structure. Cell systems, 3(4), pp.346-360. [5] Darmanis, S., Sloan, S.A., Zhang, Y., Enge, M., Caneda, C., Shuer, L.M., Gephart, M.G.H., Barres, B.A. and Quake, S.R., 2015. A survey of human brain transcriptome diversity at the single cell level. Proceedings of the National Academy of Sciences, 112(23), pp.7285-7290.
s
Hydro-Geo database management system
repository.soilwise-he.eu
Updated Mar 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Hydro-Geo database management system [Dataset]. https://repository.soilwise-he.eu/cat/collections/metadata:main/items/6ec9eb26-8dc1-4d28-a6f4-9ea3144de4a8
Explore at:
Dataset updated
Mar 7, 2022
Description
Hydro-Geo database management system is planned to be developed for the efficient management of hydro-Geo data. The system helps users of the corporation to store and use integrated and relevant information on hydrology, geospatial and other related data sources. In addition, the system creates conditions for users to easily access standards, manuals, guidelines, operational procedures and project reports which will be made available to a wide pool of Hydro-Geo information system platform.

Research domain: Other

Research question: How integrated database system enhance data management efficiency?
A field-wide assessment of differential RNAseq reveals ubiquitous bias
zenodo.org
application/gzip
Updated Jan 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Taavi Päll; Taavi Päll; Hannes Luidalepp; Tanel Tenson; Tanel Tenson; Ülo Maiväli; Ülo Maiväli; Hannes Luidalepp (2023). A field-wide assessment of differential RNAseq reveals ubiquitous bias [Dataset]. http://doi.org/10.5281/zenodo.3752549
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3752549
Dataset updated
Jan 13, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Taavi Päll; Taavi Päll; Hannes Luidalepp; Tanel Tenson; Tanel Tenson; Ülo Maiväli; Ülo Maiväli; Hannes Luidalepp
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We analyzed the field of expression profiling by high throughput sequencing, or RNA-seq, in terms of replicability and reproducibility, using data from the GEO (Gene Expression Omnibus) repository. Our work puts an upper bound of 56% to field-wide reproducibility, based on the types of files submitted to GEO.

Archived dataset contains four files in csv format:

- output/parsed_suppfiles.csv, p-value histograms, histogram classes, estimated number of true null hypotheses (pi0)

- output/document_summaries.csv, document summaries of GEO series

- output/publications.csv, publication info of GEO series

- output/scopus_citedbycount.csv, Scopus citation info of GEO series
o
Data from: Global Geo-processed Data of Aquifer Properties by 0.5° Grid,...
osti.gov
Updated Feb 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
USDOE Office of Science (SC), Biological and Environmental Research (BER) (2024). Global Geo-processed Data of Aquifer Properties by 0.5° Grid, Country and Water Basins [Dataset]. http://doi.org/10.57931/2307831
Explore at:
Unique identifier
https://doi.org/10.57931/2307831
Dataset updated
Feb 18, 2024
Dataset provided by
USDOE Office of Science (SC), Biological and Environmental Research (BER)
MultiSector Dynamics - Living, Intuitive, Value-adding, Environment
Description
This repository of global hydrogeologic datasets contains aquifer properties on 0.5° scale, including depth to groundwater (Fan et al., 2013), aquifer thickness (de Graaf et al., 2015), WHYMap aquifer classes (Richts et al., 2011), porosity and permeability (Gleeson et al., 2014), digitized and geo-processed from their respective sources. Globally gridded aquifer properties could be used independently to estimate global groundwater availability or used as critical inputs to the superwell model to simulate groundwater extraction and provide estimates of pumped volumes and unit costs under user-specific scenarios. Key resources related to this data are: Niazi, H., Ferencz, S., Graham, N., Yoon, J., Wild, T., Hejazi, M., Watson, D., & Vernon, C. (2024; In-prep). Long-term Hydro-economic Assessment Tool for Evaluating Global Groundwater Cost and Supply: Superwell v1. Geoscientific Model Development. superwell model repository which uses this data to simulate groundwater extraction and provides estimates of the global extractable volumes and unit-costs ($/km3) of accessible groundwater production under user-specified extraction scenarios. Repository Overview Main output: aquifer_properties.csv contains all processed outputs, including aquifer properties like porosity, permeability, aquifer thickness, and depth to groundwater. shapefiles.zip: contains all digitized GIS databases and shapefile for all aquifer properties prep_inputs.R: R script that processes the shapefiles to produce the aquifer_properties file plot_inputs.R: R script for plotting the maps and conducting preliminary analysis on the available groundwater volume basin_to_country_mapping.csv, basin_country_region_mapping.csv and continent_county_mapping.csv provide the mapping between continents, 32 energy-economic macro regions, countries, and water basins for post-processing aquifer_properties.csv Maps: Each map visualizes the spatial distribution of one of the aquifer properties across the globe map_in_Porosity.png map_in_Permeability.png map_in_Aquifer_thickness.png map_in_Depth_to_water.png map_in_Grid_area_km.png map_in_WHYClass.png Sample inputs sample_inputs.py: this script samples inputs from the aquifer_properties dataset, ensuring the sampled and original inputs maintain the same distributions sampled_data_100.csv contains 100 sampled data points and sampled_data_100.png compares their distributions Dataset Overview The main outputs are consolidated in a comprehensive aquifer_properties.csv file and include the following fields: GridCellID: Unique identifier for each (roughly 0.5°) grid cell Continent: Continent name Country: Country name GCAM_basin_ID: Identifier for GCAM hydrologic basin Basin_long_name: Full name of the basin WHYClass: Hydrogeologic classification based on WHYMap aquifer classes (Richts et al., 2011) Porosity: Soil porosity (%) (Gleeson et al., 2014) Permeability: Soil permeability (in square meters; Gleeson et al., 2014) Aquifer_thickness: Thickness of the aquifer (in meters; de Graaf et al., 2015) Depth_to_water: Depth to groundwater (in meters; Fan et al., 2013) Grid_area: Area of the grid cell (in square meters) Key References The datasets are digitized versions of global hydrogeologic properties from the following key literature sources: Depth to Groundwater: Fan, Y., Li, H., & Miguez-Macho, G. (2013). Global Patterns of Groundwater Table Depth. Science, 339(6122), 940-943. https://doi.org/10.1126/science.1229881 Aquifer Thickness: de Graaf, I. E. M., Sutanudjaja, E. H., van Beek, L. P. H., & Bierkens, M. F. P. (2015). A high-resolution global-scale groundwater model. Hydrol. Earth Syst. Sci., 19(2), 823-837. https://doi.org/10.5194/hess-19-823-2015 Porosity and Permeability: Gleeson, T., Moosdorf, N., Hartmann, J., & van Beek, L. P. H. (2014). A glimpse beneath earth's surface: GLobal HYdrogeology MaPS (GLHYMPS) of permeability and porosity. Geophysical Research Letters, 41(11), 3891-3898. https://doi.org/10.1002/2014GL059856 Aquifer classes: Richts, A., Struckmeier, W. F., & Zaepke, M. (2011). WHYMAP and the Groundwater Resources Map of the World 1:25,000,000. In J. A. A. Jones (Ed.), Sustaining Groundwater Resources: A Critical Element in the Global Water Crisis (pp. 159-173). Springer Netherlands. https://doi.org/10.1007/978-90-481-3426-7_10 Cite as Niazi, H., Watson, D., Hejazi, M., Yonkofski, C., Ferencz, S., Vernon, C., Graham, N., Wild, T., & Yoon, J. (2024). Global Geo-processed Data of Aquifer Properties by 0.5° Grid, Country and Water Basins. MSD-LIVE Data Repository. https://doi.org/10.57931/2307831 Contact Reach out to Hassan Niazi or open an issue in superwell repository for questions or suggestions.
f
Assessing Concordance of Drug-Induced Transcriptional Response in Rodent...
figshare.com
tiff
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeffrey J. Sutherland; Robert A. Jolly; Keith M. Goldstein; James L. Stevens (2023). Assessing Concordance of Drug-Induced Transcriptional Response in Rodent Liver and Cultured Hepatocytes [Dataset]. http://doi.org/10.1371/journal.pcbi.1004847
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1004847
Dataset updated
May 30, 2023
Dataset provided by
PLOS Computational Biology
Authors
Jeffrey J. Sutherland; Robert A. Jolly; Keith M. Goldstein; James L. Stevens
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The effect of drugs, disease and other perturbations on mRNA levels are studied using gene expression microarrays or RNA-seq, with the goal of understanding molecular effects arising from the perturbation. Previous comparisons of reproducibility across laboratories have been limited in scale and focused on a single model. The use of model systems, such as cultured primary cells or cancer cell lines, assumes that mechanistic insights derived from the models would have been observed via in vivo studies. We examined the concordance of compound-induced transcriptional changes using data from several sources: rat liver and rat primary hepatocytes (RPH) from Drug Matrix (DM) and open TG-GATEs (TG), human primary hepatocytes (HPH) from TG, and mouse liver / HepG2 results from the Gene Expression Omnibus (GEO) repository. Gene expression changes for treatments were normalized to controls and analyzed with three methods: 1) gene level for 9071 high expression genes in rat liver, 2) gene set analysis (GSA) using canonical pathways and gene ontology sets, 3) weighted gene co-expression network analysis (WGCNA). Co-expression networks performed better than genes or GSA when comparing treatment effects within rat liver and rat vs. mouse liver. Genes and modules performed similarly at Connectivity Map-style analyses, where success at identifying similar treatments among a collection of reference profiles is the goal. Comparisons between rat liver and RPH, and those between RPH, HPH and HepG2 cells reveal lower concordance for all methods. We observe that the baseline state of untreated cultured cells relative to untreated rat liver shows striking similarity with toxicant-exposed cells in vivo, indicating that gross systems level perturbation in the underlying networks in culture may contribute to the low concordance.
s
StemBase
scicrunch.org
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
StemBase [Dataset]. http://identifiers.org/RRID:SCR_006252)
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_006252)
Description
A publicly accessible database containing data on Affymetrix DNA microarray experiments, and Serial Analysis of Gene Expression, mostly on human and mouse stem cell samples and their derivatives to facilitate the discovery of gene functions relevant to stem cell control and differentiation. It has grown in both size and scope into a system with analysis tools that examine either the whole database at once, or slices of data, based on tissue type, cell type or gene of interest. There is currently more than 210 stem cell samples in 60 different experiments, with more being added regularly. The samples were originated by researchers of the Stem Cell Network and processed at the Core Facility of Stemcore Laboratories under the management of Ms. Pearl Campbell in the frame of the Stem Cell Genomics Project. Periodically, new expression data is submitted to the Gene Expression Omnibus (GEO) repository at the National Center for Biotechnological Information, in order to allow researchers to compare the data deposited in StemBase to a large amount of gene expression data sets. StemBase is different from GEO in both focus and scope. StemBase is concerned exclusively with stem cell related data. we are focused in Stem Cell research. We have made a significant effort to ensure the quality and consistency of the data included. This allows us to offer more specialized analysis tools related to Stem Cell data. GEO is intended as a large scale public archive. Deposition in a public repository such as GEO is required by most important scientific journals and it is advantageous for a further diffusion of the data since GEO is more broadly used than StemBase.
Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset
zenodo.org
data.niaid.nih.gov
bin, txt
Updated Nov 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan Hsu; Allart Stoop; Jonathan Hsu; Allart Stoop (2023). Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset [Dataset]. http://doi.org/10.5281/zenodo.10011622
Explore at:
bin, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10011622
Dataset updated
Nov 20, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jonathan Hsu; Allart Stoop; Jonathan Hsu; Allart Stoop
Description
Table of Contents
Main Description
File Descriptions
Linked Files
Installation and Instructions

1. Main Description
---------------------------
This is the Zenodo repository for the manuscript titled "A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity.". The code included in the file titled `marengo_code_for_paper_jan_2023.R` was used to generate the figures from the single-cell RNA sequencing data.
The following libraries are required for script execution:
Seurat
scReportoire
ggplot2
stringr
dplyr
ggridges
ggrepel
ComplexHeatmap

File Descriptions
---------------------------
The code can be downloaded and opened in RStudios.
The "marengo_code_for_paper_jan_2023.R" contains all the code needed to reproduce the figues in the paper
The "Marengo_newID_March242023.rds" file is available at the following address: https://zenodo.org/badge/DOI/10.5281/zenodo.7566113.svg (Zenodo DOI: 10.5281/zenodo.7566113).
The "all_res_deg_for_heat_updated_march2023.txt" file contains the unfiltered results from DGE anlaysis, also used to create the heatmap with DGE and volcano plots.
The "genes_for_heatmap_fig5F.xlsx" contains the genes included in the heatmap in figure 5F.

Linked Files
---------------------

This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. The "Rdata" or "Rds" file was deposited in Zenodo. Provided below are descriptions of the linked datasets:

Gene Expression Omnibus (GEO) ID: GSE223311(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223311)
Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment.
Description: This submission contains the "matrix.mtx", "barcodes.tsv", and "genes.tsv" files for each replicate and condition, corresponding to the aligned files for single cell sequencing data.
Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

Sequence read archive (SRA) repository ID: SRX19088718 and SRX19088719
Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment.
Description: This submission contains the **raw sequencing** or `.fastq.gz` files, which are tab delimited text files.
Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

Zenodo DOI: 10.5281/zenodo.7566113(https://zenodo.org/record/7566113#.ZCcmvC2cbrJ)
Title: A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity.
Description: This submission contains the "Rdata" or ".Rds" file, which is an R object file. This is a necessary file to use the code.
Submission type: Restricted Acess. In order to gain access to the repository, you must contact the author.

Installation and Instructions
--------------------------------------
The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation:

> Ensure you have R version 4.1.2 or higher for compatibility.

> Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code.

1. Download the *"Rdata" or ".Rds" file from Zenodo (https://zenodo.org/record/7566113#.ZCcmvC2cbrJ) (Zenodo DOI: 10.5281/zenodo.7566113).
2. Open R-Studios (https://www.rstudio.com/tags/rstudio-ide/) or a similar integrated development environment (IDE) for R.
3. Set your working directory to where the following files are located:
marengo_code_for_paper_jan_2023.R
Install_Packages.R
Marengo_newID_March242023.rds
genes_for_heatmap_fig5F.xlsx
all_res_deg_for_heat_updated_march2023.txt

You can use the following code to set the working directory in R:
> setwd(directory)

4. Open the file titled "Install_Packages.R" and execute it in R IDE. This script will attempt to install all the necessary pacakges, and its dependencies in order to set up an environment where the code in "marengo_code_for_paper_jan_2023.R" can be executed.
5. Once the "Install_Packages.R" script has been successfully executed, re-start R-Studios or your IDE of choice.
6. Open the file "marengo_code_for_paper_jan_2023.R" file in R-studios or your IDE of choice.
7. Execute commands in the file titled "marengo_code_for_paper_jan_2023.R" in R-Studios or your IDE of choice to generate the plots.
W
NOAA/WDS Paleoclimatology - GEO Prisms
cloud.csiss.gmu.edu
Updated Mar 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States (2021). NOAA/WDS Paleoclimatology - GEO Prisms [Dataset]. https://cloud.csiss.gmu.edu/uddi/dataset/noaa-wds-paleoclimatology-geo-prisms
Explore at:
Dataset updated
Mar 7, 2021
Dataset provided by
United States
Description
This archived Paleoclimatology Study is available from the NOAA National Centers for Environmental Information (NCEI), under the World Data Service (WDS) for Paleoclimatology. The associated NCEI study type is Repository. The data include parameters of repository with a geographic location of . The time period coverage is from Unavailable begin date to Unavailable end date in calendar years before present (BP). See metadata information for parameter and study location details. Please cite this study when using the data.
m
Data Normalization Method for Geo-Spatial Analysis on Ports
data.mendeley.com
Updated Jun 11, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nazmus Sakib (2020). Data Normalization Method for Geo-Spatial Analysis on Ports [Dataset]. http://doi.org/10.17632/skn24jntn3.2
Explore at:
Unique identifier
https://doi.org/10.17632/skn24jntn3.2
Dataset updated
Jun 11, 2020
Authors
Nazmus Sakib
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Based on open access data, 79 Mediterranean passenger ports are analyzed to compare their infrastructure, hinterland accessibility and offered multi-modality categories. Comparative Geo-spatial analysis is also carried out by using the data normalization method in order to visualize the ports' performance on maps. These data driven comprehensive analytical results can bring added value to sustainable development policy and planning initiatives in the Mediterranean Region. The analyzed elements can be also contributed to the development of passenger port performance indicators. The empirical research methods used for the Mediterranean passenger ports can be replicated for transport nodes of any region around the world to determine their relative performance on selected criteria for improvement and planning.

The Mediterranean passenger ports were initially categorized into cruise and ferry ports. The cruise ports were identified from the member list of the Association for the Mediterranean Cruise Ports (MedCruise), representing more than 80% of the cruise tourism activities per country. The identified cruise ports were mapped by selecting the corresponding geo-referenced ports from the map layer developed by the European Marine Observation and Data Network (EMODnet). The United Nations (UN) Code for Trade and Transport Locations (LOCODE) was identified for each of the cruise ports as the common criteria to carry out the selection. The identified cruise ports not listed by the EMODnet were added to the geo-database by using under license the editing function of the ArcMap (version 10.1) geographic information system software. The ferry ports were identified from the open access industry initiative data provided by the Ferrylines, and were mapped in a similar way as the cruise ports (Figure 1).

Based on the available data from the identified cruise ports, a database (see Table A1–A3) was created for a Mediterranean scale analysis. The ferry ports were excluded due to the unavailability of relevant information on selected criteria (Table 2). However, the cruise ports serving as ferry passenger ports were identified in order to maximize the scope of the analysis. Port infrastructure and hinterland accessibility data were collected from the statistical reports published by the MedCruise, which are a compilation of data provided by its individual member port authorities and the cruise terminal operators. Other supplementary sources were the European Sea Ports Organization (ESPO) and the Global Ports Holding, a cruise terminal operator with an established presence in the Mediterranean. Additionally, open access data sources (e.g. the Google Maps and Trip Advisor) were consulted in order to identify the multi-modal transports and bridge the data gaps on hinterland accessibility by measuring the approximate distances.
d
Raw Data for Analysis of gene expression in CD8+ T cells in Nivolumab trials...
search.dataone.org
dataverse.harvard.edu
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sukla, Sanskrita (2023). Raw Data for Analysis of gene expression in CD8+ T cells in Nivolumab trials [Dataset]. http://doi.org/10.7910/DVN/OEERV4
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/OEERV4
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Sukla, Sanskrita
Description
Raw sequencing data as downloaded from GEO repository (accession ID: GSE111414) and demultiplexed. Contains expression data of CD8+ T cells of patients involved in this trial

Facebook

Twitter

Click to copy link

Link copied

Cite

National Institutes of Health (NIH) (2023). Gene Expression Omnibus (GEO) [Dataset]. https://catalog.data.gov/dataset/gene-expression-omnibus-geo

Data from: Gene Expression Omnibus (GEO)

Explore at:

Dataset updated

Jul 26, 2023

Dataset provided by

National Institutes of Health (NIH)

Description

Gene Expression Omnibus is a public functional genomics data repository supporting MIAME-compliant submissions of array- and sequence-based data. Tools are provided to help users query and download experiments and curated gene expression profiles.

Clear search

Close search

Google apps

Main menu

Data from: Gene Expression Omnibus (GEO)

Data from: Gene Expression Omnibus (GEO)

GEO (Gene Expression Omnibus)

Gene Expression Omnibus (GEO) - ypwa-g5v3 - Archive Repository

Gene Expression Omnibus (GEO) - c73c-g6pf - Archive Repository

Field-wide assessment of differential HT-seq from NCBI GEO database

Field-wide assessment of differential HT-seq from NCBI GEO database

Evaluation of pre-processing on the meta-analysis of DNA methylation data...

Data from: Metadata record for the manuscript: FOXA1 and adaptive response...

Flu vaccinated blood samples

Context

Content

Acknowledgements

Inspiration

Single-Cell Gene Expression Profiles for Classification Problems

Hydro-Geo database management system

A field-wide assessment of differential RNAseq reveals ubiquitous bias

Data from: Global Geo-processed Data of Aquifer Properties by 0.5° Grid,...

Assessing Concordance of Drug-Induced Transcriptional Response in Rodent...

StemBase

Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset

NOAA/WDS Paleoclimatology - GEO Prisms

Data Normalization Method for Geo-Spatial Analysis on Ports

Raw Data for Analysis of gene expression in CD8+ T cells in Nivolumab trials...

Data from: Gene Expression Omnibus (GEO)