100+ datasets found

Coccolithophore Abundance, Size, Carbon And Distribution Estimates (CASCADE)...
zenodo.org
data.niaid.nih.gov
zip
Updated Jul 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joost de Vries; Joost de Vries; Alex J. Poulton; Alex J. Poulton; Jeremy R. Young; Jeremy R. Young; Fanny M. Monteiro; Fanny M. Monteiro; Rosie M. Sheward; Rosie M. Sheward; Roberta Johnson; Kyoko Hagino; Kyoko Hagino; Patrizia Ziveri; Patrizia Ziveri; Levi J. Wolf; Levi J. Wolf; Roberta Johnson (2024). Coccolithophore Abundance, Size, Carbon And Distribution Estimates (CASCADE) [Dataset]. http://doi.org/10.5281/zenodo.12794780
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12794780
Dataset updated
Jul 22, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Joost de Vries; Joost de Vries; Alex J. Poulton; Alex J. Poulton; Jeremy R. Young; Jeremy R. Young; Fanny M. Monteiro; Fanny M. Monteiro; Rosie M. Sheward; Rosie M. Sheward; Roberta Johnson; Kyoko Hagino; Kyoko Hagino; Patrizia Ziveri; Patrizia Ziveri; Levi J. Wolf; Levi J. Wolf; Roberta Johnson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CASCADE is a global dataset for 139 extant coccolithophore taxonomic units. CASCADE includes a trait database (size and cellular organic and inorganic carbon contents) and taxonomic-specific global spatiotemporal distributions (Lat/Lon/Depth/Month/Year) of coccolithophore abundance and organic and inorganic carbon stocks. CASCADE covers all ocean basins over the upper 275 meters, spans the years 1964-2019 and includes 33,119 taxonomic-specific abundance observations. Within CASCADE, we characterise the underlying uncertainties due to measurement errors by propagating error estimates between the different studies.

Full details of the data set are provided in the associated Scientific Data manuscript. The repository contains five main folders: 1) "Classification", which contains YAML files with synonyms, family-level classifications, and life cycle phase associations and definitions; 2) "Concatenated literature", which contains the merged datasets of size, PIC and POC and which were corrected for taxonomic unit synonyms; 3) "Resampled cellular datasets", which contains the resampled datasets of size, PIC and POC in long format as well as a summary table; 4) "Gridded data sets", which contains gridded datasets of abundance, PIC and POC; 5) "Species lists", which contains spreadsheets of the "common" (>20 obs) and "rare" (<20 obs) species and their number of observations.

The CASCADE data set can be easily reproduced using the scripts and data provided in the associated github repository: https://github.com/nanophyto/CASCADE/tree/v0.1.1" target="_blank" rel="noopener">https://github.com/nanophyto/CASCADE/ (zenodo.12797197)

Correspondence to: Joost de Vries, joost.devries@bristol.ac.uk
A single-cell and spatially resolved atlas of human breast cancers | spatial...
zenodo.org
data.niaid.nih.gov
application/gzip, pdf
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sunny Z Wu; Sunny Z Wu; Alexander Swarbrick; Alexander Swarbrick (2024). A single-cell and spatially resolved atlas of human breast cancers | spatial transcriptomics data [Dataset]. http://doi.org/10.5281/zenodo.4739739
Explore at:
application/gzip, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4739739
Dataset updated
Jul 19, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sunny Z Wu; Sunny Z Wu; Alexander Swarbrick; Alexander Swarbrick
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains spatial transcriptomics data related to the Wu et al. 2021 study "A single-cell and spatially resolved atlas of human breast cancers". Processed count matrices, brightfield HE-images (plain and annotated) and meta-data (containing clinical information and spot pathological details) for 6 primary breast cancers profiled using the Visium assay (10X Genomics). If you use this dataset in your research, please consider citing the above study.

The content of the files are:
raw_count_matrices.tar.gz - spaceranger processed raw count matrices.

spatial.tar.gz - spaceranger processed spatial files (images, scalefactors, aligned fiducials, position lists)

filtered_count_matrices.tar.gz - filtered count matrices.

metadata.tar.gz - metadata for tissues and spots of filtered count matrices, including clinical subtype and pathological annotation of each spot.

images.pdf - pdf detailing the H&E and annotation images.
The North Pacific Eukaryotic Gene Catalog: metatranscriptome assemblies with...
zenodo.org
application/gzip
Updated Jan 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mora Groussman; Mora Groussman; Stephen Blaskowski; Stephen Blaskowski; Sacha Coesel; Sacha Coesel; E. Virginia Armbrust; E. Virginia Armbrust (2025). The North Pacific Eukaryotic Gene Catalog: metatranscriptome assemblies with taxonomy, function and abundance annotations [Dataset]. http://doi.org/10.5281/zenodo.12630398
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12630398
Dataset updated
Jan 22, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mora Groussman; Mora Groussman; Stephen Blaskowski; Stephen Blaskowski; Sacha Coesel; Sacha Coesel; E. Virginia Armbrust; E. Virginia Armbrust
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data continues with the development of the unprocessed NPEGC Trinity de novo metatranscriptome assemblies, uploaded to this Zenodo repository for raw assemblies: The North Pacific Eukaryotic Gene Catalog: Raw assemblies from Gradients 1, 2 and 3

A full description of this data is published in Scientific Data, available here: The North Pacific Eukaryotic Gene Catalog of metatranscriptome assemblies and annotations. Please cite this publication if your research uses this data:

Groussman, R. D., Coesel, S. N., Durham, B. P., Schatz, M. J., & Armbrust, E. V. (2024). The North Pacific Eukaryotic Gene Catalog of metatranscriptome assemblies and annotations. Scientific Data, 11(1), 1161.

Excerpts of key processing steps are sampled below with links to the detailed code on the main github code repository: https://github.com/armbrustlab/NPac_euk_gene_catalog

Processing and annotation of protein-level NPEGC metatranscripts is done in 6 primary steps:
1. Six-frame translation into protein sequences
2. Frame-selection of protein-coding translation frames
3. Clustering of protein sequences at 99% sequence identity
4. Taxonomic annotation against MarFERReT v1.1 + MARMICRODB v1.0 multi-kingdom marine reference protein sequence library with DIAMOND
5. Functional annotation against Pfam 35.0 protein family HMM profiles using HMMER3
6. Functional annotation against KOfam HMM profiles (KEGG release 104.0) using KofamScan v1.3.0

# Define local NPEGC base directory here:
NPEGC_DIR="/mnt/nfs/projects/armbrust-metat"

# Raw assemblies are located in the /assemblies/raw/ directory
# for each of the metatranscriptome projects
PROJECT_LIST="D1PA G1PA G2PA G3PA G3PA_diel"

# raw Trinity assemblies:
RAW_ASSEMBLY_DIR="${NPEGC_DIR}/${PROJECT}/assemblies/raw"

Translation
We began processing the raw metatranscriptome assemblies by six-frame translation from nucleotide transcripts into three forward and three reverse reading frame translations, using the transeq function in the EMBOSS package. We add a cruise and sample prefix to the sequence IDs to ensure unique identification downstream (ex, `>TRINITY_DN2064353_c0_g1_i1_1` to `>G1PA_S09C1_3um_TRINITY_DN2064353_c0_g1_i1_1` for the S09C1_3um sample in the G1PA assemblies). See NPEGC.6tr_frame_selection_clustering.sh for full code description.

Example of six-frame translation using transeq
transeq -auto -sformat pearson -frame 6 -sequence 6tr/${PREFIX}.Trinity.fasta -outseq 6tr/${PREFIX}.Trinity.6tr.fasta

Frame selection
We use a custom frame-selection python script keep_longest_frame.py to determine the longest coding length in each open reading frame and retain this sequence (or multiple sequences if there is a tie) for downstream analyses. See NPEGC.6tr_frame_selection_clustering.sh for full code description.

Clustering by sequence identity
To reduce sequence redundancy and near-identical sequences, we cluster protein sequences at the 99% sequence identity level and retain the sequence cluster representative in a reduced-size FASTA output file. See NPEGC.6tr_frame_selection_clustering.sh for full code description of linclust/mmseqs clustering.

Sample of linclust clustering script: core mmseqs function
function NPEGC_linclust {
# make an index of the fasta file:
$MMSEQS_DIR/mmseqs createdb $FASTA_PATH/$FASTA_FILE NPac.$STUDY.bf100.db
# cluster sequences at $MIN_SEQ_ID
$MMSEQS_DIR/mmseqs linclust NPac.${STUDY}.bf100.db NPac.${STUDY}.clusters.db NPac_tmp --min-seq-id ${MIN_SEQ_ID}
# retieve cluster representatives:
$MMSEQS_DIR/mmseqs result2repseq NPac.${STUDY}.bf100.db NPac.${STUDY}.clusters.db NPac.${STUDY}.clusters.rep
# generate flat FASTA output with cluster reps
$MMSEQS_DIR/mmseqs result2flat NPac.${STUDY}.bf100.db NPac.${STUDY}.bf100.db NPac.${STUDY}.clusters.rep NPac.${STUDY}.bf100.id99.fasta --use-fasta-header
}

Corresponding files uploaded to this repository: Gzip-compressed FASTA files after translation, frame-selection, and clustering at 99% sequence identity (.bf100.id99.aa.fasta.gz)
NPac.G1PA.bf100.id99.aa.fasta.gz
NPac.G2PA.bf100.id99.aa.fasta.gz
NPac.G3PA.bf100.id99.aa.fasta.gz
NPac.G3PA_diel.bf100.id99.aa.fasta.gz
NPac.D1PA.bf100.id99.aa.fasta.gz

MarFERReT + MARMICRODB taxonomic annotation with DIAMOND

Taxonomy was inferred for the NPEGC metatranscripts with the DIAMOND fast read alignment software against the MarFERReT v1.1 + MARMICRODB v1.0 multi-kingdom marine reference protein sequence library (v1.1), a combined database of the MarFERReT v1.1 marine microbial eukaryote sequence library and MARMICRODB v1.0 prokaryote-focused marine genome database. See NPEGC.diamond_taxonomy.log.sh for full description of DIAMOND annotation.

Excerpt of core DIAMOND function:
function NPEGC_diamond {
# FASTA filename for $STUDY
FASTER_FASTA="NPac.${STUDY}.bf100.id99.aa.fasta"
# Output filename for LCA results in lca.tab file:
LCA_TAB="NPac.${STUDY}.MarFERReT_v1.1_MMDB.lca.tab"
echo "Beginning ${STUDY}"
singularity exec --no-home --bind ${DATA_DIR} \
"${CONTAINER_DIR}/diamond.sif" diamond blastp \
-c 4 --threads $N_THREADS \
--db $MFT_MMDB_DMND_DB -e $EVALUE --top 10 -f 102 \
--memory-limit 110 \
--query ${FASTER_FASTA} -o ${LCA_TAB} >> "${STUDY}.MarFERReT_v1.1_MMDB.log" 2>&1
}

Corresponding files uploaded to this repository: Gzip-compressed diamond lowest common ancestor predictions with NCBI Taxonomy against a combined MarFERReT + MARMICRODB taxonomic library (*.Pfam35.domtblout.tab.gz)
NPac.G1PA.MarFERReT_v1.1_MMDB.lca.tab.gz
NPac.G2PA.MarFERReT_v1.1_MMDB.lca.tab.gz
NPac.G3PA.MarFERReT_v1.1_MMDB.lca.tab.gz
NPac.G3PA_diel.MarFERReT_v1.1_MMDB.lca.tab.gz
NPac.D1PA.MarFERReT_v1.1_MMDB.lca.tab.gz

Pfam 35.0 functional annotation using HMMER3
Clustered protein sequences were annotated against the Pfam 35.0 collection of 19,179 protein family Hidden Markov Models (HMMs) using HMMER 3.3 with the Pfam 35.0 protein family database. Pfam annotation code is documented here: NPEGC.hmmer_function.sh

Excerpt of core hmmsearch function:

function NPEGC_hmmer {
# Define input FASTA
INPUT_FASTA="NPac.${STUDY}.bf100.id99.aa.fasta"
# hmmsearch call:
hmmsearch --cut_tc --cpu $NCORES --domtblout $ANNOTATION_DIR/${STUDY}.Pfam35.domtblout.tab $HMM_PROFILE ${INPUT_FASTA}
# compress output file:
gzip $ANNOTATION_DIR/${STUDY}.Pfam35.domtblout.tab
}

Corresponding files uploaded to this repository: Gzip-compressed hmmsearch domain table files for Pfam35 queries (*.Pfam35.domtblout.tab.gz)
G1PA.Pfam35.domtblout.tab.gz
G2PA.Pfam35.domtblout.tab.gz
G3PA.Pfam35.domtblout.tab.gz
G3PA_diel.Pfam35.domtblout.tab.gz
D1PA.Pfam35.domtblout.tab.gz

KEGG functional annotation using KofamScan v1.3.0

Clustered protein sequences were annotated against the KEGG collection (release 104.0) of 20,819 protein family Hidden Markov Models (HMMs) using KofamScan and KofamKOALA. Kofam annotation code is documented here: NPEGC.kofamscan_function.sh

Excerpt of core NPEGC_kofam function:

# Core function to perform KofamScan annotation
function NPEGC_kofam {
# Define input FASTA
local INPUT_FASTA="NPac.${STUDY}.bf100.id99.aa.fasta"

# KofamScan call
${KOFAM_DIR}/kofam_scan-1.3.0/exec_annotation -f detail-tsv -E ${EVALUE} -o ${ANNOTATION_DIR}/NPac.${STUDY}.bf100.id99.aa.tsv ${FASTA_DIR}/${INPUT_FASTA}

# Keep best hit
QUASR: the QUAsisymmetric Stellarator Repository
zenodo.org
application/gzip
Updated Mar 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew Giuliani; Andrew Giuliani (2024). QUASR: the QUAsisymmetric Stellarator Repository [Dataset]. http://doi.org/10.5281/zenodo.10581415
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10581415
Dataset updated
Mar 26, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Andrew Giuliani; Andrew Giuliani
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This is a dataset of approximately 200,000 vacuum field stellarators along with the electromagnetic coils that generate them. The devices in the database are available in a couple of formats (SIMSOPT, VMEC) useful to the stellarator community.

typo in uploads v1, v2: the 'total_coil_length' and 'coil_length_per_hp' keys in dataframe should read 'total_coil_length_threshold' and ''coil_length_threshold_per_hp'', i.e., the maximum allowable coil length and maximum allowable coil length per half period, respectively.

v2 (January 29, 2024): additional QA devices added.

v1 (October 29, 2023): initial upload.
Corrected IODP Gamma Ray Attenuation (GRA) densities and calculated...
zenodo.org
data.niaid.nih.gov
txt, zip
Updated May 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gary Acton; Gary Acton; Laurel Childress; Laurel Childress; Vincent Percuoco; Vincent Percuoco; Margaret Hastedt; Margaret Hastedt (2024). Corrected IODP Gamma Ray Attenuation (GRA) densities and calculated porosities derived from the LILY Database [Dataset]. http://doi.org/10.5281/zenodo.10001855
Explore at:
zip, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10001855
Dataset updated
May 31, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Gary Acton; Gary Acton; Laurel Childress; Laurel Childress; Vincent Percuoco; Vincent Percuoco; Margaret Hastedt; Margaret Hastedt
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 26, 2023
Description
The dataset GRA_Densities_Corrected_and_Porosities_2023-12-26.csv is derived from an analysis of data from the LILY Database (https://doi.org/10.5281/zenodo.8408296) as described in Childress et al. (2024, https://doi.org/10.1029/2023GC011287). The file contains over 3.7 million corrected gamma ray attenuation (GRA) bulk density data derived from the LILY database file GRA_DataLITH.csv. It also contains over 3.7 million porosity estimates that are computed from the corrected GRA bulk density using grain densities computed for each lithology from Moisture and Density (MAD) grain densities (derived from LILY file MAD_DataLITH.csv).

Citation: Please cite Childress et al. (2024) when using these data:

Childress, L.B., Acton, G.D., Percuoco, V.P., Hastedt, M., 2024. The LILY Database: Linking Lithology to IODP Physical, Chemical, and Magnetic Properties Data, Geochemistry, Geophysics, Geosystems, 25, https://doi.org/10.1029/2023GC011287.

GRA_Densities_Corrected_and_Porosities_2023-12-26.csv file size uncompressed is 950 Mb.

Data File format:

Exp: expedition number

Site: site number

Hole: hole number

Core: core number

Type: Type indicates the coring tool used to recover the core (typical types are F, H, R, X; see Table S3 in Childress et al., 2024, https://doi.org/10.1029/2023GC011287).

Sect: section number

Offset (cm): position of the observation, measured relative to the top of a section.

Depth CSF-A (m): location of the observation expressed relative to the top of a hole.

Bulk density (GRA): bulk GRA density measured on whole core sections in g/cm^3.

Timestamp (UTC): date and time the observation was made.

Instrument: abbreviation or mnemonic for the GRA sensing device used to make this observation (GRA1 or GRA2).

Instrument group: abbreviation or mnemonic for the data collection device (logger) used to acquire this observation (WRMSL).

Text ID: automatically generated unique database identifier for a sample, visible on printed labels.

Prefix: Prefix of the lithology

Principal: Principal lithology

Suffix: Suffix of the lithology

Full Lithology: full lithologic name = Prefix + Principal + Suffix

Simplified Lithology: categorization of lithologies (see Supporting Information in Childress et al., 2024, https://doi.org/10.1029/2023GC011287)

Lithology Type: Sedimentary, Igneous, or Metamorphic

Degree of Consolidation: consolidation state of the lithology.

Lithology Subtype: categorization of lithologies (see Supporting Information in Childress et al., 2024, https://doi.org/10.1029/2023GC011287).

Expanded Core Type: the actual coring type used, because some coring types were incorrectly grouped in the "Type" column (see Childress et al., 2024 for an explanation)

Latitude (DD): Latitude in decimal degrees

Longitude (DD): Longitude in decimal degrees

Water Depth (mbsl): water depth in meters below sea level

Grain Density: grain density associated with the Principal lithology, computed from MAD data

Mean MAD Bulk Density: mean MAD bulk density associated with the Principal lithology.

Std MAD Bulk Density: standard deviation in the MAD bulk densities for each Principal lithology.

Correction Basis: the GRA bulk densities are corrected based on coring tool used. If the RCB was used, then the lithology cored by the RCB is used in determining the size of the correction.

Median Difference: The correction that will be applied based on the median difference between the raw GRA bulk density and the colocated MAD bulk density for a specific Correction Basis.

GRA Bulk Density Corrected: The corrected GRA bulk density in g/cm^3.

Porosity: porosity computed from the corrected GRA bulk densities and grain density.

Deviation: difference between "GRA Bulk Density Corrected" and "Mean MAD Bulk Density", which is the deviation the corrected density has from that expected for its Principal lithology.

N Deviations: The number of standard deviations by which the observation differs from the expected value (= Deviation/(Std MAD Bulk Density)), which is useful for identifying outliers.

GitHub Repository:

Contains a few notebooks to demonstrate how to work with the LILY database

https://github.com/IODP?tab=repositories">IODP LILY GitHub Repository
Data from: Long-Term Wi-Fi fingerprinting dataset and supporting material
zenodo.org
producciocientifica.uv.es
zip
Updated Apr 11, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Germán Martín Mendoza-Silva; Germán Martín Mendoza-Silva; Philipp Richter; Philipp Richter; Joaquín Torres-Sospedra; Joaquín Torres-Sospedra; Elena Simona Lohan; Elena Simona Lohan; Joaquín Huerta; Joaquín Huerta (2020). Long-Term Wi-Fi fingerprinting dataset and supporting material [Dataset]. http://doi.org/10.5281/zenodo.1066041
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1066041
Dataset updated
Apr 11, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Germán Martín Mendoza-Silva; Germán Martín Mendoza-Silva; Philipp Richter; Philipp Richter; Joaquín Torres-Sospedra; Joaquín Torres-Sospedra; Elena Simona Lohan; Elena Simona Lohan; Joaquín Huerta; Joaquín Huerta
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
WiFi measurements database for UJI's library and supporting material.

The measurements were collected by one person using one Android smartphone during 15 months at two floor of the library building from Universitat Jaume I, in Spain. It contains 63,504 WiFi fingerprints, which are organized into datasets. Each dataset is the result of a collection campaign.

The supporting material includes Matlab® scripts to load and filter the desired data, and provides examples on possible studies that the database may enable. The supporting material also includes the bookshelve local coordinates.

Citation request:

G.M. Mendoza-Silva, P. Richter, J. Torres-Sospedra, E.S. Lohan, J. Huerta, "Long-Term
Wi-Fi fingerprinting dataset and supporting material", Zenodo repository, DOI 10.5281/zenodo.1066041.
# Replication code and data for: Tracking green space along streets of world...
zenodo.org
explore.openaire.eu
+1more
bin
Updated May 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giacomo Falchetta; T. Ahmed Hammad; Giacomo Falchetta; T. Ahmed Hammad (2025). # Replication code and data for: Tracking green space along streets of world cities [Dataset]. http://doi.org/10.5281/zenodo.13886667
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13886667
Dataset updated
May 20, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Giacomo Falchetta; T. Ahmed Hammad; Giacomo Falchetta; T. Ahmed Hammad
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Oct 3, 2024
Description
# Replication code and data for: Tracking green space along streets of world cities
Falchetta, G., & Hammad, A. T. (2025). Tracking green space along streets of world cities. Environmental Research: Infrastructure and Sustainability. https://doi.org/10.1088/2634-4505/add9c4

To replicate the analysis, the results, and the figures of the paper:

Download input data from this Zenodo repository and code from Github https://github.com/giacfalk/urban_green_space_mapping_and_tracking

*Optional data extraction steps* (processed output data are already available in the Zenodo repository):

Adjust your working directory

Run [lines 4-11] of workflow/sourcer.R

Run the Javascript scripts written by the string_generator_training.R and string_generator_prediction.R files in Google Earth Engine (https://code.earthengine.google.com) and complete the export to Drive tasks to generate the output .csv files

Run workflow/sourcer.R [lines 15-46] to train the ML model and make predictions (including figures and tables replication)
Dataset for Particulate Studies and Obesity
zenodo.org
data.niaid.nih.gov
zip
Updated Jan 21, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erin J. Stephenson; Alyse Ragauskas; Sridhar Jaligama; JeAnna R. Redd; Jyothi Parvathareddy; Matthew J. Peloquin; Jordy Saravia; Joan C Han; Stephania A. Cormier; Dave Bridges; Erin J. Stephenson; Alyse Ragauskas; Sridhar Jaligama; JeAnna R. Redd; Jyothi Parvathareddy; Matthew J. Peloquin; Jordy Saravia; Joan C Han; Stephania A. Cormier; Dave Bridges (2020). Dataset for Particulate Studies and Obesity [Dataset]. http://doi.org/10.5281/zenodo.50802
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.50802
Dataset updated
Jan 21, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Erin J. Stephenson; Alyse Ragauskas; Sridhar Jaligama; JeAnna R. Redd; Jyothi Parvathareddy; Matthew J. Peloquin; Jordy Saravia; Joan C Han; Stephania A. Cormier; Dave Bridges; Erin J. Stephenson; Alyse Ragauskas; Sridhar Jaligama; JeAnna R. Redd; Jyothi Parvathareddy; Matthew J. Peloquin; Jordy Saravia; Joan C Han; Stephania A. Cormier; Dave Bridges
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Description
Code and Raw Data for Obesity Particulate Treatment study

This repository contains raw data for studies done by the Bridges Lab and our collaborators on the metabolic effects of in utero exposure to particulates containing environmentally persistent free radicals on obese adult mice. This repository contains the data for the manuscripts detailed below. The tag column indicates the state of the dataset at the indicated time.:

Publication Dataset Tag E. J. Stephenson, A. Ragauskas, S. Jaligama, J. R. Redd, J. Parvathareddy, M. J. Peloquin, J. Saravia, J. Han, S. A. Cormier, D. Bridges, Exposure to environmentally persistent free radicals during gestation lowers energy expenditure and impairs skeletal muscle mitochondrial function in adult mice. (2016). American Journal of Physioogy - Endocrinology and Metabolism. doi:10.1152/ajpendo.00521.2015. ObesityParticulateTreatment-v1.0.0 Licence

This ObesityParticulateTreatment data is made available under the Open Data Commons Attribution License: http://opendatacommons.org/licenses/by/1.0.

Data Files

Data files are located in the data directory The raw data in this analysis is located in data/raw and is the following files:

Script Files

Script files are saved in scripts folder and were analysed in this order

Manuscript

The manuscript files, including the manuscript, the figures, tables and supplementary data are in the manuscript directory.
Data from: Spatial deconvolution of HER2-positive Breast cancer delineates...
zenodo.org
data.niaid.nih.gov
zip
Updated Sep 16, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alma Andersson; Alma Andersson (2021). Spatial deconvolution of HER2-positive Breast cancer delineates tumor-associated cell type interactions [Dataset]. http://doi.org/10.5281/zenodo.4751624
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4751624
Dataset updated
Sep 16, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alma Andersson; Alma Andersson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Processed count matrices, brightfield HE-images (plain and annotated), spot selection files and meta-data associated with the manuscript "Spatial Deconvolution of HER2-positive Breast Tumors Reveals Novel Intercellular Relationships".

The content of the files are:

count-matrices.zip - processed count matrices formatted as [n_spots]x[n_genes] and named as [PATIENT][SECTION].tsv.gz.

images.zip - contains two folders HE and annotation, the former holds the HE-images for respective section named as [PATIENT][SECTION].jpg, the latter holds the annotated (by the pathologist) images named by patient (only one section from each patient was annotated).

spot-selection.zip - contains .tsv files to map array coordinates to pixel coordinates, allowing the spots and their associated expression values to be visualized jointly. Files are named as [PATIENT][SECTION]_selection.tsv.gz

meta.zip - for all annotated sections, these files are similar to the spot-selection files, but also includes the label of each spot (e.g., breast glands, connective tissue, etc.).

All files are password protected (encrypted), use the passeword zNLXkYk3Q9znUseS do decrypt the data.

code.zip - a clone of the github repository created (2021-05-12).
The BORDERSCAPE Project WebGIS Repository
zenodo.org
zip
Updated May 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oren Siegel; Oren Siegel; Julian Bogdani; Julian Bogdani; Alberto Urcia; Alberto Urcia; Serena Nicolini; Serena Nicolini; Maria Carmela Gatto; Maria Carmela Gatto (2024). The BORDERSCAPE Project WebGIS Repository [Dataset]. http://doi.org/10.5281/zenodo.11099773
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11099773
Dataset updated
May 1, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Oren Siegel; Oren Siegel; Julian Bogdani; Julian Bogdani; Alberto Urcia; Alberto Urcia; Serena Nicolini; Serena Nicolini; Maria Carmela Gatto; Maria Carmela Gatto
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description

# The BORDERSCAPE Project WebGIS Repository: Description of Contents

Data are stored in a folder named borderscape_webgis_data_v6.0.zip.

Singular files are:

- README.md: a formatted text document (Markdown syntax) describing the contents of this repository.

- sites.geojson: a GeoJSON file with information on each archaeological site included in the webGIS.

- borderscape_sites.csv: the list of archaeological sites and their attributes from which the sites.geojson file was built for the webGIS, in the open CSV (comma separated values) format.

- borderscape_archaeological_sites.xlsx: the list of archaeological sites and their attributes. It contains the same information as borderscape_sites.csv as an Excel Workbook (Office Open XML)

- flooding_nile.geojson: a GeoJSON polygon file with information on Nile flood levels at 86m and 94.5m ASL.

- borderscape_bibliography.bib: A bibliography with all of the sources abbreviated in the sites.csv file.

. merged_coronas_freegr.tif: a GEOtif of the georeferenced CORONA imagery showing the Lower Nubian landscape prior to the construction of the Aswan High Dam.

Finally, a folder named borderscape_data.zip contains the following ZIP archives with the spatial (shapefiles) data:

- borderscape_archaeological_sites.zip: a ZIP archive of a shapefile showing all of the archaeological sites and their attributes used in the webGIS.

- sites_phase1.zip: a ZIP archive of a shapefile showing archaeological sites used in the webGIS from Phase 1.

- sites_phase2.zip: a ZIP archive of a shapefile showing archaeological sites used in the webGIS from Phase 2.

- sites_phase3.zip: a ZIP archive of a shapefile showing archaeological sites used in the webGIS from Phase 3.

- sites_phase4.zip: a ZIP archive of a shapefile showing archaeological sites used in the webGIS from Phase 4.

- sites_phase5.zip: a ZIP archive of a shapefile showing archaeological sites used in the webGIS from Phase 5.

- sites_phase6.zip: a ZIP archive of a shapefile showing archaeological sites used in the webGIS from Phase 6.

- 86m_flooding_contour.zip: a ZIP archive of a shapefile showing flooded areas at 86m ASL.

- 94.5m_flooding_contour.zip: a ZIP archive of a shapefile showing flooded areas at 94.5m ASL.
Data from: Tracking and classifying Amazon fire events in near-real time
zenodo.org
zip
Updated Jun 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Niels Andela; Niels Andela (2022). Tracking and classifying Amazon fire events in near-real time [Dataset]. http://doi.org/10.5281/zenodo.6641625
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6641625
Dataset updated
Jun 14, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Niels Andela; Niels Andela
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data and code supporting the manuscript "Tracking and classifying Amazon fire events in near-real time" accepted in Science Advances.
Zenodo Code Images
kaggle.com
zip
Updated Jun 18, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford Research Computing Center (2018). Zenodo Code Images [Dataset]. https://www.kaggle.com/datasets/stanfordcompute/code-images
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Jun 18, 2018
Dataset authored and provided by
Stanford Research Computing Center
Description
Code Images

Context

This is a subset of the Zenodo-ML Dinosaur Dataset [Github] that has been converted to small png files and organized in folders by the language so you can jump right in to using machine learning methods that assume image input.

Content

Included are .tar.gz files, each named based on a file extension, and when extracted, will produce a folder of the same name.

tree -L 1 . ├── c ├── cc ├── cpp ├── cs ├── css ├── csv ├── cxx ├── data ├── f90 ├── go ├── html ├── java ├── js ├── json ├── m ├── map ├── md ├── txt └── xml

And we can peep inside a (somewhat smaller) of the set to see that the subfolders are zenodo identifiers. A zenodo identifier corresponds to a single Github repository, so it means that the png files produced are chunks of code of the extension type from a particular repository.

$ tree map -L 1 map ├── 1001104 ├── 1001659 ├── 1001793 ├── 1008839 ├── 1009700 ├── 1033697 ├── 1034342 ... ├── 836482 ├── 838329 ├── 838961 ├── 840877 ├── 840881 ├── 844050 ├── 845960 ├── 848163 ├── 888395 ├── 891478 └── 893858 154 directories, 0 files

Within each folder (zenodo id) the files are prefixed by the zenodo id, followed by the index into the original image set array that is provided with the full dinosaur dataset archive.

$ tree m/891531/ -L 1 m/891531/ ├── 891531_0.png ├── 891531_10.png ├── 891531_11.png ├── 891531_12.png ├── 891531_13.png ├── 891531_14.png ├── 891531_15.png ├── 891531_16.png ├── 891531_17.png ├── 891531_18.png ├── 891531_19.png ├── 891531_1.png ├── 891531_20.png ├── 891531_21.png ├── 891531_22.png ├── 891531_23.png ├── 891531_24.png ├── 891531_25.png ├── 891531_26.png ├── 891531_27.png ├── 891531_28.png ├── 891531_29.png ├── 891531_2.png ├── 891531_30.png ├── 891531_3.png ├── 891531_4.png ├── 891531_5.png ├── 891531_6.png ├── 891531_7.png ├── 891531_8.png └── 891531_9.png 0 directories, 31 files

So what's the difference?

The difference is that these files are organized by extension type, and provided as actual png images. The original data is provided as numpy data frames, and is organized by zenodo ID. Both are useful for different things - this particular version is cool because we can actually see what a code image looks like.

How many images total?

We can count the number of total images:

find "." -type f -name *.png | wc -l 3,026,993

Dataset Curation

The script to create the dataset is provided here. Essentially, we start with the top extensions as identified by this work (excluding actual images files) and then write each 80x80 image to an actual png image, organizing by extension then zenodo id (as shown above).

Saving the Image

I tested a few methods to write the single channel 80x80 data frames as png images, and wound up liking cv2's imwrite function because it would save and then load the exact same content.

import cv2 cv2.imwrite(image_path, image)

Loading the Image

Given the above, it's pretty easy to load an image! Here is an example using scipy, and then for newer Python (if you get a deprecation message) using imageio.

image_path = '/tmp/data1/data/csv/1009185/1009185_0.png' from imageio import imread image = imread(image_path) array([[116, 105, 109, ..., 32, 32, 32], [ 48, 44, 48, ..., 32, 32, 32], [ 48, 46, 49, ..., 32, 32, 32], ..., [ 32, 32, 32, ..., 32, 32, 32], [ 32, 32, 32, ..., 32, 32, 32], [ 32, 32, 32, ..., 32, 32, 32]], dtype=uint8) image.shape (80,80) # Deprecated from scipy import misc misc.imread(image_path) Image([[116, 105, 109, ..., 32, 32, 32], [ 48, 44, 48, ..., 32, 32, 32], [ 48, 46, 49, ..., 32, 32, 32], ..., [ 32, 32, 32, ..., 32, 32, 32], [ 32, 32, 32, ..., 32, 32, 32], [ 32, 32, 32, ..., 32, 32, 32]], dtype=uint8)

Remember that the values in the data are characters that have been converted to ordinal. Can you guess what 32 is?

ord(' ') 32 # And thus if you wanted to convert it back... chr(32)

So how t...
Data from: Data for climate mitigation scenarios with persistent COVID-19...
zenodo.org
data.niaid.nih.gov
Updated Jun 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jarmo Kikstra; Jarmo Kikstra; Adriano Vinca; Adriano Vinca; Francesco Lovat; Francesco Lovat; Benigna Boza-Kiss; Benigna Boza-Kiss; Bas van Ruijven; Bas van Ruijven; Charlie Wilson; Charlie Wilson; Joeri Rogelj; Joeri Rogelj; Behnam Zakeri; Behnam Zakeri; Oliver Fricko; Oliver Fricko; Keywan Riahi; Keywan Riahi (2023). Data for climate mitigation scenarios with persistent COVID-19 related energy demand changes [Dataset]. http://doi.org/10.5281/zenodo.5211169
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.5211169
Dataset updated
Jun 15, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jarmo Kikstra; Jarmo Kikstra; Adriano Vinca; Adriano Vinca; Francesco Lovat; Francesco Lovat; Benigna Boza-Kiss; Benigna Boza-Kiss; Bas van Ruijven; Bas van Ruijven; Charlie Wilson; Charlie Wilson; Joeri Rogelj; Joeri Rogelj; Behnam Zakeri; Behnam Zakeri; Oliver Fricko; Oliver Fricko; Keywan Riahi; Keywan Riahi
Description
This repository contains data for the main text figures plus some supplementary figures in the article:
Kikstra et al 2021 Nat. Energy. DOI: 10.1038/s41560-021-00904-8

This dataset should be cited as: Kikstra et al. (2021). Data for climate mitigation scenarios with persistent COVID-19 related energy demand changes. DOI: 10.5281/zenodo.5211169

In order to reproduce the figures, one needs to use the script that is available on GitHub at:
https://github.com/iiasa/covid-energy-demand-scenarios

The most accessible way of exploring the scenario data behind this article would be to go to https://data.ece.iiasa.ac.at/engage/#/workspaces/60.
This goes to a web tool hosted by the International Institute of Applied Systems Analysis (IIASA) which provides access to a database of these and more variables of interest, defined for each scenario on the detail of MESSAGE regions, with a few example workspaces available within the ENGAGE Scenario Explorer.
The Scenario Explorer is a versatile open access tool to browse, visualize and download data and results. Users can freely create a private workspace where customized plots can be saved and shared.
For tutorials on how to use the Scenario Explorer, please visit https://software.ece.iiasa.ac.at/ixmp-server/tutorials.html.

The scenarios that were used for the IPCC Special Report on 1.5C warming (SR1.5) have been made available at https://data.ece.iiasa.ac.at/iamc-1.5c-explorer/.

The data is available for download at the ENGAGE Scenario Explorer. The license permits use of the scenario ensemble for scientific research and science communication, but restricts redistribution of substantial parts of the data. Please refer to the FAQ and legal code for more information.
Data from: Global prediction of extreme floods in ungauged watersheds
zenodo.org
data.niaid.nih.gov
application/gzip
Updated Jan 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Grey Nearing; Grey Nearing (2025). Global prediction of extreme floods in ungauged watersheds [Dataset]. http://doi.org/10.5281/zenodo.10397664
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10397664
Dataset updated
Jan 22, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Grey Nearing; Grey Nearing
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset accompanies the following article:

Nearing, Grey, et al. "Global prediction of extreme floods in ungauged watersheds." Nature (2024).

The code repository associated with this data is repository here: https://github.com/google-research-datasets/global_streamflow_model_paper/. It is highly recommended to use the associated code repository to process this data.

The `model_data.tgz` repository includes reforecasts from the Google model and reanalyses from the GloFAS model. Google model outputs are in units [mm/day] and GloFAS outputs are in units [m3/s]. Model outputs are daily and timestamps are right-labeled, meaning that model outputs labeled, .e.g., 01/01/2020 correspond to streamflow predictions for the day of 12/31/2019.

ESA WorldCereal 10 m 2021 v100

zenodo.org
data.niaid.nih.gov

bin, zip

Updated Aug 5, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Kristof Van Tricht; Kristof Van Tricht; Jeroen Degerickx; Jeroen Degerickx; Sven Gilliams; Sven Gilliams; Daniele Zanaga; Mickaël Savinaud; Marjorie Battude; Romain Buguet de Chargère; Guillaume Dubreule; Alex Grosu; Joost Brombacher; Joost Brombacher; Henk Pelgrum; Henk Pelgrum; Myroslava Lesiv; Juan Carlos Laso Bayas; Santosh Karanam; Steffen Fritz; Inbal Becker-Reshef; Belén Franch; Bertran Mollà Bononad; Juanma Cintas; Juanma Cintas; Hendrik Boogaard; Arun Kumar Pratihast; Lubos Kucera; Zoltan Szantoi; Zoltan Szantoi; Daniele Zanaga; Mickaël Savinaud; Marjorie Battude; Romain Buguet de Chargère; Guillaume Dubreule; Alex Grosu; Myroslava Lesiv; Juan Carlos Laso Bayas; Santosh Karanam; Steffen Fritz; Inbal Becker-Reshef; Belén Franch; Bertran Mollà Bononad; Hendrik Boogaard; Arun Kumar Pratihast; Lubos Kucera (2024). ESA WorldCereal 10 m 2021 v100 [Dataset]. http://doi.org/10.5281/zenodo.7875105

Explore at:

zip, binAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7875105

Dataset updated

Aug 5, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

ESA WorldCereal 2021 products v100

The European Space Agency (ESA) WorldCereal 10m 2021 product suite consist of global-scale annual and seasonal crop maps and (where applicable) their related confidence. Every file in this repository contains up to 106 agro-ecological zone (AEZ) products which were all processed with respect to their own regional seasonality and should be considered as independent products.

Naming convention of the ZIP files is as follows:

WorldCereal_{year}_{season}_{product}_{classification|confidence}.zip

The actual AEZ-based GeoTIFF files inside each ZIP are named according to following convention:

{AEZ_id}_{season}_{product}_{startdate}_{enddate}_{classification|confidence}.tif

The seasons are defined in Table 1. Note that cereals as described by WorldCereal include wheat, barley and rye, which belong to the Triticeae tribe. Next to the actual WorldCereal products, this repository contains the files "WorldCereal_AEZ.geojson" that contains the AEZ description and outline, as well as "QGIS_stylefiles.zip" which contains QGIS style files (.qml) for product visualization purposes.

Season	Description
tc-annual	A one-year cycle being defined in a region by the end of the last considered growing season
tc-wintercereals	The main cereals season defined in a region
tc-springcereals	Optional springcereals season, only defined in certain AEZ
tc-maize-main	The main maize season defined in a region
tc-maize-second	Optional second maize season, only defined in certain AEZ.

Note: AEZs for which no irrigation product is available were not processed because of the unavailability of thermal Landsat data.

A scientific paper describing the WorldCereal products and the methodology behind them is available through the link below:

Van Tricht, K., Degerickx, J., Gilliams, S., Zanaga, D., Battude, M., Grosu, A., Brombacher, J., Lesiv, M., Bayas, J. C. L., Karanam, S., Fritz, S., Becker-Reshef, I., Franch, B., Mollà-Bononad, B., Boogaard, H., Pratihast, A. K., Koetz, B., and Szantoi, Z.: WorldCereal: a dynamic open-source system for global-scale, seasonal, and reproducible crop and irrigation mapping, Earth Syst. Sci. Data, 15, 5491–5515, https://doi.org/10.5194/essd-15-5491-2023, 2023.

This work was supported by the European Space Agency under contract N°4000130569/20/I-NB.

Simulated exome-sequencing data for a family study of lymphoid cancer
zenodo.org
data.niaid.nih.gov
bin, txt, zip
Updated Jul 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jinko Graham; Jinko Graham; Nirodha Epasinghege Dona; Nirodha Epasinghege Dona (2024). Simulated exome-sequencing data for a family study of lymphoid cancer [Dataset]. http://doi.org/10.5281/zenodo.12696267
Explore at:
bin, zip, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12696267
Dataset updated
Jul 10, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jinko Graham; Jinko Graham; Nirodha Epasinghege Dona; Nirodha Epasinghege Dona
License
https://www.gnu.org/licenses/agpl.txthttps://www.gnu.org/licenses/agpl.txt
Time period covered
Jul 8, 2024
Description
This repository contains all the data files for a simulated exome-sequencing study of 150 families, ascertained to contain at least four members affected with lymphoid cancer. Please note that previous versions of this repository omitted a key file linking the genotypes of individuals to their family and individual IDs; this file, geno_key.txt, is now included. All other files remain the same as in previous versions.

The simulated data can be found in the files section below. The files are:

SLiM_output.txt - contains the SLiM-simulated, exome-wide, SNV data generated under an American-admixture demographic model, for the American-admixed sub-population only.

SLiM_output_chr8&9.txt - contains the SLiM-simulated data above for all source populations as well as the American-admixed sub-population, but only for chromosomes 8 and 9.

sample_info.txt - contains pedigree information of all the disease-affected individuals and individuals connecting them along a line of descent, for all 150 ascertained pedigrees.

Genotypes.zip - a zipfile that contains 22 text files of genotypes for each chromosome. The genotypes are for simulated single-nucleotide variants on the exome and are in gene-dosage format.

geno_key.txt – a plain-text file that links the genotyped individuals to their family and individual IDs.

SNVmaps.zip - a zipfile that contains 22 text files giving the single-nucleotide variant information for each chromosome.

familial_cRV.txt - contains the familial causal rare variants for all 150 ascertained pedigrees.

study_peds.txt - contains the 150 pedigrees ascertained to contain four or more relatives affected with lymphoid cancer.

PLINKfiles.zip - a zipfile that contains PLINK .fam, .bim and .bed files for all 22 of the chromosomes.

All the scripts used to generate these data can be found in the GitHub repository archived at https://zenodo.org/records/12694914

We have also uploaded one intermediate .Rdata file, Chromwide.Rdata, to save the user substantial time when running the associated RMarkdown script for the simulation. We recommend loading Chromwide.Rdata into your R work-space rather than generating it from scratch.
Processed data for MethylBoostER: an XGBoost model to classify kidney cancer...
zenodo.org
explore.openaire.eu
zip
Updated Apr 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sabrina H Rossi; Sabrina H Rossi; Charles E Massie; Charles E Massie; Izzy Newsham; Izzy Newsham; Shamith A Samarajiwa; Shamith A Samarajiwa (2022). Processed data for MethylBoostER: an XGBoost model to classify kidney cancer subtypes [Dataset]. http://doi.org/10.5281/zenodo.6463893
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6463893
Dataset updated
Apr 16, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sabrina H Rossi; Sabrina H Rossi; Charles E Massie; Charles E Massie; Izzy Newsham; Izzy Newsham; Shamith A Samarajiwa; Shamith A Samarajiwa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a repository containing processed data for MethylBoostER, an XGBoost model that classifies kidney cancer subtypes. The open-source code can be found here: https://github.com/ss-lab-cancerunit/MethylBoostER.
mroeck/carbenmats-buildings: Pre-release
zenodo.org
zip
Updated Sep 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin RÖCK; Martin RÖCK (2023). mroeck/carbenmats-buildings: Pre-release [Dataset]. http://doi.org/10.5281/zenodo.8363895
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8363895
Dataset updated
Sep 26, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Martin RÖCK; Martin RÖCK
Description
A Global Database on Whole Life Carbon, Energy and Material Intensity of Buildings (CarbEnMats-Buildings)

Data descriptor article (Preprint): https://doi.org/10.5281/zenodo.8378939

Data records / database (this repo): https://doi.org/10.5281/zenodo.8363895

Latest version: Available via https://github.com/mroeck/carbenmats-buildings (main)

Abstract

Globally, interest in understanding the life cycle related greenhouse gas (GHG) emissions of buildings is increasing. Robust data is required for benchmarking and analysis of parameters driving resource use and whole life carbon (WLC) emissions. However, open datasets combining information on energy and material use as well as whole life carbon emissions remain largely unavailable – until now.

We present a global database on whole life carbon, energy use, and material intensity of buildings. It contains data on more than 1,200 building case studies and includes over 300 attributes addressing context and site, building design, assessment methods, energy and material use, as well as WLC emissions across different life cycle stages. The data was collected through various meta-studies, using a dedicated data collection template (DCT) and processing scripts (Python Jupyter Notebooks), all of which are shared alongside this data descriptor.

This dataset is valuable for industrial ecology and sustainable construction research and will help inform decision-making in the building industry as well as the climate policy context.

Background & Summary

The need for reducing greenhouse gas (GHG) emissions across Europe require defining and implementing a performance system for both operational and embodied carbon at the building level that provides relevant guidance for policymakers and the building industry. So-called whole life carbon (WLC) of buildings is gaining increasing attention among decision-makers concerned with climate and industrial policy, as well as building procurement, design, and operation. However, most open buildings datasets published thus far have been focusing on building’s operational energy consumption and related parameters 1,2,2–4. Recent years furthermore brought large-scale datasets on building geometry (footprint, height) 5,6 as well as the publication of some datasets on building construction systems and material intensity 7,8. Heeren and Fishman’s database seed on material intensity (MI) of buildings 7, an essential reference to this work, was a first step towards an open data repository on material-related environmental impacts of buildings. In their 2019 descriptor, the authors present data on the material coefficients of more than 300 building cases intended for use in studies applying material flow analysis (MFA), input-output (IO) or life cycle assessment (LCA) methods. Guven et al. 8 elaborated on this effort by publishing a construction classification system database for understanding resource use in building construction. However, thus far, there is a lack of publicly available data that combines material composition, energy use and also considers life cycle-related environmental impacts, such as life cycle-related GHG emissions, also referred to as building’s whole life carbon.

The Global Database on Whole Life Carbon, Energy Use, and Material Intensity of Buildings (CarbEnMats-Buildings) published alongside this descriptor provides information on more than 1,200 buildings worldwide. The dataset includes attributes on geographical context and site, main building design characteristics, LCA-based assessment methods, as well as information on energy and material use, and related life cycle greenhouse gas (GHG) emissions, commonly referred to as whole life carbon (WLC), with a focus on embodied carbon (EC) emissions. The dataset compiles data obtained through a systematic review of the scientific literature as well as systematic data collection from both literature sources and industry partners. By applying a uniform data collection template (DCT) and related automated procedures for systematic data collection and compilation, we facilitate the processing, analysis and visualization along predefined categories and attributes, and support the consistency of data types and units. The descriptor includes specifications related to the DCT spreadsheet form used for obtaining these data as well as explanations of the data processing and feature engineering steps undertaken to clean and harmonise the data records. The validation focuses on describing the composition of the dataset and values observed for attributes related to whole life carbon, energy and material intensity.

The data published with this descriptor offers the largest open compilation of data on whole life carbon emissions, energy use and material intensity of buildings published to date. This open dataset is expected to be valuable for research applications in the context of MFA, I/O and LCA modelling. It also offers a unique data source for benchmarking whole life carbon, energy use and material intensity of buildings to inform policy and decision-making in the context of the decarbonization of building construction and operation as well as commercial real estate in Europe and beyond.

Files

All files related to this descriptor are available on a public GitHub repository and related release via Zenodo (https://doi.org/10.5281/zenodo.8363895). The repository contains the following files:

README.md is a text file with instructions on how to use the files and documents.

CarbEnMats_attributes.XLSX is a table with the complete attribute description.

CarbEnMats_materials.XLSX is the table of material options and mappings.

CarbEnMats_dataset.XLSX is the building dataset in MS Excel format.

CarbEnMats_dataset.txt is the building dataset in tab-delimited TXT format.

Further information

Please consult the related data descriptor article (linked at the top) for further information, e.g.:

Methods (Data collection; data processing)

Data records (Files; Sources; Attributes)

Technical validation (Data overview; Data consistency)

Usage Notes (Attribute priority; Scope summary, Missing information)

Code availability (LICENSE)

The dataset, the data collection template as well as the code used for processing, harmonization and visualization are published under a GNU General Public License v3.0. The GNU General Public License is a free, copyleft license for software and other kinds of works. We encourage you to review, reuse, and refine the data and scripts and eventually share-alike.

Contributing

The CarbEnMats-Buildings database is the results of a highly collaborative effort and needs your active contributions to further improve and grow the open building data landscape. Reach out to the lead author (email, linkedin) if you are interested to contribute your data or time.

Cite as

When referring to this work, please cite both the descriptor and the dataset:

Descriptor: RÖCK, Martin, SORENSEN, Andreas, BALOUKTSI, Maria, RUSCHI MENDES SAADE, Marcella, RASMUSSEN, Freja Nygaard, BIRGISDOTTIR, Harpa, FRISCHKNECHT, Rolf, LÜTZKENDORF, Thomas, HOXHA, Endrit, HABERT, Guillaume, SATOLA, Daniel, TRUGER, Barbara, TOZAN, Buket, KUITTINEN, Matti, ALAUX, Nicolas, ALLACKER, Karen, & PASSER, Alexader. (2023). A Global Database on Whole Life Carbon, Energy and Material Intensity of Buildings (CarbEnMats-Buildings) [Preprint]. Zenodo. https://doi.org/10.5281/zenodo.8378939

Dataset: Martin Röck. (2023). mroeck/carbenmats-buildings: Pre-release (0.1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.8363895
In vitro genotoxicity testing using γH2AX biomarker, Microscopy Dataset
zenodo.org
data.niaid.nih.gov
txt, zip
Updated Feb 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bára Křížkovská; Bára Křížkovská; Eva Jablonská; Eva Jablonská; Martin Schätz; Martin Schätz (2024). In vitro genotoxicity testing using γH2AX biomarker, Microscopy Dataset [Dataset]. http://doi.org/10.5281/zenodo.7673199
Explore at:
zip, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7673199
Dataset updated
Feb 14, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Bára Křížkovská; Bára Křížkovská; Eva Jablonská; Eva Jablonská; Martin Schätz; Martin Schätz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Dataset was used and is supplementary to the paper "In vitro genotoxicity testing using γH2AX biomarker, microscopy and automatic image analysis in ImageJ - a pilot study with valinomycin". It contains both RAW single-channel images and numerical results obtained with BioImage Analysis and evaluation available through the GitHub repository: https://github.com/martinschatz-cz/genotoxicity-bia.

Naming Convention for images
ChannelName_YYYYMMD_Well_PossitionInWell_AcqRun.tiff

Naming Convention for results
Well_AllResults_YYYY-MM-DD_Results.csv

The csv are ‘,’ separated, and automatically named by analysis script.

Folder Structure

Images (1069 files, as Images.zip)

4H - all images for all wells

24H - all images for all wells

Results (36 files)

Results_4H (as Results_4h.zip)

Results_24H (as Results_24h.zip)

Measurement Settings

Manufacturer and model of microscope: Olympus IX83 P2ZF

Objective lens magnification, NA: 10x Olympus IX3 Nosepiece, LensNA=0.3

Excitation filters (mounted in the light source)

Violet: 395/25nm LED module 1, DAPI

Green: 555/28nm LED module 5, Cy3

Quad band filter set for DAPI/FITC/Cy3/Cy5

Quad band polychroic mirror (mounted in the filter turret):

BP 411-454nm,

BP 495-536nm,

BP 577-617nm

BP 655-810nm

Emission filters (mounted in the fast emission filter wheel, infront the camera):

DAPI: BP 421-445nm

Cy3: BP 581-619nm

Illumination light source: Lumencor Lumencor Spectra X Lamp

Pixel size: 650nm x 650nm

Camera manufacturer and model: Hamamatsu ORCA-Flash4.0

Software program(s) and version: OLYMPUS cellSens Dimension 3.2 (Build 23706)

Image acquisition settings

expposure 500 ms

gain: 0

binning: 4 x 4

Experiment manager: ZDC + autofocus, two channels: DAPI and Cy3

Image Processing and Analysis
The data analysis workflow consists of several stages, each of which was executed by a specific script. Firstly, the raw data were manually cleaned and automatically sorted and organized using sort_wells.ijm in FIJI. Secondly, image analysis was performed using Process_WFolder_macro_v1.ijm in FIJI, which processed the image data and extracted the relevant features. Finally, the results were further processed using SF_dataVis_and_statistics_mean_XYh.ipynb in Python (Jupyter Notebook), which generated the final output in the form of a CSV file.

In this repository, you can access the resulting CSV file, which contains the final results of our analysis. Additionally, we have provided the scripts used to process the data, which are available on our GitHub repository (LINK). You will find instruction how to create local Jupyter Hub for Python scripts. These scripts are accompanied by a short manual that provides an overview of the data analysis workflow and helps users navigate through the code. By making our scripts available, we hope to facilitate transparency and reproducibility of our research. If you encounter any issues, please report them through the GitHub repository: https://github.com/martinschatz-cz/genotoxicity-bia.

We believe that our work can be useful for other researchers and analysts who are interested in studying similar datasets. We invite you to explore the contents of this repository and use the data and scripts provided here to further your research.

Cell lines and culture conditions
Human cervical adenocarcinoma (HeLa) and a Chinese hamster ovary (CHO-K1) cell lines were obtained from American Type Culture Collection (ATCC). The HeLa cells were grown MEM supplemented with 10 % FBS and NEAA. CHO-K1 cells were cultivated with DMEM supplemented with L-proline (final concentration 35 mg/l). The cell incubation took place in a humidified atmosphere of 5% CO2 at 37 °C.

Direct measurement of DNA DSBs
The cells were seeded in concentration 0,5 × 105 cells/ml into the 96-well plate (VWR, 10062-900). The cells were rinsed by phosphate buffered saline (PBS;) after 24h incubation, and medium with reduced FBS content (5 %) was added. Valinomycin was dissolved in DMSO and added to cells in two final concentrations (30 and 15 𝞵M). After 4h/24h incubation the visualization was done using and following protocol of HCS DNA Damage Kit. The cells were fixed by 4% paraformaldehyde solution for 15 min at room temperature. The cells were rinsed once by PBS and the permeabilization was performed using Triton® X-100 () solution by incubation for 15 min at room temperature. The wells were rinsed with PBS once and the plate was blocked by 1% BSA blocking solution. After 1 hour incubation at room temperature the blocking solution was removed and 100 𝞵l of pH2AX mouse monoclonal antibody solution (1:1000 in BSA) was pipetted into each well incubated for 1 hour at room temperature. After three times rinsing by PBS the 100 𝞵l of Alexa Fluor® 555 goat anti-mouse IgG (H+L; 1:2000) and Hoechst 33342 (1:6000) solution was incubated for 1 hour at room temperature protected from light. After the incubation the wells were rinsed three times by PBS. The plate was stored with 100 𝞵l in the refrigerator (4 °C) until the image analysis was performed.
MD simulations of phosphorylated peptides (GGXXGG)
zenodo.org
data.niaid.nih.gov
zip
Updated Jun 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Bickel; David Bickel; Wim Vranken; Wim Vranken (2024). MD simulations of phosphorylated peptides (GGXXGG) [Dataset]. http://doi.org/10.5281/zenodo.10518873
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10518873
Dataset updated
Jun 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
David Bickel; David Bickel; Wim Vranken; Wim Vranken
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains MD simulations and associated analyses of a peptide series of short peptides including phosphorylated residues. It is one of five repositories that are associated to the following research article:

Bickel,D., and Vranken,W. (2024) Effects of Phosphorylation on Protein Backbone Dynamics and Conformational Preferences. J. Chem. Theory Comput. https://doi.org/10.1021/acs.jctc.4c00206.

The full list of the related repositories is given here:

Pentapeptide simulations: 10.5281/zenodo.10517328

Hexapeptides simulations: 10.5281/zenodo.10518872

Heptapeptides simulations: 10.5281/zenodo.10518971

Octapeptides simulations: 10.5281/zenodo.10518993

Nonapeptides simulations: 10.5281/zenodo.10519033

Facebook

Twitter

Click to copy link

Link copied

Cite

Joost de Vries; Joost de Vries; Alex J. Poulton; Alex J. Poulton; Jeremy R. Young; Jeremy R. Young; Fanny M. Monteiro; Fanny M. Monteiro; Rosie M. Sheward; Rosie M. Sheward; Roberta Johnson; Kyoko Hagino; Kyoko Hagino; Patrizia Ziveri; Patrizia Ziveri; Levi J. Wolf; Levi J. Wolf; Roberta Johnson (2024). Coccolithophore Abundance, Size, Carbon And Distribution Estimates (CASCADE) [Dataset]. http://doi.org/10.5281/zenodo.12794780

Coccolithophore Abundance, Size, Carbon And Distribution Estimates (CASCADE)

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.12794780

Dataset updated

Jul 22, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

CASCADE is a global dataset for 139 extant coccolithophore taxonomic units. CASCADE includes a trait database (size and cellular organic and inorganic carbon contents) and taxonomic-specific global spatiotemporal distributions (Lat/Lon/Depth/Month/Year) of coccolithophore abundance and organic and inorganic carbon stocks. CASCADE covers all ocean basins over the upper 275 meters, spans the years 1964-2019 and includes 33,119 taxonomic-specific abundance observations. Within CASCADE, we characterise the underlying uncertainties due to measurement errors by propagating error estimates between the different studies.

Full details of the data set are provided in the associated Scientific Data manuscript. The repository contains five main folders: 1) "Classification", which contains YAML files with synonyms, family-level classifications, and life cycle phase associations and definitions; 2) "Concatenated literature", which contains the merged datasets of size, PIC and POC and which were corrected for taxonomic unit synonyms; 3) "Resampled cellular datasets", which contains the resampled datasets of size, PIC and POC in long format as well as a summary table; 4) "Gridded data sets", which contains gridded datasets of abundance, PIC and POC; 5) "Species lists", which contains spreadsheets of the "common" (>20 obs) and "rare" (<20 obs) species and their number of observations.

The CASCADE data set can be easily reproduced using the scripts and data provided in the associated github repository: https://github.com/nanophyto/CASCADE/tree/v0.1.1" target="_blank" rel="noopener">https://github.com/nanophyto/CASCADE/ (zenodo.12797197)

Correspondence to: Joost de Vries, joost.devries@bristol.ac.uk

Clear search

Close search

Google apps

Main menu

Coccolithophore Abundance, Size, Carbon And Distribution Estimates (CASCADE)...

A single-cell and spatially resolved atlas of human breast cancers | spatial...

The North Pacific Eukaryotic Gene Catalog: metatranscriptome assemblies with...

QUASR: the QUAsisymmetric Stellarator Repository

Corrected IODP Gamma Ray Attenuation (GRA) densities and calculated...

GitHub Repository:

Data from: Long-Term Wi-Fi fingerprinting dataset and supporting material

# Replication code and data for: Tracking green space along streets of world...

Dataset for Particulate Studies and Obesity

Data from: Spatial deconvolution of HER2-positive Breast cancer delineates...

The BORDERSCAPE Project WebGIS Repository

Data from: Tracking and classifying Amazon fire events in near-real time

Zenodo Code Images

Code Images

Context

Content

Dataset Curation

Saving the Image

Loading the Image

Data from: Data for climate mitigation scenarios with persistent COVID-19...

Data from: Global prediction of extreme floods in ungauged watersheds

ESA WorldCereal 10 m 2021 v100

Simulated exome-sequencing data for a family study of lymphoid cancer

Processed data for MethylBoostER: an XGBoost model to classify kidney cancer...

mroeck/carbenmats-buildings: Pre-release

In vitro genotoxicity testing using γH2AX biomarker, Microscopy Dataset

MD simulations of phosphorylated peptides (GGXXGG)

Coccolithophore Abundance, Size, Carbon And Distribution Estimates (CASCADE)