100+ datasets found
  1. Coccolithophore Abundance, Size, Carbon And Distribution Estimates (CASCADE)...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jul 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joost de Vries; Joost de Vries; Alex J. Poulton; Alex J. Poulton; Jeremy R. Young; Jeremy R. Young; Fanny M. Monteiro; Fanny M. Monteiro; Rosie M. Sheward; Rosie M. Sheward; Roberta Johnson; Kyoko Hagino; Kyoko Hagino; Patrizia Ziveri; Patrizia Ziveri; Levi J. Wolf; Levi J. Wolf; Roberta Johnson (2024). Coccolithophore Abundance, Size, Carbon And Distribution Estimates (CASCADE) [Dataset]. http://doi.org/10.5281/zenodo.12794780
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 22, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Joost de Vries; Joost de Vries; Alex J. Poulton; Alex J. Poulton; Jeremy R. Young; Jeremy R. Young; Fanny M. Monteiro; Fanny M. Monteiro; Rosie M. Sheward; Rosie M. Sheward; Roberta Johnson; Kyoko Hagino; Kyoko Hagino; Patrizia Ziveri; Patrizia Ziveri; Levi J. Wolf; Levi J. Wolf; Roberta Johnson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    CASCADE is a global dataset for 139 extant coccolithophore taxonomic units. CASCADE includes a trait database (size and cellular organic and inorganic carbon contents) and taxonomic-specific global spatiotemporal distributions (Lat/Lon/Depth/Month/Year) of coccolithophore abundance and organic and inorganic carbon stocks. CASCADE covers all ocean basins over the upper 275 meters, spans the years 1964-2019 and includes 33,119 taxonomic-specific abundance observations. Within CASCADE, we characterise the underlying uncertainties due to measurement errors by propagating error estimates between the different studies.
    Full details of the data set are provided in the associated Scientific Data manuscript. The repository contains five main folders: 1) "Classification", which contains YAML files with synonyms, family-level classifications, and life cycle phase associations and definitions; 2) "Concatenated literature", which contains the merged datasets of size, PIC and POC and which were corrected for taxonomic unit synonyms; 3) "Resampled cellular datasets", which contains the resampled datasets of size, PIC and POC in long format as well as a summary table; 4) "Gridded data sets", which contains gridded datasets of abundance, PIC and POC; 5) "Species lists", which contains spreadsheets of the "common" (>20 obs) and "rare" (<20 obs) species and their number of observations.
    The CASCADE data set can be easily reproduced using the scripts and data provided in the associated github repository: https://github.com/nanophyto/CASCADE/tree/v0.1.1" target="_blank" rel="noopener">https://github.com/nanophyto/CASCADE/ (zenodo.12797197)

    Correspondence to: Joost de Vries, joost.devries@bristol.ac.uk

  2. A single-cell and spatially resolved atlas of human breast cancers | spatial...

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip, pdf
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sunny Z Wu; Sunny Z Wu; Alexander Swarbrick; Alexander Swarbrick (2024). A single-cell and spatially resolved atlas of human breast cancers | spatial transcriptomics data [Dataset]. http://doi.org/10.5281/zenodo.4739739
    Explore at:
    application/gzip, pdfAvailable download formats
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sunny Z Wu; Sunny Z Wu; Alexander Swarbrick; Alexander Swarbrick
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains spatial transcriptomics data related to the Wu et al. 2021 study "A single-cell and spatially resolved atlas of human breast cancers". Processed count matrices, brightfield HE-images (plain and annotated) and meta-data (containing clinical information and spot pathological details) for 6 primary breast cancers profiled using the Visium assay (10X Genomics). If you use this dataset in your research, please consider citing the above study.

    The content of the files are:
    raw_count_matrices.tar.gz - spaceranger processed raw count matrices.

    spatial.tar.gz - spaceranger processed spatial files (images, scalefactors, aligned fiducials, position lists)

    filtered_count_matrices.tar.gz - filtered count matrices.

    metadata.tar.gz - metadata for tissues and spots of filtered count matrices, including clinical subtype and pathological annotation of each spot.

    images.pdf - pdf detailing the H&E and annotation images.

  3. The North Pacific Eukaryotic Gene Catalog: metatranscriptome assemblies with...

    • zenodo.org
    application/gzip
    Updated Jan 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mora Groussman; Mora Groussman; Stephen Blaskowski; Stephen Blaskowski; Sacha Coesel; Sacha Coesel; E. Virginia Armbrust; E. Virginia Armbrust (2025). The North Pacific Eukaryotic Gene Catalog: metatranscriptome assemblies with taxonomy, function and abundance annotations [Dataset]. http://doi.org/10.5281/zenodo.12630398
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 22, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mora Groussman; Mora Groussman; Stephen Blaskowski; Stephen Blaskowski; Sacha Coesel; Sacha Coesel; E. Virginia Armbrust; E. Virginia Armbrust
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data continues with the development of the unprocessed NPEGC Trinity de novo metatranscriptome assemblies, uploaded to this Zenodo repository for raw assemblies: The North Pacific Eukaryotic Gene Catalog: Raw assemblies from Gradients 1, 2 and 3

    A full description of this data is published in Scientific Data, available here: The North Pacific Eukaryotic Gene Catalog of metatranscriptome assemblies and annotations. Please cite this publication if your research uses this data:

    Groussman, R. D., Coesel, S. N., Durham, B. P., Schatz, M. J., & Armbrust, E. V. (2024). The North Pacific Eukaryotic Gene Catalog of metatranscriptome assemblies and annotations. Scientific Data, 11(1), 1161.


    Excerpts of key processing steps are sampled below with links to the detailed code on the main github code repository: https://github.com/armbrustlab/NPac_euk_gene_catalog


    Processing and annotation of protein-level NPEGC metatranscripts is done in 6 primary steps:
    1. Six-frame translation into protein sequences
    2. Frame-selection of protein-coding translation frames
    3. Clustering of protein sequences at 99% sequence identity
    4. Taxonomic annotation against MarFERReT v1.1 + MARMICRODB v1.0 multi-kingdom marine reference protein sequence library with DIAMOND
    5. Functional annotation against Pfam 35.0 protein family HMM profiles using HMMER3
    6. Functional annotation against KOfam HMM profiles (KEGG release 104.0) using KofamScan v1.3.0

    # Define local NPEGC base directory here:
    NPEGC_DIR="/mnt/nfs/projects/armbrust-metat"

    # Raw assemblies are located in the /assemblies/raw/ directory
    # for each of the metatranscriptome projects
    PROJECT_LIST="D1PA G1PA G2PA G3PA G3PA_diel"

    # raw Trinity assemblies:
    RAW_ASSEMBLY_DIR="${NPEGC_DIR}/${PROJECT}/assemblies/raw"

    Translation
    We began processing the raw metatranscriptome assemblies by six-frame translation from nucleotide transcripts into three forward and three reverse reading frame translations, using the transeq function in the EMBOSS package. We add a cruise and sample prefix to the sequence IDs to ensure unique identification downstream (ex, `>TRINITY_DN2064353_c0_g1_i1_1` to `>G1PA_S09C1_3um_TRINITY_DN2064353_c0_g1_i1_1` for the S09C1_3um sample in the G1PA assemblies). See NPEGC.6tr_frame_selection_clustering.sh for full code description.

    Example of six-frame translation using transeq
    transeq -auto -sformat pearson -frame 6 -sequence 6tr/${PREFIX}.Trinity.fasta -outseq 6tr/${PREFIX}.Trinity.6tr.fasta

    Frame selection
    We use a custom frame-selection python script keep_longest_frame.py to determine the longest coding length in each open reading frame and retain this sequence (or multiple sequences if there is a tie) for downstream analyses. See NPEGC.6tr_frame_selection_clustering.sh for full code description.

    Clustering by sequence identity
    To reduce sequence redundancy and near-identical sequences, we cluster protein sequences at the 99% sequence identity level and retain the sequence cluster representative in a reduced-size FASTA output file. See NPEGC.6tr_frame_selection_clustering.sh for full code description of linclust/mmseqs clustering.

    Sample of linclust clustering script: core mmseqs function
    function NPEGC_linclust {
    # make an index of the fasta file:
    $MMSEQS_DIR/mmseqs createdb $FASTA_PATH/$FASTA_FILE NPac.$STUDY.bf100.db
    # cluster sequences at $MIN_SEQ_ID
    $MMSEQS_DIR/mmseqs linclust NPac.${STUDY}.bf100.db NPac.${STUDY}.clusters.db NPac_tmp --min-seq-id ${MIN_SEQ_ID}
    # retieve cluster representatives:
    $MMSEQS_DIR/mmseqs result2repseq NPac.${STUDY}.bf100.db NPac.${STUDY}.clusters.db NPac.${STUDY}.clusters.rep
    # generate flat FASTA output with cluster reps
    $MMSEQS_DIR/mmseqs result2flat NPac.${STUDY}.bf100.db NPac.${STUDY}.bf100.db NPac.${STUDY}.clusters.rep NPac.${STUDY}.bf100.id99.fasta --use-fasta-header
    }

    Corresponding files uploaded to this repository: Gzip-compressed FASTA files after translation, frame-selection, and clustering at 99% sequence identity (.bf100.id99.aa.fasta.gz)
    NPac.G1PA.bf100.id99.aa.fasta.gz
    NPac.G2PA.bf100.id99.aa.fasta.gz
    NPac.G3PA.bf100.id99.aa.fasta.gz
    NPac.G3PA_diel.bf100.id99.aa.fasta.gz
    NPac.D1PA.bf100.id99.aa.fasta.gz

    MarFERReT + MARMICRODB taxonomic annotation with DIAMOND

    Taxonomy was inferred for the NPEGC metatranscripts with the DIAMOND fast read alignment software against the MarFERReT v1.1 + MARMICRODB v1.0 multi-kingdom marine reference protein sequence library (v1.1), a combined database of the MarFERReT v1.1 marine microbial eukaryote sequence library and MARMICRODB v1.0 prokaryote-focused marine genome database. See NPEGC.diamond_taxonomy.log.sh for full description of DIAMOND annotation.

    Excerpt of core DIAMOND function:
    function NPEGC_diamond {
    # FASTA filename for $STUDY
    FASTER_FASTA="NPac.${STUDY}.bf100.id99.aa.fasta"
    # Output filename for LCA results in lca.tab file:
    LCA_TAB="NPac.${STUDY}.MarFERReT_v1.1_MMDB.lca.tab"
    echo "Beginning ${STUDY}"
    singularity exec --no-home --bind ${DATA_DIR} \
    "${CONTAINER_DIR}/diamond.sif" diamond blastp \
    -c 4 --threads $N_THREADS \
    --db $MFT_MMDB_DMND_DB -e $EVALUE --top 10 -f 102 \
    --memory-limit 110 \
    --query ${FASTER_FASTA} -o ${LCA_TAB} >> "${STUDY}.MarFERReT_v1.1_MMDB.log" 2>&1
    }

    Corresponding files uploaded to this repository: Gzip-compressed diamond lowest common ancestor predictions with NCBI Taxonomy against a combined MarFERReT + MARMICRODB taxonomic library (*.Pfam35.domtblout.tab.gz)
    NPac.G1PA.MarFERReT_v1.1_MMDB.lca.tab.gz
    NPac.G2PA.MarFERReT_v1.1_MMDB.lca.tab.gz
    NPac.G3PA.MarFERReT_v1.1_MMDB.lca.tab.gz
    NPac.G3PA_diel.MarFERReT_v1.1_MMDB.lca.tab.gz
    NPac.D1PA.MarFERReT_v1.1_MMDB.lca.tab.gz

    Pfam 35.0 functional annotation using HMMER3
    Clustered protein sequences were annotated against the Pfam 35.0 collection of 19,179 protein family Hidden Markov Models (HMMs) using HMMER 3.3 with the Pfam 35.0 protein family database. Pfam annotation code is documented here: NPEGC.hmmer_function.sh

    Excerpt of core hmmsearch function:

    function NPEGC_hmmer {
    # Define input FASTA
    INPUT_FASTA="NPac.${STUDY}.bf100.id99.aa.fasta"
    # hmmsearch call:
    hmmsearch --cut_tc --cpu $NCORES --domtblout $ANNOTATION_DIR/${STUDY}.Pfam35.domtblout.tab $HMM_PROFILE ${INPUT_FASTA}
    # compress output file:
    gzip $ANNOTATION_DIR/${STUDY}.Pfam35.domtblout.tab
    }

    Corresponding files uploaded to this repository: Gzip-compressed hmmsearch domain table files for Pfam35 queries (*.Pfam35.domtblout.tab.gz)
    G1PA.Pfam35.domtblout.tab.gz
    G2PA.Pfam35.domtblout.tab.gz
    G3PA.Pfam35.domtblout.tab.gz
    G3PA_diel.Pfam35.domtblout.tab.gz
    D1PA.Pfam35.domtblout.tab.gz

    KEGG functional annotation using KofamScan v1.3.0

    Clustered protein sequences were annotated against the KEGG collection (release 104.0) of 20,819 protein family Hidden Markov Models (HMMs) using KofamScan and KofamKOALA. Kofam annotation code is documented here: NPEGC.kofamscan_function.sh

    Excerpt of core NPEGC_kofam function:

    # Core function to perform KofamScan annotation
    function NPEGC_kofam {
    # Define input FASTA
    local INPUT_FASTA="NPac.${STUDY}.bf100.id99.aa.fasta"

    # KofamScan call
    ${KOFAM_DIR}/kofam_scan-1.3.0/exec_annotation -f detail-tsv -E ${EVALUE} -o ${ANNOTATION_DIR}/NPac.${STUDY}.bf100.id99.aa.tsv ${FASTA_DIR}/${INPUT_FASTA}

    # Keep best hit

  4. QUASR: the QUAsisymmetric Stellarator Repository

    • zenodo.org
    application/gzip
    Updated Mar 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Giuliani; Andrew Giuliani (2024). QUASR: the QUAsisymmetric Stellarator Repository [Dataset]. http://doi.org/10.5281/zenodo.10581415
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Mar 26, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Andrew Giuliani; Andrew Giuliani
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This is a dataset of approximately 200,000 vacuum field stellarators along with the electromagnetic coils that generate them. The devices in the database are available in a couple of formats (SIMSOPT, VMEC) useful to the stellarator community.

    typo in uploads v1, v2: the 'total_coil_length' and 'coil_length_per_hp' keys in dataframe should read 'total_coil_length_threshold' and ''coil_length_threshold_per_hp'', i.e., the maximum allowable coil length and maximum allowable coil length per half period, respectively.

    v2 (January 29, 2024): additional QA devices added.

    v1 (October 29, 2023): initial upload.

  5. Corrected IODP Gamma Ray Attenuation (GRA) densities and calculated...

    • zenodo.org
    • data.niaid.nih.gov
    txt, zip
    Updated May 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gary Acton; Gary Acton; Laurel Childress; Laurel Childress; Vincent Percuoco; Vincent Percuoco; Margaret Hastedt; Margaret Hastedt (2024). Corrected IODP Gamma Ray Attenuation (GRA) densities and calculated porosities derived from the LILY Database [Dataset]. http://doi.org/10.5281/zenodo.10001855
    Explore at:
    zip, txtAvailable download formats
    Dataset updated
    May 31, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gary Acton; Gary Acton; Laurel Childress; Laurel Childress; Vincent Percuoco; Vincent Percuoco; Margaret Hastedt; Margaret Hastedt
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 26, 2023
    Description
    The dataset GRA_Densities_Corrected_and_Porosities_2023-12-26.csv is derived from an analysis of data from the LILY Database (https://doi.org/10.5281/zenodo.8408296) as described in Childress et al. (2024, https://doi.org/10.1029/2023GC011287). The file contains over 3.7 million corrected gamma ray attenuation (GRA) bulk density data derived from the LILY database file GRA_DataLITH.csv. It also contains over 3.7 million porosity estimates that are computed from the corrected GRA bulk density using grain densities computed for each lithology from Moisture and Density (MAD) grain densities (derived from LILY file MAD_DataLITH.csv).
    Citation: Please cite Childress et al. (2024) when using these data:
    Childress, L.B., Acton, G.D., Percuoco, V.P., Hastedt, M., 2024. The LILY Database: Linking Lithology to IODP Physical, Chemical, and Magnetic Properties Data, Geochemistry, Geophysics, Geosystems, 25, https://doi.org/10.1029/2023GC011287.
    GRA_Densities_Corrected_and_Porosities_2023-12-26.csv file size uncompressed is 950 Mb.
    Data File format:
    • Exp: expedition number
    • Site: site number
    • Hole: hole number
    • Core: core number
    • Type: Type indicates the coring tool used to recover the core (typical types are F, H, R, X; see Table S3 in Childress et al., 2024, https://doi.org/10.1029/2023GC011287).
    • Sect: section number
    • Offset (cm): position of the observation, measured relative to the top of a section.
    • Depth CSF-A (m): location of the observation expressed relative to the top of a hole.
    • Bulk density (GRA): bulk GRA density measured on whole core sections in g/cm^3.
    • Timestamp (UTC): date and time the observation was made.
    • Instrument: abbreviation or mnemonic for the GRA sensing device used to make this observation (GRA1 or GRA2).
    • Instrument group: abbreviation or mnemonic for the data collection device (logger) used to acquire this observation (WRMSL).
    • Text ID: automatically generated unique database identifier for a sample, visible on printed labels.
    • Prefix: Prefix of the lithology
    • Principal: Principal lithology
    • Suffix: Suffix of the lithology
    • Full Lithology: full lithologic name = Prefix + Principal + Suffix
    • Simplified Lithology: categorization of lithologies (see Supporting Information in Childress et al., 2024, https://doi.org/10.1029/2023GC011287)
    • Lithology Type: Sedimentary, Igneous, or Metamorphic
    • Degree of Consolidation: consolidation state of the lithology.
    • Lithology Subtype: categorization of lithologies (see Supporting Information in Childress et al., 2024, https://doi.org/10.1029/2023GC011287).
    • Expanded Core Type: the actual coring type used, because some coring types were incorrectly grouped in the "Type" column (see Childress et al., 2024 for an explanation)
    • Latitude (DD): Latitude in decimal degrees
    • Longitude (DD): Longitude in decimal degrees
    • Water Depth (mbsl): water depth in meters below sea level
    • Grain Density: grain density associated with the Principal lithology, computed from MAD data
    • Mean MAD Bulk Density: mean MAD bulk density associated with the Principal lithology.
    • Std MAD Bulk Density: standard deviation in the MAD bulk densities for each Principal lithology.
    • Correction Basis: the GRA bulk densities are corrected based on coring tool used. If the RCB was used, then the lithology cored by the RCB is used in determining the size of the correction.
    • Median Difference: The correction that will be applied based on the median difference between the raw GRA bulk density and the colocated MAD bulk density for a specific Correction Basis.
    • GRA Bulk Density Corrected: The corrected GRA bulk density in g/cm^3.
    • Porosity: porosity computed from the corrected GRA bulk densities and grain density.
    • Deviation: difference between "GRA Bulk Density Corrected" and "Mean MAD Bulk Density", which is the deviation the corrected density has from that expected for its Principal lithology.
    • N Deviations: The number of standard deviations by which the observation differs from the expected value (= Deviation/(Std MAD Bulk Density)), which is useful for identifying outliers.

    GitHub Repository:

  6. Data from: Long-Term Wi-Fi fingerprinting dataset and supporting material

    • zenodo.org
    • producciocientifica.uv.es
    zip
    Updated Apr 11, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Germán Martín Mendoza-Silva; Germán Martín Mendoza-Silva; Philipp Richter; Philipp Richter; Joaquín Torres-Sospedra; Joaquín Torres-Sospedra; Elena Simona Lohan; Elena Simona Lohan; Joaquín Huerta; Joaquín Huerta (2020). Long-Term Wi-Fi fingerprinting dataset and supporting material [Dataset]. http://doi.org/10.5281/zenodo.1066041
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 11, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Germán Martín Mendoza-Silva; Germán Martín Mendoza-Silva; Philipp Richter; Philipp Richter; Joaquín Torres-Sospedra; Joaquín Torres-Sospedra; Elena Simona Lohan; Elena Simona Lohan; Joaquín Huerta; Joaquín Huerta
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    WiFi measurements database for UJI's library and supporting material.

    The measurements were collected by one person using one Android smartphone during 15 months at two floor of the library building from Universitat Jaume I, in Spain. It contains 63,504 WiFi fingerprints, which are organized into datasets. Each dataset is the result of a collection campaign.

    The supporting material includes Matlab® scripts to load and filter the desired data, and provides examples on possible studies that the database may enable. The supporting material also includes the bookshelve local coordinates.

    Citation request:

    G.M. Mendoza-Silva, P. Richter, J. Torres-Sospedra, E.S. Lohan, J. Huerta, "Long-Term
    Wi-Fi fingerprinting dataset and supporting material", Zenodo repository, DOI 10.5281/zenodo.1066041.

  7. # Replication code and data for: Tracking green space along streets of world...

    • zenodo.org
    • explore.openaire.eu
    • +1more
    bin
    Updated May 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giacomo Falchetta; T. Ahmed Hammad; Giacomo Falchetta; T. Ahmed Hammad (2025). # Replication code and data for: Tracking green space along streets of world cities [Dataset]. http://doi.org/10.5281/zenodo.13886667
    Explore at:
    binAvailable download formats
    Dataset updated
    May 20, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Giacomo Falchetta; T. Ahmed Hammad; Giacomo Falchetta; T. Ahmed Hammad
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Oct 3, 2024
    Description

    # Replication code and data for: Tracking green space along streets of world cities
    Falchetta, G., & Hammad, A. T. (2025). Tracking green space along streets of world cities. Environmental Research: Infrastructure and Sustainability. https://doi.org/10.1088/2634-4505/add9c4

    To replicate the analysis, the results, and the figures of the paper:

    • Download input data from this Zenodo repository and code from Github https://github.com/giacfalk/urban_green_space_mapping_and_tracking
    • *Optional data extraction steps* (processed output data are already available in the Zenodo repository):
      • Adjust your working directory
      • Run [lines 4-11] of workflow/sourcer.R
      • Run the Javascript scripts written by the string_generator_training.R and string_generator_prediction.R files in Google Earth Engine (https://code.earthengine.google.com) and complete the export to Drive tasks to generate the output .csv files
    • Run workflow/sourcer.R [lines 15-46] to train the ML model and make predictions (including figures and tables replication)

  8. Dataset for Particulate Studies and Obesity

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jan 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erin J. Stephenson; Alyse Ragauskas; Sridhar Jaligama; JeAnna R. Redd; Jyothi Parvathareddy; Matthew J. Peloquin; Jordy Saravia; Joan C Han; Stephania A. Cormier; Dave Bridges; Erin J. Stephenson; Alyse Ragauskas; Sridhar Jaligama; JeAnna R. Redd; Jyothi Parvathareddy; Matthew J. Peloquin; Jordy Saravia; Joan C Han; Stephania A. Cormier; Dave Bridges (2020). Dataset for Particulate Studies and Obesity [Dataset]. http://doi.org/10.5281/zenodo.50802
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 21, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Erin J. Stephenson; Alyse Ragauskas; Sridhar Jaligama; JeAnna R. Redd; Jyothi Parvathareddy; Matthew J. Peloquin; Jordy Saravia; Joan C Han; Stephania A. Cormier; Dave Bridges; Erin J. Stephenson; Alyse Ragauskas; Sridhar Jaligama; JeAnna R. Redd; Jyothi Parvathareddy; Matthew J. Peloquin; Jordy Saravia; Joan C Han; Stephania A. Cormier; Dave Bridges
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    Code and Raw Data for Obesity Particulate Treatment study

    This repository contains raw data for studies done by the Bridges Lab and our collaborators on the metabolic effects of in utero exposure to particulates containing environmentally persistent free radicals on obese adult mice. This repository contains the data for the manuscripts detailed below. The tag column indicates the state of the dataset at the indicated time.:

    Publication Dataset Tag E. J. Stephenson, A. Ragauskas, S. Jaligama, J. R. Redd, J. Parvathareddy, M. J. Peloquin, J. Saravia, J. Han, S. A. Cormier, D. Bridges, Exposure to environmentally persistent free radicals during gestation lowers energy expenditure and impairs skeletal muscle mitochondrial function in adult mice. (2016). American Journal of Physioogy - Endocrinology and Metabolism. doi:10.1152/ajpendo.00521.2015. ObesityParticulateTreatment-v1.0.0 Licence

    This ObesityParticulateTreatment data is made available under the Open Data Commons Attribution License: http://opendatacommons.org/licenses/by/1.0.

    Data Files

    Data files are located in the data directory The raw data in this analysis is located in data/raw and is the following files:

    Script Files

    Script files are saved in scripts folder and were analysed in this order

    Manuscript

    The manuscript files, including the manuscript, the figures, tables and supplementary data are in the manuscript directory.

  9. Data from: Spatial deconvolution of HER2-positive Breast cancer delineates...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Sep 16, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alma Andersson; Alma Andersson (2021). Spatial deconvolution of HER2-positive Breast cancer delineates tumor-associated cell type interactions [Dataset]. http://doi.org/10.5281/zenodo.4751624
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 16, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alma Andersson; Alma Andersson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Processed count matrices, brightfield HE-images (plain and annotated), spot selection files and meta-data associated with the manuscript "Spatial Deconvolution of HER2-positive Breast Tumors Reveals Novel Intercellular Relationships".

    The content of the files are:

    count-matrices.zip - processed count matrices formatted as [n_spots]x[n_genes] and named as [PATIENT][SECTION].tsv.gz.

    images.zip - contains two folders HE and annotation, the former holds the HE-images for respective section named as [PATIENT][SECTION].jpg, the latter holds the annotated (by the pathologist) images named by patient (only one section from each patient was annotated).

    spot-selection.zip - contains .tsv files to map array coordinates to pixel coordinates, allowing the spots and their associated expression values to be visualized jointly. Files are named as [PATIENT][SECTION]_selection.tsv.gz

    meta.zip - for all annotated sections, these files are similar to the spot-selection files, but also includes the label of each spot (e.g., breast glands, connective tissue, etc.).

    All files are password protected (encrypted), use the passeword zNLXkYk3Q9znUseS do decrypt the data.

    code.zip - a clone of the github repository created (2021-05-12).

  10. The BORDERSCAPE Project WebGIS Repository

    • zenodo.org
    zip
    Updated May 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oren Siegel; Oren Siegel; Julian Bogdani; Julian Bogdani; Alberto Urcia; Alberto Urcia; Serena Nicolini; Serena Nicolini; Maria Carmela Gatto; Maria Carmela Gatto (2024). The BORDERSCAPE Project WebGIS Repository [Dataset]. http://doi.org/10.5281/zenodo.11099773
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 1, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Oren Siegel; Oren Siegel; Julian Bogdani; Julian Bogdani; Alberto Urcia; Alberto Urcia; Serena Nicolini; Serena Nicolini; Maria Carmela Gatto; Maria Carmela Gatto
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    # The BORDERSCAPE Project WebGIS Repository: Description of Contents


    Data are stored in a folder named borderscape_webgis_data_v6.0.zip.

    Singular files are:
    - README.md: a formatted text document (Markdown syntax) describing the contents of this repository.
    - sites.geojson: a GeoJSON file with information on each archaeological site included in the webGIS.
    - borderscape_sites.csv: the list of archaeological sites and their attributes from which the sites.geojson file was built for the webGIS, in the open CSV (comma separated values) format.
    - borderscape_archaeological_sites.xlsx: the list of archaeological sites and their attributes. It contains the same information as borderscape_sites.csv as an Excel Workbook (Office Open XML)
    - flooding_nile.geojson: a GeoJSON polygon file with information on Nile flood levels at 86m and 94.5m ASL.
    - borderscape_bibliography.bib: A bibliography with all of the sources abbreviated in the sites.csv file.
    . merged_coronas_freegr.tif: a GEOtif of the georeferenced CORONA imagery showing the Lower Nubian landscape prior to the construction of the Aswan High Dam.

    Finally, a folder named borderscape_data.zip contains the following ZIP archives with the spatial (shapefiles) data:
    - borderscape_archaeological_sites.zip: a ZIP archive of a shapefile showing all of the archaeological sites and their attributes used in the webGIS.
    - sites_phase1.zip: a ZIP archive of a shapefile showing archaeological sites used in the webGIS from Phase 1.
    - sites_phase2.zip: a ZIP archive of a shapefile showing archaeological sites used in the webGIS from Phase 2.
    - sites_phase3.zip: a ZIP archive of a shapefile showing archaeological sites used in the webGIS from Phase 3.
    - sites_phase4.zip: a ZIP archive of a shapefile showing archaeological sites used in the webGIS from Phase 4.
    - sites_phase5.zip: a ZIP archive of a shapefile showing archaeological sites used in the webGIS from Phase 5.
    - sites_phase6.zip: a ZIP archive of a shapefile showing archaeological sites used in the webGIS from Phase 6.
    - 86m_flooding_contour.zip: a ZIP archive of a shapefile showing flooded areas at 86m ASL.
    - 94.5m_flooding_contour.zip: a ZIP archive of a shapefile showing flooded areas at 94.5m ASL.





  11. Data from: Tracking and classifying Amazon fire events in near-real time

    • zenodo.org
    zip
    Updated Jun 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Niels Andela; Niels Andela (2022). Tracking and classifying Amazon fire events in near-real time [Dataset]. http://doi.org/10.5281/zenodo.6641625
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 14, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Niels Andela; Niels Andela
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data and code supporting the manuscript "Tracking and classifying Amazon fire events in near-real time" accepted in Science Advances.

  12. Zenodo Code Images

    • kaggle.com
    zip
    Updated Jun 18, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford Research Computing Center (2018). Zenodo Code Images [Dataset]. https://www.kaggle.com/datasets/stanfordcompute/code-images
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Jun 18, 2018
    Dataset authored and provided by
    Stanford Research Computing Center
    Description

    Code Images

    DOI

    Context

    This is a subset of the Zenodo-ML Dinosaur Dataset [Github] that has been converted to small png files and organized in folders by the language so you can jump right in to using machine learning methods that assume image input.

    Content

    Included are .tar.gz files, each named based on a file extension, and when extracted, will produce a folder of the same name.

     tree -L 1
    .
    ├── c
    ├── cc
    ├── cpp
    ├── cs
    ├── css
    ├── csv
    ├── cxx
    ├── data
    ├── f90
    ├── go
    ├── html
    ├── java
    ├── js
    ├── json
    ├── m
    ├── map
    ├── md
    ├── txt
    └── xml
    

    And we can peep inside a (somewhat smaller) of the set to see that the subfolders are zenodo identifiers. A zenodo identifier corresponds to a single Github repository, so it means that the png files produced are chunks of code of the extension type from a particular repository.

    $ tree map -L 1
    map
    ├── 1001104
    ├── 1001659
    ├── 1001793
    ├── 1008839
    ├── 1009700
    ├── 1033697
    ├── 1034342
    ...
    ├── 836482
    ├── 838329
    ├── 838961
    ├── 840877
    ├── 840881
    ├── 844050
    ├── 845960
    ├── 848163
    ├── 888395
    ├── 891478
    └── 893858
    
    154 directories, 0 files
    

    Within each folder (zenodo id) the files are prefixed by the zenodo id, followed by the index into the original image set array that is provided with the full dinosaur dataset archive.

    $ tree m/891531/ -L 1
    m/891531/
    ├── 891531_0.png
    ├── 891531_10.png
    ├── 891531_11.png
    ├── 891531_12.png
    ├── 891531_13.png
    ├── 891531_14.png
    ├── 891531_15.png
    ├── 891531_16.png
    ├── 891531_17.png
    ├── 891531_18.png
    ├── 891531_19.png
    ├── 891531_1.png
    ├── 891531_20.png
    ├── 891531_21.png
    ├── 891531_22.png
    ├── 891531_23.png
    ├── 891531_24.png
    ├── 891531_25.png
    ├── 891531_26.png
    ├── 891531_27.png
    ├── 891531_28.png
    ├── 891531_29.png
    ├── 891531_2.png
    ├── 891531_30.png
    ├── 891531_3.png
    ├── 891531_4.png
    ├── 891531_5.png
    ├── 891531_6.png
    ├── 891531_7.png
    ├── 891531_8.png
    └── 891531_9.png
    
    0 directories, 31 files
    

    So what's the difference?

    The difference is that these files are organized by extension type, and provided as actual png images. The original data is provided as numpy data frames, and is organized by zenodo ID. Both are useful for different things - this particular version is cool because we can actually see what a code image looks like.

    How many images total?

    We can count the number of total images:

    find "." -type f -name *.png | wc -l
    3,026,993
    

    Dataset Curation

    The script to create the dataset is provided here. Essentially, we start with the top extensions as identified by this work (excluding actual images files) and then write each 80x80 image to an actual png image, organizing by extension then zenodo id (as shown above).

    Saving the Image

    I tested a few methods to write the single channel 80x80 data frames as png images, and wound up liking cv2's imwrite function because it would save and then load the exact same content.

    import cv2
    cv2.imwrite(image_path, image)
    

    Loading the Image

    Given the above, it's pretty easy to load an image! Here is an example using scipy, and then for newer Python (if you get a deprecation message) using imageio.

    image_path = '/tmp/data1/data/csv/1009185/1009185_0.png'
    from imageio import imread
    
    image = imread(image_path)
    array([[116, 105, 109, ..., 32, 32, 32],
        [ 48, 44, 48, ..., 32, 32, 32],
        [ 48, 46, 49, ..., 32, 32, 32],
        ..., 
        [ 32, 32, 32, ..., 32, 32, 32],
        [ 32, 32, 32, ..., 32, 32, 32],
        [ 32, 32, 32, ..., 32, 32, 32]], dtype=uint8)
    
    
    image.shape
    (80,80)
    
    
    # Deprecated
    from scipy import misc
    misc.imread(image_path)
    
    Image([[116, 105, 109, ..., 32, 32, 32],
        [ 48, 44, 48, ..., 32, 32, 32],
        [ 48, 46, 49, ..., 32, 32, 32],
        ..., 
        [ 32, 32, 32, ..., 32, 32, 32],
        [ 32, 32, 32, ..., 32, 32, 32],
        [ 32, 32, 32, ..., 32, 32, 32]], dtype=uint8)
    

    Remember that the values in the data are characters that have been converted to ordinal. Can you guess what 32 is?

    ord(' ')
    32
    
    # And thus if you wanted to convert it back...
    chr(32)
    

    So how t...

  13. Data from: Data for climate mitigation scenarios with persistent COVID-19...

    • zenodo.org
    • data.niaid.nih.gov
    Updated Jun 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jarmo Kikstra; Jarmo Kikstra; Adriano Vinca; Adriano Vinca; Francesco Lovat; Francesco Lovat; Benigna Boza-Kiss; Benigna Boza-Kiss; Bas van Ruijven; Bas van Ruijven; Charlie Wilson; Charlie Wilson; Joeri Rogelj; Joeri Rogelj; Behnam Zakeri; Behnam Zakeri; Oliver Fricko; Oliver Fricko; Keywan Riahi; Keywan Riahi (2023). Data for climate mitigation scenarios with persistent COVID-19 related energy demand changes [Dataset]. http://doi.org/10.5281/zenodo.5211169
    Explore at:
    Dataset updated
    Jun 15, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jarmo Kikstra; Jarmo Kikstra; Adriano Vinca; Adriano Vinca; Francesco Lovat; Francesco Lovat; Benigna Boza-Kiss; Benigna Boza-Kiss; Bas van Ruijven; Bas van Ruijven; Charlie Wilson; Charlie Wilson; Joeri Rogelj; Joeri Rogelj; Behnam Zakeri; Behnam Zakeri; Oliver Fricko; Oliver Fricko; Keywan Riahi; Keywan Riahi
    Description

    This repository contains data for the main text figures plus some supplementary figures in the article:
    Kikstra et al 2021 Nat. Energy. DOI: 10.1038/s41560-021-00904-8

    This dataset should be cited as: Kikstra et al. (2021). Data for climate mitigation scenarios with persistent COVID-19 related energy demand changes. DOI: 10.5281/zenodo.5211169

    In order to reproduce the figures, one needs to use the script that is available on GitHub at:
    https://github.com/iiasa/covid-energy-demand-scenarios

    The most accessible way of exploring the scenario data behind this article would be to go to https://data.ece.iiasa.ac.at/engage/#/workspaces/60.
    This goes to a web tool hosted by the International Institute of Applied Systems Analysis (IIASA) which provides access to a database of these and more variables of interest, defined for each scenario on the detail of MESSAGE regions, with a few example workspaces available within the ENGAGE Scenario Explorer.
    The Scenario Explorer is a versatile open access tool to browse, visualize and download data and results. Users can freely create a private workspace where customized plots can be saved and shared.
    For tutorials on how to use the Scenario Explorer, please visit https://software.ece.iiasa.ac.at/ixmp-server/tutorials.html.

    The scenarios that were used for the IPCC Special Report on 1.5C warming (SR1.5) have been made available at https://data.ece.iiasa.ac.at/iamc-1.5c-explorer/.

    The data is available for download at the ENGAGE Scenario Explorer. The license permits use of the scenario ensemble for scientific research and science communication, but restricts redistribution of substantial parts of the data. Please refer to the FAQ and legal code for more information.

  14. Data from: Global prediction of extreme floods in ungauged watersheds

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Jan 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Grey Nearing; Grey Nearing (2025). Global prediction of extreme floods in ungauged watersheds [Dataset]. http://doi.org/10.5281/zenodo.10397664
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 22, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Grey Nearing; Grey Nearing
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset accompanies the following article:

    Nearing, Grey, et al. "Global prediction of extreme floods in ungauged watersheds." Nature (2024).

    The code repository associated with this data is repository here: https://github.com/google-research-datasets/global_streamflow_model_paper/. It is highly recommended to use the associated code repository to process this data.

    The `model_data.tgz` repository includes reforecasts from the Google model and reanalyses from the GloFAS model. Google model outputs are in units [mm/day] and GloFAS outputs are in units [m3/s]. Model outputs are daily and timestamps are right-labeled, meaning that model outputs labeled, .e.g., 01/01/2020 correspond to streamflow predictions for the day of 12/31/2019.

  15. ESA WorldCereal 10 m 2021 v100

    • zenodo.org
    • data.niaid.nih.gov
    bin, zip
    Updated Aug 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kristof Van Tricht; Kristof Van Tricht; Jeroen Degerickx; Jeroen Degerickx; Sven Gilliams; Sven Gilliams; Daniele Zanaga; Mickaël Savinaud; Marjorie Battude; Romain Buguet de Chargère; Guillaume Dubreule; Alex Grosu; Joost Brombacher; Joost Brombacher; Henk Pelgrum; Henk Pelgrum; Myroslava Lesiv; Juan Carlos Laso Bayas; Santosh Karanam; Steffen Fritz; Inbal Becker-Reshef; Belén Franch; Bertran Mollà Bononad; Juanma Cintas; Juanma Cintas; Hendrik Boogaard; Arun Kumar Pratihast; Lubos Kucera; Zoltan Szantoi; Zoltan Szantoi; Daniele Zanaga; Mickaël Savinaud; Marjorie Battude; Romain Buguet de Chargère; Guillaume Dubreule; Alex Grosu; Myroslava Lesiv; Juan Carlos Laso Bayas; Santosh Karanam; Steffen Fritz; Inbal Becker-Reshef; Belén Franch; Bertran Mollà Bononad; Hendrik Boogaard; Arun Kumar Pratihast; Lubos Kucera (2024). ESA WorldCereal 10 m 2021 v100 [Dataset]. http://doi.org/10.5281/zenodo.7875105
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Aug 5, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Kristof Van Tricht; Kristof Van Tricht; Jeroen Degerickx; Jeroen Degerickx; Sven Gilliams; Sven Gilliams; Daniele Zanaga; Mickaël Savinaud; Marjorie Battude; Romain Buguet de Chargère; Guillaume Dubreule; Alex Grosu; Joost Brombacher; Joost Brombacher; Henk Pelgrum; Henk Pelgrum; Myroslava Lesiv; Juan Carlos Laso Bayas; Santosh Karanam; Steffen Fritz; Inbal Becker-Reshef; Belén Franch; Bertran Mollà Bononad; Juanma Cintas; Juanma Cintas; Hendrik Boogaard; Arun Kumar Pratihast; Lubos Kucera; Zoltan Szantoi; Zoltan Szantoi; Daniele Zanaga; Mickaël Savinaud; Marjorie Battude; Romain Buguet de Chargère; Guillaume Dubreule; Alex Grosu; Myroslava Lesiv; Juan Carlos Laso Bayas; Santosh Karanam; Steffen Fritz; Inbal Becker-Reshef; Belén Franch; Bertran Mollà Bononad; Hendrik Boogaard; Arun Kumar Pratihast; Lubos Kucera
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ESA WorldCereal 2021 products v100

    The European Space Agency (ESA) WorldCereal 10m 2021 product suite consist of global-scale annual and seasonal crop maps and (where applicable) their related confidence. Every file in this repository contains up to 106 agro-ecological zone (AEZ) products which were all processed with respect to their own regional seasonality and should be considered as independent products.

    Naming convention of the ZIP files is as follows:

    WorldCereal_{year}_{season}_{product}_{classification|confidence}.zip

    The actual AEZ-based GeoTIFF files inside each ZIP are named according to following convention:

    {AEZ_id}_{season}_{product}_{startdate}_{enddate}_{classification|confidence}.tif

    The seasons are defined in Table 1. Note that cereals as described by WorldCereal include wheat, barley and rye, which belong to the Triticeae tribe. Next to the actual WorldCereal products, this repository contains the files "WorldCereal_AEZ.geojson" that contains the AEZ description and outline, as well as "QGIS_stylefiles.zip" which contains QGIS style files (.qml) for product visualization purposes.

    SeasonDescription
    tc-annualA one-year cycle being defined in a region by the end of the last considered growing season
    tc-wintercerealsThe main cereals season defined in a region
    tc-springcerealsOptional springcereals season, only defined in certain AEZ
    tc-maize-mainThe main maize season defined in a region
    tc-maize-secondOptional second maize season, only defined in certain AEZ.

    Note: AEZs for which no irrigation product is available were not processed because of the unavailability of thermal Landsat data.

    A scientific paper describing the WorldCereal products and the methodology behind them is available through the link below:

    Van Tricht, K., Degerickx, J., Gilliams, S., Zanaga, D., Battude, M., Grosu, A., Brombacher, J., Lesiv, M., Bayas, J. C. L., Karanam, S., Fritz, S., Becker-Reshef, I., Franch, B., Mollà-Bononad, B., Boogaard, H., Pratihast, A. K., Koetz, B., and Szantoi, Z.: WorldCereal: a dynamic open-source system for global-scale, seasonal, and reproducible crop and irrigation mapping, Earth Syst. Sci. Data, 15, 5491–5515, https://doi.org/10.5194/essd-15-5491-2023, 2023.

    This work was supported by the European Space Agency under contract N°4000130569/20/I-NB.

  16. Simulated exome-sequencing data for a family study of lymphoid cancer

    • zenodo.org
    • data.niaid.nih.gov
    bin, txt, zip
    Updated Jul 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jinko Graham; Jinko Graham; Nirodha Epasinghege Dona; Nirodha Epasinghege Dona (2024). Simulated exome-sequencing data for a family study of lymphoid cancer [Dataset]. http://doi.org/10.5281/zenodo.12696267
    Explore at:
    bin, zip, txtAvailable download formats
    Dataset updated
    Jul 10, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jinko Graham; Jinko Graham; Nirodha Epasinghege Dona; Nirodha Epasinghege Dona
    License

    https://www.gnu.org/licenses/agpl.txthttps://www.gnu.org/licenses/agpl.txt

    Time period covered
    Jul 8, 2024
    Description

    This repository contains all the data files for a simulated exome-sequencing study of 150 families, ascertained to contain at least four members affected with lymphoid cancer. Please note that previous versions of this repository omitted a key file linking the genotypes of individuals to their family and individual IDs; this file, geno_key.txt, is now included. All other files remain the same as in previous versions.

    The simulated data can be found in the files section below. The files are:

    1. SLiM_output.txt - contains the SLiM-simulated, exome-wide, SNV data generated under an American-admixture demographic model, for the American-admixed sub-population only.
    2. SLiM_output_chr8&9.txt - contains the SLiM-simulated data above for all source populations as well as the American-admixed sub-population, but only for chromosomes 8 and 9.
    3. sample_info.txt - contains pedigree information of all the disease-affected individuals and individuals connecting them along a line of descent, for all 150 ascertained pedigrees.
    4. Genotypes.zip - a zipfile that contains 22 text files of genotypes for each chromosome. The genotypes are for simulated single-nucleotide variants on the exome and are in gene-dosage format.
    5. geno_key.txt – a plain-text file that links the genotyped individuals to their family and individual IDs.
    6. SNVmaps.zip - a zipfile that contains 22 text files giving the single-nucleotide variant information for each chromosome.
    7. familial_cRV.txt - contains the familial causal rare variants for all 150 ascertained pedigrees.
    8. study_peds.txt - contains the 150 pedigrees ascertained to contain four or more relatives affected with lymphoid cancer.
    9. PLINKfiles.zip - a zipfile that contains PLINK .fam, .bim and .bed files for all 22 of the chromosomes.

    All the scripts used to generate these data can be found in the GitHub repository archived at https://zenodo.org/records/12694914

    We have also uploaded one intermediate .Rdata file, Chromwide.Rdata, to save the user substantial time when running the associated RMarkdown script for the simulation. We recommend loading Chromwide.Rdata into your R work-space rather than generating it from scratch.

  17. Processed data for MethylBoostER: an XGBoost model to classify kidney cancer...

    • zenodo.org
    • explore.openaire.eu
    zip
    Updated Apr 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sabrina H Rossi; Sabrina H Rossi; Charles E Massie; Charles E Massie; Izzy Newsham; Izzy Newsham; Shamith A Samarajiwa; Shamith A Samarajiwa (2022). Processed data for MethylBoostER: an XGBoost model to classify kidney cancer subtypes [Dataset]. http://doi.org/10.5281/zenodo.6463893
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 16, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sabrina H Rossi; Sabrina H Rossi; Charles E Massie; Charles E Massie; Izzy Newsham; Izzy Newsham; Shamith A Samarajiwa; Shamith A Samarajiwa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a repository containing processed data for MethylBoostER, an XGBoost model that classifies kidney cancer subtypes. The open-source code can be found here: https://github.com/ss-lab-cancerunit/MethylBoostER.

  18. mroeck/carbenmats-buildings: Pre-release

    • zenodo.org
    zip
    Updated Sep 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin RÖCK; Martin RÖCK (2023). mroeck/carbenmats-buildings: Pre-release [Dataset]. http://doi.org/10.5281/zenodo.8363895
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 26, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Martin RÖCK; Martin RÖCK
    Description

    A Global Database on Whole Life Carbon, Energy and Material Intensity of Buildings (CarbEnMats-Buildings)

    Abstract

    Globally, interest in understanding the life cycle related greenhouse gas (GHG) emissions of buildings is increasing. Robust data is required for benchmarking and analysis of parameters driving resource use and whole life carbon (WLC) emissions. However, open datasets combining information on energy and material use as well as whole life carbon emissions remain largely unavailable – until now.

    We present a global database on whole life carbon, energy use, and material intensity of buildings. It contains data on more than 1,200 building case studies and includes over 300 attributes addressing context and site, building design, assessment methods, energy and material use, as well as WLC emissions across different life cycle stages. The data was collected through various meta-studies, using a dedicated data collection template (DCT) and processing scripts (Python Jupyter Notebooks), all of which are shared alongside this data descriptor.

    This dataset is valuable for industrial ecology and sustainable construction research and will help inform decision-making in the building industry as well as the climate policy context.

    Background & Summary

    The need for reducing greenhouse gas (GHG) emissions across Europe require defining and implementing a performance system for both operational and embodied carbon at the building level that provides relevant guidance for policymakers and the building industry. So-called whole life carbon (WLC) of buildings is gaining increasing attention among decision-makers concerned with climate and industrial policy, as well as building procurement, design, and operation. However, most open buildings datasets published thus far have been focusing on building’s operational energy consumption and related parameters 1,2,2–4. Recent years furthermore brought large-scale datasets on building geometry (footprint, height) 5,6 as well as the publication of some datasets on building construction systems and material intensity 7,8. Heeren and Fishman’s database seed on material intensity (MI) of buildings 7, an essential reference to this work, was a first step towards an open data repository on material-related environmental impacts of buildings. In their 2019 descriptor, the authors present data on the material coefficients of more than 300 building cases intended for use in studies applying material flow analysis (MFA), input-output (IO) or life cycle assessment (LCA) methods. Guven et al. 8 elaborated on this effort by publishing a construction classification system database for understanding resource use in building construction. However, thus far, there is a lack of publicly available data that combines material composition, energy use and also considers life cycle-related environmental impacts, such as life cycle-related GHG emissions, also referred to as building’s whole life carbon.

    The Global Database on Whole Life Carbon, Energy Use, and Material Intensity of Buildings (CarbEnMats-Buildings) published alongside this descriptor provides information on more than 1,200 buildings worldwide. The dataset includes attributes on geographical context and site, main building design characteristics, LCA-based assessment methods, as well as information on energy and material use, and related life cycle greenhouse gas (GHG) emissions, commonly referred to as whole life carbon (WLC), with a focus on embodied carbon (EC) emissions. The dataset compiles data obtained through a systematic review of the scientific literature as well as systematic data collection from both literature sources and industry partners. By applying a uniform data collection template (DCT) and related automated procedures for systematic data collection and compilation, we facilitate the processing, analysis and visualization along predefined categories and attributes, and support the consistency of data types and units. The descriptor includes specifications related to the DCT spreadsheet form used for obtaining these data as well as explanations of the data processing and feature engineering steps undertaken to clean and harmonise the data records. The validation focuses on describing the composition of the dataset and values observed for attributes related to whole life carbon, energy and material intensity.

    The data published with this descriptor offers the largest open compilation of data on whole life carbon emissions, energy use and material intensity of buildings published to date. This open dataset is expected to be valuable for research applications in the context of MFA, I/O and LCA modelling. It also offers a unique data source for benchmarking whole life carbon, energy use and material intensity of buildings to inform policy and decision-making in the context of the decarbonization of building construction and operation as well as commercial real estate in Europe and beyond.

    Files

    All files related to this descriptor are available on a public GitHub repository and related release via Zenodo (https://doi.org/10.5281/zenodo.8363895). The repository contains the following files:

    • README.md is a text file with instructions on how to use the files and documents.
    • CarbEnMats_attributes.XLSX is a table with the complete attribute description.
    • CarbEnMats_materials.XLSX is the table of material options and mappings.
    • CarbEnMats_dataset.XLSX is the building dataset in MS Excel format.
    • CarbEnMats_dataset.txt is the building dataset in tab-delimited TXT format.

    Further information

    Please consult the related data descriptor article (linked at the top) for further information, e.g.:

    • Methods (Data collection; data processing)
    • Data records (Files; Sources; Attributes)
    • Technical validation (Data overview; Data consistency)
    • Usage Notes (Attribute priority; Scope summary, Missing information)

    Code availability (LICENSE)

    The dataset, the data collection template as well as the code used for processing, harmonization and visualization are published under a GNU General Public License v3.0. The GNU General Public License is a free, copyleft license for software and other kinds of works. We encourage you to review, reuse, and refine the data and scripts and eventually share-alike.

    Contributing

    The CarbEnMats-Buildings database is the results of a highly collaborative effort and needs your active contributions to further improve and grow the open building data landscape. Reach out to the lead author (email, linkedin) if you are interested to contribute your data or time.

    Cite as

    When referring to this work, please cite both the descriptor and the dataset:

    • Descriptor: RÖCK, Martin, SORENSEN, Andreas, BALOUKTSI, Maria, RUSCHI MENDES SAADE, Marcella, RASMUSSEN, Freja Nygaard, BIRGISDOTTIR, Harpa, FRISCHKNECHT, Rolf, LÜTZKENDORF, Thomas, HOXHA, Endrit, HABERT, Guillaume, SATOLA, Daniel, TRUGER, Barbara, TOZAN, Buket, KUITTINEN, Matti, ALAUX, Nicolas, ALLACKER, Karen, & PASSER, Alexader. (2023). A Global Database on Whole Life Carbon, Energy and Material Intensity of Buildings (CarbEnMats-Buildings) [Preprint]. Zenodo. https://doi.org/10.5281/zenodo.8378939
    • Dataset: Martin Röck. (2023). mroeck/carbenmats-buildings: Pre-release (0.1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.8363895
  19. In vitro genotoxicity testing using γH2AX biomarker, Microscopy Dataset

    • zenodo.org
    • data.niaid.nih.gov
    txt, zip
    Updated Feb 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bára Křížkovská; Bára Křížkovská; Eva Jablonská; Eva Jablonská; Martin Schätz; Martin Schätz (2024). In vitro genotoxicity testing using γH2AX biomarker, Microscopy Dataset [Dataset]. http://doi.org/10.5281/zenodo.7673199
    Explore at:
    zip, txtAvailable download formats
    Dataset updated
    Feb 14, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Bára Křížkovská; Bára Křížkovská; Eva Jablonská; Eva Jablonská; Martin Schätz; Martin Schätz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Dataset was used and is supplementary to the paper "In vitro genotoxicity testing using γH2AX biomarker, microscopy and automatic image analysis in ImageJ - a pilot study with valinomycin". It contains both RAW single-channel images and numerical results obtained with BioImage Analysis and evaluation available through the GitHub repository: https://github.com/martinschatz-cz/genotoxicity-bia.

    Naming Convention for images
    ChannelName_YYYYMMD_Well_PossitionInWell_AcqRun.tiff

    Naming Convention for results
    Well_AllResults_YYYY-MM-DD_Results.csv

    The csv are ‘,’ separated, and automatically named by analysis script.


    Folder Structure

    • Images (1069 files, as Images.zip)

      • 4H - all images for all wells

      • 24H - all images for all wells

    • Results (36 files)

      • Results_4H (as Results_4h.zip)

      • Results_24H (as Results_24h.zip)

    Measurement Settings

    • Manufacturer and model of microscope: Olympus IX83 P2ZF

    • Objective lens magnification, NA: 10x Olympus IX3 Nosepiece, LensNA=0.3

    • Excitation filters (mounted in the light source)

      • Violet: 395/25nm LED module 1, DAPI

      • Green: 555/28nm LED module 5, Cy3

    • Quad band filter set for DAPI/FITC/Cy3/Cy5

    • Quad band polychroic mirror (mounted in the filter turret):

      • BP 411-454nm,

      • BP 495-536nm,

      • BP 577-617nm

      • BP 655-810nm

    • Emission filters (mounted in the fast emission filter wheel, infront the camera):

      • DAPI: BP 421-445nm

      • Cy3: BP 581-619nm

    • Illumination light source: Lumencor Lumencor Spectra X Lamp

    • Pixel size: 650nm x 650nm

    • Camera manufacturer and model: Hamamatsu ORCA-Flash4.0

    • Software program(s) and version: OLYMPUS cellSens Dimension 3.2 (Build 23706)

    • Image acquisition settings

      • expposure 500 ms

      • gain: 0

      • binning: 4 x 4

    • Experiment manager: ZDC + autofocus, two channels: DAPI and Cy3

    Image Processing and Analysis
    The data analysis workflow consists of several stages, each of which was executed by a specific script. Firstly, the raw data were manually cleaned and automatically sorted and organized using sort_wells.ijm in FIJI. Secondly, image analysis was performed using Process_WFolder_macro_v1.ijm in FIJI, which processed the image data and extracted the relevant features. Finally, the results were further processed using SF_dataVis_and_statistics_mean_XYh.ipynb in Python (Jupyter Notebook), which generated the final output in the form of a CSV file.

    In this repository, you can access the resulting CSV file, which contains the final results of our analysis. Additionally, we have provided the scripts used to process the data, which are available on our GitHub repository (LINK). You will find instruction how to create local Jupyter Hub for Python scripts. These scripts are accompanied by a short manual that provides an overview of the data analysis workflow and helps users navigate through the code. By making our scripts available, we hope to facilitate transparency and reproducibility of our research. If you encounter any issues, please report them through the GitHub repository: https://github.com/martinschatz-cz/genotoxicity-bia.

    We believe that our work can be useful for other researchers and analysts who are interested in studying similar datasets. We invite you to explore the contents of this repository and use the data and scripts provided here to further your research.

    Cell lines and culture conditions
    Human cervical adenocarcinoma (HeLa) and a Chinese hamster ovary (CHO-K1) cell lines were obtained from American Type Culture Collection (ATCC). The HeLa cells were grown MEM supplemented with 10 % FBS and NEAA. CHO-K1 cells were cultivated with DMEM supplemented with L-proline (final concentration 35 mg/l). The cell incubation took place in a humidified atmosphere of 5% CO2 at 37 °C.

    Direct measurement of DNA DSBs
    The cells were seeded in concentration 0,5 × 105 cells/ml into the 96-well plate (VWR, 10062-900). The cells were rinsed by phosphate buffered saline (PBS;) after 24h incubation, and medium with reduced FBS content (5 %) was added. Valinomycin was dissolved in DMSO and added to cells in two final concentrations (30 and 15 𝞵M). After 4h/24h incubation the visualization was done using and following protocol of HCS DNA Damage Kit. The cells were fixed by 4% paraformaldehyde solution for 15 min at room temperature. The cells were rinsed once by PBS and the permeabilization was performed using Triton® X-100 () solution by incubation for 15 min at room temperature. The wells were rinsed with PBS once and the plate was blocked by 1% BSA blocking solution. After 1 hour incubation at room temperature the blocking solution was removed and 100 𝞵l of pH2AX mouse monoclonal antibody solution (1:1000 in BSA) was pipetted into each well incubated for 1 hour at room temperature. After three times rinsing by PBS the 100 𝞵l of Alexa Fluor® 555 goat anti-mouse IgG (H+L; 1:2000) and Hoechst 33342 (1:6000) solution was incubated for 1 hour at room temperature protected from light. After the incubation the wells were rinsed three times by PBS. The plate was stored with 100 𝞵l in the refrigerator (4 °C) until the image analysis was performed.

  20. MD simulations of phosphorylated peptides (GGXXGG)

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jun 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Bickel; David Bickel; Wim Vranken; Wim Vranken (2024). MD simulations of phosphorylated peptides (GGXXGG) [Dataset]. http://doi.org/10.5281/zenodo.10518873
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    David Bickel; David Bickel; Wim Vranken; Wim Vranken
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains MD simulations and associated analyses of a peptide series of short peptides including phosphorylated residues. It is one of five repositories that are associated to the following research article:

    Bickel,D., and Vranken,W. (2024) Effects of Phosphorylation on Protein Backbone Dynamics and Conformational Preferences. J. Chem. Theory Comput. https://doi.org/10.1021/acs.jctc.4c00206.

    The full list of the related repositories is given here:

    1. Pentapeptide simulations: 10.5281/zenodo.10517328
    2. Hexapeptides simulations: 10.5281/zenodo.10518872
    3. Heptapeptides simulations: 10.5281/zenodo.10518971
    4. Octapeptides simulations: 10.5281/zenodo.10518993
    5. Nonapeptides simulations: 10.5281/zenodo.10519033
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Joost de Vries; Joost de Vries; Alex J. Poulton; Alex J. Poulton; Jeremy R. Young; Jeremy R. Young; Fanny M. Monteiro; Fanny M. Monteiro; Rosie M. Sheward; Rosie M. Sheward; Roberta Johnson; Kyoko Hagino; Kyoko Hagino; Patrizia Ziveri; Patrizia Ziveri; Levi J. Wolf; Levi J. Wolf; Roberta Johnson (2024). Coccolithophore Abundance, Size, Carbon And Distribution Estimates (CASCADE) [Dataset]. http://doi.org/10.5281/zenodo.12794780
Organization logo

Coccolithophore Abundance, Size, Carbon And Distribution Estimates (CASCADE)

Explore at:
zipAvailable download formats
Dataset updated
Jul 22, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Joost de Vries; Joost de Vries; Alex J. Poulton; Alex J. Poulton; Jeremy R. Young; Jeremy R. Young; Fanny M. Monteiro; Fanny M. Monteiro; Rosie M. Sheward; Rosie M. Sheward; Roberta Johnson; Kyoko Hagino; Kyoko Hagino; Patrizia Ziveri; Patrizia Ziveri; Levi J. Wolf; Levi J. Wolf; Roberta Johnson
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description
CASCADE is a global dataset for 139 extant coccolithophore taxonomic units. CASCADE includes a trait database (size and cellular organic and inorganic carbon contents) and taxonomic-specific global spatiotemporal distributions (Lat/Lon/Depth/Month/Year) of coccolithophore abundance and organic and inorganic carbon stocks. CASCADE covers all ocean basins over the upper 275 meters, spans the years 1964-2019 and includes 33,119 taxonomic-specific abundance observations. Within CASCADE, we characterise the underlying uncertainties due to measurement errors by propagating error estimates between the different studies.
Full details of the data set are provided in the associated Scientific Data manuscript. The repository contains five main folders: 1) "Classification", which contains YAML files with synonyms, family-level classifications, and life cycle phase associations and definitions; 2) "Concatenated literature", which contains the merged datasets of size, PIC and POC and which were corrected for taxonomic unit synonyms; 3) "Resampled cellular datasets", which contains the resampled datasets of size, PIC and POC in long format as well as a summary table; 4) "Gridded data sets", which contains gridded datasets of abundance, PIC and POC; 5) "Species lists", which contains spreadsheets of the "common" (>20 obs) and "rare" (<20 obs) species and their number of observations.
The CASCADE data set can be easily reproduced using the scripts and data provided in the associated github repository: https://github.com/nanophyto/CASCADE/tree/v0.1.1" target="_blank" rel="noopener">https://github.com/nanophyto/CASCADE/ (zenodo.12797197)

Correspondence to: Joost de Vries, joost.devries@bristol.ac.uk

Search
Clear search
Close search
Google apps
Main menu