100+ datasets found

h
Data from: summarize
huggingface.co
Updated May 31, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Varun R (2024). summarize [Dataset]. https://huggingface.co/datasets/varunr14/summarize
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 31, 2024
Authors
Varun R
License
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Description
varunr14/summarize dataset hosted on Hugging Face and contributed by the HF Datasets community
m
Data for: Extractive Summarization of Clinical Trial Descriptions
data.mendeley.com
Updated Jun 13, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christian Gulden (2019). Data for: Extractive Summarization of Clinical Trial Descriptions [Dataset]. http://doi.org/10.17632/gg58kc7zy7.1
Explore at:
Unique identifier
https://doi.org/10.17632/gg58kc7zy7.1
Dataset updated
Jun 13, 2019
Authors
Christian Gulden
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This archive contains the summarization corpus generated as a result of the filtering stages (trials-final.csv), the rouge scores for the generated summaries (rouge-results-parsed.csv), the data and results of the human evaluation (evaluation/ subfolder), the code used to generate the corpus (extract.r, filter.r, and determine_similarity_threshold.r). The summaries were generated using the summarize_all.py script.
Data from: Optimized SMRT-UMI protocol produces highly accurate sequence...
data.niaid.nih.gov
datadryad.org
zip
Updated Dec 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dylan Westfall; Mullins James (2023). Optimized SMRT-UMI protocol produces highly accurate sequence datasets from diverse populations – application to HIV-1 quasispecies [Dataset]. http://doi.org/10.5061/dryad.w3r2280w0
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.w3r2280w0
Dataset updated
Dec 7, 2023
Dataset provided by
HIV Prevention Trials Networkhttp://www.hptn.org/
National Institute of Allergy and Infectious Diseaseshttp://www.niaid.nih.gov/
HIV Vaccine Trials Networkhttp://www.hvtn.org/
PEPFAR
Authors
Dylan Westfall; Mullins James
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Pathogen diversity resulting in quasispecies can enable persistence and adaptation to host defenses and therapies. However, accurate quasispecies characterization can be impeded by errors introduced during sample handling and sequencing which can require extensive optimizations to overcome. We present complete laboratory and bioinformatics workflows to overcome many of these hurdles. The Pacific Biosciences single molecule real-time platform was used to sequence PCR amplicons derived from cDNA templates tagged with universal molecular identifiers (SMRT-UMI). Optimized laboratory protocols were developed through extensive testing of different sample preparation conditions to minimize between-template recombination during PCR and the use of UMI allowed accurate template quantitation as well as removal of point mutations introduced during PCR and sequencing to produce a highly accurate consensus sequence from each template. Handling of the large datasets produced from SMRT-UMI sequencing was facilitated by a novel bioinformatic pipeline, Probabilistic Offspring Resolver for Primer IDs (PORPIDpipeline), that automatically filters and parses reads by sample, identifies and discards reads with UMIs likely created from PCR and sequencing errors, generates consensus sequences, checks for contamination within the dataset, and removes any sequence with evidence of PCR recombination or early cycle PCR errors, resulting in highly accurate sequence datasets. The optimized SMRT-UMI sequencing method presented here represents a highly adaptable and established starting point for accurate sequencing of diverse pathogens. These methods are illustrated through characterization of human immunodeficiency virus (HIV) quasispecies. Methods This serves as an overview of the analysis performed on PacBio sequence data that is summarized in Analysis Flowchart.pdf and was used as primary data for the paper by Westfall et al. "Optimized SMRT-UMI protocol produces highly accurate sequence datasets from diverse populations – application to HIV-1 quasispecies" Five different PacBio sequencing datasets were used for this analysis: M027, M2199, M1567, M004, and M005 For the datasets which were indexed (M027, M2199), CCS reads from PacBio sequencing files and the chunked_demux_config files were used as input for the chunked_demux pipeline. Each config file lists the different Index primers added during PCR to each sample. The pipeline produces one fastq file for each Index primer combination in the config. For example, in dataset M027 there were 3–4 samples using each Index combination. The fastq files from each demultiplexed read set were moved to the sUMI_dUMI_comparison pipeline fastq folder for further demultiplexing by sample and consensus generation with that pipeline. More information about the chunked_demux pipeline can be found in the README.md file on GitHub. The demultiplexed read collections from the chunked_demux pipeline or CCS read files from datasets which were not indexed (M1567, M004, M005) were each used as input for the sUMI_dUMI_comparison pipeline along with each dataset's config file. Each config file contains the primer sequences for each sample (including the sample ID block in the cDNA primer) and further demultiplexes the reads to prepare data tables summarizing all of the UMI sequences and counts for each family (tagged.tar.gz) as well as consensus sequences from each sUMI and rank 1 dUMI family (consensus.tar.gz). More information about the sUMI_dUMI_comparison pipeline can be found in the paper and the README.md file on GitHub. The consensus.tar.gz and tagged.tar.gz files were moved from sUMI_dUMI_comparison pipeline directory on the server to the Pipeline_Outputs folder in this analysis directory for each dataset and appended with the dataset name (e.g. consensus_M027.tar.gz). Also in this analysis directory is a Sample_Info_Table.csv containing information about how each of the samples was prepared, such as purification methods and number of PCRs. There are also three other folders: Sequence_Analysis, Indentifying_Recombinant_Reads, and Figures. Each has an .Rmd file with the same name inside which is used to collect, summarize, and analyze the data. All of these collections of code were written and executed in RStudio to track notes and summarize results. Sequence_Analysis.Rmd has instructions to decompress all of the consensus.tar.gz files, combine them, and create two fasta files, one with all sUMI and one with all dUMI sequences. Using these as input, two data tables were created, that summarize all sequences and read counts for each sample that pass various criteria. These are used to help create Table 2 and as input for Indentifying_Recombinant_Reads.Rmd and Figures.Rmd. Next, 2 fasta files containing all of the rank 1 dUMI sequences and the matching sUMI sequences were created. These were used as input for the python script compare_seqs.py which identifies any matched sequences that are different between sUMI and dUMI read collections. This information was also used to help create Table 2. Finally, to populate the table with the number of sequences and bases in each sequence subset of interest, different sequence collections were saved and viewed in the Geneious program. To investigate the cause of sequences where the sUMI and dUMI sequences do not match, tagged.tar.gz was decompressed and for each family with discordant sUMI and dUMI sequences the reads from the UMI1_keeping directory were aligned using geneious. Reads from dUMI families failing the 0.7 filter were also aligned in Genious. The uncompressed tagged folder was then removed to save space. These read collections contain all of the reads in a UMI1 family and still include the UMI2 sequence. By examining the alignment and specifically the UMI2 sequences, the site of the discordance and its case were identified for each family as described in the paper. These alignments were saved as "Sequence Alignments.geneious". The counts of how many families were the result of PCR recombination were used in the body of the paper. Using Identifying_Recombinant_Reads.Rmd, the dUMI_ranked.csv file from each sample was extracted from all of the tagged.tar.gz files, combined and used as input to create a single dataset containing all UMI information from all samples. This file dUMI_df.csv was used as input for Figures.Rmd. Figures.Rmd used dUMI_df.csv, sequence_counts.csv, and read_counts.csv as input to create draft figures and then individual datasets for eachFigure. These were copied into Prism software to create the final figures for the paper.
f
Data_Sheet_4_“R” U ready?: a case study using R to analyze changes in gene...
frontiersin.figshare.com
docx
Updated Mar 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder (2024). Data_Sheet_4_“R” U ready?: a case study using R to analyze changes in gene expression during evolution.docx [Dataset]. http://doi.org/10.3389/feduc.2024.1379910.s004
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/feduc.2024.1379910.s004
Dataset updated
Mar 22, 2024
Dataset provided by
Frontiers
Authors
Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.
d
Summary of Areal (mixed layer integrated) data from R/V Tangaroa cruise...
search.dataone.org
bco-dmo.org
+1more
Updated Dec 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mr Mark Gall; Philip W. Boyd; Doug Mackie (2021). Summary of Areal (mixed layer integrated) data from R/V Tangaroa cruise 61TG_3052 in the Southern Ocean in 1999 (SOIREE project) [Dataset]. https://search.dataone.org/view/sha256:1bcb98fb7c731c843538a35be8e2e154c8b49034d17eaa44d8ae2033990f0df3
Explore at:
Dataset updated
Dec 5, 2021
Dataset provided by
Biological and Chemical Oceanography Data Management Office (BCO-DMO)
Authors
Mr Mark Gall; Philip W. Boyd; Doug Mackie
Area covered
Southern Ocean
Description
SOIREE Summary of Areal (mixed layer integrated) data
All areal (integrated to the base of the mixed layer) estimates for
chla, phaeopigments, cell numbers, algal carbon, and 14C carbon uptake.
Includes data on mean chl and phaeopigments, carbon/chl ratios, chl/cell
and growth rates.
d
Station and data summary for data collected on Stellwagen Bank during U.S....
catalog.data.gov
datasets.ai
+2more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Station and data summary for data collected on Stellwagen Bank during U.S. Geological Survey field activity 2014-070-FA, aboard the R/V Auk, December 12, 2014 [Dataset]. https://catalog.data.gov/dataset/station-and-data-summary-for-data-collected-on-stellwagen-bank-during-u-s-geological-su-12
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Gerry E. Studds/Stellwagen Bank National Marine Sanctuary
Description
This field activity is part of the effort to map geologic substrates of the Stellwagen Bank National Marine Sanctuary region off Boston, Massachusetts. The overall goal is to develop high-resolution (1:25,000) interpretive maps, based on multibeam sonar data and seabed sampling, showing surficial geology and seabed sediment dynamics. This cruise was conducted in collaboration with the Stellwagen Bank National Marine Sanctuary, and the data collected will aid research on the ecology of fish and invertebrate species that inhabit the region. The Sanctuary's research vessel, R/V Auk, visited 33 locations on Stellwagen Bank at which a customized Van Veen grab sampler (SEABOSS) equipped with a video camera and a CTD was deployed in drift mode to collect sediment for grain-size analysis, video imagery of the seabed, and measurements of water column properties.
Data from: WiBB: An integrated method for quantifying the relative...
zenodo.org
data.niaid.nih.gov
+1more
zip
Updated Jun 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qin Li; Qin Li; Xiaojun Kou; Xiaojun Kou (2022). WiBB: An integrated method for quantifying the relative importance of predictive variables [Dataset]. http://doi.org/10.5061/dryad.xsj3tx9g1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.xsj3tx9g1
Dataset updated
Jun 5, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Qin Li; Qin Li; Xiaojun Kou; Xiaojun Kou
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains simulated datasets, empirical data, and R scripts described in the paper: "Li, Q. and Kou, X. (2021) WiBB: An integrated method for quantifying the relative importance of predictive variables. Ecography (DOI: 10.1111/ecog.05651)".

A fundamental goal of scientific research is to identify the underlying variables that govern crucial processes of a system. Here we proposed a new index, WiBB, which integrates the merits of several existing methods: a model-weighting method from information theory (Wi), a standardized regression coefficient method measured by ß* (B), and bootstrap resampling technique (B). We applied the WiBB in simulated datasets with known correlation structures, for both linear models (LM) and generalized linear models (GLM), to evaluate its performance. We also applied two other methods, relative sum of wight (SWi), and standardized beta (ß*), to evaluate their performance in comparison with the WiBB method on ranking predictor importances under various scenarios. We also applied it to an empirical dataset in a plant genus Mimulus to select bioclimatic predictors of species' presence across the landscape. Results in the simulated datasets showed that the WiBB method outperformed the ß* and SWi methods in scenarios with small and large sample sizes, respectively, and that the bootstrap resampling technique significantly improved the discriminant ability. When testing WiBB in the empirical dataset with GLM, it sensibly identified four important predictors with high credibility out of six candidates in modeling geographical distributions of 71 Mimulus species. This integrated index has great advantages in evaluating predictor importance and hence reducing the dimensionality of data, without losing interpretive power. The simplicity of calculation of the new metric over more sophisticated statistical procedures, makes it a handy method in the statistical toolbox.
Data records summary and plots R code
springernature.figshare.com
txt
Updated Feb 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ester Premate; Cene Fišer (2024). Data records summary and plots R code [Dataset]. http://doi.org/10.6084/m9.figshare.25102775.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25102775.v1
Dataset updated
Feb 9, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Ester Premate; Cene Fišer
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The R code used for summarization of the dataset contents and production of the figures presented in the Data Descriptor.
National Energy Efficiency Data-Framework (NEED) report: summary of analysis...
gov.uk
Updated Jun 28, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department for Business, Energy & Industrial Strategy (2018). National Energy Efficiency Data-Framework (NEED) report: summary of analysis 2018 [Dataset]. https://www.gov.uk/government/statistics/national-energy-efficiency-data-framework-need-report-summary-of-analysis-2018
Explore at:
Dataset updated
Jun 28, 2018
Dataset provided by
GOV.UKhttp://gov.uk/
Authors
Department for Business, Energy & Industrial Strategy
Description
The National Energy Efficiency Data-Framework (NEED) was set up to provide a better understanding of energy use and energy efficiency in domestic and non-domestic buildings in Great Britain. The data framework matches data about a property together - including energy consumption and energy efficiency measures installed - at household level.

To maximise the usefulness of future publications, please provide feedback by completing this 1 minute survey:

https://www.surveymonkey.co.uk/r/TJTGZJT" class="govuk-link">https://www.surveymonkey.co.uk/r/TJTGZJT
SYD ALL climate data statistics summary
researchdata.edu.au
Updated Mar 13, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bioregional Assessment Program (2019). SYD ALL climate data statistics summary [Dataset]. https://researchdata.edu.au/syd-all-climate-statistics-summary/2989432
Explore at:
Dataset updated
Mar 13, 2019
Dataset provided by
Data.govhttps://data.gov/
Authors
Bioregional Assessment Program
License
Attribution 2.5 (CC BY 2.5)https://creativecommons.org/licenses/by/2.5/
License information was derived automatically
Description
Abstract \r

\r The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.\r \r \r \r There are 4 csv files here:\r \r BAWAP_P_annual_BA_SYB_GLO.csv\r \r Desc: Time series mean annual BAWAP rainfall from 1900 - 2012.\r \r Source data: annual BILO rainfall on \\wron\Project\BA\BA_N_Sydney\Working\li036_Lingtao_LI\Grids\BILO_Rain_Ann\\r \r \r \r P_PET_monthly_BA_SYB_GLO.csv\r \r long term average BAWAP rainfall and Penman PET from 198101 - 201212 for each month\r \r \r \r Climatology_Trend_BA_SYB_GLO.csv\r \r Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P; (ii) Penman ETp; (iii) Tavg; (iv) Tmax; (v) Tmin; (vi) VPD; (vii) Rn; and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend\r \r \r \r Risbey_Remote_Rainfall_Drivers_Corr_Coeffs_BA_NSB_GLO.csv\r \r Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009). All data used in this analysis came directly from James Risbey, CMAR, Hobart. As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).\r \r

Dataset History \r

\r Dataset was created from various BILO source data, including Monthly BILO rainfall, Tmax, Tmin, VPD, etc, and other source data including monthly Penman PET (calculated by Randall Donohue), Correlation coefficient data from James Risbey\r \r

Dataset Citation \r

\r Bioregional Assessment Programme (XXXX) SYD ALL climate data statistics summary. Bioregional Assessment Derived Dataset. Viewed 13 March 2019, http://data.bioregionalassessments.gov.au/dataset/b0a6ccf1-395d-430e-adf1-5068f8371dea.\r \r

Dataset Ancestors \r

\r * Derived From BILO Gridded Climate Data: Daily Climate Data for each year from 1900 to 2012\r \r
o
Data from: What's the issue here?: Task-based evaluation of reader comment...
explore.openaire.eu
Updated May 23, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
E. Barker; M. Paramita; A. Funk; E. Kurtic; A. Aker; J. Foster; M. Hepple; R. Gaizauskas (2016). What's the issue here?: Task-based evaluation of reader comment summarization systems [Dataset]. https://explore.openaire.eu/search/other?orpId=core_ac_uk_::ca3b5cffcf6468d3442c6747614b1cd7
Explore at:
Dataset updated
May 23, 2016
Authors
E. Barker; M. Paramita; A. Funk; E. Kurtic; A. Aker; J. Foster; M. Hepple; R. Gaizauskas
Description
Automatic summarization of reader comments in on-line news is an extremely challenging task and a capability for which there is a\ud clear need. Work to date has focussed on producing extractive summaries using well-known techniques imported from other areas of\ud language processing. But are extractive summaries of comments what users really want? Do they support users in performing the sorts\ud of tasks they are likely to want to perform with reader comments? In this paper we address these questions by doing three things. First,\ud we offer a specification of one possible summary type for reader comment, based on an analysis of reader comment in terms of issues\ud and viewpoints. Second, we define a task-based evaluation framework for reader comment summarization that allows summarization\ud systems to be assessed in terms of how well they support users in a time-limited task of identifying issues and characterising opinion on\ud issues in comments. Third, we describe a pilot evaluation in which we used the task-based evaluation framework to evaluate a prototype\ud reader comment clustering and summarization system, demonstrating the viability of the evaluation framework and illustrating the sorts\ud of insight such an evaluation affords.
u
Data from: LTAR Phosphorus Budget Summary
agdatacommons.nal.usda.gov
datasetcatalog.nlm.nih.gov
+4more
txt
Updated Feb 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mark R. Williams; Pauline Welikhe; Janae Bos; Kevin W. King; Mark Akland; David J. Augustine; Claire Baffaut; Ennis G. Beck; Andrew Bierer; David Bosch; Elizabeth Boughton; Carolina Brandani; Erin S. Brooks; Anthony R. Buda; Michel Cavigelli; Joshua Faulkner; Gary W. Feyereisen; Ann-Marie Fortuna; Joshua Gamble; Britany R. Hanrahan; Mir Zaman Hussain; Marta M. Kohmann; John L. Kovar; Brad Lee; April B. Leytem; Mark A. Liebig; Daniel Line; Merrin Macrae; Thomas B. Moorman; Daniel Moriasi; Nathan Nelson; Aline Ortega-Pieck; Deanna Osmond; Oliva Pisani; John Ragosta; Michele Reba; Amartya Saha; Joao Sanchez; Maria Silveira; Douglas R. Smith; Sheri Spiegal; Hilary Swain; Jason Unrine; Pearl Webb; Kathryn E. White; Henry Wilson; Lindsey M.W. Yasarer (2024). LTAR Phosphorus Budget Summary [Dataset]. http://doi.org/10.15482/USDA.ADC/1523365
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/1523365
Dataset updated
Feb 16, 2024
Dataset provided by
Ag Data Commons
Authors
Mark R. Williams; Pauline Welikhe; Janae Bos; Kevin W. King; Mark Akland; David J. Augustine; Claire Baffaut; Ennis G. Beck; Andrew Bierer; David Bosch; Elizabeth Boughton; Carolina Brandani; Erin S. Brooks; Anthony R. Buda; Michel Cavigelli; Joshua Faulkner; Gary W. Feyereisen; Ann-Marie Fortuna; Joshua Gamble; Britany R. Hanrahan; Mir Zaman Hussain; Marta M. Kohmann; John L. Kovar; Brad Lee; April B. Leytem; Mark A. Liebig; Daniel Line; Merrin Macrae; Thomas B. Moorman; Daniel Moriasi; Nathan Nelson; Aline Ortega-Pieck; Deanna Osmond; Oliva Pisani; John Ragosta; Michele Reba; Amartya Saha; Joao Sanchez; Maria Silveira; Douglas R. Smith; Sheri Spiegal; Hilary Swain; Jason Unrine; Pearl Webb; Kathryn E. White; Henry Wilson; Lindsey M.W. Yasarer
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
Surface agronomic P budgets for 61 cropping systems using field-scale P flux data across 24 research sites in the United States and Canada. Data are representative of P inputs and outputs associated with the production of each crop in a respective rotation year, ranging from 1 to 10 rotation years. This dataset provides a comparison of field-scale soil surface P fluxes and phosphorus budgets across sites and cropping systems. Resources in this dataset:Resource Title: LTAR Phosphorus Budget Summary Data Sources and References. File Name: DataSourcesAndReferences.csvResource Description: This file includes data sources and references relevant to calculated P budgets. Affiliated numerical data can be found in the LTAR Phosphorus Budget Summary Data file.Resource Title: LTAR Phosphorus Budget Summary Data. File Name: PBudgetData.csvResource Description: Agronomic annual and system data for calculated P budgets for cropping systems throughout the United States and Canada.
CTD Summary Data
data.ucar.edu
ckanprod.ucar.edu
archive
Updated Jan 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stephen R. Okkonen (2025). CTD Summary Data [Dataset]. http://doi.org/10.5065/D6DB7ZVJ
Explore at:
archiveAvailable download formats
Unique identifier
https://doi.org/10.5065/D6DB7ZVJ
Dataset updated
Jan 3, 2025
Dataset provided by
University Corporation for Atmospheric Research
Authors
Stephen R. Okkonen
Time period covered
Jul 30, 1971 - Sep 29, 2013
Area covered

Description
This dataset includes extracted summary data from 12000+ Conductivity, Temperature, Depth (CTD) casts conducted in the northern Bering, Chukchi, and southern Beaufort Seas from 1970-present. One record per cast. Each record has station information, date, time, location, water depth, cast depth, surface temperature/salinity, bottom temperature/salinity, temperature maximum, stratification parameters (max Brunt-Vaisala frequency, mixed layer depth), fresh water content, and heat content. This dataset is part of the Pacific Marine Arctic Regional Synthesis (PacMARS) Project. Note: These data were updated on June 4, 2014.
Z
Storage and Transit Time Data and Code
data.niaid.nih.gov
zenodo.org
Updated Jun 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew Felton (2024). Storage and Transit Time Data and Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8136816
Explore at:
Dataset updated
Jun 12, 2024
Dataset authored and provided by
Andrew Felton
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Author: Andrew J. FeltonDate: 5/5/2024

This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis and figure production for the study entitled:

"Global estimates of the storage and transit time of water through vegetation"

Please note that 'turnover' and 'transit' are used interchangeably in this project.

Data information:

The data folder contains key data sets used for analysis. In particular:

"data/turnover_from_python/updated/annual/multi_year_average/average_annual_turnover.nc" contains a global array summarizing five year (2016-2020) averages of annual transit, storage, canopy transpiration, and number of months of data. This is the core dataset for the analysis; however, each folder has much more data, including a dataset for each year of the analysis. Data are also available is separate .csv files for each land cover type. Oterh data can be found for the minimum, monthly, and seasonal transit time found in their respective folders. These data were produced using the python code found in the "supporting_code" folder given the ease of working with .nc and EASE grid in the xarray python module. R was used primarily for data visualization purposes. The remaining files in the "data" and "data/supporting_data"" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here.

Code information

Python scripts can be found in the "supporting_code" folder.

Each R script in this project has a particular function:

01_start.R: This script loads the R packages used in the analysis, sets thedirectory, and imports custom functions for the project. You can also load in the main transit time (turnover) datasets here using the source() function.

02_functions.R: This script contains the custom function for this analysis, primarily to work with importing the seasonal transit data. Load this using the source() function in the 01_start.R script.

03_generate_data.R: This script is not necessary to run and is primarilyfor documentation. The main role of this code was to import and wranglethe data needed to calculate ground-based estimates of aboveground water storage.

04_annual_turnover_storage_import.R: This script imports the annual turnover andstorage data for each landcover type. You load in these data from the 01_start.R scriptusing the source() function.

05_minimum_turnover_storage_import.R: This script imports the minimum turnover andstorage data for each landcover type. Minimum is defined as the lowest monthlyestimate.You load in these data from the 01_start.R scriptusing the source() function.

06_figures_tables.R: This is the main workhouse for figure/table production and supporting analyses. This script generates the key figures and summary statistics used in the study that then get saved in the manuscript_figures folder. Note that allmaps were produced using Python code found in the "supporting_code"" folder.
d
MOCNESS Summary Data from R/V Polarstern ANT-XXIV_1 (PS24-1) from October to...
search.dataone.org
bco-dmo.org
+1more
Updated Apr 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peter H. Wiebe (2022). MOCNESS Summary Data from R/V Polarstern ANT-XXIV_1 (PS24-1) from October to November 2007 (CMarZ_2004-2010 project) [Dataset]. https://search.dataone.org/view/sha256%3A9325087615bb0ecab28ce818470f61a6f46565d5a6c9f7310c3db3de9ea00492
Explore at:
Dataset updated
Apr 15, 2022
Dataset provided by
Biological and Chemical Oceanography Data Management Office (BCO-DMO)
Authors
Peter H. Wiebe
Description
Summary of environmental data collected by the MOCNESS systems' (1 and 10 meter-square) electronics packages.
Summary for Policymakers of the Working Group I Contribution to the IPCC...
catalogue.ceda.ac.uk
Updated Mar 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nathan P. Gillett; Elizaveta Malinina; Darrell Kaufman; Raphael Neukom (2024). Summary for Policymakers of the Working Group I Contribution to the IPCC Sixth Assessment Report - data for Figure SPM.1 (v20221116) [Dataset]. https://catalogue.ceda.ac.uk/uuid/0b2759059ad6474098e40dad73e0a8ec
Explore at:
Dataset updated
Mar 9, 2024
Dataset provided by
Centre for Environmental Data Analysishttp://www.ceda.ac.uk/
Authors
Nathan P. Gillett; Elizaveta Malinina; Darrell Kaufman; Raphael Neukom
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 1 - Dec 31, 2019
Area covered
Earth
Variables measured
surface_temperature_anomaly
Description
Data for Figure SPM.1 from the Summary for Policymakers (SPM) of the Working Group I (WGI) Contribution to the Intergovernmental Panel on Climate Change (IPCC) Sixth Assessment Report (AR6).

Figure SPM.1 shows global temperature history and causes of recent warming.

How to cite this dataset

When citing this dataset, please include both the data citation below (under 'Citable as') and the following citation for the report component from which the figure originates:

IPCC, 2021: Summary for Policymakers. In: Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change [Masson-Delmotte, V., P. Zhai, A. Pirani, S. L. Connors, C. Péan, S. Berger, N. Caud, Y. Chen, L. Goldfarb, M. I. Gomis, M. Huang, K. Leitzell, E. Lonnoy, J.B.R. Matthews, T. K. Maycock, T. Waterfield, O. Yelekçi, R. Yu and B. Zhou (eds.)]. Cambridge University Press. In Press.

Figure subpanels

The figure has two panels, with data provided for all panels in subdirectories named panel_a and panel_b.

List of data provided

Panel a

The dataset contains:

Estimated temperature during the warmest multi-century period in at least the last 100,000 years, which occurred around 6500 years ago (4500 BCE), multi-centennial average, from AR6 WGI Chapter 2

Global surface temperature change time series relative to 1850-1900 for 1-2020 from: • 1-2000 CE reconstruction from paleoclimate archives, decadal smoothed, from PAGES2k Consortium (2019, DOI: 10.1038/s41561-019-0400-0) • 1850-2020 CE, observations, decadal smoothed, from AR6 WGI Chapter 2 assessed mean

Panel b:

The dataset contains global surface temperature change time series relative to 1850-1900 for 1850-2020 from simulations from the sixth phase of the Coupled Model Intercomparison Project (CMIP6) and observations:

CMIP6 historical+ssp245 simulations (simulations with human and natural forcing, 1850-2019)

CMIP6 hist-nat simulations (simulations with natural forcing, 1850-2019)

Global Surface Temperature Anomalies (GSTA) relative to 1850-1900 from observations assessed in IPCC AR6 WG1 Chapter 2 (1850-2020)

Data provided in relation to figure

Panel a:

panel_a/SPM1_1-2000_recon.csv, 1-2000 time series, decadal smoothed, for years centred on 5-1996 CE [column 1 grey line, columns 2 and 3 grey shading]

panel_a/SPM1_1850-2020_obs.csv, 1850-2020 time series, decadal smoothed, for years centered on 1855-2016 CE [black line]

panel_a/SPM1_6500_recon.csv, bar for the warmest multi-century period in more than 100,000 years (around 6500 years ago: 4500 BCE) [grey bar]

Panel b:

panel_b/gmst_changes_model_and_obs.csv. Global surface temperature change time series relative to 1850-1900 for 1850-2020 from: • CMIP6 historical+ssp245 simulations (1850-2019) [mean, brown line] • CMIP6 historical+ssp245 simulations (1850-2019) [5% range, brown shading, bottom] • CMIP6 historical+ssp245 simulations (1850-2019) [95% range, brown shading, top] • CMIP6 hist-nat simulations (1850-2019) [mean, green line] • CMIP6 hist-nat simulations (1850-2019) [5% range, green shading, bottom] • CMIP6 hist-nat simulations (1850-2019) [95% range, green shading, top] • Global Surface Temperature Anomalies (GSTA) relative to 1850-1900 from observations assessed in IPCC AR6 WG1 Chapter 2 (1850-2020) [black line]

Sources of additional information

The following weblinks are provided in the Related Documents section of this catalogue record: - Link to the figure on the IPCC AR6 website - Link to the report webpage, which includes the report component containing the figure (Summary for Policymakers), the Technical Summary (Cross-Section Box TS.1, Figure 1a) and the Supplementary Material for Chapters 2 and 3, which contains details on the input data used in Tables 2.SM.1 (Figure 2.11a) and 3.SM.1 (Figure 3.2c; FAQ 3.1, Figure 1). - Link to related publication for input data - Link to the webpage of the WGI report
r
Summary Data: Threatened Species Occurrences by Marine Ecoregion
researchdata.edu.au
data.nsw.gov.au
+1more
Updated Dec 22, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.nsw.gov.au (2022). Summary Data: Threatened Species Occurrences by Marine Ecoregion [Dataset]. https://researchdata.edu.au/summary-data-threatened-marine-ecoregion/2282373
Explore at:
Dataset updated
Dec 22, 2022
Dataset provided by
data.nsw.gov.au
Area covered

Description
Summary of species occurrence data from 1900 to 2020 for Australian marine species organised by IMCRA region and EPBC status. Counts are provided by species and IMCRA region for:\r \r 1.\tThe total number of occurrence records within the region, for a given EPBC status and time period\r 2.\tThe number of distinct species recorded within the region, for a given EPBC status and time period \r \r Occurrence records were aggregated and organised by the Atlas of Living Australia (ALA, https://ala.org.au/) and include survey and monitoring data collected and managed by the Integrated Marine Observing System (IMOS, https://imos.org.au/) and the Terrestrial Ecosystem Research Network (TERN, https://tern.org.au/). \r \r To find out more about this dataset, visit: https://ecoassets.org.au/data/summary-data-threatened-species-occurrences-by-marine-ecoregion/\r \r _ DOI: https://doi.org/10.26197/ala.2f745938-a1f9-408c-942f-7d246860d313_
Technical Summary of the Working Group I Contribution to the IPCC Sixth...
catalogue.ceda.ac.uk
Updated Mar 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Darrell Kaufman (2024). Technical Summary of the Working Group I Contribution to the IPCC Sixth Assessment Report - data for Box TS.2, Figure 1 (v20220817) [Dataset]. https://catalogue.ceda.ac.uk/uuid/3e344ee52c6b42a4ac37906f863b762e
Explore at:
Dataset updated
Mar 9, 2024
Dataset provided by
Centre for Environmental Data Analysishttp://www.ceda.ac.uk/
Authors
Darrell Kaufman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 1 - Dec 31, 1994
Area covered
Earth
Description
Data for Box TS.2, Figure 1 from the Technical Summary of the Working Group I (WGI) Contribution to the Intergovernmental Panel on Climate Change (IPCC) Sixth Assessment Report (AR6).

BOX TS.2, Figure 1 shows paleoclimate and recent reference periods, with selected key indicators.

How to cite this dataset

When citing this dataset, please include both the data citation below (under 'Citable as') and the following citation for the report component from which the figure originates: Arias, P.A., N. Bellouin, E. Coppola, R.G. Jones, G. Krinner, J. Marotzke, V. Naik, M.D. Palmer, G.-K. Plattner, J. Rogelj, M. Rojas, J. Sillmann, T. Storelvmo, P.W. Thorne, B. Trewin, K. Achuta Rao, B. Adhikary, R.P. Allan, K. Armour, G. Bala, R. Barimalala, S. Berger, J.G. Canadell, C. Cassou, A. Cherchi, W. Collins, W.D. Collins, S.L. Connors, S. Corti, F. Cruz, F.J. Dentener, C. Dereczynski, A. Di Luca, A. Diongue Niang, F.J. Doblas-Reyes, A. Dosio, H. Douville, F. Engelbrecht, V. Eyring, E. Fischer, P. Forster, B. Fox-Kemper, J.S. Fuglestvedt, J.C. Fyfe, N.P. Gillett, L. Goldfarb, I. Gorodetskaya, J.M. Gutierrez, R. Hamdi, E. Hawkins, H.T. Hewitt, P. Hope, A.S. Islam, C. Jones, D.S. Kaufman, R.E. Kopp, Y. Kosaka, J. Kossin, S. Krakovska, J.-Y. Lee, J. Li, T. Mauritsen, T.K. Maycock, M. Meinshausen, S.-K. Min, P.M.S. Monteiro, T. Ngo-Duc, F. Otto, I. Pinto, A. Pirani, K. Raghavan, R. Ranasinghe, A.C. Ruane, L. Ruiz, J.-B. Sallée, B.H. Samset, S. Sathyendranath, S.I. Seneviratne, A.A. Sörensson, S. Szopa, I. Takayabu, A.-M. Tréguier, B. van den Hurk, R. Vautard, K. von Schuckmann, S. Zaehle, X. Zhang, and K. Zickfeld, 2021: Technical Summary. In Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change [Masson-Delmotte, V., P. Zhai, A. Pirani, S.L. Connors, C. Péan, S. Berger, N. Caud, Y. Chen, L. Goldfarb, M.I. Gomis, M. Huang, K. Leitzell, E. Lonnoy, J.B.R. Matthews, T.K. Maycock, T. Waterfield, O. Yelekçi, R. Yu, and B. Zhou (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, pp. 33−144, doi:10.1017/9781009157896.002.

Figure subpanels

The figure has two panels with data provided for all panels in one single file.

List of data provided

This dataset contains three selected global climate indicators covary across multiple paleoclimate reference periods: - CO2 - Temperature - Sea Level

Data provided in relation to figure

Data file: 'BoxTS2_Fig_1b.csv' : Box TS.2 Fig1 shows a subset of the data from Fig 2.34 (section 2.3.5). The CO2 data for the paleo reference periods are listed in Table 2.1 (section 2.2.3). The global temperature data in panels (a) and (b) are presented with explanation of assessed values in the text (section 2.3.1.1). Sea level in panel (a) is presented with explanation of assessed values in the text (section 2.3.3.3).

GMST stands for global mean surface temperature.

Notes on reproducing the figure from the provided data

Table 2.1 lists underlying CO2 concentrations for reference periods.

Cross-Chapter Box 2.1 lists and describes the paleoclimate climate reference periods.

Temporal Range of Paleoclimate Data

This dataset covers a paleoclimate timespan from the Cenozoic to recent past, including multiple paleoclimate reference periods.

Sources of additional information

The following weblinks are provided in the Related Documents section of this catalogue record: - Link to the figure on the IPCC AR6 website - Link to the report component containing the figure (Technical Summary) - Link to the report component for Chapter 2 containing relevant figures - Link to the Supplementary Material for Chapter 2, which contains details on the input data used in Table 2.SM.1
d
Land change and carbon balance projections for the Hawaiian Islands
catalog.data.gov
data.usgs.gov
+2more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Land change and carbon balance projections for the Hawaiian Islands [Dataset]. https://catalog.data.gov/dataset/land-change-and-carbon-balance-projections-for-the-hawaiian-islands
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Hawaiian Islands, Hawaii
Description
Tabular data output from a series of modeling simulations for the seven main Hawaiian Islands. We used the LUCAS model to project changes in ecosystem carbon balance resulting from land use, land use change, climate change, and wildfire. The model was run at a 250-m spatial resolution on an annual timestep from the years 2010 to 2100. We simulated four unique scenarios, consisting of all combinations of two land-use scenarios and two radiative forcing scenarios. For each scenario, we ran 30 Monte Carlo realizations of the model. Results presented here have been aggregated from the individual cell level and summarized by island or vegetation class. Model input data and the R code used to generate it, as well as R code used to summarize and analyze model output data, can be found in the HI_Model GitHub repository (https://github.com/selmants/HI_Model).
b
CTD bottle data summary from R/V New Horizon cruise NH1008 in Monterey Bay,...
datacart.bco-dmo.org
bco-dmo.org
+1more
csv
Updated Sep 10, 2012
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David M. Checkley; Michael Dagg; George A. Jackson (2012). CTD bottle data summary from R/V New Horizon cruise NH1008 in Monterey Bay, near MBARI buoy M1 (36.747?N, 122.022?W); 2010 (GATEKEEPERS project) [Dataset]. https://datacart.bco-dmo.org/dataset/3715
Explore at:
csv(130.89 KB)Available download formats
Dataset updated
Sep 10, 2012
Dataset provided by
Biological and Chemical Data Management Office
Authors
David M. Checkley; Michael Dagg; George A. Jackson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Date, Time, Depth, Sal00, Sal11, Bat_avg, Bat_max, Bat_min, Par_avg, Par_max, and 52 more
Measurement technique
Niskin bottle, CTD Sea-Bird SBE 911plus
Description
CTD - Bottle Summary
CTD Bottle Data - avg, stdev,min and max values at bottle firings for various parameters

Facebook

Twitter

Click to copy link

Link copied

Cite

Varun R (2024). summarize [Dataset]. https://huggingface.co/datasets/varunr14/summarize

Data from: summarize

varunr14/summarize

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

May 31, 2024

Authors

Varun R

License

https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

Description

varunr14/summarize dataset hosted on Hugging Face and contributed by the HF Datasets community

Clear search

Close search

Google apps

Main menu

Data from: summarize

Data for: Extractive Summarization of Clinical Trial Descriptions

Data from: Optimized SMRT-UMI protocol produces highly accurate sequence...

Data_Sheet_4_“R” U ready?: a case study using R to analyze changes in gene...

Summary of Areal (mixed layer integrated) data from R/V Tangaroa cruise...

Station and data summary for data collected on Stellwagen Bank during U.S....

Data from: WiBB: An integrated method for quantifying the relative...

Data records summary and plots R code

National Energy Efficiency Data-Framework (NEED) report: summary of analysis...

SYD ALL climate data statistics summary

Abstract \r

Dataset History \r

Dataset Citation \r

Dataset Ancestors \r

Data from: What's the issue here?: Task-based evaluation of reader comment...

Data from: LTAR Phosphorus Budget Summary

CTD Summary Data

Storage and Transit Time Data and Code

Code information

MOCNESS Summary Data from R/V Polarstern ANT-XXIV_1 (PS24-1) from October to...

Summary for Policymakers of the Working Group I Contribution to the IPCC...

Data provided in relation to figure

Summary Data: Threatened Species Occurrences by Marine Ecoregion

Technical Summary of the Working Group I Contribution to the IPCC Sixth...

Temporal Range of Paleoclimate Data

Land change and carbon balance projections for the Hawaiian Islands

CTD bottle data summary from R/V New Horizon cruise NH1008 in Monterey Bay,...

Data from: summarize

varunr14/summarize

Data from: summarize

Data for: Extractive Summarization of Clinical Trial Descriptions

Data from: Optimized SMRT-UMI protocol produces highly accurate sequence...

Data_Sheet_4_“R” U ready?: a case study using R to analyze changes in gene...

Summary of Areal (mixed layer integrated) data from R/V Tangaroa cruise...

Station and data summary for data collected on Stellwagen Bank during U.S....

Data from: WiBB: An integrated method for quantifying the relative...

Data records summary and plots R code

National Energy Efficiency Data-Framework (NEED) report: summary of analysis...

SYD ALL climate data statistics summary

Abstract \r

Dataset History \r

Dataset Citation \r

Dataset Ancestors \r

Data from: What's the issue here?: Task-based evaluation of reader comment...

Data from: LTAR Phosphorus Budget Summary

CTD Summary Data

Storage and Transit Time Data and Code

Code information

MOCNESS Summary Data from R/V Polarstern ANT-XXIV_1 (PS24-1) from October to...

Summary for Policymakers of the Working Group I Contribution to the IPCC...

Data provided in relation to figure

Summary Data: Threatened Species Occurrences by Marine Ecoregion

Technical Summary of the Working Group I Contribution to the IPCC Sixth...

Temporal Range of Paleoclimate Data

Land change and carbon balance projections for the Hawaiian Islands

CTD bottle data summary from R/V New Horizon cruise NH1008 in Monterey Bay,...

Data from: summarizeSee More Versions

varunr14/summarize

Data from: summarize