100+ datasets found

h
dataset-pinkball-first-merge
huggingface.co
Updated Dec 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thomas R (2025). dataset-pinkball-first-merge [Dataset]. https://huggingface.co/datasets/treitz/dataset-pinkball-first-merge
Explore at:
Dataset updated
Dec 1, 2025
Authors
Thomas R
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset was created using LeRobot.

Dataset Structure

meta/info.json: { "codebase_version": "v3.0", "robot_type": "so101_follower", "total_episodes": 40, "total_frames": 10385, "total_tasks": 1, "chunks_size": 1000, "data_files_size_in_mb": 100, "video_files_size_in_mb": 200, "fps": 30, "splits": { "train": "0:40" }, "data_path": "data/chunk-{chunk_index:03d}/file-{file_index:03d}.parquet", "video_path":… See the full description on the dataset page: https://huggingface.co/datasets/treitz/dataset-pinkball-first-merge.
Reddit's /r/Gamestop
kaggle.com
zip
Updated Nov 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Reddit's /r/Gamestop [Dataset]. https://www.kaggle.com/datasets/thedevastator/gamestop-inc-stock-prices-and-social-media-senti
Explore at:
zip(186464492 bytes)Available download formats
Dataset updated
Nov 28, 2022
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Reddit's /r/Gamestop

Merge this dataset with gamestop price data to study how the chat impacted

By SocialGrep [source]

About this dataset

The stonks movement spawned by this is a very interesting one. It's rare to see an Internet meme have such an effect on real-world economy - yet here we are.

This dataset contains a collection of posts and comments mentioning GME in their title and body text respectively. The data is procured using SocialGrep. The posts and the comments are labelled with their score.

It'll be interesting to see how this effects the stock market prices in the aftermath with this new dataset

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

The file contains posts from Reddit mentioning GME and their score. This can be used to analyze how the sentiment on GME affected its stock prices in the aftermath

Research Ideas

To study how social media affects stock prices

To study how Reddit affects stock prices

To study how the sentiment of a subreddit affects stock prices

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: six-months-of-gme-on-reddit-comments.csv | Column name | Description | |:-------------------|:------------------------------------------------------| | type | The type of post or comment. (String) | | subreddit.name | The name of the subreddit. (String) | | subreddit.nsfw | Whether the subreddit is NSFW. (Boolean) | | created_utc | The time the post or comment was created. (Timestamp) | | permalink | The permalink of the post or comment. (String) | | body | The body of the post or comment. (String) | | sentiment | The sentiment of the post or comment. (String) | | score | The score of the post or comment. (Integer) |

File: six-months-of-gme-on-reddit-posts.csv | Column name | Description | |:-------------------|:------------------------------------------------------| | type | The type of post or comment. (String) | | subreddit.name | The name of the subreddit. (String) | | subreddit.nsfw | Whether the subreddit is NSFW. (Boolean) | | created_utc | The time the post or comment was created. (Timestamp) | | permalink | The permalink of the post or comment. (String) | | score | The score of the post or comment. (Integer) | | domain | The domain of the post or comment. (String) | | url | The URL of the post or comment. (String) | | selftext | The selftext of the post or comment. (String) | | title | The title of the post or comment. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit SocialGrep.
Additional file 4 of mtDNAcombine: tools to combine sequences from multiple...
springernature.figshare.com
txt
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eleanor F. Miller; Andrea Manica (2023). Additional file 4 of mtDNAcombine: tools to combine sequences from multiple studies [Dataset]. http://doi.org/10.6084/m9.figshare.14189969.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14189969.v1
Dataset updated
Jun 3, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Eleanor F. Miller; Andrea Manica
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 4. Code to create the plots in this paper presented as a R markdown file.
KORUS-AQ Aircraft Merge Data Files - Dataset - NASA Open Data Portal
data.nasa.gov
Updated Apr 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). KORUS-AQ Aircraft Merge Data Files - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/korus-aq-aircraft-merge-data-files-9bba5
Explore at:
Dataset updated
Apr 1, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
KORUSAQ_Merge_Data are pre-generated merge data files combining various products collected during the KORUS-AQ field campaign. This collection features pre-generated merge files for the DC-8 aircraft. Data collection for this product is complete.The KORUS-AQ field study was conducted in South Korea during May-June, 2016. The study was jointly sponsored by NASA and Korea’s National Institute of Environmental Research (NIER). The primary objectives were to investigate the factors controlling air quality in Korea (e.g., local emissions, chemical processes, and transboundary transport) and to assess future air quality observing strategies incorporating geostationary satellite observations. To achieve these science objectives, KORUS-AQ adopted a highly coordinated sampling strategy involved surface and airborne measurements including both in-situ and remote sensing instruments.Surface observations provided details on ground-level air quality conditions while airborne sampling provided an assessment of conditions aloft relevant to satellite observations and necessary to understand the role of emissions, chemistry, and dynamics in determining air quality outcomes. The sampling region covers the South Korean peninsula and surrounding waters with a primary focus on the Seoul Metropolitan Area. Airborne sampling was primarily conducted from near surface to about 8 km with extensive profiling to characterize the vertical distribution of pollutants and their precursors. The airborne observational data were collected from three aircraft platforms: the NASA DC-8, NASA B-200, and Hanseo King Air. Surface measurements were conducted from 16 ground sites and 2 ships: R/V Onnuri and R/V Jang Mok.The major data products collected from both the ground and air include in-situ measurements of trace gases (e.g., ozone, reactive nitrogen species, carbon monoxide and dioxide, methane, non-methane and oxygenated hydrocarbon species), aerosols (e.g., microphysical and optical properties and chemical composition), active remote sensing of ozone and aerosols, and passive remote sensing of NO2, CH2O, and O3 column densities. These data products support research focused on examining the impact of photochemistry and transport on ozone and aerosols, evaluating emissions inventories, and assessing the potential use of satellite observations in air quality studies.
Dataset: Environmental conditions and male quality traits simultaneously...
data.europa.eu
data.niaid.nih.gov
unknown
Updated Jun 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2022). Dataset: Environmental conditions and male quality traits simultaneously explain variation of multiple colour signals in male lizards [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-6683661?locale=de
Explore at:
unknown(3063441)Available download formats
Dataset updated
Jun 21, 2022
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset and R code associated with the following publication: Badiane et al. (2022), Environmental conditions and male quality traits simultaneously explain variation of multiple colour signals in male lizards. Journal of Animal Ecology, in press This dataset includes the following files: - An excel file containing the reflectance spectra of all individuals from all the study populations - An excel file containing the variables collected at the individual and population levels - Two R scripts corresponding to the analyses performed in the publication
Benchmark Datasets for Entity Linking from Tabular Data
zenodo.org
zip
Updated Sep 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roberto Avogadro; Roberto Avogadro (2025). Benchmark Datasets for Entity Linking from Tabular Data [Dataset]. http://doi.org/10.5281/zenodo.17160156
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.17160156
Dataset updated
Sep 19, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Roberto Avogadro; Roberto Avogadro
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
📖 Benchmark Datasets for Entity Linking from Tabular Data (Version 2)

This archive provides a benchmark suite for evaluating entity linking algorithms on structured tabular data.
It is organised into two parts:
• Challenge datasets (HTR1, HTR2): From the SemTab Table-to-KG Challenge, widely used in academic evaluations of table-to-KG alignment systems. Each is a dataset (a collection of many tables) provided with ground truth and candidate mappings.
👉 Please also cite the SemTab Challenge when using these resources.
• Real-world tables (Company, Movie, SN):
• Company — one table constructed via SPARQL queries on Wikidata, with both Wikidata and Crunchbase ground truths.
• Movie — one table constructed via SPARQL queries on Wikidata.
• SN (Spend Network) — one procurement table from the enRichMyData (EMD) project, manually annotated and including NIL cases for mentions with no known Wikidata match.

A shared top-level folder (mention_to_qid/) provides JSON files mapping surface mentions to candidate QIDs for these real-world tables.

⸻

📂 Contents

Each dataset or table includes:
• One or more input CSV tables
• Ground truth files mapping mentions/cells to Wikidata QIDs (or NIL)
• Candidate mappings (mention_to_qid/*.json), sometimes multiple variants
• Optional files such as column_classifications.json or cell_to_qid.json

⸻

📝 Licensing
• HTR1 & HTR2: CC BY 4.0
• Company & Movie: Derived from Wikidata (public domain; CC0 1.0)
• SN: CC BY 4.0 (from the enRichMyData project)

⸻

📌 Citation

If you use these datasets, please cite:
• This Zenodo record (Version 2):
Avogadro, R., & Rauniyar, A. (2025). Benchmark Datasets for Entity Linking from Tabular Data (Version 2). Zenodo. https://doi.org/10.5281/zenodo.15888942
• The SemTab Challenge (for HTR1/HTR2):
SemTab: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (Table-to-KG). (Cite the relevant SemTab overview paper for the year you reference.)
• Wikidata: Data retrieved from Wikidata (public domain; CC0 1.0).
• enRichMyData (for SN / Spend Network): Project resources from enRichMyData, licensed under CC BY 4.0.
Z
Missing data in the analysis of multilevel and dependent data (Examples)
data.niaid.nih.gov
Updated Jul 20, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simon Grund; Oliver Lüdtke; Alexander Robitzsch (2023). Missing data in the analysis of multilevel and dependent data (Examples) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7773613
Explore at:
Dataset updated
Jul 20, 2023
Dataset provided by
University of Hamburg
IPN - Leibniz Institute for Science and Mathematics Education
Authors
Simon Grund; Oliver Lüdtke; Alexander Robitzsch
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Example data sets and computer code for the book chapter titled "Missing Data in the Analysis of Multilevel and Dependent Data" submitted for publication in the second edition of "Dependent Data in Social Science Research" (Stemmler et al., 2015). This repository includes the computer code (".R") and the data sets from both example analyses (Examples 1 and 2). The data sets are available in two file formats (binary ".rda" for use in R; plain-text ".dat").

The data sets contain simulated data from 23,376 (Example 1) and 23,072 (Example 2) individuals from 2,000 groups on four variables:

ID = group identifier (1-2000) x = numeric (Level 1) y = numeric (Level 1) w = binary (Level 2)

In all data sets, missing values are coded as "NA".
Additional file 3 of mtDNAcombine: tools to combine sequences from multiple...
springernature.figshare.com
txt
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eleanor F. Miller; Andrea Manica (2023). Additional file 3 of mtDNAcombine: tools to combine sequences from multiple studies [Dataset]. http://doi.org/10.6084/m9.figshare.14189963.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14189963.v1
Dataset updated
Jun 1, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Eleanor F. Miller; Andrea Manica
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 3. Input files needed to recreate the plots in this paper: raw sequence data for alignment.
d
Current Population Survey (CPS)
search.dataone.org
dataverse.harvard.edu
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Damico, Anthony (2023). Current Population Survey (CPS) [Dataset]. http://doi.org/10.7910/DVN/AK4FDD
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/AK4FDD
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Damico, Anthony
Description
analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D
Data from: Optimized SMRT-UMI protocol produces highly accurate sequence...
data.niaid.nih.gov
zenodo.org
+1more
zip
Updated Dec 7, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dylan Westfall; Mullins James (2023). Optimized SMRT-UMI protocol produces highly accurate sequence datasets from diverse populations – application to HIV-1 quasispecies [Dataset]. http://doi.org/10.5061/dryad.w3r2280w0
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.w3r2280w0
Dataset updated
Dec 7, 2023
Dataset provided by
National Institute of Allergy and Infectious Diseaseshttp://www.niaid.nih.gov/
HIV Vaccine Trials Networkhttp://www.hvtn.org/
HIV Prevention Trials Networkhttp://www.hptn.org/
PEPFAR
Authors
Dylan Westfall; Mullins James
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Pathogen diversity resulting in quasispecies can enable persistence and adaptation to host defenses and therapies. However, accurate quasispecies characterization can be impeded by errors introduced during sample handling and sequencing which can require extensive optimizations to overcome. We present complete laboratory and bioinformatics workflows to overcome many of these hurdles. The Pacific Biosciences single molecule real-time platform was used to sequence PCR amplicons derived from cDNA templates tagged with universal molecular identifiers (SMRT-UMI). Optimized laboratory protocols were developed through extensive testing of different sample preparation conditions to minimize between-template recombination during PCR and the use of UMI allowed accurate template quantitation as well as removal of point mutations introduced during PCR and sequencing to produce a highly accurate consensus sequence from each template. Handling of the large datasets produced from SMRT-UMI sequencing was facilitated by a novel bioinformatic pipeline, Probabilistic Offspring Resolver for Primer IDs (PORPIDpipeline), that automatically filters and parses reads by sample, identifies and discards reads with UMIs likely created from PCR and sequencing errors, generates consensus sequences, checks for contamination within the dataset, and removes any sequence with evidence of PCR recombination or early cycle PCR errors, resulting in highly accurate sequence datasets. The optimized SMRT-UMI sequencing method presented here represents a highly adaptable and established starting point for accurate sequencing of diverse pathogens. These methods are illustrated through characterization of human immunodeficiency virus (HIV) quasispecies. Methods This serves as an overview of the analysis performed on PacBio sequence data that is summarized in Analysis Flowchart.pdf and was used as primary data for the paper by Westfall et al. "Optimized SMRT-UMI protocol produces highly accurate sequence datasets from diverse populations – application to HIV-1 quasispecies" Five different PacBio sequencing datasets were used for this analysis: M027, M2199, M1567, M004, and M005 For the datasets which were indexed (M027, M2199), CCS reads from PacBio sequencing files and the chunked_demux_config files were used as input for the chunked_demux pipeline. Each config file lists the different Index primers added during PCR to each sample. The pipeline produces one fastq file for each Index primer combination in the config. For example, in dataset M027 there were 3–4 samples using each Index combination. The fastq files from each demultiplexed read set were moved to the sUMI_dUMI_comparison pipeline fastq folder for further demultiplexing by sample and consensus generation with that pipeline. More information about the chunked_demux pipeline can be found in the README.md file on GitHub. The demultiplexed read collections from the chunked_demux pipeline or CCS read files from datasets which were not indexed (M1567, M004, M005) were each used as input for the sUMI_dUMI_comparison pipeline along with each dataset's config file. Each config file contains the primer sequences for each sample (including the sample ID block in the cDNA primer) and further demultiplexes the reads to prepare data tables summarizing all of the UMI sequences and counts for each family (tagged.tar.gz) as well as consensus sequences from each sUMI and rank 1 dUMI family (consensus.tar.gz). More information about the sUMI_dUMI_comparison pipeline can be found in the paper and the README.md file on GitHub. The consensus.tar.gz and tagged.tar.gz files were moved from sUMI_dUMI_comparison pipeline directory on the server to the Pipeline_Outputs folder in this analysis directory for each dataset and appended with the dataset name (e.g. consensus_M027.tar.gz). Also in this analysis directory is a Sample_Info_Table.csv containing information about how each of the samples was prepared, such as purification methods and number of PCRs. There are also three other folders: Sequence_Analysis, Indentifying_Recombinant_Reads, and Figures. Each has an .Rmd file with the same name inside which is used to collect, summarize, and analyze the data. All of these collections of code were written and executed in RStudio to track notes and summarize results. Sequence_Analysis.Rmd has instructions to decompress all of the consensus.tar.gz files, combine them, and create two fasta files, one with all sUMI and one with all dUMI sequences. Using these as input, two data tables were created, that summarize all sequences and read counts for each sample that pass various criteria. These are used to help create Table 2 and as input for Indentifying_Recombinant_Reads.Rmd and Figures.Rmd. Next, 2 fasta files containing all of the rank 1 dUMI sequences and the matching sUMI sequences were created. These were used as input for the python script compare_seqs.py which identifies any matched sequences that are different between sUMI and dUMI read collections. This information was also used to help create Table 2. Finally, to populate the table with the number of sequences and bases in each sequence subset of interest, different sequence collections were saved and viewed in the Geneious program. To investigate the cause of sequences where the sUMI and dUMI sequences do not match, tagged.tar.gz was decompressed and for each family with discordant sUMI and dUMI sequences the reads from the UMI1_keeping directory were aligned using geneious. Reads from dUMI families failing the 0.7 filter were also aligned in Genious. The uncompressed tagged folder was then removed to save space. These read collections contain all of the reads in a UMI1 family and still include the UMI2 sequence. By examining the alignment and specifically the UMI2 sequences, the site of the discordance and its case were identified for each family as described in the paper. These alignments were saved as "Sequence Alignments.geneious". The counts of how many families were the result of PCR recombination were used in the body of the paper. Using Identifying_Recombinant_Reads.Rmd, the dUMI_ranked.csv file from each sample was extracted from all of the tagged.tar.gz files, combined and used as input to create a single dataset containing all UMI information from all samples. This file dUMI_df.csv was used as input for Figures.Rmd. Figures.Rmd used dUMI_df.csv, sequence_counts.csv, and read_counts.csv as input to create draft figures and then individual datasets for eachFigure. These were copied into Prism software to create the final figures for the paper.
u
Growth and Yield Data for the Bushland, Texas, Soybean Datasets
agdatacommons.nal.usda.gov
catalog.data.gov
xlsx
Updated Nov 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Steven R. Evett; Gary W. Marek; Karen S. Copeland; Terry A. Sr. Howell; Paul D. Colaizzi; David K. Brauer; Brice B. Ruthardt (2025). Growth and Yield Data for the Bushland, Texas, Soybean Datasets [Dataset]. http://doi.org/10.15482/USDA.ADC/1528670
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/1528670
Dataset updated
Nov 21, 2025
Dataset provided by
Ag Data Commons
Authors
Steven R. Evett; Gary W. Marek; Karen S. Copeland; Terry A. Sr. Howell; Paul D. Colaizzi; David K. Brauer; Brice B. Ruthardt
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Bushland, Texas
Description
This dataset consists of growth and yield data for each season when soybean [Glycine max (L.) Merr.] was grown for seed at the USDA-ARS Conservation and Production Laboratory (CPRL), Soil and Water Management Research Unit (SWMRU) research weather station, Bushland, Texas (Lat. 35.186714°, Long. -102.094189°, elevation 1170 m above MSL). In the 1994, 2003, 2004, and 2010 seasons, soybean was grown on two large, precision weighing lysimeters, each in the center of a 4.44 ha square field. In 2019, soybean was grown on four large, precision weighing lysimeters and their surrounding 4.4 ha fields. The square fields are themselves arranged in a larger square with four fields in four adjacent quadrants of the larger square. Fields and lysimeters within each field are thus designated northeast (NE), southeast (SE), northwest (NW), and southwest (SW). Soybean was grown on different combinations of fields in different years. Irrigation was by linear move sprinkler system in 1995, 2003, 2004, and 2010 although in 2010 only one irrigation was applied to establish the crop after which it was grown as a dryland crop. Irrigation protocols described as full were managed to replenish soil water used by the crop on a weekly or more frequent basis as determined by soil profile water content readings made with a neutron probe to 2.4-m depth in the field. Irrigation protocols described as deficit typically involved irrigations to establish the crop early in the season, followed by reduced or absent irrigations later in the season (typically in the later winter and spring). The growth and yield data include plant population density, height, plant row width, leaf area index, growth stage, total above-ground biomass, leaf and stem biomass, head mass (when present), kernel or seed number, and final yield. Data are from replicate samples in the field and non-destructive (except for final harvest) measurements on the weighing lysimeters. In most cases yield data are available from both manual sampling on replicate plots in each field and from machine harvest. Machine harvest yields are commonly smaller than hand harvest yields due to combine losses. These datasets originate from research aimed at determining crop water use (ET), crop coefficients for use in ET-based irrigation scheduling based on a reference ET, crop growth, yield, harvest index, and crop water productivity as affected by irrigation method, timing, amount (full or some degree of deficit), agronomic practices, cultivar, and weather. Prior publications have focused on soybean ET, crop coefficients, and crop water productivity. Crop coefficients have been used by ET networks. The data have utility for testing simulation models of crop ET, growth, and yield and have been used for testing, and calibrating models of ET that use satellite and/or weather data. See the README for descriptions of each data file. Resources in this dataset:Resource Title: 1995 Bushland, TX, west soybean growth and yield data. File Name: 1995 West Soybean_Growth_and_Yield-V2.xlsxResource Title: 2003 Bushland, TX, east soybean growth and yield data. File Name: 2003 East Soybean_Growth_and_Yield-V2.xlsxResource Title: 2004 Bushland, TX, east soybean growth and yield data. File Name: 2004 East Soybean_Growth-and_Yield-V2.xlsxResource Title: 2019 Bushland, TX, east soybean growth and yield data. File Name: 2019 East Soybean_Growth_and_Yield-V2.xlsxResource Title: 2019 Bushland, TX, west soybean growth and yield data. File Name: 2019 West Soybean_Growth_and_Yield-V2.xlsxResource Title: 2010 Bushland, TX, west soybean growth and yield data. File Name: 2010 West_Soybean_Growth_and_Yield-V2.xlsxResource Title: README. File Name: README_Soybean_Growth_and_Yield.txt
h
R-PRM
huggingface.co
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shuaijie She (2025). R-PRM [Dataset]. https://huggingface.co/datasets/kevinpro/R-PRM
Explore at:
Dataset updated
Mar 31, 2025
Authors
Shuaijie She
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
📘 R-PRM Dataset (SFT + DPO)

This dataset is developed for training Reasoning-Driven Process Reward Models (R-PRM), proposed in our ACL 2025 paper. It consists of two stages:

SFT (Supervised Fine-Tuning): collected from strong LLMs prompted with limited annotated examples, enabling reasoning-style evaluation. DPO (Direct Preference Optimization): constructed by sampling multiple reasoning trajectories and forming preference pairs without additional labels.

These datasets are used… See the full description on the dataset page: https://huggingface.co/datasets/kevinpro/R-PRM.
u
Growth and Yield Data for the Bushland, Texas, Sorghum Datasets
agdatacommons.nal.usda.gov
catalog.data.gov
xlsx
Updated Nov 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Steven R. Evett; Gary W. Marek; Karen S. Copeland; Terry A. Sr. Howell; Paul D. Colaizzi; David K. Brauer; Brice B. Ruthardt (2025). Growth and Yield Data for the Bushland, Texas, Sorghum Datasets [Dataset]. http://doi.org/10.15482/USDA.ADC/1529411
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/1529411
Dataset updated
Nov 21, 2025
Dataset provided by
Ag Data Commons
Authors
Steven R. Evett; Gary W. Marek; Karen S. Copeland; Terry A. Sr. Howell; Paul D. Colaizzi; David K. Brauer; Brice B. Ruthardt
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Bushland, Texas
Description
This dataset consists of growth and yield data for each season when sorghum [Sorghum bicolor (L.)] was grown at the USDA-ARS Conservation and Production Laboratory (CPRL), Soil and Water Management Research Unit (SWMRU) research weather station, Bushland, Texas (Lat. 35.186714°, Long. -102.094189°, elevation 1170 m above MSL). In the 1988, 1991, 1993, 1997, 1998, 1999, 2003 through 2007, 2014, and 2015 seasons (13 years), sorghum was grown on from one to four large, precision weighing lysimeters, each in the center of a 4.44 ha square field also planted to sorghum. The square fields were themselves arranged in a larger square with four fields in four adjacent quadrants of the larger square. Fields and lysimeters within each field were thus designated northeast (NE), southeast (SE), northwest (NW), and southwest (SW). Sorghum was grown on different combinations of fields in different years. When irrigated, irrigation was by linear move sprinkler system years before 2014, and by both sprinkler and subsurface drip irrigation in 2014 and 2015. Irrigation protocols described as full were managed to replenish soil water used by the crop on a weekly or more frequent basis as determined by soil profile water content readings made with a neutron probe to 2.4-m depth in the field. Irrigation protocols described as deficit typically involved irrigation at rates established as percentages of full irrigation ranging from 33% to 75% depending on the year. The growth and yield data include plant population density, height, plant row width, leaf area index, growth stage, total above-ground biomass, leaf and stem biomass, head mass (when present), seed mass, and final yield. Data are from replicate samples in the field and non-destructive (except for final harvest) measurements on the weighing lysimeters. In most cases yield data are available from both manual sampling on replicate plots in each field and from machine harvest. Machine harvest yields are commonly smaller than hand harvest yields due to combine losses. These datasets originate from research aimed at determining crop water use (ET), crop coefficients for use in ET-based irrigation scheduling based on a reference ET, crop growth, yield, harvest index, and crop water productivity as affected by irrigation method, timing, amount (full or some degree of deficit), agronomic practices, cultivar, and weather. Prior publications have focused on sorghum ET, crop coefficients, crop water productivity, and simulation modeling of crop water use, growth, and yield. Crop coefficients have been used by ET networks. The data have utility for testing simulation models of crop ET, growth, and yield and have been used for testing, and calibrating models of ET that use satellite and/or weather data. See the README for descriptions of each data file.
d
phenotools: an R package for visualizing and analyzing phenomic datasets
search.dataone.org
datadryad.org
Updated Jun 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chad M. Eliason; Scott V. Edwards; Julia A. Clarke (2025). phenotools: an R package for visualizing and analyzing phenomic datasets [Dataset]. http://doi.org/10.5061/dryad.05qm36k
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.05qm36k
Dataset updated
Jun 17, 2025
Dataset provided by
Dryad Digital Repository
Authors
Chad M. Eliason; Scott V. Edwards; Julia A. Clarke
Time period covered
Jan 1, 2019
Description
1.Phenotypic data is crucial for understanding genotypeâ€“phenotype relationships, assessing the tree of life, and revealing trends in trait diversity over time. Largeâ€ scale description of whole organisms for quantitative analyses (phenomics) presents several challenges, and technological advances in the collection of genomic data outpace those for phenomic data. Reasons for this disparity include the timeâ€ consuming and expensive nature of collecting discrete phenotypic data and mining previouslyâ€ published data on a given species (both often requiring anatomical expertise across taxa), and computational challenges involved with analyzing highâ€ dimensional datasets.

2.One approach to building approximations of organismal phenomes is to combine published datasets of discrete characters assembled for phylogenetic analyses into a phenomic dataset. Despite a wealth of legacy datasets in the literature for many groups, relatively few methods exist for automating the assembly, analysis, and vi...
m
R codes and dataset for Visualisation of Diachronic Constructional Change...
bridges.monash.edu
researchdata.edu.au
zip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gede Primahadi Wijaya Rajeg (2023). R codes and dataset for Visualisation of Diachronic Constructional Change using Motion Chart [Dataset]. http://doi.org/10.26180/5c844c7a81768
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.26180/5c844c7a81768
Dataset updated
May 30, 2023
Dataset provided by
Monash University
Authors
Gede Primahadi Wijaya Rajeg
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
PublicationPrimahadi Wijaya R., Gede. 2014. Visualisation of diachronic constructional change using Motion Chart. In Zane Goebel, J. Herudjati Purwoko, Suharno, M. Suryadi & Yusuf Al Aried (eds.). Proceedings: International Seminar on Language Maintenance and Shift IV (LAMAS IV), 267-270. Semarang: Universitas Diponegoro. doi: https://doi.org/10.4225/03/58f5c23dd8387Description of R codes and data files in the repositoryThis repository is imported from its GitHub repo. Versioning of this figshare repository is associated with the GitHub repo's Release. So, check the Releases page for updates (the next version is to include the unified version of the codes in the first release with the tidyverse).The raw input data consists of two files (i.e. will_INF.txt and go_INF.txt). They represent the co-occurrence frequency of top-200 infinitival collocates for will and be going to respectively across the twenty decades of Corpus of Historical American English (from the 1810s to the 2000s).These two input files are used in the R code file 1-script-create-input-data-raw.r. The codes preprocess and combine the two files into a long format data frame consisting of the following columns: (i) decade, (ii) coll (for "collocate"), (iii) BE going to (for frequency of the collocates with be going to) and (iv) will (for frequency of the collocates with will); it is available in the input_data_raw.txt. Then, the script 2-script-create-motion-chart-input-data.R processes the input_data_raw.txt for normalising the co-occurrence frequency of the collocates per million words (the COHA size and normalising base frequency are available in coha_size.txt). The output from the second script is input_data_futurate.txt.Next, input_data_futurate.txt contains the relevant input data for generating (i) the static motion chart as an image plot in the publication (using the script 3-script-create-motion-chart-plot.R), and (ii) the dynamic motion chart (using the script 4-script-motion-chart-dynamic.R).The repository adopts the project-oriented workflow in RStudio; double-click on the Future Constructions.Rproj file to open an RStudio session whose working directory is associated with the contents of this repository.
SYD ALL climate data statistics summary
researchdata.edu.au
Updated Mar 13, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bioregional Assessment Program (2019). SYD ALL climate data statistics summary [Dataset]. https://researchdata.edu.au/syd-all-climate-statistics-summary/2989432
Explore at:
Dataset updated
Mar 13, 2019
Dataset provided by
Data.govhttps://data.gov/
Authors
Bioregional Assessment Program
License
Attribution 2.5 (CC BY 2.5)https://creativecommons.org/licenses/by/2.5/
License information was derived automatically
Description
Abstract \r

\r The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.\r \r \r \r There are 4 csv files here:\r \r BAWAP_P_annual_BA_SYB_GLO.csv\r \r Desc: Time series mean annual BAWAP rainfall from 1900 - 2012.\r \r Source data: annual BILO rainfall on \\wron\Project\BA\BA_N_Sydney\Working\li036_Lingtao_LI\Grids\BILO_Rain_Ann\\r \r \r \r P_PET_monthly_BA_SYB_GLO.csv\r \r long term average BAWAP rainfall and Penman PET from 198101 - 201212 for each month\r \r \r \r Climatology_Trend_BA_SYB_GLO.csv\r \r Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P; (ii) Penman ETp; (iii) Tavg; (iv) Tmax; (v) Tmin; (vi) VPD; (vii) Rn; and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend\r \r \r \r Risbey_Remote_Rainfall_Drivers_Corr_Coeffs_BA_NSB_GLO.csv\r \r Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009). All data used in this analysis came directly from James Risbey, CMAR, Hobart. As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).\r \r

Dataset History \r

\r Dataset was created from various BILO source data, including Monthly BILO rainfall, Tmax, Tmin, VPD, etc, and other source data including monthly Penman PET (calculated by Randall Donohue), Correlation coefficient data from James Risbey\r \r

Dataset Citation \r

\r Bioregional Assessment Programme (XXXX) SYD ALL climate data statistics summary. Bioregional Assessment Derived Dataset. Viewed 13 March 2019, http://data.bioregionalassessments.gov.au/dataset/b0a6ccf1-395d-430e-adf1-5068f8371dea.\r \r

Dataset Ancestors \r

\r * Derived From BILO Gridded Climate Data: Daily Climate Data for each year from 1900 to 2012\r \r

Harmonized global datasets of soil carbon and heterotrophic respiration from...

zenodo.org

bin, nc, txt

Updated Oct 7, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Shoji Hashimoto; Shoji Hashimoto; Akihiko Ito; Akihiko Ito; Kazuya Nishina; Kazuya Nishina (2025). Harmonized global datasets of soil carbon and heterotrophic respiration from data-driven estimates, with derived turnover time and Q10 [Dataset]. http://doi.org/10.5281/zenodo.17282577

Explore at:

nc, txt, binAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.17282577

Dataset updated

Oct 7, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Shoji Hashimoto; Shoji Hashimoto; Akihiko Ito; Akihiko Ito; Kazuya Nishina; Kazuya Nishina

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

We collected all available global soil carbon (C) and heterotrophic respiration (R_H) maps derived from data-driven estimates, sourcing them from public repositories and supplementary materials of previous studies (Table 1). All spatial datasets were converted to NetCDF format for consistency and ease of use.

Because the maps had varying spatial resolutions (ranging from 0.0083° to 0.5°), we harmonized all datasets to a common resolution of 0.5° (approximately 50 km at the equator). We then merged the processed maps by computing the mean, maximum, and minimum values at each grid cell, resulting in harmonized global maps of soil C (for the top 0–30 cm and 0–100 cm depths) and R_H at 0.5° resolution.

Grid cells with fewer than three soil C estimates or fewer than four R_H estimates were assigned NA values. Land and water grid cells were automatically distinguished by combining multiple datasets containing soil C and R_H information over land.

Soil carbon turnover time (years), denoted as τ, was calculated under the assumption of a quasi-equilibrium state using the formula:

τ = C_S / R_H

where C_S is soil carbon stock and R_H is the heterotrophic respiration rate. The uncertainty range of τ was estimated for each grid cell using:

τ_max = C_S⁺ / R_H⁻ τ_min = C_S⁻ / R_H⁺

where C_S⁺ and C_S⁻ are the maximum and minimum soil C values, and R_H⁺ and R_H⁻ are the maximum and minimum R_H values, respectively.

To calculate the temperature sensitivity of decomposition (Q₁₀)—the factor by which decomposition rates increase with a 10 °C rise in temperature—we followed the method described in Koven et al. (2017). The uncertainty of Q₁₀ (maximum and minimum values) was derived using τ_max and τ_min, respectively.

All files are provided in NetCDF format. The SOC file includes the following variables:
· longitude, latitude
· soc: mean soil C stock (kg C m⁻²)
· soc_median: median soil C (kg C m⁻²)
· soc_n: number of estimates per grid cell
· soc_max, soc_min: maximum and minimum soil C (kg C m⁻²)
· soc_max_id, soc_min_id: study IDs corresponding to the maximum and minimum values
· soc_range: range of soil C values
· soc_sd: standard deviation of soil C (kg C m⁻²)
· soc_cv: coefficient of variation (%)
The R_H file includes:
· longitude, latitude
· rh: mean R_H (g C m⁻² yr⁻¹)
· rh_median, rh_n, rh_max, rh_min: as above
· rh_max_id, rh_min_id: study IDs for max/min
· rh_range, rh_sd, rh_cv: analogous variables for R_H
The mean, maximum, and minimum values of soil C turnover time are provided as separate files. The Q₁₀ files contain estimates derived from the mean values of soil C and R_H, along with associated uncertainty values.

The harmonized dataset files available in the repository are as follows:

· harmonized-RH-hdg.nc: global soil heterotrophic respiration map

· harmonized-SOC100-hdg.nc: global soil C map for 0–100 cm

· harmonized-SOC30-hdg.nc: global soil C map for 0–30 cm

· Q10.nc: global Q10 map

· Turnover-time_max.nc: global soil C turnover time estimated using maximum soil C and minimum R_H

· Turnover-time_min.nc: global soil C turnover time estimated using minimum soil C and maximum R_H

· Turnover-time_mean.nc: global soil C turnover time estimated using mean soil C and R_H

· Turnover-time30_mean.nc: global soil C turnover time estimated using the soil C map for 0-30 cm

Version history
Version 1.1: Median values were added. Bug fix for SOC30 (n>2 was inactive in the former version)

More details are provided in: Hashimoto S. Ito, A. & Nishina K. (in revision) Harmonized global soil carbon and respiration datasets with derived turnover time and temperature sensitivity. Scientific Data

Reference

Koven, C. D., Hugelius, G., Lawrence, D. M. & Wieder, W. R. Higher climatological temperature sensitivity of soil carbon in cold than warm climates. Nat. Clim. Change 7, 817–822 (2017).

Table1 : List of soil carbon and heterotrophic respiration datasets used in this study.

<td style="width:

Dataset	Repository/References (Dataset name)	Depth	ID in NetCDF file***
Global soil C	Global soil data task 2000 (IGBP-DIS)¹	0–100	3,-
	Shangguan et al. 2014 (GSDE)^2,3	0–100, 0–30*	1,1
	Batjes 2016 (WISE30sec)^4,5	0–100, 0–30	6,7
	Sanderman et al. 2017 (Soil-Carbon-Debt) ^6,7	0–100, 0–30	5,5
	Soilgrids team and Hengl et al. 2017 (SoilGrids)^8,9	0–30**	-,6
	Hengl and Wheeler 2018 (LandGIS)¹⁰	0–100, 0–30	4,4
	FAO 2022 (GSOC)¹¹	0–30	-,2
	FAO 2023 (HWSD2)¹²	0–100, 0–30	2,3
Circumpolar soil C	Hugelius et al. 2013 (NCSCD)^13–15	0–100, 0–30	7,8
Global R_H	Hashimoto et al. 2015^16,17	-	1
	Warner et al. 2019 (Bond-Lamberty equation based)^18,19	-	2
	Warner et al. 2019 (Subke equation based)^18,19	-	3
	Tang et al. 2020^20,21	-	4
	Lu et al. 2021^22,23	-	5
	Stell et al. 2021^24,25	-

Data from: A Machine Learning Model to Estimate Toxicokinetic Half-Lives of...
catalog.data.gov
datasets.ai
Updated Apr 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2023). A Machine Learning Model to Estimate Toxicokinetic Half-Lives of Per- and Polyfluoro-Alkyl Substances (PFAS) in Multiple Species [Dataset]. https://catalog.data.gov/dataset/a-machine-learning-model-to-estimate-toxicokinetic-half-lives-of-per-and-polyfluoro-alkyl-
Explore at:
Dataset updated
Apr 30, 2023
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Data and code for "Dawson, D.E.; Lau, C.; Pradeep, P.; Sayre, R.R.; Judson, R.S.; Tornero-Velez, R.; Wambaugh, J.F. A Machine Learning Model to Estimate Toxicokinetic Half-Lives of Per- and Polyfluoro-Alkyl Substances (PFAS) in Multiple Species. Toxics 2023, 11, 98. https://doi.org/10.3390/toxics11020098" Includes a link to R-markdown file allowing the application of the model to novel chemicals. This dataset is associated with the following publication: Dawson, D., C. Lau, P. Pradeep, R. Sayre, R. Judson, R. Tornero-Velez, and J. Wambaugh. A Machine Learning Model to Estimate Toxicokinetic Half-Lives of Per- and Polyfluoro-Alkyl Substances (PFAS) in Multiple Species. Toxics. MDPI, Basel, SWITZERLAND, 11(2): 98, (2023).
u
NASA DC-8 1 Minute Data Merge
data.ucar.edu
ascii
Updated Oct 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gao Chen; Jennifer R. Olson; Michael Shook (2025). NASA DC-8 1 Minute Data Merge [Dataset]. http://doi.org/10.26023/VM9C-1C16-H003
Explore at:
asciiAvailable download formats
Unique identifier
https://doi.org/10.26023/VM9C-1C16-H003
Dataset updated
Oct 7, 2025
Authors
Gao Chen; Jennifer R. Olson; Michael Shook
Time period covered
May 1, 2012 - Jun 30, 2012
Area covered

Description
This dataset contains NASA DC-8 1 Minute Data Merge data collected during the Deep Convective Clouds and Chemistry Experiment (DC3) from 18 May 2012 through 22 June 2012. This dataset contains updated data provided by NASA. In most cases, variable names have been kept identical to those submitted in the raw data files. However, in some cases, names have been changed (e.g., to eliminate duplication). Units have been standardized throughout the merge. In addition, a "grand merge" has been provided. This includes data from all the individual merged flights throughout the mission. This grand merge will follow the following naming convention: "dc3-mrg60-dc8_merge_YYYYMMdd_R5_thruYYYYMMdd.ict" (with the comment "_thruYYYYMMdd" indicating the last flight date included). This dataset is in ICARTT format. Please see the header portion of the data files for details on instruments, parameters, quality assurance, quality control, contact information, and dataset comments. For more information on updates to this dataset, please see the readme file.
Z
Datasets and R Markdown files for the manuscript: "A rapid survey method to...
data-staging.niaid.nih.gov
Updated Feb 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Serrano, Xaymara; Chacin, Dinorah; Mack, Kevin; Gregg, Kurtis; Parsons, Mel; Ladd, Mark; Lehmann, Wade; Wolfe, Jordan; Karazsia, Jocelyn (2025). Datasets and R Markdown files for the manuscript: "A rapid survey method to assess reef habitat sediment deposition and prevalence of sediment stress on multiple benthic taxa that may be impacted by dredging" [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_14827708
Explore at:
Dataset updated
Feb 12, 2025
Dataset provided by
NOAA National Marine Fisheries Service
United States Environmental Protection Agency
NOAA National Marine Fisheries Service Southeast Regional Office
Authors
Serrano, Xaymara; Chacin, Dinorah; Mack, Kevin; Gregg, Kurtis; Parsons, Mel; Ladd, Mark; Lehmann, Wade; Wolfe, Jordan; Karazsia, Jocelyn
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Datasets and R Markdown files for the manuscript "A rapid survey method to assess reef habitat sediment deposition and prevalence of sediment stress on multiple benthic taxa that may be impacted by dredging". Figures in the paper were generated using the code in MPB_Publication Figures.Rmd. Analyses were run and interperded using the code in MPB_Publication Analysis.Rmd. Data files are contained in Data.zip

Facebook

Twitter

Click to copy link

Link copied

Cite

Thomas R (2025). dataset-pinkball-first-merge [Dataset]. https://huggingface.co/datasets/treitz/dataset-pinkball-first-merge

dataset-pinkball-first-merge

treitz/dataset-pinkball-first-merge

Explore at:

Dataset updated

Dec 1, 2025

Authors

Thomas R

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

This dataset was created using LeRobot.

  Dataset Structure

meta/info.json: { "codebase_version": "v3.0", "robot_type": "so101_follower", "total_episodes": 40, "total_frames": 10385, "total_tasks": 1, "chunks_size": 1000, "data_files_size_in_mb": 100, "video_files_size_in_mb": 200, "fps": 30, "splits": { "train": "0:40" }, "data_path": "data/chunk-{chunk_index:03d}/file-{file_index:03d}.parquet", "video_path":… See the full description on the dataset page: https://huggingface.co/datasets/treitz/dataset-pinkball-first-merge.

Clear search

Close search

Google apps

Main menu

dataset-pinkball-first-merge

Reddit's /r/Gamestop

Reddit's /r/Gamestop

Merge this dataset with gamestop price data to study how the chat impacted

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

Additional file 4 of mtDNAcombine: tools to combine sequences from multiple...

KORUS-AQ Aircraft Merge Data Files - Dataset - NASA Open Data Portal

Dataset: Environmental conditions and male quality traits simultaneously...

Benchmark Datasets for Entity Linking from Tabular Data

Missing data in the analysis of multilevel and dependent data (Examples)

Additional file 3 of mtDNAcombine: tools to combine sequences from multiple...

Current Population Survey (CPS)

Data from: Optimized SMRT-UMI protocol produces highly accurate sequence...

Growth and Yield Data for the Bushland, Texas, Soybean Datasets

R-PRM

Growth and Yield Data for the Bushland, Texas, Sorghum Datasets

phenotools: an R package for visualizing and analyzing phenomic datasets

R codes and dataset for Visualisation of Diachronic Constructional Change...

SYD ALL climate data statistics summary

Abstract \r

Dataset History \r

Dataset Citation \r

Dataset Ancestors \r

Harmonized global datasets of soil carbon and heterotrophic respiration from...

Data from: A Machine Learning Model to Estimate Toxicokinetic Half-Lives of...

NASA DC-8 1 Minute Data Merge

Datasets and R Markdown files for the manuscript: "A rapid survey method to...

dataset-pinkball-first-merge

treitz/dataset-pinkball-first-merge