30 datasets found

d
Streamflow, Dissolved Organic Carbon, and Nitrate Input Datasets and Model...
catalog.data.gov
data.usgs.gov
+1more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Streamflow, Dissolved Organic Carbon, and Nitrate Input Datasets and Model Results Using the Weighted Regressions on Time, Discharge, and Season (WRTDS) Model for Buck Creek Watersheds, Adirondack Park, New York, 2001 to 2021 [Dataset]. https://catalog.data.gov/dataset/streamflow-dissolved-organic-carbon-and-nitrate-input-datasets-and-model-results-using-the
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
This data release supports an analysis of changes in dissolved organic carbon (DOC) and nitrate concentrations in Buck Creek watershed near Inlet, New York 2001 to 2021. The Buck Creek watershed is a 310-hectare forested watershed that is recovering from acidic deposition within the Adirondack region. The data release includes pre-processed model inputs and model outputs for the Weighted Regressions on Time, Discharge and Season (WRTDS) model (Hirsch and others, 2010) to estimate daily flow normalized concentrations of DOC and nitrate during a 20-year period of analysis. WRTDS uses daily discharge and concentration observations implemented through the Exploration and Graphics for River Trends R package (EGRET) to predict solute concentration using decimal time and discharge as explanatory variables (Hirsch and De Cicco, 2015; Hirsch and others, 2010). Discharge and concentration data are available from the U.S. Geological Survey National Water Information System (NWIS) database (U.S. Geological Survey, 2016). The time series data were analyzed for the entire period, water years 2001 (WY2001) to WY2021 where WY2001 is the period from October 1, 2000 to September 30, 2001. This data release contains 5 comma-separated values (CSV) files, one R script, and one XML metadata file. There are four input files (“Daily.csv”, “INFO.csv”, “Sample_doc.csv”, and “Sample_nitrate.csv”) that contain site information, daily mean discharge, and mean daily DOC or nitrate concentrations. The R script (“Buck Creek WRTDS R script.R”) uses the four input datasets and functions from the EGRET R package to generate estimations of flow normalized concentrations. The output file (“WRTDS_results.csv”) contains model output at daily time steps for each sub-watershed and for each solute. Files are automatically associated with the R script when opened in RStudio using the provided R project file ("Files.Rproj"). All input, output, and R files are in the "Files.zip" folder.
d
Replication Data for: Revisiting 'The Rise and Decline' in a Population of...
search.dataone.org
dataverse.harvard.edu
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TeBlunthuis, Nathan; Aaron Shaw; Benjamin Mako Hill (2023). Replication Data for: Revisiting 'The Rise and Decline' in a Population of Peer Production Projects [Dataset]. http://doi.org/10.7910/DVN/SG3LP1
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/SG3LP1
Dataset updated
Nov 22, 2023
Dataset provided by
Harvard Dataverse
Authors
TeBlunthuis, Nathan; Aaron Shaw; Benjamin Mako Hill
Description
This archive contains code and data for reproducing the analysis for “Replication Data for Revisiting ‘The Rise and Decline’ in a Population of Peer Production Projects”. Depending on what you hope to do with the data you probabbly do not want to download all of the files. Depending on your computation resources you may not be able to run all stages of the analysis. The code for all stages of the analysis, including typesetting the manuscript and running the analysis, is in code.tar. If you only want to run the final analysis or to play with datasets used in the analysis of the paper, you want intermediate_data.7z or the uncompressed tab and csv files. The data files are created in a four-stage process. The first stage uses the program “wikiq” to parse mediawiki xml dumps and create tsv files that have edit data for each wiki. The second stage generates all.edits.RDS file which combines these tsvs into a dataset of edits from all the wikis. This file is expensive to generate and at 1.5GB is pretty big. The third stage builds smaller intermediate files that contain the analytical variables from these tsv files. The fourth stage uses the intermediate files to generate smaller RDS files that contain the results. Finally, knitr and latex typeset the manuscript. A stage will only run if the outputs from the previous stages do not exist. So if the intermediate files exist they will not be regenerated. Only the final analysis will run. The exception is that stage 4, fitting models and generating plots, always runs. If you only want to replicate from the second stage onward, you want wikiq_tsvs.7z. If you want to replicate everything, you want wikia_mediawiki_xml_dumps.7z.001 wikia_mediawiki_xml_dumps.7z.002, and wikia_mediawiki_xml_dumps.7z.003. These instructions work backwards from building the manuscript using knitr, loading the datasets, running the analysis, to building the intermediate datasets. Building the manuscript using knitr This requires working latex, latexmk, and knitr installations. Depending on your operating system you might install these packages in different ways. On Debian Linux you can run apt install r-cran-knitr latexmk texlive-latex-extra. Alternatively, you can upload the necessary files to a project on Overleaf.com. Download code.tar. This has everything you need to typeset the manuscript. Unpack the tar archive. On a unix system this can be done by running tar xf code.tar. Navigate to code/paper_source. Install R dependencies. In R. run install.packages(c("data.table","scales","ggplot2","lubridate","texreg")) On a unix system you should be able to run make to build the manuscript generalizable_wiki.pdf. Otherwise you should try uploading all of the files (including the tables, figure, and knitr folders) to a new project on Overleaf.com. Loading intermediate datasets The intermediate datasets are found in the intermediate_data.7z archive. They can be extracted on a unix system using the command 7z x intermediate_data.7z. The files are 95MB uncompressed. These are RDS (R data set) files and can be loaded in R using the readRDS. For example newcomer.ds <- readRDS("newcomers.RDS"). If you wish to work with these datasets using a tool other than R, you might prefer to work with the .tab files. Running the analysis Fitting the models may not work on machines with less than 32GB of RAM. If you have trouble, you may find the functions in lib-01-sample-datasets.R useful to create stratified samples of data for fitting models. See line 89 of 02_model_newcomer_survival.R for an example. Download code.tar and intermediate_data.7z to your working folder and extract both archives. On a unix system this can be done with the command tar xf code.tar && 7z x intermediate_data.7z. Install R dependencies. install.packages(c("data.table","ggplot2","urltools","texreg","optimx","lme4","bootstrap","scales","effects","lubridate","devtools","roxygen2")). On a unix system you can simply run regen.all.sh to fit the models, build the plots and create the RDS files. Generating datasets Building the intermediate files The intermediate files are generated from all.edits.RDS. This process requires about 20GB of memory. Download all.edits.RDS, userroles_data.7z,selected.wikis.csv, and code.tar. Unpack code.tar and userroles_data.7z. On a unix system this can be done using tar xf code.tar && 7z x userroles_data.7z. Install R dependencies. In R run install.packages(c("data.table","ggplot2","urltools","texreg","optimx","lme4","bootstrap","scales","effects","lubridate","devtools","roxygen2")). Run 01_build_datasets.R. Building all.edits.RDS The intermediate RDS files used in the analysis are created from all.edits.RDS. To replicate building all.edits.RDS, you only need to run 01_build_datasets.R when the int... Visit https://dataone.org/datasets/sha256%3Acfa4980c107154267d8eb6dc0753ed0fde655a73a062c0c2f5af33f237da3437 for complete metadata about this dataset.
f
Petre_Slide_CategoricalScatterplotFigShare.pptx
figshare.com
pptx
Updated Sep 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benj Petre; Aurore Coince; Sophien Kamoun (2016). Petre_Slide_CategoricalScatterplotFigShare.pptx [Dataset]. http://doi.org/10.6084/m9.figshare.3840102.v1
Explore at:
pptxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3840102.v1
Dataset updated
Sep 19, 2016
Dataset provided by
figshare
Authors
Benj Petre; Aurore Coince; Sophien Kamoun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Categorical scatterplots with R for biologists: a step-by-step guide

Benjamin Petre1, Aurore Coince2, Sophien Kamoun1

1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK

Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.

Protocol

• Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.

• Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.

• Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.

Notes

• Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.

• Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.

7 Display the graph in a separate window. Dot colors indicate

replicates

graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()

References

Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.

Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035

Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128

https://cran.r-project.org/

http://ggplot2.org/
g
Data to Assess Nitrogen Export from Forested Watersheds in and near the Long...
gimi9.com
Updated Mar 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Data to Assess Nitrogen Export from Forested Watersheds in and near the Long Island Sound Basin with Weighted Regressions on Time, Discharge, and Season (WRTDS) | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_data-to-assess-nitrogen-export-from-forested-watersheds-in-and-near-the-long-island-sound-/
Explore at:
Dataset updated
Mar 11, 2025
Area covered
Long Island Sound, Long Island
Description
The U.S. Geological Survey, in cooperation with the U.S. Environmental Protection Agency's Long Island Sound Study (https://longislandsoundstudy.net), characterized nitrogen export from forested watersheds and whether nitrogen loading has been increasing or decreasing to help inform Long Island Sound management strategies. The Weighted Regressions on Time, Discharge, and Season (WRTDS; Hirsch and others, 2010) method was used to estimate annual concentrations and fluxes of nitrogen species using long-term records (14 to 37 years in length) of stream total nitrogen, dissolved organic nitrogen, nitrate, and ammonium concentrations and daily discharge data from 17 watersheds located in the Long Island Sound basin or in nearby areas of Massachusetts, New Hampshire, or New York. This data release contains the input water-quality and discharge data, annual outputs (including concentrations, fluxes, yields, and confidence intervals about these estimates), statistical tests for trends between the periods of water years 1999-2000 and 2016-2018, and model diagnostic statistics. These datasets are organized into one zip file (WRTDSeLists.zip) and six comma-separated values (csv) data files (StationInformation.csv, AnnualResults.csv, TrendResults.csv, ModelStatistics.csv, InputWaterQuality.csv, and InputStreamflow.csv). The csv file (StationInformation.csv) contains information about the stations and input datasets. Finally, a short R script (SampleScript.R) is included to facilitate viewing the input and output data and to re-run the model. Reference: Hirsch, R.M., Moyer, D.L., and Archfield, S.A., 2010, Weighted Regressions on Time, Discharge, and Season (WRTDS), with an application to Chesapeake Bay River inputs: Journal of the American Water Resources Association, v. 46, no. 5, p. 857–880.
R-code, Dataset, Analysis and output (2012-2020): Occupancy and Probability...
catalog.data.gov
datasets.ai
Updated Feb 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Fish and Wildlife Service (2025). R-code, Dataset, Analysis and output (2012-2020): Occupancy and Probability of Detection for Bachman's Sparrow (Aimophila aestivalis), Northern Bobwhite (Collinus virginianus), and Brown-headed Nuthatch (Sitta pusilla) to Habitat Management Practices on Carolina Sandhills NWR [Dataset]. https://catalog.data.gov/dataset/r-code-dataset-analysis-and-output-2012-2020-occupancy-and-probability-of-detection-for-ba
Explore at:
Dataset updated
Feb 22, 2025
Dataset provided by
U.S. Fish and Wildlife Servicehttp://www.fws.gov/
Description
This reference contains the R-code for the analysis and summary of detections of Bachman's sparrow, bobwhite quail and brown-headed nuthatch through 2020. Specifically generates probability of detection and occupancy of the species based on call counts and elicited calls with playback. The code loads raw point count (CSV files) and fire history data (CSV) and cleans/transforms into a tidy format for occupancy analysis. It then creates the necessary data structure for occupancy analysis, performs the analysis for the three focal species, and provides functionality for generating tables and figures summarizing the key findings of the occupancy analysis. The raw data, point count locations and other spatial data (ShapeFiles) are contained in the dataset.
Supplementary data for "Characterizing Intraspecific Resource Utilization in...
zenodo.org
zip
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Claus-Peter Stelzer; Claus-Peter Stelzer (2025). Supplementary data for "Characterizing Intraspecific Resource Utilization in an Aquatic Consumer Using High-Throughput Phenotyping" [Dataset]. http://doi.org/10.5281/zenodo.15745402
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15745402
Dataset updated
Jun 26, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Claus-Peter Stelzer; Claus-Peter Stelzer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains the raw data for the study:

Characterizing Intraspecific Resource Utilization in an Aquatic Consumer Using High-Throughput Phenotyping

Data are provided separately for the first experiment (numerical response experiment with 16 rotifer clones across six food concentrations) and the second experiment (growth rate measurements with 98 rotifer clones across two food concentrations).

Contents of first_experiment.zip

input/ This folder contains raw count data (output of the Wellcounter software):
popgrowth_

output/ output files produced by the R-script 'first_experiment_analysis.Rmd'

wellcounter/ contains the Wellcounter software (programs and configuration files) that were used for running the raw analysis of this dataset on a High Performance Computing cluster

first_experiment_analysis.Rmd R-Markdown file with data processing and statistical analysis of the first experiment
numerical_response_2par.R A function required by 'first_experiment_analysis.Rmd'

Contents of second_experiment.zip

input/ This folder contains raw count and behavioral data (output of the Wellcounter software):
popgrowth_

output/ output files produced by the R-script 'second_experiment_analysis.Rmd'

wellcounter/ contains the Wellcounter software (programs and configuration files) that were used for running the raw analysis (image and motion analysis) of this dataset on a High Performance Computing cluster

second_experiment_prep_run1.Rmd R-Markdown file for preprocessing the data from run1
second_experiment_prep_run2.Rmd R-Markdown file for preprocessing the data from run2
second_experiment_analysis.Rmd R-Markdown file with data processing and statistical analysis of the second experiment
extract_fixed_effects_table.R A function required by 'second_experiment_analysis.Rmd'
d
Integrated Hourly Meteorological Database of 20 Meteorological Stations...
search.dataone.org
osti.gov
Updated Jan 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Boris Faybishenko; Dylan O'Ryan (2025). Integrated Hourly Meteorological Database of 20 Meteorological Stations (1981-2022) for Watershed Function SFA Hydrological Modeling [Dataset]. http://doi.org/10.15485/2502101
Explore at:
Unique identifier
https://doi.org/10.15485/2502101
Dataset updated
Jan 17, 2025
Dataset provided by
ESS-DIVE
Authors
Boris Faybishenko; Dylan O'Ryan
Time period covered
Jan 1, 1981 - Dec 31, 2022
Area covered

Description
This dataset contains (a) a script “R_met_integrated_for_modeling.R”, and (b) associated input CSV files: 3 CSV files per location to create a 5-variable integrated meteorological dataset file (air temperature, precipitation, wind speed, relative humidity, and solar radiation) for 19 meteorological stations and 1 location within Trail Creek from the modeling team within the East River Community Observatory as part of the Watershed Function Scientific Focus Area (SFA). As meteorological forcings varied across the watershed, a high-frequency database is needed to ensure consistency in the data analysis and modeling. We evaluated several data sources, including gridded meteorological products and field data from meteorological stations. We determined that our modeling efforts required multiple data sources to meet all their needs. As output, this dataset contains (c) a single CSV data file (*_1981-2022.csv) for each location (20 CSV output files total) containing hourly time series data for 1981 to 2022 and (d) five PNG files of time series and density plots for each variable per location (100 PNG files). Detailed location metadata is contained within the Integrated_Met_Database_Locations.csv file for each point location included within this dataset, obtained from Varadharajan et al., 2023 doi:10.15485/1660962. This dataset also includes (e) a file-level metadata (flmd.csv) file that lists each file contained in the dataset with associated metadata and (f) a data dictionary (dd.csv) file that contains column/row headers used throughout the files along with a definition, units, and data type. Review the (g) ReadMe_Integrated_Met_Database.pdf file for additional details on the script, methods, and structure of the dataset. The script integrates Northwest Alliance for Computational Science and Engineering’s PRISM gridded data product, National Oceanic and Atmospheric Administration’s NCEP-NCAR Reanalysis 1 gridded data product (through the RCNEP R package, Kemp et al., doi:10.32614/CRAN.package.RNCEP), and analytical-based calculations. Further, this script downscales the input data into hourly frequency, which is necessary for the modeling efforts.
u
QA/QC-ed Groundwater Level Time Series in PLM-1 and PLM-6 Monitoring Wells,...
data.nceas.ucsb.edu
dataone.org
+2more
Updated Feb 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Boris Faybishenko; Roelof Versteeg; Kenneth Williams; Rosemary Carroll; Wenming Dong; Tetsu Tokunaga; Dylan O'Ryan (2024). QA/QC-ed Groundwater Level Time Series in PLM-1 and PLM-6 Monitoring Wells, East River, Colorado (2016-2022) [Dataset]. http://doi.org/10.15485/1866836
Explore at:
Unique identifier
https://doi.org/10.15485/1866836
Dataset updated
Feb 8, 2024
Dataset provided by
ESS-DIVE
Authors
Boris Faybishenko; Roelof Versteeg; Kenneth Williams; Rosemary Carroll; Wenming Dong; Tetsu Tokunaga; Dylan O'Ryan
Time period covered
Nov 30, 2016 - Oct 13, 2022
Area covered

Description
This data set contains QA/QC-ed (Quality Assurance and Quality Control) water level data for the PLM1 and PLM6 wells. PLM1 and PLM6 are location identifiers used by the Watershed Function SFA project for two groundwater monitoring wells along an elevation gradient located along the lower montane life zone of a hillslope near the Pumphouse location at the East River Watershed, Colorado, USA. These wells are used to monitor subsurface water and carbon inventories and fluxes, and to determine the seasonally dependent flow of groundwater under the PLM hillslope. The downslope flow of groundwater in combination with data on groundwater chemistry (see related references) can be used to estimate rates of solute export from the hillslope to the floodplain and river. QA/QC analysis of measured groundwater levels in monitoring wells PLM-1 and PLM-6 included identification and flagging of duplicated values of timestamps, gap filling of missing timestamps and water levels, removal of abnormal/bad and outliers of measured water levels. The QA/QC analysis also tested the application of different QA/QC methods and the development of regular (5-minute, 1-hour, and 1-day) time series datasets, which can serve as a benchmark for testing other QA/QC techniques, and will be applicable for ecohydrological modeling. The package includes a Readme file, one R code file used to perform QA/QC, a series of 8 data csv files (six QA/QC-ed regular time series datasets of varying intervals (5-min, 1-hr, 1-day) and two files with QA/QC flagging of original data), and three files for the reporting format adoption of this dataset (InstallationMethods, file level metadata (flmd), and data dictionary (dd) files).QA/QC-ed data herein were derived from the original/raw data publication available at Williams et al., 2020 (DOI: 10.15485/1818367). For more information about running R code file (10.15485_1866836_QAQC_PLM1_PLM6.R) to reproduce QA/QC output files, see README (QAQC_PLM_readme.docx). This dataset replaces the previously published raw data time series, and is the final groundwater data product for the PLM wells in the East River. Complete metadata information on the PLM1 and PLM6 wells are available in a related dataset on ESS-DIVE: Varadharajan C, et al (2022). https://doi.org/10.15485/1660962. These data products are part of the Watershed Function Scientific Focus Area collection effort to further scientific understanding of biogeochemical dynamics from genome to watershed scales. 2022/09/09 Update: Converted data files using ESS-DIVE’s Hydrological Monitoring Reporting Format. With the adoption of this reporting format, the addition of three new files (v1_20220909_flmd.csv, V1_20220909_dd.csv, and InstallationMethods.csv) were added. The file-level metadata file (v1_20220909_flmd.csv) contains information specific to the files contained within the dataset. The data dictionary file (v1_20220909_dd.csv) contains definitions of column headers and other terms across the dataset. The installation methods file (InstallationMethods.csv) contains a description of methods associated with installation and deployment at PLM1 and PLM6 wells. Additionally, eight data files were re-formatted to follow the reporting format guidance (er_plm1_waterlevel_2016-2020.csv, er_plm1_waterlevel_1-hour_2016-2020.csv, er_plm1_waterlevel_daily_2016-2020.csv, QA_PLM1_Flagging.csv, er_plm6_waterlevel_2016-2020.csv, er_plm6_waterlevel_1-hour_2016-2020.csv, er_plm6_waterlevel_daily_2016-2020.csv, QA_PLM6_Flagging.csv). The major changes to the data files include the addition of header_rows above the data containing metadata about the particular well, units, and sensor description. 2023/01/18 Update: Dataset updated to include additional QA/QC-ed water level data up until 2022-10-12 for ER-PLM1 and 2022-10-13 for ER-PLM6. Reporting format specific files (v2_20230118_flmd.csv, v2_20230118_dd.csv, v2_20230118_InstallationMethods.csv) were updated to reflect the additional data. R code file (QAQC_PLM1_PLM6.R) was added to replace the previously uploaded HTML files to enable execution of the associated code. R code file (QAQC_PLM1_PLM6.R) and ReadMe file (QAQC_PLM_readme.docx) were revised to clarify where original data was retrieved from and to remove local file paths.
n
2007-08 V3 CEAMARC-CASO Bathymetry Plots Over Time During Events
cmr.earthdata.nasa.gov
researchdata.edu.au
+1more
Updated Sep 5, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). 2007-08 V3 CEAMARC-CASO Bathymetry Plots Over Time During Events [Dataset]. http://doi.org/10.4225/15/59ae2f5b239c2
Explore at:
Unique identifier
https://doi.org/10.4225/15/59ae2f5b239c2
Dataset updated
Sep 5, 2017
Time period covered
Dec 17, 2007 - Jan 26, 2008
Area covered

Description
A routine was developed in R ('bathy_plots.R') to plot bathymetry data over time during individual CEAMARC events. This is so we can analyse benthic data in relation to habitat, ie. did we trawl over a slope or was the sea floor relatively flat. Note that the depth range in the plots is autoscaled to the data, so a small range in depths appears as a scatetring of points. As long as you look at the depth scale though interpretation will be ok.

The R files need a file of bathymetry data in '200708V3_one_minute.csv' which is a file containing a data export from the underway PostgreSQL ship database and 'events.csv' which is a stripped down version of the events export from the ship board events database export. If you wish to run the code again you may need to change the pathnames in the R script to relevant locations. If you have opened the csv files in excel at any stage and the R script gets an error you may need to format the date/time columns as yyyy-mm-dd hh;mm:ss, save and close the file as csv without opening it again and then run the R script.

However, all output files are here for every CEAMARC event. Filenames contain a reference to CEAMARC event id. Files are in eps format and can be viewed using Ghostview which is available as a free download on the internet.
d
Input data, model output, and R scripts for a machine learning streamflow...
catalog.data.gov
data.usgs.gov
+2more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Input data, model output, and R scripts for a machine learning streamflow model on the Wyoming Range, Wyoming, 2012–17 [Dataset]. https://catalog.data.gov/dataset/input-data-model-output-and-r-scripts-for-a-machine-learning-streamflow-model-on-the-wyomi
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
U.S. Geological Survey
Area covered
Wyoming Range, Wyoming
Description
A machine learning streamflow (MLFLOW) model was developed in R (model is in the Rscripts folder) for modeling monthly streamflow from 2012 to 2017 in three watersheds on the Wyoming Range in the upper Green River basin. Geospatial information for 125 site features (vector data are in the Sites.shp file) and discrete streamflow observation data and environmental predictor data were used in fitting the MLFLOW model and predicting with the fitted model. Tabular calibration and validation data are in the Model_Fitting_Site_Data.csv file, totaling 971 discrete observations and predictions of monthly streamflow. Geospatial information for 17,518 stream grid cells (raster data are in the Streams.tif file) and environmental predictor data were used for continuous streamflow predictions with the MLFLOW model. Tabular prediction data for all the study area (17,518 stream grid cells) and study period (72 months; 2012–17) are in the Model_Prediction_Stream_Data.csv file, totaling 1,261,296 predictions of spatially and temporally continuous monthly streamflow. Additional information about the datasets is in the metadata included in the four zipped dataset files and about the MLFLOW model is in the readme included in the zipped model archive folder.
f
Preparing a satellite-operator-year panel from UCS and Space-Track data
middlebury.figshare.com
zip
Updated Sep 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akhil Rao; Ethan Berner; Gordon Lewis (2023). Preparing a satellite-operator-year panel from UCS and Space-Track data [Dataset]. http://doi.org/10.57968/Middlebury.23982468.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.57968/Middlebury.23982468.v1
Dataset updated
Sep 2, 2023
Dataset provided by
Middlebury
Authors
Akhil Rao; Ethan Berner; Gordon Lewis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Preparing a satellite-operator-year panel from UCS and space-track dataThis repository contains the data and R code needed to reproduce the dataset UCS-JSpOC-soy-panel-22.csv. This dataset combines and cleans data from the Union of Concerned Scientists and Space-Track.org to create a panel of satellites, operators, and years. This dataset is used in the paper "Oligopoly competition between satellite constellations will reduce economic welfare from orbit use". The final dataset can also be downloaded from the replication files for that paper: https://doi.org/10.57968/Middlebury.23816994.v1A "living" version of this repository can be found at: https://github.com/akhilrao/orbital-ownership-data# Repository structure* /UCS data contains Excel and CSV data files from the Union of Concerned Scientists, as well as output files generated from data cleaning. You can find the UCS Satellite Database here: https://www.ucsusa.org/resources/satellite-database . Historical data was obtained from Dr. Teri Grimwood.* /Space-Track data contains JSON data from Space-Track.org, files to help identify operator names for harmonization in UCS_text_cleaner.R, and output generated from cleaning and merging data. * API queries to generate the JSON files can be found in json_cleaned_script.R. They are restated below for convenience. These queries were run on January 1, 2023 to produce the data used in "Oligopoly competition between satellite constellations will reduce economic welfare from orbit use". * 33999/OBJECT_TYPE/PAYLOAD/orderby/INTLDES asc/emptyresult/show* /Current R scripts contains R scripts to process the data. * combined_scripts.R loads and cleans UCS data. It takes the raw CSV files from /UCS data as input and produces UCS_Combined_Data.csv as output. * UCS_text_cleaner.R harmonizes various text fields in the UCS data, including operator and owner names. Best efforts were made to ensure correctness and completeness, but some gaps may remain. * json_cleaned_script.R loads and cleans Space-Track data, and merges it with the cleaned and combined UCS data. * panel_builder.R uses the cleaned and merged files to construct the satellite-operator-year panel dataset with annual satellite histories and operator information. The logic behind the dataset construction approach is described in this blog post: https://akhilrao.github.io/blog//data/2020/08/20/build_stencil_cut/* /Output_figures contains figures produced by the scripts. Some are diagnostic, some are just interesting.* /Output_data contains the final data outputs.* /data-cleaning-notes contains Excel and CSV files used to assist in harmonizing text fields in UCS_text_cleaner.R. They are included here for completeness.# Creating the datasetTo reproduce the UCS-JSpOC-soy-panel-22.csv dataset:1. Ensure R is installed along with the required packages2. Run the scripts in /Current R scripts in the following order: * combined_scripts.R (this will call UCS_text_cleaner.R) * json_cleaned_script.R * panel_builder.R3. The output file UCS-JSpOC-soy-panel-22.csv, along with several intermediate files used to create it, will be generated in /Output data
d
Data from: Data and code from: Stem borer herbivory dependent on...
catalog.data.gov
agdatacommons.nal.usda.gov
+2more
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Data and code from: Stem borer herbivory dependent on interactions of sugarcane variety, associated traits, and presence of prior borer damage [Dataset]. https://catalog.data.gov/dataset/data-and-code-from-stem-borer-herbivory-dependent-on-interactions-of-sugarcane-variety-ass-1e076
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Service
Description
This dataset contains all the data and code needed to reproduce the analyses in the manuscript: Penn, H. J., & Read, Q. D. (2023). Stem borer herbivory dependent on interactions of sugarcane variety, associated traits, and presence of prior borer damage. Pest Management Science. https://doi.org/10.1002/ps.7843 Included are two .Rmd notebooks containing all code required to reproduce the analyses in the manuscript, two .html file of rendered notebook output, three .csv data files that are loaded and analyzed, and a .zip file of intermediate R objects that are generated during the model fitting and variable selection process. Notebook files 01_boring_analysis.Rmd: This RMarkdown notebook contains R code to read and process the raw data, create exploratory data visualizations and tables, fit a Bayesian generalized linear mixed model, extract output from the statistical model, and create graphs and tables summarizing the model output including marginal means for different varieties and contrasts between crop years. 02_trait_covariate_analysis.Rmd: This RMarkdown notebook contains R code to read raw variety-level trait data, perform feature selection based on correlations between traits, fit another generalized linear mixed model using traits as predictors, and create graphs and tables from that model output including marginal means by categorical trait and marginal trends by continuous trait. HTML files These HTML files contain the rendered output of the two RMarkdown notebooks. They were generated by Quentin Read on 2023-08-30 and 2023-08-15. 01_boring_analysis.html 02_trait_covariate_analysis.html CSV data files These files contain the raw data. To recreate the notebook output the CSV files should be at the file path project/data/ relative to where the notebook is run. Columns are described below. BoredInternodes_26April2022_no format.csv: primary data file with sugarcane borer (SCB) damage Columns A-C are the year, date, and location. All location values are the same. Column D identifies which experiment the data point was collected from. Column E, Stubble, indicates the crop year (plant cane or first stubble) Column F indicates the variety Column G indicates the plot (integer ID) Column H indicates the stalk within each plot (integer ID) Column I, # Internodes, indicates how many internodes were on the stalk Columns J-AM are numbered 1-30 and indicate whether SCB damage was observed on that internode (0 if no, 1 if yes, blank cell if that internode was not present on the stalk) Column AN indicates the experimental treatment for those rows that are part of a manipulative experiment Column AO contains notes variety_lookup.csv: summary information for the 16 varieties analyzed in this study Column A is the variety name Column B is the total number of stalks assessed for SCB damage for that variety across all years Column C is the number of years that variety is present in the data Column D, Stubble, indicates which crop years were sampled for that variety ("PC" if only plant cane, "PC, 1S" if there are data for both plant cane and first stubble crop years) Column E, SCB resistance, is a categorical designation with four values: susceptible, moderately susceptible, moderately resistant, resistant Column F is the literature reference for the SCB resistance value Select_variety_traits_12Dec2022.csv: variety-level traits for the 16 varieties analyzed in this study Column A is the variety name Column B is the SCB resistance designation as an integer Column C is the categorical SCB resistance designation (see above) Columns D-I are continuous traits from year 1 (plant cane), including sugar (Mg/ha), biomass or aboveground cane production (Mg/ha), TRS or theoretically recoverable sugar (g/kg), stalk weight of individual stalks (kg), stalk population density (stalks/ha), and fiber content of stalk (percent). Columns J-O are the same continuous traits from year 2 (first stubble) Columns P-V are categorical traits (in some cases continuous traits binned into categories): maturity timing, amount of stalk wax, amount of leaf sheath wax, amount of leaf sheath hair, tightness of leaf sheath, whether leaf sheath becomes necrotic with age, and amount of collar hair. ZIP file of intermediate R objects To recreate the notebook output without having to run computationally intensive steps, unzip the archive. The fitted model objects should be at the file path project/ relative to where the notebook is run. intermediate_R_objects.zip: This file contains intermediate R objects that are generated during the model fitting and variable selection process. You may use the R objects in the .zip file if you would like to reproduce final output including figures and tables without having to refit the computationally intensive statistical models. binom_fit_intxns_updated_only5yrs.rds: fitted brms model object for the main statistical model binom_fit_reduced.rds: fitted brms model object for the trait covariate analysis marginal_trends.RData: calculated values of the estimated marginal trends with respect to year and previous damage marginal_trend_trs.rds: calculated values of the estimated marginal trend with respect to TRS marginal_trend_fib.rds: calculated values of the estimated marginal trend with respect to fiber content Resources in this dataset:Resource Title: Sugarcane borer damage data by internode, 1993-2021. File Name: BoredInternodes_26April2022_no format.csvResource Title: Summary information for the 16 sugarcane varieties analyzed. File Name: variety_lookup.csvResource Title: Variety-level traits for the 16 sugarcane varieties analyzed. File Name: Select_variety_traits_12Dec2022.csvResource Title: RMarkdown notebook 2: trait covariate analysis. File Name: 02_trait_covariate_analysis.RmdResource Title: Rendered HTML output of notebook 2. File Name: 02_trait_covariate_analysis.htmlResource Title: RMarkdown notebook 1: main analysis. File Name: 01_boring_analysis.RmdResource Title: Rendered HTML output of notebook 1. File Name: 01_boring_analysis.htmlResource Title: Intermediate R objects. File Name: intermediate_R_objects.zip
Electronic Disclosure System - State and Local Election Funding and...
researchdata.edu.au
data.qld.gov.au
+1more
Updated Jan 10, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.qld.gov.au (2019). Electronic Disclosure System - State and Local Election Funding and Donations [Dataset]. https://researchdata.edu.au/electronic-disclosure-state-funding-donations/1360703
Explore at:
Dataset updated
Jan 10, 2019
Dataset provided by
Queensland Governmenthttp://qld.gov.au/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Electoral Commission of Queensland is responsible for the Electronic Disclosure System (EDS), which provides real-time reporting of political donations. It aims to streamline the disclosure process while increasing transparency surrounding gifts.\r \r All entities conducting or supporting political activity in Queensland are required to submit a disclosure return to the Electoral Commission of Queensland. These include reporting of gifts and loans, as well as periodic reporting of other dealings such as advertising and expenditure. EDS makes these returns readily available to the public, providing faster and easier access to political financial disclosure information.\r \r The EDS is an outcome of the Electoral Commission of Queensland's ongoing commitment to the people of Queensland, to drive improvements to election services and meet changing community needs.\r \r To export the data from the EDS as a CSV file, consult this page: https://helpcentre.disclosures.ecq.qld.gov.au/hc/en-us/articles/115003351428-Can-I-export-the-data-I-can-see-in-the-map-\r \r For a detailed glossary of terms used by the EDS, please consult this page: https://helpcentre.disclosures.ecq.qld.gov.au/hc/en-us/articles/115002784587-Glossary-of-Terms-in-EDS\r \r For other information about how to use the EDS, please consult the FAQ page here: https://helpcentre.disclosures.ecq.qld.gov.au/hc/en-us/categories/115000599068-FAQs
d
Data from: Commercial harvest and export of snapping turtles (Chelydra...
datadryad.org
search.dataone.org
zip
Updated Nov 17, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin C. Colteaux; Derek M. Johnson (2017). Commercial harvest and export of snapping turtles (Chelydra serpentina) in the United States: trends and the efficacy of size limits at reducing harvest [Dataset]. http://doi.org/10.5061/dryad.j5v05
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.j5v05
Dataset updated
Nov 17, 2017
Dataset provided by
Dryad
Authors
Benjamin C. Colteaux; Derek M. Johnson
Time period covered
Nov 16, 2016
Area covered
United States
Description
State Harvest Data (csv)Commercial snapping turtle harvest data (in individuals) for eleven states from 1998 - 2013. States reporting are Arkansas, Delaware, Iowa, Maryland, Massachusetts, Michigan, Minnesota, New Jersey, North Carolina, Pennsylvania, and Virginia.StateHarvestData.csvInput and execution code for Colteaux_Johnson_2016Attached R file includes the code described in the listed publication. The companion JAGS (just another Gibbs sampler) code is also stored in this repository under separate cover.ColteauxJohnsonNatureConservation.RJAGS model code for Colteaux_Johnson_2016Attached R file includes the JAGS (just another Gibbs sampler) code described in the listed publication. The companion input and execution code is also stored in this repository under separate cover.ColteauxJohnsonNatureConservationJAGS.R
n
ESG rating of general stock indices
narcis.nl
data.mendeley.com
Updated Oct 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erhart, S (via Mendeley Data) (2021). ESG rating of general stock indices [Dataset]. http://doi.org/10.17632/58mwkj5pf8.1
Explore at:
Unique identifier
https://doi.org/10.17632/58mwkj5pf8.1
Dataset updated
Oct 22, 2021
Dataset provided by
Data Archiving and Networked Services (DANS)
Authors
Erhart, S (via Mendeley Data)
Description
################################################################################################## THE FILES HAVE BEEN CREATED BY SZILÁRD ERHART FOR A RESEARCH: ERHART (2021): ESG RATINGS OF GENERAL # STOCK EXCHANGE INDICES, INTERNATIONAL REVIEW OF FINANCIAL ANALYSIS# USERS OF THE FILES AGREE TO QUOTE THE ABOVE PAPER# THE PYTHON SCRIPT (PYTHONESG_ERHART.TXT) HELPS USERS TO GET TICKERS BY STOCK EXCHANGES AND EXTRACT ESG SCORES FOR THE UNDERLYING STOCKS FROM YAHOO FINANCE.# THE R SCRIPT (ESG_UA.TXT) HELPS TO REPLICATE THE MONTE CARLO EXPERIMENT DETAILED IN THE STUDY.# THE EXPORT_ALL CSV CONTAINS THE DOWNLOADED ESG DATA (SCORES, CONTROVERSIES, ETC) ORGANIZED BY STOCKS AND EXCHANGES.############################################################################################################################################################################################################### DISCLAIMER # The author takes no responsibility for the timeliness, accuracy, completeness or quality of the information provided. # The author is in no event liable for damages of any kind incurred or suffered as a result of the use or non-use of the # information presented or the use of defective or incomplete information. # The contents are subject to confirmation and not binding. # The author expressly reserves the right to alter, amend, whole and in part, # without prior notice or to discontinue publication for a period of time or even completely. ###########################################################################################################################################READ ME############################################################# BEFORE USING THE MONTE CARLO SIMULATIONS SCRIPT: # (1) COPY THE goascores.csv and goalscores_alt.csv FILES ONTO YOUR ON COMPUTER DRIVE. THE TWO FILES ARE IDENTICAL.# (2) SET THE EXACT FILE LOCATION INFORMATION IN THE 'Read in data' SECTION OF THE MONTE CARLO SCRIPT AND FOR THE OUTPUT FILES AT THE END OF THE SCRIPT# (3) LOAD MISC TOOLS AND MATRIXSTATS IN YOUR R APPLICATION# (4) RUN THE CODE.####################################READ ME
H
Consumer Expenditure Survey (CE)
dataverse.harvard.edu
Updated May 30, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anthony Damico (2013). Consumer Expenditure Survey (CE) [Dataset]. http://doi.org/10.7910/DVN/UTNJAH
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/UTNJAH
Dataset updated
May 30, 2013
Dataset provided by
Harvard Dataverse
Authors
Anthony Damico
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
analyze the consumer expenditure survey (ce) with r the consumer expenditure survey (ce) is the primo data source to understand how americans spend money. participating households keep a running diary about every little purchase over the year. those diaries are then summed up into precise expenditure categories. how else are you gonna know that the average american household spent $34 (±2) on bacon, $826 (±17) on cellular phones, and $13 (±2) on digital e-readers in 2011? an integral component of the market basket calculation in the consumer price index, this survey recently became available as public-use microdata and they're slowly releasing historical files back to 1996. hooray! for a t aste of what's possible with ce data, look at the quick tables listed on their main page - these tables contain approximately a bazillion different expenditure categories broken down by demographic groups. guess what? i just learned that americans living in households with $5,000 to $9,999 of annual income spent an average of $283 (±90) on pets, toys, hobbies, and playground equipment (pdf page 3). you can often get close to your statistic of interest from these web tables. but say you wanted to look at domestic pet expenditure among only households with children between 12 and 17 years old. another one of the thirteen web tables - the consumer unit composition table - shows a few different breakouts of households with kids, but none matching that exact population of interest. the bureau of labor statistics (bls) (the survey's designers) and the census bureau (the survey's administrators) have provided plenty of the major statistics and breakouts for you, but they're not psychic. if you want to comb through this data for specific expenditure categories broken out by a you-defined segment of the united states' population, then let a little r into your life. fun starts now. fair warning: only analyze t he consumer expenditure survey if you are nerd to the core. the microdata ship with two different survey types (interview and diary), each containing five or six quarterly table formats that need to be stacked, merged, and manipulated prior to a methodologically-correct analysis. the scripts in this repository contain examples to prepare 'em all, just be advised that magnificent data like this will never be no-assembly-required. the folks at bls have posted an excellent summary of what's av ailable - read it before anything else. after that, read the getting started guide. don't skim. a few of the descriptions below refer to sas programs provided by the bureau of labor statistics. you'll find these in the C:\My Directory\CES\2011\docs directory after you run the download program. this new github repository contains three scripts: 2010-2011 - download all microdata.R lo op through every year and download every file hosted on the bls's ce ftp site import each of the comma-separated value files into r with read.csv depending on user-settings, save each table as an r data file (.rda) or stat a-readable file (.dta) 2011 fmly intrvw - analysis examples.R load the r data files (.rda) necessary to create the 'fmly' table shown in the ce macros program documentation.doc file construct that 'fmly' table, using five quarters of interviews (q1 2011 thru q1 2012) initiate a replicate-weighted survey design object perform some lovely li'l analysis examples replicate the %mean_variance() macro found in "ce macros.sas" and provide some examples of calculating descriptive statistics using unimputed variables replicate the %compare_groups() macro found in "ce macros.sas" and provide some examples of performing t -tests using unimputed variables create an rsqlite database (to minimize ram usage) containing the five imputed variable files, after identifying which variables were imputed based on pdf page 3 of the user's guide to income imputation initiate a replicate-weighted, database-backed, multiply-imputed survey design object perform a few additional analyses that highlight the modified syntax required for multiply-imputed survey designs replicate the %mean_variance() macro found in "ce macros.sas" and provide some examples of calculating descriptive statistics using imputed variables repl icate the %compare_groups() macro found in "ce macros.sas" and provide some examples of performing t-tests using imputed variables replicate the %proc_reg() and %proc_logistic() macros found in "ce macros.sas" and provide some examples of regressions and logistic regressions using both unimputed and imputed variables replicate integrated mean and se.R match each step in the bls-provided sas program "integr ated mean and se.sas" but with r instead of sas create an rsqlite database when the expenditure table gets too large for older computers to handle in ram export a table "2011 integrated mean and se.csv" that exactly matches the contents of the sas-produced "2011 integrated mean and se.lst" text file click here to view these three scripts for...
f
Data and tools for studying isograms
figshare.com
Updated Jul 31, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Florian Breit (2017). Data and tools for studying isograms [Dataset]. http://doi.org/10.6084/m9.figshare.5245810.v1
Explore at:
application/x-sqlite3Available download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5245810.v1
Dataset updated
Jul 31, 2017
Dataset provided by
figshare
Authors
Florian Breit
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A collection of datasets and python scripts for extraction and analysis of isograms (and some palindromes and tautonyms) from corpus-based word-lists, specifically Google Ngram and the British National Corpus (BNC).Below follows a brief description, first, of the included datasets and, second, of the included scripts.1. DatasetsThe data from English Google Ngrams and the BNC is available in two formats: as a plain text CSV file and as a SQLite3 database.1.1 CSV formatThe CSV files for each dataset actually come in two parts: one labelled ".csv" and one ".totals". The ".csv" contains the actual extracted data, and the ".totals" file contains some basic summary statistics about the ".csv" dataset with the same name.The CSV files contain one row per data point, with the colums separated by a single tab stop. There are no labels at the top of the files. Each line has the following columns, in this order (the labels below are what I use in the database, which has an identical structure, see section below):

Label Data type Description

isogramy int The order of isogramy, e.g. "2" is a second order isogram

length int The length of the word in letters

word text The actual word/isogram in ASCII

source_pos text The Part of Speech tag from the original corpus

count int Token count (total number of occurences)

vol_count int Volume count (number of different sources which contain the word)

count_per_million int Token count per million words

vol_count_as_percent int Volume count as percentage of the total number of volumes

is_palindrome bool Whether the word is a palindrome (1) or not (0)

is_tautonym bool Whether the word is a tautonym (1) or not (0)

The ".totals" files have a slightly different format, with one row per data point, where the first column is the label and the second column is the associated value. The ".totals" files contain the following data:

Label

Data type

Description

!total_1grams

int

The total number of words in the corpus

!total_volumes

int

The total number of volumes (individual sources) in the corpus

!total_isograms

int

The total number of isograms found in the corpus (before compacting)

!total_palindromes

int

How many of the isograms found are palindromes

!total_tautonyms

int

How many of the isograms found are tautonyms

The CSV files are mainly useful for further automated data processing. For working with the data set directly (e.g. to do statistics or cross-check entries), I would recommend using the database format described below.1.2 SQLite database formatOn the other hand, the SQLite database combines the data from all four of the plain text files, and adds various useful combinations of the two datasets, namely:• Compacted versions of each dataset, where identical headwords are combined into a single entry.• A combined compacted dataset, combining and compacting the data from both Ngrams and the BNC.• An intersected dataset, which contains only those words which are found in both the Ngrams and the BNC dataset.The intersected dataset is by far the least noisy, but is missing some real isograms, too.The columns/layout of each of the tables in the database is identical to that described for the CSV/.totals files above.To get an idea of the various ways the database can be queried for various bits of data see the R script described below, which computes statistics based on the SQLite database.2. ScriptsThere are three scripts: one for tiding Ngram and BNC word lists and extracting isograms, one to create a neat SQLite database from the output, and one to compute some basic statistics from the data. The first script can be run using Python 3, the second script can be run using SQLite 3 from the command line, and the third script can be run in R/RStudio (R version 3).2.1 Source dataThe scripts were written to work with word lists from Google Ngram and the BNC, which can be obtained from http://storage.googleapis.com/books/ngrams/books/datasetsv2.html and [https://www.kilgarriff.co.uk/bnc-readme.html], (download all.al.gz).For Ngram the script expects the path to the directory containing the various files, for BNC the direct path to the *.gz file.2.2 Data preparationBefore processing proper, the word lists need to be tidied to exclude superfluous material and some of the most obvious noise. This will also bring them into a uniform format.Tidying and reformatting can be done by running one of the following commands:python isograms.py --ngrams --indir=INDIR --outfile=OUTFILEpython isograms.py --bnc --indir=INFILE --outfile=OUTFILEReplace INDIR/INFILE with the input directory or filename and OUTFILE with the filename for the tidied and reformatted output.2.3 Isogram ExtractionAfter preparing the data as above, isograms can be extracted from by running the following command on the reformatted and tidied files:python isograms.py --batch --infile=INFILE --outfile=OUTFILEHere INFILE should refer the the output from the previosu data cleaning process. Please note that the script will actually write two output files, one named OUTFILE with a word list of all the isograms and their associated frequency data, and one named "OUTFILE.totals" with very basic summary statistics.2.4 Creating a SQLite3 databaseThe output data from the above step can be easily collated into a SQLite3 database which allows for easy querying of the data directly for specific properties. The database can be created by following these steps:1. Make sure the files with the Ngrams and BNC data are named “ngrams-isograms.csv” and “bnc-isograms.csv” respectively. (The script assumes you have both of them, if you only want to load one, just create an empty file for the other one).2. Copy the “create-database.sql” script into the same directory as the two data files.3. On the command line, go to the directory where the files and the SQL script are. 4. Type: sqlite3 isograms.db 5. This will create a database called “isograms.db”.See the section 1 for a basic descript of the output data and how to work with the database.2.5 Statistical processingThe repository includes an R script (R version 3) named “statistics.r” that computes a number of statistics about the distribution of isograms by length, frequency, contextual diversity, etc. This can be used as a starting point for running your own stats. It uses RSQLite to access the SQLite database version of the data described above.
f
S1 Supporting information -
plos.figshare.com
zip
Updated Oct 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jens Winther Johannsen; Julian Laabs; Magdalena M. E. Bunbury; Morten Fischer Mortensen (2024). S1 Supporting information - [Dataset]. http://doi.org/10.1371/journal.pone.0301938.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0301938.s001
Dataset updated
Oct 28, 2024
Dataset provided by
PLOS ONE
Authors
Jens Winther Johannsen; Julian Laabs; Magdalena M. E. Bunbury; Morten Fischer Mortensen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
S1 File. SI_C01_SPD_KDE_models. R-script for analysing radiocarbon dates dates. The code performs the computation of over-regional and regional SPD and KDE models, as well as their export to CSV files (Rmd). S2 File. SI_C02_aoristic_dating. R-script for exporting aoristic time series derived from typochronological dated archaeological material as CSV files (Rmd). S3 File. SI_C03_vegetation_openness_score_example. R-script performing the computation of a vegetation openness score from pollen records and the export of the generated time series as CVS file (Rmd). S4 File. SI_C04_data_preparation. Jupyter Notebook performing the import and transformation of relevant data visualize plots exhibited in the paper (ipynb). S5 File. SI_C05_figures_extra. Jupyter Notebook visualizing the plots exhibited in the paper (ipynb). S1 Data. SI_D01_reg_data_no_dups. Spread sheet holding radiocarbon dates, with the information of laboratory identification, site name, geographical coordinates, site type, material, source and regional affiliation (csv). S2 Data. SI_D02_reg_axe_dagger_graves. Spread sheet holding entries of axes and daggers, with the information of context, site, parish, artefact identification, type, subtype, absolute dating, typochonological dating, references, geographical coordinates and regional affiliations (csv). S3 Data. SI_D03_pollen_example. Spread sheet holding sample entries of the pollen records from Krageholm (neotoma Site ID 3204) and Bjäresjöholmsjön (neotoma Site ID 3017) for example run of S3 File. Record can be access via the neotoma explorer (https://apps.neotomadb.org/explorer/) with their given IDs. Each entry holds the information of the records type, regional affiliation, absolute BP and BCE dating, as well as the counts of given plant taxa (csv). S4 Data. SI_D04_PAP_303600_TOC_LOI. Table holding sample entries of TOC content, LOI and SST reconstruction of sediment core PAP_303600 for correlations of population development with Baltic sea surface temperature. Available via 10.1594/PANGAEA.883292 (tab). S5 Data. SI_D05_vos_[…]. Spread sheets holding the vegetation openness score time series of lake Belau, Vinge, Northern Jutland and Zealand (csv). (ZIP)
C
drugs
data.cityofchicago.org
Updated Sep 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chicago Police Department (2025). drugs [Dataset]. https://data.cityofchicago.org/widgets/iabf-5v6q
Explore at:
xml, kml, application/geo+json, xlsx, csv, kmzAvailable download formats
Dataset updated
Sep 1, 2025
Authors
Chicago Police Department
Description
This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. Should you have questions about this dataset, you may contact the Research & Development Division of the Chicago Police Department at 312.745.6071 or RandD@chicagopolice.org. Disclaimer: These crimes may be based upon preliminary information supplied to the Police Department by the reporting parties that have not been verified. The preliminary crime classifications may be changed at a later date based upon additional investigation and there is always the possibility of mechanical or human error. Therefore, the Chicago Police Department does not guarantee (either expressed or implied) the accuracy, completeness, timeliness, or correct sequencing of the information and the information should not be used for comparison purposes over time. The Chicago Police Department will not be responsible for any error or omission, or for the use of, or the results obtained from the use of this information. All data visualizations on maps should be considered approximate and attempts to derive specific addresses are strictly prohibited. The Chicago Police Department is not responsible for the content of any off-site pages that are referenced by or that reference this web page other than an official City of Chicago or Chicago Police Department web page. The user specifically acknowledges that the Chicago Police Department is not responsible for any defamatory, offensive, misleading, or illegal conduct of other users, links, or third parties and that the risk of injury from the foregoing rests entirely with the user. The unauthorized use of the words "Chicago Police Department," "Chicago Police," or any colorable imitation of these words or the unauthorized use of the Chicago Police Department logo is unlawful. This web page does not, in any way, authorize such use. Data is updated daily Tuesday through Sunday. The dataset contains more than 65,000 records/rows of data and cannot be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Wordpad, to view and search. To access a list of Chicago Police Department - Illinois Uniform Crime Reporting (IUCR) codes, go to http://data.cityofchicago.org/Public-Safety/Chicago-Police-Department-Illinois-Uniform-Crime-R/c7ck-438e
Supplementary files for Gernon et al. (2023) Rift-induced disruption of...
zenodo.org
zip
Updated Jul 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
T.K. Hincks; T.M. Gernon; T.K. Hincks; T.M. Gernon (2023). Supplementary files for Gernon et al. (2023) Rift-induced disruption of cratonic keels drives kimberlite volcanism [Dataset]. http://doi.org/10.5281/zenodo.7849142
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7849142
Dataset updated
Jul 27, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
T.K. Hincks; T.M. Gernon; T.K. Hincks; T.M. Gernon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Supplementary files for Gernon et al. (2023) Rift-induced disruption of cratonic keels drives kimberlite volcanism

This directory contains datasets, GIS files, Uninet model files and R scripts used to generate the figures in Gernon, T.M., Jones, S.M., Brune, S., Hincks, T.K., Palmer, M.R., Schumacher, J.C., Primiceri, R.M., Field, M., Griffin, W.L., O’Reilly, S.Y., Keir, D., Spencer, C.J., Merdith, A. & Glerum, A. Rift-induced disruption of cratonic keels drives kimberlite volcanism. Nature, doi: https://doi.org/10.1038/s41586-023-06193-3 (2023).

Contents:
GIS
COB_NAmerica.shp, COB_SAmerica.shp, COB_Africa.shp
Shapefiles for rifts in North America, South America and Africa with associated rift and breakup ages;
Combined_database_Kimberlites_Tappe_Brazil.shp Kimberlite locations (lon/lat) with ages

input data files
1_Ga_area_perimeter_1000_0_1myr.csv
Plate area and perimiter estimates used to generate Fragmentation index, see
Merdith A.S. et al., 2021, https://doi.org/10.1016/j.earscirev.2020.103477
and Merdith A.S. et al., 2017 https://doi.org/10.1016/j.gr.2017.04.001

GoodwinPreservationCurve.csv
Surface preservation estimate (as a function of time) from Goodwin, A.M. 1996
https://doi.org/10.1016/B978-0-12-289770-2.X5000-6

Kimberlites_ages.csv
Kimberlite database from Tappe et al. 2018, https://doi.org/10.1016/j.epsl.2017.12.013

LIP_input.csv
Ages and extent of Large Igneous Provinces from Ernst, 2014, https://doi.org/10.1017/CBO9781139025300

output data files
UNINET_KimberliteCount_LIPstart_dFrag_5MyrRes.csv and UNINET_Past500Myr_KimberliteCount_LIPstart_dFrag_5MyrRes.csv
5 Myr aggregated time series for Uninet lag analysis

Symmetric_dFrag9Myr_alldataNA.csv
Time series (1 Myr resolution) for:
Fragmentation (Frag);
Rate of change of fragmentation (dFrag_9Myr), calculated over a moving interval t+/-4 Myr as a measure of breakup;
Kimberlite count (Kimberlites);
Kimberlite count corrected for preservation (see GoodwinPreservationCurve.csv above) (CorrKimberlites);
Initiation of Large Igneous Provinces (LIPstart)

CLUSTER_sample_rates_Fig1e_ExFig4a.zip (large csv file)
Output from Migration_rate_estimates_with_uncertainty_MINLAG_and_MINDIST.r containing simulated estimates of migration rate, accounting for time, distance and model (rift association uncertainty) for Kimberlite clusters in N America, S America and Africa. This ouptut was used to generate Fig 1e and Extended Data Fig 4a in Gernon et al (2023)

Uninet files
UNINET_500Myr_5MryRes_dFrag9Myr_Kim
Emp_correl.txt, BN_correl.txt and Cond_correl.txt : Emprical, BN and Conditional rank correlations for Kimberlites and lagged dFrag (rate of change of fragmentation) using a 5 Myr resolution time series for the past 500 Myr.
CORRELATIONS_500Myr_5MryRes_dFrag9Myr_Kim.pdf and COND_CORR_500Myr_5MryRes_dFrag9Myr_Kim.pdf plots showing correlation v.s. lag
TEST_500Myr_5MryRes_dFrag9Myr_Kim.uninet Uninet model file used to estimate correlations

UNINET_1000Myr_5MryRes_LIPstartSum5Myr_dFrag9Myr
Emp_correl.txt, BN_correl.txt and Cond_correl.txt : Emprical, BN and Conditional rank correlations for dFrag (rate of change of fragmentation) and lagged values of LIPstart (initiation of Large Igneous Provinces) using a 5 Myr resolution time series for the past 1000 Myr.
COND_CORR_1000Myr_5MryRes_LIPstartSum5Myr_dFrag9Myr.pdf plot showing conditional correlation vs. lag
TEST_1000Myr_5MryRes_LIPstartSum5Myr_dFrag9Myr.uninet Uninet model file used to estimate correlations

Generate_Fig1b_ExtFig2_ExtFig6
R_fig1b_extended_fig2bcd.r
R script to generate Fig 1b and Extended Data Figs 2 and 6 for Gernon et al. (2023)
[ also includes generated figures and output data files ]

Generate_migration_rate_estimates
Migration_rate_estimates_with_uncertainty_MINLAG_and_MINDIST.r
R script to generate Fig 1e, and Extended Data Figs 4 and 5 for Gernon et al. (2023)
[ also includes generated figures and output data files ]
Note: UNINET_KimberliteCount_LIPstart_dFrag_5MyrRes.csv and UNINET_Past500Myr_KimberliteCount_LIPstart_dFrag_5MyrRes.csv are inputs to the Uninet models described above.

Facebook

Twitter

Click to copy link

Link copied

Cite

U.S. Geological Survey (2024). Streamflow, Dissolved Organic Carbon, and Nitrate Input Datasets and Model Results Using the Weighted Regressions on Time, Discharge, and Season (WRTDS) Model for Buck Creek Watersheds, Adirondack Park, New York, 2001 to 2021 [Dataset]. https://catalog.data.gov/dataset/streamflow-dissolved-organic-carbon-and-nitrate-input-datasets-and-model-results-using-the

Streamflow, Dissolved Organic Carbon, and Nitrate Input Datasets and Model Results Using the Weighted Regressions on Time, Discharge, and Season (WRTDS) Model for Buck Creek Watersheds, Adirondack Park, New York, 2001 to 2021

Explore at:

Dataset updated

Jul 6, 2024

Dataset provided by

United States Geological Surveyhttp://www.usgs.gov/

Description

This data release supports an analysis of changes in dissolved organic carbon (DOC) and nitrate concentrations in Buck Creek watershed near Inlet, New York 2001 to 2021. The Buck Creek watershed is a 310-hectare forested watershed that is recovering from acidic deposition within the Adirondack region. The data release includes pre-processed model inputs and model outputs for the Weighted Regressions on Time, Discharge and Season (WRTDS) model (Hirsch and others, 2010) to estimate daily flow normalized concentrations of DOC and nitrate during a 20-year period of analysis. WRTDS uses daily discharge and concentration observations implemented through the Exploration and Graphics for River Trends R package (EGRET) to predict solute concentration using decimal time and discharge as explanatory variables (Hirsch and De Cicco, 2015; Hirsch and others, 2010). Discharge and concentration data are available from the U.S. Geological Survey National Water Information System (NWIS) database (U.S. Geological Survey, 2016). The time series data were analyzed for the entire period, water years 2001 (WY2001) to WY2021 where WY2001 is the period from October 1, 2000 to September 30, 2001. This data release contains 5 comma-separated values (CSV) files, one R script, and one XML metadata file. There are four input files (“Daily.csv”, “INFO.csv”, “Sample_doc.csv”, and “Sample_nitrate.csv”) that contain site information, daily mean discharge, and mean daily DOC or nitrate concentrations. The R script (“Buck Creek WRTDS R script.R”) uses the four input datasets and functions from the EGRET R package to generate estimations of flow normalized concentrations. The output file (“WRTDS_results.csv”) contains model output at daily time steps for each sub-watershed and for each solute. Files are automatically associated with the R script when opened in RStudio using the provided R project file ("Files.Rproj"). All input, output, and R files are in the "Files.zip" folder.

Clear search

Close search

Google apps

Main menu

Streamflow, Dissolved Organic Carbon, and Nitrate Input Datasets and Model...

Replication Data for: Revisiting 'The Rise and Decline' in a Population of...

Petre_Slide_CategoricalScatterplotFigShare.pptx

7 Display the graph in a separate window. Dot colors indicate

Data to Assess Nitrogen Export from Forested Watersheds in and near the Long...

R-code, Dataset, Analysis and output (2012-2020): Occupancy and Probability...

Supplementary data for "Characterizing Intraspecific Resource Utilization in...

Integrated Hourly Meteorological Database of 20 Meteorological Stations...

QA/QC-ed Groundwater Level Time Series in PLM-1 and PLM-6 Monitoring Wells,...

2007-08 V3 CEAMARC-CASO Bathymetry Plots Over Time During Events

Input data, model output, and R scripts for a machine learning streamflow...

Preparing a satellite-operator-year panel from UCS and Space-Track data

Data from: Data and code from: Stem borer herbivory dependent on...

Electronic Disclosure System - State and Local Election Funding and...

Data from: Commercial harvest and export of snapping turtles (Chelydra...

ESG rating of general stock indices

Consumer Expenditure Survey (CE)

Data and tools for studying isograms

S1 Supporting information -

drugs

Supplementary files for Gernon et al. (2023) Rift-induced disruption of...

Streamflow, Dissolved Organic Carbon, and Nitrate Input Datasets and Model Results Using the Weighted Regressions on Time, Discharge, and Season (WRTDS) Model for Buck Creek Watersheds, Adirondack Park, New York, 2001 to 2021See More Versions

Streamflow, Dissolved Organic Carbon, and Nitrate Input Datasets and Model Results Using the Weighted Regressions on Time, Discharge, and Season (WRTDS) Model for Buck Creek Watersheds, Adirondack Park, New York, 2001 to 2021