100+ datasets found
  1. d

    Modeling data and data for figures and text

    • datasets.ai
    • catalog.data.gov
    10, 57
    Updated Aug 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Environmental Protection Agency (2024). Modeling data and data for figures and text [Dataset]. https://datasets.ai/datasets/modeling-data-and-data-for-figures-and-text
    Explore at:
    10, 57Available download formats
    Dataset updated
    Aug 27, 2024
    Dataset authored and provided by
    U.S. Environmental Protection Agency
    Description

    The data in this archive in in a zipped R data binary format, https://cran.r-project.org/doc/manuals/r-release/R-data.html. These data can be read by using the open source and free to use statistical software package R, https://www.r-project.org/. The data are organized following the figure numbering in the manuscript, e.g. Figure 1a is fig1a, and contains the same labeling as the figures including units and variable names. For a full explanation of the figure, please see the captions in the manuscript.

    To open this data file, use the following commands in R.

    load(‘JKelly_NH4NO3_JGR_2018.rdata’)

    To list the contents of the file, use the following command in R

    ls()

    The data for each figure is contained in the data object with the figures name. To list the data, simply type the name of the figure returned from the ls() command.

    The original model output and emissions used for this study are located on the ASM archived storage at /asm/ROMO/finescale/sjv2013. These data are in NetCDF format with self contained metadata with descriptive headers containing variable names, units, and simulation times.

    This dataset is associated with the following publication: Kelly, J., C. Parworth, Q. Zhang, D. Miller, K. Sun, M. Zondlo , K. Baker, A. Wisthaler, J. Nowak , S. Pusede , R. Cohen , A. Weinheimer , A. Beyersdorf , G. Tonnesen, J. Bash, L. Valin, J. Crawford, A. Fried , and J. Walega. Modeling NH4NO3 Over the San Joaquin Valley During the 2013 DISCOVER‐AQ Campaign. JOURNAL OF GEOPHYSICAL RESEARCH-ATMOSPHERES. American Geophysical Union, Washington, DC, USA, 123(9): 4727-4745, (2018).

  2. Data from: Optimized SMRT-UMI protocol produces highly accurate sequence...

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    zip
    Updated Dec 7, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Optimized SMRT-UMI protocol produces highly accurate sequence datasets from diverse populations – application to HIV-1 quasispecies [Dataset]. https://data.niaid.nih.gov/resources?id=dryad_w3r2280w0
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 7, 2023
    Dataset provided by
    HIV Prevention Trials Networkhttp://www.hptn.org/
    HIV Vaccine Trials Networkhttp://www.hvtn.org/
    National Institute of Allergy and Infectious Diseaseshttp://www.niaid.nih.gov/
    PEPFAR
    Authors
    Dylan Westfall; Mullins James
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Pathogen diversity resulting in quasispecies can enable persistence and adaptation to host defenses and therapies. However, accurate quasispecies characterization can be impeded by errors introduced during sample handling and sequencing which can require extensive optimizations to overcome. We present complete laboratory and bioinformatics workflows to overcome many of these hurdles. The Pacific Biosciences single molecule real-time platform was used to sequence PCR amplicons derived from cDNA templates tagged with universal molecular identifiers (SMRT-UMI). Optimized laboratory protocols were developed through extensive testing of different sample preparation conditions to minimize between-template recombination during PCR and the use of UMI allowed accurate template quantitation as well as removal of point mutations introduced during PCR and sequencing to produce a highly accurate consensus sequence from each template. Handling of the large datasets produced from SMRT-UMI sequencing was facilitated by a novel bioinformatic pipeline, Probabilistic Offspring Resolver for Primer IDs (PORPIDpipeline), that automatically filters and parses reads by sample, identifies and discards reads with UMIs likely created from PCR and sequencing errors, generates consensus sequences, checks for contamination within the dataset, and removes any sequence with evidence of PCR recombination or early cycle PCR errors, resulting in highly accurate sequence datasets. The optimized SMRT-UMI sequencing method presented here represents a highly adaptable and established starting point for accurate sequencing of diverse pathogens. These methods are illustrated through characterization of human immunodeficiency virus (HIV) quasispecies. Methods This serves as an overview of the analysis performed on PacBio sequence data that is summarized in Analysis Flowchart.pdf and was used as primary data for the paper by Westfall et al. "Optimized SMRT-UMI protocol produces highly accurate sequence datasets from diverse populations – application to HIV-1 quasispecies" Five different PacBio sequencing datasets were used for this analysis: M027, M2199, M1567, M004, and M005 For the datasets which were indexed (M027, M2199), CCS reads from PacBio sequencing files and the chunked_demux_config files were used as input for the chunked_demux pipeline. Each config file lists the different Index primers added during PCR to each sample. The pipeline produces one fastq file for each Index primer combination in the config. For example, in dataset M027 there were 3–4 samples using each Index combination. The fastq files from each demultiplexed read set were moved to the sUMI_dUMI_comparison pipeline fastq folder for further demultiplexing by sample and consensus generation with that pipeline. More information about the chunked_demux pipeline can be found in the README.md file on GitHub. The demultiplexed read collections from the chunked_demux pipeline or CCS read files from datasets which were not indexed (M1567, M004, M005) were each used as input for the sUMI_dUMI_comparison pipeline along with each dataset's config file. Each config file contains the primer sequences for each sample (including the sample ID block in the cDNA primer) and further demultiplexes the reads to prepare data tables summarizing all of the UMI sequences and counts for each family (tagged.tar.gz) as well as consensus sequences from each sUMI and rank 1 dUMI family (consensus.tar.gz). More information about the sUMI_dUMI_comparison pipeline can be found in the paper and the README.md file on GitHub. The consensus.tar.gz and tagged.tar.gz files were moved from sUMI_dUMI_comparison pipeline directory on the server to the Pipeline_Outputs folder in this analysis directory for each dataset and appended with the dataset name (e.g. consensus_M027.tar.gz). Also in this analysis directory is a Sample_Info_Table.csv containing information about how each of the samples was prepared, such as purification methods and number of PCRs. There are also three other folders: Sequence_Analysis, Indentifying_Recombinant_Reads, and Figures. Each has an .Rmd file with the same name inside which is used to collect, summarize, and analyze the data. All of these collections of code were written and executed in RStudio to track notes and summarize results. Sequence_Analysis.Rmd has instructions to decompress all of the consensus.tar.gz files, combine them, and create two fasta files, one with all sUMI and one with all dUMI sequences. Using these as input, two data tables were created, that summarize all sequences and read counts for each sample that pass various criteria. These are used to help create Table 2 and as input for Indentifying_Recombinant_Reads.Rmd and Figures.Rmd. Next, 2 fasta files containing all of the rank 1 dUMI sequences and the matching sUMI sequences were created. These were used as input for the python script compare_seqs.py which identifies any matched sequences that are different between sUMI and dUMI read collections. This information was also used to help create Table 2. Finally, to populate the table with the number of sequences and bases in each sequence subset of interest, different sequence collections were saved and viewed in the Geneious program. To investigate the cause of sequences where the sUMI and dUMI sequences do not match, tagged.tar.gz was decompressed and for each family with discordant sUMI and dUMI sequences the reads from the UMI1_keeping directory were aligned using geneious. Reads from dUMI families failing the 0.7 filter were also aligned in Genious. The uncompressed tagged folder was then removed to save space. These read collections contain all of the reads in a UMI1 family and still include the UMI2 sequence. By examining the alignment and specifically the UMI2 sequences, the site of the discordance and its case were identified for each family as described in the paper. These alignments were saved as "Sequence Alignments.geneious". The counts of how many families were the result of PCR recombination were used in the body of the paper. Using Identifying_Recombinant_Reads.Rmd, the dUMI_ranked.csv file from each sample was extracted from all of the tagged.tar.gz files, combined and used as input to create a single dataset containing all UMI information from all samples. This file dUMI_df.csv was used as input for Figures.Rmd. Figures.Rmd used dUMI_df.csv, sequence_counts.csv, and read_counts.csv as input to create draft figures and then individual datasets for eachFigure. These were copied into Prism software to create the final figures for the paper.

  3. Data from: Ecosystem-Level Determinants of Sustained Activity in Open-Source...

    • zenodo.org
    application/gzip, bin +2
    Updated Aug 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb (2024). Ecosystem-Level Determinants of Sustained Activity in Open-Source Projects: A Case Study of the PyPI Ecosystem [Dataset]. http://doi.org/10.5281/zenodo.1419788
    Explore at:
    bin, application/gzip, zip, text/x-pythonAvailable download formats
    Dataset updated
    Aug 2, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb
    License

    https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html

    Description
    Replication pack, FSE2018 submission #164:
    ------------------------------------------
    
    **Working title:** Ecosystem-Level Factors Affecting the Survival of Open-Source Projects: 
    A Case Study of the PyPI Ecosystem
    
    **Note:** link to data artifacts is already included in the paper. 
    Link to the code will be included in the Camera Ready version as well.
    
    
    Content description
    ===================
    
    - **ghd-0.1.0.zip** - the code archive. This code produces the dataset files 
     described below
    - **settings.py** - settings template for the code archive.
    - **dataset_minimal_Jan_2018.zip** - the minimally sufficient version of the dataset.
     This dataset only includes stats aggregated by the ecosystem (PyPI)
    - **dataset_full_Jan_2018.tgz** - full version of the dataset, including project-level
     statistics. It is ~34Gb unpacked. This dataset still doesn't include PyPI packages
     themselves, which take around 2TB.
    - **build_model.r, helpers.r** - R files to process the survival data 
      (`survival_data.csv` in **dataset_minimal_Jan_2018.zip**, 
      `common.cache/survival_data.pypi_2008_2017-12_6.csv` in 
      **dataset_full_Jan_2018.tgz**)
    - **Interview protocol.pdf** - approximate protocol used for semistructured interviews.
    - LICENSE - text of GPL v3, under which this dataset is published
    - INSTALL.md - replication guide (~2 pages)
    Replication guide
    =================
    
    Step 0 - prerequisites
    ----------------------
    
    - Unix-compatible OS (Linux or OS X)
    - Python interpreter (2.7 was used; Python 3 compatibility is highly likely)
    - R 3.4 or higher (3.4.4 was used, 3.2 is known to be incompatible)
    
    Depending on detalization level (see Step 2 for more details):
    - up to 2Tb of disk space (see Step 2 detalization levels)
    - at least 16Gb of RAM (64 preferable)
    - few hours to few month of processing time
    
    Step 1 - software
    ----------------
    
    - unpack **ghd-0.1.0.zip**, or clone from gitlab:
    
       git clone https://gitlab.com/user2589/ghd.git
       git checkout 0.1.0
     
     `cd` into the extracted folder. 
     All commands below assume it as a current directory.
      
    - copy `settings.py` into the extracted folder. Edit the file:
      * set `DATASET_PATH` to some newly created folder path
      * add at least one GitHub API token to `SCRAPER_GITHUB_API_TOKENS` 
    - install docker. For Ubuntu Linux, the command is 
      `sudo apt-get install docker-compose`
    - install libarchive and headers: `sudo apt-get install libarchive-dev`
    - (optional) to replicate on NPM, install yajl: `sudo apt-get install yajl-tools`
     Without this dependency, you might get an error on the next step, 
     but it's safe to ignore.
    - install Python libraries: `pip install --user -r requirements.txt` . 
    - disable all APIs except GitHub (Bitbucket and Gitlab support were
     not yet implemented when this study was in progress): edit
     `scraper/init.py`, comment out everything except GitHub support
     in `PROVIDERS`.
    
    Step 2 - obtaining the dataset
    -----------------------------
    
    The ultimate goal of this step is to get output of the Python function 
    `common.utils.survival_data()` and save it into a CSV file:
    
      # copy and paste into a Python console
      from common import utils
      survival_data = utils.survival_data('pypi', '2008', smoothing=6)
      survival_data.to_csv('survival_data.csv')
    
    Since full replication will take several months, here are some ways to speedup
    the process:
    
    ####Option 2.a, difficulty level: easiest
    
    Just use the precomputed data. Step 1 is not necessary under this scenario.
    
    - extract **dataset_minimal_Jan_2018.zip**
    - get `survival_data.csv`, go to the next step
    
    ####Option 2.b, difficulty level: easy
    
    Use precomputed longitudinal feature values to build the final table.
    The whole process will take 15..30 minutes.
    
    - create a folder `
  4. NYC STEW-MAP Staten Island organizations' website hyperlink webscrape

    • catalog.data.gov
    • gimi9.com
    Updated Nov 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2022). NYC STEW-MAP Staten Island organizations' website hyperlink webscrape [Dataset]. https://catalog.data.gov/dataset/nyc-stew-map-staten-island-organizations-website-hyperlink-webscrape
    Explore at:
    Dataset updated
    Nov 21, 2022
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Area covered
    Staten Island, New York
    Description

    The data represent web-scraping of hyperlinks from a selection of environmental stewardship organizations that were identified in the 2017 NYC Stewardship Mapping and Assessment Project (STEW-MAP) (USDA 2017). There are two data sets: 1) the original scrape containing all hyperlinks within the websites and associated attribute values (see "README" file); 2) a cleaned and reduced dataset formatted for network analysis. For dataset 1: Organizations were selected from from the 2017 NYC Stewardship Mapping and Assessment Project (STEW-MAP) (USDA 2017), a publicly available, spatial data set about environmental stewardship organizations working in New York City, USA (N = 719). To create a smaller and more manageable sample to analyze, all organizations that intersected (i.e., worked entirely within or overlapped) the NYC borough of Staten Island were selected for a geographically bounded sample. Only organizations with working websites and that the web scraper could access were retained for the study (n = 78). The websites were scraped between 09 and 17 June 2020 to a maximum search depth of ten using the snaWeb package (version 1.0.1, Stockton 2020) in the R computational language environment (R Core Team 2020). For dataset 2: The complete scrape results were cleaned, reduced, and formatted as a standard edge-array (node1, node2, edge attribute) for network analysis. See "READ ME" file for further details. References: R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/. Version 4.0.3. Stockton, T. (2020). snaWeb Package: An R package for finding and building social networks for a website, version 1.0.1. USDA Forest Service. (2017). Stewardship Mapping and Assessment Project (STEW-MAP). New York City Data Set. Available online at https://www.nrs.fs.fed.us/STEW-MAP/data/. This dataset is associated with the following publication: Sayles, J., R. Furey, and M. Ten Brink. How deep to dig: effects of web-scraping search depth on hyperlink network analysis of environmental stewardship organizations. Applied Network Science. Springer Nature, New York, NY, 7: 36, (2022).

  5. Data from: A dataset to model Levantine landcover and land-use change...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Dec 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Kempf; Michael Kempf (2023). A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19 [Dataset]. http://doi.org/10.5281/zenodo.10396148
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 16, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Michael Kempf; Michael Kempf
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 16, 2023
    Area covered
    Levant
    Description

    Overview

    This dataset is the repository for the following paper submitted to Data in Brief:

    Kempf, M. A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19. Data in Brief (submitted: December 2023).

    The Data in Brief article contains the supplement information and is the related data paper to:

    Kempf, M. Climate change, the Arab Spring, and COVID-19 - Impacts on landcover transformations in the Levant. Journal of Arid Environments (revision submitted: December 2023).

    Description/abstract

    The Levant region is highly vulnerable to climate change, experiencing prolonged heat waves that have led to societal crises and population displacement. Since 2010, the area has been marked by socio-political turmoil, including the Syrian civil war and currently the escalation of the so-called Israeli-Palestinian Conflict, which strained neighbouring countries like Jordan due to the influx of Syrian refugees and increases population vulnerability to governmental decision-making. Jordan, in particular, has seen rapid population growth and significant changes in land-use and infrastructure, leading to over-exploitation of the landscape through irrigation and construction. This dataset uses climate data, satellite imagery, and land cover information to illustrate the substantial increase in construction activity and highlights the intricate relationship between climate change predictions and current socio-political developments in the Levant.

    Folder structure

    The main folder after download contains all data, in which the following subfolders are stored are stored as zipped files:

    “code” stores the above described 9 code chunks to read, extract, process, analyse, and visualize the data.

    “MODIS_merged” contains the 16-days, 250 m resolution NDVI imagery merged from three tiles (h20v05, h21v05, h21v06) and cropped to the study area, n=510, covering January 2001 to December 2022 and including January and February 2023.

    “mask” contains a single shapefile, which is the merged product of administrative boundaries, including Jordan, Lebanon, Israel, Syria, and Palestine (“MERGED_LEVANT.shp”).

    “yield_productivity” contains .csv files of yield information for all countries listed above.

    “population” contains two files with the same name but different format. The .csv file is for processing and plotting in R. The .ods file is for enhanced visualization of population dynamics in the Levant (Socio_cultural_political_development_database_FAO2023.ods).

    “GLDAS” stores the raw data of the NASA Global Land Data Assimilation System datasets that can be read, extracted (variable name), and processed using code “8_GLDAS_read_extract_trend” from the respective folder. One folder contains data from 1975-2022 and a second the additional January and February 2023 data.

    “built_up” contains the landcover and built-up change data from 1975 to 2022. This folder is subdivided into two subfolder which contain the raw data and the already processed data. “raw_data” contains the unprocessed datasets and “derived_data” stores the cropped built_up datasets at 5 year intervals, e.g., “Levant_built_up_1975.tif”.

    Code structure

    1_MODIS_NDVI_hdf_file_extraction.R


    This is the first code chunk that refers to the extraction of MODIS data from .hdf file format. The following packages must be installed and the raw data must be downloaded using a simple mass downloader, e.g., from google chrome. Packages: terra. Download MODIS data from after registration from: https://lpdaac.usgs.gov/products/mod13q1v061/ or https://search.earthdata.nasa.gov/search (MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V061, last accessed, 09th of October 2023). The code reads a list of files, extracts the NDVI, and saves each file to a single .tif-file with the indication “NDVI”. Because the study area is quite large, we have to load three different (spatially) time series and merge them later. Note that the time series are temporally consistent.


    2_MERGE_MODIS_tiles.R


    In this code, we load and merge the three different stacks to produce large and consistent time series of NDVI imagery across the study area. We further use the package gtools to load the files in (1, 2, 3, 4, 5, 6, etc.). Here, we have three stacks from which we merge the first two (stack 1, stack 2) and store them. We then merge this stack with stack 3. We produce single files named NDVI_final_*consecutivenumber*.tif. Before saving the final output of single merged files, create a folder called “merged” and set the working directory to this folder, e.g., setwd("your directory_MODIS/merged").


    3_CROP_MODIS_merged_tiles.R


    Now we want to crop the derived MODIS tiles to our study area. We are using a mask, which is provided as .shp file in the repository, named "MERGED_LEVANT.shp". We load the merged .tif files and crop the stack with the vector. Saving to individual files, we name them “NDVI_merged_clip_*consecutivenumber*.tif. We now produced single cropped NDVI time series data from MODIS.
    The repository provides the already clipped and merged NDVI datasets.


    4_TREND_analysis_NDVI.R


    Now, we want to perform trend analysis from the derived data. The data we load is tricky as it contains 16-days return period across a year for the period of 22 years. Growing season sums contain MAM (March-May), JJA (June-August), and SON (September-November). December is represented as a single file, which means that the period DJF (December-February) is represented by 5 images instead of 6. For the last DJF period (December 2022), the data from January and February 2023 can be added. The code selects the respective images from the stack, depending on which period is under consideration. From these stacks, individual annually resolved growing season sums are generated and the slope is calculated. We can then extract the p-values of the trend and characterize all values with high confidence level (0.05). Using the ggplot2 package and the melt function from reshape2 package, we can create a plot of the reclassified NDVI trends together with a local smoother (LOESS) of value 0.3.
    To increase comparability and understand the amplitude of the trends, z-scores were calculated and plotted, which show the deviation of the values from the mean. This has been done for the NDVI values as well as the GLDAS climate variables as a normalization technique.


    5_BUILT_UP_change_raster.R


    Let us look at the landcover changes now. We are working with the terra package and get raster data from here: https://ghsl.jrc.ec.europa.eu/download.php?ds=bu (last accessed 03. March 2023, 100 m resolution, global coverage). Here, one can download the temporal coverage that is aimed for and reclassify it using the code after cropping to the individual study area. Here, I summed up different raster to characterize the built-up change in continuous values between 1975 and 2022.


    6_POPULATION_numbers_plot.R


    For this plot, one needs to load the .csv-file “Socio_cultural_political_development_database_FAO2023.csv” from the repository. The ggplot script provided produces the desired plot with all countries under consideration.


    7_YIELD_plot.R


    In this section, we are using the country productivity from the supplement in the repository “yield_productivity” (e.g., "Jordan_yield.csv". Each of the single country yield datasets is plotted in a ggplot and combined using the patchwork package in R.


    8_GLDAS_read_extract_trend


    The last code provides the basis for the trend analysis of the climate variables used in the paper. The raw data can be accessed https://disc.gsfc.nasa.gov/datasets?keywords=GLDAS%20Noah%20Land%20Surface%20Model%20L4%20monthly&page=1 (last accessed 9th of October 2023). The raw data comes in .nc file format and various variables can be extracted using the [“^a variable name”] command from the spatraster collection. Each time you run the code, this variable name must be adjusted to meet the requirements for the variables (see this link for abbreviations: https://disc.gsfc.nasa.gov/datasets/GLDAS_CLSM025_D_2.0/summary, last accessed 09th of October 2023; or the respective code chunk when reading a .nc file with the ncdf4 package in R) or run print(nc) from the code or use names(the spatraster collection).
    Choosing one variable, the code uses the MERGED_LEVANT.shp mask from the repository to crop and mask the data to the outline of the study area.
    From the processed data, trend analysis are conducted and z-scores were calculated following the code described above. However, annual trends require the frequency of the time series analysis to be set to value = 12. Regarding, e.g., rainfall, which is measured as annual sums and not means, the chunk r.sum=r.sum/12 has to be removed or set to r.sum=r.sum/1 to avoid calculating annual mean values (see other variables). Seasonal subset can be calculated as described in the code. Here, 3-month subsets were chosen for growing seasons, e.g. March-May (MAM), June-July (JJA), September-November (SON), and DJF (December-February, including Jan/Feb of the consecutive year).
    From the data, mean values of 48 consecutive years are calculated and trend analysis are performed as describe above. In the same way, p-values are extracted and 95 % confidence level values are marked with dots on the raster plot. This analysis can be performed with a much longer time series, other variables, ad different spatial extent across the globe due to the availability of the GLDAS variables.

  6. g

    Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program...

    • datasearch.gesis.org
    • openicpsr.org
    Updated Feb 19, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaplan, Jacob (2020). Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program Data: Property Stolen and Recovered (Supplement to Return A) 1960-2018 [Dataset]. http://doi.org/10.3886/E105403
    Explore at:
    Dataset updated
    Feb 19, 2020
    Dataset provided by
    da|ra (Registration agency for social science and economic data)
    Authors
    Kaplan, Jacob
    Description

    For any questions about this data please email me at jacob@crimedatatool.com. If you use this data, please cite it.Version 4 release notes:Adds data for 2018Version 3 release notes:Adds data in the following formats: Excel.Changes project name to avoid confusing this data for the ones done by NACJD.Version 2 release notes:Adds data for 2017.Adds a "number_of_months_reported" variable which says how many months of the year the agency reported data.Property Stolen and Recovered is a Uniform Crime Reporting (UCR) Program data set with information on the number of offenses (crimes included are murder, rape, robbery, burglary, theft/larceny, and motor vehicle theft), the value of the offense, and subcategories of the offense (e.g. for robbery it is broken down into subcategories including highway robbery, bank robbery, gas station robbery). The majority of the data relates to theft. Theft is divided into subcategories of theft such as shoplifting, theft of bicycle, theft from building, and purse snatching. For a number of items stolen (e.g. money, jewelry and previous metals, guns), the value of property stolen and and the value for property recovered is provided. This data set is also referred to as the Supplement to Return A (Offenses Known and Reported). All the data was received directly from the FBI as text or .DTA files. I created a setup file based on the documentation provided by the FBI and read the data into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. For the R code used to clean this data, see here: https://github.com/jacobkap/crime_data. The Word document file available for download is the guidebook the FBI provided with the raw data which I used to create the setup file to read in data.There may be inaccuracies in the data, particularly in the group of columns starting with "auto." To reduce (but certainly not eliminate) data errors, I replaced the following values with NA for the group of columns beginning with "offenses" or "auto" as they are common data entry error values (e.g. are larger than the agency's population, are much larger than other crimes or months in same agency): 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 99942. This cleaning was NOT done on the columns starting with "value."For every numeric column I replaced negative indicator values (e.g. "j" for -1) with the negative number they are supposed to be. These negative number indicators are not included in the FBI's codebook for this data but are present in the data. I used the values in the FBI's codebook for the Offenses Known and Clearances by Arrest data.To make it easier to merge with other data, I merged this data with the Law Enforcement Agency Identifiers Crosswalk (LEAIC) data. The data from the LEAIC add FIPS (state, county, and place) and agency type/subtype. If an agency has used a different FIPS code in the past, check to make sure the FIPS code is the same as in this data.

  7. u

    Data from: Data and code from: Environmental influences on drying rate of...

    • agdatacommons.nal.usda.gov
    • s.cnmilf.com
    • +2more
    txt
    Updated May 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Warren Copes; Quentin Read; Barbara J. Smith (2024). Data and code from: Environmental influences on drying rate of spray applied disinfestants from horticultural production services [Dataset]. http://doi.org/10.15482/USDA.ADC/25673073.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 14, 2024
    Dataset provided by
    Ag Data Commons
    Authors
    Warren Copes; Quentin Read; Barbara J. Smith
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset includes all the data and R code needed to reproduce the analyses in a forthcoming manuscript:Copes, W. E., Q. D. Read, and B. J. Smith. Environmental influences on drying rate of spray applied disinfestants from horticultural production services. PhytoFrontiers, DOI pending.Study description: Instructions for disinfestants typically specify a dose and a contact time to kill plant pathogens on production surfaces. A problem occurs when disinfestants are applied to large production areas where the evaporation rate is affected by weather conditions. The common contact time recommendation of 10 min may not be achieved under hot, sunny conditions that promote fast drying. This study is an investigation into how the evaporation rates of six commercial disinfestants vary when applied to six types of substrate materials under cool to hot and cloudy to sunny weather conditions. Initially, disinfestants with low surface tension spread out to provide 100% coverage and disinfestants with high surface tension beaded up to provide about 60% coverage when applied to hard smooth surfaces. Disinfestants applied to porous materials were quickly absorbed into the body of the material, such as wood and concrete. Even though disinfestants evaporated faster under hot sunny conditions than under cool cloudy conditions, coverage was reduced considerably in the first 2.5 min under most weather conditions and reduced to less than or equal to 50% coverage by 5 min. Dataset contents: This dataset includes R code to import the data and fit Bayesian statistical models using the model fitting software CmdStan, interfaced with R using the packages brms and cmdstanr. The models (one for 2022 and one for 2023) compare how quickly different spray-applied disinfestants dry, depending on what chemical was sprayed, what surface material it was sprayed onto, and what the weather conditions were at the time. Next, the statistical models are used to generate predictions and compare mean drying rates between the disinfestants, surface materials, and weather conditions. Finally, tables and figures are created. These files are included:Drying2022.csv: drying rate data for the 2022 experimental runWeather2022.csv: weather data for the 2022 experimental runDrying2023.csv: drying rate data for the 2023 experimental runWeather2023.csv: weather data for the 2023 experimental rundisinfestant_drying_analysis.Rmd: RMarkdown notebook with all data processing, analysis, and table creation codedisinfestant_drying_analysis.html: rendered output of notebookMS_figures.R: additional R code to create figures formatted for journal requirementsfit2022_discretetime_weather_solar.rds: fitted brms model object for 2022. This will allow users to reproduce the model prediction results without having to refit the model, which was originally fit on a high-performance computing clusterfit2023_discretetime_weather_solar.rds: fitted brms model object for 2023data_dictionary.xlsx: descriptions of each column in the CSV data files

  8. Spire live and historical data

    • earth.esa.int
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Space Agency, Spire live and historical data [Dataset]. https://earth.esa.int/eogateway/catalog/spire-live-and-historical-data
    Explore at:
    Dataset authored and provided by
    European Space Agencyhttp://www.esa.int/
    License

    https://earth.esa.int/eogateway/documents/20142/1560778/ESA-Third-Party-Missions-Terms-and-Conditions.pdfhttps://earth.esa.int/eogateway/documents/20142/1560778/ESA-Third-Party-Missions-Terms-and-Conditions.pdf

    Description

    The data collected by Spire from its 100 satellites launched into Low Earth Orbit (LEO) has a diverse range of applications, from analysis of global trade patterns and commodity flows to aircraft routing to weather forecasting. The data also provides interesting research opportunities on topics as varied as ocean currents and GNSS-based planetary boundary layer height. The following products can be requested: GNSS Polarimetric Radio Occultation (STRATOS) Novel Polarimetric Radio Occultation (PRO) measurements collected by three Spire satellites are available over 15 May 2023 to 30 November 2023. PRO differ from regular RO (described below) in that the H and V polarizations of the signal are available, as opposed to only Right-Handed Circularly Polarized (RHCP) signals in regular RO. The differential phase shift between H and V correlates with the presence of hydrometeors (ice crystals, rain, snow, etc.). When combined, the H and V information provides the same information on atmospheric thermodynamic properties as RO: temperature, humidity, and pressure, based on the signal’s bending angle. Various levels of the products are provided. GNSS Reflectometry (STRATOS) GNSS Reflectometry (GNSS-R) is a technique to measure Earth’s surface properties using reflections of GNSS signals in the form of a bistatic radar. Spire collects two types of GNSS-R data: Near-Nadir incidence LHCP reflections collected by the Spire GNSS-R satellites, and Grazing-Angle GNSS-R (i.e., low elevation angle) RHCP reflections collected by the Spire GNSS-RO satellites. The Near-Nadir GNSS-R collects DDM (Delay Doppler Map) reflectivity measurements. These are used to compute ocean wind / wave conditions and soil moisture over land. The Grazing-Angle GNSS-R collects 50 Hz reflectivity and additionally carrier phase observations. These are used for altimetry and characterization of smooth surfaces (such as ice and inland water). Derived Level 1 and Level 2 products are available, as well as some special Level 0 raw intermediate frequency (IF) data. Historical grazing angle GNSS-R data are available from May 2019 to the present, while near-nadir GNSS-R data are available from December 2020 to the present. Name Temporal coverage Spatial coverage Description Data format and content Application Polarimetric Radio Occultation (PRO) measurements 15 May 2023 to 30 November 2023 Global PRO measurements observe the properties of GNSS signals as they pass through by Earth's atmosphere, similar to regular RO measurements. The polarization state of the signals is recorded separately for H and V polarizations to provide information on the anisotropy of hydrometeors along the propagation path leoOrb.sp3. This file contains the estimated position, velocity and receiver clock error of a given Spire satellite after processing of the POD observation file proObs. Level 0 - Raw open loop carrier phase measurements at 50 Hz sampling for both linear polarization components (horizontal and vertical) of the occulted GNSS signal. h(v)(c)atmPhs. Level 1B - Atmospheric excess phase delay computed for each individual linear polarization component (hatmPhs, vatmPhs) and for the combined (“H” + “V”) signal (catmPhs). Also contains values for signal-to-noise ratio, transmitter and receiver positions and open loop model information. polPhs. Level 1C - Combines the information from the hatmPhs and vatmPhs files while removing phase continuities due to phase wrapping and navigation bit modulation. patmPrf. Level 2 - Bending angle, dry refractivity, and dry temperature as a function of mean sea level altitude and impact parameter derived from the “combined” excess phase delay (catmPhs) PRO measurements add a sensitivity to ice and precipitation content alongside the traditional RO measurements of the atmospheric temperature, pressure, and water vapor. Near-Nadir GNSS Reflectometry (NN GNSS-R) measurements 25 January-2024 to 24 July 2024 Global Tracks of surface reflections as observed by the near-nadir pointing GNSS-R antennas, based on Delay Doppler Maps (DDMs). gbrRCS.nc. Level 1B - Along-track calibrated bistatic radar cross-sections measured by Spire conventional GNSS-R satellites. gbrNRCS.nc. Level 1B - Along-track calibrated bistatic and normalized radar cross-sections measured by Spire conventional GNSS-R satellites. gbrSSM.nc. Level 2 - Along-track SNR, reflectivity, and retrievals of soil moisture (and associated uncertainties) and probability of frozen ground. gbrOcn.nc. Level 2 - Along-track retrievals of mean square slope (MSS) of the sea surface, wind speed, sigma0, and associated uncertainties. NN GNSS-R measurements are used to measure ocean surface winds and characterize land surfaces for applications such as soil moisture, freeze/thaw monitoring, flooding detection, inland water body delineation, sea ice classification, etc. Grazing angle GNSS Reflectometry (GA GNSS-R) measurements 25 January 2024 to 24 July 2024 Global Tracks of surface reflections as observed by the limb-facing RO antennas, based on open-loop tracking outputs: 50 Hz collections of accumulated I/Q observations grzRfl.nc. Level 1B - Along-track SNR, reflectivity, phase delay (with respect to an open loop model) and low-level observables and bistatic radar geometries such as receiver, specular reflection, and the transmitter locations. grzIce.nc. Level 2 - Along-track water vs sea ice classification, along with sea ice type classification. grzAlt.nc. Level 2 - Along-track phase-delay, ionosphere-corrected altimetry, tropospheric delay, and ancillary models (mean sea surface, tides). GA GNSS-R measurements are used to 1) characterize land surfaces for applications such as sea ice classification, freeze/thaw monitoring, inland water body detection and delineation, etc., and 2) measure relative altimetry with dm-level precision for inland water bodies, river slopes, sea ice freeboard, etc., but also water vapor characterization from delay based on tropospheric delays. Additionally, the following products (better detailed in the ToA) can be requested but the acceptance is not guaranteed and shall be evaluated on a case-by-case basis: Other STRATOS measurements: profiles of the Earth’s atmosphere and ionosphere, from December 2018 ADS-B Data Stream: monthly subscription to global ADS-B satellite data, available from December 2018 AIS messages: AIS messages observed from Spire satellites (S-AIS) and terrestrial from partner sensor stations (T-AIS), monthly subscription available from June 2016 The products are available as part of the Spire provision with worldwide coverage. All details about the data provision, data access conditions and quota assignment procedure are described in the Terms of Applicability.

  9. d

    Data from: Data and code from: Stem borer herbivory dependent on...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +1more
    Updated Aug 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2024). Data and code from: Stem borer herbivory dependent on interactions of sugarcane variety, associated traits, and presence of prior borer damage [Dataset]. https://catalog.data.gov/dataset/data-and-code-from-stem-borer-herbivory-dependent-on-interactions-of-sugarcane-variety-ass-1e076
    Explore at:
    Dataset updated
    Aug 2, 2024
    Dataset provided by
    Agricultural Research Service
    Description

    This dataset contains all the data and code needed to reproduce the analyses in the manuscript: Penn, H. J., & Read, Q. D. (2023). Stem borer herbivory dependent on interactions of sugarcane variety, associated traits, and presence of prior borer damage. Pest Management Science. https://doi.org/10.1002/ps.7843 Included are two .Rmd notebooks containing all code required to reproduce the analyses in the manuscript, two .html file of rendered notebook output, three .csv data files that are loaded and analyzed, and a .zip file of intermediate R objects that are generated during the model fitting and variable selection process. Notebook files 01_boring_analysis.Rmd: This RMarkdown notebook contains R code to read and process the raw data, create exploratory data visualizations and tables, fit a Bayesian generalized linear mixed model, extract output from the statistical model, and create graphs and tables summarizing the model output including marginal means for different varieties and contrasts between crop years. 02_trait_covariate_analysis.Rmd: This RMarkdown notebook contains R code to read raw variety-level trait data, perform feature selection based on correlations between traits, fit another generalized linear mixed model using traits as predictors, and create graphs and tables from that model output including marginal means by categorical trait and marginal trends by continuous trait. HTML files These HTML files contain the rendered output of the two RMarkdown notebooks. They were generated by Quentin Read on 2023-08-30 and 2023-08-15. 01_boring_analysis.html 02_trait_covariate_analysis.html CSV data files These files contain the raw data. To recreate the notebook output the CSV files should be at the file path project/data/ relative to where the notebook is run. Columns are described below. BoredInternodes_26April2022_no format.csv: primary data file with sugarcane borer (SCB) damage Columns A-C are the year, date, and location. All location values are the same. Column D identifies which experiment the data point was collected from. Column E, Stubble, indicates the crop year (plant cane or first stubble) Column F indicates the variety Column G indicates the plot (integer ID) Column H indicates the stalk within each plot (integer ID) Column I, # Internodes, indicates how many internodes were on the stalk Columns J-AM are numbered 1-30 and indicate whether SCB damage was observed on that internode (0 if no, 1 if yes, blank cell if that internode was not present on the stalk) Column AN indicates the experimental treatment for those rows that are part of a manipulative experiment Column AO contains notes variety_lookup.csv: summary information for the 16 varieties analyzed in this study Column A is the variety name Column B is the total number of stalks assessed for SCB damage for that variety across all years Column C is the number of years that variety is present in the data Column D, Stubble, indicates which crop years were sampled for that variety ("PC" if only plant cane, "PC, 1S" if there are data for both plant cane and first stubble crop years) Column E, SCB resistance, is a categorical designation with four values: susceptible, moderately susceptible, moderately resistant, resistant Column F is the literature reference for the SCB resistance value Select_variety_traits_12Dec2022.csv: variety-level traits for the 16 varieties analyzed in this study Column A is the variety name Column B is the SCB resistance designation as an integer Column C is the categorical SCB resistance designation (see above) Columns D-I are continuous traits from year 1 (plant cane), including sugar (Mg/ha), biomass or aboveground cane production (Mg/ha), TRS or theoretically recoverable sugar (g/kg), stalk weight of individual stalks (kg), stalk population density (stalks/ha), and fiber content of stalk (percent). Columns J-O are the same continuous traits from year 2 (first stubble) Columns P-V are categorical traits (in some cases continuous traits binned into categories): maturity timing, amount of stalk wax, amount of leaf sheath wax, amount of leaf sheath hair, tightness of leaf sheath, whether leaf sheath becomes necrotic with age, and amount of collar hair. ZIP file of intermediate R objects To recreate the notebook output without having to run computationally intensive steps, unzip the archive. The fitted model objects should be at the file path project/ relative to where the notebook is run. intermediate_R_objects.zip: This file contains intermediate R objects that are generated during the model fitting and variable selection process. You may use the R objects in the .zip file if you would like to reproduce final output including figures and tables without having to refit the computationally intensive statistical models. binom_fit_intxns_updated_only5yrs.rds: fitted brms model object for the main statistical model binom_fit_reduced.rds: fitted brms model object for the trait covariate analysis marginal_trends.RData: calculated values of the estimated marginal trends with respect to year and previous damage marginal_trend_trs.rds: calculated values of the estimated marginal trend with respect to TRS marginal_trend_fib.rds: calculated values of the estimated marginal trend with respect to fiber content Resources in this dataset:Resource Title: Sugarcane borer damage data by internode, 1993-2021. File Name: BoredInternodes_26April2022_no format.csvResource Title: Summary information for the 16 sugarcane varieties analyzed. File Name: variety_lookup.csvResource Title: Variety-level traits for the 16 sugarcane varieties analyzed. File Name: Select_variety_traits_12Dec2022.csvResource Title: RMarkdown notebook 2: trait covariate analysis. File Name: 02_trait_covariate_analysis.RmdResource Title: Rendered HTML output of notebook 2. File Name: 02_trait_covariate_analysis.htmlResource Title: RMarkdown notebook 1: main analysis. File Name: 01_boring_analysis.RmdResource Title: Rendered HTML output of notebook 1. File Name: 01_boring_analysis.htmlResource Title: Intermediate R objects. File Name: intermediate_R_objects.zip

  10. m

    Data for: A systematic review showed no performance benefit of machine...

    • data.mendeley.com
    • search.datacite.org
    Updated Mar 14, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ben Van Calster (2019). Data for: A systematic review showed no performance benefit of machine learning over logistic regression for clinical prediction models [Dataset]. http://doi.org/10.17632/sypyt6c2mc.1
    Explore at:
    Dataset updated
    Mar 14, 2019
    Authors
    Ben Van Calster
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The uploaded files are:

    1) Excel file containing 6 sheets in respective Order: "Data Extraction" (summarized final data extractions from the three reviewers involved), "Comparison Data" (data related to the comparisons investigated), "Paper level data" (summaries at paper level), "Outcome Event Data" (information with respect to number of events for every outcome investigated within a paper), "Tuning Classification" (data related to the manner of hyperparameter tuning of Machine Learning Algorithms).

    2) R script used for the Analysis (In order to read the data, please: Save "Comparison Data", "Paper level data", "Outcome Event Data" Excel sheets as txt files. In the R script srpap: Refers to the "Paper level data" sheet, srevents: Refers to the "Outcome Event Data" sheet and srcompx: Refers to " Comparison data Sheet".

    3) Supplementary Material: Including Search String, Tables of data, Figures

    4) PRISMA checklist items

  11. d

    R Program - Claims-Based Frailty Index

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Sep 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bedell, Douglas (2024). R Program - Claims-Based Frailty Index [Dataset]. http://doi.org/10.7910/DVN/4Y3Y23
    Explore at:
    Dataset updated
    Sep 25, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Bedell, Douglas
    Description

    This R program calculates CFI for each patient from analytic data files containing information on patient identifiers, ICD-9-CM diagnosis codes (version 32), ICD-10-CM Diagnosis Codes (version 2020), CPT codes, and HCPCS codes. NOTE: When downloading, store "CFI_ICD9CM_V32.tab" and "CFI_ICD10CM_V2020.tab" as csv files (these files are originally stored as csv files, but Dataverse automatically converts them to tab files). Please read "Frailty-Index-R-code-Guide" before proceeding. Interpretation, validation data, and annotated references are provided in "Research Background - Claims-Based Frailty Index".

  12. o

    Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program...

    • openicpsr.org
    Updated May 18, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacob Kaplan (2018). Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program Data: Hate Crime Data 1991-2019 [Dataset]. http://doi.org/10.3886/E103500V7
    Explore at:
    Dataset updated
    May 18, 2018
    Dataset provided by
    University of Pennsylvania
    Authors
    Jacob Kaplan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    1991 - 2019
    Area covered
    United States
    Description

    !!!WARNING~~~This dataset has a large number of flaws and is unable to properly answer many questions that people generally use it to answer, such as whether national hate crimes are changing (or at least they use the data so improperly that they get the wrong answer). A large number of people using this data (academics, advocates, reporting, US Congress) do so inappropriately and get the wrong answer to their questions as a result. Indeed, many published papers using this data should be retracted. Before using this data I highly recommend that you thoroughly read my book on UCR data, particularly the chapter on hate crimes (https://ucrbook.com/hate-crimes.html) as well as the FBI's own manual on this data. The questions you could potentially answer well are relatively narrow and generally exclude any causal relationships. ~~~WARNING!!!Version 8 release notes:Adds 2019 dataVersion 7 release notes:Changes release notes description, does not change data.Version 6 release notes:Adds 2018 dataVersion 5 release notes:Adds data in the following formats: SPSS, SAS, and Excel.Changes project name to avoid confusing this data for the ones done by NACJD.Adds data for 1991.Fixes bug where bias motivation "anti-lesbian, gay, bisexual, or transgender, mixed group (lgbt)" was labeled "anti-homosexual (gay and lesbian)" prior to 2013 causing there to be two columns and zero values for years with the wrong label.All data is now directly from the FBI, not NACJD. The data initially comes as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. Version 4 release notes: Adds data for 2017.Adds rows that submitted a zero-report (i.e. that agency reported no hate crimes in the year). This is for all years 1992-2017. Made changes to categorical variables (e.g. bias motivation columns) to make categories consistent over time. Different years had slightly different names (e.g. 'anti-am indian' and 'anti-american indian') which I made consistent. Made the 'population' column which is the total population in that agency. Version 3 release notes: Adds data for 2016.Order rows by year (descending) and ORI.Version 2 release notes: Fix bug where Philadelphia Police Department had incorrect FIPS county code. The Hate Crime data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains information about hate crimes reported in the United States. Please note that the files are quite large and may take some time to open.Each row indicates a hate crime incident for an agency in a given year. I have made a unique ID column ("unique_id") by combining the year, agency ORI9 (the 9 character Originating Identifier code), and incident number columns together. Each column is a variable related to that incident or to the reporting agency. Some of the important columns are the incident date, what crime occurred (up to 10 crimes), the number of victims for each of these crimes, the bias motivation for each of these crimes, and the location of each crime. It also includes the total number of victims, total number of offenders, and race of offenders (as a group). Finally, it has a number of columns indicating if the victim for each offense was a certain type of victim or not (e.g. individual victim, business victim religious victim, etc.). The only changes I made to the data are the following. Minor changes to column names to make all column names 32 characters or fewer (so it can be saved in a Stata format), made all character values lower case, reordered columns. I also generated incident month, weekday, and month-day variables from the incident date variable included in the original data.

  13. Z

    Food and Agriculture Biomass Input–Output (FABIO) database

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kuschnig, Nikolas (2022). Food and Agriculture Biomass Input–Output (FABIO) database [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2577066
    Explore at:
    Dataset updated
    Jun 8, 2022
    Dataset provided by
    Bruckner, Martin
    Kuschnig, Nikolas
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This data repository provides the Food and Agriculture Biomass Input Output (FABIO) database, a global set of multi-regional physical supply-use and input-output tables covering global agriculture and forestry.

    The work is based on mostly freely available data from FAOSTAT, IEA, EIA, and UN Comtrade/BACI. FABIO currently covers 191 countries + RoW, 118 processes and 125 commodities (raw and processed agricultural and food products) for 1986-2013. All R codes and auxilliary data are available on GitHub. For more information please refer to https://fabio.fineprint.global.

    The database consists of the following main components, in compressed .rds format:

    Z: the inter-commodity input-output matrix, displaying the relationships of intermediate use of each commodity in the production of each commodity, in physical units (tons). The matrix has 24000 rows and columns (125 commodities x 192 regions), and is available in two versions, based on the method to allocate inputs to outputs in production processes: Z_mass (mass allocation) and Z_value (value allocation). Note that the row sums of the Z matrix (= total intermediate use by commodity) are identical in both versions.

    Y: the final demand matrix, denoting the consumption of all 24000 commodities by destination country and final use category. There are six final use categories (yielding 192 x 6 = 1152 columns): 1) food use, 2) other use (non-food), 3) losses, 4) stock addition, 5) balancing, and 6) unspecified.

    X: the total output vector of all 24000 commodities. Total output is equal to the sum of intermediate and final use by commodity.

    L: the Leontief inverse, computed as (I – A)-1, where A is the matrix of input coefficients derived from Z and x. Again, there are two versions, depending on the underlying version of Z (L_mass and L_value).

    E: environmental extensions for each of the 24000 commodities, including four resource categories: 1) primary biomass extraction (in tons), 2) land use (in hectares), 3) blue water use (in m3)., and 4) green water use (in m3).

    mr_sup_mass/mr_sup_value: For each allocation method (mass/value), the supply table gives the physical supply quantity of each commodity by producing process, with processes in the rows (118 processes x 192 regions = 22656 rows) and commodities in columns (24000 columns).

    mr_use: the use table capture the quantities of each commodity (rows) used as an input in each process (columns).

    A description of the included countries and commodities (i.e. the rows and columns of the Z matrix) can be found in the auxiliary file io_codes.csv. Separate lists of the country sample (including ISO3 codes and continental grouping) and commodities (including moisture content) are given in the files regions.csv and items.csv, respectively. For information on the individual processes, see auxiliary file su_codes.csv. RDS files can be opened in R. Information on how to read these files can be obtained here: https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/readRDS

    Except of X.rds, which contains a matrix, all variables are organized as lists, where each element contains a sparse matrix. Please note that values are always given in physical units, i.e. tonnes or head, as specified in items.csv. The suffixes value and mass only indicate the form of allocation chosen for the construction of the symmetric IO tables (for more details see Bruckner et al. 2019). Product, process and country classifications can be found in the file fabio_classifications.xlsx.

    Footprint results are not contained in the database but can be calculated, e.g. by using this script: https://github.com/martinbruckner/fabio_comparison/blob/master/R/fabio_footprints.R

    How to cite:

    To cite FABIO work please refer to this paper:

    Bruckner, M., Wood, R., Moran, D., Kuschnig, N., Wieland, H., Maus, V., Börner, J. 2019. FABIO – The Construction of the Food and Agriculture Input–Output Model. Environmental Science & Technology 53(19), 11302–11312. DOI: 10.1021/acs.est.9b03554

    License:

    This data repository is distributed under the CC BY-NC-SA 4.0 License. You are free to share and adapt the material for non-commercial purposes using proper citation. If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. In case you are interested in a collaboration, I am happy to receive enquiries at martin.bruckner@wu.ac.at.

    Known issues:

    The underlying FAO data have been manipulated to the minimum extent necessary. Data filling and supply-use balancing, yet, required some adaptations. These are documented in the code and are also reflected in the balancing item in the final demand matrices. For a proper use of the database, I recommend to distribute the balancing item over all other uses proportionally and to do analyses with and without balancing to illustrate uncertainties.

  14. WoSIS snapshot - December 2023

    • search.dataone.org
    Updated Feb 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ISRIC – World Soil Information (2025). WoSIS snapshot - December 2023 [Dataset]. https://search.dataone.org/view/sha256%3Aae94fefb74f928a3d482eee20abf33cf04d988555ef2beef2977eba7d5504bd7
    Explore at:
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    International Soil Reference and Information Centre
    Time period covered
    Jan 1, 1918 - Dec 1, 2022
    Area covered
    Pacific Ocean, North Pacific Ocean
    Description

    ABSTRACT: The World Soil Information Service (WoSIS) provides quality-assessed and standardized soil profile data to support digital soil mapping and environmental applications at broad scale levels. Since the release of the ‘WoSIS snapshot 2019’ many new soil data were shared with us, registered in the ISRIC data repository, and subsequently standardized in accordance with the licenses specified by the data providers. The source data were contributed by a wide range of data providers, therefore special attention was paid to the standardization of soil property definitions, soil analytical procedures and soil property values (and units of measurement). We presently consider the following soil chemical properties (organic carbon, total carbon, total carbonate equivalent, total Nitrogen, Phosphorus (extractable-P, total-P, and P-retention), soil pH, cation exchange capacity, and electrical conductivity) and physical properties (soil texture (sand, silt, and clay), bulk density, coarse fragments, and water retention), grouped according to analytical procedures (aggregates) that are operationally comparable. For each profile we provide the original soil classification (FAO, WRB, USDA, and version) and horizon designations as far as these have been specified in the source databases. Three measures for 'fitness-for-intended-use' are provided: positional uncertainty (for site locations), time of sampling/description, and a first approximation for the uncertainty associated with the operationally defined analytical methods. These measures should be considered during digital soil mapping and subsequent earth system modelling that use the present set of soil data. DATA SET DESCRIPTION: The 'WoSIS 2023 snapshot' comprises data for 228k profiles from 217k geo-referenced sites that originate from 174 countries. The profiles represent over 900k soil layers (or horizons) and over 6 million records. The actual number of measurements for each property varies (greatly) between profiles and with depth, this generally depending on the objectives of the initial soil sampling programmes. The data are provided in TSV (tab separated values) format and as GeoPackage. The zip-file (446 Mb) contains the following files: - Readme_WoSIS_202312_v2.pdf: Provides a short description of the dataset, file structure, column names, units and category values (this file is also available directly under 'online resources'). The pdf includes links to tutorials for downloading the TSV files into R respectively Excel. See also 'HOW TO READ TSV FILES INTO R AND PYTHON' in the next section. - wosis_202312_observations.tsv: This file lists the four to six letter codes for each observation, whether the observation is for a site/profile or layer (horizon), the unit of measurement and the number of profiles respectively layers represented in the snapshot. It also provides an estimate for the inferred accuracy for the laboratory measurements. - wosis_202312_sites.tsv: This file characterizes the site location where profiles were sampled. - wosis_2023112_profiles: Presents the unique profile ID (i.e. primary key), site_id, source of the data, country ISO code and name, positional uncertainty, latitude and longitude (WGS 1984), maximum depth of soil described and sampled, as well as information on the soil classification system and edition. Depending on the soil classification system used, the number of fields will vary . - wosis_202312_layers: This file characterises the layers (or horizons) per profile, and lists their upper and lower depths (cm). - wosis_202312_xxxx.tsv : This type of file presents results for each observation (e.g. “xxxx” = “BDFIOD” ), as defined under “code” in file wosis_202312_observation.tsv. (e.g. wosis_202311_bdfiod.tsv). - wosis_202312.gpkg: Contains the above datafiles in GeoPackage format (which stores the files within an SQLite database). HOW TO READ TSV FILES INTO R AND PYTHON: A) To read the data in R, please uncompress the ZIP file and specify the uncompressed folder. setwd("/YourFolder/WoSIS_2023_December/") ## For example: setwd('D:/WoSIS_2023_December/') Then use read_tsv to read the TSV files, specifying the data types for each column (c = character, i = integer, n = number, d = double, l = logical, f = factor, D = date, T = date time, t = time). observations = readr::read_tsv('wosis_202312_observations.tsv', col_types='cccciid') observations ## show columns and first 10 rows sites = readr::read_tsv('wosis_202312_sites.tsv', col_types='iddcccc') sites profiles = readr::read_tsv('wosis_202312_profiles.tsv', col_types='icciccddcccccciccccicccci') profiles layers = readr::read_tsv('wosis_202312_layers.tsv', col_types='iiciciiilcc') layers ## Do this for each observation 'XXXX', e.g. file 'Wosis_202312_orgc.tsv': orgc = readr::read_tsv('wosis_202312_orgc.tsv', col_types='... Visit https://dataone.org/datasets/sha256%3Aae94fefb74f928a3d482eee20abf33cf04d988555ef2beef2977eba7d5504bd7 for complete metadata about this dataset.

  15. Data from: Mars Analysis Correction Data Assimilation (MACDA): MGS/TES v1.0...

    • catalogue.ceda.ac.uk
    • data-search.nerc.ac.uk
    Updated Jan 19, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luca Montabone; Stephen R. Lewis; Peter L. Read (2022). Mars Analysis Correction Data Assimilation (MACDA): MGS/TES v1.0 Reference Run Data [Dataset]. https://catalogue.ceda.ac.uk/uuid/acdfa050673c46d49d6a35bfa482762b
    Explore at:
    Dataset updated
    Jan 19, 2022
    Dataset provided by
    Centre for Environmental Data Analysishttp://www.ceda.ac.uk/
    Authors
    Luca Montabone; Stephen R. Lewis; Peter L. Read
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Time period covered
    Apr 30, 1999 - Aug 30, 2004
    Area covered
    Earth
    Variables measured
    time, latitude, longitude, Martian year, eastward_wind, northward_wind, Solar longitude, air_temperature, solar_longitude, Surface pressure, and 12 more
    Description

    This dataset contains basic gridded atmospheric and surface variables for the planet Mars over three martian years (a martian year is 1.88 terrestrial years), produced as a reference run in association with the Mars Analysis Correction Data Assimilation (MACDA) v1.0 re-analysis. Each file in the dataset spans 30 martian mean solar days (sols) during the science mapping phase of the National Aeronautics and Space Administration's (NASA) Mars Global Surveyor (MGS) spacecraft, between May 1999 and August 2004.

    This dataset is a reference run produced by re-analysis of Thermal Emission Spectrometer (TES) retrievals of only total dust opacities, using the MACDA scheme in a Mars global circulation model (MGCM). This reference dataset, therefore, should be used in association with the full re-analysis of TES retrievals of nadir thermal profiles and total dust opacities - see linked dataset.

    The MGCM used is the UK spectral version of the model developed by the Laboratoire de Météorologie Dynamique in Paris, France.

    MACDA is a collaboration between the University of Oxford and The Open University in the UK.

  16. Z

    Ultra high-density 255-channel EEG-AAD dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bertrand, Alexander (2024). Ultra high-density 255-channel EEG-AAD dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4518753
    Explore at:
    Dataset updated
    Jun 13, 2024
    Dataset provided by
    Bertrand, Alexander
    Zink, Rob
    Mundanad Narayanan, Abhijith
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    If using this dataset, please cite the following paper above and the current Zenodo repository:A. Mundanad Narayanan, R. Zink, and A. Bertrand, "EEG miniaturization limits for stimulus decoding with EEG sensor networks", Journal of Neural Engineering, vol. 18, 2021, doi: 10.1088/1741-2552/ac2629

    Experiment*************

    This dataset contains 255-channel electroencephalography (EEG) data collected during an auditory attention decoding experiment (AAD). The EEG was recorded using a SynAmps RT device (Compumedics, Australia) at a sampling rate of 1 kHz and using active Ag/Cl electrodes. The electrodes were placed on the head according to the international 10-5 (5%) system. 30 normal hearing male subjects between 22 and 35 years old participated in the experiment. All of them signed an informed consent form approved by the KU Leuven ethical committee.

    Two Dutch stories narrated by different male speakers divided into two parts of 6 minutes each were used as the stimuli in the experiment [1]. A single trial of the experiment involved the presentation of these two parts (one of both stories) to the subject through insert phones (Etymotic ER3A) at 60dBA. These speech stimuli were filtered using a head-related transfer function (HRTF) such that the stories seemed to arrive from two distinct spatial locations, namely left and right with respect to the subject with 180 degrees separation. In each trial, the subjects were asked to attend to only one ear while ignoring the other. Four trials of 6 minutes each were carried out, in which each story part is used twice. The order of presentations was randomized and balanced over different subjects. Thus approximately 24 minutes of EEG data was recorded per subject.

    File organization and details********************************

    The EEG data of each of the 30 subjects are uploaded as a ZIP file with the name Sx.tar.gzip here x=0,1,2,..,29. When a zip file is extracted, the EEG data are in their original raw format as recorded by the CURRY software [2]. The data files of each recording consist of four files with the same name but different extensions, namely, .dat, ,dap, .rs3 and .ceo. The name of each file follows the following convention: Sx_AAD_P. With P taking one of the following values for each file:1. 1L2. 1R3. 2L4. 2R

    The letter 'L' or 'R' in P indicates the attended direction of each subject in a recording: left and right respectively. A MATLAB function to read the software is provided in the directory called scripts. A python function to read the file is available in this Github repository [3].The original version of stimuli presented to subjects, i.e. without the HRTF filtering, can be found after extracting the stimuli.zip file in WAV format. There are 4 WAV files corresponding to the two parts of each of the two stories. These files have been sampled at 44.1 kHz. The order of presentation of these WAV files is given in the table below: Stimuli presentation and attention information of files

    Trial (P) Stimuli: Left-ear Stimuli: Right-ear Attention

    1L part1_track1_dry part1_track2_dry Left

    1R part1_track1_dry part1_track2_dry Right

    2L part2_track2_dry part2_track1_dry Left

    2R part2_track2_dry part2_track1_dry Right

    Additional files (after extracting scripts.zip and misc.zip):

    scripts/sample_script.m: Demonstrates reading an EEG-AAD recording and extracting the start and end of the experiment.

    misc/channel-layout.jpeg: The 255-channel EEG cap layout

    misc/eeg255ch_locs.csv: The channel names, numbers and their spherical (theta and phi) scalp coordinates.

    [1] Radioboeken voor kinderen, http://radioboeken.eu/kinderradioboeken.php?lang=NL, 2007 (Accessed: 8 Feb 2021)

    [2] CURRY 8 X – Data Acquisition and Online Processing, https://compumedicsneuroscan.com/product/curry-data-acquisition-online-processing-x/ (Accessed: 8, Feb, 2021)

    [3] Abhijith Mundanad Narayanan, "EEG analysis in python", 2021. https://github.com/mabhijithn/eeg-analyse , (Accessed: 8 Feb, 2021)

  17. r

    GAL Predictions of receptor impact variables v01

    • researchdata.edu.au
    • data.gov.au
    • +1more
    Updated Dec 9, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bioregional Assessment Program (2018). GAL Predictions of receptor impact variables v01 [Dataset]. https://researchdata.edu.au/gal-predictions-receptor-variables-v01/2989405
    Explore at:
    Dataset updated
    Dec 9, 2018
    Dataset provided by
    data.gov.au
    Authors
    Bioregional Assessment Program
    License

    Attribution 2.5 (CC BY 2.5)https://creativecommons.org/licenses/by/2.5/
    License information was derived automatically

    Description

    Abstract \r

    \r The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.\r \r \r \r Receptor impact models (RIMs) are developed for specific landscape classes. The prediction of Receptor Impact Variables is a multi-stage process. It relies on the runs from surface water and groundwater models at nodes within the analysis extent. These outputs derive directly from the hydrological model. For a given node, there is a value for each combination of hydrological response variable, future, and replicate or run number. Not all variables may be available or appropriate at every node. This differs to the quantile summary information that is otherwise used to summarise the HRV output and is also registered.\r \r

    Dataset History \r

    \r There is a key look up table (Excel file) that lists the assessment units (AUIDs) by landscape class (or landscape group if appropriate) and notes that groundwater modelling node and runs, and the surface water modelling node and runs, that should be used for that AUID. In some cases the AUID is only mapped to one set of hydrological modelling output. This look up table represent the AUIDs that require RIV predictions. For NAM and GAL there is a single look up table. For GLO and HUN surface and GW are provided separately. \r \r Receptor impact models (RIMs) are developed for specific landscape classes. The hydrological response variables that a RIM within a landscape class requires are organised by the R script RIM_Prediction_CreateArray.R into an array. The formatted data is available as an R data file format called RDS and can be read directly into R.\r \r The R script IMIA_XXX_RIM_predictions.R applies the receptor model functions (RDS object as part of Data set 1: Ecological expert elicitation and receptor impact models for the XXX subregion) to the HRV array for each landscape class (or landscape group) to make predictions of receptor impact varibles (RIVs). Predictions of a receptor impact from a RIM for a landscape class are summarised at relevant AUIDs by the 5th through to the 95th percentiles (in 5% increments) for baseline and CRDP futures. These are available in the XXX_RIV_quantiles_IMIA.csv data set. RIV predictions are further summarised and compared as boxplots (using the R script boxplotsbyfutureperiod.R) and as (aggregated) spatial risk maps using GIS.\r \r

    Dataset Citation \r

    \r Bioregional Assessment Programme (2018) GAL Predictions of receptor impact variables v01. Bioregional Assessment Derived Dataset. Viewed 10 December 2018, http://data.bioregionalassessments.gov.au/dataset/67e0aec1-be25-46f5-badc-b4d895a934aa.\r \r

    Dataset Ancestors \r

    \r * Derived From Queensland wetland data version 3 - wetland areas.\r \r * Derived From Geofabric Surface Cartography - V2.1\r \r * Derived From Landscape classification of the Galilee preliminary assessment extent\r \r * Derived From Geofabric Surface Cartography - V2.1.1\r \r * Derived From GAL Landscape Class Reclassification for impact and risk analysis 20170601\r \r * Derived From GEODATA TOPO 250K Series 3, File Geodatabase format (.gdb)\r \r * Derived From Queensland groundwater dependent ecosystems\r \r * Derived From GEODATA TOPO 250K Series 3\r \r * Derived From Multi-resolution Valley Bottom Flatness MrVBF at three second resolution CSIRO 20000211\r \r * Derived From Landscape classification of the Galilee preliminary assessment extent\r \r * Derived From Biodiversity status of pre-clearing and remnant regional ecosystems - South East Qld\r \r

  18. n

    ESG rating of general stock indices

    • narcis.nl
    • data.mendeley.com
    Updated Oct 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erhart, S (via Mendeley Data) (2021). ESG rating of general stock indices [Dataset]. http://doi.org/10.17632/58mwkj5pf8.1
    Explore at:
    Dataset updated
    Oct 22, 2021
    Dataset provided by
    Data Archiving and Networked Services (DANS)
    Authors
    Erhart, S (via Mendeley Data)
    Description
    ################################################################################################## THE FILES HAVE BEEN CREATED BY SZILÁRD ERHART FOR A RESEARCH: ERHART (2021): ESG RATINGS OF GENERAL # STOCK EXCHANGE INDICES, INTERNATIONAL REVIEW OF FINANCIAL ANALYSIS# USERS OF THE FILES AGREE TO QUOTE THE ABOVE PAPER# THE PYTHON SCRIPT (PYTHONESG_ERHART.TXT) HELPS USERS TO GET TICKERS BY STOCK EXCHANGES AND EXTRACT ESG SCORES FOR THE UNDERLYING STOCKS FROM YAHOO FINANCE.# THE R SCRIPT (ESG_UA.TXT) HELPS TO REPLICATE THE MONTE CARLO EXPERIMENT DETAILED IN THE STUDY.# THE EXPORT_ALL CSV CONTAINS THE DOWNLOADED ESG DATA (SCORES, CONTROVERSIES, ETC) ORGANIZED BY STOCKS AND EXCHANGES.############################################################################################################################################################################################################### DISCLAIMER # The author takes no responsibility for the timeliness, accuracy, completeness or quality of the information provided. # The author is in no event liable for damages of any kind incurred or suffered as a result of the use or non-use of the # information presented or the use of defective or incomplete information. # The contents are subject to confirmation and not binding. # The author expressly reserves the right to alter, amend, whole and in part, # without prior notice or to discontinue publication for a period of time or even completely. ###########################################################################################################################################READ ME############################################################# BEFORE USING THE MONTE CARLO SIMULATIONS SCRIPT: # (1) COPY THE goascores.csv and goalscores_alt.csv FILES ONTO YOUR ON COMPUTER DRIVE. THE TWO FILES ARE IDENTICAL.# (2) SET THE EXACT FILE LOCATION INFORMATION IN THE 'Read in data' SECTION OF THE MONTE CARLO SCRIPT AND FOR THE OUTPUT FILES AT THE END OF THE SCRIPT# (3) LOAD MISC TOOLS AND MATRIXSTATS IN YOUR R APPLICATION# (4) RUN THE CODE.####################################READ ME
  19. Data visualisation using R, for researchers who don't use R

    • osf.io
    Updated May 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emily Nordmann; Phil McAleer; Wilhelmiina Toivo; Helena Paterson; Lisa DeBruine (2023). Data visualisation using R, for researchers who don't use R [Dataset]. https://osf.io/bj83f
    Explore at:
    Dataset updated
    May 22, 2023
    Dataset provided by
    Center for Open Sciencehttps://cos.io/
    Authors
    Emily Nordmann; Phil McAleer; Wilhelmiina Toivo; Helena Paterson; Lisa DeBruine
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Materials and additional resources for the data visualisation tutorial. Please read the wiki for more information.

  20. o

    Date Read

    • opencontext.org
    Updated Nov 28, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David K. Pettegrew; William R. Caraher; R. Scott Moore (2021). Date Read [Dataset]. https://opencontext.org/predicates/076a9d4f-f2c4-40ed-d60d-b2551f1d0bb9
    Explore at:
    Dataset updated
    Nov 28, 2021
    Dataset provided by
    Open Context
    Authors
    David K. Pettegrew; William R. Caraher; R. Scott Moore
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    An Open Context "predicates" dataset item. Open Context publishes structured data as granular, URL identified Web resources. This "Variables" record is part of the "Pyla-Koutsopetria Archaeological Project I: Pedestrian Survey" data publication.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
U.S. Environmental Protection Agency (2024). Modeling data and data for figures and text [Dataset]. https://datasets.ai/datasets/modeling-data-and-data-for-figures-and-text

Modeling data and data for figures and text

Explore at:
10, 57Available download formats
Dataset updated
Aug 27, 2024
Dataset authored and provided by
U.S. Environmental Protection Agency
Description

The data in this archive in in a zipped R data binary format, https://cran.r-project.org/doc/manuals/r-release/R-data.html. These data can be read by using the open source and free to use statistical software package R, https://www.r-project.org/. The data are organized following the figure numbering in the manuscript, e.g. Figure 1a is fig1a, and contains the same labeling as the figures including units and variable names. For a full explanation of the figure, please see the captions in the manuscript.

To open this data file, use the following commands in R.

load(‘JKelly_NH4NO3_JGR_2018.rdata’)

To list the contents of the file, use the following command in R

ls()

The data for each figure is contained in the data object with the figures name. To list the data, simply type the name of the figure returned from the ls() command.

The original model output and emissions used for this study are located on the ASM archived storage at /asm/ROMO/finescale/sjv2013. These data are in NetCDF format with self contained metadata with descriptive headers containing variable names, units, and simulation times.

This dataset is associated with the following publication: Kelly, J., C. Parworth, Q. Zhang, D. Miller, K. Sun, M. Zondlo , K. Baker, A. Wisthaler, J. Nowak , S. Pusede , R. Cohen , A. Weinheimer , A. Beyersdorf , G. Tonnesen, J. Bash, L. Valin, J. Crawford, A. Fried , and J. Walega. Modeling NH4NO3 Over the San Joaquin Valley During the 2013 DISCOVER‐AQ Campaign. JOURNAL OF GEOPHYSICAL RESEARCH-ATMOSPHERES. American Geophysical Union, Washington, DC, USA, 123(9): 4727-4745, (2018).

Search
Clear search
Close search
Google apps
Main menu