100+ datasets found
  1. d

    R code used to estimate public supply consumptive water use

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Aug 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). R code used to estimate public supply consumptive water use [Dataset]. https://catalog.data.gov/dataset/r-code-used-to-estimate-public-supply-consumptive-water-use
    Explore at:
    Dataset updated
    Aug 29, 2024
    Dataset provided by
    U.S. Geological Survey
    Description

    This child item describes R code used to determine public supply consumptive use estimates. Consumptive use was estimated by scaling an assumed fraction of deliveries used for outdoor irrigation by spatially explicit estimates of evaporative demand using estimated domestic and commercial, industrial, and institutional deliveries from the public supply delivery machine learning model child item. This method scales public supply water service area outdoor water use by the relationship between service area gross reference evapotranspiration provided by GridMET and annual continental U.S. (CONUS) growing season maximum evapotranspiration. This relationship to climate at the CONUS scale could result in over- or under-estimation of consumptive use at public supply service areas where local variations differ from national variations in climate. This method also assumes that 50% of deliveries for total domestic and commercial, industrial, and institutional deliveries is used for outdoor purposes. This dataset is part of a larger data release using machine learning to predict public supply water use for 12-digit hydrologic units from 2000-2020. This page includes the following file: PS_ConsumptiveUse.zip - a zip file containing input datasets, scripts, and output datasets

  2. m

    R codes and dataset for Visualisation of Diachronic Constructional Change...

    • bridges.monash.edu
    • researchdata.edu.au
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gede Primahadi Wijaya Rajeg (2023). R codes and dataset for Visualisation of Diachronic Constructional Change using Motion Chart [Dataset]. http://doi.org/10.26180/5c844c7a81768
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Monash University
    Authors
    Gede Primahadi Wijaya Rajeg
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    PublicationPrimahadi Wijaya R., Gede. 2014. Visualisation of diachronic constructional change using Motion Chart. In Zane Goebel, J. Herudjati Purwoko, Suharno, M. Suryadi & Yusuf Al Aried (eds.). Proceedings: International Seminar on Language Maintenance and Shift IV (LAMAS IV), 267-270. Semarang: Universitas Diponegoro. doi: https://doi.org/10.4225/03/58f5c23dd8387Description of R codes and data files in the repositoryThis repository is imported from its GitHub repo. Versioning of this figshare repository is associated with the GitHub repo's Release. So, check the Releases page for updates (the next version is to include the unified version of the codes in the first release with the tidyverse).The raw input data consists of two files (i.e. will_INF.txt and go_INF.txt). They represent the co-occurrence frequency of top-200 infinitival collocates for will and be going to respectively across the twenty decades of Corpus of Historical American English (from the 1810s to the 2000s).These two input files are used in the R code file 1-script-create-input-data-raw.r. The codes preprocess and combine the two files into a long format data frame consisting of the following columns: (i) decade, (ii) coll (for "collocate"), (iii) BE going to (for frequency of the collocates with be going to) and (iv) will (for frequency of the collocates with will); it is available in the input_data_raw.txt. Then, the script 2-script-create-motion-chart-input-data.R processes the input_data_raw.txt for normalising the co-occurrence frequency of the collocates per million words (the COHA size and normalising base frequency are available in coha_size.txt). The output from the second script is input_data_futurate.txt.Next, input_data_futurate.txt contains the relevant input data for generating (i) the static motion chart as an image plot in the publication (using the script 3-script-create-motion-chart-plot.R), and (ii) the dynamic motion chart (using the script 4-script-motion-chart-dynamic.R).The repository adopts the project-oriented workflow in RStudio; double-click on the Future Constructions.Rproj file to open an RStudio session whose working directory is associated with the contents of this repository.

  3. d

    R code that determines groundwater and surface water source fractions for...

    • catalog.data.gov
    • data.usgs.gov
    • +2more
    Updated Aug 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). R code that determines groundwater and surface water source fractions for public-supply water service areas, counties, and 12-digit hydrologic units [Dataset]. https://catalog.data.gov/dataset/r-code-that-determines-groundwater-and-surface-water-source-fractions-for-public-supply-wa
    Explore at:
    Dataset updated
    Aug 29, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    This child item describes R code used to determine water source fractions (groundwater (GW), surface water (SW), or spring (SP)) for public-supply water service areas, counties, and 12-digit hydrologic unit codes (HUC12) using information from a proprietary dataset from the U.S. Environmental Protection Agency. Water-use volumes per source were not available from public-supply systems so water source fractions were calculated by the number of withdrawal source types (GW/SW). For example, for a public supply system with three SW intakes and one GW well, the fractions would be 0.75 SW and 0.25 GW. This dataset is part of a larger data release using machine learning to predict public supply water use for 12-digit hydrologic units from 2000-2020. Output from this code was used to calculate groundwater and surface water volumes by HUC12 for public supply. This page includes the following files: FCL_Data_Water_Sources_Flagged_wHUC_DR.R - an R script used to determine water source fractions by public-supply water service areas, counties, and HUC12s WaterSource_readme.txt - a README text file describing the script County_SourceFrac.csv - a csv file with estimated water source fractions by county HUC12_SourceFrac.csv - a csv file with estimated water source fractions by HUC12 WSA_AGIDF_SourceFrac.csv - a csv file with estimated water source fractions by public-supply water service area

  4. S

    Datasets, Codes, and R Outputs for Analyzing Cross-Provincial...

    • scidb.cn
    Updated Mar 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jing Yiming (2023). Datasets, Codes, and R Outputs for Analyzing Cross-Provincial Cultural-Psychological Differences across Mainland China [Dataset]. http://doi.org/10.57760/sciencedb.07672
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 19, 2023
    Dataset provided by
    Science Data Bank
    Authors
    Jing Yiming
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    The files include SPSS datasets and R Markdown Notebooks (codes and outputs) for the paper titled "Understanding Regional Cultural-Psychological Variations and Their Sources: A Large-Scale Examination in China". Specifically, "Data_Main.zip" contains compressed SPSS datasets (scale-level scores) for the main analyses as reported in the main text. "Data_All.zip" contains compressed SPSS datasets (item-level scores) for all respondents who passed the quality check. And "Codes_Outputs.zip" contains compressed R Markdown documents for the main analyses.

  5. Film Circulation dataset

    • zenodo.org
    • data.niaid.nih.gov
    bin, csv, png
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Skadi Loist; Skadi Loist; Evgenia (Zhenya) Samoilova; Evgenia (Zhenya) Samoilova (2024). Film Circulation dataset [Dataset]. http://doi.org/10.5281/zenodo.7887672
    Explore at:
    csv, png, binAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Skadi Loist; Skadi Loist; Evgenia (Zhenya) Samoilova; Evgenia (Zhenya) Samoilova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Complete dataset of “Film Circulation on the International Film Festival Network and the Impact on Global Film Culture”

    A peer-reviewed data paper for this dataset is in review to be published in NECSUS_European Journal of Media Studies - an open access journal aiming at enhancing data transparency and reusability, and will be available from https://necsus-ejms.org/ and https://mediarep.org

    Please cite this when using the dataset.


    Detailed description of the dataset:

    1 Film Dataset: Festival Programs

    The Film Dataset consists a data scheme image file, a codebook and two dataset tables in csv format.

    The codebook (csv file “1_codebook_film-dataset_festival-program”) offers a detailed description of all variables within the Film Dataset. Along with the definition of variables it lists explanations for the units of measurement, data sources, coding and information on missing data.

    The csv file “1_film-dataset_festival-program_long” comprises a dataset of all films and the festivals, festival sections, and the year of the festival edition that they were sampled from. The dataset is structured in the long format, i.e. the same film can appear in several rows when it appeared in more than one sample festival. However, films are identifiable via their unique ID.

    The csv file “1_film-dataset_festival-program_wide” consists of the dataset listing only unique films (n=9,348). The dataset is in the wide format, i.e. each row corresponds to a unique film, identifiable via its unique ID. For easy analysis, and since the overlap is only six percent, in this dataset the variable sample festival (fest) corresponds to the first sample festival where the film appeared. For instance, if a film was first shown at Berlinale (in February) and then at Frameline (in June of the same year), the sample festival will list “Berlinale”. This file includes information on unique and IMDb IDs, the film title, production year, length, categorization in length, production countries, regional attribution, director names, genre attribution, the festival, festival section and festival edition the film was sampled from, and information whether there is festival run information available through the IMDb data.


    2 Survey Dataset

    The Survey Dataset consists of a data scheme image file, a codebook and two dataset tables in csv format.

    The codebook “2_codebook_survey-dataset” includes coding information for both survey datasets. It lists the definition of the variables or survey questions (corresponding to Samoilova/Loist 2019), units of measurement, data source, variable type, range and coding, and information on missing data.

    The csv file “2_survey-dataset_long-festivals_shared-consent” consists of a subset (n=161) of the original survey dataset (n=454), where respondents provided festival run data for films (n=206) and gave consent to share their data for research purposes. This dataset consists of the festival data in a long format, so that each row corresponds to the festival appearance of a film.

    The csv file “2_survey-dataset_wide-no-festivals_shared-consent” consists of a subset (n=372) of the original dataset (n=454) of survey responses corresponding to sample films. It includes data only for those films for which respondents provided consent to share their data for research purposes. This dataset is shown in wide format of the survey data, i.e. information for each response corresponding to a film is listed in one row. This includes data on film IDs, film title, survey questions regarding completeness and availability of provided information, information on number of festival screenings, screening fees, budgets, marketing costs, market screenings, and distribution. As the file name suggests, no data on festival screenings is included in the wide format dataset.


    3 IMDb & Scripts

    The IMDb dataset consists of a data scheme image file, one codebook and eight datasets, all in csv format. It also includes the R scripts that we used for scraping and matching.

    The codebook “3_codebook_imdb-dataset” includes information for all IMDb datasets. This includes ID information and their data source, coding and value ranges, and information on missing data.

    The csv file “3_imdb-dataset_aka-titles_long” contains film title data in different languages scraped from IMDb in a long format, i.e. each row corresponds to a title in a given language.

    The csv file “3_imdb-dataset_awards_long” contains film award data in a long format, i.e. each row corresponds to an award of a given film.

    The csv file “3_imdb-dataset_companies_long” contains data on production and distribution companies of films. The dataset is in a long format, so that each row corresponds to a particular company of a particular film.

    The csv file “3_imdb-dataset_crew_long” contains data on names and roles of crew members in a long format, i.e. each row corresponds to each crew member. The file also contains binary gender assigned to directors based on their first names using the GenderizeR application.

    The csv file “3_imdb-dataset_festival-runs_long” contains festival run data scraped from IMDb in a long format, i.e. each row corresponds to the festival appearance of a given film. The dataset does not include each film screening, but the first screening of a film at a festival within a given year. The data includes festival runs up to 2019.

    The csv file “3_imdb-dataset_general-info_wide” contains general information about films such as genre as defined by IMDb, languages in which a film was shown, ratings, and budget. The dataset is in wide format, so that each row corresponds to a unique film.

    The csv file “3_imdb-dataset_release-info_long” contains data about non-festival release (e.g., theatrical, digital, tv, dvd/blueray). The dataset is in a long format, so that each row corresponds to a particular release of a particular film.

    The csv file “3_imdb-dataset_websites_long” contains data on available websites (official websites, miscellaneous, photos, video clips). The dataset is in a long format, so that each row corresponds to a website of a particular film.

    The dataset includes 8 text files containing the script for webscraping. They were written using the R-3.6.3 version for Windows.

    The R script “r_1_unite_data” demonstrates the structure of the dataset, that we use in the following steps to identify, scrape, and match the film data.

    The R script “r_2_scrape_matches” reads in the dataset with the film characteristics described in the “r_1_unite_data” and uses various R packages to create a search URL for each film from the core dataset on the IMDb website. The script attempts to match each film from the core dataset to IMDb records by first conducting an advanced search based on the movie title and year, and then potentially using an alternative title and a basic search if no matches are found in the advanced search. The script scrapes the title, release year, directors, running time, genre, and IMDb film URL from the first page of the suggested records from the IMDb website. The script then defines a loop that matches (including matching scores) each film in the core dataset with suggested films on the IMDb search page. Matching was done using data on directors, production year (+/- one year), and title, a fuzzy matching approach with two methods: “cosine” and “osa.” where the cosine similarity is used to match titles with a high degree of similarity, and the OSA algorithm is used to match titles that may have typos or minor variations.

    The script “r_3_matching” creates a dataset with the matches for a manual check. Each pair of films (original film from the core dataset and the suggested match from the IMDb website was categorized in the following five categories: a) 100% match: perfect match on title, year, and director; b) likely good match; c) maybe match; d) unlikely match; and e) no match). The script also checks for possible doubles in the dataset and identifies them for a manual check.

    The script “r_4_scraping_functions” creates a function for scraping the data from the identified matches (based on the scripts described above and manually checked). These functions are used for scraping the data in the next script.

    The script “r_5a_extracting_info_sample” uses the function defined in the “r_4_scraping_functions”, in order to scrape the IMDb data for the identified matches. This script does that for the first 100 films, to check, if everything works. Scraping for the entire dataset took a few hours. Therefore, a test with a subsample of 100 films is advisable.

    The script “r_5b_extracting_info_all” extracts the data for the entire dataset of the identified matches.

    The script “r_5c_extracting_info_skipped” checks the films with missing data (where data was not scraped) and tried to extract data one more time to make sure that the errors were not caused by disruptions in the internet connection or other technical issues.

    The script “r_check_logs” is used for troubleshooting and tracking the progress of all of the R scripts used. It gives information on the amount of missing values and errors.


    4 Festival Library Dataset

    The Festival Library Dataset consists of a data scheme image file, one codebook and one dataset, all in csv format.

    The codebook (csv file “4_codebook_festival-library_dataset”) offers a detailed description of all variables within the Library Dataset. It lists the definition of variables, such as location and festival name, and festival categories,

  6. Development of a Tool to Determine the Variability of Consensus Mass Spectra...

    • data.nist.gov
    • catalog.data.gov
    Updated Mar 23, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Place (2021). Development of a Tool to Determine the Variability of Consensus Mass Spectra Supporting Data [Dataset]. http://doi.org/10.18434/mds2-2299
    Explore at:
    Dataset updated
    Mar 23, 2021
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Authors
    Benjamin Place
    License

    https://www.nist.gov/open/licensehttps://www.nist.gov/open/license

    Description

    Supporting datasets and algorithms (R-based) for the manuscript entitled "Development of a Tool to Determine the Variability of Consensus Mass Spectra", including an R Markdown script to reproduce the manuscript's figures.

  7. Explore data formats and ingestion methods

    • kaggle.com
    Updated Feb 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriel Preda (2021). Explore data formats and ingestion methods [Dataset]. https://www.kaggle.com/datasets/gpreda/iris-dataset/discussion?sort=undefined
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 12, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Gabriel Preda
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Why this Dataset

    This dataset brings to you Iris Dataset in several data formats (see more details in the next sections).

    You can use it to test the ingestion of data in all these formats using Python or R libraries. We also prepared Python Jupyter Notebook and R Markdown report that input all these formats:

    Iris Dataset

    Iris Dataset was created by R. A. Fisher and donated by Michael Marshall.

    Repository on UCI site: https://archive.ics.uci.edu/ml/datasets/iris

    Data Source: https://archive.ics.uci.edu/ml/machine-learning-databases/iris/

    The file downloaded is iris.data and is formatted as a comma delimited file.

    This small data collection was created to help you test your skills with ingesting various data formats.

    Content

    This file was processed to convert the data in the following formats: * csv - comma separated values format * tsv - tab separated values format * parquet - parquet format
    * feather - feather format * parquet.gzip - compressed parquet format * h5 - hdf5 format * pickle - Python binary object file - pickle format * xslx - Excel format
    * npy - Numpy (Python library) binary format * npz - Numpy (Python library) binary compressed format * rds - Rds (R specific data format) binary format

    Acknowledgements

    I would like to acknowledge the work of the creator of the dataset - R. A. Fisher and of the donor - Michael Marshall.

    Inspiration

    Use these data formats to test your skills in ingesting data in various formats.

  8. h

    autotrain-data-test

    • huggingface.co
    Updated Apr 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eddie (2024). autotrain-data-test [Dataset]. https://huggingface.co/datasets/j23349/autotrain-data-test
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 21, 2024
    Authors
    Eddie
    Description

    AutoTrain Dataset for project: test

      Dataset Description
    

    This dataset has been automatically processed by AutoTrain for project test.

      Languages
    

    The BCP-47 code for the dataset's language is en.

      Dataset Structure
    
    
    
    
    
    
    
      Data Instances
    

    A sample from this dataset looks as follows: [ { "feat_id": "13829542", "text": "Kasia: When are u coming back?\r Matt: Back where?\r Kasia: Oh come on\r Kasia: you know what i… See the full description on the dataset page: https://huggingface.co/datasets/j23349/autotrain-data-test.

  9. [Superseded] Intellectual Property Government Open Data 2019

    • researchdata.edu.au
    • data.gov.au
    Updated Jun 6, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IP Australia (2019). [Superseded] Intellectual Property Government Open Data 2019 [Dataset]. https://researchdata.edu.au/superseded-intellectual-property-data-2019/2994670
    Explore at:
    Dataset updated
    Jun 6, 2019
    Dataset provided by
    Data.govhttps://data.gov/
    Authors
    IP Australia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    What is IPGOD?\r

    The Intellectual Property Government Open Data (IPGOD) includes over 100 years of registry data on all intellectual property (IP) rights administered by IP Australia. It also has derived information about the applicants who filed these IP rights, to allow for research and analysis at the regional, business and individual level. This is the 2019 release of IPGOD.\r \r \r

    How do I use IPGOD?\r

    IPGOD is large, with millions of data points across up to 40 tables, making them too large to open with Microsoft Excel. Furthermore, analysis often requires information from separate tables which would need specialised software for merging. We recommend that advanced users interact with the IPGOD data using the right tools with enough memory and compute power. This includes a wide range of programming and statistical software such as Tableau, Power BI, Stata, SAS, R, Python, and Scalar.\r \r \r

    IP Data Platform\r

    IP Australia is also providing free trials to a cloud-based analytics platform with the capabilities to enable working with large intellectual property datasets, such as the IPGOD, through the web browser, without any installation of software. IP Data Platform\r \r

    References\r

    \r The following pages can help you gain the understanding of the intellectual property administration and processes in Australia to help your analysis on the dataset.\r \r * Patents\r * Trade Marks\r * Designs\r * Plant Breeder’s Rights\r \r \r

    Updates\r

    \r

    Tables and columns\r

    \r Due to the changes in our systems, some tables have been affected.\r \r * We have added IPGOD 225 and IPGOD 325 to the dataset!\r * The IPGOD 206 table is not available this year.\r * Many tables have been re-built, and as a result may have different columns or different possible values. Please check the data dictionary for each table before use.\r \r

    Data quality improvements\r

    \r Data quality has been improved across all tables.\r \r * Null values are simply empty rather than '31/12/9999'.\r * All date columns are now in ISO format 'yyyy-mm-dd'.\r * All indicator columns have been converted to Boolean data type (True/False) rather than Yes/No, Y/N, or 1/0.\r * All tables are encoded in UTF-8.\r * All tables use the backslash \ as the escape character.\r * The applicant name cleaning and matching algorithms have been updated. We believe that this year's method improves the accuracy of the matches. Please note that the "ipa_id" generated in IPGOD 2019 will not match with those in previous releases of IPGOD.

  10. e

    Check actions of CTIA, their focus and issued fines

    • data.europa.eu
    • data.gov.cz
    rdf trig, unknown
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Univerzita Karlova, Check actions of CTIA, their focus and issued fines [Dataset]. https://data.europa.eu/data/datasets/https-lkod-mff-cuni-cz-zdroj-datove-sady-stirdata-c-oi-kontroly-zame-r-eni-sankce
    Explore at:
    unknown, rdf trigAvailable download formats
    Dataset authored and provided by
    Univerzita Karlova
    License

    https://data.gov.cz/zdroj/datové-sady/00216208/88a8c777bba57747498f6671020dedd2/distribuce/b7e73a5c2f2635b405e5be40dd17d362/podmínky-užitíhttps://data.gov.cz/zdroj/datové-sady/00216208/88a8c777bba57747498f6671020dedd2/distribuce/b7e73a5c2f2635b405e5be40dd17d362/podmínky-užití

    https://data.gov.cz/zdroj/datové-sady/00216208/88a8c777bba57747498f6671020dedd2/distribuce/974dc2a3d80ee61893c30d710738b059/podmínky-užitíhttps://data.gov.cz/zdroj/datové-sady/00216208/88a8c777bba57747498f6671020dedd2/distribuce/974dc2a3d80ee61893c30d710738b059/podmínky-užití

    Description

    Check actions of the Czech Trade Inspection Authority, their focus and issued fines. Transformed to RDF from the datasets https://data.gov.cz/zdroj/datové-sady/00020869/98803, https://data.gov.cz/zdroj/datové-sady/00020869/99266 and https://data.gov.cz/zdroj/datové-sady/00020869/98719

  11. c

    R code that determines buying and selling of water by public-supply water...

    • s.cnmilf.com
    • data.usgs.gov
    • +1more
    Updated Aug 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). R code that determines buying and selling of water by public-supply water service areas [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/r-code-that-determines-buying-and-selling-of-water-by-public-supply-water-service-areas
    Explore at:
    Dataset updated
    Aug 29, 2024
    Dataset provided by
    U.S. Geological Survey
    Description

    This child item describes R code used to determine whether public-supply water systems buy water, sell water, both buy and sell water, or are neutral (meaning the system has only local water supplies) using water source information from a proprietary dataset from the U.S. Environmental Protection Agency. This information was needed to better understand public-supply water use and where water buying and selling were likely to occur. Buying or selling of water may result in per capita rates that are not representative of the population within the water service area. This dataset is part of a larger data release using machine learning to predict public supply water use for 12-digit hydrologic units from 2000-2020. Output from this code was used as an input feature variable in the public supply water use machine learning model. This page includes the following files: ID_WSA_04062022_Buyers_Sellers_DR.R - an R script used to determine whether a public-supply water service area buys water, sells water, or is neutral BuySell_readme.txt - a README text file describing the script

  12. Replication dataset and calculations for PIIE Briefing 16-3, Reality Check...

    • piie.com
    Updated Mar 13, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julien Acalin; Olivier Blanchard; Monica de Bolle; José De Gregorio; Caroline Freund; Joseph E. Gagnon; Nicholas R. Lardy; Adam S. Posen; David J. Stockton; Nicolas Véron (2016). Replication dataset and calculations for PIIE Briefing 16-3, Reality Check for the Global Economy, by Julien Acalin, Olivier Blanchard, Monica de Bolle, José De Gregorio, Caroline Freund, Joseph E. Gagnon, Nicholas R. Lardy, Adam S. Posen, David J. Stockt [Dataset]. https://www.piie.com/publications/piie-briefings/reality-check-global-economy
    Explore at:
    Dataset updated
    Mar 13, 2016
    Dataset provided by
    Peterson Institute for International Economicshttp://www.piie.com/
    Authors
    Julien Acalin; Olivier Blanchard; Monica de Bolle; José De Gregorio; Caroline Freund; Joseph E. Gagnon; Nicholas R. Lardy; Adam S. Posen; David J. Stockton; Nicolas Véron
    Description

    This data package includes the underlying data and files to replicate the calculations, charts, and tables presented in Reality Check for the Global Economy, PIIE Briefing 16-3.

    If you use the data, please cite as: Acalin, Julien, Olivier Blanchard, Monica de Bolle, José De Gregorio, Caroline Freund, Joseph E. Gagnon, Nicholas R. Lardy, Adam S. Posen, David J. Stockton, and Nicolas Véron. (2016). Reality Check for the Global Economy. PIIE Briefing 16-3. Peterson Institute for International Economics.

  13. Data from: XBT and CTD pairs dataset Version 1

    • researchdata.edu.au
    • data.csiro.au
    • +1more
    datadownload
    Updated Oct 16, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark Rosenberg; Steve Rintoul; Esmee Van Wijk; Gustavo Goni; Kimio Hanawa; Shoichi Kizu; Tim Boyer; Rebecca Cowley (2014). XBT and CTD pairs dataset Version 1 [Dataset]. http://doi.org/10.4225/08/52AE99A4663B1
    Explore at:
    datadownloadAvailable download formats
    Dataset updated
    Oct 16, 2014
    Dataset provided by
    CSIROhttp://www.csiro.au/
    Authors
    Mark Rosenberg; Steve Rintoul; Esmee Van Wijk; Gustavo Goni; Kimio Hanawa; Shoichi Kizu; Tim Boyer; Rebecca Cowley
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1967 - Dec 31, 2011
    Area covered
    Description

    The XBT/CTD pairs dataset (Version 1) is the dataset used to calculate the historical XBT fall rate and temperature corrections presented in Cowley, R., Wijffels, S., Cheng, L., Boyer, T., and Kizu, S. (2013). Biases in Expendable Bathythermograph Data: A New View Based on Historical Side-by-Side Comparisons. Journal of Atmospheric and Oceanic Technology, 30, 1195–1225, doi:10.1175/JTECH-D-12-00127.1. http://journals.ametsoc.org/doi/abs/10.1175/JTECH-D-12-00127.1

    4,115 pairs from 114 datasets were used to derive the fall rate and temperature corrections. Each dataset contains the scientifically quality controlled version and (where available) the originator's data. The XBT/CTD pairs are identified in the document 'XBT_CTDpairs_metadata_V1.csv'. Note that future versions of the XBT/CTD pairs database may supersede this version. Please check more recent versions for updates to individual datasets. Lineage: Data is sourced from the World Ocean Database, NOAA, CSIRO Marine and Atmospheric Research, Bundesamt für Seeschifffahrt und Hydrographie (BSH), Hamburg, Germany, Australian Antarctic Division. Original and raw data files are included where available. Quality controlled datasets follow the procedure of Bailey, R., Gronell, A., Phillips, H., Tanner, E., and Meyers, G. (1994). Quality control cookbook for XBT data, Version 1.1. CSIRO Marine Laboratories Reports, 221. Quality controlled data is in the 'MQNC' format used at CSIRO Marine and Atmospheric Research. The MQNC format is described in the document 'XBT_CTDpairs_descriptionV1.pdf'. Note that future versions of the XBT/CTD pairs database may supersede this version. Please check more recent versions for updates to individual datasets.

  14. ASIC - Company Dataset

    • researchdata.edu.au
    • cloud.csiss.gmu.edu
    • +2more
    Updated Sep 3, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Australian Securities and Investments Commission (ASIC) (2014). ASIC - Company Dataset [Dataset]. https://researchdata.edu.au/asic-company-dataset/2975914
    Explore at:
    Dataset updated
    Sep 3, 2014
    Dataset provided by
    Data.govhttps://data.gov/
    Authors
    Australian Securities and Investments Commission (ASIC)
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    Update March 2025 ###\r

    \r From 11 March 2025, the dataset will be updated to include 1 new field, Date of Deregistration, (see help file for details). \r \r

    Update August 2018 - frequency change to Company dataset ###\r

    \r From 7 August 2018, the Company dataset will be updated weekly every Tuesday. As a result, the information might not be accurate at the time you check the Company dataset.\r ASIC-Connect updates information in real time, therefore, please consider accessing information on that platform if you need up to date information.\r \r ***\r \r

    Dataset summary###\r

    ASIC is Australia’s corporate, markets and financial services regulator. ASIC contributes to Australia’s economic reputation and wellbeing by ensuring that Australia’s financial markets are fair and transparent, supported by confident and informed investors and consumers.\r \r Australian companies are required to keep their details up to date on ASIC's Company Register. Information contained in the register is made available to the public to search via ASIC's website.\r \r Select data from the ASIC's Company Register will be uploaded each week to www.data.gov.au. The data made available will be a snapshot of the register at a point in time. Legislation prescribes the type of information ASIC is allowed to disclose to the public.\r \r The information included in the downloadable dataset is:\r \r * Company Name\r * Australian Company Number (ACN)\r * Type \r * Class\r * Sub Class \r * Status\r * Date of Registration\r * Date of Deregistration (Available from 11 March 2025)\r * Previous State of Registration (where applicable)\r * State Registration Number (where applicable) \r * Modified since last report – flag to indicate if data has been modified since last report\r * Current Name Indicator\r * Australian Business Number (ABN) \r * Current Name\r * Current Name Start Date\r \r Additional information about companies can be found via ASIC's website. Accessing some information may attract a fee.\r \r More information about searching ASIC's registers.\r

  15. m

    HUN AWRA-R simulation nodes v01

    • demo.dev.magda.io
    • researchdata.edu.au
    • +2more
    zip
    Updated Dec 4, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bioregional Assessment Program (2022). HUN AWRA-R simulation nodes v01 [Dataset]. https://demo.dev.magda.io/dataset/ds-dga-d9a4fd10-e099-48cb-b7ee-07d4000bb829
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 4, 2022
    Dataset provided by
    Bioregional Assessment Program
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract The dataset was derived by the Bioregional Assessment Programme from multiple datasets. The source dataset is identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement. The dataset consists of an excel spreadsheet and shapefile representing the locations of simulation nodes used in the AWRA-R model. Some of the nodes correspond to gauging station locations or dam …Show full descriptionAbstract The dataset was derived by the Bioregional Assessment Programme from multiple datasets. The source dataset is identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement. The dataset consists of an excel spreadsheet and shapefile representing the locations of simulation nodes used in the AWRA-R model. Some of the nodes correspond to gauging station locations or dam locations whereas other locations represent river confluences or catchment outlets which have no gauging. These are marked as "Dummy". Purpose Locations are used as pour points in oder to define reach areas for river system modelling. Dataset History Subset of data for the Hunter that was extracted from the Bureau of Meteorology's hydstra system and includes all gauges where data has been received from the lead water agency of each jurisdiction. Simulation nodes were added in locations in which the model will provide simulated streamflow. There are 3 files that have been extracted from the Hydstra database to aid in identifying sites in each bioregion and the type of data collected from each on. These data were used to determine the simulation node locations where model outputs were generated. The 3 files contained within the source dataset used for this determination are: Site - lists all sites available in Hydstra from data providers. The data provider is listed in the #Station as _xxx. For example, sites in NSW are _77, QLD are _66. Some sites do not have locational information and will not be able to be plotted. Period - the period table lists all the variables that are recorded at each site and the period of record. Variable - the variable table shows variable codes and names which can be linked to the period table. Relevant location information and other data were extracted to construct the spreadsheet and shapefile within this dataset. Dataset Citation Bioregional Assessment Programme (XXXX) HUN AWRA-R simulation nodes v01. Bioregional Assessment Derived Dataset. Viewed 13 March 2019, http://data.bioregionalassessments.gov.au/dataset/fda20928-d486-49d2-b362-e860c1918b06. Dataset Ancestors Derived From National Surface Water sites Hydstra

  16. m

    Data from: Datasets for lot sizing and scheduling problems in the...

    • data.mendeley.com
    • narcis.nl
    Updated Jan 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juan Piñeros (2021). Datasets for lot sizing and scheduling problems in the fruit-based beverage production process [Dataset]. http://doi.org/10.17632/j2x3gbskfw.1
    Explore at:
    Dataset updated
    Jan 19, 2021
    Authors
    Juan Piñeros
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The datasets presented here were partially used in “Formulation and MIP-heuristics for the lot sizing and scheduling problem with temporal cleanings” (Toscano, A., Ferreira, D. , Morabito, R. , Computers & Chemical Engineering) [1], in “A decomposition heuristic to solve the two-stage lot sizing and scheduling problem with temporal cleaning” (Toscano, A., Ferreira, D. , Morabito, R. , Flexible Services and Manufacturing Journal) [2], and in “A heuristic approach to optimize the production scheduling of fruit-based beverages” (Toscano et al., Gestão & Produção, 2020) [3]. In fruit-based production processes, there are two production stages: preparation tanks and production lines. This production process has some process-specific characteristics, such as temporal cleanings and synchrony between the two production stages, which make optimized production planning and scheduling even more difficult. In this sense, some papers in the literature have proposed different methods to solve this problem. To the best of our knowledge, there are no standard datasets used by researchers in the literature in order to verify the accuracy and performance of proposed methods or to be a benchmark for other researchers considering this problem. The authors have been using small data sets that do not satisfactorily represent different scenarios of production. Since the demand in the beverage sector is seasonal, a wide range of scenarios enables us to evaluate the effectiveness of the proposed methods in the scientific literature in solving real scenarios of the problem. The datasets presented here include data based on real data collected from five beverage companies. We presented four datasets that are specifically constructed assuming a scenario of restricted capacity and balanced costs. These dataset is supplementary data for the submitted paper to Data in Brief [4]. [1] Toscano, A., Ferreira, D., Morabito, R., Formulation and MIP-heuristics for the lot sizing and scheduling problem with temporal cleanings, Computers & Chemical Engineering. 142 (2020) 107038. Doi: 10.1016/j.compchemeng.2020.107038. [2] Toscano, A., Ferreira, D., Morabito, R., A decomposition heuristic to solve the two-stage lot sizing and scheduling problem with temporal cleaning, Flexible Services and Manufacturing Journal. 31 (2019) 142-173. Doi: 10.1007/s10696-017-9303-9. [3] Toscano, A., Ferreira, D., Morabito, R., Trassi, M. V. C., A heuristic approach to optimize the production scheduling of fruit-based beverages. Gestão & Produção, 27(4), e4869, 2020. https://doi.org/10.1590/0104-530X4869-20. [4] Piñeros, J., Toscano, A., Ferreira, D., Morabito, R., Datasets for lot sizing and scheduling problems in the fruit-based beverage production process. Data in Brief (2021).

  17. c

    GLM model data sets used to evaluate changes in the hydrodynamics of Anvil...

    • s.cnmilf.com
    • data.usgs.gov
    • +4more
    Updated Jul 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). GLM model data sets used to evaluate changes in the hydrodynamics of Anvil Lake, Wisconsin: U.S. Geological Survey Data Release [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/glm-model-data-sets-used-to-evaluate-changes-in-the-hydrodynamics-of-anvil-lake-wisconsin-
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Wisconsin, Anvil Lake
    Description

    Interannual differences in the water quality of Anvil Lake, WI, were examined to determine how water level and climate affect the hydrodynamics and trophic state of shallow lakes, and their importance compared to anthropogenic changes in the watershed. To determine how changes in water level may affect these processes, the General Lake Model (GLM) was used to simulate how the lake’s thermal structure should change in response to changes in water level using R. This dataset includes the data inputs to the GLM model and the direct outputs from the model. Model Calibration (GLM_CalibrationZ); Simulation of with Deep Lake and Cold Weather (GLM_Deep_Cold_SimulationZ); Simulation of with Deep Lake and Hot Weather (GLM_Deep_Hot_SimulationZ); Simulation of with Shallow and Hot Weather (GLM_Shallow_Hot_SimulationZ); Simulation of with Shallow Lake and Cold Weather (GLM_Shallow_Cold_SimulationZ).

  18. Case Study: Cyclist

    • kaggle.com
    Updated Jul 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PatrickRCampbell (2021). Case Study: Cyclist [Dataset]. https://www.kaggle.com/patrickrcampbell/case-study-cyclist/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 27, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    PatrickRCampbell
    Description

    Phase 1: ASK

    Key Objectives:

    1. Business Task * Cyclist is looking to increase their earnings, and wants to know if creating a social media campaign can influence "Casual" users to become "Annual" members.

    2. Key Stakeholders: * The main stakeholder from Cyclist is Lily Moreno, whom is the Director of Marketing and responsible for the development of campaigns and initiatives to promote their bike-share program. The other teams involved with this project will be Marketing & Analytics, and the Executive Team.

    3. Business Task: * Comparing the two kinds of users and defining how they use the platform, what variables they have in common, what variables are different, and how can they get Casual users to become Annual members

    Phase 2: PREPARE:

    Key Objectives:

    1. Determine Data Credibility * Cyclist provided data from years 2013-2021 (through March 2021), all of which is first-hand data collected by the company.

    2. Sort & Filter Data: * The stakeholders want to know how the current users are using their service, so I am focusing on using the data from 2020-2021 since this is the most relevant period of time to answer the business task.

    #Installing packages
    install.packages("tidyverse", repos = "http://cran.us.r-project.org")
    install.packages("readr", repos = "http://cran.us.r-project.org")
    install.packages("janitor", repos = "http://cran.us.r-project.org")
    install.packages("geosphere", repos = "http://cran.us.r-project.org")
    install.packages("gridExtra", repos = "http://cran.us.r-project.org")
    
    library(tidyverse)
    library(readr)
    library(janitor)
    library(geosphere)
    library(gridExtra)
    
    #Importing data & verifying the information within the dataset
    all_tripdata_clean <- read.csv("/Data Projects/cyclist/cyclist_data_cleaned.csv")
    
    glimpse(all_tripdata_clean)
    
    summary(all_tripdata_clean)
    
    

    Phase 3: PROCESS

    Key Objectives:

    1. Cleaning Data & Preparing for Analysis: * Once the data has been placed into one dataset, and checked for errors, we began cleaning the data. * Eliminating data that correlates to the company servicing the bikes, and any ride with a traveled distance of zero. * New columns will be added to assist in the analysis, and to provide accurate assessments of whom is using the bikes.

    #Eliminating any data that represents the company performing maintenance, and trips without any measureable distance
    all_tripdata_clean <- all_tripdata_clean[!(all_tripdata_clean$start_station_name == "HQ QR" | all_tripdata_clean$ride_length<0),] 
    
    #Creating columns for the individual date components (days_of_week should be run last)
    all_tripdata_clean$day_of_week <- format(as.Date(all_tripdata_clean$date), "%A")
    all_tripdata_clean$date <- as.Date(all_tripdata_clean$started_at)
    all_tripdata_clean$day <- format(as.Date(all_tripdata_clean$date), "%d")
    all_tripdata_clean$month <- format(as.Date(all_tripdata_clean$date), "%m")
    all_tripdata_clean$year <- format(as.Date(all_tripdata_clean$date), "%Y")
    
    

    ** Now I will begin calculating the length of rides being taken, distance traveled, and the mean amount of time & distance.**

    #Calculating the ride length in miles & minutes
    all_tripdata_clean$ride_length <- difftime(all_tripdata_clean$ended_at,all_tripdata_clean$started_at,units = "mins")
    
    all_tripdata_clean$ride_distance <- distGeo(matrix(c(all_tripdata_clean$start_lng, all_tripdata_clean$start_lat), ncol = 2), matrix(c(all_tripdata_clean$end_lng, all_tripdata_clean$end_lat), ncol = 2))
    all_tripdata_clean$ride_distance = all_tripdata_clean$ride_distance/1609.34 #converting to miles
    
    #Calculating the mean time and distance based on the user groups
    userType_means <- all_tripdata_clean %>% group_by(member_casual) %>% summarise(mean_time = mean(ride_length))
    
    
    userType_means <- all_tripdata_clean %>% 
     group_by(member_casual) %>% 
     summarise(mean_time = mean(ride_length),mean_distance = mean(ride_distance))
    

    Adding in calculations that will differentiate between bike types and which type of user is using each specific bike type.

    #Calculations
    
    with_bike_type <- all_tripdata_clean %>% filter(rideable_type=="classic_bike" | rideable_type=="electric_bike")
    
    with_bike_type %>%
     mutate(weekday = wday(started_at, label = TRUE)) %>% 
     group_by(member_casual,rideable_type,weekday) %>%
     summarise(totals=n(), .groups="drop") %>%
     
    with_bike_type %>%
     group_by(member_casual,rideable_type) %>%
     summarise(totals=n(), .groups="drop") %>%
    
     #Calculating the ride differential
     
     all_tripdata_clean %>% 
     mutate(weekday = wkday(started_at, label = TRUE)) %>% 
     group_by(member_casual, weekday) %>% 
     summarise(number_of_rides = n()
          ,average_duration = mean(ride_length),.groups = 'drop') %>% 
     arrange(me...
    
  19. d

    Current Population Survey (CPS)

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Damico, Anthony (2023). Current Population Survey (CPS) [Dataset]. http://doi.org/10.7910/DVN/AK4FDD
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Damico, Anthony
    Description

    analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D

  20. d

    Namoi AWRA-R model implementation (post groundwater input)

    • data.gov.au
    • cloud.csiss.gmu.edu
    • +1more
    Updated Aug 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bioregional Assessment Program (2023). Namoi AWRA-R model implementation (post groundwater input) [Dataset]. https://data.gov.au/data/dataset/8681bd56-1806-40a8-892e-4da13cda86b8
    Explore at:
    Dataset updated
    Aug 9, 2023
    Dataset authored and provided by
    Bioregional Assessment Program
    Area covered
    Namoi River
    Description

    Abstract

    The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.

    AWRA-R model implementation.

    The metadata within the dataset contains the workflow, processes, input and output data and instructions to implement the Namoi AWRA-R model for model calibration or simulation. In Namoi the AWRA-R simulation was done twice, firstly without the baseflow input from the groundwater modelling and then second set of runs were carried out with the input from the groundwater modelling.

    Each sub-folder in the associated data has a readme file indicating folder contents and providing general instructions about the workflow performed.

    Detailed documentation of the AWRA-R model, is provided in: https://publications.csiro.au/rpr/download?pid=csiro:EP154523&dsid=DS2

    Documentation about the implementation of AWRA-R in the Namoi bioregion is provided in BA NAM 2.6.1.3 and 2.6.1.4 products.

    '..\AWRAR_Metadata_WGW\Namoi_Model_Sequence.pptx' shows the AWRA-L/R modelling sequence.

    Purpose

    BA Surface water modelling for Namoi bioregion

    Dataset History

    The directories within contain the input data and outputs of the Namoi AWRA-R model for model calibration and simulation. The folders of calibration is used as an example,simulation uses mirror files of these data, albeit with longer time-series depending on the simualtion period.

    Detailed documentation of the AWRA-R model, is provided in: https://publications.csiro.au/rpr/download?pid=csiro:EP154523&dsid=DS2

    Documentation about the implementation of AWRA-R in the Hunter bioregion is provided in BA NAM 2.6.1.3 and 2.6.1.4 products.

    Additional data needed to generate some of the inputs needed to implement AWRA-R are detailed in the corresponding metadata statement as stated below.

    Here is the parent folder:

    '..\AWRAR_Metadata_WGW..'

    Input data needed:

    1. Gauge/node topological information in '...\model calibration\NAM5.3.1_low_calib\gis\sites\AWRARv5.00_reaches.csv'.

    2. Look up table for soil thickness in '...\model calibration\NAM5.3.1_low_calib\gis\ASRIS_soil_properties\NAM_AWRAR_ASRIS_soil_thickness_v5.00.csv'. (check metadata statement)

    3. Look up tables of AWRA-LG groundwater parameters in '...\model calibration\NAM5.3.1_low_calib\gis\AWRA-LG_gw_parameters\'.

    4. Look up table of AWRA-LG catchment grid cell contribution in '...model calibration\NAM5.3.1_low_calib\gis\catchment-boundary\AWRA-R_catchment_x_AWRA-L_weight.csv'. (check metadata statement)

    5. Look up tables of link lengths for main river, tributaries and distributaries within a reach in \model calibration\NAM5.3.1_low_calib\gis\rivers\'. (check metadata statement)

    6. Time series data of AWRA-LG outputs: evaporation, rainfall, runoff and depth to groundwater.

    7. Gridded data of AWRA-LG groundwater parameters, refer to explanation in '...'\model calibration\NAM5.3.1_low_calib\rawdata\AWRA_LG_output\gw_parameters\README.txt'.

    8. Time series of observed or simulated reservoir level, volume and surface area for reservoirs used in the simulation: Keepit Dam, Split Rock and Chaffey Creek Dam.

      located in '...\model calibration\NAM5.3.1_low_calib\rawdata\reservoirs\'.

    9. Gauge station cross sections in '...\model calibration\NAM5.3.1_low_calib\rawdata\Site_Station_Sections\'. (check metadata statement)

    10. Daily Streamflow and level time-series in'...\model calibration\NAM5.3.1_low_calib\rawdata\streamflow_and_level_all_processed\'.

    11. Irrigation input, configuration and parameter files in '...\model calibration\NAM5.3.1_low_calib\inputs\NAM\irrigation\'.

    These come from the separate calibration of the AWRA-R irrigation module in:

    '...\irrigation calibration simulation\', refer to explanation in readme.txt file therein.
    
    1. For Dam simulation script, read the following readme.txt files

    ' ..\AWRAR_Metadata_WGW\dam model calibration simulation\Chaffey\readme.txt'

    '... \AWRAR_Metadata_WGW\dam model calibration simulation\Split_Rock_and_Keepit\readme.txt'

    Relevant ouputs include:

    1. AWRA-R time series of stores and fluxes in river reaches ('...\AWRAR_Metadata_WGW\model calibration\NAM5.3.1_low_calib\outputs\jointcalibration\v00\NAM\simulations\')

      including simulated streamflow in files denoted XXXXXX_full_period_states_nonrouting.csv where XXXXXX denotes gauge or node ID.

    2. AWRA-R time series of stores and fluxes for irrigation/mining in the same directory as above in files XXXXXX_irrigation_states.csv

    3. AWRA-R calibration validation goodness of fit metrics ('...\AWRAR_Metadata_WGW\model calibration\NAM5.3.1_low_calib\outputs\jointcalibration\v00\NAM\postprocessing\')

      in files calval_results_XXXXXX_v5.00.csv

    Dataset Citation

    Bioregional Assessment Programme (2017) Namoi AWRA-R model implementation (post groundwater input). Bioregional Assessment Derived Dataset. Viewed 12 March 2019, http://data.bioregionalassessments.gov.au/dataset/8681bd56-1806-40a8-892e-4da13cda86b8.

    Dataset Ancestors

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
U.S. Geological Survey (2024). R code used to estimate public supply consumptive water use [Dataset]. https://catalog.data.gov/dataset/r-code-used-to-estimate-public-supply-consumptive-water-use

R code used to estimate public supply consumptive water use

Explore at:
Dataset updated
Aug 29, 2024
Dataset provided by
U.S. Geological Survey
Description

This child item describes R code used to determine public supply consumptive use estimates. Consumptive use was estimated by scaling an assumed fraction of deliveries used for outdoor irrigation by spatially explicit estimates of evaporative demand using estimated domestic and commercial, industrial, and institutional deliveries from the public supply delivery machine learning model child item. This method scales public supply water service area outdoor water use by the relationship between service area gross reference evapotranspiration provided by GridMET and annual continental U.S. (CONUS) growing season maximum evapotranspiration. This relationship to climate at the CONUS scale could result in over- or under-estimation of consumptive use at public supply service areas where local variations differ from national variations in climate. This method also assumes that 50% of deliveries for total domestic and commercial, industrial, and institutional deliveries is used for outdoor purposes. This dataset is part of a larger data release using machine learning to predict public supply water use for 12-digit hydrologic units from 2000-2020. This page includes the following file: PS_ConsumptiveUse.zip - a zip file containing input datasets, scripts, and output datasets

Search
Clear search
Close search
Google apps
Main menu