3 datasets found
  1. Code and Data to "Quantile regression for temporal streamflow modeling"

    • zenodo.org
    bin, pdf
    Updated Nov 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johannes Laimighofer; Johannes Laimighofer (2024). Code and Data to "Quantile regression for temporal streamflow modeling" [Dataset]. http://doi.org/10.5281/zenodo.14066026
    Explore at:
    bin, pdfAvailable download formats
    Dataset updated
    Nov 26, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Johannes Laimighofer; Johannes Laimighofer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the accompanying code to "Quantile regression for temporal streamflow modeling", which is part of the manuscript "The Role of Process Heterogeneity in Statistical Modeling", which was submitted to the Austrian Journal of Statistics.

    The data used in this publication is fully accessible through the LamaH-CE dataset. The two scripts "functions_create_data.R" and "create_data.R" will create the final dataset used for modelling.

    "functions_modelling.R" provide the functions for tuning the XGBoost model and computing the SHAP values. An example script is also attached (calc_predictions_shap.R). "analyzing_results.R" and "error_metrics.R" will produce the final output used in the manuscript. Finally, two plots produced in the script are added as pdf.

    All data analysis was performed in R, and we want to acknowledge the following packages: dplyr, tidyr, lubridate, purrr, glmnet, xgboost, shapr, Metrics, gridExtra, zoo and wesanderson.

  2. o

    Data and code for "Plastic bag bans and fees reduce harmful bag litter on...

    • openicpsr.org
    delimited
    Updated Apr 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anna Papp; Kimberly Oremus (2024). Data and code for "Plastic bag bans and fees reduce harmful bag litter on shorelines" [Dataset]. http://doi.org/10.3886/E200661V3
    Explore at:
    delimitedAvailable download formats
    Dataset updated
    Apr 14, 2024
    Dataset provided by
    Columbia University
    University of Delaware
    Authors
    Anna Papp; Kimberly Oremus
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Code and data for "Plastic bag bans and fees reduce harmful bag litter on shorelines " by Anna Papp and Kimberly Oremus.Please see included README file for details: This folder includes code and data to fully replicate Figures 1-5. In addition, the folder also includes instructions to rerun data cleaning steps. Last modified: March 6, 2025For any questions, please reach out to ap3907@columbia.edu._Code (replication/code):To replicate main figures, run each file for each main figure: - 1_figure1.R- 1_figure2.R- 1_figure3.R - 1_figure4.R- 1_figure5.R Update the home directory to match where the directory is saved ("replication" folder) in this file before running it. The code will require you to install packages (see note on versions below).To replicate entire data cleaning pipeline:- First download all required data (explained in Data section below). - Run code in code/0_setup folder (refer to separate README file)._ R-Version and Package VersionsThe project was developed and executed using:- R version: 4.0.0 (2024-04-24)- Platform: macOS 13.5 Code was developed and main figures were created using the following versions: - data.table: 1.14.2- dplyr: 1.1.4- readr: 2.1.2- tidyr: 1.2.0- broom: 0.7.12- stringr: 1.5.1- lubridate: 1.7.9- raster: 3.5.15- sf: 1.0.7- readxl: 1.4.0- cobalt: 4.4.1.9002- spdep: 1.2.3- ggplot2: 3.4.4- PNWColors: 0.1.0- grid: 4.0.0- gridExtra: 2.3- ggpubr: 0.4.0- knitr: 1.48- zoo: 1.8.12 - fixest: 0.11.2- lfe: 2.8.7.1 - did: 2.1.2- didimputation: 0.3.0 - DIDmultiplegt: 0.1.0- DIDmultiplegtDYN: 1.0.15- scales: 1.2.1 - usmap: 0.6.1 - tigris: 2.0.1 - dotwhisker: 0.7.4_Data Processed data files are provided to replicate main figures. To replicate from raw data, follow the instructions below.Policies (needs to be recreated or email for version): Compiled from bagtheban.com/in-your-state/, rila.org/retail-compliance-center/consumer-bag-legislation, baglaws.com, nicholasinstitute.duke.edu/plastics-policy-inventory, and wikipedia.org/wiki/Plastic_bag_bans_in_the_United_States; and massgreen.org/plastic-bag-legislation.html and cawrecycles.org/list-of-local-bag-bans to confirm legislation in Massachusetts and California.TIDES (needs to be downloaded for full replication): Download cleanup data for the United States from Ocean Conservancy (coastalcleanupdata.org/reports). Download files for 2000-2009, 2010-2014, and then each separate year from 2015 until 2023. Save files in the data/tides directory, as year.csv (and 2000-2009.csv, 2010-2014.csv) Also download entanglement data for each year (2016-2023) separately in a file called data/tides/entanglement (each file should be called 'entangled-animals-united-states_YEAR.csv').Shapefiles (needs to be downloaded for full replication): Download shapefiles for processing cleanups and policies. Download county shapefiles from the US Census Bureau; save files in the data/shapefiles directory, county shapefile should be in folder called county (files called cb_2018_us_county_500k.shp). Download TIGER Zip Code tabulation areas from the US Census Bureau (through data.gov); save files in the data/shapefiles directory, zip codes shapefile folder and files should be called tl_2019_us_zcta510.Other: Helper files with US county and state fips codes, lists of US counties and zip codes in data/other directory, provided in the directory except as follows. Download zip code list and 2020 IRS population data from United States zip codes and save as uszipcodes.csv in data/other directory. Download demographic characteristics of zip codes from Social Explorer and save as raw_zip_characteristics.csv in data/other directory.Refer to the .txt files in each data folder to ensure all necessary files are downloaded.

  3. H

    Time-Series Matrix (TSMx): A visualization tool for plotting multiscale...

    • dataverse.harvard.edu
    Updated Jul 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Georgios Boumis; Brad Peter (2024). Time-Series Matrix (TSMx): A visualization tool for plotting multiscale temporal trends [Dataset]. http://doi.org/10.7910/DVN/ZZDYM9
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 8, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Georgios Boumis; Brad Peter
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Time-Series Matrix (TSMx): A visualization tool for plotting multiscale temporal trends TSMx is an R script that was developed to facilitate multi-temporal-scale visualizations of time-series data. The script requires only a two-column CSV of years and values to plot the slope of the linear regression line for all possible year combinations from the supplied temporal range. The outputs include a time-series matrix showing slope direction based on the linear regression, slope values plotted with colors indicating magnitude, and results of a Mann-Kendall test. The start year is indicated on the y-axis and the end year is indicated on the x-axis. In the example below, the cell in the top-right corner is the direction of the slope for the temporal range 2001–2019. The red line corresponds with the temporal range 2010–2019 and an arrow is drawn from the cell that represents that range. One cell is highlighted with a black border to demonstrate how to read the chart—that cell represents the slope for the temporal range 2004–2014. This publication entry also includes an excel template that produces the same visualizations without a need to interact with any code, though minor modifications will need to be made to accommodate year ranges other than what is provided. TSMx for R was developed by Georgios Boumis; TSMx was originally conceptualized and created by Brad G. Peter in Microsoft Excel. Please refer to the associated publication: Peter, B.G., Messina, J.P., Breeze, V., Fung, C.Y., Kapoor, A. and Fan, P., 2024. Perspectives on modifiable spatiotemporal unit problems in remote sensing of agriculture: evaluating rice production in Vietnam and tools for analysis. Frontiers in Remote Sensing, 5, p.1042624. https://www.frontiersin.org/journals/remote-sensing/articles/10.3389/frsen.2024.1042624 TSMx sample chart from the supplied Excel template. Data represent the productivity of rice agriculture in Vietnam as measured via EVI (enhanced vegetation index) from the NASA MODIS data product (MOD13Q1.V006). TSMx R script: # import packages library(dplyr) library(readr) library(ggplot2) library(tibble) library(tidyr) library(forcats) library(Kendall) options(warn = -1) # disable warnings # read data (.csv file with "Year" and "Value" columns) data <- read_csv("EVI.csv") # prepare row/column names for output matrices years <- data %>% pull("Year") r.names <- years[-length(years)] c.names <- years[-1] years <- years[-length(years)] # initialize output matrices sign.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) pval.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) slope.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) # function to return remaining years given a start year getRemain <- function(start.year) { years <- data %>% pull("Year") start.ind <- which(data[["Year"]] == start.year) + 1 remain <- years[start.ind:length(years)] return (remain) } # function to subset data for a start/end year combination splitData <- function(end.year, start.year) { keep <- which(data[['Year']] >= start.year & data[['Year']] <= end.year) batch <- data[keep,] return(batch) } # function to fit linear regression and return slope direction fitReg <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(sign(slope)) } # function to fit linear regression and return slope magnitude fitRegv2 <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(slope) } # function to implement Mann-Kendall (MK) trend test and return significance # the test is implemented only for n>=8 getMann <- function(batch) { if (nrow(batch) >= 8) { mk <- MannKendall(batch[['Value']]) pval <- mk[['sl']] } else { pval <- NA } return(pval) } # function to return slope direction for all combinations given a start year getSign <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) signs <- lapply(combs, fitReg) return(signs) } # function to return MK significance for all combinations given a start year getPval <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) pvals <- lapply(combs, getMann) return(pvals) } # function to return slope magnitude for all combinations given a start year getMagn <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) magns <- lapply(combs, fitRegv2) return(magns) } # retrieve slope direction, MK significance, and slope magnitude signs <- lapply(years, getSign) pvals <- lapply(years, getPval) magns <- lapply(years, getMagn) # fill-in output matrices dimension <- nrow(sign.matrix) for (i in 1:dimension) { sign.matrix[i, i:dimension] <- unlist(signs[i]) pval.matrix[i, i:dimension] <- unlist(pvals[i]) slope.matrix[i, i:dimension] <- unlist(magns[i]) } sign.matrix <-...

  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Johannes Laimighofer; Johannes Laimighofer (2024). Code and Data to "Quantile regression for temporal streamflow modeling" [Dataset]. http://doi.org/10.5281/zenodo.14066026
Organization logo

Code and Data to "Quantile regression for temporal streamflow modeling"

Explore at:
bin, pdfAvailable download formats
Dataset updated
Nov 26, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Johannes Laimighofer; Johannes Laimighofer
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This is the accompanying code to "Quantile regression for temporal streamflow modeling", which is part of the manuscript "The Role of Process Heterogeneity in Statistical Modeling", which was submitted to the Austrian Journal of Statistics.

The data used in this publication is fully accessible through the LamaH-CE dataset. The two scripts "functions_create_data.R" and "create_data.R" will create the final dataset used for modelling.

"functions_modelling.R" provide the functions for tuning the XGBoost model and computing the SHAP values. An example script is also attached (calc_predictions_shap.R). "analyzing_results.R" and "error_metrics.R" will produce the final output used in the manuscript. Finally, two plots produced in the script are added as pdf.

All data analysis was performed in R, and we want to acknowledge the following packages: dplyr, tidyr, lubridate, purrr, glmnet, xgboost, shapr, Metrics, gridExtra, zoo and wesanderson.

Search
Clear search
Close search
Google apps
Main menu