75 datasets found
  1. f

    Comparing spatial regression to random forests for large environmental data...

    • plos.figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eric W. Fox; Jay M. Ver Hoef; Anthony R. Olsen (2023). Comparing spatial regression to random forests for large environmental data sets [Dataset]. http://doi.org/10.1371/journal.pone.0229509
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Eric W. Fox; Jay M. Ver Hoef; Anthony R. Olsen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Environmental data may be “large” due to number of records, number of covariates, or both. Random forests has a reputation for good predictive performance when using many covariates with nonlinear relationships, whereas spatial regression, when using reduced rank methods, has a reputation for good predictive performance when using many records that are spatially autocorrelated. In this study, we compare these two techniques using a data set containing the macroinvertebrate multimetric index (MMI) at 1859 stream sites with over 200 landscape covariates. A primary application is mapping MMI predictions and prediction errors at 1.1 million perennial stream reaches across the conterminous United States. For the spatial regression model, we develop a novel transformation procedure that estimates Box-Cox transformations to linearize covariate relationships and handles possibly zero-inflated covariates. We find that the spatial regression model with transformations, and a subsequent selection of significant covariates, has cross-validation performance comparable to random forests. We also find that prediction interval coverage is close to nominal for each method, but that spatial regression prediction intervals tend to be narrower and have less variability than quantile regression forest prediction intervals. A simulation study is used to generalize results and clarify advantages of each modeling approach.

  2. Z

    Transformations in PubChem - Full Dataset

    • data.niaid.nih.gov
    Updated Mar 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhang, Jian (Jeff) (2025). Transformations in PubChem - Full Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5644560
    Explore at:
    Dataset updated
    Mar 14, 2025
    Dataset provided by
    Blanke, Gerd
    Cheng, Tiejun
    Schymanski, Emma
    Thiessen, Paul
    Bolton, Evan
    Zhang, Jian (Jeff)
    Helmus, Rick
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is an archive of the data contained in the "Transformations" section in PubChem for integration into patRoon and other workflows.

    For further details see the ECI GitLab site: README and main "tps" folder.

    Credits:

    Concepts: E Schymanski, E Bolton, J Zhang, T Cheng;

    Code (in R): E Schymanski, R Helmus, P Thiessen

    Transformations: E Schymanski, J Zhang, T Cheng and many contributors to various lists!

    PubChem infrastructure: PubChem team

    Reaction InChI (RInChI) calculations (v1.0): Gerd Blanke (previous versions of these files)

    Acknowledgements: ECI team who contributed to related efforts, especially: J. Krier, A. Lai, M. Narayanan, T. Kondic, P. Chirsir, E. Palm. All contributors to the NORMAN-SLE transformations!

    March 2025 released as v0.2.0 since the dataset grew by >3000 entries! The stats are:

    14 March 2025

    Unique Transformation Entries: 10904# Unique Reactions by CID: 9152# Unique Reactions by IK: 9139# Unique Reactions by IKFB: 8574# Unique NORMAN-SLE Compounds by CID: 8207# Unique ChEMBL Compounds by CID: 1419# Unique Compounds (all) by CID: 9267# Unique Predecessors (all) by CID: 3724# Unique Successors (all) by CID: 7331# Range of XlogP Differences: -9.9,10# Range of Mass Differences: -957.97490813,820.227106427

  3. Supplement 1. R code demonstrating how to fit a logistic regression model,...

    • wiley.figshare.com
    • figshare.com
    html
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David I. Warton; Francis K. C. Hui (2023). Supplement 1. R code demonstrating how to fit a logistic regression model, with a random intercept term, and how to use resampling-based hypothesis testing for inference. [Dataset]. http://doi.org/10.6084/m9.figshare.3550407.v1
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Wileyhttps://www.wiley.com/
    Authors
    David I. Warton; Francis K. C. Hui
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    File List glmmeg.R: R code demonstrating how to fit a logistic regression model, with a random intercept term, to randomly generated overdispersed binomial data. boot.glmm.R: R code for estimating P-values by applying the bootstrap to a GLMM likelihood ratio statistic. Description glmm.R is some example R code which show how to fit a logistic regression model (with or without a random effects term) and use diagnostic plots to check the fit. The code is run on some randomly generated data, which are generated in such a way that overdispersion is evident. This code could be directly applied for your own analyses if you read into R a data.frame called “dataset”, which has columns labelled “success” and “failure” (for number of binomial successes and failures), “species” (a label for the different rows in the dataset), and where we want to test for the effect of some predictor variable called “location”. In other cases, just change the labels and formula as appropriate. boot.glmm.R extends glmm.R by using bootstrapping to calculate P-values in a way that provides better control of Type I error in small samples. It accepts data in the same form as that generated in glmm.R.

  4. D

    Economic Transformation Database of Transition Economies

    • dataverse.nl
    Updated Apr 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Calumn Hamilton; Gaaitzen De Vries; Calumn Hamilton; Gaaitzen De Vries (2025). Economic Transformation Database of Transition Economies [Dataset]. http://doi.org/10.34894/E7MVOX
    Explore at:
    xlsx(185368), application/x-stata-14(168935)Available download formats
    Dataset updated
    Apr 29, 2025
    Dataset provided by
    DataverseNL
    Authors
    Calumn Hamilton; Gaaitzen De Vries; Calumn Hamilton; Gaaitzen De Vries
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Economic Transformation Database of Transition Economies [ETD-TE] provides a balanced panel of internationally comparable sectoral data on output and employment in fourteen former-Soviet Republics. The ETD-TE is designed to easily combine with the GGDC/UN-WIDER Economic Transformation Database [ETD]. It enables a comparative analysis of growth and structural transformation between the former-Soviet Union countries, and other advanced and developing countries. More information about this data set can be found on the associated page on the website of the Groningen Growth and Development Centre. When using these data (for whatever purpose), please make the following reference: Hamilton, C. and G. J. de Vries (2025). The Structural Transformation of Transition Economies. World Development, 191, Article 106977. User information The ETD-TE includes the following data: Countries Armenia Azerbaijan Belarus Estonia Georgia Kazakhstan Kyrgyzstan Latvia Lithuania Moldova Russian Federation Tajikistan Ukraine Uzbekistan Variables Persons Employed (thousands) Constant price value added in local currency (millions) Nominal price value added in local currency (millions) *Where countries changed or revalued currency during the sample period, all VA data is provided in units of the most recent/current currency. Sectors Agriculture, ISIC Rev. 4 code: A Mining, ISIC Rev. 4 code: B Manufacturing, ISIC Rev. 4 code: C Utilities, ISIC Rev. 4 code: D+E Construction, ISIC Rev. 4 code: F Trade Services, ISIC Rev. 4 code: G+I Transport, ISIC Rev. 4 code: H Business Services, ISIC Rev. 4 code: J+M+N Financial Services, ISIC Rev. 4 code: K Real Estate, ISIC Rev. 4 code: L Government Services, ISIC Rev. 4 code: O+P+Q Other Services, ISIC Rev. 4 code: R+S+T+U Time period Persons employed and constant price value added is provided annually for the period 1990-2019, and nominal price value added for the period 1995-2019.

  5. t

    Reproduction package for the dissertation on building transformation...

    • service.tib.eu
    Updated Aug 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Reproduction package for the dissertation on building transformation networks for consistent evolution of interrelated models - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/rdr-doi-10-35097-1281
    Explore at:
    Dataset updated
    Aug 4, 2023
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract: This repository provides different artifacts developed in and used for the evaluation of the dissertation "Building Transformation Networks for Consistent Evolution of Interrelated Models". It serves as a reproduction package for the contributions and evaluations of that thesis. The artifacts comprise an approach to evaluate compatibility of QVT-R transformations, evaluations of interoperability issues in transformation networks and approaches to avoid them, a language to define consistency between multiple models, and an evaluation of this language. The package contains a prepared execution environment for the different artifacts. In addition, it provides a script to run the environment for some of the artifacts and automatically resolve all dependencies based on Docker. TechnicalRemarks: Instructions on how to use the data can be found within the repository.

  6. f

    Long Covid Risk

    • figshare.com
    txt
    Updated Apr 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Shaheen (2024). Long Covid Risk [Dataset]. http://doi.org/10.6084/m9.figshare.25599591.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Apr 13, 2024
    Dataset provided by
    figshare
    Authors
    Ahmed Shaheen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Feature preparation Preprocessing was applied to the data, such as creating dummy variables and performing transformations (centering, scaling, YeoJohnson) using the preProcess() function from the “caret” package in R. The correlation among the variables was examined and no serious multicollinearity problems were found. A stepwise variable selection was performed using a logistic regression model. The final set of variables included: Demographic: age, body mass index, sex, ethnicity, smoking History of disease: heart disease, migraine, insomnia, gastrointestinal disease, COVID-19 history: covid vaccination, rashes, conjunctivitis, shortness of breath, chest pain, cough, runny nose, dysgeusia, muscle and joint pain, fatigue, fever ,COVID-19 reinfection, and ICU admission. These variables were used to train and test various machine-learning models Model selection and training The data was randomly split into 80% training and 20% testing subsets. The “h2o” package in R version 4.3.1 was employed to implement different algorithms. AutoML was first used, which automatically explored a range of models with different configurations. Gradient Boosting Machines (GBM), Random Forest (RF), and Regularized Generalized Linear Model (GLM) were identified as the best-performing models on our data and their parameters were fine-tuned. An ensemble method that stacked different models together was also used, as it could sometimes improve the accuracy. The models were evaluated using the area under the curve (AUC) and C-statistics as diagnostic measures. The model with the highest AUC was selected for further analysis using the confusion matrix, accuracy, sensitivity, specificity, and F1 and F2 scores. The optimal prediction threshold was determined by plotting the sensitivity, specificity, and accuracy and choosing the point of intersection as it balanced the trade-off between the three metrics. The model’s predictions were also plotted, and the quantile ranges were used to classify the model’s prediction as follows: > 1st quantile, > 2nd quantile, > 3rd quartile and < 3rd quartile (very low, low, moderate, high) respectively. Metric Formula C-statistics (TPR + TNR - 1) / 2 Sensitivity/Recall TP / (TP + FN) Specificity TN / (TN + FP) Accuracy (TP + TN) / (TP + TN + FP + FN) F1 score 2 * (precision * recall) / (precision + recall) Model interpretation We used the variable importance plot, which is a measure of how much each variable contributes to the prediction power of a machine learning model. In H2O package, variable importance for GBM and RF is calculated by measuring the decrease in the model's error when a variable is split on. The more a variable's split decreases the error, the more important that variable is considered to be. The error is calculated using the following formula: 𝑆𝐸=𝑀𝑆𝐸∗𝑁=𝑉𝐴𝑅∗𝑁 and then it is scaled between 0 and 1 and plotted. Also, we used The SHAP summary plot which is a graphical tool to visualize the impact of input features on the prediction of a machine learning model. SHAP stands for SHapley Additive exPlanations, a method to calculate the contribution of each feature to the prediction by averaging over all possible subsets of features [28]. SHAP summary plot shows the distribution of the SHAP values for each feature across the data instances. We use the h2o.shap_summary_plot() function in R to generate the SHAP summary plot for our GBM model. We pass the model object and the test data as arguments, and optionally specify the columns (features) we want to include in the plot. The plot shows the SHAP values for each feature on the x-axis, and the features on the y-axis. The color indicates whether the feature value is low (blue) or high (red). The plot also shows the distribution of the feature values as a density plot on the right.

  7. Z

    Data and R-scripts for "Land-use trajectories for sustainable land system...

    • data.niaid.nih.gov
    Updated Oct 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dominic A. Martin (2021). Data and R-scripts for "Land-use trajectories for sustainable land system transformations: identifying leverage points in a global biodiversity hotspot" (V2) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4601599
    Explore at:
    Dataset updated
    Oct 14, 2021
    Dataset authored and provided by
    Dominic A. Martin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sustainable land system transformations are necessary to avert biodiversity and climate collapse. However, it remains unclear where entry points for transformations exist in complex land systems. Here, we conceptualize land systems along land-use trajectories, which allows us to identify and evaluate leverage points; i.e., entry points on the trajectory where targeted interventions have particular leverage to influence land-use decisions. We apply this framework in the biodiversity hotspot Madagascar. In the Northeast, smallholder agriculture results in a land-use trajectory originating in old-growth forests, spanning forest fragments, and reaching shifting hill rice cultivation and vanilla agroforests. Integrating interdisciplinary empirical data on seven taxa, five ecosystem services, and three measures of agricultural productivity, we assess trade-offs and co-benefits of land-use decisions at three leverage points along the trajectory. These trade-offs and co-benefits differ between leverage points: two leverage points are situated at the conversion of old-growth forests and forest fragments to shifting cultivation and agroforestry, resulting in considerable trade-offs, especially between endemic biodiversity and agricultural productivity. Here, interventions enabling smallholders to conserve forests are necessary. This is urgent since ongoing forest loss threatens to eliminate these leverage points due to path-dependency. The third leverage point allows for the restoration of land under shifting cultivation through vanilla agroforests and offers co-benefits between restoration goals and agricultural productivity. The co-occurring leverage points highlight that conservation and restoration are simultaneously necessary. Methodologically, the framework shows how leverage points can be identified, evaluated, and harnessed for land system transformations under the consideration of path-dependency along trajectories.

  8. Anonymized Dataset for "Do Programmers Prefer Predictable Code"

    • zenodo.org
    zip
    Updated Mar 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous; Anonymous (2020). Anonymized Dataset for "Do Programmers Prefer Predictable Code" [Dataset]. http://doi.org/10.5281/zenodo.3659203
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 20, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous; Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This package contains the anonymized dataset, R notebook results, and R code for processing the meaning preserving transformations and human subject study. Note that the title has been changed from the earlier version on arvix which was "Do People Prefer 'Natural' Code?".

    See the README file for more details.

  9. D

    The Response Scale Transformation Project

    • ssh.datastations.nl
    • datacatalogue.cessda.eu
    ods, odt +3
    Updated Dec 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    de . de Jonge; de . de Jonge; R. Veenhoven; R. Veenhoven (2020). The Response Scale Transformation Project [Dataset]. http://doi.org/10.17026/DANS-ZX5-P7PE
    Explore at:
    odt(29645), text/x-fixed-field(22452), ods(11942), ods(11139), ods(12335), ods(13098), text/x-fixed-field(59602), ods(11639), text/x-fixed-field(132938), ods(17048), ods(11734), ods(28694), text/x-fixed-field(1966), ods(14745), ods(11407), ods(14469), ods(12777), ods(12209), text/x-fixed-field(24328), ods(11706), text/x-fixed-field(50572), text/x-fixed-field(18563), odt(9924), text/x-fixed-field(10883), text/x-fixed-field(202843), text/x-fixed-field(2386), ods(97328), text/x-fixed-field(11385), ods(102829), tsv(72547), ods(12341), ods(10873), ods(14458), text/x-fixed-field(10660), ods(38470), text/x-fixed-field(20572), ods(11725), text/x-fixed-field(8864), text/x-fixed-field(165989), ods(14478), text/x-fixed-field(152269), ods(28686), text/x-fixed-field(24767), text/x-fixed-field(2484), text/x-fixed-field(2201), zip(75713), ods(16139), text/x-fixed-field(4196), text/x-fixed-field(14402), odt(923518), ods(11649), text/x-fixed-field(141193), ods(12861), ods(11589), ods(12903), ods(11757), text/x-fixed-field(49805), text/x-fixed-field(12389), text/x-fixed-field(16732), text/x-fixed-field(195127), text/x-fixed-field(122450), ods(11357), text/x-fixed-field(2288), ods(110889), text/x-fixed-field(4853), odt(14649), text/x-fixed-field(1928), ods(12818), ods(12681), ods(11897), text/x-fixed-field(20730), text/x-fixed-field(82219), ods(12707), ods(12159), ods(12189), text/x-fixed-field(12852), odt(866474), ods(12251), text/x-fixed-field(110342), ods(12822), ods(11213), text/x-fixed-field(56990), ods(11821), ods(11480), ods(103685), ods(11803), odt(128432), text/x-fixed-field(19990), ods(12672), ods(12570), text/x-fixed-field(15210), ods(12086), text/x-fixed-field(27258), odt(48839), text/x-fixed-field(3925), ods(12771), tsv(8015), tsv(1382), tsv(1156), tsv(55966), tsv(15059), tsv(22016), tsv(108268), tsv(39979), tsv(117623), tsv(176555), tsv(96630), tsv(13894), tsv(8531), tsv(10586), tsv(2795), tsv(1843), tsv(145500), tsv(879), tsv(9895), tsv(3784), tsv(123192), tsv(136784), tsv(12070), tsv(11112), tsv(17100), tsv(15037), tsv(15159), tsv(1230), tsv(938), tsv(10114), tsv(17282), tsv(3351), tsv(18397), tsv(24102), tsv(43030), tsv(20907), tsv(47877), tsv(173744)Available download formats
    Dataset updated
    Dec 9, 2020
    Dataset provided by
    DANS Data Station Social Sciences and Humanities
    Authors
    de . de Jonge; de . de Jonge; R. Veenhoven; R. Veenhoven
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    In this project we have reviewed existing methods used to homogenize data and developed several new methods for dealing with this diversity in survey questions on the same subject. The project is a spin-off from the World Database of Happiness, the main aim of which is to collate and make available research findings on the subjective enjoyment of life and to prepare these data for research synthesis. The first methods we discuss were proposed in the book ‘Happiness in Nations’ and which were used at the inception of the World Database of Happiness. Some 10 years later a new method was introduced: the International Happiness Scale Interval Study (HSIS). Taking the HSIS as a basis the Continuum Approach was developed. Then, building on this approach, we developed the Reference Distribution Method.

  10. d

    R-scripts for uncertainty analysis v01

    • data.gov.au
    • researchdata.edu.au
    • +2more
    zip
    Updated Apr 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bioregional Assessment Program (2022). R-scripts for uncertainty analysis v01 [Dataset]. https://data.gov.au/data/dataset/322c38ef-272f-4e77-964c-a14259abe9cf
    Explore at:
    zip(9161)Available download formats
    Dataset updated
    Apr 13, 2022
    Dataset authored and provided by
    Bioregional Assessment Program
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    Abstract

    This dataset was created within the Bioregional Assessment Programme. Data has not been derived from any source datasets. Metadata has been compiled by the Bioregional Assessment Programme.

    This dataset contains a set of generic R scripts that are used in the propagation of uncertainty through numerical models.

    Dataset History

    The dataset contains a set of R scripts that are loaded as a library. The R scripts are used to carry out the propagation of uncertainty through numerical models. The scripts contain the functions to create the statistical emulators and do the necessary data transformations and backtransformations. The scripts are self-documenting and created by Dan Pagendam (CSIRO) and Warren Jin (CSIRO).

    Dataset Citation

    Bioregional Assessment Programme (2016) R-scripts for uncertainty analysis v01. Bioregional Assessment Source Dataset. Viewed 13 March 2019, http://data.bioregionalassessments.gov.au/dataset/322c38ef-272f-4e77-964c-a14259abe9cf.

  11. Statistical analysis for: Mode I fracture of beech-adhesive bondline at...

    • zenodo.org
    • data.niaid.nih.gov
    bin, csv, html, txt
    Updated Oct 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Burnard; Michael Burnard; Jaka Gašper Pečnik; Jaka Gašper Pečnik (2022). Statistical analysis for: Mode I fracture of beech-adhesive bondline at three different temperatures [Dataset]. http://doi.org/10.5281/zenodo.6839197
    Explore at:
    csv, html, bin, txtAvailable download formats
    Dataset updated
    Oct 4, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Michael Burnard; Michael Burnard; Jaka Gašper Pečnik; Jaka Gašper Pečnik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset collects a raw dataset and a processed dataset derived from the raw dataset. There is a document containing the analytical code for statistical analysis of the processed dataset in .Rmd format and .html format.

    The study examined some aspects of mechanical performance of solid wood composites. We were interested in certain properties of solid wood composites made using different adhesives with different grain orientations at the bondline, then treated at different temperatures prior to testing.

    Performance was tested by assessing fracture energy and critical fracture energy, lap shear strength, and compression strength of the composites. This document concerns only the fracture properties, which are the focus of the related paper.

    Notes:

    * the raw data is provided in this upload, but the processing is not addressed here.
    * the authors of this document are a subset of the authors of the related paper.
    * this document and the related data files were uploaded at the time of submission for review. An update providing the doi of the related paper will be provided when it is available.

  12. t

    Solar self-sufficient households as a driving factor for sustainability...

    • service.tib.eu
    Updated Nov 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Solar self-sufficient households as a driving factor for sustainability transformation - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/luh-solar-self-sufficient-households-as-a-driving-factor-for-sustainability-transformation
    Explore at:
    Dataset updated
    Nov 14, 2024
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    To get the consumption model from Section 3.1, one needs load execute the file consumption_data.R. Load the data for the 3 Phases ./data/CONSUMPTION/PL1.csv, PL2.csv, PL3.csv, transform the data and build the model (starting line 225). The final consumption data can be found in one file for each year in ./data/CONSUMPTION/MEGA_CONS_list.Rdata To get the results for the optimization problem, one needs to execute the file analyze_data.R. It provides the functions to compare production and consumption data, and to optimize for the different values (PV, MBC,). To reproduce the figures one needs to execute the file visualize_results.R. It provides the functions to reproduce the figures. To calculate the solar radiation that is needed in the Section Production Data, follow file calculate_total_radiation.R. To reproduce the radiation data from from ERA5, that can be found in data.zip, do the following steps: 1. ERA5 - download the reanalysis datasets as GRIB file. For FDIR select "Total sky direct solar radiation at surface", for GHI select "Surface solar radiation downwards", and for ALBEDO select "Forecast albedo". 2. convert GRIB to csv with the file era5toGRID.sh 3. convert the csv file to the data that is used in this paper with the file convert_year_to_grid.R

  13. d

    ScienceBase Item Summary Page

    • datadiscoverystudio.org
    Updated Jun 27, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). ScienceBase Item Summary Page [Dataset]. http://datadiscoverystudio.org/geoportal/rest/metadata/item/66670ff3130144f3b0e96f0a97460d4c/html
    Explore at:
    Dataset updated
    Jun 27, 2018
    Area covered
    Description

    Link to the ScienceBase Item Summary page for the item described by this metadata record. Service Protocol: Link to the ScienceBase Item Summary page for the item described by this metadata record. Application Profile: Web Browser. Link Function: information

  14. U

    (R) MARKET TRANSFORMATION PROGRAMME ON ENERGY EFFICIENCY IN GHG-INTENSIVE...

    • unido.org
    Updated Jul 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UNIDO (2025). (R) MARKET TRANSFORMATION PROGRAMME ON ENERGY EFFICIENCY IN GHG-INTENSIVE INDUSTRIES IN RUSSIA. PROJECT PREPARATION SUPPORT FOR THE DEVELOPMENT OF INDUSTRIAL ENERGY EFFICIENCY MARKETS IN RUSSIA. FINAL REPORT (23761.en) [Dataset]. https://www.unido.org/publications/ot/9656887
    Explore at:
    Dataset updated
    Jul 4, 2025
    Dataset authored and provided by
    UNIDO
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2010
    Area covered
    Europe, Central Asia, Russia
    Description

    (R) MARKET TRANSFORMATION PROGRAMME ON ENERGY EFFICIENCY IN GHG-INTENSIVE INDUSTRIES IN RUSSIA. PROJECT PREPARATION SUPPORT FOR THE DEVELOPMENT OF INDUSTRIAL ENERGY EFFICIENCY MARKETS IN RUSSIA. FINAL REPORT (23761.en). With geographic focus on Europe and Central Asia, Russian Federation.

  15. U

    (R) THAILANDE. UNITE INDUSTRIELLE DE TRANSFORMATION DE FEVES DE CACAO. ETUDE...

    • unido.org
    Updated Jul 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UNIDO (2025). (R) THAILANDE. UNITE INDUSTRIELLE DE TRANSFORMATION DE FEVES DE CACAO. ETUDE TECHNICO-ECONOMIQUE (17081f.fr) [Dataset]. https://www.unido.org/publications/ot/9645570
    Explore at:
    Dataset updated
    Jul 9, 2025
    Dataset authored and provided by
    UNIDO
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    1988
    Area covered
    Thailand, Asia and the Pacific
    Description

    UNIDO pub. Final report on cocoa beans processing in Thailand - covers (1) cocoa as a raw material (2) chocolate and cocoa powder domestic production; cocoa beans quality (3) semimanufactured products and finished products; consumption (4) location of industry; production capacity (5) production costs; electric power (6) capital investment; financing. Diagrams, statistics. Restricted.

  16. U

    (R) STUDY ON THE TRANSFORMATION OF THE RUSSIAN PHARMACEUTICAL INDUSTRY TO A...

    • unido.org
    Updated Jul 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UNIDO (2025). (R) STUDY ON THE TRANSFORMATION OF THE RUSSIAN PHARMACEUTICAL INDUSTRY TO A MARKET-ORIENTED SYSTEM. GENERAL ECONOMIC CONTEXT OF THE PROJECT. A SUMMARY (20338.en) [Dataset]. https://www.unido.org/publications/ot/9657871
    Explore at:
    Dataset updated
    Jul 4, 2025
    Dataset authored and provided by
    UNIDO
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    1993
    Area covered
    Russia, Central Asia
    Description

    UNIDO pub on transformation of the Russian pharmaceutical industry to a market oriented system - covers (1) the Russian national economy in 1993, gross domestic product, employment by main sectors, inflation, balance of trade (2) the Russian health care market, prospects, need of pharmaceuticals. Restricted.

  17. Anonymized Dataset for "Do People Prefer 'Natural' Code?"

    • zenodo.org
    zip
    Updated Mar 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous; Anonymous (2020). Anonymized Dataset for "Do People Prefer 'Natural' Code?" [Dataset]. http://doi.org/10.5281/zenodo.3375005
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 20, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous; Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This package contains the anonymized dataset, R notebook results, and R code for processing the meaning preserving transformations and human subject study.

    See the README file for more details.

  18. U

    Data sets for the Journal of Non-Crystalline Solids X: Article entitled...

    • researchdata.bath.ac.uk
    • search.datacite.org
    c
    Updated May 10, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Philip Salmon; Anita Zeidler (2019). Data sets for the Journal of Non-Crystalline Solids X: Article entitled "Pressure induced structural transformations in amorphous MgSiO_3 and CaSiO_3" [Dataset]. http://doi.org/10.15125/BATH-00601
    Explore at:
    cAvailable download formats
    Dataset updated
    May 10, 2019
    Dataset provided by
    University of Bath
    Authors
    Philip Salmon; Anita Zeidler
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    University of Bath
    Institut Laue-Langevin
    Royal Society
    United States Department of Energy
    Engineering and Physical Sciences Research Council
    Japan Society for the Promotion of Science
    Atomic Weapons Establishment
    Description

    Data sets used to prepare Figures 1-14 in the Journal of Non-Crystalline Solids X article entitled "Pressure induced structural transformations in amorphous MgSiO_3 and CaSiO_3." The files are labelled according to the figure numbers. The data sets were created using the methodology described in the manuscript. Each of the plots was drawn using QtGrace (https://sourceforge.net/projects/qtgrace/). The data set corresponding to a plotted curve within an QtGrace file can be identified by clicking on that curve. The units for each axis are identified on the plots.

    Figure 1 shows the pressure-volume EOS at room temperature for amorphous and crystalline (a) MgSiO_3 and (b) CaSiO_3.

    Figure 2 shows the pressure dependence of the neutron total structure factor S_{N}(k) for amorphous (a) MgSiO_3 and (b) CaSiO_3.

    Figure 3 shows the pressure dependence of the neutron total pair-distribution function G_{N}(r) for amorphous (a) MgSiO_3 and (b) CaSiO_3.

    Figure 4 shows the pressure dependence of several D′_{N}(r) functions for amorphous MgSiO_3 measured using the D4c diffractometer.

    Figure 5 shows the pressure dependence of the Si-O coordination number in amorphous (a) MgSiO_3 and (b) CaSiO_3, the Si-O bond length in amorphous (c) MgSiO_3 and (d) CaSiO_3, and (e) the fraction of n-fold (n = 4, 5, or 6) coordinated Si atoms in these materials.

    Figure 6 shows the pressure dependence of the M-O (a) coordination number and (b) bond length for amorphous MgSiO_3 and CaSiO_3.

    Figure 7 shows the S_{N}(k) or S_{X}(k) functions for (a) MgSiO_3 and (b) CaSiO_3 after recovery from a pressure of 8.2 or 17.5 GPa.

    Figure 8 shows the G_{N}(r) or G_{X}(r) functions for (a) MgSiO_3 and (b) CaSiO_3 after recovery from a pressure of 8.2 or 17.5 GPa.

    Figure 9 shows the pressure dependence of the Q^n speciation for fourfold coordinated Si atoms in amorphous (a) MgSiO_3 and (b) CaSiO_3.

    Figure 10 shows the pressure dependence in amorphous MgSiO_3 and CaSiO_3 of (a) the overall M-O coordination number and its contributions from M-BO and M-NBO connections, (b) the fractions of M-BO and M-NBO bonds, and (c) the associated M-BO and M-NBO bond distances.

    Figure 11 shows the pressure dependence of the fraction of n-fold (n = 4, 5, 6, 7, 8, or 9) coordinated M atoms in amorphous (a) MgSiO_3 and (b) CaSiO_3.

    Figure 12 shows the pressure dependence of the O-Si-O, Si-O-Si, Si-O-M, O-M-O and M-O-M bond angle distributions (M = Mg or Ca) for amorphous MgSiO_3 (left hand column) and CaSiO_3 (right hand column).

    Figure 13 shows the pressure dependence of the q-parameter distributions for n-fold (n = 4, 5, or 6) coordinated Si atoms in amorphous (a) MgSiO_3 and (b) CaSiO_3.

    Figure 14 shows the pressure dependence of the q-parameter distributions for the M atoms in amorphous MgSiO_3 (left hand column) and CaSiO_3 (right hand column).

  19. g

    Microbial transformations of arsenic in the environment : from soda lakes to...

    • data.gmob.ca
    Updated May 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Academic Institutions (2025). Microbial transformations of arsenic in the environment : from soda lakes to aquifers / Jonathan R. Lloyd and Ronald S. Oremland. [Dataset]. https://data.gmob.ca/dataset/microbial-transformations-of-arsenic-in-the-environment-from-soda-lakes-to-aquifers-jonathan-r-llo
    Explore at:
    Dataset updated
    May 12, 2025
    Dataset provided by
    Academic Institutions
    Description

    Elements vol.2, 85-90. Print copy included in a collection of articles from the April 2006 issue of Elements.

  20. C2Metadata test files

    • openicpsr.org
    spss, zip
    Updated Aug 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    George Alter (2020). C2Metadata test files [Dataset]. http://doi.org/10.3886/E120642V1
    Explore at:
    spss, zipAvailable download formats
    Dataset updated
    Aug 16, 2020
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    Authors
    George Alter
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The C2Metadata (“Continuous Capture of Metadata”) Project automates one of the most burdensome aspects of documenting the provenance of research data: describing data transformations performed by statistical software. Researchers in many fields use statistical software (SPSS, Stata, SAS, R, Python) for data transformation and data management as well as analysis. Scripts used with statistical software are translated into an independent Structured Data Transformation Language (SDTL), which serves as an intermediate language for describing data transformations. SDTL can be used to add variable-level provenance to data catalogs and codebooks and to create “variable lineages” for auditing software operations. This repository provides examples of scripts and metadata for use in testing C2Metadata tools.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Eric W. Fox; Jay M. Ver Hoef; Anthony R. Olsen (2023). Comparing spatial regression to random forests for large environmental data sets [Dataset]. http://doi.org/10.1371/journal.pone.0229509

Comparing spatial regression to random forests for large environmental data sets

Explore at:
69 scholarly articles cite this dataset (View in Google Scholar)
pdfAvailable download formats
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Eric W. Fox; Jay M. Ver Hoef; Anthony R. Olsen
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Environmental data may be “large” due to number of records, number of covariates, or both. Random forests has a reputation for good predictive performance when using many covariates with nonlinear relationships, whereas spatial regression, when using reduced rank methods, has a reputation for good predictive performance when using many records that are spatially autocorrelated. In this study, we compare these two techniques using a data set containing the macroinvertebrate multimetric index (MMI) at 1859 stream sites with over 200 landscape covariates. A primary application is mapping MMI predictions and prediction errors at 1.1 million perennial stream reaches across the conterminous United States. For the spatial regression model, we develop a novel transformation procedure that estimates Box-Cox transformations to linearize covariate relationships and handles possibly zero-inflated covariates. We find that the spatial regression model with transformations, and a subsequent selection of significant covariates, has cross-validation performance comparable to random forests. We also find that prediction interval coverage is close to nominal for each method, but that spatial regression prediction intervals tend to be narrower and have less variability than quantile regression forest prediction intervals. A simulation study is used to generalize results and clarify advantages of each modeling approach.

Search
Clear search
Close search
Google apps
Main menu