3 datasets found

f
Data from: Fast robust SUR with economical and actuarial applications
wiley.figshare.com
search.datacite.org
txt
Updated Jul 14, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mia Hubert; Tim Verdonck; Ozlem Yorulmaz (2016). Data from: Fast robust SUR with economical and actuarial applications [Dataset]. http://doi.org/10.6084/m9.figshare.3408073.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3408073.v1
Dataset updated
Jul 14, 2016
Dataset provided by
Wiley
Authors
Mia Hubert; Tim Verdonck; Ozlem Yorulmaz
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The seemingly unrelated regression (SUR) model is a generalization of a linear regression model consisting of more than one equation, where the error terms of these equations are contemporaneously correlated. The standard Feasible Generalized Linear Squares (FGLS) estimator is efficient as it takes into account the covariance structure of the errors, but it is also very sensitive to outliers. The robust SUR estimator of Bilodeau and Duchesne (Canadian Journal of Statistics, 28:277-288, 2000) can accommodate outliers, but it is hard to compute. First we propose a fast algorithm, FastSUR, for its computation and show its good performance in a simulation study. We then provide diagnostics for outlier detection and illustrate them on a real data set from economics. Next we apply our FastSUR algorithm in the framework of stochastic loss reserving for general insurance. We focus on the General Multivariate Chain Ladder (GMCL) model that employs SUR to estimate its parameters. Consequently, this multivariate stochastic reserving method takes into account the contemporaneous correlations among run-off triangles and allows structural connections between these triangles. We plug in our FastSUR algorithm into the GMCL model to obtain a robust version.
Extended 1.0 Dataset of "Concentration and Geospatial Modelling of Health...
zenodo.org
bin, csv, pdf
Updated Sep 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peter Domjan; Peter Domjan; Viola Angyal; Viola Angyal; Istvan Vingender; Istvan Vingender (2024). Extended 1.0 Dataset of "Concentration and Geospatial Modelling of Health Development Offices' Accessibility for the Total and Elderly Populations in Hungary" [Dataset]. http://doi.org/10.5281/zenodo.13826993
Explore at:
bin, pdf, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13826993
Dataset updated
Sep 23, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Peter Domjan; Peter Domjan; Viola Angyal; Viola Angyal; Istvan Vingender; Istvan Vingender
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Sep 23, 2024
Area covered
Hungary
Description
Introduction

We are enclosing the database used in our research titled "Concentration and Geospatial Modelling of Health Development Offices' Accessibility for the Total and Elderly Populations in Hungary", along with our statistical calculations. For the sake of reproducibility, further information can be found in the file Short_Description_of_Data_Analysis.pdf and Statistical_formulas.pdf

The sharing of data is part of our aim to strengthen the base of our scientific research. As of March 7, 2024, the detailed submission and analysis of our research findings to a scientific journal has not yet been completed.

The dataset was expanded on 23rd September 2024 to include SPSS statistical analysis data, a heatmap, and buffer zone analysis around the Health Development Offices (HDOs) created in QGIS software.

Short Description of Data Analysis and Attached Files (datasets):

Our research utilised data from 2022, serving as the basis for statistical standardisation. The 2022 Hungarian census provided an objective basis for our analysis, with age group data available at the county level from the Hungarian Central Statistical Office (KSH) website. The 2022 demographic data provided an accurate picture compared to the data available from the 2023 microcensus. The used calculation is based on our standardisation of the 2022 data. For xlsx files, we used MS Excel 2019 (version: 1808, build: 10406.20006) with the SOLVER add-in.

Hungarian Central Statistical Office served as the data source for population by age group, county, and regions: https://www.ksh.hu/stadat_files/nep/hu/nep0035.html, (accessed 04 Jan. 2024.) with data recorded in MS Excel in the Data_of_demography.xlsx file.

In 2022, 108 Health Development Offices (HDOs) were operational, and it's noteworthy that no developments have occurred in this area since 2022. The availability of these offices and the demographic data from the Central Statistical Office in Hungary are considered public interest data, freely usable for research purposes without requiring permission.

The contact details for the Health Development Offices were sourced from the following page (Hungarian National Population Centre (NNK)): https://www.nnk.gov.hu/index.php/efi (n=107). The Semmelweis University Health Development Centre was not listed by NNK, hence it was separately recorded as the 108th HDO. More information about the office can be found here: https://semmelweis.hu/egeszsegfejlesztes/en/ (n=1). (accessed 05 Dec. 2023.)

Geocoordinates were determined using Google Maps (N=108): https://www.google.com/maps. (accessed 02 Jan. 2024.) Recording of geocoordinates (latitude and longitude according to WGS 84 standard), address data (postal code, town name, street, and house number), and the name of each HDO was carried out in the: Geo_coordinates_and_names_of_Hungarian_Health_Development_Offices.csv file.

The foundational software for geospatial modelling and display (QGIS 3.34), an open-source software, can be downloaded from:

https://qgis.org/en/site/forusers/download.html. (accessed 04 Jan. 2024.)

The HDOs_GeoCoordinates.gpkg QGIS project file contains Hungary's administrative map and the recorded addresses of the HDOs from the

Geo_coordinates_and_names_of_Hungarian_Health_Development_Offices.csv file,

imported via .csv file.

The OpenStreetMap tileset is directly accessible from www.openstreetmap.org in QGIS. (accessed 04 Jan. 2024.)

The Hungarian county administrative boundaries were downloaded from the following website: https://data2.openstreetmap.hu/hatarok/index.php?admin=6 (accessed 04 Jan. 2024.)

HDO_Buffers.gpkg is a QGIS project file that includes the administrative map of Hungary, the county boundaries, as well as the HDO offices and their corresponding buffer zones with a radius of 7.5 km.

Heatmap.gpkg is a QGIS project file that includes the administrative map of Hungary, the county boundaries, as well as the HDO offices and their corresponding heatmap (Kernel Density Estimation).

A brief description of the statistical formulas applied is included in the Statistical_formulas.pdf.

Recording of our base data for statistical concentration and diversification measurement was done using MS Excel 2019 (version: 1808, build: 10406.20006) in .xlsx format.

Aggregated number of HDOs by county: Number_of_HDOs.xlsx

Standardised data (Number of HDOs per 100,000 residents): Standardized_data.xlsx

Calculation of the Lorenz curve: Lorenz_curve.xlsx

Calculation of the Gini index: Gini_Index.xlsx

Calculation of the LQ index: LQ_Index.xlsx

Calculation of the Herfindahl-Hirschman Index: Herfindahl_Hirschman_Index.xlsx

Calculation of the Entropy index: Entropy_Index.xlsx

Regression and correlation analysis calculation: Regression_correlation.xlsx

Using the SPSS 29.0.1.0 program, we performed the following statistical calculations with the databases Data_HDOs_population_without_outliers.sav and Data_HDOs_population.sav:

Regression curve estimation with elderly population and number of HDOs, excluding outlier values (Types of analyzed equations: Linear, Logarithmic, Inverse, Quadratic, Cubic, Compound, Power, S, Growth, Exponential, Logistic, with summary and ANOVA analysis table): Curve_estimation_elderly_without_outlier.spv

Pearson correlation table between the total population, elderly population, and number of HDOs per county, excluding outlier values such as Budapest and Pest County: Pearson_Correlation_populations_HDOs_number_without_outliers.spv.

Dot diagram including total population and number of HDOs per county, excluding outlier values such as Budapest and Pest Counties: Dot_HDO_total_population_without_outliers.spv.

Dot diagram including elderly (64<) population and number of HDOs per county, excluding outlier values such as Budapest and Pest Counties: Dot_HDO_elderly_population_without_outliers.spv

Regression curve estimation with total population and number of HDOs, excluding outlier values (Types of analyzed equations: Linear, Logarithmic, Inverse, Quadratic, Cubic, Compound, Power, S, Growth, Exponential, Logistic, with summary and ANOVA analysis table): Curve_estimation_without_outlier.spv

Dot diagram including elderly (64<) population and number of HDOs per county: Dot_HDO_elderly_population.spv

Dot diagram including total population and number of HDOs per county: Dot_HDO_total_population.spv

Pearson correlation table between the total population, elderly population, and number of HDOs per county: Pearson_Correlation_populations_HDOs_number.spv

Regression curve estimation with total population and number of HDOs, (Types of analyzed equations: Linear, Logarithmic, Inverse, Quadratic, Cubic, Compound, Power, S, Growth, Exponential, Logistic, with summary and ANOVA analysis table): Curve_estimation_total_population.spv

For easier readability, the files have been provided in both SPV and PDF formats.

The translation of these supplementary files into English was completed on 23rd Sept. 2024.

If you have any further questions regarding the dataset, please contact the corresponding author: domjan.peter@phd.semmelweis.hu
f
Data from: Benchmarking Basis Sets for Density Functional Theory...
acs.figshare.com
xlsx
Updated Nov 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samuel J. Pitman; Alicia K. Evans; Robbie T. Ireland; Felix Lempriere; Laura K. McKemmish (2023). Benchmarking Basis Sets for Density Functional Theory Thermochemistry Calculations: Why Unpolarized Basis Sets and the Polarized 6‑311G Family Should Be Avoided [Dataset]. http://doi.org/10.1021/acs.jpca.3c05573.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jpca.3c05573.s001
Dataset updated
Nov 20, 2023
Dataset provided by
ACS Publications
Authors
Samuel J. Pitman; Alicia K. Evans; Robbie T. Ireland; Felix Lempriere; Laura K. McKemmish
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Basis sets are a crucial but often largely overlooked choice in setting up quantum chemistry calculations. The choice of the basis set can be critical in determining the accuracy and calculation time of your quantum chemistry calculations. Clear recommendations based on thorough benchmarking are essential but not readily available currently. This study investigates the relative quality of basis sets for general properties by benchmarking basis set performance for a diverse set of 139 reactions (from the diet-150-GMTKN55 data set). In our analysis, we find the distributions of errors are often significantly non-Gaussian, meaning that the joint consideration of median errors, mean absolute errors, and outlier statistics is helpful to provide a holistic understanding of basis set performance. Our direct comparison of performance between most modern basis sets provides quantitative evidence for basis set recommendations that broadly align with the established understanding of basis set experts and is evident in the design of modern basis sets. For example, while zeta is a good measure of quality, it is not the only determining factor for an accurate calculation with unpolarized double- and triple-ζ basis sets (like 6-31G and 6-311G) having very poor performance. Appropriate use of polarization functions (e.g., 6-31G*) is essential to obtain the accuracy offered by double- or triple-ζ basis sets. In our study, the best performances for double- and triple-ζ basis sets are 6-31++G** and pcseg-2, respectively. However, the performances of singly polarized double-ζ and doubly polarized triple-ζ basis sets are quite similar with one key exception: the polarized 6-311G basis set family has poor parametrization, which means its performance is more like a double-ζ than a triple-ζ basis set. All versions of the 6-311G basis set family should be avoided entirely for valence chemistry calculations moving forward.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Mia Hubert; Tim Verdonck; Ozlem Yorulmaz (2016). Data from: Fast robust SUR with economical and actuarial applications [Dataset]. http://doi.org/10.6084/m9.figshare.3408073.v1

Data from: Fast robust SUR with economical and actuarial applications

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.3408073.v1

Dataset updated

Jul 14, 2016

Dataset provided by

Wiley

Authors

Mia Hubert; Tim Verdonck; Ozlem Yorulmaz

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

The seemingly unrelated regression (SUR) model is a generalization of a linear regression model consisting of more than one equation, where the error terms of these equations are contemporaneously correlated. The standard Feasible Generalized Linear Squares (FGLS) estimator is efficient as it takes into account the covariance structure of the errors, but it is also very sensitive to outliers. The robust SUR estimator of Bilodeau and Duchesne (Canadian Journal of Statistics, 28:277-288, 2000) can accommodate outliers, but it is hard to compute. First we propose a fast algorithm, FastSUR, for its computation and show its good performance in a simulation study. We then provide diagnostics for outlier detection and illustrate them on a real data set from economics. Next we apply our FastSUR algorithm in the framework of stochastic loss reserving for general insurance. We focus on the General Multivariate Chain Ladder (GMCL) model that employs SUR to estimate its parameters. Consequently, this multivariate stochastic reserving method takes into account the contemporaneous correlations among run-off triangles and allows structural connections between these triangles. We plug in our FastSUR algorithm into the GMCL model to obtain a robust version.

Clear search

Close search

Google apps

Main menu

Data from: Fast robust SUR with economical and actuarial applications

Extended 1.0 Dataset of "Concentration and Geospatial Modelling of Health...

Data from: Benchmarking Basis Sets for Density Functional Theory...

Data from: Fast robust SUR with economical and actuarial applications