Facebook
TwitterMy Grandpa asked if the programs I was using could calculate his Golf League’s handicaps, so I decided to play around with SQL and Google Sheets to see if I could functionally recreate what they were doing.
The goal is to calculate a player’s handicap, which is the average of the last six months of their scores minus 29. The average is calculated based on how many games they have actually played in the last six months, and the number of scores averaged correlates to total games. For example, Clem played over 20 games so his handicap will be calculated with the maximum possible scores accounted for, that being 8. Schomo only played six games, so the lowest 4 will be used for their average. Handicap is always calculated with the lowest available scores.
This league uses Excel, so upon receiving the data I converted it into a CSV and uploaded it into bigQuery.
First thing I did was change column names to best represent what they were and simplify things in the code. It is much easier to remember ‘someone_scores’ than ‘int64_field_number’. It also seemed to confuse SQL less, as int64 can mean something independently.
(ALTER TABLE grandpa-golf.grandpas_golf_35.should only need the one
RENAME COLUMN int64_field_4 TO schomo_scores;)
To Find the average of Clem’s scores:
SELECT AVG(clem_scores)
FROM grandpa-golf.grandpas_golf_35.should only need the one
LIMIT 8; RESULT: 43.1
Remembering that handicap is the average minus 29, the final computation looks like:
SELECT AVG(clem_scores) - 29
FROM grandpa-golf.grandpas_golf_35.should only need the one
LIMIT 8; RESULT: 14.1
Find the average of Schomo’s scores:
SELECT AVG(schomo_scores) - 29
FROM grandpa-golf.grandpas_golf_35.should only need the one
LIMIT 6; RESULT: 10.5
This data was already automated to calculate a handicap in the league’s excel spreadsheet, so I asked for more data to see if i could recreate those functions.
Grandpa provided the past three years of league data. The names were all replaced with generic “Golfer 001, Golfer 002, etc”. I had planned on converting this Excel sheet into a CSV and manipulating it in SQL like with the smaller sample, but this did not work.
Immediately, there were problems. I had initially tried to just convert the file into a CSV and drop it into SQL, but there were functions that did not transfer properly from what was functionally the PDF I had been emailed. So instead of working with SQL, I decided to pull this into google sheets and recreate the functions for this spreadsheet. We only need the most recent 6 months of scores to calculate our handicap, so once I made a working copy I deleted the data from before this time period. Once that was cleaned up, I started working on a function that would pull the working average from these values, which is still determined by how many total values there were. This correlates as follows: for 20 or more scores average the lowest 8, for 15 to 19 scores average the lowest 6, for 6 to 14 scores average the lowest 4 and for 6 or fewer scores average the lowest 2. We also need to ensure that an average value of 0 returns a value of 0 so our handicap calculator works. My formula ended up being:
=IF(COUNT(E2:AT2)>=20, AVERAGE(SMALL(E2:AT2, ROW(INDIRECT("1:"&8)))), IF(COUNT(E2:AT2)>=15, AVERAGE(SMALL(E2:AT2, ROW(INDIRECT("1:"&6)))), IF(COUNT(E2:AT2)>=6, AVERAGE(SMALL(E2:AT2, ROW(INDIRECT("1:"&4)))), IF(COUNT(E2:AT2)>=1, AVERAGE(SMALL(E2:AT2, ROW(INDIRECT("1:"&2)))), IF(COUNT(E2:AT2)=0, 0, "")))))
The handicap is just this value minus 29, so for the handicap column the script is relatively simple: =IF(D2=0,0,IF(D2>47,18,D2-29)) This ensures that we will not get a negative value for our handicap, and pulls the basic average from the right place. It also sets the handicap to zero if there are no scores present.
Now that we have our spreadsheet back in working order with our new scripts, we are functionally done. We have recreated what my Grandpa’s league uses to generate handicaps.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
These data are mainly obtained from the sliceomatic software for the measurements of angles, lever arms and volume of reconstructions. The ratios have been calculated on excel
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Nigeria adopted dolutegravir (DTG) as part of first line (1L) antiretroviral therapy (ART) in 2017. However, there is limited documented experience using DTG in sub-Saharan Africa. Our study assessed DTG acceptability from the patient’s perspective as well as treatment outcomes at 3 high-volume facilities in Nigeria. This is a mixed method prospective cohort study with 12 months of follow-up between July 2017 and January 2019. Patients who had intolerance or contraindications to non-nucleoside reverse-transcriptase inhibitors were included. Patient acceptability was assessed through one-on-one interviews at 2, 6, and 12 months following DTG initiation. ART-experienced participants were asked about side effects and regimen preference compared to their previous regimen. Viral load (VL) and CD4+ cell count tests were assessed according to the national schedule. Data were analysed in MS Excel and SAS 9.4. A total of 271 participants were enrolled on the study, the median age of participants was 45 years, 62% were female. 229 (206 ART-experienced, 23 ART-naive) of enrolled participants were interviewed at 12 months. 99.5% of ART-experienced study participants preferred DTG to their previous regimen. 32% of particpants reported at least one side effect. “Increase in appetite” was most frequently reported (15%), followed by insomnia (10%) and bad dreams (10%). Average adherence as measured by drug pick-up was 99% and 3% reported a missed dose in the 3 days preceding their interview. Among participants with VL results (n = 199), 99% were virally suppressed (
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Excel file containing source data for Figs 1–4, 6, B and C in S1 Text.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A Microsoft Excel file containing the data that forms the figures in the main text of the publication. The Microsoft Excel file contains the ultrafast transient absorption and photoluminescence data for TXO-TPA and 4CzIPN, presented in wavelength (nm) and time (ps). Also included is the steady state Raman and impulsive vibrational spectra of TXO-TPA and 4CzIPN at 0.5, 3, and 10 ps (in wavenumbers). The full datasets for the quantum-chemical molecular dynamics simulations of TXO-TPA are also presented. See the main manuscript for more details.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This package contains the files required for replicating the results reported in the paper “The Flexible Reverse Approach for Decomposing Economic Inefficiency: With an Application to Taiwanese Banks” coauthored with Jesús T Pastor, Juan Aparicio, and Javier Alcaraz and accepted for publication in June 2024 in Economic Modelling.
The package contains:
A Word™ file describing the content of the accompanying Excel file, where the results of the example reported in Table 1 are replicated. The file includes basic instructions for running Excel’s Solver and the Visual Basic for Applications (VBA) macros that automate the optimization processes for all firms.
An Excel™ file consisting of four tabs. The first tab presents the data, while each successive tab includes the models and results for the weighted additive technical inefficiency (Model_1), profit inefficiency (Model_4), and the closest benchmarks maximizing profit (Model_5).
The replication files correspond to the example used to illustrate the flexible reverse approach for measuring and decomposing profit inefficiency. The data on Taiwanese banks, collected and studied by Juo et al. (2015), wwere kindly provided by Prof. Tsu-Tan Fu. Since these data are not publicly available, readers interested in replicating the empirical application should contact the above authors. The spreadsheets can be easily modified to measure and decompose the profit inefficiency of any dataset of choice.
Reference: Juo, J. C., Fu, T. T., Yu, M. M., & Lin, Y. H. (2015). Profit-oriented productivity change. Omega, 57, 176-187.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Reinforcement learning (RL) has demonstrated significant potential in social robot autonomous navigation, yet existing research lacks in-depth discussion on the feasibility of navigation strategies. Therefore, this paper proposes an Integrated Decision-Control Framework for Social Robot Autonomous Navigation (IDC-SRAN), which accounts for the nonlinearity of social robot model and ensures the feasibility of decision-control strategy. Initially, inverse reinforcement learning (IRL) is employed to tackle the challenge of designing pedestrian walking reward. Subsequently, the Four-Mecanum-Wheel Robot dynamic model is constructed to develop IDC-SRAN, resolving the issue of dynamics mismatch of RL system. The actions of IDC-SRAN are defined as additional torque, with actual torque and lateral/longitudinal velocities integrated into the state space. The feasibility of the decision-control strategy is ensured by constraining the range of actions. Furthermore, a critical challenge arises from the state delay caused by model transient characteristics, which complicates the articulation of nonlinear relationships between states and actions through IRL-based rewards. To mitigate this, a driving-force-guided reward is proposed. This reward guides the robot to explore more appropriate decision-control strategies by expected direction of driving force, thereby reducing non-optimal behaviors during transient phases. Experimental results demonstrate that IDC-SRAN achieves peak accelerations approximately 8.3% of baseline methods, significantly enhancing the feasibility of decision-control strategies. Simultaneously, the framework enables goal-oriented autonomous navigation through active torque modulation, attaining a task completion rate exceeding 90%. These outcomes further validate the intelligence and robustness of the proposed IDC-SRAN.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Literature search methods: A systematic review was done in November 2014 using the databases Web of Science and Google Scholar to collect and analyse the primary literature to date written about tool use and tool making behaviour in non-human animals. The search for publications on Google Scholar was done using the search terms “tool+using+making+animals”, including only articles, written in a non-limited period of time, sorted by relevance. Since Google Scholar provided a large amount of articles in a descending order from more relevant to less relevant, we detected relevant articles doing a first manual scan looking at the title and at the abstract until relevance was consistent, producing a total of 23 possible publications.The search for literature that was executed using the database Web of Science was done using the search terms “Tool*” (Topic) AND “Use* OR Utilization*” (Topic) AND “Mak*” (Topic) AND “Animal*”(Topic). This produced a result of 316 possible publications. Then we refined the results using the following search categories: “Behavioral Sciences”, “Ecology”, “Zoology”. We also selected only articles. After that, 9 articles were left. Then these underwent a title and abstract scan for relevance to the specific topic. The full text of the remaining articles was processed, and the articles that did not provide specific information about the occurrence of tool use and/or tool behaviour in animals were excluded. We also excluded all the secondary literature as reviewed primary literature without providing its own data. Articles whose content was not focused on the specified topic and articles whose data provided were not enough were also excluded. Of the 339 initial publications, 32 were screened: 2 were removed for not being primary research articles, 24 were directly related to the topic, 6 were excluded with reasons listed above. The remaining 24 studies included in the analysis were composed of experiments from 1973 to 2014. Out of 24 articles, 4 were written in 2005, 2 in 1982, 2 in 1990, 2 in 1994, 2 in 2003 and 2 in 2014. All articles that were included in this review were published in English in a total of 17 journals. Journal of Comparative Psychology published 4 articles out of 24 and Primates published 3. Analysis of the literature: Studies were coded by the geographical location (country's name), the duration (total lenght of the research measured in months), the type of the experiment performed ( observational, experimental), the common name of the animal observed or used as experimental subject, the activity that was the scope of the tool use behaviour, the kind of tool being used.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
File List Supplement1.xls (md5: 4202b5bccb5ee828f646f50530394c47)
Please be advised that the ESA cannot guarantee the forward migration of proprietary file formats such as Excel (.xls) documents.
Description
SupplementA.xls is an Excel spreadsheet containing 5 sheets with example calculations. The first 4 sheets (labeled Model 1 - Model 4) contain calculations for models considered in APPLICATION TO YELLOWSTONE BISON:
Model 1: Makes no assumptions about equality of survival rates for different age classes.
Model 2: Assumes survival rates are equal for ages 0–1, 2–3, 4–5, 6–7, 8–9, 10–11, 12–13.
Model 3: Assumes survival rates are equal for ages 0–1, 2–3, 4–5, 6–11, 12–13.
Model 4: Assumes survival rates are equal for ages 0–13.
The last sheet (labeled 3 Years) contains calculations for a hypothetical example with 3 age classes and 3 years of data, and no assumptions about equality of survival rates.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This datasets contain the results from our analyses of the Citizen Science Community on Twitter. These analyses have been done to better understand the discussion about SDGs, eLearning and eHealth.
The purpose of sharing these datasets is to provide the basis to reproduce the results reported in the associated deliverable. These files are not raw data, since due to privacy concerns we can not share personal information from Twitter.
dominant_topics_anonym.xlsx: Excel datasheet. This dataset contians the distribution of the most discussed topics inside the SDGs discussion.
Edges_Hashtag_connected.csv: CSV file. This dataset contains the edges to build the network of connected hashtags. This edges can be used to build a network and explore the connections or to statiscally analyse the results.
hashtags.csv: CSV file. This dataset contains the results of the most used hashtags in the analysis about eLearning.
hashtags_treemap_health.xlsx: Excel datasheet. This dataset contains the results of the most frequent hashtags in the eHealth analysis.
ldavis_prepared_ieee17.html: HTML file. This file contains the Intertopic distance map and most salient terms from the topic modelling analysis done in the SDGs conversation study.
Most_retweeted_accounts.xlsx: Excel datasheet. This dataset contains the top 20 users that receive more retweets in the conversation around eHealth. The column called Indegree refers to the topological value calculated from the network of retweets. This indegree is equivalent to the number of retweets received. On the other hand, Outdegree is the opposite, so number of retweets given to others.
Most_retweeting_account.xlsx: Excel datasheet. This dataset presents the opposite part of the previous one, the accounts that retweet the most from the eHealth analysis. The columns contain the same indicators: Indegree and Outdegree.
sdgs_count_publish.csv: CSV file. This dataset contains the number of tweets assigned to the different SDGs from the analysis done on the conversation about these Goals.
sdgs_tweets_sdgsaccess.xlsx: Excel datasheet. Same file as the previous one in other format to ease the handling in Excel.
top_hash_health.xlsx: Excel datasheet. The most used hashtags inside the conversation about eHealth.
topics_tweets_sdgsaccess.xlsx: Excel datasheet. Tweets by topic extracted using Machine Learning in the SDGs analysis.
This repository will receive updates in the future in order to present all the data available and publishable from the different analysis that were described.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Excel sheet containing numerical data used to generate Figs 2–8 and S1.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
T1 values for intraobserver reproducibility assessment; Excel data with manual ROI placement by observer 1.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Excel spreadsheet containing, in separate sheets, the underlying numerical data presented in the manuscript.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Original data set for Fukushima et al study on mitochondrial ROS generation under conditions simulating early reperfusion injury. Microsoft Excel spreadsheet, with one tab/sheet for each figure. See "Read Me" tab for more details. Note - Tukey's post-hoc tests used "Real Statistics" Excel plug-in. Underlying calculations are not included, since they will only be visible to those with the plug-in installed in Excel.
Facebook
TwitterThis zip file contains the RT-qPCR results from the final report ST-2023-20026-01: Investigation of environmental RNA (eRNA) as a detection method for dreissenid mussels and other invasive species.
RT-qPCR (reverse transcriptase quantification polymerase chain reaction) analysis was conducted on eRNA (environmental ribosomal nucleic acid) isolated from water samples collected at Canyon Reservoir, AZ. The goal of the project was to test out three different RNA preservation methods and three different RNA extraction methods. RT-qPCR was used to detect the presence of eRNA in the samples. The analysis was conducted using the CFX Maestro instrument. Included in the zip file is the CFX Maestro software information. The Cq (quantification value) was obtained using RT-qPCR for each sample, analyzed, and used to create the figures in the final report.
Following each RT-qPCR assay, all the files associated with the experiment were downloaded and saved. There are 14 folders, and each contain a series of Excel spreadsheets that contain the information on the RT-qPCR experiment. These Excel spreadsheets include the following data: ANOVA results, end point results, gene expression results, melt curve results, quantification amplification results, Cq results, plate view results, standard curve, and run information for each RT-qPCR analysis. Some of the folders also contain images of the amplification curve, melt curve, melt peak, and standard curve for the experiment.
The Cq values used in the report were taken from the quantification amplification file for each of the experiments. These Cq values were placed into the eRNA Data and Figures Excel spreadsheet. In this spreadsheet the Cq values were analyzed, and graphs were made with the data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The file in excel format including the initial dataset of precipitation and stream flow for 32,000 events for 304 catchments over Iran. These dataset are used for the analysis in the manuscript entitled "Enhancing Estimation of Catchment Direct Runoff Using Inverse Modelling of SCS-CN: Case Studies across Iran" that has been submitted to the Water Resource Research Journal. The location of the coded catchments are shown in the main text of the manuscript and also the Supporting Information file attached to the manuscript.
Facebook
Twitterhttps://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11614485%2F77423c517f4ce0f3cd92f25b40dae34e%2Fchurn%20customer%20image.jpg?generation=1701800293516994&alt=media" alt="">
Introduction On Dataset: Customer Churn prediction means knowing which customers are likely to leave your service. This is because acquiring new customers often costs more than retaining existing ones. Once you’ve identified customers at risk of churn, you need to know exactly what marketing efforts you should make with each customer to maximize their likelihood of staying. Customers have different behaviors and preferences, and reasons for discontinuing their accounts. Therefore, it is important to actively communicate with each of them to keep them on your customer list. You need to know which marketing activities are most effective for individual customers and when they are most effective.
Impact Of Customer Churn On Businesses : A company with a high churn rate loses many accounts, resulting in lower growth rates and a greater impact on sales and profits. Banks with low churn rates can retain customers.
Benefits of Analyzing Customer Churn Prediction
1. Increase profits Banks sell products and services to make money. Therefore, the ultimate goal of churn analysis is to reduce churn and increase profits. As more customers stay longer, revenue should increase, and profits should follow.
2. Improve the customer experience One of the worst ways to lose a customer is an easy-to-avoid mistake like serving the wrong service. By understanding why customers churn, you can better understand their priorities, identify your weaknesses, and improve the overall customer experience. Customer experience, also known as “CX”, is the customer’s perception or opinion of their interactions with your business. The perception of your bank is shaped throughout the Customer journey, from the first interaction to after-sales support, and has a lasting impact on your business, including your bottom line.
3. Optimize your products and services If customers are leaving because of specific issues with your product or service, you have an opportunity to improve. Implementing these insights reduces customer churn and improves the overall product or service for future growth. 4. Customer retention The opposite of customer churn is customer retention. Bank can retain customers and continue to generate revenue from them. High customer loyalty enables banks to increase the profitability of their existing customers and maximize their lifetime value (LTV).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides comprehensive information on road intersection crashes involving motorcycles (Motor tricycle, Motorcycle: under 125cc, Motorcycle: Above 125cc, Quadru-cycle) that have resulted in injuries recognised as "high-high" clusters within the City of Cape Town. It includes detailed records of all intersection crashes and their corresponding crash attribute combinations, which were prevalent in 33% of the total "high-high" cluster motorcycle road intersection crashes resulting in injuries for the years 2017, 2018 and 2019. The dataset is meticulously organised according to confidence metric values presented in descending order.Data SpecificsData Type: Geospatial-temporal categorical dataFile Format: Excel document (.xlsx)Size: 29,8 KBNumber of Files: The dataset contains a total of 576 association rulesDate Created: 23rd May 2024MethodologyData Collection Method: The descriptive road traffic crash data per crash victim involved in the crashes was obtained from the City of Cape Town Network InformationSoftware: ArcGIS Pro, PythonProcessing Steps: Following the spatio-temporal analyses and the derivation of "high-high" cluster fishnet grid cells from a cluster and outlier analysis, all the road intersection crashes involving a motorcycle resulting in injuries that occurred within the "high-high" cluster fishnet grid cells were extracted to be processed by association analysis. The association analysis of these crashes was processed using Python software and involved the use of a 0,30 support metric value. Consequently, commonly occurring crash attributes among at least 33% of the "high-high" cluster road intersection motorcycle crashes resulting in injuries were extracted for inclusion in this dataset.Geospatial InformationSpatial Coverage:West Bounding Coordinate: 18°20'EEast Bounding Coordinate: 19°05'ENorth Bounding Coordinate: 33°25'SSouth Bounding Coordinate: 34°25'SCoordinate System: South African Reference System (Lo19) using the Universal Transverse Mercator projectionTemporal InformationTemporal Coverage:Start Date: 01/01/2017End Date: 31/12/2019
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Excel sheet containing annotations of the publicly available P. multocida genomes evaluated in Fig 1.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
We are enclosing the database used in our research titled "Concentration and Geospatial Modelling of Health Development Offices' Accessibility for the Total and Elderly Populations in Hungary", along with our statistical calculations. For the sake of reproducibility, further information can be found in the file Short_Description_of_Data_Analysis.pdf and Statistical_formulas.pdf
The sharing of data is part of our aim to strengthen the base of our scientific research. As of March 7, 2024, the detailed submission and analysis of our research findings to a scientific journal has not yet been completed.
The dataset was expanded on 23rd September 2024 to include SPSS statistical analysis data, a heatmap, and buffer zone analysis around the Health Development Offices (HDOs) created in QGIS software.
Short Description of Data Analysis and Attached Files (datasets):
Our research utilised data from 2022, serving as the basis for statistical standardisation. The 2022 Hungarian census provided an objective basis for our analysis, with age group data available at the county level from the Hungarian Central Statistical Office (KSH) website. The 2022 demographic data provided an accurate picture compared to the data available from the 2023 microcensus. The used calculation is based on our standardisation of the 2022 data. For xlsx files, we used MS Excel 2019 (version: 1808, build: 10406.20006) with the SOLVER add-in.
Hungarian Central Statistical Office served as the data source for population by age group, county, and regions: https://www.ksh.hu/stadat_files/nep/hu/nep0035.html, (accessed 04 Jan. 2024.) with data recorded in MS Excel in the Data_of_demography.xlsx file.
In 2022, 108 Health Development Offices (HDOs) were operational, and it's noteworthy that no developments have occurred in this area since 2022. The availability of these offices and the demographic data from the Central Statistical Office in Hungary are considered public interest data, freely usable for research purposes without requiring permission.
The contact details for the Health Development Offices were sourced from the following page (Hungarian National Population Centre (NNK)): https://www.nnk.gov.hu/index.php/efi (n=107). The Semmelweis University Health Development Centre was not listed by NNK, hence it was separately recorded as the 108th HDO. More information about the office can be found here: https://semmelweis.hu/egeszsegfejlesztes/en/ (n=1). (accessed 05 Dec. 2023.)
Geocoordinates were determined using Google Maps (N=108): https://www.google.com/maps. (accessed 02 Jan. 2024.) Recording of geocoordinates (latitude and longitude according to WGS 84 standard), address data (postal code, town name, street, and house number), and the name of each HDO was carried out in the: Geo_coordinates_and_names_of_Hungarian_Health_Development_Offices.csv file.
The foundational software for geospatial modelling and display (QGIS 3.34), an open-source software, can be downloaded from:
https://qgis.org/en/site/forusers/download.html. (accessed 04 Jan. 2024.)
The HDOs_GeoCoordinates.gpkg QGIS project file contains Hungary's administrative map and the recorded addresses of the HDOs from the
Geo_coordinates_and_names_of_Hungarian_Health_Development_Offices.csv file,
imported via .csv file.
The OpenStreetMap tileset is directly accessible from www.openstreetmap.org in QGIS. (accessed 04 Jan. 2024.)
The Hungarian county administrative boundaries were downloaded from the following website: https://data2.openstreetmap.hu/hatarok/index.php?admin=6 (accessed 04 Jan. 2024.)
HDO_Buffers.gpkg is a QGIS project file that includes the administrative map of Hungary, the county boundaries, as well as the HDO offices and their corresponding buffer zones with a radius of 7.5 km.
Heatmap.gpkg is a QGIS project file that includes the administrative map of Hungary, the county boundaries, as well as the HDO offices and their corresponding heatmap (Kernel Density Estimation).
A brief description of the statistical formulas applied is included in the Statistical_formulas.pdf.
Recording of our base data for statistical concentration and diversification measurement was done using MS Excel 2019 (version: 1808, build: 10406.20006) in .xlsx format.
Using the SPSS 29.0.1.0 program, we performed the following statistical calculations with the databases Data_HDOs_population_without_outliers.sav and Data_HDOs_population.sav:
For easier readability, the files have been provided in both SPV and PDF formats.
The translation of these supplementary files into English was completed on 23rd Sept. 2024.
If you have any further questions regarding the dataset, please contact the corresponding author: domjan.peter@phd.semmelweis.hu
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterMy Grandpa asked if the programs I was using could calculate his Golf League’s handicaps, so I decided to play around with SQL and Google Sheets to see if I could functionally recreate what they were doing.
The goal is to calculate a player’s handicap, which is the average of the last six months of their scores minus 29. The average is calculated based on how many games they have actually played in the last six months, and the number of scores averaged correlates to total games. For example, Clem played over 20 games so his handicap will be calculated with the maximum possible scores accounted for, that being 8. Schomo only played six games, so the lowest 4 will be used for their average. Handicap is always calculated with the lowest available scores.
This league uses Excel, so upon receiving the data I converted it into a CSV and uploaded it into bigQuery.
First thing I did was change column names to best represent what they were and simplify things in the code. It is much easier to remember ‘someone_scores’ than ‘int64_field_number’. It also seemed to confuse SQL less, as int64 can mean something independently.
(ALTER TABLE grandpa-golf.grandpas_golf_35.should only need the one
RENAME COLUMN int64_field_4 TO schomo_scores;)
To Find the average of Clem’s scores:
SELECT AVG(clem_scores)
FROM grandpa-golf.grandpas_golf_35.should only need the one
LIMIT 8; RESULT: 43.1
Remembering that handicap is the average minus 29, the final computation looks like:
SELECT AVG(clem_scores) - 29
FROM grandpa-golf.grandpas_golf_35.should only need the one
LIMIT 8; RESULT: 14.1
Find the average of Schomo’s scores:
SELECT AVG(schomo_scores) - 29
FROM grandpa-golf.grandpas_golf_35.should only need the one
LIMIT 6; RESULT: 10.5
This data was already automated to calculate a handicap in the league’s excel spreadsheet, so I asked for more data to see if i could recreate those functions.
Grandpa provided the past three years of league data. The names were all replaced with generic “Golfer 001, Golfer 002, etc”. I had planned on converting this Excel sheet into a CSV and manipulating it in SQL like with the smaller sample, but this did not work.
Immediately, there were problems. I had initially tried to just convert the file into a CSV and drop it into SQL, but there were functions that did not transfer properly from what was functionally the PDF I had been emailed. So instead of working with SQL, I decided to pull this into google sheets and recreate the functions for this spreadsheet. We only need the most recent 6 months of scores to calculate our handicap, so once I made a working copy I deleted the data from before this time period. Once that was cleaned up, I started working on a function that would pull the working average from these values, which is still determined by how many total values there were. This correlates as follows: for 20 or more scores average the lowest 8, for 15 to 19 scores average the lowest 6, for 6 to 14 scores average the lowest 4 and for 6 or fewer scores average the lowest 2. We also need to ensure that an average value of 0 returns a value of 0 so our handicap calculator works. My formula ended up being:
=IF(COUNT(E2:AT2)>=20, AVERAGE(SMALL(E2:AT2, ROW(INDIRECT("1:"&8)))), IF(COUNT(E2:AT2)>=15, AVERAGE(SMALL(E2:AT2, ROW(INDIRECT("1:"&6)))), IF(COUNT(E2:AT2)>=6, AVERAGE(SMALL(E2:AT2, ROW(INDIRECT("1:"&4)))), IF(COUNT(E2:AT2)>=1, AVERAGE(SMALL(E2:AT2, ROW(INDIRECT("1:"&2)))), IF(COUNT(E2:AT2)=0, 0, "")))))
The handicap is just this value minus 29, so for the handicap column the script is relatively simple: =IF(D2=0,0,IF(D2>47,18,D2-29)) This ensures that we will not get a negative value for our handicap, and pulls the basic average from the right place. It also sets the handicap to zero if there are no scores present.
Now that we have our spreadsheet back in working order with our new scripts, we are functionally done. We have recreated what my Grandpa’s league uses to generate handicaps.