Facebook
TwitterThis dataset contains the valuation template the researcher can use to retrieve real-time Excel stock price and stock price in Google Sheets. The dataset is provided by Finsheet, the leading financial data provider for spreadsheet users. To get more financial data, visit the website and explore their function. For instance, if a researcher would like to get the last 30 years of income statement for Meta Platform Inc, the syntax would be =FS_EquityFullFinancials("FB", "ic", "FY", 30) In addition, this syntax will return the latest stock price for Caterpillar Inc right in your spreadsheet. =FS_Latest("CAT") If you need assistance with any of the function, feel free to reach out to their customer support team. To get starter, install their Excel and Google Sheets add-on.
Facebook
TwitterMy Grandpa asked if the programs I was using could calculate his Golf League’s handicaps, so I decided to play around with SQL and Google Sheets to see if I could functionally recreate what they were doing.
The goal is to calculate a player’s handicap, which is the average of the last six months of their scores minus 29. The average is calculated based on how many games they have actually played in the last six months, and the number of scores averaged correlates to total games. For example, Clem played over 20 games so his handicap will be calculated with the maximum possible scores accounted for, that being 8. Schomo only played six games, so the lowest 4 will be used for their average. Handicap is always calculated with the lowest available scores.
This league uses Excel, so upon receiving the data I converted it into a CSV and uploaded it into bigQuery.
First thing I did was change column names to best represent what they were and simplify things in the code. It is much easier to remember ‘someone_scores’ than ‘int64_field_number’. It also seemed to confuse SQL less, as int64 can mean something independently.
(ALTER TABLE grandpa-golf.grandpas_golf_35.should only need the one
RENAME COLUMN int64_field_4 TO schomo_scores;)
To Find the average of Clem’s scores:
SELECT AVG(clem_scores)
FROM grandpa-golf.grandpas_golf_35.should only need the one
LIMIT 8; RESULT: 43.1
Remembering that handicap is the average minus 29, the final computation looks like:
SELECT AVG(clem_scores) - 29
FROM grandpa-golf.grandpas_golf_35.should only need the one
LIMIT 8; RESULT: 14.1
Find the average of Schomo’s scores:
SELECT AVG(schomo_scores) - 29
FROM grandpa-golf.grandpas_golf_35.should only need the one
LIMIT 6; RESULT: 10.5
This data was already automated to calculate a handicap in the league’s excel spreadsheet, so I asked for more data to see if i could recreate those functions.
Grandpa provided the past three years of league data. The names were all replaced with generic “Golfer 001, Golfer 002, etc”. I had planned on converting this Excel sheet into a CSV and manipulating it in SQL like with the smaller sample, but this did not work.
Immediately, there were problems. I had initially tried to just convert the file into a CSV and drop it into SQL, but there were functions that did not transfer properly from what was functionally the PDF I had been emailed. So instead of working with SQL, I decided to pull this into google sheets and recreate the functions for this spreadsheet. We only need the most recent 6 months of scores to calculate our handicap, so once I made a working copy I deleted the data from before this time period. Once that was cleaned up, I started working on a function that would pull the working average from these values, which is still determined by how many total values there were. This correlates as follows: for 20 or more scores average the lowest 8, for 15 to 19 scores average the lowest 6, for 6 to 14 scores average the lowest 4 and for 6 or fewer scores average the lowest 2. We also need to ensure that an average value of 0 returns a value of 0 so our handicap calculator works. My formula ended up being:
=IF(COUNT(E2:AT2)>=20, AVERAGE(SMALL(E2:AT2, ROW(INDIRECT("1:"&8)))), IF(COUNT(E2:AT2)>=15, AVERAGE(SMALL(E2:AT2, ROW(INDIRECT("1:"&6)))), IF(COUNT(E2:AT2)>=6, AVERAGE(SMALL(E2:AT2, ROW(INDIRECT("1:"&4)))), IF(COUNT(E2:AT2)>=1, AVERAGE(SMALL(E2:AT2, ROW(INDIRECT("1:"&2)))), IF(COUNT(E2:AT2)=0, 0, "")))))
The handicap is just this value minus 29, so for the handicap column the script is relatively simple: =IF(D2=0,0,IF(D2>47,18,D2-29)) This ensures that we will not get a negative value for our handicap, and pulls the basic average from the right place. It also sets the handicap to zero if there are no scores present.
Now that we have our spreadsheet back in working order with our new scripts, we are functionally done. We have recreated what my Grandpa’s league uses to generate handicaps.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Spreadsheets targeted at the analysis of GHS safety fingerprints.AbstractOver a 20-year period, the UN developed the Globally Harmonized System (GHS) to address international variation in chemical safety information standards. By 2014, the GHS became widely accepted internationally and has become the cornerstone of OSHA’s Hazard Communication Standard. Despite this progress, today we observe that there are inconsistent results when different sources apply the GHS to specific chemicals, in terms of the GHS pictograms, hazard statements, precautionary statements, and signal words assigned to those chemicals. In order to assess the magnitude of this problem, this research uses an extension of the “chemical fingerprints” used in 2D chemical structure similarity analysis to GHS classifications. By generating a chemical safety fingerprint, the consistency of the GHS information for specific chemicals can be assessed. The problem is the sources for GHS information can differ. For example, the SDS for sodium hydroxide pellets found on Fisher Scientific’s website displays two pictograms, while the GHS information for sodium hydroxide pellets on Sigma Aldrich’s website has only one pictogram. A chemical information tool, which identifies such discrepancies within a specific chemical inventory, can assist in maintaining the quality of the safety information needed to support safe work in the laboratory. The tools for this analysis will be scaled to the size of a moderate large research lab or small chemistry department as a whole (between 1000 and 3000 chemical entities) so that labelling expectations within these universes can be established as consistently as possible.Most chemists are familiar with programs such as excel and google sheets which are spreadsheet programs that are used by many chemists daily. Though a monadal programming approach with these tools, the analysis of GHS information can be made possible for non-programmers. This monadal approach employs single spreadsheet functions to analyze the data collected rather than long programs, which can be difficult to debug and maintain. Another advantage of this approach is that the single monadal functions can be mixed and matched to meet new goals as information needs about the chemical inventory evolve over time. These monadal functions will be used to converts GHS information into binary strings of data called “bitstrings”. This approach is also used when comparing chemical structures. The binary approach make data analysis more manageable, as GHS information comes in a variety of formats such as pictures or alphanumeric strings which are difficult to compare on their face. Bitstrings generated using the GHS information can be compared using an operator such as the tanimoto coefficent to yield values from 0 for strings that have no similarity to 1 for strings that are the same. Once a particular set of information is analyzed the hope is the same techniques could be extended to more information. For example, if GHS hazard statements are analyzed through a spreadsheet approach the same techniques with minor modifications could be used to tackle more GHS information such as pictograms.Intellectual Merit. This research indicates that the use of the cheminformatic technique of structural fingerprints can be used to create safety fingerprints. Structural fingerprints are binary bit strings that are obtained from the non-numeric entity of 2D structure. This structural fingerprint allows comparison of 2D structure through the use of the tanimoto coefficient. The use of this structural fingerprint can be extended to safety fingerprints, which can be created by converting a non-numeric entity such as GHS information into a binary bit string and comparing data through the use of the tanimoto coefficient.Broader Impact. Extension of this research can be applied to many aspects of GHS information. This research focused on comparing GHS hazard statements, but could be further applied to other bits of GHS information such as pictograms and GHS precautionary statements. Another facet of this research is allowing the chemist who uses the data to be able to compare large dataset using spreadsheet programs such as excel and not need a large programming background. Development of this technique will also benefit the Chemical Health and Safety community and Chemical Information communities by better defining the quality of GHS information available and providing a scalable and transferable tool to manipulate this information to meet a variety of other organizational needs.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
PROJECT OBJECTIVE
We are a part of XYZ Co Pvt Ltd company who is in the business of organizing the sports events at international level. Countries nominate sportsmen from different departments and our team has been given the responsibility to systematize the membership roster and generate different reports as per business requirements.
Questions (KPIs)
TASK 1: STANDARDIZING THE DATASET
TASK 2: DATA FORMATING
TASK 3: SUMMARIZE DATA - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1) • Create a PIVOT table in the worksheet ANALYSIS, starting at cell B3,with the following details:
TASK 4: SUMMARIZE DATA - EXCEL FUNCTIONS (Use SPORTSMEN worksheet after attempting TASK 1)
• Create a SUMMARY table in the worksheet ANALYSIS,starting at cell G4, with the following details:
TASK 5: GENERATE REPORT - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1)
• Create a PIVOT table report in the worksheet REPORT, starting at cell A3, with the following information:
Process
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The data file entitled “Emergy analysis of maize production in Ghana” is based on an empirical study to assess the resource as well as energy use efficiency of maize production systems using the Emergy-Data Envelopment Analysis approach, which was developed within the context of the BiomassWeb Project. The study area was Bolgatanga and Bongo Districts, Ghana, sub-Saharan Africa. The approach was developed by coupling Emergy Analysis and Data Envelopment Analysis methods into a framework, and integrating the concept of eco-efficiency into the framework to assess the resource as well as energy use efficiency and sustainability of agroecosystems as a whole. In this data file, the Emergy Analysis method is applied to achieve enviromental and economic accounting of maize production systems in Ghana. The Agricultural Production Systems sIMulator (APSIM) was used to model five maize-based production scenarios as follows: 1. Extensive rainfed maize system if the external input is 0 kg/ha/yr urea, with/ without manure (Extensive0). 2. Extensive rainfed maize system if the external input is 12 kg/ha/yr NPK, with/ without manure (Extensive12). 3. Rainfed maize-legume (cowpea - Vigna unguiculata, soybean - Glycine max, or groundnut - Arachis hypogaea) intercropping system if the external input is 20 kg/ha/yr urea, with/ without manure (Intercrop20). 4. Intensive maize system if the external input is 50 kg/ha/yr urea, including supplemental irrigation (Intensive50). 5. Intensive maize system if the external input is 100 kg/ha/yr urea, including supplemental irrigation (Intensive100). The five scenarios were compared on the basis of the evaluation that was achieved using the Emergy Analysis to account for resource as well as energy use efficiency and sustainability. The data were processed using mathemathical functions in Microsoft Excel. The data file is organized in seven sheet tabs, and they are linked. Comments have been added to make the content self-explanatory. Where secondary data have been used, the sources have been cited. This data file was authored by Mwambo, Francis Molua.
Facebook
TwitterWith this add in it is possible to create map templates from GIS files in KML format, and create choropleths with them. Providing you have access to KML format map boundary files, it is possible to create your own quick and easy choropleth maps in Excel. The KML format files can be converted from 'shape' files. Many shape files are available to download for free from the web, including from Ordnance Survey and the London Datastore. Standard mapping packages such as QGIS (free to download) and ArcGIS can convert the files to KML format. A sample of a KML file (London wards) can be downloaded from this page, so that users can easily test the tool out. Macros must be enabled for the tool to function. When creating the map using the Excel tool, the 'unique ID' should normally be the area code, the 'Name' should be the area name and then if required and there is additional data in the KML file, further 'data' fields can be added. These columns will appear below and to the right of the map. If not, data can be added later on next to the codes and names. In the add-in version of the tool the final control, 'Scale (% window)' should not normally be changed. With the default value 0.5, the height of the map is set to be half the total size of the user's Excel window. To run a choropleth, select the menu option 'Run Choropleth' to get this form. To specify the colour ramp for the choropleth, the user needs to enter the number of boxes into which the range is to be divided, and the colours for the high and low ends of the range, which is done by selecting coloured option boxes as appropriate. If wished, hit the 'Swap' button to change which colours are for the different ends of the range. Then hit the 'Choropleth' button. The default options for the colours of the ends of the choropleth colour range are saved in the add in, but different values can be selected but setting up a column range of up to twelve cells, anywhere in Excel, filled with the option colours wanted. Then use the 'Colour range' control to select this range, and hit apply, having selected high or low values as wished. The button 'Copy' sets up a sheet 'ColourRamp' in the active workbook with the default colours, which can just be extended or deleted with just a few cells, so saving the user time. The add-in was developed entirely within the Excel VBA IDE by Tim Lund. He is kindly distributing the tool for free on the Datastore but suggests that users who find the tool useful make a donation to the Shelter charity. It is not intended to keep the actively maintained, but if any users or developers would like to add more features, email the author. Acknowledgments Calculation of Excel freeform shapes from latitudes and longitudes is done using calculations from the Ordnance Survey.
Facebook
TwitterThis notebook serves to showcase my problem solving ability, knowledge of the data analysis process, proficiency with Excel and its various tools and functions, as well as my strategic mindset and statistical prowess. This project consist of an auditing prompt provided by Hive Data, a raw Excel data set, a cleaned and audited version of the raw Excel data set, and my description of my thought process and knowledge used during completion of the project. The prompt can be found below:
The raw data that accompanies the prompt can be found below:
Hive Annotation Job Results - Raw Data
^ These are the tools I was given to complete my task. The rest of the work is entirely my own.
To summarize broadly, my task was to audit the dataset and summarize my process and results. Specifically, I was to create a method for identifying which "jobs" - explained in the prompt above - needed to be rerun based on a set of "background facts," or criteria. The description of my extensive thought process and results can be found below in the Content section.
Brendan Kelley April 23, 2021
Hive Data Audit Prompt Results
This paper explains the auditing process of the “Hive Annotation Job Results” data. It includes the preparation, analysis, visualization, and summary of the data. It is accompanied by the results of the audit in the excel file “Hive Annotation Job Results – Audited”.
Observation
The “Hive Annotation Job Results” data comes in the form of a single excel sheet. It contains 7 columns and 5,001 rows, including column headers. The data includes “file”, “object id”, and the pseudonym for five questions that each client was instructed to answer about their respective table: “tabular”, “semantic”, “definition list”, “header row”, and “header column”. The “file” column includes non-unique (that is, there are multiple instances of the same value in the column) numbers separated by a dash. The “object id” column includes non-unique numbers ranging from 5 to 487539. The columns containing the answers to the five questions include Boolean values - TRUE or FALSE – which depend upon the yes/no worker judgement.
Use of the COUNTIF() function reveals that there are no values other than TRUE or FALSE in any of the five question columns. The VLOOKUP() function reveals that the data does not include any missing values in any of the cells.
Assumptions
Based on the clean state of the data and the guidelines of the Hive Data Audit Prompt, the assumption is that duplicate values in the “file” column are acceptable and should not be removed. Similarly, duplicated values in the “object id” column are acceptable and should not be removed. The data is therefore clean and is ready for analysis/auditing.
Preparation
The purpose of the audit is to analyze the accuracy of the yes/no worker judgement of each question according to the guidelines of the background facts. The background facts are as follows:
• A table that is a definition list should automatically be tabular and also semantic • Semantic tables should automatically be tabular • If a table is NOT tabular, then it is definitely not semantic nor a definition list • A tabular table that has a header row OR header column should definitely be semantic
These background facts serve as instructions for how the answers to the five questions should interact with one another. These facts can be re-written to establish criteria for each question:
For tabular column: - If the table is a definition list, it is also tabular - If the table is semantic, it is also tabular
For semantic column: - If the table is a definition list, it is also semantic - If the table is not tabular, it is not semantic - If the table is tabular and has either a header row or a header column...
Facebook
TwitterSmall area estimation modelling methods have been applied to the 2011 Skills for Life survey data in order to generate local level area estimates of the number and proportion of adults (aged 16-64 years old) in England living in households with defined skill levels in:
The number and proportion of adults in households who do not speak English as a first language are also included.
Two sets of small area estimates are provided for 7 geographies; middle layer super output areas (MSOAs), standard table wards, 2005 statistical wards, 2011 council wards, 2011 parliamentary constituencies, local authorities, and local enterprise partnership areas.
Regional estimates have also been provided, however, unlike the other geographies, these estimates are based on direct survey estimates and not modelled estimates.
The files are available as both Excel and csv files – the user guide explains the estimates and modelling approach in more detail.
To find the estimate for the proportion of adults with entry level 1 or below literacy in the Manchester Central parliamentary constituency, you need to:
It is estimated that 8.1% of adults aged 16-64 in Manchester Central have entry level or below literacy. The Credible Intervals for this estimate are 7.0 and 9.3% at the 95 per cent level. This means that while the estimate is 8.1%, there is a 95% likelihood that the actual value lies between 7.0 and 9.3%.
<p class="gem-c-attachment_metadata"><span class="gem-c-attachment_attribute">MS Excel Spreadsheet</span>, <span class="gem-c-attachment_attribute">14.5 MB</span></p>
<p class="gem-c-attachment_metadata">This file may not be suitable for users of assistive technology.</p>
<details data-module="ga4-event-tracker" data-ga4-event='{"event_name":"select_content","type":"detail","text":"Request an accessible format.","section":"Request an accessible format.","index_section":1}' class="gem-c-details govuk-details govuk-!-margin-bottom-0" title="Request an accessible format.">
Request an accessible format.
If you use assistive technology (such as a screen reader) and need a version of this document in a more accessible format, please email <a href="mailto:enquiries@beis.gov.uk" target="_blank" class="govuk-link">enquiries@beis.gov.uk</a>. Please tell us what format you need. It will help us if you say what assistive technology you use.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data consists of an excel file, and R code used to manipulate and visualize the data, as well as conduct statistical analyses. The excel file contains four sheets. One is raw data collected during the trials and contains a row per individual per trial. Columns are information on time within 2m, number of attempts, and successes at specific doors per trial. The second sheet is data formatted for survival analysis and so there is a row per individual per door, for first and last solves. Columns in the survival analysis sheet are explained below: Flock - Identifier of the social group from which each subject came Treatment - C refers to subjects in captivity, F refers to subjects in the wild ID - The unique identifier for the subject, based on color band combinations Door - The label describing which type of door on the puzzle box Solve - Whether it was the first (1) or last (3) solve. Or if the subject did not solve that door (0) Trial - the trial in which the solve occurred Time (min) - Minute within the trial that solve occurred Adjusted - Amount of time until the solve accounting for multiple trials
The third sheet is raw data from a novel object boldness assay described in detail in McCune et al. 2018. Evidence for personality conformity, not social niche specialization in social jays. Behavioral Ecology, April, https://doi.org/10.1093/beheco/ary055.
The fourth sheet is the dominance rank of subjects based on number of displacements, adjusted to give a relative rank compared to group mates.
Not all subjects had dominance and boldness attribute data.
The R code is in the R markdown format, with different sections for data manipulation, visualization, and analysis. Code is annotated throughout to describe function and output.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
These ward level well being scores present a combined measure of well-being indicators of the resident population based on 12 different indicators. Where possible each indicator score is compared with the England and Wales average, which is zero. Scores over 0 indicate a higher probability that the population on average will experience better well-being according to these measures. Users can adjust the weight of each indicator depending on what they consider to be the more or less important, thus generating bespoke scores. This is done either by entering a number between 0 and 10. The scores throughout the spreadsheet will update automatically. The tool combines data across a range of themes for the last five years of available data (2009-2013). Either view the results in the online interactive tool here, Or download the interactive spreadsheet here The well-being scores are then presented in a ranked bar chart for each borough, and a ward map of London. The spreadsheet also highlights wards in the top and bottom 25 per cent in London. Wards that have shown significant improvement or reduction in their scores relative to the average over the five year period are also highlighted. Borough figures are provided to assist with comparisons. Rankings and summary tables are included. The source data that the tool is based on is included in the spreadsheet. The Excel file is 8.1MB. IMPORTANT NOTE, users must enable macros when prompted upon opening the Excel spreadsheet (or reset security to medium/low) for the map to function. The rest of the tool will function without macros. If you cannot download the Excel file directly try this zip file (2.6MB). If you experience any difficulties with downloading this spreadsheet, please contact the London Datastore in the Intelligence Unit. Detailed information about definitions and sources is contained within the spreadsheet. The 12 measures included are: Health - Life Expectancy - Childhood Obesity - Incapacity Benefits claimant rate Economic security - Unemployment rate Safety - Crime rate - Deliberate Fires Education - GCSE point scores Children - Unauthorised Pupil Absence Families - Children in out-of-work households Transport - Public Transport Accessibility Scores (PTALs) Environment - Access to public open space & nature Happiness - Composite Subjective Well-being Score (Life Satisfaction, Worthwhileness, Anxiety, and Happiness) (New data only available since 2011/12) With some measures if the data shows a high figure that indicates better well-being, and with other measures a low figure indicates better well-being. Therefore scores for Life Expectancy, GCSE scores, PTALs, and Access to Public Open Space/Nature have been reversed so that in all measures low scores indicate probable lower well-being. The data has been turned into scores where each indicator in each year has a standard deviation of 10. This means that each indicator will have an equal effect on the final score when the weightings are set to equal. Why should measuring well-being be important to policy makers? Following research by the Cabinet Office and Office for National Statistics, the government is aiming to develop policy that is more focused on ‘all those things that make life worthwhile’ (David Cameron, November 2010). They are interested in developing new and better ways to understand how policy and public services affect well-being. Why measure well-being for local areas? It is important for London policy makers to consider well-being at a local level (smaller than borough level) because of the often huge differences within boroughs. Local authorities rely on small area data in order to target resources, and with local authorities currently gaining more responsibilities from government, this is of increasing importance. But small area data is also of interest to academics, independent analysts and members of the public with an interest in the subject of well-being. How can well-being be measured within small areas? The Office for National Statistics have been developing new measures of national well-being, and as part of this, at a national and regional level, the ONS has published some subjective data to measure happiness. ONS have not measured well-being for small areas, so this tool has been designed to fill this gap. However, DCLG have published a tool that models life satisfaction data for LSOAs based on a combination of national level happiness data, and 'ACORN' data. Happiness data is not available for small areas because there are no surveys large enough for this level of detail, and so at this geography the focus is on objective indicators. Data availability for small areas is far more limited than for districts, and this means the indicators that the scores are based on are not all perfect measures of well-being, though they are the best available. However, by using a relatively high number of measures across a number of years, this increases the reliability of the well-being scores. How can this tool be used to help policy makers? Each neighbourhood will have its own priorities, but the data in this tool could help provide a solid evidence base for informed local policy-making, and the distribution of regeneration funds. In addition, it could assist users to identify the causes behind an improvement in well-being in certain wards, where examples of good practice could be applied elsewhere. Differences to the previous report This is the 2013 edition of this publication, and there is one change from 2012. Indicators of Election turnout has been replaced with a composite score of subjective well-being indicators. Past versions are still available for 2011 and 2012. The rationale/methodology paper from 2011 is here. The scores from the 2012 spreadsheet are also available in PDF format. The scores in Intelligence Update 21-2012 are based on equal weightings across each measure. This tool was created by the GLA Intelligence Unit. Please contact datastore@london.gov.uk for more information.
Facebook
TwitterThe thermal properties of bimetallic Pt nanowires have been investigated using classical interatomic potentials. Edge decorated Pt nanowires may improve the inter-facet exchange of reaction intermediates resulting in improved oxygen reduction reaction activities at fuel call cathode electrodes. In this work we report on the melting behaviour of Pt-based nanowires where either an edge atomic row or atomic shell of Pt nanowires are substituted by Au, Ag or Pd. Our overall intention is to find out if edge decorated Pt nanowires can be attained by thermally annealing mixed Pt-Au, Pt-Au or Pt-Pd nanowires—we employ a reverse approach where edge decorated nanowires are thermally treated to study their departure from a well segregated state. The data described here are Microsoft excel sheets containing the data for bond order parameter as a function of temperature for all studied Pt nanowire systems; radial pair distribution data from molecular dynamic simulation; total energy as a function of the number of successful atomic swaps obtained from energy minimization calculations and the radial pair distribution function data for energy minimized nanowires. Our molecular dynamics simulations were performed using Sutton-Chan interatomic potential as implemented in the DL-POLY Classic software.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
When I first came across the competition a few weeks ago, I knew I just had to participate, Quickly I started going through the data provided, and man o man...was it not large? most of us haven't dealt with the type of information before, at least if you are not working or a student like me.
I went over the EDA Provided by Paul_Mooney and noticed a way of slicing the data-frame fetching values from it, I wanted to have a simpler solution that would be easily understood by many,
I reviewed all previous datasets from 2018 to 2021 and found there are common questions, plus a few added over the years. We will call these questions the "Look up Questions"
I manually made an Excel sheet aka the "Look up Table" listing these questions row-by-row, for all 5 years. Most importantly, I started adding their question tag (Q1, Q3. Q26_A, Q33_B, etc) for every year.
Now what we have is,
A. Look u Table
B. Unique Questions listed row-by-row
C. For every Question, It's a column name for every year
https://imgur.com/fddPb94.jpg" alt="">
Note. Blake space / empty field means that particular question was not asked in that specific year
https://imgur.com/3BQLZUS.jpg" alt="">
https://imgur.com/aQrumcx.jpg" alt="">
A. Quick referencing, Spend more time analyzing, and less on fiddling B. With a few custom functions (Added below), A single line of code will get you any sort of Data, Filtered & categorized based on ANY other column C. Works with previous years as well future kaggle survey analytics (Given that the question format doesn't change, didn't change for the past 5yrs)
Heres a demo notebook -> https://www.kaggle.com/code/pranav941/kaggle-analytics-helper-functions-2017-2022
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary : Fuel demand is shown to be influenced by fuel prices, people's income and motorization rates. We explore the effects of electric vehicle's rates in gasoline demand using this panel dataset.
Files : dataset.csv - Panel dimensions are the Brazilian state ( i ) and year ( t ). The other columns are: gasoline sales per capita (ln_Sg_pc), prices of gasoline (ln_Pg) and ethanol (ln_Pe) and their lags, motorization rates of combustion vehicles (ln_Mi_c) and electric vehicles (ln_Mi_e) and GDP per capita (ln_gdp_pc). All variables are all under the natural log function, since we use this to calculate demand elasticities in a regression model.
adjacency.csv - The adjacency matrix used in interaction with electric vehicles' motorization rates to calculate spatial effects. At first, it follows a binary adjacency formula: for each pair of states i and j, the cell (i, j) is 0 if the states are not adjacent and 1 if they are. Then, each row is normalized to have sum equal to one.
regression.do - Series of Stata commands used to estimate the regression models of our study. dataset.csv must be imported to work, see comment section.
dataset_predictions.xlsx - Based on the estimations from Stata, we use this excel file to make average predictions by year and by state. Also, by including years beyond the last panel sample, we also forecast the model into the future and evaluate the effects of different policies that influence gasoline prices (taxation) and EV motorization rates (electrification). This file is primarily used to create images, but can be used to further understand how the forecasting scenarios are set up.
Sources: Fuel prices and sales: ANP (https://www.gov.br/anp/en/access-information/what-is-anp/what-is-anp) State population, GDP and vehicle fleet: IBGE (https://www.ibge.gov.br/en/home-eng.html?lang=en-GB) State EV fleet: Anfavea (https://anfavea.com.br/en/site/anuarios/)
Facebook
TwitterThis dataset was derived from multiple datasets. You can find a link to the parent datasets in the Lineage Field in this metadata statement. The History Field in this metadata statement describes how this dataset was derived.
This dataset represents Surface Water Economic Assets in the Sydney PAE.
Created to represent spatially Surface water economic assets in the Sydney PAE
This dataset was derived from multiple datasets. You can find a link to the parent datasets in the Lineage Field in this metadata statement. The History Field in this metadata statement describes how this dataset was derived.
The database comprises of 3 output spatial layers:
1)SW Entitlements point locations (SW_LicenceEntitlements_VolSydney) Key field is "VolPerWorks" (Volume per works)
2) Surface Water Sharing Plan volumes by Water Management zone or Water Source classified as Water Access Rights based on use (SW_WSP_WMZ_WS_WaterAccessRight)
3) Surface Water Sharing Plan volumes by Water Management zone or Water Source classified as Basic Water Rights based on use (SW_WSP_WMZ_WS_BasicWaterRight)
Steps to create SW Entitlements point locations (SW_LicenceEntitlements_VolSydney)
1) The ArcGIS Create XY event theme was used to spatially enable the NSW Office of Water Surface Water Offtakes excel spreadsheet
2) This was then clipped by the PAE in ArcGIS
3) An additional SW offtake pnt and licence was added from a query performed by NSW Office of water when it was discovered that the new PAEslightly extended the old PAE area. (see dataset in lineage)
Steps to create Surface Water Sharing plan volumes by Water Management zone or Water Source
1) In ArcGIS and Intersect was performed on (NSW Office of Water combined geodatabase of regulated rivers and water sharing plan regions) and the Sydney PAE to select only the relevant polygons
2) The NSW Office of Water provided a spreadsheet with volumes for water sharing plans which could be grouped by Water Managment Zone or Water Source (If only one poly).
3) Volumes were joined by WMZ field or Water Source if they did not have a value in WMZ field
Key volume field for Water Access Right = Share_WAR
Key volume field for Basic Water Right = Share_BWR
4) A table has been included in the database to show the asset classification (SWCountPurposeClassVolume)
Bioregional Assessment Programme (2015) SW Economic Elements Sydney Basin 20150730. Bioregional Assessment Derived Dataset. Viewed 18 June 2018, http://data.bioregionalassessments.gov.au/dataset/4240553d-702c-4fae-b64b-074d611f2a34.
Derived From NSW Office of Water SW Offtakes Processed - North & South Sydney, v2 07032014
Derived From NSW Office of Water Surface Water Entitlements Locations v1_Oct2013
Derived From NSW Office of Water SW Offtakes Processed - North & South Sydney, v3 12032014
Derived From Surface Water Entitlements in Sydney sliver between different PAEs NSW Office of Water 20150717
Derived From NSW Office of Water Surface Water Offtakes - North & South Sydney v1 24102013
Derived From NSW Office of Water combined geodatabase of regulated rivers and water sharing plan regions
Derived From NSW SW Share Components NSW Office of Water 20150717
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Facebook
TwitterAttribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
The 2014 15 Budget is officially available at budget.gov.au as the authoritative source of Budget Papers (BPs) and Portfolio Budget Statement (PBS) documents. This dataset is a collection of data sources from the 2014 15 Budget, including:
Data from the 2014-15 Budget are provided to assist those who wish to analyse, visualise and programmatically access the 2014-15 Budget. It is the first time this has been done as per our announcement blog post. We intend to move further down the digital by default route to make the 2015-16 Budget more accessible and reusable in data form. We welcome your feedback and comments below. Data users should refer to footnotes and memoranda in the original files as these are not usually captured in machine readable CSVs.
This dataset was prepared by the Department of Finance and the Department of the Treasury.
The PBS Excel files published should include the following financial tables with headings and footnotes. Only the line item data (table 2.2) is available in CSV at this stage as we thought this would be the most useful PBS data to extract. Much of the other data is also available in the Budget Papers 1 and 4 in aggregate form:
Please note, total expenses reported in the csv file ‘2014-15 PBS line items dataset’ was prepared from individual agency programme expense tables. Totalling these figures does not produce the total expense figure in ‘Table1: estimates of general government expenses’ (Statement 6, Budget Paper 1). Differences relate to:
Intra agency charging for services which are eliminated for the reporting of general government financial statements;
Agency expenses that involve revaluation of assets and liabilities are reported as other economic flows in general government financial statements; and
Additional agencies’ expenses are included in general government sector expenses (e.g. Australian Strategic Policy Institute Limited and other entities) noting that only agencies that are directly government funded are required to prepare a PBS.
At this stage, the following Portfolios have contributed their PBS Excel files and are included in the line item CSV: 1.1 Agriculture Portfolio; 1.2 Attorney-General’s Portfolio; 1.3 Communications Portfolio; 1.4A Defence Portfolio; 1.4B Defence Portfolio (Department of Veterans’ Affairs); 1.5 Education Portfolio; 1.6 Employment Portfolio; 1.7 Environment Portfolio; 1.8 Finance Portfolio; 1.9 Foreign Affairs and Trade Portfolio; 1.10 Health Portfolio; 1.11 Immigration and Border Protection Portfolio; 1.12 Industry Portfolio; 1.13 Infrastructure and Regional Development Portfolio; 1.14 Prime Minister and Cabinet Portfolio; 1.15A Social Services Portfolio; 1.15B Social Services Portfolio (Department of Human Services); 1.16 Treasury Portfolio; 1.17A Department of the House of Representatives; 1.17B Department of the Senate; 1.17C Department of Parliamentary Services; and 1.17D Department of the Parliamentary Budget Office.
The original PBS Excel files and published documents include sub-totals and totals by agency and appropriation type which are not included in the line item CSV as these can be calculated programmatically. Where modifications are identified they will be updated as required. If a corrigendum to an agencies PBS is issued after budget night, tables will be updated as necessary.
Below is the CSV structure of the line item CSV. The data transformation is expected to be complete by midday 14 May, so we have put up the incomplete CSV which will be updated as additional PBSs are transformed into data form. Please keep refreshing for now.
Portfolio, Department/Agency, Outcome, Program, Expense type, Appropriation type, Description, 2012-13, 2013-14, 2014-15, 2015-16, 2016-17, Source document, Source table, URL
We have made a number of data tables from Budget Papers 1 and 4 available in their original format as Excel or XML files. We have transformed a number of these into machine readable format (as prioritised by several users of budget data) which will be published here as they are ready. Below is the list of the tables published and whether we’ve translated them into CSV form this year:
Facebook
TwitterAttribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
The 2015-16 Budget is officially available at budget.gov.au as the authoritative source of Budget Papers and Portfolio Budget Statement (PBS) documents. This dataset is a collection of data sources from the 2015-16 Budget, including:
Data from the 2015-16 Budget are provided to assist those who wish to analyse, visualise and programmatically access the 2015-16 Budget.
Data users should refer to footnotes and memoranda in the original files as these are not usually captured in machine readable CSVs.
We welcome your feedback and comments below.
This dataset was prepared by the Department of Finance and the Department of the Treasury.
The PBS Excel files published should include the following financial tables with headings and footnotes. Only the line item data (table 2.2) is available in CSV at this stage. Much of the other data is also available in the Budget Papers 1 and 4 in aggregate form:
Please note, total expenses reported in the CSV file ‘2015-16 PBS line items dataset’ was prepared from individual entity programme expense tables. Totalling these figures does not produce the total expense figure in ‘Table1: Estimates of General Government Expenses’ (Statement 6, Budget Paper 1).
Differences relate to:
The original PBS Excel files and published documents include sub-totals and totals by entity and appropriation type which are not included in the line item CSV. These can be calculated programmatically. Where modifications are identified they will be updated as required.
If a corrigendum to an entities PBS is issued after budget night, tables will be updated as necessary.
The structure of the line item CSV is;
The data transformation is expected to be complete by midday 13 May. We may put up an incomplete CSV which will continue to be updated as additional PBSs are transformed into data form.
The following Portfolios are included in the line item CSV:
We have made a number of data tables from the Budget Papers available in Excel and CSV formats.
Below is the list of the tables published and whether we’ve translated them into CSV form this year:
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterThis dataset contains the valuation template the researcher can use to retrieve real-time Excel stock price and stock price in Google Sheets. The dataset is provided by Finsheet, the leading financial data provider for spreadsheet users. To get more financial data, visit the website and explore their function. For instance, if a researcher would like to get the last 30 years of income statement for Meta Platform Inc, the syntax would be =FS_EquityFullFinancials("FB", "ic", "FY", 30) In addition, this syntax will return the latest stock price for Caterpillar Inc right in your spreadsheet. =FS_Latest("CAT") If you need assistance with any of the function, feel free to reach out to their customer support team. To get starter, install their Excel and Google Sheets add-on.