U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This is a dataset to be used to explain pivot tables, as part of a School of Data course.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
In the Europe bikes dataset, Extract the insight into sales in each country and each state of their countries using Excel.
The complete data set of annual utilization data reported by hospitals contains basic licensing information including bed classifications; patient demographics including occupancy rates, the number of discharges and patient days by bed classification, and the number of live births; as well as information on the type of services provided including the number of surgical operating rooms, number of surgeries performed (both inpatient and outpatient), the number of cardiovascular procedures performed, and licensed emergency medical services provided.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Single-Family Portfolio Snapshot consists of a monthly data table and a report generator (Excel pivot table) that can be used to quickly create new reports of interest to the user from the data records. The data records themselves are loan level records using all of the categorical variables highlighted on the report generator table. Users may download and save the Excel file that contains the data records and the pivot table.The report generator sheet consists of an Excel pivot table that gives individual users some ability to analyze monthly trends on dimensions of interest to them. There are six choice dimensions: property state, property county, loan purpose, loan type, property product type, and downpayment source.Each report generator selection variable has an associated drop-down menu that is accessed by clicking once on the associated arrows. Only single selections can be made from each menu. For example, users must choose one state or all states, one county or all counties. If a county is chosen that does not correspond with the selected state, the result will be null values.The data records include each report generator choice variable plus the property zip code, originating mortgagee (lender) number, sponsor-lender name, sponsor number, nonprofit gift provider tax identification number, interest rate, and FHA insurance endorsement year and month. The report generator only provides output for the dollar amount of loans. Users who desire to analyze other data that are available on the data table, for example, interest rates or sponsor number, must first download the Excel file. See the data definitions (PDF in top folder) for details on each data element.Files switch from .zip to excel in August 2017.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This spreadsheet contains a number of sheets.Three sheets contain the main datasets from each of the first three RAAAP surveys.In addition there is a combined sheet containing data from all three sheets (where the semantics are the same) with an additional field indicating which survey it is from. This sheet has fewer colunms as it only has the shared variables.There is also a sheet for each survey listing the variables, and one showing the mappings between surveys, and one showing the common variables.Finally there is an example pivot table to show how the data can be easily visualised.This spreasheet was developed for the RAAAP workshop delivered at the 2023 INORMS Conference in May 2023 in Durban, South Africa.This spreadsheet contains all of the common data from the first 3 RAAAP surveys.These data are presented on separate
http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence
Information on accidents across Leeds. Data includes location, number of people and vehicles involved, road surface, weather conditions and severity of any casualties.
Due to the format of the report a number of figures in the columns are repeated, these are:
Reference Number
Grid Ref: Easting
Grid Ref: Northing
Number of vehicles
Accident Date
Time (24hr)
21G0539
427798
426248
5
16/01/2015
1205
21G0539
427798
426248
5
16/01/2015
1205
21G1108
431142
430087
1
16/01/2015
1732
21H0565
434602
436699>
1
17/01/2015
930
21H0638
434254
434318
2
17/01/2015
1315
21H0638
434254
434318
2
17/01/2015
1315
Therefore the number of vehicles involved in accident 21G0539 were 5, and in accident 21H0638 were 2. Overall in the example above a total of 9 vehicles were involved in accidents
A useful tool to analyse the data is Excel pivot tables, these help summarise large amounts of data in a easy to view table, for further information on pivot table visit here.
A random sample of households were invited to participate in this survey. In the dataset, you will find the respondent level data in each row with the questions in each column. The numbers represent a scale option from the survey, such as 1=Excellent, 2=Good, 3=Fair, 4=Poor. The question stem, response option, and scale information for each field can be found in the var "variable labels" and "value labels" sheets. VERY IMPORTANT NOTE: The scientific survey data were weighted, meaning that the demographic profile of respondents was compared to the demographic profile of adults in Bloomington from US Census data. Statistical adjustments were made to bring the respondent profile into balance with the population profile. This means that some records were given more "weight" and some records were given less weight. The weights that were applied are found in the field "wt". If you do not apply these weights, you will not obtain the same results as can be found in the report delivered to the Bloomington. The easiest way to replicate these results is likely to create pivot tables, and use the sum of the "wt" field rather than a count of responses.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Information on accidents casualites across Calderdale. Data includes location, number of people and vehicles involved, road surface, weather conditions and severity of any casualties.
Due to the format of the report a number of figures in the columns are repeated, these are:
Reference Number
Grid Ref: Easting
Grid Ref: Northing
Number of vehicles
Accident Date
Time (24hr)
21G0539
427798
426248
5
16/01/2015
1205
21G0539
427798
426248
5
16/01/2015
1205
21G1108
431142
430087
1
16/01/2015
1732
21H0565
434602
436699>
1
17/01/2015
930
21H0638
434254
434318
2
17/01/2015
1315
21H0638
434254
434318
2
17/01/2015
1315
Therefore the number of vehicles involved in accident 21G0539 were 5, and in accident 21H0638 were 2. Overall in the example above a total of 9 vehicles were involved in accidents
A useful tool to analyse the data is Excel pivot tables, these help summarise large amounts of data in a easy to view table, for further information on pivot tables visit here.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
A high-quality, clean dataset simulating global cosmetics and skincare product sales between January and August 2022. This dataset mirrors real-world transactional data, making it perfect for data analysis, Excel training, visualization projects, and machine learning prototypes.
Column Name | Description |
---|---|
Sales Person | Name of the salesperson responsible for the sale |
Country | Country or region where the sale occurred |
Product | Cosmetic or skincare product sold |
Date | Date of the transaction (format: YYYY-MM-DD) |
Amount ($) | Total revenue generated from the sale (USD) |
Boxes Shipped | Number of product boxes shipped in the order |
VLOOKUP
, IF
, AVERAGEIFS
, INDEX-MATCH
, etc.)Analyzing sales data is essential for any business looking to make informed decisions and optimize its operations. In this project, we will utilize Microsoft Excel and Power Query to conduct a comprehensive analysis of Superstore sales data. Our primary objectives will be to establish meaningful connections between various data sheets, ensure data quality, and calculate critical metrics such as the Cost of Goods Sold (COGS) and discount values. Below are the key steps and elements of this analysis:
1- Data Import and Transformation:
2- Data Quality Assessment:
3- Calculating COGS:
4- Discount Analysis:
5- Sales Metrics:
6- Visualization:
7- Report Generation:
Throughout this analysis, the goal is to provide a clear and comprehensive understanding of the Superstore's sales performance. By using Excel and Power Query, we can efficiently manage and analyze the data, ensuring that the insights gained contribute to the store's growth and success.
The Tate Collection Here we present the metadata for around 70,000 artworks that Tate owns or jointly owns with the National Galleries of Scotland as part of ARTIST ROOMS. Metadata for around 3,500 associated artists is also included.
The metadata here is released under the Creative Commons Public Domain CC0 licence. Please see the enclosed LICENCE file for more detail.
Images are not included and are not part of the dataset. Use of Tate images is covered on the Copyright and permissions page. You may also license images for commercial use.
Please review the full usage guidelines.
Repository Contents
We offer two data formats:
A richer dataset is provided in the JSON format, which is organised by the directory structure of the Git repository. JSON supports more hierarchical or nested information such as subjects.
We also provide CSVs of flattened data, which is less comprehensive but perhaps easier to grok. The CSVs provide a good introduction to overall contents of the Tate metadata and create opportunities for artistic pivot tables.
JSON
Artists
Each artist has his or her own JSON file. They are found in the artists folder, then filed away by first letter of the artist’s surname.
Artworks
Artworks are found in the artworks folder. They are filed away by accession number. This is the unique identifier given to artworks when they come into the Tate collection. In many cases, the format has significance. For example, the ar accession number prefix indicates that the artwork is part of ARTIST ROOMS collection. The n prefix indicates works that once were part of the National Gallery collection.
CSV
There is one CSV file for artists (artist_data.csv) and one (very large) for artworks (artwork_data.csv), which we may one day break up into more manageable chunks. The CSV headings should be helpful. Let us know if not. Entrepreneurial hackers could use the CSVs as an index to the JSON collections if they wanted richer data.
Usage guidelines for open data
These usage guidelines are based on goodwill. They are not a legal contract but Tate requests that you follow these guidelines if you use Metadata from our Collection dataset.
The Metadata published by Tate is available free of restrictions under the Creative Commons Zero Public Domain Dedication.
This means that you can use it for any purpose without having to give attribution. However, Tate requests that you actively acknowledge and give attribution to Tate wherever possible. Attribution supports future efforts to release other data. It also reduces the amount of ‘orphaned data’, helping retain links to authoritative sources.
Give attribution to Tate
Make sure that others are aware of the rights status of Tate and are aware of these guidelines by keeping intact links to the Creative Commons Zero Public Domain Dedication.
If for technical or other reasons you cannot include all the links to all sources of the Metadata and rights information directly with the Metadata, you should consider including them separately, for example in a separate document that is distributed with the Metadata or dataset.
If for technical or other reasons you cannot include all the links to all sources of the Metadata and rights information, you may consider linking only to the Metadata source on Tate’s website, where all available sources and rights information can be found, including in machine readable formats.
Metadata is dynamic
When working with Metadata obtained from Tate, please be aware that this Metadata is not static. It sometimes changes daily. Tate continuously updates its Metadata in order to correct mistakes and include new and additional information. Museum collections are under constant study and research, and new information is frequently added to objects in the collection.
Mention your modifications of the Metadata and contribute your modified Metadata back
Whenever you transform, translate or otherwise modify the Metadata, make it clear that the resulting Metadata has been modified by you. If you enrich or otherwise modify Metadata, consider publishing the derived Metadata without reuse restrictions, preferably via the Creative Commons Zero Public Domain Dedication.
Be responsible
Ensure that you do not use the Metadata in a way that suggests any official status or that Tate endorses you or your use of the Metadata, unless you have prior permission to do so.
Ensure that you do not mislead others or misrepresent the Metadata or its sources
Ensure that your use of the Metadata does not breach any national legislation based thereon, notably concerning (but not limited to) data protection, defamation or copyright. Please note that you use the Metadata at your own risk. Tate offers the Metadata as-is and makes no representations or warranties of any kind concerning any Metadata published by Tate.
The writers of these guidelines are deeply indebted to the Smithsonian Cooper-Hewitt, National Design Museum; and Europeana.
The City of Bloomington contracted with National Research Center, Inc. to conduct the 2019 Bloomington Community Survey. This was the second time a scientific citywide survey had been completed covering resident opinions on service delivery satisfaction by the City of Bloomington and quality of life issues. The first was in 2017. The survey captured the responses of 610 households from a representative sample of 3,000 residents of Bloomington who were randomly selected to complete the survey. VERY IMPORTANT NOTE: The scientific survey data were weighted, meaning that the demographic profile of respondents was compared to the demographic profile of adults in Bloomington from US Census data. Statistical adjustments were made to bring the respondent profile into balance with the population profile. This means that some records were given more "weight" and some records were given less weight. The weights that were applied are found in the field "wt". If you do not apply these weights, you will not obtain the same results as can be found in the report delivered to the City of Bloomington. The easiest way to replicate these results is likely to create pivot tables, and use the sum of the "wt" field rather than a count of responses.
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
Warning: Large file size (over 1GB). Each monthly data set is large (over 4 million rows), but can be viewed in standard software such as Microsoft WordPad (save by right-clicking on the file name and selecting 'Save Target As', or equivalent on Mac OSX). It is then possible to select the required rows of data and copy and paste the information into another software application, such as a spreadsheet. Alternatively, add-ons to existing software, such as the Microsoft PowerPivot add-on for Excel, to handle larger data sets, can be used. The Microsoft PowerPivot add-on for Excel is available from Microsoft http://office.microsoft.com/en-gb/excel/download-power-pivot-HA101959985.aspx Once PowerPivot has been installed, to load the large files, please follow the instructions below. Note that it may take at least 20 to 30 minutes to load one monthly file. 1. Start Excel as normal 2. Click on the PowerPivot tab 3. Click on the PowerPivot Window icon (top left) 4. In the PowerPivot Window, click on the "From Other Sources" icon 5. In the Table Import Wizard e.g. scroll to the bottom and select Text File 6. Browse to the file you want to open and choose the file extension you require e.g. CSV Once the data has been imported you can view it in a spreadsheet. What does the data cover? General practice prescribing data is a list of all medicines, dressings and appliances that are prescribed and dispensed each month. A record will only be produced when this has occurred and there is no record for a zero total. For each practice in England, the following information is presented at presentation level for each medicine, dressing and appliance, (by presentation name): - the total number of items prescribed and dispensed - the total net ingredient cost - the total actual cost - the total quantity The data covers NHS prescriptions written in England and dispensed in the community in the UK. Prescriptions written in England but dispensed outside England are included. The data includes prescriptions written by GPs and other non-medical prescribers (such as nurses and pharmacists) who are attached to GP practices. GP practices are identified only by their national code, so an additional data file - linked to the first by the practice code - provides further detail in relation to the practice. Presentations are identified only by their BNF code, so an additional data file - linked to the first by the BNF code - provides the chemical name for that presentation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A sample of DataCite records that included Project or project at the end of their resourceTypes were selected as "DataCite Projects". The metadata was read and values for a number of FAIR elements and for related identifiers were extracted. This spreadsheet has those data in two tabs: data_20240618 has the FAIR concepts and relatedIdentifier_20240716 has the related identifiers. There are also two pivot tables included in the sheet.
These data were the subject of a blog post published during July 2024 at metadatagamechangers.com/blog.
Responses from the 2021 open participation (non-probability) survey.
In the dataset, you will find the respondent level data in each row with the questions in each column. The numbers represent a scale option from the survey, such as 1=Excellent, 2=Good, 3=Fair, 4=Poor. The question stem, response option, and scale information for each field can be found in the var "variable labels" and "value labels" sheets.
VERY IMPORTANT NOTE: The open participation survey data were weighted, meaning that the demographic profile of respondents was compared to the demographic profile of adults in Bloomington from US Census data. Statistical adjustments were made to bring the respondent profile into balance with the population profile. This means that some records were given more "weight" and some records were given less weight. The weights that were applied are found in the field "wt". If you do not apply these weights, you will not obtain the same results as can be found in the report delivered to the Bloomington. The easiest way to replicate these results is likely to create pivot tables, and use the sum of the "wt" field rather than a count of responses.
Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:
See the Splitgraph documentation for more information.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BACKGROUND: The Health Insurance Institute of Slovenia (ZZZS) began publishing service-related data in May 2023, following a directive from the Ministry of Health (MoH). The ZZZS website provides easily accessible information about the services provided by individual doctors, including their names. The user is provided relevant information about the doctor's employer, including whether it is a public or private institution. The data provided is useful for studying the public system's operations and identifying any errors or anomalies.
METHODS: The data for services provided in May 2023 was downloaded and analysed. The published data were cross-referenced using the provider's RIZDDZ number with the daily updated data on ambulatory workload from June 9, 2023, published by ZZZS. The data mentioned earlier were found to be inaccurate and were improved using alerts from the zdravniki.sledilnik.org portal. Therefore, they currently provide an accurate representation of the current situation. The total number of services provided by each provider in a given month was determined by adding up the individual services and then assigning them to the corresponding provider.
RESULTS: A pivot table was created to identify 307 unique operators, with 15 operators not appearing in both lists. There are 66 public providers, which make up about 72% of the contractual programme in the public system. There are 241 private providers, accounting for about 28% of the contractual programme. In May 2023, public providers accounted for 69% (n=646,236) of services in the family medicine system, while private providers contributed 31% (n=291,660). The total number of services provided by public and private providers was 937,896. Three linear correlations were analysed. The initial analysis of the entire sample yielded a high R-squared value of .998 (adjusted R-squared value of .996) and a significant level below 0.001. The second analysis of the data from private providers showed a high R Squared value of .904 (Adjusted R Squared = .886), indicating a strong correlation between the variables. Furthermore, the significance level was < 0.001, providing additional support for the statistical significance of the results. The third analysis used data from public providers and showed a strong level of explanatory power, with a R Squared value of 1.000 (Adjusted R Squared = 1.000). Furthermore, the statistical significance of the findings was established with a p-value < 0.001.
CONCLUSION: Our analysis shows a strong linear correlation between contract size of the program signed and number services rendered by family medicine providers. A stronger linear correlation is observed among providers in the public system compared to those in the private system. Our study found that private providers generally offer more services than public providers. However, it is important to acknowledge that the evaluation framework for assessing services may have inherent flaws when examining the data. Prescribing a prescription and resuscitating a patient are both assigned a rating of one service. It is crucial to closely monitor trends and identify comparable databases for pairing at the secondary and tertiary levels.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
The National Pollutant Release Inventory (NPRI) is Canada's public inventory of pollutant releases (to air, water and land), disposals and transfers for recycling. Each file contains data from 1993 to the latest reporting year. These CSV format datasets are in normalized or ‘list’ format and are optimized for pivot table analyses. Here is a description of each file: - The RELEASES file contains all substance release quantities. - The DISPOSALS file contains all on-site and off-site disposal quantities, including tailings and waste rock (TWR). - The TRANSFERS file contains all quantities transferred for recycling or treatment prior to disposal. - The COMMENTS file contains all the comments provided by facilities about substances included in their report. - The GEO LOCATIONS file contains complete geographic information for all facilities that have reported to the NPRI. Please consult the following resources to enhance your analysis: - Guide on using and Interpreting NPRI Data: https://www.canada.ca/en/environment-climate-change/services/national-pollutant-release-inventory/using-interpreting-data.html - Access additional data from the NPRI, including datasets and mapping products: https://www.canada.ca/en/environment-climate-change/services/national-pollutant-release-inventory/tools-resources-data/exploredata.html Supplemental Information More NPRI datasets and mapping products are available here: https://www.canada.ca/en/environment-climate-change/services/national-pollutant-release-inventory/tools-resources-data/access.html Supporting Projects: National Pollutant Release Inventory (NPRI)
Stream Temperature: Site: Gwynns Falls at Gwynnbrook (GFGB):
In the Baltimore urban long-term ecological research (LTER) project, (Baltimore Ecosystem Study, BES) we use the watershed approach to evaluate integrated ecosystem function. The LTER research is centered on the Gwynns Falls watershed, a 17,150 ha catchment that traverses a gradient from the urban core of Baltimore, through older urban residential (1900 - 1950) and suburban (1950- 1980) zones, rapidly suburbanizing areas and a rural/suburban fringe.
Stream temperature is continuously measured throughout the Gwynns Falls watershed along with supplemental sites around Baltimore County/City. A total of 22 sites contain sensors (HOBO Pro v2 Water Temperature Data Logger - U22-001) that take an instantaneous temperature reading every 2 minutes. These data are downloaded on a monthly basis.
This dataset is for at Gwynnbrook/Delight. This site samples drainage from approximately 1,000 ha of old and new suburban and suburbanizing land use.
A detailed description of this site is posted at: http://md.water.usgs.gov/BES/ 01589197/.
Streamflow data for this site are posted at: http://waterdata.usgs.gov/md/nwis/nwisman?site_no=01589197
Purpose: Long-term monitoring of stream temperature in a suburban catchment.
Theme keywords: stream, watershed, temperature, suburban, Baltimore Ecosystem Study
Coordinates: Lat/Long
39.4430 (39 26 35) (-)76.7834 (-76 47 00)
Review process for BES stream temperature data:
Raw data were recorded and logged every 2-minutes using HOBO Pro v2 Water Temperature Data Logger - U22-001.
Data are exported into Microsoft Excel documents.
Then organized by site and by month
Each month's data were entered into a pivot table in Microsoft Excel and daily means and counts of daily data points were calculated.
Plots were graphed of sites with close geographic proximity on the same graph to illustrate possible outlier data.
Missing and odd data were flagged, and notes taken from the field visits are provided where applicable.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 2: Tabular data. Pivot table of sample metadata. Number of S. Napoli isolates per source of isolation, isolation year, country of isolation and ST, respectively.
Stream Temperature: Site: Gwynns Falls at Villa Nova (GFVN):
In the Baltimore urban long-term ecological research (LTER) project, (Baltimore Ecosystem Study, BES) we use the watershed approach to evaluate integrated ecosystem function. The LTER research is centered on the Gwynns Falls watershed, a 17,150 ha catchment that traverses a gradient from the urban core of Baltimore, through older urban residential (1900 - 1950) and suburban (1950- 1980) zones, rapidly suburbanizing areas and a rural/suburban fringe.
Stream temperature is continuously measured throughout the Gwynns Falls watershed along with supplemental sites around Baltimore County/City. A total of 22 sites contain sensors (HOBO Pro v2 Water Temperature Data Logger - U22-001) that take an instantaneous temperature reading every 2 minutes. These data are downloaded on a monthly basis.
This dataset is for the Gwynns Falls at Villa Nova. This site samples drainage from approximately 7,400 ha of old and new suburban and suburbanzing land use. Streamflow at this station has been monitored continuously by the USGS since 1957 (with a hiatus from 1988 - 1995). This station is the boundary between the urban and suburban portions of the Gwynns Falls.
A detailed description of this site is posted at: http://md.water.usgs.gov/BES/01589300/.
Streamflow data for this site are posted at: http://waterdata.usgs.gov/md/nwis/nwisman?site_no=01589300
Purpose: Long-term monitoring of stream temperature in a watershed.
Theme keywords: stream, watershed, temperature, Baltimore Ecosystem Study
Coordinates: Lat/Long
39.3459 (39 20 45) -76.7333 (-76 43 60)
Review process for BES stream temperature data:
Raw data were recorded and logged every 2-minutes using HOBO Pro v2 Water Temperature Data Logger - U22-001.
Data are exported into Microsoft Excel documents.
Then organized by site and by month
Each month's data were entered into a pivot table in Microsoft Excel and daily means and counts of daily data points were calculated.
Plots were graphed of sites with close geographic proximity on the same graph to illustrate possible outlier data.
Missing and odd data were flagged, and notes taken from the field visits are provided where applicable.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This is a dataset to be used to explain pivot tables, as part of a School of Data course.