Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Sample data for exercises in Further Adventures in Data Cleaning.
Facebook
TwitterAhoy, data enthusiasts! Join us for a hands-on workshop where you will hoist your sails and navigate through the Statistics Canada website, uncovering hidden treasures in the form of data tables. With the wind at your back, you’ll master the art of downloading these invaluable Stats Can datasets while braving the occasional squall of data cleaning challenges using Excel with your trusty captains Vivek and Lucia at the helm.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
PROJECT OBJECTIVE
We are a part of XYZ Co Pvt Ltd company who is in the business of organizing the sports events at international level. Countries nominate sportsmen from different departments and our team has been given the responsibility to systematize the membership roster and generate different reports as per business requirements.
Questions (KPIs)
TASK 1: STANDARDIZING THE DATASET
TASK 2: DATA FORMATING
TASK 3: SUMMARIZE DATA - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1) • Create a PIVOT table in the worksheet ANALYSIS, starting at cell B3,with the following details:
TASK 4: SUMMARIZE DATA - EXCEL FUNCTIONS (Use SPORTSMEN worksheet after attempting TASK 1)
• Create a SUMMARY table in the worksheet ANALYSIS,starting at cell G4, with the following details:
TASK 5: GENERATE REPORT - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1)
• Create a PIVOT table report in the worksheet REPORT, starting at cell A3, with the following information:
Process
Facebook
TwitterThe main objectives of the survey were: - To obtain weights for the revision of the Consumer Price Index (CPI) for Funafuti; - To provide information on the nature and distribution of household income, expenditure and food consumption patterns; - To provide data on the household sector's contribution to the National Accounts - To provide information on economic activity of men and women to study gender issues - To undertake some poverty analysis
National, including Funafuti and Outer islands
All the private household are included in the sampling frame. In each household selected, the current resident are surveyed, and people who are usual resident but are currently away (work, health, holydays reasons, or border student for example. If the household had been residing in Tuvalu for less than one year: - but intend to reside more than 12 months => The household is included - do not intend to reside more than 12 months => out of scope
Sample survey data [ssd]
It was decided that 33% (one third) sample was sufficient to achieve suitable levels of accuracy for key estimates in the survey. So the sample selection was spread proportionally across all the island except Niulakita as it was considered too small. For selection purposes, each island was treated as a separate stratum and independent samples were selected from each. The strategy used was to list each dwelling on the island by their geographical position and run a systematic skip through the list to achieve the 33% sample. This approach assured that the sample would be spread out across each island as much as possible and thus more representative.
For details please refer to Table 1.1 of the Report.
Only the island of Niulakita was not included in the sampling frame, considered too small.
Face-to-face [f2f]
There were three main survey forms used to collect data for the survey. Each question are writen in English and translated in Tuvaluan on the same version of the questionnaire. The questionnaires were designed based on the 2004 survey questionnaire.
HOUSEHOLD FORM - composition of the household and demographic profile of each members - dwelling information - dwelling expenditure - transport expenditure - education expenditure - health expenditure - land and property expenditure - household furnishing - home appliances - cultural and social payments - holydays/travel costs - Loans and saving - clothing - other major expenditure items
INDIVIDUAL FORM - health and education - labor force (individu aged 15 and above) - employment activity and income (individu aged 15 and above): wages and salaries, working own business, agriculture and livestock, fishing, income from handicraft, income from gambling, small scale activies, jobs in the last 12 months, other income, childreen income, tobacco and alcohol use, other activities, and seafarer
DIARY (one diary per week, on a 2 weeks period, 2 diaries per household were required) - All kind of expenses - Home production - food and drink (eaten by the household, given away, sold) - Goods taken from own business (consumed, given away) - Monetary gift (given away, received, winning from gambling) - Non monetary gift (given away, received, winning from gambling)
Questionnaire Design Flaws Questionnaire design flaws address any problems with the way questions were worded which will result in an incorrect answer provided by the respondent. Despite every effort to minimize this problem during the design of the respective survey questionnaires and the diaries, problems were still identified during the analysis of the data. Some examples are provided below:
Gifts, Remittances & Donations Collecting information on the following: - the receipt and provision of gifts - the receipt and provision of remittances - the provision of donations to the church, other communities and family occasions is a very difficult task in a HIES. The extent of these activities in Tuvalu is very high, so every effort should be made to address these activities as best as possible. A key problem lies in identifying the best form (questionnaire or diary) for covering such activities. A general rule of thumb for a HIES is that if the activity occurs on a regular basis, and involves the exchange of small monetary amounts or in-kind gifts, the diary is more appropriate. On the other hand, if the activity is less infrequent, and involves larger sums of money, the questionnaire with a recall approach is preferred. It is not always easy to distinguish between the two for the different activities, and as such, both the diary and questionnaire were used to collect this information. Unfortunately it probably wasn?t made clear enough as to what types of transactions were being collected from the different sources, and as such some transactions might have been missed, and others counted twice. The effects of these problems are hopefully minimal overall.
Defining Remittances Because people have different interpretations of what constitutes remittances, the questionnaire needs to be very clear as to how this concept is defined in the survey. Unfortunately this wasn?t explained clearly enough so it was difficult to distinguish between a remittance, which should be of a more regular nature, and a one-off monetary gift which was transferred between two households.
Business Expenses Still Recorded The aim of the survey is to measure "household" expenditure, and as such, any expenditure made by a household for an item or service which was primarily used for a business activity should be excluded. It was not always clear in the questionnaire that this was the case, and as such some business expenses were included. Efforts were made during data cleaning to remove any such business expenses which would impact significantly on survey results.
Purchased goods given away as a gift When a household makes a gift donation of an item it has purchased, this is recorded in section 5 of the diary. Unfortunately it was difficult to know how to treat these items as it was not clear as to whether this item had been recorded already in section 1 of the diary which covers purchases. The decision was made to exclude all information of gifts given which were considered to be purchases, as these items were assumed to have already been recorded already in section 1. Ideally these items should be treated as a purchased gift given away, which in turn is not household consumption expenditure, but this was not possible.
Some key items missed in the Questionnaire Although not a big issue, some key expenditure items were omitted from the questionnaire when it would have been best to collect them via this schedule. A key example being electric fans which many households in Tuvalu own.
Consistency of the data: - each questionnaire was checked by the supervisor during and after the collection - before data entry, all the questionnaire were coded - the CSPRo data entry system included inconsistency checks which allow the NSO staff to point some errors and to correct them with imputation estimation from their own knowledge (no time for double entry), 4 data entry operators. - after data entry, outliers were identified in order to check their consistency.
All data entry, including editing, edit checks and queries, was done using CSPro (Census Survey Processing System) with additional data editing and cleaning taking place in Excel.
The staff from the CSD was responsible for undertaking the coding and data entry, with assistance from an additional four temporary staff to help produce results in a more timely manner.
Although enumeration didn't get completed until mid June, the coding and data entry commenced as soon as forms where available from Funafuti, which was towards the end of March. The coding and data entry was then completed around the middle of July.
A visit from an SPC consultant then took place to undertake initial cleaning of the data, primarily addressing missing data items and missing schedules. Once the initial data cleaning was undertaken in CSPro, data was transferred to Excel where it was closely scrutinized to check that all responses were sensible. In the cases where unusual values were identified, original forms were consulted for these households and modifications made to the data if required.
Despite the best efforts being made to clean the data file in preparation for the analysis, no doubt errors will still exist in the data, due to its size and complexity. Having said this, they are not expected to have significant impacts on the survey results.
Under-Reporting and Incorrect Reporting as a result of Poor Field Work Procedures The most crucial stage of any survey activity, whether it be a population census or a survey such as a HIES is the fieldwork. It is crucial for intense checking to take place in the field before survey forms are returned to the office for data processing. Unfortunately, it became evident during the cleaning of the data that fieldwork wasn?t checked as thoroughly as required, and as such some unexpected values appeared in the questionnaires, as well as unusual results appearing in the diaries. Efforts were made to indentify the main issues which would have the greatest impact on final results, and this information was modified using local knowledge, to a more reasonable answer, when required.
Data Entry Errors Data entry errors are always expected, but can be kept to a minimum with
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Netflix is a popular streaming service that offers a vast catalog of movies, TV shows, and original contents. This dataset is a cleaned version of the original version which can be found here. The data consist of contents added to Netflix from 2008 to 2021. The oldest content is as old as 1925 and the newest as 2021. This dataset will be cleaned with PostgreSQL and visualized with Tableau. The purpose of this dataset is to test my data cleaning and visualization skills. The cleaned data can be found below and the Tableau dashboard can be found here .
We are going to: 1. Treat the Nulls 2. Treat the duplicates 3. Populate missing rows 4. Drop unneeded columns 5. Split columns Extra steps and more explanation on the process will be explained through the code comments
--View dataset
SELECT *
FROM netflix;
--The show_id column is the unique id for the dataset, therefore we are going to check for duplicates
SELECT show_id, COUNT(*)
FROM netflix
GROUP BY show_id
ORDER BY show_id DESC;
--No duplicates
--Check null values across columns
SELECT COUNT(*) FILTER (WHERE show_id IS NULL) AS showid_nulls,
COUNT(*) FILTER (WHERE type IS NULL) AS type_nulls,
COUNT(*) FILTER (WHERE title IS NULL) AS title_nulls,
COUNT(*) FILTER (WHERE director IS NULL) AS director_nulls,
COUNT(*) FILTER (WHERE movie_cast IS NULL) AS movie_cast_nulls,
COUNT(*) FILTER (WHERE country IS NULL) AS country_nulls,
COUNT(*) FILTER (WHERE date_added IS NULL) AS date_addes_nulls,
COUNT(*) FILTER (WHERE release_year IS NULL) AS release_year_nulls,
COUNT(*) FILTER (WHERE rating IS NULL) AS rating_nulls,
COUNT(*) FILTER (WHERE duration IS NULL) AS duration_nulls,
COUNT(*) FILTER (WHERE listed_in IS NULL) AS listed_in_nulls,
COUNT(*) FILTER (WHERE description IS NULL) AS description_nulls
FROM netflix;
We can see that there are NULLS.
director_nulls = 2634
movie_cast_nulls = 825
country_nulls = 831
date_added_nulls = 10
rating_nulls = 4
duration_nulls = 3
The director column nulls is about 30% of the whole column, therefore I will not delete them. I will rather find another column to populate it. To populate the director column, we want to find out if there is relationship between movie_cast column and director column
-- Below, we find out if some directors are likely to work with particular cast
WITH cte AS
(
SELECT title, CONCAT(director, '---', movie_cast) AS director_cast
FROM netflix
)
SELECT director_cast, COUNT(*) AS count
FROM cte
GROUP BY director_cast
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC;
With this, we can now populate NULL rows in directors
using their record with movie_cast
UPDATE netflix
SET director = 'Alastair Fothergill'
WHERE movie_cast = 'David Attenborough'
AND director IS NULL ;
--Repeat this step to populate the rest of the director nulls
--Populate the rest of the NULL in director as "Not Given"
UPDATE netflix
SET director = 'Not Given'
WHERE director IS NULL;
--When I was doing this, I found a less complex and faster way to populate a column which I will use next
Just like the director column, I will not delete the nulls in country. Since the country column is related to director and movie, we are going to populate the country column with the director column
--Populate the country using the director column
SELECT COALESCE(nt.country,nt2.country)
FROM netflix AS nt
JOIN netflix AS nt2
ON nt.director = nt2.director
AND nt.show_id <> nt2.show_id
WHERE nt.country IS NULL;
UPDATE netflix
SET country = nt2.country
FROM netflix AS nt2
WHERE netflix.director = nt2.director and netflix.show_id <> nt2.show_id
AND netflix.country IS NULL;
--To confirm if there are still directors linked to country that refuse to update
SELECT director, country, date_added
FROM netflix
WHERE country IS NULL;
--Populate the rest of the NULL in director as "Not Given"
UPDATE netflix
SET country = 'Not Given'
WHERE country IS NULL;
The date_added rows nulls is just 10 out of over 8000 rows, deleting them cannot affect our analysis or visualization
--Show date_added nulls
SELECT show_id, date_added
FROM netflix_clean
WHERE date_added IS NULL;
--DELETE nulls
DELETE F...
Facebook
TwitterSurvey data from the Australian Marine Debris Initiative and the result of spatial analysis from multiple creative commons datasets. Data consists of: • Spatial Data Queensland Coastline – Event summaries within an Excel data table and shapefile • All years • Number of Items removed, Weight volunteers, Volume, Distance, Latitude and Longitude. • Contributing organisation files table/ sites • Environmental, physical and biological variables associated with the closest catchment to each debris survey. TBF has made all reasonable efforts to ensure that the information in the Custom Dataset is accurate. TBF will not be held responsible: • for the way these data are used by the Entity for their Reports; • for any errors that may be contained in the Custom Dataset; or • any direct or indirect damage the use of the Custom Dataset may cause. Data collected by TBF comes from citizen science initiatives and is taken at face value from contributors with each entry being vetted and periodic checks being made to maintain the integrity of the overall dataset. Some clean-up data has been extrapolated by data collectors. Some weight and distance details have not been provided by contributors. The data was collected by various organisations and individuals in clean-up events at their chosen locations where man-made items greater than 5mm were removed from the beach, and sorted, counted and recorded on data sheets, using CyberTracker software devices or the AMDI mobile application. Items were identified according to the method laid out in the TBF Marine Debris Identification Manual in which items are grouped according to their material categories (the manual is available on the TBF website). The length of beach cleaned is at the discretion of the clean-up group and the total weight of items removed is either weighed with handheld scales or estimated.
Facebook
TwitterSupply Chain Business Scenarios with Excel Analysis Presented by Discover Talent
In today’s dynamic and data-driven business environment, professionals in supply chain roles must be equipped with both domain knowledge and analytical skills. This document is designed to bridge that gap.
Whether you're a student, early-career professional, or someone seeking to enhance your practical understanding of supply chain operations — this resource will guide you through realistic business scenarios that are commonly encountered in the industry.
Each scenario is paired with:
A clear business problem
Simple, structured raw data
A step-by-step Excel-based solution
By working through these examples, learners will develop confidence in applying Excel to:
Make inventory decisions
Assess supplier performance
Analyze warehouse operations
Identify inefficiencies in cost, stock, or delivery
Why Read This Document?
✔ Gain practical exposure to supply chain analytics ✔ Learn Excel tools used in the field — IF statements, PivotTables, Conditional Formatting, and more ✔ Strengthen your job readiness for supply chain, logistics, and operations roles
This learning pack has been thoughtfully prepared by Discover Talent — a platform committed to delivering industry-relevant education through hands-on learning.www.discover-talent-presents.com
Facebook
TwitterVersion 5 release notes:
Removes support for SPSS and Excel data.Changes the crimes that are stored in each file. There are more files now with fewer crimes per file. The files and their included crimes have been updated below.
Adds in agencies that report 0 months of the year.Adds a column that indicates the number of months reported. This is generated summing up the number of unique months an agency reports data for. Note that this indicates the number of months an agency reported arrests for ANY crime. They may not necessarily report every crime every month. Agencies that did not report a crime with have a value of NA for every arrest column for that crime.Removes data on runaways.
Version 4 release notes:
Changes column names from "poss_coke" and "sale_coke" to "poss_heroin_coke" and "sale_heroin_coke" to clearly indicate that these column includes the sale of heroin as well as similar opiates such as morphine, codeine, and opium. Also changes column names for the narcotic columns to indicate that they are only for synthetic narcotics.
Version 3 release notes:
Add data for 2016.Order rows by year (descending) and ORI.Version 2 release notes:
Fix bug where Philadelphia Police Department had incorrect FIPS county code.
The Arrests by Age, Sex, and Race data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains highly granular data on the number of people arrested for a variety of crimes (see below for a full list of included crimes). The data sets here combine data from the years 1980-2015 into a single file. These files are quite large and may take some time to load.
All the data was downloaded from NACJD as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. For the R code used to clean this data, see here. https://github.com/jacobkap/crime_data. If you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.
I did not make any changes to the data other than the following. When an arrest column has a value of "None/not reported", I change that value to zero. This makes the (possible incorrect) assumption that these values represent zero crimes reported. The original data does not have a value when the agency reports zero arrests other than "None/not reported." In other words, this data does not differentiate between real zeros and missing values. Some agencies also incorrectly report the following numbers of arrests which I change to NA: 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 99999, 99998.
To reduce file size and make the data more manageable, all of the data is aggregated yearly. All of the data is in agency-year units such that every row indicates an agency in a given year. Columns are crime-arrest category units. For example, If you choose the data set that includes murder, you would have rows for each agency-year and columns with the number of people arrests for murder. The ASR data breaks down arrests by age and gender (e.g. Male aged 15, Male aged 18). They also provide the number of adults or juveniles arrested by race. Because most agencies and years do not report the arrestee's ethnicity (Hispanic or not Hispanic) or juvenile outcomes (e.g. referred to adult court, referred to welfare agency), I do not include these columns.
To make it easier to merge with other data, I merged this data with the Law Enforcement Agency Identifiers Crosswalk (LEAIC) data. The data from the LEAIC add FIPS (state, county, and place) and agency type/subtype. Please note that some of the FIPS codes have leading zeros and if you open it in Excel it will automatically delete those leading zeros.
I created 9 arrest categories myself. The categories are:
Total Male JuvenileTotal Female JuvenileTotal Male AdultTotal Female AdultTotal MaleTotal FemaleTotal JuvenileTotal AdultTotal ArrestsAll of these categories are based on the sums of the sex-age categories (e.g. Male under 10, Female aged 22) rather than using the provided age-race categories (e.g. adult Black, juvenile Asian). As not all agencies report the race data, my method is more accurate. These categories also make up the data in the "simple" version of the data. The "simple" file only includes the above 9 columns as the arrest data (all other columns in the data are just agency identifier columns). Because this "simple" data set need fewer columns, I include all offenses.
As the arrest data is very granular, and each category of arrest is its own column, there are dozens of columns per crime. To keep the data somewhat manageable, there are nine different files, eight which contain different crimes and the "simple" file. Each file contains the data for all years. The eight categories each have crimes belonging to a major crime category and do not overlap in crimes other than with the index offenses. Please note that the crime names provided below are not the same as the column names in the data. Due to Stata limiting column names to 32 characters maximum, I have abbreviated the crime names in the data. The files and their included crimes are:
Index Crimes
MurderRapeRobberyAggravated AssaultBurglaryTheftMotor Vehicle TheftArsonAlcohol CrimesDUIDrunkenness
LiquorDrug CrimesTotal DrugTotal Drug SalesTotal Drug PossessionCannabis PossessionCannabis SalesHeroin or Cocaine PossessionHeroin or Cocaine SalesOther Drug PossessionOther Drug SalesSynthetic Narcotic PossessionSynthetic Narcotic SalesGrey Collar and Property CrimesForgeryFraudStolen PropertyFinancial CrimesEmbezzlementTotal GamblingOther GamblingBookmakingNumbers LotterySex or Family CrimesOffenses Against the Family and Children
Other Sex Offenses
ProstitutionRapeViolent CrimesAggravated AssaultMurderNegligent ManslaughterRobberyWeapon Offenses
Other CrimesCurfewDisorderly ConductOther Non-trafficSuspicion
VandalismVagrancy
Simple
This data set has every crime and only the arrest categories that I created (see above).
If you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.
Facebook
TwitterThese data include the individual responses for the City of Tempe Annual Business Survey conducted by ETC Institute. These data help determine priorities for the community as part of the City's on-going strategic planning process. Averaged Business Survey results are used as indicators for city performance measures. The performance measures with indicators from the Business Survey include the following (as of 2023):1. Financial Stability and Vitality5.01 Quality of Business ServicesThe location data in this dataset is generalized to the block level to protect privacy. This means that only the first two digits of an address are used to map the location. When they data are shared with the city only the latitude/longitude of the block level address points are provided. This results in points that overlap. In order to better visualize the data, overlapping points were randomly dispersed to remove overlap. The result of these two adjustments ensure that they are not related to a specific address, but are still close enough to allow insights about service delivery in different areas of the city.Additional InformationSource: Business SurveyContact (author): Adam SamuelsContact E-Mail (author): Adam_Samuels@tempe.govContact (maintainer): Contact E-Mail (maintainer): Data Source Type: Excel tablePreparation Method: Data received from vendor after report is completedPublish Frequency: AnnualPublish Method: ManualData DictionaryMethods:The survey is mailed to a random sample of businesses in the City of Tempe. Follow up emails and texts are also sent to encourage participation. A link to the survey is provided with each communication. To prevent people who do not live in Tempe or who were not selected as part of the random sample from completing the survey, everyone who completed the survey was required to provide their address. These addresses were then matched to those used for the random representative sample. If the respondent’s address did not match, the response was not used.To better understand how services are being delivered across the city, individual results were mapped to determine overall distribution across the city.Processing and Limitations:The location data in this dataset is generalized to the block level to protect privacy. This means that only the first two digits of an address are used to map the location. When they data are shared with the city only the latitude/longitude of the block level address points are provided. This results in points that overlap. In order to better visualize the data, overlapping points were randomly dispersed to remove overlap. The result of these two adjustments ensure that they are not related to a specific address, but are still close enough to allow insights about service delivery in different areas of the city.The data are used by the ETC Institute in the final published PDF report.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The files contain the raw data of the following Master Thesis:
Förster, Wenzel
Application of green solvents to remove ionomer-containing binder for PEM water electrolyzer recycling
Master Thesis
TU Bergakademie Freiberg
Date of submission: 2024-12-10
The data contains two excel files and six zip-files.
Facebook
TwitterOpen Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
This dataset contains information on all Government of Canada award notices published according to the Financial Administration Act. It includes data for all Schedule I, Schedule II and Schedule III departments, agencies, Crown corporations, and other entities (unless specifically exempt) who must comply with the Government of Canada trade agreement obligations. CanadaBuys is the authoritative source of this information. Visit the How procurement works page on CanadaBuys to learn more. All data files in this collection share a common column structure, and the procurement category field (labelled as “*procurementCategory-categorieApprovisionnement*”) can be used to filter by the following four major categories of awards: - Awards for construction, which will have a value of “CNST” - Awards for goods, which will have a value of “GD” - Awards for services, which will have a value of “SRV” - Awards for services related to goods, which will have a value of “SRVTGD” Some award notices may be associated with one or more of the above procurement categories. >Note: Some records contain long award description values that may cause issues when viewed in certain spreadsheet programs, such as Microsoft Excel. When the information doesn’t fit within the cell’s character limit, the program will insert extra rows that don’t conform to the expected column formatting. (Though, all other records will still be displayed properly, in their own rows.) To quickly remove the “spill-over data” caused by this display error in Excel, select the publication date field (labelled as “*publicationDate-datePublication*”), then click the Filter button on the Data menu ribbon. You can then use the filter pull-down list to remove any blank or non-date values from this field, which will hide the rows that only contain “spill-over” description information. --- The following list describes the resources associated with this CanadaBuys award notices dataset. Additional information on Government of Canada award notices can be found on the Award notices tab of the CanadaBuys Tender opportunities page. >NOTE: While the CanadaBuys online portal includes awards notices from across multiple levels of government, the data files in this related dataset only include notices from federal government organizations. --- (1) CanadaBuys data dictionary: This XML file offers descriptions of each data field in the award notices files linked below, as well as other procurement-related datasets CanadaBuys produces. Use this as a guide for understanding the data elements in these files. This dictionary is updated as needed to reflect changes to the data elements. (2) All CanadaBuys award notices, 2022-08-08 onward: This file contains up to date information on all award notices published on CanadaBuys. This includes any award notices that were published on or after August 8, 2022, when CanadaBuys became the system of record for all tender and award notices for the Government of Canada. This file includes any amendments made to these award notices during their lifecycles. It is refreshed each morning, between 7:00 am and 8:30 am (UTC-0500) to include any updates or amendments, as needed. Award notices in this file can have any publication date on or after August 8, 2022 (displayed in the field labelled “*publicationDate-datePublication*”), and can have a status of active, cancelled or expired (displayed in the field labelled “*awardStatus-attributionStatut-eng*”). (3) Legacy award notices, 2012 to 2022-08 (prior to CanadaBuys): This file contains details of the award notices published prior to the implementation of CanadaBuys, which became the system of record for all tender and award notices for the Government of Canada on August 8, 2022. This datafile is refreshed monthly. The over 100,000 awards in this file have publication dates from August 6, 2022 and prior (displayed in the field labelled “*publicationDate-datePublication*”), and have a status of active, cancelled or expired (displayed included in the field labelled “*awardStatus-attributionStatut-eng*”). >Note: Procurement data was structured differently in the legacy applications previously used to administer Government of Canada contracts. Efforts have been made to manipulate these historical records into the structure used by the CanadaBuys data files, to make them easier to analyse and compare with new records. This process is not perfect since simple one-to-one mappings can’t be made in many cases. You can access these historical records in their original format as part of the archived copy of the original tender notices dataset, which contained awards-related data files. You can also refer to the supporting documentation for understanding the new CanadaBuys tender and award notices datasets. (4) Award notices, YYYY-YYYY: These files contain information on all contracts awarded in the specified fiscal year. The current fiscal year's file is refreshed each morning, between 7:00 am and 8:30 am (UTC-0500) to include any updates or amendments, as needed. The files associated with past fiscal years are updated monthly. Awards in these files can have any publication date between April 1 of a given year and March 31 of the subsequent year (displayed in the field labelled “*publicationDate-datePublication*”) and can have an award status of active, cancelled or expired (displayed in the field labelled “*awardStatus-attributionStatut-eng*”). >Note: New award notice data files will be added on April 1 for each fiscal year.
Facebook
TwitterTool: Microsoft Excel
Dataset: Coffee Sales
Process: 1. Data Cleaning: • Remove duplicates and blanks. • Standardize date and currency formats.
Data Manipulation:
• Sorting and filtering function to work
with interest subsets of data.
• Use XLOOKUP, INDEX-MATCH and IF
formula for efficient data manipulation,
such as retrieving, matching and
organising information in spreadsheets
Data Analysis: • Create Pivot Tables and Pivot Charts with the formatting to visualize trends.
Dashboard Development: • Insert Slicers with the formatting for easy filtering and dynamic updates.
Highlights: This project aims to understand coffee sales trends by country, roast type, and year, which could help identify marketing opportunities and customer segments.
Facebook
TwitterThe harmonized data set on health, created and published by the ERF, is a subset of Iraq Household Socio Economic Survey (IHSES) 2012. It was derived from the household, individual and health modules, collected in the context of the above mentioned survey. The sample was then used to create a harmonized health survey, comparable with the Iraq Household Socio Economic Survey (IHSES) 2007 micro data set.
----> Overview of the Iraq Household Socio Economic Survey (IHSES) 2012:
Iraq is considered a leader in household expenditure and income surveys where the first was conducted in 1946 followed by surveys in 1954 and 1961. After the establishment of Central Statistical Organization, household expenditure and income surveys were carried out every 3-5 years in (1971/ 1972, 1976, 1979, 1984/ 1985, 1988, 1993, 2002 / 2007). Implementing the cooperation between CSO and WB, Central Statistical Organization (CSO) and Kurdistan Region Statistics Office (KRSO) launched fieldwork on IHSES on 1/1/2012. The survey was carried out over a full year covering all governorates including those in Kurdistan Region.
The survey has six main objectives. These objectives are:
The raw survey data provided by the Statistical Office were then harmonized by the Economic Research Forum, to create a comparable version with the 2006/2007 Household Socio Economic Survey in Iraq. Harmonization at this stage only included unifying variables' names, labels and some definitions. See: Iraq 2007 & 2012- Variables Mapping & Availability Matrix.pdf provided in the external resources for further information on the mapping of the original variables on the harmonized ones, in addition to more indications on the variables' availability in both survey years and relevant comments.
National coverage: Covering a sample of urban, rural and metropolitan areas in all the governorates including those in Kurdistan Region.
1- Household/family. 2- Individual/person.
The survey was carried out over a full year covering all governorates including those in Kurdistan Region.
Sample survey data [ssd]
----> Design:
Sample size was (25488) household for the whole Iraq, 216 households for each district of 118 districts, 2832 clusters each of which includes 9 households distributed on districts and governorates for rural and urban.
----> Sample frame:
Listing and numbering results of 2009-2010 Population and Housing Survey were adopted in all the governorates including Kurdistan Region as a frame to select households, the sample was selected in two stages: Stage 1: Primary sampling unit (blocks) within each stratum (district) for urban and rural were systematically selected with probability proportional to size to reach 2832 units (cluster). Stage two: 9 households from each primary sampling unit were selected to create a cluster, thus the sample size of total survey clusters was 25488 households distributed on the governorates, 216 households in each district.
----> Sampling Stages:
In each district, the sample was selected in two stages: Stage 1: based on 2010 listing and numbering frame 24 sample points were selected within each stratum through systematic sampling with probability proportional to size, in addition to the implicit breakdown urban and rural and geographic breakdown (sub-district, quarter, street, county, village and block). Stage 2: Using households as secondary sampling units, 9 households were selected from each sample point using systematic equal probability sampling. Sampling frames of each stages can be developed based on 2010 building listing and numbering without updating household lists. In some small districts, random selection processes of primary sampling may lead to select less than 24 units therefore a sampling unit is selected more than once , the selection may reach two cluster or more from the same enumeration unit when it is necessary.
Face-to-face [f2f]
----> Preparation:
The questionnaire of 2006 survey was adopted in designing the questionnaire of 2012 survey on which many revisions were made. Two rounds of pre-test were carried out. Revision were made based on the feedback of field work team, World Bank consultants and others, other revisions were made before final version was implemented in a pilot survey in September 2011. After the pilot survey implemented, other revisions were made in based on the challenges and feedbacks emerged during the implementation to implement the final version in the actual survey.
----> Questionnaire Parts:
The questionnaire consists of four parts each with several sections: Part 1: Socio – Economic Data: - Section 1: Household Roster - Section 2: Emigration - Section 3: Food Rations - Section 4: housing - Section 5: education - Section 6: health - Section 7: Physical measurements - Section 8: job seeking and previous job
Part 2: Monthly, Quarterly and Annual Expenditures: - Section 9: Expenditures on Non – Food Commodities and Services (past 30 days). - Section 10 : Expenditures on Non – Food Commodities and Services (past 90 days). - Section 11: Expenditures on Non – Food Commodities and Services (past 12 months). - Section 12: Expenditures on Non-food Frequent Food Stuff and Commodities (7 days). - Section 12, Table 1: Meals Had Within the Residential Unit. - Section 12, table 2: Number of Persons Participate in the Meals within Household Expenditure Other Than its Members.
Part 3: Income and Other Data: - Section 13: Job - Section 14: paid jobs - Section 15: Agriculture, forestry and fishing - Section 16: Household non – agricultural projects - Section 17: Income from ownership and transfers - Section 18: Durable goods - Section 19: Loans, advances and subsidies - Section 20: Shocks and strategy of dealing in the households - Section 21: Time use - Section 22: Justice - Section 23: Satisfaction in life - Section 24: Food consumption during past 7 days
Part 4: Diary of Daily Expenditures: Diary of expenditure is an essential component of this survey. It is left at the household to record all the daily purchases such as expenditures on food and frequent non-food items such as gasoline, newspapers…etc. during 7 days. Two pages were allocated for recording the expenditures of each day, thus the roster will be consists of 14 pages.
----> Raw Data:
Data Editing and Processing: To ensure accuracy and consistency, the data were edited at the following stages: 1. Interviewer: Checks all answers on the household questionnaire, confirming that they are clear and correct. 2. Local Supervisor: Checks to make sure that questions has been correctly completed. 3. Statistical analysis: After exporting data files from excel to SPSS, the Statistical Analysis Unit uses program commands to identify irregular or non-logical values in addition to auditing some variables. 4. World Bank consultants in coordination with the CSO data management team: the World Bank technical consultants use additional programs in SPSS and STAT to examine and correct remaining inconsistencies within the data files. The software detects errors by analyzing questionnaire items according to the expected parameter for each variable.
----> Harmonized Data:
Iraq Household Socio Economic Survey (IHSES) reached a total of 25488 households. Number of households refused to response was 305, response rate was 98.6%. The highest interview rates were in Ninevah and Muthanna (100%) while the lowest rates were in Sulaimaniya (92%).
Facebook
TwitterOpen Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
This dataset contains information on Government of Canada tender information published according to the Financial Administration Act. It includes data for all Schedule I, Schedule II and Schedule III departments, agencies, Crown corporations, and other entities (unless specifically exempt) who must comply with the Government of Canada trade agreement obligations. CanadaBuys is the authoritative source of this information. Visit the How procurement works page on the CanadaBuys website to learn more. All data files in this collection share a common column structure, and the procurement category field (labelled as “procurementCategory-categorieApprovisionnement”) can be used to filter by the following four major categories of tenders: Tenders for construction, which will have a value of “CNST” Tenders for goods, which will have a value of “GD” Tenders for services, which will have a value of “SRV” Tenders for services related to goods, which will have a value of “SRVTGD” A tender may be associated with one or more of the above procurement categories. Note: Some records contain long tender description values that may cause issues when viewed in certain spreadsheet programs, such as Microsoft Excel. When the information doesn’t fit within the cell’s character limit, the program will insert extra rows that don’t conform to the expected column formatting. (Though, all other records will still be displayed properly, in their own rows.) To quickly remove the “spill-over data” caused by this display error in Excel, select the publication date field (labelled as “publicationDate-datePublication”), then click the Filter button on the Data menu ribbon. You can then use the filter pull-down list to remove any blank or non-date values from this field, which will hide the rows that only contain “spill-over” description information. The following list describes the resources associated with this CanadaBuys tender notices dataset. Additional information on Government of Canada tenders can also be found on the Tender notices tab of the CanadaBuys tender opportunities page. NOTE: While the CanadaBuys online portal includes tender opportunities from across multiple levels of government, the data files in this related dataset only include notices from federal government organizations. (1) CanadaBuys data dictionary: This XML file offers descriptions of each data field in the tender notices files linked below, as well as other procurement-related datasets CanadaBuys produces. Use this as a guide for understanding the data elements in these files. This dictionary is updated as needed to reflect changes to the data elements. (2) New tender notices: This file contains up to date information on all new tender notices that are published to CanadaBuys throughout a given day. The file is updated every two hours, from 6:15 am until 10:15 pm (UTC-0500) to include new tenders as they are published. All tenders in this file will have a publication date matching the current day (displayed in the field labelled “publicationDate-datePublication”), or the day prior for systems that feed into this file on a nightly basis. (3) Open tender notices: This file contains up to date information on all tender notices that are open for bidding on CanadaBuys, including any amendments made to these tender notices during their lifecycles. The file is refreshed each morning, between 7:00 am and 8:30 am (UTC-0500) to include newly published open tenders. All tenders in this file will have a status of open (displayed in the field labelled “tenderStatus-tenderStatut-eng”). (4) All CanadaBuys tender notices, 2022-08-08 onwards: This file contains up to date information on all tender notices published through CanadaBuys. This includes any tender notices that were open for bids on or after August 8, 2022, when CanadaBuys launched as the system of record for all Tender Notices for the Government of Canada. This file includes any amendments made to these tender notices during their lifecycles. It is refreshed each morning, between 7:00 am and 8:30 am (UTC-0500) to include any updates or amendments, as needed. Tender notices in this file can have any publication date on or after August 8, 2022 (displayed in the field labelled “publicationDate-datePublication”), and can have a status of open, cancelled or expired (displayed in the field labelled “tenderStatus-tenderStatut-eng”). (5) Legacy tender notices, 2009 to 2022-08 (prior to CanadaBuys): This file contains details of the tender notices that were launched prior to the implementation of CanadaBuys, which became the system of record for all tender notices for the Government of Canada on August 8, 2022. This datafile is refreshed monthly. The over 70,000 tenders in this file have publication dates from August 5, 2022 and before (displayed in the field labelled “publicationDate-datePublication”) and have a status of cancelled or expired (displayed in the field labelled “tenderStatus-tenderStatut-eng”). Note: Procurement data was structured differently in the legacy applications previously used to administer Government of Canada tender notices. Efforts have been made to manipulate these historical records into the structure used by the CanadaBuys data files, to make them easier to analyse and compare with new records. This process is not perfect since simple one-to-one mappings can’t be made in many cases. You can access these historical records in their original format as part of the archived copy of the original tender notices dataset. You can also refer to the supporting documentation for understanding the new CanadaBuys tender and award notices datasets. (6) Tender notices, YYYY-YYYY: These files contain information on all tender notices published in the specified fiscal year that are no longer open to bidding. The current fiscal year's file is refreshed each morning, between 7:00 am and 8:30 am (UTC-0500) to include any updates or amendments, as needed. The files associated with past fiscal years are refreshed monthly. Tender notices in these files can have any publication date between April 1 of a given year and March 31 of the subsequent year (displayed in the field labelled “publicationDate-datePublication”) and can have a status of cancelled or expired (displayed in the field labelled “tenderStatus-tenderStatut-eng”). New records are added to these files once related tenders reach their close date, or are cancelled. Note: New tender notice data files will be added on April 1 for each fiscal year.
Facebook
TwitterOpen Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
This dataset contains information on all Government of Canada award notices published according to the Financial Administration Act. It includes data for all Schedule I, Schedule II and Schedule III departments, agencies, Crown corporations, and other entities (unless specifically exempt) who must comply with the Government of Canada trade agreement obligations. CanadaBuys is the authoritative source of this information. Visit the How procurement works page on CanadaBuys to learn more. All data files in this collection share a common column structure, and the procurement category field (labelled as “procurementCategory-categorieApprovisionnement”) can be used to filter by the following four major categories of awards: Awards for construction, which will have a value of “CNST” Awards for goods, which will have a value of “GD” Awards for services, which will have a value of “SRV” Awards for services related to goods, which will have a value of “SRVTGD” Some award notices may be associated with one or more of the above procurement categories. Note: Some records contain long award description values that may cause issues when viewed in certain spreadsheet programs, such as Microsoft Excel. When the information doesn’t fit within the cell’s character limit, the program will insert extra rows that don’t conform to the expected column formatting. (Though, all other records will still be displayed properly, in their own rows.) To quickly remove the “spill-over data” caused by this display error in Excel, select the publication date field (labelled as “publicationDate-datePublication”), then click the Filter button on the Data menu ribbon. You can then use the filter pull-down list to remove any blank or non-date values from this field, which will hide the rows that only contain “spill-over” description information. The following list describes the resources associated with this CanadaBuys award notices dataset. Additional information on Government of Canada award notices can be found on the Award notices tab of the CanadaBuys Tender opportunities page. NOTE: While the CanadaBuys online portal includes awards notices from across multiple levels of government, the data files in this related dataset only include notices from federal government organizations. (1) CanadaBuys data dictionary: This XML file offers descriptions of each data field in the award notices files linked below, as well as other procurement-related datasets CanadaBuys produces. Use this as a guide for understanding the data elements in these files. This dictionary is updated as needed to reflect changes to the data elements. (2) All CanadaBuys award notices, 2022-08-08 onward: This file contains up to date information on all award notices published on CanadaBuys. This includes any award notices that were published on or after August 8, 2022, when CanadaBuys became the system of record for all tender and award notices for the Government of Canada. This file includes any amendments made to these award notices during their lifecycles. It is refreshed each morning, between 7:00 am and 8:30 am (UTC-0500) to include any updates or amendments, as needed. Award notices in this file can have any publication date on or after August 8, 2022 (displayed in the field labelled “publicationDate-datePublication”), and can have a status of active, cancelled or expired (displayed in the field labelled “awardStatus-attributionStatut-eng”). (3) Legacy award notices, 2012 to 2022-08 (prior to CanadaBuys): This file contains details of the award notices published prior to the implementation of CanadaBuys, which became the system of record for all tender and award notices for the Government of Canada on August 8, 2022. This datafile is refreshed monthly. The over 100,000 awards in this file have publication dates from August 6, 2022 and prior (displayed in the field labelled “publicationDate-datePublication”), and have a status of active, cancelled or expired (displayed included in the field labelled “awardStatus-attributionStatut-eng”). Note: Procurement data was structured differently in the legacy applications previously used to administer Government of Canada contracts. Efforts have been made to manipulate these historical records into the structure used by the CanadaBuys data files, to make them easier to analyse and compare with new records. This process is not perfect since simple one-to-one mappings can’t be made in many cases. You can access these historical records in their original format as part of the archived copy of the original tender notices dataset, which contained awards-related data files. You can also refer to the supporting documentation for understanding the new CanadaBuys tender and award notices datasets. (4) Award notices, YYYY-YYYY: These files contain information on all contracts awarded in the specified fiscal year. The current fiscal year's file is refreshed each morning, between 7:00 am and 8:30 am (UTC-0500) to include any updates or amendments, as needed. The files associated with past fiscal years are updated monthly. Awards in these files can have any publication date between April 1 of a given year and March 31 of the subsequent year (displayed in the field labelled “publicationDate-datePublication”) and can have an award status of active, cancelled or expired (displayed in the field labelled “awardStatus-attributionStatut-eng”). Note: New award notice data files will be added on April 1 for each fiscal year.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Canadian contribution and data set prepared as part of the Global Media and Internet Concentration (GMIC) project offers an independent academic, empirical and data-driven analysis of a deceptively simple yet profoundly important question: have telecom, media and internet markets become more concentrated over time, or less? Media Ownership and Concentration is presented from more than a dozen sectors of the telecom-media-internet industries, including film, music and book industries. Note (22/01/2024): Small editorial changes were made throughout the report to clean up and improve the text. Small revisions to the estimates of the internet advertising revenue for some Canadian firms were also made to reflect newly available data. Those revisions were small and have no consequences for the analysis. Figures 1, 23, 25, 37, 40 and 41 were revised to reflect these changes.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This collection contains the 17 anonymised datasets from the RAAAP-2 international survey of research management and administration professional undertaken in 2019. To preserve anonymity the data are presented in 17 datasets linked only by AnalysisRegionofEmployment, as many of the textual responses, even though redacted to remove institutional affiliation could be used to identify some individuals if linked to the other data. Each dataset is presented in the original SPSS format, suitable for further analyses, as well as an Excel equivalent for ease of viewing. There are additional files in this collection showing the the questionnaire and the mappings to the datasets together with the SPSS scripts used to produce the datasets. These data follow on from, but re not directly linked to the first RAAAP survey undertaken in 2016, data from which can also be found in FigShare Errata (16/5/23) an error in v13 of the main Data Cleansing syntax file (now updated to v14) meant that two variables were missing their value labels (the underlying codes were correct) - a new version (SPSS & Excel) of the Main Dataset has been updated
Facebook
Twitterhttps://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
The Superstore Sales Data dataset, available in an Excel format as "Superstore.xlsx," is a comprehensive collection of sales and customer-related information from a retail superstore. This dataset comprises* three distinct tables*, each providing specific insights into the store's operations and customer interactions.
Facebook
TwitterThis dataset was created by Shiva Vashishtha
Facebook
TwitterEurovision Song Contest Based on the Sanremo Music Festival held in Italy since 1951, Eurovision has been held annually since 1956 (apart from 2020), making it the longest-running annual international televised music competition and one of the world's longest-running television programmes. Business Needs 1) Top 5 artist 2)Average point got for an artist in Group and solo performance 3)Total Point by region 4)Total song played in contest 5)Year wise total point and total place
According to business need started clean up data using Excel and for data visualization used Power BI and made suitable dashboard for it
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Sample data for exercises in Further Adventures in Data Cleaning.