CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Sample data for exercises in Further Adventures in Data Cleaning.
Ahoy, data enthusiasts! Join us for a hands-on workshop where you will hoist your sails and navigate through the Statistics Canada website, uncovering hidden treasures in the form of data tables. With the wind at your back, you’ll master the art of downloading these invaluable Stats Can datasets while braving the occasional squall of data cleaning challenges using Excel with your trusty captains Vivek and Lucia at the helm.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This collection contains the 17 anonymised datasets from the RAAAP-2 international survey of research management and administration professional undertaken in 2019. To preserve anonymity the data are presented in 17 datasets linked only by AnalysisRegionofEmployment, as many of the textual responses, even though redacted to remove institutional affiliation could be used to identify some individuals if linked to the other data. Each dataset is presented in the original SPSS format, suitable for further analyses, as well as an Excel equivalent for ease of viewing. There are additional files in this collection showing the the questionnaire and the mappings to the datasets together with the SPSS scripts used to produce the datasets. These data follow on from, but re not directly linked to the first RAAAP survey undertaken in 2016, data from which can also be found in FigShare Errata (16/5/23) an error in v13 of the main Data Cleansing syntax file (now updated to v14) meant that two variables were missing their value labels (the underlying codes were correct) - a new version (SPSS & Excel) of the Main Dataset has been updated
About this course Do you have messy data from multiple inconsistent sources, or open-responses to questionnaires? Do you want to improve the quality of your data by refining it and using the power of the internet? Open Refine is the perfect partner to Excel. It is a powerful, free tool for exploring, normalising and cleaning datasets, and extending data by accessing the internet through APIs. In this course we’ll work through the various features of Refine, including importing data, faceting, clustering, and calling remote APIs, by working on a fictional but plausible humanities research project. Learning Outcomes Download, install and run Open Refine Import data from csv, text or online sources and create projects Navigate data using the Open Refine interface Explore data by using facets Clean data using clustering Parse data using GREL syntax Extend data using Application Programming Interfaces (APIs) Export project for use in other applications Prerequisites The course has no prerequisites. Licence Copyright © 2021 Intersect Australia Ltd. All rights reserved.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
By Department of Energy [source]
The Building Energy Data Book (2011) is an invaluable resource for gaining insight into the current state of energy consumption in the buildings sector. This dataset provides comprehensive data on residential, commercial and industrial building energy consumption, construction techniques, building technologies and characteristics. With this resource, you can get an in-depth understanding of how energy is used in various types of buildings - from single family homes to large office complexes - as well as its impact on the environment. The BTO within the U.S Department of Energy's Office of Energy Efficiency and Renewable Energy developed this dataset to provide a wealth of knowledge for researchers, policy makers, engineers and even everyday observers who are interested in learning more about our built environment and its energy usage patterns
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset provides comprehensive information regarding energy consumption in the buildings sector of the United States. It contains a number of key variables which can be used to analyze and explore the relations between energy consumption and building characteristics, technologies, and construction. The data is provided in both CSV format as well as tabular format which can make it helpful for those who prefer to use programs like Excel or other statistical modeling software.
In order to get started with this dataset we've developed a guide outlining how to effectively use it for your research or project needs.
Understand what's included: Before you start analyzing the data, you should read through the provided documentation so that you fully understand what is included in the datasets. You'll want to be aware of any potential limitations or requirements associated with each type of data point so that your results are valid and reliable when drawing conclusions from them.
Clean up any outliers: You may need to take some time upfront investigating suspicious outliers within your dataset before using it in any further analyses — otherwise, they can skew results down the road if not dealt with first-hand! Furthermore, they could also make complex statistical modeling more difficult as well since they artificially inflate values depending on their magnitude within each example data point (i.e., one outlier could affect an entire model’s prior distributions). Missing values should also be accounted for too since these may not always appear obvious at first glance when reviewing a table or graphical representation - but accurate statistics must still be obtained either way no matter how messy things seem!
Exploratory data analysis: After cleaning up your dataset you'll want to do some basic exploring by visualizing different types of summaries like boxplots, histograms and scatter plots etc.. This will give you an initial case into what trends might exist within certain demographic/geographic/etc.. regions & variables which can then help inform future predictive models when needed! Additionally this step will highlight any clear discontinuous changes over time due over-generalization (if applicable), making sure predictors themselves don’t become part noise instead contributing meaningful signals towards overall effect predictions accuracy etc…
Analyze key metrics & observations: Once exploratory analyses have been carried out on rawsamples post-processing steps are next such as analyzing metrics such ascorrelations amongst explanatory functions; performing significance testing regression models; imputing missing/outlier values and much more depending upon specific project needs at hand… Additionally – interpretation efforts based
- Creating an energy efficiency rating system for buildings - Using the dataset, an organization can develop a metric to rate the energy efficiency of commercial and residential buildings in a standardized way.
- Developing targeted campaigns to raise awareness about energy conservation - Analyzing data from this dataset can help organizations identify areas of high energy consumption and create targeted campaigns and incentives to encourage people to conserve energy in those areas.
- Estimating costs associated with upgrading building technologies - By evaluating various trends in building technologies and their associated costs, decision-makers can determine the most cost-effective option when it comes time to upgrade their structures' energy efficiency...
Survey data from the Australian Marine Debris Initiative and the result of spatial analysis from multiple creative commons datasets. Data consists of: • Spatial Data Queensland Coastline – Event summaries within an Excel data table and shapefile • All years • Number of Items removed, Weight volunteers, Volume, Distance, Latitude and Longitude. • Contributing organisation files table/ sites • Environmental, physical and biological variables associated with the closest catchment to each debris survey. TBF has made all reasonable efforts to ensure that the information in the Custom Dataset is accurate. TBF will not be held responsible: • for the way these data are used by the Entity for their Reports; • for any errors that may be contained in the Custom Dataset; or • any direct or indirect damage the use of the Custom Dataset may cause. Data collected by TBF comes from citizen science initiatives and is taken at face value from contributors with each entry being vetted and periodic checks being made to maintain the integrity of the overall dataset. Some clean-up data has been extrapolated by data collectors. Some weight and distance details have not been provided by contributors. The data was collected by various organisations and individuals in clean-up events at their chosen locations where man-made items greater than 5mm were removed from the beach, and sorted, counted and recorded on data sheets, using CyberTracker software devices or the AMDI mobile application. Items were identified according to the method laid out in the TBF Marine Debris Identification Manual in which items are grouped according to their material categories (the manual is available on the TBF website). The length of beach cleaned is at the discretion of the clean-up group and the total weight of items removed is either weighed with handheld scales or estimated.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These data include the individual responses for the City of Tempe Annual Business Survey conducted by ETC Institute. These data help determine priorities for the community as part of the City's on-going strategic planning process. Averaged Business Survey results are used as indicators for city performance measures. The performance measures with indicators from the Business Survey include the following (as of 2023):1. Financial Stability and Vitality5.01 Quality of Business ServicesThe location data in this dataset is generalized to the block level to protect privacy. This means that only the first two digits of an address are used to map the location. When they data are shared with the city only the latitude/longitude of the block level address points are provided. This results in points that overlap. In order to better visualize the data, overlapping points were randomly dispersed to remove overlap. The result of these two adjustments ensure that they are not related to a specific address, but are still close enough to allow insights about service delivery in different areas of the city.Additional InformationSource: Business SurveyContact (author): Adam SamuelsContact E-Mail (author): Adam_Samuels@tempe.govContact (maintainer): Contact E-Mail (maintainer): Data Source Type: Excel tablePreparation Method: Data received from vendor after report is completedPublish Frequency: AnnualPublish Method: ManualData DictionaryMethods:The survey is mailed to a random sample of businesses in the City of Tempe. Follow up emails and texts are also sent to encourage participation. A link to the survey is provided with each communication. To prevent people who do not live in Tempe or who were not selected as part of the random sample from completing the survey, everyone who completed the survey was required to provide their address. These addresses were then matched to those used for the random representative sample. If the respondent’s address did not match, the response was not used.To better understand how services are being delivered across the city, individual results were mapped to determine overall distribution across the city.Processing and Limitations:The location data in this dataset is generalized to the block level to protect privacy. This means that only the first two digits of an address are used to map the location. When they data are shared with the city only the latitude/longitude of the block level address points are provided. This results in points that overlap. In order to better visualize the data, overlapping points were randomly dispersed to remove overlap. The result of these two adjustments ensure that they are not related to a specific address, but are still close enough to allow insights about service delivery in different areas of the city.The data are used by the ETC Institute in the final published PDF report.
The data set attached consists of one excel file where the data from the article “Start-up of a microalgae-based treatment system within the biorefinery concept: from wastewater to bioproducts”, published in Water Science and Technology ( vol. 78(1-2), August 2018, 114-124. ), can be found. The data is scarce, as it is an introductory article to the plant design and objectives.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The files contain the raw data of the following Master Thesis:
Förster, Wenzel
Application of green solvents to remove ionomer-containing binder for PEM water electrolyzer recycling
Master Thesis
TU Bergakademie Freiberg
Date of submission: 2024-12-10
The data contains two excel files and six zip-files.
The main objectives of the survey were: - To obtain weights for the revision of the Consumer Price Index (CPI) for Funafuti; - To provide information on the nature and distribution of household income, expenditure and food consumption patterns; - To provide data on the household sector's contribution to the National Accounts - To provide information on economic activity of men and women to study gender issues - To undertake some poverty analysis
National, including Funafuti and Outer islands
All the private household are included in the sampling frame. In each household selected, the current resident are surveyed, and people who are usual resident but are currently away (work, health, holydays reasons, or border student for example. If the household had been residing in Tuvalu for less than one year: - but intend to reside more than 12 months => The household is included - do not intend to reside more than 12 months => out of scope
Sample survey data [ssd]
It was decided that 33% (one third) sample was sufficient to achieve suitable levels of accuracy for key estimates in the survey. So the sample selection was spread proportionally across all the island except Niulakita as it was considered too small. For selection purposes, each island was treated as a separate stratum and independent samples were selected from each. The strategy used was to list each dwelling on the island by their geographical position and run a systematic skip through the list to achieve the 33% sample. This approach assured that the sample would be spread out across each island as much as possible and thus more representative.
For details please refer to Table 1.1 of the Report.
Only the island of Niulakita was not included in the sampling frame, considered too small.
Face-to-face [f2f]
There were three main survey forms used to collect data for the survey. Each question are writen in English and translated in Tuvaluan on the same version of the questionnaire. The questionnaires were designed based on the 2004 survey questionnaire.
HOUSEHOLD FORM - composition of the household and demographic profile of each members - dwelling information - dwelling expenditure - transport expenditure - education expenditure - health expenditure - land and property expenditure - household furnishing - home appliances - cultural and social payments - holydays/travel costs - Loans and saving - clothing - other major expenditure items
INDIVIDUAL FORM - health and education - labor force (individu aged 15 and above) - employment activity and income (individu aged 15 and above): wages and salaries, working own business, agriculture and livestock, fishing, income from handicraft, income from gambling, small scale activies, jobs in the last 12 months, other income, childreen income, tobacco and alcohol use, other activities, and seafarer
DIARY (one diary per week, on a 2 weeks period, 2 diaries per household were required) - All kind of expenses - Home production - food and drink (eaten by the household, given away, sold) - Goods taken from own business (consumed, given away) - Monetary gift (given away, received, winning from gambling) - Non monetary gift (given away, received, winning from gambling)
Questionnaire Design Flaws Questionnaire design flaws address any problems with the way questions were worded which will result in an incorrect answer provided by the respondent. Despite every effort to minimize this problem during the design of the respective survey questionnaires and the diaries, problems were still identified during the analysis of the data. Some examples are provided below:
Gifts, Remittances & Donations Collecting information on the following: - the receipt and provision of gifts - the receipt and provision of remittances - the provision of donations to the church, other communities and family occasions is a very difficult task in a HIES. The extent of these activities in Tuvalu is very high, so every effort should be made to address these activities as best as possible. A key problem lies in identifying the best form (questionnaire or diary) for covering such activities. A general rule of thumb for a HIES is that if the activity occurs on a regular basis, and involves the exchange of small monetary amounts or in-kind gifts, the diary is more appropriate. On the other hand, if the activity is less infrequent, and involves larger sums of money, the questionnaire with a recall approach is preferred. It is not always easy to distinguish between the two for the different activities, and as such, both the diary and questionnaire were used to collect this information. Unfortunately it probably wasn?t made clear enough as to what types of transactions were being collected from the different sources, and as such some transactions might have been missed, and others counted twice. The effects of these problems are hopefully minimal overall.
Defining Remittances Because people have different interpretations of what constitutes remittances, the questionnaire needs to be very clear as to how this concept is defined in the survey. Unfortunately this wasn?t explained clearly enough so it was difficult to distinguish between a remittance, which should be of a more regular nature, and a one-off monetary gift which was transferred between two households.
Business Expenses Still Recorded The aim of the survey is to measure "household" expenditure, and as such, any expenditure made by a household for an item or service which was primarily used for a business activity should be excluded. It was not always clear in the questionnaire that this was the case, and as such some business expenses were included. Efforts were made during data cleaning to remove any such business expenses which would impact significantly on survey results.
Purchased goods given away as a gift When a household makes a gift donation of an item it has purchased, this is recorded in section 5 of the diary. Unfortunately it was difficult to know how to treat these items as it was not clear as to whether this item had been recorded already in section 1 of the diary which covers purchases. The decision was made to exclude all information of gifts given which were considered to be purchases, as these items were assumed to have already been recorded already in section 1. Ideally these items should be treated as a purchased gift given away, which in turn is not household consumption expenditure, but this was not possible.
Some key items missed in the Questionnaire Although not a big issue, some key expenditure items were omitted from the questionnaire when it would have been best to collect them via this schedule. A key example being electric fans which many households in Tuvalu own.
Consistency of the data: - each questionnaire was checked by the supervisor during and after the collection - before data entry, all the questionnaire were coded - the CSPRo data entry system included inconsistency checks which allow the NSO staff to point some errors and to correct them with imputation estimation from their own knowledge (no time for double entry), 4 data entry operators. - after data entry, outliers were identified in order to check their consistency.
All data entry, including editing, edit checks and queries, was done using CSPro (Census Survey Processing System) with additional data editing and cleaning taking place in Excel.
The staff from the CSD was responsible for undertaking the coding and data entry, with assistance from an additional four temporary staff to help produce results in a more timely manner.
Although enumeration didn't get completed until mid June, the coding and data entry commenced as soon as forms where available from Funafuti, which was towards the end of March. The coding and data entry was then completed around the middle of July.
A visit from an SPC consultant then took place to undertake initial cleaning of the data, primarily addressing missing data items and missing schedules. Once the initial data cleaning was undertaken in CSPro, data was transferred to Excel where it was closely scrutinized to check that all responses were sensible. In the cases where unusual values were identified, original forms were consulted for these households and modifications made to the data if required.
Despite the best efforts being made to clean the data file in preparation for the analysis, no doubt errors will still exist in the data, due to its size and complexity. Having said this, they are not expected to have significant impacts on the survey results.
Under-Reporting and Incorrect Reporting as a result of Poor Field Work Procedures The most crucial stage of any survey activity, whether it be a population census or a survey such as a HIES is the fieldwork. It is crucial for intense checking to take place in the field before survey forms are returned to the office for data processing. Unfortunately, it became evident during the cleaning of the data that fieldwork wasn?t checked as thoroughly as required, and as such some unexpected values appeared in the questionnaires, as well as unusual results appearing in the diaries. Efforts were made to indentify the main issues which would have the greatest impact on final results, and this information was modified using local knowledge, to a more reasonable answer, when required.
Data Entry Errors Data entry errors are always expected, but can be kept to a minimum with
description: Subtitle I of the Resource Conservation and Recovery Act, as amended by the Hazardous Waste Disposal Act of 1984, brought underground storage tanks (USTs) under federal regulation. EPA implements the underground storage tank (UST) program in Indian country, providing support to tribal governments to prevent and clean up petroleum releases from USTs. The UST program in Indian country includes marketers and nonretail facilities that have USTs. Marketers include retail facilities such as gas stations and convenience stores that sell petroleum products. Non-retail facilities include those that do not sell petroleum products, but may rely on their own supply of gasoline or diesel for taxis, buses, limousines, trucks, vans, boats, heavy equipment, or a wide range of other vehicles. Of the more than 560 federally recognized tribes about 200 have federally-regulated underground storage tanks on their lands. Of those 200 tribes, over half have 10 or fewer active underground storage tanks. About 20 tribes have 30 or more underground storage tanks. Data on sites managed by this program is assembled by the EPA Regional Offices and varies from region to region in scope and content. Not all regions include Indian Nations. Publicly available data is limited to Excel spreadsheets, but regional contacts are also available to answer questions about the data. Data is updated in May and November of each year.; abstract: Subtitle I of the Resource Conservation and Recovery Act, as amended by the Hazardous Waste Disposal Act of 1984, brought underground storage tanks (USTs) under federal regulation. EPA implements the underground storage tank (UST) program in Indian country, providing support to tribal governments to prevent and clean up petroleum releases from USTs. The UST program in Indian country includes marketers and nonretail facilities that have USTs. Marketers include retail facilities such as gas stations and convenience stores that sell petroleum products. Non-retail facilities include those that do not sell petroleum products, but may rely on their own supply of gasoline or diesel for taxis, buses, limousines, trucks, vans, boats, heavy equipment, or a wide range of other vehicles. Of the more than 560 federally recognized tribes about 200 have federally-regulated underground storage tanks on their lands. Of those 200 tribes, over half have 10 or fewer active underground storage tanks. About 20 tribes have 30 or more underground storage tanks. Data on sites managed by this program is assembled by the EPA Regional Offices and varies from region to region in scope and content. Not all regions include Indian Nations. Publicly available data is limited to Excel spreadsheets, but regional contacts are also available to answer questions about the data. Data is updated in May and November of each year.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
This dataset contains information on Government of Canada tender information published according to the Financial Administration Act. It includes data for all Schedule I, Schedule II and Schedule III departments, agencies, Crown corporations, and other entities (unless specifically exempt) who must comply with the Government of Canada trade agreement obligations. CanadaBuys is the authoritative source of this information. Visit the How procurement works page on the CanadaBuys website to learn more. All data files in this collection share a common column structure, and the procurement category field (labelled as “procurementCategory-categorieApprovisionnement”) can be used to filter by the following four major categories of tenders: Tenders for construction, which will have a value of “CNST” Tenders for goods, which will have a value of “GD” Tenders for services, which will have a value of “SRV” Tenders for services related to goods, which will have a value of “SRVTGD” A tender may be associated with one or more of the above procurement categories. Note: Some records contain long tender description values that may cause issues when viewed in certain spreadsheet programs, such as Microsoft Excel. When the information doesn’t fit within the cell’s character limit, the program will insert extra rows that don’t conform to the expected column formatting. (Though, all other records will still be displayed properly, in their own rows.) To quickly remove the “spill-over data” caused by this display error in Excel, select the publication date field (labelled as “publicationDate-datePublication”), then click the Filter button on the Data menu ribbon. You can then use the filter pull-down list to remove any blank or non-date values from this field, which will hide the rows that only contain “spill-over” description information. The following list describes the resources associated with this CanadaBuys tender notices dataset. Additional information on Government of Canada tenders can also be found on the Tender notices tab of the CanadaBuys tender opportunities page. NOTE: While the CanadaBuys online portal includes tender opportunities from across multiple levels of government, the data files in this related dataset only include notices from federal government organizations. (1) CanadaBuys data dictionary: This XML file offers descriptions of each data field in the tender notices files linked below, as well as other procurement-related datasets CanadaBuys produces. Use this as a guide for understanding the data elements in these files. This dictionary is updated as needed to reflect changes to the data elements. (2) New tender notices: This file contains up to date information on all new tender notices that are published to CanadaBuys throughout a given day. The file is updated every two hours, from 6:15 am until 10:15 pm (UTC-0500) to include new tenders as they are published. All tenders in this file will have a publication date matching the current day (displayed in the field labelled “publicationDate-datePublication”), or the day prior for systems that feed into this file on a nightly basis. (3) Open tender notices: This file contains up to date information on all tender notices that are open for bidding on CanadaBuys, including any amendments made to these tender notices during their lifecycles. The file is refreshed each morning, between 7:00 am and 8:30 am (UTC-0500) to include newly published open tenders. All tenders in this file will have a status of open (displayed in the field labelled “tenderStatus-tenderStatut-eng”). (4) All CanadaBuys tender notices, 2022-08-08 onwards: This file contains up to date information on all tender notices published through CanadaBuys. This includes any tender notices that were open for bids on or after August 8, 2022, when CanadaBuys launched as the system of record for all Tender Notices for the Government of Canada. This file includes any amendments made to these tender notices during their lifecycles. It is refreshed each morning, between 7:00 am and 8:30 am (UTC-0500) to include any updates or amendments, as needed. Tender notices in this file can have any publication date on or after August 8, 2022 (displayed in the field labelled “publicationDate-datePublication”), and can have a status of open, cancelled or expired (displayed in the field labelled “tenderStatus-tenderStatut-eng”). (5) Legacy tender notices, 2009 to 2022-08 (prior to CanadaBuys): This file contains details of the tender notices that were launched prior to the implementation of CanadaBuys, which became the system of record for all tender notices for the Government of Canada on August 8, 2022. This datafile is refreshed monthly. The over 70,000 tenders in this file have publication dates from August 5, 2022 and before (displayed in the field labelled “publicationDate-datePublication”) and have a status of cancelled or expired (displayed in the field labelled “tenderStatus-tenderStatut-eng”). Note: Procurement data was structured differently in the legacy applications previously used to administer Government of Canada tender notices. Efforts have been made to manipulate these historical records into the structure used by the CanadaBuys data files, to make them easier to analyse and compare with new records. This process is not perfect since simple one-to-one mappings can’t be made in many cases. You can access these historical records in their original format as part of the archived copy of the original tender notices dataset. You can also refer to the supporting documentation for understanding the new CanadaBuys tender and award notices datasets. (6) Tender notices, YYYY-YYYY: These files contain information on all tender notices published in the specified fiscal year that are no longer open to bidding. The current fiscal year's file is refreshed each morning, between 7:00 am and 8:30 am (UTC-0500) to include any updates or amendments, as needed. The files associated with past fiscal years are refreshed monthly. Tender notices in these files can have any publication date between April 1 of a given year and March 31 of the subsequent year (displayed in the field labelled “publicationDate-datePublication”) and can have a status of cancelled or expired (displayed in the field labelled “tenderStatus-tenderStatut-eng”). New records are added to these files once related tenders reach their close date, or are cancelled. Note: New tender notice data files will be added on April 1 for each fiscal year.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
This dataset contains information on all Government of Canada award notices published according to the Financial Administration Act. It includes data for all Schedule I, Schedule II and Schedule III departments, agencies, Crown corporations, and other entities (unless specifically exempt) who must comply with the Government of Canada trade agreement obligations. CanadaBuys is the authoritative source of this information. Visit the How procurement works page on CanadaBuys to learn more. All data files in this collection share a common column structure, and the procurement category field (labelled as “procurementCategory-categorieApprovisionnement”) can be used to filter by the following four major categories of awards: Awards for construction, which will have a value of “CNST” Awards for goods, which will have a value of “GD” Awards for services, which will have a value of “SRV” Awards for services related to goods, which will have a value of “SRVTGD” Some award notices may be associated with one or more of the above procurement categories. Note: Some records contain long award description values that may cause issues when viewed in certain spreadsheet programs, such as Microsoft Excel. When the information doesn’t fit within the cell’s character limit, the program will insert extra rows that don’t conform to the expected column formatting. (Though, all other records will still be displayed properly, in their own rows.) To quickly remove the “spill-over data” caused by this display error in Excel, select the publication date field (labelled as “publicationDate-datePublication”), then click the Filter button on the Data menu ribbon. You can then use the filter pull-down list to remove any blank or non-date values from this field, which will hide the rows that only contain “spill-over” description information. The following list describes the resources associated with this CanadaBuys award notices dataset. Additional information on Government of Canada award notices can be found on the Award notices tab of the CanadaBuys Tender opportunities page. NOTE: While the CanadaBuys online portal includes awards notices from across multiple levels of government, the data files in this related dataset only include notices from federal government organizations. (1) CanadaBuys data dictionary: This XML file offers descriptions of each data field in the award notices files linked below, as well as other procurement-related datasets CanadaBuys produces. Use this as a guide for understanding the data elements in these files. This dictionary is updated as needed to reflect changes to the data elements. (2) All CanadaBuys award notices, 2022-08-08 onward: This file contains up to date information on all award notices published on CanadaBuys. This includes any award notices that were published on or after August 8, 2022, when CanadaBuys became the system of record for all tender and award notices for the Government of Canada. This file includes any amendments made to these award notices during their lifecycles. It is refreshed each morning, between 7:00 am and 8:30 am (UTC-0500) to include any updates or amendments, as needed. Award notices in this file can have any publication date on or after August 8, 2022 (displayed in the field labelled “publicationDate-datePublication”), and can have a status of active, cancelled or expired (displayed in the field labelled “awardStatus-attributionStatut-eng”). (3) Legacy award notices, 2012 to 2022-08 (prior to CanadaBuys): This file contains details of the award notices published prior to the implementation of CanadaBuys, which became the system of record for all tender and award notices for the Government of Canada on August 8, 2022. This datafile is refreshed monthly. The over 100,000 awards in this file have publication dates from August 6, 2022 and prior (displayed in the field labelled “publicationDate-datePublication”), and have a status of active, cancelled or expired (displayed included in the field labelled “awardStatus-attributionStatut-eng”). Note: Procurement data was structured differently in the legacy applications previously used to administer Government of Canada contracts. Efforts have been made to manipulate these historical records into the structure used by the CanadaBuys data files, to make them easier to analyse and compare with new records. This process is not perfect since simple one-to-one mappings can’t be made in many cases. You can access these historical records in their original format as part of the archived copy of the original tender notices dataset, which contained awards-related data files. You can also refer to the supporting documentation for understanding the new CanadaBuys tender and award notices datasets. (4) Award notices, YYYY-YYYY: These files contain information on all contracts awarded in the specified fiscal year. The current fiscal year's file is refreshed each morning, between 7:00 am and 8:30 am (UTC-0500) to include any updates or amendments, as needed. The files associated with past fiscal years are updated monthly. Awards in these files can have any publication date between April 1 of a given year and March 31 of the subsequent year (displayed in the field labelled “publicationDate-datePublication”) and can have an award status of active, cancelled or expired (displayed in the field labelled “awardStatus-attributionStatut-eng”). Note: New award notice data files will be added on April 1 for each fiscal year.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Sample data for exercises in Further Adventures in Data Cleaning.