18 datasets found
  1. B

    Data Cleaning Sample

    • borealisdata.ca
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 13, 2023
    Dataset provided by
    Borealis
    Authors
    Rong Luo
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Sample data for exercises in Further Adventures in Data Cleaning.

  2. d

    Navigating Stats Can Data & Scrubbing Data Clean with Excel Workshop

    • search.dataone.org
    • borealisdata.ca
    Updated Jul 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Costanzo, Lucia; Jadon, Vivek (2024). Navigating Stats Can Data & Scrubbing Data Clean with Excel Workshop [Dataset]. http://doi.org/10.5683/SP3/FF6AI9
    Explore at:
    Dataset updated
    Jul 31, 2024
    Dataset provided by
    Borealis
    Authors
    Costanzo, Lucia; Jadon, Vivek
    Description

    Ahoy, data enthusiasts! Join us for a hands-on workshop where you will hoist your sails and navigate through the Statistics Canada website, uncovering hidden treasures in the form of data tables. With the wind at your back, you’ll master the art of downloading these invaluable Stats Can datasets while braving the occasional squall of data cleaning challenges using Excel with your trusty captains Vivek and Lucia at the helm.

  3. i

    Household Income and Expenditure 2010 - Tuvalu

    • catalog.ihsn.org
    Updated Mar 29, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Central Statistics Division (2019). Household Income and Expenditure 2010 - Tuvalu [Dataset]. http://catalog.ihsn.org/catalog/3203
    Explore at:
    Dataset updated
    Mar 29, 2019
    Dataset authored and provided by
    Central Statistics Division
    Time period covered
    2010
    Area covered
    Tuvalu
    Description

    Abstract

    The main objectives of the survey were: - To obtain weights for the revision of the Consumer Price Index (CPI) for Funafuti; - To provide information on the nature and distribution of household income, expenditure and food consumption patterns; - To provide data on the household sector's contribution to the National Accounts - To provide information on economic activity of men and women to study gender issues - To undertake some poverty analysis

    Geographic coverage

    National, including Funafuti and Outer islands

    Analysis unit

    • Household
    • individual

    Universe

    All the private household are included in the sampling frame. In each household selected, the current resident are surveyed, and people who are usual resident but are currently away (work, health, holydays reasons, or border student for example. If the household had been residing in Tuvalu for less than one year: - but intend to reside more than 12 months => The household is included - do not intend to reside more than 12 months => out of scope

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    It was decided that 33% (one third) sample was sufficient to achieve suitable levels of accuracy for key estimates in the survey. So the sample selection was spread proportionally across all the island except Niulakita as it was considered too small. For selection purposes, each island was treated as a separate stratum and independent samples were selected from each. The strategy used was to list each dwelling on the island by their geographical position and run a systematic skip through the list to achieve the 33% sample. This approach assured that the sample would be spread out across each island as much as possible and thus more representative.

    For details please refer to Table 1.1 of the Report.

    Sampling deviation

    Only the island of Niulakita was not included in the sampling frame, considered too small.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    There were three main survey forms used to collect data for the survey. Each question are writen in English and translated in Tuvaluan on the same version of the questionnaire. The questionnaires were designed based on the 2004 survey questionnaire.

    HOUSEHOLD FORM - composition of the household and demographic profile of each members - dwelling information - dwelling expenditure - transport expenditure - education expenditure - health expenditure - land and property expenditure - household furnishing - home appliances - cultural and social payments - holydays/travel costs - Loans and saving - clothing - other major expenditure items

    INDIVIDUAL FORM - health and education - labor force (individu aged 15 and above) - employment activity and income (individu aged 15 and above): wages and salaries, working own business, agriculture and livestock, fishing, income from handicraft, income from gambling, small scale activies, jobs in the last 12 months, other income, childreen income, tobacco and alcohol use, other activities, and seafarer

    DIARY (one diary per week, on a 2 weeks period, 2 diaries per household were required) - All kind of expenses - Home production - food and drink (eaten by the household, given away, sold) - Goods taken from own business (consumed, given away) - Monetary gift (given away, received, winning from gambling) - Non monetary gift (given away, received, winning from gambling)

    Questionnaire Design Flaws Questionnaire design flaws address any problems with the way questions were worded which will result in an incorrect answer provided by the respondent. Despite every effort to minimize this problem during the design of the respective survey questionnaires and the diaries, problems were still identified during the analysis of the data. Some examples are provided below:

    Gifts, Remittances & Donations Collecting information on the following: - the receipt and provision of gifts - the receipt and provision of remittances - the provision of donations to the church, other communities and family occasions is a very difficult task in a HIES. The extent of these activities in Tuvalu is very high, so every effort should be made to address these activities as best as possible. A key problem lies in identifying the best form (questionnaire or diary) for covering such activities. A general rule of thumb for a HIES is that if the activity occurs on a regular basis, and involves the exchange of small monetary amounts or in-kind gifts, the diary is more appropriate. On the other hand, if the activity is less infrequent, and involves larger sums of money, the questionnaire with a recall approach is preferred. It is not always easy to distinguish between the two for the different activities, and as such, both the diary and questionnaire were used to collect this information. Unfortunately it probably wasn?t made clear enough as to what types of transactions were being collected from the different sources, and as such some transactions might have been missed, and others counted twice. The effects of these problems are hopefully minimal overall.

    Defining Remittances Because people have different interpretations of what constitutes remittances, the questionnaire needs to be very clear as to how this concept is defined in the survey. Unfortunately this wasn?t explained clearly enough so it was difficult to distinguish between a remittance, which should be of a more regular nature, and a one-off monetary gift which was transferred between two households.

    Business Expenses Still Recorded The aim of the survey is to measure "household" expenditure, and as such, any expenditure made by a household for an item or service which was primarily used for a business activity should be excluded. It was not always clear in the questionnaire that this was the case, and as such some business expenses were included. Efforts were made during data cleaning to remove any such business expenses which would impact significantly on survey results.

    Purchased goods given away as a gift When a household makes a gift donation of an item it has purchased, this is recorded in section 5 of the diary. Unfortunately it was difficult to know how to treat these items as it was not clear as to whether this item had been recorded already in section 1 of the diary which covers purchases. The decision was made to exclude all information of gifts given which were considered to be purchases, as these items were assumed to have already been recorded already in section 1. Ideally these items should be treated as a purchased gift given away, which in turn is not household consumption expenditure, but this was not possible.

    Some key items missed in the Questionnaire Although not a big issue, some key expenditure items were omitted from the questionnaire when it would have been best to collect them via this schedule. A key example being electric fans which many households in Tuvalu own.

    Cleaning operations

    Consistency of the data: - each questionnaire was checked by the supervisor during and after the collection - before data entry, all the questionnaire were coded - the CSPRo data entry system included inconsistency checks which allow the NSO staff to point some errors and to correct them with imputation estimation from their own knowledge (no time for double entry), 4 data entry operators. - after data entry, outliers were identified in order to check their consistency.

    All data entry, including editing, edit checks and queries, was done using CSPro (Census Survey Processing System) with additional data editing and cleaning taking place in Excel.

    The staff from the CSD was responsible for undertaking the coding and data entry, with assistance from an additional four temporary staff to help produce results in a more timely manner.

    Although enumeration didn't get completed until mid June, the coding and data entry commenced as soon as forms where available from Funafuti, which was towards the end of March. The coding and data entry was then completed around the middle of July.

    A visit from an SPC consultant then took place to undertake initial cleaning of the data, primarily addressing missing data items and missing schedules. Once the initial data cleaning was undertaken in CSPro, data was transferred to Excel where it was closely scrutinized to check that all responses were sensible. In the cases where unusual values were identified, original forms were consulted for these households and modifications made to the data if required.

    Despite the best efforts being made to clean the data file in preparation for the analysis, no doubt errors will still exist in the data, due to its size and complexity. Having said this, they are not expected to have significant impacts on the survey results.

    Under-Reporting and Incorrect Reporting as a result of Poor Field Work Procedures The most crucial stage of any survey activity, whether it be a population census or a survey such as a HIES is the fieldwork. It is crucial for intense checking to take place in the field before survey forms are returned to the office for data processing. Unfortunately, it became evident during the cleaning of the data that fieldwork wasn?t checked as thoroughly as required, and as such some unexpected values appeared in the questionnaires, as well as unusual results appearing in the diaries. Efforts were made to indentify the main issues which would have the greatest impact on final results, and this information was modified using local knowledge, to a more reasonable answer, when required.

    Data Entry Errors Data entry errors are always expected, but can be kept to a minimum with

  4. Coffee Sales Excel Project

    • kaggle.com
    Updated Nov 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nuha Zahidi (2024). Coffee Sales Excel Project [Dataset]. https://www.kaggle.com/datasets/nuhazahidi/coffee-sales-excel-project
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 13, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nuha Zahidi
    Description

    Tool: Microsoft Excel

    Dataset: Coffee Sales

    Process: 1. Data Cleaning: • Remove duplicates and blanks. • Standardize date and currency formats.

    1. Data Manipulation: • Sorting and filtering function to work
      with interest subsets of data. • Use XLOOKUP, INDEX-MATCH and IF
      formula for efficient data manipulation, such as retrieving, matching and organising information in spreadsheets

    2. Data Analysis: • Create Pivot Tables and Pivot Charts with the formatting to visualize trends.

    3. Dashboard Development: • Insert Slicers with the formatting for easy filtering and dynamic updates.

    Highlights: This project aims to understand coffee sales trends by country, roast type, and year, which could help identify marketing opportunities and customer segments.

  5. RAAAP-2 Datasets (17 linked datasets)

    • figshare.com
    bin
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Kerridge; Patrice Ajai-Ajagbe; Cindy Kiel; Jennifer Shambrook; BRYONY WAKEFIELD (2023). RAAAP-2 Datasets (17 linked datasets) [Dataset]. http://doi.org/10.6084/m9.figshare.18972935.v2
    Explore at:
    binAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Simon Kerridge; Patrice Ajai-Ajagbe; Cindy Kiel; Jennifer Shambrook; BRYONY WAKEFIELD
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This collection contains the 17 anonymised datasets from the RAAAP-2 international survey of research management and administration professional undertaken in 2019. To preserve anonymity the data are presented in 17 datasets linked only by AnalysisRegionofEmployment, as many of the textual responses, even though redacted to remove institutional affiliation could be used to identify some individuals if linked to the other data. Each dataset is presented in the original SPSS format, suitable for further analyses, as well as an Excel equivalent for ease of viewing. There are additional files in this collection showing the the questionnaire and the mappings to the datasets together with the SPSS scripts used to produce the datasets. These data follow on from, but re not directly linked to the first RAAAP survey undertaken in 2016, data from which can also be found in FigShare Errata (16/5/23) an error in v13 of the main Data Cleansing syntax file (now updated to v14) meant that two variables were missing their value labels (the underlying codes were correct) - a new version (SPSS & Excel) of the Main Dataset has been updated

  6. u

    Data from: Survey data from the Australian Marine Debris Initiative

    • research.usc.edu.au
    • researchdata.edu.au
    csv
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heidi Tait; Jodi Jones; Caitlin Smith; Kathy Townsend, Survey data from the Australian Marine Debris Initiative [Dataset]. https://research.usc.edu.au/esploro/outputs/dataset/Survey-data-from-the-Australian-Marine/991016398702621
    Explore at:
    csv(7054018 bytes)Available download formats
    Dataset provided by
    University of the Sunshine Coast
    Authors
    Heidi Tait; Jodi Jones; Caitlin Smith; Kathy Townsend
    Time period covered
    2024
    Area covered
    Australia
    Description

    Survey data from the Australian Marine Debris Initiative and the result of spatial analysis from multiple creative commons datasets. Data consists of: • Spatial Data Queensland Coastline – Event summaries within an Excel data table and shapefile • All years • Number of Items removed, Weight volunteers, Volume, Distance, Latitude and Longitude. • Contributing organisation files table/ sites • Environmental, physical and biological variables associated with the closest catchment to each debris survey. TBF has made all reasonable efforts to ensure that the information in the Custom Dataset is accurate. TBF will not be held responsible: • for the way these data are used by the Entity for their Reports; • for any errors that may be contained in the Custom Dataset; or • any direct or indirect damage the use of the Custom Dataset may cause. Data collected by TBF comes from citizen science initiatives and is taken at face value from contributors with each entry being vetted and periodic checks being made to maintain the integrity of the overall dataset. Some clean-up data has been extrapolated by data collectors. Some weight and distance details have not been provided by contributors. The data was collected by various organisations and individuals in clean-up events at their chosen locations where man-made items greater than 5mm were removed from the beach, and sorted, counted and recorded on data sheets, using CyberTracker software devices or the AMDI mobile application. Items were identified according to the method laid out in the TBF Marine Debris Identification Manual in which items are grouped according to their material categories (the manual is available on the TBF website). The length of beach cleaned is at the discretion of the clean-up group and the total weight of items removed is either weighed with handheld scales or estimated.

  7. o

    Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race,...

    • openicpsr.org
    • doi.org
    • +1more
    Updated Aug 16, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacob Kaplan (2018). Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race, 1980-2016 [Dataset]. http://doi.org/10.3886/E102263V5
    Explore at:
    Dataset updated
    Aug 16, 2018
    Dataset provided by
    University of Pennsylvania
    Authors
    Jacob Kaplan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    1980 - 2016
    Area covered
    United States
    Description
    Version 5 release notes:
    • Removes support for SPSS and Excel data.
    • Changes the crimes that are stored in each file. There are more files now with fewer crimes per file. The files and their included crimes have been updated below.
    • Adds in agencies that report 0 months of the year.
    • Adds a column that indicates the number of months reported. This is generated summing up the number of unique months an agency reports data for. Note that this indicates the number of months an agency reported arrests for ANY crime. They may not necessarily report every crime every month. Agencies that did not report a crime with have a value of NA for every arrest column for that crime.
    • Removes data on runaways.
    Version 4 release notes:
    • Changes column names from "poss_coke" and "sale_coke" to "poss_heroin_coke" and "sale_heroin_coke" to clearly indicate that these column includes the sale of heroin as well as similar opiates such as morphine, codeine, and opium. Also changes column names for the narcotic columns to indicate that they are only for synthetic narcotics.
    Version 3 release notes:
    • Add data for 2016.
    • Order rows by year (descending) and ORI.
    Version 2 release notes:
    • Fix bug where Philadelphia Police Department had incorrect FIPS county code.

    The Arrests by Age, Sex, and Race data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains highly granular data on the number of people arrested for a variety of crimes (see below for a full list of included crimes). The data sets here combine data from the years 1980-2015 into a single file. These files are quite large and may take some time to load.

    All the data was downloaded from NACJD as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. For the R code used to clean this data, see here.
    https://github.com/jacobkap/crime_data. If you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.

    I did not make any changes to the data other than the following. When an arrest column has a value of "None/not reported", I change that value to zero. This makes the (possible incorrect) assumption that these values represent zero crimes reported. The original data does not have a value when the agency reports zero arrests other than "None/not reported." In other words, this data does not differentiate between real zeros and missing values. Some agencies also incorrectly report the following numbers of arrests which I change to NA: 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 99999, 99998.

    To reduce file size and make the data more manageable, all of the data is aggregated yearly. All of the data is in agency-year units such that every row indicates an agency in a given year. Columns are crime-arrest category units. For example, If you choose the data set that includes murder, you would have rows for each agency-year and columns with the number of people arrests for murder. The ASR data breaks down arrests by age and gender (e.g. Male aged 15, Male aged 18). They also provide the number of adults or juveniles arrested by race. Because most agencies and years do not report the arrestee's ethnicity (Hispanic or not Hispanic) or juvenile outcomes (e.g. referred to adult court, referred to welfare agency), I do not include these columns.

    To make it easier to merge with other data, I merged this data with the Law Enforcement Agency Identifiers Crosswalk (LEAIC) data. The data from the LEAIC add FIPS (state, county, and place) and agency type/subtype. Please note that some of the FIPS codes have leading zeros and if you open it in Excel it will automatically delete those leading zeros.

    I created 9 arrest categories myself. The categories are:
    • Total Male Juvenile
    • Total Female Juvenile
    • Total Male Adult
    • Total Female Adult
    • Total Ma

  8. Market Basket Analysis

    • kaggle.com
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 9, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Aslan Ahmedov
    Description

    Market Basket Analysis

    Market basket analysis with Apriori algorithm

    The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

    Introduction

    Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

    An Example of Association Rules

    Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Strategy

    • Data Import
    • Data Understanding and Exploration
    • Transformation of the data – so that is ready to be consumed by the association rules algorithm
    • Running association rules
    • Exploring the rules generated
    • Filtering the generated rules
    • Visualization of Rule

    Dataset Description

    • File name: Assignment-1_Data
    • List name: retaildata
    • File format: . xlsx
    • Number of Row: 522065
    • Number of Attributes: 7

      • BillNo: 6-digit number assigned to each transaction. Nominal.
      • Itemname: Product name. Nominal.
      • Quantity: The quantities of each product per transaction. Numeric.
      • Date: The day and time when each transaction was generated. Numeric.
      • Price: Product price. Numeric.
      • CustomerID: 5-digit number assigned to each customer. Nominal.
      • Country: Name of the country where each customer resides. Nominal.

    imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

    Libraries in R

    First, we need to load required libraries. Shortly I describe all libraries.

    • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
    • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
    • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
    • readxl - Read Excel Files in R.
    • plyr - Tools for Splitting, Applying and Combining Data.
    • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • knitr - Dynamic Report generation in R.
    • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
    • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
    • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

    imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

    Data Pre-processing

    Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

    imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

    After we will clear our data frame, will remove missing values.

    imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

    To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

  9. d

    CompanyData.com (BoldData) — Saudi Arabia Largest B2B Company Database —...

    • datarade.ai
    Updated Apr 21, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CompanyData.com (BoldData) (2021). CompanyData.com (BoldData) — Saudi Arabia Largest B2B Company Database — 781+ Thousands Verified Companies [Dataset]. https://datarade.ai/data-products/list-of-170k-companies-in-saudi-arabia-bolddata
    Explore at:
    .json, .csv, .xls, .txtAvailable download formats
    Dataset updated
    Apr 21, 2021
    Dataset authored and provided by
    CompanyData.com (BoldData)
    Area covered
    Saudi Arabia
    Description

    At CompanyData.com (BoldData), we deliver verified, high-quality business data sourced directly from official trade registers around the world. For Saudi Arabia, we provide access to a robust database of over 781,431 registered companies, offering valuable insights into one of the Middle East’s fastest-growing and most influential economies.

    Our Saudi Arabia company database includes detailed firmographic information such as company name, registration number, legal entity type, NACE codes, estimated revenue, employee size, and ownership structures. Where available, we also provide contact data, including names of key decision-makers, job titles, email addresses, and mobile numbers to support your outreach and engagement efforts.

    Whether you are focused on regulatory compliance, KYC and AML checks, B2B lead generation, CRM enrichment, market analysis, or AI training, our Saudi company data offers the accuracy, structure and reliability your organization needs to make informed decisions and drive growth.

    Choose how you access the data: • Tailored company lists based on your ideal criteria • Full national databases for strategic and analytical projects • Real time updates via our API • Easy-to-use formats including Excel and CSV • Enrichment services to clean, match and enhance your existing records

    With access to 781,431 verified companies in over 200 countries, CompanyData.com (BoldData) combines global scale with local detail. Whether you’re entering the Saudi market or expanding across borders, we help you move forward with trusted data that supports smarter business decisions.

    Rely on CompanyData.com to connect you with accurate, up-to-date company data in Saudi Arabia and beyond — enabling compliance, growth and success at every step.

  10. d

    CompanyData.com (BoldData) — Sweden Largest B2B Company Database — 1.83+...

    • datarade.ai
    Updated Apr 21, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CompanyData.com (BoldData) (2021). CompanyData.com (BoldData) — Sweden Largest B2B Company Database — 1.83+ Millions Verified Companies [Dataset]. https://datarade.ai/data-products/list-of-1-6m-companies-in-sweden-bolddata
    Explore at:
    .json, .csv, .xls, .txtAvailable download formats
    Dataset updated
    Apr 21, 2021
    Dataset authored and provided by
    CompanyData.com (BoldData)
    Area covered
    Sweden
    Description

    At CompanyData.com (BoldData), we provide accurate and verified business data sourced directly from official trade registers. For Sweden, we offer access to a comprehensive database of over 1,825,197 registered companies, giving you the insight and confidence needed to navigate one of Europe's most digitally advanced economies.

    Our Swedish company database includes detailed firmographic information, such as company name, registration number (organisationsnummer), legal form, industry classification (SNI/NACE), size, revenue estimates and company hierarchies. Where available, we also deliver contact details including names of key executives, job titles, email addresses and mobile numbers.

    Whether you are focused on KYC and AML compliance, CRM data enrichment, B2B lead generation, sales and marketing, AI training or market research, our Sweden company data is structured, up to date and ready to meet your business needs.

    We offer multiple flexible delivery options: • Tailored company lists based on specific filters such as industry, size or region • Full national datasets for deeper analysis and segmentation • Real time integration via our API • File formats including Excel and CSV for fast deployment • Enrichment services to clean and enhance your existing data

    As part of our global network of 1,825,197 verified company records across 200+ countries, we combine local expertise with global reach. Whether you’re entering the Swedish market or scaling internationally, CompanyData.com (BoldData) gives you the data tools to make informed decisions and unlock new growth.

    Choose CompanyData.com for reliable, verified company information in Sweden and beyond. We help you reduce risk, improve targeting and grow with confidence.

  11. e

    Application of green solvents to remove ionomer-containing binder for PEM...

    • b2find.eudat.eu
    Updated Dec 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Application of green solvents to remove ionomer-containing binder for PEM water electrolyzer recycling (RAW data of the Master Thesis) - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/7fa2dc49-f3bf-5af7-9f83-46952d932029
    Explore at:
    Dataset updated
    Dec 10, 2024
    Description

    The files contain the raw data of the following Master Thesis: Förster, Wenzel Application of green solvents to remove ionomer-containing binder for PEM water electrolyzer recycling Master Thesis TU Bergakademie Freiberg Date of submission: 2024-12-10 The data contains two excel files and six zip-files.

  12. d

    City of Tempe 2023 Business Survey Data

    • catalog.data.gov
    • gimi9.com
    • +3more
    Updated Sep 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Tempe (2024). City of Tempe 2023 Business Survey Data [Dataset]. https://catalog.data.gov/dataset/city-of-tempe-2023-business-survey-data
    Explore at:
    Dataset updated
    Sep 20, 2024
    Dataset provided by
    City of Tempe
    Area covered
    Tempe
    Description

    These data include the individual responses for the City of Tempe Annual Business Survey conducted by ETC Institute. These data help determine priorities for the community as part of the City's on-going strategic planning process. Averaged Business Survey results are used as indicators for city performance measures. The performance measures with indicators from the Business Survey include the following (as of 2023):1. Financial Stability and Vitality5.01 Quality of Business ServicesThe location data in this dataset is generalized to the block level to protect privacy. This means that only the first two digits of an address are used to map the location. When they data are shared with the city only the latitude/longitude of the block level address points are provided. This results in points that overlap. In order to better visualize the data, overlapping points were randomly dispersed to remove overlap. The result of these two adjustments ensure that they are not related to a specific address, but are still close enough to allow insights about service delivery in different areas of the city.Additional InformationSource: Business SurveyContact (author): Adam SamuelsContact E-Mail (author): Adam_Samuels@tempe.govContact (maintainer): Contact E-Mail (maintainer): Data Source Type: Excel tablePreparation Method: Data received from vendor after report is completedPublish Frequency: AnnualPublish Method: ManualData DictionaryMethods:The survey is mailed to a random sample of businesses in the City of Tempe. Follow up emails and texts are also sent to encourage participation. A link to the survey is provided with each communication. To prevent people who do not live in Tempe or who were not selected as part of the random sample from completing the survey, everyone who completed the survey was required to provide their address. These addresses were then matched to those used for the random representative sample. If the respondent’s address did not match, the response was not used.To better understand how services are being delivered across the city, individual results were mapped to determine overall distribution across the city.Processing and Limitations:The location data in this dataset is generalized to the block level to protect privacy. This means that only the first two digits of an address are used to map the location. When they data are shared with the city only the latitude/longitude of the block level address points are provided. This results in points that overlap. In order to better visualize the data, overlapping points were randomly dispersed to remove overlap. The result of these two adjustments ensure that they are not related to a specific address, but are still close enough to allow insights about service delivery in different areas of the city.The data are used by the ETC Institute in the final published PDF report.

  13. f

    Data file.

    • figshare.com
    xlsx
    Updated Jul 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marie Gorreti Zalwango; Caroline Kyozira; Mariam Nambuya; Martin Bulamu; Allan Muruta (2025). Data file. [Dataset]. http://doi.org/10.1371/journal.pgph.0003722.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 15, 2025
    Dataset provided by
    PLOS Global Public Health
    Authors
    Marie Gorreti Zalwango; Caroline Kyozira; Mariam Nambuya; Martin Bulamu; Allan Muruta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mortality surveillance aids in identifying and addressing causes of death allowing health systems to adapt and respond effectively. An assessment of mortality surveillance in Uganda was conducted from November 2023 to June 2024 through data reviews and plenary discussions engaging various stakeholders in Uganda. Eight (8) workshops/meetings were conducted over a period of eight months to generate information on mortality data sources, processes of data generation and challenges affecting the system. Responses from the meetings and workshops were recorded and transcribed. Data were thematically analysed and presented as descriptive narratives. Quantitative data from district health information system version. 2 (DHIS2) was analyzed using excel and presented using charts and tables. The rapid assessment of mortality surveillance in Uganda highlighted opportunities for improved mortality surveillance through the existence of various sources of data. It was highlighted that 66.9% of the death occur in communities, however, there is a major data completeness gaps where suboptimal data from the community is feed into the national health statistics database (DHIS2) to enable stakeholder analysis and utilization. Furthermore, a number of data quality issues were identified in the health facility generated data where 33% of the deaths occur. These include: data completeness where the national referral specialized health institutes do not feed their data into the national data base, late reporting and the lack of coordination and standardisation of reporting among the various partners. The existence of structures to conduct mortality surveillance in Uganda presents an opportunity for improved mortality surveillance despite the highlighted gaps and challenges. Adoption of strategies aimed to enable the successful implementation of an efficient mortality surveillance program like: strengthening governance and operations of death reporting activities, establishing a clear definition of institutional roles and responsibilities, raising awareness and advocacy at all levels, building technical capacities, improving allocation of resources, and leveraging on shared interests by both implementing and development partners could improve mortality surveillance and the health of the population through utilisation of the generated data.

  14. Crimes - One year prior to present

    • chicago.gov
    • data.cityofchicago.org
    • +2more
    csv, xlsx, xml
    Updated Oct 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chicago Police Department (2025). Crimes - One year prior to present [Dataset]. https://www.chicago.gov/city/en/dataset/crime.html
    Explore at:
    xlsx, xml, csvAvailable download formats
    Dataset updated
    Oct 2, 2025
    Dataset authored and provided by
    Chicago Police Departmenthttp://www.chicagopolice.org/
    Description

    This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that have occurred in the City of Chicago over the past year, minus the most recent seven days of data. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. Should you have questions about this dataset, you may contact the Research & Development Division of the Chicago Police Department at 312.745.6071 or RandD@chicagopolice.org. Disclaimer: These crimes may be based upon preliminary information supplied to the Police Department by the reporting parties that have not been verified. The preliminary crime classifications may be changed at a later date based upon additional investigation and there is always the possibility of mechanical or human error. Therefore, the Chicago Police Department does not guarantee (either expressed or implied) the accuracy, completeness, timeliness, or correct sequencing of the information and the information should not be used for comparison purposes over time. The Chicago Police Department will not be responsible for any error or omission, or for the use of, or the results obtained from the use of this information. All data visualizations on maps should be considered approximate and attempts to derive specific addresses are strictly prohibited.

    The Chicago Police Department is not responsible for the content of any off-site pages that are referenced by or that reference this web page other than an official City of Chicago or Chicago Police Department web page. The user specifically acknowledges that the Chicago Police Department is not responsible for any defamatory, offensive, misleading, or illegal conduct of other users, links, or third parties and that the risk of injury from the foregoing rests entirely with the user. The unauthorized use of the words "Chicago Police Department," "Chicago Police," or any colorable imitation of these words or the unauthorized use of the Chicago Police Department logo is unlawful. This web page does not, in any way, authorize such use. Data is updated daily Tuesday through Sunday. The dataset contains more than 65,000 records/rows of data and cannot be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Wordpad, to view and search. To access a list of Chicago Police Department - Illinois Uniform Crime Reporting (IUCR) codes, go to http://bit.ly/rk5Tpc.

  15. u

    CanadaBuys award notices - Catalogue - Canadian Urban Data Catalogue (CUDC)

    • data.urbandatacentre.ca
    Updated Oct 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). CanadaBuys award notices - Catalogue - Canadian Urban Data Catalogue (CUDC) [Dataset]. https://data.urbandatacentre.ca/dataset/gov-canada-a1acb126-9ce8-40a9-b889-5da2b1dd20cb
    Explore at:
    Dataset updated
    Oct 22, 2024
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Area covered
    Canada
    Description

    This dataset contains information on all Government of Canada award notices published according to the Financial Administration Act. It includes data for all Schedule I, Schedule II and Schedule III departments, agencies, Crown corporations, and other entities (unless specifically exempt) who must comply with the Government of Canada trade agreement obligations. CanadaBuys is the authoritative source of this information. Visit the How procurement works page on CanadaBuys to learn more. All data files in this collection share a common column structure, and the procurement category field (labelled as “procurementCategory-categorieApprovisionnement”) can be used to filter by the following four major categories of awards: Awards for construction, which will have a value of “CNST” Awards for goods, which will have a value of “GD” Awards for services, which will have a value of “SRV” Awards for services related to goods, which will have a value of “SRVTGD” Some award notices may be associated with one or more of the above procurement categories. Note: Some records contain long award description values that may cause issues when viewed in certain spreadsheet programs, such as Microsoft Excel. When the information doesn’t fit within the cell’s character limit, the program will insert extra rows that don’t conform to the expected column formatting. (Though, all other records will still be displayed properly, in their own rows.) To quickly remove the “spill-over data” caused by this display error in Excel, select the publication date field (labelled as “publicationDate-datePublication”), then click the Filter button on the Data menu ribbon. You can then use the filter pull-down list to remove any blank or non-date values from this field, which will hide the rows that only contain “spill-over” description information. The following list describes the resources associated with this CanadaBuys award notices dataset. Additional information on Government of Canada award notices can be found on the Award notices tab of the CanadaBuys Tender opportunities page. NOTE: While the CanadaBuys online portal includes awards notices from across multiple levels of government, the data files in this related dataset only include notices from federal government organizations. (1) CanadaBuys data dictionary: This XML file offers descriptions of each data field in the award notices files linked below, as well as other procurement-related datasets CanadaBuys produces. Use this as a guide for understanding the data elements in these files. This dictionary is updated as needed to reflect changes to the data elements. (2) All CanadaBuys award notices, 2022-08-08 onward: This file contains up to date information on all award notices published on CanadaBuys. This includes any award notices that were published on or after August 8, 2022, when CanadaBuys became the system of record for all tender and award notices for the Government of Canada. This file includes any amendments made to these award notices during their lifecycles. It is refreshed each morning, between 7:00 am and 8:30 am (UTC-0500) to include any updates or amendments, as needed. Award notices in this file can have any publication date on or after August 8, 2022 (displayed in the field labelled “publicationDate-datePublication”), and can have a status of active, cancelled or expired (displayed in the field labelled “awardStatus-attributionStatut-eng”). (3) Legacy award notices, 2012 to 2022-08 (prior to CanadaBuys): This file contains details of the award notices published prior to the implementation of CanadaBuys, which became the system of record for all tender and award notices for the Government of Canada on August 8, 2022. This datafile is refreshed monthly. The over 100,000 awards in this file have publication dates from August 6, 2022 and prior (displayed in the field labelled “publicationDate-datePublication”), and have a status of active, cancelled or expired (displayed included in the field labelled “awardStatus-attributionStatut-eng”). Note: Procurement data was structured differently in the legacy applications previously used to administer Government of Canada contracts. Efforts have been made to manipulate these historical records into the structure used by the CanadaBuys data files, to make them easier to analyse and compare with new records. This process is not perfect since simple one-to-one mappings can’t be made in many cases. You can access these historical records in their original format as part of the archived copy of the original tender notices dataset, which contained awards-related data files. You can also refer to the supporting documentation for understanding the new CanadaBuys tender and award notices datasets. (4) Award notices, YYYY-YYYY: These files contain information on all contracts awarded in the specified fiscal year. The current fiscal year's file is refreshed each morning, between 7:00 am and 8:30 am (UTC-0500) to include any updates or amendments, as needed. The files associated with past fiscal years are updated monthly. Awards in these files can have any publication date between April 1 of a given year and March 31 of the subsequent year (displayed in the field labelled “publicationDate-datePublication”) and can have an award status of active, cancelled or expired (displayed in the field labelled “awardStatus-attributionStatut-eng”). Note: New award notice data files will be added on April 1 for each fiscal year.

  16. f

    Excel spreadsheet containing, in separate sheets, the underlying numerical...

    • plos.figshare.com
    xlsx
    Updated Jan 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roddy M. Grieves (2024). Excel spreadsheet containing, in separate sheets, the underlying numerical data and statistical analysis for Fig panels 9A-9F, 10A-10D, 11A–11B, S5B, S6B, S7C–S7D, S8C–S8D, S10 and S11. [Dataset]. http://doi.org/10.1371/journal.pcbi.1011763.s012
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 9, 2024
    Dataset provided by
    PLOS Computational Biology
    Authors
    Roddy M. Grieves
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Excel spreadsheet containing, in separate sheets, the underlying numerical data and statistical analysis for Fig panels 9A-9F, 10A-10D, 11A–11B, S5B, S6B, S7C–S7D, S8C–S8D, S10 and S11.

  17. CanadaBuys award notices

    • ouvert.canada.ca
    • open.canada.ca
    csv, html, xml
    Updated Sep 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Public Services and Procurement Canada (2025). CanadaBuys award notices [Dataset]. https://ouvert.canada.ca/data/dataset/a1acb126-9ce8-40a9-b889-5da2b1dd20cb
    Explore at:
    html, csv, xmlAvailable download formats
    Dataset updated
    Sep 8, 2025
    Dataset provided by
    Public Services and Procurement Canadahttp://www.pwgsc.gc.ca/
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    This dataset contains information on all Government of Canada award notices published according to the Financial Administration Act. It includes data for all Schedule I, Schedule II and Schedule III departments, agencies, Crown corporations, and other entities (unless specifically exempt) who must comply with the Government of Canada trade agreement obligations. CanadaBuys is the authoritative source of this information. Visit the How procurement works page on CanadaBuys to learn more. All data files in this collection share a common column structure, and the procurement category field (labelled as “*procurementCategory-categorieApprovisionnement*”) can be used to filter by the following four major categories of awards: - Awards for construction, which will have a value of “CNST” - Awards for goods, which will have a value of “GD” - Awards for services, which will have a value of “SRV” - Awards for services related to goods, which will have a value of “SRVTGD” Some award notices may be associated with one or more of the above procurement categories. >Note: Some records contain long award description values that may cause issues when viewed in certain spreadsheet programs, such as Microsoft Excel. When the information doesn’t fit within the cell’s character limit, the program will insert extra rows that don’t conform to the expected column formatting. (Though, all other records will still be displayed properly, in their own rows.) To quickly remove the “spill-over data” caused by this display error in Excel, select the publication date field (labelled as “*publicationDate-datePublication*”), then click the Filter button on the Data menu ribbon. You can then use the filter pull-down list to remove any blank or non-date values from this field, which will hide the rows that only contain “spill-over” description information. --- The following list describes the resources associated with this CanadaBuys award notices dataset. Additional information on Government of Canada award notices can be found on the Award notices tab of the CanadaBuys Tender opportunities page. >NOTE: While the CanadaBuys online portal includes awards notices from across multiple levels of government, the data files in this related dataset only include notices from federal government organizations. --- (1) CanadaBuys data dictionary: This XML file offers descriptions of each data field in the award notices files linked below, as well as other procurement-related datasets CanadaBuys produces. Use this as a guide for understanding the data elements in these files. This dictionary is updated as needed to reflect changes to the data elements. (2) All CanadaBuys award notices, 2022-08-08 onward: This file contains up to date information on all award notices published on CanadaBuys. This includes any award notices that were published on or after August 8, 2022, when CanadaBuys became the system of record for all tender and award notices for the Government of Canada. This file includes any amendments made to these award notices during their lifecycles. It is refreshed each morning, between 7:00 am and 8:30 am (UTC-0500) to include any updates or amendments, as needed. Award notices in this file can have any publication date on or after August 8, 2022 (displayed in the field labelled “*publicationDate-datePublication*”), and can have a status of active, cancelled or expired (displayed in the field labelled “*awardStatus-attributionStatut-eng*”). (3) Legacy award notices, 2012 to 2022-08 (prior to CanadaBuys): This file contains details of the award notices published prior to the implementation of CanadaBuys, which became the system of record for all tender and award notices for the Government of Canada on August 8, 2022. This datafile is refreshed monthly. The over 100,000 awards in this file have publication dates from August 6, 2022 and prior (displayed in the field labelled “*publicationDate-datePublication*”), and have a status of active, cancelled or expired (displayed included in the field labelled “*awardStatus-attributionStatut-eng*”). >Note: Procurement data was structured differently in the legacy applications previously used to administer Government of Canada contracts. Efforts have been made to manipulate these historical records into the structure used by the CanadaBuys data files, to make them easier to analyse and compare with new records. This process is not perfect since simple one-to-one mappings can’t be made in many cases. You can access these historical records in their original format as part of the archived copy of the original tender notices dataset, which contained awards-related data files. You can also refer to the supporting documentation for understanding the new CanadaBuys tender and award notices datasets. (4) Award notices, YYYY-YYYY: These files contain information on all contracts awarded in the specified fiscal year. The current fiscal year's file is refreshed each morning, between 7:00 am and 8:30 am (UTC-0500) to include any updates or amendments, as needed. The files associated with past fiscal years are updated monthly. Awards in these files can have any publication date between April 1 of a given year and March 31 of the subsequent year (displayed in the field labelled “*publicationDate-datePublication*”) and can have an award status of active, cancelled or expired (displayed in the field labelled “*awardStatus-attributionStatut-eng*”). >Note: New award notice data files will be added on April 1 for each fiscal year.

  18. f

    Mortality statistics by health facility level in Uganda, FY 2022-2023.

    • figshare.com
    xls
    Updated Jul 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marie Gorreti Zalwango; Caroline Kyozira; Mariam Nambuya; Martin Bulamu; Allan Muruta (2025). Mortality statistics by health facility level in Uganda, FY 2022-2023. [Dataset]. http://doi.org/10.1371/journal.pgph.0003722.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jul 15, 2025
    Dataset provided by
    PLOS Global Public Health
    Authors
    Marie Gorreti Zalwango; Caroline Kyozira; Mariam Nambuya; Martin Bulamu; Allan Muruta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Uganda
    Description

    Mortality statistics by health facility level in Uganda, FY 2022-2023.

  19. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177

Data Cleaning Sample

Explore at:
162 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 13, 2023
Dataset provided by
Borealis
Authors
Rong Luo
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Sample data for exercises in Further Adventures in Data Cleaning.

Search
Clear search
Close search
Google apps
Main menu