4 datasets found
  1. Restaurant Sales-Dirty Data for Cleaning Training

    • kaggle.com
    Updated Jan 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Mohamed (2025). Restaurant Sales-Dirty Data for Cleaning Training [Dataset]. https://www.kaggle.com/datasets/ahmedmohamed2003/restaurant-sales-dirty-data-for-cleaning-training
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 25, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ahmed Mohamed
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Restaurant Sales Dataset with Dirt Documentation

    Overview

    The Restaurant Sales Dataset with Dirt contains data for 17,534 transactions. The data introduces realistic inconsistencies ("dirt") to simulate real-world scenarios where data may have missing or incomplete information. The dataset includes sales details across multiple categories, such as starters, main dishes, desserts, drinks, and side dishes.

    Dataset Use Cases

    This dataset is suitable for: - Practicing data cleaning tasks, such as handling missing values and deducing missing information. - Conducting exploratory data analysis (EDA) to study restaurant sales patterns. - Feature engineering to create new variables for machine learning tasks.

    Columns Description

    Column NameDescriptionExample Values
    Order IDA unique identifier for each order.ORD_123456
    Customer IDA unique identifier for each customer.CUST_001
    CategoryThe category of the purchased item.Main Dishes, Drinks
    ItemThe name of the purchased item. May contain missing values due to data dirt.Grilled Chicken, None
    PriceThe static price of the item. May contain missing values.15.0, None
    QuantityThe quantity of the purchased item. May contain missing values.1, None
    Order TotalThe total price for the order (Price * Quantity). May contain missing values.45.0, None
    Order DateThe date when the order was placed. Always present.2022-01-15
    Payment MethodThe payment method used for the transaction. May contain missing values due to data dirt.Cash, None

    Key Characteristics

    1. Data Dirtiness:

      • Missing values in key columns (Item, Price, Quantity, Order Total, Payment Method) simulate real-world challenges.
      • At least one of the following conditions is ensured for each record to identify an item:
        • Item is present.
        • Price is present.
        • Both Quantity and Order Total are present.
      • If Price or Quantity is missing, the other is used to deduce the missing value (e.g., Order Total / Quantity).
    2. Menu Categories and Items:

      • Items are divided into five categories:
        • Starters: E.g., Chicken Melt, French Fries.
        • Main Dishes: E.g., Grilled Chicken, Steak.
        • Desserts: E.g., Chocolate Cake, Ice Cream.
        • Drinks: E.g., Coca Cola, Water.
        • Side Dishes: E.g., Mashed Potatoes, Garlic Bread.

    3 Time Range: - Orders span from January 1, 2022, to December 31, 2023.

    Cleaning Suggestions

    1. Handle Missing Values:

      • Fill missing Order Total or Quantity using the formula: Order Total = Price * Quantity.
      • Deduce missing Price from Order Total / Quantity if both are available.
    2. Validate Data Consistency:

      • Ensure that calculated values (Order Total = Price * Quantity) match.
    3. Analyze Missing Patterns:

      • Study the distribution of missing values across categories and payment methods.

    Menu Map with Prices and Categories

    CategoryItemPrice
    StartersChicken Melt8.0
    StartersFrench Fries4.0
    StartersCheese Fries5.0
    StartersSweet Potato Fries5.0
    StartersBeef Chili7.0
    StartersNachos Grande10.0
    Main DishesGrilled Chicken15.0
    Main DishesSteak20.0
    Main DishesPasta Alfredo12.0
    Main DishesSalmon18.0
    Main DishesVegetarian Platter14.0
    DessertsChocolate Cake6.0
    DessertsIce Cream5.0
    DessertsFruit Salad4.0
    DessertsCheesecake7.0
    DessertsBrownie6.0
    DrinksCoca Cola2.5
    DrinksOrange Juice3.0
    Drinks ...
  2. A

    ‘Store Transaction data’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Store Transaction data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-store-transaction-data-2e60/3a5df53c/?iid=007-635&v=presentation
    Explore at:
    Dataset updated
    Feb 14, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Store Transaction data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/iamprateek/store-transaction-data on 14 February 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    Nielsen receives transaction level scanning data (POS Data) from its partner stores on a regular basis. Stores sharing POS data include bigger format store types such as supermarkets, hypermarkets as well as smaller traditional trade grocery stores (Kirana stores), medical stores etc. using a POS machine.

    While in a bigger format store, all items for all transactions are scanned using a POS machine, smaller and more localized shops do not have a 100% compliance rate in terms of scanning and inputting information into the POS machine for all transactions.

    A transaction involving a single packet of chips or a single piece of candy may not be scanned and recorded to spare customer the inconvenience or during rush hours when the store is crowded with customers.

    Thus, the data received from such stores is often incomplete and lacks complete information of all transactions completed within a day.

    Additionally, apart from incomplete transaction data in a day, it is observed that certain stores do not share data for all active days. Stores share data ranging from 2 to 28 days in a month. While it is possible to impute/extrapolate data for 2 days of a month using 28 days of actual historical data, the vice versa is not recommended.

    Nielsen encourages you to create a model which can help impute/extrapolate data to fill in the missing data gaps in the store level POS data currently received.

    Content

    You are provided with the dataset that contains store level data by brands and categories for select stores-

    Hackathon_ Ideal_Data - The file contains brand level data for 10 stores for the last 3 months. This can be referred to as the ideal data.

    Hackathon_Working_Data - This contains data for selected stores which are missing and/or incomplete.

    Hackathon_Mapping_File - This file is provided to help understand the column names in the data set.

    Hackathon_Validation_Data - This file contains the data stores and product groups for which you have to predict the Total_VALUE.

    Sample Submission - This file represents what needs to be uploaded as output by candidate in the same format. The sample data is provided in the file to help understand the columns and values required.

    Acknowledgements

    Nielsen Holdings plc (NYSE: NLSN) is a global measurement and data analytics company that provides the most complete and trusted view available of consumers and markets worldwide. Nielsen is divided into two business units. Nielsen Global Media, the arbiter of truth for media markets, provides media and advertising industries with unbiased and reliable metrics that create a shared understanding of the industry required for markets to function. Nielsen Global Connect provides consumer packaged goods manufacturers and retailers with accurate, actionable information and insights and a complete picture of the complex and changing marketplace that companies need to innovate and grow. Our approach marries proprietary Nielsen data with other data sources to help clients around the world understand what’s happening now, what’s happening next, and how to best act on this knowledge. An S&P 500 company, Nielsen has operations in over 100 countries, covering more than 90% of the world’s population.

    Know more: https://www.nielsen.com/us/en/

    Inspiration

    Build an imputation and/or extrapolation model to fill the missing data gaps for select stores by analyzing the data and determine which factors/variables/features can help best predict the store sales.

    --- Original source retains full ownership of the source dataset ---

  3. e

    Young Persons' Behaviour and Attitudes Survey, 2022 - Dataset - B2FIND

    • b2find.eudat.eu
    Updated Aug 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The citation is currently not available for this dataset.
    Explore at:
    Dataset updated
    Aug 31, 2024
    Description

    Abstract copyright UK Data Service and data collection copyright owner.The Young Persons' Behaviour and Attitudes Survey (YPBAS) is a school-based survey carried out among 11-16 year olds and covers a wide range of topics relevant to the lives of young people today. The main aim of the YPBAS is to gain an insight into, and increase understanding of, the behaviours and lifestyles of adolescents. It also aims to influence various government policies and practices relating to young people and to facilitate access to research findings and expertise. YPBAS was introduced in 2000 as an omnibus survey of post-primary school children which replaced a number of previous surveys. It is a triennial study, conducted once every three years. Repeating this survey on a regular basis will allow government to continue to identify and monitor any significant changes, and if necessary, new policies and strategies will be developed and implemented as a result. Therefore to ensure comparability, the same methodology has been applied over the all rounds to date and the questionnaires were of a similar format.Further information is available on the Northern Ireland Statistics and Research Agency (NISRA) Young Persons' Behaviour and Attitudes Survey webpages. Main Topics:The main topics covered in most years of the YPBAS include:demographicsschooltravelling to schoolnutrition and sportssmokingalcohol, solvents and drugspersonal safetysexual experience and relationshipshealtheducation To accommodate the demand for topics on the 2022 survey, two versions of the questionnaire were used. Schools were randomly assigned one version of the questionnaire.In 2022 several new topics were added to the questionnaire: Road Safety, Future Intentions, Equality and Gambling.Note that for the topic of Young Carers, variables HealthWellbeing_16a through to HealthWellbeing_16y have been removed from the dataset at the client’s request. Multi-stage stratified random sample Self-completion 2022 2023 ACCESS TO HEALTH SE... AGE ALCOHOL USE ALCOHOLIC DRINKS AMPHETAMINES ANABOLIC STEROIDS ANTISOCIAL BEHAVIOUR ANXIETY ARTISTIC ACTIVITIES ARTS ASSAULT ATTITUDES BIRTH CONTROL BULLYING CANNABIS CAREER DEVELOPMENT CAREERS GUIDANCE CATHOLICISM CEREAL PRODUCTS CHILDREN S RIGHTS CITIZENSHIP CLUBS COCAINE COMMUTING CONFECTIONERY CULTURAL STUDIES DEBILITATIVE ILLNESS DIGITAL GAMES DISEASES DOMESTIC VIOLENCE DRINKING BEHAVIOUR DRIVING LESSONS DRUG ABUSE DRUG USE ECSTASY DRUG EDUCATIONAL CHOICE EDUCATIONAL GRANTS EDUCATIONAL INSTITU... EMOTIONAL STATES ENERGY EFFICIENCY ENVIRONMENTAL AWARE... ENVIRONMENTAL DEGRA... ENVIRONMENTAL ISSUES ENVIRONMENTAL MANAG... EQUAL OPPORTUNITY ETHNIC GROUPS EXERCISE PHYSICAL A... FAMILIES FAMILY MEMBERS FATHER S ECONOMIC A... FATHER S PLACE OF B... FEAR OF CRIME FIELDS OF STUDY FIRST AID FISH AS FOOD FOOD FOOD AND NUTRITION FREE SCHOOL MEALS FRIENDS FRIENDSHIP FRUIT FURTHER EDUCATION GAMBLING GENDER GENDER EQUALITY GLOBAL WARMING HEALTH HEALTH FOODS HEROIN HIGHER EDUCATION IN... HISTORIC BUILDINGS HOMEWORK ILL HEALTH INFORMATION SOURCES IRISH GAELIC LANGUAGE LEGUMES LEISURE TIME LEISURE TIME ACTIVI... LIBRARIES LIBRARY FACILITIES LIBRARY USERS LOCAL COMMUNITY FAC... LSD DRUG MAGIC MUSHROOMS MEALS MEAT MILK MONUMENTS MOTHER S PLACE OF B... MUSEUMS NATIONAL LANGUAGE E... NON VERBAL LANGUAGE ORGANIZATIONS PARENT CHILD RELATI... PARENT PARTICIPATION PARTNERSHIPS PERSONAL PERSONAL EFFICACY PHYSICAL ACTIVITIES PLACE OF BIRTH POTATOES PROTESTANTISM PUBLIC TRANSPORT RELIGIOUS INSTRUCTION ROAD SAFETY SAVOURY SNACKS SCHOOL CLASSES SCHOOL LEAVING SCHOOL LEAVING GUID... SCHOOL MEALS SCHOOL PUNISHMENTS SCHOOLCHILDREN SCHOOLS SECONDARY SCHOOLS SELF ESTEEM SEX EDUCATION SEXUAL BEHAVIOUR SEXUALLY TRANSMITTE... SLIMMING DIETS SMOKING SMOKING CESSATION SOCIAL ATTITUDES SOCIAL MEDIA SOFT DRINKS SOLVENT ABUSE SPORT SPORT SPECTATORSHIP SPORTS CLUBS STUDENT ATTITUDE STUDENT TRANSPORTATION SUBSTANCE USE SUN PROTECTION SUNBURN SUNTANNING Social attitudes an... TATTOOING TEACHER STUDENT REL... TELEVISION VIEWING TIME TOBACCO TRANQUILLIZERS TRUANCY TUTORING UNDERAGE DRINKING UNDERAGE SEX VEGETABLES VISITS TO RECREATIO... VOLUNTARY WORK WALKING WATER RESOURCES YOUNG ADULTS YOUTH Youth

  4. o

    IvyDB Signed Volume - Daily Options Trading Volume Data

    • optionmetrics.com
    Updated Nov 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OptionMetrics (2023). IvyDB Signed Volume - Daily Options Trading Volume Data [Dataset]. https://optionmetrics.com/
    Explore at:
    Dataset updated
    Nov 15, 2023
    Dataset authored and provided by
    OptionMetrics
    License

    https://optionmetrics.com/contact/https://optionmetrics.com/contact/

    Time period covered
    Jan 1, 2016 - Present
    Description

    The IvyDB Signed Volume dataset, available as an add-on product for IvyDB US, contains daily data on detailed option trading volume. Trades in the IvyDB US dataset are assigned as either buyer-initiated or seller-initiated based on the trade price and the bid-ask quote at the time of the trade. The total assigned daily volume is aggregated and updated nightly.

  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ahmed Mohamed (2025). Restaurant Sales-Dirty Data for Cleaning Training [Dataset]. https://www.kaggle.com/datasets/ahmedmohamed2003/restaurant-sales-dirty-data-for-cleaning-training
Organization logo

Restaurant Sales-Dirty Data for Cleaning Training

Welcome to All Scientist Restaurant

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 25, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ahmed Mohamed
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Restaurant Sales Dataset with Dirt Documentation

Overview

The Restaurant Sales Dataset with Dirt contains data for 17,534 transactions. The data introduces realistic inconsistencies ("dirt") to simulate real-world scenarios where data may have missing or incomplete information. The dataset includes sales details across multiple categories, such as starters, main dishes, desserts, drinks, and side dishes.

Dataset Use Cases

This dataset is suitable for: - Practicing data cleaning tasks, such as handling missing values and deducing missing information. - Conducting exploratory data analysis (EDA) to study restaurant sales patterns. - Feature engineering to create new variables for machine learning tasks.

Columns Description

Column NameDescriptionExample Values
Order IDA unique identifier for each order.ORD_123456
Customer IDA unique identifier for each customer.CUST_001
CategoryThe category of the purchased item.Main Dishes, Drinks
ItemThe name of the purchased item. May contain missing values due to data dirt.Grilled Chicken, None
PriceThe static price of the item. May contain missing values.15.0, None
QuantityThe quantity of the purchased item. May contain missing values.1, None
Order TotalThe total price for the order (Price * Quantity). May contain missing values.45.0, None
Order DateThe date when the order was placed. Always present.2022-01-15
Payment MethodThe payment method used for the transaction. May contain missing values due to data dirt.Cash, None

Key Characteristics

  1. Data Dirtiness:

    • Missing values in key columns (Item, Price, Quantity, Order Total, Payment Method) simulate real-world challenges.
    • At least one of the following conditions is ensured for each record to identify an item:
      • Item is present.
      • Price is present.
      • Both Quantity and Order Total are present.
    • If Price or Quantity is missing, the other is used to deduce the missing value (e.g., Order Total / Quantity).
  2. Menu Categories and Items:

    • Items are divided into five categories:
      • Starters: E.g., Chicken Melt, French Fries.
      • Main Dishes: E.g., Grilled Chicken, Steak.
      • Desserts: E.g., Chocolate Cake, Ice Cream.
      • Drinks: E.g., Coca Cola, Water.
      • Side Dishes: E.g., Mashed Potatoes, Garlic Bread.

3 Time Range: - Orders span from January 1, 2022, to December 31, 2023.

Cleaning Suggestions

  1. Handle Missing Values:

    • Fill missing Order Total or Quantity using the formula: Order Total = Price * Quantity.
    • Deduce missing Price from Order Total / Quantity if both are available.
  2. Validate Data Consistency:

    • Ensure that calculated values (Order Total = Price * Quantity) match.
  3. Analyze Missing Patterns:

    • Study the distribution of missing values across categories and payment methods.

Menu Map with Prices and Categories

CategoryItemPrice
StartersChicken Melt8.0
StartersFrench Fries4.0
StartersCheese Fries5.0
StartersSweet Potato Fries5.0
StartersBeef Chili7.0
StartersNachos Grande10.0
Main DishesGrilled Chicken15.0
Main DishesSteak20.0
Main DishesPasta Alfredo12.0
Main DishesSalmon18.0
Main DishesVegetarian Platter14.0
DessertsChocolate Cake6.0
DessertsIce Cream5.0
DessertsFruit Salad4.0
DessertsCheesecake7.0
DessertsBrownie6.0
DrinksCoca Cola2.5
DrinksOrange Juice3.0
Drinks ...
Search
Clear search
Close search
Google apps
Main menu