Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The Restaurant Sales Dataset with Dirt contains data for 17,534 transactions. The data introduces realistic inconsistencies ("dirt") to simulate real-world scenarios where data may have missing or incomplete information. The dataset includes sales details across multiple categories, such as starters, main dishes, desserts, drinks, and side dishes.
This dataset is suitable for: - Practicing data cleaning tasks, such as handling missing values and deducing missing information. - Conducting exploratory data analysis (EDA) to study restaurant sales patterns. - Feature engineering to create new variables for machine learning tasks.
Column Name | Description | Example Values |
---|---|---|
Order ID | A unique identifier for each order. | ORD_123456 |
Customer ID | A unique identifier for each customer. | CUST_001 |
Category | The category of the purchased item. | Main Dishes , Drinks |
Item | The name of the purchased item. May contain missing values due to data dirt. | Grilled Chicken , None |
Price | The static price of the item. May contain missing values. | 15.0 , None |
Quantity | The quantity of the purchased item. May contain missing values. | 1 , None |
Order Total | The total price for the order (Price * Quantity ). May contain missing values. | 45.0 , None |
Order Date | The date when the order was placed. Always present. | 2022-01-15 |
Payment Method | The payment method used for the transaction. May contain missing values due to data dirt. | Cash , None |
Data Dirtiness:
Item
, Price
, Quantity
, Order Total
, Payment Method
) simulate real-world challenges.Item
is present.Price
is present.Quantity
and Order Total
are present.Price
or Quantity
is missing, the other is used to deduce the missing value (e.g., Order Total / Quantity
).Menu Categories and Items:
Chicken Melt
, French Fries
.Grilled Chicken
, Steak
.Chocolate Cake
, Ice Cream
.Coca Cola
, Water
.Mashed Potatoes
, Garlic Bread
.3 Time Range: - Orders span from January 1, 2022, to December 31, 2023.
Handle Missing Values:
Order Total
or Quantity
using the formula: Order Total = Price * Quantity
.Price
from Order Total / Quantity
if both are available.Validate Data Consistency:
Order Total = Price * Quantity
) match.Analyze Missing Patterns:
Category | Item | Price |
---|---|---|
Starters | Chicken Melt | 8.0 |
Starters | French Fries | 4.0 |
Starters | Cheese Fries | 5.0 |
Starters | Sweet Potato Fries | 5.0 |
Starters | Beef Chili | 7.0 |
Starters | Nachos Grande | 10.0 |
Main Dishes | Grilled Chicken | 15.0 |
Main Dishes | Steak | 20.0 |
Main Dishes | Pasta Alfredo | 12.0 |
Main Dishes | Salmon | 18.0 |
Main Dishes | Vegetarian Platter | 14.0 |
Desserts | Chocolate Cake | 6.0 |
Desserts | Ice Cream | 5.0 |
Desserts | Fruit Salad | 4.0 |
Desserts | Cheesecake | 7.0 |
Desserts | Brownie | 6.0 |
Drinks | Coca Cola | 2.5 |
Drinks | Orange Juice | 3.0 |
Drinks ... |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Store Transaction data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/iamprateek/store-transaction-data on 14 February 2022.
--- Dataset description provided by original source is as follows ---
Nielsen receives transaction level scanning data (POS Data) from its partner stores on a regular basis. Stores sharing POS data include bigger format store types such as supermarkets, hypermarkets as well as smaller traditional trade grocery stores (Kirana stores), medical stores etc. using a POS machine.
While in a bigger format store, all items for all transactions are scanned using a POS machine, smaller and more localized shops do not have a 100% compliance rate in terms of scanning and inputting information into the POS machine for all transactions.
A transaction involving a single packet of chips or a single piece of candy may not be scanned and recorded to spare customer the inconvenience or during rush hours when the store is crowded with customers.
Thus, the data received from such stores is often incomplete and lacks complete information of all transactions completed within a day.
Additionally, apart from incomplete transaction data in a day, it is observed that certain stores do not share data for all active days. Stores share data ranging from 2 to 28 days in a month. While it is possible to impute/extrapolate data for 2 days of a month using 28 days of actual historical data, the vice versa is not recommended.
Nielsen encourages you to create a model which can help impute/extrapolate data to fill in the missing data gaps in the store level POS data currently received.
You are provided with the dataset that contains store level data by brands and categories for select stores-
Hackathon_ Ideal_Data - The file contains brand level data for 10 stores for the last 3 months. This can be referred to as the ideal data.
Hackathon_Working_Data - This contains data for selected stores which are missing and/or incomplete.
Hackathon_Mapping_File - This file is provided to help understand the column names in the data set.
Hackathon_Validation_Data - This file contains the data stores and product groups for which you have to predict the Total_VALUE.
Sample Submission - This file represents what needs to be uploaded as output by candidate in the same format. The sample data is provided in the file to help understand the columns and values required.
Nielsen Holdings plc (NYSE: NLSN) is a global measurement and data analytics company that provides the most complete and trusted view available of consumers and markets worldwide. Nielsen is divided into two business units. Nielsen Global Media, the arbiter of truth for media markets, provides media and advertising industries with unbiased and reliable metrics that create a shared understanding of the industry required for markets to function. Nielsen Global Connect provides consumer packaged goods manufacturers and retailers with accurate, actionable information and insights and a complete picture of the complex and changing marketplace that companies need to innovate and grow. Our approach marries proprietary Nielsen data with other data sources to help clients around the world understand what’s happening now, what’s happening next, and how to best act on this knowledge. An S&P 500 company, Nielsen has operations in over 100 countries, covering more than 90% of the world’s population.
Know more: https://www.nielsen.com/us/en/
Build an imputation and/or extrapolation model to fill the missing data gaps for select stores by analyzing the data and determine which factors/variables/features can help best predict the store sales.
--- Original source retains full ownership of the source dataset ---
Abstract copyright UK Data Service and data collection copyright owner.The Young Persons' Behaviour and Attitudes Survey (YPBAS) is a school-based survey carried out among 11-16 year olds and covers a wide range of topics relevant to the lives of young people today. The main aim of the YPBAS is to gain an insight into, and increase understanding of, the behaviours and lifestyles of adolescents. It also aims to influence various government policies and practices relating to young people and to facilitate access to research findings and expertise. YPBAS was introduced in 2000 as an omnibus survey of post-primary school children which replaced a number of previous surveys. It is a triennial study, conducted once every three years. Repeating this survey on a regular basis will allow government to continue to identify and monitor any significant changes, and if necessary, new policies and strategies will be developed and implemented as a result. Therefore to ensure comparability, the same methodology has been applied over the all rounds to date and the questionnaires were of a similar format.Further information is available on the Northern Ireland Statistics and Research Agency (NISRA) Young Persons' Behaviour and Attitudes Survey webpages. Main Topics:The main topics covered in most years of the YPBAS include:demographicsschooltravelling to schoolnutrition and sportssmokingalcohol, solvents and drugspersonal safetysexual experience and relationshipshealtheducation To accommodate the demand for topics on the 2022 survey, two versions of the questionnaire were used. Schools were randomly assigned one version of the questionnaire.In 2022 several new topics were added to the questionnaire: Road Safety, Future Intentions, Equality and Gambling.Note that for the topic of Young Carers, variables HealthWellbeing_16a through to HealthWellbeing_16y have been removed from the dataset at the client’s request. Multi-stage stratified random sample Self-completion 2022 2023 ACCESS TO HEALTH SE... AGE ALCOHOL USE ALCOHOLIC DRINKS AMPHETAMINES ANABOLIC STEROIDS ANTISOCIAL BEHAVIOUR ANXIETY ARTISTIC ACTIVITIES ARTS ASSAULT ATTITUDES BIRTH CONTROL BULLYING CANNABIS CAREER DEVELOPMENT CAREERS GUIDANCE CATHOLICISM CEREAL PRODUCTS CHILDREN S RIGHTS CITIZENSHIP CLUBS COCAINE COMMUTING CONFECTIONERY CULTURAL STUDIES DEBILITATIVE ILLNESS DIGITAL GAMES DISEASES DOMESTIC VIOLENCE DRINKING BEHAVIOUR DRIVING LESSONS DRUG ABUSE DRUG USE ECSTASY DRUG EDUCATIONAL CHOICE EDUCATIONAL GRANTS EDUCATIONAL INSTITU... EMOTIONAL STATES ENERGY EFFICIENCY ENVIRONMENTAL AWARE... ENVIRONMENTAL DEGRA... ENVIRONMENTAL ISSUES ENVIRONMENTAL MANAG... EQUAL OPPORTUNITY ETHNIC GROUPS EXERCISE PHYSICAL A... FAMILIES FAMILY MEMBERS FATHER S ECONOMIC A... FATHER S PLACE OF B... FEAR OF CRIME FIELDS OF STUDY FIRST AID FISH AS FOOD FOOD FOOD AND NUTRITION FREE SCHOOL MEALS FRIENDS FRIENDSHIP FRUIT FURTHER EDUCATION GAMBLING GENDER GENDER EQUALITY GLOBAL WARMING HEALTH HEALTH FOODS HEROIN HIGHER EDUCATION IN... HISTORIC BUILDINGS HOMEWORK ILL HEALTH INFORMATION SOURCES IRISH GAELIC LANGUAGE LEGUMES LEISURE TIME LEISURE TIME ACTIVI... LIBRARIES LIBRARY FACILITIES LIBRARY USERS LOCAL COMMUNITY FAC... LSD DRUG MAGIC MUSHROOMS MEALS MEAT MILK MONUMENTS MOTHER S PLACE OF B... MUSEUMS NATIONAL LANGUAGE E... NON VERBAL LANGUAGE ORGANIZATIONS PARENT CHILD RELATI... PARENT PARTICIPATION PARTNERSHIPS PERSONAL PERSONAL EFFICACY PHYSICAL ACTIVITIES PLACE OF BIRTH POTATOES PROTESTANTISM PUBLIC TRANSPORT RELIGIOUS INSTRUCTION ROAD SAFETY SAVOURY SNACKS SCHOOL CLASSES SCHOOL LEAVING SCHOOL LEAVING GUID... SCHOOL MEALS SCHOOL PUNISHMENTS SCHOOLCHILDREN SCHOOLS SECONDARY SCHOOLS SELF ESTEEM SEX EDUCATION SEXUAL BEHAVIOUR SEXUALLY TRANSMITTE... SLIMMING DIETS SMOKING SMOKING CESSATION SOCIAL ATTITUDES SOCIAL MEDIA SOFT DRINKS SOLVENT ABUSE SPORT SPORT SPECTATORSHIP SPORTS CLUBS STUDENT ATTITUDE STUDENT TRANSPORTATION SUBSTANCE USE SUN PROTECTION SUNBURN SUNTANNING Social attitudes an... TATTOOING TEACHER STUDENT REL... TELEVISION VIEWING TIME TOBACCO TRANQUILLIZERS TRUANCY TUTORING UNDERAGE DRINKING UNDERAGE SEX VEGETABLES VISITS TO RECREATIO... VOLUNTARY WORK WALKING WATER RESOURCES YOUNG ADULTS YOUTH Youth
https://optionmetrics.com/contact/https://optionmetrics.com/contact/
The IvyDB Signed Volume dataset, available as an add-on product for IvyDB US, contains daily data on detailed option trading volume. Trades in the IvyDB US dataset are assigned as either buyer-initiated or seller-initiated based on the trade price and the bid-ask quote at the time of the trade. The total assigned daily volume is aggregated and updated nightly.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The Restaurant Sales Dataset with Dirt contains data for 17,534 transactions. The data introduces realistic inconsistencies ("dirt") to simulate real-world scenarios where data may have missing or incomplete information. The dataset includes sales details across multiple categories, such as starters, main dishes, desserts, drinks, and side dishes.
This dataset is suitable for: - Practicing data cleaning tasks, such as handling missing values and deducing missing information. - Conducting exploratory data analysis (EDA) to study restaurant sales patterns. - Feature engineering to create new variables for machine learning tasks.
Column Name | Description | Example Values |
---|---|---|
Order ID | A unique identifier for each order. | ORD_123456 |
Customer ID | A unique identifier for each customer. | CUST_001 |
Category | The category of the purchased item. | Main Dishes , Drinks |
Item | The name of the purchased item. May contain missing values due to data dirt. | Grilled Chicken , None |
Price | The static price of the item. May contain missing values. | 15.0 , None |
Quantity | The quantity of the purchased item. May contain missing values. | 1 , None |
Order Total | The total price for the order (Price * Quantity ). May contain missing values. | 45.0 , None |
Order Date | The date when the order was placed. Always present. | 2022-01-15 |
Payment Method | The payment method used for the transaction. May contain missing values due to data dirt. | Cash , None |
Data Dirtiness:
Item
, Price
, Quantity
, Order Total
, Payment Method
) simulate real-world challenges.Item
is present.Price
is present.Quantity
and Order Total
are present.Price
or Quantity
is missing, the other is used to deduce the missing value (e.g., Order Total / Quantity
).Menu Categories and Items:
Chicken Melt
, French Fries
.Grilled Chicken
, Steak
.Chocolate Cake
, Ice Cream
.Coca Cola
, Water
.Mashed Potatoes
, Garlic Bread
.3 Time Range: - Orders span from January 1, 2022, to December 31, 2023.
Handle Missing Values:
Order Total
or Quantity
using the formula: Order Total = Price * Quantity
.Price
from Order Total / Quantity
if both are available.Validate Data Consistency:
Order Total = Price * Quantity
) match.Analyze Missing Patterns:
Category | Item | Price |
---|---|---|
Starters | Chicken Melt | 8.0 |
Starters | French Fries | 4.0 |
Starters | Cheese Fries | 5.0 |
Starters | Sweet Potato Fries | 5.0 |
Starters | Beef Chili | 7.0 |
Starters | Nachos Grande | 10.0 |
Main Dishes | Grilled Chicken | 15.0 |
Main Dishes | Steak | 20.0 |
Main Dishes | Pasta Alfredo | 12.0 |
Main Dishes | Salmon | 18.0 |
Main Dishes | Vegetarian Platter | 14.0 |
Desserts | Chocolate Cake | 6.0 |
Desserts | Ice Cream | 5.0 |
Desserts | Fruit Salad | 4.0 |
Desserts | Cheesecake | 7.0 |
Desserts | Brownie | 6.0 |
Drinks | Coca Cola | 2.5 |
Drinks | Orange Juice | 3.0 |
Drinks ... |