Data you can expect: - Metadata (country, region, city, coordinates, address, categories, description, operating hours, and more) - Contacts (phone contacts, email, website) - Social profiles (LinkedIn, Twitter, Instagram, Facebook) - Reviews (reviews and ratings on different sites) - Menus (categories, items, prices, descriptions and photos) - Other info (awards, Michelin stars, executive chef, popular dishes, average meal price, and more) - Photos (ambience, food, menu photos)
Let us know if you have a specific request, and we'll try to fulfil it.
How we deliver data: - We transform it to fit your system's data schema, (ease the pain and cost of having data engineers from your side) - We are completely flexible on the delivery format and method.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The Restaurant Sales Dataset with Dirt contains data for 17,534 transactions. The data introduces realistic inconsistencies ("dirt") to simulate real-world scenarios where data may have missing or incomplete information. The dataset includes sales details across multiple categories, such as starters, main dishes, desserts, drinks, and side dishes.
This dataset is suitable for: - Practicing data cleaning tasks, such as handling missing values and deducing missing information. - Conducting exploratory data analysis (EDA) to study restaurant sales patterns. - Feature engineering to create new variables for machine learning tasks.
Column Name | Description | Example Values |
---|---|---|
Order ID | A unique identifier for each order. | ORD_123456 |
Customer ID | A unique identifier for each customer. | CUST_001 |
Category | The category of the purchased item. | Main Dishes , Drinks |
Item | The name of the purchased item. May contain missing values due to data dirt. | Grilled Chicken , None |
Price | The static price of the item. May contain missing values. | 15.0 , None |
Quantity | The quantity of the purchased item. May contain missing values. | 1 , None |
Order Total | The total price for the order (Price * Quantity ). May contain missing values. | 45.0 , None |
Order Date | The date when the order was placed. Always present. | 2022-01-15 |
Payment Method | The payment method used for the transaction. May contain missing values due to data dirt. | Cash , None |
Data Dirtiness:
Item
, Price
, Quantity
, Order Total
, Payment Method
) simulate real-world challenges.Item
is present.Price
is present.Quantity
and Order Total
are present.Price
or Quantity
is missing, the other is used to deduce the missing value (e.g., Order Total / Quantity
).Menu Categories and Items:
Chicken Melt
, French Fries
.Grilled Chicken
, Steak
.Chocolate Cake
, Ice Cream
.Coca Cola
, Water
.Mashed Potatoes
, Garlic Bread
.3 Time Range: - Orders span from January 1, 2022, to December 31, 2023.
Handle Missing Values:
Order Total
or Quantity
using the formula: Order Total = Price * Quantity
.Price
from Order Total / Quantity
if both are available.Validate Data Consistency:
Order Total = Price * Quantity
) match.Analyze Missing Patterns:
Category | Item | Price |
---|---|---|
Starters | Chicken Melt | 8.0 |
Starters | French Fries | 4.0 |
Starters | Cheese Fries | 5.0 |
Starters | Sweet Potato Fries | 5.0 |
Starters | Beef Chili | 7.0 |
Starters | Nachos Grande | 10.0 |
Main Dishes | Grilled Chicken | 15.0 |
Main Dishes | Steak | 20.0 |
Main Dishes | Pasta Alfredo | 12.0 |
Main Dishes | Salmon | 18.0 |
Main Dishes | Vegetarian Platter | 14.0 |
Desserts | Chocolate Cake | 6.0 |
Desserts | Ice Cream | 5.0 |
Desserts | Fruit Salad | 4.0 |
Desserts | Cheesecake | 7.0 |
Desserts | Brownie | 6.0 |
Drinks | Coca Cola | 2.5 |
Drinks | Orange Juice | 3.0 |
Drinks ... |
In the fast-paced world of hospitality, data is essential for success. Our Global Bar & Restaurant POI database offers in-depth information on the locations of the world's top bars and restaurants, providing businesses with a powerful tool for strategic decision-making. Whether you're a restaurant chain, a marketing agency, or a hospitality researcher, our Global Bar & Restaurant database is a valuable resource for making informed decisions.
What You'll Find in the Database:
-Visitation Metrics: GDPR-compliant, non-PII foot traffic insights to help you identify the best locations for your next opening.
Establishment Information: Official name, unique identifier, and type of establishment (bar, restaurant, café, fast-food chain, etc.).
Operational Status: Whether the establishment is currently open or closed.
Date Established: Historical context for trend analysis.
Data Confidence Level: A rating indicating the accuracy of the information.
How You Can Use This Database:
Market Analysis: Assess the distribution and density of bars and restaurants globally.
Site Selection: Identify promising locations for new establishments based on demographics, competition, and visitation metrics of nearby establishments.
Targeted Marketing: Reach customers near specific establishments with personalized offers.
Competitive Intelligence: Understand the landscape and identify rivals' strategies.
Supply Chain Optimization: Streamline logistics based on the distribution of your target establishments.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset offers a detailed overview of restaurant information, including their location, cuisines, average cost, and user ratings. It is designed to facilitate the analysis of various factors influencing restaurant popularity, such as cuisine type, pricing, and the availability of booking and delivery services. The dataset can be instrumental in developing personalised restaurant recommendation systems and gaining insights into the broader food service industry.
The dataset is typically provided in CSV format and comprises approximately 9,531 records. * Average Cost for two: Costs predominantly fall within the 0.00 - 80,000.00 range. * Currency: Indian Rupees (Rs.) accounts for 91% of the entries, while Dollar ($) accounts for 5%. * City: New Delhi represents 57% of the restaurants, Gurgaon 12%, and other cities account for 31%. There are 8,918 unique city values. * Locality: Connaught Place and Rajouri Garden each represent 1% of localities, with 98% falling into other categories. There are 9,330 unique locality values. * Longitude: Values range from -158 to 175, with a significant concentration between 75.00 and 108.28 (8,064 entries). * Latitude: Values range from -41.3 to 56, with a large number of entries between 26.78 and 36.52 (7,911 entries). * Cuisines: North Indian cuisine accounts for 10%, North Indian, Chinese for 5%, and other cuisine combinations for 85%.
This dataset is ideal for: * Developing restaurant recommendation systems to suggest personalised dining options based on user preferences, location, and restaurant attributes. * Analysing factors affecting restaurant popularity, such as cuisine type, pricing, table booking availability, and online delivery services. * Gaining insights into the food delivery industry dynamics. * Solving problem statements related to the influence of location on cost, the relationship between cuisine type and ratings, the correlation between cost and ratings, and the impact of booking/delivery options on ratings.
The dataset's geographic scope is global, with a strong focus on cities like New Delhi (57%) and Gurgaon (12%) in India, and other cities making up the remaining 31%. The time range and specific demographic scope of the data are not specified in the available information.
CC0
Original Data Source: Global Zomato Dataset
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Motivation: Entity Matching is the task of determining which records from different data sources describe the same real-world entity. It is an important task for data integration and has been the focus of many research works. A large number of entity matching/record linkage tasks has been made available for evaluating entity matching methods. However, the lack of fixed development and test splits as well as correspondence sets including both matching and non-matching record pairs hinders the reproducibility and comparability of benchmark experiments. In an effort to enhance the reproducibility and comparability of the experiments, we complement existing entity matching benchmark tasks with fixed sets of non-matching pairs as well as fixed development and test splits. Dataset Description: An augmented version of the fodors-zagats restaurants dataset for benchmarking entity matching/record linkage methods found at:https://hpi.de/en/naumann/projects/data-integration-data-quality-and-data-cleansing/dude.html#c11471 The augmented version adds a fixed set of non-matching pairs to the original dataset. In addition, fixed splits for training, validation and testing as well as their corresponding feature vectors are provided. The feature vectors are built using data type specific similarity metrics.The dataset contains 533 records describing restaurants from fodors.com which are matched against 331 restaurants records from zagat.com. The gold standards have manual annotations for 112 matching and 488 non-matching pairs. The total number of attributes used to decribe the product records are 5 while the attribute density is 100%.The augmented dataset enhances the reproducibility of matching methods and the comparability of matching results. The dataset is part of the CompERBench repository which provides 21 complete benchmark tasks for entity matching for public download: http://data.dws.informatik.uni-mannheim.de/benchmarkmatchingtasks/index.html
MealMe offers in-depth restaurant menu data, including prices, from the top 100,000 restaurants across the USA and Canada. Our proprietary technology collects accurate, real-time menu and pricing information, enabling businesses to make data-driven decisions in competitive intelligence, pricing optimization, and market research. With comprehensive coverage that spans major restaurant platforms and chains, MealMe ensures your business has access to the most reliable data to excel in a rapidly evolving industry.
Platforms and Restaurants Covered: MealMe's database includes data from leading restaurant platforms such as UberEats, Postmates, ToastTakeout, SkipTheDishes, Square, Appfront, Olo, TouchBistro, and Clover, as well as direct menu data from major restaurant chains including Raising Cane’s, Panda Express, Popeyes, Burger King, and Subway. This extensive coverage ensures a detailed view of the market, helping businesses monitor trends, pricing, and availability across a broad spectrum of restaurant types and sizes.
Key Features: Comprehensive Menu Data: Access detailed menu information, including item descriptions, categories, sizes, and customizations. Real-Time Pricing: Monitor up-to-date menu prices for accurate competitive analysis. Restaurant-Specific Insights: Analyze individual restaurant chains such as Raising Cane’s and Panda Express, or platforms like UberEats, for market trends and pricing strategies. Cross-Platform Analysis: Compare menu items and pricing across platforms like ToastTakeout, Olo, and SkipTheDishes for a holistic industry view. Regional Data: Understand geographic variations in menu offerings and pricing across the USA and Canada.
Use Cases: Competitive Intelligence: Track menu offerings, pricing strategies, and seasonal trends across platforms like UberEats and Postmates or chains like Popeyes and Subway. Market Research: Identify gaps in the market by analyzing menus and pricing from top restaurants. Pricing Optimization: Use real-time pricing data to inform dynamic pricing strategies and promotions. Trend Monitoring: Stay ahead by tracking popular menu items, regional preferences, and emerging food trends. Platform Analysis: Assess how restaurants perform across delivery platforms such as SkipTheDishes, Olo, and Square. Industries Benefiting from Our Data Restaurant Chains: Optimize menu offerings and pricing strategies with detailed competitor data. Food Delivery Platforms: Benchmark menu pricing and availability across competitive platforms. Market Research Firms: Conduct detailed analyses to identify opportunities and market trends. AI & Analytics Companies: Power recommendation engines and predictive models with robust menu data. Consumer Apps: Enhance app experiences with accurate menu and pricing data. Data Delivery and Integration
MealMe offers flexible integration options to ensure seamless access to our comprehensive menu data. Whether you need bulk exports for in-depth research or real-time updates via API, our solutions are designed to scale with your business needs.
Why Choose MealMe? Extensive Coverage: Menu data from 100,000+ restaurants, including major chains like Burger King and Raising Cane’s. Real-Time Accuracy: Up-to-date pricing and menu details for actionable insights. Customizable Solutions: Tailored datasets to meet your specific business objectives. Proven Expertise: Trusted by top companies for delivering reliable, actionable data. MealMe empowers businesses with the data needed to thrive in a competitive restaurant and food delivery market. For more information or to request a demo, contact us today!
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘🍕 Pizza restaurants and Pizzas on their Menus’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/pizza-restaurants-and-pizzas-on-their-menuse on 13 February 2022.
--- Dataset description provided by original source is as follows ---
About this Data
This is a list of over 3,500 pizzas from multiple restaurants provided by Datafiniti's Business Database. The dataset includes the category, name, address, city, state, menu information, price range, and more for each pizza restaurant.
Note that this is a sample of a large dataset. The full dataset is available through Datafiniti.
What You Can Do with this Data
You can use this data to discover how much you can expect to pay for pizza across the country. E.g.:
- What are the least and most expensive cities for pizza?
- What is the number of restaurants serving pizza per capita (100,000 residents) across the U.S.?
- What is the median price of a large plain pizza across the U.S.?
- Which cities have the most restaurants serving pizza per capita (100,000 residents)?
Data Schema
A full schema for the data is available in our support documentation.
About Datafiniti
Datafiniti provides instant access to web data. We compile data from thousands of websites to create standardized databases of business, product, and property information. Learn more.
Interested in the Full Dataset?
Get this data and more by creating a free Datafiniti account or requesting a demo.
This dataset was created by Datafiniti and contains around 10000 samples along with Longitude, Price Range Max, technical information and other features such as: - Date Updated - Categories - and more.
- Analyze Date Added in relation to Province
- Study the influence of Price Range Min on Address
- More datasets
If you use this dataset in your research, please credit Datafiniti
--- Original source retains full ownership of the source dataset ---
This dataset contains lists of Restaurants and their menus in the USA that are partnered with Uber Eats. Data was collected via web scraping using python libraries.
*This dataset is dedicated to the awesome delivery drivers of Uber Eats, hence the cover image
kaggle API Command
!kaggle datasets download -d ahmedshahriarsakib/uber-eats-usa-restaurants-menus
The dataset has two CSV files -
restaurants.csv (40k+ entries, 11 columns)
$
= Inexpensive, $$
= Moderately expensive, $$$
= Expensive, $$$$
= Very Expensive) - Source - stackoverflowrestaurant-menus.csv (3.71M entries, 5 columns)
Data was scraped from - - https://www.ubereats.com - An online food ordering and delivery platform launched by Uber in 2014. Users can read menus, reviews, ratings, order, and pay for food from participating restaurants using an application on the iOS or Android platforms, or through a web browser. Users are also able to tip for delivery. Payment is charged to a card on file with Uber. Meals are delivered by couriers using cars, scooters, bikes, or foot. It is operational in over 6,000 cities across 45 countries.
The data and information in the data set provided here are intended to use for educational purposes only. I do not own any of the data and all rights are reserved to the respective owners.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains 12.7k entries of restaurants and cafes from all over Bangladesh
This dataset was collected from Google Maps using google places API. To know about the scraping process kindly visit my github repository. The dataset has 8 columns containing information about restaurants. - place id : A unique identifier of a place on Google Maps - name : Name of the Restaurant - latitude - longitude - rating : Rating of the Restaurant (0 - 5.0) - number of reviews : Total number of reviews given - affluence : Prices level of the Restaurant (1.0 -> Cheap, 2.0 -> Moderate, 3.0 -> Expensive, 4.0 -> Very Expensive) - address : Address of the Restaurant
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Allegheny County Health Department has generated this list of fast food restaurants by exporting all chain restaurants without an alcohol permit from the County’s Fee and Permit System. A chain restaurant defined by the County is any restaurant that has more than one location in the County. Chain restaurants capture both local and national chains (including locally owned national chains) so long as there is one or more establishments in operation within the County.
Support for Health Equity datasets and tools provided by Amazon Web Services (AWS) through their Health Equity Initiative.
Hi 👋, The food industry has grown rapidly. It produces a lot of restaurants each one of them has his one value wither it was in the type of food or the price and locations, as it became the target market for any new business, So why we don't collect data about these restaurants and TripAdvisor is the place to find all the information that we need.
This dataset was scraped from TripAdvisor Tripadvisor, the world's largest travel platform, it contains all the information that helps the travelers around the world, to find the best accommodations, restaurants, experiences, airlines, and cruises, by reviewing all the information the traveler needs to know about starting from the name to the reviews of the previous customers. Here we focused only on restaurants in Saudi Arabia since improving tourism was a hot topic in the last period of time.
This data contains information about restaurants in 3 main cities in Saudi Arabia: JEDDAH , RYADH, DAMMAM. Also, there is 4csv file 3 represents each city and the last one is the big one that contains all the 3. The information is : name | the name of the restaurant type | type of food that it represents location | the full location of the restaurant review score| how many points did he get review number| how many people give there feedback city| where is he opening hours | when he opens and when he close price range| start from - until out_of| his place out of the other restaurants represent the same type of food address_line1| extracted from location address_line2|extracted from location type 2 |extracted from type
This data is taken fro Trip-advisor website, and this project was required in order to graduate from GA data science Immersive course
There are a lot of things inspired me to do this one of them is restaurants and cafes are really important destinations when it comes to entertainment and also if you look at it from a business perspective it almost Succesful business if it was well planned. So, i thought about classifying this data to find the best location for a specific type of food in order to help any user or a new business to choose the perfect location. Or, you can combine these Data to do prediction or even recommendations. After all, Due to the current circumstances I really missed going out😢Maybe that was the main reason🙈.
There are many contexts where dyadic data are present. In social networks, users are linked to a variety of items, defining interactions. In the social platform of TripAdvisor, users are linked to restaurants by means of reviews posted by them. Using the information of these interactions, we can get valuable insights for forecasting, proposing tasks related to recommender systems, sentiment analysis, text-based personalisation or text summarisation, among others. Furthermore, in the context of TripAdvisor there is a scarcity of public datasets and lack of well-known benchmarks for model assessment. We present six new TripAdvisor datasets from the restaurants of six different cities: London, New York, New Delhi, Paris, Barcelona and Madrid. If you use this data, please cite the following paper under submission process (preprint - arXiv) We exclusively collected the reviews written in English from the restaurants of each city. The tabular data is comprised of a set of six different CSV files, containing numerical, categorical and text features: parse_count: numerical (integer), corresponding number of extracted review by the web scraper (auto-incremental) author_id: categorical (string), univocal, incremental and anonymous identifier of the user (UID_XXXXXXXXXX) restaurant_name: categorical (string), name of the restaurant matching the review rating_review: numerical (integer), review score in the range 1-5 sample: categorical (string), indicating “positive” sample for scores 4-5 and “negative” for scores 1-3 review_id: categorical (string), univocal and internal identifier of the review (review_XXXXXXXXX) title_review: text, review title review_preview: text, preview of the review, truncated in the website when the text is very long review_full: text, complete review date: timestamp, publication date of the review in the format (day, month, year) city: categorical (string), city of the restaurant which the review was written for url_restaurant: text, restaurant url
The Wake County health department inspects food service facilities throughout Wake County. The department permits and inspects these facilities, and responds to citizen complaints. In the event of disease outbreak, the department investigates to determine the source of the infection, and prevent further illness.
This dataset captures the restaurants that are inspected. The data set is geocoded based on address with approximately 85% of the locations having a valid geo-location.
You can find out additional information about our restaurant inspections on our website: Food Safety and Sanitation
This table captures all Wake County sanitation inspections from September 20, 2012 to Present.
This table is part of a set of data that combined will give you a picture of all restaurant inspections. Those three tables are:
1. Restaurants: This table captures all active facilities where Wake County performs sanitations inspections. Facilities that are closed are removed from all three files in this dataset. Per NC State regulations, facilities that have a change in ownership are considered closed and the restaurant re-opens under a new permit, even if there is not a change in the name of the restaurant.
2. Food Inspections: This table captures all Wake County performs sanitations inspections at active restaurants since September 20, 2012
3. Food Inspection Violations: This table captures all violations identified during specific Wake County sanitations inspections at active restaurants since September 20, 2012. It reports the results in code violations and according to CDC Risk Factors. You can find additional information about the CDC Risk Factors on the FDA website: "http://www.fda.gov/Food/GuidanceRegulation/RetailFoodProtection/FoodborneIllnessRiskFactorReduction/ucm224321.htm">Retail Risk Factor Study
The tables can be connected through the HSISID field and the Permit ID field.
The frequency of facility inspections fall under the following rules:
Inspected once per year:
Risk Category I applies to food service establishments that prepare only non-potentially hazardous foods.
Inspected twice per year:
Risk Category II applies to food service establishments that cook and cool no more than two potentially hazardous foods. Potentially hazardous raw ingredients shall be received in a ready-to-cook form.
Inspected three times per year
Risk Category III applies to food service establishments that cook and cool no more than three potentially hazardous foods.
Inspected four times per year
Risk Category IV applies to food service establishments that cook and cool an unlimited number of potentially hazardous foods. This category also includes those facilities using specialized processes or serving a highly susceptible population.
Field |
Description | |
HSISID |
State code identifying the restaurant (also the primary key to identify the restaurant) | |
InspectDate |
Date of Inspection | |
Category |
NC Risk Factor | |
StateCode |
NC Administrative Code | |
Critical |
Point or procedure in a specific food system where loss of control may result in an unacceptable health risk |
|
QuestionNo |
Inspection question number | |
ViolationCode |
NC Food Code Violation codes | |
Severity |
Core: General sanitation, SOP's, facilities, structure | |
ShortDesc |
Short description of violation | |
InspectedBy |
Name of Inspector | |
Comments |
Comments from Inspector | |
PointValue |
Number of points assigned to this particular violation | |
ObservationType |
Compliance status: IN, OUT, NA, NO | |
ViolationType |
R: Repeat, VR: Verification Required, CDI: Corrected During Inspection | |
CDCRiskFactor |
Risk factor established by the CDC and FDA | |
CDCDataItem |
Item within Risk Factor | |
PermitID |
The permit issued for this facility |
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains 1,000 text reviews gathered from various restaurants, with each review clearly marked as either positive or negative. It has been created with beginners in mind, particularly for those delving into the fields of sentiment analysis and natural language processing (NLP). The dataset serves as an excellent starting point for understanding how to process and classify textual data.
The dataset is provided as a CSV (Comma-Separated Values) file, named Beginner_Reviews_dataset.csv
. It has a file size of approximately 66.84 kB. The dataset consists of 1,000 records or rows, with each row representing a single restaurant review and its corresponding sentiment label.
This dataset is designed to be user-friendly for those new to data science. It can be utilised to train and evaluate sentiment analysis models, making it ideal for binary classification tasks. It is well-suited for educational purposes, assisting learners in developing skills in text preprocessing, feature extraction, and various classification algorithms within the NLP domain.
The reviews included in this dataset originate from various restaurants, implying a global scope rather than a specific geographic region. There is no specific time range for the reviews themselves detailed in the provided information, nor any particular demographic focus beyond being restaurant reviews.
CC0
This dataset is primarily intended for beginners in sentiment analysis and natural language processing. It is suitable for: * Students learning text analytics and machine learning. * New practitioners looking for simple datasets to practise building classification models. * Anyone interested in educational projects involving text data and sentiment classification.
Original Data Source: ❤️ vs 😡: Sentiment Analysis 📝
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
MIT Restaurant Corpus - CRFs (Conditional Random Fields) Dataset
A Funny Dive into Restaurant Reviews 🥳🍽️
Welcome to MIT Restaurant Corpus - CRF Dataset! If you are someone who loves food, restaurant and all the jargings that come with it, then you are for a treat! (Pun intended! 😉), Let's break it in the most delicious way!
This dataset obtained from MIT Restaurant Corpus (https://sls.csail.mit.edu/downloads/restaurant/) provides valuable restaurant review data for the NER (Named Entity Recognition) functions. With institutions such as ratings, locations and cuisine, it is perfect for the manufacture of CRF models. 🏷️🍴 Let's dive into this rich resource and find out its ability! 📊📍
The MIT Restaurant Corpus is designed to help you understand the intricacies of restaurant reviews and data about restaurants can be pars and classified. It has a set of files that are structured to give you all ingredients required to make CRF (Conditional Random Field) models for NER (Named Entity Recognition). What is served here:
1.**‘sent_train’** 📝: This file contains a collection of sentences. But not just any sentences. These are sentences taken from real - world restaurant reviews! Each sentence is separated by a new line. It is like a dish of text, a sentence at a time.
2.**‘sent_test’** 🍽️: Just like the ‘sent_train’ file, this one contains sentences, but they’re for testing purposes. Think of it as the "taste test" phase of your restaurant review trip. The sentences here help you assess how well your model has learned the art of NER.
3.**‘label_train’** 🏷️: Now here’s where the magic happens. This file holds the NER labels or tags corresponding to each token in the ‘sent_train’ file. So, for every word in a sentence, there is a related label. It helps the model know what is - whether it’s a restaurant name, location, or dish. This review is like a guide to identify the stars of the show!
4.**‘label_test’** 📋: This file is just like ‘label_train’, but for testing. This allows you to verify if your model predictions are with the reality of the restaurant world. Will your model guess that "Burtito Palace" is the name of a restaurant? You will know here!
Therefore, in short, there is a beautiful one-to-one mapping between ‘sent_train’/‘sent_test’ files and ‘label_train’/‘label_test’ files. Each sentence is combined with its NER tag, which makes your model an ideal recipe for training and testing.
The real star of this dataset is the NER tags. If you’re thinking, "Okay, but in reality we are trying to identify in these restaurants reviews?" Well, here is the menu of NER label with which you are working:
These NER tags help create an understanding of all the data you encounter in a restaurant review. You will be able to easily pull names, prices, ratings, dishes, and more. Talk about a full-recourse data food!
Now, once you get your hand on this delicious dataset, what do you do with it? A ** CRF model ** cooking time!🍳
CRF (conditional random field) is a great way to label the sequences of data - such as sentences. Since NER work is about tagging each token (word) in a sentence, CRF models are ideal. They use reference around each word to perform predictions. So, when you were "wonderful for Sushi in Sushi Central!" As the sentence passes in, the model can find out that "Sushi Central" is a Restaurant_Name, and “sushi” is a Dish.
Next, we dive into defines features for CRF model. Features are like secret materials that work your model. You will learn how to define them in the python, so your model can recognize the pattern and make accurate predictions.
...
Xverum’s Store Location Data offers unmatched global coverage of retail, restaurant, and business locations - spanning 230M+ verified POIs across 5000+ commercial categories in over 249 countries.
Whether you're launching a new retail concept, mapping competitor presence, or enriching your analytics platform with real-world business locations - our bulk dataset helps you unlock rich geospatial context.
What’s Included: ➡️ Store Locations & Addresses: Geocoded with latitude/longitude, city, postal code, country. ➡️ Business Metadata: Brand names, categories & subcategories (e.g., Restaurants, Grocery, Clothing). ➡️ Store Details (if available): Website, phone number, operating hours. ➡️ Structured Delivery: Available in .json via S3 bucket or other cloud storage.
🚫 No Foot Traffic or Mobility Data: Clean, static POI data for precise business intelligence use cases.
Use Cases: ✔️ Retail Site Selection & Market Expansion ✔️ Restaurant Chain Mapping & Competitive Benchmarking ✔️ POI Enrichment for Mapping Platforms & Apps ✔️ Real Estate & Urban Planning Analytics ✔️ Location-Based Targeting & Geospatial Analysis
Why Choose Xverum: ✅ 230M+ Store & Business POIs updated regularly ✅ Global coverage across 249+ countries ✅ 5000+ categories from retail and F&B to professional services ✅ Delivered in bulk only - ideal for enterprise data teams ✅ Privacy-compliant (GDPR/CCPA) & ethically sourced
Request your free sample today and discover how Xverum’s store location data can elevate your retail insights, POI mapping, or expansion planning.
This data includes the name and location of active food service establishments and the violations that were found at the time of the inspection. Active food service establishments include only establishments that are currently operating. This dataset excludes inspections conducted in New York City (https://data.cityofnewyork.us/Health/Restaurant-Inspection-Results/4vkw-7nck), Suffolk County (http://apps.suffolkcountyny.gov/health/Restaurant/intro.html) and Erie County (http://www.healthspace.com/erieny). Inspections are a “snapshot” in time and are not always reflective of the day-to-day operations and overall condition of an establishment. Occasionally, remediation may not appear until the following month due to the timing of the updates. Update frequencies and availability of historical inspection data may vary from county to county. Some counties provide this information on their own websites and information found there may be updated more frequently. This dataset is refreshed on a monthly basis. The inspection data contained in this dataset was not collected in a manner intended for use as a restaurant grading system, and should not be construed or interpreted as such. Any use of this data to develop a restaurant grading system is not supported or endorsed by the New York State Department of Health. For more information, visit http://www.health.ny.gov/regulations/nycrr/title_10/part_14/subpart_14-1.htm or go to the “About” tab.
This data includes the name and location of active food service establishments and the violations that were found at the time of the inspection. Active food service establishments include only establishments that are currently operating. This dataset excludes inspections conducted in New York City (https://data.cityofnewyork.us/Health/Restaurant-Inspection-Results/4vkw-7nck), Suffolk County (http://apps.suffolkcountyny.gov/health/Restaurant/intro.html) and Erie County (http://www.healthspace.com/erieny). Inspections are a “snapshot” in time and are not always reflective of the day-to-day operations and overall condition of an establishment. Occasionally, remediation may not appear until the following month due to the timing of the updates. Update frequencies and availability of historical inspection data may vary from county to county. Some counties provide this information on their own websites and information found there may be updated more frequently. This dataset is refreshed on a monthly basis. The inspection data contained in this dataset was not collected in a manner intended for use as a restaurant grading system, and should not be construed or interpreted as such. Any use of this data to develop a restaurant grading system is not supported or endorsed by the New York State Department of Health. For more information, visit http://www.health.ny.gov/regulations/nycrr/title_10/part_14/subpart_14-1.htm or go to the “About” tab.
POI data has become essential to fuel spatial data models thanks to the granularity of the information provided.
At Echo, it became our mission to build the most accurate POI datasets worldwide, so companies could get ultra-detailed information on stores, restaurants, hotels, and any area of interest to their business.
To do so, we combined public & private data sources with our proprietary algorithm to ensure every insight we provide is always reliable. All insights included are regularly updated on a monthly or quarterly basis for anyone looking to have the most recent information.
With Echo's Point-of-Interest dataset you obtain: - More than 28 attributes (see below) per POI - 55M+ POIs to choose from worldwide - Possibility to match any POI with Mobility data, to analyse the movement around specific locations - Support from our data experts every step of the way
Using Point-of-Interest data will help with: - Analysing the constant change of the physical world (identify business' openings or closings, analyse urban expansion, etc.) - Discovering uncharted growth opportunities before competitors - Making safer investment decisions based on an accurate understanding of new trends - Analysing the locations around your Points of Interest and adapting your strategy accordingly
Data you can expect: - Metadata (country, region, city, coordinates, address, categories, description, operating hours, and more) - Contacts (phone contacts, email, website) - Social profiles (LinkedIn, Twitter, Instagram, Facebook) - Reviews (reviews and ratings on different sites) - Menus (categories, items, prices, descriptions and photos) - Other info (awards, Michelin stars, executive chef, popular dishes, average meal price, and more) - Photos (ambience, food, menu photos)
Let us know if you have a specific request, and we'll try to fulfil it.
How we deliver data: - We transform it to fit your system's data schema, (ease the pain and cost of having data engineers from your side) - We are completely flexible on the delivery format and method.