The Yelp Dataset is a valuable resource for academic research, teaching, and learning. It provides a rich collection of real-world data related to businesses, reviews, and user interactions. Here are the key details about the Yelp Dataset: Reviews: A whopping 6,990,280 reviews from users. Businesses: Information on 150,346 businesses. Pictures: A collection of 200,100 pictures. Metropolitan Areas: Data from 11 metropolitan areas. Tips: Over 908,915 tips provided by 1,987,897 users. Business Attributes: Details like hours, parking availability, and ambiance for more than 1.2 million businesses. Aggregated Check-ins: Historical check-in data for each of the 131,930 businesses.
https://brightdata.com/licensehttps://brightdata.com/license
Use our Yelp dataset to discover and review local businesses, such as restaurants, bars, cafes, hotels, and more. The Yelp dataset is a complementary dataset to the Yelp businesses overview and includes full information on each review filled on a business. Datapoints include:timestamp, business_id, review_author, rating, date, content, review_image, reactions, replies and more.
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Dataset Card for YelpReviewFull
Dataset Summary
The Yelp reviews dataset consists of reviews from Yelp. It is extracted from the Yelp Dataset Challenge 2015 data.
Supported Tasks and Leaderboards
text-classification, sentiment-classification: The dataset is mainly used for text classification: given the text, predict the sentiment.
Languages
The reviews were mainly written in english.
Dataset Structure
Data Instances
A… See the full description on the dataset page: https://huggingface.co/datasets/Yelp/yelp_review_full.
This dataset provides comprehensive business information and reviews from Yelp. It includes detailed business data, customer reviews, ratings, and search capabilities for local businesses and restaurants. Perfect for applications requiring local business intelligence and customer feedback analysis. The dataset is delivered in a JSON format via REST API.
This dataset is a subset of Yelp's businesses, reviews, and user data. It was originally put together for the Yelp Dataset Challenge which is a chance for students to conduct research or analysis on Yelp's data and share their discoveries. In the most recent dataset you'll find information about businesses across 8 metropolitan areas in the USA and Canada.
This dataset contains five JSON files and the user agreement. More information about those files can be found here.
in Python, you can read the JSON files like this (using the json and pandas libraries):
import json
import pandas as pd
data_file = open("yelp_academic_dataset_checkin.json")
data = []
for line in data_file:
data.append(json.loads(line))
checkin_df = pd.DataFrame(data)
data_file.close()
Listing of geocoded businesses, inspections for those businesses, and health violations for those businesses, used as a feed to Yelp. All files are csv files.Data Dictionary Type Contact:Gerald Kaforskigerald.kaforski@louisvilleky.gov
Large Yelp Review Dataset. This is a dataset for binary sentiment classification. We provide a set of 560,000 highly polar yelp reviews for training, and 38,000 for testing. ORIGIN The Yelp reviews dataset consists of reviews from Yelp. It is extracted from the Yelp Dataset Challenge 2015 data. For more information, please refer to http://www.yelp.com/dataset
The Yelp reviews polarity dataset is constructed by Xiang Zhang (xiang.zhang@nyu.edu) from the above dataset. It is first used as a text classification benchmark in the following paper: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015).
DESCRIPTION
The Yelp reviews polarity dataset is constructed by considering stars 1 and 2 negative, and 3 and 4 positive. For each polarity 280,000 training samples and 19,000 testing samples are take randomly. In total there are 560,000 trainig samples and 38,000 testing samples. Negative polarity is class 1, and positive class 2.
The files train.csv and test.csv contain all the training samples as comma-sparated values. There are 2 columns in them, corresponding to class index (1 and 2) and review text. The review texts are escaped using double quotes ("), and any internal double quote is escaped by 2 double quotes (""). New lines are escaped by a backslash followed with an "n" character, that is " ".
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('yelp_polarity_reviews', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This dataset was created by PrivacyMatters
Released under Database: Open Database, Contents: © Original Authors
This dataset was created by Fateme Soleimani
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Yelp reported $0 in Debt for its fiscal quarter ending in March of 2025. Data for Yelp | YELP - Debt including historical, tables and charts were last updated by Trading Economics this last July in 2025.
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
At Yelp, there are lots of photos and lots of users uploading photos. These photos provide rich local business information across categories. Teaching a computer to understand the context of these photos is not an easy task. Yelp engineers work on deep learning image classification projects in-house, and you can read about them here. In this competition, you are given photos that belong to a business and asked to predict the business attributes. There are 9 different attributes in this problem: 0: good_for_lunch 1: good_for_dinner 2: takes_reservations 3: outdoor_seating 4: restaurant_is_expensive 5: has_alcohol 6: has_table_service 7: ambience_is_classy 8: good_for_kids These labels are annotated by the Yelp community. Your task is to predict these labels purely from the business photos uploaded by users. Since Yelp is a community driven website, there are duplicated images in the dataset. They are mainly due to: users accidentally upload the same photo to the same business more than
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Contains TripAdvisor and Yelp review data, and tweets related to points of interest in Florida and New York. twitter, yelp, Florida, New York, data mining
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Yelp reported $50.96M in EBITDA for its fiscal quarter ending in March of 2025. Data for Yelp | YELP - Ebitda including historical, tables and charts were last updated by Trading Economics this last July in 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the The Yelp Collaborative Knowledge Graph (YCKG) - a transformation of the Yelp Open Dataset into RDF format using Y2KG.
Paper Abstract
The Yelp Open Dataset (YOD) contains data about businesses, reviews, and users from the Yelp website and is available for research purposes. This dataset has been widely used to develop and test Recommender Systems (RS), especially those using Knowledge Graphs (KGs), e.g., integrating taxonomies, product categories, business locations, and social network information. Unfortunately, researchers applied naive or wrong mappings while converting YOD in KGs, consequently obtaining unrealistic results. Among the various issues, the conversion processes usually do not follow state-of-the-art methodologies, fail to properly link to other KGs and reuse existing vocabularies. In this work, we overcome these issues by introducing Y2KG, a utility to convert the Yelp dataset into a KG. Y2KG consists of two components. The first is a dataset including (1) a vocabulary that extends Schema.org with properties to describe the concepts in YOD and (2) mappings between the Yelp entities and Wikidata. The second component is a set of scripts to transform YOD in RDF and obtain the Yelp Collaborative Knowledge Graph (YCKG). The design of Y2KG was driven by 16 core competency questions. YCKG includes 150k businesses and 16.9M reviews from 1.9M distinct real users, resulting in over 244 million triples (with 144 distinct predicates) for about 72 million resources, with an average in-degree and out-degree of 3.3 and 12.2, respectively.
Links
Latest GitHub release: https://github.com/MadsCorfixen/The-Yelp-Collaborative-Knowledge-Graph/releases/latest
PURL domain: https://purl.archive.org/domain/yckg
Files
yelp_schema_mappings.nt.gz
containing the mappings from Yelp categories to Schema things.schema_hierarchy.nt.gz
containing the full hierarchy of the mapped Schema things.yelp_wiki_mappings.nt.gz
containing the mappings from Yelp categories to Wikidata entities.wikidata_location_mappings.nt.gz
containing the mappings from Yelp locations to Wikidata entities.yelp_categories.ttl
contains metadata for all Yelp categories.yelp_entities.ttl
contains metadata regarding the datasetyelp_vocabulary.ttl
contains metadata on the created Yelp vocabulary and properties.yelp_category_schema_mappings.csv
. This file contains the 310 mappings from Yelp categories to Schema types. These mappings have been manually verified to be correct.yelp_predicate_schema_mappings.csv
. This file contains the 14 mappings from Yelp attributes to Schema properties. These mappings are manually found.ground_truth_yelp_category_schema_mappings.csv
. This file contains the ground truth, based on 200 manually verified mappings from Yelp categories to Schema things. The ground truth mappings were used to calculate precision and recall for the semantic mappings.manually_split_categories.csv
. This file contains all Yelp categories containing either a & or /, and their manually split versions. The split versions have been used in the semantic mappings to Schema things.https://brightdata.com/licensehttps://brightdata.com/license
Yelp Reviews dataset to explore ratings and reviews for local businesses, including restaurants, bars, cafes, and hotels. Popular use cases include analyzing customer sentiment, benchmarking business performance, and gaining insights into local market trends. Datapoints include: business ID, review author, rating, date, content, image, and more.
Discover and evaluate local business & user reviews with OpenWeb Ninja's Yelp consumer review data and business listings data real-time API. Our API covers a variety of business categories including restaurants, bars, cafes, hotels, and more.
{{description}}
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
This dataset is a publicly available fake review dataset from the Yelp website, including the YelpChi, YelpNYC, and YelpZIP datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We are using the Yelp Review Dataset as the streaming data source for the DataCI example. We have processed the Yelp review dataset into a daily-based dataset by its `date`. In this dataset, we will only use the data from 2020-09-01 to 2020-11-30 to simulate the streaming data scenario. We are downloading two versions of the training and validation datasets:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The different data sets of point features related to food that we used for the different tasks in our user study. The data was extracted from the Yelp Open Dataset.
The Yelp Dataset is a valuable resource for academic research, teaching, and learning. It provides a rich collection of real-world data related to businesses, reviews, and user interactions. Here are the key details about the Yelp Dataset: Reviews: A whopping 6,990,280 reviews from users. Businesses: Information on 150,346 businesses. Pictures: A collection of 200,100 pictures. Metropolitan Areas: Data from 11 metropolitan areas. Tips: Over 908,915 tips provided by 1,987,897 users. Business Attributes: Details like hours, parking availability, and ambiance for more than 1.2 million businesses. Aggregated Check-ins: Historical check-in data for each of the 131,930 businesses.