100+ datasets found

P
Yelp Dataset
paperswithcode.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Yelp Dataset [Dataset]. https://paperswithcode.com/dataset/yelp
Explore at:
Description
The Yelp Dataset is a valuable resource for academic research, teaching, and learning. It provides a rich collection of real-world data related to businesses, reviews, and user interactions. Here are the key details about the Yelp Dataset: Reviews: A whopping 6,990,280 reviews from users. Businesses: Information on 150,346 businesses. Pictures: A collection of 200,100 pictures. Metropolitan Areas: Data from 11 metropolitan areas. Tips: Over 908,915 tips provided by 1,987,897 users. Business Attributes: Details like hours, parking availability, and ambiance for more than 1.2 million businesses. Aggregated Check-ins: Historical check-in data for each of the 131,930 businesses.
yelp_review_full
huggingface.co
Updated Mar 6, 2012
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yelp (2012). yelp_review_full [Dataset]. https://huggingface.co/datasets/Yelp/yelp_review_full
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 6, 2012
Dataset authored and provided by
Yelphttp://yelp.com/
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Dataset Card for YelpReviewFull

Dataset Summary

The Yelp reviews dataset consists of reviews from Yelp. It is extracted from the Yelp Dataset Challenge 2015 data.

Supported Tasks and Leaderboards

text-classification, sentiment-classification: The dataset is mainly used for text classification: given the text, predict the sentiment.

Languages

The reviews were mainly written in english.

Dataset Structure Data Instances

A… See the full description on the dataset page: https://huggingface.co/datasets/Yelp/yelp_review_full.

Yelp dataset 2024

kaggle.com

Updated Oct 29, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

snax07 (2024). Yelp dataset 2024 [Dataset]. https://www.kaggle.com/datasets/snax07/yelp-dataset-2024

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Oct 29, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

snax07

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Yelp Dataset JSON Each file is composed of a single object type, one JSON-object per-line.

Take a look at some examples to get you started: https://github.com/Yelp/dataset-examples.

Note: the follow examples contain inline comments, which are technically not valid JSON. This is done here to simplify the documentation and explaining the structure, the JSON files you download will not contain any comments and will be fully valid JSON.

business.json Contains business data including location data, attributes, and categories.

{ // string, 22 character unique string business id "business_id": "tnhfDv5Il8EaGSXZGiuQGg",

// string, the business's name
"name": "Garaje",

// string, the full address of the business
"address": "475 3rd St",

// string, the city
"city": "San Francisco",

// string, 2 character state code, if applicable
"state": "CA",

// string, the postal code
"postal code": "94107",

// float, latitude
"latitude": 37.7817529521,

// float, longitude
"longitude": -122.39612197,

// float, star rating, rounded to half-stars
"stars": 4.5,

// integer, number of reviews
"review_count": 1198,

// integer, 0 or 1 for closed or open, respectively
"is_open": 1,

// object, business attributes to values. note: some attribute values might be objects
"attributes": {
  "RestaurantsTakeOut": true,
  "BusinessParking": {
    "garage": false,
    "street": true,
    "validated": false,
    "lot": false,
    "valet": false
  },
},

// an array of strings of business categories
"categories": [
  "Mexican",
  "Burgers",
  "Gastropubs"
],

// an object of key day to value hours, hours are using a 24hr clock
"hours": {
  "Monday": "10:00-21:00",
  "Tuesday": "10:00-21:00",
  "Friday": "10:00-21:00",
  "Wednesday": "10:00-21:00",
  "Thursday": "10:00-21:00",
  "Sunday": "11:00-18:00",
  "Saturday": "10:00-21:00"
}

} review.json Contains full review text data including the user_id that wrote the review and the business_id the review is written for.

{ // string, 22 character unique review id "review_id": "zdSx_SD6obEhz9VrW9uAWA",

// string, 22 character unique user id, maps to the user in user.json
"user_id": "Ha3iJu77CxlrFm-vQRs_8g",

// string, 22 character business id, maps to business in business.json
"business_id": "tnhfDv5Il8EaGSXZGiuQGg",

// integer, star rating
"stars": 4,

// string, date formatted YYYY-MM-DD
"date": "2016-03-09",

// string, the review itself
"text": "Great place to hang out after work: the prices are decent, and the ambience is fun. It's a bit loud, but very lively. The staff is friendly, and the food is good. They have a good selection of drinks.",

// integer, number of useful votes received
"useful": 0,

// integer, number of funny votes received
"funny": 0,

// integer, number of cool votes received
"cool": 0

} user.json User data including the user's friend mapping and all the metadata associated with the user.

{ // string, 22 character unique user id, maps to the user in user.json "user_id": "Ha3iJu77CxlrFm-vQRs_8g",

// string, the user's first name
"name": "Sebastien",

// integer, the number of reviews they've written
"review_count": 56,

// string, when the user joined Yelp, formatted like YYYY-MM-DD
"yelping_since": "2011-01-01",

// array of strings, an array of the user's friend as user_ids
"friends": [
  "wqoXYLWmpkEH0YvTmHBsJQ",
  "KUXLLiJGrjtSsapmxmpvTA",
  "6e9rJKQC3n0RSKyHLViL-Q"
],

// integer, number of useful votes sent by the user
"useful": 21,

// integer, number of funny votes sent by the user
"funny": 88,

// integer, number of cool votes sent by the user
"cool": 15,

// integer, number of fans the user has
"fans": 1032,

// array of integers, the years the user was elite
"elite": [
  2012,
  2013
],

// float, average rating of all reviews
"average_stars": 4.31,

// integer, number of hot compliments received by the user
"compliment_hot": 339,

// integer, number of more compliments received by the user
"compliment_more": 668,

// integer, number of profile compliments received by the user
"compliment_profile": 42,

// integer, number of cute compliments received by the user
"compliment_cute": 62,

// integer, number of list compliments received by the user
"compliment_list": 37,

// integer, number of note compliments received by the user
"compliment_note": 356,

// integer, number of plain compliments received by the user
"compliment_plain": 68,

// integer, number of coo...

h
yelp-dataset
huggingface.co
Updated Apr 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Noah Nkalubo Nsimbe (2024). yelp-dataset [Dataset]. https://huggingface.co/datasets/noahnsimbe/yelp-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 17, 2024
Authors
Noah Nkalubo Nsimbe
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Dataset Card for Dataset Name

This dataset card aims to be a base template for new datasets. It has been generated using this raw template.

Dataset Details Dataset Description

Curated by: [More Information Needed] Funded by [optional]: [More Information Needed] Shared by [optional]: [More Information Needed] Language(s) (NLP): [More Information Needed] License: [More Information Needed]

Dataset Sources [optional]

Repository: [More… See the full description on the dataset page: https://huggingface.co/datasets/noahnsimbe/yelp-dataset.
Yelp Datasets
brightdata.com
.json, .csv, .xlsx
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data, Yelp Datasets [Dataset]. https://brightdata.com/products/datasets/yelp
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Use our Yelp dataset to discover and review local businesses, such as restaurants, bars, cafes, hotels, and more. The Yelp dataset is a complementary dataset to the Yelp businesses overview and includes full information on each review filled on a business. Datapoints include:timestamp, business_id, review_author, rating, date, content, review_image, reactions, replies and more.
P
Yelp-Fraud Dataset
paperswithcode.com
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yingtong Dou; Zhiwei Liu; Li Sun; Yutong Deng; Hao Peng; Philip S. Yu (2025). Yelp-Fraud Dataset [Dataset]. https://paperswithcode.com/dataset/yelpchi
Explore at:
Dataset updated
Apr 21, 2025
Authors
Yingtong Dou; Zhiwei Liu; Li Sun; Yutong Deng; Hao Peng; Philip S. Yu
Description
Yelp-Fraud is a multi-relational graph dataset built upon the Yelp spam review dataset, which can be used in evaluating graph-based node classification, fraud detection, and anomaly detection models.

Dataset Statistics

# Nodes %Fraud Nodes (Class=1)
45,954 14.5

Relation # Edges
R-U-R
R-T-R
R-S-R 3,402,743
All

Graph Construction

The Yelp spam review dataset includes hotel and restaurant reviews filtered (spam) and recommended (legitimate) by Yelp. We conduct a spam review detection task on the Yelp-Fraud dataset which is a binary classification task. We take 32 handcrafted features from SpEagle paper as the raw node features for Yelp-Fraud. Based on previous studies which show that opinion fraudsters have connections in user, product, review text, and time, we take reviews as nodes in the graph and design three relations: 1) R-U-R: it connects reviews posted by the same user; 2) R-S-R: it connects reviews under the same product with the same star rating (1-5 stars); 3) R-T-R: it connects two reviews under the same product posted in the same month.

To download the dataset, please visit this Github repo. For any other questions, please email ytongdou(AT)gmail.com for inquiry.
Yelp Reviews Dataset
brightdata.com
.json, .csv, .xlsx
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2024). Yelp Reviews Dataset [Dataset]. https://brightdata.com/products/datasets/yelp/reviews
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Nov 25, 2024
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Yelp Reviews dataset to explore ratings and reviews for local businesses, including restaurants, bars, cafes, and hotels. Popular use cases include analyzing customer sentiment, benchmarking business performance, and gaining insights into local market trends. Datapoints include: business ID, review author, rating, date, content, image, and more.
T
yelp_polarity_reviews
tensorflow.org
Updated Dec 6, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). yelp_polarity_reviews [Dataset]. https://www.tensorflow.org/datasets/catalog/yelp_polarity_reviews
Explore at:
Dataset updated
Dec 6, 2022
Description
Large Yelp Review Dataset. This is a dataset for binary sentiment classification. We provide a set of 560,000 highly polar yelp reviews for training, and 38,000 for testing. ORIGIN The Yelp reviews dataset consists of reviews from Yelp. It is extracted from the Yelp Dataset Challenge 2015 data. For more information, please refer to http://www.yelp.com/dataset

The Yelp reviews polarity dataset is constructed by Xiang Zhang (xiang.zhang@nyu.edu) from the above dataset. It is first used as a text classification benchmark in the following paper: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015).

DESCRIPTION

The Yelp reviews polarity dataset is constructed by considering stars 1 and 2 negative, and 3 and 4 positive. For each polarity 280,000 training samples and 19,000 testing samples are take randomly. In total there are 560,000 trainig samples and 38,000 testing samples. Negative polarity is class 1, and positive class 2.

The files train.csv and test.csv contain all the training samples as comma-sparated values. There are 2 columns in them, corresponding to class index (1 and 2) and review text. The review texts are escaped using double quotes ("), and any internal double quote is escaped by 2 double quotes (""). New lines are escaped by a backslash followed with an "n" character, that is " ".

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('yelp_polarity_reviews', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
yelp-csv
kaggle.com
Updated Jan 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Flyer Steve (2023). yelp-csv [Dataset]. https://www.kaggle.com/datasets/flyersteve/yelp-csv/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 30, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Flyer Steve
Description
Dataset

This dataset was created by Flyer Steve

Contents
d
Louisville Metro KY - YELP Data businesses
datasets.ai
catalog.data.gov
15, 21, 3, 8
Updated Sep 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Louisville Metro Government (2024). Louisville Metro KY - YELP Data businesses [Dataset]. https://datasets.ai/datasets/louisville-metro-ky-yelp-data-businesses
Explore at:
15, 21, 8, 3Available download formats
Dataset updated
Sep 11, 2024
Dataset authored and provided by
Louisville Metro Government
Area covered
Louisville, Kentucky
Description
Listing of geocoded businesses, inspections for those businesses, and health violations for those businesses, used as a feed to Yelp. All files are csv files.
Data Dictionary Type
Contact:
Gerald Kaforski
gerald.kaforski@louisvilleky.gov
a
Yelp reviews - Full
academictorrents.com
bittorrent
Updated Oct 16, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiang Zhang et al., 2015 (2018). Yelp reviews - Full [Dataset]. https://academictorrents.com/details/66ab083bda0c508de6c641baabb1ec17f72dc480
Explore at:
bittorrent(196146755)Available download formats
Dataset updated
Oct 16, 2018
Dataset authored and provided by
Xiang Zhang et al., 2015
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
1,569,264 samples from the Yelp Dataset Challenge 2015. This full dataset has 130,000 training samples and 10,000 testing samples in each star.
Yelp Dataset
kaggle.com
zip
Updated Mar 17, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yelp, Inc. (2022). Yelp Dataset [Dataset]. https://www.kaggle.com/yelp-dataset/yelp-dataset
Explore at:
zip(4374983563 bytes)Available download formats
Dataset updated
Mar 17, 2022
Dataset provided by
Yelphttp://yelp.com/
Authors
Yelp, Inc.
Description
Context

This dataset is a subset of Yelp's businesses, reviews, and user data. It was originally put together for the Yelp Dataset Challenge which is a chance for students to conduct research or analysis on Yelp's data and share their discoveries. In the most recent dataset you'll find information about businesses across 8 metropolitan areas in the USA and Canada.

Content

This dataset contains five JSON files and the user agreement. More information about those files can be found here.

Code snippet to read the files

in Python, you can read the JSON files like this (using the json and pandas libraries):

import json import pandas as pd data_file = open("yelp_academic_dataset_checkin.json") data = [] for line in data_file: data.append(json.loads(line)) checkin_df = pd.DataFrame(data) data_file.close()
h
yelp
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MultifacetedNLPDatasets, yelp [Dataset]. https://huggingface.co/datasets/recmeapp/yelp
Explore at:
Authors
MultifacetedNLPDatasets
Description
A quick usage example of Yelp dataset.

install datasets library

%pip install datasets

import load_dataset

from datasets import load_dataset

Reading the Dataset

ds = load_dataset("recmeapp/yelp", "main_data")

Reading the App MetaData

app_metadata = load_dataset("recmeapp/yelp", "app_meta")

How many dialogs are there in different splits?

train_data = ds['train'] valid_data = ds['val'] test_data = ds['test']

print(f'There are… See the full description on the dataset page: https://huggingface.co/datasets/recmeapp/yelp.
Yelp Open Dataset
live.european-language-grid.eu
json
Updated Dec 30, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yelp (2015). Yelp Open Dataset [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/5179
Explore at:
jsonAvailable download formats
Dataset updated
Dec 30, 2015
Dataset authored and provided by
Yelphttp://yelp.com/
License
https://s3-media0.fl.yelpcdn.com/assets/srv0/engineering_pages/bea5c1e92bf3/assets/vendor/yelp-dataset-agreement.pdfhttps://s3-media0.fl.yelpcdn.com/assets/srv0/engineering_pages/bea5c1e92bf3/assets/vendor/yelp-dataset-agreement.pdf
Description
Dataset containing millions of reviews on Yelp. In addition it contains business data including location data, attributes, and categories.
Yelp 2015
figshare.com
txt
Updated May 21, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zeping Yu (2018). Yelp 2015 [Dataset]. http://doi.org/10.6084/m9.figshare.6292334.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6292334.v1
Dataset updated
May 21, 2018
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Zeping Yu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is a subset of the Yelp Challenge, it contains all the reviews in the year of 2015
a
Yelp reviews - Polarity
academictorrents.com
bittorrent
Updated Oct 16, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiang Zhang et al., 2015 (2018). Yelp reviews - Polarity [Dataset]. https://academictorrents.com/details/271777225ff3c6dec8055e231c70731a1da2518f
Explore at:
bittorrent(166373201)Available download formats
Dataset updated
Oct 16, 2018
Dataset authored and provided by
Xiang Zhang et al., 2015
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
1,569,264 samples from the Yelp Dataset Challenge 2015. This subset has 280,000 training samples and 19,000 test samples in each polarity.
Z
The Yelp Collaborative Knowledge Graph
data.niaid.nih.gov
Updated Jun 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Olesen, Magnus (2023). The Yelp Collaborative Knowledge Graph [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7878446
Explore at:
Dataset updated
Jun 17, 2023
Dataset provided by
Olesen, Magnus
Corfixen, Mads
Heede, Thomas
Nielsen, Christian Filip Pinderup
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the The Yelp Collaborative Knowledge Graph (YCKG) - a transformation of the Yelp Open Dataset into RDF format using Y2KG.

Paper Abstract

The Yelp Open Dataset (YOD) contains data about businesses, reviews, and users from the Yelp website and is available for research purposes. This dataset has been widely used to develop and test Recommender Systems (RS), especially those using Knowledge Graphs (KGs), e.g., integrating taxonomies, product categories, business locations, and social network information. Unfortunately, researchers applied naive or wrong mappings while converting YOD in KGs, consequently obtaining unrealistic results. Among the various issues, the conversion processes usually do not follow state-of-the-art methodologies, fail to properly link to other KGs and reuse existing vocabularies. In this work, we overcome these issues by introducing Y2KG, a utility to convert the Yelp dataset into a KG. Y2KG consists of two components. The first is a dataset including (1) a vocabulary that extends Schema.org with properties to describe the concepts in YOD and (2) mappings between the Yelp entities and Wikidata. The second component is a set of scripts to transform YOD in RDF and obtain the Yelp Collaborative Knowledge Graph (YCKG). The design of Y2KG was driven by 16 core competency questions. YCKG includes 150k businesses and 16.9M reviews from 1.9M distinct real users, resulting in over 244 million triples (with 144 distinct predicates) for about 72 million resources, with an average in-degree and out-degree of 3.3 and 12.2, respectively.

Links

Latest GitHub release: https://github.com/MadsCorfixen/The-Yelp-Collaborative-Knowledge-Graph/releases/latest

PURL domain: https://purl.archive.org/domain/yckg

Files

Graph Data Triple Files

One sample file for each of the Yelp domains (Businesses, Users, Reviews, Tips and Checkins), each containing 20 entities.

yelp_schema_mappings.nt.gz containing the mappings from Yelp categories to Schema things.

schema_hierarchy.nt.gz containing the full hierarchy of the mapped Schema things.

yelp_wiki_mappings.nt.gz containing the mappings from Yelp categories to Wikidata entities.

wikidata_location_mappings.nt.gz containing the mappings from Yelp locations to Wikidata entities.

Graph Metadata Triple Files

yelp_categories.ttl contains metadata for all Yelp categories.

yelp_entities.ttl contains metadata regarding the dataset

yelp_vocabulary.ttl contains metadata on the created Yelp vocabulary and properties.

Utility Files

yelp_category_schema_mappings.csv. This file contains the 310 mappings from Yelp categories to Schema types. These mappings have been manually verified to be correct.

yelp_predicate_schema_mappings.csv. This file contains the 14 mappings from Yelp attributes to Schema properties. These mappings are manually found.

ground_truth_yelp_category_schema_mappings.csv. This file contains the ground truth, based on 200 manually verified mappings from Yelp categories to Schema things. The ground truth mappings were used to calculate precision and recall for the semantic mappings.

manually_split_categories.csv. This file contains all Yelp categories containing either a & or /, and their manually split versions. The split versions have been used in the semantic mappings to Schema things.
d
Replication Data for: \"A Topic-based Segmentation Model for Identifying...
search.dataone.org
dataverse.harvard.edu
Updated Sep 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kim, Sunghoon; Lee, Sanghak; McCulloch, Robert (2024). Replication Data for: \"A Topic-based Segmentation Model for Identifying Segment-Level Drivers of Star Ratings from Unstructured Text Reviews\" [Dataset]. http://doi.org/10.7910/DVN/EE3DE2
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/EE3DE2
Dataset updated
Sep 25, 2024
Dataset provided by
Harvard Dataverse
Authors
Kim, Sunghoon; Lee, Sanghak; McCulloch, Robert
Description
We provide instructions, codes and datasets for replicating the article by Kim, Lee and McCulloch (2024), "A Topic-based Segmentation Model for Identifying Segment-Level Drivers of Star Ratings from Unstructured Text Reviews." This repository provides a user-friendly R package for any researchers or practitioners to apply A Topic-based Segmentation Model with Unstructured Texts (latent class regression with group variable selection) to their datasets. First, we provide a R code to replicate the illustrative simulation study: see file 1. Second, we provide the user-friendly R package with a very simple example code to help apply the model to real-world datasets: see file 2, Package_MixtureRegression_GroupVariableSelection.R and Dendrogram.R. Third, we provide a set of codes and instructions to replicate the empirical studies of customer-level segmentation and restaurant-level segmentation with Yelp reviews data: see files 3-a, 3-b, 4-a, 4-b. Note, due to the dataset terms of use by Yelp and the restriction of data size, we provide the link to download the same Yelp datasets (https://www.kaggle.com/datasets/yelp-dataset/yelp-dataset/versions/6). Fourth, we provided a set of codes and datasets to replicate the empirical study with professor ratings reviews data: see file 5. Please see more details in the description text and comments of each file. [A guide on how to use the code to reproduce each study in the paper] 1. Full codes for replicating Illustrative simulation study.txt -- [see Table 2 and Figure 2 in main text]: This is R source code to replicate the illustrative simulation study. Please run from the beginning to the end in R. In addition to estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships, you will get dendrograms of selected groups of variables in Figure 2. Computing time is approximately 20 to 30 minutes 3-a. Preprocessing raw Yelp Reviews for Customer-level Segmentation.txt: Code for preprocessing the downloaded unstructured Yelp review data and preparing DV and IVs matrix for customer-level segmentation study. 3-b. Instruction for replicating Customer-level Segmentation analysis.txt -- [see Table 10 in main text; Tables F-1, F-2, and F-3 and Figure F-1 in Web Appendix]: Code for replicating customer-level segmentation study with Yelp data. You will get estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships. Computing time is approximately 3 to 4 hours. 4-a. Preprocessing raw Yelp reviews_Restaruant Segmentation (1).txt: R code for preprocessing the downloaded unstructured Yelp data and preparing DV and IVs matrix for restaurant-level segmentation study. 4-b. Instructions for replicating restaurant-level segmentation analysis.txt -- [see Tables 5, 6 and 7 in main text; Tables E-4 and E-5 and Figure H-1 in Web Appendix]: Code for replicating restaurant-level segmentation study with Yelp. you will get estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships. Computing time is approximately 10 to 12 hours. [Guidelines for running Benchmark models in Table 6] Unsupervised Topic model: 'topicmodels' package in R -- after determining the number of topics(e.g., with 'ldatuning' R package), run 'LDA' function in the 'topicmodels'package. Then, compute topic probabilities per restaurant (with 'posterior' function in the package) which can be used as predictors. Then, conduct prediction with regression Hierarchical topic model (HDP): 'gensimr' R package -- 'model_hdp' function for identifying topics in the package (see https://radimrehurek.com/gensim/models/hdpmodel.html or https://gensimr.news-r.org/). Supervised topic model: 'lda' R package -- 'slda.em' function for training and 'slda.predict' for prediction. Aggregate regression: 'lm' default function in R. Latent class regression without variable selection: 'flexmix' function in 'flexmix' R package. Run flexmix with a certain number of segments (e.g., 3 segments in this study). Then, with estimated coefficients and memberships, conduct prediction of dependent variable per each segment. Latent class regression with variable selection: 'Unconstraind_Bayes_Mixture' function in Kim, Fong and DeSarbo(2012)'s package. Run the Kim et al's model (2012) with a certain number of segments (e.g., 3 segments in this study). Then, with estimated coefficients and memberships, we can do prediction of dependent variables per each segment. The same R package ('KimFongDeSarbo2012.zip') can be downloaded at: https://sites.google.com/scarletmail.rutgers.edu/r-code-packages/home 5. Instructions for replicating Professor ratings review study.txt -- [see Tables G-1, G-2, G-4 and G-5, and Figures G-1 and H-2 in Web Appendix]: Code to replicate the Professor ratings reviews study. Computing time is approximately 10 hours. [A list of the versions of R, packages, and computer...
H
Yelp Reviews in Boston, MA
dataverse.harvard.edu
Updated Oct 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qiliang Chen; Riley Tucker; Babak Heydari; Daniel T. O'Brien (2020). Yelp Reviews in Boston, MA [Dataset]. http://doi.org/10.7910/DVN/DMWCBT
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/DMWCBT
Dataset updated
Oct 12, 2020
Dataset provided by
Harvard Dataverse
Authors
Qiliang Chen; Riley Tucker; Babak Heydari; Daniel T. O'Brien
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Massachusetts, Boston
Description
These datasets include information about Yelp restaurant reviews for the city of Boston processed from data scraped by BARI. We have generated a list of Boston restaurants by searching all of Boston's zipcodes on Yelp and then verifying that each identified restaurant has an address that falls within Boston's boundaries. YELP.Reviews is a review-level file that contains information about reviews posted on Yelp. YELP.Restaurants is a restaurant-level file that contains information about the restaurants on Yelp. Restaurant data has been aggregated across census tracts to generate YELP.CT, which includes ecometrics that describe neighborhoods in terms of frequency of reviews.
o
Same Sentiment Classification Train/Dev/Test Pair IDs
explore.openaire.eu
data.niaid.nih.gov
+1more
Updated Sep 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erik Körner; Ahmad Dawar Hakimi; Gerhard Heyer; Martin Potthast (2021). Same Sentiment Classification Train/Dev/Test Pair IDs [Dataset]. http://doi.org/10.5281/zenodo.5495793
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.5495793
Dataset updated
Sep 9, 2021
Authors
Erik Körner; Ahmad Dawar Hakimi; Gerhard Heyer; Martin Potthast
Description
This "dataset" only includes the compiled pairings of the Yelp Business Review Dataset. To get access to the actual review texts, please follow the instructions on the Yelp Dataset webpage. The data format is JSONlines. Python Load Example: import pandas as pd traindev_df = pd.read_json("df_traindev.jsonl", lines=True) test_df = pd.read_json("df_test.jsonl", lines=True) # example access to single business/review id s1_bid = test_df.iloc[0]["sent1_business_id"] s1_rid = test_df.iloc[0]["sent1_review_id"] s2_bid = test_df.iloc[0]["sent2_business_id"] s2_rid = test_df.iloc[0]["sent2_review_id"] label = test_df.iloc[0]["is_same_side"] See documentation at: Yelp Dataset Schemata (only business.json and review.json were used) Yelp Business Category Hierarchy (download the json file as all_category_list.json) For details on how the data was compiled and used in our experiments, please refer to our code repository. Other derived data splits can be reproduced deterministically by using the same random seed as in our experiments.

# Nodes	%Fraud Nodes (Class=1)
45,954	14.5

Relation	# Edges
	R-U-R
	R-T-R
R-S-R	3,402,743
	All

Facebook

Twitter

Click to copy link

Link copied

Cite

(2024). Yelp Dataset [Dataset]. https://paperswithcode.com/dataset/yelp

Yelp Dataset

Explore at:

6 scholarly articles cite this dataset (View in Google Scholar)

Description

The Yelp Dataset is a valuable resource for academic research, teaching, and learning. It provides a rich collection of real-world data related to businesses, reviews, and user interactions. Here are the key details about the Yelp Dataset: Reviews: A whopping 6,990,280 reviews from users. Businesses: Information on 150,346 businesses. Pictures: A collection of 200,100 pictures. Metropolitan Areas: Data from 11 metropolitan areas. Tips: Over 908,915 tips provided by 1,987,897 users. Business Attributes: Details like hours, parking availability, and ambiance for more than 1.2 million businesses. Aggregated Check-ins: Historical check-in data for each of the 131,930 businesses.

Clear search

Close search

Google apps

Main menu

Yelp Dataset

yelp_review_full

Yelp dataset 2024

yelp-dataset

Yelp Datasets

Yelp-Fraud Dataset

Yelp Reviews Dataset

yelp_polarity_reviews

yelp-csv

Dataset

Contents

Louisville Metro KY - YELP Data businesses

Yelp reviews - Full

Yelp Dataset

Context

Content

Code snippet to read the files

yelp

Yelp Open Dataset

Yelp 2015

Yelp reviews - Polarity

The Yelp Collaborative Knowledge Graph

Replication Data for: \"A Topic-based Segmentation Model for Identifying...

Yelp Reviews in Boston, MA

Same Sentiment Classification Train/Dev/Test Pair IDs

Yelp Dataset