100+ datasets found
  1. yelp_review_full

    • huggingface.co
    Updated Mar 6, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yelp (2012). yelp_review_full [Dataset]. https://huggingface.co/datasets/Yelp/yelp_review_full
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 6, 2012
    Dataset authored and provided by
    Yelphttp://yelp.com/
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Dataset Card for YelpReviewFull

      Dataset Summary
    

    The Yelp reviews dataset consists of reviews from Yelp. It is extracted from the Yelp Dataset Challenge 2015 data.

      Supported Tasks and Leaderboards
    

    text-classification, sentiment-classification: The dataset is mainly used for text classification: given the text, predict the sentiment.

      Languages
    

    The reviews were mainly written in english.

      Dataset Structure
    
    
    
    
    
      Data Instances
    

    A… See the full description on the dataset page: https://huggingface.co/datasets/Yelp/yelp_review_full.

  2. Processed YELP Dataset

    • kaggle.com
    zip
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nguyen Hieu Xt (2024). Processed YELP Dataset [Dataset]. https://www.kaggle.com/datasets/nguyenhieuxt/processed-yelp-dataset
    Explore at:
    zip(3655803662 bytes)Available download formats
    Dataset updated
    Dec 16, 2024
    Authors
    Nguyen Hieu Xt
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset is derived from the YELP Dataset, with the following preprocessing steps applied:

    • All JSON files in the raw dataset have been converted to CSV format.
    • The JSON file containing review data has been split into two subsets: the training set (containing data from 2014 to 2019) and the testing set (containing data from 2020 to 2021).
    • Columns in the user data that begin with the prefix "compliment_" have been removed. A new column, "num_compliments" has been created, representing the sum of all values from the removed columns.
    • The JSON files containing tip and check-in information have been removed, as they are deemed unnecessary for the review-based recommendation system.
  3. d

    Grepsr| Yelp Resturants Address and Reviews Data | Global Coverage with...

    • datarade.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Grepsr, Grepsr| Yelp Resturants Address and Reviews Data | Global Coverage with Custom and On-demand Datasets [Dataset]. https://datarade.ai/data-products/grepsr-yelp-resturants-address-and-reviews-data-global-cov-grepsr
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset authored and provided by
    Grepsr
    Area covered
    Venezuela (Bolivarian Republic of), Anguilla, Ethiopia, Turkey, Sudan, Iran (Islamic Republic of), Latvia, Saint Lucia, Gambia, United Arab Emirates
    Description

    Use cases that can be supported with Yelp Reviews

    A. Market Research and Analysis: Leverage Yelp data to conduct comprehensive market research and analysis in the restaurant industry. Identify emerging culinary trends, popular cuisines, and customer preferences. Gain a competitive edge by understanding your target audience's needs and expectations.

    B. Competitor Analysis: Compare and contrast your restaurant with competitors on Yelp. Analyze their ratings, customer reviews, and performance metrics to identify strengths and weaknesses. Use these insights to enhance your offerings and stand out in the market.

    C. Reputation Management: Monitor and manage your restaurant's online reputation effectively. Track and analyze customer reviews and ratings on Yelp to identify improvement areas and promptly address negative feedback. Positive reviews can be leveraged for marketing and branding purposes.

    D. Pricing and Revenue Optimization: Leverage the Yelp dataset to analyze pricing strategies and revenue trends in the restaurant sector. Understand seasonal demand fluctuations, pricing patterns, and revenue optimization opportunities to maximize your restaurant's profitability.

    E. Customer Sentiment Analysis: Conduct sentiment analysis on Yelp reviews to gauge customer satisfaction and sentiment towards your restaurant. Use this information to improve dining experiences, address pain points, and enhance overall customer satisfaction.

    F. Content Marketing and SEO: Create compelling content for your restaurant's website based on popular keywords, cuisines, and dining preferences identified in the Yelp dataset. Optimize your content to improve search engine rankings and attract more potential diners.

    G. Personalized Marketing Campaigns: Use Yelp data to segment your target audience based on dining preferences, food habits, and demographics. Develop personalized marketing campaigns that resonate with different customer segments, resulting in higher engagement and repeat business.

    H. Investment and Expansion Decisions: Access historical and real-time data on restaurant performance and market dynamics from Yelp. Utilize this information to make data-driven investment decisions, identify potential areas for expansion, and assess the feasibility of new culinary ventures.

    I. Predictive Analytics: Utilize the Yelp dataset to build predictive models that forecast future trends in the restaurant industry. Anticipate shifts in culinary preferences, understand customer behavior, and make proactive decisions to stay ahead of the competition.

    J. Business Intelligence Dashboards: Create interactive and insightful dashboards that visualize key performance metrics from the Yelp dataset. These dashboards can help restaurant executives and stakeholders get a quick overview of the restaurant's performance and make data-driven decisions.

    Incorporating the Yelp dataset into your business processes will enhance your understanding of the restaurant market, facilitate data-driven decision-making, and provide valuable insights to drive success in the competitive culinary industry.

  4. yelp open data philly restaurants

    • kaggle.com
    zip
    Updated Apr 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cade Apple (2024). yelp open data philly restaurants [Dataset]. https://www.kaggle.com/datasets/capple7/yelp-open-data-philly-restaurants
    Explore at:
    zip(5268181410 bytes)Available download formats
    Dataset updated
    Apr 14, 2024
    Authors
    Cade Apple
    Description

    YELP DATASET TERMS OF USE Last Updated: February 16, 2021 This document (“Data Agreement”) governs the terms under which you may access and use the data that Yelp makes available for download through this website (or made available by other means) solely for academic or non-commercial purposes (the “Data”). Yelp Terms of Service: By accessing or using the Data, you agree to be bound by the Data Agreement and represent that the contact information you provide to Yelp is correct. If you access or use the Data on behalf of a university, school, or other entity, you represent that you have authority to bind such entity and its affiliates to the Data Agreement and that it is fully binding upon them. In such a case, the term “you” and “your” will refer to such an entity and its affiliates. If you do not have authority, or if you do not agree with the terms of the Data Agreement, you may not access or use the Data. You should read and keep a copy of each component of the Data Agreement for your records. In the event of a conflict among them, the terms of this document will control. 1. Purpose The Data is made available by Yelp Inc. (“Yelp”) to enable you to access valuable local information to develop an academic project as part of an ongoing course of study or for non-commercial purposes. With this in mind, Yelp reserves the right to continually review and evaluate all uses of the Data provided under the Data Agreement. Under certain circumstances, Yelp may authorize limited commercial use under certain circumstances, for example, access and use by journalists to explore our data to generate ideas prior to formal data access requests from Yelp’s PR department. 2. Changes Yelp reserves the right to modify or revise the Data Agreement at any time. If the change is deemed to be material and it is foreseeable that such change could be adverse to your interests, Yelp will provide you notice of the change to this Data Agreement by sending you an email to the email you provided to Yelp. Your continued use of the Data after the notice of material change will constitute your acceptance of and agreement to such changes. IF YOU DO NOT WISH TO BE BOUND TO ANY NEW TERMS, YOU MUST TERMINATE THE DATA AGREEMENT BY IMMEDIATELY CEASING USE OF THE DATA AND DELETING IT FROM ANY SYSTEMS OR MEDIA. 3. License Subject to the terms set forth in the Data Agreement (specifically the restrictions set forth in Section 4 below), Yelp grants you a royalty-free, non-exclusive, revocable, non-sublicensable, non-transferable, fully paid-up right and license during the Term to use, access, and create derivative works of the Data in electronic form for solely for non-commercial use.. Non-commercial use means use of the Data by registered nonprofits, government, educational institutions, and think tanks which (a) is not undertaken for profit, or (b) is not intended to produce works, services, or data for commercial use. You may not use the Data for any other purpose without Yelp’s prior written consent. You acknowledge and agree that Yelp may request information about, review, audit, and/or monitor your use of the Data at any time in order to confirm compliance with the Data Agreement. Nothing herein shall be construed as a license to use Yelp’s registered trademarks or service marks, or any other Yelp branding. Prior to any public presentation or publication of the academic results or conclusions that involve the Data and/or the Yelp brand name, you must submit your findings to Yelp for review and approval, and Yelp will approve of the public release within five (5) business days of its submission to Yelp. 4. Restrictions You agree that you will not, and will not encourage, assist, or enable others to: A. display, perform, or distribute any of the Data, or use the Data to update or create your own business listing information for commercial purposes (i.e. you may not publicly display any of the Data to any third party, especially reviews and other user generated content, as this is a private data set challenge and not a license to compete with or disparage with Yelp); B. use the Data in connection with any commercial purpose; C. use the Data in any manner or for any purpose that may violate any law or regulation, or any right of any person including, but not limited to, intellectual property rights, rights of privacy and/or rights of personality, or which otherwise may be harmful (in Yelp's sole discretion) to Yelp, its providers, its suppliers, end users of this website, or your end users; D. use the Data on behalf of any third party without Yelp’s consent; E. create, redistribute or disclose any summary of, or metrics related to, the Data (e.g., the number of reviewed business included in the Data and other statistical analysis) to any third party or on any website or other electronic media not expressly covered by this Agreement or without Yelp’s ...

  5. Yelp Dataset

    • kaggle.com
    zip
    Updated Mar 17, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yelp, Inc. (2022). Yelp Dataset [Dataset]. https://www.kaggle.com/datasets/yelp-dataset/yelp-dataset/code
    Explore at:
    zip(4374983563 bytes)Available download formats
    Dataset updated
    Mar 17, 2022
    Dataset provided by
    Yelphttp://yelp.com/
    Authors
    Yelp, Inc.
    Description

    Context

    This dataset is a subset of Yelp's businesses, reviews, and user data. It was originally put together for the Yelp Dataset Challenge which is a chance for students to conduct research or analysis on Yelp's data and share their discoveries. In the most recent dataset you'll find information about businesses across 8 metropolitan areas in the USA and Canada.

    Content

    This dataset contains five JSON files and the user agreement. More information about those files can be found here.

    Code snippet to read the files

    in Python, you can read the JSON files like this (using the json and pandas libraries):

    import json
    import pandas as pd
    data_file = open("yelp_academic_dataset_checkin.json")
    data = []
    for line in data_file:
     data.append(json.loads(line))
    checkin_df = pd.DataFrame(data)
    data_file.close()
    
    
  6. r

    YELP

    • resodate.org
    • service.tib.eu
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yueen Ma; Dafeng Chi; Jingjing Li; Kai Song; Yuzheng Zhuang; Irwin King (2024). YELP [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQveWVscA==
    Explore at:
    Dataset updated
    Dec 3, 2024
    Dataset provided by
    Leibniz Data Manager
    Authors
    Yueen Ma; Dafeng Chi; Jingjing Li; Kai Song; Yuzheng Zhuang; Irwin King
    Description

    The YELP dataset is used for language modeling.

  7. r

    Yelp Reviews

    • resodate.org
    • service.tib.eu
    Updated Dec 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jiacheng Xu; Danlu Chen; Xipeng Qiu; Xuangjing Huang (2024). Yelp Reviews [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQveWVscC1yZXZpZXdz
    Explore at:
    Dataset updated
    Dec 2, 2024
    Dataset provided by
    Leibniz Data Manager
    Authors
    Jiacheng Xu; Danlu Chen; Xipeng Qiu; Xuangjing Huang
    Description

    Yelp Reviews is a large dataset of customer reviews.

  8. Yelp Inc. Alternative Data Analytics

    • meyka.com
    Updated Sep 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meyka (2025). Yelp Inc. Alternative Data Analytics [Dataset]. https://meyka.com/stock/YELP/alt-data/
    Explore at:
    Dataset updated
    Sep 24, 2025
    Dataset provided by
    Description

    Non-traditional data signals from social media and employment platforms for YELP stock analysis

  9. e

    yelp.com Traffic Analytics Data

    • analytics.explodingtopics.com
    Updated Sep 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). yelp.com Traffic Analytics Data [Dataset]. https://analytics.explodingtopics.com/website/yelp.com
    Explore at:
    Dataset updated
    Sep 1, 2025
    Variables measured
    Global Rank, Monthly Visits, Authority Score, US Country Rank, Online Services Category Rank
    Description

    Traffic analytics, rankings, and competitive metrics for yelp.com as of September 2025

  10. h

    yelp-04-2024

    • huggingface.co
    Updated Sep 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adam Amer (2024). yelp-04-2024 [Dataset]. https://huggingface.co/datasets/adamamer20/yelp-04-2024
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 12, 2024
    Authors
    Adam Amer
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Yelp Complete Open Dataset 04.2024

      Dataset Description
    

    This dataset contains the complete Yelp Open Dataset, a rich collection of user reviews, business information, and user data. It is a valuable resource for tasks such as sentiment analysis, recommendation systems, and other natural language processing (NLP) projects.

      Source
    

    The dataset is provided by Yelp and is publicly available under the Yelp Dataset Terms of Use.

      Dataset Structure
    

    The dataset… See the full description on the dataset page: https://huggingface.co/datasets/adamamer20/yelp-04-2024.

  11. 🏪Yelp Reviews for Senti-Analysis Binary -N/P+

    • kaggle.com
    zip
    Updated May 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yassir Acharki (2022). 🏪Yelp Reviews for Senti-Analysis Binary -N/P+ [Dataset]. https://www.kaggle.com/datasets/yacharki/yelp-reviews-for-sentianalysis-binary-np-csv
    Explore at:
    zip(169583717 bytes)Available download formats
    Dataset updated
    May 9, 2022
    Authors
    Yassir Acharki
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    The Yelp reviews polarity dataset is constructed by considering stars 1 and 2 negative, and 3 and 4 positive. For each polarity 280,000 training samples and 19,000 testing samples are take randomly. In total there are 560,000 trainig samples and 38,000 testing samples. Negative polarity is class 1, and positive class 2.

    The files train.csv and test.csv contain all the training samples as comma-sparated values. There are 2 columns in them, corresponding to class index (1 and 2) and review text. The review texts are escaped using double quotes ("), and any internal double quote is escaped by 2 double quotes (""). New lines are escaped by a backslash followed with an "n" character, that is " ".

  12. d

    Dataplex: Google Reviews & Ratings Dataset | Track Consumer Sentiment &...

    • datarade.ai
    .json, .csv
    Updated Feb 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataplex (2025). Dataplex: Google Reviews & Ratings Dataset | Track Consumer Sentiment & Location-Based Insights [Dataset]. https://datarade.ai/data-products/dataplex-google-reviews-ratings-dataset-track-consumer-s-dataplex
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Feb 3, 2025
    Dataset authored and provided by
    Dataplex
    Area covered
    United States
    Description

    The Google Reviews & Ratings Dataset provides businesses with structured insights into customer sentiment, satisfaction, and trends based on reviews from Google. Unlike broad review datasets, this product is location-specific—businesses provide the locations they want to track, and we retrieve as much historical data as possible, with daily updates moving forward.

    This dataset enables businesses to monitor brand reputation, analyze consumer feedback, and enhance decision-making with real-world insights. For deeper analysis, optional AI-driven sentiment analysis and review summaries are available on a weekly, monthly, or yearly basis.

    Dataset Highlights

    • Location-Specific Reviews – Reviews and ratings for the locations you provide.
    • Daily Updates – New reviews and rating changes updated automatically.
    • Historical Data Access – Retrieve past reviews where available.
    • AI Sentiment Analysis (Optional) – Summarized insights by week, month, or year.
    • Competitive Benchmarking – Compare performance across selected locations.

    Use Cases

    • Franchise & Retail Chains – Monitor brand reputation and performance across locations.
    • Hospitality & Restaurants – Track guest sentiment and service trends.
    • Healthcare & Medical Facilities – Understand patient feedback for specific locations.
    • Real Estate & Property Management – Analyze tenant and customer experiences through reviews.
    • Market Research & Consumer Insights – Identify trends and analyze feedback patterns across industries.

    Data Updates & Delivery

    • Update Frequency: Daily
    • Data Format: CSV for easy integration
    • Delivery: Secure file transfer (SFTP or cloud storage)

    Data Fields Include:

    • Business Name
    • Location Details
    • Star Ratings
    • Review Text
    • Timestamps
    • Reviewer Metadata

    Optional Add-Ons:

    • AI Sentiment Analysis – Aggregate trends by week, month, or year.
    • Custom Location Tracking – Tailor the dataset to fit your specific business needs.

    Ideal for

    • Marketing Teams – Leverage real-world consumer feedback to optimize brand strategy.
    • Business Analysts – Use structured review data to track customer sentiment over time.
    • Operations & Customer Experience Teams – Identify service issues and opportunities for improvement.
    • Competitive Intelligence – Compare locations and benchmark against industry competitors.

    Why Choose This Dataset?

    • Accurate & Up-to-Date – Daily updates ensure fresh, reliable data.
    • Scalable & Customizable – Track only the locations that matter to you.
    • Actionable Insights – AI-driven summaries for quick decision-making.
    • Easy Integration – Delivered in a structured format for seamless analysis.

    By leveraging Google Reviews & Ratings Data, businesses can gain valuable insights into customer sentiment, enhance reputation management, and stay ahead of the competition.

  13. c

    Walmart reviews data in JSON format

    • crawlfeeds.com
    json, zip
    Updated Aug 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2024). Walmart reviews data in JSON format [Dataset]. https://crawlfeeds.com/datasets/walmart-reviews-data-in-json-format
    Explore at:
    zip, jsonAvailable download formats
    Dataset updated
    Aug 26, 2024
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    Walmart Product Reviews Dataset provides an extensive collection of customer feedback that can be pivotal for businesses aiming to understand consumer preferences and behaviors. This dataset includes detailed information such as ratings, reviews, and timestamps, making it an invaluable resource for data analysts and market researchers. By analyzing the data, companies can identify trends, detect potential issues with products, and gauge overall customer satisfaction. Whether you're looking to optimize product offerings or enhance customer service, this dataset is a goldmine of actionable insights.

    Leveraging the Walmart Ratings and Reviews Dataset for Competitive Advantage

    Utilizing the Walmart Ratings and Reviews Dataset allows businesses to stay ahead of the competition by tapping into authentic customer experiences. This dataset is particularly useful for sentiment analysis, enabling companies to discern the emotional tone behind customer reviews. By doing so, businesses can refine their marketing strategies, address customer concerns proactively, and improve product development processes. Moreover, integrating this dataset with other data sources can provide a comprehensive view of market dynamics, helping companies make informed, data-driven decisions.

    Walmart ratings and reviews dataset. Last extracted on 16 aug 2022

  14. c

    Unlocking User Sentiment: The App Store Reviews Dataset

    • crawlfeeds.com
    json, zip
    Updated Jun 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Unlocking User Sentiment: The App Store Reviews Dataset [Dataset]. https://crawlfeeds.com/datasets/app-store-reviews-dataset
    Explore at:
    json, zipAvailable download formats
    Dataset updated
    Jun 20, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    This dataset offers a focused and invaluable window into user perceptions and experiences with applications listed on the Apple App Store. It is a vital resource for app developers, product managers, market analysts, and anyone seeking to understand the direct voice of the customer in the dynamic mobile app ecosystem.

    Dataset Specifications:

    • Investment: $45.0
    • Status: Published and immediately available.
    • Category: Ratings and Reviews Data
    • Format: Compressed ZIP archive containing JSON files, ensuring easy integration into your analytical tools and platforms.
    • Volume: Comprises 10,000 unique app reviews, providing a robust sample for qualitative and quantitative analysis of user feedback.
    • Timeliness: Last crawled: (This field is blank in your provided info, which means its recency is currently unknown. If this were a real product, specifying this would be critical for its value proposition.)

    Richness of Detail (11 Comprehensive Fields):

    Each record in this dataset provides a detailed breakdown of a single App Store review, enabling multi-dimensional analysis:

    1. Review Content:

      • review: The full text of the user's written feedback, crucial for Natural Language Processing (NLP) to extract themes, sentiment, and common keywords.
      • title: The title given to the review by the user, often summarizing their main point.
      • isEdited: A boolean flag indicating whether the review has been edited by the user since its initial submission. This can be important for tracking evolving sentiment or understanding user behavior.
    2. Reviewer & Rating Information:

      • username: The public username of the reviewer, allowing for analysis of engagement patterns from specific users (though not personally identifiable).
      • rating: The star rating (typically 1-5) given by the user, providing a quantifiable measure of satisfaction.
    3. App & Origin Context:

      • app_name: The name of the application being reviewed.
      • app_id: A unique identifier for the application within the App Store, enabling direct linking to app details or other datasets.
      • country: The country of the App Store storefront where the review was left, allowing for geographic segmentation of feedback.
    4. Metadata & Timestamps:

      • _id: A unique identifier for the specific review record in the dataset.
      • crawled_at: The timestamp indicating when this particular review record was collected by the data provider (Crawl Feeds).
      • date: The original date the review was posted by the user on the App Store.

    Expanded Use Cases & Analytical Applications:

    This dataset is a goldmine for understanding what users truly think and feel about mobile applications. Here's how it can be leveraged:

    • Product Development & Improvement:

      • Bug Detection & Prioritization: Analyze negative review text to identify recurring technical issues, crashes, or bugs, allowing developers to prioritize fixes based on user impact.
      • Feature Requests & Roadmap Prioritization: Extract feature suggestions from positive and neutral review text to inform future product roadmap decisions and develop features users actively desire.
      • User Experience (UX) Enhancement: Understand pain points related to app design, navigation, and overall usability by analyzing common complaints in the review field.
      • Version Impact Analysis: If integrated with app version data, track changes in rating and sentiment after new app updates to assess the effectiveness of bug fixes or new features.
    • Market Research & Competitive Intelligence:

      • Competitor Benchmarking: Analyze reviews of competitor apps (if included or combined with similar datasets) to identify their strengths, weaknesses, and user expectations within a specific app category.
      • Market Gap Identification: Discover unmet user needs or features that users desire but are not adequately provided by existing apps.
      • Niche Opportunities: Identify specific use cases or user segments that are underserved based on recurring feedback.
    • Marketing & App Store Optimization (ASO):

      • Sentiment Analysis: Perform sentiment analysis on the review and title fields to gauge overall user satisfaction, pinpoint specific positive and negative aspects, and track sentiment shifts over time.
      • Keyword Optimization: Identify frequently used keywords and phrases in reviews to optimize app store listings, improving discoverability and search ranking.
      • Messaging Refinement: Understand how users describe and use the app in their own words, which can inform marketing copy and advertising campaigns.
      • Reputation Management: Monitor rating trends and identify critical reviews quickly to facilitate timely responses and proactive customer engagement.
    • Academic & Data Science Research:

      • Natural Language Processing (NLP): The review and title fields are excellent for training and testing NLP models for sentiment analysis, topic modeling, named entity recognition, and text summarization.
      • User Behavior Analysis: Study patterns in rating distribution, isEdited status, and date to understand user engagement and feedback cycles.
      • Cross-Country Comparisons: Analyze country-specific reviews to understand regional differences in app perception, feature preferences, or cultural nuances in feedback.

    This App Store Reviews dataset provides a direct, unfiltered conduit to understanding user needs and ultimately driving better app performance and greater user satisfaction. Its structured format and granular detail make it an indispensable asset for data-driven decision-making in the mobile app industry.

  15. d

    Product and Price Data, Product Reviews Data from Google Shopping |...

    • datarade.ai
    .json, .csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenWeb Ninja, Product and Price Data, Product Reviews Data from Google Shopping | Ecommerce Data | Real-Time API [Dataset]. https://datarade.ai/data-products/openweb-ninja-product-data-product-reviews-data-more-fro-openweb-ninja
    Explore at:
    .json, .csvAvailable download formats
    Dataset authored and provided by
    OpenWeb Ninja
    Area covered
    Yemen, Namibia, Martinique, Libya, Réunion, Mexico, Kosovo, Taiwan, Nigeria, Guinea
    Description

    OpenWeb Ninja's Product Data API provides Product Data, Product Reviews Data, Product Offers, sourced in real-time from Google Shopping - the largest product listings aggregate on the web, listing products from all publicly available e-commerce sites (Amazon, eBay, Walmart + many others).

    The API covers more than 35 billion Product Data Listings, including Product Reviews and Product Offers across the web. The API provides over 40 product data points including prices, rating and reviews insights, product details and specs, typical price ranges, and more.

    OpenWeb Ninja's Product Data common use cases: - Price Optimization & Price Comparison - Market Research & Competitive Analysis - Product Research & Trend Analysis - Customer Reviews Analysis

    OpenWeb Ninja's Product Data Stats & Capabilities: - 35B+ Product Listings - 40+ data points per job listing - Global aggregate - Search by keyword or GTIN/EAN

  16. yelp+amazon+imdb REVIEWS

    • kaggle.com
    zip
    Updated May 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Akash Kumar (2022). yelp+amazon+imdb REVIEWS [Dataset]. https://www.kaggle.com/datasets/akashkumar01/yelpamazonimdb
    Explore at:
    zip(82239 bytes)Available download formats
    Dataset updated
    May 9, 2022
    Authors
    Akash Kumar
    Description

    Review dataset from Amazon, Yelp and Imdb. This dataset can be used for NLP sentiment Analysis. LEVEL: Beginner

    '1' is a Positive sentiment '0' is a Negative sentiment

  17. H

    Replication Data for: "Authentic and amazing": authenticity as an evaluative...

    • dataverse.harvard.edu
    Updated Feb 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dominick Boyle (2024). Replication Data for: "Authentic and amazing": authenticity as an evaluative category in online consumer restaurant reviews. [Dataset]. http://doi.org/10.7910/DVN/9JVSMI
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 12, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Dominick Boyle
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset corresponds to the paper "'Authentic and amazing': authenticity as an evaluative category in online consumer restaurant reviews" appearing in Cultural Analytics. This dataset provides the R scripts used for the preparation, analysis as well as the import of data to Sketch Engine, the ID lists of the reviews in Corpus 1, 2 and 3, as well as the authenticity lexicons used which were derived from O'Connor et. al (2017) under a CC BY 4.0 license. The IDs correspond the those in the Yelp Dataset at the time of data collection (2019).

  18. d

    Replication Data for: \"A Topic-based Segmentation Model for Identifying...

    • search.dataone.org
    Updated Sep 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kim, Sunghoon; Lee, Sanghak; McCulloch, Robert (2024). Replication Data for: \"A Topic-based Segmentation Model for Identifying Segment-Level Drivers of Star Ratings from Unstructured Text Reviews\" [Dataset]. http://doi.org/10.7910/DVN/EE3DE2
    Explore at:
    Dataset updated
    Sep 25, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Kim, Sunghoon; Lee, Sanghak; McCulloch, Robert
    Description

    We provide instructions, codes and datasets for replicating the article by Kim, Lee and McCulloch (2024), "A Topic-based Segmentation Model for Identifying Segment-Level Drivers of Star Ratings from Unstructured Text Reviews." This repository provides a user-friendly R package for any researchers or practitioners to apply A Topic-based Segmentation Model with Unstructured Texts (latent class regression with group variable selection) to their datasets. First, we provide a R code to replicate the illustrative simulation study: see file 1. Second, we provide the user-friendly R package with a very simple example code to help apply the model to real-world datasets: see file 2, Package_MixtureRegression_GroupVariableSelection.R and Dendrogram.R. Third, we provide a set of codes and instructions to replicate the empirical studies of customer-level segmentation and restaurant-level segmentation with Yelp reviews data: see files 3-a, 3-b, 4-a, 4-b. Note, due to the dataset terms of use by Yelp and the restriction of data size, we provide the link to download the same Yelp datasets (https://www.kaggle.com/datasets/yelp-dataset/yelp-dataset/versions/6). Fourth, we provided a set of codes and datasets to replicate the empirical study with professor ratings reviews data: see file 5. Please see more details in the description text and comments of each file. [A guide on how to use the code to reproduce each study in the paper] 1. Full codes for replicating Illustrative simulation study.txt -- [see Table 2 and Figure 2 in main text]: This is R source code to replicate the illustrative simulation study. Please run from the beginning to the end in R. In addition to estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships, you will get dendrograms of selected groups of variables in Figure 2. Computing time is approximately 20 to 30 minutes 3-a. Preprocessing raw Yelp Reviews for Customer-level Segmentation.txt: Code for preprocessing the downloaded unstructured Yelp review data and preparing DV and IVs matrix for customer-level segmentation study. 3-b. Instruction for replicating Customer-level Segmentation analysis.txt -- [see Table 10 in main text; Tables F-1, F-2, and F-3 and Figure F-1 in Web Appendix]: Code for replicating customer-level segmentation study with Yelp data. You will get estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships. Computing time is approximately 3 to 4 hours. 4-a. Preprocessing raw Yelp reviews_Restaruant Segmentation (1).txt: R code for preprocessing the downloaded unstructured Yelp data and preparing DV and IVs matrix for restaurant-level segmentation study. 4-b. Instructions for replicating restaurant-level segmentation analysis.txt -- [see Tables 5, 6 and 7 in main text; Tables E-4 and E-5 and Figure H-1 in Web Appendix]: Code for replicating restaurant-level segmentation study with Yelp. you will get estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships. Computing time is approximately 10 to 12 hours. [Guidelines for running Benchmark models in Table 6] Unsupervised Topic model: 'topicmodels' package in R -- after determining the number of topics(e.g., with 'ldatuning' R package), run 'LDA' function in the 'topicmodels'package. Then, compute topic probabilities per restaurant (with 'posterior' function in the package) which can be used as predictors. Then, conduct prediction with regression Hierarchical topic model (HDP): 'gensimr' R package -- 'model_hdp' function for identifying topics in the package (see https://radimrehurek.com/gensim/models/hdpmodel.html or https://gensimr.news-r.org/). Supervised topic model: 'lda' R package -- 'slda.em' function for training and 'slda.predict' for prediction. Aggregate regression: 'lm' default function in R. Latent class regression without variable selection: 'flexmix' function in 'flexmix' R package. Run flexmix with a certain number of segments (e.g., 3 segments in this study). Then, with estimated coefficients and memberships, conduct prediction of dependent variable per each segment. Latent class regression with variable selection: 'Unconstraind_Bayes_Mixture' function in Kim, Fong and DeSarbo(2012)'s package. Run the Kim et al's model (2012) with a certain number of segments (e.g., 3 segments in this study). Then, with estimated coefficients and memberships, we can do prediction of dependent variables per each segment. The same R package ('KimFongDeSarbo2012.zip') can be downloaded at: https://sites.google.com/scarletmail.rutgers.edu/r-code-packages/home 5. Instructions for replicating Professor ratings review study.txt -- [see Tables G-1, G-2, G-4 and G-5, and Figures G-1 and H-2 in Web Appendix]: Code to replicate the Professor ratings reviews study. Computing time is approximately 10 hours. [A list of the versions of R, packages, and computer...

  19. Datasets for Sentiment Analysis

    • zenodo.org
    csv
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 10, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.

    Below are the datasets specified, along with the details of their references, authors, and download sources.

    ----------- STS-Gold Dataset ----------------

    The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.

    Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.

    File name: sts_gold_tweet.csv

    ----------- Amazon Sales Dataset ----------------

    This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.

    Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)

    Features:

    • product_id - Product ID
    • product_name - Name of the Product
    • category - Category of the Product
    • discounted_price - Discounted Price of the Product
    • actual_price - Actual Price of the Product
    • discount_percentage - Percentage of Discount for the Product
    • rating - Rating of the Product
    • rating_count - Number of people who voted for the Amazon rating
    • about_product - Description about the Product
    • user_id - ID of the user who wrote review for the Product
    • user_name - Name of the user who wrote review for the Product
    • review_id - ID of the user review
    • review_title - Short review
    • review_content - Long review
    • img_link - Image Link of the Product
    • product_link - Official Website Link of the Product

    License: CC BY-NC-SA 4.0

    File name: amazon.csv

    ----------- Rotten Tomatoes Reviews Dataset ----------------

    This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.

    This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).

    Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics

    File name: data_rt.csv

    ----------- Preprocessed Dataset Sentiment Analysis ----------------

    Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
    Stemmed and lemmatized using nltk.
    Sentiment labels are generated using TextBlob polarity scores.

    The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).

    DOI: 10.34740/kaggle/dsv/3877817

    Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }

    This dataset was used in the experimental phase of my research.

    File name: EcoPreprocessed.csv

    ----------- Amazon Earphones Reviews ----------------

    This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)

    License: U.S. Government Works

    Source: www.amazon.in

    File name (original): AllProductReviews.csv (contains 14337 reviews)

    File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)

    ----------- Amazon Musical Instruments Reviews ----------------

    This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).

    Source: http://jmcauley.ucsd.edu/data/amazon/

    File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)

    File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)

  20. t

    IMDB and Yelp datasets - Dataset - LDM

    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). IMDB and Yelp datasets - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/imdb-and-yelp-datasets
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    IMDB and Yelp are datasets used for sentiment analysis and author identification.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yelp (2012). yelp_review_full [Dataset]. https://huggingface.co/datasets/Yelp/yelp_review_full
Organization logo

yelp_review_full

YelpReviewFull

Yelp/yelp_review_full

Explore at:
67 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 6, 2012
Dataset authored and provided by
Yelphttp://yelp.com/
License

https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

Description

Dataset Card for YelpReviewFull

  Dataset Summary

The Yelp reviews dataset consists of reviews from Yelp. It is extracted from the Yelp Dataset Challenge 2015 data.

  Supported Tasks and Leaderboards

text-classification, sentiment-classification: The dataset is mainly used for text classification: given the text, predict the sentiment.

  Languages

The reviews were mainly written in english.

  Dataset Structure





  Data Instances

A… See the full description on the dataset page: https://huggingface.co/datasets/Yelp/yelp_review_full.

Search
Clear search
Close search
Google apps
Main menu