The inspiration behind creating the OYO Review Dataset for sentiment analysis was to explore the sentiment and opinions expressed in hotel reviews on the OYO Hotels platform. Analyzing the sentiment of customer reviews can provide valuable insights into the overall satisfaction of guests, identify areas for improvement, and assist in making data-driven decisions to enhance the hotel experience. By collecting and curating this dataset, Deep Patel, Nikki Patel, and Nimil aimed to contribute to the field of sentiment analysis in the context of the hospitality industry. Sentiment analysis allows us to classify the sentiment expressed in textual data, such as reviews, into positive, negative, or neutral categories. This analysis can help hotel management and stakeholders understand customer sentiments, identify common patterns, and address concerns or issues that may affect the reputation and customer satisfaction of OYO Hotels. The dataset provides a valuable resource for training and evaluating sentiment analysis models specifically tailored to the hospitality domain. Researchers, data scientists, and practitioners can utilize this dataset to develop and test various machine learning and natural language processing techniques for sentiment analysis, such as classification algorithms, sentiment lexicons, or deep learning models. Overall, the goal of creating the OYO Review Dataset for sentiment analysis was to facilitate research and analysis in the area of customer sentiments and opinions in the hotel industry. By understanding the sentiment of hotel reviews, businesses can strive to improve their services, enhance customer satisfaction, and make data-driven decisions to elevate the overall guest experience.
Deep Patel: https://www.linkedin.com/in/deep-patel-55ab48199/ Nikki Patel: https://www.linkedin.com/in/nikipatel9/ Nimil lathiya: https://www.linkedin.com/in/nimil-lathiya-059a281b1/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The TripAdvisor Vietnam Hotel Reviews Dataset is a comprehensive collection of user-generated reviews from the popular online travel platform TripAdvisor. This dataset offers valuable insights into the experiences, opinions, and ratings provided by individuals who have stayed at various hotels across Vietnam.
The dataset encompasses many hotels in different cities and regions of Vietnam, including popular tourist destinations such as Hanoi, Ho Chi Minh City, Da Nang, Nha Trang, and more. The reviews cover a diverse spectrum of accommodation types, ranging from budget guesthouses to luxurious resorts, providing a comprehensive representation of the Vietnamese hospitality industry.
Each review entry in the dataset includes a rich set of information, offering researchers, developers, and data analysts an in-depth understanding of hotel performance and customer satisfaction. Key attributes of the dataset include:
Review Text: The actual text of the review left by the user, which contains detailed descriptions, opinions, and feedback about their hotel experience.
Rating: The overall rating provided by the reviewer, typically ranging from 1 to 5 stars, reflects their satisfaction level with the hotel.
Date: The review was posted, enabling temporal analysis and tracking changes over time.
Location: The hotel's geographic location allows researchers to analyze regional variations in hotel performance and customer preferences.
The TripAdvisor Vietnam Hotel Reviews Dataset is valuable for various applications, including sentiment analysis, opinion mining, natural language processing, customer behavior analysis, recommender systems, and more. Researchers can leverage this dataset to gain deep insights into customer experiences, identify patterns, trends, and sentiments, and develop data-driven strategies for the Vietnamese hotel industry.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets of Tripadvisor reviews by UK residents of UK hotels and restaurants, together with the user's rating of the hotel.Datasets are split by:Hotel star level (2, 3, 4 or all[mixed]) or Restaurant;Reviewer gender (M=male-authored reviews; F=female-authored reviews; MF=equal numbers of male and female authored reviews for each rating level);Number of texts (1k, 2k, 4k, 8k, 16k, or all available)Each dataset contains equal numbers of reviews at each rating level.The reviews were selected at random from TripAdvisor.This data is from this paper:Thelwall, M. (2018). Gender bias in machine learning for sentiment analysis. Online Information Review, 42(3), 343-354. doi: 10.1108/OIR-05-2017-0152
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
An English corpus for studying local sentiment flows and aspect-based sentiment analysis. It contains 2100 hotel reviews balanced with respect to the reviews’ sentiment scores. All reviews are segmented into subsentence-level statements that have then been manually classified as a fact, a positive, or a negative opinion. Also, all hotel aspects mentioned in the reviews have been annotated as such:arguana-tripadvisor-annotated-plus-software-v1.ziparguana-tripadvisor-annotated-v2.zipIn addition, we provide nearly 200k further hotel reviews without manual annotations:v1 upon requestarguana-tripadvisor-unannotated-v2.zip. The corpus is free-to-use for scientific purposes, not for commercial applications. In version 2, the annotated XMI files have been changed according to a new underlying type system that is more easily extendable. Notice that some adaptations of the software of version 1 are necessary to make it work with version 2.In case you publish any results related to the ArguAna TripAdvisor corpus, please cite our CICLing 2014 paper.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains TripAdvisor guest reviews for major hotels in Salalah, Oman, collected through web scraping. It provides insights into guest satisfaction, sentiment, and ratings, making it a valuable resource for marketing, hospitality and tourism research, sentiment analysis, and tourism marketing studies.
𝐇𝐨𝐭𝐞𝐥𝐬 𝐈𝐧𝐜𝐥𝐮𝐝𝐞𝐝 𝐢𝐧 𝐭𝐡𝐞 𝐃𝐚𝐭𝐚𝐬𝐞𝐭 The dataset features guest reviews from the following hotels in Salalah:
• Al Baleed Resort Salalah by Anantara • Belad Bont Resort • Crowne Plaza Resort Salalah • Fanar Hotel and Residences • Hilton Salalah Resort • Juweira Boutique Hotel • Millennium Resort Salalah • Salalah Gardens Hotel • Salalah Rotana Resort
𝐓𝐢𝐦𝐞 𝐂𝐨𝐯𝐞𝐫𝐚𝐠𝐞 The dataset captures all available guest reviews from the beginning of each hotel's presence on TripAdvisor up until February 2025.
𝐑𝐞𝐥𝐞𝐯𝐚𝐧𝐜𝐞 𝐭𝐨 𝐊𝐡𝐚𝐫𝐞𝐞𝐟 𝐓𝐨𝐮𝐫𝐢𝐬𝐦 𝐎𝐦𝐚𝐧 𝐕𝐢𝐬𝐢𝐨𝐧 2040 This dataset is particularly beneficial for the following government agencies: • Ministry of Heritage and Tourism - Oman • Oman Chamber of Commerce & Industry (OCCI) • Dhofar Municipality and Dhofar Tourism Department • National Centre for Statistics and Information (NCSI) • Oman Vision 2040 Implementation Follow-up Unit • Ministry of Commerce, Industry, and Investment Promotion • Oman Tourism Development Company (OMRAN) • Ministry of Transport, Communications, and Information Technology (MTCIT) • Dhofar Governorate Office • Ministry of Environment and Climate Affairs
It also serves as a valuable resource for researchers, policymakers, and marketing, hospitality & tourism professionals to enhance Salalah’s tourism sector, improve guest satisfaction, and support Oman’s long-term vision for a thriving and sustainable tourism industry.
Salalah experiences a surge in visitors during the Khareef season (monsoon season), a critical period for the hospitality industry. This dataset can help analyze guest experiences, identify service gaps, and optimize offerings during this peak tourism period.
Oman Vision 2040 Goals The dataset aligns with Oman’s Vision 2040, which prioritizes tourism sector growth, economic diversification, and enhanced customer experiences. By leveraging sentiment analysis and guest insights, policymakers and hotel managers can develop data-driven strategies to improve hospitality services, attract more visitors, and enhance Salalah’s reputation as a premier travel destination.
Potential Use Cases Sentiment Analysis: Understanding guest satisfaction trends over time Tourism & Hospitality Research: Evaluating service quality and hotel performance across different years Marketing Insights: Identifying key drivers of positive and negative reviews for strategic decision-making Machine Learning & NLP: Training models for text classification, sentiment prediction, and recommendation systems
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The emotion analysis of hotel online reviews is discussed by using the neural network model BERT, which proves that this method can not only help hotel network platforms fully understand customer needs but also help customers find suitable hotels according to their needs and affordability and help hotel recommendations be more intelligent. Therefore, using the pretraining BERT model, a number of emotion analytical experiments were carried out through fine-tuning, and a model with high classification accuracy was obtained by frequently adjusting the parameters during the experiment. The BERT layer was taken as a word vector layer, and the input text sequence was used as the input to the BERT layer for vector transformation. The output vectors of BERT passed through the corresponding neural network and were then classified by the softmax activation function. ERNIE is an enhancement of the BERT layer. Both models can lead to good classification results, but the latter performs better. ERNIE exhibits stronger classification and stability than BERT, which provides a promising research direction for the field of tourism and hotels.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Webis Tripad 2013 Sentiment Corpus is a English text corpus of 2100 hotel reviews for the development and evaluation of approaches to sentiment flow analysis. Each document in this corpus is assigned an overall rating score, some metadata, and two kinds of annotations. First, each statement of a review's text has been classified with respect to its sentiment polarity (positive, negative, objective) by Amazon Mechanical Turk (AMT) workers. Second, hotel aspects mentioned in the texts were tagged by in-house domain experts.
To give an example, the sentence "The service was perfect and the rooms were clean." consists of two statements "The service was perfect" and "the rooms were clean", both with positive sentiment classification. The aspect in the first statement is "service" and "rooms" in the second, respectively.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
With an increasingly large number of Chinese tourists in Japan, the hotel industry is in need of an affordable market research tool that does not rely on expensive and time consuming surveys or inter- views. Because this problem is real and relevant to the hotel industry in Japan, and otherwise completely unexplored in other studies, we have extracted a list of potential keywords from Chinese reviews of Japanese hotels in the hotel portal site Ctrip using a mathematical model to then use them in a sentiment analysis with a machine learning classifier. While most studies that use information collected from the internet use pre-existing data analysis tools, in our study we designed the mathematical model to have the highest possible performing results in classification, while also exploring on the potential business implications these may have.
A. Market Research and Analysis: Utilize the Tripadvisor dataset to conduct in-depth market research and analysis in the travel and hospitality industry. Identify emerging trends, popular destinations, and customer preferences. Gain a competitive edge by understanding your target audience's needs and expectations.
B. Competitor Analysis: Compare and contrast your hotel or travel services with competitors on Tripadvisor. Analyze their ratings, customer reviews, and performance metrics to identify strengths and weaknesses. Use these insights to enhance your offerings and stand out in the market.
C. Reputation Management: Monitor and manage your hotel's online reputation effectively. Track and analyze customer reviews and ratings on Tripadvisor to identify improvement areas and promptly address negative feedback. Positive reviews can be leveraged for marketing and branding purposes.
D. Pricing and Revenue Optimization: Leverage the Tripadvisor dataset to analyze pricing strategies and revenue trends in the hospitality sector. Understand seasonal demand fluctuations, pricing patterns, and revenue optimization opportunities to maximize your hotel's profitability.
E. Customer Sentiment Analysis: Conduct sentiment analysis on Tripadvisor reviews to gauge customer satisfaction and sentiment towards your hotel or travel service. Use this information to improve guest experiences, address pain points, and enhance overall customer satisfaction.
F. Content Marketing and SEO: Create compelling content for your hotel or travel website based on the popular keywords, topics, and interests identified in the Tripadvisor dataset. Optimize your content to improve search engine rankings and attract more potential guests.
G. Personalized Marketing Campaigns: Use the data to segment your target audience based on preferences, travel habits, and demographics. Develop personalized marketing campaigns that resonate with different customer segments, resulting in higher engagement and conversions.
H. Investment and Expansion Decisions: Access historical and real-time data on hotel performance and market dynamics from Tripadvisor. Utilize this information to make data-driven investment decisions, identify potential areas for expansion, and assess the feasibility of new ventures.
I. Predictive Analytics: Utilize the dataset to build predictive models that forecast future trends in the travel industry. Anticipate demand fluctuations, understand customer behavior, and make proactive decisions to stay ahead of the competition.
J. Business Intelligence Dashboards: Create interactive and insightful dashboards that visualize key performance metrics from the Tripadvisor dataset. These dashboards can help executives and stakeholders get a quick overview of the hotel's performance and make data-driven decisions.
Incorporating the Tripadvisor dataset into your business processes will enhance your understanding of the travel market, facilitate data-driven decision-making, and provide valuable insights to drive success in the competitive hospitality industry
https://brightdata.com/licensehttps://brightdata.com/license
The Booking Hotel Listings Dataset provides a structured and in-depth view of accommodations worldwide, offering essential data for travel industry professionals, market analysts, and businesses. This dataset includes key details such as hotel names, locations, star ratings, pricing, availability, room configurations, amenities, guest reviews, sustainability features, and cancellation policies.
With this dataset, users can:
Analyze market trends to understand booking behaviors, pricing dynamics, and seasonal demand.
Enhance travel recommendations by identifying top-rated hotels based on reviews, location, and amenities.
Optimize pricing and revenue strategies by benchmarking property performance and availability patterns.
Assess guest satisfaction through sentiment analysis of ratings and reviews.
Evaluate sustainability efforts by examining eco-friendly features and certifications.
Designed for hospitality businesses, travel platforms, AI-powered recommendation engines, and pricing strategists, this dataset enables data-driven decision-making to improve customer experience and business performance.
Use Cases
Booking Hotel Listings in Greece
Gain insights into Greece’s diverse hospitality landscape, from luxury resorts in Santorini to boutique hotels in Athens. Analyze review scores, availability trends, and traveler preferences to refine booking strategies.
Booking Hotel Listings in Croatia
Explore hotel data across Croatia’s coastal and inland destinations, ideal for travel planners targeting visitors to Dubrovnik, Split, and Plitvice Lakes. This dataset includes review scores, pricing, and sustainability features.
Booking Hotel Listings with Review Scores Greater Than 9
A curated selection of high-rated hotels worldwide, ideal for luxury travel planners and market researchers focused on premium accommodations that consistently exceed guest expectations.
Booking Hotel Listings in France with More Than 1000 Reviews
Analyze well-established and highly reviewed hotels across France, ensuring reliable guest feedback for market insights and customer satisfaction benchmarking.
This dataset serves as an indispensable resource for travel analysts, hospitality businesses, and data-driven decision-makers, providing the intelligence needed to stay competitive in the ever-evolving travel industry.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Machine learning techniques that rely on textual features or sentiment lexicons can lead to erroneous sentiment analysis. These techniques are especially vulnerable to domain-related difficulties, especially when dealing in Big data. In addition, labeling is time-consuming and supervised machine learning algorithms often lack labeled data. Transfer learning can help save time and obtain high performance with fewer datasets in this field. To cope this, we used a transfer learning-based Multi-Domain Sentiment Classification (MDSC) technique. We are able to identify the sentiment polarity of text in a target domain that is unlabeled by looking at reviews in a labelled source domain. This research aims to evaluate the impact of domain adaptation and measure the extent to which transfer learning enhances sentiment analysis outcomes. We employed transfer learning models BERT, RoBERTa, ELECTRA, and ULMFiT to improve the performance in sentiment analysis. We analyzed sentiment through various transformer models and compared the performance of LSTM and CNN. The experiments are carried on five publicly available sentiment analysis datasets, namely Hotel Reviews (HR), Movie Reviews (MR), Sentiment140 Tweets (ST), Citation Sentiment Corpus (CSC), and Bioinformatics Citation Corpus (BCC), to adapt multi-target domains. The performance of numerous models employing transfer learning from diverse datasets demonstrating how various factors influence the outputs.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
TripAdvisor reviews and comparable data sources play an important role in many tasks in Natural Language Processing (NLP), providing a data basis for the identification and classification of subjective judgments, such as hotel or restaurant reviews, into positive or negative polarities. This study explores three important factors influencing variation in crowdsourced polarity judgments, focusing on TripAdvisor reviews in Spanish. Three hypotheses are tested: the role of Part Of Speech (POS), the impact of sentiment words such as “tasty”, and the influence of neutral words like “ok” on judgment variation. The study’s methodology employs one-word titles, demonstrating their efficacy in studying polarity variation of words. Statistical tests on mean equality are performed on word groups of our interest. The results of this study reveal that adjectives in one-word titles tend to result in lower judgment variation compared to other word types or POS. Sentiment words contribute to lower judgment variation as well, emphasizing the significance of sentiment words in research on polarity judgments, and neutral words are associated with higher judgment variation as expected. However, these effects cannot be always reproduced in longer titles, which suggests that longer titles do not represent the best data source for testing the ambiguity of single words due to the influence on word polarity by other words like negation in longer titles. This empirical investigation contributes valuable insights into the factors influencing polarity variation of words, providing a foundation for NLP practitioners that aim to capture and predict polarity judgments in Spanish and for researchers that aim to understand factors influencing judgment variation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Machine learning techniques that rely on textual features or sentiment lexicons can lead to erroneous sentiment analysis. These techniques are especially vulnerable to domain-related difficulties, especially when dealing in Big data. In addition, labeling is time-consuming and supervised machine learning algorithms often lack labeled data. Transfer learning can help save time and obtain high performance with fewer datasets in this field. To cope this, we used a transfer learning-based Multi-Domain Sentiment Classification (MDSC) technique. We are able to identify the sentiment polarity of text in a target domain that is unlabeled by looking at reviews in a labelled source domain. This research aims to evaluate the impact of domain adaptation and measure the extent to which transfer learning enhances sentiment analysis outcomes. We employed transfer learning models BERT, RoBERTa, ELECTRA, and ULMFiT to improve the performance in sentiment analysis. We analyzed sentiment through various transformer models and compared the performance of LSTM and CNN. The experiments are carried on five publicly available sentiment analysis datasets, namely Hotel Reviews (HR), Movie Reviews (MR), Sentiment140 Tweets (ST), Citation Sentiment Corpus (CSC), and Bioinformatics Citation Corpus (BCC), to adapt multi-target domains. The performance of numerous models employing transfer learning from diverse datasets demonstrating how various factors influence the outputs.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Hyperparameters employed for Bert, Roberta and Electra.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison graph of TL AND DL results based on F1-score.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
The inspiration behind creating the OYO Review Dataset for sentiment analysis was to explore the sentiment and opinions expressed in hotel reviews on the OYO Hotels platform. Analyzing the sentiment of customer reviews can provide valuable insights into the overall satisfaction of guests, identify areas for improvement, and assist in making data-driven decisions to enhance the hotel experience. By collecting and curating this dataset, Deep Patel, Nikki Patel, and Nimil aimed to contribute to the field of sentiment analysis in the context of the hospitality industry. Sentiment analysis allows us to classify the sentiment expressed in textual data, such as reviews, into positive, negative, or neutral categories. This analysis can help hotel management and stakeholders understand customer sentiments, identify common patterns, and address concerns or issues that may affect the reputation and customer satisfaction of OYO Hotels. The dataset provides a valuable resource for training and evaluating sentiment analysis models specifically tailored to the hospitality domain. Researchers, data scientists, and practitioners can utilize this dataset to develop and test various machine learning and natural language processing techniques for sentiment analysis, such as classification algorithms, sentiment lexicons, or deep learning models. Overall, the goal of creating the OYO Review Dataset for sentiment analysis was to facilitate research and analysis in the area of customer sentiments and opinions in the hotel industry. By understanding the sentiment of hotel reviews, businesses can strive to improve their services, enhance customer satisfaction, and make data-driven decisions to elevate the overall guest experience.
Deep Patel: https://www.linkedin.com/in/deep-patel-55ab48199/ Nikki Patel: https://www.linkedin.com/in/nikipatel9/ Nimil lathiya: https://www.linkedin.com/in/nimil-lathiya-059a281b1/