59 datasets found
  1. Datasets used in the study: TripAdvisor and Yelp review data, tweets related...

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    zip
    Updated May 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    INNOCENSIA OWUOR (2023). Datasets used in the study: TripAdvisor and Yelp review data, tweets related to points of interest in Florida and New York. [Dataset]. http://doi.org/10.6084/m9.figshare.22766654.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 4, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    INNOCENSIA OWUOR
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Florida, New York
    Description

    Contains TripAdvisor and Yelp review data, and tweets related to points of interest in Florida and New York. twitter, yelp, Florida, New York, data mining

  2. e

    Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery -...

    • exaly.com
    csv, json
    Updated Nov 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery - impact-factor [Dataset]. https://exaly.com/journal/29473/wiley-interdisciplinary-reviews-data-mining-and-knowledge-discovery
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Nov 1, 2025
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    The graph shows the changes in the impact factor of ^ and its corresponding percentile for the sake of comparison with the entire literature. Impact Factor is the most common scientometric index, which is defined by the number of citations of papers in two preceding years divided by the number of papers published in those years.

  3. Yelp Fake Review Dataset

    • kaggle.com
    zip
    Updated Nov 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Virenraina (2025). Yelp Fake Review Dataset [Dataset]. https://www.kaggle.com/datasets/virenraina/yelp-fake-review-dataset
    Explore at:
    zip(3356313 bytes)Available download formats
    Dataset updated
    Nov 6, 2025
    Authors
    Virenraina
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains Yelp reviews labeled for fake review detection using opinion mining techniques. It is designed for training machine learning models to classify reviews as genuine or fake/spam.​

    Dataset Contents: - Review text content from Yelp platform - Labels: Genuine (0) and Fake (1) classifications - Metadata: Reviewer information, ratings, timestamps - Product/business information - Sentiment indicators​

    Use Cases: - Train supervised machine learning models for fake review detection - Perform sentiment analysis and opinion mining - Text classification and NLP research - Spam detection systems - E-commerce fraud prevention​

    Recommended Algorithms: - Naive Bayes (Bernoulli, Multinomial) - Support Vector Machines (SVM/LinearSVC) - Logistic Regression - Random Forest - LSTM/RNN for deep learning approaches​

    Preprocessing Required: - Text cleaning (remove stop words, punctuation) - Tokenization - TF-IDF or word embedding feature extraction - Train-test split (recommended 70-30)​

  4. Data from: Evaluation of classification techniques for identifying fake...

    • scielo.figshare.com
    jpeg
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrey Schmidt dos Santos; Luis Felipe Riehs Camargo; Daniel Pacheco Lacerda (2023). Evaluation of classification techniques for identifying fake reviews about products and services on the internet [Dataset]. http://doi.org/10.6084/m9.figshare.14283143.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Andrey Schmidt dos Santos; Luis Felipe Riehs Camargo; Daniel Pacheco Lacerda
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract: With the e-commerce growth, more people are buying products over the internet. To increase customer satisfaction, merchants provide spaces for product and service reviews. Products with positive reviews attract customers, while products with negative reviews lose customers. Following this idea, some individuals and corporations write fake reviews to promote their products and services or defame their competitors. The difficulty for finding these reviews was in the large amount of information available. One solution is to use data mining techniques and tools, such as the classification function. Exploring this situation, the present work evaluates classification techniques to identify fake reviews about products and services on the Internet. The research also presents a literature systematic review on fake reviews. The research used 8 classification algorithms. The algorithms were trained and tested with a hotels database. The CONCENSO algorithm presented the best result, with 88% in the precision indicator. After the first test, the algorithms classified reviews on another hotels database. To compare the results of this new classification, the Review Skeptic algorithm was used. The SVM and GLMNET algorithms presented the highest convergence with the Review Skeptic algorithm, classifying 83% of reviews with the same result. The research contributes by demonstrating the algorithms ability to understand consumers’ real reviews to products and services on the Internet. Another contribution is to be the pioneer in the investigation of fake reviews in Brazil and in production engineering.

  5. e

    List of Top Disciplines of Wiley Interdisciplinary Reviews: Data Mining and...

    • exaly.com
    csv, json
    Updated Nov 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). List of Top Disciplines of Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery sorted by citations [Dataset]. https://exaly.com/journal/29473/wiley-interdisciplinary-reviews-data-mining-and-/discipline-ranks
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Nov 1, 2025
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    List of Top Disciplines of Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery sorted by citations.

  6. Multi-aspect Reviews

    • kaggle.com
    zip
    Updated Oct 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmad (2023). Multi-aspect Reviews [Dataset]. https://www.kaggle.com/datasets/pypiahmad/multi-aspect-reviews
    Explore at:
    zip(875907419 bytes)Available download formats
    Dataset updated
    Oct 30, 2023
    Authors
    Ahmad
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The Multi-aspect Reviews dataset primarily encompasses beer review data from RateBeer and BeerAdvocate, with a focus on multiple rated dimensions providing a comprehensive insight into sensory aspects such as taste, look, feel, and smell. This dataset facilitates the analysis of different facets of reviews, thus aiding in a deeper understanding of user preferences and product characteristics.

    Basic Statistics: - RateBeer - Number of users: 40,213 - Number of items: 110,419 - Number of ratings/reviews: 2,855,232 - Timespan: Apr 2000 - Nov 2011

    • BeerAdvocate
      • Number of users: 33,387
      • Number of items: 66,051
      • Number of ratings/reviews: 1,586,259
      • Timespan: Jan 1998 - Nov 2011

    Metadata: - Reviews: Textual reviews provided by users. - Aspect-specific ratings: Ratings on taste, look, feel, smell, and overall impression. - Product Category: Categories of beer products. - ABV (Alcohol By Volume): Indicates the alcohol content in the beer.

    Examples: - RateBeer Example json { "beer/name": "John Harvards Simcoe IPA", "beer/beerId": "63836", "beer/brewerId": "8481", "beer/ABV": "5.4", "beer/style": "India Pale Ale (IPA)", "review/appearance": "4/5", "review/aroma": "6/10", "review/palate": "3/5", "review/taste": "6/10", "review/overall": "13/20", "review/time": "1157587200", "review/profileName": "hopdog", "review/text": "On tap at the Springfield, PA location. Poured a deep and cloudy orange (almost a copper) color with a small sized off white head. Aromas or oranges and all around citric. Tastes of oranges, light caramel and a very light grapefruit finish. I too would not believe the 80+ IBUs - I found this one to have a very light bitterness with a medium sweetness to it. Light lacing left on the glass." }

    Download Links: - BeerAdvocate Data - RateBeer Data - Sentences with aspect labels (annotator 1) - Sentences with aspect labels (annotator 2)

    Citations: - Learning attitudes and attributes from multi-aspect reviews, Julian McAuley, Jure Leskovec, Dan Jurafsky, International Conference on Data Mining (ICDM), 2012. pdf - From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews, Julian McAuley, Jure Leskovec, WWW, 2013. pdf

    Use Cases: 1. Aspect-Based Sentiment Analysis (ABSA): Analyzing sentiments on different aspects of beers like taste, look, feel, and smell to gain deeper insights into user preferences and opinions. 2. Recommendation Systems: Developing personalized recommendation systems that consider multiple aspects of user preferences. 3. Product Development: Utilizing the feedback on various aspects to improve the product. 4. Consumer Behavior Analysis: Studying how different aspects influence consumer choice and satisfaction. 5. Competitor Analysis: Comparing ratings on different aspects with competitors to identify strengths and weaknesses. 6. Trend Analysis: Identifying trends in consumer preferences over time across different aspects. 7. Marketing Strategies: Formulating marketing strategies based on insights drawn from aspect-based reviews. 8. Natural Language Processing (NLP): Developing and enhancing NLP models to understand and categorize multi-aspect reviews. 9. Learning User Expertise Evolution: Studying how user expertise evolves through reviews and ratings over time. 10. Training Machine Learning Models: Training supervised learning models to predict aspect-based ratings from review text.

    This dataset is extremely valuable for researchers, marketers, product developers, and machine learning practitioners looking to delve into multi-dimensional review analysis and understand user-product interaction on a granular level.

  7. Curated Email-Based Code Reviews Datasets

    • figshare.com
    bin
    Updated Feb 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mingzhao Liang; Ping Charoenwet; Patanamon Thongtanunam (2024). Curated Email-Based Code Reviews Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.24679656.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 7, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Mingzhao Liang; Ping Charoenwet; Patanamon Thongtanunam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Code review is an important practice that improves the overall quality of a proposed patch (i.e. code changes). While much research focused on tool-based code reviews (e.g. a Gerrit code review tool, GitHub), many traditional open-source software (OSS) projects still conduct code reviews through emails. However, due to the nature of unstructured email-based data, it can be challenging to mine email-based code reviews, hindering researchers from delving into the code review practice of such long-standing OSS projects. Therefore, this paper presents large-scale datasets of email-based code reviews of 167 projects across three OSS communities (i.e. Linux Kernel, OzLabs, and FFmpeg). We mined the data from Patchwork, a web-based patch-tracking system for email-based code review, and curated the data by grouping a submitted patch and its revised versions and grouping email aliases. Our datasets include a total of 4.2M patches with 2.1M patch groups and 169K email addresses belonging to 141K individuals. Our published artefacts include the datasets as well as a tool suite to crawl, curate, and store Patchwork data. With our datasets, future work can directly delve into an email-based code review practice of large OSS projects without additional effort in data collection and curation.

  8. m

    Bias-Free Dataset of Food Delivery App Reviews with Data Poisoning Attacks

    • data.mendeley.com
    Updated Oct 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hyunggu Jung (2023). Bias-Free Dataset of Food Delivery App Reviews with Data Poisoning Attacks [Dataset]. http://doi.org/10.17632/rnyrpzyw3h.1
    Explore at:
    Dataset updated
    Oct 13, 2023
    Authors
    Hyunggu Jung
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset consists of reviews collected from restaurants on a Korean delivery app platform running a review event. A total of 128,668 reviews were collected from 136 restaurants by crawling reviews using the Selenium library in Python. The 136 chosen restaurants run review events which demand customers to write reviews with 5 stars and photos. So the annotation of data was done by considering 1) whether the review gives five-star ratings, and 2) whether the review contains photo(s).

  9. Motor Carrier Compliance Reviews and Safety Audits - Data Mining Tool

    • catalog.data.gov
    • data.transportation.gov
    Updated Jul 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federal Motor Carrier Safety Administration (2024). Motor Carrier Compliance Reviews and Safety Audits - Data Mining Tool [Dataset]. https://catalog.data.gov/dataset/motor-carrier-compliance-reviews-and-safety-audits-data-mining-tool-3f27b
    Explore at:
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Federal Motor Carrier Safety Administrationhttps://www.fmcsa.dot.gov/
    Description

    Contains data on compliance reviews and new entrant safety audits performed by FMCSA and State grantees.

  10. e

    List of Top Authors of Wiley Interdisciplinary Reviews: Data Mining and...

    • exaly.com
    csv, json
    Updated Nov 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). List of Top Authors of Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery sorted by article citations [Dataset]. https://exaly.com/journal/29473/wiley-interdisciplinary-reviews-data-mining-and-knowledge-discovery/top-authors/most-cited
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Nov 1, 2025
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    List of Top Authors of Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery sorted by article citations.

  11. Food Reviews - Text Mining & Sentiment Analysis

    • kaggle.com
    zip
    Updated Aug 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vikram amin (2023). Food Reviews - Text Mining & Sentiment Analysis [Dataset]. https://www.kaggle.com/datasets/vikramamin/food-reviews-text-mining-and-sentiment-analysis
    Explore at:
    zip(1075643 bytes)Available download formats
    Dataset updated
    Aug 4, 2023
    Authors
    vikram amin
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Brief Description: - The Chief Marketing Officer (CMO) of Healthy Foods Inc. wants to understand customer sentiments about the specialty foods that the company offers. This information has been collected through customer reviews on their website. Dataset consists of about 5000 reviews. They want the answers to the following questions: 1. What are the most frequently used words in the customer reviews? 2. How can the data be prepared for text analysis? 3. What are the overall sentiments towards the products?

    • We will be using text mining and sentiment analysis (R programming) to offer insights to the CMO with regards to the food reviews

    Steps: - Set the working directory and read the data. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fd7ec6c7460b58ae39c96d5431cca2d37%2FPicture1.png?generation=1691146783504075&alt=media" alt=""> - Data cleaning. Check for missing values and data types of variables - Run the required libraries ("tm", "SnowballC", "dplyr", "sentimentr", "wordcloud2", "RColorBrewer") - TEXT ACQUISITION and AGGREGATION. Create corpus. - TEXT PRE-PROCESSING. Cleaning the text - Replace special characters with " ". We use the tm_map function for this purpose - make all the alphabets lower case - remove punctuations - remove whitespace - remove stopwords - remove numbers - stem the document - create term document matrix https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F0508dfd5df9b1ed2885e1eea35b84f30%2FPicture2.png?generation=1691147153582115&alt=media" alt=""> - convert into matrix and find out frequency of words https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Febc729e81068856dec368667c5758995%2FPicture3.png?generation=1691147243385812&alt=media" alt=""> - convert into a data frame - TEXT EXPLORATION find out the words which appear most frequently and least frequently https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F33cf5decc039baf96dbe86dd6964792a%2FTop%205%20frequent%20words.jpeg?generation=1691147382783191&alt=media" alt=""> - Create Wordcloud

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F99f1147bd9e9a4e6bb35686b015fc714%2FWordCloud.png?generation=1691147502824379&alt=media" alt="">

    • TEXT MODELLING
    • Word association between two words which tend to appear more number of times. Here we try to find the association for the top three occurring words "like", "tast", "flavor" by setting a correlation limit of 0.2 https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fbfdbfbe28a30012f0e7ab54d6185c223%2FPicture4.png?generation=1691147754149529&alt=media" alt="">
    • "like" has an association with "realli" (they appear about 25% of the time together), dont (24%), one(21%)
    • "tast" does not have an association with any word with the set correlation limit
    • "flavor" has an association with the word "chip"(they appear about 27% of the time together)
    • Sentiment analysis https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fa5da1dd46a60494ec9b26fa1a08b2087%2FPicture5.png?generation=1691147897889137&alt=media" alt="">
    • element_id refers to the Review No and sentence_id refers to the Sentence No in the review , word_count refers to the number of words part of that sentence in that review. Sentiment would be either positive or negative.
    • Let us find out the overall sentiment score of all the reviews https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F6fce0e810d47ea8864ebac58eca1be99%2FPicture6.png?generation=1691148149575056&alt=media" alt="">
    • This indicates that the entire food review document has a marginally positive score
    • Let us find out the sentiment score for each of the 5000 reviews. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F5b7861d5ebc3881483dd65a8385a539c%2FPicture7.png?generation=1691148278877972&alt=media" alt="">
    • (-1) indicates the most extreme negative sentiment and (+1) indicates the most extreme positive sentiment
    • Let us create a separate data frame for all the negative sentiments. In total there are 726 negative sentiments out of the total 5000 reviews (approx 15%).
  12. s

    Data from: The Valuation of User-Generated Content: A Structural, Stylistic...

    • researchdata.smu.edu.sg
    mdb
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sian KOH Noi (2023). Data from: The Valuation of User-Generated Content: A Structural, Stylistic and Semantic Analysis of Online Reviews [Dataset]. http://doi.org/10.25440/smu.12062805.v1
    Explore at:
    mdbAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    SMU Research Data Repository (RDR)
    Authors
    Sian KOH Noi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This record contains the underlying research data for the publication "The Valuation of User-Generated Content: A Structural, Stylistic and Semantic Analysis of Online Reviews" and the full-text is available from: https://ink.library.smu.edu.sg/etd_coll/78The ability and ease for users to create and publish content has provided vast amount of online product reviews. However, the amount of data is overwhelmingly large and unstructured, making information difficult to quantify. This creates challenge in understanding how online reviews affect consumers’ purchase decisions. In my dissertation, I explore the structural, stylistic and semantic content of online reviews. Firstly, I present a measurement that quantifies sentiments with respect to a multi-point scale and conduct a systematic study on the impact of online reviews on product sales. Using the sentiment metrics generated, I estimate the weight that customers place on each segment of the review and examine how these segments affect the sales for a given product. The results empirically verified that sentiments influence sales, of which ratings alone do not capture. Secondly, I propose a method to detect online review manipulation using writing style analysis and assess how consumers respond to such manipulation. Finally, I find that societal norms have influence on posting behavior and significant differences do exist across cultures. Users should therefore exercise care in interpreting the information from online reviews. This dissertation advances our understanding on the consumer decision making process and shed insight on the relevance of online review ratings and sentiments over a sequential decision making process. Having tapped into the abundant supply of online review data, the results in this work are based on large-scale datasets which extend beyond the scale of traditional word-of-mouth research.

  13. m

    ShoppingAppReviews Dataset

    • data.mendeley.com
    • kaggle.com
    Updated Sep 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Noor Mairukh Khan Arnob (2024). ShoppingAppReviews Dataset [Dataset]. http://doi.org/10.17632/chr5b94c6y.2
    Explore at:
    Dataset updated
    Sep 16, 2024
    Authors
    Noor Mairukh Khan Arnob
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    A dataset consisting of 751,500 English app reviews of 12 online shopping apps. The dataset was scraped from the internet using a python script. This ShoppingAppReviews dataset contains app reviews of the 12 most popular online shopping android apps: Alibaba, Aliexpress, Amazon, Daraz, eBay, Flipcart, Lazada, Meesho, Myntra, Shein, Snapdeal and Walmart. Each review entry contains many metadata like review score, thumbsupcount, review posting time, reply content etc. The dataset is organized in a zip file, under which there are 12 json files and 12 csv files for 12 online shopping apps. This dataset can be used to obtain valuable information about customers' feedback regarding their user experience of these financially important apps.

  14. Datasheet1_Retrospective content analysis of consumer product reviews...

    • frontiersin.figshare.com
    txt
    Updated Jun 4, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jungwei W. Fan; Wanjing Wang; Ming Huang; Hongfang Liu; W. Michael Hooten (2023). Datasheet1_Retrospective content analysis of consumer product reviews related to chronic pain.csv [Dataset]. http://doi.org/10.3389/fdgth.2023.958338.s001
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Jungwei W. Fan; Wanjing Wang; Ming Huang; Hongfang Liu; W. Michael Hooten
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Chronic pain (CP) lasts for more than 3 months, causing prolonged physical and mental burdens to patients. According to the US Centers for Disease Control and Prevention, CP contributes to more than 500 billion US dollars yearly in direct medical cost plus the associated productivity loss. CP is complex in etiology and can occur anywhere in the body, making it difficult to treat and manage. There is a pressing need for research to better summarize the common health issues faced by consumers living with CP and their experience in accessing over-the-counter analgesics or therapeutic devices. Modern online shopping platforms offer a broad array of opportunities for the secondary use of consumer-generated data in CP research. In this study, we performed an exploratory data mining study that analyzed CP-related Amazon product reviews. Our descriptive analyses characterized the review language, the reviewed products, the representative topics, and the network of comorbidities mentioned in the reviews. The results indicated that most of the reviews were concise yet rich in terms of representing the various health issues faced by people with CP. Despite the noise in the online reviews, we see potential in leveraging the data to capture certain consumer-reported outcomes or to identify shortcomings of the available products.

  15. e

    List of Top Schools of Wiley Interdisciplinary Reviews: Data Mining and...

    • exaly.com
    csv, json
    Updated Nov 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). List of Top Schools of Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery sorted by citations [Dataset]. https://exaly.com/journal/29473/wiley-interdisciplinary-reviews-data-mining-and-/top-schools
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Nov 1, 2025
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    List of Top Schools of Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery sorted by citations.

  16. Yelp Aspect Based Opinion Mining

    • kaggle.com
    zip
    Updated Dec 7, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rafay (2019). Yelp Aspect Based Opinion Mining [Dataset]. https://www.kaggle.com/rafay12/yelp-aspect-based-opinion-mining
    Explore at:
    zip(175861 bytes)Available download formats
    Dataset updated
    Dec 7, 2019
    Authors
    Rafay
    Description

    Dataset

    This dataset was created by Rafay

    Contents

  17. Data and Model Checkpoints for "Weakly Supervised Concept Map Generation...

    • figshare.com
    application/x-gzip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jiaying Lu (2023). Data and Model Checkpoints for "Weakly Supervised Concept Map Generation through Task-Guided Graph Translation" [Dataset]. http://doi.org/10.6084/m9.figshare.16415802.v2
    Explore at:
    application/x-gzipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Jiaying Lu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data and model checkpoints for paper "Weakly Supervised Concept Map Generation through Task-Guided Graph Translation" by Jiaying Lu, Xiangjue Dong, and Carl Yang. The paper has been accepted by IEEE Transactions on Knowledge and Data Engineering (TKDE).

    GT-D2G-*.tar.gz are model checkpoints for GT-D2G variants. These models are trained by seed=27. nyt/dblp/yelp.*.win5.pickle.gz are initial graphs generated by NLP pipelines. glove.840B.restaurant.400d.vec.gz is the pre-trained embedding for the Yelp dataset.

    For more instructions, please refer to our GitHub repo.

  18. Booking.com USA Hotel Reviews Dataset

    • crawlfeeds.com
    csv, zip
    Updated Oct 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Booking.com USA Hotel Reviews Dataset [Dataset]. https://crawlfeeds.com/datasets/booking-com-usa-hotel-reviews-dataset
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Oct 6, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Area covered
    USA
    Description

    This comprehensive dataset offers a rich collection of over 5 million customer reviews for hotels and accommodations listed on Booking.com, specifically sourced from the United States. It provides invaluable insights into guest experiences, preferences, and sentiment across various properties and locations within the USA. This dataset is ideal for market research, sentiment analysis, hospitality trend identification, and building advanced recommendation systems.

    Key Features:

    • Geographic Focus: Exclusively reviews from properties located in the USA.
    • Comprehensive Coverage: Includes a wide range of hotel types and sizes across different states and cities in the US, covering reviews from January 2020 to June 2025.
    • Rich Detail: Each record provides detailed review information, allowing for in-depth analysis.
    • Structured Format: Clean, organized, and ready for immediate use in various analytical tools and platforms.

    Dive into a sample of 1,000+ records to experience the dataset's quality. For full access to this comprehensive data, submit your request at Booking reviews data.

    Use Cases:

    • Market Research: Gain insights into customer preferences and satisfaction in the US hospitality sector.
    • Sentiment Analysis: Analyze the emotional tone of reviews to gauge customer sentiment towards hotels and services.
    • Competitor Analysis: Benchmark hotel performance and identify areas for improvement against competitors.
    • Trend Identification: Discover emerging trends in hotel amenities, service expectations, and guest behavior in the US.
    • Recommendation Systems: Develop and train models to recommend hotels based on user preferences and review data.
    • Natural Language Processing (NLP): Create and refine NLP models for text summarization, topic modeling, and opinion mining.
    • Academic Research: Support studies on tourism, consumer behavior, and data science applications in hospitality.

  19. E

    Data from: Implicit aspect-based opinion mining and analysis of airline...

    • live.european-language-grid.eu
    csv
    Updated May 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Implicit aspect-based opinion mining and analysis of airline industry based on user generated reviews [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7665
    Explore at:
    csvAvailable download formats
    Dataset updated
    May 29, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mining opinions from reviews has been a field of ever-growing research. These include mining opinions on document level, sentence-level, and even aspect level of a review. While explicitly mentioned aspects in a review have been widely researched, very little work has been done in gathering opinions on aspects that are implied and not explicitly mentioned. E.g. “the flight was spacious and there was plenty of legroom”. This gives an opinion on the entities of the cabin and seat of an airline. Words like “spacious” and phrases like “plenty of legroom” help identify these implied entities and the opinions attached to them. Not much research has been done for gathering such implicit aspects and opinions for airline reviews. The present dataset is a manually annotated domain-specific aspect-based corpus that helps a study to extract and analyze opinions about such implied aspects and entities of airlines.

  20. Z

    Data from: Argument Mining Driven Analysis of Peer-Reviews Dataset

    • data.niaid.nih.gov
    Updated Dec 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fromm; Berrendorf; Faerman; Seidl (2020). Argument Mining Driven Analysis of Peer-Reviews Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4314389
    Explore at:
    Dataset updated
    Dec 10, 2020
    Dataset provided by
    LMU Munich
    Authors
    Fromm; Berrendorf; Faerman; Seidl
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Argument Mining in Scientific Reviews (AMSR)

    We release a new dataset of peer-reviews from different computer science conferences with annotated arguments, called AMSR (Argument Mining in Scientific Reviews).

    The dataset has been crawled by the OpenReview platform (https://openreview.net/) and the OpenReviewCrawler (https://openreview-py.readthedocs.io/en/latest/getting data.html)

    From 12,135 collected papers and reviews, we sample 77 for the annotation. We use a simple argumentation scheme, which distinguishes between non-arguments, supporting arguments, and attacking arguments, which we denote as NON/PRO/CON accordingly.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
INNOCENSIA OWUOR (2023). Datasets used in the study: TripAdvisor and Yelp review data, tweets related to points of interest in Florida and New York. [Dataset]. http://doi.org/10.6084/m9.figshare.22766654.v1
Organization logoOrganization logo

Datasets used in the study: TripAdvisor and Yelp review data, tweets related to points of interest in Florida and New York.

Explore at:
zipAvailable download formats
Dataset updated
May 4, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
INNOCENSIA OWUOR
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered
Florida, New York
Description

Contains TripAdvisor and Yelp review data, and tweets related to points of interest in Florida and New York. twitter, yelp, Florida, New York, data mining

Search
Clear search
Close search
Google apps
Main menu