Facebook
TwitterBetween ************** and ***********, Google recorded the highest increase in the number of online reviews on restaurants in Italy compared to the reviews published on the platform during the previous 12 months. According to the data, the number of online reviews of restaurants published on Google increased by ** percent. Conversely, the number of online reviews on restaurants published on Tripadvisor decreased by ** percent compared to the previous 12 months.
Facebook
TwitterThis statistic presents the information that is considered the most helpful in product reviews according to internet users in the United States as of September 2018. According to the findings, 60 percent of respondents stated that information regarding product performance was considered the most helpful when reading reviews, while in comparison 55 percent of respondents reported that purchaser satisfaction was considered to be most useful for them.
Facebook
TwitterThis Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:
More reviews:
New reviews:
Metadata: - We have added transaction metadata for each review shown on the review page.
If you publish articles based on this dataset, please cite the following paper:
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset provides detailed information on links, papers, and peer reviews from the International Conference on Learning Representations (ICLR) for the years 2018 through 2023. The dataset can be used to replicate experiments or conduct new analyses on scientific reviews and decisions from OpenReview.
Content overview: - iclr_{year}_links.csv: Contains the IDs and links to the articles on OpenReview. - iclr_{year}_papers.csv: Includes the article IDs, titles, and forum identifiers (Forum) on OpenReview. - iclr_{year}_reviews.csv: Provides review data, including: - Forum: The article's unique identifier. - Type: The type of review content (e.g., title, comment, decision, rating). - Content: The text associated with each type.
Facebook
TwitterAccording to 2018 and 2019 industry data, review volume is a deciding factor in terms of business conversion rates. The only notable exceptions are the food & beverage and the retail industry. On average, businesses in the food and beverage segment had *** reviews but only require ** reviews to achieve maximum growth rate. In contrast, service & B2B companies had an average review count of ** and needed ** reviews to achieve maximum growth rate.
Facebook
TwitterA survey between October 2018 and December 2019 found that 37.3 percent of Canadians felt that online reviews had a major influenced on their shopping decisions. However, approximately 31 percent of survey respondents disagreed that online reviews played a major role in their shopping decisions.
Facebook
TwitterAustin Code Department's 2018 Annual Report
Facebook
TwitterI use daily prices collected from online retailers in five countries to study the impact of measurement bias on three common price stickiness statistics. Relative to previous results, I find that online prices have longer durations, with fewer price changes close to zero, and hazard functions that initially increase over time. I show that time-averaging and imputed prices in scanner and CPI data can fully explain the differences with the literature. I then report summary statistics for the duration and size of price changes using scraped data collected from 181 retailers in 31 countries.
Facebook
TwitterDataset is a subset of Amazon Review 2018 dataset. Data used in this project includes reviews for category Electronics. These data have been reduced to extract the 5-core, such that each of the remaining users and items have 5 reviews each. Only part of the data was left.
Includes reviews and corresponding ratings. Columns are following:
Source: https://nijianmo.github.io/amazon/index.html
Description: This Dataset is an updated version of the Amazon review dataset released in 2014.
As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes),
product metadata (descriptions, category information, price, brand, and image features), and
links (also viewed/also bought graphs).
Justifying recommendations using distantly-labeled reviews and fined-grained aspects
Jianmo Ni, Jiacheng Li, Julian McAuley
Empirical Methods in Natural Language Processing (EMNLP), 2019
Dataset with reviews and coresponding ratings from 1 to 5 can be used for Sentiment Analysis and other NLP tasks.
Facebook
TwitterA list of schools receiving Quality Reviews during the 2018-19 school year
Facebook
TwitterThis statistic displays the proportion of people reading online reviews or blogs in Australia from 2011 to 2018. In 2018, ** percent of respondents stated they read online reviews and blogs.
Facebook
TwitterThis dataset was collected from an open-source Amazon reviews made available by Jianmo Ni
The data was originally in JSON, and divided into metadata and reviews. I converted the data into data frame and then join both the metadata and the reviews, before converting it to CSV file. No further process was done afterwards.
This dataset contains full reviews from Amazon in 2018, consists of 500000+ reviews from 100000+ users. The columns are pretty much self-explanatory, such as userName, itemName, rating, reviewText, etc
This dataset can be used to build a recommender system, since it has the user-item-rating information. This can also be used for NLP tasks, using the reviewText column.
Facebook
TwitterThis statistic displays the proportion of people reading online reviews or blogs in Australia in 2018, by age. That year, ** percent of respondents aged between ** to ** stated they read online reviews and blogs.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Goodreads Book Reviews dataset encapsulates a wealth of reviews and various attributes concerning the books listed on the Goodreads platform. A distinguishing feature of this dataset is its capture of multiple tiers of user interaction, ranging from adding a book to a "shelf", to rating and reading it. This dataset is a treasure trove for those interested in understanding user behavior, book recommendations, sentiment analysis, and the interplay between various attributes of books and user interactions.
Basic Statistics: - Items: 1,561,465 - Users: 808,749 - Interactions: 225,394,930
Metadata: - Reviews: The text of the reviews provided by users. - Add-to-shelf, Read, Review Actions: Various interactions users have with the books. - Book Attributes: Attributes describing the books including title, and ISBN. - Graph of Similar Books: A graph depicting similarity relations between books.
Example (interaction data):
json
{
"user_id": "8842281e1d1347389f2ab93d60773d4d",
"book_id": "130580",
"review_id": "330f9c153c8d3347eb914c06b89c94da",
"isRead": true,
"rating": 4,
"date_added": "Mon Aug 01 13:41:57 -0700 2011",
"date_updated": "Mon Aug 01 13:42:41 -0700 2011",
"read_at": "Fri Jan 01 00:00:00 -0800 1988",
"started_at": ""
}
Use Cases: - Book Recommendations: Creating personalized book recommendations based on user interactions and preferences. - Sentiment Analysis: Analyzing sentiment in reviews and understanding how different book attributes influence sentiment. - User Behavior Analysis: Understanding user interaction patterns with books and deriving insights to enhance user engagement. - Natural Language Processing: Training models to process and analyze user-generated text in reviews. - Similarity Analysis: Analyzing the graph of similar books to understand book similarities and clustering.
Citation:
Please cite the following if you use the data:
Item recommendation on monotonic behavior chains
Mengting Wan, Julian McAuley
RecSys, 2018
[PDF](https://cseweb.ucsd.edu/~jmcauley/pdfs/recsys18e.pdf)
Code Samples: A curated set of code samples is provided in the dataset's Github repository, aiding in seamless interaction with the datasets. These include: - Downloading datasets without GUI: Facilitating dataset download in a non-GUI environment. - Displaying Sample Records: Showcasing sample records to get a glimpse of the dataset structure. - Calculating Basic Statistics: Computing basic statistics to understand the dataset's distribution and characteristics. - Exploring the Interaction Data: Delving into interaction data to grasp user-book interaction patterns. - Exploring the Review Data: Analyzing review data to extract valuable insights from user reviews.
Additional Dataset: - Complete book reviews (~15m multilingual reviews about ~2m books and 465k users): This dataset comprises a comprehensive collection of reviews, showcasing a multilingual facet with reviews about around 2 million books from 465,000 users.
Datasets:
Facebook
TwitterAbout the Dataset This dataset contains: A list of School Food Authorities (SFAs) that have recently undergone Administrative Reviews with TDA, including: Types of school nutrition program operated Special provision programs utilized Whether or not there were Findings This report can be found on SquareMeals Compliance for NSLP for the current program year and will be posted to ODP within three months after the end of the program year.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Historical Dataset of Saca Online is provided by PublicSchoolReview and contain statistics on metrics:Total Students Trends Over Years (2017-2023),Distribution of Students By Grade Trends,American Indian Student Percentage Comparison Over Years (2017-2018),Hispanic Student Percentage Comparison Over Years (2017-2023),Black Student Percentage Comparison Over Years (2021-2023),White Student Percentage Comparison Over Years (2017-2023),Two or More Races Student Percentage Comparison Over Years (2019-2020),Diversity Score Comparison Over Years (2017-2023),Graduation Rate Comparison Over Years (2018-2021)
Facebook
Twitterhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
(Disclaimer: Description update and experiments are still in progress).
The dataset was collected for the research related to Stanford NLU Course, and specifically with the goal to compare the performance of deep-learning transfer models like BERT, GPT-2, ROBERTA and more classic type of models that were previously used for the Sentiment Analysis. The paper could be found here: Leveraging BERT for Multi-Dimensional Sentiment Analysis of Employee Reviews
Most of the publicly available datasets for the Social Science-related tasks contain mainly statistical data (numeric values or Boolean-type gradation for a particular characteristic). Formal employee reviews usually contain sensitive data that can only be shared with either direct manager or senior members of the company. Such data cannot leave the premises of the company, and thus had to be generated or collected explicitly.
Data was collected with the help of Amazon MTurk Workers. A custom task was created in order to make sure that there is a good level of variability and quality of data. The task instructions looked as follows: > In this task you’re asked to generate a free-form review for your imaginary colleague. The review should assess employee’s performance for the last quarter by one of ”9-box” categories below. Note: Review should be in English and not shorter than 4 sentences. Avoid using word combinations from the given Category as is (i.e. phrases containing words ”performance” and ”potential”).
And categories can be visualized like this:
https://performanceculture.com/wp-content/uploads/2018/11/9-box-https.png%20=350x350" alt="9 Box Performance and Potential Model">
One of the goals of the experiment is to validate how good performance of DL models is on a pretty raw data (without significant cleansing and review), thus current version of Train/Validation dataset is provided, where about 70% of records still need to be reviewed for consistency (i.e. does the feedback actually match provided class).
There are 2 main datasets (which do not overlap): 1. Core dataset that was used for Training and Evaluation (partially reviewed, unbalanced distribution of classes) 2. Test dataset (225 records, all reviewed, 25 for each class)
(train, validation) and test - are enriched datasets that were used in experiments, but using data from 2 main datasets respectively. Train/Validation were obtained via stratified split. Code can be found here: https://github.com/fryzhykau/BERT-employee-reviews-analysis
Main columns: - id - unique identifier of the record - person_name - imaginary employee name, for which feedback was given - nine_box _category - human-readable 9-box category - feedback - the actual review on the employee - updated or adjusted - whether original category provided by MTurk employee was updated to properly match with the feedback (to sustain high degree of consistency) - reviewed - flag that says whether this record was thoroughly reviewed or not with another pair of eyes
Additional columns: - label - 0-based nine_box_category id - feedback_len - length of the feedback - num_of_sent - number of sentences in feedback - performance_class - 0-based performance class id - potential_class - 0-based potential class id - feedback_clean - pre-processed feedback value
Big acknowledgement to my wife Hanna for persistently reviewing the data with me to validate the judgement and achieve the highest level of consistency and quality.
I hope this data will help inspire additional ideas in the area of Social Science, and would trigger more "personal-like" data available for a realistic research.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This is a list of 1,000 hotels and their reviews provided by Datafiniti's Business Database. The dataset includes hotel location, name, rating, review data, title, username, and more.
Note that this is a sample of a large dataset. The full dataset is available through Datafiniti.
You can use this data to compare hotel reviews on a state-by-state basis; experiment with sentiment scoring and other natural language processing techniques. The review data lets you correlate keywords in the review text with ratings. E.g.:
A full schema for the data is available in our support documentation.
Datafiniti provides instant access to web data. We compile data from thousands of websites to create standardized databases of business, product, and property information. Learn more.
You can access the full dataset by running the following query with Datafiniti’s Business API.
{
"query": "dateUpdated:[2018-01-01 TO *] AND categories:(Hotel OR Hotels) AND country:US* AND name:* AND reviews:* AND sourceURLs:*", "format": "csv", "download": true
}
**The total number of results may vary.*
Get this data and more by creating a free Datafiniti account or requesting a demo.
Facebook
TwitterAt the end of 2021, a total of 244 million reviews had been submitted to the local business review and recommendation site Yelp, representing a nine percent year-on-year increase from the 224 million reviews at the end of the previous year.
Facebook
TwitterBetween ************** and ***********, Google recorded the highest increase in the number of online reviews on restaurants in Italy compared to the reviews published on the platform during the previous 12 months. According to the data, the number of online reviews of restaurants published on Google increased by ** percent. Conversely, the number of online reviews on restaurants published on Tripadvisor decreased by ** percent compared to the previous 12 months.