16 datasets found

Women's E-Commerce Clothing Reviews
kaggle.com
zip
Updated Feb 3, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nicapotato (2018). Women's E-Commerce Clothing Reviews [Dataset]. https://www.kaggle.com/nicapotato/womens-ecommerce-clothing-reviews
Explore at:
zip(2924120 bytes)Available download formats
Dataset updated
Feb 3, 2018
Authors
nicapotato
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Welcome. This is a Women’s Clothing E-Commerce dataset revolving around the reviews written by customers. Its nine supportive features offer a great environment to parse out the text through its multiple dimensions. Because this is real commercial data, it has been anonymized, and references to the company in the review text and body have been replaced with “retailer”.

Content

This dataset includes 23486 rows and 10 feature variables. Each row corresponds to a customer review, and includes the variables:

Clothing ID: Integer Categorical variable that refers to the specific piece being reviewed.

Age: Positive Integer variable of the reviewers age.

Title: String variable for the title of the review.

Review Text: String variable for the review body.

Rating: Positive Ordinal Integer variable for the product score granted by the customer from 1 Worst, to 5 Best.

Recommended IND: Binary variable stating where the customer recommends the product where 1 is recommended, 0 is not recommended.

Positive Feedback Count: Positive Integer documenting the number of other customers who found this review positive.

Division Name: Categorical name of the product high level division.

Department Name: Categorical name of the product department name.

Class Name: Categorical name of the product class name.

Acknowledgements

Anonymous but real source

Inspiration

I look forward to come quality NLP! There is also some great opportunities for feature engineering, and multivariate analysis.

Publications

Statistical Analysis on E-Commerce Reviews, with Sentiment Classification using Bidirectional Recurrent Neural Network
by Abien Fred Agarap - Github
E
MultiEmo: Multilingual, Multilevel, Multidomain Sentiment Analysis Corpus of...
live.european-language-grid.eu
binary format
Updated Feb 28, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). MultiEmo: Multilingual, Multilevel, Multidomain Sentiment Analysis Corpus of Consumer Reviews [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/8680
Explore at:
binary formatAvailable download formats
Dataset updated
Feb 28, 2021
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
MultiEmo, a new benchmark data set for the multilingual sentiment analysis task including 11 languages. The collection contains consumer reviews from four domains: medicine, hotels, products and university. The original reviews in Polish contained 8,216 documents consisting of 57,466 sentences. The reviews were manually annotated with sentiment at the level of the whole document and at the level of a sentence (3 annotators per element). We achieved a high Positive Specific Agreement value of 0.91 for texts and 0.88 for sentences. The collection was then translated automatically into English, Chinese, Italian, Japanese, Russian, German, Spanish, French, Dutch and Portuguese. MultiEmo is publicly available under a Creative Commons Attribution 4.0 International Licence.

More information: https://github.com/CLARIN-PL/multiemo
Amazon Reviews for Sentiment Analysis
kaggle.com
zip
Updated Nov 18, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adam Bittlingmayer (2019). Amazon Reviews for Sentiment Analysis [Dataset]. https://www.kaggle.com/bittlingmayer/amazonreviews
Explore at:
zip(517080965 bytes)Available download formats
Dataset updated
Nov 18, 2019
Authors
Adam Bittlingmayer
Description
This dataset consists of a few million Amazon customer reviews (input text) and star ratings (output labels) for learning how to train fastText for sentiment analysis.

The idea here is a dataset is more than a toy - real business data on a reasonable scale - but can be trained in minutes on a modest laptop.

Content

The fastText supervised learning tutorial requires data in the following format:

_label_
Z
Dataset for sentiment analysis in Spanish
data.niaid.nih.gov
zenodo.org
Updated Apr 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fran Ramírez (2022). Dataset for sentiment analysis in Spanish [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6425686
Explore at:
Dataset updated
Apr 9, 2022
Dataset authored and provided by
Fran Ramírez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is automatically generated by webscraping from sites such as Tripadvisor or Google Maps reviews. In these sites, the users post comments with ratings, allowing us to have tagged data. The code that generated this dataset can be found at the following URL:

https://github.com/fjramirezv/sentiment-webscraping
Dataset - How do you propose your code changes? Empirical Analysis of Affect...
zenodo.org
data.niaid.nih.gov
txt, zip
Updated May 13, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marco Ortu; Giuseppe Destefanis; Daniel Graziotin; Daniel Graziotin; Michele Marchesi; Marco Tonelli; Marco Ortu; Giuseppe Destefanis; Michele Marchesi; Marco Tonelli (2020). Dataset - How do you propose your code changes? Empirical Analysis of Affect Metrics of Pull Requests on GitHub [Dataset]. http://doi.org/10.5281/zenodo.3825044
Explore at:
txt, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3825044
Dataset updated
May 13, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Marco Ortu; Giuseppe Destefanis; Daniel Graziotin; Daniel Graziotin; Michele Marchesi; Marco Tonelli; Marco Ortu; Giuseppe Destefanis; Michele Marchesi; Marco Tonelli
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This package contains the raw open data for the study

Marco Ortu, Giuseppe Destefanis, Daniel Graziotin, Michele Marchesi, Roberto Tonelli. 2020. How do you propose your code changes? Empirical Analysis of Affect Metrics of Pull Requests on GitHub. Under Review.

The dataset is based on GHTorrent dataset:

Georgios Gousios. 2013. The GHTorent dataset and tool suite. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR ’13). IEEE Press, 233–236

And released with the same license (CC BY-SA 4.0).
Metacritic's Best Games and Reviews - 2025
kaggle.com
Updated Mar 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Davut Bayık (2025). Metacritic's Best Games and Reviews - 2025 [Dataset]. https://www.kaggle.com/datasets/davutb/metacritic-games
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 16, 2025
Dataset provided by
Kaggle
Authors
Davut Bayık
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
*Also find Metacritic Movies and Metacritic TV Shows datasets.*

Metacritic Games and Reviews Dataset

This dataset contains a collection of video games and their corresponding reviews from Metacritic, a popular aggregate review site. The data provides insights into various video games across different platforms, including PC, PlayStation, Xbox, and others. Each game entry includes critical reviews, user reviews, ratings, and other relevant information that can be used for analysis, natural language processing, machine learning, and predictive modeling.

Important Note: *The games in this collection are selected from Metacritic's Best Games of All Time list, which only includes titles that have received at least 7 reviews, ensuring a minimum level of critical and user input.*

Up-to-dateness: *This dataset is accurate as of March 14, 2025, and includes the most current rankings and game details available at that time.*

Content

The dataset contains general information and scores of 13K+ games and their corresponding 1.6M+ user/critic reviews collected by sending automated requests to Metacritic's public backend API using Python's requests and pandas libraries.

Potential Uses

Sentiment analysis and natural language processing on reviews.

Predictive modeling for predicting user ratings based on features like genre, publisher, and developer.

Data analysis on trends in game quality, genres, or platform performance.

Comparing critical reviews and user reviews to understand the divergence in ratings.

This dataset is perfect for researchers, game enthusiasts, and data scientists who are interested in exploring the gaming industry through data analysis.

Acknowledgements

The datasets are scraped from metacritic.com

The code for scraping the dataset can be found on my GitHub repository
h
ro_sent
huggingface.co
Updated Mar 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dumitrescu Stefan (2025). ro_sent [Dataset]. https://huggingface.co/datasets/dumitrescustefan/ro_sent
Explore at:
Dataset updated
Mar 28, 2025
Authors
Dumitrescu Stefan
License
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Description
This dataset is a Romanian Sentiment Analysis dataset. It is present in a processed form, as used by the authors of Romanian Transformers in their examples and based on the original data present in https://github.com/katakonst/sentiment-analysis-tensorflow. The original dataset is collected from product and movie reviews in Romanian.
A
‘Animal Crossing Reviews’ analyzed by Analyst-2
analyst-2.ai
Updated May 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘Animal Crossing Reviews’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-animal-crossing-reviews-f013/latest
Explore at:
Dataset updated
May 4, 2020
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Animal Crossing Reviews’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/jessemostipak/animal-crossing on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context and Content

The data this week comes from the VillagerDB and Metacritic. VillagerDB brings info about villagers, items, crafting, accessories, including links to their images. Metacritic brings user and critic reviews of the game (scores and raw text).

Per Wikipedia:

Animal Crossing: New Horizons is a 2020 life simulation video game developed and published by Nintendo for the Nintendo Switch. It is the fifth main series title in the Animal Crossing series. New Horizons was released in all regions on March 20, 2020.

New Horizons sees the player assuming the role of a customizable character who moves to a deserted island after purchasing a package from Tom Nook, a tanuki character who has appeared in every entry in the Animal Crossing series. Taking place in real-time, the player can explore the island in a nonlinear fashion, gathering and crafting items, catching insects and fish, and developing the island into a community of anthropomorphic animals.

Animal Crossing as explained by a Polygon opinion piece.

With just a few design twists, the work behind collecting hundreds or even thousands of items over weeks and months becomes an exercise of mindfulness, predictability, and agency that many players find soothing instead of annoying.

Games that feature gentle progression give us a sense of progress and achievability, teaching us that putting in a little work consistently while taking things one step at a time can give us some fantastic results. It’s a good life lesson, as well as a way to calm yourself and others, and it’s all achieved through game design.

Some potential context for user_reviews.tsv from 538 and a point of potential strife via Animal Crossing World, and lastly a spoiler article analyzing the reviews in R by Boon Tan.

PS there is an easter egg somewhere in the readme - something to do with... turnips.

Acknowledgements

The data was downloaded and cleaned by Thomas Mock for #TidyTuesday during the week of May 4th, 2020. You can see the code used to clean the data in the #TidyTuesday GitHub repository.

Inspiration

Potential Analyses:

Reviews: Sentiment analysis, text analysis, scores, date effect

Villagers/Items: Gender, species, sayings, personality, price, recipe, what about a star sign based off the birthday column?

--- Original source retains full ownership of the source dataset ---
g
Replication Data for: A Review of Best Practice Recommendations for...
datasearch.gesis.org
dataverse-staging.rdmc.unc.edu
Updated Jan 22, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wesslen, Ryan (2020). Replication Data for: A Review of Best Practice Recommendations for Text-Analysis in R (and a User Friendly App) [Dataset]. http://doi.org/10.15139/S3/R4W7ZS
Explore at:
Unique identifier
https://doi.org/10.15139/S3/R4W7ZS
Dataset updated
Jan 22, 2020
Dataset provided by
Odum Institute Dataverse Network
Authors
Wesslen, Ryan
Description
Replication materials for "A Review of Best Practice Recommendations for Text-Analysis in R (and a User Friendly App)". You can also find these materials on GitHub repo (https://github.com/wesslen/text-analysis-org-science) as well as the Shiny app in the GitHub repo (https://github.com/wesslen/topicApp).
Arabic 100k Reviews
kaggle.com
Updated Mar 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abed Khooli (2020). Arabic 100k Reviews [Dataset]. https://www.kaggle.com/abedkhooli/arabic-100k-reviews/kernels
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 7, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Abed Khooli
Description
Context

Few Arabic datasets are available for classification comparison and other NLP tasks. This dataset is mainly a compilation of several available datasets and a sampling of 100k rows (99999 to be exact).

Content

The dataset combines reviews from hotels, books, movies, products and a few airlines. It has three classes (Mixed, Negative and Positive). Most were mapped from reviewers' ratings with 3 being mixed, above 3 positive and below 3 negative. Each row has a label and text separated by a tab (tsv). Text (reviews) were cleaned by removing Arabic diacritics and non-Arabic characters. The dataset has no duplicate reviews.

Acknowledgements

The hotels and book reviews are a subset of HARD and BRAD. The rest were selected from hadyelsahar with a little over 100 airlines reviews collected manually.

Inspiration

Let's jump in and use your best tools to beat the SOTA! Don't forget to show and share your work.
e
NoReC: The Norwegian Review Corpus - Dataset - B2FIND
b2find.eudat.eu
Updated Aug 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). NoReC: The Norwegian Review Corpus - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/7cadff3f-fa71-595f-a700-191f22351103
Explore at:
Dataset updated
Aug 11, 2023
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
While the NoReC dataset was primarily created for training and evaluating models for document-level sentiment analysis, many other use cases are of course possible. The corpus comprises more than 35,000 full-text reviews extracted from eight different major Norwegian news sources: Dagbladet, VG, Aftenposten, Bergens Tidende, Fædrelandsvennen, Stavanger Aftenblad, DinSide.no and P3.no. The reviews cover a range of different domains, including literature, movies, video games, restaurants, music and theater, in addition to product reviews across a range of categories. Each review is labeled with a manually assigned score of 1–6, as provided by the rating of the original author. The texts have been pre-processed using UDPipe and are distributed in the CoNLL-U format. However, we also provide HTML files with the raw texts. Documentation and an accompanying Python package are provided through the following git repository: https://github.com/ltgoslo/norec
Dark Side of the Moon Reviews Dataset
kaggle.com
Updated Oct 21, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Bryant (2021). Dark Side of the Moon Reviews Dataset [Dataset]. https://www.kaggle.com/michaelbryantds/reviews-of-pink-floyds-the-dark-side-of-the-moon/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 21, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Michael Bryant
Description
Context

This dataset contains the reviews and ratings of Pink Floyd's The Dark Side of the Moon from users of rateyourmusic.com.

Content

The dataset was acquired by scraping on 15 October 2021. It contains 1544 reviews and ratings (if the user rated the album).

The scraper can be found at this GitHub Repo.

Acknowledgements

The reviews can be found here.

Inspiration

This dataset can be used to practice data cleaning, performing exploratory data analyses, and using sentiment analysis.
f
Dataset: Towards Trustworthy Sentiment Analysis in Software Engineering:...
figshare.com
xlsx
Updated Jul 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Obaidi; Marc Herrmann; Jil Klünder; Kurt Schneider (2025). Dataset: Towards Trustworthy Sentiment Analysis in Software Engineering: Dataset Characteristics and Tool Selection [Dataset]. http://doi.org/10.6084/m9.figshare.29250935.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29250935.v1
Dataset updated
Jul 2, 2025
Dataset provided by
figshare
Authors
Martin Obaidi; Marc Herrmann; Jil Klünder; Kurt Schneider
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset: Towards Trustworthy Sentiment Analysis in Software Engineering — Dataset Characteristics and Tool SelectionAuthorsMartin Obaidi, Marc Herrmann, Jil Klünder, Kurt SchneiderDescriptionThis dataset accompanies the publication:Towards Trustworthy Sentiment Analysis in Software Engineering: Dataset Characteristics and Tool SelectionThe dataset contains all coded data and annotation results from a comprehensive analysis of sentiment and linguistic characteristics in software engineering communication. The study benchmarks 14 sentiment analysis tools across 10 datasets from five major SE platforms and investigates how dataset characteristics impact tool performance and selection. The coded data underpins the development of a practical questionnaire-based recommendation approach for trustworthy and context-sensitive sentiment analysis in SE.ContentsThe dataset includes the following file:All_Sample_Sets_Coded-v04.xlsxContains manually coded sample sets from five platforms (App Reviews, Code Reviews, GitHub, Jira, Stack Overflow).Each worksheet corresponds to one platform and provides:The raw text of the communication sample (“Text”).Gold-standard sentiment labels (“oracle”): -1 = Negative, 0 = Neutral, 1 = Positive.Annotations for 13 linguistic characteristics:For each characteristic, x = present, n = not present, and an empty cell = not applicable for this item (e.g., if a characteristic is only relevant for positive statements).Enables detailed cross-platform analysis of both sentiment polarity and linguistic features in developer communication.Column details:Text: Communication/document text.oracle: Gold-standard sentiment label.Characteristic 1 – 13: See accompanying paper for definitions. Annotation can be x, n, or empty (not applicable).If you use this dataset, please cite:Obaidi, M., Herrmann, M., Klünder, J., Schneider, K. (2025).Towards Trustworthy Sentiment Analysis in Software Engineering: Dataset Characteristics and Tool Selection.In: 2025 IEEE 33rd International Requirements Engineering Conference Workshops (REW).LicenseThis dataset is provided under the Creative Commons Attribution 4.0 International License (CC BY 4.0).ContactFor questions regarding the dataset, please contact the corresponding author as listed in the publication.
h
MovieReviewSentimentClassification
huggingface.co
Updated May 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massive Text Embedding Benchmark (2025). MovieReviewSentimentClassification [Dataset]. https://huggingface.co/datasets/mteb/MovieReviewSentimentClassification
Explore at:
Dataset updated
May 6, 2025
Dataset authored and provided by
Massive Text Embedding Benchmark
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
MovieReviewSentimentClassification An MTEB dataset Massive Text Embedding Benchmark

The Allociné dataset is a French-language dataset for sentiment analysis that contains movie reviews produced by the online community of the Allociné.fr website.

Task category t2c

Domains Reviews, Written

Reference https://github.com/TheophileBlard/french-sentiment-analysis-with-bert

How to evaluate on this task

You can evaluate an embedding model on this dataset using… See the full description on the dataset page: https://huggingface.co/datasets/mteb/MovieReviewSentimentClassification.
IMDB Reviews on Barbie
kaggle.com
Updated Aug 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ibrahim (2023). IMDB Reviews on Barbie [Dataset]. https://www.kaggle.com/datasets/ibrahimonmars/imdb-reviews-on-barbie/suggestions?status=pending&yourSuggestions=true
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 4, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ibrahim
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset is scraped form imdb using a python module called Scrapset. Here is the link to doc : https://github.com/ibrahim-string/Scrapset

There are two csv files in this dataset, one is cleaned and the other is cleaned. You can explore and do some kind of sentiment analysis after cleaning the data "YOUR WAY" or you can use the cleaned csv file.
O
ASC (TIL, 19 tasks) (Task Incremental Aspect Sentiment Classification)
opendatalab.com
zip
Updated Jul 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Illinois at Chicago (2024). ASC (TIL, 19 tasks) (Task Incremental Aspect Sentiment Classification) [Dataset]. https://opendatalab.com/OpenDataLab/ASC
Explore at:
zipAvailable download formats
Dataset updated
Jul 1, 2024
Dataset provided by
University of Illinois at Chicago
Facebook AI Research
Description
A set of 19 ASC datasets (reviews of 19 products) producing a sequence of 19 tasks. Each dataset represents a task. The datasets are from 4 sources: (1) HL5Domains (Hu and Liu, 2004) with reviews of 5 products; (2) Liu3Domains (Liu et al., 2015) with reviews of 3 products; (3) Ding9Domains (Ding et al., 2008) with reviews of 9 products; and (4) SemEval14 with reviews of 2 products - SemEval 2014 Task 4 for laptop and restaurant. For (1), (2) and (3), we split about 10% of the original data as the validate data, another about 10% of the original data as the testing data. For (4), We use 150 examples from the training set for validation. To be consistent with existing research(Tang et al., 2016), examples belonging to the conflicting polarity (both positive and negative sentiments are expressed about an aspect term) are not used. Statistics and details of the 19 datasets are given on Page https://github.com/ZixuanKe/PyContinual.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

nicapotato (2018). Women's E-Commerce Clothing Reviews [Dataset]. https://www.kaggle.com/nicapotato/womens-ecommerce-clothing-reviews

Women's E-Commerce Clothing Reviews

23,000 Customer Reviews and Ratings

Explore at:

zip(2924120 bytes)Available download formats

Dataset updated

Feb 3, 2018

Authors

nicapotato

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

Welcome. This is a Women’s Clothing E-Commerce dataset revolving around the reviews written by customers. Its nine supportive features offer a great environment to parse out the text through its multiple dimensions. Because this is real commercial data, it has been anonymized, and references to the company in the review text and body have been replaced with “retailer”.

Content

This dataset includes 23486 rows and 10 feature variables. Each row corresponds to a customer review, and includes the variables:

Clothing ID: Integer Categorical variable that refers to the specific piece being reviewed.
Age: Positive Integer variable of the reviewers age.
Title: String variable for the title of the review.
Review Text: String variable for the review body.
Rating: Positive Ordinal Integer variable for the product score granted by the customer from 1 Worst, to 5 Best.
Recommended IND: Binary variable stating where the customer recommends the product where 1 is recommended, 0 is not recommended.
Positive Feedback Count: Positive Integer documenting the number of other customers who found this review positive.
Division Name: Categorical name of the product high level division.
Department Name: Categorical name of the product department name.
Class Name: Categorical name of the product class name.

Acknowledgements

Anonymous but real source

Inspiration

I look forward to come quality NLP! There is also some great opportunities for feature engineering, and multivariate analysis.

Publications

Statistical Analysis on E-Commerce Reviews, with Sentiment Classification using Bidirectional Recurrent Neural Network
by Abien Fred Agarap - Github

Clear search

Close search

Google apps

Main menu

Women's E-Commerce Clothing Reviews

Context

Content

Acknowledgements

Inspiration

Publications

MultiEmo: Multilingual, Multilevel, Multidomain Sentiment Analysis Corpus of...

Amazon Reviews for Sentiment Analysis

Content

Dataset for sentiment analysis in Spanish

Dataset - How do you propose your code changes? Empirical Analysis of Affect...

Metacritic's Best Games and Reviews - 2025

Metacritic Games and Reviews Dataset

Content

Potential Uses

Acknowledgements

ro_sent

‘Animal Crossing Reviews’ analyzed by Analyst-2

Context and Content

Acknowledgements

Inspiration

Replication Data for: A Review of Best Practice Recommendations for...

Arabic 100k Reviews

Context

Content

Acknowledgements

Inspiration

NoReC: The Norwegian Review Corpus - Dataset - B2FIND

Dark Side of the Moon Reviews Dataset

Context

Content

Acknowledgements

Inspiration

Dataset: Towards Trustworthy Sentiment Analysis in Software Engineering:...

MovieReviewSentimentClassification

IMDB Reviews on Barbie

ASC (TIL, 19 tasks) (Task Incremental Aspect Sentiment Classification)

Women's E-Commerce Clothing Reviews

23,000 Customer Reviews and Ratings

Context

Content

Acknowledgements

Inspiration

Publications