16 datasets found
  1. Women's E-Commerce Clothing Reviews

    • kaggle.com
    zip
    Updated Feb 3, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nicapotato (2018). Women's E-Commerce Clothing Reviews [Dataset]. https://www.kaggle.com/nicapotato/womens-ecommerce-clothing-reviews
    Explore at:
    zip(2924120 bytes)Available download formats
    Dataset updated
    Feb 3, 2018
    Authors
    nicapotato
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Welcome. This is a Women’s Clothing E-Commerce dataset revolving around the reviews written by customers. Its nine supportive features offer a great environment to parse out the text through its multiple dimensions. Because this is real commercial data, it has been anonymized, and references to the company in the review text and body have been replaced with “retailer”.

    Content

    This dataset includes 23486 rows and 10 feature variables. Each row corresponds to a customer review, and includes the variables:

    • Clothing ID: Integer Categorical variable that refers to the specific piece being reviewed.
    • Age: Positive Integer variable of the reviewers age.
    • Title: String variable for the title of the review.
    • Review Text: String variable for the review body.
    • Rating: Positive Ordinal Integer variable for the product score granted by the customer from 1 Worst, to 5 Best.
    • Recommended IND: Binary variable stating where the customer recommends the product where 1 is recommended, 0 is not recommended.
    • Positive Feedback Count: Positive Integer documenting the number of other customers who found this review positive.
    • Division Name: Categorical name of the product high level division.
    • Department Name: Categorical name of the product department name.
    • Class Name: Categorical name of the product class name.

    Acknowledgements

    Anonymous but real source

    Inspiration

    I look forward to come quality NLP! There is also some great opportunities for feature engineering, and multivariate analysis.

    Publications

    Statistical Analysis on E-Commerce Reviews, with Sentiment Classification using Bidirectional Recurrent Neural Network
    by Abien Fred Agarap - Github

  2. E

    MultiEmo: Multilingual, Multilevel, Multidomain Sentiment Analysis Corpus of...

    • live.european-language-grid.eu
    binary format
    Updated Feb 28, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). MultiEmo: Multilingual, Multilevel, Multidomain Sentiment Analysis Corpus of Consumer Reviews [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/8680
    Explore at:
    binary formatAvailable download formats
    Dataset updated
    Feb 28, 2021
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    MultiEmo, a new benchmark data set for the multilingual sentiment analysis task including 11 languages. The collection contains consumer reviews from four domains: medicine, hotels, products and university. The original reviews in Polish contained 8,216 documents consisting of 57,466 sentences. The reviews were manually annotated with sentiment at the level of the whole document and at the level of a sentence (3 annotators per element). We achieved a high Positive Specific Agreement value of 0.91 for texts and 0.88 for sentences. The collection was then translated automatically into English, Chinese, Italian, Japanese, Russian, German, Spanish, French, Dutch and Portuguese. MultiEmo is publicly available under a Creative Commons Attribution 4.0 International Licence.

    More information: https://github.com/CLARIN-PL/multiemo

  3. Amazon Reviews for Sentiment Analysis

    • kaggle.com
    zip
    Updated Nov 18, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adam Bittlingmayer (2019). Amazon Reviews for Sentiment Analysis [Dataset]. https://www.kaggle.com/bittlingmayer/amazonreviews
    Explore at:
    zip(517080965 bytes)Available download formats
    Dataset updated
    Nov 18, 2019
    Authors
    Adam Bittlingmayer
    Description

    This dataset consists of a few million Amazon customer reviews (input text) and star ratings (output labels) for learning how to train fastText for sentiment analysis.

    The idea here is a dataset is more than a toy - real business data on a reasonable scale - but can be trained in minutes on a modest laptop.

    Content

    The fastText supervised learning tutorial requires data in the following format:

    _label_
    
  4. Z

    Dataset for sentiment analysis in Spanish

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fran Ramírez (2022). Dataset for sentiment analysis in Spanish [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6425686
    Explore at:
    Dataset updated
    Apr 9, 2022
    Dataset authored and provided by
    Fran Ramírez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is automatically generated by webscraping from sites such as Tripadvisor or Google Maps reviews. In these sites, the users post comments with ratings, allowing us to have tagged data. The code that generated this dataset can be found at the following URL:

    https://github.com/fjramirezv/sentiment-webscraping

  5. Dataset - How do you propose your code changes? Empirical Analysis of Affect...

    • zenodo.org
    • data.niaid.nih.gov
    txt, zip
    Updated May 13, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marco Ortu; Giuseppe Destefanis; Daniel Graziotin; Daniel Graziotin; Michele Marchesi; Marco Tonelli; Marco Ortu; Giuseppe Destefanis; Michele Marchesi; Marco Tonelli (2020). Dataset - How do you propose your code changes? Empirical Analysis of Affect Metrics of Pull Requests on GitHub [Dataset]. http://doi.org/10.5281/zenodo.3825044
    Explore at:
    txt, zipAvailable download formats
    Dataset updated
    May 13, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marco Ortu; Giuseppe Destefanis; Daniel Graziotin; Daniel Graziotin; Michele Marchesi; Marco Tonelli; Marco Ortu; Giuseppe Destefanis; Michele Marchesi; Marco Tonelli
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This package contains the raw open data for the study

    Marco Ortu, Giuseppe Destefanis, Daniel Graziotin, Michele Marchesi, Roberto Tonelli. 2020. How do you propose your code changes? Empirical Analysis of Affect Metrics of Pull Requests on GitHub. Under Review.

    The dataset is based on GHTorrent dataset:

    Georgios Gousios. 2013. The GHTorent dataset and tool suite. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR ’13). IEEE Press, 233–236

    And released with the same license (CC BY-SA 4.0).

  6. Metacritic's Best Games and Reviews - 2025

    • kaggle.com
    Updated Mar 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Davut Bayık (2025). Metacritic's Best Games and Reviews - 2025 [Dataset]. https://www.kaggle.com/datasets/davutb/metacritic-games
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 16, 2025
    Dataset provided by
    Kaggle
    Authors
    Davut Bayık
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    *Also find Metacritic Movies and Metacritic TV Shows datasets.*

    Metacritic Games and Reviews Dataset

    This dataset contains a collection of video games and their corresponding reviews from Metacritic, a popular aggregate review site. The data provides insights into various video games across different platforms, including PC, PlayStation, Xbox, and others. Each game entry includes critical reviews, user reviews, ratings, and other relevant information that can be used for analysis, natural language processing, machine learning, and predictive modeling.

    Important Note: *The games in this collection are selected from Metacritic's Best Games of All Time list, which only includes titles that have received at least 7 reviews, ensuring a minimum level of critical and user input.*

    Up-to-dateness: *This dataset is accurate as of March 14, 2025, and includes the most current rankings and game details available at that time.*

    Content

    The dataset contains general information and scores of 13K+ games and their corresponding 1.6M+ user/critic reviews collected by sending automated requests to Metacritic's public backend API using Python's requests and pandas libraries.

    Potential Uses

    • Sentiment analysis and natural language processing on reviews.
    • Predictive modeling for predicting user ratings based on features like genre, publisher, and developer.
    • Data analysis on trends in game quality, genres, or platform performance.
    • Comparing critical reviews and user reviews to understand the divergence in ratings.

    This dataset is perfect for researchers, game enthusiasts, and data scientists who are interested in exploring the gaming industry through data analysis.

    Acknowledgements

  7. h

    ro_sent

    • huggingface.co
    Updated Mar 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dumitrescu Stefan (2025). ro_sent [Dataset]. https://huggingface.co/datasets/dumitrescustefan/ro_sent
    Explore at:
    Dataset updated
    Mar 28, 2025
    Authors
    Dumitrescu Stefan
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    This dataset is a Romanian Sentiment Analysis dataset. It is present in a processed form, as used by the authors of Romanian Transformers in their examples and based on the original data present in https://github.com/katakonst/sentiment-analysis-tensorflow. The original dataset is collected from product and movie reviews in Romanian.

  8. A

    ‘Animal Crossing Reviews’ analyzed by Analyst-2

    • analyst-2.ai
    Updated May 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘Animal Crossing Reviews’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-animal-crossing-reviews-f013/latest
    Explore at:
    Dataset updated
    May 4, 2020
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Animal Crossing Reviews’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/jessemostipak/animal-crossing on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context and Content

    The data this week comes from the VillagerDB and Metacritic. VillagerDB brings info about villagers, items, crafting, accessories, including links to their images. Metacritic brings user and critic reviews of the game (scores and raw text).

    Per Wikipedia:

    Animal Crossing: New Horizons is a 2020 life simulation video game developed and published by Nintendo for the Nintendo Switch. It is the fifth main series title in the Animal Crossing series. New Horizons was released in all regions on March 20, 2020.

    New Horizons sees the player assuming the role of a customizable character who moves to a deserted island after purchasing a package from Tom Nook, a tanuki character who has appeared in every entry in the Animal Crossing series. Taking place in real-time, the player can explore the island in a nonlinear fashion, gathering and crafting items, catching insects and fish, and developing the island into a community of anthropomorphic animals.

    Animal Crossing as explained by a Polygon opinion piece.

    With just a few design twists, the work behind collecting hundreds or even thousands of items over weeks and months becomes an exercise of mindfulness, predictability, and agency that many players find soothing instead of annoying.

    Games that feature gentle progression give us a sense of progress and achievability, teaching us that putting in a little work consistently while taking things one step at a time can give us some fantastic results. It’s a good life lesson, as well as a way to calm yourself and others, and it’s all achieved through game design.

    Some potential context for user_reviews.tsv from 538 and a point of potential strife via Animal Crossing World, and lastly a spoiler article analyzing the reviews in R by Boon Tan.

    PS there is an easter egg somewhere in the readme - something to do with... turnips.

    Acknowledgements

    The data was downloaded and cleaned by Thomas Mock for #TidyTuesday during the week of May 4th, 2020. You can see the code used to clean the data in the #TidyTuesday GitHub repository.

    Inspiration

    Potential Analyses:

    • Reviews: Sentiment analysis, text analysis, scores, date effect
    • Villagers/Items: Gender, species, sayings, personality, price, recipe, what about a star sign based off the birthday column?

    --- Original source retains full ownership of the source dataset ---

  9. g

    Replication Data for: A Review of Best Practice Recommendations for...

    • datasearch.gesis.org
    • dataverse-staging.rdmc.unc.edu
    Updated Jan 22, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wesslen, Ryan (2020). Replication Data for: A Review of Best Practice Recommendations for Text-Analysis in R (and a User Friendly App) [Dataset]. http://doi.org/10.15139/S3/R4W7ZS
    Explore at:
    Dataset updated
    Jan 22, 2020
    Dataset provided by
    Odum Institute Dataverse Network
    Authors
    Wesslen, Ryan
    Description

    Replication materials for "A Review of Best Practice Recommendations for Text-Analysis in R (and a User Friendly App)". You can also find these materials on GitHub repo (https://github.com/wesslen/text-analysis-org-science) as well as the Shiny app in the GitHub repo (https://github.com/wesslen/topicApp).

  10. Arabic 100k Reviews

    • kaggle.com
    Updated Mar 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abed Khooli (2020). Arabic 100k Reviews [Dataset]. https://www.kaggle.com/abedkhooli/arabic-100k-reviews/kernels
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 7, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Abed Khooli
    Description

    Context

    Few Arabic datasets are available for classification comparison and other NLP tasks. This dataset is mainly a compilation of several available datasets and a sampling of 100k rows (99999 to be exact).

    Content

    The dataset combines reviews from hotels, books, movies, products and a few airlines. It has three classes (Mixed, Negative and Positive). Most were mapped from reviewers' ratings with 3 being mixed, above 3 positive and below 3 negative. Each row has a label and text separated by a tab (tsv). Text (reviews) were cleaned by removing Arabic diacritics and non-Arabic characters. The dataset has no duplicate reviews.

    Acknowledgements

    The hotels and book reviews are a subset of HARD and BRAD. The rest were selected from hadyelsahar with a little over 100 airlines reviews collected manually.

    Inspiration

    Let's jump in and use your best tools to beat the SOTA! Don't forget to show and share your work.

  11. e

    NoReC: The Norwegian Review Corpus - Dataset - B2FIND

    • b2find.eudat.eu
    Updated Aug 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). NoReC: The Norwegian Review Corpus - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/7cadff3f-fa71-595f-a700-191f22351103
    Explore at:
    Dataset updated
    Aug 11, 2023
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Description

    While the NoReC dataset was primarily created for training and evaluating models for document-level sentiment analysis, many other use cases are of course possible. The corpus comprises more than 35,000 full-text reviews extracted from eight different major Norwegian news sources: Dagbladet, VG, Aftenposten, Bergens Tidende, Fædrelandsvennen, Stavanger Aftenblad, DinSide.no and P3.no. The reviews cover a range of different domains, including literature, movies, video games, restaurants, music and theater, in addition to product reviews across a range of categories. Each review is labeled with a manually assigned score of 1–6, as provided by the rating of the original author. The texts have been pre-processed using UDPipe and are distributed in the CoNLL-U format. However, we also provide HTML files with the raw texts. Documentation and an accompanying Python package are provided through the following git repository: https://github.com/ltgoslo/norec

  12. Dark Side of the Moon Reviews Dataset

    • kaggle.com
    Updated Oct 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Bryant (2021). Dark Side of the Moon Reviews Dataset [Dataset]. https://www.kaggle.com/michaelbryantds/reviews-of-pink-floyds-the-dark-side-of-the-moon/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 21, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Michael Bryant
    Description

    Context

    This dataset contains the reviews and ratings of Pink Floyd's The Dark Side of the Moon from users of rateyourmusic.com.

    Content

    The dataset was acquired by scraping on 15 October 2021. It contains 1544 reviews and ratings (if the user rated the album).

    The scraper can be found at this GitHub Repo.

    Acknowledgements

    The reviews can be found here.

    Inspiration

    This dataset can be used to practice data cleaning, performing exploratory data analyses, and using sentiment analysis.

  13. f

    Dataset: Towards Trustworthy Sentiment Analysis in Software Engineering:...

    • figshare.com
    xlsx
    Updated Jul 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Obaidi; Marc Herrmann; Jil Klünder; Kurt Schneider (2025). Dataset: Towards Trustworthy Sentiment Analysis in Software Engineering: Dataset Characteristics and Tool Selection [Dataset]. http://doi.org/10.6084/m9.figshare.29250935.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 2, 2025
    Dataset provided by
    figshare
    Authors
    Martin Obaidi; Marc Herrmann; Jil Klünder; Kurt Schneider
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset: Towards Trustworthy Sentiment Analysis in Software Engineering — Dataset Characteristics and Tool SelectionAuthorsMartin Obaidi, Marc Herrmann, Jil Klünder, Kurt SchneiderDescriptionThis dataset accompanies the publication:Towards Trustworthy Sentiment Analysis in Software Engineering: Dataset Characteristics and Tool SelectionThe dataset contains all coded data and annotation results from a comprehensive analysis of sentiment and linguistic characteristics in software engineering communication. The study benchmarks 14 sentiment analysis tools across 10 datasets from five major SE platforms and investigates how dataset characteristics impact tool performance and selection. The coded data underpins the development of a practical questionnaire-based recommendation approach for trustworthy and context-sensitive sentiment analysis in SE.ContentsThe dataset includes the following file:All_Sample_Sets_Coded-v04.xlsxContains manually coded sample sets from five platforms (App Reviews, Code Reviews, GitHub, Jira, Stack Overflow).Each worksheet corresponds to one platform and provides:The raw text of the communication sample (“Text”).Gold-standard sentiment labels (“oracle”): -1 = Negative, 0 = Neutral, 1 = Positive.Annotations for 13 linguistic characteristics:For each characteristic, x = present, n = not present, and an empty cell = not applicable for this item (e.g., if a characteristic is only relevant for positive statements).Enables detailed cross-platform analysis of both sentiment polarity and linguistic features in developer communication.Column details:Text: Communication/document text.oracle: Gold-standard sentiment label.Characteristic 1 – 13: See accompanying paper for definitions. Annotation can be x, n, or empty (not applicable).If you use this dataset, please cite:Obaidi, M., Herrmann, M., Klünder, J., Schneider, K. (2025).Towards Trustworthy Sentiment Analysis in Software Engineering: Dataset Characteristics and Tool Selection.In: 2025 IEEE 33rd International Requirements Engineering Conference Workshops (REW).LicenseThis dataset is provided under the Creative Commons Attribution 4.0 International License (CC BY 4.0).ContactFor questions regarding the dataset, please contact the corresponding author as listed in the publication.

  14. h

    MovieReviewSentimentClassification

    • huggingface.co
    Updated May 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massive Text Embedding Benchmark (2025). MovieReviewSentimentClassification [Dataset]. https://huggingface.co/datasets/mteb/MovieReviewSentimentClassification
    Explore at:
    Dataset updated
    May 6, 2025
    Dataset authored and provided by
    Massive Text Embedding Benchmark
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    MovieReviewSentimentClassification An MTEB dataset Massive Text Embedding Benchmark

    The Allociné dataset is a French-language dataset for sentiment analysis that contains movie reviews produced by the online community of the Allociné.fr website.

    Task category t2c

    Domains Reviews, Written

    Reference https://github.com/TheophileBlard/french-sentiment-analysis-with-bert

      How to evaluate on this task
    

    You can evaluate an embedding model on this dataset using… See the full description on the dataset page: https://huggingface.co/datasets/mteb/MovieReviewSentimentClassification.

  15. IMDB Reviews on Barbie

    • kaggle.com
    Updated Aug 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ibrahim (2023). IMDB Reviews on Barbie [Dataset]. https://www.kaggle.com/datasets/ibrahimonmars/imdb-reviews-on-barbie/suggestions?status=pending&yourSuggestions=true
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 4, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ibrahim
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset is scraped form imdb using a python module called Scrapset. Here is the link to doc : https://github.com/ibrahim-string/Scrapset

    There are two csv files in this dataset, one is cleaned and the other is cleaned. You can explore and do some kind of sentiment analysis after cleaning the data "YOUR WAY" or you can use the cleaned csv file.

  16. O

    ASC (TIL, 19 tasks) (Task Incremental Aspect Sentiment Classification)

    • opendatalab.com
    zip
    Updated Jul 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Illinois at Chicago (2024). ASC (TIL, 19 tasks) (Task Incremental Aspect Sentiment Classification) [Dataset]. https://opendatalab.com/OpenDataLab/ASC
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 1, 2024
    Dataset provided by
    University of Illinois at Chicago
    Facebook AI Research
    Description

    A set of 19 ASC datasets (reviews of 19 products) producing a sequence of 19 tasks. Each dataset represents a task. The datasets are from 4 sources: (1) HL5Domains (Hu and Liu, 2004) with reviews of 5 products; (2) Liu3Domains (Liu et al., 2015) with reviews of 3 products; (3) Ding9Domains (Ding et al., 2008) with reviews of 9 products; and (4) SemEval14 with reviews of 2 products - SemEval 2014 Task 4 for laptop and restaurant. For (1), (2) and (3), we split about 10% of the original data as the validate data, another about 10% of the original data as the testing data. For (4), We use 150 examples from the training set for validation. To be consistent with existing research(Tang et al., 2016), examples belonging to the conflicting polarity (both positive and negative sentiments are expressed about an aspect term) are not used. Statistics and details of the 19 datasets are given on Page https://github.com/ZixuanKe/PyContinual.

  17. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
nicapotato (2018). Women's E-Commerce Clothing Reviews [Dataset]. https://www.kaggle.com/nicapotato/womens-ecommerce-clothing-reviews
Organization logo

Women's E-Commerce Clothing Reviews

23,000 Customer Reviews and Ratings

Explore at:
zip(2924120 bytes)Available download formats
Dataset updated
Feb 3, 2018
Authors
nicapotato
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

Welcome. This is a Women’s Clothing E-Commerce dataset revolving around the reviews written by customers. Its nine supportive features offer a great environment to parse out the text through its multiple dimensions. Because this is real commercial data, it has been anonymized, and references to the company in the review text and body have been replaced with “retailer”.

Content

This dataset includes 23486 rows and 10 feature variables. Each row corresponds to a customer review, and includes the variables:

  • Clothing ID: Integer Categorical variable that refers to the specific piece being reviewed.
  • Age: Positive Integer variable of the reviewers age.
  • Title: String variable for the title of the review.
  • Review Text: String variable for the review body.
  • Rating: Positive Ordinal Integer variable for the product score granted by the customer from 1 Worst, to 5 Best.
  • Recommended IND: Binary variable stating where the customer recommends the product where 1 is recommended, 0 is not recommended.
  • Positive Feedback Count: Positive Integer documenting the number of other customers who found this review positive.
  • Division Name: Categorical name of the product high level division.
  • Department Name: Categorical name of the product department name.
  • Class Name: Categorical name of the product class name.

Acknowledgements

Anonymous but real source

Inspiration

I look forward to come quality NLP! There is also some great opportunities for feature engineering, and multivariate analysis.

Publications

Statistical Analysis on E-Commerce Reviews, with Sentiment Classification using Bidirectional Recurrent Neural Network
by Abien Fred Agarap - Github

Search
Clear search
Close search
Google apps
Main menu