Saved datasets
Last updated
Download format
Usage rights
License from data provider
Please review the applicable license to make sure your contemplated use is permitted.
Topic
Free
Cost to access
Described as free to access or have a license that allows redistribution.
10 datasets found
  1. Webis Cross-Lingual Sentiment Dataset 2010 (Webis-CLS-10)

    • zenodo.org
    gz, txt
    Updated Jul 16, 2010
  2. Webis-CLS-10

    • webis.de
    3251672
    Updated 2010
  3. ITOP Dataset

    • zenodo.org
    gz, jpg
    Updated Oct 8, 2016
  4. Sentiment Analysis of movie review

    • www.kaggle.com
    zip
    Updated Nov 8, 2020
  5. Women's E-Commerce Clothing Reviews

    • www.kaggle.com
    zip
    Updated Feb 3, 2018
  6. COVID-19 Open Research Dataset Challenge (CORD-19)

    • www.kaggle.com
    zip
    Updated Apr 25, 2022
  7. Website Phishing Dataset

    • www.kaggle.com
    • data.world
    zip
    Updated May 4, 2019
  8. Predict Click through rate (CTR) for a website

    • www.kaggle.com
    zip
    Updated Jun 20, 2019
  9. Rotten Tomatoes movies and critic reviews dataset

    • www.kaggle.com
    zip
    Updated Nov 4, 2020
  10. URL Classification Dataset [DMOZ]

    • www.kaggle.com
    zip
    Updated Aug 19, 2018
  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Prettenhofer, Peter; Stein, Benno (2010). Webis Cross-Lingual Sentiment Dataset 2010 (Webis-CLS-10) [Dataset]. http://doi.org/10.5281/zenodo.3251672
Organization logo

Webis Cross-Lingual Sentiment Dataset 2010 (Webis-CLS-10)

4 scholarly articles cite this dataset (View in Google Scholar)
txt, gzAvailable download formats
Dataset updated
Jul 16, 2010
Dataset provided by
Bauhaus-Universität Weimarhttp://www.uni-weimar.de/
Authors
Prettenhofer, Peter; Stein, Benno
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The Cross-Lingual Sentiment (CLS) dataset comprises about 800.000 Amazon product reviews in the four languages English, German, French, and Japanese.

For more information on the construction of the dataset see (Prettenhofer and Stein, 2010) or the enclosed readme files. If you have a question after reading the paper and the readme files, please contact Peter Prettenhofer.

We provide the dataset in two formats: 1) a processed format which corresponds to the preprocessing (tokenization, etc.) in (Prettenhofer and Stein, 2010); 2) an unprocessed format which contains the full text of the reviews (e.g., for machine translation or feature engineering).

The dataset was first used by (Prettenhofer and Stein, 2010). It consists of Amazon product reviews for three product categories---books, dvds and music---written in four different languages: English, German, French, and Japanese. The German, French, and Japanese reviews were crawled from Amazon in November, 2009. The English reviews were sampled from the Multi-Domain Sentiment Dataset (Blitzer et. al., 2007). For each language-category pair there exist three sets of training documents, test documents, and unlabeled documents. The training and test sets comprise 2.000 documents each, whereas the number of unlabeled documents varies from 9.000 - 170.000.

Search
Clear search
Close search
Google apps
Main menu