11 datasets found
  1. h

    sst2

    • huggingface.co
    Updated Mar 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford NLP (2024). sst2 [Dataset]. https://huggingface.co/datasets/stanfordnlp/sst2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 26, 2024
    Dataset authored and provided by
    Stanford NLP
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    Dataset Card for [Dataset Name]

      Dataset Summary
    

    The Stanford Sentiment Treebank is a corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language. The corpus is based on the dataset introduced by Pang and Lee (2005) and consists of 11,855 single sentences extracted from movie reviews. It was parsed with the Stanford parser and includes a total of 215,154 unique phrases from those parse trees, each… See the full description on the dataset page: https://huggingface.co/datasets/stanfordnlp/sst2.

  2. Binary Stanford Sentiment Treebank 2 (SST-2)

    • zenodo.org
    • explore.openaire.eu
    bin, txt, zip
    Updated Jan 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    N/a; N/a (2023). Binary Stanford Sentiment Treebank 2 (SST-2) [Dataset]. http://doi.org/10.5281/zenodo.7555310
    Explore at:
    txt, bin, zipAvailable download formats
    Dataset updated
    Jan 21, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    N/a; N/a
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Binary Stanford Sentiment Treebank (SST2) is a binary version of SST and Movie Review dataset (the neutral class was removed), that is, the data was classified only into positive and negative classes.

    The files:
    texts.txt: Document set (text). One per line.
    score.txt: Document class whose index is associated with texts.txt
    split_

  3. h

    sst

    • huggingface.co
    Updated May 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford NLP (2024). sst [Dataset]. https://huggingface.co/datasets/stanfordnlp/sst
    Explore at:
    Dataset updated
    May 23, 2024
    Dataset authored and provided by
    Stanford NLP
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    The Stanford Sentiment Treebank, the first corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language.

  4. P

    SST-2 Dataset

    • library.toponeai.link
    Updated May 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Richard Socher; Alex Perelygin; Jean Wu; Jason Chuang; Christopher D. Manning; Andrew Ng; Christopher Potts (2021). SST-2 Dataset [Dataset]. https://library.toponeai.link/dataset/sst-2
    Explore at:
    Dataset updated
    May 16, 2021
    Authors
    Richard Socher; Alex Perelygin; Jean Wu; Jason Chuang; Christopher D. Manning; Andrew Ng; Christopher Potts
    Description

    The Stanford Sentiment Treebank is a corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language. The corpus is based on the dataset introduced by Pang and Lee (2005) and consists of 11,855 single sentences extracted from movie reviews. It was parsed with the Stanford parser and includes a total of 215,154 unique phrases from those parse trees, each annotated by 3 human judges.

    Binary classification experiments on full sentences (negative or somewhat negative vs somewhat positive or positive with neutral sentences discarded) refer to the dataset as SST-2 or SST binary.

  5. h

    sst2-es-mt

    • huggingface.co
    Updated Mar 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manuel Romero (2025). sst2-es-mt [Dataset]. https://huggingface.co/datasets/mrm8488/sst2-es-mt
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 28, 2025
    Authors
    Manuel Romero
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    STT-2 Spanish

      A Spanish translation (using EasyNMT) of the SST-2 Dataset
    
    
    
    
    
      For more information check the official Model Card
    
  6. O

    Rendered SST-2

    • opendatalab.com
    zip
    Updated Apr 17, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford University (2023). Rendered SST-2 [Dataset]. https://opendatalab.com/OpenDataLab/Rendered_SST-2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 17, 2023
    Dataset provided by
    Stanford University
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The Rendered SST2 dataset is a dataset released by OpenAI, that measures the optical character recognition capability of visual representations. It uses sentences from the Stanford Sentiment Treebank dataset and renders them into images, with black texts on a white background, in a 448×448 resolution.

  7. h

    sst2_llama1b_modified

    • huggingface.co
    Updated Mar 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    emirhan boge (2025). sst2_llama1b_modified [Dataset]. https://huggingface.co/datasets/emirhanboge/sst2_llama1b_modified
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 7, 2025
    Authors
    emirhan boge
    Description

    SST-2 (Modified for LLaMA 1B)

    This dataset is a modified version of the Stanford Sentiment Treebank 2 (SST-2), a binary classification dataset for sentiment analysis.

      Modifications:
    

    Labels were originally "negative" and "positive", now converted to integers (0 for negative, 1 for positive). Each example includes a task prefix: Task: SST2 | Sentence: ... The dataset has been tokenized using the LLaMA-1B tokenizer. Maximum sequence length is 128 tokens.… See the full description on the dataset page: https://huggingface.co/datasets/emirhanboge/sst2_llama1b_modified.

  8. h

    sst2-textbugger

    • huggingface.co
    Updated Oct 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yang Wang (2023). sst2-textbugger [Dataset]. https://huggingface.co/datasets/yangwang825/sst2-textbugger
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 12, 2023
    Authors
    Yang Wang
    Description

    Stanford Sentiment Treebank - Binary

  9. h

    sst2-norwegian-bokmaal

    • huggingface.co
    Updated Jan 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kushtrim Visoka (2024). sst2-norwegian-bokmaal [Dataset]. https://huggingface.co/datasets/Kushtrim/sst2-norwegian-bokmaal
    Explore at:
    Dataset updated
    Jan 15, 2024
    Authors
    Kushtrim Visoka
    Description

    Norwegian Translated SST-2 Dataset

      Dataset
    
    
    
    
    
      Overview
    

    The dataset is a Norwegian machine-translation of the Stanford Sentiment Treebank (SST-2). The original dataset comprises sentences extracted from movie reviews, accompanied by human annotations indicating their sentiment.

      Dataset Structure
    

    The dataset has the following structure: { "idx": int, "sentence": str, "label": int, "sentence_nob": str }

      Data Fields
    

    idx:… See the full description on the dataset page: https://huggingface.co/datasets/Kushtrim/sst2-norwegian-bokmaal.

  10. h

    sst2-albanian

    • huggingface.co
    Updated Jan 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kushtrim Visoka (2024). sst2-albanian [Dataset]. https://huggingface.co/datasets/Kushtrim/sst2-albanian
    Explore at:
    Dataset updated
    Jan 13, 2024
    Authors
    Kushtrim Visoka
    Description

    Albanian Translated SST-2 Dataset

      Dataset
    
    
    
    
    
      Overview
    

    The dataset is a Albanian machine-translation of the Stanford Sentiment Treebank (SST-2). The original dataset comprises sentences extracted from movie reviews, accompanied by human annotations indicating their sentiment.

      Dataset Structure
    

    The dataset has the following structure: { "idx": int, "sentence": str, "label": int, "sentence_alb": str }

      Data Fields
    

    idx:… See the full description on the dataset page: https://huggingface.co/datasets/Kushtrim/sst2-albanian.

  11. h

    cobie_sst2

    • huggingface.co
    Updated Dec 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Language Technologies Unit @ Barcelona Supercomputing Center (2024). cobie_sst2 [Dataset]. https://huggingface.co/datasets/BSC-LT/cobie_sst2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 12, 2024
    Dataset authored and provided by
    Language Technologies Unit @ Barcelona Supercomputing Center
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for cobie_sst2

    This dataset is a modification of the original SST-2 dataset for LLM cognitive bias evaluation.

      Language(s)
    

    English (en)

      Dataset Summary
    

    The Stanford Sentiment Treebank is a corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language. The corpus is based on the dataset introduced by Pang and Lee (2005) and consists of 11,855 single sentences extracted from movie… See the full description on the dataset page: https://huggingface.co/datasets/BSC-LT/cobie_sst2.

  12. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Stanford NLP (2024). sst2 [Dataset]. https://huggingface.co/datasets/stanfordnlp/sst2

sst2

stanfordnlp/sst2

Stanford Sentiment Treebank v2

Explore at:
25 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 26, 2024
Dataset authored and provided by
Stanford NLP
License

https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

Description

Dataset Card for [Dataset Name]

  Dataset Summary

The Stanford Sentiment Treebank is a corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language. The corpus is based on the dataset introduced by Pang and Lee (2005) and consists of 11,855 single sentences extracted from movie reviews. It was parsed with the Stanford parser and includes a total of 215,154 unique phrases from those parse trees, each… See the full description on the dataset page: https://huggingface.co/datasets/stanfordnlp/sst2.

Search
Clear search
Close search
Google apps
Main menu