11 datasets found

h
sst2
huggingface.co
Updated Mar 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford NLP (2024). sst2 [Dataset]. https://huggingface.co/datasets/stanfordnlp/sst2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 26, 2024
Dataset authored and provided by
Stanford NLP
License
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Description
Dataset Card for [Dataset Name]

Dataset Summary

The Stanford Sentiment Treebank is a corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language. The corpus is based on the dataset introduced by Pang and Lee (2005) and consists of 11,855 single sentences extracted from movie reviews. It was parsed with the Stanford parser and includes a total of 215,154 unique phrases from those parse trees, each… See the full description on the dataset page: https://huggingface.co/datasets/stanfordnlp/sst2.
Binary Stanford Sentiment Treebank 2 (SST-2)
zenodo.org
explore.openaire.eu
bin, txt, zip
Updated Jan 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
N/a; N/a (2023). Binary Stanford Sentiment Treebank 2 (SST-2) [Dataset]. http://doi.org/10.5281/zenodo.7555310
Explore at:
txt, bin, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7555310
Dataset updated
Jan 21, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
N/a; N/a
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Binary Stanford Sentiment Treebank (SST2) is a binary version of SST and Movie Review dataset (the neutral class was removed), that is, the data was classified only into positive and negative classes.

The files:
texts.txt: Document set (text). One per line.
score.txt: Document class whose index is associated with texts.txt
split_
h
sst
huggingface.co
Updated May 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford NLP (2024). sst [Dataset]. https://huggingface.co/datasets/stanfordnlp/sst
Explore at:
Dataset updated
May 23, 2024
Dataset authored and provided by
Stanford NLP
License
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Description
The Stanford Sentiment Treebank, the first corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language.
P
SST-2 Dataset
library.toponeai.link
Updated May 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard Socher; Alex Perelygin; Jean Wu; Jason Chuang; Christopher D. Manning; Andrew Ng; Christopher Potts (2021). SST-2 Dataset [Dataset]. https://library.toponeai.link/dataset/sst-2
Explore at:
Dataset updated
May 16, 2021
Authors
Richard Socher; Alex Perelygin; Jean Wu; Jason Chuang; Christopher D. Manning; Andrew Ng; Christopher Potts
Description
The Stanford Sentiment Treebank is a corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language. The corpus is based on the dataset introduced by Pang and Lee (2005) and consists of 11,855 single sentences extracted from movie reviews. It was parsed with the Stanford parser and includes a total of 215,154 unique phrases from those parse trees, each annotated by 3 human judges.

Binary classification experiments on full sentences (negative or somewhat negative vs somewhat positive or positive with neutral sentences discarded) refer to the dataset as SST-2 or SST binary.
h
sst2-es-mt
huggingface.co
Updated Mar 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Manuel Romero (2025). sst2-es-mt [Dataset]. https://huggingface.co/datasets/mrm8488/sst2-es-mt
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 28, 2025
Authors
Manuel Romero
License
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Description
STT-2 Spanish

A Spanish translation (using EasyNMT) of the SST-2 Dataset For more information check the official Model Card
O
Rendered SST-2
opendatalab.com
zip
Updated Apr 17, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford University (2023). Rendered SST-2 [Dataset]. https://opendatalab.com/OpenDataLab/Rendered_SST-2
Explore at:
zipAvailable download formats
Dataset updated
Apr 17, 2023
Dataset provided by
Stanford University
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The Rendered SST2 dataset is a dataset released by OpenAI, that measures the optical character recognition capability of visual representations. It uses sentences from the Stanford Sentiment Treebank dataset and renders them into images, with black texts on a white background, in a 448×448 resolution.
h
sst2_llama1b_modified
huggingface.co
Updated Mar 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
emirhan boge (2025). sst2_llama1b_modified [Dataset]. https://huggingface.co/datasets/emirhanboge/sst2_llama1b_modified
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 7, 2025
Authors
emirhan boge
Description
SST-2 (Modified for LLaMA 1B)

This dataset is a modified version of the Stanford Sentiment Treebank 2 (SST-2), a binary classification dataset for sentiment analysis.

Modifications:

Labels were originally "negative" and "positive", now converted to integers (0 for negative, 1 for positive). Each example includes a task prefix: Task: SST2 | Sentence: ... The dataset has been tokenized using the LLaMA-1B tokenizer. Maximum sequence length is 128 tokens.… See the full description on the dataset page: https://huggingface.co/datasets/emirhanboge/sst2_llama1b_modified.
h
sst2-textbugger
huggingface.co
Updated Oct 12, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yang Wang (2023). sst2-textbugger [Dataset]. https://huggingface.co/datasets/yangwang825/sst2-textbugger
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 12, 2023
Authors
Yang Wang
Description
Stanford Sentiment Treebank - Binary
h
sst2-norwegian-bokmaal
huggingface.co
Updated Jan 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kushtrim Visoka (2024). sst2-norwegian-bokmaal [Dataset]. https://huggingface.co/datasets/Kushtrim/sst2-norwegian-bokmaal
Explore at:
Dataset updated
Jan 15, 2024
Authors
Kushtrim Visoka
Description
Norwegian Translated SST-2 Dataset

Dataset Overview

The dataset is a Norwegian machine-translation of the Stanford Sentiment Treebank (SST-2). The original dataset comprises sentences extracted from movie reviews, accompanied by human annotations indicating their sentiment.

Dataset Structure

The dataset has the following structure: { "idx": int, "sentence": str, "label": int, "sentence_nob": str }

Data Fields

idx:… See the full description on the dataset page: https://huggingface.co/datasets/Kushtrim/sst2-norwegian-bokmaal.
h
sst2-albanian
huggingface.co
Updated Jan 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kushtrim Visoka (2024). sst2-albanian [Dataset]. https://huggingface.co/datasets/Kushtrim/sst2-albanian
Explore at:
Dataset updated
Jan 13, 2024
Authors
Kushtrim Visoka
Description
Albanian Translated SST-2 Dataset

Dataset Overview

The dataset is a Albanian machine-translation of the Stanford Sentiment Treebank (SST-2). The original dataset comprises sentences extracted from movie reviews, accompanied by human annotations indicating their sentiment.

Dataset Structure

The dataset has the following structure: { "idx": int, "sentence": str, "label": int, "sentence_alb": str }

Data Fields

idx:… See the full description on the dataset page: https://huggingface.co/datasets/Kushtrim/sst2-albanian.
h
cobie_sst2
huggingface.co
Updated Dec 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Language Technologies Unit @ Barcelona Supercomputing Center (2024). cobie_sst2 [Dataset]. https://huggingface.co/datasets/BSC-LT/cobie_sst2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 12, 2024
Dataset authored and provided by
Language Technologies Unit @ Barcelona Supercomputing Center
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for cobie_sst2

This dataset is a modification of the original SST-2 dataset for LLM cognitive bias evaluation.

Language(s)

English (en)

Dataset Summary

The Stanford Sentiment Treebank is a corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language. The corpus is based on the dataset introduced by Pang and Lee (2005) and consists of 11,855 single sentences extracted from movie… See the full description on the dataset page: https://huggingface.co/datasets/BSC-LT/cobie_sst2.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Stanford NLP (2024). sst2 [Dataset]. https://huggingface.co/datasets/stanfordnlp/sst2

sst2

stanfordnlp/sst2

Stanford Sentiment Treebank v2

Explore at:

25 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Mar 26, 2024

Dataset authored and provided by

Stanford NLP

License

https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

Description

Dataset Card for [Dataset Name]

  Dataset Summary

The Stanford Sentiment Treebank is a corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language. The corpus is based on the dataset introduced by Pang and Lee (2005) and consists of 11,855 single sentences extracted from movie reviews. It was parsed with the Stanford parser and includes a total of 215,154 unique phrases from those parse trees, each… See the full description on the dataset page: https://huggingface.co/datasets/stanfordnlp/sst2.

Clear search

Close search

Google apps

Main menu

sst2

Binary Stanford Sentiment Treebank 2 (SST-2)

sst

SST-2 Dataset

sst2-es-mt

Rendered SST-2

sst2_llama1b_modified

sst2-textbugger

sst2-norwegian-bokmaal

sst2-albanian

cobie_sst2

sst2

stanfordnlp/sst2

Stanford Sentiment Treebank v2