https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Dataset Card for [Dataset Name]
Dataset Summary
The Stanford Sentiment Treebank is a corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language. The corpus is based on the dataset introduced by Pang and Lee (2005) and consists of 11,855 single sentences extracted from movie reviews. It was parsed with the Stanford parser and includes a total of 215,154 unique phrases from those parse trees, each… See the full description on the dataset page: https://huggingface.co/datasets/stanfordnlp/sst2.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Binary Stanford Sentiment Treebank (SST2) is a binary version of SST and Movie Review dataset (the neutral class was removed), that is, the data was classified only into positive and negative classes.
The files:
texts.txt: Document set (text). One per line.
score.txt: Document class whose index is associated with texts.txt
split_
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
The Stanford Sentiment Treebank, the first corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language.
The Stanford Sentiment Treebank is a corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language. The corpus is based on the dataset introduced by Pang and Lee (2005) and consists of 11,855 single sentences extracted from movie reviews. It was parsed with the Stanford parser and includes a total of 215,154 unique phrases from those parse trees, each annotated by 3 human judges.
Binary classification experiments on full sentences (negative or somewhat negative vs somewhat positive or positive with neutral sentences discarded) refer to the dataset as SST-2 or SST binary.
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
STT-2 Spanish
A Spanish translation (using EasyNMT) of the SST-2 Dataset
For more information check the official Model Card
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The Rendered SST2 dataset is a dataset released by OpenAI, that measures the optical character recognition capability of visual representations. It uses sentences from the Stanford Sentiment Treebank dataset and renders them into images, with black texts on a white background, in a 448×448 resolution.
SST-2 (Modified for LLaMA 1B)
This dataset is a modified version of the Stanford Sentiment Treebank 2 (SST-2), a binary classification dataset for sentiment analysis.
Modifications:
Labels were originally "negative" and "positive", now converted to integers (0 for negative, 1 for positive). Each example includes a task prefix: Task: SST2 | Sentence: ... The dataset has been tokenized using the LLaMA-1B tokenizer. Maximum sequence length is 128 tokens.… See the full description on the dataset page: https://huggingface.co/datasets/emirhanboge/sst2_llama1b_modified.
Stanford Sentiment Treebank - Binary
Norwegian Translated SST-2 Dataset
Dataset
Overview
The dataset is a Norwegian machine-translation of the Stanford Sentiment Treebank (SST-2). The original dataset comprises sentences extracted from movie reviews, accompanied by human annotations indicating their sentiment.
Dataset Structure
The dataset has the following structure: { "idx": int, "sentence": str, "label": int, "sentence_nob": str }
Data Fields
idx:… See the full description on the dataset page: https://huggingface.co/datasets/Kushtrim/sst2-norwegian-bokmaal.
Albanian Translated SST-2 Dataset
Dataset
Overview
The dataset is a Albanian machine-translation of the Stanford Sentiment Treebank (SST-2). The original dataset comprises sentences extracted from movie reviews, accompanied by human annotations indicating their sentiment.
Dataset Structure
The dataset has the following structure: { "idx": int, "sentence": str, "label": int, "sentence_alb": str }
Data Fields
idx:… See the full description on the dataset page: https://huggingface.co/datasets/Kushtrim/sst2-albanian.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for cobie_sst2
This dataset is a modification of the original SST-2 dataset for LLM cognitive bias evaluation.
Language(s)
English (en)
Dataset Summary
The Stanford Sentiment Treebank is a corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language. The corpus is based on the dataset introduced by Pang and Lee (2005) and consists of 11,855 single sentences extracted from movie… See the full description on the dataset page: https://huggingface.co/datasets/BSC-LT/cobie_sst2.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Dataset Card for [Dataset Name]
Dataset Summary
The Stanford Sentiment Treebank is a corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language. The corpus is based on the dataset introduced by Pang and Lee (2005) and consists of 11,855 single sentences extracted from movie reviews. It was parsed with the Stanford parser and includes a total of 215,154 unique phrases from those parse trees, each… See the full description on the dataset page: https://huggingface.co/datasets/stanfordnlp/sst2.