8 datasets found

Yahoo! Answers Topic Classification
kaggle.com
zip
Updated Jun 30, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bhavik Ardeshna (2022). Yahoo! Answers Topic Classification [Dataset]. https://www.kaggle.com/datasets/bhavikardeshna/yahoo-email-classification
Explore at:
zip(324007831 bytes)Available download formats
Dataset updated
Jun 30, 2022
Authors
Bhavik Ardeshna
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The Yahoo! Answers topic classification dataset is constructed using the 10 largest main categories. Each class contains 140,000 training samples and 6,000 testing samples. Therefore, the total number of training samples is 1,400,000, and testing samples are 60,000 in this dataset. From all the answers and other meta-information, we only used the best answer content and the main category information.

Society & Culture

Science & Mathematics

Health

Education & Reference

Computers & Internet

Sports

Business & Finance

Entertainment & Music

Family & Relationships

Politics & Government

The Yahoo! Answers topic classification dataset is constructed by Xiang Zhang (xiang.zhang@nyu.edu) from the above dataset. It is used as a text classification benchmark in the following paper: Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015)
📊 Yahoo Answers 10 categories for NLP CSV
kaggle.com
zip
Updated Apr 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yassir Acharki (2023). 📊 Yahoo Answers 10 categories for NLP CSV [Dataset]. https://www.kaggle.com/datasets/yacharki/yahoo-answers-10-categories-for-nlp-csv
Explore at:
zip(324009471 bytes)Available download formats
Dataset updated
Apr 7, 2023
Authors
Yassir Acharki
License
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Description
The Yahoo! Answers topic classification dataset is constructed using 10 largest main categories. Each class contains 140,000 training samples and 6,000 testing samples. Therefore, the total number of training samples is 1,400,000 and testing samples 60,000 in this dataset. From all the answers and other meta-information, we only used the best answer content and the main category information.

The file classes.txt contains a list of classes corresponding to each label.

The files train.csv and test.csv contain all the training samples as comma-sparated values. There are 4 columns in them, corresponding to class index (1 to 10), question title, question content and best answer. The text fields are escaped using double quotes ("), and any internal double quote is escaped by 2 double quotes (""). New lines are escaped by a backslash followed with an "n" character, that is " ".
Data from: Yahoo! Answers
zenodo.org
bin, txt
Updated Aug 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
N/A; N/A (2021). Yahoo! Answers [Dataset]. http://doi.org/10.5281/zenodo.5259952
Explore at:
bin, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5259952
Dataset updated
Aug 26, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
N/A; N/A
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Yahoo! Answers (Yahoo! Answers Comprehensive Questions and Answers version 1.0). Zhang et al. (2016) collected this set of 4,483,032 questions and used their answers across the 10 largest main categories for building the classification dataset. The used fields include question title, question content and best answer.

The files:
texts.txt: Document set (text). One per line.
score.txt: Document class whose index is associated with texts.txt
split_
r
Yahoo Answers topics
resodate.org
service.tib.eu
Updated Jan 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Weinan Zhang; Wanxiang Che; Hongyuan Dong (2025). Yahoo Answers topics [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQveWFob28tYW5zd2Vycy10b3BpY3M=
Explore at:
Dataset updated
Jan 3, 2025
Dataset provided by
Leibniz Data Manager
Authors
Weinan Zhang; Wanxiang Che; Hongyuan Dong
Description
The dataset used in this paper for few-shot text classification task.
Data from: Yahoo Answers Dataset
kaggle.com
zip
Updated May 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rakesh Jarupula (2021). Yahoo Answers Dataset [Dataset]. https://www.kaggle.com/jarupula/yahoo-answers-dataset
Explore at:
zip(324009088 bytes)Available download formats
Dataset updated
May 14, 2021
Authors
Rakesh Jarupula
Description
The 10 largest main categories from the Yahoo! Answers Comprehensive Questions and Answers version 1.0 dataset. Each class contains 140,000 training samples and 5,000 testing samples.
r
Data from: Yahoo Answers dataset
resodate.org
service.tib.eu
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiang Zhang; Junbo Zhao; Yann LeCun (2024). Yahoo Answers dataset [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQveWFob28tYW5zd2Vycy1kYXRhc2V0
Explore at:
Dataset updated
Nov 25, 2024
Dataset provided by
Leibniz Data Manager
Authors
Xiang Zhang; Junbo Zhao; Yann LeCun
Description
The Yahoo Answers dataset contains labeled examples for topic classification used to perform error analysis on a BERT-based model.
r
Yahoo Answer and Yelp15 review
resodate.org
service.tib.eu
Updated Dec 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zichao Yang; Zhiting Hu; Ruslan Salakhutdinov; Taylor Berg-Kirkpatrick (2024). Yahoo Answer and Yelp15 review [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQveWFob28tYW5zd2VyLWFuZC15ZWxwMTUtcmV2aWV3
Explore at:
Dataset updated
Dec 3, 2024
Dataset provided by
Leibniz Data Manager
Authors
Zichao Yang; Zhiting Hu; Ruslan Salakhutdinov; Taylor Berg-Kirkpatrick
Description
Two large scale document classification datasets: Yahoo Answer and Yelp15 review, representing topic classification and sentiment classification data sets respectively.
h
YahooAnswersTopicsClassification
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massive Text Embedding Benchmark, YahooAnswersTopicsClassification [Dataset]. https://huggingface.co/datasets/mteb/YahooAnswersTopicsClassification
Explore at:
Dataset authored and provided by
Massive Text Embedding Benchmark
License
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Description
YahooAnswersTopicsClassification An MTEB dataset Massive Text Embedding Benchmark

Dataset composed of questions and answers from Yahoo Answers, categorized into topics.

Task category t2c

Domains Web, Written

Referencehttps://huggingface.co/datasets/yahoo_answers_topics

Source datasets:

community-datasets/yahoo_answers_topics

How to evaluate on this task

You can evaluate an embedding model on this dataset using the following code: import mteb

task =… See the full description on the dataset page: https://huggingface.co/datasets/mteb/YahooAnswersTopicsClassification.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Bhavik Ardeshna (2022). Yahoo! Answers Topic Classification [Dataset]. https://www.kaggle.com/datasets/bhavikardeshna/yahoo-email-classification

Yahoo! Answers Topic Classification

The Yahoo! dataset is constructed using 10 largest main categories.

Explore at:

50 scholarly articles cite this dataset (View in Google Scholar)

zip(324007831 bytes)Available download formats

Dataset updated

Jun 30, 2022

Authors

Bhavik Ardeshna

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

The Yahoo! Answers topic classification dataset is constructed using the 10 largest main categories. Each class contains 140,000 training samples and 6,000 testing samples. Therefore, the total number of training samples is 1,400,000, and testing samples are 60,000 in this dataset. From all the answers and other meta-information, we only used the best answer content and the main category information.

Society & Culture
Science & Mathematics
Health
Education & Reference
Computers & Internet
Sports
Business & Finance
Entertainment & Music
Family & Relationships
Politics & Government

The Yahoo! Answers topic classification dataset is constructed by Xiang Zhang (xiang.zhang@nyu.edu) from the above dataset. It is used as a text classification benchmark in the following paper: Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015)

Clear search

Close search

Google apps

Main menu

Yahoo! Answers Topic Classification

📊 Yahoo Answers 10 categories for NLP CSV

Data from: Yahoo! Answers

Yahoo Answers topics

Data from: Yahoo Answers Dataset

Data from: Yahoo Answers dataset

Yahoo Answer and Yelp15 review

YahooAnswersTopicsClassification

Yahoo! Answers Topic ClassificationSee More Versions

The Yahoo! dataset is constructed using 10 largest main categories.

Yahoo! Answers Topic Classification