8 datasets found
  1. Yahoo! Answers Topic Classification

    • kaggle.com
    zip
    Updated Jun 30, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhavik Ardeshna (2022). Yahoo! Answers Topic Classification [Dataset]. https://www.kaggle.com/datasets/bhavikardeshna/yahoo-email-classification
    Explore at:
    zip(324007831 bytes)Available download formats
    Dataset updated
    Jun 30, 2022
    Authors
    Bhavik Ardeshna
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The Yahoo! Answers topic classification dataset is constructed using the 10 largest main categories. Each class contains 140,000 training samples and 6,000 testing samples. Therefore, the total number of training samples is 1,400,000, and testing samples are 60,000 in this dataset. From all the answers and other meta-information, we only used the best answer content and the main category information.

    • Society & Culture
    • Science & Mathematics
    • Health
    • Education & Reference
    • Computers & Internet
    • Sports
    • Business & Finance
    • Entertainment & Music
    • Family & Relationships
    • Politics & Government

    The Yahoo! Answers topic classification dataset is constructed by Xiang Zhang (xiang.zhang@nyu.edu) from the above dataset. It is used as a text classification benchmark in the following paper: Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015)

  2. 📊 Yahoo Answers 10 categories for NLP CSV

    • kaggle.com
    zip
    Updated Apr 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yassir Acharki (2023). 📊 Yahoo Answers 10 categories for NLP CSV [Dataset]. https://www.kaggle.com/datasets/yacharki/yahoo-answers-10-categories-for-nlp-csv
    Explore at:
    zip(324009471 bytes)Available download formats
    Dataset updated
    Apr 7, 2023
    Authors
    Yassir Acharki
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    The Yahoo! Answers topic classification dataset is constructed using 10 largest main categories. Each class contains 140,000 training samples and 6,000 testing samples. Therefore, the total number of training samples is 1,400,000 and testing samples 60,000 in this dataset. From all the answers and other meta-information, we only used the best answer content and the main category information.

    The file classes.txt contains a list of classes corresponding to each label.

    The files train.csv and test.csv contain all the training samples as comma-sparated values. There are 4 columns in them, corresponding to class index (1 to 10), question title, question content and best answer. The text fields are escaped using double quotes ("), and any internal double quote is escaped by 2 double quotes (""). New lines are escaped by a backslash followed with an "n" character, that is " ".

  3. Data from: Yahoo! Answers

    • zenodo.org
    bin, txt
    Updated Aug 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    N/A; N/A (2021). Yahoo! Answers [Dataset]. http://doi.org/10.5281/zenodo.5259952
    Explore at:
    bin, txtAvailable download formats
    Dataset updated
    Aug 26, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    N/A; N/A
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Yahoo! Answers (Yahoo! Answers Comprehensive Questions and Answers version 1.0). Zhang et al. (2016) collected this set of 4,483,032 questions and used their answers across the 10 largest main categories for building the classification dataset. The used fields include question title, question content and best answer.

    The files:
    texts.txt: Document set (text). One per line.
    score.txt: Document class whose index is associated with texts.txt
    split_

  4. r

    Yahoo Answers topics

    • resodate.org
    • service.tib.eu
    Updated Jan 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Weinan Zhang; Wanxiang Che; Hongyuan Dong (2025). Yahoo Answers topics [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQveWFob28tYW5zd2Vycy10b3BpY3M=
    Explore at:
    Dataset updated
    Jan 3, 2025
    Dataset provided by
    Leibniz Data Manager
    Authors
    Weinan Zhang; Wanxiang Che; Hongyuan Dong
    Description

    The dataset used in this paper for few-shot text classification task.

  5. Data from: Yahoo Answers Dataset

    • kaggle.com
    zip
    Updated May 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rakesh Jarupula (2021). Yahoo Answers Dataset [Dataset]. https://www.kaggle.com/jarupula/yahoo-answers-dataset
    Explore at:
    zip(324009088 bytes)Available download formats
    Dataset updated
    May 14, 2021
    Authors
    Rakesh Jarupula
    Description

    The 10 largest main categories from the Yahoo! Answers Comprehensive Questions and Answers version 1.0 dataset. Each class contains 140,000 training samples and 5,000 testing samples.

  6. r

    Data from: Yahoo Answers dataset

    • resodate.org
    • service.tib.eu
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiang Zhang; Junbo Zhao; Yann LeCun (2024). Yahoo Answers dataset [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQveWFob28tYW5zd2Vycy1kYXRhc2V0
    Explore at:
    Dataset updated
    Nov 25, 2024
    Dataset provided by
    Leibniz Data Manager
    Authors
    Xiang Zhang; Junbo Zhao; Yann LeCun
    Description

    The Yahoo Answers dataset contains labeled examples for topic classification used to perform error analysis on a BERT-based model.

  7. r

    Yahoo Answer and Yelp15 review

    • resodate.org
    • service.tib.eu
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zichao Yang; Zhiting Hu; Ruslan Salakhutdinov; Taylor Berg-Kirkpatrick (2024). Yahoo Answer and Yelp15 review [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQveWFob28tYW5zd2VyLWFuZC15ZWxwMTUtcmV2aWV3
    Explore at:
    Dataset updated
    Dec 3, 2024
    Dataset provided by
    Leibniz Data Manager
    Authors
    Zichao Yang; Zhiting Hu; Ruslan Salakhutdinov; Taylor Berg-Kirkpatrick
    Description

    Two large scale document classification datasets: Yahoo Answer and Yelp15 review, representing topic classification and sentiment classification data sets respectively.

  8. h

    YahooAnswersTopicsClassification

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massive Text Embedding Benchmark, YahooAnswersTopicsClassification [Dataset]. https://huggingface.co/datasets/mteb/YahooAnswersTopicsClassification
    Explore at:
    Dataset authored and provided by
    Massive Text Embedding Benchmark
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    YahooAnswersTopicsClassification An MTEB dataset Massive Text Embedding Benchmark

    Dataset composed of questions and answers from Yahoo Answers, categorized into topics.

    Task category t2c

    Domains Web, Written

    Referencehttps://huggingface.co/datasets/yahoo_answers_topics

    Source datasets:

    community-datasets/yahoo_answers_topics

      How to evaluate on this task
    

    You can evaluate an embedding model on this dataset using the following code: import mteb

    task =… See the full description on the dataset page: https://huggingface.co/datasets/mteb/YahooAnswersTopicsClassification.

  9. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Bhavik Ardeshna (2022). Yahoo! Answers Topic Classification [Dataset]. https://www.kaggle.com/datasets/bhavikardeshna/yahoo-email-classification
Organization logo

Yahoo! Answers Topic Classification

The Yahoo! dataset is constructed using 10 largest main categories.

Explore at:
50 scholarly articles cite this dataset (View in Google Scholar)
zip(324007831 bytes)Available download formats
Dataset updated
Jun 30, 2022
Authors
Bhavik Ardeshna
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

The Yahoo! Answers topic classification dataset is constructed using the 10 largest main categories. Each class contains 140,000 training samples and 6,000 testing samples. Therefore, the total number of training samples is 1,400,000, and testing samples are 60,000 in this dataset. From all the answers and other meta-information, we only used the best answer content and the main category information.

  • Society & Culture
  • Science & Mathematics
  • Health
  • Education & Reference
  • Computers & Internet
  • Sports
  • Business & Finance
  • Entertainment & Music
  • Family & Relationships
  • Politics & Government

The Yahoo! Answers topic classification dataset is constructed by Xiang Zhang (xiang.zhang@nyu.edu) from the above dataset. It is used as a text classification benchmark in the following paper: Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015)

Search
Clear search
Close search
Google apps
Main menu