100+ datasets found
  1. h

    ag_news

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seonghyeon Lee, ag_news [Dataset]. https://huggingface.co/datasets/sh0416/ag_news
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Seonghyeon Lee
    Description

    AG's News Topic Classification Dataset Version 3, Updated 09/09/2015 ORIGIN AG is a collection of more than 1 million news articles. News articles have been gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity. ComeToMyHead is an academic news search engine which has been running since July, 2004. The dataset is provided by the academic comunity for research purposes in data mining (clustering, classification, etc), information retrieval (ranking, search… See the full description on the dataset page: https://huggingface.co/datasets/sh0416/ag_news.

  2. h

    ag_news

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    wangrongsheng, ag_news [Dataset]. https://huggingface.co/datasets/wangrongsheng/ag_news
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    wangrongsheng
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    Dataset Card for "ag_news"

      Dataset Summary
    

    AG is a collection of more than 1 million news articles. News articles have been gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity. ComeToMyHead is an academic news search engine which has been running since July, 2004. The dataset is provided by the academic comunity for research purposes in data mining (clustering, classification, etc), information retrieval (ranking, search, etc), xml… See the full description on the dataset page: https://huggingface.co/datasets/wangrongsheng/ag_news.

  3. T

    ag_news_subset

    • tensorflow.org
    Updated Dec 6, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). ag_news_subset [Dataset]. http://identifiers.org/arxiv:1509.01626
    Explore at:
    Dataset updated
    Dec 6, 2022
    Description

    AG is a collection of more than 1 million news articles. News articles have been gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity. ComeToMyHead is an academic news search engine which has been running since July, 2004. The dataset is provided by the academic comunity for research purposes in data mining (clustering, classification, etc), information retrieval (ranking, search, etc), xml, data compression, data streaming, and any other non-commercial activity. For more information, please refer to the link http://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html .

    The AG's news topic classification dataset is constructed by Xiang Zhang (xiang.zhang@nyu.edu) from the dataset above. It is used as a text classification benchmark in the following paper: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015).

    The AG's news topic classification dataset is constructed by choosing 4 largest classes from the original corpus. Each class contains 30,000 training samples and 1,900 testing samples. The total number of training samples is 120,000 and testing 7,600.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('ag_news_subset', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  4. a

    AG News

    • academictorrents.com
    bittorrent
    Updated Oct 16, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiang Zhang et al., 2015 (2018). AG News [Dataset]. https://academictorrents.com/details/758bf646e3ffd39d20f9a3d9efbdb0e1eade5022
    Explore at:
    bittorrent(11784419)Available download formats
    Dataset updated
    Oct 16, 2018
    Dataset authored and provided by
    Xiang Zhang et al., 2015
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    496,835 categorized news articles from >2000 news sources from the 4 largest classes from AG’s corpus of news articles, using only the title and description fields. The number of training samples for each class is 30,000 and testing 1900.

  5. h

    ag_news

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SetFit, ag_news [Dataset]. https://huggingface.co/datasets/SetFit/ag_news
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    SetFit
    Description

    SetFit/ag_news dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. h

    AG-news

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kseniia Zolina, AG-news [Dataset]. https://huggingface.co/datasets/nixiieee/AG-news
    Explore at:
    Authors
    Kseniia Zolina
    Description

    nixiieee/AG-news dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. d

    sts_bert_microsoft-mpnet-base AG News Results

    • data.dtu.dk
    txt
    Updated Jul 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Beatrix Miranda Ginn Nielsen (2024). sts_bert_microsoft-mpnet-base AG News Results [Dataset]. http://doi.org/10.11583/DTU.21268422.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 26, 2024
    Dataset provided by
    Technical University of Denmark
    Authors
    Beatrix Miranda Ginn Nielsen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Raw result files used for tables and figures in Hubness Reduction Improves Sentence-BERT Semantic Spaces (DOI: coming)

    For more info see: https://github.com/bemigini/hubness-reduction-sentence-bert

  8. t

    AG News, SogouNews and DBpedia

    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). AG News, SogouNews and DBpedia [Dataset]. https://service.tib.eu/ldmservice/dataset/ag-news--sogounews-and-dbpedia
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    The AG News, SogouNews and DBpedia datasets are used for text classification experiments.

  9. O

    AG’s Corpus (AG's corpus of news articlesNews)

    • opendatalab.com
    zip
    Updated Aug 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Pisa (2022). AG’s Corpus (AG's corpus of news articlesNews) [Dataset]. https://opendatalab.com/OpenDataLab/AG_s_Corpus
    Explore at:
    zip(619245611 bytes)Available download formats
    Dataset updated
    Aug 23, 2022
    Dataset provided by
    Google
    University of Pisa
    License

    http://groups.di.unipi.it/~gulli/AG_corpus_of_news_articles.htmlhttp://groups.di.unipi.it/~gulli/AG_corpus_of_news_articles.html

    Description

    Antonio Gulli’s corpus of news articles is a collection of more than 1 million news articles. The articles have been gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity. ComeToMyHead is an academic news search engine which has been running since July, 2004. The dataset is provided by the academic comunity for research purposes in data mining (clustering, classification, etc), information retrieval (ranking, search, etc), xml, data compression, data streaming, and any other non - commercial activity. A subset of this corpus, AG News, consisting of the 4 largest classes is a popular topic classification dataset.

  10. d

    sts_bert_distilroberta-base AG News results

    • data.dtu.dk
    json
    Updated Jul 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Beatrix Miranda Ginn Nielsen (2024). sts_bert_distilroberta-base AG News results [Dataset]. http://doi.org/10.11583/DTU.21387282.v1
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jul 26, 2024
    Dataset provided by
    Technical University of Denmark
    Authors
    Beatrix Miranda Ginn Nielsen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Raw result files used for tables and figures in Hubness Reduction Improves Sentence-BERT Semantic Spaces (DOI: coming)

    For more info see: https://github.com/bemigini/hubness-reduction-sentence-bert

  11. Farm Service Agency News Releases

    • catalog.data.gov
    • datadiscoverystudio.org
    • +3more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farm Service Agency, Department of Agriculture (2025). Farm Service Agency News Releases [Dataset]. https://catalog.data.gov/dataset/farm-service-agency-news-releases
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    United States Department of Agriculturehttp://usda.gov/
    Farm Service Agencyhttps://www.fsa.usda.gov/
    Description

    Feed of news releases from the US Department of Agriculture, Farm Service Agency.

  12. f

    Pretrained sentence BERT models AG News Results

    • figshare.com
    • data.dtu.dk
    txt
    Updated Jul 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Beatrix Miranda Ginn Nielsen (2024). Pretrained sentence BERT models AG News Results [Dataset]. http://doi.org/10.11583/DTU.21276648.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 26, 2024
    Dataset provided by
    Technical University of Denmark
    Authors
    Beatrix Miranda Ginn Nielsen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Raw result files used for tables and figures in Hubness Reduction Improves Sentence-BERT Semantic Spaces (DOI: coming)

    For more info see: https://github.com/bemigini/hubness-reduction-sentence-bert

  13. t

    Yizhe Zhang, Dinghan Shen (2024). Dataset: AG’s News....

    • service.tib.eu
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Yizhe Zhang, Dinghan Shen (2024). Dataset: AG’s News. https://doi.org/10.57702/w24z4xar [Dataset]. https://service.tib.eu/ldmservice/dataset/ag-s-news
    Explore at:
    Dataset updated
    Nov 25, 2024
    Description

    The AG’s News dataset is a topic classification dataset containing news articles categorized into different topics.

  14. Farm Service Agency News and Events Widget

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +1more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farm Service Agency, Department of Agriculture (2025). Farm Service Agency News and Events Widget [Dataset]. https://catalog.data.gov/dataset/farm-service-agency-news-and-events-widget
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    United States Department of Agriculturehttp://usda.gov/
    Farm Service Agencyhttps://www.fsa.usda.gov/
    Description

    This Widget provides access to all FSA National News releases. The widget may be embedded into your website or blog with code provided using either Flash or Javascript.

  15. h

    AG-news-softlabels-averaged

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kseniia Zolina, AG-news-softlabels-averaged [Dataset]. https://huggingface.co/datasets/nixiieee/AG-news-softlabels-averaged
    Explore at:
    Authors
    Kseniia Zolina
    Description

    Same as nixiieee/AG-news-softlabels, but prompt was run 10 times and then predictions were averaged for each sample. This led to better quality when training model on this data compared to non-averaged predictions.

  16. Ten Thousand German News Articles Dataset

    • kaggle.com
    • tblock.github.io
    zip
    Updated Jan 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Timo Block (2022). Ten Thousand German News Articles Dataset [Dataset]. https://www.kaggle.com/tblock/10kgnad
    Explore at:
    zip(21144764 bytes)Available download formats
    Dataset updated
    Jan 20, 2022
    Authors
    Timo Block
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    (see https://tblock.github.io/10kGNAD/ for the original dataset page)

    This page introduces the 10k German News Articles Dataset (10kGNAD) german topic classification dataset. The 10kGNAD is based on the One Million Posts Corpus and avalaible under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. You can download the dataset here.

    Why a German dataset?

    English text classification datasets are common. Examples are the big AG News, the class-rich 20 Newsgroups and the large-scale DBpedia ontology datasets for topic classification and for example the commonly used IMDb and Yelp datasets for sentiment analysis. Non-english datasets, especially German datasets, are less common. There is a collection of sentiment analysis datasets assembled by the Interest Group on German Sentiment Analysis. However, to my knowlege, no german topic classification dataset is avaliable to the public.

    Due to grammatical differences between the English and the German language, a classifyer might be effective on a English dataset, but not as effectiv on a German dataset. The German language has a higher inflection and long compound words are quite common compared to the English language. One would need to evaluate a classifyer on multiple German datasets to get a sense of it's effectivness.

    The dataset

    The 10kGNAD dataset is intended to solve part of this problem as the first german topic classification dataset. It consists of 10273 german language news articles from an austrian online newspaper categorized into nine topics. These articles are a till now unused part of the One Million Posts Corpus.

    In the One Million Posts Corpus each article has a topic path. For example Newsroom/Wirtschaft/Wirtschaftpolitik/Finanzmaerkte/Griechenlandkrise. The 10kGNAD uses the second part of the topic path, here Wirtschaft, as class label. In result the dataset can be used for multi-class classification.

    I created and used this dataset in my thesis to train and evaluate four text classifyers on the German language. By publishing the dataset I hope to support the advancement of tools and models for the German language. Additionally this dataset can be used as a benchmark dataset for german topic classification.

    Numbers and statistics

    As in most real-world datasets the class distribution of the 10kGNAD is not balanced. The biggest class Web consists of 1678, while the smalles class Kultur contains only 539 articles. However articles from the Web class have on average the fewest words, while artilces from the culture class have the second most words.

    Splitting into train and test

    I propose a stratifyed split of 10% for testing and the remaining articles for training. To use the dataset as a benchmark dataset, please used the train.csv and test.csv files located in the project root.

    Code

    Python scripts to extract the articles and split them into a train- and a testset avaliable in the code directory of this project. Make sure to install the requirements. The original corpus.sqlite3 is required to extract the articles (download here (compressed) or here (uncompressed)).

    License

    Creative Commons License

    This dataset is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Please consider citing the authors of the One Million Post Corpus if you use the dataset.

  17. Farm Service Agency Market News Widget

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +2more
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farm Service Agency, Department of Agriculture (2025). Farm Service Agency Market News Widget [Dataset]. https://catalog.data.gov/dataset/farm-service-agency-market-news-widget
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    United States Department of Agriculturehttp://usda.gov/
    Farm Service Agencyhttps://www.fsa.usda.gov/
    Description

    This Widget provides access to all FSA Daily Terminal Market Prices information releases. The widget may be embedded into your website or blog with code provided using either Flash or Javascript.

  18. d

    Latest news from the Ministry of Agriculture

    • data.gov.tw
    csv, json
    Updated Apr 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ministry of Agriculture (2024). Latest news from the Ministry of Agriculture [Dataset]. https://data.gov.tw/en/datasets/95056
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Apr 12, 2024
    Dataset authored and provided by
    Ministry of Agriculture
    License

    https://data.gov.tw/licensehttps://data.gov.tw/license

    Description

    Provide the latest news RSS feed of the Department of Agriculture

  19. w

    Dataset of news about lavoie.ag

    • workwithdata.com
    Updated May 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of news about lavoie.ag [Dataset]. https://www.workwithdata.com/datasets/news?f=1&fcol0=page_name&fop0=%3D&fval0=lavoie.ag
    Explore at:
    Dataset updated
    May 16, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about news. It has 2 rows and is filtered where the keywords includes lavoie.ag. It features 10 columns including source, publication date, section, and news link.

  20. t

    CIFAR-100 and AGNews

    • service.tib.eu
    Updated Dec 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). CIFAR-100 and AGNews [Dataset]. https://service.tib.eu/ldmservice/dataset/cifar-100-and-agnews
    Explore at:
    Dataset updated
    Dec 17, 2024
    Description

    Two datasets used for multi-task learning, CIFAR-100 and AGNews.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Seonghyeon Lee, ag_news [Dataset]. https://huggingface.co/datasets/sh0416/ag_news

ag_news

sh0416/ag_news

Explore at:
343 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Seonghyeon Lee
Description

AG's News Topic Classification Dataset Version 3, Updated 09/09/2015 ORIGIN AG is a collection of more than 1 million news articles. News articles have been gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity. ComeToMyHead is an academic news search engine which has been running since July, 2004. The dataset is provided by the academic comunity for research purposes in data mining (clustering, classification, etc), information retrieval (ranking, search… See the full description on the dataset page: https://huggingface.co/datasets/sh0416/ag_news.

Search
Clear search
Close search
Google apps
Main menu