100+ datasets found

h
ag_news
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seonghyeon Lee, ag_news [Dataset]. https://huggingface.co/datasets/sh0416/ag_news
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Seonghyeon Lee
Description
AG's News Topic Classification Dataset Version 3, Updated 09/09/2015 ORIGIN AG is a collection of more than 1 million news articles. News articles have been gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity. ComeToMyHead is an academic news search engine which has been running since July, 2004. The dataset is provided by the academic comunity for research purposes in data mining (clustering, classification, etc), information retrieval (ranking, search… See the full description on the dataset page: https://huggingface.co/datasets/sh0416/ag_news.
h
ag_news
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
wangrongsheng, ag_news [Dataset]. https://huggingface.co/datasets/wangrongsheng/ag_news
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
wangrongsheng
License
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Description
Dataset Card for "ag_news"

Dataset Summary

AG is a collection of more than 1 million news articles. News articles have been gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity. ComeToMyHead is an academic news search engine which has been running since July, 2004. The dataset is provided by the academic comunity for research purposes in data mining (clustering, classification, etc), information retrieval (ranking, search, etc), xml… See the full description on the dataset page: https://huggingface.co/datasets/wangrongsheng/ag_news.
T
ag_news_subset
tensorflow.org
Updated Dec 6, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). ag_news_subset [Dataset]. http://identifiers.org/arxiv:1509.01626
Explore at:
Unique identifier
https://identifiers.org/arxiv:1509.01626
Dataset updated
Dec 6, 2022
Description
AG is a collection of more than 1 million news articles. News articles have been gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity. ComeToMyHead is an academic news search engine which has been running since July, 2004. The dataset is provided by the academic comunity for research purposes in data mining (clustering, classification, etc), information retrieval (ranking, search, etc), xml, data compression, data streaming, and any other non-commercial activity. For more information, please refer to the link http://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html .

The AG's news topic classification dataset is constructed by Xiang Zhang (xiang.zhang@nyu.edu) from the dataset above. It is used as a text classification benchmark in the following paper: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015).

The AG's news topic classification dataset is constructed by choosing 4 largest classes from the original corpus. Each class contains 30,000 training samples and 1,900 testing samples. The total number of training samples is 120,000 and testing 7,600.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('ag_news_subset', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
a
AG News
academictorrents.com
bittorrent
Updated Oct 16, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiang Zhang et al., 2015 (2018). AG News [Dataset]. https://academictorrents.com/details/758bf646e3ffd39d20f9a3d9efbdb0e1eade5022
Explore at:
bittorrent(11784419)Available download formats
Dataset updated
Oct 16, 2018
Dataset authored and provided by
Xiang Zhang et al., 2015
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
496,835 categorized news articles from >2000 news sources from the 4 largest classes from AG’s corpus of news articles, using only the title and description fields. The number of training samples for each class is 30,000 and testing 1900.
h
ag_news
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SetFit, ag_news [Dataset]. https://huggingface.co/datasets/SetFit/ag_news
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
SetFit
Description
SetFit/ag_news dataset hosted on Hugging Face and contributed by the HF Datasets community
h
AG-news
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kseniia Zolina, AG-news [Dataset]. https://huggingface.co/datasets/nixiieee/AG-news
Explore at:
Authors
Kseniia Zolina
Description
nixiieee/AG-news dataset hosted on Hugging Face and contributed by the HF Datasets community
d
sts_bert_microsoft-mpnet-base AG News Results
data.dtu.dk
txt
Updated Jul 26, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Beatrix Miranda Ginn Nielsen (2024). sts_bert_microsoft-mpnet-base AG News Results [Dataset]. http://doi.org/10.11583/DTU.21268422.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.11583/DTU.21268422.v1
Dataset updated
Jul 26, 2024
Dataset provided by
Technical University of Denmark
Authors
Beatrix Miranda Ginn Nielsen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Raw result files used for tables and figures in Hubness Reduction Improves Sentence-BERT Semantic Spaces (DOI: coming)

For more info see: https://github.com/bemigini/hubness-reduction-sentence-bert
t
AG News, SogouNews and DBpedia
service.tib.eu
Updated Dec 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). AG News, SogouNews and DBpedia [Dataset]. https://service.tib.eu/ldmservice/dataset/ag-news--sogounews-and-dbpedia
Explore at:
Dataset updated
Dec 16, 2024
Description
The AG News, SogouNews and DBpedia datasets are used for text classification experiments.
O
AG’s Corpus (AG's corpus of news articlesNews)
opendatalab.com
zip
Updated Aug 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Pisa (2022). AG’s Corpus (AG's corpus of news articlesNews) [Dataset]. https://opendatalab.com/OpenDataLab/AG_s_Corpus
Explore at:
zip(619245611 bytes)Available download formats
Dataset updated
Aug 23, 2022
Dataset provided by
Google
University of Pisa
License
http://groups.di.unipi.it/~gulli/AG_corpus_of_news_articles.htmlhttp://groups.di.unipi.it/~gulli/AG_corpus_of_news_articles.html
Description
Antonio Gulli’s corpus of news articles is a collection of more than 1 million news articles. The articles have been gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity. ComeToMyHead is an academic news search engine which has been running since July, 2004. The dataset is provided by the academic comunity for research purposes in data mining (clustering, classification, etc), information retrieval (ranking, search, etc), xml, data compression, data streaming, and any other non - commercial activity. A subset of this corpus, AG News, consisting of the 4 largest classes is a popular topic classification dataset.
d
sts_bert_distilroberta-base AG News results
data.dtu.dk
json
Updated Jul 26, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Beatrix Miranda Ginn Nielsen (2024). sts_bert_distilroberta-base AG News results [Dataset]. http://doi.org/10.11583/DTU.21387282.v1
Explore at:
jsonAvailable download formats
Unique identifier
https://doi.org/10.11583/DTU.21387282.v1
Dataset updated
Jul 26, 2024
Dataset provided by
Technical University of Denmark
Authors
Beatrix Miranda Ginn Nielsen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Raw result files used for tables and figures in Hubness Reduction Improves Sentence-BERT Semantic Spaces (DOI: coming)

For more info see: https://github.com/bemigini/hubness-reduction-sentence-bert
Farm Service Agency News Releases
catalog.data.gov
datadiscoverystudio.org
+3more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Farm Service Agency, Department of Agriculture (2025). Farm Service Agency News Releases [Dataset]. https://catalog.data.gov/dataset/farm-service-agency-news-releases
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
United States Department of Agriculturehttp://usda.gov/
Farm Service Agencyhttps://www.fsa.usda.gov/
Description
Feed of news releases from the US Department of Agriculture, Farm Service Agency.
f
Pretrained sentence BERT models AG News Results
figshare.com
data.dtu.dk
txt
Updated Jul 26, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Beatrix Miranda Ginn Nielsen (2024). Pretrained sentence BERT models AG News Results [Dataset]. http://doi.org/10.11583/DTU.21276648.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.11583/DTU.21276648.v1
Dataset updated
Jul 26, 2024
Dataset provided by
Technical University of Denmark
Authors
Beatrix Miranda Ginn Nielsen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Raw result files used for tables and figures in Hubness Reduction Improves Sentence-BERT Semantic Spaces (DOI: coming)

For more info see: https://github.com/bemigini/hubness-reduction-sentence-bert
t
Yizhe Zhang, Dinghan Shen (2024). Dataset: AG’s News....
service.tib.eu
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Yizhe Zhang, Dinghan Shen (2024). Dataset: AG’s News. https://doi.org/10.57702/w24z4xar [Dataset]. https://service.tib.eu/ldmservice/dataset/ag-s-news
Explore at:
Dataset updated
Nov 25, 2024
Description
The AG’s News dataset is a topic classification dataset containing news articles categorized into different topics.
Farm Service Agency News and Events Widget
catalog.data.gov
agdatacommons.nal.usda.gov
+1more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Farm Service Agency, Department of Agriculture (2025). Farm Service Agency News and Events Widget [Dataset]. https://catalog.data.gov/dataset/farm-service-agency-news-and-events-widget
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
United States Department of Agriculturehttp://usda.gov/
Farm Service Agencyhttps://www.fsa.usda.gov/
Description
This Widget provides access to all FSA National News releases. The widget may be embedded into your website or blog with code provided using either Flash or Javascript.
h
AG-news-softlabels-averaged
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kseniia Zolina, AG-news-softlabels-averaged [Dataset]. https://huggingface.co/datasets/nixiieee/AG-news-softlabels-averaged
Explore at:
Authors
Kseniia Zolina
Description
Same as nixiieee/AG-news-softlabels, but prompt was run 10 times and then predictions were averaged for each sample. This led to better quality when training model on this data compared to non-averaged predictions.
Ten Thousand German News Articles Dataset
kaggle.com
tblock.github.io
zip
Updated Jan 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Timo Block (2022). Ten Thousand German News Articles Dataset [Dataset]. https://www.kaggle.com/tblock/10kgnad
Explore at:
zip(21144764 bytes)Available download formats
Dataset updated
Jan 20, 2022
Authors
Timo Block
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
(see https://tblock.github.io/10kGNAD/ for the original dataset page)

This page introduces the 10k German News Articles Dataset (10kGNAD) german topic classification dataset. The 10kGNAD is based on the One Million Posts Corpus and avalaible under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. You can download the dataset here.

Why a German dataset?

English text classification datasets are common. Examples are the big AG News, the class-rich 20 Newsgroups and the large-scale DBpedia ontology datasets for topic classification and for example the commonly used IMDb and Yelp datasets for sentiment analysis. Non-english datasets, especially German datasets, are less common. There is a collection of sentiment analysis datasets assembled by the Interest Group on German Sentiment Analysis. However, to my knowlege, no german topic classification dataset is avaliable to the public.

Due to grammatical differences between the English and the German language, a classifyer might be effective on a English dataset, but not as effectiv on a German dataset. The German language has a higher inflection and long compound words are quite common compared to the English language. One would need to evaluate a classifyer on multiple German datasets to get a sense of it's effectivness.

The dataset

The 10kGNAD dataset is intended to solve part of this problem as the first german topic classification dataset. It consists of 10273 german language news articles from an austrian online newspaper categorized into nine topics. These articles are a till now unused part of the One Million Posts Corpus.

In the One Million Posts Corpus each article has a topic path. For example Newsroom/Wirtschaft/Wirtschaftpolitik/Finanzmaerkte/Griechenlandkrise. The 10kGNAD uses the second part of the topic path, here Wirtschaft, as class label. In result the dataset can be used for multi-class classification.

I created and used this dataset in my thesis to train and evaluate four text classifyers on the German language. By publishing the dataset I hope to support the advancement of tools and models for the German language. Additionally this dataset can be used as a benchmark dataset for german topic classification.

Numbers and statistics

As in most real-world datasets the class distribution of the 10kGNAD is not balanced. The biggest class Web consists of 1678, while the smalles class Kultur contains only 539 articles. However articles from the Web class have on average the fewest words, while artilces from the culture class have the second most words.

Splitting into train and test

I propose a stratifyed split of 10% for testing and the remaining articles for training. To use the dataset as a benchmark dataset, please used the train.csv and test.csv files located in the project root.

Code

Python scripts to extract the articles and split them into a train- and a testset avaliable in the code directory of this project. Make sure to install the requirements. The original corpus.sqlite3 is required to extract the articles (download here (compressed) or here (uncompressed)).

License

This dataset is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Please consider citing the authors of the One Million Post Corpus if you use the dataset.
Farm Service Agency Market News Widget
catalog.data.gov
agdatacommons.nal.usda.gov
+2more
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Farm Service Agency, Department of Agriculture (2025). Farm Service Agency Market News Widget [Dataset]. https://catalog.data.gov/dataset/farm-service-agency-market-news-widget
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
United States Department of Agriculturehttp://usda.gov/
Farm Service Agencyhttps://www.fsa.usda.gov/
Description
This Widget provides access to all FSA Daily Terminal Market Prices information releases. The widget may be embedded into your website or blog with code provided using either Flash or Javascript.
d
Latest news from the Ministry of Agriculture
data.gov.tw
csv, json
Updated Apr 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ministry of Agriculture (2024). Latest news from the Ministry of Agriculture [Dataset]. https://data.gov.tw/en/datasets/95056
Explore at:
csv, jsonAvailable download formats
Dataset updated
Apr 12, 2024
Dataset authored and provided by
Ministry of Agriculture
License
https://data.gov.tw/licensehttps://data.gov.tw/license
Description
Provide the latest news RSS feed of the Department of Agriculture
w
Dataset of news about lavoie.ag
workwithdata.com
Updated May 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of news about lavoie.ag [Dataset]. https://www.workwithdata.com/datasets/news?f=1&fcol0=page_name&fop0=%3D&fval0=lavoie.ag
Explore at:
Dataset updated
May 16, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about news. It has 2 rows and is filtered where the keywords includes lavoie.ag. It features 10 columns including source, publication date, section, and news link.
t
CIFAR-100 and AGNews
service.tib.eu
Updated Dec 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). CIFAR-100 and AGNews [Dataset]. https://service.tib.eu/ldmservice/dataset/cifar-100-and-agnews
Explore at:
Dataset updated
Dec 17, 2024
Description
Two datasets used for multi-task learning, CIFAR-100 and AGNews.

Facebook

Twitter

Click to copy link

Link copied

Cite

Seonghyeon Lee, ag_news [Dataset]. https://huggingface.co/datasets/sh0416/ag_news

ag_news

sh0416/ag_news

Explore at:

343 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Authors

Seonghyeon Lee

Description

AG's News Topic Classification Dataset Version 3, Updated 09/09/2015 ORIGIN AG is a collection of more than 1 million news articles. News articles have been gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity. ComeToMyHead is an academic news search engine which has been running since July, 2004. The dataset is provided by the academic comunity for research purposes in data mining (clustering, classification, etc), information retrieval (ranking, search… See the full description on the dataset page: https://huggingface.co/datasets/sh0416/ag_news.

Clear search

Close search

Google apps

Main menu

ag_news

ag_news

ag_news_subset

AG News

ag_news

AG-news

sts_bert_microsoft-mpnet-base AG News Results

AG News, SogouNews and DBpedia

AG’s Corpus (AG's corpus of news articlesNews)

sts_bert_distilroberta-base AG News results

Farm Service Agency News Releases

Pretrained sentence BERT models AG News Results

Yizhe Zhang, Dinghan Shen (2024). Dataset: AG’s News....

Farm Service Agency News and Events Widget

AG-news-softlabels-averaged

Ten Thousand German News Articles Dataset

Why a German dataset?

The dataset

Numbers and statistics

Splitting into train and test

Code

License

Farm Service Agency Market News Widget

Latest news from the Ministry of Agriculture

Dataset of news about lavoie.ag

CIFAR-100 and AGNews

ag_newsSee More Versions

sh0416/ag_news

ag_news