BBC News Topic Dataset
Dataset on BBC News Topic Classification consisting of 2,225 articles published on the BBC News website corresponding during 2004-2005. Each article is labeled under one of 5 categories: business, entertainment, politics, sport or tech. Original source for this dataset:
Derek Greene, Pádraig Cunningham, “Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering,” in Proc. 23rd International Conference on Machine learning (ICML’06)… See the full description on the dataset page: https://huggingface.co/datasets/SetFit/bbc-news.
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
About Dataset
Context
Text summarization is a way to condense the large amount of information into a concise form by the process of selection of important information and discarding unimportant and redundant information. With the amount of textual information present in the world wide web the area of text summarization is becoming very important. The extractive summarization is the one where the exact sentences present in the document are used as summaries. The extractive… See the full description on the dataset page: https://huggingface.co/datasets/gopalkalpande/bbc-news-summary.
RealTimeData Monthly Collection - BBC News
This datasets contains all news articles from BBC News that were created every months from 2017 to current. To access articles in a specific month, simple run the following: ds = datasets.load_dataset('RealTimeData/bbc_news_alltime', '2020-02')
This will give you all BBC news articles that were created in 2020-02.
Want to crawl the data by your own?
Please head to LatestEval for the crawler scripts.
Credit… See the full description on the dataset page: https://huggingface.co/datasets/RealTimeData/bbc_news_alltime.
Latest BBC News
You could always access the latest BBC News articles via this dataset. We update the dataset weekly, on every Sunday. So the dataset always provides the latest BBC News article from the last week. The current dataset on main branch contains the latest BBC News articles submitted from 2024-09-02 to 2024-09-09. The data collection is conducted on 2024-09-09. Use the dataset via: ds = datasets.load_dataset('RealTimeData/bbc_latest')
Previsou versions
You… See the full description on the dataset page: https://huggingface.co/datasets/RealTimeData/bbc_latest.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
davidadamczyk/bbc-news-100 dataset hosted on Hugging Face and contributed by the HF Datasets community
DefenceLab/bbc-news dataset hosted on Hugging Face and contributed by the HF Datasets community
jeosol/fineweb-bbc-news-embeddings dataset hosted on Hugging Face and contributed by the HF Datasets community
ood-research/bbc-data-v2 dataset hosted on Hugging Face and contributed by the HF Datasets community
whybe-choi/bbc-news-embeddings dataset hosted on Hugging Face and contributed by the HF Datasets community
Dzeniks/BBC-IDC-article dataset hosted on Hugging Face and contributed by the HF Datasets community
0x-YuAN/bbc-news-fkl dataset hosted on Hugging Face and contributed by the HF Datasets community
TanThanhNg/bbc-test dataset hosted on Hugging Face and contributed by the HF Datasets community
https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
Dataset Card for BBC News from C4
This dataset provides a filtered subset of BBC News articles from the realnewslike subset of the C4 dataset, containing approximately 77k articles from BBC News domains.
Dataset Details
Dataset Sources
Repository: https://huggingface.co/datasets/permutans/c4-bbc-news Source Dataset: allenai/c4 (realnewslike subset) Paper: https://arxiv.org/abs/1910.10683 (C4 paper)
Uses
Direct Use
Suitable for text… See the full description on the dataset page: https://huggingface.co/datasets/permutans/c4-bbc-news.
pranjaljaiswal/arrowhead-bbc dataset hosted on Hugging Face and contributed by the HF Datasets community
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
markhneedham/bbc-media-show dataset hosted on Hugging Face and contributed by the HF Datasets community
ood-research/bbc-finetune-data dataset hosted on Hugging Face and contributed by the HF Datasets community
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
BBC News Summary Dataset (Llama-4-Maverick-17B-128E-Instruct-FP8)
Dataset Description
This dataset contains high-quality summaries for BBC news articles from the CC-MAIN-2013-20 web crawl, generated using the Llama-4-Maverick-17B-128E-Instruct-FP8 model. Each summary provides a concise, accurate overview of BBC news stories while preserving journalistic integrity and essential information.
Dataset Features
High-quality summaries: Generated using… See the full description on the dataset page: https://huggingface.co/datasets/PursuitOfDataScience/bbc-news-llama4-maverick-summary.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
We present XLSum, a comprehensive and diverse dataset comprising 1.35 million professionally annotated article-summary pairs from BBC, extracted using a set of carefully designed heuristics. The dataset covers 45 languages ranging from low to high-resource, for many of which no public dataset is currently available. XL-Sum is highly abstractive, concise, and of high quality, as indicated by human and intrinsic evaluation.
Hamza-Ziyard/BBC-Sinhala dataset hosted on Hugging Face and contributed by the HF Datasets community
minsea/chinese-BBC-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
BBC News Topic Dataset
Dataset on BBC News Topic Classification consisting of 2,225 articles published on the BBC News website corresponding during 2004-2005. Each article is labeled under one of 5 categories: business, entertainment, politics, sport or tech. Original source for this dataset:
Derek Greene, Pádraig Cunningham, “Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering,” in Proc. 23rd International Conference on Machine learning (ICML’06)… See the full description on the dataset page: https://huggingface.co/datasets/SetFit/bbc-news.