BBC News Topic Dataset
Dataset on BBC News Topic Classification consisting of 2,225 articles published on the BBC News website corresponding during 2004-2005. Each article is labeled under one of 5 categories: business, entertainment, politics, sport or tech. Original source for this dataset:
Derek Greene, Pádraig Cunningham, “Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering,” in Proc. 23rd International Conference on Machine learning (ICML’06)… See the full description on the dataset page: https://huggingface.co/datasets/SetFit/bbc-news.
https://brightdata.com/licensehttps://brightdata.com/license
Unlock the full potential of BBC broadcast data with our comprehensive dataset featuring transcripts, program schedules, headlines, topics, and multimedia resources. This all-in-one dataset is designed to empower media analysts, researchers, journalists, and advocacy groups with actionable insights for media analysis, transparency studies, and editorial assessments.
Dataset Features
Transcripts: Access detailed broadcast transcripts, including headlines, content, author details, and publication dates. Perfect for analyzing media framing, topic frequency, and news narratives across various programs. Program Schedules: Explore program schedules with accurate timing, show names, and related metadata to track news coverage patterns and identify trends. Topics and Keywords: Analyze categorized topics and keywords to understand content diversity, editorial focus, and recurring themes in news broadcasts. Multimedia Content: Gain access to videos, images, and related articles linked to each broadcast for a holistic understanding of the news presentation. Metadata: Includes critical data points like publication dates, last updates, content URLs, and unique IDs for easier referencing and cross-analysis.
Customizable Subsets for Specific Needs Our CNN dataset is fully customizable to match your research or analytical goals. Focus on transcripts for in-depth media framing analysis, extract multimedia for content visualization studies, or dive into program schedules for broadcast trend analysis. Tailor the dataset to ensure it aligns with your objectives for maximum efficiency and relevance.
Popular Use Cases
Media Analysis: Evaluate news framing, content diversity, and topic coverage to assess editorial direction and media focus. Transparency Studies: Analyze journalistic standards, corrections, and retractions to assess media integrity and accountability. Audience Engagement: Identify recurring topics and trends in news content to understand audience preferences and behavior. Market Analysis: Track media coverage of key industries, companies, and topics to analyze public sentiment and industry relevance. Journalistic Integrity: Use transcripts and metadata to evaluate adherence to reporting practices, fairness, and transparency in news coverage. Research and Scholarly Studies: Leverage transcripts and multimedia to support academic studies in journalism, media criticism, and political discourse analysis.
Whether you are evaluating transparency, conducting media criticism, or tracking broadcast trends, our BBC dataset provides you with the tools and insights needed for in-depth research and strategic analysis. Customize your access to focus on the most relevant data points for your unique needs.
This dataset was created using a dataset used for data categorization that onsists of 2225 documents from the BBC news website corresponding to stories in five topical areas from 2004-2005 used in the paper of D. Greene and P. Cunningham. "Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering", Proc. ICML 2006; whose all rights, including copyright, in the content of the original articles are owned by the BBC. More at http://mlg.ucd.ie/datasets/bbc.html
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
About Dataset
Context
Text summarization is a way to condense the large amount of information into a concise form by the process of selection of important information and discarding unimportant and redundant information. With the amount of textual information present in the world wide web the area of text summarization is becoming very important. The extractive summarization is the one where the exact sentences present in the document are used as summaries. The extractive… See the full description on the dataset page: https://huggingface.co/datasets/gopalkalpande/bbc-news-summary.
This dataset was created by Shine K George
Released under Data files © Original Authors
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
This dataset contains more than 1 million news articles and extracted all the data points present in the news article page. BBC news articles first collected on the year 2021 and convered all the categories present in the BBC site.
This news dataset is ideal for text clasification, finding popular categories, NLP and other reasearch purposes.
Dataset is available in JSON format.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Bbc is a dataset for object detection tasks - it contains Po annotations for 335 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Get access to a comprehensive and structured dataset of BBC News articles, freshly crawled and compiled in February 2023. This collection includes 1 million records from one of the world’s most trusted news organizations — perfect for training NLP models, sentiment analysis, and trend detection across global topics.
💾 Format: CSV (available in ZIP archive)
📢 Status: Published and available for immediate access
Train language models to summarize or categorize news
Detect media bias and compare narrative framing
Conduct research in journalism, politics, and public sentiment
Enrich news aggregation platforms with clean metadata
Analyze content distribution across categories (e.g. health, politics, tech)
This dataset ensures reliable and high-quality information sourced from a globally respected outlet. The format is optimized for quick ingestion into your pipelines — with clean text, timestamps, image links, and more.
Need a filtered dataset or want this refreshed for a later date? We offer on-demand news scraping as well.
👉 Request access or sample now
Latest BBC News
You could always access the latest BBC News articles via this dataset. We update the dataset weekly, on every Sunday. So the dataset always provides the latest BBC News article from the last week. The current dataset on main branch contains the latest BBC News articles submitted from 2024-09-02 to 2024-09-09. The data collection is conducted on 2024-09-09. Use the dataset via: ds = datasets.load_dataset('RealTimeData/bbc_latest')
Previsou versions
You… See the full description on the dataset page: https://huggingface.co/datasets/RealTimeData/bbc_latest.
RealTimeData Monthly Collection - BBC News Images
This datasets contains all news articles head images from BBC News that were created every months from 2017 to current. To access articles in a specific month, simple run the following: ds = datasets.load_dataset('RealTimeData/bbc_images_alltime', '2020-02')
This will give you all BBC news head images that were created in 2020-02.
Want to crawl the data by your own?
Please head to LatestEval for the crawler scripts.… See the full description on the dataset page: https://huggingface.co/datasets/RealTimeData/bbc_images_alltime.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BBC Dataset for video shot detection. Paper: https://dl.acm.org/doi/10.1145/2733373.2806316
Videos are BBC educational TV series Planet Earth.
This dataset was created by Chalika Mihiran
New annotated datasets linking tweets and articles, including Tweets – PAP News Dataset, Tweets – BBC News Dataset, Cascades – PAP News Dataset, and Cascades – BBC News Dataset.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is the BBC news dataset (cleaned version) which I have uploaded after my previous dataset post. The original dataset downloaded from the UCI Machine Learning Repository was unclean. The dataset was cleaned by extracting the keywords from the description column into the noisy 'keys' column data.
The BBC news dataset consists of the following data 1. # - News ID. 2. descr - description/detail of the news provided. 3. tags - the tags/keywords related to the corresponding news in the 'descr' label.
This dataset was created by Jagjit Singh
This dataset is designed for analysing news trends, performing sentiment analysis, and studying the impact of specific events over time. It offers valuable insights for those interested in media coverage, news propagation, and shifts in public interest across various topics. The dataset is particularly useful for tasks involving natural language processing (NLP), multiclass classification, and text pre-processing.
The dataset, named bbc_news.csv, contains 35,860 rows and 5 columns. It is typically provided in a CSV file format. The dataset includes 33,889 unique descriptions, 32,335 unique links, 33,124 unique titles, and 33,081 unique GUIDs.
This dataset is ideally suited for: * Analysing patterns and shifts in news reporting. * Conducting sentiment analysis on news article content. * Investigating the influence of particular events over time. * Developing and testing models for multiclass classification. * Tasks requiring text pre-processing for machine learning applications. * Research into media coverage and public engagement with news.
The data primarily spans from 07 March 2022 to 03 July 2024. However, the full collection includes a wider range of publication dates, with some articles dating back to 2013. The distribution of articles by date range is as follows: * 08/30/2013 - 03/16/2014: 1 article * 06/16/2017 - 12/31/2017: 1 article * 12/31/2017 - 07/17/2018: 1 article * 08/17/2019 - 03/02/2020: 1 article * 09/16/2020 - 04/02/2021: 2 articles * 10/17/2021 - 05/03/2022: 2,477 articles * 05/03/2022 - 11/17/2022: 8,049 articles * 11/17/2022 - 06/03/2023: 7,334 articles * 06/03/2023 - 12/18/2023: 8,933 articles * 12/18/2023 - 07/04/2024: 9,061 articles The dataset covers news articles on a global scale.
CC-BY
This dataset is particularly beneficial for: * Researchers: For academic studies on media, public opinion, and linguistic analysis. * Data Scientists: For developing predictive models, text analytics, and machine learning applications. * Journalists: For investigative reporting, trend analysis, and understanding news propagation. * Individuals interested in natural language processing (NLP) and text-based data projects.
Original Data Source:BBC News Articles
This dataset was created by Sweta raj sinha
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
BBC news sample dataset has 60 records with 12 columns. Crawl Feeds team used in-house tools to extract data from BBC.
Politics
news dataset,bbc news dataset,news data
66
Free
Foreign, Commonwealth & Development Office (FCDO) ODA data for the BBC World Service for financial years between 2016 to 2017 and 2023 to 2024 (up to December 2023).
To be consistent with the data we have provided to the International Aid Transparency Initiative, the complete data set includes data from previous financial years.
Find out about all ODA spend data for the FCDO.
The whole of government ODA data is on:
Short BBC Pose contains five one-hour-long videos with sign language signers each with different sleeve length (in contrast to the BBC pose and Extended BBC Pose, which only contain signers with moderately long sleeves). Each of the five videos has 200 test frames (which have been manually annotated with joint locations), amounting to 1,000 test frames in total. Test frames were selected by the authors to contain a diverse range of poses.
BBC News Topic Dataset
Dataset on BBC News Topic Classification consisting of 2,225 articles published on the BBC News website corresponding during 2004-2005. Each article is labeled under one of 5 categories: business, entertainment, politics, sport or tech. Original source for this dataset:
Derek Greene, Pádraig Cunningham, “Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering,” in Proc. 23rd International Conference on Machine learning (ICML’06)… See the full description on the dataset page: https://huggingface.co/datasets/SetFit/bbc-news.