100+ datasets found
  1. h

    bbc-news

    • huggingface.co
    • opendatalab.com
    Updated Jun 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SetFit (2022). bbc-news [Dataset]. https://huggingface.co/datasets/SetFit/bbc-news
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 28, 2022
    Dataset authored and provided by
    SetFit
    Description

    BBC News Topic Dataset

    Dataset on BBC News Topic Classification consisting of 2,225 articles published on the BBC News website corresponding during 2004-2005. Each article is labeled under one of 5 categories: business, entertainment, politics, sport or tech. Original source for this dataset:

    Derek Greene, Pádraig Cunningham, “Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering,” in Proc. 23rd International Conference on Machine learning (ICML’06)… See the full description on the dataset page: https://huggingface.co/datasets/SetFit/bbc-news.

  2. BBC Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data, BBC Datasets [Dataset]. https://brightdata.com/products/datasets/bbc
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Jul 11, 2025
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Unlock the full potential of BBC broadcast data with our comprehensive dataset featuring transcripts, program schedules, headlines, topics, and multimedia resources. This all-in-one dataset is designed to empower media analysts, researchers, journalists, and advocacy groups with actionable insights for media analysis, transparency studies, and editorial assessments.

    Dataset Features

    Transcripts: Access detailed broadcast transcripts, including headlines, content, author details, and publication dates. Perfect for analyzing media framing, topic frequency, and news narratives across various programs. Program Schedules: Explore program schedules with accurate timing, show names, and related metadata to track news coverage patterns and identify trends. Topics and Keywords: Analyze categorized topics and keywords to understand content diversity, editorial focus, and recurring themes in news broadcasts. Multimedia Content: Gain access to videos, images, and related articles linked to each broadcast for a holistic understanding of the news presentation. Metadata: Includes critical data points like publication dates, last updates, content URLs, and unique IDs for easier referencing and cross-analysis.

    Customizable Subsets for Specific Needs Our CNN dataset is fully customizable to match your research or analytical goals. Focus on transcripts for in-depth media framing analysis, extract multimedia for content visualization studies, or dive into program schedules for broadcast trend analysis. Tailor the dataset to ensure it aligns with your objectives for maximum efficiency and relevance.

    Popular Use Cases

    Media Analysis: Evaluate news framing, content diversity, and topic coverage to assess editorial direction and media focus. Transparency Studies: Analyze journalistic standards, corrections, and retractions to assess media integrity and accountability. Audience Engagement: Identify recurring topics and trends in news content to understand audience preferences and behavior. Market Analysis: Track media coverage of key industries, companies, and topics to analyze public sentiment and industry relevance. Journalistic Integrity: Use transcripts and metadata to evaluate adherence to reporting practices, fairness, and transparency in news coverage. Research and Scholarly Studies: Leverage transcripts and multimedia to support academic studies in journalism, media criticism, and political discourse analysis.

    Whether you are evaluating transparency, conducting media criticism, or tracking broadcast trends, our BBC dataset provides you with the tools and insights needed for in-depth research and strategic analysis. Customize your access to focus on the most relevant data points for your unique needs.

  3. P

    BBC News Summary Dataset

    • paperswithcode.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anushka Gupta; Diksha Chugh; Anjum; Rahul Katarya, BBC News Summary Dataset [Dataset]. https://paperswithcode.com/dataset/bbc-news-summary
    Explore at:
    Authors
    Anushka Gupta; Diksha Chugh; Anjum; Rahul Katarya
    Description

    This dataset was created using a dataset used for data categorization that onsists of 2225 documents from the BBC news website corresponding to stories in five topical areas from 2004-2005 used in the paper of D. Greene and P. Cunningham. "Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering", Proc. ICML 2006; whose all rights, including copyright, in the content of the original articles are owned by the BBC. More at http://mlg.ucd.ie/datasets/bbc.html

  4. h

    bbc-news-summary

    • huggingface.co
    Updated Mar 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gopal Kalpande (2023). bbc-news-summary [Dataset]. https://huggingface.co/datasets/gopalkalpande/bbc-news-summary
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 1, 2023
    Authors
    Gopal Kalpande
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    About Dataset

      Context
    

    Text summarization is a way to condense the large amount of information into a concise form by the process of selection of important information and discarding unimportant and redundant information. With the amount of textual information present in the world wide web the area of text summarization is becoming very important. The extractive summarization is the one where the exact sentences present in the document are used as summaries. The extractive… See the full description on the dataset page: https://huggingface.co/datasets/gopalkalpande/bbc-news-summary.

  5. BBC news dataset

    • kaggle.com
    Updated Nov 24, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shine K George (2018). BBC news dataset [Dataset]. https://www.kaggle.com/datasets/shineucc/bbc-news-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 24, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Shine K George
    Description

    Dataset

    This dataset was created by Shine K George

    Released under Data files © Original Authors

    Contents

  6. c

    BBC Latest News Dataset 2021

    • crawlfeeds.com
    json, zip
    Updated Apr 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2024). BBC Latest News Dataset 2021 [Dataset]. https://crawlfeeds.com/datasets/bbc-latest-news-dataset-2021
    Explore at:
    zip, jsonAvailable download formats
    Dataset updated
    Apr 6, 2024
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    This dataset contains more than 1 million news articles and extracted all the data points present in the news article page. BBC news articles first collected on the year 2021 and convered all the categories present in the BBC site.

    This news dataset is ideal for text clasification, finding popular categories, NLP and other reasearch purposes.

    Dataset is available in JSON format.

  7. R

    Bbc Dataset

    • universe.roboflow.com
    zip
    Updated Jul 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vicky (2022). Bbc Dataset [Dataset]. https://universe.roboflow.com/vicky-8rg4e/bbc-fqrxn
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 25, 2022
    Dataset authored and provided by
    vicky
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Po Bounding Boxes
    Description

    Bbc

    ## Overview
    
    Bbc is a dataset for object detection tasks - it contains Po annotations for 335 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  8. c

    BBC News Dataset – February 2023 Edition

    • crawlfeeds.com
    csv, zip
    Updated Jun 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). BBC News Dataset – February 2023 Edition [Dataset]. https://crawlfeeds.com/datasets/bbc-news-dataset-feb-2023
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Jun 14, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    Get access to a comprehensive and structured dataset of BBC News articles, freshly crawled and compiled in February 2023. This collection includes 1 million records from one of the world’s most trusted news organizations — perfect for training NLP models, sentiment analysis, and trend detection across global topics.

    💾 Format: CSV (available in ZIP archive)

    📢 Status: Published and available for immediate access

    Use Cases

    • Train language models to summarize or categorize news

    • Detect media bias and compare narrative framing

    • Conduct research in journalism, politics, and public sentiment

    • Enrich news aggregation platforms with clean metadata

    • Analyze content distribution across categories (e.g. health, politics, tech)

    This dataset ensures reliable and high-quality information sourced from a globally respected outlet. The format is optimized for quick ingestion into your pipelines — with clean text, timestamps, image links, and more.

    Need a filtered dataset or want this refreshed for a later date? We offer on-demand news scraping as well.

    👉 Request access or sample now

  9. h

    bbc_latest

    • huggingface.co
    Updated Sep 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RealTimeData (2024). bbc_latest [Dataset]. https://huggingface.co/datasets/RealTimeData/bbc_latest
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 8, 2024
    Dataset authored and provided by
    RealTimeData
    Description

    Latest BBC News

    You could always access the latest BBC News articles via this dataset. We update the dataset weekly, on every Sunday. So the dataset always provides the latest BBC News article from the last week. The current dataset on main branch contains the latest BBC News articles submitted from 2024-09-02 to 2024-09-09. The data collection is conducted on 2024-09-09. Use the dataset via: ds = datasets.load_dataset('RealTimeData/bbc_latest')

      Previsou versions
    

    You… See the full description on the dataset page: https://huggingface.co/datasets/RealTimeData/bbc_latest.

  10. h

    bbc_images_alltime

    • huggingface.co
    Updated Mar 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RealTimeData (2024). bbc_images_alltime [Dataset]. https://huggingface.co/datasets/RealTimeData/bbc_images_alltime
    Explore at:
    Dataset updated
    Mar 7, 2024
    Dataset authored and provided by
    RealTimeData
    Description

    RealTimeData Monthly Collection - BBC News Images

    This datasets contains all news articles head images from BBC News that were created every months from 2017 to current. To access articles in a specific month, simple run the following: ds = datasets.load_dataset('RealTimeData/bbc_images_alltime', '2020-02')

    This will give you all BBC news head images that were created in 2020-02.

      Want to crawl the data by your own?
    

    Please head to LatestEval for the crawler scripts.… See the full description on the dataset page: https://huggingface.co/datasets/RealTimeData/bbc_images_alltime.

  11. BBC Dataset

    • zenodo.org
    mp4, txt
    Updated Feb 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lorenzo Baraldi; Costantino Grana; Rita Cucchiara; Lorenzo Baraldi; Costantino Grana; Rita Cucchiara (2025). BBC Dataset [Dataset]. http://doi.org/10.1145/2733373.2806316
    Explore at:
    mp4, txtAvailable download formats
    Dataset updated
    Feb 13, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lorenzo Baraldi; Costantino Grana; Rita Cucchiara; Lorenzo Baraldi; Costantino Grana; Rita Cucchiara
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Oct 13, 2015
    Description

    BBC Dataset for video shot detection. Paper: https://dl.acm.org/doi/10.1145/2733373.2806316

    Videos are BBC educational TV series Planet Earth.

  12. BBC News Classification

    • kaggle.com
    Updated Dec 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chalika Mihiran (2021). BBC News Classification [Dataset]. https://www.kaggle.com/datasets/chalikamihiran/bbc-news-classification
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 6, 2021
    Dataset provided by
    Kaggle
    Authors
    Chalika Mihiran
    Description

    Dataset

    This dataset was created by Chalika Mihiran

    Contents

  13. t

    Tweets – BBC News Dataset

    • service.tib.eu
    Updated Dec 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Tweets – BBC News Dataset [Dataset]. https://service.tib.eu/ldmservice/dataset/tweets---bbc-news-dataset
    Explore at:
    Dataset updated
    Dec 3, 2024
    Description

    New annotated datasets linking tweets and articles, including Tweets – PAP News Dataset, Tweets – BBC News Dataset, Cascades – PAP News Dataset, and Cascades – BBC News Dataset.

  14. BBC-News Dataset

    • kaggle.com
    Updated Aug 11, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sahil Kirpekar (2020). BBC-News Dataset [Dataset]. https://www.kaggle.com/sahilkirpekar/bbcnews-dataset/tasks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 11, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sahil Kirpekar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Hello data people ! 😄

    This is the BBC news dataset (cleaned version) which I have uploaded after my previous dataset post. The original dataset downloaded from the UCI Machine Learning Repository was unclean. The dataset was cleaned by extracting the keywords from the description column into the noisy 'keys' column data.

    About the Dataset 🔢

    The BBC news dataset consists of the following data 1. # - News ID. 2. descr - description/detail of the news provided. 3. tags - the tags/keywords related to the corresponding news in the 'descr' label.

  15. bbc dataset

    • kaggle.com
    Updated Jul 2, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jagjit Singh (2019). bbc dataset [Dataset]. https://www.kaggle.com/sainijagjit/bbc-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 2, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jagjit Singh
    Description

    Dataset

    This dataset was created by Jagjit Singh

    Contents

  16. o

    Daily BBC News Text Dataset

    • opendatabay.com
    .undefined
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Daily BBC News Text Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/7545737c-bc3b-48b7-a332-247a2a33cc17
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Datasimple
    Area covered
    Entertainment & Media Consumption
    Description

    This dataset is designed for analysing news trends, performing sentiment analysis, and studying the impact of specific events over time. It offers valuable insights for those interested in media coverage, news propagation, and shifts in public interest across various topics. The dataset is particularly useful for tasks involving natural language processing (NLP), multiclass classification, and text pre-processing.

    Columns

    • title: The headline or title of the news article.
    • pubDate: The date and time when the news article was published.
    • guid: A globally unique identifier for the news article, typically presented as a URL.
    • link: The direct URL link to access the full news article online.
    • description: A concise summary or brief overview of the news article content.

    Distribution

    The dataset, named bbc_news.csv, contains 35,860 rows and 5 columns. It is typically provided in a CSV file format. The dataset includes 33,889 unique descriptions, 32,335 unique links, 33,124 unique titles, and 33,081 unique GUIDs.

    Usage

    This dataset is ideally suited for: * Analysing patterns and shifts in news reporting. * Conducting sentiment analysis on news article content. * Investigating the influence of particular events over time. * Developing and testing models for multiclass classification. * Tasks requiring text pre-processing for machine learning applications. * Research into media coverage and public engagement with news.

    Coverage

    The data primarily spans from 07 March 2022 to 03 July 2024. However, the full collection includes a wider range of publication dates, with some articles dating back to 2013. The distribution of articles by date range is as follows: * 08/30/2013 - 03/16/2014: 1 article * 06/16/2017 - 12/31/2017: 1 article * 12/31/2017 - 07/17/2018: 1 article * 08/17/2019 - 03/02/2020: 1 article * 09/16/2020 - 04/02/2021: 2 articles * 10/17/2021 - 05/03/2022: 2,477 articles * 05/03/2022 - 11/17/2022: 8,049 articles * 11/17/2022 - 06/03/2023: 7,334 articles * 06/03/2023 - 12/18/2023: 8,933 articles * 12/18/2023 - 07/04/2024: 9,061 articles The dataset covers news articles on a global scale.

    License

    CC-BY

    Who Can Use It

    This dataset is particularly beneficial for: * Researchers: For academic studies on media, public opinion, and linguistic analysis. * Data Scientists: For developing predictive models, text analytics, and machine learning applications. * Journalists: For investigative reporting, trend analysis, and understanding news propagation. * Individuals interested in natural language processing (NLP) and text-based data projects.

    Dataset Name Suggestions

    • BBC News Articles Collection
    • BBC News Headlines & Summaries Dataset
    • Daily BBC News Text Data
    • BBC News Article Archive

    Attributes

    Original Data Source:BBC News Articles

  17. BBC DATASET

    • kaggle.com
    Updated Jun 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sweta raj sinha (2022). BBC DATASET [Dataset]. https://www.kaggle.com/datasets/swetarajsinha/bbc-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 23, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sweta raj sinha
    Description

    Dataset

    This dataset was created by Sweta raj sinha

    Contents

  18. BBC news sample dataset

    • dataandsons.com
    csv, zip
    Updated Feb 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    crawl feeds (2023). BBC news sample dataset [Dataset]. https://www.dataandsons.com/categories/politics/bbc-news-sample-dataset
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Feb 24, 2023
    Dataset provided by
    Authors
    crawl feeds
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Time period covered
    Feb 16, 2023 - Feb 24, 2023
    Description

    About this Dataset

    BBC news sample dataset has 60 records with 12 columns. Crawl Feeds team used in-house tools to extract data from BBC.

    Category

    Politics

    Keywords

    news dataset,bbc news dataset,news data

    Row Count

    66

    Price

    Free

  19. Official Development Assistance (ODA): BBC World Service

    • gov.uk
    • s3.amazonaws.com
    Updated Feb 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Foreign, Commonwealth & Development Office (2024). Official Development Assistance (ODA): BBC World Service [Dataset]. https://www.gov.uk/government/publications/official-development-assistance-oda-bbc-world-service
    Explore at:
    Dataset updated
    Feb 7, 2024
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Foreign, Commonwealth & Development Office
    Description

    Foreign, Commonwealth & Development Office (FCDO) ODA data for the BBC World Service for financial years between 2016 to 2017 and 2023 to 2024 (up to December 2023).

    To be consistent with the data we have provided to the International Aid Transparency Initiative, the complete data set includes data from previous financial years.

    Find out about all ODA spend data for the FCDO.

    The whole of government ODA data is on:

  20. P

    Short BBC Pose Dataset

    • paperswithcode.com
    Updated Mar 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Short BBC Pose Dataset [Dataset]. https://paperswithcode.com/dataset/short-bbc-pose
    Explore at:
    Dataset updated
    Mar 24, 2021
    Description

    Short BBC Pose contains five one-hour-long videos with sign language signers each with different sleeve length (in contrast to the BBC pose and Extended BBC Pose, which only contain signers with moderately long sleeves). Each of the five videos has 200 test frames (which have been manually annotated with joint locations), amounting to 1,000 test frames in total. Test frames were selected by the authors to contain a diverse range of poses.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
SetFit (2022). bbc-news [Dataset]. https://huggingface.co/datasets/SetFit/bbc-news

bbc-news

SetFit/bbc-news

BBC News Topic Dataset

Explore at:
4 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 28, 2022
Dataset authored and provided by
SetFit
Description

BBC News Topic Dataset

Dataset on BBC News Topic Classification consisting of 2,225 articles published on the BBC News website corresponding during 2004-2005. Each article is labeled under one of 5 categories: business, entertainment, politics, sport or tech. Original source for this dataset:

Derek Greene, Pádraig Cunningham, “Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering,” in Proc. 23rd International Conference on Machine learning (ICML’06)… See the full description on the dataset page: https://huggingface.co/datasets/SetFit/bbc-news.

Search
Clear search
Close search
Google apps
Main menu