39 datasets found
  1. h

    bbc-news

    • huggingface.co
    • opendatalab.com
    Updated Jun 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SetFit (2022). bbc-news [Dataset]. https://huggingface.co/datasets/SetFit/bbc-news
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 28, 2022
    Dataset authored and provided by
    SetFit
    Description

    BBC News Topic Dataset

    Dataset on BBC News Topic Classification consisting of 2,225 articles published on the BBC News website corresponding during 2004-2005. Each article is labeled under one of 5 categories: business, entertainment, politics, sport or tech. Original source for this dataset:

    Derek Greene, Pádraig Cunningham, “Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering,” in Proc. 23rd International Conference on Machine learning (ICML’06)… See the full description on the dataset page: https://huggingface.co/datasets/SetFit/bbc-news.

  2. h

    bbc-news-summary

    • huggingface.co
    Updated Mar 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gopal Kalpande (2023). bbc-news-summary [Dataset]. https://huggingface.co/datasets/gopalkalpande/bbc-news-summary
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 1, 2023
    Authors
    Gopal Kalpande
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    About Dataset

      Context
    

    Text summarization is a way to condense the large amount of information into a concise form by the process of selection of important information and discarding unimportant and redundant information. With the amount of textual information present in the world wide web the area of text summarization is becoming very important. The extractive summarization is the one where the exact sentences present in the document are used as summaries. The extractive… See the full description on the dataset page: https://huggingface.co/datasets/gopalkalpande/bbc-news-summary.

  3. h

    bbc_news_alltime

    • huggingface.co
    Updated Mar 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RealTimeData (2024). bbc_news_alltime [Dataset]. https://huggingface.co/datasets/RealTimeData/bbc_news_alltime
    Explore at:
    Dataset updated
    Mar 7, 2024
    Dataset authored and provided by
    RealTimeData
    Description

    RealTimeData Monthly Collection - BBC News

    This datasets contains all news articles from BBC News that were created every months from 2017 to current. To access articles in a specific month, simple run the following: ds = datasets.load_dataset('RealTimeData/bbc_news_alltime', '2020-02')

    This will give you all BBC news articles that were created in 2020-02.

      Want to crawl the data by your own?
    

    Please head to LatestEval for the crawler scripts.

      Credit… See the full description on the dataset page: https://huggingface.co/datasets/RealTimeData/bbc_news_alltime.
    
  4. h

    bbc_latest

    • huggingface.co
    Updated Sep 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RealTimeData (2024). bbc_latest [Dataset]. https://huggingface.co/datasets/RealTimeData/bbc_latest
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 8, 2024
    Dataset authored and provided by
    RealTimeData
    Description

    Latest BBC News

    You could always access the latest BBC News articles via this dataset. We update the dataset weekly, on every Sunday. So the dataset always provides the latest BBC News article from the last week. The current dataset on main branch contains the latest BBC News articles submitted from 2024-09-02 to 2024-09-09. The data collection is conducted on 2024-09-09. Use the dataset via: ds = datasets.load_dataset('RealTimeData/bbc_latest')

      Previsou versions
    

    You… See the full description on the dataset page: https://huggingface.co/datasets/RealTimeData/bbc_latest.

  5. h

    bbc-news-100

    • huggingface.co
    Updated Aug 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Adamczyk (2024). bbc-news-100 [Dataset]. https://huggingface.co/datasets/davidadamczyk/bbc-news-100
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 23, 2024
    Authors
    David Adamczyk
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    davidadamczyk/bbc-news-100 dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. h

    bbc-news

    • huggingface.co
    Updated Sep 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DefenceLab (2024). bbc-news [Dataset]. https://huggingface.co/datasets/DefenceLab/bbc-news
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 19, 2024
    Dataset authored and provided by
    DefenceLab
    Description

    DefenceLab/bbc-news dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. h

    fineweb-bbc-news-embeddings

    • huggingface.co
    Updated Apr 27, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    jeosol (2012). fineweb-bbc-news-embeddings [Dataset]. https://huggingface.co/datasets/jeosol/fineweb-bbc-news-embeddings
    Explore at:
    Dataset updated
    Apr 27, 2012
    Authors
    jeosol
    Description

    jeosol/fineweb-bbc-news-embeddings dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. h

    bbc-data-v2

    • huggingface.co
    Updated Feb 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OOD Research (2025). bbc-data-v2 [Dataset]. https://huggingface.co/datasets/ood-research/bbc-data-v2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 6, 2025
    Dataset authored and provided by
    OOD Research
    Description

    ood-research/bbc-data-v2 dataset hosted on Hugging Face and contributed by the HF Datasets community

  9. h

    bbc-news-embeddings

    • huggingface.co
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yongbin Choi (2025). bbc-news-embeddings [Dataset]. https://huggingface.co/datasets/whybe-choi/bbc-news-embeddings
    Explore at:
    Dataset updated
    May 28, 2025
    Authors
    Yongbin Choi
    Description

    whybe-choi/bbc-news-embeddings dataset hosted on Hugging Face and contributed by the HF Datasets community

  10. h

    BBC-IDC-article

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jan Soukeník, BBC-IDC-article [Dataset]. https://huggingface.co/datasets/Dzeniks/BBC-IDC-article
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Jan Soukeník
    Description

    Dzeniks/BBC-IDC-article dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. h

    bbc-news-fkl

    • huggingface.co
    Updated Aug 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    YuanLu (2024). bbc-news-fkl [Dataset]. https://huggingface.co/datasets/0x-YuAN/bbc-news-fkl
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 16, 2024
    Authors
    YuanLu
    Description

    0x-YuAN/bbc-news-fkl dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. h

    bbc-test

    • huggingface.co
    Updated Apr 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tan Thanh Nguyen (2024). bbc-test [Dataset]. https://huggingface.co/datasets/TanThanhNg/bbc-test
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 4, 2024
    Authors
    Tan Thanh Nguyen
    Description

    TanThanhNg/bbc-test dataset hosted on Hugging Face and contributed by the HF Datasets community

  13. h

    c4-bbc-news

    • huggingface.co
    Updated Jan 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Louis Maddox (2025). c4-bbc-news [Dataset]. https://huggingface.co/datasets/permutans/c4-bbc-news
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 6, 2025
    Authors
    Louis Maddox
    License

    https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/

    Description

    Dataset Card for BBC News from C4

    This dataset provides a filtered subset of BBC News articles from the realnewslike subset of the C4 dataset, containing approximately 77k articles from BBC News domains.

      Dataset Details
    
    
    
    
    
      Dataset Sources
    

    Repository: https://huggingface.co/datasets/permutans/c4-bbc-news Source Dataset: allenai/c4 (realnewslike subset) Paper: https://arxiv.org/abs/1910.10683 (C4 paper)

      Uses
    
    
    
    
    
      Direct Use
    

    Suitable for text… See the full description on the dataset page: https://huggingface.co/datasets/permutans/c4-bbc-news.

  14. h

    arrowhead-bbc

    • huggingface.co
    Updated Sep 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pranjal Jaiswal (2024). arrowhead-bbc [Dataset]. https://huggingface.co/datasets/pranjaljaiswal/arrowhead-bbc
    Explore at:
    Dataset updated
    Sep 1, 2024
    Authors
    Pranjal Jaiswal
    Description

    pranjaljaiswal/arrowhead-bbc dataset hosted on Hugging Face and contributed by the HF Datasets community

  15. h

    bbc-media-show

    • huggingface.co
    Updated Aug 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark Needham (2024). bbc-media-show [Dataset]. https://huggingface.co/datasets/markhneedham/bbc-media-show
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 2, 2024
    Authors
    Mark Needham
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    markhneedham/bbc-media-show dataset hosted on Hugging Face and contributed by the HF Datasets community

  16. h

    bbc-finetune-data

    • huggingface.co
    Updated Jun 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OOD Research (2020). bbc-finetune-data [Dataset]. https://huggingface.co/datasets/ood-research/bbc-finetune-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 9, 2020
    Dataset authored and provided by
    OOD Research
    Description

    ood-research/bbc-finetune-data dataset hosted on Hugging Face and contributed by the HF Datasets community

  17. h

    bbc-news-llama4-maverick-summary

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Youzhi Yu, bbc-news-llama4-maverick-summary [Dataset]. https://huggingface.co/datasets/PursuitOfDataScience/bbc-news-llama4-maverick-summary
    Explore at:
    Authors
    Youzhi Yu
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    BBC News Summary Dataset (Llama-4-Maverick-17B-128E-Instruct-FP8)

      Dataset Description
    

    This dataset contains high-quality summaries for BBC news articles from the CC-MAIN-2013-20 web crawl, generated using the Llama-4-Maverick-17B-128E-Instruct-FP8 model. Each summary provides a concise, accurate overview of BBC news stories while preserving journalistic integrity and essential information.

      Dataset Features
    

    High-quality summaries: Generated using… See the full description on the dataset page: https://huggingface.co/datasets/PursuitOfDataScience/bbc-news-llama4-maverick-summary.

  18. h

    xlsum

    • huggingface.co
    • opendatalab.com
    Updated Dec 18, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GEM benchmark (2021). xlsum [Dataset]. https://huggingface.co/datasets/GEM/xlsum
    Explore at:
    Dataset updated
    Dec 18, 2021
    Dataset authored and provided by
    GEM benchmark
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    We present XLSum, a comprehensive and diverse dataset comprising 1.35 million professionally annotated article-summary pairs from BBC, extracted using a set of carefully designed heuristics. The dataset covers 45 languages ranging from low to high-resource, for many of which no public dataset is currently available. XL-Sum is highly abstractive, concise, and of high quality, as indicated by human and intrinsic evaluation.

  19. h

    BBC-Sinhala

    • huggingface.co
    Updated Jun 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hamza Ziyard (2023). BBC-Sinhala [Dataset]. https://huggingface.co/datasets/Hamza-Ziyard/BBC-Sinhala
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 17, 2023
    Authors
    Hamza Ziyard
    Description

    Hamza-Ziyard/BBC-Sinhala dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. h

    chinese-BBC-dataset

    • huggingface.co
    Updated Oct 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    min (2024). chinese-BBC-dataset [Dataset]. https://huggingface.co/datasets/minsea/chinese-BBC-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 18, 2024
    Authors
    min
    Description

    minsea/chinese-BBC-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
SetFit (2022). bbc-news [Dataset]. https://huggingface.co/datasets/SetFit/bbc-news

bbc-news

SetFit/bbc-news

BBC News Topic Dataset

Explore at:
4 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 28, 2022
Dataset authored and provided by
SetFit
Description

BBC News Topic Dataset

Dataset on BBC News Topic Classification consisting of 2,225 articles published on the BBC News website corresponding during 2004-2005. Each article is labeled under one of 5 categories: business, entertainment, politics, sport or tech. Original source for this dataset:

Derek Greene, Pádraig Cunningham, “Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering,” in Proc. 23rd International Conference on Machine learning (ICML’06)… See the full description on the dataset page: https://huggingface.co/datasets/SetFit/bbc-news.

Search
Clear search
Close search
Google apps
Main menu