100+ datasets found
  1. c

    Fox News dataset is for analyzing media trends and narratives

    • crawlfeeds.com
    csv, zip
    Updated May 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Fox News dataset is for analyzing media trends and narratives [Dataset]. https://crawlfeeds.com/datasets/fox-news-dataset
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    May 19, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    The Fox News Dataset is a comprehensive collection of over 1 million news articles, offering an unparalleled resource for analyzing media narratives, public discourse, and political trends. Covering articles up to the year 2023, this dataset is a treasure trove for researchers, analysts, and businesses interested in gaining deeper insights into the topics and trends covered by Fox News.

    Key Features of the Fox News Dataset

    • Extensive Coverage: Contains more than 1 million articles spanning various topics and events up to 2023.
    • Research-Ready: Perfect for text classification, natural language processing (NLP), and other research purposes.
    • Format: Provided in CSV format for seamless integration into analytical and research tools.

    Why Use This Dataset?

    This large dataset is ideal for:

    • Text Classification: Develop machine learning models to classify and categorize news content.
    • Natural Language Processing (NLP): Conduct sentiment analysis, keyword extraction, or topic modeling.
    • Media and Political Research: Analyze media narratives, public opinion, and political trends reflected in Fox News articles.
    • Trend Analysis: Identify shifts in public discourse and media focus over time.

    Explore More News Datasets

    Discover additional resources for your research needs by visiting our news dataset collection. These datasets are tailored to support diverse analytical applications, including sentiment analysis and trend modeling.

    The Fox News Dataset is a must-have for anyone interested in exploring large-scale media data and leveraging it for advanced analysis. Ready to dive into this wealth of information? Download the dataset now in CSV format and start uncovering the stories behind the headlines.

  2. News Dataset Vietnameses

    • kaggle.com
    zip
    Updated May 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Pham Duc (2022). News Dataset Vietnameses [Dataset]. https://www.kaggle.com/datasets/phamtheds/news-dataset-vietnameses
    Explore at:
    zip(323381125 bytes)Available download formats
    Dataset updated
    May 29, 2022
    Authors
    The Pham Duc
    Area covered
    Vietnam
    Description

    Bộ dữ liệu được crawl từ Báo Lao Động ngày 19/05/2022. Chưa qua bất kỳ bước xử lý dữ liệu nào. Có thể phục vụ cho nhiều tác vụ xử lý ngôn ngữ tự nhiên (NLP) như. 1. Tóm tắt văn bản 2. Phân loại thể loại của bài báo. 3. Dự đoán các tags của bài báo.

    Thông tin các thuộc tính:

    1 - URL: Url của bài báo. 2 - Title: Tên (tiêu đề) của bài báo. 3 - Summary: Tóm tắt nội dung của bài báo. 4 - Contents: Nội dung chi tiết của bài báo. 5 - Data: Ngày viết (xuất bản) của bài báo. 6 - Author(s): Tác giả của bài báo. 7 - Category: Thể loại của bài báo. 8 - Tags: Các tags có liên quan đến bài báo.

  3. h

    all-the-news-2-1-Component-one

    • huggingface.co
    Updated Jul 2, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rafael Arias Calles (2019). all-the-news-2-1-Component-one [Dataset]. https://huggingface.co/datasets/rjac/all-the-news-2-1-Component-one
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 2, 2019
    Authors
    Rafael Arias Calles
    Description

    2.7 million news articles and essays

      Dataset Description
    

    2.7 million news articles and essays from 27 American publications. Includes date, title, publication, article text, publication name, year, month, and URL (for some). Articles mostly span from 2016 to early 2020.

    Type: CSV Size: 3.4 GB compressed, 8.8 GB uncompressed Created by: Andrew Thompson Date added: 4/3/2020 Date modified: 4/3/2020 source: Component one Datasets 2.7 Millions Date of Download and processed:… See the full description on the dataset page: https://huggingface.co/datasets/rjac/all-the-news-2-1-Component-one.

  4. MIT AI news dataset

    • kaggle.com
    zip
    Updated Aug 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yousef Fawzi (2025). MIT AI news dataset [Dataset]. https://www.kaggle.com/datasets/losif01/mit-ai-news-dataset
    Explore at:
    zip(808350 bytes)Available download formats
    Dataset updated
    Aug 21, 2025
    Authors
    Yousef Fawzi
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    📄 Dataset Description

    This dataset contains articles scraped from the Massachusetts Institute of Technology (MIT) News website, specifically focusing on topics related to Artificial Intelligence, Machine Learning, Robotics, and Emerging Technologies.

    The data was collected from the MIT News topic page:
    👉 https://news.mit.edu/topic/artificial-intelligence2

    Each entry includes: - Title of the article - Author(s) - Publication date - Summary (dek) - Full article body text - URL to the original article - Link to related research paper (e.g., Nature, Science) when available

    The dataset spans multiple research domains, including: - AI for drug discovery & healthcare - Protein language models - Sustainable AI and eco-driving - Robotics and embodied intelligence - Chemistry and materials science - Climate and clean energy

    This dataset is ideal for: - Natural Language Processing (NLP) tasks (summarization, topic modeling, sentiment analysis) - Trend analysis in AI and scientific research - Text classification and information retrieval - Educational projects and AI literacy - Knowledge graph construction of AI research

    ⚠️ Important Notes

    • All content is copyright of MIT News and is shared under non-commercial, educational use only.
    • This dataset was collected respectfully, with delays between requests, in accordance with MIT’s robots.txt and ethical web scraping practices.
    • The full text of articles is included to enable research, but users are encouraged to cite original sources and visit the MIT News website for the latest updates.

    📁 Columns

    ColumnDescription
    titleArticle headline
    authorAuthor(s) of the article
    publication_dateHuman-readable publication date
    datetimeISO-formatted publication timestamp
    summaryArticle summary (lead paragraph)
    bodyFull article text
    paper_linkURL to the related research paper (e.g., Nature)
    urlDirect link to the MIT News article

    🔗 Source

    🙌 Inspiration

    Use this dataset to: - Track how AI is being applied across scientific disciplines - Build a news aggregator for AI research - Train a model to predict research trends - Create a search engine for MIT’s AI breakthroughs

    License

    This dataset is shared under Kaggle’s Terms of Service for non-commercial, educational, and research purposes.
    The original content remains the property of MIT News and should be properly attributed.

  5. h

    fake-news-detection-dataset-English

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erfan Moosavi Monazzah, fake-news-detection-dataset-English [Dataset]. https://huggingface.co/datasets/ErfanMoosaviMonazzah/fake-news-detection-dataset-English
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Erfan Moosavi Monazzah
    License

    https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/

    Description

    This is a cleaned and splitted version of this dataset (https://www.kaggle.com/datasets/sadikaljarif/fake-news-detection-dataset-english) Labels:

    Fake News: 0 Real News: 1 You can find the cleansing script at: https://github.com/ErfanMoosaviMonazzah/Fake-News-Detection

  6. news dataset

    • kaggle.com
    zip
    Updated Apr 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    parsa abdolmaleki (2024). news dataset [Dataset]. https://www.kaggle.com/datasets/parsaabdolmaleki/news-dataset
    Explore at:
    zip(38970756 bytes)Available download formats
    Dataset updated
    Apr 9, 2024
    Authors
    parsa abdolmaleki
    Description

    150,000 news headlines and descriptions are combined in this dataset, and preliminary pre-processing includes removing stop words, rooting words (lemmatization), etc. The data is a combination of 4 datasets.

  7. u

    Spanish Fake News Dataset

    • produccioncientifica.ucm.es
    • zenodo.org
    Updated 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tretiakov, Arsenii; D'Antonio Maceiras, Sergio; Martín, Alejandro; Tretiakov, Arsenii; D'Antonio Maceiras, Sergio; Martín, Alejandro (2025). Spanish Fake News Dataset [Dataset]. https://produccioncientifica.ucm.es/documentos/685699246364e456d3a66786
    Explore at:
    Dataset updated
    2025
    Authors
    Tretiakov, Arsenii; D'Antonio Maceiras, Sergio; Martín, Alejandro; Tretiakov, Arsenii; D'Antonio Maceiras, Sergio; Martín, Alejandro
    Description

    Spanish Fake News Dataset

    This dataset contains a structured and annotated collection of false news items in Spanish (Castilian), gathered and processed for academic research on misinformation.

    Dataset Scope

    The dataset represents most of the recorded false news messages and their variations up to 01.02.2021.

    Content Description

    The dataset includes samples of false information in various formats:

    News articles and headlines

    Tweets and Facebook/Instagram/Telegram posts

    YouTube video captions

    WhatsApp text and voice message transcripts

    Transcribed video/audio fragments with false claims

    Fake government documents

    Captions from photos and memes

    Text extracted from images using OCR

    Only Spanish (Castilian) texts were used, excluding regional variants (e.g., Catalan, Basque, Galician) for consistency.

    Sources

    The data was collected from the following verified fact-checking initiatives:

    Maldito Bulo

    Newtral

    AFP Factual

    Fact-checkers from these organizations provide detailed articles identifying and explaining falsehoods, often including:

    General context of the event

    Quotes or links to false claims

    Analysis and explanation of why the claims are false

    Verified information or corrections

    Collection Method

    The dataset was built using both manual extraction (e.g., identifying and quoting false statements) and automated parsing:

    MyNews service: an archive of Spanish mass media

    Custom scripts: for parsing and extracting structured data

    OCR tools: for extracting text from images (e.g., memes and screenshots)

    Fields Description

    Column Name

    Description

    Topic

    The thematic category of the news item (e.g., Politics, Health, COVID-19, Crime). Normalized and translated to English.

    Link source

    URL to the original news piece, fact-check report, or source of the claim. Invalid links were removed.

    Media

    The platform or outlet where the false claim appeared (e.g., Facebook, YouTube, WhatsApp). Normalized for consistent spelling and language.

    Date

    Publication or verification date of the news item, in YYYY-MM-DD format.

    Author

    (Optional) Author of the news or platform source, if available. May be empty.

    Headlines

    Title or summary of the news item or article containing the false information.

    Fake statement

    Quoted false claim or misinformation as cited in the verification article.

    ⚠️ Notes

    The dataset was preprocessed to remove duplicates, invalid links, and non-textual clutter.

    Field values were normalized to support multilingual and cross-platform analysis.

    Only Castilian Spanish was retained for consistency and clarity.

    📚 License & Use

    This dataset is intended for non-commercial academic and research purposes. Please cite the original fact-checking organizations and this dataset if used in publications or analysis.

  8. c

    News category dataset from huffpost

    • crawlfeeds.com
    csv, zip
    Updated Nov 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2024). News category dataset from huffpost [Dataset]. https://crawlfeeds.com/datasets/news-category-dataset-from-huffpost
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Nov 11, 2024
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    Huffspot news Dataset contains more than 500K+ news articles from year 2015 to 2022.

  9. CT-FAN: A Multilingual dataset for Fake News Detection

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Oct 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Juliane Köhler; Michael Wiegand; Melanie Siegel; Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Juliane Köhler; Michael Wiegand; Melanie Siegel (2022). CT-FAN: A Multilingual dataset for Fake News Detection [Dataset]. http://doi.org/10.5281/zenodo.6555293
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 23, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Juliane Köhler; Michael Wiegand; Melanie Siegel; Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Juliane Köhler; Michael Wiegand; Melanie Siegel
    Description

    By downloading the data, you agree with the terms & conditions mentioned below:

    Data Access: The data in the research collection may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use them only for research purposes.

    Summaries, analyses and interpretations of the linguistic properties of the information may be derived and published, provided it is impossible to reconstruct the information from these summaries. You may not try identifying the individuals whose texts are included in this dataset. You may not try to identify the original entry on the fact-checking site. You are not permitted to publish any portion of the dataset besides summary statistics or share it with anyone else.

    We grant you the right to access the collection's content as described in this agreement. You may not otherwise make unauthorised commercial use of, reproduce, prepare derivative works, distribute copies, perform, or publicly display the collection or parts of it. You are responsible for keeping and storing the data in a way that others cannot access. The data is provided free of charge.

    Citation

    Please cite our work as

    @InProceedings{clef-checkthat:2022:task3,
    author = {K{\"o}hler, Juliane and Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Wiegand, Michael and Siegel, Melanie and Mandl, Thomas},
    title = "Overview of the {CLEF}-2022 {CheckThat}! Lab Task 3 on Fake News Detection",
    year = {2022},
    booktitle = "Working Notes of CLEF 2022---Conference and Labs of the Evaluation Forum",
    series = {CLEF~'2022},
    address = {Bologna, Italy},}
    
    @article{shahi2021overview,
     title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection},
     author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas},
     journal={Working Notes of CLEF},
     year={2021}
    }

    Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English and German.

    Task 3: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. The training data will be released in batches and roughly about 1264 articles with the respective label in English language. Our definitions for the categories are as follows:

    • False - The main claim made in an article is untrue.

    • Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.

    • True - This rating indicates that the primary elements of the main claim are demonstrably true.

    • Other- An article that cannot be categorised as true, false, or partially false due to a lack of evidence about its claims. This category includes articles in dispute and unproven articles.

    Cross-Lingual Task (German)

    Along with the multi-class task for the English language, we have introduced a task for low-resourced language. We will provide the data for the test in the German language. The idea of the task is to use the English data and the concept of transfer to build a classification model for the German language.

    Input Data

    The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:

    • ID- Unique identifier of the news article
    • Title- Title of the news article
    • text- Text mentioned inside the news article
    • our rating - class of the news article as false, partially false, true, other

    Output data format

    • public_id- Unique identifier of the news article
    • predicted_rating- predicted class

    Sample File

    public_id, predicted_rating
    1, false
    2, true

    IMPORTANT!

    1. We have used the data from 2010 to 2022, and the content of fake news is mixed up with several topics like elections, COVID-19 etc.

    Baseline: For this task, we have created a baseline system. The baseline system can be found at https://zenodo.org/record/6362498

    Related Work

    • Shahi GK. AMUSED: An Annotation Framework of Multi-modal Social Media Data. arXiv preprint arXiv:2010.00502. 2020 Oct 1.https://arxiv.org/pdf/2010.00502.pdf
    • G. K. Shahi and D. Nandini, “FakeCovid – a multilingual cross-domain fact check news dataset for covid-19,” in workshop Proceedings of the 14th International AAAI Conference on Web and Social Media, 2020. http://workshop-proceedings.icwsm.org/abstract?id=2020_14
    • Shahi, G. K., Dirkson, A., & Majchrzak, T. A. (2021). An exploratory study of covid-19 misinformation on twitter. Online Social Networks and Media, 22, 100104. doi: 10.1016/j.osnem.2020.100104
    • Shahi, G. K., Struß, J. M., & Mandl, T. (2021). Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection. Working Notes of CLEF.
    • Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeno, A., Míguez, R., Shaar, S., ... & Mandl, T. (2021, March). The CLEF-2021 CheckThat! lab on detecting check-worthy claims, previously fact-checked claims, and fake news. In European Conference on Information Retrieval (pp. 639-649). Springer, Cham.
    • Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeño, A., Míguez, R., Shaar, S., ... & Kartal, Y. S. (2021, September). Overview of the CLEF–2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 264-291). Springer, Cham.
  10. I

    Global News Index and Extracted Features Repository (v.1.2.0)

    • databank.illinois.edu
    Updated Mar 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Global News Index and Extracted Features Repository (v.1.2.0) [Dataset]. http://doi.org/10.13012/B2IDB-5649852_V5
    Explore at:
    Dataset updated
    Mar 5, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Cline Center Global News Index is a searchable database of textual features extracted from millions of news stories, specifically designed to provide comprehensive coverage of events around the world. In addition to searching documents for keywords, users can query metadata and features such as named entities extracted using Natural Language Processing (NLP) methods and variables that measure sentiment and emotional valence. Archer is a web application purpose-built by the Cline Center to enable researchers to access data from the Global News Index. Archer provides a user-friendly interface for querying the Global News Index (with the back-end indexing still handled by Solr). By default, queries are built using icons and drop-down menus. More technically-savvy users can use Lucene/Solr query syntax via a ‘raw query’ option. Archer allows users to save and iterate on their queries, and to visualize faceted query results, which can be helpful for users as they refine their queries. Additional Resources: - Access to Archer and the Global News Index is limited to account-holders. If you are interested in signing up for an account, please fill out the Archer Access Request Form so we can determine if you are eligible for access or not. - Current users who would like to provide feedback, such as reporting a bug or requesting a feature, can fill out the Archer User Feedback Form. - The Cline Center sends out periodic email newsletters to the Archer Users Group. Please fill out this form to subscribe to it. Citation Guidelines: 1) To cite the GNI codebook (or any other documentation associated with the Global News Index and Archer) please use the following citation: Cline Center for Advanced Social Research. 2023. Global News Index and Extracted Features Repository [codebook], v1.2.0. Champaign, IL: University of Illinois. June. XX. doi:10.13012/B2IDB-5649852_V5 2) To cite data from the Global News Index (accessed via Archer or otherwise) please use the following citation (filling in the correct date of access): Cline Center for Advanced Social Research. 2023. Global News Index and Extracted Features Repository [database], v1.2.0. Champaign, IL: University of Illinois. Jun. XX. Accessed Month, DD, YYYY. doi:10.13012/B2IDB-5649852_V5 *NOTE: V4 is suppressed and V5 is replacing V4 with updated ‘Archer’ documents.

  11. i

    Covid-19 and vaccine news dataset

    • ieee-dataport.org
    Updated Oct 27, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rajat Thakur (2021). Covid-19 and vaccine news dataset [Dataset]. https://ieee-dataport.org/documents/covid-19-and-vaccine-news-dataset
    Explore at:
    Dataset updated
    Oct 27, 2021
    Authors
    Rajat Thakur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains world news related to Covid-19 and vaccine and also with the news article's available metadata.

  12. h

    gdelt-news-headlines

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Online Language Modelling, gdelt-news-headlines [Dataset]. https://huggingface.co/datasets/olm/gdelt-news-headlines
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    Online Language Modelling
    Description

    olm/gdelt-news-headlines dataset hosted on Hugging Face and contributed by the HF Datasets community

  13. Fake News detection

    • kaggle.com
    zip
    Updated Dec 7, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    jruvika (2017). Fake News detection [Dataset]. https://www.kaggle.com/datasets/jruvika/fake-news-detection
    Explore at:
    zip(5123662 bytes)Available download formats
    Dataset updated
    Dec 7, 2017
    Authors
    jruvika
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Dataset

    This dataset was created by jruvika

    Released under Database: Open Database, Contents: © Original Authors

    Contents

  14. h

    news-dataset

    • huggingface.co
    Updated Jan 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robert Albert Allen McNarland (2025). news-dataset [Dataset]. https://huggingface.co/datasets/R3troR0b/news-dataset
    Explore at:
    Dataset updated
    Jan 1, 2025
    Authors
    Robert Albert Allen McNarland
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for World_News

    A collection of news articles from around the world. The script ensures no duplicate articles are added.

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    The articles are drawn from these sources:

    Reuters News Agency

    BBC World News

    Al Jazeera

    Le Monde

    South China Morning Post

    The Hindu

    Deutshce Welle

    The Gauardian

    NPR

    TASS News Agency, Russia

    The Sydney Morning Herald

    Curated by: McNarland Software Consulatants Inc.

    Funded by… See the full description on the dataset page: https://huggingface.co/datasets/R3troR0b/news-dataset.

  15. m

    Covid-19 latest news dataset

    • data.mendeley.com
    Updated Oct 27, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rajat Thakur (2021). Covid-19 latest news dataset [Dataset]. http://doi.org/10.17632/8rbm7d874k.1
    Explore at:
    Dataset updated
    Oct 27, 2021
    Authors
    Rajat Thakur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Coronavirus disease 2019 (COVID19) time series that lists confirmed cases, reported deaths, and reported recoveries. Data is broken down by country (and sometimes by sub-region).

    Coronavirus disease (COVID19) is caused by severe acute respiratory syndrome Coronavirus 2 (SARSCoV2) and has had an effect worldwide. On March 11, 2020, the World Health Organization (WHO) declared it a pandemic, currently indicating more than 118,000 cases of coronavirus disease in more than 110 countries and territories around the world.

    This dataset contains the latest news related to Covid-19 and it was fetched with the help of Newsdata.io news API.

  16. c

    CNBC Economy Dataset - 17K Economy Articles CSV

    • crawlfeeds.com
    csv, zip
    Updated Nov 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). CNBC Economy Dataset - 17K Economy Articles CSV [Dataset]. https://crawlfeeds.com/datasets/cnbc-economy-articles-dataset
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Nov 24, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    CNBC Economy Articles Dataset is an invaluable collection of data extracted from CNBC’s economy section, offering deep insights into global and U.S. economic trends, market dynamics, financial policies, and industry developments.

    This dataset encompasses a diverse array of economic articles on critical topics like GDP growth, inflation rates, employment statistics, central bank policies, and major global events influencing the market. Designed for researchers, analysts, and businesses, it serves as an essential resource for understanding economic patterns, conducting sentiment analysis, and developing financial forecasting models.

    Dataset Highlights

    Each record in the dataset is meticulously structured and includes:

    • Article Titles
    • Publication Dates
    • Author Names
    • Content Summaries
    • URLs to Original Articles

    This rich combination of fields ensures seamless integration into data science projects, research papers, and market analyses.

    Key Features

    • Number of Articles: Hundreds of articles sourced directly from CNBC.
    • Data Fields: Includes title, publication date, author, article content, summary, URL, and relevant keywords.
    • Topics Covered: U.S. and global economy, GDP trends, inflation, employment, financial markets, and monetary policies.
    • Format: Delivered in CSV format for easy integration with research tools and analytical platforms.
    • Source: Extracted directly from CNBC’s economy news section, ensuring accuracy and relevance.

    Use Cases

    • Economic Research: Gain insights into U.S. and global economic policies, market trends, and industry developments.
    • Sentiment Analysis: Assess the sentiment of economic articles to gauge market perspectives and investor confidence.
    • Financial Modeling: Build forecasting models leveraging key economic indicators discussed in the dataset.
    • Content Creation: Develop research-backed reports, articles, and presentations on economic topics.

    Who Benefits?

    • Researchers & Academics studying macro-economics or financial policy.
    • Data Scientists building AI models, trend analyzers, or economic forecasting tools.
    • Economists & Analysts need real-world news data for policy analysis.
    • Content Strategists who write data-backed articles about economic trends.

    Why Choose This Dataset?

    • No need to manually scrape CNBC — data is pre-extracted and clean.
    • High-quality economy news metadata enables detailed filtering (by date, author, topic).
    • Ready for machine learning, sentiment analysis, or building news-based economic models.
    • Well-suited for trend tracking, policy analysis, and economic forecasting.

    Explore More News Datasets

    Interested in additional structured news datasets for your research or analytics needs? Check out our news dataset collection to find datasets tailored for diverse analytical applications.

  17. Z

    Multilingual Fake News Detection Dataset: Gujarati, Hindi, Marathi, and...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patil, Kailas; Parshv, Gandhi; Abhishek, Chauhan; Vaibhav, Patil; Ameya, Pawar (2024). Multilingual Fake News Detection Dataset: Gujarati, Hindi, Marathi, and Telugu [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11408512
    Explore at:
    Dataset updated
    Jun 1, 2024
    Dataset provided by
    Vishwakarma University
    Authors
    Patil, Kailas; Parshv, Gandhi; Abhishek, Chauhan; Vaibhav, Patil; Ameya, Pawar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is designed to support research in fake news detection across four major Indian languages: Gujarati, Hindi, Marathi, and Telugu. The dataset includes a diverse set of news articles collected from various sources, each labeled as either 'fake' or 'real'. The primary goal is to provide a resource that helps in the development and evaluation of natural language processing (NLP) models capable of detecting fake news in these regional languages.

  18. R

    News Dataset

    • universe.roboflow.com
    zip
    Updated Dec 22, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI (2023). News Dataset [Dataset]. https://universe.roboflow.com/ai-ilcyn/news-ilp3q/model/3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 22, 2023
    Dataset authored and provided by
    AI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Vietnamese Text Bounding Boxes
    Description

    News

    ## Overview
    
    News is a dataset for object detection tasks - it contains Vietnamese Text annotations for 6,770 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  19. h

    Data from: polish-news

    • huggingface.co
    Updated Oct 4, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wiktor Sobański (2018). polish-news [Dataset]. https://huggingface.co/datasets/WiktorS/polish-news
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 4, 2018
    Authors
    Wiktor Sobański
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    Poland
    Description

    This dataset contains more than 250k articles obtained from polish news site tvp.info.pl. Main purpouse of collecting the data was to create a transformer-based model for text summarization. Columns:

    link - link to article title - original title of the article headline - lead/headline of the article - first paragraph of the article visible directly from the page content - full textual contents of the article

    Link to original repo: https://github.com/WiktorSob/scraper-tvp Download the data:… See the full description on the dataset page: https://huggingface.co/datasets/WiktorS/polish-news.

  20. COVID-19 fake news Dataset

    • kaggle.com
    zip
    Updated Jun 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Inv.Alireza babazadeh zarei (2024). COVID-19 fake news Dataset [Dataset]. https://www.kaggle.com/datasets/invalizare/covid-19-fake-news-dataset/code
    Explore at:
    zip(851720 bytes)Available download formats
    Dataset updated
    Jun 11, 2024
    Authors
    Inv.Alireza babazadeh zarei
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Inv.Alireza babazadeh zarei

    Released under MIT

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Crawl Feeds (2025). Fox News dataset is for analyzing media trends and narratives [Dataset]. https://crawlfeeds.com/datasets/fox-news-dataset

Fox News dataset is for analyzing media trends and narratives

Fox News dataset is for analyzing media trends and narratives from foxnews.com

Explore at:
zip, csvAvailable download formats
Dataset updated
May 19, 2025
Dataset authored and provided by
Crawl Feeds
License

https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

Description

The Fox News Dataset is a comprehensive collection of over 1 million news articles, offering an unparalleled resource for analyzing media narratives, public discourse, and political trends. Covering articles up to the year 2023, this dataset is a treasure trove for researchers, analysts, and businesses interested in gaining deeper insights into the topics and trends covered by Fox News.

Key Features of the Fox News Dataset

  • Extensive Coverage: Contains more than 1 million articles spanning various topics and events up to 2023.
  • Research-Ready: Perfect for text classification, natural language processing (NLP), and other research purposes.
  • Format: Provided in CSV format for seamless integration into analytical and research tools.

Why Use This Dataset?

This large dataset is ideal for:

  • Text Classification: Develop machine learning models to classify and categorize news content.
  • Natural Language Processing (NLP): Conduct sentiment analysis, keyword extraction, or topic modeling.
  • Media and Political Research: Analyze media narratives, public opinion, and political trends reflected in Fox News articles.
  • Trend Analysis: Identify shifts in public discourse and media focus over time.

Explore More News Datasets

Discover additional resources for your research needs by visiting our news dataset collection. These datasets are tailored to support diverse analytical applications, including sentiment analysis and trend modeling.

The Fox News Dataset is a must-have for anyone interested in exploring large-scale media data and leveraging it for advanced analysis. Ready to dive into this wealth of information? Download the dataset now in CSV format and start uncovering the stories behind the headlines.

Search
Clear search
Close search
Google apps
Main menu