100+ datasets found
  1. News Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data, News Datasets [Dataset]. https://brightdata.com/products/datasets/news
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Stay ahead with our comprehensive News Dataset, designed for businesses, analysts, and researchers to track global events, monitor media trends, and extract valuable insights from news sources worldwide.

    Dataset Features

    News Articles: Access structured news data, including headlines, summaries, full articles, publication dates, and source details. Ideal for media monitoring and sentiment analysis. Publisher & Source Information: Extract details about news publishers, including domain, region, and credibility indicators. Sentiment & Topic Classification: Analyze news sentiment, categorize articles by topic, and track emerging trends in real time. Historical & Real-Time Data: Retrieve historical archives or access continuously updated news feeds for up-to-date insights.

    Customizable Subsets for Specific Needs Our News Dataset is fully customizable, allowing you to filter data based on publication date, region, topic, sentiment, or specific news sources. Whether you need broad coverage for trend analysis or focused data for competitive intelligence, we tailor the dataset to your needs.

    Popular Use Cases

    Media Monitoring & Reputation Management: Track brand mentions, analyze media coverage, and assess public sentiment. Market & Competitive Intelligence: Monitor industry trends, competitor activity, and emerging market opportunities. AI & Machine Learning Training: Use structured news data to train AI models for sentiment analysis, topic classification, and predictive analytics. Financial & Investment Research: Analyze news impact on stock markets, commodities, and economic indicators. Policy & Risk Analysis: Track regulatory changes, geopolitical events, and crisis developments in real time.

    Whether you're analyzing market trends, monitoring brand reputation, or training AI models, our News Dataset provides the structured data you need. Get started today and customize your dataset to fit your business objectives.

  2. Social media as a news outlet worldwide 2025

    • statista.com
    Updated Nov 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Social media as a news outlet worldwide 2025 [Dataset]. https://www.statista.com/statistics/718019/social-media-news-source/
    Explore at:
    Dataset updated
    Nov 19, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 2025 - Feb 2025
    Area covered
    Worldwide
    Description

    During a 2025 survey, ** percent of respondents from Nigeria stated that they used social media as a source of news. In comparison, just ** percent of Japanese respondents said the same. Large portions of social media users around the world admit that they do not trust social platforms either as media sources or as a way to get news, and yet they continue to access such networks on a daily basis. Social media: trust and consumption Despite the majority of adults surveyed in each country reporting that they used social networks to keep up to date with news and current affairs, a 2018 study showed that social media is the least trusted news source in the world. Less than ** percent of adults in Europe considered social networks to be trustworthy in this respect, yet more than ** percent of adults in Portugal, Poland, Romania, Hungary, Bulgaria, Slovakia and Croatia said that they got their news on social media. What is clear is that we live in an era where social media is such an enormous part of daily life that consumers will still use it in spite of their doubts or reservations. Concerns about fake news and propaganda on social media have not stopped billions of users accessing their favorite networks on a daily basis. Most Millennials in the United States use social media for news every day, and younger consumers in European countries are much more likely to use social networks for national political news than their older peers. Like it or not, reading news on social is fast becoming the norm for younger generations, and this form of news consumption will likely increase further regardless of whether consumers fully trust their chosen network or not.

  3. UK news headlines

    • kaggle.com
    zip
    Updated Jul 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DeXmaSa (2023). UK news headlines [Dataset]. https://www.kaggle.com/datasets/dexmasa/uk-news-headlines
    Explore at:
    zip(1318144 bytes)Available download formats
    Dataset updated
    Jul 5, 2023
    Authors
    DeXmaSa
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Area covered
    United Kingdom
    Description

    This dataset contains the headlines generated by the top 15 UK news websites over a time span of roughly 20 days. The headlines were scraped from the sites' respective RSS feeds.

    • Time frame: 2023-02-13 to 2023-03-05

    • Headlines were scraped in 12 hour intervals

    • The dataset consists of two files:

      • The scraped data consisting of the headlines
      • Auxiliary data containing additional information for each of the news outlets
    • Identification of the top 15 news websites in the UK: statista.com

    • Data dictionary for scraped data:

      • website: BBC, Sun, Mirror, Daily Mail, Independent, Telegraph, Guardian, Manchester Evening News, Sky News, Metro, Daily Express, Times, Liverpool Echo, Birmingham Live, Evening Standard.
      • timestamp scraped: Date and time when a particular headline was scraped.
      • headline: Headline of news article.
    • Data dictionary for compiled auxiliary data:

      • website: BBC, Sun, Mirror, Daily Mail, Independent, Telegraph, Guardian, Manchester Evening News, Sky News, Metro, Daily Express, Times, Liverpool Echo, Birmingham Live, Evening Standard.
      • RSS URL: URL to RSS feed for each of the above websites.
      • visitors unique monthly: In millions; taken from statista.com.
      • ownership: Entity owning a particular news outlet and thus the associated website. Source: General internet search.
      • political bias: left-center, center, right-center, right. Source: General internet search.
      • party support GE 2019: None, Conservative, Labour, Unknown. Source: General internet search. GE = general election.
      • journalism style: quality, tabloid. Source: General internet search.

    Image credit: https://unsplash.com/@siora18

  4. c

    BBC News Dataset – February 2023 Edition

    • crawlfeeds.com
    csv, zip
    Updated Jun 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). BBC News Dataset – February 2023 Edition [Dataset]. https://crawlfeeds.com/datasets/bbc-news-dataset-feb-2023
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Jun 14, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    Get access to a comprehensive and structured dataset of BBC News articles, freshly crawled and compiled in February 2023. This collection includes 1 million records from one of the world’s most trusted news organizations — perfect for training NLP models, sentiment analysis, and trend detection across global topics.

    💾 Format: CSV (available in ZIP archive)

    📢 Status: Published and available for immediate access

    Use Cases

    • Train language models to summarize or categorize news

    • Detect media bias and compare narrative framing

    • Conduct research in journalism, politics, and public sentiment

    • Enrich news aggregation platforms with clean metadata

    • Analyze content distribution across categories (e.g. health, politics, tech)

    This dataset ensures reliable and high-quality information sourced from a globally respected outlet. The format is optimized for quick ingestion into your pipelines — with clean text, timestamps, image links, and more.

    Need a filtered dataset or want this refreshed for a later date? We offer on-demand news scraping as well.

    👉 Request access or sample now

  5. h

    news-bias-full-data

    • huggingface.co
    Updated Oct 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    News Media Biases (2023). news-bias-full-data [Dataset]. https://huggingface.co/datasets/newsmediabias/news-bias-full-data
    Explore at:
    Dataset updated
    Oct 25, 2023
    Dataset authored and provided by
    News Media Biases
    Description

    **Please access the latest verison of data that is here https://huggingface.co/datasets/shainar/BEAD **

    email at shaina.raza@torontomu.ca for usage of data

      Please cite us if you use it
    

    @article{raza2024beads, title={BEADs: Bias Evaluation Across Domains}, author={Raza, Shaina and Rahman, Mizanur and Zhang, Michael R}, journal={arXiv preprint arXiv:2406.04220}, year={2024} }

      license: cc-by-nc-4.0
    

    language: - en pretty_name: Navigating News… See the full description on the dataset page: https://huggingface.co/datasets/newsmediabias/news-bias-full-data.

  6. Leading social networks used for news in the U.S. 2019-2025

    • statista.com
    Updated Jul 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Leading social networks used for news in the U.S. 2019-2025 [Dataset]. https://www.statista.com/statistics/444708/social-networks-used-for-news-usa/
    Explore at:
    Dataset updated
    Jul 4, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United States
    Description

    In 2025, Facebook remained the most-used social platform for news in the United States, with ** percent of respondents reporting they accessed news on it. YouTube followed closely at ** percent, recording a slight increase from the previous year. X (formerly Twitter) saw the most notable growth, rising by ***** percent to ** percent.

  7. c

    Bloomberg Quint news dataset

    • crawlfeeds.com
    json, zip
    Updated Sep 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2024). Bloomberg Quint news dataset [Dataset]. https://crawlfeeds.com/datasets/bloomberg-quint-news-dataset
    Explore at:
    json, zipAvailable download formats
    Dataset updated
    Sep 27, 2024
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    Explore the "Bloomberg Quint News Dataset," a comprehensive collection of news articles from Bloomberg Quint, a leading source of financial, business, and economic news in India and around the world.

    This dataset includes thousands of articles covering a wide range of topics, such as financial markets, economic policies, corporate news, technology, politics, and more. Each article in the dataset comes with detailed information, including headlines, publication dates, authors, article content, and categories, offering valuable insights for researchers, data analysts, and media professionals.

    Key Features:

    • Extensive Coverage: Thousands of news articles from Bloomberg Quint, covering diverse topics including business, finance, economics, technology, and global news.
    • Detailed Metadata: Each article includes key details such as headline, publication date, author, content, and category, making it ideal for in-depth research and analysis.
    • Ideal for Analysis: Perfect for researchers, data scientists, and content strategists looking to analyze trends in news reporting, study media coverage, or develop content strategies.
    • Rich Source of Information: Provides up-to-date information on financial markets, economic policies, and global events, helping professionals stay informed and make data-driven decisions.

    Whether you're researching financial trends, analyzing media coverage, or developing new content, the "Bloomberg Quint News Dataset" is an invaluable resource that offers detailed insights and extensive coverage of the latest news.

  8. News Events Data in Asia ( Techsalerator)

    • datarade.ai
    Updated Jul 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Techsalerator (2024). News Events Data in Asia ( Techsalerator) [Dataset]. https://datarade.ai/data-products/news-events-data-in-asia-techsalerator-techsalerator
    Explore at:
    .json, .csv, .xls, .txtAvailable download formats
    Dataset updated
    Jul 9, 2024
    Dataset provided by
    Techsalerator LLC
    Authors
    Techsalerator
    Area covered
    United Arab Emirates, Kyrgyzstan, Brunei Darussalam, Kazakhstan, Maldives, China, Timor-Leste, Uzbekistan, Hong Kong, Iran (Islamic Republic of)
    Description

    Techsalerator’s News Event Data in Asia offers a detailed and expansive dataset designed to provide businesses, analysts, journalists, and researchers with comprehensive insights into significant news events across the Asian continent. This dataset captures and categorizes major events reported from a diverse range of news sources, including press releases, industry news sites, blogs, and PR platforms, offering valuable perspectives on regional developments, economic shifts, political changes, and cultural occurrences.

    Key Features of the Dataset: Extensive Coverage:

    The dataset aggregates news events from a wide range of sources such as company press releases, industry-specific news outlets, blogs, PR sites, and traditional media. This broad coverage ensures a diverse array of information from multiple reporting channels. Categorization of Events:

    News events are categorized into various types including business and economic updates, political developments, technological advancements, legal and regulatory changes, and cultural events. This categorization helps users quickly find and analyze information relevant to their interests or sectors. Real-Time Updates:

    The dataset is updated regularly to include the most current events, ensuring users have access to the latest news and can stay informed about recent developments as they happen. Geographic Segmentation:

    Events are tagged with their respective countries and regions within Asia. This geographic segmentation allows users to filter and analyze news events based on specific locations, facilitating targeted research and analysis. Event Details:

    Each event entry includes comprehensive details such as the date of occurrence, source of the news, a description of the event, and relevant keywords. This thorough detailing helps users understand the context and significance of each event. Historical Data:

    The dataset includes historical news event data, enabling users to track trends and perform comparative analysis over time. This feature supports longitudinal studies and provides insights into the evolution of news events. Advanced Search and Filter Options:

    Users can search and filter news events based on criteria such as date range, event type, location, and keywords. This functionality allows for precise and efficient retrieval of relevant information. Asian Countries and Territories Covered: Central Asia: Kazakhstan Kyrgyzstan Tajikistan Turkmenistan Uzbekistan East Asia: China Hong Kong (Special Administrative Region of China) Japan Mongolia North Korea South Korea Taiwan South Asia: Afghanistan Bangladesh Bhutan India Maldives Nepal Pakistan Sri Lanka Southeast Asia: Brunei Cambodia East Timor (Timor-Leste) Indonesia Laos Malaysia Myanmar (Burma) Philippines Singapore Thailand Vietnam Western Asia (Middle East): Armenia Azerbaijan Bahrain Cyprus Georgia Iraq Israel Jordan Kuwait Lebanon Oman Palestine Qatar Saudi Arabia Syria Turkey (partly in Europe, but often included in Asia contextually) United Arab Emirates Yemen Benefits of the Dataset: Strategic Insights: Businesses and analysts can use the dataset to gain insights into significant regional developments, economic conditions, and political changes, aiding in strategic decision-making and market analysis. Market and Industry Trends: The dataset provides valuable information on industry-specific trends and events, helping users understand market dynamics and identify emerging opportunities. Media and PR Monitoring: Journalists and PR professionals can track relevant news across Asia, enabling them to monitor media coverage, identify emerging stories, and manage public relations efforts effectively. Academic and Research Use: Researchers can utilize the dataset for longitudinal studies, trend analysis, and academic research on various topics related to Asian news and events. Techsalerator’s News Event Data in Asia is a crucial resource for accessing and analyzing significant news events across the continent. By offering detailed, categorized, and up-to-date information, it supports effective decision-making, research, and media monitoring across diverse sectors.

  9. ARABIC NEWS DATASET - RESULTS FROM WEB SCRAPING

    • kaggle.com
    zip
    Updated Apr 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elaaatif (2024). ARABIC NEWS DATASET - RESULTS FROM WEB SCRAPING [Dataset]. https://www.kaggle.com/datasets/latif8/arabic-news-dataset-results-from-web-scraping
    Explore at:
    zip(10472746 bytes)Available download formats
    Dataset updated
    Apr 15, 2024
    Authors
    Elaaatif
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset obtained from web scraping encompasses a diverse set of news articles from prominent sources: Al Jazeera, BBC News Arabic, Fatabyyano, Verify-Sy and matsda2sh. Each article provides unique insights into various topics, ranging from global politics and current affairs to health, culture, and technology. The dataset offers a comprehensive snapshot of contemporary news coverage, allowing for in-depth analysis and exploration of different perspectives. With detailed information on article titles, categories, publication dates, and content, researchers and analysts can gain valuable insights into arabic media trends, public discourse, and societal issues.

  10. Z

    CT-FAN: A Multilingual dataset for Fake News Detection

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Juliane Köhler; Michael Wiegand; Melanie Siegel (2022). CT-FAN: A Multilingual dataset for Fake News Detection [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4714516
    Explore at:
    Dataset updated
    Oct 23, 2022
    Dataset provided by
    University of Klagenfurt
    University of Hildesheim
    Darmstadt University of Applied Sciences
    University of Duisburg-Essen
    University of Applied Sciences Potsdam
    Authors
    Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Juliane Köhler; Michael Wiegand; Melanie Siegel
    Description

    By downloading the data, you agree with the terms & conditions mentioned below:

    Data Access: The data in the research collection may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use them only for research purposes.

    Summaries, analyses and interpretations of the linguistic properties of the information may be derived and published, provided it is impossible to reconstruct the information from these summaries. You may not try identifying the individuals whose texts are included in this dataset. You may not try to identify the original entry on the fact-checking site. You are not permitted to publish any portion of the dataset besides summary statistics or share it with anyone else.

    We grant you the right to access the collection's content as described in this agreement. You may not otherwise make unauthorised commercial use of, reproduce, prepare derivative works, distribute copies, perform, or publicly display the collection or parts of it. You are responsible for keeping and storing the data in a way that others cannot access. The data is provided free of charge.

    Citation

    Please cite our work as

    @InProceedings{clef-checkthat:2022:task3, author = {K{"o}hler, Juliane and Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Wiegand, Michael and Siegel, Melanie and Mandl, Thomas}, title = "Overview of the {CLEF}-2022 {CheckThat}! Lab Task 3 on Fake News Detection", year = {2022}, booktitle = "Working Notes of CLEF 2022---Conference and Labs of the Evaluation Forum", series = {CLEF~'2022}, address = {Bologna, Italy},}

    @article{shahi2021overview, title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection}, author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas}, journal={Working Notes of CLEF}, year={2021} }

    Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English and German.

    Task 3: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. The training data will be released in batches and roughly about 1264 articles with the respective label in English language. Our definitions for the categories are as follows:

    False - The main claim made in an article is untrue.

    Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.

    True - This rating indicates that the primary elements of the main claim are demonstrably true.

    Other- An article that cannot be categorised as true, false, or partially false due to a lack of evidence about its claims. This category includes articles in dispute and unproven articles.

    Cross-Lingual Task (German)

    Along with the multi-class task for the English language, we have introduced a task for low-resourced language. We will provide the data for the test in the German language. The idea of the task is to use the English data and the concept of transfer to build a classification model for the German language.

    Input Data

    The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:

    ID- Unique identifier of the news article

    Title- Title of the news article

    text- Text mentioned inside the news article

    our rating - class of the news article as false, partially false, true, other

    Output data format

    public_id- Unique identifier of the news article

    predicted_rating- predicted class

    Sample File

    public_id, predicted_rating 1, false 2, true

    IMPORTANT!

    We have used the data from 2010 to 2022, and the content of fake news is mixed up with several topics like elections, COVID-19 etc.

    Baseline: For this task, we have created a baseline system. The baseline system can be found at https://zenodo.org/record/6362498

    Related Work

    Shahi GK. AMUSED: An Annotation Framework of Multi-modal Social Media Data. arXiv preprint arXiv:2010.00502. 2020 Oct 1.https://arxiv.org/pdf/2010.00502.pdf

    G. K. Shahi and D. Nandini, “FakeCovid – a multilingual cross-domain fact check news dataset for covid-19,” in workshop Proceedings of the 14th International AAAI Conference on Web and Social Media, 2020. http://workshop-proceedings.icwsm.org/abstract?id=2020_14

    Shahi, G. K., Dirkson, A., & Majchrzak, T. A. (2021). An exploratory study of covid-19 misinformation on twitter. Online Social Networks and Media, 22, 100104. doi: 10.1016/j.osnem.2020.100104

    Shahi, G. K., Struß, J. M., & Mandl, T. (2021). Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection. Working Notes of CLEF.

    Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeno, A., Míguez, R., Shaar, S., ... & Mandl, T. (2021, March). The CLEF-2021 CheckThat! lab on detecting check-worthy claims, previously fact-checked claims, and fake news. In European Conference on Information Retrieval (pp. 639-649). Springer, Cham.

    Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeño, A., Míguez, R., Shaar, S., ... & Kartal, Y. S. (2021, September). Overview of the CLEF–2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 264-291). Springer, Cham.

  11. c

    CNBC Economy Dataset - 17K Economy Articles CSV

    • crawlfeeds.com
    csv, zip
    Updated Nov 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). CNBC Economy Dataset - 17K Economy Articles CSV [Dataset]. https://crawlfeeds.com/datasets/cnbc-economy-articles-dataset
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Nov 24, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    CNBC Economy Articles Dataset is an invaluable collection of data extracted from CNBC’s economy section, offering deep insights into global and U.S. economic trends, market dynamics, financial policies, and industry developments.

    This dataset encompasses a diverse array of economic articles on critical topics like GDP growth, inflation rates, employment statistics, central bank policies, and major global events influencing the market. Designed for researchers, analysts, and businesses, it serves as an essential resource for understanding economic patterns, conducting sentiment analysis, and developing financial forecasting models.

    Dataset Highlights

    Each record in the dataset is meticulously structured and includes:

    • Article Titles
    • Publication Dates
    • Author Names
    • Content Summaries
    • URLs to Original Articles

    This rich combination of fields ensures seamless integration into data science projects, research papers, and market analyses.

    Key Features

    • Number of Articles: Hundreds of articles sourced directly from CNBC.
    • Data Fields: Includes title, publication date, author, article content, summary, URL, and relevant keywords.
    • Topics Covered: U.S. and global economy, GDP trends, inflation, employment, financial markets, and monetary policies.
    • Format: Delivered in CSV format for easy integration with research tools and analytical platforms.
    • Source: Extracted directly from CNBC’s economy news section, ensuring accuracy and relevance.

    Use Cases

    • Economic Research: Gain insights into U.S. and global economic policies, market trends, and industry developments.
    • Sentiment Analysis: Assess the sentiment of economic articles to gauge market perspectives and investor confidence.
    • Financial Modeling: Build forecasting models leveraging key economic indicators discussed in the dataset.
    • Content Creation: Develop research-backed reports, articles, and presentations on economic topics.

    Who Benefits?

    • Researchers & Academics studying macro-economics or financial policy.
    • Data Scientists building AI models, trend analyzers, or economic forecasting tools.
    • Economists & Analysts need real-world news data for policy analysis.
    • Content Strategists who write data-backed articles about economic trends.

    Why Choose This Dataset?

    • No need to manually scrape CNBC — data is pre-extracted and clean.
    • High-quality economy news metadata enables detailed filtering (by date, author, topic).
    • Ready for machine learning, sentiment analysis, or building news-based economic models.
    • Well-suited for trend tracking, policy analysis, and economic forecasting.

    Explore More News Datasets

    Interested in additional structured news datasets for your research or analytics needs? Check out our news dataset collection to find datasets tailored for diverse analytical applications.

  12. CT-FAN-21 corpus: A dataset for Fake News Detection

    • zenodo.org
    Updated Oct 23, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl (2022). CT-FAN-21 corpus: A dataset for Fake News Detection [Dataset]. http://doi.org/10.5281/zenodo.4714517
    Explore at:
    Dataset updated
    Oct 23, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl
    Description

    Data Access: The data in the research collection provided may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use it only for research purposes. Due to these restrictions, the collection is not open data. Please download the Agreement at Data Sharing Agreement and send the signed form to fakenewstask@gmail.com .

    Citation

    Please cite our work as

    @article{shahi2021overview,
     title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection},
     author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas},
     journal={Working Notes of CLEF},
     year={2021}
    }

    Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English.

    Subtask 3A: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. The training data will be released in batches and roughly about 900 articles with the respective label. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. Our definitions for the categories are as follows:

    • False - The main claim made in an article is untrue.

    • Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.

    • True - This rating indicates that the primary elements of the main claim are demonstrably true.

    • Other- An article that cannot be categorised as true, false, or partially false due to lack of evidence about its claims. This category includes articles in dispute and unproven articles.

    Subtask 3B: Topical Domain Classification of News Articles (English) Fact-checkers require background expertise to identify the truthfulness of an article. The categorisation will help to automate the sampling process from a stream of data. Given the text of a news article, determine the topical domain of the article (English). This is a classification problem. The task is to categorise fake news articles into six topical categories like health, election, crime, climate, election, education. This task will be offered for a subset of the data of Subtask 3A.

    Input Data

    The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:

    Task 3a

    • ID- Unique identifier of the news article
    • Title- Title of the news article
    • text- Text mentioned inside the news article
    • our rating - class of the news article as false, partially false, true, other

    Task 3b

    • public_id- Unique identifier of the news article
    • Title- Title of the news article
    • text- Text mentioned inside the news article
    • domain - domain of the given news article(applicable only for task B)

    Output data format

    Task 3a

    • public_id- Unique identifier of the news article
    • predicted_rating- predicted class

    Sample File

    public_id, predicted_rating
    1, false
    2, true

    Task 3b

    • public_id- Unique identifier of the news article
    • predicted_domain- predicted domain

    Sample file

    public_id, predicted_domain
    1, health
    2, crime

    Additional data for Training

    To train your model, the participant can use additional data with a similar format; some datasets are available over the web. We don't provide the background truth for those datasets. For testing, we will not use any articles from other datasets. Some of the possible source:

    IMPORTANT!

    1. Fake news article used for task 3b is a subset of task 3a.
    2. We have used the data from 2010 to 2021, and the content of fake news is mixed up with several topics like election, COVID-19 etc.

    Evaluation Metrics

    This task is evaluated as a classification task. We will use the F1-macro measure for the ranking of teams. There is a limit of 5 runs (total and not per day), and only one person from a team is allowed to submit runs.

    Submission Link: https://competitions.codalab.org/competitions/31238

    Related Work

    • Shahi GK. AMUSED: An Annotation Framework of Multi-modal Social Media Data. arXiv preprint arXiv:2010.00502. 2020 Oct 1.https://arxiv.org/pdf/2010.00502.pdf
    • G. K. Shahi and D. Nandini, “FakeCovid – a multilingualcross-domain fact check news dataset for covid-19,” inWorkshop Proceedings of the 14th International AAAIConference on Web and Social Media, 2020. http://workshop-proceedings.icwsm.org/abstract?id=2020_14
    • Shahi, G. K., Dirkson, A., & Majchrzak, T. A. (2021). An exploratory study of covid-19 misinformation on twitter. Online Social Networks and Media, 22, 100104. doi: 10.1016/j.osnem.2020.100104
  13. Fake News Detection Dataset

    • kaggle.com
    zip
    Updated Apr 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahdi Mashayekhi (2025). Fake News Detection Dataset [Dataset]. https://www.kaggle.com/datasets/mahdimashayekhi/fake-news-detection-dataset
    Explore at:
    zip(11735585 bytes)Available download formats
    Dataset updated
    Apr 27, 2025
    Authors
    Mahdi Mashayekhi
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    📚 Fake News Detection Dataset

    Overview

    This dataset is designed for practicing fake news detection using machine learning and natural language processing (NLP) techniques. It includes a rich collection of 20,000 news articles, carefully generated to simulate real-world data scenarios. Each record contains metadata about the article and a label indicating whether the news is real or fake.

    The dataset also intentionally includes around 5% missing values in some fields to simulate the challenges of handling incomplete data in real-life projects.

    Columns Description

    title A short headline summarizing the article (around 6 words). text The body of the news article (200–300 words on average). date The publication date of the article, randomly selected over the past 3 years. source The media source that published the article (e.g., BBC, CNN, Al Jazeera). May contain missing values (~5%). author The author's full name. Some entries are missing (~5%) to simulate real-world incomplete data. category The general category of the article (e.g., Politics, Health, Sports, Technology). label The target label: real or fake news.

    Why Use This Dataset?

    Fake News Detection Practice: Perfect for binary classification tasks.

    NLP Preprocessing: Allows users to practice text cleaning, tokenization, vectorization, etc.

    Handling Missing Data: Some fields are incomplete to simulate real-world data challenges.

    Feature Engineering: Encourages creating new features from text and metadata.

    Balanced Labels: Realistic distribution of real and fake news for fair model training.

    Potential Use Cases

    Building and evaluating text classification models (e.g., Logistic Regression, Random Forests, XGBoost).

    Practicing NLP techniques like TF-IDF, Word2Vec, BERT embeddings.

    Performing exploratory data analysis (EDA) on news data.

    Developing pipelines for dealing with missing values and feature extraction.

    A Note on Data Quality

    This dataset has been synthetically generated to closely resemble real news articles. The diversity in titles, text, sources, and categories ensures that models trained on this dataset can generalize well to unseen, real-world data. However, since it is synthetic, it should not be used for production models or decision-making without careful validation.

    File Info

    Filename: fake_news_dataset.csv

    Size: 20,000 rows × 7 columns

    Missing Data: ~5% missing values in the source and author columns.

  14. a

    Online News Popularity Data Set

    • academictorrents.com
    • kaggle.com
    bittorrent
    Updated Feb 11, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kelwin Fernandes and Pedro Vinagre and Paulo Cortez and Pedro Sernadela (2016). Online News Popularity Data Set [Dataset]. https://academictorrents.com/details/95d3b03397a0bafd74a662fe13ba3550c13b7ce1
    Explore at:
    bittorrent(7476401)Available download formats
    Dataset updated
    Feb 11, 2016
    Dataset authored and provided by
    Kelwin Fernandes and Pedro Vinagre and Paulo Cortez and Pedro Sernadela
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    Data Set Information: * The articles were published by Mashable (www.mashable.com) and their content as the rights to reproduce it belongs to them. Hence, this dataset does not share the original content but some statistics associated with it. The original content be publicly accessed and retrieved using the provided urls. * Acquisition date: January 8, 2015 * The estimated relative performance values were estimated by the authors using a Random Forest classifier and a rolling windows as assessment method. See their article for more details on how the relative performance values were set. ##Attribute Information: Number of Attributes: 61 (58 predictive attributes, 2 non-predictive, 1 goal field) 0. url: URL of the article (non-predictive) 1. timedelta: Days between the article publication and the dataset acquisition (non-predictive) 2. n_tokens_title: Number of words in the title 3. n_tokens_content: Number of words in the content 4. n_unique_tokens: Rate of unique words in the conte

  15. o

    News Data, Global News, Topic News, and More from Google News

    • openwebninja.com
    json
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenWeb Ninja, News Data, Global News, Topic News, and More from Google News [Dataset]. https://www.openwebninja.com/api/real-time-news-data
    Explore at:
    jsonAvailable download formats
    Dataset authored and provided by
    OpenWeb Ninja
    Area covered
    Global News Coverage
    Description

    This dataset provides comprehensive access to news articles and headlines from Google News in real-time. Get top news globally or by specific topics, with support for geographic targeting and custom search queries. Perfect for applications requiring news monitoring, media analysis, and content aggregation. The dataset is delivered in a JSON format via REST API.

  16. m

    News Recommendations Dataset: Headlines & Categories.

    • data.mendeley.com
    Updated Dec 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ankur Ray Chayan (2024). News Recommendations Dataset: Headlines & Categories. [Dataset]. http://doi.org/10.17632/pk5vs5wjxm.1
    Explore at:
    Dataset updated
    Dec 12, 2024
    Authors
    Ankur Ray Chayan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset, titled News Recommendations: Headlines & Categories, contains 1,999 records of news articles sourced from various newspapers. It is a versatile resource for machine learning tasks such as text classification, recommendation systems, and natural language processing (NLP). Each entry includes a headline summarizing the news article, the name of the newspaper that published it, a brief description of the article, and its associated categories. There are 209 unique categories in total, ranging from single labels like "Business" and "Education" to multi-label combinations like "Environment, Health" and "Sports, Economy." Additionally, each record includes a link to the full article, offering further context for analysis. This dataset can be utilized for a variety of applications, including building personalized news recommendation systems, performing sentiment analysis, and experimenting with multi-label learning models. Its rich and diverse content makes it ideal for researchers and practitioners exploring real-world data scenarios. To enhance usability, cleaning the category labels for consistency may be a helpful first step. Overall, this dataset provides an excellent opportunity to work with complex textual data in a practical and impactful way.

  17. Data from: Space News Dataset

    • kaggle.com
    zip
    Updated May 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    astro_pat (2024). Space News Dataset [Dataset]. https://www.kaggle.com/datasets/patrickfleith/space-news-dataset
    Explore at:
    zip(30407544 bytes)Available download formats
    Dataset updated
    May 26, 2024
    Authors
    astro_pat
    Description

    Latest Dataset Update: May 2024

    This dataset contains the title, url, text content, author, publish data, and of more than 20,000 english news articles related to the space industry, a total of more than 14 million tokens (words) which makes it perfect to train language models specific to the space industry ecosystem. It covers agency news, commercial, civil, launches, military, and also opinion articles.

    Credits goes to the original authors of each articles. For any citation, use https://spacenews.com/

  18. c

    Time Magazine Latest News dataset

    • crawlfeeds.com
    json, zip
    Updated Jan 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Time Magazine Latest News dataset [Dataset]. https://crawlfeeds.com/datasets/time-magazine-latest-news-dataset
    Explore at:
    json, zipAvailable download formats
    Dataset updated
    Jan 14, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    Unlock real-time insights from Time Magazine's Latest News Dataset through our platform in just a few simple steps. Whether you're a researcher, marketer, or business analyst, this dataset offers comprehensive coverage of global news from one of the world’s most trusted sources. Here’s how you can get started:

    1. Sign Up

    Begin by signing up for an account on our platform. This gives you access to all of our data services, including the Time Magazine Latest News Dataset.

    2. Add to Cart

    Browse our offerings and select the Time Magazine Latest News Dataset plan that fits your needs. Once you’ve made your choice, add it to your cart and proceed to the checkout page.

    3. Make the Payment

    Complete your purchase by paying through our secure payment options. We accept multiple payment methods to ensure a smooth and easy transaction process.

    4. Invoice & Access

    Once your payment is processed, you will receive an invoice for your purchase. Our team will then provide you with immediate access to the dataset, along with the relevant download instructions and login details.

    5. Start Using the Dataset

    After gaining access, you’ll be able to download the Time Magazine Latest News Dataset, which includes news articles extracted as of March 2021. While this dataset is not a live feed, it offers historical articles and insights that can be used for trend analysis, research, and content aggregation.

    Key Features of the Dataset

    • Comprehensive Coverage: Includes articles from a variety of categories, including politics, business, technology, health, and entertainment.
    • Structured Data: The dataset is well-organized, making it easy to filter, analyze, and integrate into your projects.
    • Reliable Source: Sourced directly from Time Magazine, a globally recognized and trusted news outlet.

    Why Choose Our Platform?

    • Ease of Use: Once you gain access to the dataset, it’s easy to integrate into your systems or conduct analysis with minimal effort.
    • Data You Can Trust: With Time Magazine being one of the most credible sources for global news, you’re assured of high-quality, reliable content.
    • Flexible Use Cases: Ideal for news aggregation, sentiment analysis, academic research, and more.

  19. CNBC News Headlines Dataset

    • kaggle.com
    zip
    Updated Sep 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2024). CNBC News Headlines Dataset [Dataset]. https://www.kaggle.com/datasets/crawlfeeds/cnbc-news-headlines-dataset
    Explore at:
    zip(1171757 bytes)Available download formats
    Dataset updated
    Sep 7, 2024
    Authors
    Crawl Feeds
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains a collection of 500 news headlines from CNBC, covering a range of topics including business, politics, finance, technology, and more. The dataset is organized in CSV format, providing users with an easy-to-analyze structure. The headlines span a variety of dates and capture some of the most important events in recent news cycles.

    Features:

    • Headline: The title of the news article.
    • Date Published: The date on which the news was published.
    • Category: The section or topic (e.g., business, technology, finance) under which the article was published.
    • description: A brief summary of the news article.

    This dataset can be useful for:

    Text mining and sentiment analysis of news headlines. Trend analysis of CNBC's coverage on key topics over time. Natural Language Processing (NLP) tasks such as headline generation, categorization, or summarization. Market research to study the influence of certain news topics on market movements. The dataset is structured to be friendly for a variety of analysis tasks, making it suitable for beginners, researchers, and data enthusiasts.

    Columns:

    • headline: Title of the news article.
    • date: Publication date in YYYY-MM-DD format.
    • category: The topic/category of the news (e.g., business, finance, politics, etc.).
    • description: Short description of the news article.

    Data Source:

    The data is sourced from CNBC's publicly available news articles. It was extracted by the CrawlFeeds team using an in-house tool. For more information, visit CrawlFeeds. This dataset is provided for educational and research purposes.

  20. Leading social networks used for news in the UK 2025

    • statista.com
    Updated Nov 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Leading social networks used for news in the UK 2025 [Dataset]. https://www.statista.com/statistics/297866/digital-news-gateways-used-online-by-age-uk/
    Explore at:
    Dataset updated
    Nov 27, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 2025 - Feb 2025
    Area covered
    United Kingdom
    Description

    According to data from February 2025, Facebook was the most popular social network for news access in the United Kingdom, with ** percent of respondents using the service. YouTube and X (formerly Twitter) ranked second and third, with ** and ** percent of users respectively using the networks for news content.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Bright Data, News Datasets [Dataset]. https://brightdata.com/products/datasets/news
Organization logo

News Datasets

Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License

https://brightdata.com/licensehttps://brightdata.com/license

Area covered
Worldwide
Description

Stay ahead with our comprehensive News Dataset, designed for businesses, analysts, and researchers to track global events, monitor media trends, and extract valuable insights from news sources worldwide.

Dataset Features

News Articles: Access structured news data, including headlines, summaries, full articles, publication dates, and source details. Ideal for media monitoring and sentiment analysis. Publisher & Source Information: Extract details about news publishers, including domain, region, and credibility indicators. Sentiment & Topic Classification: Analyze news sentiment, categorize articles by topic, and track emerging trends in real time. Historical & Real-Time Data: Retrieve historical archives or access continuously updated news feeds for up-to-date insights.

Customizable Subsets for Specific Needs Our News Dataset is fully customizable, allowing you to filter data based on publication date, region, topic, sentiment, or specific news sources. Whether you need broad coverage for trend analysis or focused data for competitive intelligence, we tailor the dataset to your needs.

Popular Use Cases

Media Monitoring & Reputation Management: Track brand mentions, analyze media coverage, and assess public sentiment. Market & Competitive Intelligence: Monitor industry trends, competitor activity, and emerging market opportunities. AI & Machine Learning Training: Use structured news data to train AI models for sentiment analysis, topic classification, and predictive analytics. Financial & Investment Research: Analyze news impact on stock markets, commodities, and economic indicators. Policy & Risk Analysis: Track regulatory changes, geopolitical events, and crisis developments in real time.

Whether you're analyzing market trends, monitoring brand reputation, or training AI models, our News Dataset provides the structured data you need. Get started today and customize your dataset to fit your business objectives.

Search
Clear search
Close search
Google apps
Main menu