100+ datasets found
  1. h

    cc_news

    • huggingface.co
    Updated Jul 3, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vladimir Blagojevic (2018). cc_news [Dataset]. https://huggingface.co/datasets/vblagoje/cc_news
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 3, 2018
    Authors
    Vladimir Blagojevic
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    Dataset Card for CC-News

      Dataset Summary
    

    CC-News dataset contains news articles from news sites all over the world. The data is available on AWS S3 in the Common Crawl bucket at /crawl-data/CC-NEWS/. This version of the dataset has been prepared using news-please - an integrated web crawler and information extractor for news.It contains 708241 English language news articles published between Jan 2017 and December 2019. It represents a small portion of the English… See the full description on the dataset page: https://huggingface.co/datasets/vblagoje/cc_news.

  2. Real & Fake News

    • kaggle.com
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Raza (2025). Real & Fake News [Dataset]. https://www.kaggle.com/datasets/razanaqvi14/real-and-fake-news
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 28, 2025
    Dataset provided by
    Kaggle
    Authors
    Ali Raza
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    📰 Fake News Detection Dataset

    In the digital age, misinformation spreads faster than ever. To combat this challenge, we present a robust dataset crafted for the development and evaluation of machine learning models that can distinguish between real and fake news.

    This dataset is divided into two parts:

    • True.csv – Contains 21,417 verified news articles with four key attributes:

      • title: The headline of the article
      • text: The full body of the news article
      • subject: The category or theme (e.g., politics, world news, etc.)
      • date: The date of publication
    • Fake.csv – Includes 23,481 fabricated news articles with the same structure and attributes as the True dataset.

    🧠 Use Cases: - Training NLP models for binary classification (fake vs real) - Sentiment and subject analysis of misinformation - Exploring linguistic patterns between authentic and deceptive news

    📊 Ideal For: - Data science and machine learning learners - Researchers focusing on information integrity - Developers building news verification tools

  3. c

    Fox News dataset is for analyzing media trends and narratives

    • crawlfeeds.com
    csv, zip
    Updated May 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Fox News dataset is for analyzing media trends and narratives [Dataset]. https://crawlfeeds.com/datasets/fox-news-dataset
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    May 19, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    The Fox News Dataset is a comprehensive collection of over 1 million news articles, offering an unparalleled resource for analyzing media narratives, public discourse, and political trends. Covering articles up to the year 2023, this dataset is a treasure trove for researchers, analysts, and businesses interested in gaining deeper insights into the topics and trends covered by Fox News.

    Key Features of the Fox News Dataset

    • Extensive Coverage: Contains more than 1 million articles spanning various topics and events up to 2023.
    • Research-Ready: Perfect for text classification, natural language processing (NLP), and other research purposes.
    • Format: Provided in CSV format for seamless integration into analytical and research tools.

    Why Use This Dataset?

    This large dataset is ideal for:

    • Text Classification: Develop machine learning models to classify and categorize news content.
    • Natural Language Processing (NLP): Conduct sentiment analysis, keyword extraction, or topic modeling.
    • Media and Political Research: Analyze media narratives, public opinion, and political trends reflected in Fox News articles.
    • Trend Analysis: Identify shifts in public discourse and media focus over time.

    Explore More News Datasets

    Discover additional resources for your research needs by visiting our news dataset collection. These datasets are tailored to support diverse analytical applications, including sentiment analysis and trend modeling.

    The Fox News Dataset is a must-have for anyone interested in exploring large-scale media data and leveraging it for advanced analysis. Ready to dive into this wealth of information? Download the dataset now in CSV format and start uncovering the stories behind the headlines.

  4. h

    all-the-news-2-1-Component-one

    • huggingface.co
    Updated Jul 2, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rafael Arias Calles (2019). all-the-news-2-1-Component-one [Dataset]. https://huggingface.co/datasets/rjac/all-the-news-2-1-Component-one
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 2, 2019
    Authors
    Rafael Arias Calles
    Description

    2.7 million news articles and essays

      Dataset Description
    

    2.7 million news articles and essays from 27 American publications. Includes date, title, publication, article text, publication name, year, month, and URL (for some). Articles mostly span from 2016 to early 2020.

    Type: CSV Size: 3.4 GB compressed, 8.8 GB uncompressed Created by: Andrew Thompson Date added: 4/3/2020 Date modified: 4/3/2020 source: Component one Datasets 2.7 Millions Date of Download and processed:… See the full description on the dataset page: https://huggingface.co/datasets/rjac/all-the-news-2-1-Component-one.

  5. P

    RealNews Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Jan 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rowan Zellers; Ari Holtzman; Hannah Rashkin; Yonatan Bisk; Ali Farhadi; Franziska Roesner; Yejin Choi (2023). RealNews Dataset [Dataset]. https://paperswithcode.com/dataset/realnews
    Explore at:
    Dataset updated
    Jan 30, 2023
    Authors
    Rowan Zellers; Ari Holtzman; Hannah Rashkin; Yonatan Bisk; Ali Farhadi; Franziska Roesner; Yejin Choi
    Description

    RealNews is a large corpus of news articles from Common Crawl. Data is scraped from Common Crawl, limited to the 5000 news domains indexed by Google News. The authors used the Newspaper Python library to extract the body and metadata from each article. News from Common Crawl dumps from December 2016 through March 2019 were used as training data; articles published in April 2019 from the April 2019 dump were used for evaluation. After deduplication, RealNews is 120 gigabytes without compression.

  6. h

    mirage-news

    • huggingface.co
    Updated Apr 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Runsheng Huang (2025). mirage-news [Dataset]. https://huggingface.co/datasets/anson-huang/mirage-news
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 5, 2025
    Authors
    Runsheng Huang
    Description

    MiRAGeNews: Multimodal Realistic AI-Generated News Detection

    [Paper] [Github] This dataset contains a total of 15,000 pieces of real or AI-generated multimodal news (image-caption pairs) -- a training set of 10,000 pairs, a validation set of 2,500 pairs, and five test sets of 500 pairs each. Four of the test sets are out-of-domain data from unseen news publishers and image generators to evaluate detector's generalization ability. === Data Source (News Publisher + Image Generator)… See the full description on the dataset page: https://huggingface.co/datasets/anson-huang/mirage-news.

  7. i

    Science and tech news dataset

    • ieee-dataport.org
    Updated Oct 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rajat Thakur (2021). Science and tech news dataset [Dataset]. https://ieee-dataport.org/documents/science-and-tech-news-dataset
    Explore at:
    Dataset updated
    Oct 27, 2021
    Authors
    Rajat Thakur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains world news related to Science and technology and also with the news article's available metadata.

  8. P

    TR-News Dataset

    • paperswithcode.com
    • opendatalab.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ehsan Kamalloo; Davood Rafiei, TR-News Dataset [Dataset]. https://paperswithcode.com/dataset/tr-news
    Explore at:
    Authors
    Ehsan Kamalloo; Davood Rafiei
    Description

    This dataset is collected from various global and local news sources. Toponyms are manually annotated in the articles with the corresponding entries from GeoNames. In total, the dataset consists of 118 articles.

  9. Z

    CT-FAN: A Multilingual dataset for Fake News Detection

    • data.niaid.nih.gov
    Updated Oct 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Melanie Siegel (2022). CT-FAN: A Multilingual dataset for Fake News Detection [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4714516
    Explore at:
    Dataset updated
    Oct 23, 2022
    Dataset provided by
    Julia Maria Struß
    Gautam Kishore Shahi
    Juliane Köhler
    Thomas Mandl
    Melanie Siegel
    Michael Wiegand
    Description

    By downloading the data, you agree with the terms & conditions mentioned below:

    Data Access: The data in the research collection may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use them only for research purposes.

    Summaries, analyses and interpretations of the linguistic properties of the information may be derived and published, provided it is impossible to reconstruct the information from these summaries. You may not try identifying the individuals whose texts are included in this dataset. You may not try to identify the original entry on the fact-checking site. You are not permitted to publish any portion of the dataset besides summary statistics or share it with anyone else.

    We grant you the right to access the collection's content as described in this agreement. You may not otherwise make unauthorised commercial use of, reproduce, prepare derivative works, distribute copies, perform, or publicly display the collection or parts of it. You are responsible for keeping and storing the data in a way that others cannot access. The data is provided free of charge.

    Citation

    Please cite our work as

    @InProceedings{clef-checkthat:2022:task3, author = {K{"o}hler, Juliane and Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Wiegand, Michael and Siegel, Melanie and Mandl, Thomas}, title = "Overview of the {CLEF}-2022 {CheckThat}! Lab Task 3 on Fake News Detection", year = {2022}, booktitle = "Working Notes of CLEF 2022---Conference and Labs of the Evaluation Forum", series = {CLEF~'2022}, address = {Bologna, Italy},}

    @article{shahi2021overview, title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection}, author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas}, journal={Working Notes of CLEF}, year={2021} }

    Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English and German.

    Task 3: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. The training data will be released in batches and roughly about 1264 articles with the respective label in English language. Our definitions for the categories are as follows:

    False - The main claim made in an article is untrue.

    Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.

    True - This rating indicates that the primary elements of the main claim are demonstrably true.

    Other- An article that cannot be categorised as true, false, or partially false due to a lack of evidence about its claims. This category includes articles in dispute and unproven articles.

    Cross-Lingual Task (German)

    Along with the multi-class task for the English language, we have introduced a task for low-resourced language. We will provide the data for the test in the German language. The idea of the task is to use the English data and the concept of transfer to build a classification model for the German language.

    Input Data

    The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:

    ID- Unique identifier of the news article

    Title- Title of the news article

    text- Text mentioned inside the news article

    our rating - class of the news article as false, partially false, true, other

    Output data format

    public_id- Unique identifier of the news article

    predicted_rating- predicted class

    Sample File

    public_id, predicted_rating 1, false 2, true

    IMPORTANT!

    We have used the data from 2010 to 2022, and the content of fake news is mixed up with several topics like elections, COVID-19 etc.

    Baseline: For this task, we have created a baseline system. The baseline system can be found at https://zenodo.org/record/6362498

    Related Work

    Shahi GK. AMUSED: An Annotation Framework of Multi-modal Social Media Data. arXiv preprint arXiv:2010.00502. 2020 Oct 1.https://arxiv.org/pdf/2010.00502.pdf

    G. K. Shahi and D. Nandini, “FakeCovid – a multilingual cross-domain fact check news dataset for covid-19,” in workshop Proceedings of the 14th International AAAI Conference on Web and Social Media, 2020. http://workshop-proceedings.icwsm.org/abstract?id=2020_14

    Shahi, G. K., Dirkson, A., & Majchrzak, T. A. (2021). An exploratory study of covid-19 misinformation on twitter. Online Social Networks and Media, 22, 100104. doi: 10.1016/j.osnem.2020.100104

    Shahi, G. K., Struß, J. M., & Mandl, T. (2021). Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection. Working Notes of CLEF.

    Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeno, A., Míguez, R., Shaar, S., ... & Mandl, T. (2021, March). The CLEF-2021 CheckThat! lab on detecting check-worthy claims, previously fact-checked claims, and fake news. In European Conference on Information Retrieval (pp. 639-649). Springer, Cham.

    Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeño, A., Míguez, R., Shaar, S., ... & Kartal, Y. S. (2021, September). Overview of the CLEF–2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 264-291). Springer, Cham.

  10. h

    fake-news-detection-dataset-English

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erfan Moosavi Monazzah, fake-news-detection-dataset-English [Dataset]. https://huggingface.co/datasets/ErfanMoosaviMonazzah/fake-news-detection-dataset-English
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Erfan Moosavi Monazzah
    License

    https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/

    Description

    This is a cleaned and splitted version of this dataset (https://www.kaggle.com/datasets/sadikaljarif/fake-news-detection-dataset-english) Labels:

    Fake News: 0 Real News: 1 You can find the cleansing script at: https://github.com/ErfanMoosaviMonazzah/Fake-News-Detection

  11. Z

    Multilingual Fake News Detection Dataset: Gujarati, Hindi, Marathi, and...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patil, Kailas (2024). Multilingual Fake News Detection Dataset: Gujarati, Hindi, Marathi, and Telugu [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11408512
    Explore at:
    Dataset updated
    Jun 1, 2024
    Dataset provided by
    Vaibhav, Patil
    Ameya, Pawar
    Parshv, Gandhi
    Abhishek, Chauhan
    Patil, Kailas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is designed to support research in fake news detection across four major Indian languages: Gujarati, Hindi, Marathi, and Telugu. The dataset includes a diverse set of news articles collected from various sources, each labeled as either 'fake' or 'real'. The primary goal is to provide a resource that helps in the development and evaluation of natural language processing (NLP) models capable of detecting fake news in these regional languages.

  12. P

    Reuters-21578 Dataset

    • paperswithcode.com
    Updated Feb 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lewis (2021). Reuters-21578 Dataset [Dataset]. https://paperswithcode.com/dataset/reuters-21578
    Explore at:
    Dataset updated
    Feb 2, 2021
    Authors
    Lewis
    Description

    The Reuters-21578 dataset is a collection of documents with news articles. The original corpus has 10,369 documents and a vocabulary of 29,930 words.

  13. P

    Cable TV News Dataset

    • paperswithcode.com
    • opendatalab.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cable TV News Dataset [Dataset]. https://paperswithcode.com/dataset/cable-tv-news
    Explore at:
    Description

    Cable TV news is a data set of nearly 24/7 video, audio, and text captions from three U.S. cable TV networks (CNN, FOX, and MSNBC) from January 2010 to July 2019. Using machine learning tools, the authors detect faces in 244,038 hours of video, label each face's presented gender, identify prominent public figures, and align text captions to audio.

  14. F

    TamperedNews & News400 (IJMIR'21 Update)

    • data.uni-hannover.de
    partaa, partab +5
    Updated May 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TIB (2022). TamperedNews & News400 (IJMIR'21 Update) [Dataset]. https://data.uni-hannover.de/dataset/tamperednews-news400-ijmir21
    Explore at:
    tar.gz(36324241), tar.gz(43558405), tar.gz(304532), partac(500000000), partab(500000000), partad(500000000), partaa(500000000), tar(43282945), partad(370561409), tar(10547367), partae(445522586)Available download formats
    Dataset updated
    May 17, 2022
    Dataset authored and provided by
    TIB
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Description

    Multimodal Analytics for Real-world News using Measures of Cross-modal Entity Consistency

    This repository contains the TamperedNews and News400 datasets introduced in the paper:

    Eric Müller-Budack, Jonas Theiner, Sebastian Diering, Maximilian Idahl, Sherzod Hakimov und Ralph Ewerth. „Multimodal news analytics using measures of cross-modal entity and context consistency“. In: International Journal of Multimedia Information Retrieval 10.2 (2021), Springer, S. 111–125. DOI: https://doi.org/10.1007/s13735-021-00207-4

    Content

    For both datasets TamperedNews and News400, we provide the:

    • *dataset*.tar.gz containing the *dataset*.jsonl with
      • Web links to the news texts
      • Web links to the news image
      • Outputs of the named entity recognition and disambiguation (NERD) approach
      • Untampered and tampered entities
    • *dataset*_features.tar.gzwith visual features for events, locations, and persons
    • news400_wordembeddings.tar.gz: Word embeddings of all nouns in the news texts of the News400 dataset

    Please note that the word embeddings of the TamperedNews dataset (tamperednews_wordembeddings.tar.gz) have been already provided in the first version (Link).

    For all entities detected in both datasets, we provide:

    • entities.tar.gz containing an *entity_type*.jsonl for all entity types (events, locations, and persons) with:
      • Wikidata ID
      • Wikidata label
      • Meta information used for tampering
      • Web links to all reference images crawled from Google, Bing, and Wikidata
    • entities_features.tar.gz containing the visual features of the reference images for all entities

    Source Code

    The source code to reproduce our results as well as download scripts to crawl news texts and images can be found on our GitHub page: https://github.com/TIBHannover/cross-modal_entity_consistency

  15. Forex News Annotated Dataset for Sentiment Analysis

    • zenodo.org
    • paperswithcode.com
    • +1more
    csv
    Updated Nov 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Georgios Fatouros; Georgios Fatouros; Kalliopi Kouroumali; Kalliopi Kouroumali (2023). Forex News Annotated Dataset for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.7976208
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 11, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Georgios Fatouros; Georgios Fatouros; Kalliopi Kouroumali; Kalliopi Kouroumali
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains news headlines relevant to key forex pairs: AUDUSD, EURCHF, EURUSD, GBPUSD, and USDJPY. The data was extracted from reputable platforms Forex Live and FXstreet over a period of 86 days, from January to May 2023. The dataset comprises 2,291 unique news headlines. Each headline includes an associated forex pair, timestamp, source, author, URL, and the corresponding article text. Data was collected using web scraping techniques executed via a custom service on a virtual machine. This service periodically retrieves the latest news for a specified forex pair (ticker) from each platform, parsing all available information. The collected data is then processed to extract details such as the article's timestamp, author, and URL. The URL is further used to retrieve the full text of each article. This data acquisition process repeats approximately every 15 minutes.

    To ensure the reliability of the dataset, we manually annotated each headline for sentiment. Instead of solely focusing on the textual content, we ascertained sentiment based on the potential short-term impact of the headline on its corresponding forex pair. This method recognizes the currency market's acute sensitivity to economic news, which significantly influences many trading strategies. As such, this dataset could serve as an invaluable resource for fine-tuning sentiment analysis models in the financial realm.

    We used three categories for annotation: 'positive', 'negative', and 'neutral', which correspond to bullish, bearish, and hold sentiments, respectively, for the forex pair linked to each headline. The following Table provides examples of annotated headlines along with brief explanations of the assigned sentiment.

    Examples of Annotated Headlines
    
    
        Forex Pair
        Headline
        Sentiment
        Explanation
    
    
    
    
        GBPUSD 
        Diminishing bets for a move to 12400 
        Neutral
        Lack of strong sentiment in either direction
    
    
        GBPUSD 
        No reasons to dislike Cable in the very near term as long as the Dollar momentum remains soft 
        Positive
        Positive sentiment towards GBPUSD (Cable) in the near term
    
    
        GBPUSD 
        When are the UK jobs and how could they affect GBPUSD 
        Neutral
        Poses a question and does not express a clear sentiment
    
    
        JPYUSD
        Appropriate to continue monetary easing to achieve 2% inflation target with wage growth 
        Positive
        Monetary easing from Bank of Japan (BoJ) could lead to a weaker JPY in the short term due to increased money supply
    
    
        USDJPY
        Dollar rebounds despite US data. Yen gains amid lower yields 
        Neutral
        Since both the USD and JPY are gaining, the effects on the USDJPY forex pair might offset each other
    
    
        USDJPY
        USDJPY to reach 124 by Q4 as the likelihood of a BoJ policy shift should accelerate Yen gains 
        Negative
        USDJPY is expected to reach a lower value, with the USD losing value against the JPY
    
    
        AUDUSD
    
        <p>RBA Governor Lowe’s Testimony High inflation is damaging and corrosive </p>
    
        Positive
        Reserve Bank of Australia (RBA) expresses concerns about inflation. Typically, central banks combat high inflation with higher interest rates, which could strengthen AUD.
    

    Moreover, the dataset includes two columns with the predicted sentiment class and score as predicted by the FinBERT model. Specifically, the FinBERT model outputs a set of probabilities for each sentiment class (positive, negative, and neutral), representing the model's confidence in associating the input headline with each sentiment category. These probabilities are used to determine the predicted class and a sentiment score for each headline. The sentiment score is computed by subtracting the negative class probability from the positive one.

  16. h

    fake-news

    • huggingface.co
    Updated Dec 25, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gagan Bhatia (2021). fake-news [Dataset]. https://huggingface.co/datasets/gagan3012/fake-news
    Explore at:
    Dataset updated
    Dec 25, 2021
    Authors
    Gagan Bhatia
    Description

    gagan3012/fake-news dataset hosted on Hugging Face and contributed by the HF Datasets community

  17. I

    Global News Index and Extracted Features Repository (v.1.2.0)

    • databank.illinois.edu
    Updated Mar 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Global News Index and Extracted Features Repository (v.1.2.0) [Dataset]. http://doi.org/10.13012/B2IDB-5649852_V5
    Explore at:
    Dataset updated
    Mar 5, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Cline Center Global News Index is a searchable database of textual features extracted from millions of news stories, specifically designed to provide comprehensive coverage of events around the world. In addition to searching documents for keywords, users can query metadata and features such as named entities extracted using Natural Language Processing (NLP) methods and variables that measure sentiment and emotional valence. Archer is a web application purpose-built by the Cline Center to enable researchers to access data from the Global News Index. Archer provides a user-friendly interface for querying the Global News Index (with the back-end indexing still handled by Solr). By default, queries are built using icons and drop-down menus. More technically-savvy users can use Lucene/Solr query syntax via a ‘raw query’ option. Archer allows users to save and iterate on their queries, and to visualize faceted query results, which can be helpful for users as they refine their queries. Additional Resources: - Access to Archer and the Global News Index is limited to account-holders. If you are interested in signing up for an account, please fill out the Archer Access Request Form so we can determine if you are eligible for access or not. - Current users who would like to provide feedback, such as reporting a bug or requesting a feature, can fill out the Archer User Feedback Form. - The Cline Center sends out periodic email newsletters to the Archer Users Group. Please fill out this form to subscribe to it. Citation Guidelines: 1) To cite the GNI codebook (or any other documentation associated with the Global News Index and Archer) please use the following citation: Cline Center for Advanced Social Research. 2023. Global News Index and Extracted Features Repository [codebook], v1.2.0. Champaign, IL: University of Illinois. June. XX. doi:10.13012/B2IDB-5649852_V5 2) To cite data from the Global News Index (accessed via Archer or otherwise) please use the following citation (filling in the correct date of access): Cline Center for Advanced Social Research. 2023. Global News Index and Extracted Features Repository [database], v1.2.0. Champaign, IL: University of Illinois. Jun. XX. Accessed Month, DD, YYYY. doi:10.13012/B2IDB-5649852_V5 *NOTE: V4 is suppressed and V5 is replacing V4 with updated ‘Archer’ documents.

  18. A

    Popular News articles

    • apitube.io
    Updated Oct 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    APITube.io (2024). Popular News articles [Dataset]. https://apitube.io/free-datasets/popular-news-articles
    Explore at:
    Dataset updated
    Oct 2, 2024
    Dataset authored and provided by
    APITube
    License

    https://www.apache.org/licenses/LICENSE-2.0https://www.apache.org/licenses/LICENSE-2.0

    Time period covered
    Jan 1, 2020 - Present
    Area covered
    Global
    Variables measured
    Category, Language, Sentiment, News Content, News Sources, News Headlines, Publication Date, Geographic Location
    Description

    A dataset of popular news articles from various sources.. Crawled date: Oct, 2024. Documents count: 12,000.

  19. Fake and Real News Dataset

    • kaggle.com
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gilchrist (2024). Fake and Real News Dataset [Dataset]. https://www.kaggle.com/datasets/gilchr/fake-and-real-news-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 3, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Gilchrist
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Title: Fake vs Real News Dataset

    Description:

    This dataset contains news articles classified into two categories: real and fake. It is designed to help researchers, data scientists, and students build and test machine learning models capable of detecting fake news.

    Dataset Structure:

    • Columns:
      • title: The title of the news article.
      • content: The full content of the news article (raw text).
      • target: A binary label indicating the authenticity of the news:
      • 0: Real news.
      • 1: Fake news.

    Objective:

    The primary goals of this dataset are to: - Provide a resource for training and evaluating binary classification models. - Enable experiments on Natural Language Processing (NLP), such as text vectorization, sentiment analysis, and more. - Encourage exploration of approaches to identify biases in data related to fake news detection.

    Data Sources:

    Sample Data:

    titlecontenttarget
    NASA announces new Mars rover missionNASA revealed plans for a new mission to Mars starting in 2025.0
    Vaccines implant 5G chipsConspiracy theorists claim vaccines are used to implant 5G tracking.1

    Potential Use Cases:

    1. Train classification models to predict the authenticity of news articles.
    2. Test NLP pipelines, such as those based on CountVectorizer, TF-IDF, or advanced models like BERT.
    3. Study trends in fake news: topics, keywords, and linguistic patterns.

    Caution:

    • This dataset is provided for educational and research purposes only.
    • Model results should be interpreted carefully and not used for critical applications without thorough validation.
  20. Iran news dataset

    • kaggle.com
    Updated Oct 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamad Shirzad (2023). Iran news dataset [Dataset]. https://www.kaggle.com/datasets/mohamadshirzad/iran-news-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 27, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mohamad Shirzad
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Area covered
    Iran
    Description

    Data about news in Iran in housing field.This data has been collected from https://khabaronline.ir

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Vladimir Blagojevic (2018). cc_news [Dataset]. https://huggingface.co/datasets/vblagoje/cc_news

cc_news

CC-News

vblagoje/cc_news

Explore at:
164 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 3, 2018
Authors
Vladimir Blagojevic
License

https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

Description

Dataset Card for CC-News

  Dataset Summary

CC-News dataset contains news articles from news sites all over the world. The data is available on AWS S3 in the Common Crawl bucket at /crawl-data/CC-NEWS/. This version of the dataset has been prepared using news-please - an integrated web crawler and information extractor for news.It contains 708241 English language news articles published between Jan 2017 and December 2019. It represents a small portion of the English… See the full description on the dataset page: https://huggingface.co/datasets/vblagoje/cc_news.

Search
Clear search
Close search
Google apps
Main menu