100+ datasets found

h
cc_news
huggingface.co
Updated Jul 3, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vladimir Blagojevic (2018). cc_news [Dataset]. https://huggingface.co/datasets/vblagoje/cc_news
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 3, 2018
Authors
Vladimir Blagojevic
License
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Description
Dataset Card for CC-News

Dataset Summary

CC-News dataset contains news articles from news sites all over the world. The data is available on AWS S3 in the Common Crawl bucket at /crawl-data/CC-NEWS/. This version of the dataset has been prepared using news-please - an integrated web crawler and information extractor for news.It contains 708241 English language news articles published between Jan 2017 and December 2019. It represents a small portion of the English… See the full description on the dataset page: https://huggingface.co/datasets/vblagoje/cc_news.
Real & Fake News
kaggle.com
Updated Apr 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali Raza (2025). Real & Fake News [Dataset]. https://www.kaggle.com/datasets/razanaqvi14/real-and-fake-news
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 28, 2025
Dataset provided by
Kaggle
Authors
Ali Raza
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
📰 Fake News Detection Dataset

In the digital age, misinformation spreads faster than ever. To combat this challenge, we present a robust dataset crafted for the development and evaluation of machine learning models that can distinguish between real and fake news.

This dataset is divided into two parts:

True.csv – Contains 21,417 verified news articles with four key attributes:

title: The headline of the article

text: The full body of the news article

subject: The category or theme (e.g., politics, world news, etc.)

date: The date of publication

Fake.csv – Includes 23,481 fabricated news articles with the same structure and attributes as the True dataset.

🧠 Use Cases: - Training NLP models for binary classification (fake vs real) - Sentiment and subject analysis of misinformation - Exploring linguistic patterns between authentic and deceptive news

📊 Ideal For: - Data science and machine learning learners - Researchers focusing on information integrity - Developers building news verification tools
c
Fox News dataset is for analyzing media trends and narratives
crawlfeeds.com
csv, zip
Updated May 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Fox News dataset is for analyzing media trends and narratives [Dataset]. https://crawlfeeds.com/datasets/fox-news-dataset
Explore at:
zip, csvAvailable download formats
Dataset updated
May 19, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
The Fox News Dataset is a comprehensive collection of over 1 million news articles, offering an unparalleled resource for analyzing media narratives, public discourse, and political trends. Covering articles up to the year 2023, this dataset is a treasure trove for researchers, analysts, and businesses interested in gaining deeper insights into the topics and trends covered by Fox News.

Key Features of the Fox News Dataset

Extensive Coverage: Contains more than 1 million articles spanning various topics and events up to 2023.

Research-Ready: Perfect for text classification, natural language processing (NLP), and other research purposes.

Format: Provided in CSV format for seamless integration into analytical and research tools.

Why Use This Dataset?

This large dataset is ideal for:

Text Classification: Develop machine learning models to classify and categorize news content.

Natural Language Processing (NLP): Conduct sentiment analysis, keyword extraction, or topic modeling.

Media and Political Research: Analyze media narratives, public opinion, and political trends reflected in Fox News articles.

Trend Analysis: Identify shifts in public discourse and media focus over time.

Explore More News Datasets

Discover additional resources for your research needs by visiting our news dataset collection. These datasets are tailored to support diverse analytical applications, including sentiment analysis and trend modeling.

The Fox News Dataset is a must-have for anyone interested in exploring large-scale media data and leveraging it for advanced analysis. Ready to dive into this wealth of information? Download the dataset now in CSV format and start uncovering the stories behind the headlines.
h
all-the-news-2-1-Component-one
huggingface.co
Updated Jul 2, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rafael Arias Calles (2019). all-the-news-2-1-Component-one [Dataset]. https://huggingface.co/datasets/rjac/all-the-news-2-1-Component-one
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 2, 2019
Authors
Rafael Arias Calles
Description
2.7 million news articles and essays

Dataset Description

2.7 million news articles and essays from 27 American publications. Includes date, title, publication, article text, publication name, year, month, and URL (for some). Articles mostly span from 2016 to early 2020.

Type: CSV Size: 3.4 GB compressed, 8.8 GB uncompressed Created by: Andrew Thompson Date added: 4/3/2020 Date modified: 4/3/2020 source: Component one Datasets 2.7 Millions Date of Download and processed:… See the full description on the dataset page: https://huggingface.co/datasets/rjac/all-the-news-2-1-Component-one.
P
RealNews Dataset
paperswithcode.com
opendatalab.com
Updated Jan 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rowan Zellers; Ari Holtzman; Hannah Rashkin; Yonatan Bisk; Ali Farhadi; Franziska Roesner; Yejin Choi (2023). RealNews Dataset [Dataset]. https://paperswithcode.com/dataset/realnews
Explore at:
Dataset updated
Jan 30, 2023
Authors
Rowan Zellers; Ari Holtzman; Hannah Rashkin; Yonatan Bisk; Ali Farhadi; Franziska Roesner; Yejin Choi
Description
RealNews is a large corpus of news articles from Common Crawl. Data is scraped from Common Crawl, limited to the 5000 news domains indexed by Google News. The authors used the Newspaper Python library to extract the body and metadata from each article. News from Common Crawl dumps from December 2016 through March 2019 were used as training data; articles published in April 2019 from the April 2019 dump were used for evaluation. After deduplication, RealNews is 120 gigabytes without compression.
h
mirage-news
huggingface.co
Updated Apr 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Runsheng Huang (2025). mirage-news [Dataset]. https://huggingface.co/datasets/anson-huang/mirage-news
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 5, 2025
Authors
Runsheng Huang
Description
MiRAGeNews: Multimodal Realistic AI-Generated News Detection

[Paper] [Github] This dataset contains a total of 15,000 pieces of real or AI-generated multimodal news (image-caption pairs) -- a training set of 10,000 pairs, a validation set of 2,500 pairs, and five test sets of 500 pairs each. Four of the test sets are out-of-domain data from unseen news publishers and image generators to evaluate detector's generalization ability. === Data Source (News Publisher + Image Generator)… See the full description on the dataset page: https://huggingface.co/datasets/anson-huang/mirage-news.
i
Science and tech news dataset
ieee-dataport.org
Updated Oct 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rajat Thakur (2021). Science and tech news dataset [Dataset]. https://ieee-dataport.org/documents/science-and-tech-news-dataset
Explore at:
Dataset updated
Oct 27, 2021
Authors
Rajat Thakur
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains world news related to Science and technology and also with the news article's available metadata.
P
TR-News Dataset
paperswithcode.com
opendatalab.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ehsan Kamalloo; Davood Rafiei, TR-News Dataset [Dataset]. https://paperswithcode.com/dataset/tr-news
Explore at:
Authors
Ehsan Kamalloo; Davood Rafiei
Description
This dataset is collected from various global and local news sources. Toponyms are manually annotated in the articles with the corresponding entries from GeoNames. In total, the dataset consists of 118 articles.
Z
CT-FAN: A Multilingual dataset for Fake News Detection
data.niaid.nih.gov
Updated Oct 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Melanie Siegel (2022). CT-FAN: A Multilingual dataset for Fake News Detection [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4714516
Explore at:
Dataset updated
Oct 23, 2022
Dataset provided by
Julia Maria Struß
Gautam Kishore Shahi
Juliane Köhler
Thomas Mandl
Melanie Siegel
Michael Wiegand
Description
By downloading the data, you agree with the terms & conditions mentioned below:

Data Access: The data in the research collection may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use them only for research purposes.

Summaries, analyses and interpretations of the linguistic properties of the information may be derived and published, provided it is impossible to reconstruct the information from these summaries. You may not try identifying the individuals whose texts are included in this dataset. You may not try to identify the original entry on the fact-checking site. You are not permitted to publish any portion of the dataset besides summary statistics or share it with anyone else.

We grant you the right to access the collection's content as described in this agreement. You may not otherwise make unauthorised commercial use of, reproduce, prepare derivative works, distribute copies, perform, or publicly display the collection or parts of it. You are responsible for keeping and storing the data in a way that others cannot access. The data is provided free of charge.

Citation

Please cite our work as

@InProceedings{clef-checkthat:2022:task3, author = {K{"o}hler, Juliane and Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Wiegand, Michael and Siegel, Melanie and Mandl, Thomas}, title = "Overview of the {CLEF}-2022 {CheckThat}! Lab Task 3 on Fake News Detection", year = {2022}, booktitle = "Working Notes of CLEF 2022---Conference and Labs of the Evaluation Forum", series = {CLEF~'2022}, address = {Bologna, Italy},}

@article{shahi2021overview, title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection}, author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas}, journal={Working Notes of CLEF}, year={2021} }

Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English and German.

Task 3: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. The training data will be released in batches and roughly about 1264 articles with the respective label in English language. Our definitions for the categories are as follows:

False - The main claim made in an article is untrue.

Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.

True - This rating indicates that the primary elements of the main claim are demonstrably true.

Other- An article that cannot be categorised as true, false, or partially false due to a lack of evidence about its claims. This category includes articles in dispute and unproven articles.

Cross-Lingual Task (German)

Along with the multi-class task for the English language, we have introduced a task for low-resourced language. We will provide the data for the test in the German language. The idea of the task is to use the English data and the concept of transfer to build a classification model for the German language.

Input Data

The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:

ID- Unique identifier of the news article

Title- Title of the news article

text- Text mentioned inside the news article

our rating - class of the news article as false, partially false, true, other

Output data format

public_id- Unique identifier of the news article

predicted_rating- predicted class

Sample File

public_id, predicted_rating 1, false 2, true

IMPORTANT!

We have used the data from 2010 to 2022, and the content of fake news is mixed up with several topics like elections, COVID-19 etc.

Baseline: For this task, we have created a baseline system. The baseline system can be found at https://zenodo.org/record/6362498

Related Work

Shahi GK. AMUSED: An Annotation Framework of Multi-modal Social Media Data. arXiv preprint arXiv:2010.00502. 2020 Oct 1.https://arxiv.org/pdf/2010.00502.pdf

G. K. Shahi and D. Nandini, “FakeCovid – a multilingual cross-domain fact check news dataset for covid-19,” in workshop Proceedings of the 14th International AAAI Conference on Web and Social Media, 2020. http://workshop-proceedings.icwsm.org/abstract?id=2020_14

Shahi, G. K., Dirkson, A., & Majchrzak, T. A. (2021). An exploratory study of covid-19 misinformation on twitter. Online Social Networks and Media, 22, 100104. doi: 10.1016/j.osnem.2020.100104

Shahi, G. K., Struß, J. M., & Mandl, T. (2021). Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection. Working Notes of CLEF.

Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeno, A., Míguez, R., Shaar, S., ... & Mandl, T. (2021, March). The CLEF-2021 CheckThat! lab on detecting check-worthy claims, previously fact-checked claims, and fake news. In European Conference on Information Retrieval (pp. 639-649). Springer, Cham.

Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeño, A., Míguez, R., Shaar, S., ... & Kartal, Y. S. (2021, September). Overview of the CLEF–2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 264-291). Springer, Cham.
h
fake-news-detection-dataset-English
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erfan Moosavi Monazzah, fake-news-detection-dataset-English [Dataset]. https://huggingface.co/datasets/ErfanMoosaviMonazzah/fake-news-detection-dataset-English
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Erfan Moosavi Monazzah
License
https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/
Description
This is a cleaned and splitted version of this dataset (https://www.kaggle.com/datasets/sadikaljarif/fake-news-detection-dataset-english) Labels:

Fake News: 0 Real News: 1 You can find the cleansing script at: https://github.com/ErfanMoosaviMonazzah/Fake-News-Detection
Z
Multilingual Fake News Detection Dataset: Gujarati, Hindi, Marathi, and...
data.niaid.nih.gov
zenodo.org
Updated Jun 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patil, Kailas (2024). Multilingual Fake News Detection Dataset: Gujarati, Hindi, Marathi, and Telugu [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11408512
Explore at:
Dataset updated
Jun 1, 2024
Dataset provided by
Vaibhav, Patil
Ameya, Pawar
Parshv, Gandhi
Abhishek, Chauhan
Patil, Kailas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is designed to support research in fake news detection across four major Indian languages: Gujarati, Hindi, Marathi, and Telugu. The dataset includes a diverse set of news articles collected from various sources, each labeled as either 'fake' or 'real'. The primary goal is to provide a resource that helps in the development and evaluation of natural language processing (NLP) models capable of detecting fake news in these regional languages.
P
Reuters-21578 Dataset
paperswithcode.com
Updated Feb 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lewis (2021). Reuters-21578 Dataset [Dataset]. https://paperswithcode.com/dataset/reuters-21578
Explore at:
Dataset updated
Feb 2, 2021
Authors
Lewis
Description
The Reuters-21578 dataset is a collection of documents with news articles. The original corpus has 10,369 documents and a vocabulary of 29,930 words.
P
Cable TV News Dataset
paperswithcode.com
opendatalab.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cable TV News Dataset [Dataset]. https://paperswithcode.com/dataset/cable-tv-news
Explore at:
Description
Cable TV news is a data set of nearly 24/7 video, audio, and text captions from three U.S. cable TV networks (CNN, FOX, and MSNBC) from January 2010 to July 2019. Using machine learning tools, the authors detect faces in 244,038 hours of video, label each face's presented gender, identify prominent public figures, and align text captions to audio.
F
TamperedNews & News400 (IJMIR'21 Update)
data.uni-hannover.de
partaa, partab +5
Updated May 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TIB (2022). TamperedNews & News400 (IJMIR'21 Update) [Dataset]. https://data.uni-hannover.de/dataset/tamperednews-news400-ijmir21
Explore at:
tar.gz(36324241), tar.gz(43558405), tar.gz(304532), partac(500000000), partab(500000000), partad(500000000), partaa(500000000), tar(43282945), partad(370561409), tar(10547367), partae(445522586)Available download formats
Dataset updated
May 17, 2022
Dataset authored and provided by
TIB
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
Multimodal Analytics for Real-world News using Measures of Cross-modal Entity Consistency

This repository contains the TamperedNews and News400 datasets introduced in the paper:

Eric Müller-Budack, Jonas Theiner, Sebastian Diering, Maximilian Idahl, Sherzod Hakimov und Ralph Ewerth. „Multimodal news analytics using measures of cross-modal entity and context consistency“. In: International Journal of Multimedia Information Retrieval 10.2 (2021), Springer, S. 111–125. DOI: https://doi.org/10.1007/s13735-021-00207-4

Content

For both datasets TamperedNews and News400, we provide the:

*dataset*.tar.gz containing the *dataset*.jsonl with

Web links to the news texts

Web links to the news image

Outputs of the named entity recognition and disambiguation (NERD) approach

Untampered and tampered entities

*dataset*_features.tar.gzwith visual features for events, locations, and persons

news400_wordembeddings.tar.gz: Word embeddings of all nouns in the news texts of the News400 dataset

Please note that the word embeddings of the TamperedNews dataset (tamperednews_wordembeddings.tar.gz) have been already provided in the first version (Link).

For all entities detected in both datasets, we provide:

entities.tar.gz containing an *entity_type*.jsonl for all entity types (events, locations, and persons) with:

Wikidata ID

Wikidata label

Meta information used for tampering

Web links to all reference images crawled from Google, Bing, and Wikidata

entities_features.tar.gz containing the visual features of the reference images for all entities

Source Code

The source code to reproduce our results as well as download scripts to crawl news texts and images can be found on our GitHub page: https://github.com/TIBHannover/cross-modal_entity_consistency
Forex News Annotated Dataset for Sentiment Analysis
zenodo.org
paperswithcode.com
+1more
csv
Updated Nov 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Georgios Fatouros; Georgios Fatouros; Kalliopi Kouroumali; Kalliopi Kouroumali (2023). Forex News Annotated Dataset for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.7976208
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7976208
Dataset updated
Nov 11, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Georgios Fatouros; Georgios Fatouros; Kalliopi Kouroumali; Kalliopi Kouroumali
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains news headlines relevant to key forex pairs: AUDUSD, EURCHF, EURUSD, GBPUSD, and USDJPY. The data was extracted from reputable platforms Forex Live and FXstreet over a period of 86 days, from January to May 2023. The dataset comprises 2,291 unique news headlines. Each headline includes an associated forex pair, timestamp, source, author, URL, and the corresponding article text. Data was collected using web scraping techniques executed via a custom service on a virtual machine. This service periodically retrieves the latest news for a specified forex pair (ticker) from each platform, parsing all available information. The collected data is then processed to extract details such as the article's timestamp, author, and URL. The URL is further used to retrieve the full text of each article. This data acquisition process repeats approximately every 15 minutes.

To ensure the reliability of the dataset, we manually annotated each headline for sentiment. Instead of solely focusing on the textual content, we ascertained sentiment based on the potential short-term impact of the headline on its corresponding forex pair. This method recognizes the currency market's acute sensitivity to economic news, which significantly influences many trading strategies. As such, this dataset could serve as an invaluable resource for fine-tuning sentiment analysis models in the financial realm.

We used three categories for annotation: 'positive', 'negative', and 'neutral', which correspond to bullish, bearish, and hold sentiments, respectively, for the forex pair linked to each headline. The following Table provides examples of annotated headlines along with brief explanations of the assigned sentiment.

Examples of Annotated Headlines Forex Pair Headline Sentiment Explanation GBPUSD Diminishing bets for a move to 12400 Neutral Lack of strong sentiment in either direction GBPUSD No reasons to dislike Cable in the very near term as long as the Dollar momentum remains soft Positive Positive sentiment towards GBPUSD (Cable) in the near term GBPUSD When are the UK jobs and how could they affect GBPUSD Neutral Poses a question and does not express a clear sentiment JPYUSD Appropriate to continue monetary easing to achieve 2% inflation target with wage growth Positive Monetary easing from Bank of Japan (BoJ) could lead to a weaker JPY in the short term due to increased money supply USDJPY Dollar rebounds despite US data. Yen gains amid lower yields Neutral Since both the USD and JPY are gaining, the effects on the USDJPY forex pair might offset each other USDJPY USDJPY to reach 124 by Q4 as the likelihood of a BoJ policy shift should accelerate Yen gains Negative USDJPY is expected to reach a lower value, with the USD losing value against the JPY AUDUSD <p>RBA Governor Lowe’s Testimony High inflation is damaging and corrosive </p> Positive Reserve Bank of Australia (RBA) expresses concerns about inflation. Typically, central banks combat high inflation with higher interest rates, which could strengthen AUD.

Moreover, the dataset includes two columns with the predicted sentiment class and score as predicted by the FinBERT model. Specifically, the FinBERT model outputs a set of probabilities for each sentiment class (positive, negative, and neutral), representing the model's confidence in associating the input headline with each sentiment category. These probabilities are used to determine the predicted class and a sentiment score for each headline. The sentiment score is computed by subtracting the negative class probability from the positive one.
h
fake-news
huggingface.co
Updated Dec 25, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gagan Bhatia (2021). fake-news [Dataset]. https://huggingface.co/datasets/gagan3012/fake-news
Explore at:
Dataset updated
Dec 25, 2021
Authors
Gagan Bhatia
Description
gagan3012/fake-news dataset hosted on Hugging Face and contributed by the HF Datasets community
I
Global News Index and Extracted Features Repository (v.1.2.0)
databank.illinois.edu
Updated Mar 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Global News Index and Extracted Features Repository (v.1.2.0) [Dataset]. http://doi.org/10.13012/B2IDB-5649852_V5
Explore at:
Unique identifier
https://doi.org/10.13012/B2IDB-5649852_V5
Dataset updated
Mar 5, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Cline Center Global News Index is a searchable database of textual features extracted from millions of news stories, specifically designed to provide comprehensive coverage of events around the world. In addition to searching documents for keywords, users can query metadata and features such as named entities extracted using Natural Language Processing (NLP) methods and variables that measure sentiment and emotional valence. Archer is a web application purpose-built by the Cline Center to enable researchers to access data from the Global News Index. Archer provides a user-friendly interface for querying the Global News Index (with the back-end indexing still handled by Solr). By default, queries are built using icons and drop-down menus. More technically-savvy users can use Lucene/Solr query syntax via a ‘raw query’ option. Archer allows users to save and iterate on their queries, and to visualize faceted query results, which can be helpful for users as they refine their queries. Additional Resources: - Access to Archer and the Global News Index is limited to account-holders. If you are interested in signing up for an account, please fill out the Archer Access Request Form so we can determine if you are eligible for access or not. - Current users who would like to provide feedback, such as reporting a bug or requesting a feature, can fill out the Archer User Feedback Form. - The Cline Center sends out periodic email newsletters to the Archer Users Group. Please fill out this form to subscribe to it. Citation Guidelines: 1) To cite the GNI codebook (or any other documentation associated with the Global News Index and Archer) please use the following citation: Cline Center for Advanced Social Research. 2023. Global News Index and Extracted Features Repository [codebook], v1.2.0. Champaign, IL: University of Illinois. June. XX. doi:10.13012/B2IDB-5649852_V5 2) To cite data from the Global News Index (accessed via Archer or otherwise) please use the following citation (filling in the correct date of access): Cline Center for Advanced Social Research. 2023. Global News Index and Extracted Features Repository [database], v1.2.0. Champaign, IL: University of Illinois. Jun. XX. Accessed Month, DD, YYYY. doi:10.13012/B2IDB-5649852_V5 *NOTE: V4 is suppressed and V5 is replacing V4 with updated ‘Archer’ documents.
A
Popular News articles
apitube.io
Updated Oct 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
APITube.io (2024). Popular News articles [Dataset]. https://apitube.io/free-datasets/popular-news-articles
Explore at:
Dataset updated
Oct 2, 2024
Dataset authored and provided by
APITube
License
https://www.apache.org/licenses/LICENSE-2.0https://www.apache.org/licenses/LICENSE-2.0
Time period covered
Jan 1, 2020 - Present
Area covered
Global
Variables measured
Category, Language, Sentiment, News Content, News Sources, News Headlines, Publication Date, Geographic Location
Description
A dataset of popular news articles from various sources.. Crawled date: Oct, 2024. Documents count: 12,000.

Fake and Real News Dataset

kaggle.com

Updated Dec 3, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Gilchrist (2024). Fake and Real News Dataset [Dataset]. https://www.kaggle.com/datasets/gilchr/fake-and-real-news-dataset

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Dec 3, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Gilchrist

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Title: Fake vs Real News Dataset

Description:

This dataset contains news articles classified into two categories: real and fake. It is designed to help researchers, data scientists, and students build and test machine learning models capable of detecting fake news.

Dataset Structure:

Columns:
- title: The title of the news article.
- content: The full content of the news article (raw text).
- target: A binary label indicating the authenticity of the news:
- 0: Real news.
- 1: Fake news.

Objective:

The primary goals of this dataset are to: - Provide a resource for training and evaluating binary classification models. - Enable experiments on Natural Language Processing (NLP), such as text vectorization, sentiment analysis, and more. - Encourage exploration of approaches to identify biases in data related to fake news detection.

Data Sources:

This dataset was created by merging two existing CSV files, representing fake and real news articles respectively. https://www.kaggle.com/datasets/clmentbisaillon/fake-and-real-news-dataset?select=Fake.csv

Sample Data:

title	content	target
NASA announces new Mars rover mission	NASA revealed plans for a new mission to Mars starting in 2025.	0
Vaccines implant 5G chips	Conspiracy theorists claim vaccines are used to implant 5G tracking.	1

Potential Use Cases:

Train classification models to predict the authenticity of news articles.
Test NLP pipelines, such as those based on CountVectorizer, TF-IDF, or advanced models like BERT.
Study trends in fake news: topics, keywords, and linguistic patterns.

Caution:

This dataset is provided for educational and research purposes only.
Model results should be interpreted carefully and not used for critical applications without thorough validation.

Iran news dataset
kaggle.com
Updated Oct 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamad Shirzad (2023). Iran news dataset [Dataset]. https://www.kaggle.com/datasets/mohamadshirzad/iran-news-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 27, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mohamad Shirzad
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Area covered
Iran
Description
Data about news in Iran in housing field.This data has been collected from https://khabaronline.ir

Facebook

Twitter

Click to copy link

Link copied

Cite

Vladimir Blagojevic (2018). cc_news [Dataset]. https://huggingface.co/datasets/vblagoje/cc_news

cc_news

CC-News

vblagoje/cc_news

Explore at:

164 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jul 3, 2018

Authors

Vladimir Blagojevic

License

https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

Description

Dataset Card for CC-News

  Dataset Summary

CC-News dataset contains news articles from news sites all over the world. The data is available on AWS S3 in the Common Crawl bucket at /crawl-data/CC-NEWS/. This version of the dataset has been prepared using news-please - an integrated web crawler and information extractor for news.It contains 708241 English language news articles published between Jan 2017 and December 2019. It represents a small portion of the English… See the full description on the dataset page: https://huggingface.co/datasets/vblagoje/cc_news.

Clear search

Close search

Google apps

Main menu

cc_news

Real & Fake News

📰 Fake News Detection Dataset

Fox News dataset is for analyzing media trends and narratives

Key Features of the Fox News Dataset

Why Use This Dataset?

Explore More News Datasets

all-the-news-2-1-Component-one

RealNews Dataset

mirage-news

Science and tech news dataset

TR-News Dataset

CT-FAN: A Multilingual dataset for Fake News Detection

fake-news-detection-dataset-English

Multilingual Fake News Detection Dataset: Gujarati, Hindi, Marathi, and...

Reuters-21578 Dataset

Cable TV News Dataset

TamperedNews & News400 (IJMIR'21 Update)

Multimodal Analytics for Real-world News using Measures of Cross-modal Entity Consistency

Content

Source Code

Forex News Annotated Dataset for Sentiment Analysis

fake-news

Global News Index and Extracted Features Repository (v.1.2.0)

Popular News articles

Fake and Real News Dataset

Title: Fake vs Real News Dataset

Description:

Dataset Structure:

Objective:

Data Sources:

Sample Data:

Potential Use Cases:

Caution:

Iran news dataset

cc_news

CC-News

vblagoje/cc_news