In 2025, Facebook remained the most-used social platform for news in the United States, with 32 percent of respondents reporting they accessed news on it. YouTube followed closely at 30 percent, recording a slight increase from the previous year. X (formerly Twitter) saw the most notable growth, rising by eight percent to 23 percent.
https://coolest-gadgets.com/privacy-policyhttps://coolest-gadgets.com/privacy-policy
Fake News Statistics: Fake news has become a major problem in today's digital age in recent years. It spreads quickly through social media and other online platforms, often misleading people. Fake news spreads faster than real news, thus creating confusion and mistrust among global people. In 2024, current statistics and trends reveal that many people have encountered fake news online, and many have shared it unknowingly.
Fake news affects public opinion, political decisions, and even relationships. This article helps us understand how widespread it is and helps us address several issues more effectively. Raising awareness and encouraging critical thinking can reduce its impact, in which reliable statistics and research are essential for uncovering the truth and stopping the spread of false information. Everyone plays a role in combating fake news.
Data Access: The data in the research collection provided may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use it only for research purposes. Due to these restrictions, the collection is not open data. Please download the Agreement at Data Sharing Agreement and send the signed form to fakenewstask@gmail.com .
Citation
Please cite our work as
@article{shahi2021overview, title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection}, author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas}, journal={Working Notes of CLEF}, year={2021} }
Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English.
Subtask 3A: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. The training data will be released in batches and roughly about 900 articles with the respective label. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. Our definitions for the categories are as follows:
False - The main claim made in an article is untrue.
Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.
True - This rating indicates that the primary elements of the main claim are demonstrably true.
Other- An article that cannot be categorised as true, false, or partially false due to lack of evidence about its claims. This category includes articles in dispute and unproven articles.
Subtask 3B: Topical Domain Classification of News Articles (English) Fact-checkers require background expertise to identify the truthfulness of an article. The categorisation will help to automate the sampling process from a stream of data. Given the text of a news article, determine the topical domain of the article (English). This is a classification problem. The task is to categorise fake news articles into six topical categories like health, election, crime, climate, election, education. This task will be offered for a subset of the data of Subtask 3A.
Input Data
The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:
Task 3a
Task 3b
Output data format
Task 3a
Sample File
public_id, predicted_rating
1, false
2, true
Task 3b
Sample file
public_id, predicted_domain
1, health
2, crime
Additional data for Training
To train your model, the participant can use additional data with a similar format; some datasets are available over the web. We don't provide the background truth for those datasets. For testing, we will not use any articles from other datasets. Some of the possible source:
IMPORTANT!
Evaluation Metrics
This task is evaluated as a classification task. We will use the F1-macro measure for the ranking of teams. There is a limit of 5 runs (total and not per day), and only one person from a team is allowed to submit runs.
Submission Link: https://competitions.codalab.org/competitions/31238
Related Work
By downloading the data, you agree with the terms & conditions mentioned below:
Data Access: The data in the research collection may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use them only for research purposes.
Summaries, analyses and interpretations of the linguistic properties of the information may be derived and published, provided it is impossible to reconstruct the information from these summaries. You may not try identifying the individuals whose texts are included in this dataset. You may not try to identify the original entry on the fact-checking site. You are not permitted to publish any portion of the dataset besides summary statistics or share it with anyone else.
We grant you the right to access the collection's content as described in this agreement. You may not otherwise make unauthorised commercial use of, reproduce, prepare derivative works, distribute copies, perform, or publicly display the collection or parts of it. You are responsible for keeping and storing the data in a way that others cannot access. The data is provided free of charge.
Citation
Please cite our work as
@InProceedings{clef-checkthat:2022:task3, author = {K{"o}hler, Juliane and Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Wiegand, Michael and Siegel, Melanie and Mandl, Thomas}, title = "Overview of the {CLEF}-2022 {CheckThat}! Lab Task 3 on Fake News Detection", year = {2022}, booktitle = "Working Notes of CLEF 2022---Conference and Labs of the Evaluation Forum", series = {CLEF~'2022}, address = {Bologna, Italy},}
@article{shahi2021overview, title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection}, author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas}, journal={Working Notes of CLEF}, year={2021} }
Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English and German.
Task 3: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. The training data will be released in batches and roughly about 1264 articles with the respective label in English language. Our definitions for the categories are as follows:
False - The main claim made in an article is untrue.
Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.
True - This rating indicates that the primary elements of the main claim are demonstrably true.
Other- An article that cannot be categorised as true, false, or partially false due to a lack of evidence about its claims. This category includes articles in dispute and unproven articles.
Cross-Lingual Task (German)
Along with the multi-class task for the English language, we have introduced a task for low-resourced language. We will provide the data for the test in the German language. The idea of the task is to use the English data and the concept of transfer to build a classification model for the German language.
Input Data
The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:
ID- Unique identifier of the news article
Title- Title of the news article
text- Text mentioned inside the news article
our rating - class of the news article as false, partially false, true, other
Output data format
public_id- Unique identifier of the news article
predicted_rating- predicted class
Sample File
public_id, predicted_rating 1, false 2, true
IMPORTANT!
We have used the data from 2010 to 2022, and the content of fake news is mixed up with several topics like elections, COVID-19 etc.
Baseline: For this task, we have created a baseline system. The baseline system can be found at https://zenodo.org/record/6362498
Related Work
Shahi GK. AMUSED: An Annotation Framework of Multi-modal Social Media Data. arXiv preprint arXiv:2010.00502. 2020 Oct 1.https://arxiv.org/pdf/2010.00502.pdf
G. K. Shahi and D. Nandini, “FakeCovid – a multilingual cross-domain fact check news dataset for covid-19,” in workshop Proceedings of the 14th International AAAI Conference on Web and Social Media, 2020. http://workshop-proceedings.icwsm.org/abstract?id=2020_14
Shahi, G. K., Dirkson, A., & Majchrzak, T. A. (2021). An exploratory study of covid-19 misinformation on twitter. Online Social Networks and Media, 22, 100104. doi: 10.1016/j.osnem.2020.100104
Shahi, G. K., Struß, J. M., & Mandl, T. (2021). Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection. Working Notes of CLEF.
Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeno, A., Míguez, R., Shaar, S., ... & Mandl, T. (2021, March). The CLEF-2021 CheckThat! lab on detecting check-worthy claims, previously fact-checked claims, and fake news. In European Conference on Information Retrieval (pp. 639-649). Springer, Cham.
Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeño, A., Míguez, R., Shaar, S., ... & Kartal, Y. S. (2021, September). Overview of the CLEF–2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 264-291). Springer, Cham.
According a survey held in April 2023, the share of people aged 18 years and above in the United States who were very confident in their ability to distinguish real news from false information amounted to 23 percent. A further 52 percent were somewhat confident that they were able to identify misinformation, whereas just five percent had little faith in themselves to determine facts from fake content.
Perhaps unsurprisingly, the main traffic source for false information online is social media, which generates 42 percent of fake news traffic. The nature of social networks, most notably the ease of sharing content, allows fake news to spread at a rapid rate – an issue further exacerbated by the fact that many U.S. adults sometimes believe fake news to be real.
Fake news: an ongoing problem
The presence of fake news would be less of an issue if users were more aware of how to identify it and were aware of the risks of sharing such content. Many U.S. news consumers have shared fake news online, and worryingly, ten percent did so deliberately. Adults who are part of that ten percent are just a small portion of people in the United States, and elsewhere in the world, who are responsible for spreading false information. More than 30 percent of U.S. children and teenagers have shared a fake news story online, and over 50 percent of adults in selected countries worldwide have wrongly believed a fake news story.
The result of adults and young consumers alike not only believing fake news, but actively sharing it, is that small, illegitimate websites producing such content are able to grow more successful. Such websites have the potential to tarnish or seriously damage the reputation of any persons mentioned within a fake news article, promote events or policies which do not exist, and mislead readers about important topics they are trying to keep up with. A 2019 survey revealed that most adults believe that fake news and misinformation will get worse in the next five years, and the sad truth is that this will likely be the case unless news consumers grow more discerning about what they post and share online.
The statistic presents the share of adults who have witnessed fake news in print media worldwide as of January 2019, broken down by country. The findings reveal that the majority of responding adults in Turkey said that they had witnessed fake news in print media, with 72 percent having encountered false information in a print publication compared to 18 percent who said they had not. Conversely, just 27 percent of respondents in Pakistan witnessed fake news in print media at some point.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this dataset have to part combined namely fake news and true news. fake news collected from Kaggle and some true news collected form IEEE Data port. Therefor some true news data required to optimize with the fake news. After that i have collect some true news from different trusted online site. Finally i have concat the Fake and True news as a single dataset for the purpose to help the Researchers further if they want to research by taken this topic.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Akash_Sandhu4x4
Released under MIT
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset contains the list of COVID Fake News/Claims which is shared all over the internet.
Content
Inspiration
In many research portals, there was this common question in which the combined fake news dataset is available or not. This led to the publication of this dataset.
According to a global study conducted in 2019, 62 percent of respondents felt that there was a fair extent or great deal of fake news on online websites and platforms. By comparison, 10 percent less said the same about TV, radio, newspapers, and magazines. Traditional media in general is still considered more trustworthy than online formats, despite social networks being the preferred choice for many.
Meanwhile, as some consumers around the world now turn to influencers for news instead of journalists, the risk of them being exposed to inaccurate, incorrect, or deliberately false information continues to grow, and journalists face pressure to battle fake content whilst finding new ways to keep audiences engaged.
Fake news and journalism
More than 50 percent of journalists responding to a global survey believed that the public had lost trust in the media over the past year. Whilst the reasons for this are many, the role of fake news cannot be undermined, particularly given the speed with which false content can spread and reach vulnerable or misinformed audiences. Either unintentionally or deliberately, fake news is often shared by those who encounter it, which only serves to worsen the problem. Indeed, journalists consider regular citizens to be the main source of disinformation, followed by political leaders and internet trolls.
Despite the threats fake news poses, journalists themselves feel that concerns about disinformation could positively impact the quality of journalism. There are also growing expectations from the public and journalists alike for governments and companies to do more to help boost quality journalism and curb the dissemination and influence of fake news. News industry leaders rated Google as being the best platform for supporting journalism, but the likes of Amazon and Snapchat have a long way to go before organizations consider them reliable in this respect.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains news articles classified into two categories: real and fake. It is designed to help researchers, data scientists, and students build and test machine learning models capable of detecting fake news.
title
: The title of the news article.content
: The full content of the news article (raw text).target
: A binary label indicating the authenticity of the news:The primary goals of this dataset are to: - Provide a resource for training and evaluating binary classification models. - Enable experiments on Natural Language Processing (NLP), such as text vectorization, sentiment analysis, and more. - Encourage exploration of approaches to identify biases in data related to fake news detection.
title | content | target |
---|---|---|
NASA announces new Mars rover mission | NASA revealed plans for a new mission to Mars starting in 2025. | 0 |
Vaccines implant 5G chips | Conspiracy theorists claim vaccines are used to implant 5G tracking. | 1 |
CountVectorizer
, TF-IDF
, or advanced models like BERT.https://brightdata.com/licensehttps://brightdata.com/license
Stay ahead with our comprehensive News Dataset, designed for businesses, analysts, and researchers to track global events, monitor media trends, and extract valuable insights from news sources worldwide.
Dataset Features
News Articles: Access structured news data, including headlines, summaries, full articles, publication dates, and source details. Ideal for media monitoring and sentiment analysis. Publisher & Source Information: Extract details about news publishers, including domain, region, and credibility indicators. Sentiment & Topic Classification: Analyze news sentiment, categorize articles by topic, and track emerging trends in real time. Historical & Real-Time Data: Retrieve historical archives or access continuously updated news feeds for up-to-date insights.
Customizable Subsets for Specific Needs Our News Dataset is fully customizable, allowing you to filter data based on publication date, region, topic, sentiment, or specific news sources. Whether you need broad coverage for trend analysis or focused data for competitive intelligence, we tailor the dataset to your needs.
Popular Use Cases
Media Monitoring & Reputation Management: Track brand mentions, analyze media coverage, and assess public sentiment. Market & Competitive Intelligence: Monitor industry trends, competitor activity, and emerging market opportunities. AI & Machine Learning Training: Use structured news data to train AI models for sentiment analysis, topic classification, and predictive analytics. Financial & Investment Research: Analyze news impact on stock markets, commodities, and economic indicators. Policy & Risk Analysis: Track regulatory changes, geopolitical events, and crisis developments in real time.
Whether you're analyzing market trends, monitoring brand reputation, or training AI models, our News Dataset provides the structured data you need. Get started today and customize your dataset to fit your business objectives.
According to a digital news consumption survey conducted in India in March 2023, more than 60 percent of the respondents claimed that they sometimes encountered potentially fake news online. In contrast, three percent of the surveyed consumers stated that they never encountered potentially fake news online. In recent years, the number of fake news-related incidents in India has been on the rise.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
news_dataset.csv
is a fake new classification dataset.It contains two columns label
and text
columns
text
columns : news text
label
columns : FAKE/REAL
Use 20% of the data as test dataset and rest 80% for training.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Curated database of fact checked claims (fake and real news), with close to 70.000 URLs, classified by topic.
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
The Fox News Dataset is a comprehensive collection of over 1 million news articles, offering an unparalleled resource for analyzing media narratives, public discourse, and political trends. Covering articles up to the year 2023, this dataset is a treasure trove for researchers, analysts, and businesses interested in gaining deeper insights into the topics and trends covered by Fox News.
This large dataset is ideal for:
Discover additional resources for your research needs by visiting our news dataset collection. These datasets are tailored to support diverse analytical applications, including sentiment analysis and trend modeling.
The Fox News Dataset is a must-have for anyone interested in exploring large-scale media data and leveraging it for advanced analysis. Ready to dive into this wealth of information? Download the dataset now in CSV format and start uncovering the stories behind the headlines.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The chart shows that Americans over 65 were more likely to share fake news to their Facebook friends, regardless of their education, ideology, and partisanship. The oldest age group was likely to share nearly seven times as many articles from fake news domains on Facebook as those in the youngest age group, or about 2.3 times as many as those in the next-oldest age group. The data regarding the age group 18-29 and 30-44 are not displayed in the source, therefore the value of data in this chart are approximate, determined with pixel count.
MM-COVID is a dataset for fake news detection related to COVID-19. This dataset provides the multilingual fake news and the relevant social context. It contains 3,981 pieces of fake news content and 7,192 trustworthy information from English, Spanish, Portuguese, Hindi, French and Italian, 6 different languages.
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Get access to a comprehensive and structured dataset of BBC News articles, freshly crawled and compiled in February 2023. This collection includes 1 million records from one of the world’s most trusted news organizations — perfect for training NLP models, sentiment analysis, and trend detection across global topics.
💾 Format: CSV (available in ZIP archive)
📢 Status: Published and available for immediate access
Train language models to summarize or categorize news
Detect media bias and compare narrative framing
Conduct research in journalism, politics, and public sentiment
Enrich news aggregation platforms with clean metadata
Analyze content distribution across categories (e.g. health, politics, tech)
This dataset ensures reliable and high-quality information sourced from a globally respected outlet. The format is optimized for quick ingestion into your pipelines — with clean text, timestamps, image links, and more.
Need a filtered dataset or want this refreshed for a later date? We offer on-demand news scraping as well.
👉 Request access or sample now
In 2025, Facebook remained the most-used social platform for news in the United States, with 32 percent of respondents reporting they accessed news on it. YouTube followed closely at 30 percent, recording a slight increase from the previous year. X (formerly Twitter) saw the most notable growth, rising by eight percent to 23 percent.