Facebook
TwitterIn a survey conducted in May 2025, journalism was rated the most positively by U.S. adults, with 54 percent describing it as very or somewhat favorable. Social media followed with 49 percent favorable, though a notable share of respondents also held negative views. The news media and the press were rated less positively, at 47 and 46 percent, respectively. Overall, the findings suggest stronger confidence in journalism compared to other media institutions.
Facebook
TwitterThe statistic presents the share of adults who have witnessed fake news on television worldwide as of January 2019, broken down by country. The findings reveal that the majority of responding adults in Turkey said that they had witnessed fake news on television, with 76 percent having encountered false information via that medium. Germany had the lowest share of respondents who said they'd seen fake news on TV, along with Japan, Great Britain and Pakistan where fewer than 40 percent of adults had witnessed fake news via TV in each country.
Facebook
TwitterAccording to a survey conducted in May 2025, 56 percent of adults in the United States said they actively seek out news, while 35 percent reported that news usually comes to them. A smaller share were unsure about their news consumption habits.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
WWFND: World Wide Fake News Dataset 1. Introduction The World Wide Fake News Dataset (WWFND) has been developed with the objective of facilitating research in the domain of fake news detection. This dataset has been created using Python’s web scraping library – BeautifulSoup, and comprises news articles collected from multiple globally recognised fact-checking and media organisations. The data has been carefully compiled from reputable news and fact-verification platforms identified by the Pew Research Center, including but not limited to:
BBC News
CNN
Al Jazeera
Times of India
The Hindu
PolitiFact
NBC News
CBS News
ABC News
NDTV
The Wire
These sources have been selected for their credibility and global or national reach. News articles were collected only after ensuring that they had been clearly classified as either true or fake by these organisations.
2. Dataset Summary The dataset comprises a total of 30,616 records, which include:
15,027 records identified as true news articles
15,589 records identified as fake news articles
To further enhance the robustness and applicability of the dataset, it has been combined with another dataset titled COVID19_FNIR, available through the IEEE Dataport at the following link: https://ieee-dataport.org/open-access/covid-19-fake-news-infodemic-research-dataset-covid19-fnir-dataset
This integration was undertaken to provide a more comprehensive dataset, especially for training machine learning models in detecting misinformation during global crises such as the COVID-19 pandemic.
3. Contents of the Dataset The WWFND dataset includes the following files:
This file contains the cleaned and preprocessed version of the dataset, combining both fake and true news articles.
This file contains the raw, unprocessed fake news articles collected from the sources mentioned above.
This file contains the raw, unprocessed true news articles obtained from the verified sources.
4. Applications This dataset is suitable for various applications, including:
Training and testing models for fake news detection
Text classification and content analysis using Natural Language Processing (NLP) techniques
Research in media literacy, misinformation tracking, and credibility assessment
Academic projects and data science competitions focused on information verification
5. Acknowledgements The dataset creators acknowledge the use of publicly available content solely for academic and research purposes. The COVID19_FNIR dataset has been used with reference to its source on IEEE Dataport.
6. Licensing and Usage This dataset is intended for educational and research use only. Users are advised to cite the original sources and the IEEE dataset if the WWFND dataset is used in any publication or project.
Facebook
TwitterA report investigating media literacy and news consumption revealed that consumers in Brazil found telling the difference between misinformation and facts most difficult, with 34 percent saying that they found it very or somewhat difficult to differentiate between false and real content. By contrast, Indian and Nigerian audiences were the least likely to have problems in this regard and reported finding it relatively easy to identify misinformation.
Facebook
TwitterData Access: The data in the research collection provided may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use it only for research purposes. Due to these restrictions, the collection is not open data. Please download the Agreement at Data Sharing Agreement and send the signed form to fakenewstask@gmail.com .
Citation
Please cite our work as
@article{shahi2021overview,
title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection},
author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas},
journal={Working Notes of CLEF},
year={2021}
}
Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English.
Subtask 3A: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. The training data will be released in batches and roughly about 900 articles with the respective label. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. Our definitions for the categories are as follows:
False - The main claim made in an article is untrue.
Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.
True - This rating indicates that the primary elements of the main claim are demonstrably true.
Other- An article that cannot be categorised as true, false, or partially false due to lack of evidence about its claims. This category includes articles in dispute and unproven articles.
Subtask 3B: Topical Domain Classification of News Articles (English) Fact-checkers require background expertise to identify the truthfulness of an article. The categorisation will help to automate the sampling process from a stream of data. Given the text of a news article, determine the topical domain of the article (English). This is a classification problem. The task is to categorise fake news articles into six topical categories like health, election, crime, climate, election, education. This task will be offered for a subset of the data of Subtask 3A.
Input Data
The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:
Task 3a
Task 3b
Output data format
Task 3a
Sample File
public_id, predicted_rating
1, false
2, true
Task 3b
Sample file
public_id, predicted_domain
1, health
2, crime
Additional data for Training
To train your model, the participant can use additional data with a similar format; some datasets are available over the web. We don't provide the background truth for those datasets. For testing, we will not use any articles from other datasets. Some of the possible source:
IMPORTANT!
Evaluation Metrics
This task is evaluated as a classification task. We will use the F1-macro measure for the ranking of teams. There is a limit of 5 runs (total and not per day), and only one person from a team is allowed to submit runs.
Submission Link: https://competitions.codalab.org/competitions/31238
Related Work
Facebook
TwitterA 2025 survey found that around one in four adults in the United States actively avoided news related to sports, followed by entertainment (18 percent) and lifestyle (17 percent). In contrast, health was the least avoided news topic, with just four percent of respondents saying they ignored it.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The widespread dissemination of misinformation on social media is a serious threat to global health. To a large extent, it is still unclear who actually shares health-related misinformation deliberately and accidentally. We conducted a large-scale online survey among 5,307 Facebook users in six sub-Saharan African countries, in which we collected information on sharing of fake news and truth discernment. We estimate the magnitude and determinants of deliberate and accidental sharing of misinformation related to three vaccines (HPV, polio, and COVID-19). In an OLS framework we relate the actual sharing of fake news to several socioeconomic characteristics (age, gender, employment status, education), social media consumption, personality factors and vaccine-related characteristics while controlling for country and vaccine-specific effects. We first show that actual sharing rates of fake news articles are substantially higher than those reported from developed countries and that most of the sharing occurs accidentally. Second, we reveal that the determinants of deliberate vs. accidental sharing differ. While deliberate sharing is related to being older and risk-loving, accidental sharing is associated with being older, male, and high levels of trust in institutions. Lastly, we demonstrate that the determinants of sharing differ by the adopted measure (intentions vs. actual sharing) which underscores the limitations of commonly used intention-based measures to derive insights about actual fake news sharing behaviour.
Facebook
TwitterThe statistic presents the share of adults who have witnessed fake news in print media worldwide as of January 2019, broken down by country. The findings reveal that the majority of responding adults in Turkey said that they had witnessed fake news in print media, with 72 percent having encountered false information in a print publication compared to 18 percent who said they had not. Conversely, just 27 percent of respondents in Pakistan witnessed fake news in print media at some point.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
As we all know, Fake-News has become the centre of attraction worldwide because of its hazardous impact on our society. One of the recent example is spread of Fake-news related to Covid-19 cure, precautions, and symptoms and you must be understood by now, how dangerous this bogus information could be. Distorted piece of information propagated at the times of election for achieving political agenda is not hidden from anyone.
Fake news is quickly becoming an epidemic, and it alarms and angers me how often and how rapidly totally fabricated stories circulate. Why? In the first place, the deceptive effect: the fact that if a lie is repeated enough times, you’ll begin to believe it’s true.
You understand by now that fake news and other types of false information can take on various appearances. They can likewise have significant effects, because information shapes our world view: we make important decisions based on information. We form an idea about people or a situation by obtaining information. So if the information we saw on the Web is invented, false, exaggerated or distorted, we won’t make good decisions.
Hence, Its in dire need to do something about it and It's a Big Data problem, where data scientist can contribute from their end to fight against Fake-News.
Although, fighting against fake-News is a big data problem but I have created this small dataset having approx. 10,000 piece of news article and meta-data scraped through approx. 600 web-pages of Politifact website to analyse it using data science skills and get some insights of how can we stop spread of misinformation at broader aspect and what approach will give us better accuracy to achieve the same.
This dataset is having 6 attributes among which News_Headline is the most important to us in order to classify news as FALSE or TRUE. As you notice the Label attribute clearly, there are 6 classes specified in it. So, it's totally up-to you whether you want to use my dataset for multi-class classification or convert these class labels into FALSE or TRUE and then, perform binary classification. Although, for your convenience, I will write a notebook on how to convert this dataset from multi-class to binary-class. To deal with the text data, you need to have good hands on practice on NLP & Data-Mining concepts.
News_Headline - contains piece of information that has to be analysed. Link_Of_News - contains url of News Headlines specified in very first column.Source - this column contains author names who has posted the information on facebook, instagram, twitter or any other social-media platform.Stated_On - This column contains date when the information is posted by the authors on different social-media platforms.Date - This column contains date when this piece of information is analysed by politifact team of fact-checkers in order to labelize as FAKE or REAL.Label - This column contains 5 class labels : True, Mostly-True, Half-True, Barely-True, False, Pants on Fire.So, you can either perform multi-class classification on it or convert Mostly-True, Half-True, Barely-True as True and drop Pants on Fire and perform Binary-class classification.
A very Big thanks to fact-checking team of Politifact.com website as they provide with correct labels by working hard manually. So that we data science people can take advantage to train our models on such labels and make better models. These are some research papers that will help you to get start with the project and clear your fundamentals.
"https://journals.sagepub.com/doi/full/10.1177/2053951719843310">Big Data and quality data for fake news and misinformation detection by Fatemeh Torabi Asr, Maite Taboada
"https://asistdl.onlinelibrary.wiley.com/doi/full/10.1002/pra2.2015.145052010082">Automatic deception detection: Methods for finding fake news by Nadia K. Conroy Victoria L. Rubin Yimin Chen
Facebook
TwitterA study held in early 2023 found that Indonesian adults were the most concerned about the spread of false information on social media, with over 80 percent saying that they were very or somewhat worried about the matter. Whilst Swedish and Danish respondents were less concerned about misinformation on social media, the global average among all countries was 68 percent, highlighting the growing awareness and worry about false information worldwide.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This file describes the contents of the replication archive used to conduct the analyses in the main text and appendix for Detecting Misinformation: Identifying False News Spread by Political Leaders in the Global South.
Facebook
TwitterIn May 2025, a survey asked U.S. adults how they feel while consuming news. The results indicate that a majority feel informed, with 53 percent saying that news generally makes them feel this way. At the same time, 43 percent reported feeling angry, and 32 percent said they feel depressed when consuming news. In contrast, only 16 percent described feeling hopeful. These findings highlight that while staying informed is a major benefit of news consumption, negative emotional reactions—such as anger and depression—are also very common among Americans.
Facebook
TwitterWSDM (pronounced "wisdom") is one of the premier conferences on web-inspired research involving search and data mining. The 12th ACM International WSDM Conference will take place in Melbourne, Australia during Feb. 11-15, 2019.
This task is organized by ByteDance, the Platinum Level Sponsor of the conference. ByteDance is a global Internet technology company started from China. Our goal is to build a global content platform that enable people to enjoy various content in various forms. We inform, entertain, and inspire people across language, culture and geography.
One of the challenges which we are facing is to combat different types of fake news. Fake news here refers to all forms of false, inaccurate or misleading information, which now poses a big threat to human civilization.
At Bytedance, we have created a large-scale database to store existing fake news articles. Any new article must go through a test on the truthfulness of content before being published. We conduct matching between the new article and the articles in the database. Articles identified as containing fake news will be withdrawn after human verification. The accuracy and efficiency of the process, therefore, becomes crucial for us to make the platform safe, reliable, and healthy.
This dataset is released as the competition dataset of Task: Fake News Classification with the following task:
Given the title of a fake news article A and the title of a coming news article B, participants are asked to classify B into one of the three categories.
The English titles are machine translated from the related Chinese titles. This may help participants from all background to get better understanding of the datasets. Participants are highly recommended to use the Chinese version titles to finish the task.
We use Weighted Categorization Accuracy to evaluate your performance. Weighted categorization accuracy can be generally defined as:
\[ WeightedAccuracy(y, \hat{y}, \omega) = \frac{1}{n} \displaystyle{\sum_{i=1}^{n}} \frac{\omega_i(y_i=\hat{y}_i)}{\sum \omega_i} \]
where \(y\) are ground truths, \(\hat{y}\) are the predicted results, and \(\omega_i\) is the weight associated with the \(i\)th item in the dataset.
In our test set, we assign each testing item a weight according to its category. The weights of the three categories, agreed, disagreed and unrelated are \(\frac{1}{15}\), \(\frac{1}{5}\), \(\frac{1}{16}\), respectively. We set the weights in consideration of the imbalance of the data distribution to minimize the bias to your performance caused by the majority class (unrelated pairs accounts for approximately 70% of the dataset).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract:
Analyzing the spread of information related to a specific event in the news has many potential applications. Consequently, various systems have been developed to facilitate the analysis of information spreadings such as detection of disease propagation and identification of the spreading of fake news through social media. There are several open challenges in the process of discerning information propagation, among them the lack of resources for training and evaluation. This paper describes the process of compiling a corpus from the EventRegistry global media monitoring system. We focus on information spreading in three domains: sports (i.e. the FIFA WorldCup), natural disasters (i.e. earthquakes), and climate change (i.e.global warming). This corpus is a valuable addition to the currently available datasets to examine the spreading of information about various kinds of events.Introduction:Domain-specific gaps in information spreading are ubiquitous and may exist due to economic conditions, political factors, or linguistic, geographical, time-zone, cultural, and other barriers. These factors potentially contribute to obstructing the flow of local as well as international news. We believe that there is a lack of research studies that examine, identify, and uncover the reasons for barriers in information spreading. Additionally, there is limited availability of datasets containing news text and metadata including time, place, source, and other relevant information. When a piece of information starts spreading, it implicitly raises questions such as asHow far does the information in the form of news reach out to the public?Does the content of news remain the same or changes to a certain extent?Do the cultural values impact the information especially when the same news will get translated in other languages?Statistics about datasets:
Statistics about datasets:
--------------------------------------------------------------------------------------------------------------------------------------
# Domain Event Type Articles Per Language Total Articles
1 Sports FIFA World Cup 983-en, 762-sp, 711-de, 10-sl, 216-pt 2679
2 Natural Disaster Earthquake 941-en, 999-sp, 937-de, 19-sl, 251-pt 3194
3 Climate Changes Global Warming 996-en, 298-sp, 545-de, 8-sl, 97-pt 1945
--------------------------------------------------------------------------------------------------------------------------------------
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Citizens these days feel inundated with news online and are worried about its veracity. This study examines if these concerns in the digital news environment led to greater news avoidance and news authentication behaviors. The relationships were tested across 16 countries by combining individual-level survey data from the Reuters Institute Digital News Report (N = 34,201) with country-level data based on comparative media systems research. Analysis from multilevel modeling showed that concern with fake news was related to news authentication and news fatigue was related to news avoidance. High news fatigue also accentuated the influence of concern with fake news on news avoidance while low fatigue attenuated the relationship. Additional cross-level interactions further contextualized the findings according to media system, showing how the relationships can vary under different conditions of press market, political parallelism, journalistic professionalism, and public service broadcasting. This study demonstrates the utility and importance of considering the contextual role of media system to understand individuals’ perceptions of news they receive online and subsequent news engagement, especially in the context of fake news research because its prevalence and deleterious impact varies across countries.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By [source]
The FakeCovid dataset is an unparalleled compilation of 7623 fact-checked news articles related to COVID-19. Obtained from 92 fact-checking websites located in 105 countries, this comprehensive collection covers a wide range of sources and languages, including locations across Africa, Europe, Asia, The Americas and Oceania. With data gathered from references on Poynter and Snopes, this unique dataset is an invaluable resource for researching the accuracy of global news related to the pandemic. It offers an invaluable insight into the international nature of COVID information with its column headers covering country's involved; categories such as coronavirus health updates or political interference during coronavirus; URLs for referenced articles; verifiers employed by websites; article classes that can range from true to false or even mixed evaluations; publication dates ; article sources injected with credibility verification as well as article text and language standardization. This one-of-a kind dataset serves as an essential tool in understanding both global information flow around the world concerning COVID 19 while simultaneously offering transparency into whose interests guide it
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
The FakeCovid dataset is a multilingual cross-domain collection of 7623 fact-checked news articles related to COVID-19. It is collected from 92 fact-checking websites and covers a wide range of sources and countries, including locations in Africa, Asia, Europe, The Americas, and Oceania. This dataset can be used for research related to understanding the truth and accuracy of news sources related to COVID-19 in different countries and languages.
To use this dataset effectively, you will need basic knowledge of data science principles such as data manipulation with pandas or Python libraries such as NumPy or ScikitLearn. The data is in CSV (comma separated values) format that can be read by most spreadsheet applications or text editor like Notepad++. Here are some steps on how to get started: - Access the FakeCovid Fact Checked News Dataset from Kaggle: https://www.kaggle.com/c/fakecovidfactcheckednewsdataset/data - Download the provided CSV file containing all fact checked news articles and place it into your desired folder location - Load the CSV file into your preferred software application like Jupyter Notebook or RStudio 4)Explore your dataset using built-in functions within data science libraries such as Pandas & matplotlib – find meaningful information through statistical analysis &//or create visualizations 5)Modify parameters within the csv file if required & save 6)Share your creative projects through Gitter chatroom #fakecovidauthors 7 )Publish any interesting discoveries you find within open source repositories like GitHub 8 )Engage with our Hangouts group #FakeCoviDFactCheckersClub 9 )Show off fun graphics via Twitter hashtag #FakeCovidiauthors 10 )Reach out if you have further questions via email contactfakecovidadatateam 11 )Stay connected by joining our mailing list#FakeCoviDAuthorsGroup
We hope this guide helps you better understand how to use our FakeCoviD Fact Checked News Dataset for generating meaningful insights relating to COVID-19 news articles worldwide!
- Developing an automated algorithm to detect fake news related to COVID-19 by leveraging the fact-checking flags and other results included in this dataset for machine learning and natural language processing tasks.
- Training a sentiment analysis model on the data to categorize articles according to their sentiments which can be used for further investigations into why certain news topics or countries have certain outcomes, motivations, or behaviors due to their content relatedness or author biasness(if any).
- Using unsupervised clustering techniques, this dataset could be used as a tool for identifying any discrepancies between news circulated in different populations in different countries (langauge and regions) so that publicists can focus more on providing factual information rather than spreading false rumors or misinformation about the pandemic
If you use this dataset in your research, please credit the original authors. Data Source
**License: [CC0 1.0 Universal (CC0 1.0) - Public Do...
Facebook
TwitterAccording to a global study conducted in 2019, ** percent of respondents felt that there was a fair extent or great deal of fake news on online websites and platforms. By comparison, ** percent less said the same about TV, radio, newspapers, and magazines. Traditional media in general is still considered more trustworthy than online formats, despite social networks being the preferred choice for many.
Meanwhile, as some consumers around the world now turn to influencers for news instead of journalists, the risk of them being exposed to inaccurate, incorrect, or deliberately false information continues to grow, and journalists face pressure to battle fake content whilst finding new ways to keep audiences engaged.
Fake news and journalism
More than ** percent of journalists responding to a global survey believed that the public had lost trust in the media over the past year. Whilst the reasons for this are many, the role of fake news cannot be undermined, particularly given the speed with which false content can spread and reach vulnerable or misinformed audiences. Either unintentionally or deliberately, fake news is often shared by those who encounter it, which only serves to worsen the problem. Indeed, journalists consider regular citizens to be the main source of disinformation, followed by political leaders and internet trolls.
Despite the threats fake news poses, journalists themselves feel that concerns about disinformation could positively impact the quality of journalism. There are also growing expectations from the public and journalists alike for governments and companies to do more to help boost quality journalism and curb the dissemination and influence of fake news. News industry leaders rated Google as being the best platform for supporting journalism, but the likes of Amazon and Snapchat have a long way to go before organizations consider them reliable in this respect.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Rebel News and Global Research Facebook and Twitter data.
Facebook
TwitterOnline platforms and other Internet services have provided new ways for people to connect, to debate and to gather information. However, the spread of news that intentionally mislead readers has become an increasing problem for the functioning of our democracies, affecting people’s understanding of reality. In June 2017, the European Parliament adopted a Resolution calling on the European Commission to analyse in depth the current situation and legal framework with regard to fake news, and to verify the possibility of legislative intervention to limit the dissemination and spreading of fake content. This Flash Eurobarometer is designed to explore EU citizens’ awareness of and attitudes towards the existence of fake news and disinformation online. It covers the following issues: - Levels of trust in news and information accessed through different channels; - People’s perceptions of how often they encounter news or information that is misleading or false; - Public confidence in identifying news or information that is misleading or false; - People’s views on the extent of the problem, both in their own country and for democracy in general; - Views on which institutions and media actors should act to stop the spread of fake news. #####The results by volumes are distributed as follows: * Volume A: Countries * Volume AA: Groups of countries * Volume A' (AP): Trends * Volume AA' (AAP): Trends of groups of countries * Volume B: EU/socio-demographics * Volume C: Country/socio-demographics ---- Researchers may also contact GESIS - Leibniz Institute for the Social Sciences: http://www.gesis.org/en/home/
Facebook
TwitterIn a survey conducted in May 2025, journalism was rated the most positively by U.S. adults, with 54 percent describing it as very or somewhat favorable. Social media followed with 49 percent favorable, though a notable share of respondents also held negative views. The news media and the press were rated less positively, at 47 and 46 percent, respectively. Overall, the findings suggest stronger confidence in journalism compared to other media institutions.