61 datasets found
  1. Opinions on selected media and news institutions in the U.S. 2025

    • statista.com
    Updated Sep 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amy Watson (2025). Opinions on selected media and news institutions in the U.S. 2025 [Dataset]. https://www.statista.com/topics/3251/fake-news/
    Explore at:
    Dataset updated
    Sep 12, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Amy Watson
    Area covered
    United States
    Description

    In a survey conducted in May 2025, journalism was rated the most positively by U.S. adults, with 54 percent describing it as very or somewhat favorable. Social media followed with 49 percent favorable, though a notable share of respondents also held negative views. The news media and the press were rated less positively, at 47 and 46 percent, respectively. Overall, the findings suggest stronger confidence in journalism compared to other media institutions.

  2. Encountering fake news on TV worldwide 2019, by country

    • statista.com
    Updated Jun 11, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2019). Encountering fake news on TV worldwide 2019, by country [Dataset]. https://www.statista.com/statistics/1017760/fake-news-television-worldwide/
    Explore at:
    Dataset updated
    Jun 11, 2019
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Dec 21, 2018 - Jan 4, 2019
    Area covered
    Worldwide
    Description

    The statistic presents the share of adults who have witnessed fake news on television worldwide as of January 2019, broken down by country. The findings reveal that the majority of responding adults in Turkey said that they had witnessed fake news on television, with 76 percent having encountered false information via that medium. Germany had the lowest share of respondents who said they'd seen fake news on TV, along with Japan, Great Britain and Pakistan where fewer than 40 percent of adults had witnessed fake news via TV in each country.

  3. Preferred ways of getting news among adults in the U.S. 2025

    • statista.com
    Updated Sep 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amy Watson (2025). Preferred ways of getting news among adults in the U.S. 2025 [Dataset]. https://www.statista.com/topics/3251/fake-news/
    Explore at:
    Dataset updated
    Sep 12, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Amy Watson
    Area covered
    United States
    Description

    According to a survey conducted in May 2025, 56 percent of adults in the United States said they actively seek out news, while 35 percent reported that news usually comes to them. A smaller share were unsure about their news consumption habits.

  4. WWFND (World Wide Fake News Dataset)

    • kaggle.com
    zip
    Updated May 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rameez Raja (2025). WWFND (World Wide Fake News Dataset) [Dataset]. https://www.kaggle.com/datasets/rameezraja11/wwfnd-world-wide-fake-news-dataset
    Explore at:
    zip(3771151 bytes)Available download formats
    Dataset updated
    May 10, 2025
    Authors
    Rameez Raja
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    World
    Description

    WWFND: World Wide Fake News Dataset 1. Introduction The World Wide Fake News Dataset (WWFND) has been developed with the objective of facilitating research in the domain of fake news detection. This dataset has been created using Python’s web scraping library – BeautifulSoup, and comprises news articles collected from multiple globally recognised fact-checking and media organisations. The data has been carefully compiled from reputable news and fact-verification platforms identified by the Pew Research Center, including but not limited to:

    BBC News

    CNN

    Al Jazeera

    Times of India

    The Hindu

    PolitiFact

    NBC News

    CBS News

    ABC News

    NDTV

    The Wire

    These sources have been selected for their credibility and global or national reach. News articles were collected only after ensuring that they had been clearly classified as either true or fake by these organisations.

    2. Dataset Summary The dataset comprises a total of 30,616 records, which include:

    15,027 records identified as true news articles

    15,589 records identified as fake news articles

    To further enhance the robustness and applicability of the dataset, it has been combined with another dataset titled COVID19_FNIR, available through the IEEE Dataport at the following link: https://ieee-dataport.org/open-access/covid-19-fake-news-infodemic-research-dataset-covid19-fnir-dataset

    This integration was undertaken to provide a more comprehensive dataset, especially for training machine learning models in detecting misinformation during global crises such as the COVID-19 pandemic.

    3. Contents of the Dataset The WWFND dataset includes the following files:

    • Final_Preprocessed_WWFND_BothFakeAndTrueNews.csv

    This file contains the cleaned and preprocessed version of the dataset, combining both fake and true news articles.

    • WWFND_FakeNews_RawData.csv

    This file contains the raw, unprocessed fake news articles collected from the sources mentioned above.

    • WWFND_TrueNews_RawData.csv

    This file contains the raw, unprocessed true news articles obtained from the verified sources.

    4. Applications This dataset is suitable for various applications, including:

    Training and testing models for fake news detection

    Text classification and content analysis using Natural Language Processing (NLP) techniques

    Research in media literacy, misinformation tracking, and credibility assessment

    Academic projects and data science competitions focused on information verification

    5. Acknowledgements The dataset creators acknowledge the use of publicly available content solely for academic and research purposes. The COVID19_FNIR dataset has been used with reference to its source on IEEE Dataport.

    6. Licensing and Usage This dataset is intended for educational and research use only. Users are advised to cite the original sources and the IEEE dataset if the WWFND dataset is used in any publication or project.

  5. Identifying fake news. vs facts online in selected countries worldwide 2020

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Identifying fake news. vs facts online in selected countries worldwide 2020 [Dataset]. https://www.statista.com/statistics/1227193/identifying-misinformation-difficulty-worldwide/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    A report investigating media literacy and news consumption revealed that consumers in Brazil found telling the difference between misinformation and facts most difficult, with 34 percent saying that they found it very or somewhat difficult to differentiate between false and real content. By contrast, Indian and Nigerian audiences were the least likely to have problems in this regard and reported finding it relatively easy to identify misinformation.

  6. CT-FAN-21 corpus: A dataset for Fake News Detection

    • zenodo.org
    Updated Oct 23, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl (2022). CT-FAN-21 corpus: A dataset for Fake News Detection [Dataset]. http://doi.org/10.5281/zenodo.4714517
    Explore at:
    Dataset updated
    Oct 23, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl
    Description

    Data Access: The data in the research collection provided may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use it only for research purposes. Due to these restrictions, the collection is not open data. Please download the Agreement at Data Sharing Agreement and send the signed form to fakenewstask@gmail.com .

    Citation

    Please cite our work as

    @article{shahi2021overview,
     title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection},
     author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas},
     journal={Working Notes of CLEF},
     year={2021}
    }

    Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English.

    Subtask 3A: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. The training data will be released in batches and roughly about 900 articles with the respective label. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. Our definitions for the categories are as follows:

    • False - The main claim made in an article is untrue.

    • Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.

    • True - This rating indicates that the primary elements of the main claim are demonstrably true.

    • Other- An article that cannot be categorised as true, false, or partially false due to lack of evidence about its claims. This category includes articles in dispute and unproven articles.

    Subtask 3B: Topical Domain Classification of News Articles (English) Fact-checkers require background expertise to identify the truthfulness of an article. The categorisation will help to automate the sampling process from a stream of data. Given the text of a news article, determine the topical domain of the article (English). This is a classification problem. The task is to categorise fake news articles into six topical categories like health, election, crime, climate, election, education. This task will be offered for a subset of the data of Subtask 3A.

    Input Data

    The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:

    Task 3a

    • ID- Unique identifier of the news article
    • Title- Title of the news article
    • text- Text mentioned inside the news article
    • our rating - class of the news article as false, partially false, true, other

    Task 3b

    • public_id- Unique identifier of the news article
    • Title- Title of the news article
    • text- Text mentioned inside the news article
    • domain - domain of the given news article(applicable only for task B)

    Output data format

    Task 3a

    • public_id- Unique identifier of the news article
    • predicted_rating- predicted class

    Sample File

    public_id, predicted_rating
    1, false
    2, true

    Task 3b

    • public_id- Unique identifier of the news article
    • predicted_domain- predicted domain

    Sample file

    public_id, predicted_domain
    1, health
    2, crime

    Additional data for Training

    To train your model, the participant can use additional data with a similar format; some datasets are available over the web. We don't provide the background truth for those datasets. For testing, we will not use any articles from other datasets. Some of the possible source:

    IMPORTANT!

    1. Fake news article used for task 3b is a subset of task 3a.
    2. We have used the data from 2010 to 2021, and the content of fake news is mixed up with several topics like election, COVID-19 etc.

    Evaluation Metrics

    This task is evaluated as a classification task. We will use the F1-macro measure for the ranking of teams. There is a limit of 5 runs (total and not per day), and only one person from a team is allowed to submit runs.

    Submission Link: https://competitions.codalab.org/competitions/31238

    Related Work

    • Shahi GK. AMUSED: An Annotation Framework of Multi-modal Social Media Data. arXiv preprint arXiv:2010.00502. 2020 Oct 1.https://arxiv.org/pdf/2010.00502.pdf
    • G. K. Shahi and D. Nandini, “FakeCovid – a multilingualcross-domain fact check news dataset for covid-19,” inWorkshop Proceedings of the 14th International AAAIConference on Web and Social Media, 2020. http://workshop-proceedings.icwsm.org/abstract?id=2020_14
    • Shahi, G. K., Dirkson, A., & Majchrzak, T. A. (2021). An exploratory study of covid-19 misinformation on twitter. Online Social Networks and Media, 22, 100104. doi: 10.1016/j.osnem.2020.100104
  7. Most avoided news topics among adults in the U.S. 2025

    • statista.com
    Updated Sep 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amy Watson (2025). Most avoided news topics among adults in the U.S. 2025 [Dataset]. https://www.statista.com/topics/3251/fake-news/
    Explore at:
    Dataset updated
    Sep 12, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Amy Watson
    Area covered
    United States
    Description

    A 2025 survey found that around one in four adults in the United States actively avoided news related to sports, followed by entertainment (18 percent) and lifestyle (17 percent). In contrast, health was the least avoided news topic, with just four percent of respondents saying they ignored it.

  8. S1 Data -

    • plos.figshare.com
    zip
    Updated Apr 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kerstin Unfried; Jan Priebe (2024). S1 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0301818.s009
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 9, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Kerstin Unfried; Jan Priebe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The widespread dissemination of misinformation on social media is a serious threat to global health. To a large extent, it is still unclear who actually shares health-related misinformation deliberately and accidentally. We conducted a large-scale online survey among 5,307 Facebook users in six sub-Saharan African countries, in which we collected information on sharing of fake news and truth discernment. We estimate the magnitude and determinants of deliberate and accidental sharing of misinformation related to three vaccines (HPV, polio, and COVID-19). In an OLS framework we relate the actual sharing of fake news to several socioeconomic characteristics (age, gender, employment status, education), social media consumption, personality factors and vaccine-related characteristics while controlling for country and vaccine-specific effects. We first show that actual sharing rates of fake news articles are substantially higher than those reported from developed countries and that most of the sharing occurs accidentally. Second, we reveal that the determinants of deliberate vs. accidental sharing differ. While deliberate sharing is related to being older and risk-loving, accidental sharing is associated with being older, male, and high levels of trust in institutions. Lastly, we demonstrate that the determinants of sharing differ by the adopted measure (intentions vs. actual sharing) which underscores the limitations of commonly used intention-based measures to derive insights about actual fake news sharing behaviour.

  9. Encountering fake news in print media worldwide 2019, by country

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Encountering fake news in print media worldwide 2019, by country [Dataset]. https://www.statista.com/statistics/1016534/fake-news-print-media-worldwide/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Dec 21, 2018 - Jan 4, 2019
    Area covered
    Worldwide
    Description

    The statistic presents the share of adults who have witnessed fake news in print media worldwide as of January 2019, broken down by country. The findings reveal that the majority of responding adults in Turkey said that they had witnessed fake news in print media, with 72 percent having encountered false information in a print publication compared to 18 percent who said they had not. Conversely, just 27 percent of respondents in Pakistan witnessed fake news in print media at some point.

  10. Fake-Real News

    • kaggle.com
    zip
    Updated Jun 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kajal Yadav (2020). Fake-Real News [Dataset]. https://www.kaggle.com/techykajal/fakereal-news
    Explore at:
    zip(864364 bytes)Available download formats
    Dataset updated
    Jun 23, 2020
    Authors
    Kajal Yadav
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Context

    As we all know, Fake-News has become the centre of attraction worldwide because of its hazardous impact on our society. One of the recent example is spread of Fake-news related to Covid-19 cure, precautions, and symptoms and you must be understood by now, how dangerous this bogus information could be. Distorted piece of information propagated at the times of election for achieving political agenda is not hidden from anyone.

    Fake news is quickly becoming an epidemic, and it alarms and angers me how often and how rapidly totally fabricated stories circulate. Why? In the first place, the deceptive effect: the fact that if a lie is repeated enough times, you’ll begin to believe it’s true.

    You understand by now that fake news and other types of false information can take on various appearances. They can likewise have significant effects, because information shapes our world view: we make important decisions based on information. We form an idea about people or a situation by obtaining information. So if the information we saw on the Web is invented, false, exaggerated or distorted, we won’t make good decisions.

    Hence, Its in dire need to do something about it and It's a Big Data problem, where data scientist can contribute from their end to fight against Fake-News.

    Content

    Although, fighting against fake-News is a big data problem but I have created this small dataset having approx. 10,000 piece of news article and meta-data scraped through approx. 600 web-pages of Politifact website to analyse it using data science skills and get some insights of how can we stop spread of misinformation at broader aspect and what approach will give us better accuracy to achieve the same.

    This dataset is having 6 attributes among which News_Headline is the most important to us in order to classify news as FALSE or TRUE. As you notice the Label attribute clearly, there are 6 classes specified in it. So, it's totally up-to you whether you want to use my dataset for multi-class classification or convert these class labels into FALSE or TRUE and then, perform binary classification. Although, for your convenience, I will write a notebook on how to convert this dataset from multi-class to binary-class. To deal with the text data, you need to have good hands on practice on NLP & Data-Mining concepts.

    • News_Headline - contains piece of information that has to be analysed.
    • Link_Of_News - contains url of News Headlines specified in very first column.
    • Source - this column contains author names who has posted the information on facebook, instagram, twitter or any other social-media platform.
    • Stated_On - This column contains date when the information is posted by the authors on different social-media platforms.
    • Date - This column contains date when this piece of information is analysed by politifact team of fact-checkers in order to labelize as FAKE or REAL.
    • Label - This column contains 5 class labels : True, Mostly-True, Half-True, Barely-True, False, Pants on Fire.

    So, you can either perform multi-class classification on it or convert Mostly-True, Half-True, Barely-True as True and drop Pants on Fire and perform Binary-class classification.

    Acknowledgements

    Inspiration

    • I want to see which approach can solve this problem of combating Fake-News with greater accuracy.
  11. Consumers worried about false information on social media worldwide 2023

    • statista.com
    Updated Apr 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Consumers worried about false information on social media worldwide 2023 [Dataset]. https://www.statista.com/statistics/1461636/false-information-concern-worldwide/
    Explore at:
    Dataset updated
    Apr 22, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Mar 2023
    Area covered
    Worldwide
    Description

    A study held in early 2023 found that Indonesian adults were the most concerned about the spread of false information on social media, with over 80 percent saying that they were very or somewhat worried about the matter. Whilst Swedish and Danish respondents were less concerned about misinformation on social media, the global average among all countries was 68 percent, highlighting the growing awareness and worry about false information worldwide.

  12. H

    Replication Data for Detecting Misinformation: Identifying False News Spread...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Jan 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Valerie Wirtschafter; Frederico Batista Pereira; Natala Bueno; Nara Pavão; João Oliveira dos Santos; Felipe Nunes (2024). Replication Data for Detecting Misinformation: Identifying False News Spread by Political Leaders in the Global South [Dataset]. http://doi.org/10.7910/DVN/EQL5E4
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 18, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Valerie Wirtschafter; Frederico Batista Pereira; Natala Bueno; Nara Pavão; João Oliveira dos Santos; Felipe Nunes
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This file describes the contents of the replication archive used to conduct the analyses in the main text and appendix for Detecting Misinformation: Identifying False News Spread by Political Leaders in the Global South.

  13. How news consumption affects adults in the U.S. 2025

    • statista.com
    Updated Sep 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amy Watson (2025). How news consumption affects adults in the U.S. 2025 [Dataset]. https://www.statista.com/topics/3251/fake-news/
    Explore at:
    Dataset updated
    Sep 12, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Amy Watson
    Area covered
    United States
    Description

    In May 2025, a survey asked U.S. adults how they feel while consuming news. The results indicate that a majority feel informed, with 53 percent saying that news generally makes them feel this way. At the same time, 43 percent reported feeling angry, and 32 percent said they feel depressed when consuming news. In contrast, only 16 percent described feeling hopeful. These findings highlight that while staying informed is a major benefit of news consumption, negative emotional reactions—such as anger and depression—are also very common among Americans.

  14. WSDM - Fake News Classification

    • kaggle.com
    zip
    Updated Apr 2, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bytedance WSDM Cup 2019 (2019). WSDM - Fake News Classification [Dataset]. https://www.kaggle.com/wsdmcup/wsdm-fake-news-classification
    Explore at:
    zip(36334250 bytes)Available download formats
    Dataset updated
    Apr 2, 2019
    Authors
    Bytedance WSDM Cup 2019
    Description

    Background

    WSDM (pronounced "wisdom") is one of the premier conferences on web-inspired research involving search and data mining. The 12th ACM International WSDM Conference will take place in Melbourne, Australia during Feb. 11-15, 2019.

    This task is organized by ByteDance, the Platinum Level Sponsor of the conference. ByteDance is a global Internet technology company started from China. Our goal is to build a global content platform that enable people to enjoy various content in various forms. We inform, entertain, and inspire people across language, culture and geography.

    One of the challenges which we are facing is to combat different types of fake news. Fake news here refers to all forms of false, inaccurate or misleading information, which now poses a big threat to human civilization.

    At Bytedance, we have created a large-scale database to store existing fake news articles. Any new article must go through a test on the truthfulness of content before being published. We conduct matching between the new article and the articles in the database. Articles identified as containing fake news will be withdrawn after human verification. The accuracy and efficiency of the process, therefore, becomes crucial for us to make the platform safe, reliable, and healthy.

    About This Dataset

    This dataset is released as the competition dataset of Task: Fake News Classification with the following task:

    Given the title of a fake news article A and the title of a coming news article B, participants are asked to classify B into one of the three categories.

    • agreed: B talks about the same fake news as A
    • disagreed: B refutes the fake news in A
    • unrelated: B is unrelated to A

    File

    • train.csv - training data contains 320,767 news pairs in both Chinese and English. This file provides the only data you can use to finish the task. Using external data is not allowed.
    • test.csv - testing data contains 80,126 news pairs in both Chinese and English. The approximately 25% of the testing data is set to be public and is used to calculate your accuracy shown on the leading board. The remaining 75% private data is used to calculate your final result of the competition.
    • sample_submission.csv - sample answer to the testing data.

    Data fields

    • id - the id of each news pair.
    • tid1 - the id of fake news title 1.
    • tid2 - the id of news title 2.
    • title1_zh - the fake news title 1 in Chinese.
    • title2_zh - the news title 2 in Chinese.
    • title1_en - the fake news title 1 in English.
    • title2_en - the news title 2 in English.
    • label - indicates the relation between the news pair: agreed/disagreed/unrelated.

    The English titles are machine translated from the related Chinese titles. This may help participants from all background to get better understanding of the datasets. Participants are highly recommended to use the Chinese version titles to finish the task.

    Evaluation Metrics

    We use Weighted Categorization Accuracy to evaluate your performance. Weighted categorization accuracy can be generally defined as:

    \[ WeightedAccuracy(y, \hat{y}, \omega) = \frac{1}{n} \displaystyle{\sum_{i=1}^{n}} \frac{\omega_i(y_i=\hat{y}_i)}{\sum \omega_i} \]

    where \(y\) are ground truths, \(\hat{y}\) are the predicted results, and \(\omega_i\) is the weight associated with the \(i\)th item in the dataset.

    In our test set, we assign each testing item a weight according to its category. The weights of the three categories, agreed, disagreed and unrelated are \(\frac{1}{15}\), \(\frac{1}{5}\), \(\frac{1}{16}\), respectively. We set the weights in consideration of the imbalance of the data distribution to minimize the bias to your performance caused by the majority class (unrelated pairs accounts for approximately 70% of the dataset).

  15. E

    Data from: A Data set for Information Spreading over the News

    • live.european-language-grid.eu
    txt
    Updated Nov 28, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). A Data set for Information Spreading over the News [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7719
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 28, 2021
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract:

    Analyzing the spread of information related to a specific event in the news has many potential applications. Consequently, various systems have been developed to facilitate the analysis of information spreadings such as detection of disease propagation and identification of the spreading of fake news through social media. There are several open challenges in the process of discerning information propagation, among them the lack of resources for training and evaluation. This paper describes the process of compiling a corpus from the EventRegistry global media monitoring system. We focus on information spreading in three domains: sports (i.e. the FIFA WorldCup), natural disasters (i.e. earthquakes), and climate change (i.e.global warming). This corpus is a valuable addition to the currently available datasets to examine the spreading of information about various kinds of events.Introduction:Domain-specific gaps in information spreading are ubiquitous and may exist due to economic conditions, political factors, or linguistic, geographical, time-zone, cultural, and other barriers. These factors potentially contribute to obstructing the flow of local as well as international news. We believe that there is a lack of research studies that examine, identify, and uncover the reasons for barriers in information spreading. Additionally, there is limited availability of datasets containing news text and metadata including time, place, source, and other relevant information. When a piece of information starts spreading, it implicitly raises questions such as asHow far does the information in the form of news reach out to the public?Does the content of news remain the same or changes to a certain extent?Do the cultural values impact the information especially when the same news will get translated in other languages?Statistics about datasets:

    Statistics about datasets:

    --------------------------------------------------------------------------------------------------------------------------------------

    # Domain Event Type Articles Per Language Total Articles

    1 Sports FIFA World Cup 983-en, 762-sp, 711-de, 10-sl, 216-pt 2679

    2 Natural Disaster Earthquake 941-en, 999-sp, 937-de, 19-sl, 251-pt 3194

    3 Climate Changes Global Warming 996-en, 298-sp, 545-de, 8-sl, 97-pt 1945

    --------------------------------------------------------------------------------------------------------------------------------------

  16. Data from: Avoid or Authenticate? A Multilevel Cross-Country Analysis of the...

    • tandf.figshare.com
    odt
    Updated Apr 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Chan; Francis L. F. Lee; Hsuan-Ting Chen (2024). Avoid or Authenticate? A Multilevel Cross-Country Analysis of the Roles of Fake News Concern and News Fatigue on News Avoidance and Authentication [Dataset]. http://doi.org/10.6084/m9.figshare.19369373.v1
    Explore at:
    odtAvailable download formats
    Dataset updated
    Apr 24, 2024
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Michael Chan; Francis L. F. Lee; Hsuan-Ting Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Citizens these days feel inundated with news online and are worried about its veracity. This study examines if these concerns in the digital news environment led to greater news avoidance and news authentication behaviors. The relationships were tested across 16 countries by combining individual-level survey data from the Reuters Institute Digital News Report (N = 34,201) with country-level data based on comparative media systems research. Analysis from multilevel modeling showed that concern with fake news was related to news authentication and news fatigue was related to news avoidance. High news fatigue also accentuated the influence of concern with fake news on news avoidance while low fatigue attenuated the relationship. Additional cross-level interactions further contextualized the findings according to media system, showing how the relationships can vary under different conditions of press market, political parallelism, journalistic professionalism, and public service broadcasting. This study demonstrates the utility and importance of considering the contextual role of media system to understand individuals’ perceptions of news they receive online and subsequent news engagement, especially in the context of fake news research because its prevalence and deleterious impact varies across countries.

  17. FakeCovid Fact-Checked News Dataset

    • kaggle.com
    zip
    Updated Feb 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). FakeCovid Fact-Checked News Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/fakecovid-fact-checked-news-dataset
    Explore at:
    zip(19911252 bytes)Available download formats
    Dataset updated
    Feb 1, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    FakeCovid Fact-Checked News Dataset

    International Coverage of COVID-19 in 40 Languages from 105 Countries

    By [source]

    About this dataset

    The FakeCovid dataset is an unparalleled compilation of 7623 fact-checked news articles related to COVID-19. Obtained from 92 fact-checking websites located in 105 countries, this comprehensive collection covers a wide range of sources and languages, including locations across Africa, Europe, Asia, The Americas and Oceania. With data gathered from references on Poynter and Snopes, this unique dataset is an invaluable resource for researching the accuracy of global news related to the pandemic. It offers an invaluable insight into the international nature of COVID information with its column headers covering country's involved; categories such as coronavirus health updates or political interference during coronavirus; URLs for referenced articles; verifiers employed by websites; article classes that can range from true to false or even mixed evaluations; publication dates ; article sources injected with credibility verification as well as article text and language standardization. This one-of-a kind dataset serves as an essential tool in understanding both global information flow around the world concerning COVID 19 while simultaneously offering transparency into whose interests guide it

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    The FakeCovid dataset is a multilingual cross-domain collection of 7623 fact-checked news articles related to COVID-19. It is collected from 92 fact-checking websites and covers a wide range of sources and countries, including locations in Africa, Asia, Europe, The Americas, and Oceania. This dataset can be used for research related to understanding the truth and accuracy of news sources related to COVID-19 in different countries and languages.

    To use this dataset effectively, you will need basic knowledge of data science principles such as data manipulation with pandas or Python libraries such as NumPy or ScikitLearn. The data is in CSV (comma separated values) format that can be read by most spreadsheet applications or text editor like Notepad++. Here are some steps on how to get started: - Access the FakeCovid Fact Checked News Dataset from Kaggle: https://www.kaggle.com/c/fakecovidfactcheckednewsdataset/data - Download the provided CSV file containing all fact checked news articles and place it into your desired folder location - Load the CSV file into your preferred software application like Jupyter Notebook or RStudio 4)Explore your dataset using built-in functions within data science libraries such as Pandas & matplotlib – find meaningful information through statistical analysis &//or create visualizations 5)Modify parameters within the csv file if required & save 6)Share your creative projects through Gitter chatroom #fakecovidauthors 7 )Publish any interesting discoveries you find within open source repositories like GitHub 8 )Engage with our Hangouts group #FakeCoviDFactCheckersClub 9 )Show off fun graphics via Twitter hashtag #FakeCovidiauthors 10 )Reach out if you have further questions via email contactfakecovidadatateam 11 )Stay connected by joining our mailing list#FakeCoviDAuthorsGroup

    We hope this guide helps you better understand how to use our FakeCoviD Fact Checked News Dataset for generating meaningful insights relating to COVID-19 news articles worldwide!

    Research Ideas

    • Developing an automated algorithm to detect fake news related to COVID-19 by leveraging the fact-checking flags and other results included in this dataset for machine learning and natural language processing tasks.
    • Training a sentiment analysis model on the data to categorize articles according to their sentiments which can be used for further investigations into why certain news topics or countries have certain outcomes, motivations, or behaviors due to their content relatedness or author biasness(if any).
    • Using unsupervised clustering techniques, this dataset could be used as a tool for identifying any discrepancies between news circulated in different populations in different countries (langauge and regions) so that publicists can focus more on providing factual information rather than spreading false rumors or misinformation about the pandemic

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    **License: [CC0 1.0 Universal (CC0 1.0) - Public Do...

  18. Perceived prevalence of fake news in media sources worldwide 2019

    • statista.com
    Updated Nov 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Perceived prevalence of fake news in media sources worldwide 2019 [Dataset]. https://www.statista.com/statistics/1112026/fake-news-prevalence-attitudes-worldwide/
    Explore at:
    Dataset updated
    Nov 27, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 25, 2019 - Feb 8, 2019
    Area covered
    Worldwide
    Description

    According to a global study conducted in 2019, ** percent of respondents felt that there was a fair extent or great deal of fake news on online websites and platforms. By comparison, ** percent less said the same about TV, radio, newspapers, and magazines. Traditional media in general is still considered more trustworthy than online formats, despite social networks being the preferred choice for many.

    Meanwhile, as some consumers around the world now turn to influencers for news instead of journalists, the risk of them being exposed to inaccurate, incorrect, or deliberately false information continues to grow, and journalists face pressure to battle fake content whilst finding new ways to keep audiences engaged.

    Fake news and journalism

    More than ** percent of journalists responding to a global survey believed that the public had lost trust in the media over the past year. Whilst the reasons for this are many, the role of fake news cannot be undermined, particularly given the speed with which false content can spread and reach vulnerable or misinformed audiences. Either unintentionally or deliberately, fake news is often shared by those who encounter it, which only serves to worsen the problem. Indeed, journalists consider regular citizens to be the main source of disinformation, followed by political leaders and internet trolls.

    Despite the threats fake news poses, journalists themselves feel that concerns about disinformation could positively impact the quality of journalism. There are also growing expectations from the public and journalists alike for governments and companies to do more to help boost quality journalism and curb the dissemination and influence of fake news. News industry leaders rated Google as being the best platform for supporting journalism, but the likes of Amazon and Snapchat have a long way to go before organizations consider them reliable in this respect.

  19. Rebel News / Global Research Social Media Data

    • figshare.com
    xlsx
    Updated Dec 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abde Amr (2020). Rebel News / Global Research Social Media Data [Dataset]. http://doi.org/10.6084/m9.figshare.13499295.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Dec 29, 2020
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Abde Amr
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Rebel News and Global Research Facebook and Twitter data.

  20. Flash Eurobarometer 464: Fake News and Disinformation Online

    • data.wu.ac.at
    zip
    Updated Sep 4, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Union Open Data Portal (2018). Flash Eurobarometer 464: Fake News and Disinformation Online [Dataset]. https://data.wu.ac.at/schema/www_europeandataportal_eu/YWZhMzNkMWUtZDFlYy00MzU0LThkNDctYzI3NDZlMDljMjA0
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 4, 2018
    Dataset provided by
    EU Open Data Portalhttp://data.europa.eu/
    European Union-
    Description

    Online platforms and other Internet services have provided new ways for people to connect, to debate and to gather information. However, the spread of news that intentionally mislead readers has become an increasing problem for the functioning of our democracies, affecting people’s understanding of reality. In June 2017, the European Parliament adopted a Resolution calling on the European Commission to analyse in depth the current situation and legal framework with regard to fake news, and to verify the possibility of legislative intervention to limit the dissemination and spreading of fake content. This Flash Eurobarometer is designed to explore EU citizens’ awareness of and attitudes towards the existence of fake news and disinformation online. It covers the following issues: - Levels of trust in news and information accessed through different channels; - People’s perceptions of how often they encounter news or information that is misleading or false; - Public confidence in identifying news or information that is misleading or false; - People’s views on the extent of the problem, both in their own country and for democracy in general; - Views on which institutions and media actors should act to stop the spread of fake news. #####The results by volumes are distributed as follows: * Volume A: Countries * Volume AA: Groups of countries * Volume A' (AP): Trends * Volume AA' (AAP): Trends of groups of countries * Volume B: EU/socio-demographics * Volume C: Country/socio-demographics ---- Researchers may also contact GESIS - Leibniz Institute for the Social Sciences: http://www.gesis.org/en/home/

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Amy Watson (2025). Opinions on selected media and news institutions in the U.S. 2025 [Dataset]. https://www.statista.com/topics/3251/fake-news/
Organization logo

Opinions on selected media and news institutions in the U.S. 2025

Explore at:
30 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Sep 12, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Amy Watson
Area covered
United States
Description

In a survey conducted in May 2025, journalism was rated the most positively by U.S. adults, with 54 percent describing it as very or somewhat favorable. Social media followed with 49 percent favorable, though a notable share of respondents also held negative views. The news media and the press were rated less positively, at 47 and 46 percent, respectively. Overall, the findings suggest stronger confidence in journalism compared to other media institutions.

Search
Clear search
Close search
Google apps
Main menu