100+ datasets found
  1. C

    Fake News Statistics By Impacts, AI, Country, Misinformation, Frequency,...

    • coolest-gadgets.com
    Updated Jan 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Coolest Gadgets (2025). Fake News Statistics By Impacts, AI, Country, Misinformation, Frequency, Media Outlets And Economic Losses [Dataset]. https://coolest-gadgets.com/fake-news-statistics/
    Explore at:
    Dataset updated
    Jan 9, 2025
    Dataset authored and provided by
    Coolest Gadgets
    License

    https://coolest-gadgets.com/privacy-policyhttps://coolest-gadgets.com/privacy-policy

    Time period covered
    2022 - 2032
    Area covered
    Global
    Description

    Introduction

    Fake News Statistics: Fake news has become a major problem in today's digital age in recent years. It spreads quickly through social media and other online platforms, often misleading people. Fake news spreads faster than real news, thus creating confusion and mistrust among global people. In 2024, current statistics and trends reveal that many people have encountered fake news online, and many have shared it unknowingly.

    Fake news affects public opinion, political decisions, and even relationships. This article helps us understand how widespread it is and helps us address several issues more effectively. Raising awareness and encouraging critical thinking can reduce its impact, in which reliable statistics and research are essential for uncovering the truth and stopping the spread of false information. Everyone plays a role in combating fake news.

  2. Sharing of made-up news on social networks in the U.S. 2020

    • statista.com
    Updated Mar 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2023). Sharing of made-up news on social networks in the U.S. 2020 [Dataset]. https://www.statista.com/statistics/657111/fake-news-sharing-online/
    Explore at:
    Dataset updated
    Mar 21, 2023
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Dec 8, 2020
    Area covered
    United States
    Description

    A survey conducted in December 2020 assessing if news consumers in the United States had ever unknowingly shared fake news or information on social media found that 38.2 percent had done so. A similar share had not, whereas seven percent were unsure if they had accidentally disseminated misinformation on social networks.

    Fake news in the U.S.

    Fake news, or news that contains misinformation, has become a prevalent issue within the American media landscape. Fake news can be circulated online as news stories with deliberately misleading headings, or clickbait, but the rise of misinformation cannot be solely accredited to online social media. Forms of fake news are also found in print media, with 47 percent of Americans witnessing fake news in newspapers and magazines as of January 2019.

    News consumers in the United States are aware of the spread of misinformation, with many Americans believing online news websites regularly report fake news stories. With such a high volume of online news websites publishing false information, it can be difficult to assess the credibility of a story. This can have damaging effects on society in that the public struggled to keep informed, creating a great deal of confusion about even basic facts and contributing to incivility.

  3. Frequency of online news sources reporting fake news U.S. 2018

    • statista.com
    Updated Feb 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Frequency of online news sources reporting fake news U.S. 2018 [Dataset]. https://www.statista.com/statistics/649234/fake-news-exposure-usa/
    Explore at:
    Dataset updated
    Feb 13, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Mar 2, 2018 - Mar 5, 2018
    Area covered
    United States
    Description

    As of March of 2018, around 52 percent of Americans felt that online news websites regularly report fake news stories in the United States. Another 34 percent of respondents stated that they believed that online news websites occasionally report fake news stories. Just nine percent of adults said that they did not believe that fake news stories were being reported online.

    Fake news

    Coined by Donald Trump, the term ‘fake news’ is used to describe news stories or even entire networks believed to be spreading false information. Increasingly used by members of government and citizens on both sides of the political spectrum, the term is now a staple in debates regarding freedom of the press, corruption, and media bias. People of all ages now believe that over 60 percent of the news that they see on social media is fake and express similar concern over the accuracy of traditional news sources. While a cynical perspective regarding news and reporting may be positive in terms of holding guilty outlets accountable and ensuring responsible reporting, the fake news phenomenon has extended much farther than pure skepticism. As of 2018, around 35 percent of Republicans and 18 percent of Independents perceived the media to be an enemy of the American people.

  4. b

    Data from: Processing political misinformation: comprehending the Trump...

    • data.bris.ac.uk
    Updated Apr 22, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). Data from: Processing political misinformation: comprehending the Trump phenomenon - Datasets - data.bris [Dataset]. https://data.bris.ac.uk/data/dataset/8001384ef9ab38dd90710ba227c8f7e3
    Explore at:
    Dataset updated
    Apr 22, 2017
    Description

    This study investigated the cognitive processing of true and false political information. Specifically, it examined the impact of source credibility on the assessment of veracity when information comes from a polarizing source (Experiment 1), and effectiveness of explanations when they come from one's own political party or an opposition party (Experiment 2). These experiments were conducted prior to the 2016 Presidential election. Participants rated their belief in factual and incorrect statements that President Trump made on the campaign trail; facts were subsequently affirmed and misinformation retracted. Participants then re-rated their belief immediately or after a delay. Experiment 1 found that (i) if information was attributed to Trump, Republican supporters of Trump believed it more than if it was presented without attribution, whereas the opposite was true for Democrats and (ii) although Trump supporters reduced their belief in misinformation items following a correction, they did not change their voting preferences. Experiment 2 revealed that the explanation's source had relatively little impact, and belief updating was more influenced by perceived credibility of the individual initially purporting the information. These findings suggest that people use political figures as a heuristic to guide evaluation of what is true or false, yet do not necessarily insist on veracity as a prerequisite for supporting political candidates.

  5. Children reading fake news online United Kingdom (UK) 2024

    • statista.com
    Updated Sep 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Children reading fake news online United Kingdom (UK) 2024 [Dataset]. https://www.statista.com/statistics/1268671/children-reading-fake-news-online-united-kingdom-uk/
    Explore at:
    Dataset updated
    Sep 18, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Nov 2023 - Mar 2024
    Area covered
    United Kingdom
    Description

    A 2024 study on news consumption among children in the United Kingdom found that 37 percent of respondents aged 12 to 15 years old had come across deliberately untrue or misleading news online or on social media in the year before the survey was conducted. 35 percent said they had not seen any false news.

  6. Instagram recommended misinformation 2020, by content

    • statista.com
    Updated Mar 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2022). Instagram recommended misinformation 2020, by content [Dataset]. https://www.statista.com/statistics/1293258/instagram-recommended-misinformation-by-content/
    Explore at:
    Dataset updated
    Mar 7, 2022
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Sep 14, 2020 - Nov 16, 2020
    Area covered
    Worldwide
    Description

    From September to November 2020, 57.7 percent of misinformation recommended by Instagram contained content about the coronavirus. Overall, 21.2 percent of misinformation posts contained content about vaccines, and 12.5 percent of recommended misinformation was surrounding elections. Overall, over 37 percent of misinformation came from Instagram's suggested posts feature.

  7. Forms of misinformation or disinformation business or organization has been...

    • www150.statcan.gc.ca
    • open.canada.ca
    • +1more
    Updated Aug 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government of Canada, Statistics Canada (2023). Forms of misinformation or disinformation business or organization has been a victim of over the last 12 months, third quarter of 2023 [Dataset]. http://doi.org/10.25318/3310070901-eng
    Explore at:
    Dataset updated
    Aug 28, 2023
    Dataset provided by
    Government of Canadahttp://www.gg.ca/
    Statistics Canadahttps://statcan.gc.ca/en
    Area covered
    Canada
    Description

    Forms of misinformation or disinformation business or organization has been a victim of over the last 12 months, by North American Industry Classification System (NAICS), business employment size, type of business, business activity and majority ownership, third quarter of 2023.

  8. CT-FAN-21 corpus: A dataset for Fake News Detection

    • zenodo.org
    Updated Oct 23, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl (2022). CT-FAN-21 corpus: A dataset for Fake News Detection [Dataset]. http://doi.org/10.5281/zenodo.4714517
    Explore at:
    Dataset updated
    Oct 23, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl
    Description

    Data Access: The data in the research collection provided may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use it only for research purposes. Due to these restrictions, the collection is not open data. Please download the Agreement at Data Sharing Agreement and send the signed form to fakenewstask@gmail.com .

    Citation

    Please cite our work as

    @article{shahi2021overview,
     title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection},
     author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas},
     journal={Working Notes of CLEF},
     year={2021}
    }

    Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English.

    Subtask 3A: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. The training data will be released in batches and roughly about 900 articles with the respective label. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. Our definitions for the categories are as follows:

    • False - The main claim made in an article is untrue.

    • Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.

    • True - This rating indicates that the primary elements of the main claim are demonstrably true.

    • Other- An article that cannot be categorised as true, false, or partially false due to lack of evidence about its claims. This category includes articles in dispute and unproven articles.

    Subtask 3B: Topical Domain Classification of News Articles (English) Fact-checkers require background expertise to identify the truthfulness of an article. The categorisation will help to automate the sampling process from a stream of data. Given the text of a news article, determine the topical domain of the article (English). This is a classification problem. The task is to categorise fake news articles into six topical categories like health, election, crime, climate, election, education. This task will be offered for a subset of the data of Subtask 3A.

    Input Data

    The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:

    Task 3a

    • ID- Unique identifier of the news article
    • Title- Title of the news article
    • text- Text mentioned inside the news article
    • our rating - class of the news article as false, partially false, true, other

    Task 3b

    • public_id- Unique identifier of the news article
    • Title- Title of the news article
    • text- Text mentioned inside the news article
    • domain - domain of the given news article(applicable only for task B)

    Output data format

    Task 3a

    • public_id- Unique identifier of the news article
    • predicted_rating- predicted class

    Sample File

    public_id, predicted_rating
    1, false
    2, true

    Task 3b

    • public_id- Unique identifier of the news article
    • predicted_domain- predicted domain

    Sample file

    public_id, predicted_domain
    1, health
    2, crime

    Additional data for Training

    To train your model, the participant can use additional data with a similar format; some datasets are available over the web. We don't provide the background truth for those datasets. For testing, we will not use any articles from other datasets. Some of the possible source:

    IMPORTANT!

    1. Fake news article used for task 3b is a subset of task 3a.
    2. We have used the data from 2010 to 2021, and the content of fake news is mixed up with several topics like election, COVID-19 etc.

    Evaluation Metrics

    This task is evaluated as a classification task. We will use the F1-macro measure for the ranking of teams. There is a limit of 5 runs (total and not per day), and only one person from a team is allowed to submit runs.

    Submission Link: https://competitions.codalab.org/competitions/31238

    Related Work

    • Shahi GK. AMUSED: An Annotation Framework of Multi-modal Social Media Data. arXiv preprint arXiv:2010.00502. 2020 Oct 1.https://arxiv.org/pdf/2010.00502.pdf
    • G. K. Shahi and D. Nandini, “FakeCovid – a multilingualcross-domain fact check news dataset for covid-19,” inWorkshop Proceedings of the 14th International AAAIConference on Web and Social Media, 2020. http://workshop-proceedings.icwsm.org/abstract?id=2020_14
    • Shahi, G. K., Dirkson, A., & Majchrzak, T. A. (2021). An exploratory study of covid-19 misinformation on twitter. Online Social Networks and Media, 22, 100104. doi: 10.1016/j.osnem.2020.100104
  9. 2021 HEALTH MISINFORMATION DATASET

    • catalog.data.gov
    • s.cnmilf.com
    Updated Mar 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2025). 2021 HEALTH MISINFORMATION DATASET [Dataset]. https://catalog.data.gov/dataset/2021-health-misinformation-dataset
    Explore at:
    Dataset updated
    Mar 14, 2025
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    The Health Misinformation track aims to (1) provide a venue for research on retrieval methods that promote better decision making with search engines, and (2) develop new online and offline evaluation methods to predict the decision making quality induced by search results. Consumer health information is used as the domain of interest in the track.

  10. S

    Fake News Statistics By Social Media, Region And Facts (2025)

    • sci-tech-today.com
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sci-Tech Today (2025). Fake News Statistics By Social Media, Region And Facts (2025) [Dataset]. https://www.sci-tech-today.com/stats/fake-news-statistics/
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset authored and provided by
    Sci-Tech Today
    License

    https://www.sci-tech-today.com/privacy-policyhttps://www.sci-tech-today.com/privacy-policy

    Time period covered
    2022 - 2032
    Area covered
    Global
    Description

    Introduction

    Fake News Statistics: Fake news refers to information that is untrue and circulated deliberately intending to deceive the reader. The dissemination of fake news statistics has increased tremendously over the past few years with the development of social media and other online platforms.

    It has become a serious concern in various countries as of the year 2024 for aspects such as trust among the citizens, politics, and the social conduct of the people. There are concerted efforts by both the authorities and technology industries to contain the menace of false information. This article will show the fake news statistics and facts below, showing how prevalent this modern issue is today.

  11. b

    Data from: Neutralizing misinformation through inoculation: exposing...

    • data.bris.ac.uk
    Updated May 27, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). Data from: Neutralizing misinformation through inoculation: exposing misleading argumentation techniques reduces their influence - Datasets - data.bris [Dataset]. https://data.bris.ac.uk/data/dataset/cef22931299aa6b4d39ba43ea6e21e5a
    Explore at:
    Dataset updated
    May 27, 2017
    Description

    Data for Experiments 1 & 2 for Cook, Lewandowsky & Ecker (2017). Neutralizing Misinformation Through Inoculation: Exposing Misleading Argumentation Techniques Reduces Their Influence. PLOS ONE.

  12. S

    Global Disinformation Detection Tools Market Growth Opportunities 2025-2032

    • statsndata.org
    excel, pdf
    Updated May 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats N Data (2025). Global Disinformation Detection Tools Market Growth Opportunities 2025-2032 [Dataset]. https://www.statsndata.org/report/disinformation-detection-tools-market-321151
    Explore at:
    pdf, excelAvailable download formats
    Dataset updated
    May 2025
    Dataset authored and provided by
    Stats N Data
    License

    https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order

    Area covered
    Global
    Description

    The Disinformation Detection Tools market is rapidly evolving, driven by the increasing prevalence of misinformation across various platforms. As organizations, governments, and individuals grapple with the challenges posed by fake news, deepfakes, and other forms of deceptive content, the demand for effective disin

  13. H

    FakeNewsNet

    • dataverse.harvard.edu
    • kaggle.com
    Updated Jan 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kai Shu (2020). FakeNewsNet [Dataset]. http://doi.org/10.7910/DVN/UEMMHS
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 16, 2020
    Dataset provided by
    Harvard Dataverse
    Authors
    Kai Shu
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.7910/DVN/UEMMHShttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.7910/DVN/UEMMHS

    Description

    FakeNewsNet is a multi-dimensional data repository that currently contains two datasets with news content, social context, and spatiotemporal information. The dataset is constructed using an end-to-end system, FakeNewsTracker. The constructed FakeNewsNet repository has the potential to boost the study of various open research problems related to fake news study. Because of the Twitter data sharing policy, we only share the news articles and tweet ids as part of this dataset and provide code along with repo to download complete tweet details, social engagements, and social networks. We describe and compare FakeNewsNet with other existing datasets in FakeNewsNet: A Data Repository with News Content, Social Context and Spatialtemporal Information for Studying Fake News on Social Media (https://arxiv.org/abs/1809.01286). A more readable version of the dataset is available at https://github.com/KaiDMML/FakeNewsNet

  14. Ways that consumers identify online misinformation India 2023

    • statista.com
    Updated Jun 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Ways that consumers identify online misinformation India 2023 [Dataset]. https://www.statista.com/statistics/1406290/india-fake-news-indicators/
    Explore at:
    Dataset updated
    Jun 26, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Mar 2023
    Area covered
    India
    Description

    In a digital news consumption survey conducted in India in March 2023, 43 percent of respondents stated that observing how news spreads and its absence from other digital platforms was a common method they used to spot online misinformation. In comparison, 30 percent of the surveyed consumers selected poorly designed graphics or one-sided news as common indicators of online misinformation.

  15. h

    Supporting data for "Training Critical Thinking in Fake News Discernment"

    • datahub.hku.hk
    docx
    Updated Nov 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yiwen Zhong; Xiaoqing Hu (2023). Supporting data for "Training Critical Thinking in Fake News Discernment" [Dataset]. http://doi.org/10.25442/hku.21365841.v1
    Explore at:
    docxAvailable download formats
    Dataset updated
    Nov 1, 2023
    Dataset provided by
    HKU Data Repository
    Authors
    Yiwen Zhong; Xiaoqing Hu
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The datasets are for my master's thesis. We conducted 2 experiments to examine the effect of our self-designed reflective questions which induce critical thinking on participants' fake news discernment ability. The dataset contains participants' demographic information, critical thinking ability, reflective question scoring, fake news discerning and sharing ability etc.

    Materials, scripts, and data are available at: https://osf.io/na9qt/ and https://osf.io/96p7g/

  16. h

    Supporting Data for "The Role of Memory in Correcting Misinformation"

    • datahub.hku.hk
    Updated May 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shu Yi Sean Guo (2025). Supporting Data for "The Role of Memory in Correcting Misinformation" [Dataset]. http://doi.org/10.25442/hku.28936775.v1
    Explore at:
    Dataset updated
    May 22, 2025
    Dataset provided by
    HKU Data Repository
    Authors
    Shu Yi Sean Guo
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The dataset is supporting data for the thesis " The Role of Memory in Correcting Misinformation".Chapter 2 examined how initial beliefs in AI-generated visual misinformation is influenced by properties of images, and how these properties relate to correction effectiveness. Raw survey data and analysis scripts are provided.Chapter 3 examined how re-exposure to AI-generated images changed correction effectiveness to AI-generated visual misinformation. Raw survey data and analysis scripts are provided.Chapter 4 examined neural activity during encoding of causal misinformation and corrections and investigated how delays affected retrieval accuracy. Behavioral and ERP data are provided.Chapter 5 examined neural activity during retrieval of causal misinformation and investigated how an alternative explanation to misinformation improved retrieval accuracy. Behavioral and ERP data are provided.Names of participants have been anonymized. Data files can be opened with excel, and analysis scripts can be opened with R.

  17. Z

    A dataset of Covid-related misinformation videos and their spread on social...

    • data.niaid.nih.gov
    • explore.openaire.eu
    Updated Feb 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Knuutila, Aleksi (2021). A dataset of Covid-related misinformation videos and their spread on social media [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4557827
    Explore at:
    Dataset updated
    Feb 24, 2021
    Dataset authored and provided by
    Knuutila, Aleksi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains metadata about all Covid-related YouTube videos which circulated on public social media, but which YouTube eventually removed because they contained false information. It describes 8,122 videos that were shared between November 2019 and June 2020. The dataset contains unique identifiers for the videos and social media accounts that shared the videos, statistics on social media engagement and metadata such as video titles and view counts where they were recoverable. We publish the data alongside the code used to produce on Github. The dataset has reuse potential for research studying narratives related to the coronavirus, the impact of social media on knowledge about health and the politics of social media platforms.

  18. Z

    Dataset for the paper: "Monant Medical Misinformation Dataset: Mapping...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 22, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elena Stefancova (2022). Dataset for the paper: "Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked Claims" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5996863
    Explore at:
    Dataset updated
    Apr 22, 2022
    Dataset provided by
    Jakub Simko
    Maria Bielikova
    Branislav Pecher
    Robert Moro
    Matus Tomlein
    Elena Stefancova
    Ivan Srba
    Description

    Overview

    This dataset of medical misinformation was collected and is published by Kempelen Institute of Intelligent Technologies (KInIT). It consists of approx. 317k news articles and blog posts on medical topics published between January 1, 1998 and February 1, 2022 from a total of 207 reliable and unreliable sources. The dataset contains full-texts of the articles, their original source URL and other extracted metadata. If a source has a credibility score available (e.g., from Media Bias/Fact Check), it is also included in the form of annotation. Besides the articles, the dataset contains around 3.5k fact-checks and extracted verified medical claims with their unified veracity ratings published by fact-checking organisations such as Snopes or FullFact. Lastly and most importantly, the dataset contains 573 manually and more than 51k automatically labelled mappings between previously verified claims and the articles; mappings consist of two values: claim presence (i.e., whether a claim is contained in the given article) and article stance (i.e., whether the given article supports or rejects the claim or provides both sides of the argument).

    The dataset is primarily intended to be used as a training and evaluation set for machine learning methods for claim presence detection and article stance classification, but it enables a range of other misinformation related tasks, such as misinformation characterisation or analyses of misinformation spreading.

    Its novelty and our main contributions lie in (1) focus on medical news article and blog posts as opposed to social media posts or political discussions; (2) providing multiple modalities (beside full-texts of the articles, there are also images and videos), thus enabling research of multimodal approaches; (3) mapping of the articles to the fact-checked claims (with manual as well as predicted labels); (4) providing source credibility labels for 95% of all articles and other potential sources of weak labels that can be mined from the articles' content and metadata.

    The dataset is associated with the research paper "Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked Claims" accepted and presented at ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22).

    The accompanying Github repository provides a small static sample of the dataset and the dataset's descriptive analysis in a form of Jupyter notebooks.

    Options to access the dataset

    There are two ways how to get access to the dataset:

    1. Static dump of the dataset available in the CSV format
    2. Continuously updated dataset available via REST API

    In order to obtain an access to the dataset (either to full static dump or REST API), please, request the access by following instructions provided below.

    References

    If you use this dataset in any publication, project, tool or in any other form, please, cite the following papers:

    @inproceedings{SrbaMonantPlatform, author = {Srba, Ivan and Moro, Robert and Simko, Jakub and Sevcech, Jakub and Chuda, Daniela and Navrat, Pavol and Bielikova, Maria}, booktitle = {Proceedings of Workshop on Reducing Online Misinformation Exposure (ROME 2019)}, pages = {1--7}, title = {Monant: Universal and Extensible Platform for Monitoring, Detection and Mitigation of Antisocial Behavior}, year = {2019} }

    @inproceedings{SrbaMonantMedicalDataset, author = {Srba, Ivan and Pecher, Branislav and Tomlein Matus and Moro, Robert and Stefancova, Elena and Simko, Jakub and Bielikova, Maria}, booktitle = {Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22)}, numpages = {11}, title = {Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked Claims}, year = {2022}, doi = {10.1145/3477495.3531726}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3477495.3531726}, }

    Dataset creation process

    In order to create this dataset (and to continuously obtain new data), we used our research platform Monant. The Monant platform provides so called data providers to extract news articles/blogs from news/blog sites as well as fact-checking articles from fact-checking sites. General parsers (from RSS feeds, Wordpress sites, Google Fact Check Tool, etc.) as well as custom crawler and parsers were implemented (e.g., for fact checking site Snopes.com). All data is stored in the unified format in a central data storage.

    Ethical considerations

    The dataset was collected and is published for research purposes only. We collected only publicly available content of news/blog articles. The dataset contains identities of authors of the articles if they were stated in the original source; we left this information, since the presence of an author's name can be a strong credibility indicator. However, we anonymised the identities of the authors of discussion posts included in the dataset.

    The main identified ethical issue related to the presented dataset lies in the risk of mislabelling of an article as supporting a false fact-checked claim and, to a lesser extent, in mislabelling an article as not containing a false claim or not supporting it when it actually does. To minimise these risks, we developed a labelling methodology and require an agreement of at least two independent annotators to assign a claim presence or article stance label to an article. It is also worth noting that we do not label an article as a whole as false or true. Nevertheless, we provide partial article-claim pair veracities based on the combination of claim presence and article stance labels.

    As to the veracity labels of the fact-checked claims and the credibility (reliability) labels of the articles' sources, we take these from the fact-checking sites and external listings such as Media Bias/Fact Check as they are and refer to their methodologies for more details on how they were established.

    Lastly, the dataset also contains automatically predicted labels of claim presence and article stance using our baselines described in the next section. These methods have their limitations and work with certain accuracy as reported in this paper. This should be taken into account when interpreting them.

    Reporting mistakes in the dataset The mean to report considerable mistakes in raw collected data or in manual annotations is by creating a new issue in the accompanying Github repository. Alternately, general enquiries or requests can be sent at info [at] kinit.sk.

    Dataset structure

    Raw data

    At first, the dataset contains so called raw data (i.e., data extracted by the Web monitoring module of Monant platform and stored in exactly the same form as they appear at the original websites). Raw data consist of articles from news sites and blogs (e.g. naturalnews.com), discussions attached to such articles, fact-checking articles from fact-checking portals (e.g. snopes.com). In addition, the dataset contains feedback (number of likes, shares, comments) provided by user on social network Facebook which is regularly extracted for all news/blogs articles.

    Raw data are contained in these CSV files (and corresponding REST API endpoints):

    sources.csv

    articles.csv

    article_media.csv

    article_authors.csv

    discussion_posts.csv

    discussion_post_authors.csv

    fact_checking_articles.csv

    fact_checking_article_media.csv

    claims.csv

    feedback_facebook.csv

    Note: Personal information about discussion posts' authors (name, website, gravatar) are anonymised.

    Annotations

    Secondly, the dataset contains so called annotations. Entity annotations describe the individual raw data entities (e.g., article, source). Relation annotations describe relation between two of such entities.

    Each annotation is described by the following attributes:

    category of annotation (annotation_category). Possible values: label (annotation corresponds to ground truth, determined by human experts) and prediction (annotation was created by means of AI method).

    type of annotation (annotation_type_id). Example values: Source reliability (binary), Claim presence. The list of possible values can be obtained from enumeration in annotation_types.csv.

    method which created annotation (method_id). Example values: Expert-based source reliability evaluation, Fact-checking article to claim transformation method. The list of possible values can be obtained from enumeration methods.csv.

    its value (value). The value is stored in JSON format and its structure differs according to particular annotation type.

    At the same time, annotations are associated with a particular object identified by:

    entity type (parameter entity_type in case of entity annotations, or source_entity_type and target_entity_type in case of relation annotations). Possible values: sources, articles, fact-checking-articles.

    entity id (parameter entity_id in case of entity annotations, or source_entity_id and target_entity_id in case of relation annotations).

    The dataset provides specifically these entity annotations:

    Source reliability (binary). Determines validity of source (website) at a binary scale with two options: reliable source and unreliable source.

    Article veracity. Aggregated information about veracity from article-claim pairs.

    The dataset provides specifically these relation annotations:

    Fact-checking article to claim mapping. Determines mapping between fact-checking article and claim.

    Claim presence. Determines presence of claim in article.

    Claim stance. Determines stance of an article to a claim.

    Annotations are contained in these CSV files (and corresponding REST API endpoints):

    entity_annotations.csv

    relation_annotations.csv

    Note: Identification of human annotators authors (email provided in the annotation app) is anonymised.

  19. W

    BuzzFeed-Webis Fake News Corpus 16

    • webis.de
    • paperswithcode.com
    • +2more
    1181813
    Updated 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Potthast; Johannes Kiesel; Kevin Reinartz; Janek Bevendorff; Benno Stein (2018). BuzzFeed-Webis Fake News Corpus 16 [Dataset]. http://doi.org/10.5281/zenodo.1181813
    Explore at:
    1181813Available download formats
    Dataset updated
    2018
    Dataset provided by
    University of Kassel, hessian.AI, and ScaDS.AI
    The Web Technology & Information Systems Network
    GESIS - Leibniz Institute for the Social Sciences
    Bauhaus-Universität Weimar and Leipzig University
    Bauhaus-Universität Weimar
    Authors
    Martin Potthast; Johannes Kiesel; Kevin Reinartz; Janek Bevendorff; Benno Stein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The BuzzFeed-Webis Fake News Corpus 16 comprises the output of 9 publishers in a week close to the US elections. Among the selected publishers are 6 prolific hyperpartisan ones (three left-wing and three right-wing), and three mainstream publishers (see Table 1). All publishers earned Facebook’s blue checkmark, indicating authenticity and an elevated status within the network. For seven weekdays (September 19 to 23 and September 26 and 27), every post and linked news article of the 9 publishers was fact-checked by professional journalists at BuzzFeed. In total, 1,627 articles were checked, 826 mainstream, 256 left-wing and 545 right-wing. The imbalance between categories results from differing publication frequencies.

  20. H

    Replication Data for "Real Solutions for Fake News? Measuring the...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Mar 21, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2019). Replication Data for "Real Solutions for Fake News? Measuring the Effectiveness of General Warnings and Fact-Check Tags in Reducing Belief in False Stories on Social Media" [Dataset]. http://doi.org/10.7910/DVN/YDC4XD
    Explore at:
    application/x-stata-syntax(30045), tsv(22612847)Available download formats
    Dataset updated
    Mar 21, 2019
    Dataset provided by
    Harvard Dataverse
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Replication data and code for "Real Solutions for Fake News? Measuring the Effectiveness of General Warnings and Fact-Check Tags in Reducing Belief in False Stories on Social Media" by Katherine Clayton, Spencer Blair, Jonathan A. Busam, Samuel Forstner, John Glance, Guy Green, Anna Kawata, Akhila Kovvuri, Jonathan Martin, Evan Morgan, Morgan Sandhu, Rachel Sang, Rachel Scholz-Bright, Austin T. Welch, Andrew G. Wolff, Amanda Zhou, and Brendan Nyhan.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Coolest Gadgets (2025). Fake News Statistics By Impacts, AI, Country, Misinformation, Frequency, Media Outlets And Economic Losses [Dataset]. https://coolest-gadgets.com/fake-news-statistics/

Fake News Statistics By Impacts, AI, Country, Misinformation, Frequency, Media Outlets And Economic Losses

Explore at:
Dataset updated
Jan 9, 2025
Dataset authored and provided by
Coolest Gadgets
License

https://coolest-gadgets.com/privacy-policyhttps://coolest-gadgets.com/privacy-policy

Time period covered
2022 - 2032
Area covered
Global
Description

Introduction

Fake News Statistics: Fake news has become a major problem in today's digital age in recent years. It spreads quickly through social media and other online platforms, often misleading people. Fake news spreads faster than real news, thus creating confusion and mistrust among global people. In 2024, current statistics and trends reveal that many people have encountered fake news online, and many have shared it unknowingly.

Fake news affects public opinion, political decisions, and even relationships. This article helps us understand how widespread it is and helps us address several issues more effectively. Raising awareness and encouraging critical thinking can reduce its impact, in which reliable statistics and research are essential for uncovering the truth and stopping the spread of false information. Everyone plays a role in combating fake news.

Search
Clear search
Close search
Google apps
Main menu