100+ datasets found
  1. Sharing of made-up news on social networks in the U.S. 2020

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Sharing of made-up news on social networks in the U.S. 2020 [Dataset]. https://www.statista.com/statistics/657111/fake-news-sharing-online/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Dec 8, 2020
    Area covered
    United States
    Description

    A survey conducted in December 2020 assessing if news consumers in the United States had ever unknowingly shared fake news or information on social media found that 38.2 percent had done so. A similar share had not, whereas seven percent were unsure if they had accidentally disseminated misinformation on social networks.

    Fake news in the U.S.

    Fake news, or news that contains misinformation, has become a prevalent issue within the American media landscape. Fake news can be circulated online as news stories with deliberately misleading headings, or clickbait, but the rise of misinformation cannot be solely accredited to online social media. Forms of fake news are also found in print media, with 47 percent of Americans witnessing fake news in newspapers and magazines as of January 2019.

    News consumers in the United States are aware of the spread of misinformation, with many Americans believing online news websites regularly report fake news stories. With such a high volume of online news websites publishing false information, it can be difficult to assess the credibility of a story. This can have damaging effects on society in that the public struggled to keep informed, creating a great deal of confusion about even basic facts and contributing to incivility.

  2. Children reading fake news online United Kingdom (UK) 2024

    • statista.com
    Updated Nov 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Children reading fake news online United Kingdom (UK) 2024 [Dataset]. https://www.statista.com/statistics/1268671/children-reading-fake-news-online-united-kingdom-uk/
    Explore at:
    Dataset updated
    Nov 27, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United Kingdom
    Description

    A 2024 study on news consumption among children in the United Kingdom found that ** percent of respondents aged 12 to 15 years old had come across deliberately untrue or misleading news online or on social media in the year before the survey was conducted. ** percent said they had not seen any false news.

  3. S

    Social Media Misinformation Statistics 2025: How Social Platforms Amplify...

    • sqmagazine.co.uk
    Updated Oct 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SQ Magazine (2025). Social Media Misinformation Statistics 2025: How Social Platforms Amplify False Content (with Data) [Dataset]. https://sqmagazine.co.uk/social-media-misinformation-statistics/
    Explore at:
    Dataset updated
    Oct 3, 2025
    Dataset authored and provided by
    SQ Magazine
    License

    https://sqmagazine.co.uk/privacy-policy/https://sqmagazine.co.uk/privacy-policy/

    Time period covered
    Jan 1, 2024 - Dec 31, 2025
    Area covered
    Global
    Description

    In the spring of 2020, a simple tweet claimed that sipping hot water every 15 minutes could kill the coronavirus. No medical source backed it, yet the post quickly amassed over 150,000 shares. Fast forward to 2025, and we’ve learned that misinformation online is not a bug; it’s a system...

  4. Z

    Data from: Anatomy of an online misinformation network

    • data.niaid.nih.gov
    Updated Aug 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chengcheng Shao; Pik-Mai Hui; Lei Wang; Xinwen Jiang; Alessandro Flammini; Filippo Menczer; Giovanni Luca Ciampaglia (2021). Anatomy of an online misinformation network [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1154839
    Explore at:
    Dataset updated
    Aug 3, 2021
    Dataset provided by
    The MOE Key Laboratory of Intelligent Computing and Information Processing, Xiangtan University, China
    School of Informatics, Computing, and Engineering, Indiana University, Bloomington, USA
    ndiana University Network Science Institute, Bloomington, USA
    College of Computer, National University of Defense Technology, China
    Authors
    Chengcheng Shao; Pik-Mai Hui; Lei Wang; Xinwen Jiang; Alessandro Flammini; Filippo Menczer; Giovanni Luca Ciampaglia
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This dataset is provided to facilitate reproducibility of results presented in the following paper:

    Chengcheng Shao, Pik-Mai Hui, Lei Wang, Xinwen Jiang, Alessandro Flammini, Filippo Menczer and Giovanni Luca Ciampaglia (2018): Anatomy of an online misinformation network. Preprint arXiv:1801.06122, arxiv.org/abs/1801.06122

    Please read carefully both the paper and the README file attached to understand what is contained in this dataset before proceeding. These data are provided for non-commercial purposes only. If you use this dataset for research, please be sure to cite the above preprint, or preferably the final published version that will be shown on the arXiv.

  5. AMMeBa: Annotated Misinformation, Media-Based

    • kaggle.com
    zip
    Updated Apr 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google AI (2024). AMMeBa: Annotated Misinformation, Media-Based [Dataset]. https://www.kaggle.com/datasets/googleai/in-the-wild-misinformation-media
    Explore at:
    zip(48436539 bytes)Available download formats
    Dataset updated
    Apr 24, 2024
    Dataset authored and provided by
    Google AIhttp://ai.google/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is discussed in far more detail in the corresponding paper, AMMeBa: A Large-Scale Survey and Dataset of Media-Based Misinformation In-The-Wild.

    Background

    The rise of convincing, photorealistic AI-generated images and video have heightened already intense concern over online misinformation and its associated harms. However, despite huge coverage in the press and interest by the general public, it's not clear if AI is widely used in misinformation. In fact, there is little systematic data available whatsoever about the forms misinformation takes online, the use of images and video in misinformation contexts, and what types of manipulations are taking place.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20210666%2F1d22656f5bc8f2f974c0e1cb4977cfc4%2Fgalileo_examples_nolabel.jpg?generation=1712195771239016&alt=media" alt="">

    The AMMeBa (Annotated Misinformation, Media-**Ba**sed) dataset seeks to provide a survey of online misinformation, allowing first-of-its-kind quantification of manipulations like deepfakes and photoshopped media as well as trends in how those populations are changing over time.

    Recognizing the enormous value and work of fact checkers, AMMeBa uses publicly-available fact checks to identify misinformation claims, which were annotated by highly trained human annotators, providing detailed characterization of the misinformation claim. Media-based misinformation, which uses images, video and audio to bolster the claim, are a particular focus, especially images.

    Annotations took place over two years. The resulting dataset comprises millions of individual hand-applied labels, applied to over a hundred thousand English-language fact checks published between 1995 and today. More than fifty thousand misinfo-associated images were identified and annotated.

    Findings

    • Online misinformation is popularly conceptualized as false claims and rumors rendered in text. Our data indicates that the majority of misinformation (recently, about 80%) involves media of some kind: images, video, or audio.
    • Images are historically the most common type of media associated with misinformation. However, in the past two years, video-based misinformation has become increasingly common and is now the most common type of media associated with misinformation.
    • Among images, screenshots are common, peaking at about 1/5th of misinformation-associated images. The majority of these are screenshots of social media posts, nearly 20% are screenshots of fake social media posts.
    • While image-based misinformation is commonly thought of as consisting of photoshop-like manipulations, or, more recently, AI-generated content, our data show that the most common type historically is context manipulations without any pixel manipulation i.e. the original un-edited image is shown alongside a false claim about what that image shows.
      • The prevalence of technologically simple context manipulations underscores the fact that misinformation does not need to be sophisticated or elaborate to be effective.
    • While widespread concern around the use of deepfakes in misinformation began in 2018, our data show that AI-generated content was a negligible proportion of overall image-based misinfo until early 2023, when it exploded in popularity. By the time data annotation ended, it accounted for nearly 30% of all fact checked content manipulations.

    Dataset Notes

    Image URLs

    Image URLs were obtained in a best-effort manner. We provide them as a possible pointer to the correct image. However, URLs are absent for several reasons:

    1. Attrition: The image has been removed from that location; see "Data Attrition" in the paper. We are working to identify other versions of the images, if available, and will make them available in dataset updates.
    2. URL Dynamism: The images were obtained by following a fact check link to the original page or an archived version of it. Some pages, particularly archival services, dynamically generate image URLs on load or update the URLs periodically. This instability in the URL means collected URLs are soon useless for these images.

    In the majority of cases, though, the URL under misinfo_source in all provided CSVs will point to the page where the image occurred, and in general they are still present (this is checked explicitly by raters when a fact check / source is passed to a subsequent stage, like Stage 1M → Stage 2M. If the entry is not "disqualified," then the image was present on the page at the time of subsequent annotation, and may still be fetchable by matching against the provided hashes.

    Image Hashes

    To allow users to fetch the images themselves, we provide three hashes of the image data. These hashes use the open-source "imagehash" Image Hashing Library from Github ([README, with explanat...

  6. UK: digitally-altered and AI generated content and online misinformation...

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, UK: digitally-altered and AI generated content and online misinformation 2024 [Dataset]. https://www.statista.com/statistics/1489655/uk-digitally-altered-ai-generated-content-online-misinformation/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    May 1, 2024 - May 2, 2024
    Area covered
    United Kingdom
    Description

    According to a survey conducted in the United Kingdom in May 2024, 75 percent of adults thought that digitally-altered content contributed to the spread of online misinformation. Additionally, 67 percent felt that AI-generated content contributed to the spread of misnformation on online platforms.

  7. Ways that consumers identify online misinformation India 2023

    • statista.com
    Updated May 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2023). Ways that consumers identify online misinformation India 2023 [Dataset]. https://www.statista.com/statistics/1406290/india-fake-news-indicators/
    Explore at:
    Dataset updated
    May 15, 2023
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Mar 2023
    Area covered
    India
    Description

    In a digital news consumption survey conducted in India in March 2023, ** percent of respondents stated that observing how news spreads and its absence from other digital platforms was a common method they used to spot online misinformation. In comparison, ** percent of the surveyed consumers selected poorly designed graphics or one-sided news as common indicators of online misinformation.

  8. FakeNewsNet

    • kaggle.com
    • dataverse.harvard.edu
    zip
    Updated Nov 2, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepak Mahudeswaran (2018). FakeNewsNet [Dataset]. https://www.kaggle.com/mdepak/fakenewsnet
    Explore at:
    zip(17409594 bytes)Available download formats
    Dataset updated
    Nov 2, 2018
    Authors
    Deepak Mahudeswaran
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    FakeNewsNet

    This is a repository for an ongoing data collection project for fake news research at ASU. We describe and compare FakeNewsNet with other existing datasets in Fake News Detection on Social Media: A Data Mining Perspective. We also perform a detail analysis of FakeNewsNet dataset, and build a fake news detection model on this dataset in Exploiting Tri-Relationship for Fake News Detection

    JSON version of this dataset is available in github here. The new version of this dataset described in FakeNewNet will be published soon or you can email authors for more info.

    News Content

    It includes all the fake news articles, with the news content attributes as follows:

    1. source: It indicates the author or publisher of the news article
    2. headline: It refers to the short text that aims to catch the attention of readers and relates well to the major of the news topic.
    3. _body_text_: It elaborates the details of news story. Usually there is a major claim which shaped the angle of the publisher and is specifically highlighted and elaborated upon.
    4. _image_video_: It is an important part of body content of news article, which provides visual cues to frame the story.

    Social Context

    It includes the social engagements of fake news articles from Twitter. We extract profiles, posts and social network information for all relevant users.

    1. _user_profile_: It includes a set of profile fields that describe the users' basic information
    2. _user_content_: It collects the users' recent posts on Twitter
    3. _user_followers_: It includes the follower list of the relevant users
    4. _user_followees_: It includes list of users that are followed by relevant users

    References

    If you use this dataset, please cite the following papers:

    @article{shu2017fake, title={Fake News Detection on Social Media: A Data Mining Perspective}, author={Shu, Kai and Sliva, Amy and Wang, Suhang and Tang, Jiliang and Liu, Huan}, journal={ACM SIGKDD Explorations Newsletter}, volume={19}, number={1}, pages={22--36}, year={2017}, publisher={ACM} }

    @article{shu2017exploiting, title={Exploiting Tri-Relationship for Fake News Detection}, author={Shu, Kai and Wang, Suhang and Liu, Huan}, journal={arXiv preprint arXiv:1712.07709}, year={2017} }

    @article{shu2018fakenewsnet, title={FakeNewsNet: A Data Repository with News Content, Social Context and Dynamic Information for Studying Fake News on Social Media}, author={Shu, Kai and Mahudeswaran, Deepak and Wang, Suhang and Lee, Dongwon and Liu, Huan}, journal={arXiv preprint arXiv:1809.01286}, year={2018} }

  9. Misinformation & Fake News text dataset 79k

    • kaggle.com
    zip
    Updated May 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    steven (2022). Misinformation & Fake News text dataset 79k [Dataset]. https://www.kaggle.com/datasets/stevenpeutz/misinformation-fake-news-text-dataset-79k
    Explore at:
    zip(88691612 bytes)Available download formats
    Dataset updated
    May 9, 2022
    Authors
    steven
    License

    http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

    Description

    Misinformation, fake news & propaganda data set

    A dataset containing 79k articles of misinformation, fake news and propaganda. - 34975 'true' articles. --> MisinfoSuperset_TRUE.csv - 43642 articles of misinfo, fake news or propaganda --> MisinfoSuperset_FAKE.csv

    The 'true' articles comes from a variety of sources, such as Reuters, the New York TImes, the Washington Post and more.

    The 'fake' articles are sourced from: 1. American right wing extremist websites (such as Redflag Newsdesk, Beitbart, Truth Broadcast Network) 2. A previously made public dataset described in the following article: Ahmed H, Traore I, Saad S. (2017) “Detection of Online Fake News Using N-Gram Analysis and Machine Learning Techniques. In: Traore I., Woungang I., Awad A. (eds) Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments. ISDDC 2017. Lecture Notes in Computer Science, vol 10618. Springer, Cham (pp. 127-138). 3. Disinformation and propaganda cases collected by the EUvsDisinfo project. A project started in 2015 that identifies and fact checks disinformation cases originating from pro-Kremlin media that are spread across the EU.

    The articles have all information except the actual text removed and are split up into a set with all the fake news / misinformation, and one with al the true articles.

    // For those only interested in Russian propaganda (and not so much misinformation in general), I have added the Russian propaganda in a separate csv called 'EXTRA_RussianPropagandaSubset.csv..'

    --

    Note. While this might immediately seem like a great classification task, I would suggest also considering clustering / topic modelling. Why clustering? Because by clustering we make a model that can match a newly written article to a previously debunked lie / misinformation narrative, thereby we can immediately debunk a new article (or at least link it to a actual fact-checked statement) without either using an algorithm as argument , or encountering a time delay with regards to waiting for confirmation of a fact checking organisation.

    An example disinformation project using this dataset can be found on https://stevenpeutz.com/disinformation/

    Enjoy! You have chosen an incredibly important topic for your project!

  10. News Detection (Fake or Real) Dataset

    • kaggle.com
    zip
    Updated Apr 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nitish Jolly (2024). News Detection (Fake or Real) Dataset [Dataset]. https://www.kaggle.com/datasets/nitishjolly/news-detection-fake-or-real-dataset
    Explore at:
    zip(9823999 bytes)Available download formats
    Dataset updated
    Apr 17, 2024
    Authors
    Nitish Jolly
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The Fake News Detection Dataset is created to assist researchers, data scientists, and machine learning enthusiasts in tackling the challenge of distinguishing between genuine and false information in today's digital landscape inundated with social media and online channels. With thousands of news items labeled as either "Fake" or "Real," this dataset provides a robust foundation for training and testing machine learning models aimed at automatically detecting deceptive content.

    Each entry in the dataset contains the full text of a news article alongside its corresponding label, facilitating the development of supervised learning projects. The inclusion of various types of content within the news articles, ranging from factual reporting to potentially misleading information or falsehoods, offers a comprehensive resource for algorithmic training.

    The dataset's structure, with a clear binary classification of news articles as either "Fake" or "Real," enables the exploration of diverse machine learning approaches, from traditional methods to cutting-edge deep learning techniques.

    By offering an accessible and practical dataset, the Fake News Detection Dataset aims to stimulate innovation in the ongoing battle against online misinformation. It serves as a catalyst for research and development within the realms of text analysis, natural language processing, and machine learning communities. Whether it's refining feature engineering, experimenting with state-of-the-art transformer models, or creating educational tools to enhance understanding of fake news, this dataset serves as an invaluable starting point for a wide range of impactful projects.

  11. CT-FAN-21 corpus: A dataset for Fake News Detection

    • zenodo.org
    Updated Oct 23, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl (2022). CT-FAN-21 corpus: A dataset for Fake News Detection [Dataset]. http://doi.org/10.5281/zenodo.4714517
    Explore at:
    Dataset updated
    Oct 23, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl
    Description

    Data Access: The data in the research collection provided may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use it only for research purposes. Due to these restrictions, the collection is not open data. Please download the Agreement at Data Sharing Agreement and send the signed form to fakenewstask@gmail.com .

    Citation

    Please cite our work as

    @article{shahi2021overview,
     title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection},
     author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas},
     journal={Working Notes of CLEF},
     year={2021}
    }

    Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English.

    Subtask 3A: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. The training data will be released in batches and roughly about 900 articles with the respective label. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. Our definitions for the categories are as follows:

    • False - The main claim made in an article is untrue.

    • Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.

    • True - This rating indicates that the primary elements of the main claim are demonstrably true.

    • Other- An article that cannot be categorised as true, false, or partially false due to lack of evidence about its claims. This category includes articles in dispute and unproven articles.

    Subtask 3B: Topical Domain Classification of News Articles (English) Fact-checkers require background expertise to identify the truthfulness of an article. The categorisation will help to automate the sampling process from a stream of data. Given the text of a news article, determine the topical domain of the article (English). This is a classification problem. The task is to categorise fake news articles into six topical categories like health, election, crime, climate, election, education. This task will be offered for a subset of the data of Subtask 3A.

    Input Data

    The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:

    Task 3a

    • ID- Unique identifier of the news article
    • Title- Title of the news article
    • text- Text mentioned inside the news article
    • our rating - class of the news article as false, partially false, true, other

    Task 3b

    • public_id- Unique identifier of the news article
    • Title- Title of the news article
    • text- Text mentioned inside the news article
    • domain - domain of the given news article(applicable only for task B)

    Output data format

    Task 3a

    • public_id- Unique identifier of the news article
    • predicted_rating- predicted class

    Sample File

    public_id, predicted_rating
    1, false
    2, true

    Task 3b

    • public_id- Unique identifier of the news article
    • predicted_domain- predicted domain

    Sample file

    public_id, predicted_domain
    1, health
    2, crime

    Additional data for Training

    To train your model, the participant can use additional data with a similar format; some datasets are available over the web. We don't provide the background truth for those datasets. For testing, we will not use any articles from other datasets. Some of the possible source:

    IMPORTANT!

    1. Fake news article used for task 3b is a subset of task 3a.
    2. We have used the data from 2010 to 2021, and the content of fake news is mixed up with several topics like election, COVID-19 etc.

    Evaluation Metrics

    This task is evaluated as a classification task. We will use the F1-macro measure for the ranking of teams. There is a limit of 5 runs (total and not per day), and only one person from a team is allowed to submit runs.

    Submission Link: https://competitions.codalab.org/competitions/31238

    Related Work

    • Shahi GK. AMUSED: An Annotation Framework of Multi-modal Social Media Data. arXiv preprint arXiv:2010.00502. 2020 Oct 1.https://arxiv.org/pdf/2010.00502.pdf
    • G. K. Shahi and D. Nandini, “FakeCovid – a multilingualcross-domain fact check news dataset for covid-19,” inWorkshop Proceedings of the 14th International AAAIConference on Web and Social Media, 2020. http://workshop-proceedings.icwsm.org/abstract?id=2020_14
    • Shahi, G. K., Dirkson, A., & Majchrzak, T. A. (2021). An exploratory study of covid-19 misinformation on twitter. Online Social Networks and Media, 22, 100104. doi: 10.1016/j.osnem.2020.100104
  12. Fake News Detection Data

    • kaggle.com
    zip
    Updated Apr 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tasnim Niger (2024). Fake News Detection Data [Dataset]. https://www.kaggle.com/datasets/tasnimniger/fake-news-detection-data
    Explore at:
    zip(55829 bytes)Available download formats
    Dataset updated
    Apr 27, 2024
    Authors
    Tasnim Niger
    Description

    The internet and social media have led to a major problem—fake news. Fake news is false information presented as real news, often with the goal of tricking or influencing people. It's difficult to identify fake news because it can look very similar to real news. The Fake News detection dataset deals with the problem indirectly by using tabular summary statistics about each news article to attempt to predict whether the article is real or fake. This dataset is in a tabular format and contains features such as word count, sentence length, unique words, average word length, and a label indicating whether the article is fake or real.

  13. Experience of being misled by misinformation online India 2022

    • statista.com
    Updated Jun 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2023). Experience of being misled by misinformation online India 2022 [Dataset]. https://www.statista.com/statistics/1388664/india-frequency-of-being-misled-by-fake-news-online/
    Explore at:
    Dataset updated
    Jun 12, 2023
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 2022
    Area covered
    India
    Description

    In response to a survey conducted in ************, ** percent of social media users in India reported having been misled by fake news circulated online about once or twice which was slightly higher than active internet users. Meanwhile, ** percent of all internet users had experienced this a few times. Notably, more than half the respondents claimed to have never been misled by fake news online.

  14. COVID-19 rumor dataset

    • figshare.com
    html
    Updated Jun 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    cheng (2023). COVID-19 rumor dataset [Dataset]. http://doi.org/10.6084/m9.figshare.14456385.v2
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Jun 10, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    cheng
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A COVID-19 misinformation / fake news / rumor / disinformation dataset collected from online social media and news websites. Usage note:Misinformation detection, classification, tracking, prediction.Misinformation sentiment analysis.Rumor veracity classification, comment stance classification.Rumor tracking, social network analysis.Data pre-processing and data analysis codes available at https://github.com/MickeysClubhouse/COVID-19-rumor-datasetPlease see full info in our GitHub link.Cite us:Cheng, Mingxi, et al. "A COVID-19 Rumor Dataset." Frontiers in Psychology 12 (2021): 1566.@article{cheng2021covid, title={A COVID-19 Rumor Dataset}, author={Cheng, Mingxi and Wang, Songli and Yan, Xiaofeng and Yang, Tianqi and Wang, Wenshuo and Huang, Zehao and Xiao, Xiongye and Nazarian, Shahin and Bogdan, Paul}, journal={Frontiers in Psychology}, volume={12}, pages={1566}, year={2021}, publisher={Frontiers} }

  15. Fake News data set

    • kaggle.com
    zip
    Updated Dec 17, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bjørn-Jostein (2021). Fake News data set [Dataset]. https://www.kaggle.com/datasets/bjoernjostein/fake-news-data-set
    Explore at:
    zip(56446259 bytes)Available download formats
    Dataset updated
    Dec 17, 2021
    Authors
    Bjørn-Jostein
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Today, we are producing more information than ever before, but not all information is true. Some of it is actually malicious and harmful. And it makes it harder for us to trust any piece of information we come across! Not only that, now the bad actors are able to use language modelling tools like Open AI's GPT 2 to generate fake news too. Ever since its initial release, there have been talks on how it can be potentially misused for generating misleading news articles, automating the production of abusive or fake content for social media, and automating the creation of spam and phishing content.

    How do we figure out what is true and what is fake? Can we do something about it?

    Content

    The dataset consists of around 387,000 pieces of text which has been sourced from various news articles on the web as well as texts generated by Open AI's GPT 2 language model!

    The dataset is split into train, validation and test such that each of the sets has an equal split of the two classes.

    Acknowledgements

    This dataset was published on AI Crowd in a so-called KIIT AI (mini)Blitz⚡ Challenge. AI Blitz⚡ is a series of educational challenges by AIcrowd, with an aim to make it really easy for anyone to get started with the world of AI. This AI Blitz⚡challenge was an exclusive challenge just for the students and the faculty of the Kalinga Institute of Industrial Technology.

  16. H

    Replication Data for: Trends in the Diffusion of Misinformation on Social...

    • dataverse.harvard.edu
    Updated Jun 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hunt Allcott; Matthew Gentzkow; Chuan Yu (2023). Replication Data for: Trends in the Diffusion of Misinformation on Social Media [Dataset]. http://doi.org/10.7910/DVN/YAR9FU
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 12, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Hunt Allcott; Matthew Gentzkow; Chuan Yu
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains all replication files to perform the analysis in the manuscript and the online appendix.

  17. B

    Replication Data for: Seeing Misinformation and Trust, Political Ideology...

    • borealisdata.ca
    • search.dataone.org
    Updated Apr 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Trish Anderson (2023). Replication Data for: Seeing Misinformation and Trust, Political Ideology and Facebook Use [Dataset]. http://doi.org/10.5683/SP3/MHNHBV
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 17, 2023
    Dataset provided by
    Borealis
    Authors
    Trish Anderson
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Survey data collected in 2019 in Canada (n=1539). Seeing misinformation online, trust in federal government, political ideology and Facebook use.

  18. Gen AI Misinformation Detection Data (2024–2025)

    • kaggle.com
    zip
    Updated Sep 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Atharva Soundankar (2025). Gen AI Misinformation Detection Data (2024–2025) [Dataset]. https://www.kaggle.com/datasets/atharvasoundankar/gen-ai-misinformation-detection-datase-20242025
    Explore at:
    zip(32023 bytes)Available download formats
    Dataset updated
    Sep 23, 2025
    Authors
    Atharva Soundankar
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset captures realistic simulations of news articles and social media posts circulating between 2024–2025, labeled for potential AI-generated misinformation.

    It includes 500 rows × 31 columns, combining:
    - Temporal features → date, time, month, day of week
    - Text-based metadata → platform, region, language, topic
    - Quantitative engagement metrics → likes, shares, comments, CTR, views
    - Content quality indicators → sentiment polarity, toxicity score, readability index
    - Fact-checking signals → credibility source score, manual check flag, claim verification status
    - Target variableis_misinformation (0 = authentic, 1 = misinformation)

    This dataset is designed for machine learning, deep learning, NLP, data visualization, and predictive analysis research.

    🎯 Use Cases

    This dataset can be applied to multiple domains:
    - 🧠 Machine Learning / Deep Learning: Binary classification of misinformation
    - 📊 Data Visualization: Engagement trends, regional misinformation heatmaps
    - 🔍 NLP Research: Fake news detection, text classification, sentiment-based filtering
    - 🌐 PhD & Academic Research: AI misinformation studies, disinformation propagation models
    - 📈 Model Evaluation: Feature engineering, ROC-AUC, precision-recall tradeoff

  19. e

    Disinformation for Hire

    • datarepository.eur.nl
    pdf
    Updated Dec 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jan Stoop; Alain Cohn (2024). Disinformation for Hire [Dataset]. http://doi.org/10.25397/eur.27868341.v3
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Dec 13, 2024
    Dataset provided by
    Erasmus University Rotterdam (EUR)
    Authors
    Jan Stoop; Alain Cohn
    License

    http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/

    Description

    The replication material includes the do files and datasets to replicate the results in tables, figures and text of the main manuscript and the appendix. The code constructs the results from the field data and additional experiments we ran on Prolific and MTurk. The material contains 4 code files, all ending with “.do”. The code was last run using Stata (version 18.0) on MacOS. The replicator should expect the code to run under 5 minutes on a standard (2024) desktop machine.Background The spread of misinformation has been linked to increased social divisions and adverse health outcomes, but less is known about the production of disinformation, which is misinformation intended to mislead.Method The main data used in this paper has been collected by the authors using the Mturk interface (Field Experiment) or Qualtrics (Manipulation Check, Downstream Consequences, and Platform Interventions). It is available in the replication package. Our survey design and selection eligibility are included in the Supplementary Document in this depository.Results In a field experiment on MTurk (N=1,197), we found that while 70% of workers accepted a control job, 61% accepted a disinformation job requiring them to manipulate COVID-19 data. To quantify the trade-off between ethical and financial considerations in job acceptance, we introduced a lower-pay condition offering half the wage of the control job; 51% of workers accepted this job, suggesting that the ethical compromise in the disinformation task reduced the acceptance rate by about the same amount as a 25% wage reduction.A survey experiment with a nationally representative sample shows that viewing a disinformation graph from the field experiment negatively affected people’s beliefs and behavioral intentions related to the COVID-19 pandemic, including increased vaccine hesitancy.Conclusion Using a “wisdom-of-crowds” approach, we highlight how online labor markets can introduce features, such as increased worker accountability, to reduce the likelihood of workers engaging in the production of disinformation. Our findings emphasize the importance of addressing the supply side of disinformation in online labor markets to mitigate its harmful societal effects.

  20. CT-FAN: A Multilingual dataset for Fake News Detection

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Oct 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Juliane Köhler; Michael Wiegand; Melanie Siegel; Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Juliane Köhler; Michael Wiegand; Melanie Siegel (2022). CT-FAN: A Multilingual dataset for Fake News Detection [Dataset]. http://doi.org/10.5281/zenodo.6555293
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 23, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Juliane Köhler; Michael Wiegand; Melanie Siegel; Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Juliane Köhler; Michael Wiegand; Melanie Siegel
    Description

    By downloading the data, you agree with the terms & conditions mentioned below:

    Data Access: The data in the research collection may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use them only for research purposes.

    Summaries, analyses and interpretations of the linguistic properties of the information may be derived and published, provided it is impossible to reconstruct the information from these summaries. You may not try identifying the individuals whose texts are included in this dataset. You may not try to identify the original entry on the fact-checking site. You are not permitted to publish any portion of the dataset besides summary statistics or share it with anyone else.

    We grant you the right to access the collection's content as described in this agreement. You may not otherwise make unauthorised commercial use of, reproduce, prepare derivative works, distribute copies, perform, or publicly display the collection or parts of it. You are responsible for keeping and storing the data in a way that others cannot access. The data is provided free of charge.

    Citation

    Please cite our work as

    @InProceedings{clef-checkthat:2022:task3,
    author = {K{\"o}hler, Juliane and Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Wiegand, Michael and Siegel, Melanie and Mandl, Thomas},
    title = "Overview of the {CLEF}-2022 {CheckThat}! Lab Task 3 on Fake News Detection",
    year = {2022},
    booktitle = "Working Notes of CLEF 2022---Conference and Labs of the Evaluation Forum",
    series = {CLEF~'2022},
    address = {Bologna, Italy},}
    
    @article{shahi2021overview,
     title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection},
     author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas},
     journal={Working Notes of CLEF},
     year={2021}
    }

    Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English and German.

    Task 3: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. The training data will be released in batches and roughly about 1264 articles with the respective label in English language. Our definitions for the categories are as follows:

    • False - The main claim made in an article is untrue.

    • Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.

    • True - This rating indicates that the primary elements of the main claim are demonstrably true.

    • Other- An article that cannot be categorised as true, false, or partially false due to a lack of evidence about its claims. This category includes articles in dispute and unproven articles.

    Cross-Lingual Task (German)

    Along with the multi-class task for the English language, we have introduced a task for low-resourced language. We will provide the data for the test in the German language. The idea of the task is to use the English data and the concept of transfer to build a classification model for the German language.

    Input Data

    The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:

    • ID- Unique identifier of the news article
    • Title- Title of the news article
    • text- Text mentioned inside the news article
    • our rating - class of the news article as false, partially false, true, other

    Output data format

    • public_id- Unique identifier of the news article
    • predicted_rating- predicted class

    Sample File

    public_id, predicted_rating
    1, false
    2, true

    IMPORTANT!

    1. We have used the data from 2010 to 2022, and the content of fake news is mixed up with several topics like elections, COVID-19 etc.

    Baseline: For this task, we have created a baseline system. The baseline system can be found at https://zenodo.org/record/6362498

    Related Work

    • Shahi GK. AMUSED: An Annotation Framework of Multi-modal Social Media Data. arXiv preprint arXiv:2010.00502. 2020 Oct 1.https://arxiv.org/pdf/2010.00502.pdf
    • G. K. Shahi and D. Nandini, “FakeCovid – a multilingual cross-domain fact check news dataset for covid-19,” in workshop Proceedings of the 14th International AAAI Conference on Web and Social Media, 2020. http://workshop-proceedings.icwsm.org/abstract?id=2020_14
    • Shahi, G. K., Dirkson, A., & Majchrzak, T. A. (2021). An exploratory study of covid-19 misinformation on twitter. Online Social Networks and Media, 22, 100104. doi: 10.1016/j.osnem.2020.100104
    • Shahi, G. K., Struß, J. M., & Mandl, T. (2021). Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection. Working Notes of CLEF.
    • Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeno, A., Míguez, R., Shaar, S., ... & Mandl, T. (2021, March). The CLEF-2021 CheckThat! lab on detecting check-worthy claims, previously fact-checked claims, and fake news. In European Conference on Information Retrieval (pp. 639-649). Springer, Cham.
    • Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeño, A., Míguez, R., Shaar, S., ... & Kartal, Y. S. (2021, September). Overview of the CLEF–2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 264-291). Springer, Cham.
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista, Sharing of made-up news on social networks in the U.S. 2020 [Dataset]. https://www.statista.com/statistics/657111/fake-news-sharing-online/
Organization logo

Sharing of made-up news on social networks in the U.S. 2020

Explore at:
19 scholarly articles cite this dataset (View in Google Scholar)
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Dec 8, 2020
Area covered
United States
Description

A survey conducted in December 2020 assessing if news consumers in the United States had ever unknowingly shared fake news or information on social media found that 38.2 percent had done so. A similar share had not, whereas seven percent were unsure if they had accidentally disseminated misinformation on social networks.

Fake news in the U.S.

Fake news, or news that contains misinformation, has become a prevalent issue within the American media landscape. Fake news can be circulated online as news stories with deliberately misleading headings, or clickbait, but the rise of misinformation cannot be solely accredited to online social media. Forms of fake news are also found in print media, with 47 percent of Americans witnessing fake news in newspapers and magazines as of January 2019.

News consumers in the United States are aware of the spread of misinformation, with many Americans believing online news websites regularly report fake news stories. With such a high volume of online news websites publishing false information, it can be difficult to assess the credibility of a story. This can have damaging effects on society in that the public struggled to keep informed, creating a great deal of confusion about even basic facts and contributing to incivility.

Search
Clear search
Close search
Google apps
Main menu