100+ datasets found
  1. f

    Repository of fake news detection datasets

    • figshare.com
    • data.4tu.nl
    • +1more
    txt
    Updated Mar 18, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arianna D'Ulizia; Maria Chiara Caschera; Fernando ferri; Patrizia Grifoni (2021). Repository of fake news detection datasets [Dataset]. http://doi.org/10.4121/14151755.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Mar 18, 2021
    Dataset provided by
    4TU.ResearchData
    Authors
    Arianna D'Ulizia; Maria Chiara Caschera; Fernando ferri; Patrizia Grifoni
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The dataset contains a list of twenty-seven freely available evaluation datasets for fake news detection analysed according to eleven main characteristics (i.e., news domain, application purpose, type of disinformation, language, size, news content, rating scale, spontaneity, media platform, availability, and extraction time)

  2. E

    COVID Fake News Detector

    • live.european-language-grid.eu
    Updated May 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). COVID Fake News Detector [Dataset]. https://live.european-language-grid.eu/catalogue/tool-service/18690
    Explore at:
    Dataset updated
    May 23, 2022
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This service detects Fake News in a German text about COVID-19. It uses a German BERT model as binary text classifier. The result is given as a probability between 0 and 1: How likely is the information in that text to be reliable, without any Fake News?

    The model was trained on the FANG-COVID dataset. The dataset contains 41,242 documents labeled as either real (68%) or fake (32%). The ground truth was derived from automatic annotation based on the publication platform of a text (newspapers, websites, etc.). The publication platforms were associated with global labels (real or fake) as introduced by independent organizations such as Correctiv or NewsGuard.

  3. Z

    WELFake dataset for fake news detection in text data

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Radu Prodan (2021). WELFake dataset for fake news detection in text data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4561252
    Explore at:
    Dataset updated
    Apr 9, 2021
    Dataset provided by
    Pawan Kumar Verma
    Prateek Agrawal
    Radu Prodan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We designed a larger and more generic Word Embedding over Linguistic Features for Fake News Detection (WELFake) dataset of 72,134 news articles with 35,028 real and 37,106 fake news. For this, we merged four popular news datasets (i.e. Kaggle, McIntire, Reuters, BuzzFeed Political) to prevent over-fitting of classifiers and to provide more text data for better ML training.

    Dataset contains four columns: Serial number (starting from 0); Title (about the text news heading); Text (about the news content); and Label (0 = fake and 1 = real).

    There are 78098 data entries in csv file out of which only 72134 entries are accessed as per the data frame.

    This dataset is a part of our ongoing research on "Fake News Prediction on Social Media Website" as a doctoral degree program of Mr. Pawan Kumar Verma and is partially supported by the ARTICONF project funded by the European Union’s Horizon 2020 research and innovation program.

  4. Fake News Content Detection

    • kaggle.com
    zip
    Updated Sep 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ganesh (2020). Fake News Content Detection [Dataset]. https://www.kaggle.com/datasets/ganeshmg/fake-news-content-detection
    Explore at:
    zip(573472 bytes)Available download formats
    Dataset updated
    Sep 15, 2020
    Authors
    Ganesh
    Description

    Dataset

    This dataset was created by Ganesh

    Contents

  5. Z

    CT-FAN-22 corpus: A Multilingual dataset for Fake News Detection

    • data.niaid.nih.gov
    Updated Jan 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas Mandl (2022). CT-FAN-22 corpus: A Multilingual dataset for Fake News Detection [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_5775507
    Explore at:
    Dataset updated
    Jan 6, 2022
    Dataset provided by
    Shahi Gautam Kishore
    Struß Julia Maria
    Thomas Mandl
    Description

    Data Access: The data in the research collection provided may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use it only for research purposes. Due to these restrictions, the collection is not open data. Please download the Agreement at Data Sharing Agreement and send the signed form to fakenewstask@gmail.com .

    Citation

    Please cite our work as

    @article{shahi2021overview,
     title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection},
     author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas},
     journal={Working Notes of CLEF},
     year={2021}
    }

    Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English and German.

    Subtask 3: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. The training data will be released in batches and roughly about 900 articles with the respective label. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. Our definitions for the categories are as follows:

    • False - The main claim made in an article is untrue.

    • Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.

    • True - This rating indicates that the primary elements of the main claim are demonstrably true.

    • Other- An article that cannot be categorised as true, false, or partially false due to lack of evidence about its claims. This category includes articles in dispute and unproven articles.

    Input Data

    The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:

    Task 3

    • ID- Unique identifier of the news article
    • Title- Title of the news article
    • text- Text mentioned inside the news article
    • our rating - class of the news article as false, partially false, true, other

    Output data format

    Task 3

    • public_id- Unique identifier of the news article
    • predicted_rating- predicted class

    Sample File

    public_id, predicted_rating
    1, false
    2, true

    Sample file

    public_id, predicted_domain
    1, health
    2, crime

    Additional data for Training

    To train your model, the participant can use additional data with a similar format; some datasets are available over the web. We don't provide the background truth for those datasets. For testing, we will not use any articles from other datasets. Some of the possible sources:

    IMPORTANT!

    1. We have used the data from 2010 to 2021, and the content of fake news is mixed up with several topics like election, COVID-19 etc.

    Evaluation Metrics

    This task is evaluated as a classification task. We will use the F1-macro measure for the ranking of teams. There is a limit of 5 runs (total and not per day), and only one person from a team is allowed to submit runs.

    Submission Link: Coming soon

    Related Work

    • Shahi, G. K., Struß, J. M., & Mandl, T. (2021). Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection. Working Notes of CLEF.
    • Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeño, A., Míguez, R., Shaar, S., ... & Mandl, T. (2021, March). The CLEF-2021 CheckThat! lab on detecting check-worthy claims, previously fact-checked claims, and fake news. In European Conference on Information Retrieval (pp. 639-649). Springer, Cham.
    • Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeño, A., Míguez, R., Shaar, S., ... & Kartal, Y. S. (2021, September). Overview of the CLEF–2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 264-291). Springer, Cham.
    • Shahi GK. AMUSED: An Annotation Framework of Multi-modal Social Media Data. arXiv preprint arXiv:2010.00502. 2020 Oct 1.https://arxiv.org/pdf/2010.00502.pdf
    • G. K. Shahi and D. Nandini, “FakeCovid – a multilingualcross-domain fact check news dataset for covid-19,” inWorkshop Proceedings of the 14th International AAAIConference on Web and Social Media, 2020. http://workshop-proceedings.icwsm.org/abstract?id=2020_14
    • Shahi, G. K., Dirkson, A., & Majchrzak, T. A. (2021). An exploratory study of covid-19 misinformation on twitter. Online Social Networks and Media, 22, 100104. doi: 10.1016/j.osnem.2020.100104
  6. P

    MM-COVID Dataset

    • paperswithcode.com
    Updated Nov 4, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yichuan Li; Bohan Jiang; Kai Shu; Huan Liu (2021). MM-COVID Dataset [Dataset]. https://paperswithcode.com/dataset/mm-covid
    Explore at:
    Dataset updated
    Nov 4, 2021
    Authors
    Yichuan Li; Bohan Jiang; Kai Shu; Huan Liu
    Description

    MM-COVID is a dataset for fake news detection related to COVID-19. This dataset provides the multilingual fake news and the relevant social context. It contains 3,981 pieces of fake news content and 7,192 trustworthy information from English, Spanish, Portuguese, Hindi, French and Italian, 6 different languages.

  7. P

    UPFD Dataset

    • paperswithcode.com
    Updated Apr 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yingtong Dou; Kai Shu; Congying Xia; Philip S. Yu; Lichao Sun (2021). UPFD Dataset [Dataset]. https://paperswithcode.com/dataset/upfd
    Explore at:
    Dataset updated
    Apr 24, 2021
    Authors
    Yingtong Dou; Kai Shu; Congying Xia; Philip S. Yu; Lichao Sun
    Description

    For benchmarking, please refer to its variant UPFD-POL and UPFD-GOS.

    The dataset has been integrated with Pytorch Geometric (PyG) and Deep Graph Library (DGL). You can load the dataset after installing the latest versions of PyG or DGL.

    The UPFD dataset includes two sets of tree-structured graphs curated for evaluating binary graph classification, graph anomaly detection, and fake/real news detection tasks. The dataset is dumped in the form of Pytorch-Geometric dataset object. You can easily load the data and run various GNN models using PyG.

    The dataset includes fake&real news propagation (retweet) networks on Twitter built according to fact-check information from Politifact and Gossipcop. The news retweet graphs were originally extracted by FakeNewsNet. Each graph is a hierarchical tree-structured graph where the root node represents the news; the leaf nodes are Twitter users who retweeted the root news. A user node has an edge to the news node if he/she retweeted the news tweet. Two user nodes have an edge if one user retweeted the news tweet from the other user.

    We crawled near 20 million historical tweets from users who participated in fake news propagation in FakeNewsNet to generate node features in the dataset. We incorporate four node feature types in the dataset, the 768-dimensional bert and 300-dimensional spacy features are encoded using pretrained BERT and spaCy word2vec, respectively. The 10-dimensional profile feature is obtained from a Twitter account's profile. You can refer to profile_feature.py for profile feature extraction. The 310-dimensional content feature is composed of a 300-dimensional user comment word2vec (spaCy) embedding plus a 10-dimensional profile feature.

    The dataset statistics is shown below:

    Data#Graphs#Fake News#Total Nodes#Total Edges#Avg. Nodes per Graph
    Politifact31415741,05440,740131
    Gossipcop54642732314,262308,79858

    Please refer to the paper for more details about the UPFD dataset.

    Due to the Twitter policy, we could not release the crawled user's historical tweets publicly. To get the corresponding Twitter user information, you can refer to the news lists under \data in our github repo and map the news id to FakeNewsNet. Then, you can crawl the user information by following the instruction on FakeNewsNet. In the UPFD project, we use Tweepy and Twitter Developer API to get the user information.

  8. Fake news Detection

    • kaggle.com
    zip
    Updated Jul 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marvel Samuel (2023). Fake news Detection [Dataset]. https://www.kaggle.com/datasets/marvelsamuel/fake-news-detection
    Explore at:
    zip(12075965 bytes)Available download formats
    Dataset updated
    Jul 24, 2023
    Authors
    Marvel Samuel
    Description

    Dataset

    This dataset was created by Marvel Samuel

    Contents

  9. Global Fake Image Detection Market Size By Component (Software, Services),...

    • verifiedmarketresearch.com
    Updated Apr 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VERIFIED MARKET RESEARCH (2024). Global Fake Image Detection Market Size By Component (Software, Services), By Application (Incident Reporting, Cyber Defense), By Geographic Scope And Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/fake-image-detection-market/
    Explore at:
    Dataset updated
    Apr 8, 2024
    Dataset provided by
    Verified Market Researchhttps://www.verifiedmarketresearch.com/
    Authors
    VERIFIED MARKET RESEARCH
    License

    https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/

    Time period covered
    2024 - 2031
    Area covered
    Global
    Description

    Fake Image Detection Market size was valued at USD 964.45 Million in 2023 and is projected to reach USD 4,107.03 Million by 2031, growing at a CAGR of 23.00% from 2024 to 2031.

    Global Fake Image Detection Market Overview

    The widespread availability of image editing software and social media platforms has led to a surge in fake images, including digitally altered photos and manipulated visual content. This trend has fueled the demand for advanced detection solutions capable of identifying and flagging fake images in real-time. With the proliferation of fake news and misinformation online, there is an increasing awareness among consumers, businesses, and governments about the importance of combating digital fraud and preserving the authenticity of visual content. This heightened concern is driving investments in fake image detection technologies to mitigate the risks associated with misinformation.

    However, despite advancements in AI and ML, detecting fake images remains a complex and challenging task, especially when dealing with sophisticated techniques such as deepfakes and generative adversarial networks (GANs). Developing robust detection algorithms capable of identifying increasingly sophisticated forms of image manipulation poses a significant challenge for researchers and developers. The deployment of fake image detection technologies raises concerns about privacy and data ethics, particularly regarding the collection and analysis of visual content shared online. Balancing the need for effective detection with respect for user privacy and ethical considerations remains a key challenge for stakeholders in the Fake Image Detection Market.

  10. fake news detection

    • kaggle.com
    zip
    Updated Apr 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    kalaivani (2023). fake news detection [Dataset]. https://www.kaggle.com/datasets/kalainisha/fake-news-detection
    Explore at:
    zip(24178590 bytes)Available download formats
    Dataset updated
    Apr 11, 2023
    Authors
    kalaivani
    Description

    Dataset

    This dataset was created by kalaivani

    Contents

  11. O

    UPFD-GOS (User Preference-aware Fake News Detection)

    • opendatalab.com
    zip
    Updated Apr 18, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lehigh University (2023). UPFD-GOS (User Preference-aware Fake News Detection) [Dataset]. https://opendatalab.com/OpenDataLab/UPFD-GOS
    Explore at:
    zip(1601216611 bytes)Available download formats
    Dataset updated
    Apr 18, 2023
    Dataset provided by
    Illinois Institute of Technology
    Lehigh University
    University of Illinois at Chicago
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    For benchmarking, please refer to its variant UPFD-POL and UPFD-GOS. The dataset has been integrated with Pytorch Geometric (PyG) and Deep Graph Library (DGL). You can load the dataset after installing the latest versions of PyG or DGL. The UPFD dataset includes two sets of tree-structured graphs curated for evaluating binary graph classification, graph anomaly detection, and fake/real news detection tasks. The dataset is dumped in the form of Pytorch-Geometric dataset object. You can easily load the data and run various GNN models using PyG. The dataset includes fake&real news propagation (retweet) networks on Twitter built according to fact-check information from Politifact and Gossipcop. The news retweet graphs were originally extracted by FakeNewsNet. Each graph is a hierarchical tree-structured graph where the root node represents the news; the leaf nodes are Twitter users who retweeted the root news. A user node has an edge to the news node if he/she retweeted the news tweet. Two user nodes have an edge if one user retweeted the news tweet from the other user. We crawled near 20 million historical tweets from users who participated in fake news propagation in FakeNewsNet to generate node features in the dataset. We incorporate four node feature types in the dataset, the 768-dimensional bert and 300-dimensional spacy features are encoded using pretrained BERT and spaCy word2vec, respectively. The 10-dimensional profile feature is obtained from a Twitter account's profile. You can refer to profile_feature.py for profile feature extraction. The 310-dimensional content feature is composed of a 300-dimensional user comment word2vec (spaCy) embedding plus a 10-dimensional profile feature. The dataset statistics is shown below: Data

    Graphs

    Fake News

    Total Nodes

    Total Edges

    Avg. Nodes per Graph

    Politifact 314 157 41,054 40,740 131 Gossipcop 5464 2732 314,262 308,798 58 Please refer to the paper for more details about the UPFD dataset. Due to the Twitter policy, we could not release the crawled user's historical tweets publicly. To get the corresponding Twitter user information, you can refer to the news lists under \data in our github repo and map the news id to FakeNewsNet. Then, you can crawl the user information by following the instruction on FakeNewsNet. In the UPFD project, we use Tweepy and Twitter Developer API to get the user information.

  12. P

    BuzzFeed-Webis Fake News Corpus 2016 Dataset

    • paperswithcode.com
    • webis.de
    • +1more
    Updated May 8, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Potthast; Johannes Kiesel; Kevin Reinartz; Janek Bevendorff; Benno Stein (2018). BuzzFeed-Webis Fake News Corpus 2016 Dataset [Dataset]. https://paperswithcode.com/dataset/buzzfeed-webis-fake-news-corpus-2016
    Explore at:
    Dataset updated
    May 8, 2018
    Authors
    Martin Potthast; Johannes Kiesel; Kevin Reinartz; Janek Bevendorff; Benno Stein
    Description

    The BuzzFeed-Webis Fake News Corpus 16 comprises the output of 9 publishers in a week close to the US elections. Among the selected publishers are 6 prolific hyperpartisan ones (three left-wing and three right-wing), and three mainstream publishers (see Table 1). All publishers earned Facebook’s blue checkmark, indicating authenticity and an elevated status within the network. For seven weekdays (September 19 to 23 and September 26 and 27), every post and linked news article of the 9 publishers was fact-checked by professional journalists at BuzzFeed. In total, 1,627 articles were checked, 826 mainstream, 256 left-wing and 545 right-wing. The imbalance between categories results from differing publication frequencies.

  13. d

    Multi-Fake-DetectiVE - Dataset - B2FIND

    • b2find.dkrz.de
    Updated Jan 31, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Multi-Fake-DetectiVE - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/957d18c1-c5b3-5b47-9524-95d9fea1d021
    Explore at:
    Dataset updated
    Jan 31, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset includes social media posts and news articles, containing both a textual and a visual component, concerning the Ukrainian-Russian war started in February 2022. The dataset was collected to perform two distinct sub-tasks: Multimodal Fake News Detection, and Cross-modal Relation Classification in fake and real news. Given a piece of content (e.g., a social media post or a news article) that includes both a visual and a textual component, the first sub-task aims to detect if the content is a real or a fake news. The second sub-task aims to understand how the visual and textual components of news can influence each other. Given a text and an accompanying image, the sub-task intends to determine whether the combination of the two aims to mislead the interpretation of the reader about one or the other, or not. The data to be used for the two sub-tasks are stored in two separate sub-folders. Each sub-folder includes: (i) a training set, which contains data collected from February 2022 to September 2022, (ii) a contemporary test set, which includes data collected in the same time window as the training set, and (iii) a future test set, which contains data collected in a subsequent time window, specifically from October 2022 to December 2022.

  14. v

    Global AI Content Detector Market Size By Application, By End-Use Industry,...

    • verifiedmarketresearch.com
    Updated Jun 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VERIFIED MARKET RESEARCH (2024). Global AI Content Detector Market Size By Application, By End-Use Industry, By Geographic Scope And Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/ai-content-detector-market/
    Explore at:
    Dataset updated
    Jun 10, 2024
    Dataset authored and provided by
    VERIFIED MARKET RESEARCH
    License

    https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/

    Time period covered
    2024 - 2031
    Area covered
    Global
    Description

    AI Content Detector Market size is growing at a moderate pace with substantial growth rates over the last few years and is estimated that the market will grow significantly in the forecasted period i.e. 2024 to 2031.

    Global AI Content Detector Market Drivers

    Rising Concerns Over Misinformation: The proliferation of fake news, misinformation, and inappropriate content on digital platforms has led to increased demand for AI content detectors. These systems can identify and flag misleading or harmful content, helping to combat the spread of misinformation online.

    Regulatory Compliance Requirements: Stringent regulations and legal obligations regarding content moderation, data privacy, and online safety drive the adoption of AI content detectors. Organizations need to comply with regulations such as the General Data Protection Regulation (GDPR) and the Digital Millennium Copyright Act (DMCA), spurring investment in AI-powered content moderation solutions.

    Growing Volume of User-Generated Content: The exponential growth of user-generated content on social media platforms, forums, and websites has overwhelmed traditional moderation methods. AI content detectors offer scalable and efficient solutions for analyzing vast amounts of content in real-time, enabling platforms to maintain a safe and healthy online environment for users.

    Advancements in AI and Machine Learning Technologies: Continuous advancements in artificial intelligence and machine learning algorithms have enhanced the capabilities of content detection systems. AI models trained on large datasets can accurately identify various types of content, including text, images, videos, and audio, with high precision and speed.

    Brand Protection and Reputation Management: Businesses prioritize brand protection and reputation management in the digital age, as negative content or misinformation can severely impact brand image and consumer trust. AI content detectors help organizations identify and address potentially damaging content proactively, safeguarding their reputation and brand integrity.

    Demand for Personalized User Experiences: Consumers increasingly expect personalized online experiences tailored to their preferences and interests. AI content detectors analyze user behavior and content interactions to deliver relevant and engaging content, driving user engagement and satisfaction.

    Adoption of AI-Powered Moderation Tools by Social Media Platforms: Major social media platforms and online communities are investing in AI-powered moderation tools to enforce community guidelines, prevent abuse and harassment, and maintain a positive user experience. The need to address content moderation challenges at scale drives the adoption of AI content detectors.

    Mitigation of Online Risks and Threats: Online platforms face various risks and threats, including cyberbullying, hate speech, terrorist propaganda, and child exploitation content. AI content detectors help mitigate these risks by identifying and removing harmful content, thereby creating a safer online environment for users.

    Cost and Resource Efficiency: Traditional content moderation methods, such as manual review by human moderators, are time-consuming, labor-intensive, and costly. AI content detectors automate the moderation process, reducing the need for human intervention and minimizing operational expenses for organizations.

  15. Fake and Real News Detection Use Machine Learning

    • kaggle.com
    zip
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sabriar Bishal (2023). Fake and Real News Detection Use Machine Learning [Dataset]. https://www.kaggle.com/datasets/sabriarbishal/fake-and-real-news-detection-use-machine-learning
    Explore at:
    zip(42975911 bytes)Available download formats
    Dataset updated
    Nov 21, 2023
    Authors
    Sabriar Bishal
    Description

    Dataset

    This dataset was created by Sabriar Bishal

    Contents

  16. Machine Hack: Fake News Content Detection

    • kaggle.com
    zip
    Updated Sep 11, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sumit Saha (2020). Machine Hack: Fake News Content Detection [Dataset]. https://www.kaggle.com/ssismasterchief/machine-hack-fake-news-content-detection
    Explore at:
    zip(573738 bytes)Available download formats
    Dataset updated
    Sep 11, 2020
    Authors
    Sumit Saha
    Description

    Dataset

    This dataset was created by Sumit Saha

    Contents

  17. E

    Political Bias Classifier

    • live.european-language-grid.eu
    Updated Nov 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Political Bias Classifier [Dataset]. https://live.european-language-grid.eu/catalogue/tool-service/9278
    Explore at:
    Dataset updated
    Nov 28, 2021
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The model classifies the political bias of a German text into 5 classes: far-left, center-left, center, center-right, far-right. It uses a TF-IDF vectorizer to preprocess documents. Then, a Random Forest classifier is applied on the resulting vectors to determine the final class.

  18. f

    Data_Sheet_1_True, justified, belief? Partisanship weakens the positive...

    • figshare.com
    docx
    Updated Sep 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Jeffrey Sude; Gil Sharon; Shira Dvir-Gvirsman (2023). Data_Sheet_1_True, justified, belief? Partisanship weakens the positive effect of news media literacy on fake news detection.docx [Dataset]. http://doi.org/10.3389/fpsyg.2023.1242865.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Sep 26, 2023
    Dataset provided by
    Frontiers
    Authors
    Daniel Jeffrey Sude; Gil Sharon; Shira Dvir-Gvirsman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To investigate how people assess whether politically consistent news is real or fake, two studies (N = 1,008; N = 1,397) with adult American participants conducted in 2020 and 2022 utilized a within-subjects experimental design to investigate perceptions of news accuracy. When a mock Facebook post with either fake (Study 1) or real (Study 2) news content was attributed to an alternative (vs. a mainstream) news outlet, it was, on average, perceived to be less accurate. Those with beliefs reflecting News Media Literacy demonstrated greater sensitivity to the outlet’s status. This relationship was itself contingent on the strength of the participant’s partisan identity. Strong partisans high in News Media Literacy defended the accuracy of politically consistent content, even while recognizing that an outlet was unfamiliar. These results highlight the fundamental importance of looking at the interaction between user-traits and features of social media news posts when examining learning from political news on social media.

  19. Fake news detection dataset

    • kaggle.com
    zip
    Updated Apr 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Khursheed (2024). Fake news detection dataset [Dataset]. https://www.kaggle.com/datasets/ahmedkhursheed007/fake-news-detection-dataset
    Explore at:
    zip(38841253 bytes)Available download formats
    Dataset updated
    Apr 26, 2024
    Authors
    Ahmed Khursheed
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Ahmed Khursheed

    Released under Apache 2.0

    Contents

  20. E

    Data from: A Data set for Information Spreading over the News

    • live.european-language-grid.eu
    txt
    Updated Nov 28, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). A Data set for Information Spreading over the News [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7719
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 28, 2021
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract:

    Analyzing the spread of information related to a specific event in the news has many potential applications. Consequently, various systems have been developed to facilitate the analysis of information spreadings such as detection of disease propagation and identification of the spreading of fake news through social media. There are several open challenges in the process of discerning information propagation, among them the lack of resources for training and evaluation. This paper describes the process of compiling a corpus from the EventRegistry global media monitoring system. We focus on information spreading in three domains: sports (i.e. the FIFA WorldCup), natural disasters (i.e. earthquakes), and climate change (i.e.global warming). This corpus is a valuable addition to the currently available datasets to examine the spreading of information about various kinds of events.Introduction:Domain-specific gaps in information spreading are ubiquitous and may exist due to economic conditions, political factors, or linguistic, geographical, time-zone, cultural, and other barriers. These factors potentially contribute to obstructing the flow of local as well as international news. We believe that there is a lack of research studies that examine, identify, and uncover the reasons for barriers in information spreading. Additionally, there is limited availability of datasets containing news text and metadata including time, place, source, and other relevant information. When a piece of information starts spreading, it implicitly raises questions such as asHow far does the information in the form of news reach out to the public?Does the content of news remain the same or changes to a certain extent?Do the cultural values impact the information especially when the same news will get translated in other languages?Statistics about datasets:

    Statistics about datasets:

    --------------------------------------------------------------------------------------------------------------------------------------

    # Domain Event Type Articles Per Language Total Articles

    1 Sports FIFA World Cup 983-en, 762-sp, 711-de, 10-sl, 216-pt 2679

    2 Natural Disaster Earthquake 941-en, 999-sp, 937-de, 19-sl, 251-pt 3194

    3 Climate Changes Global Warming 996-en, 298-sp, 545-de, 8-sl, 97-pt 1945

    --------------------------------------------------------------------------------------------------------------------------------------

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Arianna D'Ulizia; Maria Chiara Caschera; Fernando ferri; Patrizia Grifoni (2021). Repository of fake news detection datasets [Dataset]. http://doi.org/10.4121/14151755.v1

Repository of fake news detection datasets

Explore at:
5 scholarly articles cite this dataset (View in Google Scholar)
txtAvailable download formats
Dataset updated
Mar 18, 2021
Dataset provided by
4TU.ResearchData
Authors
Arianna D'Ulizia; Maria Chiara Caschera; Fernando ferri; Patrizia Grifoni
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

The dataset contains a list of twenty-seven freely available evaluation datasets for fake news detection analysed according to eleven main characteristics (i.e., news domain, application purpose, type of disinformation, language, size, news content, rating scale, spontaneity, media platform, availability, and extraction time)

Search
Clear search
Close search
Google apps
Main menu