The ISOT Fake News dataset is a compilation of several thousands fake news and truthful articles, obtained from different legitimate news sites and sites flagged as unreliable by Politifact.com.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Fake audio detection is a growing concern and some relevant datasets have been designed for research. But there is no standard public Chinese dataset under additive noise conditions. In this paper, we aim to fill in the gap and design a
Chinese fake audio detection dataset (FAD) for studying more generalized detection methods. Twelve mainstream speech generation techniques are used to generate fake audios. To simulate the real-life scenarios, three noise datasets are selected for
noisy adding at five different signal noise ratios. FAD dataset can be used not only for fake audio detection, but also for detecting the algorithms of fake utterances for
audio forensics. Baseline results are presented with analysis. The results that show fake audio detection methods with generalization remain challenging.
The FAD dataset is publicly available. The source code of baselines is available on GitHub https://github.com/ADDchallenge/FAD
The FAD dataset is designed to evaluate the methods of fake audio detection and fake algorithms recognition and other relevant studies. To better study the robustness of the methods under noisy
conditions when applied in real life, we construct the corresponding noisy dataset. The total FAD dataset consists of two versions: clean version and noisy version. Both versions are divided into
disjoint training, development and test sets in the same way. There is no speaker overlap across these three subsets. Each test sets is further divided into seen and unseen test sets. Unseen test sets can
evaluate the generalization of the methods to unknown types. It is worth mentioning that both real audios and fake audios in the unseen test set are unknown to the model.
For the noisy speech part, we select three noise database for simulation. Additive noises are added to each audio in the clean dataset at 5 different SNRs. The additive noises of the unseen test set and the
remaining subsets come from different noise databases. In each version of FAD dataset, there are 138400 utterances in training set, 14400 utterances in development set, 42000 utterances in seen test set, and 21000 utterances in unseen test set. More detailed statistics are demonstrated in the Tabel 2.
Clean Real Audios Collection
From the point of eliminating the interference of irrelevant factors, we collect clean real audios from
two aspects: 5 open resources from OpenSLR platform (http://www.openslr.org/12/) and one self-recording dataset.
Clean Fake Audios Generation
We select 11 representative speech synthesis methods to generate the fake audios and one partially fake audios.
Noisy Audios Simulation
Noisy audios aim to quantify the robustness of the methods under noisy conditions. To simulate the real-life scenarios, we artificially sample the noise signals and add them to clean audios at 5 different
SNRs, which are 0dB, 5dB, 10dB, 15dB and 20dB. Additive noises are selected from three noise databases: PNL 100 Nonspeech Sounds, NOISEX-92, and TAU Urban Acoustic Scenes.
This data set is licensed with a CC BY-NC-ND 4.0 license.
You can cite the data using the following BibTeX entry:
@inproceedings{ma2022fad,
title={FAD: A Chinese Dataset for Fake Audio Detection},
author={Haoxin Ma, Jiangyan Yi, Chenglong Wang, Xunrui Yan, Jianhua Tao, Tao Wang, Shiming Wang, Le Xu, Ruibo Fu},
booktitle={Submitted to the 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks },
year={2022},
}
Around 23 percent of Americans stated that they strongly agreed that CNN regularly reports made up or fake news about Donald Trump and his administration. An additional 16 percent strongly disagreed with this statement, and 15 percent had no opinion, despite the divisive subject matter.
CNN
CNN ranks as one of the most popular news networks in the United States and boasts successful affiliates which can be accessed by people in over 200 countries around the world. Over 45 percent of Americans report that they watch the network, and it is generally seen as a credible source of news and information. Over half of Americans find the network to be at least somewhat credible, but 21 percent strongly disagreed, implying highly polarized opinions based on political affiliation. Democrats are much more likely to watch CNN than their Republican and Independent counterparts suggesting that the network is at least somewhat left leaning in its coverage.
Fake news
Coined by Donald Trump during the 2016 election cycle, the term ‘fake news’ is often used by the president and his supporters to describe news stories and networks which they believe to be spreading false information. Over 50 percent of Americans believe that online news websites regularly report fake news stories, while only nine percent think otherwise. Fake news is often difficult to identify, and many news consumers in countries across the globe struggle to determine fact from fiction.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is designed to support research in fake news detection across four major Indian languages: Gujarati, Hindi, Marathi, and Telugu. The dataset includes a diverse set of news articles collected from various sources, each labeled as either 'fake' or 'real'. The primary goal is to provide a resource that helps in the development and evaluation of natural language processing (NLP) models capable of detecting fake news in these regional languages.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Fake Face Vs Real Face is a dataset for object detection tasks - it contains Fake Face annotations for 494 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
This dataset was created by hyeojuKim
This dataset was created by Yashvardhan Thakker
This dataset was created by Ekatra
Released under Other (specified in description)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset of fake Spanish ID documents to train fake ID detectors on. Base material taken from MIDV 2020 dataset.
In 2022, mobile app install frauds across all examined categories on Android devices made us for the most bot operations. Bots were used in around 75 percent of the fraudulent installs of finance apps, as well as in over 75 percent of all fake installs of social apps. Midcore gaming apps were more likely than other app categories to be targeted by click flooding as fraudulent installs, while hypercasual gaming apps saw approximately 24 percent of their fraudulent installs coming from fake publisher activities.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
The ISOT Fake News dataset is a compilation of several thousands fake news and truthful articles, obtained from different legitimate news sites and sites flagged as unreliable by Politifact.com.