100+ datasets found

f
Repository of fake news detection datasets
figshare.com
data.4tu.nl
+1more
txt
Updated Mar 18, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arianna D'Ulizia; Maria Chiara Caschera; Fernando ferri; Patrizia Grifoni (2021). Repository of fake news detection datasets [Dataset]. http://doi.org/10.4121/14151755.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.4121/14151755.v1
Dataset updated
Mar 18, 2021
Dataset provided by
4TU.ResearchData
Authors
Arianna D'Ulizia; Maria Chiara Caschera; Fernando ferri; Patrizia Grifoni
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The dataset contains a list of twenty-seven freely available evaluation datasets for fake news detection analysed according to eleven main characteristics (i.e., news domain, application purpose, type of disinformation, language, size, news content, rating scale, spontaneity, media platform, availability, and extraction time)
E
COVID Fake News Detector
live.european-language-grid.eu
Updated May 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). COVID Fake News Detector [Dataset]. https://live.european-language-grid.eu/catalogue/tool-service/18690
Explore at:
Dataset updated
May 23, 2022
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This service detects Fake News in a German text about COVID-19. It uses a German BERT model as binary text classifier. The result is given as a probability between 0 and 1: How likely is the information in that text to be reliable, without any Fake News?
The model was trained on the FANG-COVID dataset. The dataset contains 41,242 documents labeled as either real (68%) or fake (32%). The ground truth was derived from automatic annotation based on the publication platform of a text (newspapers, websites, etc.). The publication platforms were associated with global labels (real or fake) as introduced by independent organizations such as Correctiv or NewsGuard.
Z
WELFake dataset for fake news detection in text data
data.niaid.nih.gov
zenodo.org
Updated Apr 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Radu Prodan (2021). WELFake dataset for fake news detection in text data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4561252
Explore at:
Dataset updated
Apr 9, 2021
Dataset provided by
Pawan Kumar Verma
Prateek Agrawal
Radu Prodan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We designed a larger and more generic Word Embedding over Linguistic Features for Fake News Detection (WELFake) dataset of 72,134 news articles with 35,028 real and 37,106 fake news. For this, we merged four popular news datasets (i.e. Kaggle, McIntire, Reuters, BuzzFeed Political) to prevent over-fitting of classifiers and to provide more text data for better ML training.

Dataset contains four columns: Serial number (starting from 0); Title (about the text news heading); Text (about the news content); and Label (0 = fake and 1 = real).

There are 78098 data entries in csv file out of which only 72134 entries are accessed as per the data frame.

This dataset is a part of our ongoing research on "Fake News Prediction on Social Media Website" as a doctoral degree program of Mr. Pawan Kumar Verma and is partially supported by the ARTICONF project funded by the European Union’s Horizon 2020 research and innovation program.
Fake News Content Detection
kaggle.com
zip
Updated Sep 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ganesh (2020). Fake News Content Detection [Dataset]. https://www.kaggle.com/datasets/ganeshmg/fake-news-content-detection
Explore at:
zip(573472 bytes)Available download formats
Dataset updated
Sep 15, 2020
Authors
Ganesh
Description
Dataset

This dataset was created by Ganesh

Contents
Z
CT-FAN-22 corpus: A Multilingual dataset for Fake News Detection
data.niaid.nih.gov
Updated Jan 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thomas Mandl (2022). CT-FAN-22 corpus: A Multilingual dataset for Fake News Detection [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_5775507
Explore at:
Dataset updated
Jan 6, 2022
Dataset provided by
Shahi Gautam Kishore
Struß Julia Maria
Thomas Mandl
Description
Data Access: The data in the research collection provided may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use it only for research purposes. Due to these restrictions, the collection is not open data. Please download the Agreement at Data Sharing Agreement and send the signed form to fakenewstask@gmail.com .

Citation

Please cite our work as

@article{shahi2021overview, title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection}, author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas}, journal={Working Notes of CLEF}, year={2021} }

Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English and German.

Subtask 3: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. The training data will be released in batches and roughly about 900 articles with the respective label. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. Our definitions for the categories are as follows:

False - The main claim made in an article is untrue.

Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.

True - This rating indicates that the primary elements of the main claim are demonstrably true.

Other- An article that cannot be categorised as true, false, or partially false due to lack of evidence about its claims. This category includes articles in dispute and unproven articles.

Input Data

The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:

Task 3

ID- Unique identifier of the news article

Title- Title of the news article

text- Text mentioned inside the news article

our rating - class of the news article as false, partially false, true, other

Output data format

Task 3

public_id- Unique identifier of the news article

predicted_rating- predicted class

Sample File

public_id, predicted_rating 1, false 2, true

Sample file

public_id, predicted_domain 1, health 2, crime

Additional data for Training

To train your model, the participant can use additional data with a similar format; some datasets are available over the web. We don't provide the background truth for those datasets. For testing, we will not use any articles from other datasets. Some of the possible sources:

Fakenews Classification Datasets

Fake News Detection Challenge KDD 2020

FakeNewsNet

IMPORTANT!

We have used the data from 2010 to 2021, and the content of fake news is mixed up with several topics like election, COVID-19 etc.

Evaluation Metrics

This task is evaluated as a classification task. We will use the F1-macro measure for the ranking of teams. There is a limit of 5 runs (total and not per day), and only one person from a team is allowed to submit runs.

Submission Link: Coming soon

Related Work

Shahi, G. K., Struß, J. M., & Mandl, T. (2021). Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection. Working Notes of CLEF.

Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeño, A., Míguez, R., Shaar, S., ... & Mandl, T. (2021, March). The CLEF-2021 CheckThat! lab on detecting check-worthy claims, previously fact-checked claims, and fake news. In European Conference on Information Retrieval (pp. 639-649). Springer, Cham.

Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeño, A., Míguez, R., Shaar, S., ... & Kartal, Y. S. (2021, September). Overview of the CLEF–2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 264-291). Springer, Cham.

Shahi GK. AMUSED: An Annotation Framework of Multi-modal Social Media Data. arXiv preprint arXiv:2010.00502. 2020 Oct 1.https://arxiv.org/pdf/2010.00502.pdf

G. K. Shahi and D. Nandini, “FakeCovid – a multilingualcross-domain fact check news dataset for covid-19,” inWorkshop Proceedings of the 14th International AAAIConference on Web and Social Media, 2020. http://workshop-proceedings.icwsm.org/abstract?id=2020_14

Shahi, G. K., Dirkson, A., & Majchrzak, T. A. (2021). An exploratory study of covid-19 misinformation on twitter. Online Social Networks and Media, 22, 100104. doi: 10.1016/j.osnem.2020.100104
P
MM-COVID Dataset
paperswithcode.com
Updated Nov 4, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yichuan Li; Bohan Jiang; Kai Shu; Huan Liu (2021). MM-COVID Dataset [Dataset]. https://paperswithcode.com/dataset/mm-covid
Explore at:
Dataset updated
Nov 4, 2021
Authors
Yichuan Li; Bohan Jiang; Kai Shu; Huan Liu
Description
MM-COVID is a dataset for fake news detection related to COVID-19. This dataset provides the multilingual fake news and the relevant social context. It contains 3,981 pieces of fake news content and 7,192 trustworthy information from English, Spanish, Portuguese, Hindi, French and Italian, 6 different languages.
P
UPFD Dataset
paperswithcode.com
Updated Apr 24, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yingtong Dou; Kai Shu; Congying Xia; Philip S. Yu; Lichao Sun (2021). UPFD Dataset [Dataset]. https://paperswithcode.com/dataset/upfd
Explore at:
Dataset updated
Apr 24, 2021
Authors
Yingtong Dou; Kai Shu; Congying Xia; Philip S. Yu; Lichao Sun
Description
For benchmarking, please refer to its variant UPFD-POL and UPFD-GOS.

The dataset has been integrated with Pytorch Geometric (PyG) and Deep Graph Library (DGL). You can load the dataset after installing the latest versions of PyG or DGL.

The UPFD dataset includes two sets of tree-structured graphs curated for evaluating binary graph classification, graph anomaly detection, and fake/real news detection tasks. The dataset is dumped in the form of Pytorch-Geometric dataset object. You can easily load the data and run various GNN models using PyG.

The dataset includes fake&real news propagation (retweet) networks on Twitter built according to fact-check information from Politifact and Gossipcop. The news retweet graphs were originally extracted by FakeNewsNet. Each graph is a hierarchical tree-structured graph where the root node represents the news; the leaf nodes are Twitter users who retweeted the root news. A user node has an edge to the news node if he/she retweeted the news tweet. Two user nodes have an edge if one user retweeted the news tweet from the other user.

We crawled near 20 million historical tweets from users who participated in fake news propagation in FakeNewsNet to generate node features in the dataset. We incorporate four node feature types in the dataset, the 768-dimensional bert and 300-dimensional spacy features are encoded using pretrained BERT and spaCy word2vec, respectively. The 10-dimensional profile feature is obtained from a Twitter account's profile. You can refer to profile_feature.py for profile feature extraction. The 310-dimensional content feature is composed of a 300-dimensional user comment word2vec (spaCy) embedding plus a 10-dimensional profile feature.

The dataset statistics is shown below:

Data #Graphs #Fake News #Total Nodes #Total Edges #Avg. Nodes per Graph
Politifact 314 157 41,054 40,740 131
Gossipcop 5464 2732 314,262 308,798 58

Please refer to the paper for more details about the UPFD dataset.

Due to the Twitter policy, we could not release the crawled user's historical tweets publicly. To get the corresponding Twitter user information, you can refer to the news lists under \data in our github repo and map the news id to FakeNewsNet. Then, you can crawl the user information by following the instruction on FakeNewsNet. In the UPFD project, we use Tweepy and Twitter Developer API to get the user information.
Fake news Detection
kaggle.com
zip
Updated Jul 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marvel Samuel (2023). Fake news Detection [Dataset]. https://www.kaggle.com/datasets/marvelsamuel/fake-news-detection
Explore at:
zip(12075965 bytes)Available download formats
Dataset updated
Jul 24, 2023
Authors
Marvel Samuel
Description
Dataset

This dataset was created by Marvel Samuel

Contents
Global Fake Image Detection Market Size By Component (Software, Services),...
verifiedmarketresearch.com
Updated Apr 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VERIFIED MARKET RESEARCH (2024). Global Fake Image Detection Market Size By Component (Software, Services), By Application (Incident Reporting, Cyber Defense), By Geographic Scope And Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/fake-image-detection-market/
Explore at:
Dataset updated
Apr 8, 2024
Dataset provided by
Verified Market Researchhttps://www.verifiedmarketresearch.com/
Authors
VERIFIED MARKET RESEARCH
License
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Time period covered
2024 - 2031
Area covered
Global
Description
Fake Image Detection Market size was valued at USD 964.45 Million in 2023 and is projected to reach USD 4,107.03 Million by 2031, growing at a CAGR of 23.00% from 2024 to 2031.

Global Fake Image Detection Market Overview

The widespread availability of image editing software and social media platforms has led to a surge in fake images, including digitally altered photos and manipulated visual content. This trend has fueled the demand for advanced detection solutions capable of identifying and flagging fake images in real-time. With the proliferation of fake news and misinformation online, there is an increasing awareness among consumers, businesses, and governments about the importance of combating digital fraud and preserving the authenticity of visual content. This heightened concern is driving investments in fake image detection technologies to mitigate the risks associated with misinformation.

However, despite advancements in AI and ML, detecting fake images remains a complex and challenging task, especially when dealing with sophisticated techniques such as deepfakes and generative adversarial networks (GANs). Developing robust detection algorithms capable of identifying increasingly sophisticated forms of image manipulation poses a significant challenge for researchers and developers. The deployment of fake image detection technologies raises concerns about privacy and data ethics, particularly regarding the collection and analysis of visual content shared online. Balancing the need for effective detection with respect for user privacy and ethical considerations remains a key challenge for stakeholders in the Fake Image Detection Market.
fake news detection
kaggle.com
zip
Updated Apr 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
kalaivani (2023). fake news detection [Dataset]. https://www.kaggle.com/datasets/kalainisha/fake-news-detection
Explore at:
zip(24178590 bytes)Available download formats
Dataset updated
Apr 11, 2023
Authors
kalaivani
Description
Dataset

This dataset was created by kalaivani

Contents
O
UPFD-GOS (User Preference-aware Fake News Detection)
opendatalab.com
zip
Updated Apr 18, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lehigh University (2023). UPFD-GOS (User Preference-aware Fake News Detection) [Dataset]. https://opendatalab.com/OpenDataLab/UPFD-GOS
Explore at:
zip(1601216611 bytes)Available download formats
Dataset updated
Apr 18, 2023
Dataset provided by
Illinois Institute of Technology
Lehigh University
University of Illinois at Chicago
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
For benchmarking, please refer to its variant UPFD-POL and UPFD-GOS. The dataset has been integrated with Pytorch Geometric (PyG) and Deep Graph Library (DGL). You can load the dataset after installing the latest versions of PyG or DGL. The UPFD dataset includes two sets of tree-structured graphs curated for evaluating binary graph classification, graph anomaly detection, and fake/real news detection tasks. The dataset is dumped in the form of Pytorch-Geometric dataset object. You can easily load the data and run various GNN models using PyG. The dataset includes fake&real news propagation (retweet) networks on Twitter built according to fact-check information from Politifact and Gossipcop. The news retweet graphs were originally extracted by FakeNewsNet. Each graph is a hierarchical tree-structured graph where the root node represents the news; the leaf nodes are Twitter users who retweeted the root news. A user node has an edge to the news node if he/she retweeted the news tweet. Two user nodes have an edge if one user retweeted the news tweet from the other user. We crawled near 20 million historical tweets from users who participated in fake news propagation in FakeNewsNet to generate node features in the dataset. We incorporate four node feature types in the dataset, the 768-dimensional bert and 300-dimensional spacy features are encoded using pretrained BERT and spaCy word2vec, respectively. The 10-dimensional profile feature is obtained from a Twitter account's profile. You can refer to profile_feature.py for profile feature extraction. The 310-dimensional content feature is composed of a 300-dimensional user comment word2vec (spaCy) embedding plus a 10-dimensional profile feature. The dataset statistics is shown below: Data

Graphs

Fake News

Total Nodes

Total Edges

Avg. Nodes per Graph

Politifact 314 157 41,054 40,740 131 Gossipcop 5464 2732 314,262 308,798 58 Please refer to the paper for more details about the UPFD dataset. Due to the Twitter policy, we could not release the crawled user's historical tweets publicly. To get the corresponding Twitter user information, you can refer to the news lists under \data in our github repo and map the news id to FakeNewsNet. Then, you can crawl the user information by following the instruction on FakeNewsNet. In the UPFD project, we use Tweepy and Twitter Developer API to get the user information.
P
BuzzFeed-Webis Fake News Corpus 2016 Dataset
paperswithcode.com
webis.de
+1more
Updated May 8, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Potthast; Johannes Kiesel; Kevin Reinartz; Janek Bevendorff; Benno Stein (2018). BuzzFeed-Webis Fake News Corpus 2016 Dataset [Dataset]. https://paperswithcode.com/dataset/buzzfeed-webis-fake-news-corpus-2016
Explore at:
Dataset updated
May 8, 2018
Authors
Martin Potthast; Johannes Kiesel; Kevin Reinartz; Janek Bevendorff; Benno Stein
Description
The BuzzFeed-Webis Fake News Corpus 16 comprises the output of 9 publishers in a week close to the US elections. Among the selected publishers are 6 prolific hyperpartisan ones (three left-wing and three right-wing), and three mainstream publishers (see Table 1). All publishers earned Facebook’s blue checkmark, indicating authenticity and an elevated status within the network. For seven weekdays (September 19 to 23 and September 26 and 27), every post and linked news article of the 9 publishers was fact-checked by professional journalists at BuzzFeed. In total, 1,627 articles were checked, 826 mainstream, 256 left-wing and 545 right-wing. The imbalance between categories results from differing publication frequencies.
d
Multi-Fake-DetectiVE - Dataset - B2FIND
b2find.dkrz.de
Updated Jan 31, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Multi-Fake-DetectiVE - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/957d18c1-c5b3-5b47-9524-95d9fea1d021
Explore at:
Dataset updated
Jan 31, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset includes social media posts and news articles, containing both a textual and a visual component, concerning the Ukrainian-Russian war started in February 2022. The dataset was collected to perform two distinct sub-tasks: Multimodal Fake News Detection, and Cross-modal Relation Classification in fake and real news. Given a piece of content (e.g., a social media post or a news article) that includes both a visual and a textual component, the first sub-task aims to detect if the content is a real or a fake news. The second sub-task aims to understand how the visual and textual components of news can influence each other. Given a text and an accompanying image, the sub-task intends to determine whether the combination of the two aims to mislead the interpretation of the reader about one or the other, or not. The data to be used for the two sub-tasks are stored in two separate sub-folders. Each sub-folder includes: (i) a training set, which contains data collected from February 2022 to September 2022, (ii) a contemporary test set, which includes data collected in the same time window as the training set, and (iii) a future test set, which contains data collected in a subsequent time window, specifically from October 2022 to December 2022.
v
Global AI Content Detector Market Size By Application, By End-Use Industry,...
verifiedmarketresearch.com
Updated Jun 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VERIFIED MARKET RESEARCH (2024). Global AI Content Detector Market Size By Application, By End-Use Industry, By Geographic Scope And Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/ai-content-detector-market/
Explore at:
Dataset updated
Jun 10, 2024
Dataset authored and provided by
VERIFIED MARKET RESEARCH
License
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Time period covered
2024 - 2031
Area covered
Global
Description
AI Content Detector Market size is growing at a moderate pace with substantial growth rates over the last few years and is estimated that the market will grow significantly in the forecasted period i.e. 2024 to 2031.

Global AI Content Detector Market Drivers

Rising Concerns Over Misinformation: The proliferation of fake news, misinformation, and inappropriate content on digital platforms has led to increased demand for AI content detectors. These systems can identify and flag misleading or harmful content, helping to combat the spread of misinformation online.

Regulatory Compliance Requirements: Stringent regulations and legal obligations regarding content moderation, data privacy, and online safety drive the adoption of AI content detectors. Organizations need to comply with regulations such as the General Data Protection Regulation (GDPR) and the Digital Millennium Copyright Act (DMCA), spurring investment in AI-powered content moderation solutions.

Growing Volume of User-Generated Content: The exponential growth of user-generated content on social media platforms, forums, and websites has overwhelmed traditional moderation methods. AI content detectors offer scalable and efficient solutions for analyzing vast amounts of content in real-time, enabling platforms to maintain a safe and healthy online environment for users.

Advancements in AI and Machine Learning Technologies: Continuous advancements in artificial intelligence and machine learning algorithms have enhanced the capabilities of content detection systems. AI models trained on large datasets can accurately identify various types of content, including text, images, videos, and audio, with high precision and speed.

Brand Protection and Reputation Management: Businesses prioritize brand protection and reputation management in the digital age, as negative content or misinformation can severely impact brand image and consumer trust. AI content detectors help organizations identify and address potentially damaging content proactively, safeguarding their reputation and brand integrity.

Demand for Personalized User Experiences: Consumers increasingly expect personalized online experiences tailored to their preferences and interests. AI content detectors analyze user behavior and content interactions to deliver relevant and engaging content, driving user engagement and satisfaction.

Adoption of AI-Powered Moderation Tools by Social Media Platforms: Major social media platforms and online communities are investing in AI-powered moderation tools to enforce community guidelines, prevent abuse and harassment, and maintain a positive user experience. The need to address content moderation challenges at scale drives the adoption of AI content detectors.

Mitigation of Online Risks and Threats: Online platforms face various risks and threats, including cyberbullying, hate speech, terrorist propaganda, and child exploitation content. AI content detectors help mitigate these risks by identifying and removing harmful content, thereby creating a safer online environment for users.

Cost and Resource Efficiency: Traditional content moderation methods, such as manual review by human moderators, are time-consuming, labor-intensive, and costly. AI content detectors automate the moderation process, reducing the need for human intervention and minimizing operational expenses for organizations.
Fake and Real News Detection Use Machine Learning
kaggle.com
zip
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sabriar Bishal (2023). Fake and Real News Detection Use Machine Learning [Dataset]. https://www.kaggle.com/datasets/sabriarbishal/fake-and-real-news-detection-use-machine-learning
Explore at:
zip(42975911 bytes)Available download formats
Dataset updated
Nov 21, 2023
Authors
Sabriar Bishal
Description
Dataset

This dataset was created by Sabriar Bishal

Contents
Machine Hack: Fake News Content Detection
kaggle.com
zip
Updated Sep 11, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sumit Saha (2020). Machine Hack: Fake News Content Detection [Dataset]. https://www.kaggle.com/ssismasterchief/machine-hack-fake-news-content-detection
Explore at:
zip(573738 bytes)Available download formats
Dataset updated
Sep 11, 2020
Authors
Sumit Saha
Description
Dataset

This dataset was created by Sumit Saha

Contents
E
Political Bias Classifier
live.european-language-grid.eu
Updated Nov 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Political Bias Classifier [Dataset]. https://live.european-language-grid.eu/catalogue/tool-service/9278
Explore at:
Dataset updated
Nov 28, 2021
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The model classifies the political bias of a German text into 5 classes: far-left, center-left, center, center-right, far-right. It uses a TF-IDF vectorizer to preprocess documents. Then, a Random Forest classifier is applied on the resulting vectors to determine the final class.
f
Data_Sheet_1_True, justified, belief? Partisanship weakens the positive...
figshare.com
docx
Updated Sep 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Jeffrey Sude; Gil Sharon; Shira Dvir-Gvirsman (2023). Data_Sheet_1_True, justified, belief? Partisanship weakens the positive effect of news media literacy on fake news detection.docx [Dataset]. http://doi.org/10.3389/fpsyg.2023.1242865.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fpsyg.2023.1242865.s001
Dataset updated
Sep 26, 2023
Dataset provided by
Frontiers
Authors
Daniel Jeffrey Sude; Gil Sharon; Shira Dvir-Gvirsman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
To investigate how people assess whether politically consistent news is real or fake, two studies (N = 1,008; N = 1,397) with adult American participants conducted in 2020 and 2022 utilized a within-subjects experimental design to investigate perceptions of news accuracy. When a mock Facebook post with either fake (Study 1) or real (Study 2) news content was attributed to an alternative (vs. a mainstream) news outlet, it was, on average, perceived to be less accurate. Those with beliefs reflecting News Media Literacy demonstrated greater sensitivity to the outlet’s status. This relationship was itself contingent on the strength of the participant’s partisan identity. Strong partisans high in News Media Literacy defended the accuracy of politically consistent content, even while recognizing that an outlet was unfamiliar. These results highlight the fundamental importance of looking at the interaction between user-traits and features of social media news posts when examining learning from political news on social media.
Fake news detection dataset
kaggle.com
zip
Updated Apr 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmed Khursheed (2024). Fake news detection dataset [Dataset]. https://www.kaggle.com/datasets/ahmedkhursheed007/fake-news-detection-dataset
Explore at:
zip(38841253 bytes)Available download formats
Dataset updated
Apr 26, 2024
Authors
Ahmed Khursheed
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Ahmed Khursheed

Released under Apache 2.0

Contents
E
Data from: A Data set for Information Spreading over the News
live.european-language-grid.eu
txt
Updated Nov 28, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). A Data set for Information Spreading over the News [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7719
Explore at:
txtAvailable download formats
Dataset updated
Nov 28, 2021
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract:
Analyzing the spread of information related to a specific event in the news has many potential applications. Consequently, various systems have been developed to facilitate the analysis of information spreadings such as detection of disease propagation and identification of the spreading of fake news through social media. There are several open challenges in the process of discerning information propagation, among them the lack of resources for training and evaluation. This paper describes the process of compiling a corpus from the EventRegistry global media monitoring system. We focus on information spreading in three domains: sports (i.e. the FIFA WorldCup), natural disasters (i.e. earthquakes), and climate change (i.e.global warming). This corpus is a valuable addition to the currently available datasets to examine the spreading of information about various kinds of events.Introduction:Domain-specific gaps in information spreading are ubiquitous and may exist due to economic conditions, political factors, or linguistic, geographical, time-zone, cultural, and other barriers. These factors potentially contribute to obstructing the flow of local as well as international news. We believe that there is a lack of research studies that examine, identify, and uncover the reasons for barriers in information spreading. Additionally, there is limited availability of datasets containing news text and metadata including time, place, source, and other relevant information. When a piece of information starts spreading, it implicitly raises questions such as asHow far does the information in the form of news reach out to the public?Does the content of news remain the same or changes to a certain extent?Do the cultural values impact the information especially when the same news will get translated in other languages?Statistics about datasets:
Statistics about datasets:
--------------------------------------------------------------------------------------------------------------------------------------
# Domain Event Type Articles Per Language Total Articles
1 Sports FIFA World Cup 983-en, 762-sp, 711-de, 10-sl, 216-pt 2679
2 Natural Disaster Earthquake 941-en, 999-sp, 937-de, 19-sl, 251-pt 3194
3 Climate Changes Global Warming 996-en, 298-sp, 545-de, 8-sl, 97-pt 1945
--------------------------------------------------------------------------------------------------------------------------------------

Data	#Graphs	#Fake News	#Total Nodes	#Total Edges	#Avg. Nodes per Graph
Politifact	314	157	41,054	40,740	131
Gossipcop	5464	2732	314,262	308,798	58

Facebook

Twitter

Click to copy link

Link copied

Cite

Arianna D'Ulizia; Maria Chiara Caschera; Fernando ferri; Patrizia Grifoni (2021). Repository of fake news detection datasets [Dataset]. http://doi.org/10.4121/14151755.v1

Repository of fake news detection datasets

Explore at:

5 scholarly articles cite this dataset (View in Google Scholar)

txtAvailable download formats

Unique identifier

https://doi.org/10.4121/14151755.v1

Dataset updated

Mar 18, 2021

Dataset provided by

4TU.ResearchData

Authors

Arianna D'Ulizia; Maria Chiara Caschera; Fernando ferri; Patrizia Grifoni

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

The dataset contains a list of twenty-seven freely available evaluation datasets for fake news detection analysed according to eleven main characteristics (i.e., news domain, application purpose, type of disinformation, language, size, news content, rating scale, spontaneity, media platform, availability, and extraction time)

Clear search

Close search

Google apps

Main menu

Repository of fake news detection datasets

COVID Fake News Detector

WELFake dataset for fake news detection in text data

Fake News Content Detection

Dataset

Contents

CT-FAN-22 corpus: A Multilingual dataset for Fake News Detection

MM-COVID Dataset

UPFD Dataset

Fake news Detection

Dataset

Contents

Global Fake Image Detection Market Size By Component (Software, Services),...

fake news detection

Dataset

Contents

UPFD-GOS (User Preference-aware Fake News Detection)

Graphs

Fake News

Total Nodes

Total Edges

Avg. Nodes per Graph

BuzzFeed-Webis Fake News Corpus 2016 Dataset

Multi-Fake-DetectiVE - Dataset - B2FIND

Global AI Content Detector Market Size By Application, By End-Use Industry,...

Fake and Real News Detection Use Machine Learning

Dataset

Contents

Machine Hack: Fake News Content Detection

Dataset

Contents

Political Bias Classifier

Data_Sheet_1_True, justified, belief? Partisanship weakens the positive...

Fake news detection dataset

Dataset

Contents

Data from: A Data set for Information Spreading over the News

Repository of fake news detection datasets