2 datasets found

FakeNewsNet
kaggle.com
dataverse.harvard.edu
zip
Updated Nov 2, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deepak Mahudeswaran (2018). FakeNewsNet [Dataset]. https://www.kaggle.com/mdepak/fakenewsnet
Explore at:
zip(17409594 bytes)Available download formats
Dataset updated
Nov 2, 2018
Authors
Deepak Mahudeswaran
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
FakeNewsNet

This is a repository for an ongoing data collection project for fake news research at ASU. We describe and compare FakeNewsNet with other existing datasets in Fake News Detection on Social Media: A Data Mining Perspective. We also perform a detail analysis of FakeNewsNet dataset, and build a fake news detection model on this dataset in Exploiting Tri-Relationship for Fake News Detection

JSON version of this dataset is available in github here. The new version of this dataset described in FakeNewNet will be published soon or you can email authors for more info.

News Content

It includes all the fake news articles, with the news content attributes as follows:

source: It indicates the author or publisher of the news article

headline: It refers to the short text that aims to catch the attention of readers and relates well to the major of the news topic.

_body_text_: It elaborates the details of news story. Usually there is a major claim which shaped the angle of the publisher and is specifically highlighted and elaborated upon.

_image_video_: It is an important part of body content of news article, which provides visual cues to frame the story.

Social Context

It includes the social engagements of fake news articles from Twitter. We extract profiles, posts and social network information for all relevant users.

_user_profile_: It includes a set of profile fields that describe the users' basic information

_user_content_: It collects the users' recent posts on Twitter

_user_followers_: It includes the follower list of the relevant users

_user_followees_: It includes list of users that are followed by relevant users

References

If you use this dataset, please cite the following papers:

@article{shu2017fake, title={Fake News Detection on Social Media: A Data Mining Perspective}, author={Shu, Kai and Sliva, Amy and Wang, Suhang and Tang, Jiliang and Liu, Huan}, journal={ACM SIGKDD Explorations Newsletter}, volume={19}, number={1}, pages={22--36}, year={2017}, publisher={ACM} }

@article{shu2017exploiting, title={Exploiting Tri-Relationship for Fake News Detection}, author={Shu, Kai and Wang, Suhang and Liu, Huan}, journal={arXiv preprint arXiv:1712.07709}, year={2017} }

@article{shu2018fakenewsnet, title={FakeNewsNet: A Data Repository with News Content, Social Context and Dynamic Information for Studying Fake News on Social Media}, author={Shu, Kai and Mahudeswaran, Deepak and Wang, Suhang and Lee, Dongwon and Liu, Huan}, journal={arXiv preprint arXiv:1809.01286}, year={2018} }
CT-FAN-21 corpus: A dataset for Fake News Detection
zenodo.org
Updated Oct 23, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl (2022). CT-FAN-21 corpus: A dataset for Fake News Detection [Dataset]. http://doi.org/10.5281/zenodo.4714517
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.4714517
Dataset updated
Oct 23, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl
Description
Data Access: The data in the research collection provided may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use it only for research purposes. Due to these restrictions, the collection is not open data. Please download the Agreement at Data Sharing Agreement and send the signed form to fakenewstask@gmail.com .

Citation

Please cite our work as

@article{shahi2021overview, title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection}, author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas}, journal={Working Notes of CLEF}, year={2021} }

Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English.

Subtask 3A: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. The training data will be released in batches and roughly about 900 articles with the respective label. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. Our definitions for the categories are as follows:

False - The main claim made in an article is untrue.

Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.

True - This rating indicates that the primary elements of the main claim are demonstrably true.

Other- An article that cannot be categorised as true, false, or partially false due to lack of evidence about its claims. This category includes articles in dispute and unproven articles.

Subtask 3B: Topical Domain Classification of News Articles (English) Fact-checkers require background expertise to identify the truthfulness of an article. The categorisation will help to automate the sampling process from a stream of data. Given the text of a news article, determine the topical domain of the article (English). This is a classification problem. The task is to categorise fake news articles into six topical categories like health, election, crime, climate, election, education. This task will be offered for a subset of the data of Subtask 3A.

Input Data

The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:

Task 3a

ID- Unique identifier of the news article

Title- Title of the news article

text- Text mentioned inside the news article

our rating - class of the news article as false, partially false, true, other

Task 3b

public_id- Unique identifier of the news article

Title- Title of the news article

text- Text mentioned inside the news article

domain - domain of the given news article(applicable only for task B)

Output data format

Task 3a

public_id- Unique identifier of the news article

predicted_rating- predicted class

Sample File

public_id, predicted_rating 1, false 2, true

Task 3b

public_id- Unique identifier of the news article

predicted_domain- predicted domain

Sample file

public_id, predicted_domain 1, health 2, crime

Additional data for Training

To train your model, the participant can use additional data with a similar format; some datasets are available over the web. We don't provide the background truth for those datasets. For testing, we will not use any articles from other datasets. Some of the possible source:

Fakenews Classification Datasets

Fake News Detection Challenge KDD 2020

FakeNewsNet

IMPORTANT!

Fake news article used for task 3b is a subset of task 3a.

We have used the data from 2010 to 2021, and the content of fake news is mixed up with several topics like election, COVID-19 etc.

Evaluation Metrics

This task is evaluated as a classification task. We will use the F1-macro measure for the ranking of teams. There is a limit of 5 runs (total and not per day), and only one person from a team is allowed to submit runs.

Submission Link: https://competitions.codalab.org/competitions/31238

Related Work

Shahi GK. AMUSED: An Annotation Framework of Multi-modal Social Media Data. arXiv preprint arXiv:2010.00502. 2020 Oct 1.https://arxiv.org/pdf/2010.00502.pdf

G. K. Shahi and D. Nandini, “FakeCovid – a multilingualcross-domain fact check news dataset for covid-19,” inWorkshop Proceedings of the 14th International AAAIConference on Web and Social Media, 2020. http://workshop-proceedings.icwsm.org/abstract?id=2020_14

Shahi, G. K., Dirkson, A., & Majchrzak, T. A. (2021). An exploratory study of covid-19 misinformation on twitter. Online Social Networks and Media, 22, 100104. doi: 10.1016/j.osnem.2020.100104
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Deepak Mahudeswaran (2018). FakeNewsNet [Dataset]. https://www.kaggle.com/mdepak/fakenewsnet

FakeNewsNet

Fake News, MisInformation, Data Mining

Explore at:

zip(17409594 bytes)Available download formats

Dataset updated

Nov 2, 2018

Authors

Deepak Mahudeswaran

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

FakeNewsNet

This is a repository for an ongoing data collection project for fake news research at ASU. We describe and compare FakeNewsNet with other existing datasets in Fake News Detection on Social Media: A Data Mining Perspective. We also perform a detail analysis of FakeNewsNet dataset, and build a fake news detection model on this dataset in Exploiting Tri-Relationship for Fake News Detection

JSON version of this dataset is available in github here. The new version of this dataset described in FakeNewNet will be published soon or you can email authors for more info.

News Content

It includes all the fake news articles, with the news content attributes as follows:

source: It indicates the author or publisher of the news article
headline: It refers to the short text that aims to catch the attention of readers and relates well to the major of the news topic.
_body_text_: It elaborates the details of news story. Usually there is a major claim which shaped the angle of the publisher and is specifically highlighted and elaborated upon.
_image_video_: It is an important part of body content of news article, which provides visual cues to frame the story.

Social Context

It includes the social engagements of fake news articles from Twitter. We extract profiles, posts and social network information for all relevant users.

_user_profile_: It includes a set of profile fields that describe the users' basic information
_user_content_: It collects the users' recent posts on Twitter
_user_followers_: It includes the follower list of the relevant users
_user_followees_: It includes list of users that are followed by relevant users

References

If you use this dataset, please cite the following papers:

@article{shu2017fake, title={Fake News Detection on Social Media: A Data Mining Perspective}, author={Shu, Kai and Sliva, Amy and Wang, Suhang and Tang, Jiliang and Liu, Huan}, journal={ACM SIGKDD Explorations Newsletter}, volume={19}, number={1}, pages={22--36}, year={2017}, publisher={ACM} }

@article{shu2017exploiting, title={Exploiting Tri-Relationship for Fake News Detection}, author={Shu, Kai and Wang, Suhang and Liu, Huan}, journal={arXiv preprint arXiv:1712.07709}, year={2017} }

@article{shu2018fakenewsnet, title={FakeNewsNet: A Data Repository with News Content, Social Context and Dynamic Information for Studying Fake News on Social Media}, author={Shu, Kai and Mahudeswaran, Deepak and Wang, Suhang and Lee, Dongwon and Liu, Huan}, journal={arXiv preprint arXiv:1809.01286}, year={2018} }

Clear search

Close search

Google apps

Main menu

FakeNewsNet

FakeNewsNet

News Content

Social Context

References

CT-FAN-21 corpus: A dataset for Fake News Detection

FakeNewsNet

Fake News, MisInformation, Data Mining

FakeNewsNet

News Content

Social Context

References