By downloading the data, you agree with the terms & conditions mentioned below:
Data Access: The data in the research collection may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use them only for research purposes.
Summaries, analyses and interpretations of the linguistic properties of the information may be derived and published, provided it is impossible to reconstruct the information from these summaries. You may not try identifying the individuals whose texts are included in this dataset. You may not try to identify the original entry on the fact-checking site. You are not permitted to publish any portion of the dataset besides summary statistics or share it with anyone else.
We grant you the right to access the collection's content as described in this agreement. You may not otherwise make unauthorised commercial use of, reproduce, prepare derivative works, distribute copies, perform, or publicly display the collection or parts of it. You are responsible for keeping and storing the data in a way that others cannot access. The data is provided free of charge.
Citation
Please cite our work as
@InProceedings{clef-checkthat:2022:task3, author = {K{"o}hler, Juliane and Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Wiegand, Michael and Siegel, Melanie and Mandl, Thomas}, title = "Overview of the {CLEF}-2022 {CheckThat}! Lab Task 3 on Fake News Detection", year = {2022}, booktitle = "Working Notes of CLEF 2022---Conference and Labs of the Evaluation Forum", series = {CLEF~'2022}, address = {Bologna, Italy},}
@article{shahi2021overview, title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection}, author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas}, journal={Working Notes of CLEF}, year={2021} }
Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English and German.
Task 3: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. The training data will be released in batches and roughly about 1264 articles with the respective label in English language. Our definitions for the categories are as follows:
False - The main claim made in an article is untrue.
Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.
True - This rating indicates that the primary elements of the main claim are demonstrably true.
Other- An article that cannot be categorised as true, false, or partially false due to a lack of evidence about its claims. This category includes articles in dispute and unproven articles.
Cross-Lingual Task (German)
Along with the multi-class task for the English language, we have introduced a task for low-resourced language. We will provide the data for the test in the German language. The idea of the task is to use the English data and the concept of transfer to build a classification model for the German language.
Input Data
The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:
ID- Unique identifier of the news article
Title- Title of the news article
text- Text mentioned inside the news article
our rating - class of the news article as false, partially false, true, other
Output data format
public_id- Unique identifier of the news article
predicted_rating- predicted class
Sample File
public_id, predicted_rating 1, false 2, true
IMPORTANT!
We have used the data from 2010 to 2022, and the content of fake news is mixed up with several topics like elections, COVID-19 etc.
Baseline: For this task, we have created a baseline system. The baseline system can be found at https://zenodo.org/record/6362498
Related Work
Shahi GK. AMUSED: An Annotation Framework of Multi-modal Social Media Data. arXiv preprint arXiv:2010.00502. 2020 Oct 1.https://arxiv.org/pdf/2010.00502.pdf
G. K. Shahi and D. Nandini, “FakeCovid – a multilingual cross-domain fact check news dataset for covid-19,” in workshop Proceedings of the 14th International AAAI Conference on Web and Social Media, 2020. http://workshop-proceedings.icwsm.org/abstract?id=2020_14
Shahi, G. K., Dirkson, A., & Majchrzak, T. A. (2021). An exploratory study of covid-19 misinformation on twitter. Online Social Networks and Media, 22, 100104. doi: 10.1016/j.osnem.2020.100104
Shahi, G. K., Struß, J. M., & Mandl, T. (2021). Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection. Working Notes of CLEF.
Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeno, A., Míguez, R., Shaar, S., ... & Mandl, T. (2021, March). The CLEF-2021 CheckThat! lab on detecting check-worthy claims, previously fact-checked claims, and fake news. In European Conference on Information Retrieval (pp. 639-649). Springer, Cham.
Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeño, A., Míguez, R., Shaar, S., ... & Kartal, Y. S. (2021, September). Overview of the CLEF–2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 264-291). Springer, Cham.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A web framework designed for researchers to perform comparative analysis of various machine learning algorithms in the context of fake news detection. The folder also includes several datasets for experimentation, alongside the source code. The rise of social media has transformed the landscape of news dissemination, presenting new challenges in combating the spread of fake news. This study addresses the automated detection of misinformation within written content, a task that has prompted extensive research efforts across various methodologies. We evaluate existing benchmarks, introduce a novel hybrid word embedding model, and implement a web framework for text classification. Our approach integrates traditional frequency–inverse document frequency (TF–IDF) methods with sophisticated feature extraction techniques, considering linguistic, psychological, morphological, and grammatical aspects of the text. Through a series of experiments on diverse datasets, applying transfer and incremental learning techniques, we demonstrate the effectiveness of our hybrid model in surpassing benchmarks and outperforming alternative experimental setups. Furthermore, our findings emphasize the importance of dataset alignment and balance in transfer learning, as well as the utility of incremental learning in maintaining high detection performance while reducing runtime. This research offers promising avenues for further advancements in fake news detection methodologies, with implications for future research and development in this critical domain.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The dataset contains a list of twenty-seven freely available evaluation datasets for fake news detection analysed according to eleven main characteristics (i.e., news domain, application purpose, type of disinformation, language, size, news content, rating scale, spontaneity, media platform, availability, and extraction time)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains multimodal content—images and text—from two sources:Fakeddit Subset: A collection of social media posts (primarily from Reddit) that often include misleading or questionable content.Snopes Crawled Data (Medical Fake News Only): Fact-checking information focused solely on medical misinformation, as curated and verified by Snopes.
Data Access: The data in the research collection provided may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use them only for research purposes. Due to these restrictions, the collection is not open data. Please fill out the form and upload the Data Sharing Agreement at Google Form.
Citation
Please cite our work as
@article{shahi2021overview, title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection}, author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas}, journal={Working Notes of CLEF}, year={2021} }
Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English and German.
Subtask 3: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. The training data will be released in batches and roughly about 900 articles with the respective label. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. Our definitions for the categories are as follows:
False - The main claim made in an article is untrue.
Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.
True - This rating indicates that the primary elements of the main claim are demonstrably true.
Other- An article that cannot be categorised as true, false, or partially false due to lack of evidence about its claims. This category includes articles in dispute and unproven articles.
Input Data
The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:
Output data format
Sample File
public_id, predicted_rating
1, false
2, true
Sample file
public_id, predicted_domain
1, health
2, crime
Additional data for Training
To train your model, the participant can use additional data with a similar format; some datasets are available over the web. We don't provide the background truth for those datasets. For testing, we will not use any articles from other datasets. Some of the possible sources:
IMPORTANT!
Evaluation Metrics
This task is evaluated as a classification task. We will use the F1-macro measure for the ranking of teams. There is a limit of 5 runs (total and not per day), and only one person from a team is allowed to submit runs.
Baseline: For this task, we have created a baseline system. The baseline system can be found at https://zenodo.org/record/6362498
Submission Link: Coming soon
Related Work
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study proposes an innovative approach for multimodal fake news detection that utilizes a stick-breaking smoothed Dirichlet distribution. This approach enables the model to capture intricate, subtle interactions between modalities more effectively, thereby improving detection performance and enhancing the system's adaptability to various forms of fake news content
This dataset was created by Shivam Chaurasia
This is a multimodal dataset used in the paper "On the Role of Images for Analyzing Claims in Social Media", accepted at CLEOPATRA-2021 (2nd International Workshop on Cross-lingual Event-centric Open Analytics), co-located with The Web Conference 2021.
The four datasets are curated for two different tasks that broadly come under fake news detection. Originally, the datasets were released as part of challenges or papers for text-based NLP tasks and are further extended here with corresponding images.
1. clef_en and clef_ar are English and Arabic Twitter datasets for claim check-worthiness detection released in CLEF CheckThat! 2020 Barrón-Cedeno et al. [1].
2. lesa is an English Twitter dataset for claim detection released by Gupta et al.[2]
3. mediaeval is an English Twitter dataset for conspiracy detection released in MediaEval 2020 Workshop by Pogorelov et al.[3]
The dataset details like data curation and annotation process can be found in the cited papers.
Datasets released here with corresponding images are relatively smaller than the original text-based tweets. The data statistics are as follows:
1. clef_en: 281
2. clef_ar: 2571
3. lesa: 1395
4. mediaeval: 1724
Each folder has two sub-folders and a json file data.json that consists of crawled tweets. Two sub-folders are:
1. images: This Contains crawled images with the same name as tweet-id in data.json.
2. splits: This contains 5-fold splits used for training and evaluation in our paper. Each file in this folder is a csv with two columns
Code for the paper: https://github.com/cleopatra-itn/image_text_claim_detection
If you find the dataset and the paper useful, please cite our paper and the corresponding dataset papers[1,2,3]
Cheema, Gullal S., et al. "On the Role of Images for Analyzing Claims in Social Media" 2nd International Workshop on Cross-lingual Event-centric Open Analytics (CLEOPATRA) co-located with The Web Conf 2021.
[1] Barrón-Cedeno, Alberto, et al. "Overview of CheckThat! 2020: Automatic identification and verification of claims in social media." International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, Cham, 2020.
[2] Gupta, Shreya, et al. "LESA: Linguistic Encapsulation and Semantic Amalgamation Based Generalised Claim Detection from Online Content." arXiv preprint arXiv:2101.11891 (2021).
[3] Pogorelov, Konstantin, et al. "FakeNews: Corona Virus and 5G Conspiracy Task at MediaEval 2020." MediaEval 2020 Workshop. 2020.
This dataset was created by Sumit Saha
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We designed a larger and more generic Word Embedding over Linguistic Features for Fake News Detection (WELFake) dataset of 72,134 news articles with 35,028 real and 37,106 fake news. For this, we merged four popular news datasets (i.e. Kaggle, McIntire, Reuters, BuzzFeed Political) to prevent over-fitting of classifiers and to provide more text data for better ML training.
Dataset contains four columns: Serial number (starting from 0); Title (about the text news heading); Text (about the news content); and Label (0 = fake and 1 = real).
There are 78098 data entries in csv file out of which only 72134 entries are accessed as per the data frame.
This dataset is a part of our ongoing research on "Fake News Prediction on Social Media Website" as a doctoral degree program of Mr. Pawan Kumar Verma and is partially supported by the ARTICONF project funded by the European Union’s Horizon 2020 research and innovation program.
MM-COVID is a dataset for fake news detection related to COVID-19. This dataset provides the multilingual fake news and the relevant social context. It contains 3,981 pieces of fake news content and 7,192 trustworthy information from English, Spanish, Portuguese, Hindi, French and Italian, 6 different languages.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Truth Seeker Dataset is designed to support research in the detection and classification of misinformation on social media platforms, particularly focusing on Twitter. This dataset is part of a broader initiative to enhance the understanding of how machine learning (ML) and natural language processing (NLP) can be leveraged to identify fake news and misleading content in real-time.Dataset CompositionThe Truth Seeker Dataset comprises a substantial collection of social media posts that have been meticulously labeled as either real or fake. It was constructed using advanced ML algorithms and NLP techniques to analyze the language patterns in social media communications. The dataset includes:Raw Social Media Posts: A diverse range of tweets that reflect various topics and sentiments.Labeling: Each post is annotated with binary labels indicating its authenticity (real or fake).Feature Sets: Two distinct subsets of the dataset have been created using different NLP vectorization methods—Word2Vec and TF-IDF. This allows researchers to explore how different feature representations impact model performance.Research ApplicationsThe primary aim of the Truth Seeker Dataset is to facilitate the development and validation of models that can accurately classify social media content. Key applications include:Fake News Detection: Utilizing various ML algorithms, including Random Forest and AdBoost, which have demonstrated high F1 scores in preliminary evaluations.Model Comparison: Researchers can compare the effectiveness of different ML approaches on the same dataset, enabling a clearer understanding of which methods yield the best results in detecting misinformation.Algorithm Development: The dataset serves as a benchmark for developing new algorithms aimed at improving accuracy in fake news detection.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
In our society, the spread of fake news is increasing drastically due to which people are believing in unreal incidents. So it is utmost necessary to differentiate the real news from the fake ones and present them to society.
There are three CSV files: 1.train.csv- 25117 rows and 5 columns named id, title, author, text, and label. 2.test.csv- 5881 rows and 4 columns named id, title, author, and text. 3.submit.csv- It is a sample file of how the output file should be.
Everyone deserves to know the actual happenings of the world. A model should be developed which will be able to differentiate the fake news from the real ones. Use the train data to build your model and use the test data to evaluate that model.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This work is accepted at LREC 2020. Paper is available at https://arxiv.org/pdf/2004.08789.pdf
Observing the damages that can be done by the rapid propagation of fake news in various sectors like politics and finance, automatic identification of fake news using linguistic analysis has drawn the attention of the research community. However, such methods are largely being developed for English where low resource languages remain out of the focus. But the risks spawned by fake and manipulative news are not confined by languages. In this work, we propose an annotated dataset of ~50K news that can be used for building automated fake news detection systems for a low resource language like Bangla. Additionally, we provide an analysis of the dataset and develop a benchmark system with state of the art NLP techniques to identify Bangla fake news. To create this system, we explore traditional linguistic features and neural network based methods. We expect this dataset will be a valuable resource for building technologies to prevent the spreading of fake news and contribute in research with low resource languages.
File Format Authentic-48K.csv and Fake-1K.csv
Column Title | Description |
---|---|
articleID | ID of the news |
domain | News publisher's site name |
date | Category of the news |
category | Category of the news |
headline | Headline of the news |
content | Article or body of the news |
label | 1 or 0 . '1' for authentic '0' for fake |
LabeledAuthentic-7K.csv, LabeledFake-1K.csv
Column Title | Description |
---|---|
articleID | ID of the news |
domain | News publisher's site name |
date | Published Date |
category | Category of the news |
source | Source of the news. (One who can verify the claim of the news) |
relation | Related or Unrelated. Related if headline matches with content's claim otherwise it is labeled as Unrelated |
headline | Headline of the news |
content | Article or body of the news |
label | 1 or 0 . '1' for authentic '0' for fake |
F-type | Type of fake news (Clickbait, Satire, Fake(Misleading or False Context)) |
F-type is only present in LabeledFake-1K.csv
@InProceedings{Hossain20.1084,
author = {Md Zobaer Hossain, Md Ashraful Rahman, Md Saiful Islam, Sudipta Kar},
title = "{BanFakeNews: A Dataset for Detecting Fake News in Bangla}",
booktitle = {Proceedings of the Twelfth International Conference on Language Resources and Evaluation (LREC 2020)},
year = {2020},
publisher = {European Language Resources Association (ELRA)},
language = {english}
}
This dataset was created by Raj Jain
This dataset was created by Mohit
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The AI content detection market is experiencing rapid growth, driven by the increasing prevalence of AI-generated content and the rising need for authenticity verification across various sectors. The market, estimated at $2 billion in 2025, is projected to exhibit a robust Compound Annual Growth Rate (CAGR) of 25% from 2025 to 2033, reaching an estimated $10 billion by 2033. This expansion is fueled by several key factors. The educational sector is a significant driver, with institutions increasingly employing AI detection tools to combat plagiarism and ensure academic integrity. Furthermore, the news and media industries are adopting these technologies to identify and mitigate the spread of misinformation generated by AI. The development of sophisticated algorithms capable of detecting subtle nuances in AI-generated text, images, and audio is another significant contributing factor. Different types of detectors – text, image/video, and audio – cater to diverse needs, driving market segmentation. However, challenges remain, including the ongoing arms race between AI content generators and detectors, the potential for false positives, and concerns surrounding data privacy and ethical implications. The market's geographical distribution reflects the higher adoption in technologically advanced regions like North America and Europe, but rapid growth is anticipated in Asia Pacific, driven by rising internet penetration and increasing awareness of AI-generated content issues. The competitive landscape is dynamic, with both established players and emerging startups vying for market share. Companies like Turnitin and Copyleaks are well-positioned with their existing platforms, while newcomers are innovating with specialized detectors and AI-powered solutions. The market is characterized by both subscription-based models and one-time purchases, providing various options for users. Future growth will depend on ongoing technological advancements, the ability to adapt to evolving AI writing techniques, and the expansion into new applications and industries. The increasing integration of AI detection tools into existing platforms and workflows will further accelerate market adoption, making it a significant investment opportunity in the rapidly evolving technological landscape.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Amid growing concerns about the proliferation and belief in false or misleading information, the study addresses the need for automated detection in the public domain. It revisits and replicates scattered findings using a comprehensive, content-oriented, and feature-based approach. This method reliably identifies deceptive news-like content and highlights the importance of individual features in guiding the prediction algorithm. Employing explainable machine learning, the study explores content patterns for disinformation detection. Results from a tree-based approach on real-world data indicate that content-related characteristics can—when used in combination—facilitate the early detection of deceptive news-like articles. The study concludes by discussing the practical implications of computationally detecting the malicious language of disinformation.
This dataset was created by Sarthak malik
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Fake Image Detection Market size was valued at USD 276.65 Million in 2024 and is projected to reach USD 1417.59 Million by 2031, growing at a CAGR of 22.66% from 2024 to 2031.
Global Fake Image Detection Market Overview
The widespread availability of image editing software and social media platforms has led to a surge in fake images, including digitally altered photos and manipulated visual content. This trend has fueled the demand for advanced detection solutions capable of identifying and flagging fake images in real-time. With the proliferation of fake news and misinformation online, there is an increasing awareness among consumers, businesses, and governments about the importance of combating digital fraud and preserving the authenticity of visual content. This heightened concern is driving investments in fake image detection technologies to mitigate the risks associated with misinformation.
However, despite advancements in AI and ML, detecting fake images remains a complex and challenging task, especially when dealing with sophisticated techniques such as deepfakes and generative adversarial networks (GANs). Developing robust detection algorithms capable of identifying increasingly sophisticated forms of image manipulation poses a significant challenge for researchers and developers. The deployment of fake image detection technologies raises concerns about privacy and data ethics, particularly regarding the collection and analysis of visual content shared online. Balancing the need for effective detection with respect for user privacy and ethical considerations remains a key challenge for stakeholders in the Fake Image Detection Market.
By downloading the data, you agree with the terms & conditions mentioned below:
Data Access: The data in the research collection may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use them only for research purposes.
Summaries, analyses and interpretations of the linguistic properties of the information may be derived and published, provided it is impossible to reconstruct the information from these summaries. You may not try identifying the individuals whose texts are included in this dataset. You may not try to identify the original entry on the fact-checking site. You are not permitted to publish any portion of the dataset besides summary statistics or share it with anyone else.
We grant you the right to access the collection's content as described in this agreement. You may not otherwise make unauthorised commercial use of, reproduce, prepare derivative works, distribute copies, perform, or publicly display the collection or parts of it. You are responsible for keeping and storing the data in a way that others cannot access. The data is provided free of charge.
Citation
Please cite our work as
@InProceedings{clef-checkthat:2022:task3, author = {K{"o}hler, Juliane and Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Wiegand, Michael and Siegel, Melanie and Mandl, Thomas}, title = "Overview of the {CLEF}-2022 {CheckThat}! Lab Task 3 on Fake News Detection", year = {2022}, booktitle = "Working Notes of CLEF 2022---Conference and Labs of the Evaluation Forum", series = {CLEF~'2022}, address = {Bologna, Italy},}
@article{shahi2021overview, title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection}, author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas}, journal={Working Notes of CLEF}, year={2021} }
Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English and German.
Task 3: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. The training data will be released in batches and roughly about 1264 articles with the respective label in English language. Our definitions for the categories are as follows:
False - The main claim made in an article is untrue.
Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.
True - This rating indicates that the primary elements of the main claim are demonstrably true.
Other- An article that cannot be categorised as true, false, or partially false due to a lack of evidence about its claims. This category includes articles in dispute and unproven articles.
Cross-Lingual Task (German)
Along with the multi-class task for the English language, we have introduced a task for low-resourced language. We will provide the data for the test in the German language. The idea of the task is to use the English data and the concept of transfer to build a classification model for the German language.
Input Data
The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:
ID- Unique identifier of the news article
Title- Title of the news article
text- Text mentioned inside the news article
our rating - class of the news article as false, partially false, true, other
Output data format
public_id- Unique identifier of the news article
predicted_rating- predicted class
Sample File
public_id, predicted_rating 1, false 2, true
IMPORTANT!
We have used the data from 2010 to 2022, and the content of fake news is mixed up with several topics like elections, COVID-19 etc.
Baseline: For this task, we have created a baseline system. The baseline system can be found at https://zenodo.org/record/6362498
Related Work
Shahi GK. AMUSED: An Annotation Framework of Multi-modal Social Media Data. arXiv preprint arXiv:2010.00502. 2020 Oct 1.https://arxiv.org/pdf/2010.00502.pdf
G. K. Shahi and D. Nandini, “FakeCovid – a multilingual cross-domain fact check news dataset for covid-19,” in workshop Proceedings of the 14th International AAAI Conference on Web and Social Media, 2020. http://workshop-proceedings.icwsm.org/abstract?id=2020_14
Shahi, G. K., Dirkson, A., & Majchrzak, T. A. (2021). An exploratory study of covid-19 misinformation on twitter. Online Social Networks and Media, 22, 100104. doi: 10.1016/j.osnem.2020.100104
Shahi, G. K., Struß, J. M., & Mandl, T. (2021). Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection. Working Notes of CLEF.
Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeno, A., Míguez, R., Shaar, S., ... & Mandl, T. (2021, March). The CLEF-2021 CheckThat! lab on detecting check-worthy claims, previously fact-checked claims, and fake news. In European Conference on Information Retrieval (pp. 639-649). Springer, Cham.
Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeño, A., Míguez, R., Shaar, S., ... & Kartal, Y. S. (2021, September). Overview of the CLEF–2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 264-291). Springer, Cham.