https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/
This is a cleaned and splitted version of this dataset (https://www.kaggle.com/datasets/sadikaljarif/fake-news-detection-dataset-english) Labels:
Fake News: 0 Real News: 1 You can find the cleansing script at: https://github.com/ErfanMoosaviMonazzah/Fake-News-Detection
There exists one directory named Fake_True
. The directory contains 2 csv files:
- Fake.csv
- True.csv
Fake.csv
- contains unreliable news
True.csv
- contains reliable news
Each csv file has 4 columns: 1. title 2. text 3. subject 4. date
Collected from MEJBAH AHAMMAD
Data Access: The data in the research collection provided may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use them only for research purposes. Due to these restrictions, the collection is not open data. Please fill out the form and upload the Data Sharing Agreement at Google Form.
Citation
Please cite our work as
@article{shahi2021overview, title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection}, author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas}, journal={Working Notes of CLEF}, year={2021} }
Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English and German.
Subtask 3: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. The training data will be released in batches and roughly about 900 articles with the respective label. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. Our definitions for the categories are as follows:
False - The main claim made in an article is untrue.
Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.
True - This rating indicates that the primary elements of the main claim are demonstrably true.
Other- An article that cannot be categorised as true, false, or partially false due to lack of evidence about its claims. This category includes articles in dispute and unproven articles.
Input Data
The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:
Output data format
Sample File
public_id, predicted_rating
1, false
2, true
Sample file
public_id, predicted_domain
1, health
2, crime
Additional data for Training
To train your model, the participant can use additional data with a similar format; some datasets are available over the web. We don't provide the background truth for those datasets. For testing, we will not use any articles from other datasets. Some of the possible sources:
IMPORTANT!
Evaluation Metrics
This task is evaluated as a classification task. We will use the F1-macro measure for the ranking of teams. There is a limit of 5 runs (total and not per day), and only one person from a team is allowed to submit runs.
Baseline: For this task, we have created a baseline system. The baseline system can be found at https://zenodo.org/record/6362498
Submission Link: Coming soon
Related Work
This dataset was created by Muna
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We designed a larger and more generic Word Embedding over Linguistic Features for Fake News Detection (WELFake) dataset of 72,134 news articles with 35,028 real and 37,106 fake news. For this, we merged four popular news datasets (i.e. Kaggle, McIntire, Reuters, BuzzFeed Political) to prevent over-fitting of classifiers and to provide more text data for better ML training.
Dataset contains four columns: Serial number (starting from 0); Title (about the text news heading); Text (about the news content); and Label (0 = fake and 1 = real).
There are 78098 data entries in csv file out of which only 72134 entries are accessed as per the data frame.
This dataset is a part of our ongoing research on "Fake News Prediction on Social Media Website" as a doctoral degree program of Mr. Pawan Kumar Verma and is partially supported by the ARTICONF project funded by the European Union’s Horizon 2020 research and innovation program.
This dataset was created by Mohit
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Adapted from: [https://www.kaggle.com/datasets/csmalarkodi/covid-fake-news-dataset].
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
It is trained on data of around 45,000 news articles with a mix of real and fake news articles. The dataset is provided by the University of Victoria.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F21948533%2Fa9c02011dc538fde2c967d56bfdb4778%2Fsubjects.png?generation=1735462720561554&alt=media" alt="distribution of topics">
The dataset contains two types of articles fake and real News. This dataset was collected from realworld sources; the truthful articles were obtained by crawling articles from Reuters.com (News website). As for the fake news articles, they were collected from different sources. The fake news articles were collected from unreliable websites that were flagged by Politifact (a fact-checking organization in the USA) and Wikipedia. The dataset contains different types of articles on different topics, however, the majority of articles focus on political and World news topics.
The dataset consists of two CSV files. The first file named “True.csv” contains more than 12,600 articles from reuter.com. The second file named “Fake.csv” contains more than 12,600 articles from different fake news outlet resources. Each article contains the following information: article title, text, type and the date the article was published on. To match the fake news data collected for kaggle.com, we focused mostly on collecting articles from 2016 to 2017. The data collected were cleaned and processed, however, the punctuations and mistakes that existed in the fake news were kept in the text.
The following table gives a breakdown of the categories and number of articles per category.
News | Size (Number of articles) | Subjects | |
---|---|---|---|
Real-News | 21417 | Type | Articles size |
World-News | 10145 | ||
Politics-News | 11272 | ||
Fake-News | 23481 | Type | Articles size |
Government-News | 1570 | ||
Middle-east | 778 | ||
US News | 783 | ||
Left-news | 4459 | ||
Politics | 6841 | ||
News | 9050 |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Kaggle Person Detection is a dataset for object detection tasks - it contains Person annotations for 1,111 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
This dataset was created by KUMAR RANJAN KAMILA
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Face Mask Detection Kaggle is a dataset for object detection tasks - it contains Masks annotations for 848 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Kaggle Fish Detection is a dataset for instance segmentation tasks - it contains Fish annotations for 2,208 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Obj Detect Kaggle is a dataset for object detection tasks - it contains Face Masks annotations for 848 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The BDG2 open data set consists of 3,053 energy meters from 1,636 non-residential buildings with a range of two full years (2016 and 2017) at an hourly frequency (17,544 measurements per meter resulting in approximately 53.6 million measurements). These meters are collected from 19 sites across North America and Europe, and they measure electrical, heating and cooling water, steam, and solar energy as well as water and irrigation meters. Part of these data was used in the Great Energy Predictor III (GEPIII) competition hosted by the ASHRAE organization in October-December 2019. This subset includes data from 2,380 meters from 1,448 buildings that were used in the GEPIII, a machine learning competition for long-term prediction with an application to measurement and verification. This paper describes the process of data collection, cleaning, and convergence of time-series meter data, the meta-data about the buildings, and complementary weather data. This data set can be used for further prediction benchmarking and prototyping as well as anomaly detection, energy analysis, and building type classification.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Humans From Https Www.kaggle.com Datasets Constantinwerner Human Detection Dataset is a dataset for object detection tasks - it contains Human annotations for 548 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
This dataset was created by Raj Jain
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Traffic Kaggle Detection is a dataset for object detection tasks - it contains Traffic Signs annotations for 875 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
From the site: Masks play a crucial role in protecting the health of individuals against respiratory diseases, as is one of the few precautions available for COVID-19 in the absence of immunization. With this dataset, it is possible to create a model to detect people wearing masks, not wearing them, or wearing masks improperly. This dataset contains 853 images belonging to the 3 classes, as well as their bounding boxes in the PASCAL VOC format. The classes are:
With mask; Without mask; Mask worn incorrectly.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Kaggle FPT Mask Detection is a dataset for object detection tasks - it contains Mask NoMask IncorrectMask annotations for 1,824 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Gun Kaggle is a dataset for object detection tasks - it contains Gun Danger annotations for 2,988 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/
This is a cleaned and splitted version of this dataset (https://www.kaggle.com/datasets/sadikaljarif/fake-news-detection-dataset-english) Labels:
Fake News: 0 Real News: 1 You can find the cleansing script at: https://github.com/ErfanMoosaviMonazzah/Fake-News-Detection