5 datasets found
  1. Z

    Dataset for the paper: "Monant Medical Misinformation Dataset: Mapping...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 22, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jakub Simko (2022). Dataset for the paper: "Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked Claims" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5996863
    Explore at:
    Dataset updated
    Apr 22, 2022
    Dataset provided by
    Jakub Simko
    Matus Tomlein
    Ivan Srba
    Robert Moro
    Branislav Pecher
    Elena Stefancova
    Maria Bielikova
    Description

    Overview

    This dataset of medical misinformation was collected and is published by Kempelen Institute of Intelligent Technologies (KInIT). It consists of approx. 317k news articles and blog posts on medical topics published between January 1, 1998 and February 1, 2022 from a total of 207 reliable and unreliable sources. The dataset contains full-texts of the articles, their original source URL and other extracted metadata. If a source has a credibility score available (e.g., from Media Bias/Fact Check), it is also included in the form of annotation. Besides the articles, the dataset contains around 3.5k fact-checks and extracted verified medical claims with their unified veracity ratings published by fact-checking organisations such as Snopes or FullFact. Lastly and most importantly, the dataset contains 573 manually and more than 51k automatically labelled mappings between previously verified claims and the articles; mappings consist of two values: claim presence (i.e., whether a claim is contained in the given article) and article stance (i.e., whether the given article supports or rejects the claim or provides both sides of the argument).

    The dataset is primarily intended to be used as a training and evaluation set for machine learning methods for claim presence detection and article stance classification, but it enables a range of other misinformation related tasks, such as misinformation characterisation or analyses of misinformation spreading.

    Its novelty and our main contributions lie in (1) focus on medical news article and blog posts as opposed to social media posts or political discussions; (2) providing multiple modalities (beside full-texts of the articles, there are also images and videos), thus enabling research of multimodal approaches; (3) mapping of the articles to the fact-checked claims (with manual as well as predicted labels); (4) providing source credibility labels for 95% of all articles and other potential sources of weak labels that can be mined from the articles' content and metadata.

    The dataset is associated with the research paper "Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked Claims" accepted and presented at ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22).

    The accompanying Github repository provides a small static sample of the dataset and the dataset's descriptive analysis in a form of Jupyter notebooks.

    Options to access the dataset

    There are two ways how to get access to the dataset:

    1. Static dump of the dataset available in the CSV format
    2. Continuously updated dataset available via REST API

    In order to obtain an access to the dataset (either to full static dump or REST API), please, request the access by following instructions provided below.

    References

    If you use this dataset in any publication, project, tool or in any other form, please, cite the following papers:

    @inproceedings{SrbaMonantPlatform, author = {Srba, Ivan and Moro, Robert and Simko, Jakub and Sevcech, Jakub and Chuda, Daniela and Navrat, Pavol and Bielikova, Maria}, booktitle = {Proceedings of Workshop on Reducing Online Misinformation Exposure (ROME 2019)}, pages = {1--7}, title = {Monant: Universal and Extensible Platform for Monitoring, Detection and Mitigation of Antisocial Behavior}, year = {2019} }

    @inproceedings{SrbaMonantMedicalDataset, author = {Srba, Ivan and Pecher, Branislav and Tomlein Matus and Moro, Robert and Stefancova, Elena and Simko, Jakub and Bielikova, Maria}, booktitle = {Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22)}, numpages = {11}, title = {Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked Claims}, year = {2022}, doi = {10.1145/3477495.3531726}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3477495.3531726}, }

    Dataset creation process

    In order to create this dataset (and to continuously obtain new data), we used our research platform Monant. The Monant platform provides so called data providers to extract news articles/blogs from news/blog sites as well as fact-checking articles from fact-checking sites. General parsers (from RSS feeds, Wordpress sites, Google Fact Check Tool, etc.) as well as custom crawler and parsers were implemented (e.g., for fact checking site Snopes.com). All data is stored in the unified format in a central data storage.

    Ethical considerations

    The dataset was collected and is published for research purposes only. We collected only publicly available content of news/blog articles. The dataset contains identities of authors of the articles if they were stated in the original source; we left this information, since the presence of an author's name can be a strong credibility indicator. However, we anonymised the identities of the authors of discussion posts included in the dataset.

    The main identified ethical issue related to the presented dataset lies in the risk of mislabelling of an article as supporting a false fact-checked claim and, to a lesser extent, in mislabelling an article as not containing a false claim or not supporting it when it actually does. To minimise these risks, we developed a labelling methodology and require an agreement of at least two independent annotators to assign a claim presence or article stance label to an article. It is also worth noting that we do not label an article as a whole as false or true. Nevertheless, we provide partial article-claim pair veracities based on the combination of claim presence and article stance labels.

    As to the veracity labels of the fact-checked claims and the credibility (reliability) labels of the articles' sources, we take these from the fact-checking sites and external listings such as Media Bias/Fact Check as they are and refer to their methodologies for more details on how they were established.

    Lastly, the dataset also contains automatically predicted labels of claim presence and article stance using our baselines described in the next section. These methods have their limitations and work with certain accuracy as reported in this paper. This should be taken into account when interpreting them.

    Reporting mistakes in the dataset The mean to report considerable mistakes in raw collected data or in manual annotations is by creating a new issue in the accompanying Github repository. Alternately, general enquiries or requests can be sent at info [at] kinit.sk.

    Dataset structure

    Raw data

    At first, the dataset contains so called raw data (i.e., data extracted by the Web monitoring module of Monant platform and stored in exactly the same form as they appear at the original websites). Raw data consist of articles from news sites and blogs (e.g. naturalnews.com), discussions attached to such articles, fact-checking articles from fact-checking portals (e.g. snopes.com). In addition, the dataset contains feedback (number of likes, shares, comments) provided by user on social network Facebook which is regularly extracted for all news/blogs articles.

    Raw data are contained in these CSV files (and corresponding REST API endpoints):

    sources.csv

    articles.csv

    article_media.csv

    article_authors.csv

    discussion_posts.csv

    discussion_post_authors.csv

    fact_checking_articles.csv

    fact_checking_article_media.csv

    claims.csv

    feedback_facebook.csv

    Note: Personal information about discussion posts' authors (name, website, gravatar) are anonymised.

    Annotations

    Secondly, the dataset contains so called annotations. Entity annotations describe the individual raw data entities (e.g., article, source). Relation annotations describe relation between two of such entities.

    Each annotation is described by the following attributes:

    category of annotation (annotation_category). Possible values: label (annotation corresponds to ground truth, determined by human experts) and prediction (annotation was created by means of AI method).

    type of annotation (annotation_type_id). Example values: Source reliability (binary), Claim presence. The list of possible values can be obtained from enumeration in annotation_types.csv.

    method which created annotation (method_id). Example values: Expert-based source reliability evaluation, Fact-checking article to claim transformation method. The list of possible values can be obtained from enumeration methods.csv.

    its value (value). The value is stored in JSON format and its structure differs according to particular annotation type.

    At the same time, annotations are associated with a particular object identified by:

    entity type (parameter entity_type in case of entity annotations, or source_entity_type and target_entity_type in case of relation annotations). Possible values: sources, articles, fact-checking-articles.

    entity id (parameter entity_id in case of entity annotations, or source_entity_id and target_entity_id in case of relation annotations).

    The dataset provides specifically these entity annotations:

    Source reliability (binary). Determines validity of source (website) at a binary scale with two options: reliable source and unreliable source.

    Article veracity. Aggregated information about veracity from article-claim pairs.

    The dataset provides specifically these relation annotations:

    Fact-checking article to claim mapping. Determines mapping between fact-checking article and claim.

    Claim presence. Determines presence of claim in article.

    Claim stance. Determines stance of an article to a claim.

    Annotations are contained in these CSV files (and corresponding REST API endpoints):

    entity_annotations.csv

    relation_annotations.csv

    Note: Identification of human annotators authors (email provided in the annotation app) is anonymised.

  2. d

    Data Package: Archival Gossip - Dataset - B2FIND

    • b2find.dkrz.de
    Updated Oct 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Data Package: Archival Gossip - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/6c215a9d-2ed2-52fb-ad27-b94602c493b6
    Explore at:
    Dataset updated
    Oct 21, 2023
    Description

    This zip-file contains all relevant research data from ArchivalGossip.com. This digital project is made up of a Wordpress-site (archivalgossip.com) and an Omeka-collection (archivalgossip.com/collection). Their shared theme is the investigation of the role of gossip in nineteenth-century life-writing and print culture. This data collection includes PDFs of all relevant posts and .csv files for all items in the two collections: 1) Cushmania (relating to the career of actress Charlotte Cushman), and 2) Gossip Columns and Columnists (concerning the rise of gossip in nineteenth-century journalism). Both collections consist largely of digitized versions of archival material, such as letters and newspaper articles, their descriptive metadata, and transcripts of their contents.

  3. n

    2013 Baluchistan, Pakistan Post-Earthquake Stereogrammetric DEMs - Dataset -...

    • nationaldataplatform.org
    Updated Feb 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). 2013 Baluchistan, Pakistan Post-Earthquake Stereogrammetric DEMs - Dataset - CKAN [Dataset]. https://nationaldataplatform.org/catalog/dataset/2013-baluchistan-pakistan-post-earthquake-stereogrammetric-dems
    Explore at:
    Dataset updated
    Feb 28, 2024
    Area covered
    Balochistan
    Description

    This dataset includes a suite of post-seismic 2m resolution DEMs post-dating the 2013 Mw7.7 Baluchistan earthquake. The DEMs were constructed using the open-source software package SETSM (https://mjremotesensing.wordpress.com/setsm/) from DigitalGlobe base imagery (©DigitalGlobe 2018). DEMs were mosaicked and vertically registered using the Ames StereoPipeline (https://ti.arc.nasa.gov/tech/asr/groups/intelligent-robotics/ngt/stereo/). The base imagery included 0.5m and 0.3m resolution panchromatic imagery from QuickBird, GEOEYE, WorldView1, WorldView2, and WorldView3 (©DigitalGlobe 2018). The dataset includes DEMs generated from in-track stereo imagery, as well as DEMs constructed from mixed pairs of non-in-track stereo images. The post-event DEMs are not vertically registered to a pre-existing DEM in order to avoid removal of relative co-seismic offsets between the pre- and post-event pairs. The generation of this dataset was funded by NASA in cooperation with the U.S. Geological Survey. A complete description of the generation of this dataset and the images that were used to construct the DEMs can be found in the associated manuscript: Barnhart WD, Gold RD, Shea HN, Peterson KE, Briggs RW, Harbor DJ (2019) Vertical coseismic offsets derived from high-resolution stereogrammetric DSM differencing: The 2013 Baluchistan, Pakistan earthquake, JGR-Solid Earth. DOI:10.1029/2018JB017107 The naming convention of individual DEMs is detailed in the metadata. Note: The source data for this project are the individual 2 meter DEMs that were constructed with the SETSM open-source software (described above). However, in order to utilize the OpenTopography webmap interface, these DEMs were mosaiced into a single seamless mosaic of post-earthquake topography. Details on how this single mosaic was created are in the metadata. Users are cautioned that files created using the webmap interface will use the averaged, mosaic data. For certain applications, users may wish to utilize the source datasets by downloading the original DEMs via the "Source" directory under the "Bulk Download" section of the OpenTopography website.

  4. d

    IBAMar DATABASE: 4 decades of oceanographic sampling on the Western...

    • b2find.dkrz.de
    Updated Oct 29, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). IBAMar DATABASE: 4 decades of oceanographic sampling on the Western Mediterranean Sea - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/fbbec80b-ef1a-5a9f-869d-3b311c581679
    Explore at:
    Dataset updated
    Oct 29, 2023
    Area covered
    Mediterranean Sea
    Description

    IBAMar (http://www.ba.ieo.es/ibamar) is a regional database that puts together all physical and biochemical data obtained by multiparametric probes (CTDs equipped with different sensors), during the cruises managed by the Balearic Center of the Spanish Institute of Oceanography (COB-IEO). It has been recently extended to include data obtained with classical hydro casts using oceanographic Niskin or Nansen bottles. The result is a database that includes a main core of hydrographic data: temperature (T), salinity (S), dissolved oxygen (DO), fluorescence and turbidity; complemented by bio-chemical data: dissolved inorganic nutrients (phosphate, nitrate, nitrite and silicate) and chlorophyll-a.In IBAMar Database, different technologies and methodologies were used by different teams along the four decades of data sampling in the COB-IEO. Despite of this fact, data have been reprocessed using the same protocols, and a standard QC has been applied to each variable. Therefore it provides a regional database of homogeneous, good quality data. Data acquisition and quality control (QC): 94% of the data are CTDs Sbe911 and Sbe25. S and DO were calibrated on board using water samples, whenever a Rossetta was available (70% of the cases). All CTD data from Seabird CTDs were reviewed and post processed with the software provided by Sea-Bird Electronics. Data were averaged to get 1 dbar vertical resolution. General sampling methodology and pre processing are described in https://ibamardatabase.wordpress.com/home/). Manual QC include visual checks of metadata, duplicate data and outliers. Automatic QC include range check of variables by area (north of Balearic Islands, south of BI and Alboran Sea) and depth (27 standard levels), check for spikes and check for density inversions. Nutrients QC includes a preliminary control and a range check on the observed level of the data to detect outliers around objectively analyzed data fields. A quality flag is assigned as an integer number, depending on the result of the QC check. IBAMar metadata and mean values at 27 standard levels will be freely accessible, but the origin of the data must always be explicitly acknowledged and cite this dataset and the associated work by A. Aparicio-Gonzalez et al. in Earth System Science Data.Full resolution data are provided under a collaboration agreement between the requesting institution and the IEO. It has to fulfil the data policy of the IBAMar database and also data policy of the specific programs that funded the requested data.Any user who accepts the IBAMar data release guidelines (https://ibamardatabase.wordpress.com/ibamar-data-policy/) may ask Jose Luis Lóepez-Jurado (mailto:ibamar@ba.ieo.es) to obtain an account to download these datasets.

  5. Application Domain of 5,000 GitHub Repositories

    • zenodo.org
    • data.niaid.nih.gov
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hudson Silva Borges; Marco Tulio Valente; Hudson Silva Borges; Marco Tulio Valente (2020). Application Domain of 5,000 GitHub Repositories [Dataset]. http://doi.org/10.5281/zenodo.804474
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Hudson Silva Borges; Marco Tulio Valente; Hudson Silva Borges; Marco Tulio Valente
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We provide a manual classification of the application domain of 5,000 GitHub repositories (the most popular ones, by number of stars, on January, 2017).

    We classified each system in one of the following application domains:

    • Application software: systems that provide functionalities to end-users, like browsers and text editors (e.g., WordPress/WordPress and adobe/brackets).
    • System software: systems that provide services and infrastructure to other systems, like operating systems, middleware, and databases (e.g., torvalds/linux and mongodb/mongo).
    • Web libraries and frameworks (e.g., twbs/bootstrap and angular/angular.js).
    • Non-web libraries and frameworks (e.g., google/guava and facebook/fresco).
    • Software tools: systems that support development tasks, like IDEs, package managers, and compilers (e.g., Homebrew/homebrew and git/git).
    • Documentation: repositories with documentation, tutorials, source code examples, etc. (e.g., iluwatar/java-design-patterns).

    To cite the dataset, please use the following paper (which proposes and uses a first dataset version):

    Hudson Borges, Andre Hora, Marco Tulio Valente. Understanding the Factors that Impact the Popularity of GitHub Repositories. In 32nd IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 334-344, 2016.

  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jakub Simko (2022). Dataset for the paper: "Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked Claims" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5996863

Dataset for the paper: "Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked Claims"

Explore at:
Dataset updated
Apr 22, 2022
Dataset provided by
Jakub Simko
Matus Tomlein
Ivan Srba
Robert Moro
Branislav Pecher
Elena Stefancova
Maria Bielikova
Description

Overview

This dataset of medical misinformation was collected and is published by Kempelen Institute of Intelligent Technologies (KInIT). It consists of approx. 317k news articles and blog posts on medical topics published between January 1, 1998 and February 1, 2022 from a total of 207 reliable and unreliable sources. The dataset contains full-texts of the articles, their original source URL and other extracted metadata. If a source has a credibility score available (e.g., from Media Bias/Fact Check), it is also included in the form of annotation. Besides the articles, the dataset contains around 3.5k fact-checks and extracted verified medical claims with their unified veracity ratings published by fact-checking organisations such as Snopes or FullFact. Lastly and most importantly, the dataset contains 573 manually and more than 51k automatically labelled mappings between previously verified claims and the articles; mappings consist of two values: claim presence (i.e., whether a claim is contained in the given article) and article stance (i.e., whether the given article supports or rejects the claim or provides both sides of the argument).

The dataset is primarily intended to be used as a training and evaluation set for machine learning methods for claim presence detection and article stance classification, but it enables a range of other misinformation related tasks, such as misinformation characterisation or analyses of misinformation spreading.

Its novelty and our main contributions lie in (1) focus on medical news article and blog posts as opposed to social media posts or political discussions; (2) providing multiple modalities (beside full-texts of the articles, there are also images and videos), thus enabling research of multimodal approaches; (3) mapping of the articles to the fact-checked claims (with manual as well as predicted labels); (4) providing source credibility labels for 95% of all articles and other potential sources of weak labels that can be mined from the articles' content and metadata.

The dataset is associated with the research paper "Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked Claims" accepted and presented at ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22).

The accompanying Github repository provides a small static sample of the dataset and the dataset's descriptive analysis in a form of Jupyter notebooks.

Options to access the dataset

There are two ways how to get access to the dataset:

  1. Static dump of the dataset available in the CSV format
  2. Continuously updated dataset available via REST API

In order to obtain an access to the dataset (either to full static dump or REST API), please, request the access by following instructions provided below.

References

If you use this dataset in any publication, project, tool or in any other form, please, cite the following papers:

@inproceedings{SrbaMonantPlatform, author = {Srba, Ivan and Moro, Robert and Simko, Jakub and Sevcech, Jakub and Chuda, Daniela and Navrat, Pavol and Bielikova, Maria}, booktitle = {Proceedings of Workshop on Reducing Online Misinformation Exposure (ROME 2019)}, pages = {1--7}, title = {Monant: Universal and Extensible Platform for Monitoring, Detection and Mitigation of Antisocial Behavior}, year = {2019} }

@inproceedings{SrbaMonantMedicalDataset, author = {Srba, Ivan and Pecher, Branislav and Tomlein Matus and Moro, Robert and Stefancova, Elena and Simko, Jakub and Bielikova, Maria}, booktitle = {Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22)}, numpages = {11}, title = {Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked Claims}, year = {2022}, doi = {10.1145/3477495.3531726}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3477495.3531726}, }

Dataset creation process

In order to create this dataset (and to continuously obtain new data), we used our research platform Monant. The Monant platform provides so called data providers to extract news articles/blogs from news/blog sites as well as fact-checking articles from fact-checking sites. General parsers (from RSS feeds, Wordpress sites, Google Fact Check Tool, etc.) as well as custom crawler and parsers were implemented (e.g., for fact checking site Snopes.com). All data is stored in the unified format in a central data storage.

Ethical considerations

The dataset was collected and is published for research purposes only. We collected only publicly available content of news/blog articles. The dataset contains identities of authors of the articles if they were stated in the original source; we left this information, since the presence of an author's name can be a strong credibility indicator. However, we anonymised the identities of the authors of discussion posts included in the dataset.

The main identified ethical issue related to the presented dataset lies in the risk of mislabelling of an article as supporting a false fact-checked claim and, to a lesser extent, in mislabelling an article as not containing a false claim or not supporting it when it actually does. To minimise these risks, we developed a labelling methodology and require an agreement of at least two independent annotators to assign a claim presence or article stance label to an article. It is also worth noting that we do not label an article as a whole as false or true. Nevertheless, we provide partial article-claim pair veracities based on the combination of claim presence and article stance labels.

As to the veracity labels of the fact-checked claims and the credibility (reliability) labels of the articles' sources, we take these from the fact-checking sites and external listings such as Media Bias/Fact Check as they are and refer to their methodologies for more details on how they were established.

Lastly, the dataset also contains automatically predicted labels of claim presence and article stance using our baselines described in the next section. These methods have their limitations and work with certain accuracy as reported in this paper. This should be taken into account when interpreting them.

Reporting mistakes in the dataset The mean to report considerable mistakes in raw collected data or in manual annotations is by creating a new issue in the accompanying Github repository. Alternately, general enquiries or requests can be sent at info [at] kinit.sk.

Dataset structure

Raw data

At first, the dataset contains so called raw data (i.e., data extracted by the Web monitoring module of Monant platform and stored in exactly the same form as they appear at the original websites). Raw data consist of articles from news sites and blogs (e.g. naturalnews.com), discussions attached to such articles, fact-checking articles from fact-checking portals (e.g. snopes.com). In addition, the dataset contains feedback (number of likes, shares, comments) provided by user on social network Facebook which is regularly extracted for all news/blogs articles.

Raw data are contained in these CSV files (and corresponding REST API endpoints):

sources.csv

articles.csv

article_media.csv

article_authors.csv

discussion_posts.csv

discussion_post_authors.csv

fact_checking_articles.csv

fact_checking_article_media.csv

claims.csv

feedback_facebook.csv

Note: Personal information about discussion posts' authors (name, website, gravatar) are anonymised.

Annotations

Secondly, the dataset contains so called annotations. Entity annotations describe the individual raw data entities (e.g., article, source). Relation annotations describe relation between two of such entities.

Each annotation is described by the following attributes:

category of annotation (annotation_category). Possible values: label (annotation corresponds to ground truth, determined by human experts) and prediction (annotation was created by means of AI method).

type of annotation (annotation_type_id). Example values: Source reliability (binary), Claim presence. The list of possible values can be obtained from enumeration in annotation_types.csv.

method which created annotation (method_id). Example values: Expert-based source reliability evaluation, Fact-checking article to claim transformation method. The list of possible values can be obtained from enumeration methods.csv.

its value (value). The value is stored in JSON format and its structure differs according to particular annotation type.

At the same time, annotations are associated with a particular object identified by:

entity type (parameter entity_type in case of entity annotations, or source_entity_type and target_entity_type in case of relation annotations). Possible values: sources, articles, fact-checking-articles.

entity id (parameter entity_id in case of entity annotations, or source_entity_id and target_entity_id in case of relation annotations).

The dataset provides specifically these entity annotations:

Source reliability (binary). Determines validity of source (website) at a binary scale with two options: reliable source and unreliable source.

Article veracity. Aggregated information about veracity from article-claim pairs.

The dataset provides specifically these relation annotations:

Fact-checking article to claim mapping. Determines mapping between fact-checking article and claim.

Claim presence. Determines presence of claim in article.

Claim stance. Determines stance of an article to a claim.

Annotations are contained in these CSV files (and corresponding REST API endpoints):

entity_annotations.csv

relation_annotations.csv

Note: Identification of human annotators authors (email provided in the annotation app) is anonymised.

Search
Clear search
Close search
Google apps
Main menu