100+ datasets found
  1. give us the data validation test set

    • kaggle.com
    zip
    Updated Apr 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anna (2021). give us the data validation test set [Dataset]. https://www.kaggle.com/annatmp/give-us-the-data-validation-test-set
    Explore at:
    zip(439562080 bytes)Available download formats
    Dataset updated
    Apr 23, 2021
    Authors
    Anna
    Description

    Dataset

    This dataset was created by Anna

    Contents

  2. R

    Data from: Id Check Dataset

    • universe.roboflow.com
    zip
    Updated Jul 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SOIL (2024). Id Check Dataset [Dataset]. https://universe.roboflow.com/soil-empbg/id-check-r7v5n
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 15, 2024
    Dataset authored and provided by
    SOIL
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Names Bounding Boxes
    Description

    ID Check

    ## Overview
    
    ID Check is a dataset for object detection tasks - it contains Names annotations for 390 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  3. H

    Data from: Checking how Fact-checkers Check

    • dataverse.harvard.edu
    Updated May 29, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chloe Lim (2018). Checking how Fact-checkers Check [Dataset]. http://doi.org/10.7910/DVN/CUPPCQ
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 29, 2018
    Dataset provided by
    Harvard Dataverse
    Authors
    Chloe Lim
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Replication Data for "Checking how Fact-checkers Check" (Forthcoming in Research and Politics 2018)

  4. MIPS Data Validation Criteria

    • johnsnowlabs.com
    csv
    Updated Jan 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs (2021). MIPS Data Validation Criteria [Dataset]. https://www.johnsnowlabs.com/marketplace/mips-data-validation-criteria/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 20, 2021
    Dataset authored and provided by
    John Snow Labs
    Time period covered
    2017 - 2020
    Area covered
    United States
    Description

    This dataset includes the MIPS Data Validation Criteria. The Medicare Access and CHIP Reauthorization Act of 2015 (MACRA) streamlines a patchwork collection of programs with a single system where provider can be rewarded for better care. Providers will be able to practice as they always have, but they may receive higher Medicare payments based on their performance.

  5. Address & ZIP Validation Dataset | Mobility Data | Geospatial Checks +...

    • datarade.ai
    .csv
    Updated May 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GeoPostcodes (2024). Address & ZIP Validation Dataset | Mobility Data | Geospatial Checks + Coverage Flags (Global) [Dataset]. https://datarade.ai/data-products/geopostcodes-geospatial-data-zip-code-data-address-vali-geopostcodes
    Explore at:
    .csvAvailable download formats
    Dataset updated
    May 17, 2024
    Dataset authored and provided by
    GeoPostcodes
    Area covered
    Bolivia (Plurinational State of), Cabo Verde, Mongolia, Ireland, Kazakhstan, South Africa, Sint Maarten (Dutch part), Colombia, Korea (Republic of), French Guiana
    Description

    Our location data powers the most advanced address validation solutions for enterprise backend and frontend systems.

    A global, standardized, self-hosted location dataset containing all administrative divisions, cities, and zip codes for 247 countries.

    All geospatial data for address data validation is updated weekly to maintain the highest data quality, including challenging countries such as China, Brazil, Russia, and the United Kingdom.

    Use cases for the Address Validation at Zip Code Level Database (Geospatial data)

    • Address capture and address validation

    • Address autocomplete

    • Address verification

    • Reporting and Business Intelligence (BI)

    • Master Data Mangement

    • Logistics and Supply Chain Management

    • Sales and Marketing

    Product Features

    • Dedicated features to deliver best-in-class user experience

    • Multi-language support including address names in local and foreign languages

    • Comprehensive city definitions across countries

    Data export methodology

    Our location data packages are offered in variable formats, including .csv. All geospatial data for address validation are optimized for seamless integration with popular systems like Esri ArcGIS, Snowflake, QGIS, and more.

    Why do companies choose our location databases

    • Enterprise-grade service

    • Full control over security, speed, and latency

    • Reduce integration time and cost by 30%

    • Weekly updates for the highest quality

    • Seamlessly integrated into your software

    Note: Custom address validation packages are available. Please submit a request via the above contact button for more details.

  6. d

    Replication Data for: “Fact-checking” fact-checkers: A data-driven approach

    • search.dataone.org
    • dataverse.harvard.edu
    • +1more
    Updated Jan 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lee, Sian (2024). Replication Data for: “Fact-checking” fact-checkers: A data-driven approach [Dataset]. http://doi.org/10.7910/DVN/FXYZDT
    Explore at:
    Dataset updated
    Jan 26, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Lee, Sian
    Description

    The codes and data for: “Fact-checking” fact-checkers: A data-driven approach

  7. Validation

    • catalog.data.gov
    • datahub.va.gov
    • +3more
    Updated Nov 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Veterans Affairs (2020). Validation [Dataset]. https://catalog.data.gov/dataset/validation
    Explore at:
    Dataset updated
    Nov 10, 2020
    Dataset provided by
    United States Department of Veterans Affairshttp://va.gov/
    Description

    Validation to ensure data and identity integrity. DAS will also ensure security compliant standards are met.

  8. R

    Check Data Detection V1 Dataset

    • universe.roboflow.com
    zip
    Updated Jun 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Altimetrik (2024). Check Data Detection V1 Dataset [Dataset]. https://universe.roboflow.com/altimetrik-epwql/check-data-detection-v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 30, 2024
    Dataset authored and provided by
    Altimetrik
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Variables measured
    Check Data Detection V1 Bounding Boxes
    Description

    Check Data Detection V1

    ## Overview
    
    Check Data Detection V1 is a dataset for object detection tasks - it contains Check Data Detection V1 annotations for 279 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [Public Domain license](https://creativecommons.org/licenses/Public Domain).
    
  9. facebook fact checking dataset

    • figshare.com
    csv
    Updated Nov 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mehdi khalil (2024). facebook fact checking dataset [Dataset]. http://doi.org/10.6084/m9.figshare.27645690.v2
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 11, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    mehdi khalil
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    OverviewThe BuzzFeed dataset, officially known as the BuzzFeed-Webis Fake News Corpus 2016, comprises content from 9 news publishers over a 7-day period close to the 2016 US election. It was created to analyze the spread of misinformation and hyperpartisan content on social media platforms, particularly Facebook.Dataset CompositionNews Articles: The dataset includes 1,627 articles from various sources:826 from mainstream publishers256 from left-wing publishers545 from right-wing publishersFacebook Posts: Each article is associated with Facebook post data, including metrics like share counts, reaction counts, and comment counts.Comments: The dataset includes nearly 1.7 million Facebook comments discussing the news content.Fact-Check Ratings: Each article was fact-checked by professional journalists at BuzzFeed, providing veracity assessments.Key FeaturesPublisher Information: The dataset covers 9 publishers, including 6 hyperpartisan (3 left-wing and 3 right-wing) and 3 mainstream outlets.Temporal Aspect: The data was collected over seven weekdays (September 19-23 and September 26-27, 2016).Verification Status: All publishers included in the dataset had earned Facebook's blue checkmark, indicating authenticity and elevated status.Metadata: Includes various metrics such as publication dates, post types, and engagement statistics.Potential ApplicationsThe BuzzFeed dataset is valuable for various research and analytical purposes:News Veracity Assessment: Researchers can use machine learning techniques to classify articles based on their factual accuracy.Social Media Analysis: The dataset allows for studying how news spreads on platforms like Facebook, including engagement patterns.Hyperpartisan Content Study: It enables analysis of differences between mainstream and hyperpartisan news sources.Content Strategy Optimization: Media companies can use insights from the dataset to refine their content strategies.Audience Analysis: The data can be used for demographic analysis and audience segmentation.This dataset provides a comprehensive snapshot of news dissemination and engagement on social media during a crucial period, making it a valuable resource for researchers, data scientists, and media analysts studying online information ecosystems.

  10. d

    ECCOE 2022 Surface Reflectance Validation Dataset

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). ECCOE 2022 Surface Reflectance Validation Dataset [Dataset]. https://catalog.data.gov/dataset/eccoe-2022-surface-reflectance-validation-dataset
    Explore at:
    Dataset updated
    Nov 20, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    Scientists and engineers from the U.S. Geological Survey (USGS) Earth Resources Observation and Science Center (EROS) Cal/Val Center of Excellence (ECCOE) collected in situ measurements using field spectrometers to support the validation of surface reflectance products derived from Earth observing remote sensing imagery. Data provided in this data release were collected during select Earth observing satellite overpasses and tests during the months of May through October 2022. Data was collected at three field sites: the ground viewing radiometer (GVR) site on the USGS EROS facility in Minnehaha County, South Dakota, a private land holding near the City of Arlington in Brookings County, South Dakota, and a private land holding in Sanborn County, South Dakota. Each field collection file includes the calculated surface reflectance of each wavelength collected using a dual field spectrometer methodology. The dual field spectrometer methodology allows for the calculated surface reflectance of each wavelength to be computed using one or both of the spectrometers. The use of the dual field spectrometers system reduces uncertainty in the field measurements by accounting for changes in solar irradiance. Both single and dual spectrometer calculated surface reflectance are included with this dataset. The differing methodologies of the calculated surface reflectance data are denoted as "Single Spectrometer" and "Dual Spectrometer". Field spectrometer data are provided as Comma Separated Values (CSV) files and GeoPackage files. The 09 May 2022 and the 16 June 2022 collection data are calculated using single spectrometer only, due to a technical issue with a field spectrometer.

  11. Gen AI Misinformation Detection Data (2024–2025)

    • kaggle.com
    zip
    Updated Sep 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Atharva Soundankar (2025). Gen AI Misinformation Detection Data (2024–2025) [Dataset]. https://www.kaggle.com/datasets/atharvasoundankar/gen-ai-misinformation-detection-datase-20242025
    Explore at:
    zip(32023 bytes)Available download formats
    Dataset updated
    Sep 23, 2025
    Authors
    Atharva Soundankar
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset captures realistic simulations of news articles and social media posts circulating between 2024–2025, labeled for potential AI-generated misinformation.

    It includes 500 rows × 31 columns, combining:
    - Temporal features → date, time, month, day of week
    - Text-based metadata → platform, region, language, topic
    - Quantitative engagement metrics → likes, shares, comments, CTR, views
    - Content quality indicators → sentiment polarity, toxicity score, readability index
    - Fact-checking signals → credibility source score, manual check flag, claim verification status
    - Target variableis_misinformation (0 = authentic, 1 = misinformation)

    This dataset is designed for machine learning, deep learning, NLP, data visualization, and predictive analysis research.

    🎯 Use Cases

    This dataset can be applied to multiple domains:
    - 🧠 Machine Learning / Deep Learning: Binary classification of misinformation
    - 📊 Data Visualization: Engagement trends, regional misinformation heatmaps
    - 🔍 NLP Research: Fake news detection, text classification, sentiment-based filtering
    - 🌐 PhD & Academic Research: AI misinformation studies, disinformation propagation models
    - 📈 Model Evaluation: Feature engineering, ROC-AUC, precision-recall tradeoff

  12. d

    Lending Equity - Savings and Checking Accounts

    • catalog.data.gov
    • data.cityofchicago.org
    Updated Dec 13, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofchicago.org (2024). Lending Equity - Savings and Checking Accounts [Dataset]. https://catalog.data.gov/dataset/lending-equity-savings-and-checking-accounts
    Explore at:
    Dataset updated
    Dec 13, 2024
    Dataset provided by
    data.cityofchicago.org
    Description

    Pursuant to the City of Chicago Municipal Code, certain banks are required to report, and the City of Chicago Comptroller is required to make public, information related to lending equity. The datasets in this series and additional information on the Department of Finance portion of the City Web site, make up that public sharing of the data. This dataset shows bank accounts at responding banks, aggregated by either ZIP Code or Census Tract. For further information applicable to all datasets in this series, please see the dataset description for Lending Equity - Residential Lending.

  13. n

    Data from: Click or skip: the role of experience in easy-click checking...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Aug 23, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yefim Roth; Michaela Wänke; Ido Erev (2017). Click or skip: the role of experience in easy-click checking decisions [Dataset]. http://doi.org/10.5061/dryad.g3pj8
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 23, 2017
    Authors
    Yefim Roth; Michaela Wänke; Ido Erev
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    New websites and smartphone applications provide easy-click checking opportunities that can help consumers in many domains. However, this technology is not always used effectively. For example, many consumers skip checking “Terms and Conditions” links even when a quick evaluation of the terms can save money, but check their smartphone while driving even thought this behavior is illegal and dangerous. Four laboratory experiments clarify the significance of one contributor to such contradictory deviations from effective checking. Studies 1, 2, and 3 show that, like basic decisions from experience, checking decisions reflect underweighting of rare events, which in turn is a sufficient condition for the coexistence of insufficient and too much checking. Insufficient checking emerges when most checking efforts impair performance even if checking is effective on average. Too much checking emerges when most checking clicks are rewarding even if checking is counterproductive on average. This pattern can be captured with a model that assumes reliance on small samples of past checking decision experiences. Study 4 shows that when the goal is to increase checking, interventions which increase the probability that checking leads to the best possible outcome can be far more effective than efforts to reduce the cost of checking.

  14. EX46 Credibility Check data - Dataset - data.gov.uk

    • ckan.publishing.service.gov.uk
    Updated Nov 5, 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2013). EX46 Credibility Check data - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/ex46-credibility-check-data_1
    Explore at:
    Dataset updated
    Nov 5, 2013
    Dataset provided by
    CKANhttps://ckan.org/
    Description

    Data from EX46 returns on establishing alcohol and tobacco traders activities and revenue liabilities. Updated: ad hoc. Data coverage: 2001/02, 2002/03, 2003/04, 2004/05, 2005/06, 2006/07, 2007/08, 1998/99, 1999/00, 2000/01, 1996/97, 1997/98

  15. Turbulence Models: Data from Other Experiments: CFD Validation of Synthetic...

    • data.nasa.gov
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Turbulence Models: Data from Other Experiments: CFD Validation of Synthetic Jets and Turbulent Separation Control - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/turbulence-models-data-from-other-experiments-cfd-validation-of-synthetic-jets-and-turbule
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    CFD Validation of Synthetic Jets and Turbulent Separation Control. This web page provides data from experiments that may be useful for the validation of turbulence models. This resource is expected to grow gradually over time. All data herein arepublicly available.

  16. u

    Data from: Data for the paper: An empirical validation protocol for...

    • pub.uni-bielefeld.de
    • pub-dev.ub.uni-bielefeld.de
    Updated May 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sander van der Hoog; Sylvain Barde (2024). Data for the paper: An empirical validation protocol for large-scale agent-based models [Dataset]. https://pub.uni-bielefeld.de/record/2908396
    Explore at:
    Dataset updated
    May 28, 2024
    Authors
    Sander van der Hoog; Sylvain Barde
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    This is some research data or software published by PUB.

  17. Data-For-spell-check

    • kaggle.com
    zip
    Updated Jul 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jawad Khan (2024). Data-For-spell-check [Dataset]. https://www.kaggle.com/datasets/jawadkhan65/data-for-spell-check
    Explore at:
    zip(137182 bytes)Available download formats
    Dataset updated
    Jul 14, 2024
    Authors
    Jawad Khan
    Description

    Dataset

    This dataset was created by Jawad Khan

    Contents

    Incorrect and Correct Spellings Dataset.

  18. Data from: Validation of Urban Freeway Models [supporting datasets]

    • catalog.data.gov
    • data.bts.gov
    • +3more
    Updated Dec 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federal Highway Administration (2023). Validation of Urban Freeway Models [supporting datasets] [Dataset]. https://catalog.data.gov/dataset/validation-of-urban-freeway-models-supporting-datasets
    Explore at:
    Dataset updated
    Dec 7, 2023
    Dataset provided by
    Federal Highway Administrationhttps://highways.dot.gov/
    Description

    The goal of the SHRP 2 Project L33 Validation of Urban Freeway Models was to assess and enhance the predictive travel time reliability models developed in the SHRP 2 Project L03, Analytic Procedures for Determining the Impacts of Reliability Mitigation Strategies. SHRP 2 Project L03, which concluded in 2010, developed two categories of reliability models to be used for the estimation or prediction of travel time reliability within planning, programming, and systems management contexts: data-rich and data-poor models. The objectives of Project L33 were the following: • The first was to validate the most important models – the “Data Poor” and “Data Rich” models with new datasets. • The second objective was to assess the validation outcomes to recommend potential enhancements. • The third was to explore enhancements and develop a final set of predictive equations. • The fourth was to validate the enhanced models. • The last was to develop a clear set of application guidelines for practitioner use of the project outputs. The datasets in these 5 zip files are in support of SHRP 2 Report S2-L33-RW-1, Validation of Urban Freeway Models, https://rosap.ntl.bts.gov/view/dot/3604 The 5 zip files contain a total of 60 comma separated value (.csv) files. The compressed zip files total 3.8 GB in size. The files have been uploaded as-is; no further documentation was supplied. These files can be unzipped using any zip compression/decompression software. The files can be read in any simple text editor. [software requirements] Note: Data files larger than 1GB each. Direct data download links: L03-01: https://doi.org/10.21949/1500858 L03-02: https://doi.org/10.21949/1500868 L03-03: https://doi.org/10.21949/1500869 L03-04: https://doi.org/10.21949/1500870 L03-05: https://doi.org/10.21949/1500871

  19. CT-FAN: A Multilingual dataset for Fake News Detection

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Oct 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Juliane Köhler; Michael Wiegand; Melanie Siegel; Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Juliane Köhler; Michael Wiegand; Melanie Siegel (2022). CT-FAN: A Multilingual dataset for Fake News Detection [Dataset]. http://doi.org/10.5281/zenodo.6555293
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 23, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Juliane Köhler; Michael Wiegand; Melanie Siegel; Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Juliane Köhler; Michael Wiegand; Melanie Siegel
    Description

    By downloading the data, you agree with the terms & conditions mentioned below:

    Data Access: The data in the research collection may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use them only for research purposes.

    Summaries, analyses and interpretations of the linguistic properties of the information may be derived and published, provided it is impossible to reconstruct the information from these summaries. You may not try identifying the individuals whose texts are included in this dataset. You may not try to identify the original entry on the fact-checking site. You are not permitted to publish any portion of the dataset besides summary statistics or share it with anyone else.

    We grant you the right to access the collection's content as described in this agreement. You may not otherwise make unauthorised commercial use of, reproduce, prepare derivative works, distribute copies, perform, or publicly display the collection or parts of it. You are responsible for keeping and storing the data in a way that others cannot access. The data is provided free of charge.

    Citation

    Please cite our work as

    @InProceedings{clef-checkthat:2022:task3,
    author = {K{\"o}hler, Juliane and Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Wiegand, Michael and Siegel, Melanie and Mandl, Thomas},
    title = "Overview of the {CLEF}-2022 {CheckThat}! Lab Task 3 on Fake News Detection",
    year = {2022},
    booktitle = "Working Notes of CLEF 2022---Conference and Labs of the Evaluation Forum",
    series = {CLEF~'2022},
    address = {Bologna, Italy},}
    
    @article{shahi2021overview,
     title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection},
     author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas},
     journal={Working Notes of CLEF},
     year={2021}
    }

    Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English and German.

    Task 3: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. The training data will be released in batches and roughly about 1264 articles with the respective label in English language. Our definitions for the categories are as follows:

    • False - The main claim made in an article is untrue.

    • Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.

    • True - This rating indicates that the primary elements of the main claim are demonstrably true.

    • Other- An article that cannot be categorised as true, false, or partially false due to a lack of evidence about its claims. This category includes articles in dispute and unproven articles.

    Cross-Lingual Task (German)

    Along with the multi-class task for the English language, we have introduced a task for low-resourced language. We will provide the data for the test in the German language. The idea of the task is to use the English data and the concept of transfer to build a classification model for the German language.

    Input Data

    The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:

    • ID- Unique identifier of the news article
    • Title- Title of the news article
    • text- Text mentioned inside the news article
    • our rating - class of the news article as false, partially false, true, other

    Output data format

    • public_id- Unique identifier of the news article
    • predicted_rating- predicted class

    Sample File

    public_id, predicted_rating
    1, false
    2, true

    IMPORTANT!

    1. We have used the data from 2010 to 2022, and the content of fake news is mixed up with several topics like elections, COVID-19 etc.

    Baseline: For this task, we have created a baseline system. The baseline system can be found at https://zenodo.org/record/6362498

    Related Work

    • Shahi GK. AMUSED: An Annotation Framework of Multi-modal Social Media Data. arXiv preprint arXiv:2010.00502. 2020 Oct 1.https://arxiv.org/pdf/2010.00502.pdf
    • G. K. Shahi and D. Nandini, “FakeCovid – a multilingual cross-domain fact check news dataset for covid-19,” in workshop Proceedings of the 14th International AAAI Conference on Web and Social Media, 2020. http://workshop-proceedings.icwsm.org/abstract?id=2020_14
    • Shahi, G. K., Dirkson, A., & Majchrzak, T. A. (2021). An exploratory study of covid-19 misinformation on twitter. Online Social Networks and Media, 22, 100104. doi: 10.1016/j.osnem.2020.100104
    • Shahi, G. K., Struß, J. M., & Mandl, T. (2021). Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection. Working Notes of CLEF.
    • Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeno, A., Míguez, R., Shaar, S., ... & Mandl, T. (2021, March). The CLEF-2021 CheckThat! lab on detecting check-worthy claims, previously fact-checked claims, and fake news. In European Conference on Information Retrieval (pp. 639-649). Springer, Cham.
    • Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeño, A., Míguez, R., Shaar, S., ... & Kartal, Y. S. (2021, September). Overview of the CLEF–2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 264-291). Springer, Cham.
  20. Data from: Tackling Verification and Validation for Prognostics

    • data.nasa.gov
    • gimi9.com
    • +2more
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Tackling Verification and Validation for Prognostics [Dataset]. https://data.nasa.gov/dataset/tackling-verification-and-validation-for-prognostics
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    Verification and validation (V&V) has been identified as a critical phase in fielding systems with Integrated Systems Health Management (ISHM) solutions to ensure that the results produced are robust, reliable, and can confidently inform about vehicle and system health status and to support operational and maintenance decisions. Prognostics is a key constituent within ISHM. It faces unique challenges for V&V since it informs about the future behavior of a component or subsystem. In this paper, we present a detailed review of identified barriers and solutions to prognostics V&V, and a novel methodological way for the organization and application of this knowledge. We discuss these issues within the context of a prognostics application for the ground support equipment of space vehicle propellant loading, and identify the significant barriers and adopted solution for this application.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Anna (2021). give us the data validation test set [Dataset]. https://www.kaggle.com/annatmp/give-us-the-data-validation-test-set
Organization logo

give us the data validation test set

Explore at:
zip(439562080 bytes)Available download formats
Dataset updated
Apr 23, 2021
Authors
Anna
Description

Dataset

This dataset was created by Anna

Contents

Search
Clear search
Close search
Google apps
Main menu