Facebook
TwitterThis dataset was created by Anna
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
ID Check is a dataset for object detection tasks - it contains Names annotations for 390 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Replication Data for "Checking how Fact-checkers Check" (Forthcoming in Research and Politics 2018)
Facebook
TwitterThis dataset includes the MIPS Data Validation Criteria. The Medicare Access and CHIP Reauthorization Act of 2015 (MACRA) streamlines a patchwork collection of programs with a single system where provider can be rewarded for better care. Providers will be able to practice as they always have, but they may receive higher Medicare payments based on their performance.
Facebook
TwitterOur location data powers the most advanced address validation solutions for enterprise backend and frontend systems.
A global, standardized, self-hosted location dataset containing all administrative divisions, cities, and zip codes for 247 countries.
All geospatial data for address data validation is updated weekly to maintain the highest data quality, including challenging countries such as China, Brazil, Russia, and the United Kingdom.
Use cases for the Address Validation at Zip Code Level Database (Geospatial data)
Address capture and address validation
Address autocomplete
Address verification
Reporting and Business Intelligence (BI)
Master Data Mangement
Logistics and Supply Chain Management
Sales and Marketing
Product Features
Dedicated features to deliver best-in-class user experience
Multi-language support including address names in local and foreign languages
Comprehensive city definitions across countries
Data export methodology
Our location data packages are offered in variable formats, including .csv. All geospatial data for address validation are optimized for seamless integration with popular systems like Esri ArcGIS, Snowflake, QGIS, and more.
Why do companies choose our location databases
Enterprise-grade service
Full control over security, speed, and latency
Reduce integration time and cost by 30%
Weekly updates for the highest quality
Seamlessly integrated into your software
Note: Custom address validation packages are available. Please submit a request via the above contact button for more details.
Facebook
TwitterThe codes and data for: “Fact-checking” fact-checkers: A data-driven approach
Facebook
TwitterValidation to ensure data and identity integrity. DAS will also ensure security compliant standards are met.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
## Overview
Check Data Detection V1 is a dataset for object detection tasks - it contains Check Data Detection V1 annotations for 279 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [Public Domain license](https://creativecommons.org/licenses/Public Domain).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
OverviewThe BuzzFeed dataset, officially known as the BuzzFeed-Webis Fake News Corpus 2016, comprises content from 9 news publishers over a 7-day period close to the 2016 US election. It was created to analyze the spread of misinformation and hyperpartisan content on social media platforms, particularly Facebook.Dataset CompositionNews Articles: The dataset includes 1,627 articles from various sources:826 from mainstream publishers256 from left-wing publishers545 from right-wing publishersFacebook Posts: Each article is associated with Facebook post data, including metrics like share counts, reaction counts, and comment counts.Comments: The dataset includes nearly 1.7 million Facebook comments discussing the news content.Fact-Check Ratings: Each article was fact-checked by professional journalists at BuzzFeed, providing veracity assessments.Key FeaturesPublisher Information: The dataset covers 9 publishers, including 6 hyperpartisan (3 left-wing and 3 right-wing) and 3 mainstream outlets.Temporal Aspect: The data was collected over seven weekdays (September 19-23 and September 26-27, 2016).Verification Status: All publishers included in the dataset had earned Facebook's blue checkmark, indicating authenticity and elevated status.Metadata: Includes various metrics such as publication dates, post types, and engagement statistics.Potential ApplicationsThe BuzzFeed dataset is valuable for various research and analytical purposes:News Veracity Assessment: Researchers can use machine learning techniques to classify articles based on their factual accuracy.Social Media Analysis: The dataset allows for studying how news spreads on platforms like Facebook, including engagement patterns.Hyperpartisan Content Study: It enables analysis of differences between mainstream and hyperpartisan news sources.Content Strategy Optimization: Media companies can use insights from the dataset to refine their content strategies.Audience Analysis: The data can be used for demographic analysis and audience segmentation.This dataset provides a comprehensive snapshot of news dissemination and engagement on social media during a crucial period, making it a valuable resource for researchers, data scientists, and media analysts studying online information ecosystems.
Facebook
TwitterScientists and engineers from the U.S. Geological Survey (USGS) Earth Resources Observation and Science Center (EROS) Cal/Val Center of Excellence (ECCOE) collected in situ measurements using field spectrometers to support the validation of surface reflectance products derived from Earth observing remote sensing imagery. Data provided in this data release were collected during select Earth observing satellite overpasses and tests during the months of May through October 2022. Data was collected at three field sites: the ground viewing radiometer (GVR) site on the USGS EROS facility in Minnehaha County, South Dakota, a private land holding near the City of Arlington in Brookings County, South Dakota, and a private land holding in Sanborn County, South Dakota. Each field collection file includes the calculated surface reflectance of each wavelength collected using a dual field spectrometer methodology. The dual field spectrometer methodology allows for the calculated surface reflectance of each wavelength to be computed using one or both of the spectrometers. The use of the dual field spectrometers system reduces uncertainty in the field measurements by accounting for changes in solar irradiance. Both single and dual spectrometer calculated surface reflectance are included with this dataset. The differing methodologies of the calculated surface reflectance data are denoted as "Single Spectrometer" and "Dual Spectrometer". Field spectrometer data are provided as Comma Separated Values (CSV) files and GeoPackage files. The 09 May 2022 and the 16 June 2022 collection data are calculated using single spectrometer only, due to a technical issue with a field spectrometer.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset captures realistic simulations of news articles and social media posts circulating between 2024–2025, labeled for potential AI-generated misinformation.
It includes 500 rows × 31 columns, combining:
- Temporal features → date, time, month, day of week
- Text-based metadata → platform, region, language, topic
- Quantitative engagement metrics → likes, shares, comments, CTR, views
- Content quality indicators → sentiment polarity, toxicity score, readability index
- Fact-checking signals → credibility source score, manual check flag, claim verification status
- Target variable → is_misinformation (0 = authentic, 1 = misinformation)
This dataset is designed for machine learning, deep learning, NLP, data visualization, and predictive analysis research.
This dataset can be applied to multiple domains:
- 🧠 Machine Learning / Deep Learning: Binary classification of misinformation
- 📊 Data Visualization: Engagement trends, regional misinformation heatmaps
- 🔍 NLP Research: Fake news detection, text classification, sentiment-based filtering
- 🌐 PhD & Academic Research: AI misinformation studies, disinformation propagation models
- 📈 Model Evaluation: Feature engineering, ROC-AUC, precision-recall tradeoff
Facebook
TwitterPursuant to the City of Chicago Municipal Code, certain banks are required to report, and the City of Chicago Comptroller is required to make public, information related to lending equity. The datasets in this series and additional information on the Department of Finance portion of the City Web site, make up that public sharing of the data. This dataset shows bank accounts at responding banks, aggregated by either ZIP Code or Census Tract. For further information applicable to all datasets in this series, please see the dataset description for Lending Equity - Residential Lending.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
New websites and smartphone applications provide easy-click checking opportunities that can help consumers in many domains. However, this technology is not always used effectively. For example, many consumers skip checking “Terms and Conditions” links even when a quick evaluation of the terms can save money, but check their smartphone while driving even thought this behavior is illegal and dangerous. Four laboratory experiments clarify the significance of one contributor to such contradictory deviations from effective checking. Studies 1, 2, and 3 show that, like basic decisions from experience, checking decisions reflect underweighting of rare events, which in turn is a sufficient condition for the coexistence of insufficient and too much checking. Insufficient checking emerges when most checking efforts impair performance even if checking is effective on average. Too much checking emerges when most checking clicks are rewarding even if checking is counterproductive on average. This pattern can be captured with a model that assumes reliance on small samples of past checking decision experiences. Study 4 shows that when the goal is to increase checking, interventions which increase the probability that checking leads to the best possible outcome can be far more effective than efforts to reduce the cost of checking.
Facebook
TwitterData from EX46 returns on establishing alcohol and tobacco traders activities and revenue liabilities. Updated: ad hoc. Data coverage: 2001/02, 2002/03, 2003/04, 2004/05, 2005/06, 2006/07, 2007/08, 1998/99, 1999/00, 2000/01, 1996/97, 1997/98
Facebook
TwitterCFD Validation of Synthetic Jets and Turbulent Separation Control. This web page provides data from experiments that may be useful for the validation of turbulence models. This resource is expected to grow gradually over time. All data herein arepublicly available.
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This is some research data or software published by PUB.
Facebook
TwitterThis dataset was created by Jawad Khan
Incorrect and Correct Spellings Dataset.
Facebook
TwitterThe goal of the SHRP 2 Project L33 Validation of Urban Freeway Models was to assess and enhance the predictive travel time reliability models developed in the SHRP 2 Project L03, Analytic Procedures for Determining the Impacts of Reliability Mitigation Strategies. SHRP 2 Project L03, which concluded in 2010, developed two categories of reliability models to be used for the estimation or prediction of travel time reliability within planning, programming, and systems management contexts: data-rich and data-poor models. The objectives of Project L33 were the following: • The first was to validate the most important models – the “Data Poor” and “Data Rich” models with new datasets. • The second objective was to assess the validation outcomes to recommend potential enhancements. • The third was to explore enhancements and develop a final set of predictive equations. • The fourth was to validate the enhanced models. • The last was to develop a clear set of application guidelines for practitioner use of the project outputs. The datasets in these 5 zip files are in support of SHRP 2 Report S2-L33-RW-1, Validation of Urban Freeway Models, https://rosap.ntl.bts.gov/view/dot/3604 The 5 zip files contain a total of 60 comma separated value (.csv) files. The compressed zip files total 3.8 GB in size. The files have been uploaded as-is; no further documentation was supplied. These files can be unzipped using any zip compression/decompression software. The files can be read in any simple text editor. [software requirements] Note: Data files larger than 1GB each. Direct data download links: L03-01: https://doi.org/10.21949/1500858 L03-02: https://doi.org/10.21949/1500868 L03-03: https://doi.org/10.21949/1500869 L03-04: https://doi.org/10.21949/1500870 L03-05: https://doi.org/10.21949/1500871
Facebook
TwitterBy downloading the data, you agree with the terms & conditions mentioned below:
Data Access: The data in the research collection may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use them only for research purposes.
Summaries, analyses and interpretations of the linguistic properties of the information may be derived and published, provided it is impossible to reconstruct the information from these summaries. You may not try identifying the individuals whose texts are included in this dataset. You may not try to identify the original entry on the fact-checking site. You are not permitted to publish any portion of the dataset besides summary statistics or share it with anyone else.
We grant you the right to access the collection's content as described in this agreement. You may not otherwise make unauthorised commercial use of, reproduce, prepare derivative works, distribute copies, perform, or publicly display the collection or parts of it. You are responsible for keeping and storing the data in a way that others cannot access. The data is provided free of charge.
Citation
Please cite our work as
@InProceedings{clef-checkthat:2022:task3,
author = {K{\"o}hler, Juliane and Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Wiegand, Michael and Siegel, Melanie and Mandl, Thomas},
title = "Overview of the {CLEF}-2022 {CheckThat}! Lab Task 3 on Fake News Detection",
year = {2022},
booktitle = "Working Notes of CLEF 2022---Conference and Labs of the Evaluation Forum",
series = {CLEF~'2022},
address = {Bologna, Italy},}
@article{shahi2021overview,
title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection},
author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas},
journal={Working Notes of CLEF},
year={2021}
}
Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English and German.
Task 3: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. The training data will be released in batches and roughly about 1264 articles with the respective label in English language. Our definitions for the categories are as follows:
False - The main claim made in an article is untrue.
Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.
True - This rating indicates that the primary elements of the main claim are demonstrably true.
Other- An article that cannot be categorised as true, false, or partially false due to a lack of evidence about its claims. This category includes articles in dispute and unproven articles.
Cross-Lingual Task (German)
Along with the multi-class task for the English language, we have introduced a task for low-resourced language. We will provide the data for the test in the German language. The idea of the task is to use the English data and the concept of transfer to build a classification model for the German language.
Input Data
The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:
Output data format
Sample File
public_id, predicted_rating
1, false
2, true
IMPORTANT!
Baseline: For this task, we have created a baseline system. The baseline system can be found at https://zenodo.org/record/6362498
Related Work
Facebook
TwitterVerification and validation (V&V) has been identified as a critical phase in fielding systems with Integrated Systems Health Management (ISHM) solutions to ensure that the results produced are robust, reliable, and can confidently inform about vehicle and system health status and to support operational and maintenance decisions. Prognostics is a key constituent within ISHM. It faces unique challenges for V&V since it informs about the future behavior of a component or subsystem. In this paper, we present a detailed review of identified barriers and solutions to prognostics V&V, and a novel methodological way for the organization and application of this knowledge. We discuss these issues within the context of a prognostics application for the ground support equipment of space vehicle propellant loading, and identify the significant barriers and adopted solution for this application.
Facebook
TwitterThis dataset was created by Anna