Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
CommonForms: A Large, Diverse Dataset for Form Field Detection
This repository hosts the CommonForms dataset, a web-scale dataset for form field detection, introduced in the paper CommonForms: A Large, Diverse Dataset for Form Field Detection. CommonForms casts the problem of form field detection as object detection: given an image of a page, predict the location and type (Text Input, Choice Button, Signature) of form fields. Key Features:
Scale: Roughly 55,000 documents comprising… See the full description on the dataset page: https://huggingface.co/datasets/jbarrow/CommonForms.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The submitted data relate to sections 2.3 and 2.4 of: H. Moisl (2022) Dynamical systems implementation of intrinsic sentence meaning, Minds and Machines 32 (2022), which describe the processing architecture of the model of intrinsic sentence meaning proposed there. Six separate programs are used to generate the results presented in the article, whose interrelationships are described in the above-cited sections. The paper with which the data are associated proposes a model for implementation of intrinsic natural language sentence meaning in a physical language understanding system, where 'intrinsic' is understood as 'independent of meaning ascription by system-external observers'. The proposal is that intrinsic meaning can be implemented as a point attractor in the state space of a nonlinear dynamical system with feedback which is generated by temporally sequenced inputs. It is motivated by John Searle's well known (1980) critique of the then-standard and currently still influential Computational Theory of Mind (CTM), the essence of which was that CTM representations lack intrinsic meaning because that meaning is dependent on ascription by an observer. The proposed dynamical model comprises a collection of interacting artificial neural networks, and constitutes a radical simplification of the principle of compositional phrase structure which is at the heart of the current standard view of sentence semantics because it is computationally interpretable as a finite state machine.
Facebook
TwitterSuccess.ai offers a comprehensive, enterprise-ready B2B leads data solution, ideal for businesses seeking access to over 150 million verified employee profiles and 170 million work emails. Our data empowers organizations across industries to target key decision-makers, optimize recruitment, and fuel B2B marketing efforts. Whether you're looking for UK B2B data, B2B marketing data, or global B2B contact data, Success.ai provides the insights you need with pinpoint accuracy.
Tailored for B2B Sales, Marketing, Recruitment and more: Our B2B contact data and B2B email data solutions are designed to enhance your lead generation, sales, and recruitment efforts. Build hyper-targeted lists based on job title, industry, seniority, and geographic location. Whether you’re reaching mid-level professionals or C-suite executives, Success.ai delivers the data you need to connect with the right people.
Key Categories Served: B2B sales leads – Identify decision-makers in key industries, B2B marketing data – Target professionals for your marketing campaigns, Recruitment data – Source top talent efficiently and reduce hiring times, CRM enrichment – Update and enhance your CRM with verified, updated data, Global reach – Coverage across 195 countries, including the United States, United Kingdom, Germany, India, Singapore, and more.
Global Coverage with Real-Time Accuracy: Success.ai’s dataset spans a wide range of industries such as technology, finance, healthcare, and manufacturing. With continuous real-time updates, your team can rely on the most accurate data available: 150M+ Employee Profiles: Access professional profiles worldwide with insights including full name, job title, seniority, and industry. 170M Verified Work Emails: Reach decision-makers directly with verified work emails, available across industries and geographies, including Singapore and UK B2B data. GDPR-Compliant: Our data is fully compliant with GDPR and other global privacy regulations, ensuring safe and legal use of B2B marketing data.
Key Data Points for Every Employee Profile: Every profile in Success.ai’s database includes over 20 critical data points, providing the information needed to power B2B sales and marketing campaigns: Full Name, Job Title, Company, Work Email, Location, Phone Number, LinkedIn Profile, Experience, Education, Technographic Data, Languages, Certifications, Industry, Publications & Awards.
Use Cases Across Industries: Success.ai’s B2B data solution is incredibly versatile and can support various enterprise use cases, including: B2B Marketing Campaigns: Reach high-value professionals in industries such as technology, finance, and healthcare. Enterprise Sales Outreach: Build targeted B2B contact lists to improve sales efforts and increase conversions. Talent Acquisition: Accelerate hiring by sourcing top talent with accurate and updated employee data, filtered by job title, industry, and location. Market Research: Gain insights into employment trends and company profiles to enrich market research. CRM Data Enrichment: Ensure your CRM stays accurate by integrating updated B2B contact data. Event Targeting: Create lists for webinars, conferences, and product launches by targeting professionals in key industries.
Use Cases for Success.ai's Contact Data - Targeted B2B Marketing: Create precise campaigns by targeting key professionals in industries like tech and finance. - Sales Outreach: Build focused sales lists of decision-makers and C-suite executives for faster deal cycles. - Recruiting Top Talent: Easily find and hire qualified professionals with updated employee profiles. - CRM Enrichment: Keep your CRM current with verified, accurate employee data. - Event Targeting: Create attendee lists for events by targeting relevant professionals in key sectors. - Market Research: Gain insights into employment trends and company profiles for better business decisions. - Executive Search: Source senior executives and leaders for headhunting and recruitment. - Partnership Building: Find the right companies and key people to develop strategic partnerships.
Why Choose Success.ai’s Employee Data? Success.ai is the top choice for enterprises looking for comprehensive and affordable B2B data solutions. Here’s why: Unmatched Accuracy: Our AI-powered validation process ensures 99% accuracy across all data points, resulting in higher engagement and fewer bounces. Global Scale: With 150M+ employee profiles and 170M verified work emails, Success.ai provides extensive coverage for UK B2B data, B2B marketing data, and global contacts. Competitive Pricing: We offer the most competitive rates on the market, undercutting major competitors like Lusha, Cognism, and ZoomInfo. Tailored Solutions: Our white-glove service ensures we deliver exactly what you need, in the format that suits your workflow (CSV, Excel, etc.). Real-Time Updates: Our data is continuously updated, so you always have the latest information, unlike static da...
Facebook
Twitterhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/22950https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/22950
The data contained in these files (one in Excel, the other in JSON format) consists of full text download numbers through the ADS during the year 2010. Every row is a journal, indicated by the journal name and the ADS abbreviation ("bibstem", see: http://adsabs.harvard.edu/abs_doc/journals2.html). For each journal, we present the download numbers split up by publication year (with the first data column being the range "pre 1998"). Full text downloads within the ADS service are defined as 'clicks' on either of the links within an ADS record that provide access to full text in one form or other. Specifically, these are the 'E', 'F', 'L', 'G' or 'X' links (see http://doc.adsabs.harvard.edu/abs_doc/help_pages/results.html#List_of_Links definitions). The data contained in these files had been released under the CC-BY License (see: http://creativecommons.org/licenses/by/3.0/us/). Please acknowledge the ADS in a publication that makes us of these data by the phrase: ``This research has made use of NASA's Astrophysics Data System."
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
A fully synthetic dataset simulating real-world medical billing scenarios, including claim status, denials, team allocation, and AR follow-up logic.
This dataset represents a synthetic Account Receivable (AR) data model for medical billing, created using realistic healthcare revenue cycle management (RCM) workflows. It is designed for data analysis, machine learning modeling, automation testing, and process simulation in the healthcare billing domain.
The dataset includes realistic business logic, mimicking the actual process of claim submission, denial management, follow-ups, and payment tracking. This is especially useful for: ✔ Medical billing training ✔ Predictive modeling (claim outcomes, denial prediction, payment forecasting) ✔ RCM process automation and AI research ✔ Data visualization and dashboard creation
✅ Patient & Claim Information:
XXXXXZXXXXXXToday - DOS0-30, 31-60, 61-90, 91-120, 120+✅ Claim Status & Denial Logic:
Dx inconsistent with CPT)Need Coding Assistance)Team Allocation: Based on denial type
Coding TeamBilling TeamPayment Team✅ Realistic Denial Scenarios Covered:
✅ Other Important Columns:
| Column Name | Description |
|---|---|
| Client | Name of the client/provider |
| State | US State where service provided |
| Visit ID# | Unique alphanumeric ID (XXXXXZXXXXXX) |
| Patient Name | Patient’s full name |
| DOS | Date of Service (MM/DD/YYYY) |
| Aging Days | Days from DOS to today |
| Aging Bucket | Aging category |
| Claim Amount | Original claim billed |
| Paid Amount | Amount paid so far |
| Balance | Remaining balance |
| Status | Initial claim status (No Response, Paid, etc.) |
| Status Code | Actual reason (e.g., Dx inconsistent with CPT) |
| Action Code | Next step (e.g., Need Coding Assistance) |
| Team Allocation | Responsible team (Coding, Billing, Payment) |
| Notes | Follow-up notes |
XXXXXZXXXXXX formatDenial Workflow:
Payments: Realistic logic where payment may be partial, full, or none
Insurance Flow: Balance moves from primary → secondary → tertiary → patient responsibility
CC BY 4.0 – Free to use, modify, and share with attribution.
Facebook
TwitterTexas Parks and Wildlife Department boat registration data access form
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
DFID publish the following information on a monthly basis: Details on international development projects including budgets and financial transactions. Information on sectors, geographical location (including sub-national geolocations). All core project documentation, including business cases, annual reviews, completion reports, and evaluations. In addition we will publish country plans and formal agreements with partners. Project summaries are published and translated into major local languages of the relevant countries. Further data is being gathered on sub-national locations and results, and will be released as it becomes available. The data is published in an open, accessible format, using the International Aid Transparency Initiative (IATI) open standard, in XML format. Data is also presented in a more visual form on the Development Tracker. A small number of exclusions will apply to sensitive information, based on the key principles of the UK’s Freedom of Information Act.
Facebook
TwitterAccess B2B Contact Data for North American Small Business Owners with Success.ai—your go-to provider for verified, high-quality business datasets. This dataset is tailored for businesses, agencies, and professionals seeking direct access to decision-makers within the small business ecosystem across North America. With over 170 million professional profiles, it’s an unparalleled resource for powering your marketing, sales, and lead generation efforts.
Key Features of the Dataset:
Verified Contact Details
Includes accurate and up-to-date email addresses and phone numbers to ensure you reach your targets reliably.
AI-validated for 99% accuracy, eliminating errors and reducing wasted efforts.
Detailed Professional Insights
Comprehensive data points include job titles, skills, work experience, and education to enable precise segmentation and targeting.
Enriched with insights into decision-making roles, helping you connect directly with small business owners, CEOs, and other key stakeholders.
Business-Specific Information
Covers essential details such as industry, company size, location, and more, enabling you to tailor your campaigns effectively. Ideal for profiling and understanding the unique needs of small businesses.
Continuously Updated Data
Our dataset is maintained and updated regularly to ensure relevance and accuracy in fast-changing market conditions. New business contacts are added frequently, helping you stay ahead of the competition.
Why Choose Success.ai?
At Success.ai, we understand the critical importance of high-quality data for your business success. Here’s why our dataset stands out:
Tailored for Small Business Engagement Focused specifically on North American small business owners, this dataset is an invaluable resource for building relationships with SMEs (Small and Medium Enterprises). Whether you’re targeting startups, local businesses, or established small enterprises, our dataset has you covered.
Comprehensive Coverage Across North America Spanning the United States, Canada, and Mexico, our dataset ensures wide-reaching access to verified small business contacts in the region.
Categories Tailored to Your Needs Includes highly relevant categories such as Small Business Contact Data, CEO Contact Data, B2B Contact Data, and Email Address Data to match your marketing and sales strategies.
Customizable and Flexible Choose from a wide range of filtering options to create datasets that meet your exact specifications, including filtering by industry, company size, geographic location, and more.
Best Price Guaranteed We pride ourselves on offering the most competitive rates without compromising on quality. When you partner with Success.ai, you receive superior data at the best value.
Seamless Integration Delivered in formats that integrate effortlessly with your CRM, marketing automation, or sales platforms, so you can start acting on the data immediately.
Use Cases: This dataset empowers you to:
Drive Sales Growth: Build and refine your sales pipeline by connecting directly with decision-makers in small businesses. Optimize Marketing Campaigns: Launch highly targeted email and phone outreach campaigns with verified contact data. Expand Your Network: Leverage the dataset to build relationships with small business owners and other key figures within the B2B landscape. Improve Data Accuracy: Enhance your existing databases with verified, enriched contact information, reducing bounce rates and increasing ROI. Industries Served: Whether you're in B2B SaaS, digital marketing, consulting, or any field requiring accurate and targeted contact data, this dataset serves industries of all kinds. It is especially useful for professionals focused on:
Lead Generation Business Development Market Research Sales Outreach Customer Acquisition What’s Included in the Dataset: Each profile provides:
Full Name Verified Email Address Phone Number (where available) Job Title Company Name Industry Company Size Location Skills and Professional Experience Education Background With over 170 million profiles, you can tap into a wealth of opportunities to expand your reach and grow your business.
Why High-Quality Contact Data Matters: Accurate, verified contact data is the foundation of any successful B2B strategy. Reaching small business owners and decision-makers directly ensures your message lands where it matters most, reducing costs and improving the effectiveness of your campaigns. By choosing Success.ai, you ensure that every contact in your pipeline is a genuine opportunity.
Partner with Success.ai for Better Data, Better Results: Success.ai is committed to delivering premium-quality B2B data solutions at scale. With our small business owner dataset, you can unlock the potential of North America's dynamic small business market.
Get Started Today Request a sample or customize your dataset to fit your unique...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Form Table is a dataset for object detection tasks - it contains Form Table annotations for 2,106 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterThe product represents a new design of the State Map at a scale of 1:5,000 (SM 5) in vector form, whose advantages are recency and colour processing. The map contains planimetry based on cadastral map, altimetry adopted from the altimetry part of ZABAGED and map lettering based on database of geographic names Geonames and abbreviations of feature type signification coming up from attributes of selected ZABAGED features. This new design of the SM 5 is repeatedly generated once a year on the part of the Czech territory where the vector form of cadastral map is available. Therefore, part of export units (map sheets of SM 5) has not a full coverage (price of such export unit is then proportionally reduced).
Facebook
TwitterYou've been given a classified data set from a company! They've hidden the feature column names but have given you the data and the target classes.
We'll try to use ML to create a model that directly predicts a class for a new data point based off of the features.
Let's grab it and use it!
Facebook
TwitterThe Canadian Atmospheric Environment Service (AES) provided BOREAS with hourly and daily surface meteorological data from 23 of the AES meteorological stations located across Canada and upper air data from 1 station at The Pas, Manitoba. Due to copyright restrictions on the full resolution surface meteorological data, this data set contains 5-day average values for the surface parameters. The upper air data are provided in their full resolution form. The 5-day averaging was performed in order to create a data set that could be publicly distributed at no cost. Temporally, the surface meteorological data cover the period of January 1975 to December 1996 and the upper air data cover the period of January 1961 to November 1996.
Facebook
TwitterBy Gary Hoover [source]
This data set provides a detailed look into the US economy. It includes information on establishments and nonemployer businesses, as well as sales revenue, payrolls, and the number of employees. Gleaned from the Economic Census done every five years, this data is a valuable resource to anyone curious about where the nation was economically at the time. With columns including geographic area name, North American Industry Classification System (NAICS) codes for industries, descriptions of those codes meaning of operation or tax status, and annual payroll, this information-rich dataset contains all you need to track economic trends over time. Whether you’re a researcher studying industry patterns or an entrepreneur looking for market insight — this dataset has what you’re looking for!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset provides detailed US industry data by state, including the number of establishments, value of sales, payroll, and number of employees. All the data is based on the North American Industry Classification System (NAICS) code for each specific industry. This will allow you to easily analyze and compare industries across different states or regions.
- Analyzing the economic impact of a new business or industry trends in different states: Comparing the change in the number of establishments, payroll, and employees over time can give insight into how a state is affected by a new industry trend or introduction of a new service or product.
- Estimating customer sales potential for businesses: This dataset can be used to estimate the potential customer base for businesses in different geographic areas. By analyzing total business done by non-employers in an area along with its estimated population can help estimate how much overall sales potential exists for a given region.
- Tracking competitor performance: By looking at shipments, receipts, and value of business done across industries in different regions or even cities, companies can track their competitors’ performance and compare it to their own to better assess their strategies going forward
If you use this dataset in your research, please credit the original authors. Data Source
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.
File: 2012 Industry Data by Industry and State.csv | Column name | Description | |:----------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------| | Geographic area name | The name of the geographic area the data is for. (String) | | NAICS code | The North American Industry Classification System (NAICS) code for the industry. (String) | | Meaning of NAICS code | The description of the NAICS code. (String) | | Meaning of Type of operation or tax status code | The description of the type of operation or tax status code. (String) ...
Facebook
TwitterRail equipment accidents/incidents, collisions, derailments, fires, explosions, acts of God, or other events involving the operation of railroad on-track equipment (standing or moving) and causing reportable damages greater than the reporting threshold for the year in which the accident/incident occurred, must be reported by railroads to the FRA on Form FRA 6180.54 - Rail Equipment Accident/Incident.
This dataset is the source dataset and contains raw data values. It replaced the legacy data download (https://safetydata.fra.dot.gov/OfficeofSafety/publicsite/on_the_fly_download.aspx). To download data that contains data in a user-friendly human-readable format, please reference https://data.transportation.gov/Railroads/Rail-Equipment-Accident-Incident-Data/85tf-25kj.
The data dictionary can be found here: https://datahub.transportation.gov/api/views/aqxq-n5hy/files/ea00a728-94b0-43e7-8c11-8481f13170a7?download=true&filename=accfile_EFFECTIVE_060111%20(8).pdf.
For information on how to filter and export data, please visit: https://data.transportation.gov/stories/s/Download-Export-and-Print-User-Guide/s8hj-vns8/.
To view the data release schedule, please visit: https://data.transportation.gov/stories/s/Data-Release-Schedule/qfc9-tapk/.
Facebook
TwitterDatasets of the AuTexTification shared task at IberLEF 2023. This task aims to boost research on the detection of text generated automatically by text generation models. Participants must develop models that exploit clues about linguistic form and meaning to distinguish automatically generated text from human text.
This dataset includes the training and test splits with labels for all the subtasks and languages. Additionally, each file includes the domain, the model and the prompt used to generate each sample. The model label mapping for subtask 2 is: {"A": "bloom-1b7", "B": "bloom-3b", "C": "bloom-7b1", "D": "babbage", "E": "curie", "F": "text-davinci-003"}
Facebook
TwitterThis is the landing page for Form 6180.71 US DOT Crossing Inventory data.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Techsalerator’s Business Technographic Data for Iran: Unlocking Insights into Iran's Technology Landscape
Techsalerator’s Business Technographic Data for Iran offers a comprehensive and detailed dataset crucial for businesses, market analysts, and technology vendors aiming to understand and engage with companies operating in Iran. This dataset provides in-depth insights into the technological environment, capturing and organizing information related to technology stacks, digital tools, and IT infrastructure used by businesses across the country.
Please reach out to us at info@techsalerator.com or visit Techsalerator Contact.
Company Name: This field lists the names of companies in Iran, allowing technology vendors to identify potential clients and enabling analysts to assess technology adoption trends within specific businesses.
Technology Stack: This field details the technologies and software solutions utilized by a company, such as ERP systems, CRM software, and cloud services. Understanding a company's technology stack is crucial for evaluating its digital maturity and operational requirements.
Deployment Status: This field indicates whether the technology is currently in use, planned for future implementation, or under evaluation. Vendors can use this information to gauge the level of technology adoption and interest among companies in Iran.
Industry Sector: This field specifies the industry in which the company operates, such as oil and gas, manufacturing, or finance. Knowledge of the industry helps vendors tailor their products to sector-specific needs and emerging trends in Iran.
Geographic Location: This field identifies the company's headquarters or primary operations within Iran. Geographic information supports regional analysis and helps understand localized technology adoption patterns across the country.
Oil and Gas Technology: Given Iran's significant role in the global oil and gas industry, there is a strong focus on advanced technologies such as exploration and production tools, seismic analysis software, and energy management systems.
Fintech Innovations: The financial technology sector is experiencing rapid growth, with businesses adopting digital payment solutions, mobile banking apps, and blockchain technologies to enhance financial transactions and services.
E-commerce Growth: The e-commerce sector in Iran is expanding, with companies increasingly leveraging online marketplaces, digital payment gateways, and logistics technology to improve customer reach and operational efficiency.
Cybersecurity: With the rise in digital transactions and online activities, there is a heightened emphasis on cybersecurity. Companies in Iran are investing in data protection solutions, encryption technologies, and secure communication systems to protect against cyber threats.
Smart Manufacturing: The push towards Industry 4.0 is evident in Iran, with companies adopting smart manufacturing technologies such as IoT-enabled machinery, automated production systems, and advanced data analytics to enhance operational efficiency.
National Iranian Oil Company (NIOC): As a major player in the oil and gas sector, NIOC utilizes advanced exploration and production technologies, digital asset management, and energy management solutions.
Bank Melli Iran: A leading financial institution, Bank Melli Iran is implementing digital banking services, mobile apps, and fintech solutions to enhance customer experience and streamline operations.
Digikala: Iran's largest e-commerce platform, Digikala, leverages sophisticated online shopping technologies, digital payment systems, and logistics solutions to serve a growing customer base.
Iran Telecommunications Company (TCI): TCI plays a critical role in providing telecommunication services, focusing on expanding its network infrastructure, improving connectivity, and investing in next-generation technologies.
Khorasan Industrial Group: A significant player in the manufacturing sector, Khorasan Industrial Group is adopting smart manufacturing technologies, automation, and data analytics to optimize production processes and improve product quality.
For those interested in accessing Techsalerator’s Business Technographic Data for Iran, please contact info@techsalerator.com with your specific requirements. Techsalerator offers customized quotes based on the number of data fields and records needed, with datasets available for delivery within 24 hours. Ongoing access options can also be arranged upon request.
Facebook
TwitterWe tested the hypothesis that phonosemantic iconicity––i.e., a motivated resonance of sound and meaning––might not only be found on the level of individual words or entire texts, but also in word combinations such that the meaning of a target word is iconically expressed, or highlighted, in the phonetic properties of its immediate verbal context. To this end, we extracted single lines from German poems that all include a word designating high or low dominance, such as large or small, strong or weak, etc. Based on insights from previous studies, we expected to find more vowels with a relatively short distance between the first two formants (low formant dispersion) in the immediate context of words expressing high physical or social dominance than in the context of words expressing low dominance. Our findings support this hypothesis, suggesting that neighboring words can form iconic dyads in which the meaning of one word is sound-iconically reflected in the phonetic properties of adjacent words. The construct of a contiguity-based phono-semantic iconicity opens many venues for future research well beyond lines extracted from poems.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
AI-ready dataset for business partner analytics and visualization
This dataset provides standardized, enriched information on 2,000 of the largest global companies, based on publicly available data. It is designed to support both analytical workflows and AI applications, thanks to its comprehensive scope and semantically well-documented data fields.
Included are granular business partner attributes such as:
The data is structured according to the CDQ Business Partner Data Model, which ensures semantic consistency and traceability across jurisdictions.
This dataset demonstrates the power of structured, high-quality data in enabling business partner insights, AI-based enrichment, and compliance use cases. Created and shared by CDQ, a leading provider of trusted business partner data with a global knowledge base of over 200 million company records.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
CommonForms: A Large, Diverse Dataset for Form Field Detection
This repository hosts the CommonForms dataset, a web-scale dataset for form field detection, introduced in the paper CommonForms: A Large, Diverse Dataset for Form Field Detection. CommonForms casts the problem of form field detection as object detection: given an image of a page, predict the location and type (Text Input, Choice Button, Signature) of form fields. Key Features:
Scale: Roughly 55,000 documents comprising… See the full description on the dataset page: https://huggingface.co/datasets/jbarrow/CommonForms.