100+ datasets found
  1. h

    Data from: CommonForms

    • huggingface.co
    Updated Oct 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joe Barrow (2025). CommonForms [Dataset]. https://huggingface.co/datasets/jbarrow/CommonForms
    Explore at:
    Dataset updated
    Oct 13, 2025
    Authors
    Joe Barrow
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    CommonForms: A Large, Diverse Dataset for Form Field Detection

    This repository hosts the CommonForms dataset, a web-scale dataset for form field detection, introduced in the paper CommonForms: A Large, Diverse Dataset for Form Field Detection. CommonForms casts the problem of form field detection as object detection: given an image of a page, predict the location and type (Text Input, Choice Button, Signature) of form fields. Key Features:

    Scale: Roughly 55,000 documents comprising… See the full description on the dataset page: https://huggingface.co/datasets/jbarrow/CommonForms.

  2. D

    Replication data for: Dynamical systems implementation of intrinsic sentence...

    • dataverse.no
    • dataverse.azure.uit.no
    • +1more
    png, txt
    Updated Sep 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hermann Moisl; Hermann Moisl (2025). Replication data for: Dynamical systems implementation of intrinsic sentence meaning [Dataset]. http://doi.org/10.18710/BNZRRU
    Explore at:
    txt(210), txt(125), txt(228), txt(7515), txt(15621), txt(15739), txt(208), txt(8215), txt(1732), png(68500), txt(22865), png(56982), txt(17395), txt(15401), txt(13302), png(67078), txt(12863), png(62987), png(62446), png(70286), txt(149), txt(82)Available download formats
    Dataset updated
    Sep 29, 2025
    Dataset provided by
    DataverseNO
    Authors
    Hermann Moisl; Hermann Moisl
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The submitted data relate to sections 2.3 and 2.4 of: H. Moisl (2022) Dynamical systems implementation of intrinsic sentence meaning, Minds and Machines 32 (2022), which describe the processing architecture of the model of intrinsic sentence meaning proposed there. Six separate programs are used to generate the results presented in the article, whose interrelationships are described in the above-cited sections. The paper with which the data are associated proposes a model for implementation of intrinsic natural language sentence meaning in a physical language understanding system, where 'intrinsic' is understood as 'independent of meaning ascription by system-external observers'. The proposal is that intrinsic meaning can be implemented as a point attractor in the state space of a nonlinear dynamical system with feedback which is generated by temporally sequenced inputs. It is motivated by John Searle's well known (1980) critique of the then-standard and currently still influential Computational Theory of Mind (CTM), the essence of which was that CTM representations lack intrinsic meaning because that meaning is dependent on ascription by an observer. The proposed dynamical model comprises a collection of interacting artificial neural networks, and constitutes a radical simplification of the principle of compositional phrase structure which is at the heart of the current standard view of sentence semantics because it is computationally interpretable as a finite state machine.

  3. Success.ai | 150M+ B2B Employee Contact Data – Full Verified Profiles, 170M...

    • datarade.ai
    Updated Oct 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Success.ai (2024). Success.ai | 150M+ B2B Employee Contact Data – Full Verified Profiles, 170M Work Emails & Phone Numbers, Global Dataset, Price & Quality Guarantee [Dataset]. https://datarade.ai/data-products/success-ai-150m-b2b-employee-contact-data-full-verified-success-ai
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Oct 12, 2024
    Dataset provided by
    Area covered
    France, Venezuela (Bolivarian Republic of), Togo, Turks and Caicos Islands, Philippines, Madagascar, Tajikistan, Slovenia, Bhutan, Andorra
    Description

    Success.ai offers a comprehensive, enterprise-ready B2B leads data solution, ideal for businesses seeking access to over 150 million verified employee profiles and 170 million work emails. Our data empowers organizations across industries to target key decision-makers, optimize recruitment, and fuel B2B marketing efforts. Whether you're looking for UK B2B data, B2B marketing data, or global B2B contact data, Success.ai provides the insights you need with pinpoint accuracy.

    Tailored for B2B Sales, Marketing, Recruitment and more: Our B2B contact data and B2B email data solutions are designed to enhance your lead generation, sales, and recruitment efforts. Build hyper-targeted lists based on job title, industry, seniority, and geographic location. Whether you’re reaching mid-level professionals or C-suite executives, Success.ai delivers the data you need to connect with the right people.

    Key Categories Served: B2B sales leads – Identify decision-makers in key industries, B2B marketing data – Target professionals for your marketing campaigns, Recruitment data – Source top talent efficiently and reduce hiring times, CRM enrichment – Update and enhance your CRM with verified, updated data, Global reach – Coverage across 195 countries, including the United States, United Kingdom, Germany, India, Singapore, and more.

    Global Coverage with Real-Time Accuracy: Success.ai’s dataset spans a wide range of industries such as technology, finance, healthcare, and manufacturing. With continuous real-time updates, your team can rely on the most accurate data available: 150M+ Employee Profiles: Access professional profiles worldwide with insights including full name, job title, seniority, and industry. 170M Verified Work Emails: Reach decision-makers directly with verified work emails, available across industries and geographies, including Singapore and UK B2B data. GDPR-Compliant: Our data is fully compliant with GDPR and other global privacy regulations, ensuring safe and legal use of B2B marketing data.

    Key Data Points for Every Employee Profile: Every profile in Success.ai’s database includes over 20 critical data points, providing the information needed to power B2B sales and marketing campaigns: Full Name, Job Title, Company, Work Email, Location, Phone Number, LinkedIn Profile, Experience, Education, Technographic Data, Languages, Certifications, Industry, Publications & Awards.

    Use Cases Across Industries: Success.ai’s B2B data solution is incredibly versatile and can support various enterprise use cases, including: B2B Marketing Campaigns: Reach high-value professionals in industries such as technology, finance, and healthcare. Enterprise Sales Outreach: Build targeted B2B contact lists to improve sales efforts and increase conversions. Talent Acquisition: Accelerate hiring by sourcing top talent with accurate and updated employee data, filtered by job title, industry, and location. Market Research: Gain insights into employment trends and company profiles to enrich market research. CRM Data Enrichment: Ensure your CRM stays accurate by integrating updated B2B contact data. Event Targeting: Create lists for webinars, conferences, and product launches by targeting professionals in key industries.

    Use Cases for Success.ai's Contact Data - Targeted B2B Marketing: Create precise campaigns by targeting key professionals in industries like tech and finance. - Sales Outreach: Build focused sales lists of decision-makers and C-suite executives for faster deal cycles. - Recruiting Top Talent: Easily find and hire qualified professionals with updated employee profiles. - CRM Enrichment: Keep your CRM current with verified, accurate employee data. - Event Targeting: Create attendee lists for events by targeting relevant professionals in key sectors. - Market Research: Gain insights into employment trends and company profiles for better business decisions. - Executive Search: Source senior executives and leaders for headhunting and recruitment. - Partnership Building: Find the right companies and key people to develop strategic partnerships.

    Why Choose Success.ai’s Employee Data? Success.ai is the top choice for enterprises looking for comprehensive and affordable B2B data solutions. Here’s why: Unmatched Accuracy: Our AI-powered validation process ensures 99% accuracy across all data points, resulting in higher engagement and fewer bounces. Global Scale: With 150M+ employee profiles and 170M verified work emails, Success.ai provides extensive coverage for UK B2B data, B2B marketing data, and global contacts. Competitive Pricing: We offer the most competitive rates on the market, undercutting major competitors like Lusha, Cognism, and ZoomInfo. Tailored Solutions: Our white-glove service ensures we deliver exactly what you need, in the format that suits your workflow (CSV, Excel, etc.). Real-Time Updates: Our data is continuously updated, so you always have the latest information, unlike static da...

  4. H

    Worldwide Fulltext Usage of Data Astrophysics Data System in 2010

    • dataverse.harvard.edu
    • data.niaid.nih.gov
    Updated Oct 22, 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SAO/NASA Astrophysics Data System (2013). Worldwide Fulltext Usage of Data Astrophysics Data System in 2010 [Dataset]. http://doi.org/10.7910/DVN/22950
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 22, 2013
    Dataset provided by
    Harvard Dataverse
    Authors
    SAO/NASA Astrophysics Data System
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/22950https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/22950

    Dataset funded by
    NASAhttp://nasa.gov/
    Description

    The data contained in these files (one in Excel, the other in JSON format) consists of full text download numbers through the ADS during the year 2010. Every row is a journal, indicated by the journal name and the ADS abbreviation ("bibstem", see: http://adsabs.harvard.edu/abs_doc/journals2.html). For each journal, we present the download numbers split up by publication year (with the first data column being the range "pre 1998"). Full text downloads within the ADS service are defined as 'clicks' on either of the links within an ADS record that provide access to full text in one form or other. Specifically, these are the 'E', 'F', 'L', 'G' or 'X' links (see http://doc.adsabs.harvard.edu/abs_doc/help_pages/results.html#List_of_Links definitions). The data contained in these files had been released under the CC-BY License (see: http://creativecommons.org/licenses/by/3.0/us/). Please acknowledge the ADS in a publication that makes us of these data by the phrase: ``This research has made use of NASA's Astrophysics Data System."

  5. Synthetic AR Medical Dataset with Realistic Denial

    • kaggle.com
    zip
    Updated Aug 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abuthahir1998 (2025). Synthetic AR Medical Dataset with Realistic Denial [Dataset]. https://www.kaggle.com/datasets/abuthahir1998/synthetic-ar-medical-dataset-with-realistic-denial
    Explore at:
    zip(13843 bytes)Available download formats
    Dataset updated
    Aug 31, 2025
    Authors
    Abuthahir1998
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Subtitle

    A fully synthetic dataset simulating real-world medical billing scenarios, including claim status, denials, team allocation, and AR follow-up logic.

    Description

    This dataset represents a synthetic Account Receivable (AR) data model for medical billing, created using realistic healthcare revenue cycle management (RCM) workflows. It is designed for data analysis, machine learning modeling, automation testing, and process simulation in the healthcare billing domain.

    The dataset includes realistic business logic, mimicking the actual process of claim submission, denial management, follow-ups, and payment tracking. This is especially useful for: ✔ Medical billing trainingPredictive modeling (claim outcomes, denial prediction, payment forecasting)RCM process automation and AI researchData visualization and dashboard creation

    Key Features of This Dataset

    Patient & Claim Information:

    • Visit ID: Unique alphanumeric ID in the format XXXXXZXXXXXX
    • Patient Name: Randomly generated names
    • Date of Service (DOS): In MM/DD/YYYY format
    • Aging Days: Calculated as Today - DOS
    • Aging Bucket: Categorized as 0-30, 31-60, 61-90, 91-120, 120+

    Claim Status & Denial Logic:

    • Status Column: Indicates whether response received or not
    • If No Response → Simulates a follow-up call → Claim may result in denial
    • Status Code: Reflects actual denial reason (e.g., Dx inconsistent with CPT)
    • Action Code: Required follow-up action (e.g., Need Coding Assistance)
    • Team Allocation: Based on denial type

      • Coding-related denialCoding Team
      • Submission/Claim-related denialBilling Team
      • Payment-related denialPayment Team

    Realistic Denial Scenarios Covered:

    • Coding Errors (Dx inconsistent with CPT, Missing Modifier)
    • Claim Issues (Duplicate Claim, Invalid Subscriber ID)
    • Payment Issues (Allowed Amount Paid, No Coverage)

    Other Important Columns:

    • Claim Amount, Paid Amount, Balance
    • Insurance Details (Primary, Secondary, Tertiary)
    • Notes explaining denial or next steps

    Columns in the Dataset

    Column NameDescription
    ClientName of the client/provider
    StateUS State where service provided
    Visit ID#Unique alphanumeric ID (XXXXXZXXXXXX)
    Patient NamePatient’s full name
    DOSDate of Service (MM/DD/YYYY)
    Aging DaysDays from DOS to today
    Aging BucketAging category
    Claim AmountOriginal claim billed
    Paid AmountAmount paid so far
    BalanceRemaining balance
    StatusInitial claim status (No Response, Paid, etc.)
    Status CodeActual reason (e.g., Dx inconsistent with CPT)
    Action CodeNext step (e.g., Need Coding Assistance)
    Team AllocationResponsible team (Coding, Billing, Payment)
    NotesFollow-up notes

    Data Generation Rules Applied

    • Date format: MM/DD/YYYY
    • Aging Days: Calculated dynamically based on DOS
    • Visit ID: Always follows the XXXXXZXXXXXX format
    • Denial Workflow:

      • If claim denied → Status Code & Action Code updated
      • Team allocation based on denial type
    • Payments: Realistic logic where payment may be partial, full, or none

    • Insurance Flow: Balance moves from primary → secondary → tertiary → patient responsibility

    Use Cases

    • Predictive modeling for claim outcome
    • Identifying high-risk claims for early intervention
    • Denial pattern analysis for improving first-pass resolution rate
    • Building RCM dashboards and AR management tools

    License

    CC BY 4.0 – Free to use, modify, and share with attribution.

  6. d

    Boat Registration Data Access Form

    • catalog.data.gov
    • s.cnmilf.com
    Updated Jan 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.austintexas.gov (2025). Boat Registration Data Access Form [Dataset]. https://catalog.data.gov/dataset/boat-registration-data-access-form
    Explore at:
    Dataset updated
    Jan 25, 2025
    Dataset provided by
    data.austintexas.gov
    Description

    Texas Parks and Wildlife Department boat registration data access form

  7. Full information/data on DFID aid projects - Dataset - data.gov.uk

    • ckan.publishing.service.gov.uk
    Updated May 30, 2012
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2012). Full information/data on DFID aid projects - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/info-dfid-aid-projects
    Explore at:
    Dataset updated
    May 30, 2012
    Dataset provided by
    CKANhttps://ckan.org/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    DFID publish the following information on a monthly basis: Details on international development projects including budgets and financial transactions. Information on sectors, geographical location (including sub-national geolocations). All core project documentation, including business cases, annual reviews, completion reports, and evaluations. In addition we will publish country plans and formal agreements with partners. Project summaries are published and translated into major local languages of the relevant countries. Further data is being gathered on sub-national locations and results, and will be released as it becomes available. The data is published in an open, accessible format, using the International Aid Transparency Initiative (IATI) open standard, in XML format. Data is also presented in a more visual form on the Development Tracker. A small number of exclusions will apply to sensitive information, based on the key principles of the UK’s Freedom of Information Act.

  8. o

    Application Form - Dataset - Open Government Data Portal

    • opendata.gov.jo
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Application Form - Dataset - Open Government Data Portal [Dataset]. https://opendata.gov.jo/dataset/application-form-2527-2023
    Explore at:
    Dataset updated
    Jun 1, 2023
    Description

    Application Form

  9. d

    Small Business Contact Data | North American Small Business Owners |...

    • datarade.ai
    Updated Oct 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Success.ai (2021). Small Business Contact Data | North American Small Business Owners | Verified Contact Details from 170M Profiles | Best Price Guaranteed [Dataset]. https://datarade.ai/data-products/small-business-contact-data-north-american-small-business-o-success-ai
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Oct 27, 2021
    Dataset provided by
    Success.ai
    Area covered
    Honduras, Costa Rica, Guatemala, United States of America, Panama, Greenland, Saint Pierre and Miquelon, Mexico, Bermuda, Belize
    Description

    Access B2B Contact Data for North American Small Business Owners with Success.ai—your go-to provider for verified, high-quality business datasets. This dataset is tailored for businesses, agencies, and professionals seeking direct access to decision-makers within the small business ecosystem across North America. With over 170 million professional profiles, it’s an unparalleled resource for powering your marketing, sales, and lead generation efforts.

    Key Features of the Dataset:

    Verified Contact Details

    Includes accurate and up-to-date email addresses and phone numbers to ensure you reach your targets reliably.

    AI-validated for 99% accuracy, eliminating errors and reducing wasted efforts.

    Detailed Professional Insights

    Comprehensive data points include job titles, skills, work experience, and education to enable precise segmentation and targeting.

    Enriched with insights into decision-making roles, helping you connect directly with small business owners, CEOs, and other key stakeholders.

    Business-Specific Information

    Covers essential details such as industry, company size, location, and more, enabling you to tailor your campaigns effectively. Ideal for profiling and understanding the unique needs of small businesses.

    Continuously Updated Data

    Our dataset is maintained and updated regularly to ensure relevance and accuracy in fast-changing market conditions. New business contacts are added frequently, helping you stay ahead of the competition.

    Why Choose Success.ai?

    At Success.ai, we understand the critical importance of high-quality data for your business success. Here’s why our dataset stands out:

    Tailored for Small Business Engagement Focused specifically on North American small business owners, this dataset is an invaluable resource for building relationships with SMEs (Small and Medium Enterprises). Whether you’re targeting startups, local businesses, or established small enterprises, our dataset has you covered.

    Comprehensive Coverage Across North America Spanning the United States, Canada, and Mexico, our dataset ensures wide-reaching access to verified small business contacts in the region.

    Categories Tailored to Your Needs Includes highly relevant categories such as Small Business Contact Data, CEO Contact Data, B2B Contact Data, and Email Address Data to match your marketing and sales strategies.

    Customizable and Flexible Choose from a wide range of filtering options to create datasets that meet your exact specifications, including filtering by industry, company size, geographic location, and more.

    Best Price Guaranteed We pride ourselves on offering the most competitive rates without compromising on quality. When you partner with Success.ai, you receive superior data at the best value.

    Seamless Integration Delivered in formats that integrate effortlessly with your CRM, marketing automation, or sales platforms, so you can start acting on the data immediately.

    Use Cases: This dataset empowers you to:

    Drive Sales Growth: Build and refine your sales pipeline by connecting directly with decision-makers in small businesses. Optimize Marketing Campaigns: Launch highly targeted email and phone outreach campaigns with verified contact data. Expand Your Network: Leverage the dataset to build relationships with small business owners and other key figures within the B2B landscape. Improve Data Accuracy: Enhance your existing databases with verified, enriched contact information, reducing bounce rates and increasing ROI. Industries Served: Whether you're in B2B SaaS, digital marketing, consulting, or any field requiring accurate and targeted contact data, this dataset serves industries of all kinds. It is especially useful for professionals focused on:

    Lead Generation Business Development Market Research Sales Outreach Customer Acquisition What’s Included in the Dataset: Each profile provides:

    Full Name Verified Email Address Phone Number (where available) Job Title Company Name Industry Company Size Location Skills and Professional Experience Education Background With over 170 million profiles, you can tap into a wealth of opportunities to expand your reach and grow your business.

    Why High-Quality Contact Data Matters: Accurate, verified contact data is the foundation of any successful B2B strategy. Reaching small business owners and decision-makers directly ensures your message lands where it matters most, reducing costs and improving the effectiveness of your campaigns. By choosing Success.ai, you ensure that every contact in your pipeline is a genuine opportunity.

    Partner with Success.ai for Better Data, Better Results: Success.ai is committed to delivering premium-quality B2B data solutions at scale. With our small business owner dataset, you can unlock the potential of North America's dynamic small business market.

    Get Started Today Request a sample or customize your dataset to fit your unique...

  10. R

    Data from: Form Table Dataset

    • universe.roboflow.com
    zip
    Updated May 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raijin (2024). Form Table Dataset [Dataset]. https://universe.roboflow.com/raijin/form-table
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 29, 2024
    Dataset authored and provided by
    Raijin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Form Table Bounding Boxes
    Description

    Form Table

    ## Overview
    
    Form Table is a dataset for object detection tasks - it contains Form Table annotations for 2,106 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  11. e

    State map 1:5000 new form vector data - Rájec 1-8

    • data.europa.eu
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    State map 1:5000 new form vector data - Rájec 1-8 [Dataset]. https://data.europa.eu/data/datasets/cz-cuzk-sm5-v-raje18
    Explore at:
    Description

    The product represents a new design of the State Map at a scale of 1:5,000 (SM 5) in vector form, whose advantages are recency and colour processing. The map contains planimetry based on cadastral map, altimetry adopted from the altimetry part of ZABAGED and map lettering based on database of geographic names Geonames and abbreviations of feature type signification coming up from attributes of selected ZABAGED features. This new design of the SM 5 is repeatedly generated once a year on the part of the Czech territory where the vector form of cadastral map is available. Therefore, part of export units (map sheets of SM 5) has not a full coverage (price of such export unit is then proportionally reduced).

  12. Dataset-classified-data

    • kaggle.com
    zip
    Updated Jun 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kapil Nahariya (2023). Dataset-classified-data [Dataset]. https://www.kaggle.com/datasets/kapilnahariya/dataset-for-unknown-data
    Explore at:
    zip(91451 bytes)Available download formats
    Dataset updated
    Jun 24, 2023
    Authors
    Kapil Nahariya
    Description

    You've been given a classified data set from a company! They've hidden the feature column names but have given you the data and the target classes.

    We'll try to use ML to create a model that directly predicts a class for a new data point based off of the features.

    Let's grab it and use it!

  13. BOREAS AES Five-day Averaged Surface Meteorological and Upper Air Data

    • data.nasa.gov
    • data.globalchange.gov
    • +9more
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). BOREAS AES Five-day Averaged Surface Meteorological and Upper Air Data [Dataset]. https://data.nasa.gov/dataset/boreas-aes-five-day-averaged-surface-meteorological-and-upper-air-data-223f7
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    The Canadian Atmospheric Environment Service (AES) provided BOREAS with hourly and daily surface meteorological data from 23 of the AES meteorological stations located across Canada and upper air data from 1 station at The Pas, Manitoba. Due to copyright restrictions on the full resolution surface meteorological data, this data set contains 5-day average values for the surface parameters. The upper air data are provided in their full resolution form. The 5-day averaging was performed in order to create a data set that could be publicly distributed at no cost. Temporally, the surface meteorological data cover the period of January 1975 to December 1996 and the upper air data cover the period of January 1961 to November 1996.

  14. US Industry Data by State, by Industry

    • kaggle.com
    zip
    Updated Jan 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). US Industry Data by State, by Industry [Dataset]. https://www.kaggle.com/datasets/thedevastator/2012-us-industry-data-by-state-by-industry
    Explore at:
    zip(53066 bytes)Available download formats
    Dataset updated
    Jan 15, 2023
    Authors
    The Devastator
    Area covered
    United States
    Description

    US Industry Data by State, by Industry

    Number of Establishments, Sales, Payroll, and Employees

    By Gary Hoover [source]

    About this dataset

    This data set provides a detailed look into the US economy. It includes information on establishments and nonemployer businesses, as well as sales revenue, payrolls, and the number of employees. Gleaned from the Economic Census done every five years, this data is a valuable resource to anyone curious about where the nation was economically at the time. With columns including geographic area name, North American Industry Classification System (NAICS) codes for industries, descriptions of those codes meaning of operation or tax status, and annual payroll, this information-rich dataset contains all you need to track economic trends over time. Whether you’re a researcher studying industry patterns or an entrepreneur looking for market insight — this dataset has what you’re looking for!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset provides detailed US industry data by state, including the number of establishments, value of sales, payroll, and number of employees. All the data is based on the North American Industry Classification System (NAICS) code for each specific industry. This will allow you to easily analyze and compare industries across different states or regions.

    Research Ideas

    • Analyzing the economic impact of a new business or industry trends in different states: Comparing the change in the number of establishments, payroll, and employees over time can give insight into how a state is affected by a new industry trend or introduction of a new service or product.
    • Estimating customer sales potential for businesses: This dataset can be used to estimate the potential customer base for businesses in different geographic areas. By analyzing total business done by non-employers in an area along with its estimated population can help estimate how much overall sales potential exists for a given region.
    • Tracking competitor performance: By looking at shipments, receipts, and value of business done across industries in different regions or even cities, companies can track their competitors’ performance and compare it to their own to better assess their strategies going forward

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

    Columns

    File: 2012 Industry Data by Industry and State.csv | Column name | Description | |:----------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------| | Geographic area name | The name of the geographic area the data is for. (String) | | NAICS code | The North American Industry Classification System (NAICS) code for the industry. (String) | | Meaning of NAICS code | The description of the NAICS code. (String) | | Meaning of Type of operation or tax status code | The description of the type of operation or tax status code. (String) ...

  15. Railroad Equipment Accident/Incident Source Data (Form 54)

    • data.virginia.gov
    • data.transportation.gov
    • +1more
    csv, json, rdf, xsl
    Updated Oct 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S Department of Transportation (2025). Railroad Equipment Accident/Incident Source Data (Form 54) [Dataset]. https://data.virginia.gov/dataset/railroad-equipment-accident-incident-source-data-form-54
    Explore at:
    xsl, rdf, csv, jsonAvailable download formats
    Dataset updated
    Oct 15, 2025
    Dataset provided by
    Federal Railroad Administrationhttp://www.fra.dot.gov/
    Authors
    U.S Department of Transportation
    Description

    Rail equipment accidents/incidents, collisions, derailments, fires, explosions, acts of God, or other events involving the operation of railroad on-track equipment (standing or moving) and causing reportable damages greater than the reporting threshold for the year in which the accident/incident occurred, must be reported by railroads to the FRA on Form FRA 6180.54 - Rail Equipment Accident/Incident.

    This dataset is the source dataset and contains raw data values. It replaced the legacy data download (https://safetydata.fra.dot.gov/OfficeofSafety/publicsite/on_the_fly_download.aspx). To download data that contains data in a user-friendly human-readable format, please reference https://data.transportation.gov/Railroads/Rail-Equipment-Accident-Incident-Data/85tf-25kj.

    The data dictionary can be found here: https://datahub.transportation.gov/api/views/aqxq-n5hy/files/ea00a728-94b0-43e7-8c11-8481f13170a7?download=true&filename=accfile_EFFECTIVE_060111%20(8).pdf.

    For information on how to filter and export data, please visit: https://data.transportation.gov/stories/s/Download-Export-and-Print-User-Guide/s8hj-vns8/.

    To view the data release schedule, please visit: https://data.transportation.gov/stories/s/Data-Release-Schedule/qfc9-tapk/.

  16. Z

    Data from: AuTexTification Dataset (Full data)

    • data.niaid.nih.gov
    • portalcientifico.universidadeuropea.com
    • +1more
    Updated May 22, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Areg Sarvazyan; José Ángel González; Marc Franco; Francisco Manuel Rangel; María Alberta Chulvi; Paolo Rosso (2023). AuTexTification Dataset (Full data) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7956206
    Explore at:
    Dataset updated
    May 22, 2023
    Dataset provided by
    Universitat Politècnica de València
    Symanto
    Authors
    Areg Sarvazyan; José Ángel González; Marc Franco; Francisco Manuel Rangel; María Alberta Chulvi; Paolo Rosso
    Description

    Datasets of the AuTexTification shared task at IberLEF 2023. This task aims to boost research on the detection of text generated automatically by text generation models. Participants must develop models that exploit clues about linguistic form and meaning to distinguish automatically generated text from human text.

    This dataset includes the training and test splits with labels for all the subtasks and languages. Additionally, each file includes the domain, the model and the prompt used to generate each sample. The model label mapping for subtask 2 is: {"A": "bloom-1b7", "B": "bloom-3b", "C": "bloom-7b1", "D": "babbage", "E": "curie", "F": "text-davinci-003"}

  17. Form 71 Data Downloads

    • catalog.data.gov
    • odgavaprod.ogopendata.com
    • +1more
    Updated Nov 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federal Railroad Administration (2025). Form 71 Data Downloads [Dataset]. https://catalog.data.gov/dataset/form-71-data-downloads
    Explore at:
    Dataset updated
    Nov 22, 2025
    Dataset provided by
    Federal Railroad Administrationhttp://www.fra.dot.gov/
    Description

    This is the landing page for Form 6180.71 US DOT Crossing Inventory data.

  18. B2B Technographic Data in Iran

    • kaggle.com
    zip
    Updated Sep 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Techsalerator (2024). B2B Technographic Data in Iran [Dataset]. https://www.kaggle.com/datasets/techsalerator/b2b-technographic-data-in-iran
    Explore at:
    zip(12108 bytes)Available download formats
    Dataset updated
    Sep 13, 2024
    Authors
    Techsalerator
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    Iran
    Description

    Techsalerator’s Business Technographic Data for Iran: Unlocking Insights into Iran's Technology Landscape

    Techsalerator’s Business Technographic Data for Iran offers a comprehensive and detailed dataset crucial for businesses, market analysts, and technology vendors aiming to understand and engage with companies operating in Iran. This dataset provides in-depth insights into the technological environment, capturing and organizing information related to technology stacks, digital tools, and IT infrastructure used by businesses across the country.

    Please reach out to us at info@techsalerator.com or visit Techsalerator Contact.

    Top 5 Most Utilized Data Fields

    • Company Name: This field lists the names of companies in Iran, allowing technology vendors to identify potential clients and enabling analysts to assess technology adoption trends within specific businesses.

    • Technology Stack: This field details the technologies and software solutions utilized by a company, such as ERP systems, CRM software, and cloud services. Understanding a company's technology stack is crucial for evaluating its digital maturity and operational requirements.

    • Deployment Status: This field indicates whether the technology is currently in use, planned for future implementation, or under evaluation. Vendors can use this information to gauge the level of technology adoption and interest among companies in Iran.

    • Industry Sector: This field specifies the industry in which the company operates, such as oil and gas, manufacturing, or finance. Knowledge of the industry helps vendors tailor their products to sector-specific needs and emerging trends in Iran.

    • Geographic Location: This field identifies the company's headquarters or primary operations within Iran. Geographic information supports regional analysis and helps understand localized technology adoption patterns across the country.

    Top 5 Technology Trends in Iran

    • Oil and Gas Technology: Given Iran's significant role in the global oil and gas industry, there is a strong focus on advanced technologies such as exploration and production tools, seismic analysis software, and energy management systems.

    • Fintech Innovations: The financial technology sector is experiencing rapid growth, with businesses adopting digital payment solutions, mobile banking apps, and blockchain technologies to enhance financial transactions and services.

    • E-commerce Growth: The e-commerce sector in Iran is expanding, with companies increasingly leveraging online marketplaces, digital payment gateways, and logistics technology to improve customer reach and operational efficiency.

    • Cybersecurity: With the rise in digital transactions and online activities, there is a heightened emphasis on cybersecurity. Companies in Iran are investing in data protection solutions, encryption technologies, and secure communication systems to protect against cyber threats.

    • Smart Manufacturing: The push towards Industry 4.0 is evident in Iran, with companies adopting smart manufacturing technologies such as IoT-enabled machinery, automated production systems, and advanced data analytics to enhance operational efficiency.

    Top 5 Companies with Notable Technographic Data in Iran

    • National Iranian Oil Company (NIOC): As a major player in the oil and gas sector, NIOC utilizes advanced exploration and production technologies, digital asset management, and energy management solutions.

    • Bank Melli Iran: A leading financial institution, Bank Melli Iran is implementing digital banking services, mobile apps, and fintech solutions to enhance customer experience and streamline operations.

    • Digikala: Iran's largest e-commerce platform, Digikala, leverages sophisticated online shopping technologies, digital payment systems, and logistics solutions to serve a growing customer base.

    • Iran Telecommunications Company (TCI): TCI plays a critical role in providing telecommunication services, focusing on expanding its network infrastructure, improving connectivity, and investing in next-generation technologies.

    • Khorasan Industrial Group: A significant player in the manufacturing sector, Khorasan Industrial Group is adopting smart manufacturing technologies, automation, and data analytics to optimize production processes and improve product quality.

    Accessing Techsalerator’s Business Technographic Data

    For those interested in accessing Techsalerator’s Business Technographic Data for Iran, please contact info@techsalerator.com with your specific requirements. Techsalerator offers customized quotes based on the number of data fields and records needed, with datasets available for delivery within 24 hours. Ongoing access options can also be arranged upon request.

    Included Data Fields

    • Company Name
    • Technology Stack
    • Deployment Status
    • Ind...
  19. f

    Data from: Contiguity-based sound iconicity: The meaning of words resonates...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated May 16, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Auracher, Jan; Scharinger, Mathias; Menninghaus, Winfried (2019). Contiguity-based sound iconicity: The meaning of words resonates with phonetic properties of their immediate verbal contexts [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000109877
    Explore at:
    Dataset updated
    May 16, 2019
    Authors
    Auracher, Jan; Scharinger, Mathias; Menninghaus, Winfried
    Description

    We tested the hypothesis that phonosemantic iconicity––i.e., a motivated resonance of sound and meaning––might not only be found on the level of individual words or entire texts, but also in word combinations such that the meaning of a target word is iconically expressed, or highlighted, in the phonetic properties of its immediate verbal context. To this end, we extracted single lines from German poems that all include a word designating high or low dominance, such as large or small, strong or weak, etc. Based on insights from previous studies, we expected to find more vowels with a relatively short distance between the first two formants (low formant dispersion) in the immediate context of words expressing high physical or social dominance than in the context of words expressing low dominance. Our findings support this hypothesis, suggesting that neighboring words can form iconic dyads in which the meaning of one word is sound-iconically reflected in the phonetic properties of adjacent words. The construct of a contiguity-based phono-semantic iconicity opens many venues for future research well beyond lines extracted from poems.

  20. Global 2000 Companies (2025)

    • kaggle.com
    zip
    Updated Jul 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CDQ AG (2025). Global 2000 Companies (2025) [Dataset]. https://www.kaggle.com/datasets/cdq-ag/global-2000-companies-2025/code
    Explore at:
    zip(1090091 bytes)Available download formats
    Dataset updated
    Jul 17, 2025
    Dataset authored and provided by
    CDQ AG
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    AI-ready dataset for business partner analytics and visualization

    This dataset provides standardized, enriched information on 2,000 of the largest global companies, based on publicly available data. It is designed to support both analytical workflows and AI applications, thanks to its comprehensive scope and semantically well-documented data fields.

    Included are granular business partner attributes such as:

    • Legal and international names
    • Legal form and identifiers (e.g., EIN, LEI)
    • Registered and headquarter addresses with geographic coordinates
    • Standardized industry classifications (NACE, NAICS, SIC)
    • Key financials for 2024 (Revenue, Profit, Assets, Market Capitalization)
    • Narrative company profiles

    The data is structured according to the CDQ Business Partner Data Model, which ensures semantic consistency and traceability across jurisdictions.

    This dataset demonstrates the power of structured, high-quality data in enabling business partner insights, AI-based enrichment, and compliance use cases. Created and shared by CDQ, a leading provider of trusted business partner data with a global knowledge base of over 200 million company records.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Joe Barrow (2025). CommonForms [Dataset]. https://huggingface.co/datasets/jbarrow/CommonForms

Data from: CommonForms

jbarrow/CommonForms

Related Article
Explore at:
91 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Oct 13, 2025
Authors
Joe Barrow
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

CommonForms: A Large, Diverse Dataset for Form Field Detection

This repository hosts the CommonForms dataset, a web-scale dataset for form field detection, introduced in the paper CommonForms: A Large, Diverse Dataset for Form Field Detection. CommonForms casts the problem of form field detection as object detection: given an image of a page, predict the location and type (Text Input, Choice Button, Signature) of form fields. Key Features:

Scale: Roughly 55,000 documents comprising… See the full description on the dataset page: https://huggingface.co/datasets/jbarrow/CommonForms.

Search
Clear search
Close search
Google apps
Main menu