100+ datasets found
  1. Open Data Documentation

    • data.ca.gov
    • data.cnra.ca.gov
    • +3more
    pdf
    Updated Apr 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Parks and Recreation (2021). Open Data Documentation [Dataset]. https://data.ca.gov/dataset/open-data-documentation
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Apr 26, 2021
    Dataset provided by
    California State Parkshttps://www.parks.ca.gov/
    Authors
    California Department of Parks and Recreation
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Useful information and links for navigating this site, understanding and utilizing Open Data

  2. Invasive Plant Inventory at San Pablo Bay National Wildlife Refuge- Data...

    • catalog.data.gov
    • datasets.ai
    Updated Nov 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Fish and Wildlife Service (2025). Invasive Plant Inventory at San Pablo Bay National Wildlife Refuge- Data Documentation [Dataset]. https://catalog.data.gov/dataset/invasive-plant-inventory-at-san-pablo-bay-national-wildlife-refuge-data-documentation
    Explore at:
    Dataset updated
    Nov 25, 2025
    Dataset provided by
    U.S. Fish and Wildlife Servicehttp://www.fws.gov/
    Description

    In 2013, an invasive plant inventory of priority invasive plant species in priority areas was conducted at San Pablo Bay National Wildlife Refuge. Results from this effort will inform the development of invasive plant management objectives, strategies, and serves as a baseline for assessing change in the status of invasive plant distribution or abundance over time.

  3. Z

    The Clarity Software Documentation Dataset

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    Updated Jan 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous Authors (2022). The Clarity Software Documentation Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5821839
    Explore at:
    Dataset updated
    Jan 6, 2022
    Dataset provided by
    Anonymous
    Authors
    Anonymous Authors
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository holds the Clarity Dataset which is a companion to the SANER'22 entitled "An Empirical Investigation into the Use of Image Captioning for Automated Software Documentation". The dataset consists of 45,998 captions 10,204 GUI screenshots and xml metadata files (akin to the "html" for stipulating GUIs) of Android applications. The NL captions were obtained from human labelers, underwent several quality control mechanisms, and contain both high- (screen-level) and low-(component) level descriptions of screen functionality. This dataset is meant as a new source of data to augment techniques for software documentation that can take advantage of the rich pixel-based information contained within screenshots.

  4. Radio Science Documentation Bundle - Dataset - NASA Open Data Portal

    • data.nasa.gov
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Radio Science Documentation Bundle - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/radio-science-documentation-bundle
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    This bundle contains documentation about data products that are collected using radio science and supporting equipment. With one exception, each member collection contains one or more versions of a single Software Interface Specification (SIS) or an equivalent document. A SIS describes the format and content of a data file at a granularity suffient for use -- typically byte-level, but sometimes bit-level. Examples of products and descriptions of their use may also be included in a collection, as appropriate. The exception is the DOCUMENT collection, which contains supporting material -- usually journal publications, technical reports, or other documents that describe investigations, analysis methods, and/or data but not at the level of a SIS. Members of the DOCUMENT collection were usually released once, whereas a SIS often evolves over many years.

  5. d

    Invasive Plant Inventory at Ruby Lake National Wildlife Refuge- Data...

    • datasets.ai
    • s.cnmilf.com
    • +1more
    57
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of the Interior (2023). Invasive Plant Inventory at Ruby Lake National Wildlife Refuge- Data Documentation [Dataset]. https://datasets.ai/datasets/invasive-plant-inventory-at-ruby-lake-national-wildlife-refuge-data-documentation-a241d
    Explore at:
    57Available download formats
    Dataset updated
    Jun 1, 2023
    Dataset authored and provided by
    Department of the Interior
    Description

    In 2019, an invasive plant inventory of priority invasive plant species in priority areas was conducted at Ruby Lake National Wildlife Refuge. Results from this effort will inform the development of invasive plant management objectives, strategies, and serves as a baseline for assessing change in the status of invasive plant distribution or abundance over time. This report holds the data documenting this effort.

  6. H

    Documentation and Metadata

    • dataverse.harvard.edu
    • dataverse.lib.virginia.edu
    Updated May 22, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2015). Documentation and Metadata [Dataset]. http://doi.org/10.7910/DVN/8KN41O
    Explore at:
    application/x-download(21383), pptx(3299456), doc(71680), application/x-download(30506), xlsx(67819), application/x-download(33870), pdf(286050), doc(72192)Available download formats
    Dataset updated
    May 22, 2015
    Dataset provided by
    Harvard Dataverse
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Data Documentation and Metadata session from the 2015 Virginia Data Management Bootcamp. Introduces non-structural (data dictionaries, read me files, code books) and structured ways (XML schemas) to document research data.

  7. b

    Data Documentation Initiative Vocabulary

    • bioregistry.io
    Updated Aug 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Data Documentation Initiative Vocabulary [Dataset]. https://bioregistry.io/ddi
    Explore at:
    Dataset updated
    Aug 15, 2025
    Description

    A set of controlled vocabularies in the Data Documentation Initiative, each of which has its own code.

  8. Company Documents Dataset

    • kaggle.com
    zip
    Updated May 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ayoub Cherguelaine (2024). Company Documents Dataset [Dataset]. https://www.kaggle.com/datasets/ayoubcherguelaine/company-documents-dataset
    Explore at:
    zip(9789538 bytes)Available download formats
    Dataset updated
    May 23, 2024
    Authors
    Ayoub Cherguelaine
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Overview

    This dataset contains a collection of over 2,000 company documents, categorized into four main types: invoices, inventory reports, purchase orders, and shipping orders. Each document is provided in PDF format, accompanied by a CSV file that includes the text extracted from these documents, their respective labels, and the word count of each document. This dataset is ideal for various natural language processing (NLP) tasks, including text classification, information extraction, and document clustering.

    Dataset Content

    PDF Documents: The dataset includes 2,677 PDF files, each representing a unique company document. These documents are derived from the Northwind dataset, which is commonly used for demonstrating database functionalities.

    The document types are:

    • Invoices: Detailed records of transactions between a buyer and a seller.
    • Inventory Reports: Records of inventory levels, including items in stock and units sold.
    • Purchase Orders: Requests made by a buyer to a seller to purchase products or services.
    • Shipping Orders: Instructions for the delivery of goods to specified recipients.

    Example Entries

    Here are a few example entries from the CSV file:

    Shipping Order:

    • Order ID: 10718
    • Shipping Details: "Ship Name: Königlich Essen, Ship Address: Maubelstr. 90, Ship City: ..."
    • Word Count: 120

    Invoice:

    • Order ID: 10707
    • Customer Details: "Customer ID: Arout, Order Date: 2017-10-16, Contact Name: Th..."
    • Word Count: 66

    Purchase Order:

    • Order ID: 10892
    • Order Details: "Order Date: 2018-02-17, Customer Name: Catherine Dewey, Products: Product ..."
    • Word Count: 26

    Applications

    This dataset can be used for:

    • Text Classification: Train models to classify documents into their respective categories.
    • Information Extraction: Extract specific fields and details from the documents.
    • Document Clustering: Group similar documents together based on their content.
    • OCR and Text Mining: Improve OCR (Optical Character Recognition) models and text mining techniques using real-world data.
  9. o

    Certificates and Documents Documentation - Dataset - Open Government Data...

    • opendata.gov.jo
    Updated Jan 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Certificates and Documents Documentation - Dataset - Open Government Data Portal [Dataset]. https://opendata.gov.jo/dataset/certificates-and-documents-documentation-2915-2023
    Explore at:
    Dataset updated
    Jan 31, 2024
    Description

    Certificates and Documents Documentation

  10. Z

    ESTRAM: data documentation

    • nde-dev.biothings.io
    • data.niaid.nih.gov
    • +1more
    Updated Nov 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schlemminger, Marlon (2023). ESTRAM: data documentation [Dataset]. https://nde-dev.biothings.io/resources?id=zenodo_10089520
    Explore at:
    Dataset updated
    Nov 9, 2023
    Dataset provided by
    Peterssen, Florian
    Lohr, Clemens
    Schlemminger, Marlon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This documentation offers an overview of data employed in energy system optimization models built with the ESTRAM framework by a research group of the Leibniz University Hannover and the Institute for Solar Energy Research Hamelin (ISFH). It is important to note that specific models may utilize distinct data as indicated in their respective studies.

  11. a

    RBDC Open Data Documentation

    • communautaire-esrica-apps.hub.arcgis.com
    • data.torontopolice.on.ca
    • +1more
    Updated Nov 10, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Toronto Police Service (2022). RBDC Open Data Documentation [Dataset]. https://communautaire-esrica-apps.hub.arcgis.com/datasets/TorontoPS::rbdc-open-data-documentation
    Explore at:
    Dataset updated
    Nov 10, 2022
    Dataset authored and provided by
    Toronto Police Service
    Description

    Documentation describing the Race and Identity-Based Data Collection Strategy data tables released as open data, including table descriptions, metadata, and glossary of terms.

  12. Priority Resources of Concern for Stillwater National Wildlife Refuge...

    • catalog.data.gov
    • datasets.ai
    Updated Nov 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Fish and Wildlife Service (2025). Priority Resources of Concern for Stillwater National Wildlife Refuge Complex - Data Documentation [Dataset]. https://catalog.data.gov/dataset/priority-resources-of-concern-for-stillwater-national-wildlife-refuge-complex-data-documen
    Explore at:
    Dataset updated
    Nov 14, 2025
    Dataset provided by
    U.S. Fish and Wildlife Servicehttp://www.fws.gov/
    Description

    A collection of data serving as documentation of Stillwater National Wildlife Refuge Complex priority resources of concern.

  13. V

    USDA - Food Environment Atlas - Data Access and Documentation

    • data.virginia.gov
    html
    Updated Feb 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Other (2024). USDA - Food Environment Atlas - Data Access and Documentation [Dataset]. https://data.virginia.gov/dataset/usda-food-environment-atlas-data-access-and-documentation
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Feb 3, 2024
    Dataset authored and provided by
    Other
    Description

    Please find attached the data documentation. The Atlas is based on 2010 census tract polygons. To use the underlying Atlas data in a GIS, the data from this spreadsheet needs to be joined to a census tract boundary file. With ESRI software, users should have access to the tract layer on ESRI's "Data and Maps" data distribution. For users of other software, tract boundaries can be downloaded directly from the Census Bureau's Cartographic Boundary Files. The underlying map services used in the Food Access Research Atlas are also available for both developers and GIS users. See the Geospatial API documentation for more information.

  14. OCR image data for Thai documents

    • kaggle.com
    zip
    Updated Jun 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Appen Limited (2025). OCR image data for Thai documents [Dataset]. https://www.kaggle.com/datasets/appenlimited/ocr-image-data-for-thai-documents
    Explore at:
    zip(26285828 bytes)Available download formats
    Dataset updated
    Jun 25, 2025
    Authors
    Appen Limited
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    如需完整数据集或了解更多,请发邮件至commercialproduct@appen.com For the complete dataset or more, please email commercialproduct@appen.com

    The dataset product can be used in many AI pilot projects and supplement production models with other data. It can improve the model performance and be cost-effectiveness. Dataset is an excellent solution when time and budget is limited. Appen database team can provide a large number of database products, such as ASR, TTS, video, text, image. At the same time, we are also constantly building new datasets to expand resources. Database team always strive to deliver as soon as possible to meet the needs of the global customers. This OCR database consists of image data in Korean, Vietnamese, Spanish, French, Thai, Japanese, Indonesian, Tamil, and Burmese, as well as handwritten images in both Chinese and English (including annotations). On average, each image contains 30 to 40 frames, including texts in various languages, special characters, and numbers. The accuracy rate requirement is over 99% (both position and content are correct). The images include the following categories: - RECEIPT - IDCARD - TRADE - TABLE - WHITEBOARD - NEWSPAPER - THESIS - CARD - NOTE - CONTRACT - BOOKCONTENT - HANDWRITING

    1. Data Specification Usage Cases Image label recognition training Collecting device Mobile phone / Camera Collecting environment Multiple lights environments

    Database Name Category Quantity

    Korean Document OCR Images

    RECEIPT 1500 IDCARD 500 TRADE 1012 TABLE 512 WHITEBOARD 500 NEWSPAPER 500 THESIS 500 CARD 500 NOTE 499 CONTRACT 501 BOOKCONTENT 500 TOTAL 7,024

    Vietnamese Document OCR Images

    RECEIPT 337 IDCARD 100 TRADE 227 TABLE 100 WHITEBOARD 111 NEWSPAPER 100 THESIS 100 CARD 100 NOTE 100 CONTRACT 105 BOOKCONTENT 700 TOTAL 2,080

    Spanish Document OCR Images

    RECEIPT 1500 IDCARD 500 TRADE 1000 TABLE 500 WHITEBOARD 500 NEWSPAPER 500 THESIS 500 CARD 500 NOTE 500 CONTRACT 500 BOOKCONTENT 500 TOTAL 7000

    French Document OCR Images

    RECEIPT 300 IDCARD 100 TRADE 200 TABLE 100 WHITEBOARD 100 NEWSPAPER 100 THESIS 103 CARD 100 NOTE 100 CONTRACT 100 BOOKCONTENT 700 TOTAL 2003

    Thai Document OCR Images

    RECEIPT 1500 IDCARD 500 TRADE 1000 TABLE 537 WHITEBOARD 500 NEWSPAPER 500 THESIS 500 CARD 500 NOTE 500 CONTRACT 500 BOOKCONTENT 500 TOTAL 7037

    Japanese Document OCR Images

    RECEIPT 1586 IDCARD 500 TRADE 1000 TABLE 552 WHITEBOARD 500 NEWSPAPER 500 THESIS 509 CARD 500 NOTE 500 CONTRACT 500 BOOKCONTENT 500 TOTAL 7147

    Indonesian Document OCR Images

    RECEIPT 1500 IDCARD 500 TRADE 1003 TABLE 500 WHITEBOARD 501 NEWSPAPER 502 THESIS 500 CARD 500 NOTE 500 CONTRACT 500 BOOKCONTENT 500 TOTAL 7006

    Tamil Document OCR Images

    RECEIPT 356 IDCARD 98 TRADE 475 TABLE 532 WHITEBOARD 501 NEWSPAPER 500 THESIS 500 CARD 500 NOTE 501 CONTRACT 500 BOOKCONTENT 500 TOTAL 4963

    Burmese Document OCR Images

    RECEIPT 300 IDCARD 100 TRADE 200 TABLE 117 WHITEBOARD 110 NEWSPAPER 108 THESIS 102 CARD 100 NOTE 120 CONTRACT 100 BOOKCONTENT 761 TOTAL 2118

    English Handwritten Datasets HANDWRITING 2278 Chinese Handwritten Datasets HANDWRITING 11118

    1. Information provided by database
    2. Data Format:. JPG
  15. d

    Invasive Plant Prioritization for Inventory and Early Detection at Marin...

    • datasets.ai
    • catalog.data.gov
    1, 8
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of the Interior (2023). Invasive Plant Prioritization for Inventory and Early Detection at Marin Islands National Wildlife Refuge - Data Documentation [Dataset]. https://datasets.ai/datasets/invasive-plant-prioritization-for-inventory-and-early-detection-at-marin-islands-national-
    Explore at:
    1, 8Available download formats
    Dataset updated
    Jun 1, 2023
    Dataset authored and provided by
    Department of the Interior
    Description

    In 2017, invasive plant species and area priorities for baseline inventory and early detection were identified for Marin Islands National Wildlife Refuge. Results from this effort will inform a future inventory, and guide development of invasive plant management objectives and strategies. This record holds the data documenting this effort.

  16. D

    University of Twente : course research and data documentation

    • dataverse.nl
    pdf
    Updated Feb 4, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DataverseNL (2018). University of Twente : course research and data documentation [Dataset]. http://doi.org/10.34894/Z34BRA
    Explore at:
    pdf(1215972), pdf(223990), pdf(1827940)Available download formats
    Dataset updated
    Feb 4, 2018
    Dataset provided by
    DataverseNL
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Twente
    Description

    Educational materials used in the course 'Research and data documentation', University of Twente, Enschede, the Netherlands

  17. Data from: DEEP IMPACT/EPOXI DOCUMENTATION SET V3.0

    • data.nasa.gov
    • s.cnmilf.com
    • +2more
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov, DEEP IMPACT/EPOXI DOCUMENTATION SET V3.0 [Dataset]. https://data.nasa.gov/dataset/deep-impact-epoxi-documentation-set-v3-0-4d8de
    Explore at:
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    This data set contains version 3.0 of the updated collection of documentation for the raw and calibrated science data sets for the Deep Impact and EPOXI missions. This data set supersedes version 2.0.

  18. r

    Data from: Methods documentation

    • redivis.com
    • stanford.redivis.com
    Updated Oct 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford Center for Population Health Sciences (2025). Methods documentation [Dataset]. https://redivis.com/datasets/6f7e-cxanam2b8
    Explore at:
    Dataset updated
    Oct 1, 2025
    Dataset authored and provided by
    Stanford Center for Population Health Sciences
    Description

    This is an auto-generated index table corresponding to a folder of files in this dataset with the same name. This table can be used to extract a subset of files based on their metadata, which can then be used for further analysis. You can view the contents of specific files by navigating to the "cells" tab and clicking on an individual file_id.

  19. d

    Audit Documentation and Appendices - Datasets - data.wa.gov.au

    • catalogue.data.wa.gov.au
    Updated Jun 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Audit Documentation and Appendices - Datasets - data.wa.gov.au [Dataset]. https://catalogue.data.wa.gov.au/dataset/audit-documentation-and-appendices
    Explore at:
    Dataset updated
    Jun 20, 2025
    Area covered
    Western Australia
    Description

    The data and interpretations presented are based on firsthand experience, being compiled by the Department of Conservation and Land Management’s regional nature conservation staff between July 2001 and January 2002. Note: to access the data, select the data source link located on the right-hand side. Show full description

  20. a

    MAR 2.0 Data Dictionary

    • hub.arcgis.com
    • opendata.dc.gov
    • +2more
    Updated Jun 30, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Washington, DC (2022). MAR 2.0 Data Dictionary [Dataset]. https://hub.arcgis.com/documents/130778ae88bb433cb0024298c478ab46
    Explore at:
    Dataset updated
    Jun 30, 2022
    Dataset authored and provided by
    City of Washington, DC
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    The Master Address Repository (MAR) 2.0 is the successor to the Master Address Repository. The Master Address Repository is a complex and widely accessed database that is increasingly being accessed by many DC Government applications. It is important to have high quality documentation readily accessible for such widely used databases. This document contains the column (field) definitions for the most important views, tables and feature classes within the MAR 2.0.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
California Department of Parks and Recreation (2021). Open Data Documentation [Dataset]. https://data.ca.gov/dataset/open-data-documentation
Organization logo

Open Data Documentation

Explore at:
42 scholarly articles cite this dataset (View in Google Scholar)
pdfAvailable download formats
Dataset updated
Apr 26, 2021
Dataset provided by
California State Parkshttps://www.parks.ca.gov/
Authors
California Department of Parks and Recreation
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Useful information and links for navigating this site, understanding and utilizing Open Data

Search
Clear search
Close search
Google apps
Main menu