100+ datasets found

Open Data Documentation
data.ca.gov
data.cnra.ca.gov
+3more
pdf
Updated Apr 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Parks and Recreation (2021). Open Data Documentation [Dataset]. https://data.ca.gov/dataset/open-data-documentation
Explore at:
pdfAvailable download formats
Dataset updated
Apr 26, 2021
Dataset provided by
California State Parkshttps://www.parks.ca.gov/
Authors
California Department of Parks and Recreation
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Useful information and links for navigating this site, understanding and utilizing Open Data
Invasive Plant Inventory at San Pablo Bay National Wildlife Refuge- Data...
catalog.data.gov
datasets.ai
Updated Nov 25, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Fish and Wildlife Service (2025). Invasive Plant Inventory at San Pablo Bay National Wildlife Refuge- Data Documentation [Dataset]. https://catalog.data.gov/dataset/invasive-plant-inventory-at-san-pablo-bay-national-wildlife-refuge-data-documentation
Explore at:
Dataset updated
Nov 25, 2025
Dataset provided by
U.S. Fish and Wildlife Servicehttp://www.fws.gov/
Description
In 2013, an invasive plant inventory of priority invasive plant species in priority areas was conducted at San Pablo Bay National Wildlife Refuge. Results from this effort will inform the development of invasive plant management objectives, strategies, and serves as a baseline for assessing change in the status of invasive plant distribution or abundance over time.
Z
The Clarity Software Documentation Dataset
data.niaid.nih.gov
data-staging.niaid.nih.gov
+1more
Updated Jan 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous Authors (2022). The Clarity Software Documentation Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5821839
Explore at:
Dataset updated
Jan 6, 2022
Dataset provided by
Anonymous
Authors
Anonymous Authors
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository holds the Clarity Dataset which is a companion to the SANER'22 entitled "An Empirical Investigation into the Use of Image Captioning for Automated Software Documentation". The dataset consists of 45,998 captions 10,204 GUI screenshots and xml metadata files (akin to the "html" for stipulating GUIs) of Android applications. The NL captions were obtained from human labelers, underwent several quality control mechanisms, and contain both high- (screen-level) and low-(component) level descriptions of screen functionality. This dataset is meant as a new source of data to augment techniques for software documentation that can take advantage of the rich pixel-based information contained within screenshots.
Radio Science Documentation Bundle - Dataset - NASA Open Data Portal
data.nasa.gov
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Radio Science Documentation Bundle - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/radio-science-documentation-bundle
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
This bundle contains documentation about data products that are collected using radio science and supporting equipment. With one exception, each member collection contains one or more versions of a single Software Interface Specification (SIS) or an equivalent document. A SIS describes the format and content of a data file at a granularity suffient for use -- typically byte-level, but sometimes bit-level. Examples of products and descriptions of their use may also be included in a collection, as appropriate. The exception is the DOCUMENT collection, which contains supporting material -- usually journal publications, technical reports, or other documents that describe investigations, analysis methods, and/or data but not at the level of a SIS. Members of the DOCUMENT collection were usually released once, whereas a SIS often evolves over many years.
d
Invasive Plant Inventory at Ruby Lake National Wildlife Refuge- Data...
datasets.ai
s.cnmilf.com
+1more
57
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of the Interior (2023). Invasive Plant Inventory at Ruby Lake National Wildlife Refuge- Data Documentation [Dataset]. https://datasets.ai/datasets/invasive-plant-inventory-at-ruby-lake-national-wildlife-refuge-data-documentation-a241d
Explore at:
57Available download formats
Dataset updated
Jun 1, 2023
Dataset authored and provided by
Department of the Interior
Description
In 2019, an invasive plant inventory of priority invasive plant species in priority areas was conducted at Ruby Lake National Wildlife Refuge. Results from this effort will inform the development of invasive plant management objectives, strategies, and serves as a baseline for assessing change in the status of invasive plant distribution or abundance over time. This report holds the data documenting this effort.
H
Documentation and Metadata
dataverse.harvard.edu
dataverse.lib.virginia.edu
Updated May 22, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harvard Dataverse (2015). Documentation and Metadata [Dataset]. http://doi.org/10.7910/DVN/8KN41O
Explore at:
application/x-download(21383), pptx(3299456), doc(71680), application/x-download(30506), xlsx(67819), application/x-download(33870), pdf(286050), doc(72192)Available download formats
Unique identifier
https://doi.org/10.7910/DVN/8KN41O
Dataset updated
May 22, 2015
Dataset provided by
Harvard Dataverse
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Data Documentation and Metadata session from the 2015 Virginia Data Management Bootcamp. Introduces non-structural (data dictionaries, read me files, code books) and structured ways (XML schemas) to document research data.
b
Data Documentation Initiative Vocabulary
bioregistry.io
Updated Aug 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Data Documentation Initiative Vocabulary [Dataset]. https://bioregistry.io/ddi
Explore at:
Dataset updated
Aug 15, 2025
Description
A set of controlled vocabularies in the Data Documentation Initiative, each of which has its own code.
Company Documents Dataset
kaggle.com
zip
Updated May 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ayoub Cherguelaine (2024). Company Documents Dataset [Dataset]. https://www.kaggle.com/datasets/ayoubcherguelaine/company-documents-dataset
Explore at:
zip(9789538 bytes)Available download formats
Dataset updated
May 23, 2024
Authors
Ayoub Cherguelaine
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Overview

This dataset contains a collection of over 2,000 company documents, categorized into four main types: invoices, inventory reports, purchase orders, and shipping orders. Each document is provided in PDF format, accompanied by a CSV file that includes the text extracted from these documents, their respective labels, and the word count of each document. This dataset is ideal for various natural language processing (NLP) tasks, including text classification, information extraction, and document clustering.

Dataset Content

PDF Documents: The dataset includes 2,677 PDF files, each representing a unique company document. These documents are derived from the Northwind dataset, which is commonly used for demonstrating database functionalities.

The document types are:

Invoices: Detailed records of transactions between a buyer and a seller.

Inventory Reports: Records of inventory levels, including items in stock and units sold.

Purchase Orders: Requests made by a buyer to a seller to purchase products or services.

Shipping Orders: Instructions for the delivery of goods to specified recipients.

Example Entries

Here are a few example entries from the CSV file:

Shipping Order:

Order ID: 10718

Shipping Details: "Ship Name: Königlich Essen, Ship Address: Maubelstr. 90, Ship City: ..."

Word Count: 120

Invoice:

Order ID: 10707

Customer Details: "Customer ID: Arout, Order Date: 2017-10-16, Contact Name: Th..."

Word Count: 66

Purchase Order:

Order ID: 10892

Order Details: "Order Date: 2018-02-17, Customer Name: Catherine Dewey, Products: Product ..."

Word Count: 26

Applications

This dataset can be used for:

Text Classification: Train models to classify documents into their respective categories.

Information Extraction: Extract specific fields and details from the documents.

Document Clustering: Group similar documents together based on their content.

OCR and Text Mining: Improve OCR (Optical Character Recognition) models and text mining techniques using real-world data.
o
Certificates and Documents Documentation - Dataset - Open Government Data...
opendata.gov.jo
Updated Jan 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Certificates and Documents Documentation - Dataset - Open Government Data Portal [Dataset]. https://opendata.gov.jo/dataset/certificates-and-documents-documentation-2915-2023
Explore at:
Dataset updated
Jan 31, 2024
Description
Certificates and Documents Documentation
Z
ESTRAM: data documentation
nde-dev.biothings.io
data.niaid.nih.gov
+1more
Updated Nov 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Schlemminger, Marlon (2023). ESTRAM: data documentation [Dataset]. https://nde-dev.biothings.io/resources?id=zenodo_10089520
Explore at:
Dataset updated
Nov 9, 2023
Dataset provided by
Peterssen, Florian
Lohr, Clemens
Schlemminger, Marlon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This documentation offers an overview of data employed in energy system optimization models built with the ESTRAM framework by a research group of the Leibniz University Hannover and the Institute for Solar Energy Research Hamelin (ISFH). It is important to note that specific models may utilize distinct data as indicated in their respective studies.
a
RBDC Open Data Documentation
communautaire-esrica-apps.hub.arcgis.com
data.torontopolice.on.ca
+1more
Updated Nov 10, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Toronto Police Service (2022). RBDC Open Data Documentation [Dataset]. https://communautaire-esrica-apps.hub.arcgis.com/datasets/TorontoPS::rbdc-open-data-documentation
Explore at:
Dataset updated
Nov 10, 2022
Dataset authored and provided by
Toronto Police Service
Description
Documentation describing the Race and Identity-Based Data Collection Strategy data tables released as open data, including table descriptions, metadata, and glossary of terms.
Priority Resources of Concern for Stillwater National Wildlife Refuge...
catalog.data.gov
datasets.ai
Updated Nov 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Fish and Wildlife Service (2025). Priority Resources of Concern for Stillwater National Wildlife Refuge Complex - Data Documentation [Dataset]. https://catalog.data.gov/dataset/priority-resources-of-concern-for-stillwater-national-wildlife-refuge-complex-data-documen
Explore at:
Dataset updated
Nov 14, 2025
Dataset provided by
U.S. Fish and Wildlife Servicehttp://www.fws.gov/
Description
A collection of data serving as documentation of Stillwater National Wildlife Refuge Complex priority resources of concern.
V
USDA - Food Environment Atlas - Data Access and Documentation
data.virginia.gov
html
Updated Feb 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Other (2024). USDA - Food Environment Atlas - Data Access and Documentation [Dataset]. https://data.virginia.gov/dataset/usda-food-environment-atlas-data-access-and-documentation
Explore at:
htmlAvailable download formats
Dataset updated
Feb 3, 2024
Dataset authored and provided by
Other
Description
Please find attached the data documentation. The Atlas is based on 2010 census tract polygons. To use the underlying Atlas data in a GIS, the data from this spreadsheet needs to be joined to a census tract boundary file. With ESRI software, users should have access to the tract layer on ESRI's "Data and Maps" data distribution. For users of other software, tract boundaries can be downloaded directly from the Census Bureau's Cartographic Boundary Files. The underlying map services used in the Food Access Research Atlas are also available for both developers and GIS users. See the Geospatial API documentation for more information.
OCR image data for Thai documents
kaggle.com
zip
Updated Jun 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Appen Limited (2025). OCR image data for Thai documents [Dataset]. https://www.kaggle.com/datasets/appenlimited/ocr-image-data-for-thai-documents
Explore at:
zip(26285828 bytes)Available download formats
Dataset updated
Jun 25, 2025
Authors
Appen Limited
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
如需完整数据集或了解更多，请发邮件至commercialproduct@appen.com For the complete dataset or more, please email commercialproduct@appen.com

The dataset product can be used in many AI pilot projects and supplement production models with other data. It can improve the model performance and be cost-effectiveness. Dataset is an excellent solution when time and budget is limited. Appen database team can provide a large number of database products, such as ASR, TTS, video, text, image. At the same time, we are also constantly building new datasets to expand resources. Database team always strive to deliver as soon as possible to meet the needs of the global customers. This OCR database consists of image data in Korean, Vietnamese, Spanish, French, Thai, Japanese, Indonesian, Tamil, and Burmese, as well as handwritten images in both Chinese and English (including annotations). On average, each image contains 30 to 40 frames, including texts in various languages, special characters, and numbers. The accuracy rate requirement is over 99% (both position and content are correct). The images include the following categories: - RECEIPT - IDCARD - TRADE - TABLE - WHITEBOARD - NEWSPAPER - THESIS - CARD - NOTE - CONTRACT - BOOKCONTENT - HANDWRITING

Data Specification Usage Cases Image label recognition training Collecting device Mobile phone / Camera Collecting environment Multiple lights environments

Database Name Category Quantity

Korean Document OCR Images

RECEIPT 1500 IDCARD 500 TRADE 1012 TABLE 512 WHITEBOARD 500 NEWSPAPER 500 THESIS 500 CARD 500 NOTE 499 CONTRACT 501 BOOKCONTENT 500 TOTAL 7,024

Vietnamese Document OCR Images

RECEIPT 337 IDCARD 100 TRADE 227 TABLE 100 WHITEBOARD 111 NEWSPAPER 100 THESIS 100 CARD 100 NOTE 100 CONTRACT 105 BOOKCONTENT 700 TOTAL 2,080

Spanish Document OCR Images

RECEIPT 1500 IDCARD 500 TRADE 1000 TABLE 500 WHITEBOARD 500 NEWSPAPER 500 THESIS 500 CARD 500 NOTE 500 CONTRACT 500 BOOKCONTENT 500 TOTAL 7000

French Document OCR Images

RECEIPT 300 IDCARD 100 TRADE 200 TABLE 100 WHITEBOARD 100 NEWSPAPER 100 THESIS 103 CARD 100 NOTE 100 CONTRACT 100 BOOKCONTENT 700 TOTAL 2003

Thai Document OCR Images

RECEIPT 1500 IDCARD 500 TRADE 1000 TABLE 537 WHITEBOARD 500 NEWSPAPER 500 THESIS 500 CARD 500 NOTE 500 CONTRACT 500 BOOKCONTENT 500 TOTAL 7037

Japanese Document OCR Images

RECEIPT 1586 IDCARD 500 TRADE 1000 TABLE 552 WHITEBOARD 500 NEWSPAPER 500 THESIS 509 CARD 500 NOTE 500 CONTRACT 500 BOOKCONTENT 500 TOTAL 7147

Indonesian Document OCR Images

RECEIPT 1500 IDCARD 500 TRADE 1003 TABLE 500 WHITEBOARD 501 NEWSPAPER 502 THESIS 500 CARD 500 NOTE 500 CONTRACT 500 BOOKCONTENT 500 TOTAL 7006

Tamil Document OCR Images

RECEIPT 356 IDCARD 98 TRADE 475 TABLE 532 WHITEBOARD 501 NEWSPAPER 500 THESIS 500 CARD 500 NOTE 501 CONTRACT 500 BOOKCONTENT 500 TOTAL 4963

Burmese Document OCR Images

RECEIPT 300 IDCARD 100 TRADE 200 TABLE 117 WHITEBOARD 110 NEWSPAPER 108 THESIS 102 CARD 100 NOTE 120 CONTRACT 100 BOOKCONTENT 761 TOTAL 2118

English Handwritten Datasets HANDWRITING 2278 Chinese Handwritten Datasets HANDWRITING 11118

Information provided by database

Data Format：. JPG
d
Invasive Plant Prioritization for Inventory and Early Detection at Marin...
datasets.ai
catalog.data.gov
1, 8
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of the Interior (2023). Invasive Plant Prioritization for Inventory and Early Detection at Marin Islands National Wildlife Refuge - Data Documentation [Dataset]. https://datasets.ai/datasets/invasive-plant-prioritization-for-inventory-and-early-detection-at-marin-islands-national-
Explore at:
1, 8Available download formats
Dataset updated
Jun 1, 2023
Dataset authored and provided by
Department of the Interior
Description
In 2017, invasive plant species and area priorities for baseline inventory and early detection were identified for Marin Islands National Wildlife Refuge. Results from this effort will inform a future inventory, and guide development of invasive plant management objectives and strategies. This record holds the data documenting this effort.
D
University of Twente : course research and data documentation
dataverse.nl
pdf
Updated Feb 4, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DataverseNL (2018). University of Twente : course research and data documentation [Dataset]. http://doi.org/10.34894/Z34BRA
Explore at:
pdf(1215972), pdf(223990), pdf(1827940)Available download formats
Unique identifier
https://doi.org/10.34894/Z34BRA
Dataset updated
Feb 4, 2018
Dataset provided by
DataverseNL
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Twente
Description
Educational materials used in the course 'Research and data documentation', University of Twente, Enschede, the Netherlands
Data from: DEEP IMPACT/EPOXI DOCUMENTATION SET V3.0
data.nasa.gov
s.cnmilf.com
+2more
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov, DEEP IMPACT/EPOXI DOCUMENTATION SET V3.0 [Dataset]. https://data.nasa.gov/dataset/deep-impact-epoxi-documentation-set-v3-0-4d8de
Explore at:
Dataset provided by
NASAhttp://nasa.gov/
Description
This data set contains version 3.0 of the updated collection of documentation for the raw and calibrated science data sets for the Deep Impact and EPOXI missions. This data set supersedes version 2.0.
r
Data from: Methods documentation
redivis.com
stanford.redivis.com
Updated Oct 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford Center for Population Health Sciences (2025). Methods documentation [Dataset]. https://redivis.com/datasets/6f7e-cxanam2b8
Explore at:
Dataset updated
Oct 1, 2025
Dataset authored and provided by
Stanford Center for Population Health Sciences
Description
This is an auto-generated index table corresponding to a folder of files in this dataset with the same name. This table can be used to extract a subset of files based on their metadata, which can then be used for further analysis. You can view the contents of specific files by navigating to the "cells" tab and clicking on an individual file_id.
d
Audit Documentation and Appendices - Datasets - data.wa.gov.au
catalogue.data.wa.gov.au
Updated Jun 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Audit Documentation and Appendices - Datasets - data.wa.gov.au [Dataset]. https://catalogue.data.wa.gov.au/dataset/audit-documentation-and-appendices
Explore at:
Dataset updated
Jun 20, 2025
Area covered
Western Australia
Description
The data and interpretations presented are based on firsthand experience, being compiled by the Department of Conservation and Land Management’s regional nature conservation staff between July 2001 and January 2002. Note: to access the data, select the data source link located on the right-hand side. Show full description
a
MAR 2.0 Data Dictionary
hub.arcgis.com
opendata.dc.gov
+2more
Updated Jun 30, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Washington, DC (2022). MAR 2.0 Data Dictionary [Dataset]. https://hub.arcgis.com/documents/130778ae88bb433cb0024298c478ab46
Explore at:
Dataset updated
Jun 30, 2022
Dataset authored and provided by
City of Washington, DC
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
The Master Address Repository (MAR) 2.0 is the successor to the Master Address Repository. The Master Address Repository is a complex and widely accessed database that is increasingly being accessed by many DC Government applications. It is important to have high quality documentation readily accessible for such widely used databases. This document contains the column (field) definitions for the most important views, tables and feature classes within the MAR 2.0.

Facebook

Twitter

Click to copy link

Link copied

Cite

California Department of Parks and Recreation (2021). Open Data Documentation [Dataset]. https://data.ca.gov/dataset/open-data-documentation

Open Data Documentation

Explore at:

42 scholarly articles cite this dataset (View in Google Scholar)

pdfAvailable download formats

Dataset updated

Apr 26, 2021

Dataset provided by

California State Parkshttps://www.parks.ca.gov/

Authors

California Department of Parks and Recreation

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Useful information and links for navigating this site, understanding and utilizing Open Data

Clear search

Close search

Google apps

Main menu

Open Data Documentation

Invasive Plant Inventory at San Pablo Bay National Wildlife Refuge- Data...

The Clarity Software Documentation Dataset

Radio Science Documentation Bundle - Dataset - NASA Open Data Portal

Invasive Plant Inventory at Ruby Lake National Wildlife Refuge- Data...

Documentation and Metadata

Data Documentation Initiative Vocabulary

Company Documents Dataset

Overview

Dataset Content

Example Entries

Shipping Order:

Invoice:

Purchase Order:

Applications

Certificates and Documents Documentation - Dataset - Open Government Data...

ESTRAM: data documentation

RBDC Open Data Documentation

Priority Resources of Concern for Stillwater National Wildlife Refuge...

USDA - Food Environment Atlas - Data Access and Documentation

OCR image data for Thai documents

Korean Document OCR Images

Vietnamese Document OCR Images

Spanish Document OCR Images

French Document OCR Images

Thai Document OCR Images

Japanese Document OCR Images

Indonesian Document OCR Images

Tamil Document OCR Images

Burmese Document OCR Images

Invasive Plant Prioritization for Inventory and Early Detection at Marin...

University of Twente : course research and data documentation

Data from: DEEP IMPACT/EPOXI DOCUMENTATION SET V3.0

Data from: Methods documentation

Audit Documentation and Appendices - Datasets - data.wa.gov.au

MAR 2.0 Data Dictionary

Open Data Documentation