100+ datasets found

d
Financial Statement and Notes Data Sets
catalog.data.gov
datasets.ai
Updated Jun 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Economic and Risk Analysis (2025). Financial Statement and Notes Data Sets [Dataset]. https://catalog.data.gov/dataset/financial-statement-and-notes-data-sets
Explore at:
Dataset updated
Jun 4, 2025
Dataset provided by
Economic and Risk Analysis
Description
The data sets provide the text and detailed numeric information in all financial statements and their notes extracted from exhibits to corporate financial reports filed with the Commission using eXtensible Business Reporting Language (XBRL).
p
Data from: CLIP: A Dataset for Extracting Action Items for Physicians from...
physionet.org
Updated Jun 21, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James Mullenbach; Yada Pruksachatkun; Sean Adler; Jennifer Seale; Jordan Swartz; T Greg McKelvey; Yi Yang; David Sontag (2021). CLIP: A Dataset for Extracting Action Items for Physicians from Hospital Discharge Notes [Dataset]. http://doi.org/10.13026/kw00-z903
Explore at:
Unique identifier
https://doi.org/10.13026/kw00-z903
Dataset updated
Jun 21, 2021
Authors
James Mullenbach; Yada Pruksachatkun; Sean Adler; Jennifer Seale; Jordan Swartz; T Greg McKelvey; Yi Yang; David Sontag
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
We created a dataset of clinical action items annotated over MIMIC-III. This dataset, which we call CLIP, is annotated by physicians and covers 718 discharge summaries, representing 107,494 sentences. Annotations were collected as character-level spans to discharge summaries after applying surrogate generation to fill in the anonymized templates from MIMIC-III text with faked data. We release these spans, their aggregation into sentence-level labels, and the sentence tokenizer used to aggregate the spans and label sentences. We also release the surrogate data generator, and the document IDs used for training, validation, and test splits, to enable reproduction. The spans are annotated with 0 or more labels of 7 different types, representing the different actions that may need to be taken: Appointment, Lab, Procedure, Medication, Imaging, Patient Instructions, and Other. We encourage the community to use this dataset to develop methods for automatically extracting clinical action items from discharge summaries.
FINGAP07 NUMBER OF FINANCIAL STATEMENTS AND NOTES TO ACCOUNTS PRODUCED -...
data.sa.gov.au
Updated Jun 28, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.sa.gov.au (2016). FINGAP07 NUMBER OF FINANCIAL STATEMENTS AND NOTES TO ACCOUNTS PRODUCED - Dataset - data.sa.gov.au [Dataset]. https://data.sa.gov.au/data/dataset/fingap07-number-of-financial-statements-and-notes-to-accounts-produced
Explore at:
Dataset updated
Jun 28, 2016
Dataset provided by
Government of South Australiahttp://sa.gov.au/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
South Australia
Description
FINGAP07 NUMBER OF FINANCIAL STATEMENTS AND NOTES TO ACCOUNTS PRODUCED
o
Publicly available medical text data with authentic quality
explore.openaire.eu
data.niaid.nih.gov
+1more
Updated Oct 2, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rina Kagawa; Yukino Baba; Hideo Tsurushima (2020). Publicly available medical text data with authentic quality [Dataset]. http://doi.org/10.5281/zenodo.4064153
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.4064153
Dataset updated
Oct 2, 2020
Authors
Rina Kagawa; Yukino Baba; Hideo Tsurushima
Description
This dataset is the public medical text record (progress notes) written in Japanese. Any researchers can use this dataset without privacy issues. CC BY-NC 4.0 crowd.zip: 9,756 pseudo progress notes written by crowd workers crowd_evaluated.zip: 83 pseudo progress notes with authentic quality written by crowd workers MD.zip: 19 pseudo progress notes written by medical doctors Reference: Kagawa, R., Baba, Y., & Tsurushima, H. (2021, December). A practical and universal framework for generating publicly available medical notes of authentic quality via the power of crowds. In 2021 IEEE International Conference on Big Data (Big Data) (pp. 3534-3543). IEEE. http://hdl.handle.net/2241/0002002333 The supplemental files of the paper are here: https://github.com/rinabouk/HMData2021 {"references": ["Kagawa, R., Baba, Y., & Tsurushima, H. (2021, December). A practical and universal framework for generating publicly available medical notes of authentic quality via the power of crowds. In 2021 IEEE International Conference on Big Data (Big Data) (pp. 3534-3543). IEEE. http://hdl.handle.net/2241/0002002333"]}
d
Notes to Financial Statements
data.gov.cz
data.europa.eu
Updated Nov 7, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ministerstvo financí (2024). Notes to Financial Statements [Dataset]. https://data.gov.cz/dataset?iri=https%3A%2F%2Fdata.gov.cz%2Fzdroj%2Fdatov%C3%A9-sady%2F00006947%2F247a76a4aeaaa197840ef8ae32db79f5
Explore at:
Dataset updated
Nov 7, 2024
Dataset authored and provided by
Ministerstvo financí
Description
Financial statements: Notes to Financial Statements
u
Writing Center Session Note Data Repository
deepblue.lib.umich.edu
Updated Apr 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Modey, Christine A.; Cheatle, Joseph; Giaimo, Genie N. (2022). Writing Center Session Note Data Repository [Dataset]. http://doi.org/10.7302/25dd-as06
Explore at:
Unique identifier
https://doi.org/10.7302/25dd-as06
Dataset updated
Apr 21, 2022
Dataset provided by
Deep Blue Data
Authors
Modey, Christine A.; Cheatle, Joseph; Giaimo, Genie N.
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Data includes information regarding session notes from sixty-three institutions, including blank session note forms, data sets of completed session notes, and survey data about how sessions notes are conceived of, and used, in writing centers.
d
Interagency Data Team Notes August 9, 2017
catalog.data.gov
opendata.dc.gov
+1more
Updated Feb 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Washington, DC (2025). Interagency Data Team Notes August 9, 2017 [Dataset]. https://catalog.data.gov/dataset/interagency-data-team-notes-august-9-2017
Explore at:
Dataset updated
Feb 5, 2025
Dataset provided by
City of Washington, DC
Description
Meeting notes from Interagency Data Team meetings. These are best attempt to capture notable comments and questions from attendees. Notes are paraphrased. Please reference presentation or contact open.data@dc.gov with questions. The Interagency Data Team is a community of data analysts, or agency liaisons, who convene regularly with representation from DC agencies of all persuasions. Participants engage in discussions regarding the team’s core mission and priorities for a better kind of data culture – collection, application, sharing, classification and governance to name a few. The team is coordinated by the Office of the Chief Technology Officer (OCTO), lead by the Chief Data Officer (CDO), and directly supports the District of Columbia's Data Policy.Role of the DC State Data CenterEnterprise Dataset Inventory, Lessons from Pilot AgenciesEnterprise Dataset Inventory, A General Counsel’s Perspective
p
MIMIC-IV-Note: Deidentified free-text clinical notes
physionet.org
Updated Jan 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alistair Johnson; Tom Pollard; Steven Horng; Leo Anthony Celi; Roger Mark (2023). MIMIC-IV-Note: Deidentified free-text clinical notes [Dataset]. http://doi.org/10.13026/1n74-ne17
Explore at:
Unique identifier
https://doi.org/10.13026/1n74-ne17
Dataset updated
Jan 6, 2023
Authors
Alistair Johnson; Tom Pollard; Steven Horng; Leo Anthony Celi; Roger Mark
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
The advent of large, open access text databases has driven advances in state-of-the-art model performance in natural language processing (NLP). The relatively limited amount of clinical data available for NLP has been cited as a significant barrier to the field's progress. Here we describe MIMIC-IV-Note: a collection of deidentified free-text clinical notes for patients included in the MIMIC-IV clinical database. MIMIC-IV-Note contains 331,794 deidentified discharge summaries from 145,915 patients admitted to the hospital and emergency department at the Beth Israel Deaconess Medical Center in Boston, MA, USA. The database also contains 2,321,355 deidentified radiology reports for 237,427 patients. All notes have had protected health information removed in accordance with the Health Insurance Portability and Accountability Act (HIPAA) Safe Harbor provision. All notes are linkable to MIMIC-IV providing important context to the clinical data therein. The database is intended to stimulate research in clinical natural language processing and associated areas.
F
Turkish Handwritten Sticky Notes OCR Image Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Turkish Handwritten Sticky Notes OCR Image Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/turkish-sticky-notes-ocr-image-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
Introducing the Turkish Sticky Notes Image Dataset - a diverse and comprehensive collection of handwritten text images carefully curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Turkish language.
Dataset Contain & Diversity:
Containing more than 2000 images, this Turkish OCR dataset offers a wide distribution of different types of sticky note images. Within this dataset, you'll discover a variety of handwritten text, including quotes, sentences, and individual words on sticky notes. The images in this dataset showcase distinct handwriting styles, fonts, font sizes, and writing variations.
To ensure diversity and robustness in training your OCR model, we allow limited (less than three) unique images in a single handwriting. This ensures we have diverse types of handwriting to train your OCR model on. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible Turkish text.
The images have been captured under varying lighting conditions, including day and night, as well as different capture angles and backgrounds. This diversity helps build a balanced OCR dataset, featuring images in both portrait and landscape modes.
All these sticky notes were written and images were captured by native Turkish people to ensure text quality, prevent toxic content, and exclude PII text. We utilized the latest iOS and Android mobile devices with cameras above 5MP to maintain image quality. Images in this training dataset are available in both JPEG and HEIC formats.
Metadata:
In addition to the image data, you will receive structured metadata in CSV format. For each image, this metadata includes information on image orientation, country, language, and device details. Each image is correctly named to correspond with the metadata.
This metadata serves as a valuable resource for understanding and characterizing the data, aiding informed decision-making in the development of Turkish text recognition models.
Update & Custom Collection:
We are committed to continually expanding this dataset by adding more images with the help of our native Turkish crowd community.
If you require a customized OCR dataset containing sticky note images tailored to your specific guidelines or device distribution, please don't hesitate to contact us. We have the capability to curate specialized data to meet your unique requirements.
Additionally, we can annotate or label the images with bounding boxes or transcribe the text in the images to align with your project's specific needs using our crowd community.
License:
This image dataset, created by FutureBeeAI, is now available for commercial use.
Conclusion:
Leverage this sticky notes image OCR dataset to enhance the training and performance of text recognition, text detection, and optical character recognition models for the Turkish language. Your journey to improved language understanding and processing begins here.
Data from: Epilepsy-iEEG-Multicenter-Dataset
openneuro.org
Updated Dec 2, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adam Li; Sara Inati; Kareem Zaghloul; Nathan Crone; William Anderson; Emily Johnson; Iahn Cajigas; Damian Brusko; Jonathan Jagid; Angel Claudio; Andres Kanner; Jennifer Hopp; Stephanie Chen; Jennifer Haagensen; Sridevi Sarma (2020). Epilepsy-iEEG-Multicenter-Dataset [Dataset]. http://doi.org/10.18112/openneuro.ds003029.v1.0.2
Explore at:
Unique identifier
https://doi.org/10.18112/openneuro.ds003029.v1.0.2
Dataset updated
Dec 2, 2020
Dataset provided by
OpenNeurohttps://openneuro.org/
Authors
Adam Li; Sara Inati; Kareem Zaghloul; Nathan Crone; William Anderson; Emily Johnson; Iahn Cajigas; Damian Brusko; Jonathan Jagid; Angel Claudio; Andres Kanner; Jennifer Hopp; Stephanie Chen; Jennifer Haagensen; Sridevi Sarma
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Fragility Multi-Center Retrospective Study

iEEG and EEG data from 5 centers is organized in our study with a total of 100 subjects. We publish 4 centers' dataset here due to data sharing issues.

Acquisitions include ECoG and SEEG. Each run specifies a different snapshot of EEG data from that specific subject's session. For seizure sessions, this means that each run is a EEG snapshot around a different seizure event.

For additional clinical metadata about each subject, refer to the clinical Excel table in the publication.

Data Availability

NIH, JHH, UMMC, and UMF agreed to share. Cleveland Clinic did not, so requires an additional DUA.

All data, except for Cleveland Clinic was approved by their centers to be de-identified and shared. All data in this dataset have no PHI, or other identifiers associated with patient. In order to access Cleveland Clinic data, please forward all requests to Amber Sours, SOURSA@ccf.org:

Amber Sours, MPH Research Supervisor | Epilepsy Center Cleveland Clinic | 9500 Euclid Ave. S3-399 | Cleveland, OH 44195 (216) 444-8638

You will need to sign a data use agreement (DUA).

Sourcedata

For each subject, there was a raw EDF file, which was converted into the BrainVision format with mne_bids. Each subject with SEEG implantation, also has an Excel table, called electrode_layout.xlsx, which outlines where the clinicians marked each electrode anatomically. Note that there is no rigorous atlas applied, so the main points of interest are: WM, GM, VENTRICLE, CSF, and OUT, which represent white-matter, gray-matter, ventricle, cerebrospinal fluid and outside the brain. WM, Ventricle, CSF and OUT were removed channels from further analysis. These were labeled in the corresponding BIDS channels.tsv sidecar file as status=bad. The dataset uploaded to openneuro.org does not contain the sourcedata since there was an extra anonymization step that occurred when fully converting to BIDS.

Derivatives

Derivatives include: * fragility analysis * frequency analysis * graph metrics analysis * figures

These can be computed by following the following paper: Neural Fragility as an EEG Marker for the Seizure Onset Zone

Events and Descriptions

Within each EDF file, there contain event markers that are annotated by clinicians, which may inform you of specific clinical events that are occuring in time, or of when they saw seizures onset and offset (clinical and electrographic).

During a seizure event, specifically event markers may follow this time course:

* eeg onset, or clinical onset - the onset of a seizure that is either marked electrographically, or by clinical behavior. Note that the clinical onset may not always be present, since some seizures manifest without clinical behavioral changes. * Marker/Mark On - these are usually annotations within some cases, where a health practitioner injects a chemical marker for use in ICTAL SPECT imaging after a seizure occurs. This is commonly done to see which portions of the brain are active metabolically. * Marker/Mark Off - This is when the ICTAL SPECT stops imaging. * eeg offset, or clinical offset - this is the offset of the seizure, as determined either electrographically, or by clinical symptoms.

Other events included may be beneficial for you to understand the time-course of each seizure. Note that ICTAL SPECT occurs in all Cleveland Clinic data. Note that seizure markers are not consistent in their description naming, so one might encode some specific regular-expression rules to consistently capture seizure onset/offset markers across all dataset. In the case of UMMC data, all onset and offset markers were provided by the clinicians on an Excel sheet instead of via the EDF file. So we went in and added the annotations manually to each EDF file.

Seizure Electrographic and Clinical Onset Annotations

For various datasets, there are seizures present within the dataset. Generally there is only one seizure per EDF file. When seizures are present, they are marked electrographically (and clinically if present) via standard approaches in the epilepsy clinical workflow.

Clinical onset are just manifestation of the seizures with clinical syndromes. Sometimes the maker may not be present.

Seizure Onset Zone Annotations

What is actually important in the evaluation of datasets is the clinical annotations of their localization hypotheses of the seizure onset zone.

These generally include:

* early onset: the earliest onset electrodes participating in the seizure that clinicians saw * early/late spread (optional): the electrodes that showed epileptic spread activity after seizure onset. Not all seizures has spread contacts annotated.

Surgical Zone (Resection or Ablation) Annotations

For patients with the post-surgical MRI available, then the segmentation process outlined above tells us which electrodes were within the surgical removed brain region.

Otherwise, clinicians give us their best estimate, of which electrodes were resected/ablated based on their surgical notes.

For surgical patients whose postoperative medical records did not explicitly indicate specific resected or ablated contacts, manual visual inspection was performed to determine the approximate contacts that were located in later resected/ablated tissue. Postoperative T1 MRI scans were compared against post-SEEG implantation CT scans or CURRY coregistrations of preoperative MRI/post SEEG CT scans. Contacts of interest in and around the area of the reported resection were selected individually and the corresponding slice was navigated to on the CT scan or CURRY coregistration. After identifying landmarks of that slice (e.g. skull shape, skull features, shape of prominent brain structures like the ventricles, central sulcus, superior temporal gyrus, etc.), the location of a given contact in relation to these landmarks, and the location of the slice along the axial plane, the corresponding slice in the postoperative MRI scan was navigated to. The resected tissue within the slice was then visually inspected and compared against the distinct landmarks identified in the CT scans, if brain tissue was not present in the corresponding location of the contact, then the contact was marked as resected/ablated. This process was repeated for each contact of interest.

References

Adam Li, Chester Huynh, Zachary Fitzgerald, Iahn Cajigas, Damian Brusko, Jonathan Jagid, Angel Claudio, Andres Kanner, Jennifer Hopp, Stephanie Chen, Jennifer Haagensen, Emily Johnson, William Anderson, Nathan Crone, Sara Inati, Kareem Zaghloul, Juan Bulacio, Jorge Gonzalez-Martinez, Sridevi V. Sarma. Neural Fragility as an EEG Marker of the Seizure Onset Zone. bioRxiv 862797; doi: https://doi.org/10.1101/862797

Appelhoff, S., Sanderson, M., Brooks, T., Vliet, M., Quentin, R., Holdgraf, C., Chaumon, M., Mikulan, E., Tavabi, K., Höchenberger, R., Welke, D., Brunner, C., Rockhill, A., Larson, E., Gramfort, A. and Jas, M. (2019). MNE-BIDS: Organizing electrophysiological data into the BIDS format and facilitating their analysis. Journal of Open Source Software 4: (1896). https://doi.org/10.21105/joss.01896

Holdgraf, C., Appelhoff, S., Bickel, S., Bouchard, K., D'Ambrosio, S., David, O., … Hermes, D. (2019). iEEG-BIDS, extending the Brain Imaging Data Structure specification to human intracranial electrophysiology. Scientific Data, 6, 102. https://doi.org/10.1038/s41597-019-0105-7

Pernet, C. R., Appelhoff, S., Gorgolewski, K. J., Flandin, G., Phillips, C., Delorme, A., Oostenveld, R. (2019). EEG-BIDS, an extension to the brain imaging data structure for electroencephalography. Scientific Data, 6, 103. https://doi.org/10.1038/s41597-019-0104-8
SECs Compiled Financial Statements & Notes Dataset
kaggle.com
Updated Jul 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deny Tran (2024). SECs Compiled Financial Statements & Notes Dataset [Dataset]. https://www.kaggle.com/datasets/denytran/im-a-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 31, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Deny Tran
License
https://www.usa.gov/government-works/https://www.usa.gov/government-works/
Description
This dataset is from the SEC's Financial Statements and Notes Data Set.
It was a personal project to see if I could make the queries efficient.
It's just been collecting dust ever since, maybe someone will make good use of it.
Data is up to about early-2024.
It doesn't differ from the source, other than it's compiled - so maybe you can try it out, then compile your own (with the link below).
Dataset was created using SEC Files and SQL Server on Docker.
For details on the SQL Server database this came from, see: "dataset-previous-life-info" folder, which will contain: - Row Counts - Primary/Foreign Keys - SQL Statements to recreate database tables - Example queries on how to join the data tables. - A pretty picture of the table associations. Source: https://www.sec.gov/data-research/financial-statement-notes-data-sets

Happy coding!
F
Liabilities: Notes in Circulation: Federal Reserve Notes in Actual...
fred.stlouisfed.org
json
Updated Feb 10, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Liabilities: Notes in Circulation: Federal Reserve Notes in Actual Circulation [Dataset]. https://fred.stlouisfed.org/series/LNCFRNC
Explore at:
jsonAvailable download formats
Dataset updated
Feb 10, 2021
License
https://fred.stlouisfed.org/legal/#copyright-citation-requiredhttps://fred.stlouisfed.org/legal/#copyright-citation-required
Description
Graph and download economic data for Liabilities: Notes in Circulation: Federal Reserve Notes in Actual Circulation (LNCFRNC) from 1914-11-20 to 2018-04-11 about actual, notes, liabilities, and USA.
h
MeSH-CZ-2025-notes
huggingface.co
Updated Apr 25, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Medical Library (2025). MeSH-CZ-2025-notes [Dataset]. https://huggingface.co/datasets/NLK-NML/MeSH-CZ-2025-notes
Explore at:
Dataset updated
Apr 25, 2025
Dataset authored and provided by
National Medical Library
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MeSH-CZ-2025-notes - training dataset

Czech translation of Medical Subject Headings version 2025 Download more MeSH-CZ data here @ nlk.cz

License

MeSH-CZ-2025 - training dataset © 2025 by National Medical Library is licensed under Creative Commons Attribution 4.0 International

Structure

"text1","text2","value","category" "term1","definition/note","0.5","cat1|cat2"

category - multiple values-codes separated by a pipe… See the full description on the dataset page: https://huggingface.co/datasets/NLK-NML/MeSH-CZ-2025-notes.
Notes
azgeo-open-data-agic.hub.arcgis.com
ais-faa.opendata.arcgis.com
+1more
Updated Apr 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Federal Aviation Administration - AIS (2025). Notes [Dataset]. https://azgeo-open-data-agic.hub.arcgis.com/datasets/faa::notes
Explore at:
Dataset updated
Apr 17, 2025
Dataset provided by
Federal Aviation Administrationhttp://www.faa.gov/
Authors
Federal Aviation Administration - AIS
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered
Description
Current Effective Date: 0901Z 17 Apr 2025 to 0901Z 12 Jun 2025Note data provides additional information for Enroute chart production. It is provided as a geospatial vector file formats and depicted on Enroute charts. Note data information is published every eight weeks by the U.S. Department of Transportation, Federal Aviation Administration-Aeronautical Information Services.
s
Project Idea Notes
pacific-data.sprep.org
tonga-data.sprep.org
docx
Updated Feb 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Environment (2025). Project Idea Notes [Dataset]. https://pacific-data.sprep.org/dataset/project-idea-notes
Explore at:
docxAvailable download formats
Dataset updated
Feb 14, 2025
Dataset provided by
Tonga
Department of Environment
License
https://pacific-data.sprep.org/resource/private-data-license-agreement-0https://pacific-data.sprep.org/resource/private-data-license-agreement-0
Area covered
Tonga
Description
Project Idea Notes based on the developed SoE and NEMS
f
Dataset 1 accuracy.
plos.figshare.com
xls
Updated Jun 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neva J. Bull; Bridget Honan; Neil J. Spratt; Simon Quilty (2023). Dataset 1 accuracy. [Dataset]. http://doi.org/10.1371/journal.pone.0284965.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0284965.t002
Dataset updated
Jun 2, 2023
Dataset provided by
PLOS ONE
Authors
Neva J. Bull; Bridget Honan; Neil J. Spratt; Simon Quilty
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Classifying free-text from historical databases into research-compatible formats is a barrier for clinicians undertaking audit and research projects. The aim of this study was to (a) develop interactive active machine-learning model training methodology using readily available software that was (b) easily adaptable to a wide range of natural language databases and allowed customised researcher-defined categories, and then (c) evaluate the accuracy and speed of this model for classifying free text from two unique and unrelated clinical notes into coded data. A user interface for medical experts to train and evaluate the algorithm was created. Data requiring coding in the form of two independent databases of free-text clinical notes, each of unique natural language structure. Medical experts defined categories relevant to research projects and performed ‘label-train-evaluate’ loops on the training data set. A separate dataset was used for validation, with the medical experts blinded to the label given by the algorithm. The first dataset was 32,034 death certificate records from Northern Territory Births Deaths and Marriages, which were coded into 3 categories: haemorrhagic stroke, ischaemic stroke or no stroke. The second dataset was 12,039 recorded episodes of aeromedical retrieval from two prehospital and retrieval services in Northern Territory, Australia, which were coded into 5 categories: medical, surgical, trauma, obstetric or psychiatric. For the first dataset, macro-accuracy of the algorithm was 94.7%. For the second dataset, macro-accuracy was 92.4%. The time taken to develop and train the algorithm was 124 minutes for the death certificate coding, and 144 minutes for the aeromedical retrieval coding. This machine-learning training method was able to classify free-text clinical notes quickly and accurately from two different health datasets into categories of relevance to clinicians undertaking health service research.
F
Thai Handwritten Sticky Notes OCR Image Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Thai Handwritten Sticky Notes OCR Image Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/thai-sticky-notes-ocr-image-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
Introducing the Thai Sticky Notes Image Dataset - a diverse and comprehensive collection of handwritten text images carefully curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Thai language.
Dataset Contain & Diversity:
Containing more than 2000 images, this Thai OCR dataset offers a wide distribution of different types of sticky note images. Within this dataset, you'll discover a variety of handwritten text, including quotes, sentences, and individual words on sticky notes. The images in this dataset showcase distinct handwriting styles, fonts, font sizes, and writing variations.
To ensure diversity and robustness in training your OCR model, we allow limited (less than three) unique images in a single handwriting. This ensures we have diverse types of handwriting to train your OCR model on. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible Thai text.
The images have been captured under varying lighting conditions, including day and night, as well as different capture angles and backgrounds. This diversity helps build a balanced OCR dataset, featuring images in both portrait and landscape modes.
All these sticky notes were written and images were captured by native Thai people to ensure text quality, prevent toxic content, and exclude PII text. We utilized the latest iOS and Android mobile devices with cameras above 5MP to maintain image quality. Images in this training dataset are available in both JPEG and HEIC formats.
Metadata:
In addition to the image data, you will receive structured metadata in CSV format. For each image, this metadata includes information on image orientation, country, language, and device details. Each image is correctly named to correspond with the metadata.
This metadata serves as a valuable resource for understanding and characterizing the data, aiding informed decision-making in the development of Thai text recognition models.
Update & Custom Collection:
We are committed to continually expanding this dataset by adding more images with the help of our native Thai crowd community.
If you require a customized OCR dataset containing sticky note images tailored to your specific guidelines or device distribution, please don't hesitate to contact us. We have the capability to curate specialized data to meet your unique requirements.
Additionally, we can annotate or label the images with bounding boxes or transcribe the text in the images to align with your project's specific needs using our crowd community.
License:
This image dataset, created by FutureBeeAI, is now available for commercial use.
Conclusion:
Leverage this sticky notes image OCR dataset to enhance the training and performance of text recognition, text detection, and optical character recognition models for the Thai language. Your journey to improved language understanding and processing begins here.
d
Interagency Data Team Notes October 25, 2022
catalog.data.gov
opendata.dc.gov
+2more
Updated Feb 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Washington, DC (2025). Interagency Data Team Notes October 25, 2022 [Dataset]. https://catalog.data.gov/dataset/interagency-data-team-notes-october-25-2022
Explore at:
Dataset updated
Feb 4, 2025
Dataset provided by
City of Washington, DC
Description
Meeting notes from the Interagency Data Team meeting. The Interagency Data Team is a community of data analysts, or agency liaisons, who convene regularly with representation from DC agencies of all persuasions. Participants engage in discussions regarding the team’s core mission and priorities for a better kind of data culture – collection, application, sharing, classification and governance to name a few. The team is coordinated by the Office of the Chief Technology Officer (OCTO), lead by the Chief Data Officer (CDO), and directly supports the District of Columbia's Data Policy.Office of the Chief Technology Officer Data Team FY 2023 Projects2023 Enterprise Dataset Inventory is now OpenData Report
C
COVID-19 Patient Data
data.ca.gov
data.chhs.ca.gov
csv, zip
Updated Feb 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of State Hospitals (2025). COVID-19 Patient Data [Dataset]. https://data.ca.gov/dataset/covid-19-patient-data
Explore at:
zip, csvAvailable download formats
Dataset updated
Feb 10, 2025
Dataset provided by
Department of State Hospitals
Authors
California Department of State Hospitals
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
DSH COVID-19 Patient Testing: Last updated -02/10/2025

DSH COVID-19 Patient Data reports on patient positives and testing counts at the facility level for DSH. The table reports on the following data fields:

Total patients that tested positive for COVID-19 since 5/16/2020

Patients newly positive for COVID-19 in the last 14 days

Patient deaths while patient was positive for COVID-19 since 5/30/2020

Total number of tests administered since 3/23/2020

Table Notes:

COVID-19 test results for patients include DSH patients who are tested while receiving treatment at an outside medical facility. Data has been de-identified in accordance with CalHHS Data De-identification Guidelines. Counts between 1-10 are masked with "<11". Includes Patients Under Investigation (PUIs) testing and proactive testing of asymptomatic patients for surveillance of geriatric, medically fragile, and skilled nursing facility units and for patients upon admission, re-admission, or discharge. Includes all individuals who were positive for COVID-19 at time of death, regardless of underlying health conditions or whether the cause of death has been confirmed to be COVID-19 related illness. Metro-Norwalk is additional COVID-19 surge space and technically a branch location that is part of DSH Metropolitan Hospital.
d
Fit Notes Issued by GP Practices
digital.nhs.uk
Updated Jul 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Fit Notes Issued by GP Practices [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/fit-notes-issued-by-gp-practices
Explore at:
Dataset updated
Jul 11, 2024
License
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
Time period covered
Apr 1, 2019 - Mar 31, 2024
Area covered
England
Description
The outbreak of Coronavirus (COVID-19) in the last quarter of 2019-20 has led to unprecedented changes in the work and behaviour of GP practices and consequently the data in this publication may have been impacted. As such caution should be taken in drawing any conclusions from this data without due consideration of the circumstances both locally and nationally and would recommend that any use of this data is accompanied by an appropriate caveat. The Statement of Fitness for Work (the Med3 form or 'fit note') was introduced in April 2010 across England, Wales and Scotland. It enables healthcare professionals to give advice to their patients about the impact of their health condition on their fitness for work and is used to provide medical evidence for employers or to support a claim to health-related benefits through the Department for Work and Pensions (DWP). A fit note is issued after the first seven days of sickness absence (when patients can self-certify) if the healthcare professional assesses that the patient’s health affects their fitness for work. The healthcare professional can decide the patient is 'unfit for work' or 'may be fit for work subject to the following advice...' with accompanying notes on suggested adjustments or adaptations to the job role or workplace. In 2012, DWP funded a project to provide general practice's with the ability to produce computer-generated fit notes (eMed3) and this included the capability to collect the aggregated data generated. Fit notes are issued to patients by doctors, nurses, physiotherapists, occupational therapists and pharmacists following an assessment of their fitness for work. While they can be written by hand, most fit notes provided by general practice are now computer-generated. This quarterly statistical publication is produced by NHS England in collaboration with The Work and Health Unit, jointly sponsored by the Department for Work and Pensions and the Department of Health. It presents data on electronic fit notes issued in general practices in England for a given period. This is a ‘cumulative’ data collection. Weekly data collected will continue to be added to existing data. All data for all reporting periods is updated in each quarterly publication. From April 2019 all publications will contain data from practices who have TPP as their system supplier (which was not previously available), and accounts for one third of practices in England, consequently publications from this date may not be comparable to previous publications. All GP practices are mapped using current NHS geographies and recent changes may have resulted in a small number of practices not being mapped historically. These are shown as 'Unallocated' but are included in the England total. NHS England will publish data on a quarterly basis in October, January, April and July. 11/07/2024: the summary excel, table 13 was updated with missing text, no data was changed or impacted.

Facebook

Twitter

Click to copy link

Link copied

Cite

Economic and Risk Analysis (2025). Financial Statement and Notes Data Sets [Dataset]. https://catalog.data.gov/dataset/financial-statement-and-notes-data-sets

Financial Statement and Notes Data Sets

Explore at:

59 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Jun 4, 2025

Dataset provided by

Economic and Risk Analysis

Description

The data sets provide the text and detailed numeric information in all financial statements and their notes extracted from exhibits to corporate financial reports filed with the Commission using eXtensible Business Reporting Language (XBRL).

Clear search

Close search

Google apps

Main menu

Financial Statement and Notes Data Sets

Data from: CLIP: A Dataset for Extracting Action Items for Physicians from...

FINGAP07 NUMBER OF FINANCIAL STATEMENTS AND NOTES TO ACCOUNTS PRODUCED -...

Publicly available medical text data with authentic quality

Notes to Financial Statements

Writing Center Session Note Data Repository

Interagency Data Team Notes August 9, 2017

MIMIC-IV-Note: Deidentified free-text clinical notes

Turkish Handwritten Sticky Notes OCR Image Dataset

What’s Included

Data from: Epilepsy-iEEG-Multicenter-Dataset

Fragility Multi-Center Retrospective Study

Data Availability

Sourcedata

Derivatives

Events and Descriptions

Seizure Electrographic and Clinical Onset Annotations

Seizure Onset Zone Annotations

Surgical Zone (Resection or Ablation) Annotations

References

SECs Compiled Financial Statements & Notes Dataset

Liabilities: Notes in Circulation: Federal Reserve Notes in Actual...

MeSH-CZ-2025-notes

Notes

Project Idea Notes

Dataset 1 accuracy.

Thai Handwritten Sticky Notes OCR Image Dataset

What’s Included

Interagency Data Team Notes October 25, 2022

COVID-19 Patient Data

DSH COVID-19 Patient Testing: Last updated -02/10/2025

Table Notes:

Fit Notes Issued by GP Practices

Financial Statement and Notes Data SetsSee More Versions

Financial Statement and Notes Data Sets