The ICDAR 2013 dataset consists of 229 training images and 233 testing images, with word-level annotations provided. It is the standard benchmark dataset for evaluating near-horizontal text detection.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
META
https://github.com/open-mmlab/mmocr/blob/main/dataset_zoo/icdar2013/metafile.yml Name: 'Incidental Scene Text IC13' Paper: Title: ICDAR 2013 Robust Reading Competition URL: https://www.imlab.jp/publication_data/1352/icdar_competition_report.pdf Venue: ICDAR Year: '2013' BibTeX:'@inproceedings{karatzas2013icdar, title={ICDAR 2013 robust reading competition}, author={Karatzas, Dimosthenis and Shafait, Faisal and Uchida, Seiichi and Iwamura, Masakazu and i Bigorda… See the full description on the dataset page: https://huggingface.co/datasets/MiXaiLL76/ICDAR2013_OCR.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The aim of this competition is to evaluate the performance of state of the art methods for table detection (TRACK A) and table recognition (TRACK B). For the first track, document images containing one or several tables are provided. For TRACK B two subtracks exist: the first subtrack (B.1) provides the table region. Thus, only the table structure recognition must be performed. The second subtrack (B.2) provides no a-priori information. This means, the table region and table structure detection has to be done. The Ground Truth is provided in a similar format as for the ICDAR 2013 competition (see [2]):
<document filename='filename.jpg'>
<table id='Table_1540517170416_3'>
<cell id='TableCell_1540517477147_58' start-row='0' start-col='0' end-row='1' end-col='2'>
<Coords points="180,160 177,456 614,456 615,163"/>
...
...
The difference to Gobel et al. [2] is the Coords tag which defines a table/cell as a polygon specified by a list of coordinates. For B.1 the table and its coordinates is given together with the input image.
Important Note:
For the modern dataset, the convex hull of the content describes a cell region. For the historical dataset, it is requested that the output region of a cell is the cell boundary. This is necessary due to the characteristics of handwritten text, which is often overlapping with different cells.
See also: http://sac.founderit.com/tasks.html
The evaluation tool is available at github: https://github.com/cndplab-founder/ctdar_measurement_tool
The ICDAR-2013 competition dataset contains 3755 handwritten Chinese characters.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by loïc lauréote
Released under CC0: Public Domain
DIBCO 2013 is the international Document Image Binarization Contest organized in the context of ICDAR 2013 conference. The general objective of the contest is to identify current advances in document image binarization for both machine-printed and handwritten document images using evaluation performance measures that conform to document image analysis and recognition.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The CVL Single Digit dataset consists of 7000 single digits (700 digits per class) written by approximately 60 different writers. The validation set has the same size but different writers. The validation set may be used for parameter estimation and validation but not for supervised training. The CVL Digit Strings dataset uses 10 different digit strings from a total of about 120 writers resulting in 1262 training images. The digits from the CVL Single Digit dataset were extracted from these strings.
This database may be used for non-commercial research purpose only. If you publish material based on this database, we request you to include a reference to:
Markus Diem, Stefan Fiel, Angelika Garz, Manuel Keglevic, Florian Kleber and Robert Sablatnig, ICDAR 2013 Competition on Handwritten Digit Recognition (HDRC 2013), In Proc. of the 12th Int. Conference on Document Analysis and Recognition (ICDAR) 2013, pp. 1454-1459, 2013.
ICDAR 2013 consists of 229 training images and 233 testing images, and similar to ICDAR 2015, it also provides "Strong", "Weak" and "Generic" lexicons for text spotting task. Different to above datasets, it contains only horizontal text.
The ICDAR2003 dataset is a dataset for scene text recognition. It contains 507 natural scene images (including 258 training images and 249 test images) in total. The images are annotated at character level. Characters and words can be cropped from the images.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset comes from Devashish Prasad, Ayan Gadpal, Kshitij Kapadni, Manish Visave, and Kavita Sultanpure - creators of CascadeTabNet.
Depending on the dataset version downloaded, the images will include annotations for 'borderless' tables, 'bordered' tables', and 'cells'. Borderless tables are those in which every cell in the table does not have a border. Bordered tables are those in which every cell in the table has a border, and the table is bordered. Cells are the individual data points within the table.
A subset of the full dataset, the ICDAR Table Cells Dataset, was extracted and imported to Roboflow to create this hosted version of the Cascade TabNet project. All the additional dataset components used in the full project are available here: All Files.
For the versions below: Preprocessing step of Resize (416by416 Fit within-white edges) was added along with more augmentations to increase the size of the training set and to make our images more uniform. Preprocessing applies to all images whereas augmentations only apply to training set images. 3. Version 3, augmented-FAST-model : 818 raw images of tables. Trained from Scratch (no transfer learning) with the "Fast" model from Roboflow Train. 3X augmentation (generated images). 4. Version 4, augmented-ACCURATE-model : 818 raw images of tables. Trained from Scratch with the "Accurate" model from Roboflow Train. 3X augmentation. 5. Version 5, tableBordersOnly-augmented-FAST-model : 818 raw images of tables. 'Cell' class ommitted with Modify Classes. Trained from Scratch with the "Fast" model from Roboflow Train. 3X augmentation. 6. Version 6, tableBordersOnly-augmented-ACCURATE-model : 818 raw images of tables. 'Cell' class ommitted with Modify Classes. Trained from Scratch with the "Accurate" model from Roboflow Train. 3X augmentation.
Example Image from the Datasethttps://i.imgur.com/ruizSQN.png" alt="Example Image from the Dataset">
Cascade TabNet in Actionhttps://i.imgur.com/nyn98Ue.png" alt="Cascade TabNet in Action">
CascadeTabNet is an automatic table recognition method for interpretation of tabular data in document images. We present an improved deep learning-based end to end approach for solving both problems of table detection and structure recognition using a single Convolution Neural Network (CNN) model. CascadeTabNet is a Cascade mask Region-based CNN High-Resolution Network (Cascade mask R-CNN HRNet) based model that detects the regions of tables and recognizes the structural body cells from the detected tables at the same time. We evaluate our results on ICDAR 2013, ICDAR 2019 and TableBank public datasets. We achieved 3rd rank in ICDAR 2019 post-competition results for table detection while attaining the best accuracy results for the ICDAR 2013 and TableBank dataset. We also attain the highest accuracy results on the ICDAR 2019 table structure recognition dataset.
If you find this work useful for your research, please cite our paper: @misc{ cascadetabnet2020, title={CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents}, author={Devashish Prasad and Ayan Gadpal and Kshitij Kapadni and Manish Visave and Kavita Sultanpure}, year={2020}, eprint={2004.12629}, archivePrefix={arXiv}, primaryClass={cs.CV} }
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A new release of the dataset used in the ICDAR 2015 HTR competition in which all Page XML files are based on the same 2013-07-15 schema. It only contains page level images, Page XML files for train and test (including the ground truth transcripts for the test and train batch 1) and plain text files for train batch 2 that have the page level ground truth transcripts. The original version of this dataset can be found at http://doi.org/10.5281/zenodo.248733
The CVL Database is a public database for writer retrieval, writer identification and word spotting. The database consists of 7 different handwritten texts (1 German and 6 Englisch Texts). In total 310 writers participated in the dataset. 27 of which wrote 7 texts and 283 writers had to write 5 texts. For each text a rgb color image (300 dpi) comprising the handwritten text and the printed text sample is available as well as a cropped version (only handwritten). An unique id identifies the writer, whereas the Bounding Boxes for each single word are stored in an XML file. The CVL-database consists of images with cursively handwritten german and english texts which has been choosen from literary works. All pages have a unique writer id and the text number (separated by a dash) at the upper right corner, followed by the printed sample text. The text is placed between two horizontal separatores. Beneath the printed text individuals have been asked to write the text using a ruled undersheet to prevent curled text lines. The layout follows the style of the IAM database. The database was updated on 12/09/2013 since one writer ID (265/266) was wrong. The version number was changed to 1.1. Samples of the following texts have been used: Edwin A. Abbot – Flatland: A Romance of Many Dimension (92 words). William Shakespeare – Mac Beth (49 words). Wikipedia – Mailüfterl (73 words, under CC Attribution-ShareALike License). Charles Darwin – Origin of Species (52 words). Johann Wolfgang von Goethe – Faust. Eine Tragödie (50 words). Oscar Wilde – The Picture of Dorian Gray (66 words). Edgar Allan Poe – The Fall of the House of Usher (78 words). This database may be used for non-commercial research purpose only. If you publish material based on this database, we request you to include a reference to: Florian Kleber, Stefan Fiel, Markus Diem and Robert Sablatnig, CVL-Database: An Off-line Database for Writer Retrieval, Writer Identification and Word Spotting, In Proc. of the 12th Int. Conference on Document Analysis and Recognition (ICDAR) 2013, pp. 560-564, 2013. {"references": ["Florian Kleber, Stefan Fiel, Markus Diem and Robert Sablatnig, CVL-Database: An Off-line Database for Writer Retrieval, Writer Identification and Word Spotting, In Proc. of the 12th Int. Conference on Document Analysis and Recognition (ICDAR) 2013, pp. 560-564, 2013."]}
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The CVL ruling dataset was synthetically generated to allow for comparing different ruling removal methods. It is based on the ICDAR 2013 Handwriting Segmentation database [1]. It was generated by synthetically adding four different ruling images resulting in a total of 600 test images. The pixel values are:
For processing, a binary image must be generated which sets all pixels to 0 that are not 255. When evaluating, the line GT image can be found by setting all pixel having value 155 to one (e.g. linImg = img == 155). The text GT image can be extracted by setting all values below 155 to zero (e.g. txtImg = img < 155). Then, true positives (tp), false positives (fp) and false negatives (fn) are defined as:
The database ships with a Matlab that gives evaluation results if all images are already processed.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
The ICDAR 2013 dataset consists of 229 training images and 233 testing images, with word-level annotations provided. It is the standard benchmark dataset for evaluating near-horizontal text detection.