Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description:
This dataset contains images of handwritten mathematical expressions paired with their corresponding textual representations and answers. The expressions include various arithmetic operations such as addition (+
), subtraction (-
), multiplication (*
), division (÷
), and parentheses for grouping operations. The dataset is designed to support tasks such as Optical Character Recognition (OCR), handwritten text recognition, and sequence modeling for solving mathematical expressions.
Expression
: The mathematical expression in text form.Answer
: The evaluated result of the expression.This dataset serves as a valuable resource for researchers and practitioners working on handwriting recognition and mathematical problem-solving automation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Handwritten Math Equation Solver is a dataset for object detection tasks - it contains Handwritten Math Numbers annotations for 420 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Băngf Hải
Released under Apache 2.0
https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/
Aida Calculus Math Handwriting Recognition Dataset (Downsampled Image-to-LaTeX Version)
Synthetic handwritten calculus math expressions with downsampled images for LaTeX OCR and handwriting recognition tasks.
Dataset Summary
This is a processed version of the original Aida Calculus Math Handwriting Recognition Dataset, tailored specifically for image-to-LaTeX modeling. The dataset comprises synthetic handwritten calculus expressions, with each image annotated by a ground… See the full description on the dataset page: https://huggingface.co/datasets/deepcopy/Aida-Calculus-Math-Handwriting.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
I’ve modified the 100K image dataset of handwritten math equations from the AidaV7 Dataset to improve its usability for training models. The original dataset was divided into 10 folders, each containing 10K images, which made it challenging to train models that require a large volume of data simultaneously. I combined all the images into a single folder to address this. Additionally, I restructured the annotations, which were originally spread across multiple JSON files and stored as an array of dictionaries. The annotations weren’t in order, requiring repeated iterations through the array to locate the correct annotation for each image. To streamline this process, I merged all the JSON files into one and converted the data into a CSV file. In this CSV, each row represents an image filename, and the columns contain the corresponding annotations, making annotation retrieval faster during training.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Handwritten Maths Operators 2 is a dataset for object detection tasks - it contains Equal 6zFb annotations for 853 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Here is the datasets collected for the Competitionon Recognition of Online Handwritten Mathematical Expressions in competition session of ICDAR 2023.
3 tasks are proposed with different modalities, there are on-line, off-line and bi-modal.
For on-line task, we provide .inkml file (contain trace information, mathML and LaTeX string), and also symbol level label graph (SymLG) as ground truth. Except the new data and previous CROHME data, we also provide huge amount of artificial on-line data in the train set.
For off-line task, the .png images (scanned from paper or rendering from inkml) and symbol level label graph (SymLG) are provided. Except the new data and previous CROHME data, we use off-line images from OffHME to increase the size of train set.
For bi-modal task, both .inkml file and ,png images are provided as 2 channels input, and SymLG as ground truth.
All the 3 tasks inherited the data collected from the previous 6 CROHME, and also the new collection 2023 in 3 sites, Nantes (France), Luleå (Sweden) and Tokyo (Japan).
deepcopy/fastmath-handwritten-math-to-latex dataset hosted on Hugging Face and contributed by the HF Datasets community
Explore the Persian Handwritten Math Solutions Dataset with images and JSON annotations for formula recognition.
https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
The Aida Calculus Math Handwriting Recognition Dataset consists of 100,000 images in 10 batches. Each image contains a photo of a handwritten calculus math expression (specifically within the topic of limits) written with a dark utensil on plain paper. Each image is accompanied by ground truth math expression in LaTeX as well as bounding boxes and pixel-level masks per character. All images are synthetically generated.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F5602706%2F67bf0c680286baf2c979c8207a991bb2%2FScreen%20Shot%202020-08-19%20at%201.02.50%20PM.png?generation=1597868629120369&alt=media%20=500x100" alt="">
The complexity of handwriting recognition for math expressions can be decomposed into the following sources of variability:
Image of Math = Math Expression x Math Characters x Location of Math Characters x Visual Qualities of the Math Characters (fonts, color) x Noise of Image (backgrounds, stray marks)
It is the job of the recognition model to take the Image of Math as input and predict the Math Expression.
Typical approaches to handwritten recognition tasks involve collecting and tagging of large amounts of data, on which many iterations of models are trained. The "one dataset, many models" paradigm has specific drawbacks within the context of product development. As product requirements evolve, such as the addition of a new mathematical character into the prediction space, a new data collection and tagging effort must be undertaken. The cycle of adapting the handwriting recognition capability to new requirements is long and does not support agile product development.
Here, we take a different approach by iteratively building a complex, synthetically generated dataset towards specific requirements. The generation process delivers exact control over the distribution of math expressions, characters, location of characters, specific visual qualities of the math, image noise, and image augmentations to the developer. The developer controls every aspect of the data, down to each pixel. In many ways, the data synthesis runs backwards to the handwriting recognition model, creating visual complexity that the model must then untangle to uncover the ground truth math expression. Thus, we can arrive at a "many datasets, one model" paradigm that as product requirements change, the data can quickly iterate and adapt on agile cycles.
In addition to affording more control over the product development process, synthetic data allows for 100% correct pixel by pixel tagging that opens the door for new modeling possibilities. Every image is tagged with the ground truth LaTeX for the expressions, bounding boxes per math character, and exact pixel masks for each character.
Our goal in releasing this dataset is to provide the data science and machine learning community with resources for undertaking the challenging computer vision task of extracting math expressions from images. The data offers something to all levels, from beginners building simple character recognition models to experts who wish to predict pixel-by-pixel masks and decode the complex structure of math expressions.
The images contain math expressions of limits, a topic typically encountered by students learning Calculus I in the United States. Features of the writing such as font, writing utensils (type, color, pressure, consistency), angle and distance of photo, and size of writing are all simulated. Backgrounds features include shadows, various plain paper types, bleed throughs, other distortions, and noise typical of student taking photos of their math.
The strategy in defining the populations from which images are synthesized is to be a superset of what we expect students to submit. Therefore, the math expressions are not in themselves pedagogical, but aim to encompass the potential variety of student submissions, both mathematically correct and incorrect. The image features and augmentations are similarly designed to cover the range of possible student handwriting qualities.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F5602706%2F78c49b9673f8d07c91cd5c929e50ed13%2FPicture2.png?generation=1597361067979205&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F5602706%2F38f70b6a773709eb02578f20634e8433%2FPicture1.png?generation=1597361068613807&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F5602706%2F17a3a78ac635cd728f9d6ef32609aee8%2FPicture3.png?generation=1597361068784034&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F5602706%2Fc052749a8085d66aa7bf97c78a4b6c6a%2FPicture4.png?generation=1597361068949074&alt=media%20=250x100" alt="">
Data consis...
Dataset Card for MathWriting
Dataset Summary
The MathWriting dataset contains online handwritten mathematical expressions collected through a prompted interface and rendered to RGB images. It consists of 230,000 human-written expressions, each paired with its corresponding LaTeX string. The dataset is intended to support research in online and offline handwritten mathematical expression (HME) recognition. Key features:
Online handwriting converted to rendered RGB images.… See the full description on the dataset page: https://huggingface.co/datasets/deepcopy/MathWriting-human.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Handwritten Maths Operators is a dataset for object detection tasks - it contains Equal annotations for 853 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
This dataset was created by Carlos Espa
The HWRT database of handwritten symbols contains on-line data of handwritten symbols such as all alphanumeric characters, arrows, greek characters and mathematical symbols like the integral symbol. The database can be downloaded in form of bzip2-compressed tar files. Each tar file contains: symbols.csv: A CSV file with the rows symbol_id, latex, training_samples, test_samples. The symbol id is an integer, the row latex contains the latex code of the symbol, the rows training_samples and test_samples contain integers with the number of labeled data. train-data.csv: A CSV file with the rows symbol_id, user_id, user_agent and data. test-data.csv: A CSV file with the rows symbol_id, user_id, user_agent and data. All CSV files use ";" as delimiter and "'" as quotechar. The data is given in YAML format as a list of lists of dictinaries. Each dictionary has the keys "x", "y" and "time". (x,y) are coordinates and time is the UNIX time. About 90% of the data was made available by Daniel Kirsch via github.com/kirel/detexify-data. Thank you very much, Daniel!
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Handwritten Maths Operators 3 is a dataset for object detection tasks - it contains Equal 6zFb Tq9W annotations for 754 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
This dataset was created by CODER B
Released under Data files © Original Authors
https://creativecommons.org/licenses/zero/1.0https://creativecommons.org/licenses/zero/1.0
The dataset contains samples of handwritten digits (0–9) and basic mathematical symbols: +, -, ÷, ×, (, ).Total number of samples: 2,183.
This dataset was created by CuteDeadu
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are a few use cases for this project:
Math Education Tools: The model could be integrated into an educational software to help students learn and visualize math problems. It can recognize and interpret handwritten equations, turning them into digital format, allowing students to solve them more easily.
Handwritten Data Digitization: This can be used in institutions like banks, where many data entries are still done by hand. This tool could transcribe these handwritten entries into digital numbers, helping to automate the digitization process.
Automated Marking System: The model can be used to auto-grade written numerical assignments or exam answers, reducing redundancy for teachers and providing objective scoring.
Invoice Processing: Companies dealing with large numbers of handwritten invoices could use the model to accurately transcribe these documents into a digital system for easy tracking and management.
Handwriting Recognition in Health Sector: In healthcare, doctors' handwritten notes or prescriptions often cause issues. This model could digitize those notes, ensuring that errors due to illegibility are minimized.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
ICDAR 2019 Competition on Recognition of Handwritten Mathematical Expressions and Typeset Formula Detection (ICDAR2019-CROHME-TDF) - With temporal classification labeled data (generated from Label Graph)
\cite{Mouchère, ICDAR 2019 Competition on Recognition of Handwritten Mathematical Expressions and Typeset Formula Detection (ICDAR2019-CROHME-TDF) ,1,ID:ICDAR2019-CROHME-TDF_1,URL:https://tc11.cvc.uab.es/datasets/ICDAR2019-CROHME-TDF_1}
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description:
This dataset contains images of handwritten mathematical expressions paired with their corresponding textual representations and answers. The expressions include various arithmetic operations such as addition (+
), subtraction (-
), multiplication (*
), division (÷
), and parentheses for grouping operations. The dataset is designed to support tasks such as Optical Character Recognition (OCR), handwritten text recognition, and sequence modeling for solving mathematical expressions.
Expression
: The mathematical expression in text form.Answer
: The evaluated result of the expression.This dataset serves as a valuable resource for researchers and practitioners working on handwriting recognition and mathematical problem-solving automation.