6 datasets found

h
synth-text-classification
huggingface.co
Updated Oct 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Vila (2024). synth-text-classification [Dataset]. https://huggingface.co/datasets/dvilasuero/synth-text-classification
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 9, 2024
Authors
Daniel Vila
Description
Dataset Card for synth-text-classification

This dataset has been created with distilabel. The pipeline script was uploaded to easily reproduce the dataset: text_classification.py. It can be run directly using the CLI: distilabel pipeline run --script "https://huggingface.co/datasets/dvilasuero/synth-text-classification/raw/main/text_classification.py"

Dataset Summary

This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that… See the full description on the dataset page: https://huggingface.co/datasets/dvilasuero/synth-text-classification.
a
Data from: Synthetic Data for Text Localisation in Natural Images
academictorrents.com
bittorrent
Updated Nov 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ankush Gupta and Andrea Vedaldi and Andrew Zisserman (2021). Synthetic Data for Text Localisation in Natural Images [Dataset]. https://academictorrents.com/details/2dba9518166cbd141534cbf381aa3e99a087e83c
Explore at:
bittorrent(73499997703)Available download formats
Dataset updated
Nov 15, 2021
Dataset authored and provided by
Ankush Gupta and Andrea Vedaldi and Andrew Zisserman
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
This is a synthetically generated dataset, in which word instances are placed in natural scene images, while taking into account the scene layout. The dataset consists of 800 thousand images with approximately 8 million synthetic word instances. Each text instance is annotated with its text-string, word-level and character-level bounding-boxes.
h
CaptionedSynthText
huggingface.co
Updated Aug 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chris Wendler (2024). CaptionedSynthText [Dataset]. https://huggingface.co/datasets/wendlerc/CaptionedSynthText
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 23, 2024
Authors
Chris Wendler
Description
This dataset has been created by Stability AI and LAION. SynthText is a popular OCR dataset, where random texts are rendered into random locations in images based on depth maps. In this dataset, we additionally computed image captions using BLIP2.

Caption: "a close up of a leopard's face with a blurry background"
SynthText TF Records
kaggle.com
Updated Dec 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gopi chandu (2023). SynthText TF Records [Dataset]. https://www.kaggle.com/datasets/gopichandu1/synthtext-tf-records/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 19, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Gopi chandu
Description
Dataset

This dataset was created by Gopi chandu

Contents
h
synthtext
huggingface.co
Updated Jun 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
madehua (2025). synthtext [Dataset]. https://huggingface.co/datasets/mdh98/synthtext
Explore at:
Dataset updated
Jun 17, 2025
Authors
madehua
Description
mdh98/synthtext dataset hosted on Hugging Face and contributed by the HF Datasets community
Clova Deep Text LMDB Dataset
kaggle.com
zip
Updated Oct 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lalu Erfandi Maula Yusnu (2020). Clova Deep Text LMDB Dataset [Dataset]. https://www.kaggle.com/nunenuh/clova-deeptext
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Oct 4, 2020
Authors
Lalu Erfandi Maula Yusnu
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Lalu Erfandi Maula Yusnu

Released under CC0: Public Domain

Contents

test
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Daniel Vila (2024). synth-text-classification [Dataset]. https://huggingface.co/datasets/dvilasuero/synth-text-classification

synth-text-classification

dvilasuero/synth-text-classification

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Oct 9, 2024

Authors

Daniel Vila

Description

Dataset Card for synth-text-classification

This dataset has been created with distilabel. The pipeline script was uploaded to easily reproduce the dataset: text_classification.py. It can be run directly using the CLI: distilabel pipeline run --script "https://huggingface.co/datasets/dvilasuero/synth-text-classification/raw/main/text_classification.py"

  Dataset Summary

This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that… See the full description on the dataset page: https://huggingface.co/datasets/dvilasuero/synth-text-classification.

Clear search

Close search

Google apps

Main menu

synth-text-classification

Data from: Synthetic Data for Text Localisation in Natural Images

CaptionedSynthText

SynthText TF Records

Dataset

Contents

synthtext

Clova Deep Text LMDB Dataset

Dataset

Contents

synth-text-classificationSee More Versions

dvilasuero/synth-text-classification

synth-text-classification