Dataset Card for synth-text-classification
This dataset has been created with distilabel. The pipeline script was uploaded to easily reproduce the dataset: text_classification.py. It can be run directly using the CLI: distilabel pipeline run --script "https://huggingface.co/datasets/dvilasuero/synth-text-classification/raw/main/text_classification.py"
Dataset Summary
This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that… See the full description on the dataset page: https://huggingface.co/datasets/dvilasuero/synth-text-classification.
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
This is a synthetically generated dataset, in which word instances are placed in natural scene images, while taking into account the scene layout. The dataset consists of 800 thousand images with approximately 8 million synthetic word instances. Each text instance is annotated with its text-string, word-level and character-level bounding-boxes.
This dataset has been created by Stability AI and LAION. SynthText is a popular OCR dataset, where random texts are rendered into random locations in images based on depth maps. In this dataset, we additionally computed image captions using BLIP2.
Caption: "a close up of a leopard's face with a blurry background"
This dataset was created by Gopi chandu
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Lalu Erfandi Maula Yusnu
Released under CC0: Public Domain
test
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Dataset Card for synth-text-classification
This dataset has been created with distilabel. The pipeline script was uploaded to easily reproduce the dataset: text_classification.py. It can be run directly using the CLI: distilabel pipeline run --script "https://huggingface.co/datasets/dvilasuero/synth-text-classification/raw/main/text_classification.py"
Dataset Summary
This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that… See the full description on the dataset page: https://huggingface.co/datasets/dvilasuero/synth-text-classification.