6 datasets found
  1. h

    synth-text-classification

    • huggingface.co
    Updated Oct 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Vila (2024). synth-text-classification [Dataset]. https://huggingface.co/datasets/dvilasuero/synth-text-classification
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 9, 2024
    Authors
    Daniel Vila
    Description

    Dataset Card for synth-text-classification

    This dataset has been created with distilabel. The pipeline script was uploaded to easily reproduce the dataset: text_classification.py. It can be run directly using the CLI: distilabel pipeline run --script "https://huggingface.co/datasets/dvilasuero/synth-text-classification/raw/main/text_classification.py"

      Dataset Summary
    

    This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that… See the full description on the dataset page: https://huggingface.co/datasets/dvilasuero/synth-text-classification.

  2. a

    Data from: Synthetic Data for Text Localisation in Natural Images

    • academictorrents.com
    bittorrent
    Updated Nov 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ankush Gupta and Andrea Vedaldi and Andrew Zisserman (2021). Synthetic Data for Text Localisation in Natural Images [Dataset]. https://academictorrents.com/details/2dba9518166cbd141534cbf381aa3e99a087e83c
    Explore at:
    bittorrent(73499997703)Available download formats
    Dataset updated
    Nov 15, 2021
    Dataset authored and provided by
    Ankush Gupta and Andrea Vedaldi and Andrew Zisserman
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    This is a synthetically generated dataset, in which word instances are placed in natural scene images, while taking into account the scene layout. The dataset consists of 800 thousand images with approximately 8 million synthetic word instances. Each text instance is annotated with its text-string, word-level and character-level bounding-boxes.

  3. h

    CaptionedSynthText

    • huggingface.co
    Updated Aug 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chris Wendler (2024). CaptionedSynthText [Dataset]. https://huggingface.co/datasets/wendlerc/CaptionedSynthText
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 23, 2024
    Authors
    Chris Wendler
    Description

    This dataset has been created by Stability AI and LAION. SynthText is a popular OCR dataset, where random texts are rendered into random locations in images based on depth maps. In this dataset, we additionally computed image captions using BLIP2.

    Caption: "a close up of a leopard's face with a blurry background"

  4. SynthText TF Records

    • kaggle.com
    Updated Dec 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gopi chandu (2023). SynthText TF Records [Dataset]. https://www.kaggle.com/datasets/gopichandu1/synthtext-tf-records/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 19, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Gopi chandu
    Description

    Dataset

    This dataset was created by Gopi chandu

    Contents

  5. h

    synthtext

    • huggingface.co
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    madehua (2025). synthtext [Dataset]. https://huggingface.co/datasets/mdh98/synthtext
    Explore at:
    Dataset updated
    Jun 17, 2025
    Authors
    madehua
    Description

    mdh98/synthtext dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. Clova Deep Text LMDB Dataset

    • kaggle.com
    zip
    Updated Oct 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lalu Erfandi Maula Yusnu (2020). Clova Deep Text LMDB Dataset [Dataset]. https://www.kaggle.com/nunenuh/clova-deeptext
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Oct 4, 2020
    Authors
    Lalu Erfandi Maula Yusnu
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Lalu Erfandi Maula Yusnu

    Released under CC0: Public Domain

    Contents

    test

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Daniel Vila (2024). synth-text-classification [Dataset]. https://huggingface.co/datasets/dvilasuero/synth-text-classification

synth-text-classification

dvilasuero/synth-text-classification

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 9, 2024
Authors
Daniel Vila
Description

Dataset Card for synth-text-classification

This dataset has been created with distilabel. The pipeline script was uploaded to easily reproduce the dataset: text_classification.py. It can be run directly using the CLI: distilabel pipeline run --script "https://huggingface.co/datasets/dvilasuero/synth-text-classification/raw/main/text_classification.py"

  Dataset Summary

This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that… See the full description on the dataset page: https://huggingface.co/datasets/dvilasuero/synth-text-classification.

Search
Clear search
Close search
Google apps
Main menu