100+ datasets found
  1. F

    English Shopping List OCR Image Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). English Shopping List OCR Image Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/english-shopping-list-ocr-image-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    Introducing the English Shopping List Image Dataset - a diverse and comprehensive collection of handwritten text images carefully curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the English language.

    Dataset Contain & Diversity:

    Containing more than 2000 images, this English OCR dataset offers a wide distribution of different types of shopping list images. Within this dataset, you'll discover a variety of handwritten text, including sentences, and individual item name words, quantity, comments, etc on shopping lists. The images in this dataset showcase distinct handwriting styles, fonts, font sizes, and writing variations.

    To ensure diversity and robustness in training your OCR model, we allow limited (less than three) unique images in a single handwriting. This ensures we have diverse types of handwriting to train your OCR model on. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible English text.

    The images have been captured under varying lighting conditions, including day and night, as well as different capture angles and backgrounds. This diversity helps build a balanced OCR dataset, featuring images in both portrait and landscape modes.

    All these shopping lists were written and images were captured by native English people to ensure text quality, prevent toxic content, and exclude PII text. We utilized the latest iOS and Android mobile devices with cameras above 5MP to maintain image quality. Images in this training dataset are available in both JPEG and HEIC formats.

    Metadata:

    In addition to the image data, you will receive structured metadata in CSV format. For each image, this metadata includes information on image orientation, country, language, and device details. Each image is correctly named to correspond with the metadata.

    This metadata serves as a valuable resource for understanding and characterizing the data, aiding informed decision-making in the development of English text recognition models.

    Update & Custom Collection:

    We are committed to continually expanding this dataset by adding more images with the help of our native English crowd community.

    If you require a customized OCR dataset containing shopping list images tailored to your specific guidelines or device distribution, please don't hesitate to contact us. We have the capability to curate specialized data to meet your unique requirements.

    Additionally, we can annotate or label the images with bounding boxes or transcribe the text in the images to align with your project's specific needs using our crowd community.

    License:

    This image dataset, created by FutureBeeAI, is now available for commercial use.

    Conclusion:

    Leverage this shopping list image OCR dataset to enhance the training and performance of text recognition, text detection, and optical character recognition models for the English language. Your journey to improved language understanding and processing begins here.

  2. F

    Finnish Product Image OCR Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Finnish Product Image OCR Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/finnish-product-image-ocr-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    Introducing the Finnish Product Image Dataset - a diverse and comprehensive collection of images meticulously curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Finnish language.

    Dataset Contain & Diversity:

    Containing a total of 2000 images, this Finnish OCR dataset offers diverse distribution across different types of front images of Products. In this dataset, you'll find a variety of text that includes product names, taglines, logos, company names, addresses, product content, etc. Images in this dataset showcase distinct fonts, writing formats, colors, designs, and layouts.

    To ensure the diversity of the dataset and to build a robust text recognition model we allow limited (less than five) unique images from a single resource. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible Finnish text.

    Images have been captured under varying lighting conditions – both day and night – along with different capture angles and backgrounds, to build a balanced OCR dataset. The collection features images in portrait and landscape modes.

    All these images were captured by native Finnish people to ensure the text quality, avoid toxic content and PII text. We used the latest iOS and Android mobile devices above 5MP cameras to click all these images to maintain the image quality. In this training dataset images are available in both JPEG and HEIC formats.

    Metadata:

    Along with the image data, you will also receive detailed structured metadata in CSV format. For each image, it includes metadata like image orientation, county, language, and device information. Each image is properly renamed corresponding to the metadata.

    The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of Finnish text recognition models.

    Update & Custom Collection:

    We're committed to expanding this dataset by continuously adding more images with the assistance of our native Finnish crowd community.

    If you require a custom product image OCR dataset tailored to your guidelines or specific device distribution, feel free to contact us. We're equipped to curate specialized data to meet your unique needs.

    Furthermore, we can annotate or label the images with bounding box or transcribe the text in the image to align with your specific project requirements using our crowd community.

    License:

    This Image dataset, created by FutureBeeAI, is now available for commercial use.

    Conclusion:

    Leverage the power of this product image OCR dataset to elevate the training and performance of text recognition, text detection, and optical character recognition models within the realm of the Finnish language. Your journey to enhanced language understanding and processing starts here.

  3. F

    French Newspaper, Magazine, and Books OCR Image Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). French Newspaper, Magazine, and Books OCR Image Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/french-newspaper-book-magazine-ocr-image-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    French
    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    Introducing the French Newspaper, Books, and Magazine Image Dataset - a diverse and comprehensive collection of images meticulously curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the French language.

    Dataset Contain & Diversity:

    Containing a total of 5000 images, this French OCR dataset offers an equal distribution across newspapers, books, and magazines. Within, you'll find a diverse collection of content, including articles, advertisements, cover pages, headlines, call outs, and author sections from a variety of newspapers, books, and magazines. Images in this dataset showcases distinct fonts, writing formats, colors, designs, and layouts.

    To ensure the diversity of the dataset and to build robust text recognition model we allow limited (less than five) unique images from a single resource. Stringent measures have been taken to exclude any personal identifiable information (PII), and in each image a minimum of 80% space is contain visible French text.

    Images have been captured under varying lighting conditions – both day and night – along with different capture angles and backgrounds, further enhancing dataset diversity. The collection features images in portrait and landscape modes.

    All these images were captured by native French people to ensure the text quality, avoid toxic content and PII text. We used latest iOS and android mobile devices above 5MP camera to click all these images to maintain the image quality. In this training dataset images are available in both JPEG and HEIC formats.

    Metadata:

    Along with the image data you will also receive detailed structured metadata in CSV format. For each image it includes metadata like device information, source type like newspaper, magazine or book image, and image type like portrait or landscape etc. Each image is properly renamed corresponding to the metadata.

    The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of French text recognition models.

    Update & Custom Collection:

    We're committed to expanding this dataset by continuously adding more images with the assistance of our native French crowd community.

    If you require a custom dataset tailored to your guidelines or specific device distribution, feel free to contact us. We're equipped to curate specialized data to meet your unique needs.

    Furthermore, we can annotate or label the images with bounding box or transcribe the text in the image to align with your specific requirements using our crowd community.

    License:

    This Image dataset, created by FutureBeeAI, is now available for commercial use.

    Conclusion:

    Leverage the power of this image dataset to elevate the training and performance of text recognition, text detection, and optical character recognition models within the realm of the French language. Your journey to enhanced language understanding and processing starts here.

  4. F

    Thai Newspaper, Magazine, and Books OCR Image Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Thai Newspaper, Magazine, and Books OCR Image Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/thai-newspaper-book-magazine-ocr-image-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    Introducing the Thai Newspaper, Books, and Magazine Image Dataset - a diverse and comprehensive collection of images meticulously curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Thai language.

    Dataset Contain & Diversity:

    Containing a total of 5000 images, this Thai OCR dataset offers an equal distribution across newspapers, books, and magazines. Within, you'll find a diverse collection of content, including articles, advertisements, cover pages, headlines, call outs, and author sections from a variety of newspapers, books, and magazines. Images in this dataset showcases distinct fonts, writing formats, colors, designs, and layouts.

    To ensure the diversity of the dataset and to build robust text recognition model we allow limited (less than five) unique images from a single resource. Stringent measures have been taken to exclude any personal identifiable information (PII), and in each image a minimum of 80% space is contain visible Thai text.

    Images have been captured under varying lighting conditions – both day and night – along with different capture angles and backgrounds, further enhancing dataset diversity. The collection features images in portrait and landscape modes.

    All these images were captured by native Thai people to ensure the text quality, avoid toxic content and PII text. We used latest iOS and android mobile devices above 5MP camera to click all these images to maintain the image quality. In this training dataset images are available in both JPEG and HEIC formats.

    Metadata:

    Along with the image data you will also receive detailed structured metadata in CSV format. For each image it includes metadata like device information, source type like newspaper, magazine or book image, and image type like portrait or landscape etc. Each image is properly renamed corresponding to the metadata.

    The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of Thai text recognition models.

    Update & Custom Collection:

    We're committed to expanding this dataset by continuously adding more images with the assistance of our native Thai crowd community.

    If you require a custom dataset tailored to your guidelines or specific device distribution, feel free to contact us. We're equipped to curate specialized data to meet your unique needs.

    Furthermore, we can annotate or label the images with bounding box or transcribe the text in the image to align with your specific requirements using our crowd community.

    License:

    This Image dataset, created by FutureBeeAI, is now available for commercial use.

    Conclusion:

    Leverage the power of this image dataset to elevate the training and performance of text recognition, text detection, and optical character recognition models within the realm of the Thai language. Your journey to enhanced language understanding and processing starts here.

  5. F

    Italian Shopping List OCR Image Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Italian Shopping List OCR Image Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/italian-shopping-list-ocr-image-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    Introducing the Italian Shopping List Image Dataset - a diverse and comprehensive collection of handwritten text images carefully curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Italian language.

    Dataset Contain & Diversity:

    Containing more than 2000 images, this Italian OCR dataset offers a wide distribution of different types of shopping list images. Within this dataset, you'll discover a variety of handwritten text, including sentences, and individual item name words, quantity, comments, etc on shopping lists. The images in this dataset showcase distinct handwriting styles, fonts, font sizes, and writing variations.

    To ensure diversity and robustness in training your OCR model, we allow limited (less than three) unique images in a single handwriting. This ensures we have diverse types of handwriting to train your OCR model on. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible Italian text.

    The images have been captured under varying lighting conditions, including day and night, as well as different capture angles and backgrounds. This diversity helps build a balanced OCR dataset, featuring images in both portrait and landscape modes.

    All these shopping lists were written and images were captured by native Italian people to ensure text quality, prevent toxic content, and exclude PII text. We utilized the latest iOS and Android mobile devices with cameras above 5MP to maintain image quality. Images in this training dataset are available in both JPEG and HEIC formats.

    Metadata:

    In addition to the image data, you will receive structured metadata in CSV format. For each image, this metadata includes information on image orientation, country, language, and device details. Each image is correctly named to correspond with the metadata.

    This metadata serves as a valuable resource for understanding and characterizing the data, aiding informed decision-making in the development of Italian text recognition models.

    Update & Custom Collection:

    We are committed to continually expanding this dataset by adding more images with the help of our native Italian crowd community.

    If you require a customized OCR dataset containing shopping list images tailored to your specific guidelines or device distribution, please don't hesitate to contact us. We have the capability to curate specialized data to meet your unique requirements.

    Additionally, we can annotate or label the images with bounding boxes or transcribe the text in the images to align with your project's specific needs using our crowd community.

    License:

    This image dataset, created by FutureBeeAI, is now available for commercial use.

    Conclusion:

    Leverage this shopping list image OCR dataset to enhance the training and performance of text recognition, text detection, and optical character recognition models for the Italian language. Your journey to improved language understanding and processing begins here.

  6. F

    Chinese Product Image OCR Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Chinese Product Image OCR Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/chinese-product-image-ocr-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    Introducing the Chinese Product Image Dataset - a diverse and comprehensive collection of images meticulously curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Chinese language.

    Dataset Contain & Diversity:

    Containing a total of 2000 images, this Chinese OCR dataset offers diverse distribution across different types of front images of Products. In this dataset, you'll find a variety of text that includes product names, taglines, logos, company names, addresses, product content, etc. Images in this dataset showcase distinct fonts, writing formats, colors, designs, and layouts.

    To ensure the diversity of the dataset and to build a robust text recognition model we allow limited (less than five) unique images from a single resource. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible Chinese text.

    Images have been captured under varying lighting conditions – both day and night – along with different capture angles and backgrounds, to build a balanced OCR dataset. The collection features images in portrait and landscape modes.

    All these images were captured by native Chinese people to ensure the text quality, avoid toxic content and PII text. We used the latest iOS and Android mobile devices above 5MP cameras to click all these images to maintain the image quality. In this training dataset images are available in both JPEG and HEIC formats.

    Metadata:

    Along with the image data, you will also receive detailed structured metadata in CSV format. For each image, it includes metadata like image orientation, county, language, and device information. Each image is properly renamed corresponding to the metadata.

    The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of Chinese text recognition models.

    Update & Custom Collection:

    We're committed to expanding this dataset by continuously adding more images with the assistance of our native Chinese crowd community.

    If you require a custom product image OCR dataset tailored to your guidelines or specific device distribution, feel free to contact us. We're equipped to curate specialized data to meet your unique needs.

    Furthermore, we can annotate or label the images with bounding box or transcribe the text in the image to align with your specific project requirements using our crowd community.

    License:

    This Image dataset, created by FutureBeeAI, is now available for commercial use.

    Conclusion:

    Leverage the power of this product image OCR dataset to elevate the training and performance of text recognition, text detection, and optical character recognition models within the realm of the Chinese language. Your journey to enhanced language understanding and processing starts here.

  7. F

    Korean Shopping List OCR Image Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Korean Shopping List OCR Image Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/korean-shopping-list-ocr-image-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    Introducing the Korean Shopping List Image Dataset - a diverse and comprehensive collection of handwritten text images carefully curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Korean language.

    Dataset Contain & Diversity:

    Containing more than 2000 images, this Korean OCR dataset offers a wide distribution of different types of shopping list images. Within this dataset, you'll discover a variety of handwritten text, including sentences, and individual item name words, quantity, comments, etc on shopping lists. The images in this dataset showcase distinct handwriting styles, fonts, font sizes, and writing variations.

    To ensure diversity and robustness in training your OCR model, we allow limited (less than three) unique images in a single handwriting. This ensures we have diverse types of handwriting to train your OCR model on. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible Korean text.

    The images have been captured under varying lighting conditions, including day and night, as well as different capture angles and backgrounds. This diversity helps build a balanced OCR dataset, featuring images in both portrait and landscape modes.

    All these shopping lists were written and images were captured by native Korean people to ensure text quality, prevent toxic content, and exclude PII text. We utilized the latest iOS and Android mobile devices with cameras above 5MP to maintain image quality. Images in this training dataset are available in both JPEG and HEIC formats.

    Metadata:

    In addition to the image data, you will receive structured metadata in CSV format. For each image, this metadata includes information on image orientation, country, language, and device details. Each image is correctly named to correspond with the metadata.

    This metadata serves as a valuable resource for understanding and characterizing the data, aiding informed decision-making in the development of Korean text recognition models.

    Update & Custom Collection:

    We are committed to continually expanding this dataset by adding more images with the help of our native Korean crowd community.

    If you require a customized OCR dataset containing shopping list images tailored to your specific guidelines or device distribution, please don't hesitate to contact us. We have the capability to curate specialized data to meet your unique requirements.

    Additionally, we can annotate or label the images with bounding boxes or transcribe the text in the images to align with your project's specific needs using our crowd community.

    License:

    This image dataset, created by FutureBeeAI, is now available for commercial use.

    Conclusion:

    Leverage this shopping list image OCR dataset to enhance the training and performance of text recognition, text detection, and optical character recognition models for the Korean language. Your journey to improved language understanding and processing begins here.

  8. F

    Handwritten Sticky Notes OCR Image Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Handwritten Sticky Notes OCR Image Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/arabic-sticky-notes-ocr-image-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    Introducing the Arabic Sticky Notes Image Dataset - a diverse and comprehensive collection of handwritten text images carefully curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Arabic language.

    Dataset Contain & Diversity:

    Containing more than 2000 images, this Arabic OCR dataset offers a wide distribution of different types of sticky note images. Within this dataset, you'll discover a variety of handwritten text, including quotes, sentences, and individual words on sticky notes. The images in this dataset showcase distinct handwriting styles, fonts, font sizes, and writing variations.

    To ensure diversity and robustness in training your OCR model, we allow limited (less than three) unique images in a single handwriting. This ensures we have diverse types of handwriting to train your OCR model on. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible Arabic text.

    The images have been captured under varying lighting conditions, including day and night, as well as different capture angles and backgrounds. This diversity helps build a balanced OCR dataset, featuring images in both portrait and landscape modes.

    All these sticky notes were written and images were captured by native Arabic people to ensure text quality, prevent toxic content, and exclude PII text. We utilized the latest iOS and Android mobile devices with cameras above 5MP to maintain image quality. Images in this training dataset are available in both JPEG and HEIC formats.

    Metadata:

    In addition to the image data, you will receive structured metadata in CSV format. For each image, this metadata includes information on image orientation, country, language, and device details. Each image is correctly named to correspond with the metadata.

    This metadata serves as a valuable resource for understanding and characterizing the data, aiding informed decision-making in the development of Arabic text recognition models.

    Update & Custom Collection:

    We are committed to continually expanding this dataset by adding more images with the help of our native Arabic crowd community.

    If you require a customized OCR dataset containing sticky note images tailored to your specific guidelines or device distribution, please don't hesitate to contact us. We have the capability to curate specialized data to meet your unique requirements.

    Additionally, we can annotate or label the images with bounding boxes or transcribe the text in the images to align with your project's specific needs using our crowd community.

    License:

    This image dataset, created by FutureBeeAI, is now available for commercial use.

    Conclusion:

    Leverage this sticky notes image OCR dataset to enhance the training and performance of text recognition, text detection, and optical character recognition models for the Arabic language. Your journey to improved language understanding and processing begins here.

  9. F

    Japanese Newspaper, Magazine, and Books OCR Image Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Japanese Newspaper, Magazine, and Books OCR Image Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/japanese-newspaper-book-magazine-ocr-image-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    Introducing the Japanese Newspaper, Books, and Magazine Image Dataset - a diverse and comprehensive collection of images meticulously curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Japanese language.

    Dataset Contain & Diversity:

    Containing a total of 5000 images, this Japanese OCR dataset offers an equal distribution across newspapers, books, and magazines. Within, you'll find a diverse collection of content, including articles, advertisements, cover pages, headlines, call outs, and author sections from a variety of newspapers, books, and magazines. Images in this dataset showcases distinct fonts, writing formats, colors, designs, and layouts.

    To ensure the diversity of the dataset and to build robust text recognition model we allow limited (less than five) unique images from a single resource. Stringent measures have been taken to exclude any personal identifiable information (PII), and in each image a minimum of 80% space is contain visible Japanese text.

    Images have been captured under varying lighting conditions – both day and night – along with different capture angles and backgrounds, further enhancing dataset diversity. The collection features images in portrait and landscape modes.

    All these images were captured by native Japanese people to ensure the text quality, avoid toxic content and PII text. We used latest iOS and android mobile devices above 5MP camera to click all these images to maintain the image quality. In this training dataset images are available in both JPEG and HEIC formats.

    Metadata:

    Along with the image data you will also receive detailed structured metadata in CSV format. For each image it includes metadata like device information, source type like newspaper, magazine or book image, and image type like portrait or landscape etc. Each image is properly renamed corresponding to the metadata.

    The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of Japanese text recognition models.

    Update & Custom Collection:

    We're committed to expanding this dataset by continuously adding more images with the assistance of our native Japanese crowd community.

    If you require a custom dataset tailored to your guidelines or specific device distribution, feel free to contact us. We're equipped to curate specialized data to meet your unique needs.

    Furthermore, we can annotate or label the images with bounding box or transcribe the text in the image to align with your specific requirements using our crowd community.

    License:

    This Image dataset, created by FutureBeeAI, is now available for commercial use.

    Conclusion:

    Leverage the power of this image dataset to elevate the training and performance of text recognition, text detection, and optical character recognition models within the realm of the Japanese language. Your journey to enhanced language understanding and processing starts here.

  10. F

    English Newspaper, Magazine, and Books OCR Image Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). English Newspaper, Magazine, and Books OCR Image Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/english-newspaper-book-magazine-ocr-image-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    Introducing the English Newspaper, Books, and Magazine Image Dataset - a diverse and comprehensive collection of images meticulously curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the English language.

    Dataset Contain & Diversity:

    Containing a total of 5000 images, this English OCR dataset offers an equal distribution across newspapers, books, and magazines. Within, you'll find a diverse collection of content, including articles, advertisements, cover pages, headlines, call outs, and author sections from a variety of newspapers, books, and magazines. Images in this dataset showcases distinct fonts, writing formats, colors, designs, and layouts.

    To ensure the diversity of the dataset and to build robust text recognition model we allow limited (less than five) unique images from a single resource. Stringent measures have been taken to exclude any personal identifiable information (PII), and in each image a minimum of 80% space is contain visible English text.

    Images have been captured under varying lighting conditions – both day and night – along with different capture angles and backgrounds, further enhancing dataset diversity. The collection features images in portrait and landscape modes.

    All these images were captured by native English Speaking people to ensure the text quality, avoid toxic content and PII text. We used latest iOS and android mobile devices above 5MP camera to click all these images to maintain the image quality. In this training dataset images are available in both JPEG and HEIC formats.

    Metadata:

    Along with the image data you will also receive detailed structured metadata in CSV format. For each image it includes metadata like device information, source type like newspaper, magazine or book image, and image type like portrait or landscape etc. Each image is properly renamed corresponding to the metadata.

    The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of English text recognition models.

    Update & Custom Collection:

    We're committed to expanding this dataset by continuously adding more images with the assistance of our native English language crowd community.

    If you require a custom dataset tailored to your guidelines or specific device distribution, feel free to contact us. We're equipped to curate specialized data to meet your unique needs.

    Furthermore, we can annotate or label the images with bounding box or transcribe the text in the image to align with your specific requirements using our crowd community.

    License:

    This Image dataset, created by FutureBeeAI, is now available for commercial use.

    Conclusion:

    Leverage the power of this image dataset to elevate the training and performance of text recognition, text detection, and optical character recognition models within the realm of the English language. Your journey to enhanced language understanding and processing starts here.

  11. F

    German Product Image OCR Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). German Product Image OCR Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/german-product-image-ocr-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    Introducing the German Product Image Dataset - a diverse and comprehensive collection of images meticulously curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the German language.

    Dataset Contain & Diversity:

    Containing a total of 2000 images, this German OCR dataset offers diverse distribution across different types of front images of Products. In this dataset, you'll find a variety of text that includes product names, taglines, logos, company names, addresses, product content, etc. Images in this dataset showcase distinct fonts, writing formats, colors, designs, and layouts.

    To ensure the diversity of the dataset and to build a robust text recognition model we allow limited (less than five) unique images from a single resource. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible German text.

    Images have been captured under varying lighting conditions – both day and night – along with different capture angles and backgrounds, to build a balanced OCR dataset. The collection features images in portrait and landscape modes.

    All these images were captured by native German people to ensure the text quality, avoid toxic content and PII text. We used the latest iOS and Android mobile devices above 5MP cameras to click all these images to maintain the image quality. In this training dataset images are available in both JPEG and HEIC formats.

    Metadata:

    Along with the image data, you will also receive detailed structured metadata in CSV format. For each image, it includes metadata like image orientation, county, language, and device information. Each image is properly renamed corresponding to the metadata.

    The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of German text recognition models.

    Update & Custom Collection:

    We're committed to expanding this dataset by continuously adding more images with the assistance of our native German crowd community.

    If you require a custom product image OCR dataset tailored to your guidelines or specific device distribution, feel free to contact us. We're equipped to curate specialized data to meet your unique needs.

    Furthermore, we can annotate or label the images with bounding box or transcribe the text in the image to align with your specific project requirements using our crowd community.

    License:

    This Image dataset, created by FutureBeeAI, is now available for commercial use.

    Conclusion:

    Leverage the power of this product image OCR dataset to elevate the training and performance of text recognition, text detection, and optical character recognition models within the realm of the German language. Your journey to enhanced language understanding and processing starts here.

  12. F

    Filipino Newspaper, Magazine, and Books OCR Image Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Filipino Newspaper, Magazine, and Books OCR Image Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/filipino-newspaper-book-magazine-ocr-image-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    Introducing the Filipino Newspaper, Books, and Magazine Image Dataset - a diverse and comprehensive collection of images meticulously curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Filipino language.

    Dataset Contain & Diversity:

    Containing a total of 5000 images, this Filipino OCR dataset offers an equal distribution across newspapers, books, and magazines. Within, you'll find a diverse collection of content, including articles, advertisements, cover pages, headlines, call outs, and author sections from a variety of newspapers, books, and magazines. Images in this dataset showcases distinct fonts, writing formats, colors, designs, and layouts.

    To ensure the diversity of the dataset and to build robust text recognition model we allow limited (less than five) unique images from a single resource. Stringent measures have been taken to exclude any personal identifiable information (PII), and in each image a minimum of 80% space is contain visible Filipino text.

    Images have been captured under varying lighting conditions – both day and night – along with different capture angles and backgrounds, further enhancing dataset diversity. The collection features images in portrait and landscape modes.

    All these images were captured by native Filipino people to ensure the text quality, avoid toxic content and PII text. We used latest iOS and android mobile devices above 5MP camera to click all these images to maintain the image quality. In this training dataset images are available in both JPEG and HEIC formats.

    Metadata:

    Along with the image data you will also receive detailed structured metadata in CSV format. For each image it includes metadata like device information, source type like newspaper, magazine or book image, and image type like portrait or landscape etc. Each image is properly renamed corresponding to the metadata.

    The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of Filipino text recognition models.

    Update & Custom Collection:

    We're committed to expanding this dataset by continuously adding more images with the assistance of our native Filipino crowd community.

    If you require a custom dataset tailored to your guidelines or specific device distribution, feel free to contact us. We're equipped to curate specialized data to meet your unique needs.

    Furthermore, we can annotate or label the images with bounding box or transcribe the text in the image to align with your specific requirements using our crowd community.

    License:

    This Image dataset, created by FutureBeeAI, is now available for commercial use.

    Conclusion:

    Leverage the power of this image dataset to elevate the training and performance of text recognition, text detection, and optical character recognition models within the realm of the Filipino language. Your journey to enhanced language understanding and processing starts here.

  13. F

    Thai Shopping List OCR Image Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Thai Shopping List OCR Image Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/thai-shopping-list-ocr-image-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    Introducing the Thai Shopping List Image Dataset - a diverse and comprehensive collection of handwritten text images carefully curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Thai language.

    Dataset Contain & Diversity:

    Containing more than 2000 images, this Thai OCR dataset offers a wide distribution of different types of shopping list images. Within this dataset, you'll discover a variety of handwritten text, including sentences, and individual item name words, quantity, comments, etc on shopping lists. The images in this dataset showcase distinct handwriting styles, fonts, font sizes, and writing variations.

    To ensure diversity and robustness in training your OCR model, we allow limited (less than three) unique images in a single handwriting. This ensures we have diverse types of handwriting to train your OCR model on. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible Thai text.

    The images have been captured under varying lighting conditions, including day and night, as well as different capture angles and backgrounds. This diversity helps build a balanced OCR dataset, featuring images in both portrait and landscape modes.

    All these shopping lists were written and images were captured by native Thai people to ensure text quality, prevent toxic content, and exclude PII text. We utilized the latest iOS and Android mobile devices with cameras above 5MP to maintain image quality. Images in this training dataset are available in both JPEG and HEIC formats.

    Metadata:

    In addition to the image data, you will receive structured metadata in CSV format. For each image, this metadata includes information on image orientation, country, language, and device details. Each image is correctly named to correspond with the metadata.

    This metadata serves as a valuable resource for understanding and characterizing the data, aiding informed decision-making in the development of Thai text recognition models.

    Update & Custom Collection:

    We are committed to continually expanding this dataset by adding more images with the help of our native Thai crowd community.

    If you require a customized OCR dataset containing shopping list images tailored to your specific guidelines or device distribution, please don't hesitate to contact us. We have the capability to curate specialized data to meet your unique requirements.

    Additionally, we can annotate or label the images with bounding boxes or transcribe the text in the images to align with your project's specific needs using our crowd community.

    License:

    This image dataset, created by FutureBeeAI, is now available for commercial use.

    Conclusion:

    Leverage this shopping list image OCR dataset to enhance the training and performance of text recognition, text detection, and optical character recognition models for the Thai language. Your journey to improved language understanding and processing begins here.

  14. F

    Punjabi Product Image OCR Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Punjabi Product Image OCR Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/punjabi-product-image-ocr-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    Introducing the Punjabi Product Image Dataset - a diverse and comprehensive collection of images meticulously curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Punjabi language.

    Dataset Contain & Diversity:

    Containing a total of 2000 images, this Punjabi OCR dataset offers diverse distribution across different types of front images of Products. In this dataset, you'll find a variety of text that includes product names, taglines, logos, company names, addresses, product content, etc. Images in this dataset showcase distinct fonts, writing formats, colors, designs, and layouts.

    To ensure the diversity of the dataset and to build a robust text recognition model we allow limited (less than five) unique images from a single resource. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible Punjabi text.

    Images have been captured under varying lighting conditions – both day and night – along with different capture angles and backgrounds, to build a balanced OCR dataset. The collection features images in portrait and landscape modes.

    All these images were captured by native Punjabi people to ensure the text quality, avoid toxic content and PII text. We used the latest iOS and Android mobile devices above 5MP cameras to click all these images to maintain the image quality. In this training dataset images are available in both JPEG and HEIC formats.

    Metadata:

    Along with the image data, you will also receive detailed structured metadata in CSV format. For each image, it includes metadata like image orientation, county, language, and device information. Each image is properly renamed corresponding to the metadata.

    The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of Punjabi text recognition models.

    Update & Custom Collection:

    We're committed to expanding this dataset by continuously adding more images with the assistance of our native Punjabi crowd community.

    If you require a custom product image OCR dataset tailored to your guidelines or specific device distribution, feel free to contact us. We're equipped to curate specialized data to meet your unique needs.

    Furthermore, we can annotate or label the images with bounding box or transcribe the text in the image to align with your specific project requirements using our crowd community.

    License:

    This Image dataset, created by FutureBeeAI, is now available for commercial use.

    Conclusion:

    Leverage the power of this product image OCR dataset to elevate the training and performance of text recognition, text detection, and optical character recognition models within the realm of the Punjabi language. Your journey to enhanced language understanding and processing starts here.

  15. F

    Japanese Handwritten Sticky Notes OCR Image Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Japanese Handwritten Sticky Notes OCR Image Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/japanese-sticky-notes-ocr-image-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    Introducing the Japanese Sticky Notes Image Dataset - a diverse and comprehensive collection of handwritten text images carefully curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Japanese language.

    Dataset Contain & Diversity:

    Containing more than 2000 images, this Japanese OCR dataset offers a wide distribution of different types of sticky note images. Within this dataset, you'll discover a variety of handwritten text, including quotes, sentences, and individual words on sticky notes. The images in this dataset showcase distinct handwriting styles, fonts, font sizes, and writing variations.

    To ensure diversity and robustness in training your OCR model, we allow limited (less than three) unique images in a single handwriting. This ensures we have diverse types of handwriting to train your OCR model on. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible Japanese text.

    The images have been captured under varying lighting conditions, including day and night, as well as different capture angles and backgrounds. This diversity helps build a balanced OCR dataset, featuring images in both portrait and landscape modes.

    All these sticky notes were written and images were captured by native Japanese people to ensure text quality, prevent toxic content, and exclude PII text. We utilized the latest iOS and Android mobile devices with cameras above 5MP to maintain image quality. Images in this training dataset are available in both JPEG and HEIC formats.

    Metadata:

    In addition to the image data, you will receive structured metadata in CSV format. For each image, this metadata includes information on image orientation, country, language, and device details. Each image is correctly named to correspond with the metadata.

    This metadata serves as a valuable resource for understanding and characterizing the data, aiding informed decision-making in the development of Japanese text recognition models.

    Update & Custom Collection:

    We are committed to continually expanding this dataset by adding more images with the help of our native Japanese crowd community.

    If you require a customized OCR dataset containing sticky note images tailored to your specific guidelines or device distribution, please don't hesitate to contact us. We have the capability to curate specialized data to meet your unique requirements.

    Additionally, we can annotate or label the images with bounding boxes or transcribe the text in the images to align with your project's specific needs using our crowd community.

    License:

    This image dataset, created by FutureBeeAI, is now available for commercial use.

    Conclusion:

    Leverage this sticky notes image OCR dataset to enhance the training and performance of text recognition, text detection, and optical character recognition models for the Japanese language. Your journey to improved language understanding and processing begins here.

  16. F

    Tamil Shopping List OCR Image Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Tamil Shopping List OCR Image Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/tamil-shopping-list-ocr-image-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    Introducing the Tamil Shopping List Image Dataset - a diverse and comprehensive collection of handwritten text images carefully curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Tamil language.

    Dataset Contain & Diversity:

    Containing more than 2000 images, this Tamil OCR dataset offers a wide distribution of different types of shopping list images. Within this dataset, you'll discover a variety of handwritten text, including sentences, and individual item name words, quantity, comments, etc on shopping lists. The images in this dataset showcase distinct handwriting styles, fonts, font sizes, and writing variations.

    To ensure diversity and robustness in training your OCR model, we allow limited (less than three) unique images in a single handwriting. This ensures we have diverse types of handwriting to train your OCR model on. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible Tamil text.

    The images have been captured under varying lighting conditions, including day and night, as well as different capture angles and backgrounds. This diversity helps build a balanced OCR dataset, featuring images in both portrait and landscape modes.

    All these shopping lists were written and images were captured by native Tamil people to ensure text quality, prevent toxic content, and exclude PII text. We utilized the latest iOS and Android mobile devices with cameras above 5MP to maintain image quality. Images in this training dataset are available in both JPEG and HEIC formats.

    Metadata:

    In addition to the image data, you will receive structured metadata in CSV format. For each image, this metadata includes information on image orientation, country, language, and device details. Each image is correctly named to correspond with the metadata.

    This metadata serves as a valuable resource for understanding and characterizing the data, aiding informed decision-making in the development of Tamil text recognition models.

    Update & Custom Collection:

    We are committed to continually expanding this dataset by adding more images with the help of our native Tamil crowd community.

    If you require a customized OCR dataset containing shopping list images tailored to your specific guidelines or device distribution, please don't hesitate to contact us. We have the capability to curate specialized data to meet your unique requirements.

    Additionally, we can annotate or label the images with bounding boxes or transcribe the text in the images to align with your project's specific needs using our crowd community.

    License:

    This image dataset, created by FutureBeeAI, is now available for commercial use.

    Conclusion:

    Leverage this shopping list image OCR dataset to enhance the training and performance of text recognition, text detection, and optical character recognition models for the Tamil language. Your journey to improved language understanding and processing begins here.

  17. F

    Spanish Product Image OCR Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Spanish Product Image OCR Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/spanish-product-image-ocr-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    Introducing the Spanish Product Image Dataset - a diverse and comprehensive collection of images meticulously curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Spanish language.

    Dataset Contain & Diversity:

    Containing a total of 2000 images, this Spanish OCR dataset offers diverse distribution across different types of front images of Products. In this dataset, you'll find a variety of text that includes product names, taglines, logos, company names, addresses, product content, etc. Images in this dataset showcase distinct fonts, writing formats, colors, designs, and layouts.

    To ensure the diversity of the dataset and to build a robust text recognition model we allow limited (less than five) unique images from a single resource. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible Spanish text.

    Images have been captured under varying lighting conditions – both day and night – along with different capture angles and backgrounds, to build a balanced OCR dataset. The collection features images in portrait and landscape modes.

    All these images were captured by native Spanish people to ensure the text quality, avoid toxic content and PII text. We used the latest iOS and Android mobile devices above 5MP cameras to click all these images to maintain the image quality. In this training dataset images are available in both JPEG and HEIC formats.

    Metadata:

    Along with the image data, you will also receive detailed structured metadata in CSV format. For each image, it includes metadata like image orientation, county, language, and device information. Each image is properly renamed corresponding to the metadata.

    The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of Spanish text recognition models.

    Update & Custom Collection:

    We're committed to expanding this dataset by continuously adding more images with the assistance of our native Spanish crowd community.

    If you require a custom product image OCR dataset tailored to your guidelines or specific device distribution, feel free to contact us. We're equipped to curate specialized data to meet your unique needs.

    Furthermore, we can annotate or label the images with bounding box or transcribe the text in the image to align with your specific project requirements using our crowd community.

    License:

    This Image dataset, created by FutureBeeAI, is now available for commercial use.

    Conclusion:

    Leverage the power of this product image OCR dataset to elevate the training and performance of text recognition, text detection, and optical character recognition models within the realm of the Spanish language. Your journey to enhanced language understanding and processing starts here.

  18. F

    Arabic Newspaper, Magazine, and Books OCR Image Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Arabic Newspaper, Magazine, and Books OCR Image Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/arabic-newspaper-book-magazine-ocr-image-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    Introducing the Arabic Newspaper, Books, and Magazine Image Dataset - a diverse and comprehensive collection of images meticulously curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Arabic language.

    Dataset Contain & Diversity:

    Containing a total of 5000 images, this Arabic OCR dataset offers an equal distribution across newspapers, books, and magazines. Within, you'll find a diverse collection of content, including articles, advertisements, cover pages, headlines, call outs, and author sections from a variety of newspapers, books, and magazines. Images in this dataset showcases distinct fonts, writing formats, colors, designs, and layouts.

    To ensure the diversity of the dataset and to build robust text recognition model we allow limited (less than five) unique images from a single resource. Stringent measures have been taken to exclude any personal identifiable information (PII), and in each image a minimum of 80% space is contain visible Arabic text.

    Images have been captured under varying lighting conditions – both day and night – along with different capture angles and backgrounds, further enhancing dataset diversity. The collection features images in portrait and landscape modes.

    All these images were captured by native Arabic people to ensure the text quality, avoid toxic content and PII text. We used latest iOS and android mobile devices above 5MP camera to click all these images to maintain the image quality. In this training dataset images are available in both JPEG and HEIC formats.

    Metadata:

    Along with the image data you will also receive detailed structured metadata in CSV format. For each image it includes metadata like device information, source type like newspaper, magazine or book image, and image type like portrait or landscape etc. Each image is properly renamed corresponding to the metadata.

    The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of Arabic text recognition models.

    Update & Custom Collection:

    We're committed to expanding this dataset by continuously adding more images with the assistance of our native Arabic crowd community.

    If you require a custom dataset tailored to your guidelines or specific device distribution, feel free to contact us. We're equipped to curate specialized data to meet your unique needs.

    Furthermore, we can annotate or label the images with bounding box or transcribe the text in the image to align with your specific requirements using our crowd community.

    License:

    This Image dataset, created by FutureBeeAI, is now available for commercial use.

    Conclusion:

    Leverage the power of this image dataset to elevate the training and performance of text recognition, text detection, and optical character recognition models within the realm of the Arabic language. Your journey to enhanced language understanding and processing starts here.

  19. F

    Ukrainian Handwritten Sticky Notes OCR Image Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Ukrainian Handwritten Sticky Notes OCR Image Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/ukrainian-sticky-notes-ocr-image-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    Introducing the Ukrainian Sticky Notes Image Dataset - a diverse and comprehensive collection of handwritten text images carefully curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Ukrainian language.

    Dataset Contain & Diversity:

    Containing more than 2000 images, this Ukrainian OCR dataset offers a wide distribution of different types of sticky note images. Within this dataset, you'll discover a variety of handwritten text, including quotes, sentences, and individual words on sticky notes. The images in this dataset showcase distinct handwriting styles, fonts, font sizes, and writing variations.

    To ensure diversity and robustness in training your OCR model, we allow limited (less than three) unique images in a single handwriting. This ensures we have diverse types of handwriting to train your OCR model on. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible Ukrainian text.

    The images have been captured under varying lighting conditions, including day and night, as well as different capture angles and backgrounds. This diversity helps build a balanced OCR dataset, featuring images in both portrait and landscape modes.

    All these sticky notes were written and images were captured by native Ukrainian people to ensure text quality, prevent toxic content, and exclude PII text. We utilized the latest iOS and Android mobile devices with cameras above 5MP to maintain image quality. Images in this training dataset are available in both JPEG and HEIC formats.

    Metadata:

    In addition to the image data, you will receive structured metadata in CSV format. For each image, this metadata includes information on image orientation, country, language, and device details. Each image is correctly named to correspond with the metadata.

    This metadata serves as a valuable resource for understanding and characterizing the data, aiding informed decision-making in the development of Ukrainian text recognition models.

    Update & Custom Collection:

    We are committed to continually expanding this dataset by adding more images with the help of our native Ukrainian crowd community.

    If you require a customized OCR dataset containing sticky note images tailored to your specific guidelines or device distribution, please don't hesitate to contact us. We have the capability to curate specialized data to meet your unique requirements.

    Additionally, we can annotate or label the images with bounding boxes or transcribe the text in the images to align with your project's specific needs using our crowd community.

    License:

    This image dataset, created by FutureBeeAI, is now available for commercial use.

    Conclusion:

    Leverage this sticky notes image OCR dataset to enhance the training and performance of text recognition, text detection, and optical character recognition models for the Ukrainian language. Your journey to improved language understanding and processing begins here.

  20. F

    Punjabi Newspaper, Magazine, and Books OCR Image Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Punjabi Newspaper, Magazine, and Books OCR Image Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/punjabi-newspaper-book-magazine-ocr-image-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    Introducing the Punjabi Newspaper, Books, and Magazine Image Dataset - a diverse and comprehensive collection of images meticulously curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Punjabi language.

    Dataset Contain & Diversity:

    Containing a total of 5000 images, this Punjabi OCR dataset offers an equal distribution across newspapers, books, and magazines. Within, you'll find a diverse collection of content, including articles, advertisements, cover pages, headlines, call outs, and author sections from a variety of newspapers, books, and magazines. Images in this dataset showcases distinct fonts, writing formats, colors, designs, and layouts.

    To ensure the diversity of the dataset and to build robust text recognition model we allow limited (less than five) unique images from a single resource. Stringent measures have been taken to exclude any personal identifiable information (PII), and in each image a minimum of 80% space is contain visible Punjabi text.

    Images have been captured under varying lighting conditions – both day and night – along with different capture angles and backgrounds, further enhancing dataset diversity. The collection features images in portrait and landscape modes.

    All these images were captured by native Punjabi people to ensure the text quality, avoid toxic content and PII text. We used latest iOS and android mobile devices above 5MP camera to click all these images to maintain the image quality. In this training dataset images are available in both JPEG and HEIC formats.

    Metadata:

    Along with the image data you will also receive detailed structured metadata in CSV format. For each image it includes metadata like device information, source type like newspaper, magazine or book image, and image type like portrait or landscape etc. Each image is properly renamed corresponding to the metadata.

    The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of Punjabi text recognition models.

    Update & Custom Collection:

    We're committed to expanding this dataset by continuously adding more images with the assistance of our native Punjabi crowd community.

    If you require a custom dataset tailored to your guidelines or specific device distribution, feel free to contact us. We're equipped to curate specialized data to meet your unique needs.

    Furthermore, we can annotate or label the images with bounding box or transcribe the text in the image to align with your specific requirements using our crowd community.

    License:

    This Image dataset, created by FutureBeeAI, is now available for commercial use.

    Conclusion:

    Leverage the power of this image dataset to elevate the training and performance of text recognition, text detection, and optical character recognition models within the realm of the Punjabi language. Your journey to enhanced language understanding and processing starts here.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
FutureBee AI (2022). English Shopping List OCR Image Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/english-shopping-list-ocr-image-dataset

English Shopping List OCR Image Dataset

English Handwritten Shopping List OCR dataset

Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License

https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

Dataset funded by
FutureBeeAI
Description

What’s Included

Introducing the English Shopping List Image Dataset - a diverse and comprehensive collection of handwritten text images carefully curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the English language.

Dataset Contain & Diversity:

Containing more than 2000 images, this English OCR dataset offers a wide distribution of different types of shopping list images. Within this dataset, you'll discover a variety of handwritten text, including sentences, and individual item name words, quantity, comments, etc on shopping lists. The images in this dataset showcase distinct handwriting styles, fonts, font sizes, and writing variations.

To ensure diversity and robustness in training your OCR model, we allow limited (less than three) unique images in a single handwriting. This ensures we have diverse types of handwriting to train your OCR model on. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible English text.

The images have been captured under varying lighting conditions, including day and night, as well as different capture angles and backgrounds. This diversity helps build a balanced OCR dataset, featuring images in both portrait and landscape modes.

All these shopping lists were written and images were captured by native English people to ensure text quality, prevent toxic content, and exclude PII text. We utilized the latest iOS and Android mobile devices with cameras above 5MP to maintain image quality. Images in this training dataset are available in both JPEG and HEIC formats.

Metadata:

In addition to the image data, you will receive structured metadata in CSV format. For each image, this metadata includes information on image orientation, country, language, and device details. Each image is correctly named to correspond with the metadata.

This metadata serves as a valuable resource for understanding and characterizing the data, aiding informed decision-making in the development of English text recognition models.

Update & Custom Collection:

We are committed to continually expanding this dataset by adding more images with the help of our native English crowd community.

If you require a customized OCR dataset containing shopping list images tailored to your specific guidelines or device distribution, please don't hesitate to contact us. We have the capability to curate specialized data to meet your unique requirements.

Additionally, we can annotate or label the images with bounding boxes or transcribe the text in the images to align with your project's specific needs using our crowd community.

License:

This image dataset, created by FutureBeeAI, is now available for commercial use.

Conclusion:

Leverage this shopping list image OCR dataset to enhance the training and performance of text recognition, text detection, and optical character recognition models for the English language. Your journey to improved language understanding and processing begins here.

Search
Clear search
Close search
Google apps
Main menu