100+ datasets found

F
Filipino Product Image OCR Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Filipino Product Image OCR Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/filipino-product-image-ocr-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
Introducing the Filipino Product Image Dataset - a diverse and comprehensive collection of images meticulously curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Filipino language.
Dataset Contain & Diversity:
Containing a total of 2000 images, this Filipino OCR dataset offers diverse distribution across different types of front images of Products. In this dataset, you'll find a variety of text that includes product names, taglines, logos, company names, addresses, product content, etc. Images in this dataset showcase distinct fonts, writing formats, colors, designs, and layouts.
To ensure the diversity of the dataset and to build a robust text recognition model we allow limited (less than five) unique images from a single resource. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible Filipino text.
Images have been captured under varying lighting conditions – both day and night – along with different capture angles and backgrounds, to build a balanced OCR dataset. The collection features images in portrait and landscape modes.
All these images were captured by native Filipino people to ensure the text quality, avoid toxic content and PII text. We used the latest iOS and Android mobile devices above 5MP cameras to click all these images to maintain the image quality. In this training dataset images are available in both JPEG and HEIC formats.
Metadata:
Along with the image data, you will also receive detailed structured metadata in CSV format. For each image, it includes metadata like image orientation, county, language, and device information. Each image is properly renamed corresponding to the metadata.
The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of Filipino text recognition models.
Update & Custom Collection:
We're committed to expanding this dataset by continuously adding more images with the assistance of our native Filipino crowd community.
If you require a custom product image OCR dataset tailored to your guidelines or specific device distribution, feel free to contact us. We're equipped to curate specialized data to meet your unique needs.
Furthermore, we can annotate or label the images with bounding box or transcribe the text in the image to align with your specific project requirements using our crowd community.
License:
This Image dataset, created by FutureBeeAI, is now available for commercial use.
Conclusion:
Leverage the power of this product image OCR dataset to elevate the training and performance of text recognition, text detection, and optical character recognition models within the realm of the Filipino language. Your journey to enhanced language understanding and processing starts here.
Wirestock's AI/ML Image Training Data, 4.5M Files with Metadata
datarade.ai
.csv
Updated Jul 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WIRESTOCK (2023). Wirestock's AI/ML Image Training Data, 4.5M Files with Metadata [Dataset]. https://datarade.ai/data-products/wirestock-s-ai-ml-image-training-data-4-5m-files-with-metadata-wirestock
Explore at:
.csvAvailable download formats
Dataset updated
Jul 18, 2023
Dataset provided by
Wirestock, Inc.
Authors
WIRESTOCK
Area covered
Belarus, Peru, Estonia, Georgia, Jersey, New Caledonia, Chile, Pakistan, Swaziland, Sudan
Description
Wirestock's AI/ML Image Training Data, 4.5M Files with Metadata: This data product is a unique offering in the realm of AI/ML training data. What sets it apart is the sheer volume and diversity of the dataset, which includes 4.5 million files spanning across 20 different categories. These categories range from Animals/Wildlife and The Arts to Technology and Transportation, providing a rich and varied dataset for AI/ML applications.

The data is sourced from Wirestock's platform, where creators upload and sell their photos, videos, and AI art online. This means that the data is not only vast but also constantly updated, ensuring a fresh and relevant dataset for your AI/ML needs. The data is collected in a GDPR-compliant manner, ensuring the privacy and rights of the creators are respected.

The primary use-cases for this data product are numerous. It is ideal for training machine learning models for image recognition, improving computer vision algorithms, and enhancing AI applications in various industries such as retail, healthcare, and transportation. The diversity of the dataset also means it can be used for more niche applications, such as training AI to recognize specific objects or scenes.

This data product fits into Wirestock's broader data offering as a key resource for AI/ML training. Wirestock is a platform for creators to sell their work, and this dataset is a collection of that work. It represents the breadth and depth of content available on Wirestock, making it a valuable resource for any company working with AI/ML.

The core benefits of this dataset are its volume, diversity, and quality. With 4.5 million files, it provides a vast resource for AI training. The diversity of the dataset, spanning 20 categories, ensures a wide range of images for training purposes. The quality of the images is also high, as they are sourced from creators selling their work on Wirestock.

In terms of how the data is collected, creators upload their work to Wirestock, where it is then sold on various marketplaces. This means the data is sourced directly from creators, ensuring a diverse and unique dataset. The data includes both the images themselves and associated metadata, providing additional context for each image.

The different image categories included in this dataset are Animals/Wildlife, The Arts, Backgrounds/Textures, Beauty/Fashion, Buildings/Landmarks, Business/Finance, Celebrities, Education, Emotions, Food Drinks, Holidays, Industrial, Interiors, Nature Parks/Outdoor, People, Religion, Science, Signs/Symbols, Sports/Recreation, Technology, Transportation, Vintage, Healthcare/Medical, Objects, and Miscellaneous. This wide range of categories ensures a diverse dataset that can cater to a variety of AI/ML applications.
c
Ultra beauty products dataset
crawlfeeds.com
csv, zip
Updated May 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Ultra beauty products dataset [Dataset]. https://crawlfeeds.com/datasets/ultra-beauty-products-dataset
Explore at:
csv, zipAvailable download formats
Dataset updated
May 20, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
Description:

Ultra Beauty Products Dataset offers detailed insights into Ulta Beauty's wide range of beauty products. This dataset includes product URLs, titles, SKUs, high-quality images, pricing, availability, and customer reviews, making it an essential resource for e-commerce analytics, competitor research, and digital marketing.

Leverage this dataset to optimize product listings, analyze market trends, and enhance customer engagement.

For access to more beauty and cosmetics datasets, explore the Beauty and Cosmetics Data Collection and power your data-driven strategies today.

Fields:

url: Product URL on the Ultra Beauty website.

title: Name of the beauty product.

sku: Stock Keeping Unit identifier.

productID: Unique identifier for each product.

main_image: URL of the main product image.

price: Retail price of the product.

currency: Currency used for the price (e.g., USD).

product_variant: Variant of the product (e.g., size, color).

summary: Short description or summary of the product.

raw_summary: Raw, unprocessed summary text of the product.

availability: Availability status (e.g., in stock, out of stock).

primary_category: Main category of the product.

category1: Primary sub-category.

category2: Secondary sub-category (if applicable).

breads: Breadth of product categories (if applicable).

images: Additional product images (URLs).

description: Detailed description of the product.

ingredients: Ingredients used in the product.

raw_ingredients: Raw, unprocessed ingredients list.

details: Additional product details.

raw_details: Raw, unprocessed details.

how_to_use: Instructions on how to use the product.

raw_how_to_use: Raw, unprocessed usage instructions.

avg_rating: Average customer rating (out of 5 stars).

review_count: Number of customer reviews.

reviews: Customer reviews (text).

uniq_id: Unique identifier for the record.

scraped_at: Timestamp when the data was scraped.

Source:

The dataset is sourced from Ultra Beauty's product catalog.

Usage:

Ideal for analyzing product trends, pricing strategies, customer preferences, and market research in the beauty industry.
F
German Product Image OCR Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). German Product Image OCR Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/german-product-image-ocr-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
Introducing the German Product Image Dataset - a diverse and comprehensive collection of images meticulously curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the German language.
Dataset Contain & Diversity:
Containing a total of 2000 images, this German OCR dataset offers diverse distribution across different types of front images of Products. In this dataset, you'll find a variety of text that includes product names, taglines, logos, company names, addresses, product content, etc. Images in this dataset showcase distinct fonts, writing formats, colors, designs, and layouts.
To ensure the diversity of the dataset and to build a robust text recognition model we allow limited (less than five) unique images from a single resource. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible German text.
Images have been captured under varying lighting conditions – both day and night – along with different capture angles and backgrounds, to build a balanced OCR dataset. The collection features images in portrait and landscape modes.
All these images were captured by native German people to ensure the text quality, avoid toxic content and PII text. We used the latest iOS and Android mobile devices above 5MP cameras to click all these images to maintain the image quality. In this training dataset images are available in both JPEG and HEIC formats.
Metadata:
Along with the image data, you will also receive detailed structured metadata in CSV format. For each image, it includes metadata like image orientation, county, language, and device information. Each image is properly renamed corresponding to the metadata.
The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of German text recognition models.
Update & Custom Collection:
We're committed to expanding this dataset by continuously adding more images with the assistance of our native German crowd community.
If you require a custom product image OCR dataset tailored to your guidelines or specific device distribution, feel free to contact us. We're equipped to curate specialized data to meet your unique needs.
Furthermore, we can annotate or label the images with bounding box or transcribe the text in the image to align with your specific project requirements using our crowd community.
License:
This Image dataset, created by FutureBeeAI, is now available for commercial use.
Conclusion:
Leverage the power of this product image OCR dataset to elevate the training and performance of text recognition, text detection, and optical character recognition models within the realm of the German language. Your journey to enhanced language understanding and processing starts here.
ScrapeHero Data Cloud - Free and Easy to use
datarade.ai
.json, .csv
Updated Apr 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scrapehero (2022). ScrapeHero Data Cloud - Free and Easy to use [Dataset]. https://datarade.ai/data-products/scrapehero-data-cloud-free-and-easy-to-use-scrapehero
Explore at:
.json, .csvAvailable download formats
Dataset updated
Apr 11, 2022
Dataset provided by
ScrapeHero
Authors
Scrapehero
Area covered
Bhutan, Dominica, Chad, Bahamas, Slovakia, Anguilla, Portugal, Ghana, Niue, Bahrain
Description
The Easiest Way to Collect Data from the Internet Download anything you see on the internet into spreadsheets within a few clicks using our ready-made web crawlers or a few lines of code using our APIs

We have made it as simple as possible to collect data from websites

Easy to Use Crawlers Amazon Product Details and Pricing Scraper Amazon Product Details and Pricing Scraper Get product information, pricing, FBA, best seller rank, and much more from Amazon.

Google Maps Search Results Google Maps Search Results Get details like place name, phone number, address, website, ratings, and open hours from Google Maps or Google Places search results.

Twitter Scraper Twitter Scraper Get tweets, Twitter handle, content, number of replies, number of retweets, and more. All you need to provide is a URL to a profile, hashtag, or an advance search URL from Twitter.

Amazon Product Reviews and Ratings Amazon Product Reviews and Ratings Get customer reviews for any product on Amazon and get details like product name, brand, reviews and ratings, and more from Amazon.

Google Reviews Scraper Google Reviews Scraper Scrape Google reviews and get details like business or location name, address, review, ratings, and more for business and places.

Walmart Product Details & Pricing Walmart Product Details & Pricing Get the product name, pricing, number of ratings, reviews, product images, URL other product-related data from Walmart.

Amazon Search Results Scraper Amazon Search Results Scraper Get product search rank, pricing, availability, best seller rank, and much more from Amazon.

Amazon Best Sellers Amazon Best Sellers Get the bestseller rank, product name, pricing, number of ratings, rating, product images, and more from any Amazon Bestseller List.

Google Search Scraper Google Search Scraper Scrape Google search results and get details like search rank, paid and organic results, knowledge graph, related search results, and more.

Walmart Product Reviews & Ratings Walmart Product Reviews & Ratings Get customer reviews for any product on Walmart.com and get details like product name, brand, reviews, and ratings.

Scrape Emails and Contact Details Scrape Emails and Contact Details Get emails, addresses, contact numbers, social media links from any website.

Walmart Search Results Scraper Walmart Search Results Scraper Get Product details such as pricing, availability, reviews, ratings, and more from Walmart search results and categories.

Glassdoor Job Listings Glassdoor Job Listings Scrape job details such as job title, salary, job description, location, company name, number of reviews, and ratings from Glassdoor.

Indeed Job Listings Indeed Job Listings Scrape job details such as job title, salary, job description, location, company name, number of reviews, and ratings from Indeed.

LinkedIn Jobs Scraper Premium LinkedIn Jobs Scraper Scrape job listings on LinkedIn and extract job details such as job title, job description, location, company name, number of reviews, and more.

Redfin Scraper Premium Redfin Scraper Scrape real estate listings from Redfin. Extract property details such as address, price, mortgage, redfin estimate, broker name and more.

Yelp Business Details Scraper Yelp Business Details Scraper Scrape business details from Yelp such as phone number, address, website, and more from Yelp search and business details page.

Zillow Scraper Premium Zillow Scraper Scrape real estate listings from Zillow. Extract property details such as address, price, Broker, broker name and more.

Amazon product offers and third party sellers Amazon product offers and third party sellers Get product pricing, delivery details, FBA, seller details, and much more from the Amazon offer listing page.

Realtor Scraper Premium Realtor Scraper Scrape real estate listings from Realtor.com. Extract property details such as Address, Price, Area, Broker and more.

Target Product Details & Pricing Target Product Details & Pricing Get product details from search results and category pages such as pricing, availability, rating, reviews, and 20+ data points from Target.

Trulia Scraper Premium Trulia Scraper Scrape real estate listings from Trulia. Extract property details such as Address, Price, Area, Mortgage and more.

Amazon Customer FAQs Amazon Customer FAQs Get FAQs for any product on Amazon and get details like the question, answer, answered user name, and more.

Yellow Pages Scraper Yellow Pages Scraper Get details like business name, phone number, address, website, ratings, and more from Yellow Pages search results.
d
Google SERP Data, Web Search Data, Google Images Data | Real-Time API
datarade.ai
.json, .csv
Updated May 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenWeb Ninja (2024). Google SERP Data, Web Search Data, Google Images Data | Real-Time API [Dataset]. https://datarade.ai/data-products/openweb-ninja-google-data-google-image-data-google-serp-d-openweb-ninja
Explore at:
.json, .csvAvailable download formats
Dataset updated
May 17, 2024
Dataset authored and provided by
OpenWeb Ninja
Area covered
Panama, Ireland, South Georgia and the South Sandwich Islands, Grenada, Barbados, Uganda, Burundi, Tokelau, Virgin Islands (U.S.), Uruguay
Description
OpenWeb Ninja's Google Images Data (Google SERP Data) API provides real-time image search capabilities for images sourced from all public sources on the web.

The API enables you to search and access more than 100 billion images from across the web including advanced filtering capabilities as supported by Google Advanced Image Search. The API provides Google Images Data (Google SERP Data) including details such as image URL, title, size information, thumbnail, source information, and more data points. The API supports advanced filtering and options such as file type, image color, usage rights, creation time, and more. In addition, any Advanced Google Search operators can be used with the API.

OpenWeb Ninja's Google Images Data & Google SERP Data API common use cases:

Creative Media Production: Enhance digital content with a vast array of real-time images, ensuring engaging and brand-aligned visuals for blogs, social media, and advertising.

AI Model Enhancement: Train and refine AI models with diverse, annotated images, improving object recognition and image classification accuracy.

Trend Analysis: Identify emerging market trends and consumer preferences through real-time visual data, enabling proactive business decisions.

Innovative Product Design: Inspire product innovation by exploring current design trends and competitor products, ensuring market-relevant offerings.

Advanced Search Optimization: Improve search engines and applications with enriched image datasets, providing users with accurate, relevant, and visually appealing search results.

OpenWeb Ninja's Annotated Imagery Data & Google SERP Data Stats & Capabilities:

100B+ Images: Access an extensive database of over 100 billion images.

Images Data from all Public Sources (Google SERP Data): Benefit from a comprehensive aggregation of image data from various public websites, ensuring a wide range of sources and perspectives.

Extensive Search and Filtering Capabilities: Utilize advanced search operators and filters to refine image searches by file type, color, usage rights, creation time, and more, making it easy to find exactly what you need.

Rich Data Points: Each image comes with more than 10 data points, including URL, title (annotation), size information, thumbnail, and source information, providing a detailed context for each image.
F
Finnish Product Image OCR Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Finnish Product Image OCR Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/finnish-product-image-ocr-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
Introducing the Finnish Product Image Dataset - a diverse and comprehensive collection of images meticulously curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Finnish language.
Dataset Contain & Diversity:
Containing a total of 2000 images, this Finnish OCR dataset offers diverse distribution across different types of front images of Products. In this dataset, you'll find a variety of text that includes product names, taglines, logos, company names, addresses, product content, etc. Images in this dataset showcase distinct fonts, writing formats, colors, designs, and layouts.
To ensure the diversity of the dataset and to build a robust text recognition model we allow limited (less than five) unique images from a single resource. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible Finnish text.
Images have been captured under varying lighting conditions – both day and night – along with different capture angles and backgrounds, to build a balanced OCR dataset. The collection features images in portrait and landscape modes.
All these images were captured by native Finnish people to ensure the text quality, avoid toxic content and PII text. We used the latest iOS and Android mobile devices above 5MP cameras to click all these images to maintain the image quality. In this training dataset images are available in both JPEG and HEIC formats.
Metadata:
Along with the image data, you will also receive detailed structured metadata in CSV format. For each image, it includes metadata like image orientation, county, language, and device information. Each image is properly renamed corresponding to the metadata.
The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of Finnish text recognition models.
Update & Custom Collection:
We're committed to expanding this dataset by continuously adding more images with the assistance of our native Finnish crowd community.
If you require a custom product image OCR dataset tailored to your guidelines or specific device distribution, feel free to contact us. We're equipped to curate specialized data to meet your unique needs.
Furthermore, we can annotate or label the images with bounding box or transcribe the text in the image to align with your specific project requirements using our crowd community.
License:
This Image dataset, created by FutureBeeAI, is now available for commercial use.
Conclusion:
Leverage the power of this product image OCR dataset to elevate the training and performance of text recognition, text detection, and optical character recognition models within the realm of the Finnish language. Your journey to enhanced language understanding and processing starts here.
d
340K+ Jewelry Images | AI Training Data | Object Detection Data | Annotated...
datarade.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Seeds, 340K+ Jewelry Images | AI Training Data | Object Detection Data | Annotated imagery data | Global Coverage [Dataset]. https://datarade.ai/data-products/200k-jewelry-images-ai-training-data-object-detection-da-data-seeds
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset authored and provided by
Data Seeds
Area covered
Madagascar, Swaziland, Bahrain, Denmark, Turkey, Bhutan, Venezuela (Bolivarian Republic of), Tokelau, Equatorial Guinea, Saint Vincent and the Grenadines
Description
This dataset features over 340,000 high-quality images of jewelry sourced from photographers worldwide. Designed to support AI and machine learning applications, it provides a richly detailed and carefully annotated collection of jewelry imagery across styles, materials, and contexts.

Key Features: 1. Comprehensive Metadata: the dataset includes full EXIF data, detailing camera settings such as aperture, ISO, shutter speed, and focal length. Each image is pre-annotated with object and scene detection metadata, including jewelry type, material, and context—ideal for tasks like object detection, style classification, and fine-grained visual analysis. Popularity metrics, derived from engagement on our proprietary platform, are also included.

Unique Sourcing Capabilities: the images are collected through a proprietary gamified platform for photographers. Competitions focused on jewelry photography ensure high-quality, well-lit, and visually appealing submissions. Custom datasets can be sourced on-demand within 72 hours to meet specific requirements such as jewelry category (rings, necklaces, bracelets, etc.), material type, or presentation style (worn vs. product shots).

Global Diversity: photographs have been submitted by contributors in over 100 countries, offering an extensive range of cultural styles, design traditions, and jewelry aesthetics. The dataset includes handcrafted and luxury items, traditional and contemporary pieces, and representations across diverse ethnic and regional fashions.

High-Quality Imagery: the dataset includes high-resolution images suitable for detailed product analysis. Both studio-lit commercial shots and lifestyle/editorial photography are included, allowing models to learn from various presentation styles and settings.

Popularity Scores: each image is assigned a popularity score based on its performance in GuruShots competitions. This metric offers insight into aesthetic appeal and global consumer preferences, aiding AI models focused on trend analysis or user engagement.

AI-Ready Design: this dataset is optimized for training AI in jewelry classification, attribute tagging, visual search, and recommendation systems. It integrates easily into retail AI workflows and supports model development for e-commerce and fashion platforms.

Licensing & Compliance: the dataset complies fully with data privacy and IP standards, offering transparent licensing for commercial and academic purposes.

Use Cases: 1. Training AI for visual search and recommendation engines in jewelry e-commerce. 2. Enhancing product recognition, classification, and tagging systems. 3. Powering AR/VR applications for virtual try-ons and 3D visualization. 4. Supporting fashion analytics, trend forecasting, and cultural design research.

This dataset offers a diverse, high-quality resource for training AI and ML models in the jewelry and fashion space. Customizations are available to meet specific product or market needs. Contact us to learn more!
c
Amazon Beauty Products Dataset with Ingredients (47K Records)
crawlfeeds.com
csv, zip
Updated Jun 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Amazon Beauty Products Dataset with Ingredients (47K Records) [Dataset]. https://crawlfeeds.com/datasets/amazon-beauty-products-dataset-with-ingredients-47k-records
Explore at:
csv, zipAvailable download formats
Dataset updated
Jun 28, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
Gain insights into Amazon’s beauty and personal care market with this comprehensive Amazon Beauty Products Dataset. Covering 47,000 records across skincare, haircare, and makeup, this dataset provides full ingredient lists, product descriptions, pricing, and availability. Ideal for researchers and businesses focused on ingredient transparency, beauty trend analysis, and competitive market insights. Perfect for applications in ingredient research, product development, and e-commerce analysis.

Access a rich Amazon Beauty & Cosmetics dataset with over 200,000+ product records, including detailed ingredients.
Explore more on our Beauty & Cosmetics Data page or view the full Amazon Beauty Dataset

Walmart product dataset featuring detailed ingredient information across categories like beauty, food, personal care, and more.
View Dataset →

The dataset includes the following fields:

ASIN: Unique Amazon product identifier.

Product Name and Description: Full titles and descriptions of each product.

Price and Availability: Current pricing and stock status.

Categories: Product type classification (e.g., skincare, haircare, makeup).

Ingredients: Complete ingredient lists, ensuring transparency about product composition.

Images: High-quality product images.

Brand and Manufacturer Information: Details of the brand and manufacturer.

Customer Ratings and Reviews: User-generated content for understanding product popularity and performance.

This dataset is invaluable for:

Ingredient Analysis: Understanding popular ingredients in beauty products.

Market Research: Analyzing trends in beauty products, such as ingredient types and product categories.

Competitive Analysis: Assessing product offerings by brand, price, and ingredients.

Whether you’re focused on skincare, haircare, makeup, or other beauty categories, this dataset provides in-depth information for deep analysis. For any custom requirements or additional data needs, please feel free to reach out.
F
Spanish Product Image OCR Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Spanish Product Image OCR Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/spanish-product-image-ocr-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
Introducing the Spanish Product Image Dataset - a diverse and comprehensive collection of images meticulously curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Spanish language.
Dataset Contain & Diversity:
Containing a total of 2000 images, this Spanish OCR dataset offers diverse distribution across different types of front images of Products. In this dataset, you'll find a variety of text that includes product names, taglines, logos, company names, addresses, product content, etc. Images in this dataset showcase distinct fonts, writing formats, colors, designs, and layouts.
To ensure the diversity of the dataset and to build a robust text recognition model we allow limited (less than five) unique images from a single resource. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible Spanish text.
Images have been captured under varying lighting conditions – both day and night – along with different capture angles and backgrounds, to build a balanced OCR dataset. The collection features images in portrait and landscape modes.
All these images were captured by native Spanish people to ensure the text quality, avoid toxic content and PII text. We used the latest iOS and Android mobile devices above 5MP cameras to click all these images to maintain the image quality. In this training dataset images are available in both JPEG and HEIC formats.
Metadata:
Along with the image data, you will also receive detailed structured metadata in CSV format. For each image, it includes metadata like image orientation, county, language, and device information. Each image is properly renamed corresponding to the metadata.
The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of Spanish text recognition models.
Update & Custom Collection:
We're committed to expanding this dataset by continuously adding more images with the assistance of our native Spanish crowd community.
If you require a custom product image OCR dataset tailored to your guidelines or specific device distribution, feel free to contact us. We're equipped to curate specialized data to meet your unique needs.
Furthermore, we can annotate or label the images with bounding box or transcribe the text in the image to align with your specific project requirements using our crowd community.
License:
This Image dataset, created by FutureBeeAI, is now available for commercial use.
Conclusion:
Leverage the power of this product image OCR dataset to elevate the training and performance of text recognition, text detection, and optical character recognition models within the realm of the Spanish language. Your journey to enhanced language understanding and processing starts here.
Stock Images Market Analysis, Size, and Forecast 2025-2029: North America...
technavio.com
Updated Dec 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2024). Stock Images Market Analysis, Size, and Forecast 2025-2029: North America (US, Canada), Europe (Germany, UK, Italy, France), APAC (China, India, Japan), South America (Brazil), Middle East & Africa [Dataset]. https://www.technavio.com/report/stock-images-market-industry-analysis
Explore at:
Dataset updated
Dec 15, 2024
Dataset provided by
TechNavio
Authors
Technavio
Time period covered
2021 - 2025
Area covered
Global
Description
Snapshot img

Stock Images Market Size 2025-2029

The stock images market size is forecast to increase by USD 1.28 billion, at a CAGR of 5.3% between 2024 and 2029. The market is experiencing significant growth, driven by the increasing popularity of visual content in digital and social media marketing.

Major Market Trends & Insights

North America dominated the market and accounted for a 43% share in 2023. The market is expected to grow significantly in Europe region as well over the forecast period. Based on the Application, the editorial segment led the market and was valued at USD 2.14 billion of the global revenue in 2023. Based on the Product, the still images segment accounted for the largest market revenue share in 2023.

Market Size & Forecast

Market Opportunities: USD 4.34 Billion Future Opportunities: USD 1.28 Billion CAGR (2024-2029): 5.3% North America: Largest market in 2023

Businesses are investing heavily in related portfolios to enhance their online presence and engage customers effectively. However, this trend comes with challenges. Profit margins are declining due to the increasing competition and availability of free or low-cost stock images. Companies must navigate this competitive landscape by offering high-quality, unique, and exclusive images to differentiate themselves and maintain profitability. To capitalize on this market, businesses should focus on creating a strong brand identity through visually appealing content and leveraging advanced image search technologies to cater to specific customer needs.

Additionally, exploring niche markets and offering customized solutions can provide opportunities for growth and differentiation. Overall, the market presents both opportunities and challenges, requiring strategic planning and innovative approaches to succeed.

What will be the Size of the Stock Images Market during the forecast period?

Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
Request Free Sample

The market continues to evolve, driven by advancements in technology and shifting consumer preferences. Image format conversion, a key trend, enables businesses to adapt their visual content for various platforms and devices. Image database solutions, equipped with semantic image search, facilitate efficient content discovery. Digital asset management systems, enhanced by AI-powered image tagging and metadata extraction, streamline content organization and access. Image manipulation detection ensures authenticity and trust in visual content. Metadata tagging systems and photographic licensing models enable effective rights management. High-resolution imaging and image editing software cater to the demand for visually appealing content. Large-scale image storage solutions address the increasing volume of visual data. The commercial segment is the second largest segment of the application and was valued at USD 1.99 billion in 2023.

Image resolution scaling, panoramic image stitching, and image compression algorithms optimize content for efficient transmission and display. Visual search technology and 360-degree image creation offer innovative ways to engage consumers. Image enhancement filters, image recognition software, vector graphics optimization, and image archival systems ensure content remains relevant and accessible. Industry growth is expected to reach 12% annually, reflecting the continuous demand for visual content in various sectors. For instance, a leading e-commerce company reported a 25% increase in sales after implementing AI-driven image tagging and metadata management. This underscores the importance of optimizing visual content for discoverability and accessibility.

How is this Stock Images Industry segmented?

The stock images industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

Application Editorial Commercial Product Still images Footage Type Free Paid Geography North America US Canada Europe France Germany Italy UK APAC China India Japan South Korea Rest of World (ROW)

By Application Insights

The editorial segment is estimated to witness significant growth during the forecast period. The segment was valued at USD 2.14 billion in 2023. It continued to the largest segment at a CAGR of 4.01%.

The market is witnessing significant growth, driven by the editorial segment's increasing demand. In this sector, stock images serve primarily to enhance storytelling in publishing and media. These images, designated for editorial use, are restricted to non-commercial applications. Publishing houses, which produce books, newspapers, and mag
d
Ecommerce Data - Product data, Seller data, Market data, Pricing data|...
datarade.ai
Updated Jan 29, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
APISCRAPY (2024). Ecommerce Data - Product data, Seller data, Market data, Pricing data| Scrape all publicly available eCommerce data| 50% Cost Saving | Free Sample [Dataset]. https://datarade.ai/data-products/apiscrapy-mobile-app-data-api-scraping-service-app-intel-apiscrapy
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Jan 29, 2024
Dataset authored and provided by
APISCRAPY
Area covered
Ukraine, Switzerland, Bosnia and Herzegovina, United States of America, Isle of Man, China, Åland Islands, Norway, Malta, Spain
Description
Note:- Only publicly available data can be worked upon

In today's ever-evolving Ecommerce landscape, success hinges on the ability to harness the power of data. APISCRAPY is your strategic ally, dedicated to providing a comprehensive solution for extracting critical Ecommerce data, including Ecommerce market data, Ecommerce product data, and Ecommerce datasets. With the Ecommerce arena being more competitive than ever, having a data-driven approach is no longer a luxury but a necessity.

APISCRAPY's forte lies in its ability to unearth valuable Ecommerce market data. We recognize that understanding the market dynamics, trends, and fluctuations is essential for making informed decisions.

APISCRAPY's AI-driven ecommerce data scraping service presents several advantages for individuals and businesses seeking comprehensive insights into the ecommerce market. Here are key benefits associated with their advanced data extraction technology:

Ecommerce Product Data: APISCRAPY's AI-driven approach ensures the extraction of detailed Ecommerce Product Data, including product specifications, images, and pricing information. This comprehensive data is valuable for market analysis and strategic decision-making.

Data Customization: APISCRAPY enables users to customize the data extraction process, ensuring that the extracted ecommerce data aligns precisely with their informational needs. This customization option adds versatility to the service.

Efficient Data Extraction: APISCRAPY's technology streamlines the data extraction process, saving users time and effort. The efficiency of the extraction workflow ensures that users can obtain relevant ecommerce data swiftly and consistently.

Realtime Insights: Businesses can gain real-time insights into the dynamic Ecommerce Market by accessing rapidly extracted data. This real-time information is crucial for staying ahead of market trends and making timely adjustments to business strategies.

Scalability: The technology behind APISCRAPY allows scalable extraction of ecommerce data from various sources, accommodating evolving data needs and handling increased volumes effortlessly.

Beyond the broader market, a deeper dive into specific products can provide invaluable insights. APISCRAPY excels in collecting Ecommerce product data, enabling businesses to analyze product performance, pricing strategies, and customer reviews.

To navigate the complexities of the Ecommerce world, you need access to robust datasets. APISCRAPY's commitment to providing comprehensive Ecommerce datasets ensures businesses have the raw materials required for effective decision-making.

Our primary focus is on Amazon data, offering businesses a wealth of information to optimize their Amazon presence. By doing so, we empower our clients to refine their strategies, enhance their products, and make data-backed decisions.

[Tags: Ecommerce data, Ecommerce Data Sample, Ecommerce Product Data, Ecommerce Datasets, Ecommerce market data, Ecommerce Market Datasets, Ecommerce Sales data, Ecommerce Data API, Amazon Ecommerce API, Ecommerce scraper, Ecommerce Web Scraping, Ecommerce Data Extraction, Ecommerce Crawler, Ecommerce data scraping, Amazon Data, Ecommerce web data]
c
Myntra products dataset with images
crawlfeeds.com
json, zip
Updated May 29, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Myntra products dataset with images [Dataset]. https://crawlfeeds.com/datasets/myntra-products-dataset-with-images
Explore at:
zip, jsonAvailable download formats
Dataset updated
May 29, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
Myntra is a major Indian fashion e-commerce company. The crawl Feeds team extracted more than 110K+ records along with images for research and analysis purposes.

Total images count: 120K+

Dataset included JSON file and long with images in JPG format.

Clothes image dataset having a product schema which included path of a image file so it helps to gather product related images.
u
Pinterest Fashion Compatibility
cseweb.ucsd.edu
beta.data.urbandatacentre.ca
json
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Pinterest Fashion Compatibility [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
Explore at:
jsonAvailable download formats
Dataset authored and provided by
UCSD CSE Research Project
Description
This dataset contains images (scenes) containing fashion products, which are labeled with bounding boxes and links to the corresponding products.

Metadata includes

product IDs

bounding boxes

Basic Statistics:

Scenes: 47,739

Products: 38,111

Scene-Product Pairs: 93,274
F
Bahasa Product Image OCR Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Bahasa Product Image OCR Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/bahasa-product-image-ocr-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
Introducing the Bahasa Product Image Dataset - a diverse and comprehensive collection of images meticulously curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Bahasa language.
Dataset Contain & Diversity:
Containing a total of 2000 images, this Bahasa OCR dataset offers diverse distribution across different types of front images of Products. In this dataset, you'll find a variety of text that includes product names, taglines, logos, company names, addresses, product content, etc. Images in this dataset showcase distinct fonts, writing formats, colors, designs, and layouts.
To ensure the diversity of the dataset and to build a robust text recognition model we allow limited (less than five) unique images from a single resource. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible Bahasa text.
Images have been captured under varying lighting conditions – both day and night – along with different capture angles and backgrounds, to build a balanced OCR dataset. The collection features images in portrait and landscape modes.
All these images were captured by native Bahasa people to ensure the text quality, avoid toxic content and PII text. We used the latest iOS and Android mobile devices above 5MP cameras to click all these images to maintain the image quality. In this training dataset images are available in both JPEG and HEIC formats.
Metadata:
Along with the image data, you will also receive detailed structured metadata in CSV format. For each image, it includes metadata like image orientation, county, language, and device information. Each image is properly renamed corresponding to the metadata.
The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of Bahasa text recognition models.
Update & Custom Collection:
We're committed to expanding this dataset by continuously adding more images with the assistance of our native Bahasa crowd community.
If you require a custom product image OCR dataset tailored to your guidelines or specific device distribution, feel free to contact us. We're equipped to curate specialized data to meet your unique needs.
Furthermore, we can annotate or label the images with bounding box or transcribe the text in the image to align with your specific project requirements using our crowd community.
License:
This Image dataset, created by FutureBeeAI, is now available for commercial use.
Conclusion:
Leverage the power of this product image OCR dataset to elevate the training and performance of text recognition, text detection, and optical character recognition models within the realm of the Bahasa language. Your journey to enhanced language understanding and processing starts here.
d
Selfie photo Dataset | 10M+ images | Global Coverage | Face Detection |...
datarade.ai
.jpeg, .jpg, .png
Updated Jul 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FileMarket (2025). Selfie photo Dataset | 10M+ images | Global Coverage | Face Detection | Computer vision Data [Dataset]. https://datarade.ai/data-products/selfie-photo-dataset-10m-images-global-coverage-face-d-filemarket
Explore at:
.jpeg, .jpg, .pngAvailable download formats
Dataset updated
Jul 25, 2025
Dataset authored and provided by
FileMarket
Area covered
Turkmenistan, Fiji, Nauru, Nigeria, Myanmar, Christmas Island, Sweden, French Polynesia, British Indian Ocean Territory, Guernsey
Description
Total Users 10,229,822 Total Pictures 10M+ (mostly 1 per ID)

Gender: - Male 60% - Female 40%

Ethnicity: - Asian 9% - African Decent 13% - East Indian 3% - Latino Hispanic 28% - Caucasian 47%

Age Group: - 0-17 3% - 18-24 62% - 25-34 21% - 35-44 10% - 45-54 3% - 55+ 1%

Top Phone Models: - iPhone 6s 9% - iPhone XR 6% - iPhone 6 6% - iPhone 7 (US/CDMA) 6% - iPhone 11 5% - iPhone 8 (US/CDMA) 4% (Total 141 device)

Top Countries: - US 48.84% - GB 10.57% - CA 4.26% - AU 3.48% - FR 2.80% - SA 2.17% (Total 131 countries)

Average resolution 5761024 px.

All photos are collected with the consent of users.
F
Polish Product Image OCR Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Polish Product Image OCR Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/polish-product-image-ocr-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
Introducing the Polish Product Image Dataset - a diverse and comprehensive collection of images meticulously curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Polish language.
Dataset Contain & Diversity:
Containing a total of 2000 images, this Polish OCR dataset offers diverse distribution across different types of front images of Products. In this dataset, you'll find a variety of text that includes product names, taglines, logos, company names, addresses, product content, etc. Images in this dataset showcase distinct fonts, writing formats, colors, designs, and layouts.
To ensure the diversity of the dataset and to build a robust text recognition model we allow limited (less than five) unique images from a single resource. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible Polish text.
Images have been captured under varying lighting conditions – both day and night – along with different capture angles and backgrounds, to build a balanced OCR dataset. The collection features images in portrait and landscape modes.
All these images were captured by native Polish people to ensure the text quality, avoid toxic content and PII text. We used the latest iOS and Android mobile devices above 5MP cameras to click all these images to maintain the image quality. In this training dataset images are available in both JPEG and HEIC formats.
Metadata:
Along with the image data, you will also receive detailed structured metadata in CSV format. For each image, it includes metadata like image orientation, county, language, and device information. Each image is properly renamed corresponding to the metadata.
The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of Polish text recognition models.
Update & Custom Collection:
We're committed to expanding this dataset by continuously adding more images with the assistance of our native Polish crowd community.
If you require a custom product image OCR dataset tailored to your guidelines or specific device distribution, feel free to contact us. We're equipped to curate specialized data to meet your unique needs.
Furthermore, we can annotate or label the images with bounding box or transcribe the text in the image to align with your specific project requirements using our crowd community.
License:
This Image dataset, created by FutureBeeAI, is now available for commercial use.
Conclusion:
Leverage the power of this product image OCR dataset to elevate the training and performance of text recognition, text detection, and optical character recognition models within the realm of the Polish language. Your journey to enhanced language understanding and processing starts here.
d
600K+ Household Object Images | AI Training Data | Object Detection Data |...
datarade.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Seeds, 600K+ Household Object Images | AI Training Data | Object Detection Data | Annotated imagery data | Global Coverage [Dataset]. https://datarade.ai/data-products/500k-household-object-images-ai-training-data-object-det-data-seeds
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset authored and provided by
Data Seeds
Area covered
Serbia, Austria, Ecuador, United Republic of, Brunei Darussalam, Congo, New Caledonia, Ukraine, Saint Kitts and Nevis, Kiribati
Description
This dataset features over 600,000 high-quality images of household objects sourced from photographers worldwide. Designed to support AI and machine learning applications, it offers an extensively annotated and highly diverse collection of everyday indoor items across cultural and functional contexts.

Key Features: 1. Comprehensive Metadata: the dataset includes full EXIF data such as aperture, ISO, shutter speed, and focal length. Each image is annotated with object labels, room context, material types, and functional categories—ideal for training models in object detection, classification, and scene understanding. Popularity metrics based on platform engagement are also included.

Unique Sourcing Capabilities: images are gathered through a proprietary gamified platform featuring competitions focused on home environments and still life. This ensures a rich flow of authentic, high-quality submissions. Custom datasets can be created on-demand within 72 hours, targeting specific object categories, use-cases (e.g., kitchenware, electronics, decor), or room types.

Global Diversity: contributions from over 100 countries showcase household items from a wide range of cultures, economic settings, and design aesthetics. The dataset includes everything from modern appliances and utensils to traditional tools and furnishings, captured in kitchens, bedrooms, bathrooms, living rooms, and utility spaces.

High-Quality Imagery: includes images from standard to ultra-high-definition, covering both staged product-like photos and natural usage contexts. This variety supports robust training for real-world applications in cluttered or dynamic environments.

Popularity Scores: each image has a popularity score based on its performance in GuruShots competitions. These scores provide valuable input for training models focused on product appeal, consumer trend detection, or aesthetic evaluation.

AI-Ready Design: optimized for use in smart home applications, inventory systems, assistive technologies, and robotics. Fully compatible with major machine learning frameworks and annotation workflows.

Licensing & Compliance: all data is compliant with global privacy and content use regulations, with transparent licensing for both commercial and academic applications.

Use Cases: 1. Training AI for home inventory and recognition in smart devices and AR tools. 2. Powering assistive technologies for accessibility and elder care. 3. Enhancing e-commerce recommendation and visual search systems. 4. Supporting robotics for home navigation, object grasping, and task automation.

This dataset provides a comprehensive, high-quality resource for training AI across smart living, retail, and assistive domains. Custom requests are welcome. Contact us to learn more!
u
Product Exchange/Bartering Data
cseweb.ucsd.edu
json
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Product Exchange/Bartering Data [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
Explore at:
jsonAvailable download formats
Dataset authored and provided by
UCSD CSE Research Project
Description
These datasets contain peer-to-peer trades from various recommendation platforms.

Metadata includes

peer-to-peer trades

have and want lists

image data (tradesy)
Images Dataset
brightdata.com
.json, .csv, .xlsx
Updated May 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2024). Images Dataset [Dataset]. https://brightdata.com/products/datasets/image
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
May 8, 2024
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
We'll tailor an Images dataset to meet your unique needs, encompassing image titles, tags, categories, photographer information, download metrics, licensing details, and other pertinent metrics.

Leverage our Images datasets for diverse applications to bolster strategic planning and market analysis. Scrutinizing these datasets enables organizations to understand trends in visual media consumption, facilitating nuanced content creation and marketing strategies. Customize your access to the entire dataset or specific subsets as per your business requisites.

Popular use cases involve using engagement insights to optimize visual content strategies, enhancing decision-making through targeted content segmentation based on trends and usage patterns, and identifying and forecasting trends in photography and digital media to stay ahead in the competitive landscape.

Facebook

Twitter

Click to copy link

Link copied

Cite

FutureBee AI (2022). Filipino Product Image OCR Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/filipino-product-image-ocr-dataset

Filipino Product Image OCR Dataset

Filipino product image dataset

Explore at:

wavAvailable download formats

Dataset updated

Aug 1, 2022

Dataset provided by

FutureBeeAI

Authors

FutureBee AI

License

https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

Dataset funded by

FutureBeeAI

Description

What’s Included

Introducing the Filipino Product Image Dataset - a diverse and comprehensive collection of images meticulously curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Filipino language.

Dataset Contain & Diversity:

Containing a total of 2000 images, this Filipino OCR dataset offers diverse distribution across different types of front images of Products. In this dataset, you'll find a variety of text that includes product names, taglines, logos, company names, addresses, product content, etc. Images in this dataset showcase distinct fonts, writing formats, colors, designs, and layouts.

To ensure the diversity of the dataset and to build a robust text recognition model we allow limited (less than five) unique images from a single resource. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible Filipino text.

Images have been captured under varying lighting conditions – both day and night – along with different capture angles and backgrounds, to build a balanced OCR dataset. The collection features images in portrait and landscape modes.

All these images were captured by native Filipino people to ensure the text quality, avoid toxic content and PII text. We used the latest iOS and Android mobile devices above 5MP cameras to click all these images to maintain the image quality. In this training dataset images are available in both JPEG and HEIC formats.

Metadata:

Along with the image data, you will also receive detailed structured metadata in CSV format. For each image, it includes metadata like image orientation, county, language, and device information. Each image is properly renamed corresponding to the metadata.

The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of Filipino text recognition models.

Update & Custom Collection:

We're committed to expanding this dataset by continuously adding more images with the assistance of our native Filipino crowd community.

If you require a custom product image OCR dataset tailored to your guidelines or specific device distribution, feel free to contact us. We're equipped to curate specialized data to meet your unique needs.

Furthermore, we can annotate or label the images with bounding box or transcribe the text in the image to align with your specific project requirements using our crowd community.

License:

This Image dataset, created by FutureBeeAI, is now available for commercial use.

Conclusion:

Leverage the power of this product image OCR dataset to elevate the training and performance of text recognition, text detection, and optical character recognition models within the realm of the Filipino language. Your journey to enhanced language understanding and processing starts here.

Clear search

Close search

Google apps

Main menu

Filipino Product Image OCR Dataset

What’s Included

Wirestock's AI/ML Image Training Data, 4.5M Files with Metadata

Ultra beauty products dataset

Description:

Fields:

Source:

Usage:

German Product Image OCR Dataset

What’s Included

ScrapeHero Data Cloud - Free and Easy to use

Google SERP Data, Web Search Data, Google Images Data | Real-Time API

Finnish Product Image OCR Dataset

What’s Included

340K+ Jewelry Images | AI Training Data | Object Detection Data | Annotated...

Amazon Beauty Products Dataset with Ingredients (47K Records)

Spanish Product Image OCR Dataset

What’s Included

Stock Images Market Analysis, Size, and Forecast 2025-2029: North America...

Snapshot img

Ecommerce Data - Product data, Seller data, Market data, Pricing data|...

Myntra products dataset with images

Pinterest Fashion Compatibility

Bahasa Product Image OCR Dataset

What’s Included

Selfie photo Dataset | 10M+ images | Global Coverage | Face Detection |...

Polish Product Image OCR Dataset

What’s Included

600K+ Household Object Images | AI Training Data | Object Detection Data |...

Product Exchange/Bartering Data

Images Dataset

Filipino Product Image OCR DatasetSee More Versions

Filipino product image dataset

What’s Included

Filipino Product Image OCR Dataset