17 datasets found

h
commoncatalog-cc-by
huggingface.co
Updated May 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CommonCanvas (2024). commoncatalog-cc-by [Dataset]. https://huggingface.co/datasets/common-canvas/commoncatalog-cc-by
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 16, 2024
Dataset authored and provided by
CommonCanvas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for CommonCatalog CC-BY

This dataset is a large collection of high-resolution Creative Common images (composed of different licenses, see paper Table 1 in the Appendix) collected in 2014 from users of Yahoo Flickr. The dataset contains images of up to 4k resolution, making this one of the highest resolution captioned image datasets.

Dataset Details Dataset Description

We provide captions synthetic captions to approximately 100 million high… See the full description on the dataset page: https://huggingface.co/datasets/common-canvas/commoncatalog-cc-by.
AI Generated Images vs Real Images
kaggle.com
zip
Updated Feb 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cash Bowman (2024). AI Generated Images vs Real Images [Dataset]. https://www.kaggle.com/datasets/cashbowman/ai-generated-images-vs-real-images
Explore at:
zip(499048119 bytes)Available download formats
Dataset updated
Feb 10, 2024
Authors
Cash Bowman
License
https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Description
The dataset is a captivating ensemble of images sourced from two distinct channels: web scraping and AI-generated content. The content covers many subjects; however, special emphasis was placed on these topics: people, animals, portraits, scenery, and psychedelics.

Key Features:

Web-Scraped Images: These images are harvested from various online sources across the web. Ranging from landscapes, paintings, psychedelic trips, and portraits, the web-scraped images offer a glimpse into the vast spectrum of digital imagery available online.

Projects and Applications:

Image Classification and Recognition: Researchers and developers can leverage the dataset to train machine learning models for image classification and recognition tasks. By incorporating both web-scraped and AI-generated images, models can learn to identify and categorize objects, scenes, and concepts across diverse domains with greater accuracy and generalization.

Artistic Exploration and Creative Synthesis: Artists, designers, and creative enthusiasts can draw inspiration from the dataset to explore new avenues of artistic expression and experimentation. They can use AI-generated imagery as a canvas for artistic reinterpretation, blending traditional techniques with computational aesthetics to produce captivating artworks and multimedia installations.

Data Visualization and Exploratory Analysis: Data scientists and researchers can analyze the dataset to uncover insights into visual trends, patterns, and correlations.

Have fun!
Reddit r/place 2022 History
kaggle.com
zip
Updated Apr 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rob Mulla (2022). Reddit r/place 2022 History [Dataset]. https://www.kaggle.com/datasets/robikscube/reddit-rplace-2022-history
Explore at:
zip(5069536534 bytes)Available download formats
Dataset updated
Apr 5, 2022
Authors
Rob Mulla
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Source

This data was provided by the website: https://place.thatguyalex.com/ And pulled using the code created here: https://github.com/ProstoSanja/place-2022

History

From Wikipedia: On March 28, 2022, a reboot of Place was announced. It began on April 1, 2022, and lasted for four days. Unlike in 2017, individual subreddits immediately began to coordinate pixel art; in large part due to this, and the increased number of users on Reddit between the two experiments. Different communities collaborated and formed alliances through Discord as well as subreddits. Large streamers on Twitch also participated by instructing thousands of their viewers to mark their logos and symbols. In response to the project's popularity, Reddit doubled the canvas's size and expanded the color palette on days 2 and 3. On the fourth day, the palette was reduced to only white, with which the entire canvas was blanked.

This Dataset

The dataset contains png images taken as snapshots of the r/place canvas throughout the 4 days it was online. The current version is final_v1 provided by https://place.thatguyalex.com/
m
Devanagari Handwritten CAPTCHA - Dataset of 90 K Images : A Challenge Test
data.mendeley.com
ieee-dataport.org
Updated Nov 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SANJAY PATE (2023). Devanagari Handwritten CAPTCHA - Dataset of 90 K Images : A Challenge Test [Dataset]. http://doi.org/10.17632/v9wwkvdjmm.1
Explore at:
Unique identifier
https://doi.org/10.17632/v9wwkvdjmm.1
Dataset updated
Nov 30, 2023
Authors
SANJAY PATE
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Captcha stands for Completely Automated Public Turing Tests to Distinguish Between Humans and Computers. This test cannot be successfully completed by current computer systems; only humans can. It is applied in several contexts for machine and human identification. The most common kind found on websites are text-based CAPTCHAs.A CAPTCHA is made up of a series of alphabets or numbers that are linked together in a certain order. Random lines, blocks, grids, rotations, and other sorts of noise have been used to distort this image.It is difficult for rural residents who only speak their local tongues to pass the test because the majority of the letters in this protected CAPTCHA script are in English. Machine identification of Devanagari characters is significantly more challenging due to their higher character complexity compared to normal English characters and numeral-based CAPTCHAs. The vast majority of official Indian websites exclusively provide content in Devanagari. Regretfully, websites do not employ CAPTCHAs in Devanagari. Because of this, we have developed a brand-new text-based CAPTCHA using Devanagari writing.A canvas was created using Python. This canvas code is distributed to more than one hundred (100+) Devanagari native speakers of all ages, including both left- and right-handed computer users. Each user writes 440 characters (44 characters multiplied by 10) on the canvas and saves it on their computers. All user data is then gathered and compiled. The character on the canvas is black with a white background. No noise in the image is a benefit of using canvas. The final data set contains a total of 44,000 digitized images, 10,000 numerals, 4000 vowels, and 30,000 consonants. This dataset was published for research scholars for recognition and other applications on Mendeley (Mendeley Data, DOI: 10.17632/yb9rmfjzc2.1, dated October 5, 2022) and the IEEE data port (DOI: 10.21227/9zpv-3194, dated October 6, 2022).We have designed our own algorithm to design the Handwritten Devanagari CAPTCHA. We used the above-created handwritten character set. General CAPTCHA generation principles are used to add noise to the image using digital image processing techniques. The size of each CAPTCHA image is 250 x 90 pixels. Three (03) types of character sets are used: handwritten alphabets, handwritten digits, and handwritten alphabets and digits combined. For 09 Classes X 10,000 images , a Devanagari CAPTCHA data set of 90,0000 images was created using Python. All images are stored in CSV format for easy use to researchers. To make the CAPTCHA image less recognized or not easily broken. Passing a test identifying Devanagari alphabets is difficult. It is beneficial to researchers who are investigating captcha recognition in this area. This dataset is helpful to researchers in designing OCR to recognize Devanagari CAPTCHA and break it. If you are able to successfully bypass the CAPTCHA, please acknowledge us by sending an email to sanjayepate@gmail.com.
DC GAN 🎨 | MNIST 🔢 | Generated Images 📷
kaggle.com
zip
Updated Nov 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DeepNets (2023). DC GAN 🎨 | MNIST 🔢 | Generated Images 📷 [Dataset]. https://www.kaggle.com/datasets/utkarshsaxenadn/dc-gan-mnist-generated-images
Explore at:
zip(8114420 bytes)Available download formats
Dataset updated
Nov 10, 2023
Authors
DeepNets
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
DCGAN Magic on MNIST: Enchanting Image Dataset! 🌟🎨

Dive into a realm where pixels awaken with magic! 🚀✨ This captivating dataset unveils the enchanting journey of a Deep Convolutional Generative Adversarial Network (DCGAN) as it breathes life into mesmerizing images on the MNIST canvas.

🌌 GIF Chronicles: Behold time capsules of GIFs, each a magical journey through epochs. Watch as the pixels evolve, dance, and transform, revealing the growth and artistry of our wizardly model.

📸 Snapshot Diaries: Explore meticulously collected snapshots, capturing the essence of every 1000 steps across 10 enchanting epochs. Each image tells a tale of the model's evolution, from its tentative steps to the grandeur of mastery.

🧙‍♂️ Genesis Moments: Step back to the humble beginnings, where the random generator forged base generations, setting the stage for the grand symphony of creativity.

🎩 Crowning Achievements: Marvel at the final generator's crowning glory—the synthesis of captivating, realistic images. Each pixel is a stroke of genius, a testament to the magic of PyTorch and the artistry of our DCGAN.

May your coding be enchanting, and your images spellbinding! 🪄🌈

May the Pixels Be Ever in Your Favor! 🖼💫
O
Earth on Canvas
opendatalab.com
zip
Updated Mar 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
German Aerospace Center (2023). Earth on Canvas [Dataset]. https://opendatalab.com/OpenDataLab/Earth_on_Canvas
Explore at:
zipAvailable download formats
Dataset updated
Mar 24, 2023
Dataset provided by
German Aerospace Center
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
地球
Description
A Zero-Shot Sketch-based Inter-Modal Object Retrieval Scheme for Remote Sensing Images WITH the advancement in sensor technology, huge amounts of data are being collected from various satellites. Hence, the task of target-based data retrieval and acquisition has become exceedingly challenging. Existing satellites essentially scan a vast overlapping region of the Earth using various sensing techniques, like multi-spectral, hyperspectral, Synthetic Aperture Radar (SAR), video, and compressed sensing, to name a few. With increasing complexity and different sensing techniques at our disposal, it has become our primary interest to design efficient algorithms to retrieve data from multiple data modalities, given the complementary information that is captured by different sensors. This type of problem is referred to as inter-modal data retrieval. In remote sensing (RS), there are primarily two important types of problems, i.e., land-cover classification and object detection. In this work, we focus on the target-based object retrieval part, which falls under the realm of object detection in RS. Object retrieval essentially requires high-resolution imagery for objects to be distinctly visible in the image. The main challenge with the conventional retrieval approach using large-scale databases is that, quite often, we do not have any query image sample of the target class at our disposal. The target of interest solely exists as a perception to the user in the form of an imprecise sketch. In such situations where a photo query is absent, it can be immensely useful if we can promptly make a quick hand-made sketch of the target. Sketches are a highly symbolic and hieroglyphic representation of data. One can exploit the notion of this minimalistic representative of sketch queries for sketch-based image retrieval (SBIR) framework. While dealing with satellite images, it is imperative to collect as many samples of images as possible for each object class for object recognition with a high success rate. However, in general, there exists a considerable number of classes for which we seldom have any training data samples. Therefore, for such classes, we can use the zero-shot learning (ZSL) strategy. The ZSL approach aims to solve a task without receiving any example of that task during the training phase. This makes the network capable of handling an unseen class (new class) sample obtained during the inference phase upon deployment of the network. Hence, we propose the aerial sketch-image dataset, namely Earth on Canvas dataset. Classes in this dataset: Airplane, Baseball Diamond, Buildings, Freeway, Golf Course, Harbor, Intersection, Mobile home park, Overpass, Parking lot, River, Runway, Storage tank, Tennis court.
A
AI Painting Tools Report
archivemarketresearch.com
doc, pdf, ppt
Updated May 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). AI Painting Tools Report [Dataset]. https://www.archivemarketresearch.com/reports/ai-painting-tools-559351
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
May 1, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
Discover the booming AI painting tools market! This report reveals a $1.5 billion market in 2025, projected to grow at 25% CAGR until 2033. Learn about key players like Nvidia and Google, market segments, and regional trends. Get insights into the future of AI art generation.
MNIST 2 Digit Classification Dataset
kaggle.com
zip
Updated Sep 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aman Kumar (2023). MNIST 2 Digit Classification Dataset [Dataset]. https://www.kaggle.com/datasets/amankumar234/mnist-2-digit-classification-dataset/discussion
Explore at:
zip(140169 bytes)Available download formats
Dataset updated
Sep 19, 2023
Authors
Aman Kumar
Description
Objective :

The goal of this dataset is to create a custom dataset for multi-digit recognition tasks by concatenating pairs of digits from the MNIST dataset into single 128x128 pixel images and assigning labels that represent two-digit numbers from '00' to '99'.

Dataset Features :

Image (128 x 128 pixel Numpy array): The dataset contains images of size 128 x 128 pixels. Each image is a composition of two pairs of MNIST digits. Each digit occupies a 28 x 28 pixel space within the larger 128 x 128 pixel canvas. The digits are randomly placed within the canvas to simulate real-world scenarios.

Label (Int): The labels represent two-digit numbers ranging from '00' to '99'. These labels are assigned based on the digits present in the image and their order. For example, an image with '7' and '2' as the first and second digits would be labeled as '72' ('7' * 10 + '2'). Leading zeros are added to ensure that all labels are two characters in length.

Dataset Size:

Training Data: 60,000 data points Test Data: 10,000 data points

Data Generation: To create this dataset, you would start with the MNIST dataset, which contains single-digit images of handwritten digits from '0' to '9'. For each data point in the new dataset, you would randomly select two pairs of digits from MNIST and place them on a 128 x 128 canvas. The digits are placed at random positions, and their order can also be random. After creating the multi-digit image, you assign a label by concatenating the labels of the individual digits while ensuring they are two characters in length.

Key Features of the 2-Digit Classification Dataset:

Multi-Digit Images: This dataset consists of multi-digit images, each containing two handwritten digits. The inclusion of multiple digits in a single image presents a unique and challenging classification task.

Labeling Complexity: Labels are represented as two-digit numbers, adding complexity to the classification problem. The labels range from '00' to '99,' encompassing a wide variety of possible combinations.

Diverse Handwriting Styles: The dataset captures diverse handwriting styles, making it suitable for testing the robustness and generalization capabilities of machine learning models.

128x128 Pixel Images: Images are provided in a high-resolution format of 128x128 pixels, allowing for fine-grained analysis and leveraging the increased image information.

Large-Scale Training and Test Sets: With 60,000 training data points and 10,000 test data points, this dataset provides ample data for training and evaluating classification models.

Potential Use Cases:

Multi-Digit Recognition: The dataset is ideal for developing and evaluating machine learning models that can accurately classify multi-digit sequences, which find applications in reading house numbers, license plates, and more.

OCR (Optical Character Recognition) Systems: Researchers and developers can use this dataset to train and benchmark OCR systems for recognizing handwritten multi-digit numbers.

Real-World Document Processing: In scenarios where documents contain multiple handwritten numbers, such as invoices, receipts, and forms, this dataset can be valuable for automating data extraction.

Address Parsing: It can be used to build systems capable of parsing handwritten addresses and extracting postal codes or other important information.

Authentication and Security: Multi-digit classification models can contribute to security applications by recognizing handwritten PINs, passwords, or access codes.

Education and Handwriting Analysis: Educational institutions can use this dataset to create handwriting analysis tools and assess the difficulty of recognizing different handwritten number combinations.

Benchmarking Deep Learning Models: Data scientists and machine learning practitioners can use this dataset as a benchmark for testing and improving deep learning models' performance in multi-digit classification tasks.

Data Augmentation: Researchers can employ data augmentation techniques to generate even more training data by introducing variations in digit placement and size.

Model Explainability: Developing models for interpreting and explaining the reasoning behind classifying specific multi-digit combinations can have applications in AI ethics and accountability.

Visualizations and Data Exploration: Researchers can use this dataset to explore visualizations and data analysis techniques to gain insights into the characteristics of handwritten multi-digit numbers.

In summary, the 2-Digit Classification Dataset offers a unique opportunity to work on a challenging multi-digit recognition problem with real-world applications, making it a valuable resource for researchers, developers, and data scientists.

Note: Creating this dataset would require a considerable amount of preprocessing and image manipulation. ...

Photo Printing Market Analysis, Size, and Forecast 2025-2029: North America...

technavio.com

pdf

Updated Aug 29, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Technavio (2025). Photo Printing Market Analysis, Size, and Forecast 2025-2029: North America (US and Canada), Europe (France, Germany, Italy, and UK), APAC (China, India, Japan, and South Korea), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/photo-printing-market-industry-analysis

Explore at:

pdfAvailable download formats

Dataset updated

Aug 29, 2025

Dataset provided by

TechNavio

Authors

Technavio

License

https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

Time period covered

2025 - 2029

Area covered

United States

Description

Snapshot img

Photo Printing Market Size 2025-2029

The photo printing market size is forecast to increase by USD 11.54 billion, at a CAGR of 6.7% between 2024 and 2029.

The market is experiencing significant growth, driven by the increasing popularity of gifting culture and the rising preference for organic and natural pigments in photo ink. The gifting trend is fueled by the desire to create tangible memories, making photo prints an enduring choice for personal and professional celebrations. Furthermore, the shift towards eco-friendly and sustainable products is influencing the adoption of organic and natural pigments, which offer superior image quality and reduced environmental impact. Another trend shaping the market is the increasing applications of artificial intelligence (AI) in the photo editing process.
Companies must adapt to this shift by offering unique value propositions, such as high-quality prints, personalization, and specialized finishes, to differentiate themselves and retain customers. Additionally, investments in research and development to create innovative products and services that cater to evolving consumer preferences will be essential for market success. However, the market faces challenges as the digitalization of photography continues to gain momentum. Advanced technologies, including high-resolution printing, color management, and digital asset management, ensure superior print quality.

What will be the Size of the Photo Printing Market during the forecast period?

Get Key Insights on Market Forecast (PDF) Request Free Sample

The market continues to evolve, driven by advancements in technology and shifting consumer preferences. Image file formats and print quality assessment are paramount in ensuring superior results. Calibration of photo printers is crucial for maintaining consistent print accuracy. Matte and glossy photo printing cater to diverse tastes, with matte gaining popularity for its subtle, elegant finish. Print speed, measured in pages per minute (ppm), is a significant consideration for professional photo printers. Inkjet technology, utilizing UV resistant inks, dominates the market, offering borderless printing and excellent print longevity. Printing workflow optimization, including print driver settings and print media handling, is essential for efficient production.
Dye sublimation printing and canvas printing expand the market's reach, catering to unique applications. Photo paper types, each with varying color gamut accuracy and print resolution, influence the final output. Print longevity tests and image sharpening techniques further enhance the overall quality. Professional photo printers invest in color management systems and photo editing software to ensure precise color reproduction. Ink drying time and ink cartridge capacity are essential factors in operational costs. Printhead technology and image processing software advancements continue to push the boundaries of print quality. The photo printing industry is projected to grow at a robust rate of 5% annually, reflecting the continuous innovation and demand for high-quality printed photos.
For instance, a leading professional photo lab reported a 30% increase in sales due to the implementation of advanced image processing software and color profile ICC management, improving overall print quality and customer satisfaction.

How is this Photo Printing Industry segmented?

The photo printing industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

Distribution Channel

  Offline
  Online


Product

  Photo gifts
  Cards
  Photo books
  Calendars
  Others


Material

  Paper
  Canvas
  Metal
  Fabric


Geography

  North America

    US
    Canada


  Europe

    France
    Germany
    Italy
    UK


  APAC

    China
    India
    Japan
    South Korea


  Rest of World (ROW)

By Distribution Channel Insights

The Offline segment is estimated to witness significant growth during the forecast period. In the dynamic and evolving market, various image file formats continue to shape consumer preferences and industry trends. For instance, matte photo printing has experienced a significant rise in popularity, accounting for approximately 35% of total sales in 2021. Furthermore, print quality assessment plays a pivotal role in ensuring customer satisfaction, with inkjet photo printers dominating the market due to their ability to produce high-quality images. Print speed, another essential factor, has seen impressive improvements, with professional photo printers boasting print speeds of up to 15 prints per minute (ppm). UV resistant inks are another key development, ensuring print longevity and contributing to around 40% of the market share. The market trend

Rendering modes for Canvas.
plos.figshare.com
xls
Updated Oct 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yan Feng (2023). Rendering modes for Canvas. [Dataset]. http://doi.org/10.1371/journal.pone.0285331.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0285331.t002
Dataset updated
Oct 5, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Yan Feng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Virtual Reality (VR) technology uses computers to simulate the real world comprehensively. VR has been widely used in college teaching and has a huge application prospect. To better apply computer-aided instruction technology in music teaching, a music teaching system based on VR technology is proposed. First, a virtual piano is developed using the HTC Vive kit and the Leap Motion sensor fixed on the helmet as the hardware platform, and using Unity3D, related SteamVR plug-ins, and Leap Motion plug-ins as software platforms. Then, a gesture recognition algorithm is proposed and implemented. Specifically, the Dual Channel Convolutional Neural Network (DCCNN) is adopted to collect the user’s gesture command data. The dual-size convolution kernel is applied to extract the feature information in the image and the gesture command in the video, and then the DCCNN recognizes it. After the spatial and temporal information is extracted, Red-Green-Blue (RGB) color pattern images and optical flow images are input into the DCCNN. The prediction results are merged to obtain the final recognition result. The experimental results reveal that the recognition accuracy of DCCNN for the Curwen gesture is as high as 96%, and the recognition accuracy varies with different convolution kernels. By comparison, it is found that the recognition effect of DCCNN is affected by the size of the convolution kernel. Combining convolution kernels of size 5×5 and 7×7 can improve the recognition accuracy to 98%. The research results of this study can be used for music teaching piano and other VR products, with extensive popularization and application value.
R
Layout2 Dataset
universe.roboflow.com
zip
Updated Jul 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Liu (2023). Layout2 Dataset [Dataset]. https://universe.roboflow.com/daniel-liu-yxlfe/layout2-orwuv/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Jul 16, 2023
Dataset authored and provided by
Daniel Liu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Canvas Bounding Boxes
Description
Layout2

## Overview Layout2 is a dataset for object detection tasks - it contains Canvas annotations for 288 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
easy_grayscale_captcha200x40p_280k_plain_various
kaggle.com
zip
Updated Feb 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LJH Wild (2023). easy_grayscale_captcha200x40p_280k_plain_various [Dataset]. https://www.kaggle.com/datasets/ljhwild/easy-grayscale-captcha200x40p-240k-plain-various/code
Explore at:
zip(1273166592 bytes)Available download formats
Dataset updated
Feb 19, 2023
Authors
LJH Wild
License
Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
License information was derived automatically
Description
Easy Grayscale Captchas - No distortions, system fonts, various combinations - Master-Set

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F6758234%2F62788d6d8c7bce9775f616a38bb8c92d%2Fpreview%20this.png?generation=1676823128689885&alt=media" alt="">

All captchas in the set were generated by a custom python library chaptcha, with use of variety of fonts, backgrounds, positioning for text and with addition of framing. Created with OCR training in mind.

Key features

This is a master-set containing 14 subsets for various combinations of symbols, letters, numbers and spacing.

This set contains examples for all windows 10 system fonts.

Includes sets of varied lengths (sets with spaces) and constant lengths (sets without spaces)

Uses two RGB palettes: grayscale_bright = [ (255, 255, 255), # white (240, 240, 240), # light gray (230, 230, 230), # light gray (220, 220, 220), # light gray (210, 210, 210), # light gray (200, 200, 200), # light gray (190, 190, 190), # gray (180, 180, 180), # gray (170, 170, 170), # gray (160, 160, 160), # gray (150, 150, 150), # gray ] grayscale_dark = [ (140, 140, 140), # dark gray (125, 125, 125), # medium gray (110, 110, 110), # gray (95, 95, 95), # gray (80, 80, 80), # dark gray (65, 65, 65), # very dark gray (50, 50, 50), # very dark gray (35, 35, 35), # very dark gray (20, 20, 20), # very dark gray (5, 5, 5), # black ]

Each PNG file is accompanied by a matching name TXT file with the following structure (without hashtag comments): TEXT=Op269P2RkU #captcha text TLEN=10 #captcha string len FNAME=GOUDOS.TTF #used font FSIZE=45 #font size before rescaling the image FCOLOR=(35, 35, 35) #font color DISTORTED=False #applied distortions to font CANVAS=(200, 40) #size of the image file CAPTIOGSIZE=(225, 34) #original size at which captcha was generated* MARCOLOR=(160, 160, 160)#margins color ** *the captchas are generated with various readable sizes and rescaled and pasted onto a specific size canvas, this helps prepare the model to seeing cut out text with various proportions of background **no background color is saved, as by default random cut-outs from image files are used as background, this is a simplified set to begin the training, before moving on to full difficulty captchas

Other sets

I will be soon launching other master sets of similar sizes including easy_grayscale_with_distortions_system_fonts <- random grayscale colors from palettes easy_colors_plain_no_distortions_system_fonts <- random colors from palettes easy_colors_with_with_distortion_system_fonts <- random colors from palettes easy_any_colors_with_distortion_system_fonts <- random colors generated on the spot easy_text_color_background_image_no_distortion_system_fonts <- random text color, background based of an image library (smaller set 5-10k captcha per sub-set) easy_text_color_background_image_with_distortion_system_fonts <- random text color, background based of an image library (smaller set 5-10k captcha per sub-set)

Final sets preview: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F6758234%2Fa0fcb1f1418ee455afbd395b659df2a9%2Fpreview%20final.png?generation=1676823007184576&alt=media" alt="">

Even more other sets and custom sets

If you need to train your OCR on particular fonts or types of backgrounds or colors, or any other specific conditions, get in touch with me and we can collaborate. I'm glad to contribute to the development of these technologies.

DISCLOSURE

I've put a few days into testing and confirming the quality of the dataset. I did manually inspect the results, however I'm not able to be 100% sure that there aren't a few dozen files with fonts that don't support specific symbols in captchas that use all symbols (folders with specials in the name. If you spot any of those please let me know and I will update the data set. A way to clear these out whenever you spot any, at least for yourself is to write a script that checks all the text files files that have FNAME matching the the faulty font name. And then delete the text file and the PNG with the same name. But do please let me know as well, so I can correct it for the rest of us :) I.e. ``` import os

def remove_faulty_fonts(folder_path='.', faulty_fonts=['Font Name1',]): for root, dirs, files in os.walk(folder_path): for file in files: if file.endswith('.txt'): with open(os.path.join(root, file), 'r') as f: for line in f: if line.startswith('FNAME='): font_name = os.path.splitext(line.split('=')[1].strip())[0] if font_name in fau...
f
Data from: A Canvas of Spatially Arranged DNA Strands that Can Produce...
acs.figshare.com
xlsx
Updated Oct 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tadija Kekić; Jory Lietard (2023). A Canvas of Spatially Arranged DNA Strands that Can Produce 24-bit Color Depth [Dataset]. http://doi.org/10.1021/jacs.3c06500.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/jacs.3c06500.s001
Dataset updated
Oct 3, 2023
Dataset provided by
ACS Publications
Authors
Tadija Kekić; Jory Lietard
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Nucleic acid microarray photolithography combines density, throughput, and positional control in DNA synthesis. These surface-bound sequence libraries are conventionally used in large-scale hybridization assays against fluorescently labeled, perfect-match DNA strands. Here, we introduce another layer of control for in situ microarray synthesishybridization affinityto precisely modulate fluorescence intensity upon duplex formation. Using a combination of Cy3-, Cy5-, and fluorescein-labeled targets and an ensemble of truncated DNA probes, we organize 256 shades of red, green, and blue intensities that can be superimposed and merged. In so doing, hybridization alone is able to produce a large palette of 16 million colors or 24-bit color depth. Digital images can be reproduced with high fidelity at the micrometer scale by using a simple process that assigns sequence to any RGB value. Largely automated, this approach can be seen as miniaturized DNA-based painting.
E
Four cover designs for the EAR 33rd Edition 2013
find.data.gov.scot
dtechtive.com
jpg
Updated Dec 11, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Edinburgh. Edinburgh College of Art. Edinburgh School of Architecture and Landscape Architecture (2014). Four cover designs for the EAR 33rd Edition 2013 [Dataset]. http://doi.org/10.7488/ds/194
Explore at:
jpg(0.4676 MB), jpg(0.2788 MB), jpg(0.3014 MB), jpg(0.322 MB)Available download formats
Unique identifier
https://doi.org/10.7488/ds/194
Dataset updated
Dec 11, 2014
Dataset provided by
University of Edinburgh. Edinburgh College of Art. Edinburgh School of Architecture and Landscape Architecture
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Cover design tests for the 33rd Edition of the Edinburgh Architecture Research journal. The 2013 publication EAR Vol. 33 focused on the theme: Methodologies for Sustainable Projects. The images used for the four cover designs aimed to address the theme through the projection of light or images onto paper, wool, skin and canvas. The four covers were submitted for competition. The cover 'Jungenfeld_EAR33_CoverDesign_2.jpg' was selected for publication.
O
Crello (Crello dataset)
opendatalab.com
zip
Updated Sep 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CyberAgent (2022). Crello (Crello dataset) [Dataset]. https://opendatalab.com/OpenDataLab/Crello
Explore at:
zip(3118660088 bytes)Available download formats
Dataset updated
Sep 22, 2022
Dataset provided by
CyberAgent
License
https://cdla.dev/permissive-2-0/https://cdla.dev/permissive-2-0/
Description
Crello dataset is compiled for the study of vector graphic documents. The dataset contains document meta-data such as canvas size and pre-rendered elements such as images or text boxes. The original templates was collected from crello.com (now create.vista.com) and converted to low-resolution format suitable for machine learning analysis.
Reddit r/Place History
kaggle.com
zip
Updated Sep 19, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aleksey Bilogur (2017). Reddit r/Place History [Dataset]. https://www.kaggle.com/residentmario/reddit-rplace-history
Explore at:
zip(1042572638 bytes)Available download formats
Dataset updated
Sep 19, 2017
Authors
Aleksey Bilogur
Description
Context

r/Place was a wildly successful April Fool's joke perpetrated by Reddit over the course of 72 hours April 1-3, 2017. The rules of Place, quoting u/Drunken_Economist were:

There is an empty canvas.

You may place a tile upon it, but you must wait to place another.

Individually you can create something.

Together you can create something more.

1.2 million redditors used these premises to build the largest collaborative art project in history, painting (and often re-painting) a million-pixel canvas with 16.5 million tiles in 16 colors.

The canvas started out completely blank, and ended looking like this:

https://www.kaggle.io/svf/1524078/9f348000571338d18ca84afa042957f7/_results_files/_results_18_1.png" alt="">

How did that happen?

Content

This dataset is a full time placement history for r/place over time. Each record is a single move: one user changing one pixel to one of 15 different colors.

Acknowledgements

This data was published as-is by Reddit.

Inspiration

Users were heavily rate-limited in their ability to place pixels, so this dataset shows what happens when users of similar stripes "band together" to build something greater than themselves. With a pixel-by-pixel history, what can you tell about the relative popularity of different regions in the figure? Can you use image analysis techniques to segment the image into different regions, and measure what happens to them over time?
h
TreeOil_VanGogh_TheForgottenMasterwork_GlobalForensicIndex
huggingface.co
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SunnyAiNetwork (2025). TreeOil_VanGogh_TheForgottenMasterwork_GlobalForensicIndex [Dataset]. http://doi.org/10.57967/hf/6885
Explore at:
Unique identifier
https://doi.org/10.57967/hf/6885
Dataset updated
Jun 26, 2025
Authors
SunnyAiNetwork
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
🌌 The Tree Oil Painting – Master Reference Index

The World’s Largest Torque-Based AI & Scientific Forensic Universe for a Single Painting

🖼️ Central Image: The Tree Oil PaintingThis dataset contains a single image — the full canvas of The Tree Oil Painting.Around this painting, an entire universe of comparative research has formed. This file acts as the anchor point, the visual core of a 10-year investigation spanning AI, brushstroke forensics, pigment science, and… See the full description on the dataset page: https://huggingface.co/datasets/HaruthaiAi/TreeOil_VanGogh_TheForgottenMasterwork_GlobalForensicIndex.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

CommonCanvas (2024). commoncatalog-cc-by [Dataset]. https://huggingface.co/datasets/common-canvas/commoncatalog-cc-by

commoncatalog-cc-by

common-canvas/commoncatalog-cc-by

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

May 16, 2024

Dataset authored and provided by

CommonCanvas

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Dataset Card for CommonCatalog CC-BY

This dataset is a large collection of high-resolution Creative Common images (composed of different licenses, see paper Table 1 in the Appendix) collected in 2014 from users of Yahoo Flickr. The dataset contains images of up to 4k resolution, making this one of the highest resolution captioned image datasets.

  Dataset Details





  Dataset Description

We provide captions synthetic captions to approximately 100 million high… See the full description on the dataset page: https://huggingface.co/datasets/common-canvas/commoncatalog-cc-by.

Clear search

Close search

Google apps

Main menu

commoncatalog-cc-by

AI Generated Images vs Real Images

Reddit r/place 2022 History

Source

History

This Dataset

Devanagari Handwritten CAPTCHA - Dataset of 90 K Images : A Challenge Test

DC GAN 🎨 | MNIST 🔢 | Generated Images 📷

DCGAN Magic on MNIST: Enchanting Image Dataset! 🌟🎨

Earth on Canvas

AI Painting Tools Report

MNIST 2 Digit Classification Dataset

Objective :

Dataset Features :

Dataset Size:

Photo Printing Market Analysis, Size, and Forecast 2025-2029: North America...

Snapshot img

Rendering modes for Canvas.

Layout2 Dataset

Layout2

easy_grayscale_captcha200x40p_280k_plain_various

Easy Grayscale Captchas - No distortions, system fonts, various combinations - Master-Set

Key features

Other sets

Even more other sets and custom sets

DISCLOSURE

Data from: A Canvas of Spatially Arranged DNA Strands that Can Produce...

Four cover designs for the EAR 33rd Edition 2013

Crello (Crello dataset)

Reddit r/Place History

Context

Content

Acknowledgements

Inspiration

TreeOil_VanGogh_TheForgottenMasterwork_GlobalForensicIndex

commoncatalog-cc-bySee More Versions

common-canvas/commoncatalog-cc-by

commoncatalog-cc-by