64 datasets found

Tagged Anime Illustrations
kaggle.com
Updated Jul 30, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Myles O'Neill (2018). Tagged Anime Illustrations [Dataset]. https://www.kaggle.com/mylesoneill/tagged-anime-illustrations/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 30, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Myles O'Neill
Description
Context

This is a dataset of richly tagged and labeled artwork depicting characters from Japanese anime. The data comes from two image boards, danbooru and moeimouto. This data can be used in an variety of different interesting ways, from classification to generative modeling. Please note that while all of the images in this dataset have been tagged as SFW (non-explicit), the websites these are from do not ban explicit or pornographic images and mislabeled images are possibly still in the dataset.

Danbooru2017: A large-scale crowdsourced and tagged anime illustration dataset

The first set of data comes from the imageboard Danbooru. The entire corpus of Danbooru images was scraped from the site with permission and was collected into a dataset. The zip files included here have the full metadata for these images as well as a subset of 300,000 of the images in normalized 512px x 512px form. Full information about this dataset is available here:

https://www.gwern.net/Danbooru2017

From the article:

Deep learning for computer revision relies on large annotated datasets. Classification/categorization has benefited from the creation of ImageNet, which classifies 1m photos into 1000 categories. But classification/categorization is a coarse description of an image which limits application of classifiers, and there is no comparably large dataset of images with many tags or labels which would allow learning and detecting much richer information about images. Such a dataset would ideally be >1m images with at least 10 descriptive tags each which can be publicly distributed to all interested researchers, hobbyists, and organizations. There are currently no such public datasets, as ImageNet, Birds, Flowers, and MS COCO fall short either on image or tag count or restricted distribution. I suggest that the image -boorus be used. The image boorus are longstanding web databases which host large numbers of images which can be tagged or labeled with an arbitrary number of textual descriptions; they were developed for and are most popular among fans of anime, who provide detailed annotations.

The best known booru, with a focus on quality, is Danbooru. We create & provide a torrent which contains ~1.9tb of 2.94m images with 77.5m tag instances (of 333k defined tags, ~26.3/image) covering Danbooru from 24 May 2005 through 31 December 2017 (final ID: #2,973,532), providing the image files & a JSON export of the metadata. We also provide a smaller torrent of SFW images downscaled to 512x512px JPG (241GB; 2,232,462 images) for convenience.

Our hope is that a Danbooru2017 dataset can be used for rich large-scale classification/tagging & learned embeddings, test out the transferability of existing computer vision techniques (primarily developed using photographs) to illustration/anime-style images, provide an archival backup for the Danbooru community, feed back metadata improvements & corrections, and serve as a testbed for advanced techniques such as conditional image generation or style transfer.

Anime face character dataset

The second set of data included in this dataset is a little more manageable than the first, it includes a number of cropped illustrated faces from the now defunct site moeimouto. This dataset has been used in GAN work in the past. The data comes from:

http://www.nurs.or.jp/~nagadomi/animeface-character-dataset/

More information:

http://www.nurs.or.jp/~nagadomi/animeface-character-dataset/README.html

If you are interested in creating more face data (potentially from the Danbooru data), here is a helpful resource: https://github.com/nagadomi/lbpcascade_animeface

Other Data

If you are looking for something a little easier to crack into, check out this other great anime image booru dataset: https://www.kaggle.com/alamson/safebooru
w
Dataset of artists who created Large Drawing #5
workwithdata.com
Updated May 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of artists who created Large Drawing #5 [Dataset]. https://www.workwithdata.com/datasets/artists?f=1&fcol0=j0-artwork&fop0=%3D&fval0=Large+Drawing+%235&j=1&j0=artworks
Explore at:
Dataset updated
May 8, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about artists. It has 1 row and is filtered where the artworks is Large Drawing #5. It features 9 columns including birth date, death date, country, and gender.
The Quick, Draw! Dataset
github.com
carrfratagen43.blogspot.com
Updated Mar 1, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google (2017). The Quick, Draw! Dataset [Dataset]. https://github.com/googlecreativelab/quickdraw-dataset
Explore at:
Dataset updated
Mar 1, 2017
Dataset provided by
Googlehttp://google.com/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Quick Draw Dataset is a collection of 50 million drawings across 345 categories, contributed by players of the game "Quick, Draw!". The drawings were captured as timestamped vectors, tagged with metadata including what the player was asked to draw and in which country the player was located.

Example drawings: https://raw.githubusercontent.com/googlecreativelab/quickdraw-dataset/master/preview.jpg" alt="preview">
P
FloorPlanCAD Dataset
paperswithcode.com
opendatalab.com
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhiwen Fan; Lingjie Zhu; Honghua Li; Xiaohao Chen; Siyu Zhu; Ping Tan (2025). FloorPlanCAD Dataset [Dataset]. https://paperswithcode.com/dataset/floorplancad
Explore at:
Dataset updated
May 28, 2025
Authors
Zhiwen Fan; Lingjie Zhu; Honghua Li; Xiaohao Chen; Siyu Zhu; Ping Tan
Description
FloorPlanCAD is a large-scale real-world CAD drawing dataset containing over 15,000 floor plans, ranging from residential to commercial buildings.
f
Evaluating Statistical Methods Using Plasmode Data Sets in the Age of...
plos.figshare.com
pdf
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gary L. Gadbury; Qinfang Xiang; Lin Yang; Stephen Barnes; Grier P. Page; David B. Allison (2023). Evaluating Statistical Methods Using Plasmode Data Sets in the Age of Massive Public Databases: An Illustration Using False Discovery Rates [Dataset]. http://doi.org/10.1371/journal.pgen.1000098
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pgen.1000098
Dataset updated
Jun 4, 2023
Dataset provided by
PLOS Genetics
Authors
Gary L. Gadbury; Qinfang Xiang; Lin Yang; Stephen Barnes; Grier P. Page; David B. Allison
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Plasmode is a term coined several years ago to describe data sets that are derived from real data but for which some truth is known. Omic techniques, most especially microarray and genomewide association studies, have catalyzed a new zeitgeist of data sharing that is making data and data sets publicly available on an unprecedented scale. Coupling such data resources with a science of plasmode use would allow statistical methodologists to vet proposed techniques empirically (as opposed to only theoretically) and with data that are by definition realistic and representative. We illustrate the technique of empirical statistics by consideration of a common task when analyzing high dimensional data: the simultaneous testing of hundreds or thousands of hypotheses to determine which, if any, show statistical significance warranting follow-on research. The now-common practice of multiple testing in high dimensional experiment (HDE) settings has generated new methods for detecting statistically significant results. Although such methods have heretofore been subject to comparative performance analysis using simulated data, simulating data that realistically reflect data from an actual HDE remains a challenge. We describe a simulation procedure using actual data from an HDE where some truth regarding parameters of interest is known. We use the procedure to compare estimates for the proportion of true null hypotheses, the false discovery rate (FDR), and a local version of FDR obtained from 15 different statistical methods.
P
WikiArt Dataset
paperswithcode.com
Updated Feb 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Babak Saleh; Ahmed Elgammal (2021). WikiArt Dataset [Dataset]. https://paperswithcode.com/dataset/wikiart
Explore at:
Dataset updated
Feb 2, 2021
Authors
Babak Saleh; Ahmed Elgammal
Description
WikiArt contains painting from 195 different artists. The dataset has 42129 images for training and 10628 images for testing.
Cartoon -> Feature Pairs
kaggle.com
zip
Updated Aug 22, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
K Scott Mader (2019). Cartoon -> Feature Pairs [Dataset]. https://www.kaggle.com/datasets/kmader/cartoon-set
Explore at:
zip(5286882822 bytes)Available download formats
Dataset updated
Aug 22, 2019
Authors
K Scott Mader
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Context

Each cartoon face in these sets is composed of 16 components that vary in 10 artwork attributes, 4 color attributes, and 4 proportion attributes. Colors are chosen from a discrete set of RGB values. The number of options per attribute category ranges from 3, for short/medium/long chin length, to 111, for the largest category, hairstyle. See below for a description of each attribute.

Each of these components and their variation were drawn by the same artist, Shiraz Fuman, resulting in approximately 250 cartoon component artworks and ~1013 possible combinations. The artwork components are divided into a fixed set of layers that define a Z-ordering for rendering. For instance, the face shape is defined on a layer below the eyes and glasses, so that the artworks are rendered in the correct order. Hair style is a more complex case and needs to be defined on two layers, one behind the face and one in front. There are 8 total layers: hair back, face, hair front, eyes, eyebrows, mouth, facial hair, and glasses.

The mapping from attribute to artwork is also defined by the artist such that any random selection of attributes produces a visually appealing cartoon without any misaligned artwork; this sometimes involves handling interaction between attributes. For example, the proper way to display a “short beard” changes for different face shapes, which required the artist to create a “short beard” artwork for each face shape.

Attributes

Artwork chin_length 3 Length of chin (below mouth region) eye_angle 3 Tilt of the eye inwards or outwards eye_lashes 2 Whether or not eyelashes are visible eye_lid 2 Appearance of the eyelids eyebrow_shape 14 Shape of eyebrows eyebrow_weight 2 Line weight of eyebrows face_shape 7 Overall shape of the face facial_hair 15 Type of facial hair (type 14 is no hair) glasses 12 Type of glasses (type 11 is no glasses) hair 111 Type of head hair Colors eye_color 5 Color of the eye irises face_color 11 Color of the face skin glasses_color 7 Color of the glasses, if present hair_color 10 Color of the hair, facial hair, and eyebrows Proportions eye_eyebrow_distance 3 Distance between the eye and eyebrows eye_slant 3 Similar to eye_angle, but rotates the eye and does not change artwork eyebrow_thickness 4 Vertical scaling of the eyebrows

Acknowledgements

Dataset has been copied from https://google.github.io/cartoonset/ with no modifications (other than tgz being replaced with zip for compression)

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?`
h
danbooru2023
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
JapanFinetunersCommunity, danbooru2023 [Dataset]. https://huggingface.co/datasets/jpft/danbooru2023
Explore at:
Dataset authored and provided by
JapanFinetunersCommunity
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Danbooru2023: A Large-Scale Crowdsourced and Tagged Anime Illustration Dataset

Danbooru2023 is a large-scale anime image dataset with over 5 million images contributed and annotated in detail by an enthusiast community. Image tags cover aspects like characters, scenes, copyrights, artists, etc with an average of 30 tags per image. Danbooru is a veteran anime image board with high-quality images and extensive tag metadata. The dataset can be used to train image classification… See the full description on the dataset page: https://huggingface.co/datasets/jpft/danbooru2023.
P
Met Dataset
paperswithcode.com
opendatalab.com
Updated Sep 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nikolaos-Antonios Ypsilantis; Noa Garcia; Guangxing Han; Sarah Ibrahimi; Nanne van Noord; Giorgos Tolias (2023). Met Dataset [Dataset]. https://paperswithcode.com/dataset/met
Explore at:
Dataset updated
Sep 4, 2023
Authors
Nikolaos-Antonios Ypsilantis; Noa Garcia; Guangxing Han; Sarah Ibrahimi; Nanne van Noord; Giorgos Tolias
Description
The Met dataset is a large-scale dataset for Instance-Level Recognition (ILR) in the artwork domain. It relies on the open access collection from the Metropolitan Museum of Art (The Met) in New York to form the training set, which consists of about 400k images from more than 224k classes, with artworks of world-level geographic coverage and chronological periods dating back to the Paleolithic period. Each museum exhibit corresponds to a unique artwork, and defines its own class. The training set exhibits a long-tail distribution with more than half of the classes represented by a single image, making it a special case of few-shot learning.
w
Dataset of book subjects that contain The larger illustrated guide to birds...
workwithdata.com
Updated Nov 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Dataset of book subjects that contain The larger illustrated guide to birds of Southern Africa [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=The+larger+illustrated+guide+to+birds+of+Southern+Africa&j=1&j0=books
Explore at:
Dataset updated
Nov 7, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about book subjects. It has 3 rows and is filtered where the books is The larger illustrated guide to birds of Southern Africa. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
w
Dataset of book subjects that contain The illustrated encyclopedia of major...
workwithdata.com
Updated Nov 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Dataset of book subjects that contain The illustrated encyclopedia of major aircraft of World War II [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=The+illustrated+encyclopedia+of+major+aircraft+of+World+War+II&j=1&j0=books
Explore at:
Dataset updated
Nov 7, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about book subjects. It has 2 rows and is filtered where the books is The illustrated encyclopedia of major aircraft of World War II. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
Data used by EPA researchers to generate illustrative figures for overview...
s.cnmilf.com
datasets.ai
+1more
Updated Nov 14, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Data used by EPA researchers to generate illustrative figures for overview article "Multiscale Modeling of Background Ozone: Research Needs to Inform and Improve Air Quality Management" [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/data-used-by-epa-researchers-to-generate-illustrative-figures-for-overview-article-multisc
Explore at:
Dataset updated
Nov 14, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Data sets used to prepare illustrative figures for the overview article “Multiscale Modeling of Background Ozone” Overview The CMAQ model output datasets used to create illustrative figures for this overview article were generated by scientists in EPA/ORD/CEMM and EPA/OAR/OAQPS. The EPA/ORD/CEMM-generated dataset consisted of hourly CMAQ output from two simulations. The first simulation was performed for July 1 – 31 over a 12 km modeling _domain covering the Western U.S. The simulation was configured with the Integrated Source Apportionment Method (ISAM) to estimate the contributions from 9 source categories to modeled ozone. ISAM source contributions for July 17 – 31 averaged over all grid cells located in Colorado were used to generate the illustrative pie chart in the overview article. The second simulation was performed for October 1, 2013 – August 31, 2014 over a 108 km modeling _domain covering the northern hemisphere. This simulation was also configured with ISAM to estimate the contributions from non-US anthropogenic sources, natural sources, stratospheric ozone, and other sources on ozone concentrations. Ozone ISAM results from this simulation were extracted along a boundary curtain of the 12 km modeling _domain specified over the Western U.S. for the time period January 1, 2014 – July 31, 2014 and used to generate the illustrative time-height cross-sections in the overview article. The EPA/OAR/OAQPS-generated dataset consisted of hourly gridded CMAQ output for surface ozone concentrations for the year 2016. The CMAQ simulations were performed over the northern hemisphere at a horizontal resolution of 108 km. NO2 and O3 data for July 2016 was extracted from these simulations generate the vertically-integrated column densities shown in the illustrative comparison to satellite-derived column densities. CMAQ Model Data The data from the CMAQ model simulations used in this research effort are very large (several terabytes) and cannot be uploaded to ScienceHub due to size restrictions. The model simulations are stored on the /asm archival system accessible through the atmos high-performance computing (HPC) system. Due to data management policies, files on /asm are subject to expiry depending on the template of the project. Files not requested for extension after the expiry date are deleted permanently from the system. The format of the files used in this analysis and listed below is ioapi/netcdf. Documentation of this format, including definitions of the geographical projection attributes contained in the file headers, are available at https://www.cmascenter.org/ioapi/ Documentation on the CMAQ model, including a description of the output file format and output model species can be found in the CMAQ documentation on the CMAQ GitHub site at https://github.com/USEPA/CMAQ. This dataset is associated with the following publication: Hogrefe, C., B. Henderson, G. Tonnesen, R. Mathur, and R. Matichuk. Multiscale Modeling of Background Ozone: Research Needs to Inform and Improve Air Quality Management. EM Magazine. Air and Waste Management Association, Pittsburgh, PA, USA, 1-6, (2020).
P
PASS Dataset
paperswithcode.com
Updated Mar 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuki M. Asano; Christian Rupprecht; Andrew Zisserman; Andrea Vedaldi (2024). PASS Dataset [Dataset]. https://paperswithcode.com/dataset/pass
Explore at:
Dataset updated
Mar 27, 2024
Authors
Yuki M. Asano; Christian Rupprecht; Andrew Zisserman; Andrea Vedaldi
Description
PASS is a large-scale image dataset, containing 1.4 million images, that does not include any humans and which can be used for high-quality pretraining while significantly reducing privacy concerns.
g
Danbooru2017
gwern.net
jpeg
Updated Feb 3, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous, The Danbooru Community, Gwern Branwen, & Aaron Gokaslan (2018). Danbooru2017 [Dataset]. https://gwern.net/danbooru2021
Explore at:
jpegAvailable download formats
Dataset updated
Feb 3, 2018
Dataset authored and provided by
Anonymous, The Danbooru Community, Gwern Branwen, & Aaron Gokaslan
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
May 24, 2005 - Dec 31, 2017
Description
Danbooru2017 is a large-scale anime image database with 2.9m+ images annotated with 77.5m+ tags; it can be useful for machine learning purposes such as image recognition and generation
Z
Personal Protective Equipment Dataset (PPED)
data.niaid.nih.gov
zenodo.org
Updated May 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous (2022). Personal Protective Equipment Dataset (PPED) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6551757
Explore at:
Dataset updated
May 17, 2022
Dataset authored and provided by
Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Personal Protective Equipment Dataset (PPED)

This dataset serves as a benchmark for PPE in chemical plants We provide datasets and experimental results.

The dataset

We produced a data set based on the actual needs and relevant regulations in chemical plants. The standard GB 39800.1-2020 formulated by the Ministry of Emergency Management of the People’s Republic of China defines the protective requirements for plants and chemical laboratories. The complete dataset is contained in the folder PPED/data.

1.1. Image collection

We took more than 3300 pictures. We set the following different characteristics, including different environments, different distances, different lighting conditions, different angles, and the diversity of the number of people photographed.

Backgrounds: There are 4 backgrounds, including office, near machines, factory and regular outdoor scenes.

Scale: By taking pictures from different distances, the captured PPEs are classified in small, medium and large scales.

Light: Good lighting conditions and poor lighting conditions were studied.

Diversity: Some images contain a single person, and some contain multiple people.

Angle: The pictures we took can be divided into front and side.

A total of more than 3300 photos were taken in the raw data under all conditions. All images are located in the folder “PPED/data/JPEGImages”.

1.2. Label

We use Labelimg as the labeling tool, and we use the PASCAL-VOC labelimg format. Yolo use the txt format, we can use trans_voc2yolo.py to convert the XML file in PASCAL-VOC format to txt file. Annotations are stored in the folder PPED/data/Annotations

1.3. Dataset Features

The pictures are made by us according to the different conditions mentioned above. The file PPED/data/feature.csv is a CSV file which notes all the .os of all the image. It records every feature of the picture, including lighting conditions, angles, backgrounds, number of people and scale.

1.4. Dataset Division

The data set is divided into 9:1 training set and test set.

Baseline Experiments

We provide baseline results with five models, namely Faster R-CNN ®, Faster R-CNN (M), SSD, YOLOv3-spp, and YOLOv5. All code and results is given in folder PPED/experiment.

2.1. Environment and Configuration:

Intel Core i7-8700 CPU

NVIDIA GTX1060 GPU

16 GB of RAM

Python: 3.8.10

pytorch: 1.9.0

pycocotools: pycocotools-win

Windows 10

2.2. Applied Models

The source codes and results of the applied models is given in folder PPED/experiment with sub-folders corresponding to the model names.

2.2.1. Faster R-CNN

Faster R-CNN

backbone: resnet50+fpn

We downloaded the pre-training weights from https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth.

We modified the dataset path, training classes and training parameters including batch size.

We run train_res50_fpn.py start training.

Then, the weights are trained by the training set.

Finally, we validate the results on the test set.

backbone: mobilenetv2

the same training method as resnet50+fpn, but the effect is not as good as resnet50+fpn, so it is directly discarded.

The Faster R-CNN source code used in our experiment is given in folder PPED/experiment/Faster R-CNN. The weights of the fully-trained Faster R-CNN (R), Faster R-CNN (M) model are stored in file PPED/experiment/trained_models/resNetFpn-model-19.pth and mobile-model.pth. The performance measurements of Faster R-CNN (R) Faster R-CNN (M) are stored in folder PPED/experiment/results/Faster RCNN(R)and Faster RCNN(M).

2.2.2. SSD

backbone: resnet50

We downloaded pre-training weights from https://download.pytorch.org/models/resnet50-19c8e357.pth.

The same training method as Faster R-CNN is applied.

The SSD source code used in our experiment is given in folder PPED/experiment/ssd. The weights of the fully-trained SSD model are stored in file PPED/experiment/trained_models/SSD_19.pth. The performance measurements of SSD are stored in folder PPED/experiment/results/SSD.

2.2.3. YOLOv3-spp

backbone: DarkNet53

We modified the type information of the XML file to match our application.

We run trans_voc2yolo.py to convert the XML file in VOC format to a txt file.

The weights used are: yolov3-spp-ultralytics-608.pt.

The YOLOv3-spp source code used in our experiment is given in folder PPED/experiment/YOLOv3-spp. The weights of the fully-trained YOLOv3-spp model are stored in file PPED/experiment/trained_models/YOLOvspp-19.pt. The performance measurements of YOLOv3-spp are stored in folder PPED/experiment/results/YOLOv3-spp.

2.2.4. YOLOv5

backbone: CSP_DarkNet

We modified the type information of the XML file to match our application.

We run trans_voc2yolo.py to convert the XML file in VOC format to a txt file.

The weights used are: yolov5s.

The YOLOv5 source code used in our experiment is given in folder PPED/experiment/yolov5. The weights of the fully-trained YOLOv5 model are stored in file PPED/experiment/trained_models/YOLOv5.pt. The performance measurements of YOLOv5 are stored in folder PPED/experiment/results/YOLOv5.

2.3. Evaluation

The computed evaluation metrics as well as the code needed to compute them from our dataset are provided in the folder PPED/experiment/eval.

Code Sources

Faster R-CNN (R and M)

https://github.com/WZMIAOMIAO/deep-learning-for-image-processing/tree/master/pytorch_object_detection/faster_rcnn

official code: https://github.com/pytorch/vision/blob/main/torchvision/models/detection/faster_rcnn.py

SSD

https://github.com/WZMIAOMIAO/deep-learning-for-image-processing/tree/master/pytorch_object_detection/ssd

official code: https://github.com/pytorch/vision/blob/main/torchvision/models/detection/ssd.py

YOLOv3-spp

https://github.com/WZMIAOMIAO/deep-learning-for-image-processing/tree/master/pytorch_object_detection/yolov3-spp

YOLOv5

https://github.com/ultralytics/yolov5
Large ornithopod footprint classification validated by Díaz-Martínez et al....
plos.figshare.com
xls
Updated Nov 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yeoncheol Ha; Seung-Sep Kim (2023). Large ornithopod footprint classification validated by Díaz-Martínez et al. [3]. [Dataset]. http://doi.org/10.1371/journal.pone.0293020.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0293020.t001
Dataset updated
Nov 29, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Yeoncheol Ha; Seung-Sep Kim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Large ornithopod footprint classification validated by Díaz-Martínez et al. [3].
Paintings Collection
kaggle.com
Updated Dec 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Paintings Collection [Dataset]. https://www.kaggle.com/datasets/thedevastator/paintings-collection
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 8, 2023
Dataset provided by
Kaggle
Authors
The Devastator
Description
Paintings Collection

A diverse collection of paintings from artists throughout history

By Gove Allen [source]

About this dataset

The Paintings Dataset is a rich and diverse collection of various paintings from different artists spanning across multiple time periods. It includes a wide range of art styles, techniques, and subjects, providing an extensive resource for art enthusiasts, historians, researchers, and anyone interested in exploring the world of visual arts.

This dataset aims to capture the essence of artistic expression through its vast array of paintings. From classical masterpieces to contemporary works, it offers a comprehensive perspective on the evolution of artistic creativity throughout history.

Each record in this dataset represents an individual painting with detailed information such as artist's name, artwork title (if applicable), genre/style classification (e.g., landscape, portrait), medium (e.g., oil on canvas), dimensions (height and width), and provenance details if available. Additionally, some records may include additional metadata like the year or era in which the artwork was created.

By providing such comprehensive data about each painting included within this dataset, it enables users to study various aspects of art history. Researchers can analyze trends across different time periods or explore specific artistic movements by filtering the dataset based on genre or style categories. Art enthusiasts can also use this dataset to discover new artists or artworks that align with their interests.

This valuable collection appeals not only to those seeking knowledge or inspiration from renowned artworks but also encourages exploration into lesser-known pieces that may have been overlooked in mainstream discourse. It fosters engagement with cultural heritage while promoting diversity and inclusivity within the realm of visual arts.

Whether you are interested in studying classical works by universally acclaimed painters like Leonardo da Vinci or exploring modern expressions by emerging contemporary artists—this Paintings Dataset has something for everyone who appreciates aesthetics and enjoys unraveling stories through brushstrokes on canvas

How to use the dataset

How to Use the Paintings Dataset

Welcome to the Paintings Dataset! This dataset is a comprehensive collection of various paintings from different artists and time periods. It contains information about the artist, title, genre, style, and medium of each painting. Whether you are an art enthusiast, researcher, or just curious about paintings, this guide will help you navigate through this dataset easily.

1. Understanding the Columns

This dataset consists of several columns that provide detailed information about each painting. Here is a brief description of each column:

Artist: The name of the artist who created the painting.

Title: The title or name given to the artwork by the artist.

Genre: The artistic category or subject matter depicted in the painting.

Style: The specific artistic style or movement associated with the painting.

Medium: The materials and techniques used by the artist to create the artwork.

2. Exploring Artists and Their Paintings

One interesting way to use this dataset is to explore individual artists and their artworks. You can filter by a specific artist's name in order to retrieve all their paintings included in this collection.

For example: If you are interested in exploring all paintings by Leonardo da Vinci, simply filter using Leonardo da Vinci in Artist column using your preferred data analysis tool.

3. Analyzing Painting Genres

The genre column allows you to analyze different categories within this collection of paintings. You can examine popular genres or compare them across different eras.

To analyze genres: - Get unique values for Genre column. - Count frequency for each genre value. - Visualize results using bar charts or other graphical representations.

You might discover which genres were more predominant during certain periods or which artists were known for specific subjects!

4. Investigating Artistic Styles

Similar to genres, artistic styles also play an essential role in the world of painting. This dataset includes various styles like Impressionism, Cubism, Realism, etc. By analyzing the artistic styles column, you can explore trends and shifts in artistic movements.

To investigate styles: - Get unique values for Style column. - Count frequency for each style value. - Visualize results using bar charts or other graphical...
f
Geographic distribution of the large ornithopod dinosaur track data used in...
plos.figshare.com
xls
Updated Nov 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yeoncheol Ha; Seung-Sep Kim (2023). Geographic distribution of the large ornithopod dinosaur track data used in this study. [Dataset]. http://doi.org/10.1371/journal.pone.0293020.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0293020.t002
Dataset updated
Nov 29, 2023
Dataset provided by
PLOS ONE
Authors
Yeoncheol Ha; Seung-Sep Kim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Geographic distribution of the large ornithopod dinosaur track data used in this study.
P
ArtBench-10 (32x32) Dataset
paperswithcode.com
Updated Jun 16, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peiyuan Liao; Xiuyu Li; Xihui Liu; Kurt Keutzer (2022). ArtBench-10 (32x32) Dataset [Dataset]. https://paperswithcode.com/dataset/artbench-10
Explore at:
Dataset updated
Jun 16, 2022
Authors
Peiyuan Liao; Xiuyu Li; Xihui Liu; Kurt Keutzer
Description
We introduce ArtBench-10, the first class-balanced, high-quality, cleanly annotated, and standardized dataset for benchmarking artwork generation. It comprises 60,000 images of artwork from 10 distinctive artistic styles, with 5,000 training images and 1,000 testing images per style. ArtBench-10 has several advantages over previous artwork datasets. Firstly, it is class-balanced while most previous artwork datasets suffer from the long tail class distributions. Secondly, the images are of high quality with clean annotations. Thirdly, ArtBench-10 is created with standardized data collection, annotation, filtering, and preprocessing procedures. We provide three versions of the dataset with different resolutions (32×32, 256×256, and original image size), formatted in a way that is easy to be incorporated by popular machine learning frameworks.
d
The Million Song Dataset
datadryad.org
zip
Updated Oct 5, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thierry Bertin-Mahieux; Daniel P.W. Ellis; Brian Whitman; Paul Lamere (2017). The Million Song Dataset [Dataset]. http://doi.org/10.15146/R3467H
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.15146/R3467H
Dataset updated
Oct 5, 2017
Dataset provided by
Dryad
Authors
Thierry Bertin-Mahieux; Daniel P.W. Ellis; Brian Whitman; Paul Lamere
Time period covered
2017
Description
We introduce the Million Song Dataset, a freely-available collection of audio features and metadata for a million contemporary popular music tracks. We describe its creation process, its content, and its possible uses. Attractive features of the Million Song Database include the range of existing resources to which it is linked, and the fact that it is the largest current research dataset in our field. As an illustration, we present year prediction as an example application, a task that has, until now, been difficult to study owing to the absence of a large set of suitable data. We show positive results on year prediction, and discuss more generally the future development of the dataset.

Facebook

Twitter

Click to copy link

Link copied

Cite

Myles O'Neill (2018). Tagged Anime Illustrations [Dataset]. https://www.kaggle.com/mylesoneill/tagged-anime-illustrations/discussion

Tagged Anime Illustrations

Explore more than 300,000 pieces of fan art

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jul 30, 2018

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Myles O'Neill

Description

Context

This is a dataset of richly tagged and labeled artwork depicting characters from Japanese anime. The data comes from two image boards, danbooru and moeimouto. This data can be used in an variety of different interesting ways, from classification to generative modeling. Please note that while all of the images in this dataset have been tagged as SFW (non-explicit), the websites these are from do not ban explicit or pornographic images and mislabeled images are possibly still in the dataset.

Danbooru2017: A large-scale crowdsourced and tagged anime illustration dataset

The first set of data comes from the imageboard Danbooru. The entire corpus of Danbooru images was scraped from the site with permission and was collected into a dataset. The zip files included here have the full metadata for these images as well as a subset of 300,000 of the images in normalized 512px x 512px form. Full information about this dataset is available here:

https://www.gwern.net/Danbooru2017

From the article:

Deep learning for computer revision relies on large annotated datasets. Classification/categorization has benefited from the creation of ImageNet, which classifies 1m photos into 1000 categories. But classification/categorization is a coarse description of an image which limits application of classifiers, and there is no comparably large dataset of images with many tags or labels which would allow learning and detecting much richer information about images. Such a dataset would ideally be >1m images with at least 10 descriptive tags each which can be publicly distributed to all interested researchers, hobbyists, and organizations. There are currently no such public datasets, as ImageNet, Birds, Flowers, and MS COCO fall short either on image or tag count or restricted distribution. I suggest that the image -boorus be used. The image boorus are longstanding web databases which host large numbers of images which can be tagged or labeled with an arbitrary number of textual descriptions; they were developed for and are most popular among fans of anime, who provide detailed annotations.

The best known booru, with a focus on quality, is Danbooru. We create & provide a torrent which contains ~1.9tb of 2.94m images with 77.5m tag instances (of 333k defined tags, ~26.3/image) covering Danbooru from 24 May 2005 through 31 December 2017 (final ID: #2,973,532), providing the image files & a JSON export of the metadata. We also provide a smaller torrent of SFW images downscaled to 512x512px JPG (241GB; 2,232,462 images) for convenience.

Our hope is that a Danbooru2017 dataset can be used for rich large-scale classification/tagging & learned embeddings, test out the transferability of existing computer vision techniques (primarily developed using photographs) to illustration/anime-style images, provide an archival backup for the Danbooru community, feed back metadata improvements & corrections, and serve as a testbed for advanced techniques such as conditional image generation or style transfer.

Anime face character dataset

The second set of data included in this dataset is a little more manageable than the first, it includes a number of cropped illustrated faces from the now defunct site moeimouto. This dataset has been used in GAN work in the past. The data comes from:

http://www.nurs.or.jp/~nagadomi/animeface-character-dataset/

More information:

http://www.nurs.or.jp/~nagadomi/animeface-character-dataset/README.html

If you are interested in creating more face data (potentially from the Danbooru data), here is a helpful resource: https://github.com/nagadomi/lbpcascade_animeface

Other Data

If you are looking for something a little easier to crack into, check out this other great anime image booru dataset: https://www.kaggle.com/alamson/safebooru

Clear search

Close search

Google apps

Main menu

Tagged Anime Illustrations

Context

Danbooru2017: A large-scale crowdsourced and tagged anime illustration dataset

Anime face character dataset

Other Data

Dataset of artists who created Large Drawing #5

The Quick, Draw! Dataset

FloorPlanCAD Dataset

Evaluating Statistical Methods Using Plasmode Data Sets in the Age of...

WikiArt Dataset

Cartoon -> Feature Pairs

Context

Attributes

Acknowledgements

Inspiration

danbooru2023

Met Dataset

Dataset of book subjects that contain The larger illustrated guide to birds...

Dataset of book subjects that contain The illustrated encyclopedia of major...

Data used by EPA researchers to generate illustrative figures for overview...

PASS Dataset

Danbooru2017

Personal Protective Equipment Dataset (PPED)

Large ornithopod footprint classification validated by Díaz-Martínez et al....

Paintings Collection

Paintings Collection

A diverse collection of paintings from artists throughout history

About this dataset

How to use the dataset

How to Use the Paintings Dataset

1. Understanding the Columns

2. Exploring Artists and Their Paintings

3. Analyzing Painting Genres

4. Investigating Artistic Styles

Geographic distribution of the large ornithopod dinosaur track data used in...

ArtBench-10 (32x32) Dataset

The Million Song Dataset

Tagged Anime Illustrations

Explore more than 300,000 pieces of fan art

Context

Danbooru2017: A large-scale crowdsourced and tagged anime illustration dataset

Anime face character dataset

Other Data