99 datasets found
  1. Cu dataset – A copper ore labeled images dataset for segmentation training...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jul 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Otávio da Fonseca Martins Gomes; Otávio da Fonseca Martins Gomes; Sidnei Paciornik; Sidnei Paciornik; Michel Pedro Filippo; Michel Pedro Filippo; Gilson Alexandre Ostwald Pedro da Costa; Gilson Alexandre Ostwald Pedro da Costa; Guilherme Lucio Abelha Mota; Guilherme Lucio Abelha Mota (2021). Cu dataset – A copper ore labeled images dataset for segmentation training and testing [Dataset]. http://doi.org/10.5281/zenodo.5020566
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 16, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Otávio da Fonseca Martins Gomes; Otávio da Fonseca Martins Gomes; Sidnei Paciornik; Sidnei Paciornik; Michel Pedro Filippo; Michel Pedro Filippo; Gilson Alexandre Ostwald Pedro da Costa; Gilson Alexandre Ostwald Pedro da Costa; Guilherme Lucio Abelha Mota; Guilherme Lucio Abelha Mota
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is composed of 121 pairs of correlated images. Each pair contains one image of a copper ore sample acquired through reflected light microscopy (RGB, 24-bit), and the corresponding binary reference image (8-bit), in which the pixels are labeled as belonging to one of two classes: ore (0) or embedding resin (255).

    The sample came from a copper ore from Yauri Cusco (Peru) with a complex mineralogy, mainly composed of sulfides, oxides, silicates, and native copper. It was classified by size. The fraction +74-100 μm was cold mounted with epoxy resin and subsequently ground and polished.

    Correlative microscopy was employed for image acquisition. Thus, 121 fields were imaged on a reflected light microscope with a 20× (NA 0.40) objective lens and on a scanning electron microscope (SEM). In sequence, they were registered, resulting in images of 1017×753 pixels with a resolution of 0.53 µm/pixel. As matter of fact, some images (the images No. 2, 3, 24, 25, 46, 47, 69, 91, and 113) have slightly smaller sizes because they were cropped during the registration procedure to correct co-localization errors of the order of a few pixels. Finally, the images from SEM were thresholded to generate the reference images.

    Further description of this sample and its imaging procedure can be found in the work by Gomes and Paciornik (2012).

    This dataset was created for developing and testing deep learning models on semantic segmentation tasks. The paper of Filippo et al. (2021) presented a variant of the DeepLabv3+ model (Chen et al., 2018) that reached mean values of 90.56% and 92.12% for overall accuracy and F1 score, respectively, for 5 rounds of experiments (training and testing), each with a different, random initialization of network weights.

    For further questions and suggestions, please do not hesitate to contact us.

    Contact email: ogomes@gmail.com

    If you use this dataset in your own work, please cite this DOI: 10.5281/zenodo.5020566

    Please also cite this paper, which provides additional details about the dataset:

    Michel Pedro Filippo, Otávio da Fonseca Martins Gomes, Gilson Alexandre Ostwald Pedro da Costa, Guilherme Lucio Abelha Mota. Deep learning semantic segmentation of opaque and non-opaque minerals from epoxy resin in reflected light microscopy images. Minerals Engineering, Volume 170, 2021, 107007, https://doi.org/10.1016/j.mineng.2021.107007.

  2. u

    visuAAL Skin Segmentation Dataset

    • observatorio-cientifico.ua.es
    • data.niaid.nih.gov
    • +2more
    Updated 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hashemifard, Kooshan; Florez-Revuelta, Francisco; Hashemifard, Kooshan; Florez-Revuelta, Francisco (2022). visuAAL Skin Segmentation Dataset [Dataset]. https://observatorio-cientifico.ua.es/documentos/668fc45eb9e7c03b01bdb3c6
    Explore at:
    Dataset updated
    2022
    Authors
    Hashemifard, Kooshan; Florez-Revuelta, Francisco; Hashemifard, Kooshan; Florez-Revuelta, Francisco
    Description

    The visuAAL Skin Segmentation Dataset contains 46,775 high quality images divided into a training set with 45,623 images, and a validation set with 1,152 images. Skin areas have been obtained automatically from the FashionPedia garment dataset. The process to extract the skin areas is explained in detail in the paper 'From Garment to Skin: The visuAAL Skin Segmentation Dataset'. If you use the visuAAL Skin Segmentation Dataset, please, cite: https://doi.org/10.5281/zenodo.6973396 https://doi.org/10.1007/978-3-031-13321-3_6 How to use: Download the FashionPedia dataset from https://fashionpedia.github.io/home/Fashionpedia_download.html Download the visuAAL Skin Segmentation Dataset. The dataset consists of two folders, namely train_masks and val_masks. Each folder corresponds to the training and validation sets in the original FashionPedia dataset. After extracting the images from FashionPedia, for each image existing in the visuAAL skin segmentation dataset, the original image can be found with the same name (file_name in the annotations file). A sample of image data in the FashionPedia dataset is: {'id': 12305, 'width': 680, 'height': 1024, 'file_name': '064c8022b32931e787260d81ed5aafe8.jpg', 'license': 4, 'time_captured': 'March-August, 2018', 'original_url': 'https://farm2.staticflickr.com/1936/8607950470_9d9d76ced7_o.jpg', 'isstatic': 1, 'kaggle_id': '064c8022b32931e787260d81ed5aafe8'} NOTE: Not all the images in the FashionPedia dataset have the correponding skin mask in the visuAAL Skin Segmentation Dataset, as there are images in which only garment parts and not people are present in them. These images were removed when creating the visuAAL Skin Segmentation Dataset. However, all the instances in the visuAAL skin segmentation dataset have their corresponding match in the FashionPedia dataset.

  3. Z

    FeM dataset – An iron ore labeled images dataset for segmentation training...

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    Updated Jul 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gomes, Otávio da Fonseca Martins; Paciornik, Sidnei; Filippo, Michel Pedro; da Costa, Gilson Alexandre Ostwald Pedro; Mota, Guilherme Lucio Abelha (2021). FeM dataset – An iron ore labeled images dataset for segmentation training and testing [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5014699
    Explore at:
    Dataset updated
    Jul 16, 2021
    Dataset provided by
    Dept. of Chemical and Materials Engineering, PUC-Rio
    Dept. of Informatics and Computer Science, Rio de Janeiro State University (UERJ)
    Centre for Mineral Technology
    Postgraduate Program in Computational Sciences, Rio de Janeiro State University (UERJ)
    Authors
    Gomes, Otávio da Fonseca Martins; Paciornik, Sidnei; Filippo, Michel Pedro; da Costa, Gilson Alexandre Ostwald Pedro; Mota, Guilherme Lucio Abelha
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is composed of 81 pairs of correlated images. Each pair contains one image of an iron ore sample acquired through reflected light microscopy (RGB, 24-bit), and the corresponding binary reference image (8-bit), in which the pixels are labeled as belonging to one of two classes: ore (0) or embedding resin (255).

    The sample came from an itabiritic iron ore concentrate from Quadrilátero Ferrífero (Brazil) mainly composed of hematite and quartz, with little magnetite and goethite. It was classified by size and concentrated with a dense liquid. Then, the fraction -149+105 μm with density greater than 3.2 was cold mounted with epoxy resin and subsequently ground and polished.

    Correlative microscopy was employed for image acquisition. Thus, 81 fields were imaged on a reflected light microscope with a 10× (NA 0.20) objective lens and on a scanning electron microscope (SEM). In sequence, they were registered, resulting in images of 999×756 pixels with a resolution of 1.05 µm/pixel. Finally, the images from SEM were thresholded to generate the reference images.

    Further description of this sample and its imaging procedure can be found in the work by Gomes and Paciornik (2012).

    This dataset was created for developing and testing deep learning models on semantic segmentation tasks. The paper of Filippo et al. (2021) presented a variant of the DeepLabv3+ model that reached mean values of 91.43% and 93.13% for overall accuracy and F1 score, respectively, for 5 rounds of experiments (training and testing), each with a different, random initialization of network weights.

    For further questions and suggestions, please do not hesitate to contact us.

    Contact email: ogomes@gmail.com

    If you use this dataset in your own work, please cite this DOI: 10.5281/zenodo.5014700

    Please also cite this paper, which provides additional details about the dataset:

    Michel Pedro Filippo, Otávio da Fonseca Martins Gomes, Gilson Alexandre Ostwald Pedro da Costa, Guilherme Lucio Abelha Mota. Deep learning semantic segmentation of opaque and non-opaque minerals from epoxy resin in reflected light microscopy images. Minerals Engineering, Volume 170, 2021, 107007, https://doi.org/10.1016/j.mineng.2021.107007.

  4. Annotated Ultrasound Liver images Dataset

    • kaggle.com
    zip
    Updated Apr 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Orvile (2025). Annotated Ultrasound Liver images Dataset [Dataset]. https://www.kaggle.com/datasets/orvile/annotated-ultrasound-liver-images-dataset
    Explore at:
    zip(70388588 bytes)Available download formats
    Dataset updated
    Apr 2, 2025
    Authors
    Orvile
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Annotated Ultrasound Liver Images

    This dataset contains a collection of annotated ultrasound images of the liver, designed to aid in the development of computer vision models for liver analysis, segmentation, and disease detection. The annotations include outlines of the liver and liver mass regions, as well as classifications into benign, malignant, and normal cases.

    Creators: Xu Yiming, Zheng Bowen, Liu Xiaohong, Wu Tao, Ju Jinxiu, Wang Shijie, Lian Yufan, Zhang Hongjun, Liang Tong, Sang Ye, Jiang Rui, Wang Guangyu, Ren Jie, Chen Ting

    Published: November 2, 2022 Version: v1 DOI: 10.5281/zenodo.7272660

    Dataset Overview

    This dataset provides ultrasound images of the liver with detailed annotations. The annotations highlight the liver itself and any liver mass regions present. The images are categorized into three classes:

    • Benign: Images showing benign liver conditions.
    • Malignant: Images showing malignant liver conditions.
    • Normal: Images of healthy livers.

    Files Included

    The dataset is organized into three zip files:

    • Benign.zip (16.9 MB): Contains ultrasound images classified as benign. (md5: c37fef0cb2730236a79ef57e5315995e)
    • Malignant.zip (46.9 MB): Contains ultrasound images classified as malignant. (md5: 63894a9e5654a69c3b94bda84071dfb0)
    • Normal.zip (6.6 MB): Contains ultrasound images of normal livers. (md5: a7e16299b2cf12ca4a6c3468d2e4978f)

    Annotations

    The ultrasound images have been annotated to show:

    • Outlines of the liver.
    • Regions of liver masses (where applicable).

    These annotations make the dataset suitable for tasks such as segmentation of the liver and liver masses, as well as classification of liver conditions.

    Potential Uses

    This dataset can be valuable for a variety of applications, including:

    • Training and evaluating deep learning models for liver disease detection.
    • Developing algorithms for automatic segmentation of the liver and liver masses in ultrasound images.
    • Research in medical image analysis and computer-aided diagnosis.
    • Educational purposes in medical imaging and related fields.

    Copyright and Citation

    This dataset is subject to copyright. Any use of the data must include appropriate acknowledgement and credit. Please contact the authors of the published data and cite the publication and the provided URL.

    Citation:

    Xu Yiming, Zheng Bowen, Liu Xiaohong, Wu Tao, Ju Jinxiu, Wang Shijie, Lian Yufan, Zhang Hongjun, Liang Tong, Sang Ye, Jiang Rui, Wang Guangyu, Ren Jie, & Chen Ting. (2022). Annotated Ultrasound Liver images [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7272660

    APA Style Citation:

    Xu, Y., Bowen, Z., Xiaohong, L., Tao, W., Jinxiu, J., Shijie, W., Yufan, L., Hongjun, Z., Tong, L., Ye, S., Rui, J., Guangyu, W., Jie, R., & Ting, C. (2022). Annotated Ultrasound Liver images [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7272660

    License

    Creative Commons Attribution 4.0 International

    We hope this dataset is helpful for your research and projects!

    🙏 If you find this dataset useful, please consider giving it an upvote! 👍 Thank you! 😊

  5. CODEBRIM: COncrete DEfect BRidge IMage Dataset

    • zenodo.org
    • data-staging.niaid.nih.gov
    • +2more
    bin, zip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Mundt; Sagnik Majumder; Sreenivas Murali; Panagiotis Panetsos; Visvanathan Ramesh; Martin Mundt; Sagnik Majumder; Sreenivas Murali; Panagiotis Panetsos; Visvanathan Ramesh (2020). CODEBRIM: COncrete DEfect BRidge IMage Dataset [Dataset]. http://doi.org/10.5281/zenodo.2620293
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Martin Mundt; Sagnik Majumder; Sreenivas Murali; Panagiotis Panetsos; Visvanathan Ramesh; Martin Mundt; Sagnik Majumder; Sreenivas Murali; Panagiotis Panetsos; Visvanathan Ramesh
    Description

    CODEBRIM: COncrete DEfect BRidge IMage Dataset for multi-target multi-class concrete defect classification in computer vision and machine learning.

    Dataset as presented and detailed in our CVPR 2019 publication: http://openaccess.thecvf.com/content_CVPR_2019/html/Mundt_Meta-Learning_Convolutional_Neural_Architectures_for_Multi-Target_Concrete_Defect_Classification_With_CVPR_2019_paper.html or https://arxiv.org/abs/1904.08486 . If you make use of the dataset please cite it as follows:

    "Martin Mundt, Sagnik Majumder, Sreenivas Murali, Panagiotis Panetsos, Visvanathan Ramesh. Meta-learning Convolutional Neural Architectures for Multi-target Concrete Defect Classification with the COncrete DEfect BRidge IMage Dataset. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019"

    We offer a supplementary GitHub repository with code to reproduce the paper and data loaders: https://github.com/ccc-frankfurt/meta-learning-CODEBRIM

    For ease of use we provide the dataset in multiple different versions.

    Files contained:
    * CODEBRIM_original_images: contains the original full-resolution images and bounding box annotations
    * CODEBRIM_cropped_dataset: contains the extracted crops/patches with corresponding class labels from the bounding boxes
    * CODEBRIM_classification_dataset: contains the cropped patches with corresponding class labels split into training, validation and test sets for machine learning
    * CODEBRIM_classification_balanced_dataset: similar to "CODEBRIM_classification_dataset" but with the exact replication of training images to balance the dataset in order to reproduce results obtained in the paper.

  6. Z

    Mars orbital image (HiRISE) labeled data set

    • data-staging.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    You Lu; Kiri Wagstaff (2020). Mars orbital image (HiRISE) labeled data set [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_1048300
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Jet Propulsion Laboratory
    Jet Propulsion Laboratory
    Authors
    You Lu; Kiri Wagstaff
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This data set contains 3820 landmarks that were extracted from 168 HiRISE images. The landmarks were detected in HiRISE browse images. For each landmark, we cropped a square bounding box the included the full extent of the landmark plus a 30-pixel margin to left, right, top, and bottom. Each cropped image was then resized to 227x227 pixels.

    Contents:

    map-proj/: Directory containing individual cropped landmark images

    labels-map-proj.txt: Class labels (ids) for each landmark image

    landmark_mp.py: Python dictionary that maps class ids to semantic names

    Attribution:

    If you use this data set in your own work, please cite this DOI: 10.5281/zenodo.1048301

    Please also cite this paper, which provides additional details about the data set.

    Kiri L. Wagstaff, You Lu, Alice Stanboli, Kevin Grimes, Thamme Gowda, and Jordan Padams. "Deep Mars: CNN Classification of Mars Imagery for the PDS Imaging Atlas." Proceedings of the Thirtieth Annual Conference on Innovative Applications of Artificial Intelligence, 2018.

  7. 🪙 Coin Image Dataset

    • kaggle.com
    zip
    Updated Sep 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mexwell (2023). 🪙 Coin Image Dataset [Dataset]. https://www.kaggle.com/datasets/mexwell/coin-image-dataset
    Explore at:
    zip(342484543 bytes)Available download formats
    Dataset updated
    Sep 11, 2023
    Authors
    mexwell
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The coin image dataset is a dataset of 60 classes of Roman Republican coins. Each class is represented by three coin images of the reverse side acquired at Coin Cabinet of the Museum of Fine Arts in Vienna, Austria.

    Technical Details

    The image filenames have the following syntax: class[classid]_image[1-3].png The dataset also contains a CSV-file “classes.csv” which maps the class-IDs to the reference numbers defined by Crawford’s standard reference book [2].

    References

    [1] Zambanini S., Kampel M. “Coarse-to-Fine Correspondence Search for Classifying Ancient Coins“, 2nd ACCV Workshop on e-Heritage, pp. 25-36, Daejeon, South Korea, November 2012. (pdf) [2] Crawford, M.H.: “Roman Republican Coinage”, 2 vols., Cambridge University Press, 1974.

    Citation

    Sebastian Zambanini. (2014). Coin Image Dataset [Data set]. Zenodo. https://doi.org/10.5281/zenodo.4454549

    Original Data

    Acknowlegement

    Foto von Priyansh Patidar auf Unsplash

  8. Data from: On the Role of Images for Analyzing Claims in Social Media

    • data.europa.eu
    unknown
    Updated Mar 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2021). On the Role of Images for Analyzing Claims in Social Media [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-4592249?locale=de
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Mar 9, 2021
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    Description

    This is a multimodal dataset used in the paper "On the Role of Images for Analyzing Claims in Social Media", accepted at CLEOPATRA-2021 (2nd International Workshop on Cross-lingual Event-centric Open Analytics), co-located with The Web Conference 2021. The four datasets are curated for two different tasks that broadly come under fake news detection. Originally, the datasets were released as part of challenges or papers for text-based NLP tasks and are further extended here with corresponding images. 1. clef_en and clef_ar are English and Arabic Twitter datasets for claim check-worthiness detection released in CLEF CheckThat! 2020 Barrón-Cedeno et al. [1]. 2. lesa is an English Twitter dataset for claim detection released by Gupta et al.[2] 3. mediaeval is an English Twitter dataset for conspiracy detection released in MediaEval 2020 Workshop by Pogorelov et al.[3] The dataset details like data curation and annotation process can be found in the cited papers. Datasets released here with corresponding images are relatively smaller than the original text-based tweets. The data statistics are as follows: 1. clef_en: 281 2. clef_ar: 2571 3. lesa: 1395 4. mediaeval: 1724 Each folder has two sub-folders and a json file data.json that consists of crawled tweets. Two sub-folders are: 1. images: This Contains crawled images with the same name as tweet-id in data.json. 2. splits: This contains 5-fold splits used for training and evaluation in our paper. Each file in this folder is a csv with two columns

  9. Pre-processed (in Detectron2 and YOLO format) planetary images and boulder...

    • data.europa.eu
    • data-staging.niaid.nih.gov
    • +1more
    unknown
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). Pre-processed (in Detectron2 and YOLO format) planetary images and boulder labels collected during the BOULDERING Marie Skłodowska-Curie Global fellowship [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-14250874?locale=no
    Explore at:
    unknown(601409488)Available download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This database contains 4976 planetary images of boulder fields located on Earth, Mars and Moon. The data was collected during the BOULDERING Marie Skłodowska-Curie Global fellowship between October 2021 and 2024. The data was already splitted into train, validation and test datasets, but feel free to re-organize the labels at your convenience. For each image, all of the boulder outlines within the image were carefully mapped in QGIS. More information about the labelling procedure can be found in the following manuscript (https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2023JE008013). This dataset differs from the previous dataset included along with the manuscript https://zenodo.org/records/8171052, as it contains more mapped images, especially of boulder populations around young impact structures on the Moon (cold spots). In addition, the boulder outlines were also pre-processed so that it can be ingested directly in YOLOv8. A description of what is what is given in the README.txt file (in addition in how to load the custom datasets in Detectron2 and YOLO). Most of the other files are mostly self-explanatory. Please see previous dataset or manuscript for more information. If you want to have more information about specific lunar and martian planetary images, the IDs of the images are still available in the name of the file. Use this ID to find more information (e.g., M121118602_00875_image.png, ID M121118602 ca be used on https://pilot.wr.usgs.gov/). I will also upload the raw data from which this pre-processed dataset was generated (see https://zenodo.org/records/14250970). Thanks to this database, you can easily train a Detectron2 Mask R-CNN or YOLO instance segmentation models to automatically detect boulders. How to cite: Please refer to the "how to cite" section of the readme file of https://github.com/astroNils/YOLOv8-BeyondEarth. Structure: . └── boulder2024/ ├── jupyter-notebooks/ │ └── REGISTERING_BOULDER_DATASET_IN_DETECTRON2.ipynb ├── test/ │ └── images/ │ ├──

  10. Data from: Ichthyofauna (Osteichthyes, Actinopterygii) from tributaries of...

    • demo.gbif.org
    Updated Oct 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BioFresh (2025). Ichthyofauna (Osteichthyes, Actinopterygii) from tributaries of the Beni and Mamoré rivers in the Llanos de Moxos wetland of the Bolivian Amazon [Dataset]. http://doi.org/10.15468/sjkfca
    Explore at:
    Dataset updated
    Oct 3, 2025
    Dataset provided by
    Global Biodiversity Information Facilityhttps://www.gbif.org/
    BioFresh
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Sep 16, 2023 - Sep 30, 2023
    Area covered
    Description

    This dataset is an update to Supplemental Material Table S1 from the paper: Yunoki T, Echeverria AR, Cholima RB, Miranda Ch. G, Moreno FA (2025). Ichthyofauna (Osteichthyes, Actinopterygii) from tributaries of the Beni and Mamoré rivers in the Llanos de Moxos wetland of the Bolivian Amazon. Check List 21: 318–346.

    Specimen images associated with the occurrence records have been deposited in Zenodo across several archived sets. Each image reference in the associatedMedia field includes:

    A direct URL linking to the individual image file hosted on Zenodo (e.g., https://zenodo.org/records/.../files/image.JPG

    The DOI of the complete dataset in which the image is archived (e.g., https://doi.org/...)

    This format ensures both persistent citation via dataset DOIs and straightforward access to individual images. The current version improves traceability between image files and their corresponding occurrence records.

    Image sets are available at the following DOIs:

    https://doi.org/10.5281/zenodo.15748915

    https://doi.org/10.5281/zenodo.15749855

    https://doi.org/10.5281/zenodo.15750205

    https://doi.org/10.5281/zenodo.15750430

    https://doi.org/10.5281/zenodo.15750726

    https://doi.org/10.5281/zenodo.15754796

    https://doi.org/10.5281/zenodo.15755252

    https://doi.org/10.5281/zenodo.15756212

    https://doi.org/10.5281/zenodo.15756460

  11. r

    The Klarna Product-Page Dataset

    • researchdata.se
    • demo.researchdata.se
    • +2more
    Updated Nov 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexandra Hotti; Riccardo Sven Risuleo; Stefan Magureanu; Aref Moradi; Jens Lagergren (2024). The Klarna Product-Page Dataset [Dataset]. http://doi.org/10.5281/zenodo.12605480
    Explore at:
    Dataset updated
    Nov 7, 2024
    Dataset provided by
    KTH Royal Institute of Technology
    Authors
    Alexandra Hotti; Riccardo Sven Risuleo; Stefan Magureanu; Aref Moradi; Jens Lagergren
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Description

    The Klarna Product Page Dataset is a dataset of publicly available pages corresponding to products sold online on various e-commerce websites. The dataset contains offline snapshots of 51,701 product pages collected from 8,175 distinct merchants across 8 different markets (US, GB, SE, NL, FI, NO, DE, AT) between 2018 and 2019. On each page, analysts labelled 5 elements of interest: the price of the product, its image, its name and the add-to-cart and go-to-cart buttons (if found). These labels are present in the HTML code as an attribute called klarna-ai-label taking one of the values: Price, Name, Main picture, Add to cart and Cart.

    The snapshots are available in 3 formats: as MHTML files (~24GB), as WebTraversalLibrary (WTL) snapshots (~7.4GB), and as screeshots (~8.9GB). The MHTML format is less lossy, a browser can render these pages though any Javascript on the page is lost. The WTL snapshots are produced by loading the MHTML pages into a chromium-based browser. To keep the WTL dataset compact, the screenshots of the rendered MTHML are provided separately; here we provide the HTML of the rendered DOM tree and additional page and element metadata with rendering information (bounding boxes of elements, font sizes etc.). The folder structure of the screenshot dataset is identical to the one the WTL dataset and can be used to complete the WTL snapshots with image information. For convenience, the datasets are provided with a train/test split in which no merchants in the test set are present in the training set.

    Corresponding Publication

    For more information about the contents of the datasets (statistics etc.) please refer to the following TMLR paper.

    GitHub Repository

    The code needed to re-run the experiments in the publication accompanying the dataset can be accessed here.

    Citing

    If you found this dataset useful in your research, please cite the paper as follows:

    @article{hotti2024the, title={The Klarna Product Page Dataset: Web Element Nomination with Graph Neural Networks and Large Language Models}, author={Alexandra Hotti and Riccardo Sven Risuleo and Stefan Magureanu and Aref Moradi and Jens Lagergren}, journal={Transactions on Machine Learning Research}, issn={2835-8856}, year={2024}, url={https://openreview.net/forum?id=zz6FesdDbB}, note={} }

  12. Z

    Data from: Computational 3D resolution enhancement for optical coherence...

    • data.niaid.nih.gov
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jos de Wit; George-Othon Glentis; Jeroen Kalkman (2024). Computational 3D resolution enhancement for optical coherence tomography with a narrowband visible light source [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7870794
    Explore at:
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    University of Peloponnese, Tripolis, Greece
    Delft University of Technology, Delft, The Netherlands
    Authors
    Jos de Wit; George-Othon Glentis; Jeroen Kalkman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the code and data underlying the publication "Computational 3D resolution enhancement for optical coherence tomography with a narrowband visible light source" in Biomedical Optics Express 14, 3532-3554 (2023) (doi.org/10.1364/BOE.487345).

    The reader is free to use the scripts and data in this depository, as long as the manuscript is correctly cited in their work. For further questions, please contact the corresponding author.

    Description of the code and datasets

    Table 1 describes all the Matlab and Python scripts in this depository. Table 2 describes the datasets. The input datasets are the phase corrected datasets, as the raw data is large in size and phase correction using a coverslip as reference is rather straightforward. Processed datasets are also added to the repository to allow for running only a limited number of scripts, or to obtain for example the aberration corrected data without the need to use python. Note that the simulation input data (input_simulations_pointscatters_SLDshape_98zf_noise75.mat) is generated with random noise, so if this is overwritten de results may slightly vary. Also the aberration correction is done with random apertures, so the processed aberration corrected data (exp_pointscat_image_MIAA_ISAM_CAO.mat and exp_leaf_image_MIAA_ISAM_CAO.mat) will also slightly change if the aberration correction script is run anew. The current processed datasets are used as basis for the figures in the publication. For details on the implementation we refer to the publication.

    Table 1: The Matlab and Python scripts with their description
    
    
        Script name
        Description
    
    
        MIAA_ISAM_processing.m
        This scripts performs the DFT, RFIAA and MIAA processing of the phase-corrected data that can be loaded from the datasets. Afterwards it also applies ISAM on the DFT and MIAA data and plots the results in a figure (via the scripts plot_figure3, plot_figure5 and plot_simulationdatafigure).
    
    
        resolution_analysis_figure4.m
        This figure loads the data from the point scatterers (absolute amplitude data), seeks the point scatterrers and fits them to obtain the resolution data. Finally it plots figure 4 of the publication.
    
    
        fiaa_oct_c1.m, oct_iaa_c1.m, rec_fiaa_oct_c1.m, rfiaa_oct_c1.m 
        These four functions are used to apply fast IAA and MIAA. See script MIAA_ISAM_processing.m for their usage.
    
    
        viridis.m, morgenstemning.m
        These scripts define the colormaps for the figures.
    
    
        plot_figure3.m, plot_figure5.m, plot_simulationdatafigure.m
        These scripts are used to plot the figures 3 and 5 and a figure with simulation data. These scripts are executed at the end of script MIAA_ISAM_processing.m.
    
    
        Python script: computational_adaptive_optics_script.py
        Python script that applied computational adaptive optics to obtain the data for figure 6 of the manuscript.
    
    
        Python script: zernike_functions2.py
        Python script that gives the values and carthesian derrivatives of the Zernike polynomials.
    
    
        figure6_ComputationalAdaptiveOptics.m
        Script that loads the CAO data that was saved in Python, analyzes the resolution, and plots figure 6.
    
    
        Python script: OCTsimulations_3D_script2.py
        Python script simulates OCT data, adds noise and saves it as .mat file for use in the matlab script above.
    
    
        Python script: OCTsimulations2.py
        Module that contains a python class that can be used to simulate 3D OCT datasets based on a Gaussian beam.
    
    
        Matlab toolbox DIPimage 2.9.zip
        Dipimage is used in the scripts. The toolbox can be downloaded online or this zip can be used.
    
    
    
    
    
    
    The datasets in this Zenodo repository
    
    
        Name
        Description
    
    
        input_leafdisc_phasecorrected.mat
        Phase corrected input image of the leaf disc (used in figure 5).
    
    
        input_TiO2gelatin_004_phasecorrected.mat
        Phase corrected input image of the TiO2 in gelatin sample.
    
    
        input_simulations_pointscatters_SLDshape_98zf_noise75
        Input simulation data that, once processed, is used in figure 4.
    

    exp_pointscat_image_DFT.mat

    exp_pointscat_image_DFT_ISAM.mat

    exp_pointscat_image_RFIAA.mat

    exp_pointscat_image_MIAA_ISAM.mat

    exp_pointscat_image_MIAA_ISAM_CAO.mat

        Processed experimental amplitude data for the TiO2 point scattering sample with respectively DFT, DFT+ISAM, RFIAA, MIAA+ISAM and MIAA+ISAM+CAO. These datasets are used for fitting in figure 4 (except for CAO), and MIAA_ISAM and MIAA_ISAM_CAO are used for figure 6.
    

    simu_pointscat_image_DFT.mat

    simu_pointscat_image_RFIAA.mat

    simu_pointscat_image_DFT_ISAM.mat

    simu_pointscat_image_MIAA_ISAM.mat

        Processed amplitude data from the simulation dataset, which is used in the script for figure 4 for the resolution analysis.
    

    exp_leaf_image_MIAA_ISAM.mat

    exp_leaf_image_MIAA_ISAM_CAO.mat

        Processed amplitude data from the leaf sample, with and without aberration correction which is used to produce figure 6.
    

    exp_leaf_zernike_coefficients_CAO_normal_wmaf.mat

    exp_pointscat_zernike_coefficients_CAO_normal_wmaf.mat

        Estimated Zernike coefficients and the weighted moving average of them that is used for the computational aberration correction. Some of this data is plotted in Figure 6 of the manuscript.
    
    
        input_zernike_modes.mat
        The reference Zernike modes corresponding to the data that is loaded to give the modes the proper name.
    

    exp_pointscat_MIAA_ISAM_complex.mat

    exp_leaf_MIAA_ISAM_complex

        Complex MIAA+ISAM processed data that is used as input for the computational aberration correction.
    
  13. Aerial Water Buoys Dataset

    • data.europa.eu
    • zenodo.org
    unknown
    Updated Jul 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). Aerial Water Buoys Dataset [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-7288444?locale=hr
    Explore at:
    unknown(4)Available download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Aerial Water Buoys Dataset: Over the past few years, a plethora of advancements in Unmanned Areal Vehicle (UAV) technologies have made possible advanced UAV-based search and rescue operations with transformative impact on the outcome of critical life-saving missions. This dataset aims into helping the challenging task of multi-castaway tracking and following using a single UAV. Due to the difficulty and data protection of capturing footage of people in the sea, we have captured a dataset of buoys in order to conduct experiments on multi-castaway tracking and following. A paper on multi-castaway tracking and following technical details and experiments will be published soon using this dataset. The dataset consists of top-view images of buoys from various altitudes on the coasts of Larnaca and Protaras in Cyprus. Images were captured at different altitudes in order to challenge object detectors to be able to detect smaller objects in case a UAV needs to track multiple targets, which leads to flying at a higher altitude. There is only one class annotated on all images which is labeled as 'buoy'. Additionally, all annotations were converted into VOC and COCO formats for training in numerous frameworks. The dataset consists of the following images and detection objects (buoys): Subset Images Buoys Training 10814 14811 Validation 1350 1865 Testing 1352 1827 It is advised to further enhance the dataset so that random augmentations are probabilistically applied to each image prior to adding it to the batch for training. Specifically, there are a number of possible transformations such as geometric (rotations, translations, horizontal axis mirroring, cropping, and zooming), as well as image manipulations (illumination changes, color shifting, blurring, sharpening, and shadowing). NOTE If you use this dataset in your research/publication please cite us using the following Antreas Anastasiou, Rafael Makrigiorgis, & Panayiotis Kolios. (2022). Aerial Water Buoys Dataset (1.1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7288444

  14. Z

    Seatizen Atlas image dataset

    • data.niaid.nih.gov
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matteo Contini; Julien Barde; Sylvain Bonhommeau; Victor Illien; Alexis Joly (2025). Seatizen Atlas image dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_12819156
    Explore at:
    Dataset updated
    Jan 15, 2025
    Dataset provided by
    UMR Marbec, IRD, France
    INRIA Zenith, Montpellier, France
    Ifremer DOI, La Réunion, France
    Authors
    Matteo Contini; Julien Barde; Sylvain Bonhommeau; Victor Illien; Alexis Joly
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Seatizen Atlas image dataset

    This repository contains the resources and tools for accessing and utilizing the annotated images within the Seatizen Atlas dataset, as described in the paper Seatizen Atlas: a collaborative dataset of underwater and aerial marine imagery.

    Download the Dataset

    This annotated dataset is part of a bigger dataset composed of labeled and unlabeled images. To access information about the whole dataset, please visit the Zenodo repository and follow the download instructions provided.

    Scientific Publication

    If you use this dataset in your research, please consider citing the associated paper:

    @article{Contini2025, author = {Matteo Contini and Victor Illien and Mohan Julien and Mervyn Ravitchandirane and Victor Russias and Arthur Lazennec and Thomas Chevrier and Cam Ly Rintz and Léanne Carpentier and Pierre Gogendeau and César Leblanc and Serge Bernard and Alexandre Boyer and Justine Talpaert Daudon and Sylvain Poulain and Julien Barde and Alexis Joly and Sylvain Bonhommeau}, doi = {10.1038/s41597-024-04267-z}, issn = {2052-4463}, issue = {1}, journal = {Scientific Data}, pages = {67}, title = {Seatizen Atlas: a collaborative dataset of underwater and aerial marine imagery}, volume = {12}, url = {https://doi.org/10.1038/s41597-024-04267-z}, year = {2025},}

    For detailed information about the dataset and experimental results, please refer to the previous paper.

    Overview

    The Seatizen Atlas dataset includes 14,492 multilabel and 1,200 instance segmentation annotated images. These images are useful for training and evaluating AI models for marine biodiversity research. The annotations follow standards from the Global Coral Reef Monitoring Network (GCRMN).

    Annotation Details

    Annotation Types:

    Multilabel Convention: Identifies all observed classes in an image.

    Instance Segmentation: Highlights contours of each instance for each class.

    List of Classes

    Algae

    Algal Assemblage

    Algae Halimeda

    Algae Coralline

    Algae Turf

    Coral

    Acropora Branching

    Acropora Digitate

    Acropora Submassive

    Acropora Tabular

    Bleached Coral

    Dead Coral

    Gorgonian

    Living Coral

    Non-acropora Millepora

    Non-acropora Branching

    Non-acropora Encrusting

    Non-acropora Foliose

    Non-acropora Massive

    Non-acropora Coral Free

    Non-acropora Submassive

    Seagrass

    Syringodium Isoetifolium

    Thalassodendron Ciliatum

    Habitat

    Rock

    Rubble

    Sand

    Other Organisms

    Thorny Starfish

    Sea Anemone

    Ascidians

    Giant Clam

    Fish

    Other Starfish

    Sea Cucumber

    Sea Urchin

    Sponges

    Turtle

    Custom Classes

    Blurred

    Homo Sapiens

    Human Object

    Trample

    Useless

    Waste

    These classes reflect the biodiversity and variety of habitats captured in the Seatizen Atlas dataset, providing valuable resources for training AI models in marine biodiversity research.

    Usage Notes

    The annotated images are available for non-commercial use. Users are requested to cite the related publication in any resulting works. A GitHub repository has been set up to facilitate data reuse and sharing: GitHub Repository.

    Code Availability

    All related codes for data processing, downloading, and AI model training can be found in the following GitHub repositories:

    Plancha Workflow

    Zenodo Tools

    DinoVdeau Model

    Acknowledgements

    This dataset and associated research have been supported by several organizations, including the Seychelles Islands Foundation, Réserve Naturelle Marine de la Réunion, and Monaco Explorations, among others.

    For any questions or collaboration inquiries, please contact seatizen.ifremer@gmail.com.

  15. Data from: RT-BENE: A Dataset and Baselines for Real-Time Blink Estimation...

    • data.europa.eu
    • data.niaid.nih.gov
    • +1more
    unknown
    Updated Feb 25, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2020). RT-BENE: A Dataset and Baselines for Real-Time Blink Estimation in Natural Environments [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-3685316?locale=fi
    Explore at:
    unknown(1615)Available download formats
    Dataset updated
    Feb 25, 2020
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The RT-BENE dataset is licensed under CC BY-NC-SA 4.0. Commercial usage is not permitted. If you use our blink estimation code or dataset, please cite the relevant paper: @inproceedings{CortaceroICCV2019W, author={Kevin Cortacero and Tobias Fischer and Yiannis Demiris}, booktitle = {Proceedings of the IEEE International Conference on Computer Vision Workshops}, title = {RT-BENE: A Dataset and Baselines for Real-Time Blink Estimation in Natural Environments}, year = {2019}, } More information can be found on the Personal Robotic Lab's website: https://www.imperial.ac.uk/personal-robotics/software/. Overview We manually annotated images that are contained in the "noglasses" part of the RT-GENE dataset with blink annotations. This dataset contains the extracted eye image patches and associated annotations. In particular, rt_bene_subjects.csv is an overview CSV file with the following columns: id subject csv file path to left eye images path to right eye images training/validation/discarded category fold-id for the 3-fold evaluation. Each individual "blink_labels" CSV file (s000_blink_labels.csv to s016_blink_labels.csv) contains two columns: image file name label, where 0.0 is the annotation for open eyes, 1.0 for blinks and 0.5 for annotator disagreement (these images are discarded) Associated code Please see the code repository for code allowing to train and evaluate a deep neural network based on the RT-BENE dataset. The code repository also links to pre-trained models and code for real-time inference.

  16. Z

    Data from: Night and Day Instance Segmented Park (NDISPark) Dataset: a...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    • +2more
    Updated Sep 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ciampi, Luca; Santiago, Carlos; Costeira, Joao Paulo; Gennaro, Claudio; Amato, Giuseppe (2023). Night and Day Instance Segmented Park (NDISPark) Dataset: a Collection of Images taken by Day and by Night for Vehicle Detection, Segmentation and Counting in Parking Areas [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_6560822
    Explore at:
    Dataset updated
    Sep 11, 2023
    Dataset provided by
    Instituto Superior Técnico (LARSyS/IST), Lisbon, Portugal
    Institute of Information Science and Technologies (ISTI-CNR), Pisa, Italy
    Authors
    Ciampi, Luca; Santiago, Carlos; Costeira, Joao Paulo; Gennaro, Claudio; Amato, Giuseppe
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    The Dataset

    A collection of images of parking lots for vehicle detection, segmentation, and counting. Each image is manually labeled with pixel-wise masks and bounding boxes localizing vehicle instances. The dataset includes about 250 images depicting several parking areas describing most of the problematic situations that we can find in a real scenario: seven different cameras capture the images under various weather conditions and viewing angles. Another challenging aspect is the presence of partial occlusion patterns in many scenes such as obstacles (trees, lampposts, other cars) and shadowed cars. The main peculiarity is that images are taken during the day and the night, showing utterly different lighting conditions.

    We suggest a three-way split (train-validation-test). The train split contains images taken during the daytime while validation and test splits include images gathered at night. In line with these splits we provide some annotation files:

    train_coco_annotations.json and val_coco_annotations.json --> JSON files that follow the golden standard MS COCO data format (for more info see https://cocodataset.org/#format-data) for the training and the validation splits, respectively. All the vehicles are labeled with the COCO category 'car'. They are suitable for vehicle detection and instance segmentation.

    train_dot_annotations.csv and val_dot_annotations.csv --> CSV files that contain xy coordinates of the centroids of the vehicles for the training and the validation splits, respectively. Dot annotation is commonly used for the visual counting task.

    ground_truth_test_counting.csv --> CSV file that contains the number of vehicles present in each image. It is only suitable for testing vehicle counting solutions.

    Citing our work

    If you found this dataset useful, please cite the following paper

    @inproceedings{Ciampi_visapp_2021, doi = {10.5220/0010303401850195}, url = {https://doi.org/10.5220%2F0010303401850195}, year = 2021, publisher = {{SCITEPRESS} - Science and Technology Publications}, author = {Luca Ciampi and Carlos Santiago and Joao Costeira and Claudio Gennaro and Giuseppe Amato}, title = {Domain Adaptation for Traffic Density Estimation}, booktitle = {Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications} }

    and this Zenodo Dataset

    @dataset{ciampi_ndispark_6560823, author = {Luca Ciampi and Carlos Santiago and Joao Costeira and Claudio Gennaro and Giuseppe Amato}, title = {{Night and Day Instance Segmented Park (NDISPark) Dataset: a Collection of Images taken by Day and by Night for Vehicle Detection, Segmentation and Counting in Parking Areas}}, month = may, year = 2022, publisher = {Zenodo}, version = {1.0.0}, doi = {10.5281/zenodo.6560823}, url = {https://doi.org/10.5281/zenodo.6560823} }

    Contact Information

    If you would like further information about the dataset or if you experience any issues downloading files, please contact us at luca.ciampi@isti.cnr.it

  17. VIPPrint: A Large Scale Dataset for Colored Printed Documents Authentication...

    • data.europa.eu
    • data.niaid.nih.gov
    unknown
    Updated Feb 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2021). VIPPrint: A Large Scale Dataset for Colored Printed Documents Authentication and Source Linking [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-4454971?locale=cs
    Explore at:
    unknown(677)Available download formats
    Dataset updated
    Feb 14, 2021
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The possibility of carrying out a meaningful forensics analysis on printed and scanned images plays a major role in many applications. First of all, printed documents are often associated with criminal activities, such as terrorist plans, child pornography pictures, and even fake packages. Additionally, printing and scanning can be used to hide the traces of image manipulation and even the synthetic nature of images, since the artifacts commonly found in manipulated and synthetic images are gone after the images are printed and scanned. A problem hindering research in this area is the lack of large scale reference datasets to be used for algorithm development and benchmarking. Motivated by this issue, we share a new dataset composed of a large number of synthetic and natural printed face images. Such a dataset can be used with several computer vision and machine learning approaches for two tasks: pinpointing the printer source of a document and detecting printed pictures generated by deep fakes. When using the dataset, don't forget to cite our paper: @Article{jimaging7030050, AUTHOR = {Ferreira, Anselmo and Nowroozi, Ehsan and Barni, Mauro}, TITLE = {VIPPrint: Validating Synthetic Image Detection and Source Linking Methods on a Large Scale Dataset of Printed Documents}, JOURNAL = {Journal of Imaging}, VOLUME = {7}, YEAR = {2021}, NUMBER = {3}, ARTICLE-NUMBER = {50}, URL = {https://www.mdpi.com/2313-433X/7/3/50}, ISSN = {2313-433X}, DOI = {10.3390/jimaging7030050} }

  18. Data from: RV-TMO: Large-Scale Dataset for Subjective Quality Assessment of...

    • data.europa.eu
    • data.niaid.nih.gov
    unknown
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). RV-TMO: Large-Scale Dataset for Subjective Quality Assessment of Tone Mapped Images [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-8328145?locale=pl
    Explore at:
    unknown(2804)Available download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Tone mapping operators (TMO) are functions that map high dynamic range (HDR) images to a standard dynamic range (SDR), while aiming to preserve the perceptual cues of a scene that govern its visual quality. Despite the increasing number of studies on quality assessment of tone mapped images, current subjective quality datasets have relatively small numbers of images and subjective opinions. Moreover, existing challenges in transferring laboratory experiments to crowdsourcing platforms put a barrier for collecting large-scale datasets through crowdsourcing. We address these challenges and propose the RealVision-TMO (RV-TMO), a large-scale tone mapped image quality dataset. RV-TMO contains 250 unique HDR images, their tone mapped versions obtained using four TMOs and pairwise comparison results from seventy unique observers for each pair. This dataset is published as part of the Journal paper titled as " RV-TMO: Large-Scale Dataset for Subjective Quality Assessment of Tone Mapped Images". If you are using this dataset in your work, please cite the paper below: @ARTICLE{9872141, author={Ak, Ali and Goswami, Abhishek and Hauser, Wolf and Le Callet, Patrick and Dufaux, Frederic}, journal={IEEE Transactions on Multimedia}, title={RV-TMO: Large-Scale Dataset for Subjective Quality Assessment of Tone Mapped Images}, year={2022}, pages={1-12}, doi={10.1109/TMM.2022.3203211}}

  19. NASA Mars Landmarks Classification

    • kaggle.com
    zip
    Updated Jun 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    kaizen (2022). NASA Mars Landmarks Classification [Dataset]. https://www.kaggle.com/datasets/sshikamaru/mars-landmarks/discussion
    Explore at:
    zip(985710335 bytes)Available download formats
    Dataset updated
    Jun 23, 2022
    Authors
    kaizen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mars orbital image (HiRISE) labeled data set version 3

    Authors: Kiri L. Wagstaff, Steven Lu, Gary Doran, Lukas Mandrake Contact: you.lu@jpl.nasa.gov

    This data set contains a total of 73,031 landmarks. 10,433 landmarks were detected and extracted from 180 HiRISE browse images, and 62,598 landmarks were augmented from 10,433 original landmarks. For each original landmark, we cropped a square bounding box that includes the full extent of the landmark plus a 30-pixel margin to left, right, top and bottom. Each cropped landmark was resized to 227x227 pixels, and then was augmented to generate 6 additional landmarks using the following methods:

    1. 90 degrees clockwise rotation
    2. 180 degrees clockwise rotation
    3. 270 degrees clockwise rotation
    4. Horizontal flip
    5. Vertical flip
    6. Random brightness adjustment

    Contents: - map-proj-v3/: Directory containing individual cropped landmark images - labels-map-proj-v3.txt: Class labels (ids) for each landmark image - landmarks_map-proj-v3_classmap.csv: Dictionary that maps class ids to semantic names

    Attribution: If you use this data set in your own work, please cite this DOI: 10.5281/zenodo.2538136

    https://zenodo.org/record/2538136#.YrN7KezMJ9I

  20. Z

    Three Annotated Anomaly Detection Datasets for Line-Scan Algorithms

    • data.niaid.nih.gov
    • zenodo.org
    Updated Aug 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Garske, Samuel; Mao, Yiwei (2024). Three Annotated Anomaly Detection Datasets for Line-Scan Algorithms [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13370799
    Explore at:
    Dataset updated
    Aug 29, 2024
    Dataset provided by
    University of Sydney
    Authors
    Garske, Samuel; Mao, Yiwei
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary

    This dataset contains two hyperspectral and one multispectral anomaly detection images, and their corresponding binary pixel masks. They were initially used for real-time anomaly detection in line-scanning, but they can be used for any anomaly detection task.

    They are in .npy file format (will add tiff or geotiff variants in the future), with the image datasets being in the order of (height, width, channels). The SNP dataset was collected using sentinelhub, and the Synthetic dataset was collected from AVIRIS. The Python code used to analyse these datasets can be found at: https://github.com/WiseGamgee/HyperAD

    How to Get Started

    All that is needed to load these datasets is Python (preferably 3.8+) and the NumPy package. Example code for loading the Beach Dataset if you put it in a folder called "data" with the python script is:

    import numpy as np

    Load image file

    hsi_array = np.load("data/beach_hsi.npy") n_pixels, n_lines, n_bands = hsi_array.shape print(f"This dataset has {n_pixels} pixels, {n_lines} lines, and {n_bands}.")

    Load image mask

    mask_array = np.load("data/beach_mask.npy") m_pixels, m_lines = mask_array.shape print(f"The corresponding anomaly mask is {m_pixels} pixels by {m_lines} lines.")

    Citing the Datasets

    If you use any of these datasets, please cite the following paper:

    @article{garske2024erx, title={ERX - a Fast Real-Time Anomaly Detection Algorithm for Hyperspectral Line-Scanning}, author={Garske, Samuel and Evans, Bradley and Artlett, Christopher and Wong, KC}, journal={arXiv preprint arXiv:2408.14947}, year={2024},}

    If you use the beach dataset please cite the following paper as well (original source):

    @article{mao2022openhsi, title={OpenHSI: A complete open-source hyperspectral imaging solution for everyone}, author={Mao, Yiwei and Betters, Christopher H and Evans, Bradley and Artlett, Christopher P and Leon-Saval, Sergio G and Garske, Samuel and Cairns, Iver H and Cocks, Terry and Winter, Robert and Dell, Timothy}, journal={Remote Sensing}, volume={14}, number={9}, pages={2244}, year={2022}, publisher={MDPI} }

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Otávio da Fonseca Martins Gomes; Otávio da Fonseca Martins Gomes; Sidnei Paciornik; Sidnei Paciornik; Michel Pedro Filippo; Michel Pedro Filippo; Gilson Alexandre Ostwald Pedro da Costa; Gilson Alexandre Ostwald Pedro da Costa; Guilherme Lucio Abelha Mota; Guilherme Lucio Abelha Mota (2021). Cu dataset – A copper ore labeled images dataset for segmentation training and testing [Dataset]. http://doi.org/10.5281/zenodo.5020566
Organization logo

Cu dataset – A copper ore labeled images dataset for segmentation training and testing

Related Article
Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
zipAvailable download formats
Dataset updated
Jul 16, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Otávio da Fonseca Martins Gomes; Otávio da Fonseca Martins Gomes; Sidnei Paciornik; Sidnei Paciornik; Michel Pedro Filippo; Michel Pedro Filippo; Gilson Alexandre Ostwald Pedro da Costa; Gilson Alexandre Ostwald Pedro da Costa; Guilherme Lucio Abelha Mota; Guilherme Lucio Abelha Mota
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset is composed of 121 pairs of correlated images. Each pair contains one image of a copper ore sample acquired through reflected light microscopy (RGB, 24-bit), and the corresponding binary reference image (8-bit), in which the pixels are labeled as belonging to one of two classes: ore (0) or embedding resin (255).

The sample came from a copper ore from Yauri Cusco (Peru) with a complex mineralogy, mainly composed of sulfides, oxides, silicates, and native copper. It was classified by size. The fraction +74-100 μm was cold mounted with epoxy resin and subsequently ground and polished.

Correlative microscopy was employed for image acquisition. Thus, 121 fields were imaged on a reflected light microscope with a 20× (NA 0.40) objective lens and on a scanning electron microscope (SEM). In sequence, they were registered, resulting in images of 1017×753 pixels with a resolution of 0.53 µm/pixel. As matter of fact, some images (the images No. 2, 3, 24, 25, 46, 47, 69, 91, and 113) have slightly smaller sizes because they were cropped during the registration procedure to correct co-localization errors of the order of a few pixels. Finally, the images from SEM were thresholded to generate the reference images.

Further description of this sample and its imaging procedure can be found in the work by Gomes and Paciornik (2012).

This dataset was created for developing and testing deep learning models on semantic segmentation tasks. The paper of Filippo et al. (2021) presented a variant of the DeepLabv3+ model (Chen et al., 2018) that reached mean values of 90.56% and 92.12% for overall accuracy and F1 score, respectively, for 5 rounds of experiments (training and testing), each with a different, random initialization of network weights.

For further questions and suggestions, please do not hesitate to contact us.

Contact email: ogomes@gmail.com

If you use this dataset in your own work, please cite this DOI: 10.5281/zenodo.5020566

Please also cite this paper, which provides additional details about the dataset:

Michel Pedro Filippo, Otávio da Fonseca Martins Gomes, Gilson Alexandre Ostwald Pedro da Costa, Guilherme Lucio Abelha Mota. Deep learning semantic segmentation of opaque and non-opaque minerals from epoxy resin in reflected light microscopy images. Minerals Engineering, Volume 170, 2021, 107007, https://doi.org/10.1016/j.mineng.2021.107007.

Search
Clear search
Close search
Google apps
Main menu