1 dataset found
  1. H

    BOSQUE Test set

    • dataverse.harvard.edu
    Updated Jul 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alejandra Jaramillo Arboleda; Maria Juliana Sanchez Zapata; LILI JOHANA RUEDA JAIME; Andrés Morales-Forero; Samuel Bassetto (2025). BOSQUE Test set [Dataset]. http://doi.org/10.7910/DVN/AQEPIN
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 10, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Alejandra Jaramillo Arboleda; Maria Juliana Sanchez Zapata; LILI JOHANA RUEDA JAIME; Andrés Morales-Forero; Samuel Bassetto
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    BOSQUE Test Set: A Dermoscopic Image Dataset from Colombian Patients with Diverse Skin Phototypes Description: The BOSQUE Test Set is a curated dataset of 151 dermoscopic images of pigmented skin lesions, collected from dermatology consultations and outreach campaigns in Bogotá, Colombia. Each image is accompanied by expert-verified metadata including histological diagnosis, patient demographic details, anatomical site, and skin phototype. The dataset is intended to support machine learning research in dermatology with a particular focus on skin tone diversity and fairness in diagnostic algorithms. The dataset was developed under the guidance of Universidad El Bosque, whose name inspired the acronym BOSQUE. It responds to the global underrepresentation of darker skin phototypes in existing dermoscopic image collections such as HAM10000, and aims to improve diagnostic equity through inclusive data curation. Key Features 151 dermoscopic images acquired in real-world clinical settings Captured using polarized light dermatoscopes (DermLite 4 + iPhone) Inclusive population: Sex: 97 Female, 54 Male Age groups: from 0–29 to 90+, categorized into clinically relevant bins Fitzpatrick skin phototypes: ranging from II to VI Type II (fair, burns easily): 11 patients Type III (light brown, mild burns): 94 patients Type IV (moderate brown, rarely burns): 34 patients Type V (dark brown, very rarely burns): 7 patients Type VI (deeply pigmented, never burns): 5 patients Lesion characteristics: Nature: benign or malignant (histopathologically confirmed) Size: categorized as ≤5mm, 6–10mm, 11–20mm, >20mm Evolution time: grouped into <1y, 1y, 2y, 3–4y, 5–9y, and 10y+ categories Anatomical site: head/neck, trunk, limbs, or acral areas Histopathological diagnosis: 7-class ISIC-style labels (akiec, bcc, bkl, df, mel, nv, vasc) Clinical label: melanocytic vs. non-melanocytic (from clinical diagnosis) Clinical context: includes personal history of NMSC and use of photosensitizing drugs Image naming: pseudonymized file names encode diagnosis label and image ID Ethics: all data anonymized and collected under IRB-approved protocol in Colombia Included Files BOSQUE_test_set.zip: Folder containing 151 dermoscopic image files (JPG) BOSQUE_metadata.csv: Metadata for each image, including: Patient sex, age group, skin phototype Anatomical site of the lesion Lesion nature (benign/malignant) Lesion size and evolution time (binned) Histological diagnosis (7-class) Clinical label (melanocytic / non-melanocytic) Use Cases This dataset is intended for: Benchmarking AI models for dermoscopic image classification Fairness analysis across skin tones, sex, and age groups Medical education and clinical training on diverse skin phototypes Comparison against HAM10000 or ISIC datasets in research Ethical Statement All patients provided informed consent for the capture and use of clinical and dermoscopic images, the collection of relevant clinical metadata, and the performance of skin biopsies for diagnostic confirmation. The study protocol was reviewed and approved by the Institutional Ethics Committee at Subred Integrada de Servicios de Salud Norte E.S.E and Universidad El Bosque (Bogotá, Colombia). All data were anonymized in compliance with Colombian health data privacy regulations and international ethical standards (e.g., Declaration of Helsinki). No personally identifiable information is included in the metadata or image files. Access to data was restricted to authorized investigators, and patients were informed about the research and educational use of their anonymized data. Suggested Citation [Author(s)]. (2025). BOSQUE Test Set: A Dermoscopic Image Dataset from Colombian Patients with Diverse Skin Phototypes [Data set]. Harvard Dataverse. https://doi.org/xxxxx

  2. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Alejandra Jaramillo Arboleda; Maria Juliana Sanchez Zapata; LILI JOHANA RUEDA JAIME; Andrés Morales-Forero; Samuel Bassetto (2025). BOSQUE Test set [Dataset]. http://doi.org/10.7910/DVN/AQEPIN

BOSQUE Test set

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 10, 2025
Dataset provided by
Harvard Dataverse
Authors
Alejandra Jaramillo Arboleda; Maria Juliana Sanchez Zapata; LILI JOHANA RUEDA JAIME; Andrés Morales-Forero; Samuel Bassetto
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

BOSQUE Test Set: A Dermoscopic Image Dataset from Colombian Patients with Diverse Skin Phototypes Description: The BOSQUE Test Set is a curated dataset of 151 dermoscopic images of pigmented skin lesions, collected from dermatology consultations and outreach campaigns in Bogotá, Colombia. Each image is accompanied by expert-verified metadata including histological diagnosis, patient demographic details, anatomical site, and skin phototype. The dataset is intended to support machine learning research in dermatology with a particular focus on skin tone diversity and fairness in diagnostic algorithms. The dataset was developed under the guidance of Universidad El Bosque, whose name inspired the acronym BOSQUE. It responds to the global underrepresentation of darker skin phototypes in existing dermoscopic image collections such as HAM10000, and aims to improve diagnostic equity through inclusive data curation. Key Features 151 dermoscopic images acquired in real-world clinical settings Captured using polarized light dermatoscopes (DermLite 4 + iPhone) Inclusive population: Sex: 97 Female, 54 Male Age groups: from 0–29 to 90+, categorized into clinically relevant bins Fitzpatrick skin phototypes: ranging from II to VI Type II (fair, burns easily): 11 patients Type III (light brown, mild burns): 94 patients Type IV (moderate brown, rarely burns): 34 patients Type V (dark brown, very rarely burns): 7 patients Type VI (deeply pigmented, never burns): 5 patients Lesion characteristics: Nature: benign or malignant (histopathologically confirmed) Size: categorized as ≤5mm, 6–10mm, 11–20mm, >20mm Evolution time: grouped into <1y, 1y, 2y, 3–4y, 5–9y, and 10y+ categories Anatomical site: head/neck, trunk, limbs, or acral areas Histopathological diagnosis: 7-class ISIC-style labels (akiec, bcc, bkl, df, mel, nv, vasc) Clinical label: melanocytic vs. non-melanocytic (from clinical diagnosis) Clinical context: includes personal history of NMSC and use of photosensitizing drugs Image naming: pseudonymized file names encode diagnosis label and image ID Ethics: all data anonymized and collected under IRB-approved protocol in Colombia Included Files BOSQUE_test_set.zip: Folder containing 151 dermoscopic image files (JPG) BOSQUE_metadata.csv: Metadata for each image, including: Patient sex, age group, skin phototype Anatomical site of the lesion Lesion nature (benign/malignant) Lesion size and evolution time (binned) Histological diagnosis (7-class) Clinical label (melanocytic / non-melanocytic) Use Cases This dataset is intended for: Benchmarking AI models for dermoscopic image classification Fairness analysis across skin tones, sex, and age groups Medical education and clinical training on diverse skin phototypes Comparison against HAM10000 or ISIC datasets in research Ethical Statement All patients provided informed consent for the capture and use of clinical and dermoscopic images, the collection of relevant clinical metadata, and the performance of skin biopsies for diagnostic confirmation. The study protocol was reviewed and approved by the Institutional Ethics Committee at Subred Integrada de Servicios de Salud Norte E.S.E and Universidad El Bosque (Bogotá, Colombia). All data were anonymized in compliance with Colombian health data privacy regulations and international ethical standards (e.g., Declaration of Helsinki). No personally identifiable information is included in the metadata or image files. Access to data was restricted to authorized investigators, and patients were informed about the research and educational use of their anonymized data. Suggested Citation [Author(s)]. (2025). BOSQUE Test Set: A Dermoscopic Image Dataset from Colombian Patients with Diverse Skin Phototypes [Data set]. Harvard Dataverse. https://doi.org/xxxxx

Search
Clear search
Close search
Google apps
Main menu