2 datasets found
  1. P

    DataSeeds.AI-Sample-Dataset-DSD Dataset

    • paperswithcode.com
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). DataSeeds.AI-Sample-Dataset-DSD Dataset [Dataset]. https://paperswithcode.com/dataset/dataseeds-ai-sample-dataset-dsd
    Explore at:
    Dataset updated
    Jun 5, 2025
    Description

    Dataset Summary The DataSeeds.AI Sample Dataset (DSD) is a high-fidelity, human-curated computer vision-ready dataset comprised of 7,772 peer-ranked, fully annotated photographic images, 350,000+ words of descriptive text, and comprehensive metadata. While the DSD is being released under an open source license, a sister dataset of over 10,000 fully annotated and segmented images is available for immediate commercial licensing, and the broader GuruShots ecosystem contains over 100 million images in its catalog.

    Each image includes multi-tier human annotations and semantic segmentation masks. Generously contributed to the community by the GuruShots photography platform, where users engage in themed competitions, the DSD uniquely captures aesthetic preference signals and high-quality technical metadata (EXIF) across an expansive diversity of photographic styles, camera types, and subject matter. The dataset is optimized for fine-tuning and evaluating multimodal vision-language models, especially in scene description and stylistic comprehension tasks.

    Technical Report - Peer-Ranked Precision: Creating a Foundational Dataset for Fine-Tuning Vision Models from DataSeeds' Annotated Imagery Github Repo - Access the complete weights and code which were used to evaluate the DSD -- https://github.com/DataSeeds-ai/DSD-finetune-blip-llava This dataset is ready for commercial/non-commercial use. Dataset Structure Size: 7,772 images (7,010 train, 762 validation) Format: Apache Parquet files for metadata, with images in JPG format Total Size: ~4.1GB Languages: English (annotations) Annotation Quality: All annotations were verified through a multi-tier human-in-the-loop process

  2. h

    DataSeeds.AI-Sample-Dataset-DSD

    • huggingface.co
    Updated Jun 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataseeds AI (2025). DataSeeds.AI-Sample-Dataset-DSD [Dataset]. https://huggingface.co/datasets/Dataseeds/DataSeeds.AI-Sample-Dataset-DSD
    Explore at:
    Dataset updated
    Jun 10, 2025
    Dataset authored and provided by
    Dataseeds AI
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    DataSeeds.AI Sample Dataset (DSD)

      Dataset Summary
    

    The DataSeeds.AI Sample Dataset (DSD) is a high-fidelity, human-curated computer vision-ready dataset comprised of 7,772 peer-ranked, fully annotated photographic images, 350,000+ words of descriptive text, and comprehensive metadata. While the DSD is being released under an open source license, a sister dataset of over 10,000 fully annotated and segmented images is available for immediate commercial licensing, and the… See the full description on the dataset page: https://huggingface.co/datasets/Dataseeds/DataSeeds.AI-Sample-Dataset-DSD.

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2025). DataSeeds.AI-Sample-Dataset-DSD Dataset [Dataset]. https://paperswithcode.com/dataset/dataseeds-ai-sample-dataset-dsd

DataSeeds.AI-Sample-Dataset-DSD Dataset

Explore at:
Dataset updated
Jun 5, 2025
Description

Dataset Summary The DataSeeds.AI Sample Dataset (DSD) is a high-fidelity, human-curated computer vision-ready dataset comprised of 7,772 peer-ranked, fully annotated photographic images, 350,000+ words of descriptive text, and comprehensive metadata. While the DSD is being released under an open source license, a sister dataset of over 10,000 fully annotated and segmented images is available for immediate commercial licensing, and the broader GuruShots ecosystem contains over 100 million images in its catalog.

Each image includes multi-tier human annotations and semantic segmentation masks. Generously contributed to the community by the GuruShots photography platform, where users engage in themed competitions, the DSD uniquely captures aesthetic preference signals and high-quality technical metadata (EXIF) across an expansive diversity of photographic styles, camera types, and subject matter. The dataset is optimized for fine-tuning and evaluating multimodal vision-language models, especially in scene description and stylistic comprehension tasks.

Technical Report - Peer-Ranked Precision: Creating a Foundational Dataset for Fine-Tuning Vision Models from DataSeeds' Annotated Imagery Github Repo - Access the complete weights and code which were used to evaluate the DSD -- https://github.com/DataSeeds-ai/DSD-finetune-blip-llava This dataset is ready for commercial/non-commercial use. Dataset Structure Size: 7,772 images (7,010 train, 762 validation) Format: Apache Parquet files for metadata, with images in JPG format Total Size: ~4.1GB Languages: English (annotations) Annotation Quality: All annotations were verified through a multi-tier human-in-the-loop process

Search
Clear search
Close search
Google apps
Main menu