23 datasets found
  1. h

    webui-test

    • huggingface.co
    Updated Nov 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Big Lab (2024). webui-test [Dataset]. https://huggingface.co/datasets/biglab/webui-test
    Explore at:
    Dataset updated
    Nov 1, 2024
    Dataset authored and provided by
    Big Lab
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    This data accompanies the WebUI project (https://dl.acm.org/doi/abs/10.1145/3544548.3581158) For more information, check out the project website: https://uimodeling.github.io/ To download this dataset, you need to install the huggingface-hub package pip install huggingface-hub

    Use snapshot_download from huggingface_hub import snapshot_download snapshot_download(repo_id="biglab/webui-test", repo_type="dataset")

    IMPORTANT

    Before downloading and using, please review the copyright info here:… See the full description on the dataset page: https://huggingface.co/datasets/biglab/webui-test.

  2. h

    MaternKernel_compositionality

    • huggingface.co
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seok Hoan (Kevin) Choi (2025). MaternKernel_compositionality [Dataset]. https://huggingface.co/datasets/shc443/MaternKernel_compositionality
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 5, 2025
    Authors
    Seok Hoan (Kevin) Choi
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    You can load the dataset as follows: from huggingface_hub import snapshot_download

    snapshot_download(repo_id="shc443/MaternKernel_compositionality", repo_type="dataset")

    For more information regarding data generating process, please refer to our paper or github page

  3. h

    fastmap_sfm

    • huggingface.co
    Updated May 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haochen Wang (2025). fastmap_sfm [Dataset]. https://huggingface.co/datasets/whc/fastmap_sfm
    Explore at:
    Dataset updated
    May 7, 2025
    Authors
    Haochen Wang
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Fastmap evaluation suite.

    You only need the databases to run fastmap. Download the images if you want to produce colored point cloud. Download the subset of data you want to your local directory. huggingface-cli download whc/fastmap_sfm --repo-type dataset --local-dir ./ --include 'databases/tnt_*' 'ground_truths/tnt_*'

    or use the python interface from huggingface_hub import hf_hub_download, snapshot_download snapshot_download( repo_id="whc/fastmap_sfm", repo_type='dataset'… See the full description on the dataset page: https://huggingface.co/datasets/whc/fastmap_sfm.

  4. h

    SoccerNet-BallActionSpotting-Videos

    • huggingface.co
    Updated Nov 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenSportsLab (2024). SoccerNet-BallActionSpotting-Videos [Dataset]. https://huggingface.co/datasets/OpenSportsLab/SoccerNet-BallActionSpotting-Videos
    Explore at:
    Dataset updated
    Nov 20, 2024
    Dataset authored and provided by
    OpenSportsLab
    License

    https://choosealicense.com/licenses/gpl-3.0/https://choosealicense.com/licenses/gpl-3.0/

    Description

    Download the SoccerNet Ball Action Spotting dataset in the OSL Action Spotting JSON format

    from huggingface_hub import snapshot_download snapshot_download(repo_id="OpenSportsLab/SoccerNet-BallActionSpotting-Videos", repo_type="dataset", revision="main", local_dir="SoccerNet-BallActionSpotting-Videos")

      Download specific subsets
    
    
    
    
    
      Download 224p/720p versions
    

    from huggingface_hub import snapshot_download

    Download the 224p… See the full description on the dataset page: https://huggingface.co/datasets/OpenSportsLab/SoccerNet-BallActionSpotting-Videos.

  5. h

    job_embedding

    • huggingface.co
    Updated Jun 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pulsifi Pte. Ltd. (2024). job_embedding [Dataset]. https://huggingface.co/datasets/Pulsifi/job_embedding
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 21, 2024
    Dataset authored and provided by
    Pulsifi Pte. Ltd.
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Steps to download the data

    Installing necessary library

    poetry add huggingface_hub

    Using the library to download the data into desired directory

    from huggingface_hub import snapshot_download path = "./data" # path you want to store your data snapshot_download("Pulsifi/job_embedding", repo_type="dataset", local_dir=path)

  6. h

    MIDI-Images

    • huggingface.co
    Updated Sep 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alex (2024). MIDI-Images [Dataset]. https://huggingface.co/datasets/asigalov61/MIDI-Images
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 1, 2024
    Authors
    Alex
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    A dataset of MIDI images designed for use with diffusion models for music generation, music classification, text-to-music and other purposes

      🤗 Check out Imagen MIDI Images LIVE demo on Hugging Face Spaces 🤗
    
    
    
    
    
    
    
    
      Installation
    

    from huggingface_hub import snapshot_download

    repo_id = "asigalov61/MIDI-Images" repo_type = 'dataset'

    local_dir = "./MIDI-Images"

    snapshot_download(repo_id, repo_type=repo_type, local_dir=local_dir)

      MIDI Images… See the full description on the dataset page: https://huggingface.co/datasets/asigalov61/MIDI-Images.
    
  7. h

    bird-dataset-vector

    • huggingface.co
    Updated Mar 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HDR Imageomics Institute (2025). bird-dataset-vector [Dataset]. https://huggingface.co/datasets/imageomics/bird-dataset-vector
    Explore at:
    Dataset updated
    Mar 30, 2025
    Dataset authored and provided by
    HDR Imageomics Institute
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Vector Database

    This is a chromadb based vector database containing indexed embeddings from the hugging face dataset: "Somnath01/Birds_Species".

      Usage
    

    import chromadb from huggingface_hub import snapshot_download

    vector_db_path = snapshot_download( repo_id=vector_dataset.hf_dataset_path, repo_type="dataset" )

    client = chromadb.PersistentClient( path=os.path.join(vector_db_path… See the full description on the dataset page: https://huggingface.co/datasets/imageomics/bird-dataset-vector.

  8. h

    BLIP3o-Pretrain-JourneyDB

    • huggingface.co
    Updated May 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BLIP3o (2025). BLIP3o-Pretrain-JourneyDB [Dataset]. https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-JourneyDB
    Explore at:
    Dataset updated
    May 27, 2025
    Dataset authored and provided by
    BLIP3o
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    BLIP3o Pretrain JourneyDB Dataset

    This collection contains 4 million JourneyDB images.

      Download
    

    from huggingface_hub import snapshot_download

    snapshot_download( repo_id="BLIP3o/BLIP3o-Pretrain-JourneyDB", repo_type="dataset" )

      Load Dataset without Extracting
    

    You don’t need to unpack the .tar archives, use WebDataset support in 🤗datasets instead: from datasets import load_dataset import glob

    data_files = glob.glob("/your/data/path/*.tar")… See the full description on the dataset page: https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-JourneyDB.

  9. h

    SOS-SFC-200K

    • huggingface.co
    Updated May 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Weikai Huang (2025). SOS-SFC-200K [Dataset]. https://huggingface.co/datasets/weikaih/SOS-SFC-200K
    Explore at:
    Dataset updated
    May 23, 2025
    Authors
    Weikai Huang
    Description

    SOS-SFC-200K Dataset Splits

    These are the dataset splits for the paper SOS: Synthetic Object Segments Improve Detection, Segmentation, and Grounding. The SOS-SFC-200K collection contains 200k samples, each with:

    Segmentation masks
    Bounding boxes

    All annotations are provided in COCO format.

      Download & Extraction
    

    Clone or download the entire repository.

    from huggingface_hub import snapshot_download

    snapshot_download( repo_id="weikaih/SOS-SFC-200K"… See the full description on the dataset page: https://huggingface.co/datasets/weikaih/SOS-SFC-200K.

  10. h

    SOS-FC-1M

    • huggingface.co
    Updated May 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Weikai Huang (2025). SOS-FC-1M [Dataset]. https://huggingface.co/datasets/weikaih/SOS-FC-1M
    Explore at:
    Dataset updated
    May 23, 2025
    Authors
    Weikai Huang
    Description

    SOS-FC-1M Dataset Splits

    These are the dataset splits for the paper SOS: Synthetic Object Segments Improve Detection, Segmentation, and Grounding. The SOS-FC-1M collection contains one million samples, each with:

    Segmentation masks
    Bounding boxes
    Referring expressions

    All annotations are provided in COCO format.

      Download & Extraction
    

    Clone or download the entire repository.

    from huggingface_hub import snapshot_download

    snapshot_download(… See the full description on the dataset page: https://huggingface.co/datasets/weikaih/SOS-FC-1M.

  11. h

    disease

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aashi tuli, disease [Dataset]. https://huggingface.co/datasets/aashituli/disease
    Explore at:
    Authors
    Aashi tuli
    Description

    from huggingface_hub import snapshot_download snapshot_download( repo_id="wambugu71/crop_leaf_diseases_vit", local_dir="crop_leaf_model", local_dir_use_symlinks=False )

  12. h

    BLIP3o-Pretrain-Long-Caption

    • huggingface.co
    Updated May 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BLIP3o (2025). BLIP3o-Pretrain-Long-Caption [Dataset]. https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption
    Explore at:
    Dataset updated
    May 17, 2025
    Dataset authored and provided by
    BLIP3o
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    BLIP3o Pretrain Long-Caption Dataset

    This collection contains 27 million images, each paired with a long (~120 token) caption generated by Qwen/Qwen2.5-VL-7B-Instruct.

      Download
    

    from huggingface_hub import snapshot_download

    snapshot_download( repo_id="BLIP3o/BLIP3o-Pretrain-Long-Caption", repo_type="dataset" )

      Load Dataset without Extracting
    

    You don’t need to unpack the .tar archives, use WebDataset support in 🤗datasets instead: from datasets import… See the full description on the dataset page: https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption.

  13. h

    ACDC

    • huggingface.co
    Updated Jun 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ACDC [Dataset]. https://huggingface.co/datasets/mathpluscode/ACDC
    Explore at:
    Dataset updated
    Jun 3, 2025
    Authors
    Yunguan Fu
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    ACDC Dataset

    This is a pre-processed version of the original Automated Cardiac Diagnosis Challenge (ACDC) dataset. Short-axis view images have been resampled to 1mm x 1mm x 10mm. Images have also been center cropped at the left ventricle mask center. The size of each slice is 192 x 192 pixels. Download the dataset using the following command: from huggingface_hub import snapshot_download data_dir = snapshot_download(repo_id="mathpluscode/ACDC", allow_patterns=["*.nii.gz", "*.csv"]… See the full description on the dataset page: https://huggingface.co/datasets/mathpluscode/ACDC.

  14. h

    SOS-FC-Object-Segments-10M

    • huggingface.co
    Updated May 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Weikai Huang (2025). SOS-FC-Object-Segments-10M [Dataset]. https://huggingface.co/datasets/weikaih/SOS-FC-Object-Segments-10M
    Explore at:
    Dataset updated
    May 23, 2025
    Authors
    Weikai Huang
    Description

    SOS-FC-Object-Segments-10M

    These are the dataset splits for the paper SOS: Synthetic Object Segments Improve Detection, Segmentation, and Grounding. This dataset contains over 10M object segments in Frequeny-Category (FC) splits.

      Download & Extraction
    

    Clone or download the entire repository.

    from huggingface_hub import snapshot_download

    snapshot_download( repo_id="weikaih/SOS-FC-Object-Segments-10M", repo_type="dataset"… See the full description on the dataset page: https://huggingface.co/datasets/weikaih/SOS-FC-Object-Segments-10M.

  15. h

    Matter-0.1

    • huggingface.co
    Updated Mar 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ram (2024). Matter-0.1 [Dataset]. https://huggingface.co/datasets/0-hero/Matter-0.1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 24, 2024
    Authors
    Ram
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Matter 0.1

    Curated top quality records from 35 other datasets. Extracted from prompt-perfect This is just a consolidation of all the score 5s. Fine-tuning models with various subsets and combinations to create a best performing v1 dataset

      ~1.4B Tokens, ~2.5M records
    

    Dataset has been deduped, decontaminated with bagel script from Jon Durbin Download using the below command to avoid unecessary files from huggingface_hub import snapshot_download… See the full description on the dataset page: https://huggingface.co/datasets/0-hero/Matter-0.1.

  16. h

    TunSwitch

    • huggingface.co
    Updated May 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tunisia.AI (2024). TunSwitch [Dataset]. https://huggingface.co/datasets/tunis-ai/TunSwitch
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 4, 2024
    Dataset provided by
    Tunisia.AI
    Description

    Original dataset has been acquired through the following link : https://zenodo.org/records/8370566 The dataset is not cleaned yet and any contributions are welcome 🤗

      download instructions
    

    from huggingface_hub import snapshot_download snapshot_download(repo_id="tunis-ai/TunSwitch",repo_type="dataset",local_dir=".")

      Information
    

    This repo contains the data used to develop and test the Tunisian Arabic Automatic Speech Recognition model developed in the following paper : A.… See the full description on the dataset page: https://huggingface.co/datasets/tunis-ai/TunSwitch.

  17. h

    BLIP3o-60k

    • huggingface.co
    Updated May 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BLIP3o (2025). BLIP3o-60k [Dataset]. https://huggingface.co/datasets/BLIP3o/BLIP3o-60k
    Explore at:
    Dataset updated
    May 13, 2025
    Dataset authored and provided by
    BLIP3o
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This is BLIP3o-60k Text-to-Image instruction tuning dataset distilled from GPT-4o, including the following categories:

    JourneyDB Human (including MSCOCO with human caption, human gestures, occupations) Dalle3 Geneval (no overlap with test set) Common objects Simple text

    Here we provide the code guidance to download tar file: from huggingface_hub import snapshot_download snapshot_download(repo_id='BLIP3o/BLIP3o-60k', repo_type=‘dataset’)

    And you can use huggingface datasets to read the tar… See the full description on the dataset page: https://huggingface.co/datasets/BLIP3o/BLIP3o-60k.

  18. h

    CLEVR-BT-DB

    • huggingface.co
    Updated Sep 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrey Borevsky (2023). CLEVR-BT-DB [Dataset]. https://huggingface.co/datasets/Aborevsky01/CLEVR-BT-DB
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 15, 2023
    Authors
    Andrey Borevsky
    Description

    How to install?

    !pip install datasets -q from huggingface_hub import snapshot_download import pandas as pd import matplotlib.pyplot as plt

    First step: download an entire datatset

    snapshot_download(repo_id="Aborevsky01/CLEVR-BT-DB", repo_type="dataset", local_dir='path-to-your-local-dir')

    Second step: unarchive the images for VQA

    !unzip [path-to-your-local-dir]/[type-of-task]/images.zip

    Example of the triplet (image - question -… See the full description on the dataset page: https://huggingface.co/datasets/Aborevsky01/CLEVR-BT-DB.

  19. h

    GenRef-CoT

    • huggingface.co
    Updated Apr 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Diffusion CoT (2025). GenRef-CoT [Dataset]. https://huggingface.co/datasets/diffusion-cot/GenRef-CoT
    Explore at:
    Dataset updated
    Apr 23, 2025
    Dataset authored and provided by
    Diffusion CoT
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    GenRef-CoT

    We provide 227K high-quality CoT reflections which were used to train our Qwen-based reflection generation model in ReflectionFlow [1]. To know the details of the dataset creation pipeline, please refer to Section 3.2 of [1].

      Dataset loading
    

    We provide the dataset in the webdataset format for fast dataloading and streaming. We recommend downloading the repository locally for faster I/O: from huggingface_hub import snapshot_download

    local_dir =… See the full description on the dataset page: https://huggingface.co/datasets/diffusion-cot/GenRef-CoT.

  20. h

    Annotated-MIDI-Dataset

    • huggingface.co
    Updated Mar 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alex (2025). Annotated-MIDI-Dataset [Dataset]. https://huggingface.co/datasets/asigalov61/Annotated-MIDI-Dataset
    Explore at:
    Dataset updated
    Mar 27, 2025
    Authors
    Alex
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Annotated MIDI Dataset

      Comprehensive annotated MIDI dataset with original lyrics, lyrics summaries, lyrics sentiments, music descriptions, illustrations, pre-trained MIDI classification model and helper Python code
    
    
    
    
    
    
    
    
    
      Annotated MIDI Dataset LIVE demos
    

    Music Sentence Transformer Advanced MIDI Classifer Descriptive Music Transformer

      Installation
    

    from huggingface_hub import snapshot_download

    repo_id = "asigalov61/Annotated-MIDI-Dataset"… See the full description on the dataset page: https://huggingface.co/datasets/asigalov61/Annotated-MIDI-Dataset.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Big Lab (2024). webui-test [Dataset]. https://huggingface.co/datasets/biglab/webui-test

webui-test

biglab/webui-test

Explore at:
12 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Nov 1, 2024
Dataset authored and provided by
Big Lab
License

https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

Description

This data accompanies the WebUI project (https://dl.acm.org/doi/abs/10.1145/3544548.3581158) For more information, check out the project website: https://uimodeling.github.io/ To download this dataset, you need to install the huggingface-hub package pip install huggingface-hub

Use snapshot_download from huggingface_hub import snapshot_download snapshot_download(repo_id="biglab/webui-test", repo_type="dataset")

IMPORTANT

Before downloading and using, please review the copyright info here:… See the full description on the dataset page: https://huggingface.co/datasets/biglab/webui-test.

Search
Clear search
Close search
Google apps
Main menu