64 datasets found
  1. h

    DL3DV-ALL-2K

    • huggingface.co
    Updated Mar 13, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DL3DV (2024). DL3DV-ALL-2K [Dataset]. https://huggingface.co/datasets/DL3DV/DL3DV-ALL-2K
    Explore at:
    Dataset updated
    Mar 13, 2024
    Dataset authored and provided by
    DL3DV
    Description

    DL3DV-Dataset

    This repo has all the 2K frames with camera poses of DL3DV-10K Dataset. We are working hard to review all the dataset to avoid sensitive information. Thank you for your patience.

      Download
    

    If you have enough space, you can use git to download a dataset from huggingface. See this link. 480P/960P versions should satisfies most needs. If you do not have enough space, we further provide a download script here to download a subset. The usage: usage:… See the full description on the dataset page: https://huggingface.co/datasets/DL3DV/DL3DV-ALL-2K.

  2. h

    fastmap_sfm

    • huggingface.co
    Updated May 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haochen Wang (2025). fastmap_sfm [Dataset]. https://huggingface.co/datasets/whc/fastmap_sfm
    Explore at:
    Dataset updated
    May 7, 2025
    Authors
    Haochen Wang
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Fastmap evaluation suite.

    You only need the databases to run fastmap. Download the images if you want to produce colored point cloud. Download the subset of data you want to your local directory. huggingface-cli download whc/fastmap_sfm --repo-type dataset --local-dir ./ --include 'databases/tnt_*' 'ground_truths/tnt_*'

    or use the python interface from huggingface_hub import hf_hub_download, snapshot_download snapshot_download( repo_id="whc/fastmap_sfm", repo_type='dataset'… See the full description on the dataset page: https://huggingface.co/datasets/whc/fastmap_sfm.

  3. h

    github-code

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CodeParrot, github-code [Dataset]. https://huggingface.co/datasets/codeparrot/github-code
    Explore at:
    Dataset provided by
    Good Engineering, Inc
    Authors
    CodeParrot
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    The GitHub Code dataest consists of 115M code files from GitHub in 32 programming languages with 60 extensions totalling in 1TB of text data. The dataset was created from the GitHub dataset on BiqQuery.

  4. h

    LAV-DF

    • huggingface.co
    Updated Jul 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ControlNet (2023). LAV-DF [Dataset]. https://huggingface.co/datasets/ControlNet/LAV-DF
    Explore at:
    Dataset updated
    Jul 11, 2023
    Authors
    ControlNet
    License

    https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/

    Description

    Localized Audio Visual DeepFake Dataset (LAV-DF)

    This repo is the dataset for the DICTA paper Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization (Best Award), and the journal paper "Glitch in the Matrix!": A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization submitted to CVIU.

      LAV-DF Dataset
    
    
    
    
    
    
    
      Download
    

    To use this LAV-DF dataset, you should… See the full description on the dataset page: https://huggingface.co/datasets/ControlNet/LAV-DF.

  5. h

    cloud-adapter-datasets

    • huggingface.co
    Updated Nov 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    XavierJiezou (2024). cloud-adapter-datasets [Dataset]. https://huggingface.co/datasets/XavierJiezou/cloud-adapter-datasets
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 23, 2024
    Authors
    XavierJiezou
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Cloud-Adapter-Datasets

    This dataset card aims to describe the datasets used in the Cloud-Adapter, a collection of high-resolution satellite images and semantic segmentation masks for cloud detection and related tasks.

      Install
    

    pip install huggingface-hub

      Usage
    

    Step 1: Download datasets

    huggingface-cli download --repo-type dataset XavierJiezou/cloud-adapter-datasets --local-dir data --include hrc_whu.zip huggingface-cli download --repo-type dataset… See the full description on the dataset page: https://huggingface.co/datasets/XavierJiezou/cloud-adapter-datasets.

  6. h

    the-stack-v2

    • huggingface.co
    Updated Mar 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BigCode (2024). the-stack-v2 [Dataset]. https://huggingface.co/datasets/bigcode/the-stack-v2
    Explore at:
    Dataset updated
    Mar 1, 2024
    Dataset authored and provided by
    BigCode
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    The Stack v2

    The dataset consists of 4 versions:

    bigcode/the-stack-v2: the full "The Stack v2" dataset <-- you are here bigcode/the-stack-v2-dedup: based on the bigcode/the-stack-v2 but further near-deduplicated bigcode/the-stack-v2-train-full-ids: based on the bigcode/the-stack-v2-dedup dataset but further filtered with heuristics and spanning 600+ programming languages. The data is grouped into repositories.bigcode/the-stack-v2-train-smol-ids: based on the… See the full description on the dataset page: https://huggingface.co/datasets/bigcode/the-stack-v2.

  7. h

    VLATrainingDataset

    • huggingface.co
    Updated May 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    zhou wei (2025). VLATrainingDataset [Dataset]. https://huggingface.co/datasets/WeiChow/VLATrainingDataset
    Explore at:
    Dataset updated
    May 31, 2025
    Authors
    zhou wei
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Open X-Embodiment Dataset (unofficial)

    RLDS dataset for train vla

      use this dataset
    

    download the dataset by hf: (

      prepare by yourself
    

    The code modified from rlds_dataset_mod We upload the precessed dataset in this repository ❤… See the full description on the dataset page: https://huggingface.co/datasets/WeiChow/VLATrainingDataset.

  8. h

    tiny-shakespeare

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Trelis, tiny-shakespeare [Dataset]. https://huggingface.co/datasets/Trelis/tiny-shakespeare
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    Trelis
    Description

    Data source

    Downloaded via Andrej Karpathy's nanogpt repo from this link

      Data Format
    

    The entire dataset is split into train (90%) and test (10%). All rows are at most 1024 tokens, using the Llama 2 tokenizer. All rows are split cleanly so that sentences are whole and unbroken.

  9. h

    indonesian-youtube

    • huggingface.co
    Updated Apr 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Malaysia AI (2024). indonesian-youtube [Dataset]. https://huggingface.co/datasets/malaysia-ai/indonesian-youtube
    Explore at:
    Dataset updated
    Apr 18, 2024
    Dataset authored and provided by
    Malaysia AI
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Area covered
    YouTube
    Description

    Indonesian Youtube

    Source code at https://github.com/mesolitica/malaysian-dataset/tree/master/speech/indonesian-youtube

      how to download
    

    huggingface-cli download --repo-type dataset
    --include '*.z*'
    --local-dir './'
    malaysia-ai/indonesian-youtube

    wget https://www.7-zip.org/a/7z2301-linux-x64.tar.xz tar -xf 7z2301-linux-x64.tar.xz ~/7zz x mp3-16k.zip -y -mmt40

      Licensing
    

    All the videos, songs, images, and graphics used in the video belong to their… See the full description on the dataset page: https://huggingface.co/datasets/malaysia-ai/indonesian-youtube.

  10. h

    crag-mm-image-search-images

    • huggingface.co
    Updated Apr 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daoyu Wang (2025). crag-mm-image-search-images [Dataset]. https://huggingface.co/datasets/Melmaphother/crag-mm-image-search-images
    Explore at:
    Dataset updated
    Apr 26, 2025
    Authors
    Daoyu Wang
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Download script to avoid the rate limit:

    !/bin/bash

    下载命令

    COMMAND="huggingface-cli download --repo-type dataset Melmaphother/crag-mm-image-search-images --local-dir crag-mm-image-search-images"

    Loop until the command is executed successfully

    while true; do echo "Attempting to download/resume: $COMMAND" # Execute download command $COMMAND

    EXIT_STATUS=$?
    
    if [ $EXIT_STATUS -eq 0 ]; then
      echo"Download completed successfully."
      break 
    else
      echo… See the full description on the dataset page: https://huggingface.co/datasets/Melmaphother/crag-mm-image-search-images.
    
  11. h

    HealthyCT

    • huggingface.co
    Updated Mar 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qi Chen (2024). HealthyCT [Dataset]. https://huggingface.co/datasets/qicq1c/HealthyCT
    Explore at:
    Dataset updated
    Mar 28, 2024
    Authors
    Qi Chen
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Summary

    Healthy CT data for abdominal organs (liver, pancreas and kidney) are filtered out from public dataset.

      Downloading Instructions
    
    
    
    
    
      1- Install the Hugging Face library:
    

    pip install -U "huggingface_hub[cli]"

      2- Download the dataset:
    

    mkdir HealthyCT cd HealthyCT huggingface-cli download qicq1c/HealthyCT --repo-type dataset --local-dir . --cache-dir ./cache

    [Optional] Resume downloading

    In case you had a previous interrupted download… See the full description on the dataset page: https://huggingface.co/datasets/qicq1c/HealthyCT.

  12. h

    IQA-PyTorch-Datasets

    • huggingface.co
    Updated Feb 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chaofeng Chen (2024). IQA-PyTorch-Datasets [Dataset]. https://huggingface.co/datasets/chaofengc/IQA-PyTorch-Datasets
    Explore at:
    Dataset updated
    Feb 18, 2024
    Authors
    Chaofeng Chen
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Description

    This is the dataset repository used in the pyiqa toolbox. Please refer to Awesome Image Quality Assessment for details of each dataset Example commandline script with huggingface-cli: huggingface-cli download chaofengc/IQA-PyTorch-Datasets live.tgz --local-dir ./datasets --repo-type dataset cd datasets tar -xzvf live.tgz

      Disclaimer for This Dataset Collection
    

    This collection of datasets is compiled and maintained for academic, research, and educational… See the full description on the dataset page: https://huggingface.co/datasets/chaofengc/IQA-PyTorch-Datasets.

  13. h

    tamil-youtube

    • huggingface.co
    Updated Dec 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Malaysia AI (2024). tamil-youtube [Dataset]. https://huggingface.co/datasets/malaysia-ai/tamil-youtube
    Explore at:
    Dataset updated
    Dec 21, 2024
    Dataset authored and provided by
    Malaysia AI
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Area covered
    YouTube
    Description

    Tamil Youtube

    Selected channels from https://www.youtube.com using 'tamil podcast' keyword. With total 121347 audio files, total 11292.83 hours.

      how to download
    

    huggingface-cli download --repo-type dataset
    --include '*.z*'
    --local-dir './'
    malaysia-ai/tamil-youtube

    https://gist.githubusercontent.com/huseinzol05/2e26de4f3b29d99e993b349864ab6c10/raw/9b2251f3ff958770215d70c8d82d311f82791b78/unzip.py python3 unzip.py

      Licensing
    

    All the videos, songs, images… See the full description on the dataset page: https://huggingface.co/datasets/malaysia-ai/tamil-youtube.

  14. h

    OCRMT30K-refine

    • huggingface.co
    Updated Mar 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jingheng Pan (2025). OCRMT30K-refine [Dataset]. https://huggingface.co/datasets/p1k0/OCRMT30K-refine
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 1, 2025
    Authors
    Jingheng Pan
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    下载数据集使用: huggingface-cli download --repo-type dataset --resume-download p1k0/OCRMT30K-refine --local-dir OCRMT30K-refine original_data:原始标注 whole_image_v2.zip: 图片文件

  15. h

    sbsfigures_imgs

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kwangrok Ryoo, sbsfigures_imgs [Dataset]. https://huggingface.co/datasets/Ryoo72/sbsfigures_imgs
    Explore at:
    Authors
    Kwangrok Ryoo
    Description

    This repository is a collection of images from sbsfgures.

      How to use ths repo.
    
    1. Download huggingface-cli download Ryoo72/sbsfigures_imgs --repo-type dataset

    2. Unzip cat partial-imgs* > imgs.tar.gz tar -zxvf imgs.tar.gz

    3. UseUse it with the following datasets.

    Ryoo72/sbsfigures_qa Ryoo72/sbsfigures_extract

      How did I upload this repo.
    
    1. Split split -b 20G -d --suffix-length=2 imgs.tar.gz partial-imgs.

    2. Upload from huggingface_hub import HfApi import glob… See the full description on the dataset page: https://huggingface.co/datasets/Ryoo72/sbsfigures_imgs.

  16. h

    audiocaps

    • huggingface.co
    • opendatalab.com
    Updated Jul 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dmitry Balobin (2025). audiocaps [Dataset]. https://huggingface.co/datasets/d0rj/audiocaps
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 1, 2025
    Authors
    Dmitry Balobin
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    audiocaps

    HuggingFace mirror of official data repo.

  17. h

    TSpec-LLM

    • huggingface.co
    Updated Jun 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rasoul (2024). TSpec-LLM [Dataset]. https://huggingface.co/datasets/rasoul-nikbakht/TSpec-LLM
    Explore at:
    Dataset updated
    Jun 1, 2024
    Authors
    Rasoul
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Model Card for the TSpec-LLM Dataset

    Demo:

      Dataset Description
    
    
    
    
    
      Abstract
    

    This dataset contains processed documentation files from the 3GPP (3rd Generation Partnership Project) standards, converted to markdown and docx formats. It is intended for use in telecommunications research, natural language processing, and machine learning applications, particularly those focusing on telecommunications standards and technologies.

      🚀 Dataset Update: Now Up-to-Date… See the full description on the dataset page: https://huggingface.co/datasets/rasoul-nikbakht/TSpec-LLM.
    
  18. h

    world_model_tokenized_data

    • huggingface.co
    Updated Jun 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    1X (2024). world_model_tokenized_data [Dataset]. https://huggingface.co/datasets/1x-technologies/world_model_tokenized_data
    Explore at:
    Dataset updated
    Jun 20, 2024
    Dataset authored and provided by
    1X
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    World
    Description

    1X World Model Compression Challenge Dataset

    This repository hosts the dataset for the 1X World Model Compression Challenge. huggingface-cli download 1x-technologies/worldmodel --repo-type dataset --local-dir data

      Updates Since v1.1
    

    Train/Val v2.0 (~100 hours), replacing v1.1 Test v2.0 dataset for the Compression Challenge Faces blurred for privacy New raw video dataset (CC-BY-NC-SA 4.0) at worldmodel_raw_data Example scripts now split into: cosmos_video_decoder.py —… See the full description on the dataset page: https://huggingface.co/datasets/1x-technologies/world_model_tokenized_data.

  19. h

    farfetch_singapore_images

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thanh Hau Nguyen, farfetch_singapore_images [Dataset]. https://huggingface.co/datasets/thanhhau097/farfetch_singapore_images
    Explore at:
    Authors
    Thanh Hau Nguyen
    License

    https://choosealicense.com/licenses/bsl-1.0/https://choosealicense.com/licenses/bsl-1.0/

    Area covered
    Singapore
    Description

    pip install diffusers transformers para-attn numpy pandas hf_transfer

    HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download thanhhau097/farfetch_singapore_images --repo-type dataset --local-dir .

    cat farfetch_singapore_images.zip.* > farfetch_singapore_images.zip unzip -qq farfetch_singapore_images.zip

    unzip -qq farfetch_masks_and_denseposes.zip

    rm .zip

    pip install sentencepiece HF_TOKEN= python create_farfetch_mask_free_data.py --k 1 --gpu_id 0 --root_folder ./

  20. h

    world_model_raw_data

    • huggingface.co
    Updated Nov 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    1X (2024). world_model_raw_data [Dataset]. https://huggingface.co/datasets/1x-technologies/world_model_raw_data
    Explore at:
    Dataset updated
    Nov 6, 2024
    Dataset authored and provided by
    1X
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Area covered
    World
    Description

    Raw Dataset for the 1X World Model Sammpling Challenge. Download with: huggingface-cli download 1x-technologies/worldmodel_raw_data --repo-type dataset --local-dir data

      Train/Val v2.0
    

    The training dataset is shareded into 100 independent shards. The definitions are as follows:

    video_{shard}.mp4: Raw video with a resolution of 512x512. segment_idx_{shard}.bin - Maps each frame i to its corresponding segment index. You may want to use this to separate non-contiguous frames from… See the full description on the dataset page: https://huggingface.co/datasets/1x-technologies/world_model_raw_data.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
DL3DV (2024). DL3DV-ALL-2K [Dataset]. https://huggingface.co/datasets/DL3DV/DL3DV-ALL-2K

DL3DV-ALL-2K

Dl3DV-Dataset

DL3DV/DL3DV-ALL-2K

Explore at:
Dataset updated
Mar 13, 2024
Dataset authored and provided by
DL3DV
Description

DL3DV-Dataset

This repo has all the 2K frames with camera poses of DL3DV-10K Dataset. We are working hard to review all the dataset to avoid sensitive information. Thank you for your patience.

  Download

If you have enough space, you can use git to download a dataset from huggingface. See this link. 480P/960P versions should satisfies most needs. If you do not have enough space, we further provide a download script here to download a subset. The usage: usage:… See the full description on the dataset page: https://huggingface.co/datasets/DL3DV/DL3DV-ALL-2K.

Search
Clear search
Close search
Google apps
Main menu