DL3DV-Dataset
This repo has all the 2K frames with camera poses of DL3DV-10K Dataset. We are working hard to review all the dataset to avoid sensitive information. Thank you for your patience.
Download
If you have enough space, you can use git to download a dataset from huggingface. See this link. 480P/960P versions should satisfies most needs. If you do not have enough space, we further provide a download script here to download a subset. The usage: usage:… See the full description on the dataset page: https://huggingface.co/datasets/DL3DV/DL3DV-ALL-2K.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Fastmap evaluation suite.
You only need the databases to run fastmap. Download the images if you want to produce colored point cloud. Download the subset of data you want to your local directory. huggingface-cli download whc/fastmap_sfm --repo-type dataset --local-dir ./ --include 'databases/tnt_*' 'ground_truths/tnt_*'
or use the python interface from huggingface_hub import hf_hub_download, snapshot_download snapshot_download( repo_id="whc/fastmap_sfm", repo_type='dataset'… See the full description on the dataset page: https://huggingface.co/datasets/whc/fastmap_sfm.
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
The GitHub Code dataest consists of 115M code files from GitHub in 32 programming languages with 60 extensions totalling in 1TB of text data. The dataset was created from the GitHub dataset on BiqQuery.
https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/
Localized Audio Visual DeepFake Dataset (LAV-DF)
This repo is the dataset for the DICTA paper Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization (Best Award), and the journal paper "Glitch in the Matrix!": A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization submitted to CVIU.
LAV-DF Dataset
Download
To use this LAV-DF dataset, you should… See the full description on the dataset page: https://huggingface.co/datasets/ControlNet/LAV-DF.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Cloud-Adapter-Datasets
This dataset card aims to describe the datasets used in the Cloud-Adapter, a collection of high-resolution satellite images and semantic segmentation masks for cloud detection and related tasks.
Install
pip install huggingface-hub
Usage
huggingface-cli download --repo-type dataset XavierJiezou/cloud-adapter-datasets --local-dir data --include hrc_whu.zip huggingface-cli download --repo-type dataset… See the full description on the dataset page: https://huggingface.co/datasets/XavierJiezou/cloud-adapter-datasets.
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
The Stack v2
The dataset consists of 4 versions:
bigcode/the-stack-v2: the full "The Stack v2" dataset <-- you are here bigcode/the-stack-v2-dedup: based on the bigcode/the-stack-v2 but further near-deduplicated bigcode/the-stack-v2-train-full-ids: based on the bigcode/the-stack-v2-dedup dataset but further filtered with heuristics and spanning 600+ programming languages. The data is grouped into repositories.bigcode/the-stack-v2-train-smol-ids: based on the… See the full description on the dataset page: https://huggingface.co/datasets/bigcode/the-stack-v2.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Open X-Embodiment Dataset (unofficial)
RLDS dataset for train vla
use this dataset
download the dataset by hf: (
prepare by yourself
The code modified from rlds_dataset_mod We upload the precessed dataset in this repository ❤… See the full description on the dataset page: https://huggingface.co/datasets/WeiChow/VLATrainingDataset.
Data source
Downloaded via Andrej Karpathy's nanogpt repo from this link
Data Format
The entire dataset is split into train (90%) and test (10%). All rows are at most 1024 tokens, using the Llama 2 tokenizer. All rows are split cleanly so that sentences are whole and unbroken.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Indonesian Youtube
Source code at https://github.com/mesolitica/malaysian-dataset/tree/master/speech/indonesian-youtube
how to download
huggingface-cli download --repo-type dataset
--include '*.z*'
--local-dir './'
malaysia-ai/indonesian-youtube
wget https://www.7-zip.org/a/7z2301-linux-x64.tar.xz tar -xf 7z2301-linux-x64.tar.xz ~/7zz x mp3-16k.zip -y -mmt40
Licensing
All the videos, songs, images, and graphics used in the video belong to their… See the full description on the dataset page: https://huggingface.co/datasets/malaysia-ai/indonesian-youtube.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Download script to avoid the rate limit:
COMMAND="huggingface-cli download --repo-type dataset Melmaphother/crag-mm-image-search-images --local-dir crag-mm-image-search-images"
while true; do echo "Attempting to download/resume: $COMMAND" # Execute download command $COMMAND
EXIT_STATUS=$?
if [ $EXIT_STATUS -eq 0 ]; then
echo"Download completed successfully."
break
else
echo… See the full description on the dataset page: https://huggingface.co/datasets/Melmaphother/crag-mm-image-search-images.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Summary
Healthy CT data for abdominal organs (liver, pancreas and kidney) are filtered out from public dataset.
Downloading Instructions
1- Install the Hugging Face library:
pip install -U "huggingface_hub[cli]"
2- Download the dataset:
mkdir HealthyCT cd HealthyCT huggingface-cli download qicq1c/HealthyCT --repo-type dataset --local-dir . --cache-dir ./cache
[Optional] Resume downloading
In case you had a previous interrupted download… See the full description on the dataset page: https://huggingface.co/datasets/qicq1c/HealthyCT.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This is the dataset repository used in the pyiqa toolbox. Please refer to Awesome Image Quality Assessment for details of each dataset Example commandline script with huggingface-cli: huggingface-cli download chaofengc/IQA-PyTorch-Datasets live.tgz --local-dir ./datasets --repo-type dataset cd datasets tar -xzvf live.tgz
Disclaimer for This Dataset Collection
This collection of datasets is compiled and maintained for academic, research, and educational… See the full description on the dataset page: https://huggingface.co/datasets/chaofengc/IQA-PyTorch-Datasets.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Tamil Youtube
Selected channels from https://www.youtube.com using 'tamil podcast' keyword. With total 121347 audio files, total 11292.83 hours.
how to download
huggingface-cli download --repo-type dataset
--include '*.z*'
--local-dir './'
malaysia-ai/tamil-youtube
https://gist.githubusercontent.com/huseinzol05/2e26de4f3b29d99e993b349864ab6c10/raw/9b2251f3ff958770215d70c8d82d311f82791b78/unzip.py python3 unzip.py
Licensing
All the videos, songs, images… See the full description on the dataset page: https://huggingface.co/datasets/malaysia-ai/tamil-youtube.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
下载数据集使用: huggingface-cli download --repo-type dataset --resume-download p1k0/OCRMT30K-refine --local-dir OCRMT30K-refine original_data:原始标注 whole_image_v2.zip: 图片文件
This repository is a collection of images from sbsfgures.
How to use ths repo.
Download huggingface-cli download Ryoo72/sbsfigures_imgs --repo-type dataset
Unzip cat partial-imgs* > imgs.tar.gz tar -zxvf imgs.tar.gz
UseUse it with the following datasets.
Ryoo72/sbsfigures_qa Ryoo72/sbsfigures_extract
How did I upload this repo.
Split split -b 20G -d --suffix-length=2 imgs.tar.gz partial-imgs.
Upload from huggingface_hub import HfApi import glob… See the full description on the dataset page: https://huggingface.co/datasets/Ryoo72/sbsfigures_imgs.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
audiocaps
HuggingFace mirror of official data repo.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Model Card for the TSpec-LLM Dataset
Demo:
Dataset Description
Abstract
This dataset contains processed documentation files from the 3GPP (3rd Generation Partnership Project) standards, converted to markdown and docx formats. It is intended for use in telecommunications research, natural language processing, and machine learning applications, particularly those focusing on telecommunications standards and technologies.
🚀 Dataset Update: Now Up-to-Date… See the full description on the dataset page: https://huggingface.co/datasets/rasoul-nikbakht/TSpec-LLM.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
1X World Model Compression Challenge Dataset
This repository hosts the dataset for the 1X World Model Compression Challenge. huggingface-cli download 1x-technologies/worldmodel --repo-type dataset --local-dir data
Updates Since v1.1
Train/Val v2.0 (~100 hours), replacing v1.1 Test v2.0 dataset for the Compression Challenge Faces blurred for privacy New raw video dataset (CC-BY-NC-SA 4.0) at worldmodel_raw_data Example scripts now split into: cosmos_video_decoder.py —… See the full description on the dataset page: https://huggingface.co/datasets/1x-technologies/world_model_tokenized_data.
https://choosealicense.com/licenses/bsl-1.0/https://choosealicense.com/licenses/bsl-1.0/
pip install diffusers transformers para-attn numpy pandas hf_transfer
HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download thanhhau097/farfetch_singapore_images --repo-type dataset --local-dir .
cat farfetch_singapore_images.zip.* > farfetch_singapore_images.zip unzip -qq farfetch_singapore_images.zip
unzip -qq farfetch_masks_and_denseposes.zip
rm .zip
pip install sentencepiece HF_TOKEN= python create_farfetch_mask_free_data.py --k 1 --gpu_id 0 --root_folder ./
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Raw Dataset for the 1X World Model Sammpling Challenge. Download with: huggingface-cli download 1x-technologies/worldmodel_raw_data --repo-type dataset --local-dir data
Train/Val v2.0
The training dataset is shareded into 100 independent shards. The definitions are as follows:
video_{shard}.mp4: Raw video with a resolution of 512x512. segment_idx_{shard}.bin - Maps each frame i to its corresponding segment index. You may want to use this to separate non-contiguous frames from… See the full description on the dataset page: https://huggingface.co/datasets/1x-technologies/world_model_raw_data.
DL3DV-Dataset
This repo has all the 2K frames with camera poses of DL3DV-10K Dataset. We are working hard to review all the dataset to avoid sensitive information. Thank you for your patience.
Download
If you have enough space, you can use git to download a dataset from huggingface. See this link. 480P/960P versions should satisfies most needs. If you do not have enough space, we further provide a download script here to download a subset. The usage: usage:… See the full description on the dataset page: https://huggingface.co/datasets/DL3DV/DL3DV-ALL-2K.