48 datasets found

h
3dgs-dissolve
huggingface.co
Updated Jan 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Finetrainers (2025). 3dgs-dissolve [Dataset]. https://huggingface.co/datasets/finetrainers/3dgs-dissolve
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 28, 2025
Dataset authored and provided by
Finetrainers
Description
Captioned version of dylanebert/3dgs-dissolve-videos. Captioning script:

caption.py

from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor import torch import os from pathlib import Path from huggingface_hub import snapshot_download from torchvision import io

model = Qwen2_5_VLForConditionalGeneration.from_pretrained( "Qwen/Qwen2.5-VL-7B-Instruct", device_map="auto", torch_dtype=torch.bfloat16, attn_implementation="flash_attention_2", ) processor =… See the full description on the dataset page: https://huggingface.co/datasets/finetrainers/3dgs-dissolve.
h
3dgs-dissolve-wan-1.3b
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Will Lin, 3dgs-dissolve-wan-1.3b [Dataset]. https://huggingface.co/datasets/wlsaidhi/3dgs-dissolve-wan-1.3b
Explore at:
Authors
Will Lin
Description
wlsaidhi/3dgs-dissolve-wan-1.3b dataset hosted on Hugging Face and contributed by the HF Datasets community
h
faquad
huggingface.co
opendatalab.com
Updated Sep 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eraldo R. Fernandes (2023). faquad [Dataset]. https://huggingface.co/datasets/eraldoluis/faquad
Explore at:
Dataset updated
Sep 13, 2023
Authors
Eraldo R. Fernandes
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Academic secretaries and faculty members of higher education institutions face a common problem: the abundance of questions sent by academics whose answers are found in available institutional documents. The official documents produced by Brazilian public universities are vast and disperse, which discourage students to further search for answers in such sources. In order to lessen this problem, we present FaQuAD: a novel machine reading comprehension dataset in the domain of Brazilian higher education institutions. FaQuAD follows the format of SQuAD (Stanford Question Answering Dataset) [Rajpurkar et al. 2016]. It comprises 900 questions about 249 reading passages (paragraphs), which were taken from 18 official documents of a computer science college from a Brazilian federal university and 21 Wikipedia articles related to Brazilian higher education system. As far as we know, this is the first Portuguese reading comprehension dataset in this format.
h
first-impressions-dataset
huggingface.co
Updated Mar 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Training Data (2024). first-impressions-dataset [Dataset]. https://huggingface.co/datasets/TrainingDataPro/first-impressions-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 28, 2024
Authors
Training Data
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
First Impressions Dataset

The dataset contains 20,000 images of people. For each person, a first impression of them was created. The first impression is a text consisting of several sentences.

💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on TrainingData to buy the dataset Content

The dataset includes a folder with images of 20,000 people. The .csv file consists of columns:

image_id - the… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/first-impressions-dataset.
h
pile-of-law
huggingface.co
opendatalab.com
Updated Jul 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pile of Law (2022). pile-of-law [Dataset]. https://huggingface.co/datasets/pile-of-law/pile-of-law
Explore at:
Dataset updated
Jul 10, 2022
Dataset authored and provided by
Pile of Law
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
We curate a large corpus of legal and administrative data. The utility of this data is twofold: (1) to aggregate legal and administrative data sources that demonstrate different norms and legal standards for data filtering; (2) to collect a dataset that can be used in the future for pretraining legal-domain language models, a key direction in access-to-justice initiatives.
h
ocr-generated-machine-readable-zone-mrz-text-detection
huggingface.co
Updated Sep 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Training Data (2023). ocr-generated-machine-readable-zone-mrz-text-detection [Dataset]. https://huggingface.co/datasets/TrainingDataPro/ocr-generated-machine-readable-zone-mrz-text-detection
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 15, 2023
Authors
Training Data
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
OCR GENERATED Machine-Readable Zone (MRZ) Text Detection

The dataset includes a collection of GENERATED photos containing Machine Readable Zones (MRZ) commonly found on identification documents such as passports, visas, and ID cards. Each photo in the dataset is accompanied by text detection and Optical Character Recognition (OCR) results.

💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on TrainingData to… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/ocr-generated-machine-readable-zone-mrz-text-detection.
h
vua20_metaphor
huggingface.co
Updated Sep 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Creative Language ToolKit (2023). vua20_metaphor [Dataset]. https://huggingface.co/datasets/CreativeLang/vua20_metaphor
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 25, 2023
Dataset authored and provided by
Creative Language ToolKit
License
Attribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
License information was derived automatically
Description
VUA20

Dataset Summary

Creative Language Toolkit (CLTK) Metadata

CL Type: Metaphor Task Type: detection Size: 200k Created time: 2020

VUA20 is (perhaps) the largest dataset of metaphor detection used in Figlang2020 workshop. For the details of this dataset, we refer you to the release paper. The annotation method of VUA20 is elabrated in the paper of MIP.

Citation Information

If you find this dataset helpful, please cite: @inproceedings{Leong2020ARO… See the full description on the dataset page: https://huggingface.co/datasets/CreativeLang/vua20_metaphor.
h
dpr-nq-reader-v2
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NLP Connect, dpr-nq-reader-v2 [Dataset]. https://huggingface.co/datasets/nlpconnect/dpr-nq-reader-v2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
NLP Connect
Description
nlpconnect/dpr-nq-reader-v2 dataset hosted on Hugging Face and contributed by the HF Datasets community
h
HunSimpleNews
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ELTE Department of Digital Humanities, HunSimpleNews [Dataset]. https://huggingface.co/datasets/ELTE-DH/HunSimpleNews
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
ELTE Department of Digital Humanities
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
HunSimpleNews

The majority of the text simplification literature in NLP focuses on simplifying sentences. From a theoretical standpoint, this approach is not entirely valid as it misses vital context required to dissolve ambiguities in meaning, references and the process itself. HunSimpleNews is the first Hungarian text simplification corpus that includes the standard and simplified versions of whole documents. The corpus contains news article pairs taken from the Serbian Hungarian… See the full description on the dataset page: https://huggingface.co/datasets/ELTE-DH/HunSimpleNews.
h
facial-hair-classification-dataset
huggingface.co
Updated Oct 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Training Data (2023). facial-hair-classification-dataset [Dataset]. https://huggingface.co/datasets/TrainingDataPro/facial-hair-classification-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 18, 2023
Authors
Training Data
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Facial Hair Classification & Object Detection dataset

The Facial Hair Classification Dataset is a comprehensive collection of high-resolution images showcasing individuals with and without a beard. The dataset includes a diverse range of individuals of various ages, ethnicities, and genders.

💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on TrainingData to buy the dataset

The dataset also contains… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/facial-hair-classification-dataset.
h
chest-x-rays-dataset
huggingface.co
Updated Feb 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Training Data (2024). chest-x-rays-dataset [Dataset]. https://huggingface.co/datasets/TrainingDataPro/chest-x-rays-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 6, 2024
Authors
Training Data
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Chest X-ray

The dataset consists of .dcm files containing X-ray images of the thorax. The images are labeled by the doctors and accompanied by corresponding annotations in JSON format. The annotations provide detailed information about the organ structures present in the chest X-ray images.

💴 For Commercial Usage: Full version of the dataset includes 400+ chest x-rays of people with different conditions, leave a request on TrainingData to buy the dataset Types… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/chest-x-rays-dataset.
h
asos-e-commerce-dataset
huggingface.co
Updated Mar 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Training Data (2023). asos-e-commerce-dataset [Dataset]. https://huggingface.co/datasets/TrainingDataPro/asos-e-commerce-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 11, 2023
Authors
Training Data
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Asos

Using web scraping, we collected information on over 30,845 clothing items from the Asos website. The dataset can be applied in E-commerce analytics in the fashion industry. The dataset is similar to SheIn E-Commerce Dataset.

💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on TrainingData to buy the dataset Dataset Info

For each item, we extracted:

url - link to the item on the… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/asos-e-commerce-dataset.
h
magpie
huggingface.co
Updated Aug 12, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriele Sarti (2022). magpie [Dataset]. https://huggingface.co/datasets/gsarti/magpie
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 12, 2022
Authors
Gabriele Sarti
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The MAGPIE corpus is a large sense-annotated corpus of potentially idiomatic expressions (PIEs), based on the British National Corpus (BNC). Potentially idiomatic expressions are like idiomatic expressions, but the term also covers literal uses of idiomatic expressions, such as 'I leave work at the end of the day.' for the idiom 'at the end of the day'. This version of the dataset reflects the filtered subset used by Dankers et al. (2022) in their investigation on how PIEs are represented by NMT models. Authors use 37k samples annotated as fully figurative or literal, for 1482 idioms that contain nouns, numerals or adjectives that are colours (which they refer to as keywords). Because idioms show syntactic and morphological variability, the focus is mostly put on nouns. PIEs and their context are separated using the original corpus’s word-level annotations.
h
Mannequin_Dataset_Anti_Spoofing
huggingface.co
Updated May 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AxonLabs (2024). Mannequin_Dataset_Anti_Spoofing [Dataset]. https://huggingface.co/datasets/AxonData/Mannequin_Dataset_Anti_Spoofing
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 14, 2024
Authors
AxonLabs
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
3D Mannequin Face Dataset for Liveness Detection (1K+ pictures)

Explore 3D mannequins for anti-spoofing models (1000+ images)

Share your feedback - recieve additional samples for free!😊 Full version of dataset is availible for commercial usage - leave a request on our website Axon Labs to purchase the dataset 💰

Our 3D Mannequin Anti-Spoofing Dataset provides a comprehensive collection of mannequin images, optimized for enhancing liveness detection models in… See the full description on the dataset page: https://huggingface.co/datasets/AxonData/Mannequin_Dataset_Anti_Spoofing.
test-cot
huggingface.co
Updated Jul 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hugging Face H4 (2024). test-cot [Dataset]. https://huggingface.co/datasets/HuggingFaceH4/test-cot
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 17, 2024
Dataset provided by
Hugging Facehttps://huggingface.co/
Authors
Hugging Face H4
Description
HuggingFaceH4/test-cot dataset hosted on Hugging Face and contributed by the HF Datasets community
h
bald-women
huggingface.co
Updated Apr 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Training Data (2024). bald-women [Dataset]. https://huggingface.co/datasets/TrainingDataPro/bald-women
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 25, 2024
Authors
Training Data
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Bald Women Dataset

The dataset contains images of women with various stages of hair loss. Each person is represented by 5 images showcasing their condition. The alopecia dataset encompasses diverse demographics, age and ethnicities. Shooting angles in the dataset:

💴 For Commercial Usage: Full version of the dataset includes 1000+ photos of people with different stages of hair loss, leave a request on TrainingData to buy the dataset

The balding dataset is a valuable… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/bald-women.
h
roads-segmentation-dataset
huggingface.co
Updated Sep 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Training Data (2023). roads-segmentation-dataset [Dataset]. https://huggingface.co/datasets/TrainingDataPro/roads-segmentation-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 16, 2023
Authors
Training Data
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Road Segmentation Dataset

This dataset comprises a collection of images captured through DVRs (Digital Video Recorders) showcasing roads. Each image is accompanied by segmentation masks demarcating different entities (road surface, cars, road signs, marking and background) within the scene.

💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on TrainingData to buy the dataset

The dataset can be utilized… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/roads-segmentation-dataset.
h
JAZZMUS_staffLevel
huggingface.co
Updated Jun 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pattern Recognition and Artificial Intelligence Group (2025). JAZZMUS_staffLevel [Dataset]. https://huggingface.co/datasets/PRAIG/JAZZMUS_staffLevel
Explore at:
Dataset updated
Jun 17, 2025
Dataset authored and provided by
Pattern Recognition and Artificial Intelligence Group
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Optical Music Recognition of Jazz Lead Sheets - Staff Level Dataset

We provide musical scores for 163 unique jazz standards in MusicXML and Humdrum **kern format. The latter is widely used in systems that output musical scores, because it is a compact and easy-to-handle format. The MusicXML scores are sourced from the Wikifonia database (discontinued in 2013) and have been partially corrected. We also leave the lyrics, if present in the original files, as they could be helpful for… See the full description on the dataset page: https://huggingface.co/datasets/PRAIG/JAZZMUS_staffLevel.
h
Reasoner-1o1-v0.3-HQ
huggingface.co
Updated Sep 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yassine Ennaour (2024). Reasoner-1o1-v0.3-HQ [Dataset]. https://huggingface.co/datasets/Lyte/Reasoner-1o1-v0.3-HQ
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 16, 2024
Authors
Yassine Ennaour
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Details

Dataset Description

This dataset was created using a chain of thought with a verifier and varying difficulty levels. The questions were generated using a higher temperature setting to encourage better creativity across random categories. I did not hand-curate this dataset, but please feel free to review it and leave a comment with any details you may want to discuss. Made using Meta-Llama-3.1-405B-Instruct and Meta-Llama-3.1-70B-Instruct I planned to… See the full description on the dataset page: https://huggingface.co/datasets/Lyte/Reasoner-1o1-v0.3-HQ.
h
bald-men
huggingface.co
Updated Apr 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Training Data (2024). bald-men [Dataset]. https://huggingface.co/datasets/TrainingDataPro/bald-men
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 25, 2024
Authors
Training Data
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Bald Men Image Dataset

The dataset contains images of men with various stages of hair loss. Each person is represented by 5 images showcasing their condition. The alopecia dataset encompasses diverse demographics, age and ethnicities. Each case of hair loss is labeled by the Norwood scale. Shooting angles in the dataset:

💴 For Commercial Usage: Full version of the dataset includes 1000+ photos of people with different stages of hair loss, leave a request on TrainingData… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/bald-men.

Facebook

Twitter

Click to copy link

Link copied

Cite

Finetrainers (2025). 3dgs-dissolve [Dataset]. https://huggingface.co/datasets/finetrainers/3dgs-dissolve

3dgs-dissolve

finetrainers/3dgs-dissolve

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jan 28, 2025

Dataset authored and provided by

Finetrainers

Description

Captioned version of dylanebert/3dgs-dissolve-videos. Captioning script:

caption.py

from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor import torch import os from pathlib import Path from huggingface_hub import snapshot_download from torchvision import io

model = Qwen2_5_VLForConditionalGeneration.from_pretrained( "Qwen/Qwen2.5-VL-7B-Instruct", device_map="auto", torch_dtype=torch.bfloat16, attn_implementation="flash_attention_2", ) processor =… See the full description on the dataset page: https://huggingface.co/datasets/finetrainers/3dgs-dissolve.

Clear search

Close search

Google apps

Main menu

3dgs-dissolve

3dgs-dissolve-wan-1.3b

faquad

first-impressions-dataset

pile-of-law

ocr-generated-machine-readable-zone-mrz-text-detection

vua20_metaphor

dpr-nq-reader-v2

HunSimpleNews

facial-hair-classification-dataset

chest-x-rays-dataset

asos-e-commerce-dataset

magpie

Mannequin_Dataset_Anti_Spoofing

test-cot

bald-women

roads-segmentation-dataset

JAZZMUS_staffLevel

Reasoner-1o1-v0.3-HQ

bald-men

3dgs-dissolve

finetrainers/3dgs-dissolve