Captioned version of dylanebert/3dgs-dissolve-videos. Captioning script:
caption.py
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor import torch import os from pathlib import Path from huggingface_hub import snapshot_download from torchvision import io
model = Qwen2_5_VLForConditionalGeneration.from_pretrained( "Qwen/Qwen2.5-VL-7B-Instruct", device_map="auto", torch_dtype=torch.bfloat16, attn_implementation="flash_attention_2", ) processor =… See the full description on the dataset page: https://huggingface.co/datasets/finetrainers/3dgs-dissolve.
wlsaidhi/3dgs-dissolve-wan-1.3b dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Academic secretaries and faculty members of higher education institutions face a common problem: the abundance of questions sent by academics whose answers are found in available institutional documents. The official documents produced by Brazilian public universities are vast and disperse, which discourage students to further search for answers in such sources. In order to lessen this problem, we present FaQuAD: a novel machine reading comprehension dataset in the domain of Brazilian higher education institutions. FaQuAD follows the format of SQuAD (Stanford Question Answering Dataset) [Rajpurkar et al. 2016]. It comprises 900 questions about 249 reading passages (paragraphs), which were taken from 18 official documents of a computer science college from a Brazilian federal university and 21 Wikipedia articles related to Brazilian higher education system. As far as we know, this is the first Portuguese reading comprehension dataset in this format.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
First Impressions Dataset
The dataset contains 20,000 images of people. For each person, a first impression of them was created. The first impression is a text consisting of several sentences.
💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on TrainingData to buy the dataset
Content
The dataset includes a folder with images of 20,000 people. The .csv file consists of columns:
image_id - the… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/first-impressions-dataset.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
We curate a large corpus of legal and administrative data. The utility of this data is twofold: (1) to aggregate legal and administrative data sources that demonstrate different norms and legal standards for data filtering; (2) to collect a dataset that can be used in the future for pretraining legal-domain language models, a key direction in access-to-justice initiatives.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
OCR GENERATED Machine-Readable Zone (MRZ) Text Detection
The dataset includes a collection of GENERATED photos containing Machine Readable Zones (MRZ) commonly found on identification documents such as passports, visas, and ID cards. Each photo in the dataset is accompanied by text detection and Optical Character Recognition (OCR) results.
💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on TrainingData to… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/ocr-generated-machine-readable-zone-mrz-text-detection.
Attribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
License information was derived automatically
VUA20
Dataset Summary
Creative Language Toolkit (CLTK) Metadata
CL Type: Metaphor Task Type: detection Size: 200k Created time: 2020
VUA20 is (perhaps) the largest dataset of metaphor detection used in Figlang2020 workshop. For the details of this dataset, we refer you to the release paper. The annotation method of VUA20 is elabrated in the paper of MIP.
Citation Information
If you find this dataset helpful, please cite: @inproceedings{Leong2020ARO… See the full description on the dataset page: https://huggingface.co/datasets/CreativeLang/vua20_metaphor.
nlpconnect/dpr-nq-reader-v2 dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
HunSimpleNews
The majority of the text simplification literature in NLP focuses on simplifying sentences. From a theoretical standpoint, this approach is not entirely valid as it misses vital context required to dissolve ambiguities in meaning, references and the process itself. HunSimpleNews is the first Hungarian text simplification corpus that includes the standard and simplified versions of whole documents. The corpus contains news article pairs taken from the Serbian Hungarian… See the full description on the dataset page: https://huggingface.co/datasets/ELTE-DH/HunSimpleNews.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Facial Hair Classification & Object Detection dataset
The Facial Hair Classification Dataset is a comprehensive collection of high-resolution images showcasing individuals with and without a beard. The dataset includes a diverse range of individuals of various ages, ethnicities, and genders.
💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on TrainingData to buy the dataset
The dataset also contains… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/facial-hair-classification-dataset.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Chest X-ray
The dataset consists of .dcm files containing X-ray images of the thorax. The images are labeled by the doctors and accompanied by corresponding annotations in JSON format. The annotations provide detailed information about the organ structures present in the chest X-ray images.
💴 For Commercial Usage: Full version of the dataset includes 400+ chest x-rays of people with different conditions, leave a request on TrainingData to buy the dataset
Types… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/chest-x-rays-dataset.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Asos
Using web scraping, we collected information on over 30,845 clothing items from the Asos website. The dataset can be applied in E-commerce analytics in the fashion industry. The dataset is similar to SheIn E-Commerce Dataset.
💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on TrainingData to buy the dataset
Dataset Info
For each item, we extracted:
url - link to the item on the… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/asos-e-commerce-dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The MAGPIE corpus is a large sense-annotated corpus of potentially idiomatic expressions (PIEs), based on the British National Corpus (BNC). Potentially idiomatic expressions are like idiomatic expressions, but the term also covers literal uses of idiomatic expressions, such as 'I leave work at the end of the day.' for the idiom 'at the end of the day'. This version of the dataset reflects the filtered subset used by Dankers et al. (2022) in their investigation on how PIEs are represented by NMT models. Authors use 37k samples annotated as fully figurative or literal, for 1482 idioms that contain nouns, numerals or adjectives that are colours (which they refer to as keywords). Because idioms show syntactic and morphological variability, the focus is mostly put on nouns. PIEs and their context are separated using the original corpus’s word-level annotations.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
3D Mannequin Face Dataset for Liveness Detection (1K+ pictures)
Explore 3D mannequins for anti-spoofing models (1000+ images)
Share your feedback - recieve additional samples for free!😊
Full version of dataset is availible for commercial usage - leave a request on our website Axon Labs to purchase the dataset 💰
Our 3D Mannequin Anti-Spoofing Dataset provides a comprehensive collection of mannequin images, optimized for enhancing liveness detection models in… See the full description on the dataset page: https://huggingface.co/datasets/AxonData/Mannequin_Dataset_Anti_Spoofing.
HuggingFaceH4/test-cot dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Bald Women Dataset
The dataset contains images of women with various stages of hair loss. Each person is represented by 5 images showcasing their condition. The alopecia dataset encompasses diverse demographics, age and ethnicities. Shooting angles in the dataset:
💴 For Commercial Usage: Full version of the dataset includes 1000+ photos of people with different stages of hair loss, leave a request on TrainingData to buy the dataset
The balding dataset is a valuable… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/bald-women.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Road Segmentation Dataset
This dataset comprises a collection of images captured through DVRs (Digital Video Recorders) showcasing roads. Each image is accompanied by segmentation masks demarcating different entities (road surface, cars, road signs, marking and background) within the scene.
💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on TrainingData to buy the dataset
The dataset can be utilized… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/roads-segmentation-dataset.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Optical Music Recognition of Jazz Lead Sheets - Staff Level Dataset
We provide musical scores for 163 unique jazz standards in MusicXML and Humdrum **kern format. The latter is widely used in systems that output musical scores, because it is a compact and easy-to-handle format. The MusicXML scores are sourced from the Wikifonia database (discontinued in 2013) and have been partially corrected. We also leave the lyrics, if present in the original files, as they could be helpful for… See the full description on the dataset page: https://huggingface.co/datasets/PRAIG/JAZZMUS_staffLevel.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Details
Dataset Description
This dataset was created using a chain of thought with a verifier and varying difficulty levels. The questions were generated using a higher temperature setting to encourage better creativity across random categories. I did not hand-curate this dataset, but please feel free to review it and leave a comment with any details you may want to discuss. Made using Meta-Llama-3.1-405B-Instruct and Meta-Llama-3.1-70B-Instruct I planned to… See the full description on the dataset page: https://huggingface.co/datasets/Lyte/Reasoner-1o1-v0.3-HQ.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Bald Men Image Dataset
The dataset contains images of men with various stages of hair loss. Each person is represented by 5 images showcasing their condition. The alopecia dataset encompasses diverse demographics, age and ethnicities. Each case of hair loss is labeled by the Norwood scale. Shooting angles in the dataset:
💴 For Commercial Usage: Full version of the dataset includes 1000+ photos of people with different stages of hair loss, leave a request on TrainingData… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/bald-men.
Captioned version of dylanebert/3dgs-dissolve-videos. Captioning script:
caption.py
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor import torch import os from pathlib import Path from huggingface_hub import snapshot_download from torchvision import io
model = Qwen2_5_VLForConditionalGeneration.from_pretrained( "Qwen/Qwen2.5-VL-7B-Instruct", device_map="auto", torch_dtype=torch.bfloat16, attn_implementation="flash_attention_2", ) processor =… See the full description on the dataset page: https://huggingface.co/datasets/finetrainers/3dgs-dissolve.