48 datasets found
  1. h

    3dgs-dissolve

    • huggingface.co
    Updated Jan 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Finetrainers (2025). 3dgs-dissolve [Dataset]. https://huggingface.co/datasets/finetrainers/3dgs-dissolve
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 28, 2025
    Dataset authored and provided by
    Finetrainers
    Description

    Captioned version of dylanebert/3dgs-dissolve-videos. Captioning script:

    caption.py

    from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor import torch import os from pathlib import Path from huggingface_hub import snapshot_download from torchvision import io

    model = Qwen2_5_VLForConditionalGeneration.from_pretrained( "Qwen/Qwen2.5-VL-7B-Instruct", device_map="auto", torch_dtype=torch.bfloat16, attn_implementation="flash_attention_2", ) processor =… See the full description on the dataset page: https://huggingface.co/datasets/finetrainers/3dgs-dissolve.

  2. h

    3dgs-dissolve-wan-1.3b

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Will Lin, 3dgs-dissolve-wan-1.3b [Dataset]. https://huggingface.co/datasets/wlsaidhi/3dgs-dissolve-wan-1.3b
    Explore at:
    Authors
    Will Lin
    Description

    wlsaidhi/3dgs-dissolve-wan-1.3b dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. h

    faquad

    • huggingface.co
    • opendatalab.com
    Updated Sep 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eraldo R. Fernandes (2023). faquad [Dataset]. https://huggingface.co/datasets/eraldoluis/faquad
    Explore at:
    Dataset updated
    Sep 13, 2023
    Authors
    Eraldo R. Fernandes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Academic secretaries and faculty members of higher education institutions face a common problem: the abundance of questions sent by academics whose answers are found in available institutional documents. The official documents produced by Brazilian public universities are vast and disperse, which discourage students to further search for answers in such sources. In order to lessen this problem, we present FaQuAD: a novel machine reading comprehension dataset in the domain of Brazilian higher education institutions. FaQuAD follows the format of SQuAD (Stanford Question Answering Dataset) [Rajpurkar et al. 2016]. It comprises 900 questions about 249 reading passages (paragraphs), which were taken from 18 official documents of a computer science college from a Brazilian federal university and 21 Wikipedia articles related to Brazilian higher education system. As far as we know, this is the first Portuguese reading comprehension dataset in this format.

  4. h

    first-impressions-dataset

    • huggingface.co
    Updated Mar 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Training Data (2024). first-impressions-dataset [Dataset]. https://huggingface.co/datasets/TrainingDataPro/first-impressions-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 28, 2024
    Authors
    Training Data
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    First Impressions Dataset

    The dataset contains 20,000 images of people. For each person, a first impression of them was created. The first impression is a text consisting of several sentences.

      💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on TrainingData to buy the dataset
    
    
    
    
    
    
      Content
    

    The dataset includes a folder with images of 20,000 people. The .csv file consists of columns:

    image_id - the… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/first-impressions-dataset.

  5. h

    pile-of-law

    • huggingface.co
    • opendatalab.com
    Updated Jul 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pile of Law (2022). pile-of-law [Dataset]. https://huggingface.co/datasets/pile-of-law/pile-of-law
    Explore at:
    Dataset updated
    Jul 10, 2022
    Dataset authored and provided by
    Pile of Law
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    We curate a large corpus of legal and administrative data. The utility of this data is twofold: (1) to aggregate legal and administrative data sources that demonstrate different norms and legal standards for data filtering; (2) to collect a dataset that can be used in the future for pretraining legal-domain language models, a key direction in access-to-justice initiatives.

  6. h

    ocr-generated-machine-readable-zone-mrz-text-detection

    • huggingface.co
    Updated Sep 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Training Data (2023). ocr-generated-machine-readable-zone-mrz-text-detection [Dataset]. https://huggingface.co/datasets/TrainingDataPro/ocr-generated-machine-readable-zone-mrz-text-detection
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 15, 2023
    Authors
    Training Data
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    OCR GENERATED Machine-Readable Zone (MRZ) Text Detection

    The dataset includes a collection of GENERATED photos containing Machine Readable Zones (MRZ) commonly found on identification documents such as passports, visas, and ID cards. Each photo in the dataset is accompanied by text detection and Optical Character Recognition (OCR) results.

      💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on TrainingData to… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/ocr-generated-machine-readable-zone-mrz-text-detection.
    
  7. h

    vua20_metaphor

    • huggingface.co
    Updated Sep 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Creative Language ToolKit (2023). vua20_metaphor [Dataset]. https://huggingface.co/datasets/CreativeLang/vua20_metaphor
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 25, 2023
    Dataset authored and provided by
    Creative Language ToolKit
    License

    Attribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
    License information was derived automatically

    Description

    VUA20

      Dataset Summary
    

    Creative Language Toolkit (CLTK) Metadata

    CL Type: Metaphor Task Type: detection Size: 200k Created time: 2020

    VUA20 is (perhaps) the largest dataset of metaphor detection used in Figlang2020 workshop. For the details of this dataset, we refer you to the release paper. The annotation method of VUA20 is elabrated in the paper of MIP.

      Citation Information
    

    If you find this dataset helpful, please cite: @inproceedings{Leong2020ARO… See the full description on the dataset page: https://huggingface.co/datasets/CreativeLang/vua20_metaphor.

  8. h

    dpr-nq-reader-v2

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NLP Connect, dpr-nq-reader-v2 [Dataset]. https://huggingface.co/datasets/nlpconnect/dpr-nq-reader-v2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    NLP Connect
    Description

    nlpconnect/dpr-nq-reader-v2 dataset hosted on Hugging Face and contributed by the HF Datasets community

  9. h

    HunSimpleNews

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELTE Department of Digital Humanities, HunSimpleNews [Dataset]. https://huggingface.co/datasets/ELTE-DH/HunSimpleNews
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    ELTE Department of Digital Humanities
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    HunSimpleNews

    The majority of the text simplification literature in NLP focuses on simplifying sentences. From a theoretical standpoint, this approach is not entirely valid as it misses vital context required to dissolve ambiguities in meaning, references and the process itself. HunSimpleNews is the first Hungarian text simplification corpus that includes the standard and simplified versions of whole documents. The corpus contains news article pairs taken from the Serbian Hungarian… See the full description on the dataset page: https://huggingface.co/datasets/ELTE-DH/HunSimpleNews.

  10. h

    facial-hair-classification-dataset

    • huggingface.co
    Updated Oct 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Training Data (2023). facial-hair-classification-dataset [Dataset]. https://huggingface.co/datasets/TrainingDataPro/facial-hair-classification-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 18, 2023
    Authors
    Training Data
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Facial Hair Classification & Object Detection dataset

    The Facial Hair Classification Dataset is a comprehensive collection of high-resolution images showcasing individuals with and without a beard. The dataset includes a diverse range of individuals of various ages, ethnicities, and genders.

      💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on TrainingData to buy the dataset
    

    The dataset also contains… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/facial-hair-classification-dataset.

  11. h

    chest-x-rays-dataset

    • huggingface.co
    Updated Feb 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Training Data (2024). chest-x-rays-dataset [Dataset]. https://huggingface.co/datasets/TrainingDataPro/chest-x-rays-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 6, 2024
    Authors
    Training Data
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Chest X-ray

    The dataset consists of .dcm files containing X-ray images of the thorax. The images are labeled by the doctors and accompanied by corresponding annotations in JSON format. The annotations provide detailed information about the organ structures present in the chest X-ray images.

      💴 For Commercial Usage: Full version of the dataset includes 400+ chest x-rays of people with different conditions, leave a request on TrainingData to buy the dataset
    
    
    
    
    
      Types… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/chest-x-rays-dataset.
    
  12. h

    asos-e-commerce-dataset

    • huggingface.co
    Updated Mar 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Training Data (2023). asos-e-commerce-dataset [Dataset]. https://huggingface.co/datasets/TrainingDataPro/asos-e-commerce-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 11, 2023
    Authors
    Training Data
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Asos

    Using web scraping, we collected information on over 30,845 clothing items from the Asos website. The dataset can be applied in E-commerce analytics in the fashion industry. The dataset is similar to SheIn E-Commerce Dataset.

      💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on TrainingData to buy the dataset
    
    
    
    
    
    
      Dataset Info
    

    For each item, we extracted:

    url - link to the item on the… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/asos-e-commerce-dataset.

  13. h

    magpie

    • huggingface.co
    Updated Aug 12, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriele Sarti (2022). magpie [Dataset]. https://huggingface.co/datasets/gsarti/magpie
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 12, 2022
    Authors
    Gabriele Sarti
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The MAGPIE corpus is a large sense-annotated corpus of potentially idiomatic expressions (PIEs), based on the British National Corpus (BNC). Potentially idiomatic expressions are like idiomatic expressions, but the term also covers literal uses of idiomatic expressions, such as 'I leave work at the end of the day.' for the idiom 'at the end of the day'. This version of the dataset reflects the filtered subset used by Dankers et al. (2022) in their investigation on how PIEs are represented by NMT models. Authors use 37k samples annotated as fully figurative or literal, for 1482 idioms that contain nouns, numerals or adjectives that are colours (which they refer to as keywords). Because idioms show syntactic and morphological variability, the focus is mostly put on nouns. PIEs and their context are separated using the original corpus’s word-level annotations.

  14. h

    Mannequin_Dataset_Anti_Spoofing

    • huggingface.co
    Updated May 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AxonLabs (2024). Mannequin_Dataset_Anti_Spoofing [Dataset]. https://huggingface.co/datasets/AxonData/Mannequin_Dataset_Anti_Spoofing
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 14, 2024
    Authors
    AxonLabs
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    3D Mannequin Face Dataset for Liveness Detection (1K+ pictures)

    Explore 3D mannequins for anti-spoofing models (1000+ images)

      Share your feedback - recieve additional samples for free!😊
    
    
    
    
    
      Full version of dataset is availible for commercial usage - leave a request on our website Axon Labs to purchase the dataset 💰
    

    Our 3D Mannequin Anti-Spoofing Dataset provides a comprehensive collection of mannequin images, optimized for enhancing liveness detection models in… See the full description on the dataset page: https://huggingface.co/datasets/AxonData/Mannequin_Dataset_Anti_Spoofing.

  15. test-cot

    • huggingface.co
    Updated Jul 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugging Face H4 (2024). test-cot [Dataset]. https://huggingface.co/datasets/HuggingFaceH4/test-cot
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 17, 2024
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    Hugging Face H4
    Description

    HuggingFaceH4/test-cot dataset hosted on Hugging Face and contributed by the HF Datasets community

  16. h

    bald-women

    • huggingface.co
    Updated Apr 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Training Data (2024). bald-women [Dataset]. https://huggingface.co/datasets/TrainingDataPro/bald-women
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 25, 2024
    Authors
    Training Data
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Bald Women Dataset

    The dataset contains images of women with various stages of hair loss. Each person is represented by 5 images showcasing their condition. The alopecia dataset encompasses diverse demographics, age and ethnicities. Shooting angles in the dataset:

      💴 For Commercial Usage: Full version of the dataset includes 1000+ photos of people with different stages of hair loss, leave a request on TrainingData to buy the dataset
    

    The balding dataset is a valuable… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/bald-women.

  17. h

    roads-segmentation-dataset

    • huggingface.co
    Updated Sep 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Training Data (2023). roads-segmentation-dataset [Dataset]. https://huggingface.co/datasets/TrainingDataPro/roads-segmentation-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 16, 2023
    Authors
    Training Data
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Road Segmentation Dataset

    This dataset comprises a collection of images captured through DVRs (Digital Video Recorders) showcasing roads. Each image is accompanied by segmentation masks demarcating different entities (road surface, cars, road signs, marking and background) within the scene.

      💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on TrainingData to buy the dataset
    

    The dataset can be utilized… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/roads-segmentation-dataset.

  18. h

    JAZZMUS_staffLevel

    • huggingface.co
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pattern Recognition and Artificial Intelligence Group (2025). JAZZMUS_staffLevel [Dataset]. https://huggingface.co/datasets/PRAIG/JAZZMUS_staffLevel
    Explore at:
    Dataset updated
    Jun 17, 2025
    Dataset authored and provided by
    Pattern Recognition and Artificial Intelligence Group
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Optical Music Recognition of Jazz Lead Sheets - Staff Level Dataset

    We provide musical scores for 163 unique jazz standards in MusicXML and Humdrum **kern format. The latter is widely used in systems that output musical scores, because it is a compact and easy-to-handle format. The MusicXML scores are sourced from the Wikifonia database (discontinued in 2013) and have been partially corrected. We also leave the lyrics, if present in the original files, as they could be helpful for… See the full description on the dataset page: https://huggingface.co/datasets/PRAIG/JAZZMUS_staffLevel.

  19. h

    Reasoner-1o1-v0.3-HQ

    • huggingface.co
    Updated Sep 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yassine Ennaour (2024). Reasoner-1o1-v0.3-HQ [Dataset]. https://huggingface.co/datasets/Lyte/Reasoner-1o1-v0.3-HQ
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 16, 2024
    Authors
    Yassine Ennaour
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Details

      Dataset Description
    

    This dataset was created using a chain of thought with a verifier and varying difficulty levels. The questions were generated using a higher temperature setting to encourage better creativity across random categories. I did not hand-curate this dataset, but please feel free to review it and leave a comment with any details you may want to discuss. Made using Meta-Llama-3.1-405B-Instruct and Meta-Llama-3.1-70B-Instruct I planned to… See the full description on the dataset page: https://huggingface.co/datasets/Lyte/Reasoner-1o1-v0.3-HQ.

  20. h

    bald-men

    • huggingface.co
    Updated Apr 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Training Data (2024). bald-men [Dataset]. https://huggingface.co/datasets/TrainingDataPro/bald-men
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 25, 2024
    Authors
    Training Data
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Bald Men Image Dataset

    The dataset contains images of men with various stages of hair loss. Each person is represented by 5 images showcasing their condition. The alopecia dataset encompasses diverse demographics, age and ethnicities. Each case of hair loss is labeled by the Norwood scale. Shooting angles in the dataset:

      💴 For Commercial Usage: Full version of the dataset includes 1000+ photos of people with different stages of hair loss, leave a request on TrainingData… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/bald-men.
    
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Finetrainers (2025). 3dgs-dissolve [Dataset]. https://huggingface.co/datasets/finetrainers/3dgs-dissolve

3dgs-dissolve

finetrainers/3dgs-dissolve

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 28, 2025
Dataset authored and provided by
Finetrainers
Description

Captioned version of dylanebert/3dgs-dissolve-videos. Captioning script:

caption.py

from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor import torch import os from pathlib import Path from huggingface_hub import snapshot_download from torchvision import io

model = Qwen2_5_VLForConditionalGeneration.from_pretrained( "Qwen/Qwen2.5-VL-7B-Instruct", device_map="auto", torch_dtype=torch.bfloat16, attn_implementation="flash_attention_2", ) processor =… See the full description on the dataset page: https://huggingface.co/datasets/finetrainers/3dgs-dissolve.

Search
Clear search
Close search
Google apps
Main menu