41 datasets found
  1. SALT

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SAP AI Research, SALT [Dataset]. https://huggingface.co/datasets/sap-ai-research/SALT
    Explore at:
    Dataset provided by
    SAPhttp://sap.com/
    Authors
    SAP AI Research
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    SALT: Sales Autocompletion Linked Business Tables Dataset

    Dataset for our paper SALT: Sales Autocompletion Linked Business Tables Dataset presented at NeurIPS'24 Table Representation Workshop.

      News
    

    07/10/2025: ๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰ Dataset is now integrated into RelBench ๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰ 01/11/2025: Updated paper (some results changed due to minor dataset changes, screenshots added to appendix) 12/19/2024: Train/test splits released 12/15/2024: Preliminatry dataset now also available onโ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/sap-ai-research/SALT.

  2. h

    LLaVAR

    • huggingface.co
    Updated Jan 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Social And Language Technology Lab (2021). LLaVAR [Dataset]. https://huggingface.co/datasets/SALT-NLP/LLaVAR
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 27, 2021
    Dataset authored and provided by
    Social And Language Technology Lab
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    LLaVAR Data: Enhanced Visual Instruction Data with Text-Rich Images

    More info at LLaVAR project page, Github repo, and paper.

      Training Data
    

    Based on the LAION dataset, we collect 422K pretraining data based on OCR results. For finetuning data, we collect 16K high-quality instruction-following data by interacting with langauge-only GPT-4. Note that we also release a larger and more diverse finetuning dataset below (20K), which contains the 16K we used for the paper. Theโ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/SALT-NLP/LLaVAR.

  3. h

    spotify_podcast_ASR

    • huggingface.co
    Updated Dec 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Social And Language Technology Lab (2024). spotify_podcast_ASR [Dataset]. https://huggingface.co/datasets/SALT-NLP/spotify_podcast_ASR
    Explore at:
    Dataset updated
    Dec 20, 2024
    Dataset authored and provided by
    Social And Language Technology Lab
    Description

    SALT-NLP/spotify_podcast_ASR dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. h

    wiki-balance-natural

    • huggingface.co
    Updated Jun 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Social And Language Technology Lab (2024). wiki-balance-natural [Dataset]. https://huggingface.co/datasets/SALT-NLP/wiki-balance-natural
    Explore at:
    Dataset updated
    Jun 4, 2024
    Dataset authored and provided by
    Social And Language Technology Lab
    Description

    SALT-NLP/wiki-balance-natural dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. h

    ProtectAndServe

    • huggingface.co
    Updated May 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Social And Language Technology Lab (2025). ProtectAndServe [Dataset]. https://huggingface.co/datasets/SALT-NLP/ProtectAndServe
    Explore at:
    Dataset updated
    May 1, 2025
    Dataset authored and provided by
    Social And Language Technology Lab
    Description

    SALT-NLP/ProtectAndServe dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. h

    CoQA_AppE

    • huggingface.co
    Updated Jun 13, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Social And Language Technology Lab (2024). CoQA_AppE [Dataset]. https://huggingface.co/datasets/SALT-NLP/CoQA_AppE
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 13, 2024
    Dataset authored and provided by
    Social And Language Technology Lab
    Description

    SALT-NLP/CoQA_AppE dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. h

    MultiModalInstructionFollowing

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Social And Language Technology Lab, MultiModalInstructionFollowing [Dataset]. https://huggingface.co/datasets/SALT-NLP/MultiModalInstructionFollowing
    Explore at:
    Dataset authored and provided by
    Social And Language Technology Lab
    Description

    SALT-NLP/MultiModalInstructionFollowing dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. h

    salt-translation-test-set

    • huggingface.co
    Updated Jun 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sunbird AI (2025). salt-translation-test-set [Dataset]. https://huggingface.co/datasets/Sunbird/salt-translation-test-set
    Explore at:
    Dataset updated
    Jun 16, 2025
    Dataset authored and provided by
    Sunbird AI
    Description

    Sunbird/salt-translation-test-set dataset hosted on Hugging Face and contributed by the HF Datasets community

  9. h

    salt-summarisation

    • huggingface.co
    Updated May 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Quinn (2024). salt-summarisation [Dataset]. https://huggingface.co/datasets/jq/salt-summarisation
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 28, 2024
    Authors
    John Quinn
    Description

    jq/salt-summarisation dataset hosted on Hugging Face and contributed by the HF Datasets community

  10. h

    salt-asr-correction

    • huggingface.co
    Updated Jun 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Quinn (2024). salt-asr-correction [Dataset]. https://huggingface.co/datasets/jq/salt-asr-correction
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 22, 2024
    Authors
    John Quinn
    Description

    jq/salt-asr-correction dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. h

    salt-translation-leaderboard

    • huggingface.co
    Updated Jun 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sunbird AI (2025). salt-translation-leaderboard [Dataset]. https://huggingface.co/datasets/Sunbird/salt-translation-leaderboard
    Explore at:
    Dataset updated
    Jun 13, 2025
    Dataset authored and provided by
    Sunbird AI
    Description

    Sunbird/salt-translation-leaderboard dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. h

    salt-corrected-asr-data-transcriptions

    • huggingface.co
    Updated May 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nafula Evelyn (2024). salt-corrected-asr-data-transcriptions [Dataset]. https://huggingface.co/datasets/evie-8/salt-corrected-asr-data-transcriptions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 3, 2024
    Authors
    Nafula Evelyn
    Description

    evie-8/salt-corrected-asr-data-transcriptions dataset hosted on Hugging Face and contributed by the HF Datasets community

  13. h

    Sunbird-salt-with-gender

    • huggingface.co
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Uganda Open Source AI Lab (USOAL) (2025). Sunbird-salt-with-gender [Dataset]. https://huggingface.co/datasets/USOAL/Sunbird-salt-with-gender
    Explore at:
    Dataset updated
    Jun 26, 2025
    Dataset authored and provided by
    Uganda Open Source AI Lab (USOAL)
    Description

    USOAL/Sunbird-salt-with-gender dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. h

    salt-multispeaker-eng

    • huggingface.co
    Updated Dec 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdul Waheed (2024). salt-multispeaker-eng [Dataset]. https://huggingface.co/datasets/macabdul9/salt-multispeaker-eng
    Explore at:
    Dataset updated
    Dec 30, 2024
    Authors
    Abdul Waheed
    Description

    macabdul9/salt-multispeaker-eng dataset hosted on Hugging Face and contributed by the HF Datasets community

  15. h

    huberman_lab_Using_Salt_to_Optimize_Mental_Physical_Performance

    • huggingface.co
    Updated Aug 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Masa (2024). huberman_lab_Using_Salt_to_Optimize_Mental_Physical_Performance [Dataset]. https://huggingface.co/datasets/MasaFoundation/huberman_lab_Using_Salt_to_Optimize_Mental_Physical_Performance
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 19, 2024
    Dataset authored and provided by
    Masa
    Description

    MasaFoundation/huberman_lab_Using_Salt_to_Optimize_Mental_Physical_Performance dataset hosted on Hugging Face and contributed by the HF Datasets community

  16. h

    wiki-balance-natural-qrels

    • huggingface.co
    Updated Jun 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Social And Language Technology Lab (2024). wiki-balance-natural-qrels [Dataset]. https://huggingface.co/datasets/SALT-NLP/wiki-balance-natural-qrels
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 4, 2024
    Dataset authored and provided by
    Social And Language Technology Lab
    Description

    SALT-NLP/wiki-balance-natural-qrels dataset hosted on Hugging Face and contributed by the HF Datasets community

  17. h

    salt-test

    • huggingface.co
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    harsha lingampally (2025). salt-test [Dataset]. https://huggingface.co/datasets/harshav17/salt-test
    Explore at:
    Dataset updated
    Jun 17, 2025
    Authors
    harsha lingampally
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset was created using LeRobot.

      Dataset Structure
    

    meta/info.json: { "codebase_version": "v2.1", "robot_type": "so100_follower", "total_episodes": 2, "total_frames": 3016, "total_tasks": 1, "total_videos": 2, "total_chunks": 1, "chunks_size": 1000, "fps": 30, "splits": { "train": "0:2" }, "data_path": "data/chunk-{episode_chunk:03d}/episode_{episode_index:06d}.parquet", "video_path":โ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/harshav17/salt-test.

  18. h

    did-Salt

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Elshabrawy, did-Salt [Dataset]. https://huggingface.co/datasets/ashabrawy/did-Salt
    Explore at:
    Authors
    Ahmed Elshabrawy
    Description

    ashabrawy/did-Salt dataset hosted on Hugging Face and contributed by the HF Datasets community

  19. h

    silent_signals

    • huggingface.co
    Updated Sep 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Social And Language Technology Lab (2024). silent_signals [Dataset]. https://huggingface.co/datasets/SALT-NLP/silent_signals
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 13, 2024
    Dataset authored and provided by
    Social And Language Technology Lab
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Silent Signals

    A dataset of dogwhistle use cases in informal and formal discourse. A dogwhistle is a form of coded communication that carries a secondary meaning to specific audiences and is often weaponized for racial and socioeconomic discrimination. Dog whistling historically originated from United States politics, but in recent years has taken root in social media as a means of evading hate speech detection systems and maintaining plausible deniability. We developed an approachโ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/SALT-NLP/silent_signals.

  20. h

    DeepDialogue-orpheus

    • huggingface.co
    Updated Mar 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Speech Audio and Language Technologies (2025). DeepDialogue-orpheus [Dataset]. https://huggingface.co/datasets/SALT-Research/DeepDialogue-orpheus
    Explore at:
    Dataset updated
    Mar 20, 2025
    Dataset authored and provided by
    Speech Audio and Language Technologies
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    DeepDialogue-orpheus

    DeepDialogue-orpheus is a large-scale multimodal dataset containing 40,150 high-quality multi-turn dialogues spanning 41 domains and incorporating 20 distinct emotions with coherent emotional progressions. This repository contains the Orpheus variant of the dataset, where speech is generated using Orpheus, a state-of-the-art TTS model that infers emotional expressions implicitly from text.

      ๐Ÿšจ Important Notice
    

    This dataset is large (~180GB) due toโ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/SALT-Research/DeepDialogue-orpheus.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
SAP AI Research, SALT [Dataset]. https://huggingface.co/datasets/sap-ai-research/SALT
Organization logo

SALT

sap-ai-research/SALT

Explore at:
Dataset provided by
SAPhttp://sap.com/
Authors
SAP AI Research
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

SALT: Sales Autocompletion Linked Business Tables Dataset

Dataset for our paper SALT: Sales Autocompletion Linked Business Tables Dataset presented at NeurIPS'24 Table Representation Workshop.

  News

07/10/2025: ๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰ Dataset is now integrated into RelBench ๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰ 01/11/2025: Updated paper (some results changed due to minor dataset changes, screenshots added to appendix) 12/19/2024: Train/test splits released 12/15/2024: Preliminatry dataset now also available onโ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/sap-ai-research/SALT.

Search
Clear search
Close search
Google apps
Main menu