Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
SALT: Sales Autocompletion Linked Business Tables Dataset
Dataset for our paper SALT: Sales Autocompletion Linked Business Tables Dataset presented at NeurIPS'24 Table Representation Workshop.
News
07/10/2025: ๐๐๐ Dataset is now integrated into RelBench ๐๐๐ 01/11/2025: Updated paper (some results changed due to minor dataset changes, screenshots added to appendix) 12/19/2024: Train/test splits released 12/15/2024: Preliminatry dataset now also available onโฆ See the full description on the dataset page: https://huggingface.co/datasets/sap-ai-research/SALT.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
LLaVAR Data: Enhanced Visual Instruction Data with Text-Rich Images
More info at LLaVAR project page, Github repo, and paper.
Training Data
Based on the LAION dataset, we collect 422K pretraining data based on OCR results. For finetuning data, we collect 16K high-quality instruction-following data by interacting with langauge-only GPT-4. Note that we also release a larger and more diverse finetuning dataset below (20K), which contains the 16K we used for the paper. Theโฆ See the full description on the dataset page: https://huggingface.co/datasets/SALT-NLP/LLaVAR.
SALT-NLP/spotify_podcast_ASR dataset hosted on Hugging Face and contributed by the HF Datasets community
SALT-NLP/wiki-balance-natural dataset hosted on Hugging Face and contributed by the HF Datasets community
SALT-NLP/ProtectAndServe dataset hosted on Hugging Face and contributed by the HF Datasets community
SALT-NLP/CoQA_AppE dataset hosted on Hugging Face and contributed by the HF Datasets community
SALT-NLP/MultiModalInstructionFollowing dataset hosted on Hugging Face and contributed by the HF Datasets community
Sunbird/salt-translation-test-set dataset hosted on Hugging Face and contributed by the HF Datasets community
jq/salt-summarisation dataset hosted on Hugging Face and contributed by the HF Datasets community
jq/salt-asr-correction dataset hosted on Hugging Face and contributed by the HF Datasets community
Sunbird/salt-translation-leaderboard dataset hosted on Hugging Face and contributed by the HF Datasets community
evie-8/salt-corrected-asr-data-transcriptions dataset hosted on Hugging Face and contributed by the HF Datasets community
USOAL/Sunbird-salt-with-gender dataset hosted on Hugging Face and contributed by the HF Datasets community
macabdul9/salt-multispeaker-eng dataset hosted on Hugging Face and contributed by the HF Datasets community
MasaFoundation/huberman_lab_Using_Salt_to_Optimize_Mental_Physical_Performance dataset hosted on Hugging Face and contributed by the HF Datasets community
SALT-NLP/wiki-balance-natural-qrels dataset hosted on Hugging Face and contributed by the HF Datasets community
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created using LeRobot.
Dataset Structure
meta/info.json: { "codebase_version": "v2.1", "robot_type": "so100_follower", "total_episodes": 2, "total_frames": 3016, "total_tasks": 1, "total_videos": 2, "total_chunks": 1, "chunks_size": 1000, "fps": 30, "splits": { "train": "0:2" }, "data_path": "data/chunk-{episode_chunk:03d}/episode_{episode_index:06d}.parquet", "video_path":โฆ See the full description on the dataset page: https://huggingface.co/datasets/harshav17/salt-test.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Silent Signals
A dataset of dogwhistle use cases in informal and formal discourse. A dogwhistle is a form of coded communication that carries a secondary meaning to specific audiences and is often weaponized for racial and socioeconomic discrimination. Dog whistling historically originated from United States politics, but in recent years has taken root in social media as a means of evading hate speech detection systems and maintaining plausible deniability. We developed an approachโฆ See the full description on the dataset page: https://huggingface.co/datasets/SALT-NLP/silent_signals.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
DeepDialogue-orpheus
DeepDialogue-orpheus is a large-scale multimodal dataset containing 40,150 high-quality multi-turn dialogues spanning 41 domains and incorporating 20 distinct emotions with coherent emotional progressions. This repository contains the Orpheus variant of the dataset, where speech is generated using Orpheus, a state-of-the-art TTS model that infers emotional expressions implicitly from text.
๐จ Important Notice
This dataset is large (~180GB) due toโฆ See the full description on the dataset page: https://huggingface.co/datasets/SALT-Research/DeepDialogue-orpheus.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
SALT: Sales Autocompletion Linked Business Tables Dataset
Dataset for our paper SALT: Sales Autocompletion Linked Business Tables Dataset presented at NeurIPS'24 Table Representation Workshop.
News
07/10/2025: ๐๐๐ Dataset is now integrated into RelBench ๐๐๐ 01/11/2025: Updated paper (some results changed due to minor dataset changes, screenshots added to appendix) 12/19/2024: Train/test splits released 12/15/2024: Preliminatry dataset now also available onโฆ See the full description on the dataset page: https://huggingface.co/datasets/sap-ai-research/SALT.