saikrishna759/convAI dataset hosted on Hugging Face and contributed by the HF Datasets community
causal-lm/ConvAI dataset hosted on Hugging Face and contributed by the HF Datasets community
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for EPO-RL-data
This is the official data collection for paper "EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning". Please see paper & code for more information:
paper: https://arxiv.org/abs/2502.12486 code: https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/EPO
Uses
This dataset was used to train the strategic reasoning model (Llama-3-8B-Instruct) via RL reported in our paper. Note that dump.rdb was… See the full description on the dataset page: https://huggingface.co/datasets/Tongyi-ConvAI/EPO-RL-data.
Dataset Card for "oasst1-guanaco-damo-convai-pro"
More Information needed
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Nadi_Indic466K_Instruct Dataset
The Nadi_Indic466K_Instruct dataset is the world's first coding dataset with 18 Indian language support, 466k rows and 142 Million total tokens. This dataset can be used by developers to build Indian coding language models (LLMs) for various programming languages. Q-LoRA based SFT/PPO/DPO fine-tuning can be done on the dataset in LLAMA-2 or Mistral or any opens-soure LLM for text generation. The dataset was carefully curated such that the coding part… See the full description on the dataset page: https://huggingface.co/datasets/convaiinnovations/Nadi_Indic466k_Instruct.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
saikrishna759/convAI dataset hosted on Hugging Face and contributed by the HF Datasets community