1024m/PHI-4-Hindi-Instruct-Data dataset hosted on Hugging Face and contributed by the HF Datasets community
Dataset Card for Evaluation run of microsoft/phi-4
Dataset automatically created during the evaluation run of model microsoft/phi-4. The dataset is composed of 116 configuration, each one coresponding to one of the evaluated task. The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to the latest results. An additional configuration… See the full description on the dataset page: https://huggingface.co/datasets/OALL/details_microsoft_phi-4_v2.
Raw responses generated by Phi4 , questions from alamios/Mistral-Small-24B-Instruct-2501-Conversations Made it to use on the QwenPhi 0.5B Draft model, but the finetune did not yield much improvement, still I have generated the dataset so here is the raw data hopefully it is useful for someone.
Dataset Card for Evaluation run of microsoft/Phi-4-mini-instruct
Dataset automatically created during the evaluation run of model microsoft/Phi-4-mini-instruct. The dataset is composed of 1 configuration, each one coresponding to one of the evaluated task. The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to the latest results. An… See the full description on the dataset page: https://huggingface.co/datasets/OALL/details_microsoft_Phi-4-mini-instruct_v2_alrage.
R0bfried/RAGAS-BASE-phi-4-eval2 dataset hosted on Hugging Face and contributed by the HF Datasets community
mlfoundations-dev/Phi-4-reasoning-plus_eval_2693 dataset hosted on Hugging Face and contributed by the HF Datasets community
Malthe/phi4-hallucination-embeddings dataset hosted on Hugging Face and contributed by the HF Datasets community
ykarout/code-reasoning-phi4-template dataset hosted on Hugging Face and contributed by the HF Datasets community
FineTome-Phi4-Format
This dataset contains 15000 samples from the mlabonne/FineTome-100k dataset, formatted specifically for Phi-4 models with the system prompt set to "detailed thinking off".
Format
Each example follows the Phi chat format: <|im_start|>system<|im_sep|>detailed thinking off<|im_end|> <|im_start|>user<|im_sep|>[User content]<|im_end|> <|im_start|>assistant<|im_sep|>[Assistant response]<|im_end|>
Source
The original data comes from… See the full description on the dataset page: https://huggingface.co/datasets/ykarout/finetome-phi4-format.
alex-miller/crs-2014-2023-housing-labeled-phi4 dataset hosted on Hugging Face and contributed by the HF Datasets community
ykarout/nemotron-phi4-format-data dataset hosted on Hugging Face and contributed by the HF Datasets community
Nemotron Phi-4 Format Dataset
This dataset contains examples extracted from NVIDIA/Llama-Nemotron-Post-Training-Dataset, filtered and formatted for Phi-4 fine-tuning.
Dataset Details
Source: NVIDIA/Llama-Nemotron-Post-Training-Dataset Splits: code, math, science Filter: system_prompt="detailed thinking on", output length between 8000-10000 characters Format: Phi-4 chat template Size: 20000 examples
Format
Examples follow the Phi-4 chat template:… See the full description on the dataset page: https://huggingface.co/datasets/ykarout/nvidia-nemotron-sampled-phi4-format.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Phi-4 Math Chat Dataset (10k)
This dataset contains 10,000 carefully selected math reasoning samples formatted using the Phi-4 chat template. Each entry consists of:
A user message: the math question An assistant message: a final concise answer, extracted from deeper reasoning chains
Format
Each example follows the Phi-4 chat format: <|im_start|>user<|im_sep|>question<|im_end|> <|im_start|>assistant<|im_sep|>answer<|im_end|>
Usage
This dataset is suitable… See the full description on the dataset page: https://huggingface.co/datasets/ykarout/phi4-math-chat-10k.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
GhulamShabbirKhan/BankFAQs-Phi4 dataset hosted on Hugging Face and contributed by the HF Datasets community
gx-ai-architect/numinamath-178k-phi4-bon-verified-dpo-trl-40k-old-r1-format dataset hosted on Hugging Face and contributed by the HF Datasets community
R0bfried/RAGAS-INSTRUCT-phi-4-eval2 dataset hosted on Hugging Face and contributed by the HF Datasets community
Dataset Card for Evaluation run of FINGU-AI/Phi-4-RRStock
Dataset automatically created during the evaluation run of model FINGU-AI/Phi-4-RRStock The dataset is composed of 38 configuration(s), each one corresponding to one of the evaluated task. The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to the latest results. An additional… See the full description on the dataset page: https://huggingface.co/datasets/open-llm-leaderboard/FINGU-AI_Phi-4-RRStock-details.
kadirnar/Emilia-All-Ja-Phi-4-mini-instruct dataset hosted on Hugging Face and contributed by the HF Datasets community
mlfoundations-dev/Phi-4-reasoning-plus_eval_5554
Precomputed model outputs for evaluation.
Evaluation Results
Summary
Metric AIME24 AMC23 MATH500 MMLUPro JEEBench GPQADiamond LiveCodeBench CodeElo CodeForces HLE HMMT AIME25 LiveCodeBenchv5
Accuracy 76.0 96.2 84.0 14.6 83.5 66.8 0.8 2.4 3.5 7.1 53.0 68.0 0.5
AIME24
Average Accuracy: 76.00% ± 1.23% Number of Runs: 10
Run Accuracy Questions Solved Total Questions
1… See the full description on the dataset page: https://huggingface.co/datasets/mlfoundations-dev/Phi-4-reasoning-plus_eval_5554.
Dataset Card for Evaluation run of bunnycore/Phi-4-Model-Stock-v4
Dataset automatically created during the evaluation run of model bunnycore/Phi-4-Model-Stock-v4 The dataset is composed of 38 configuration(s), each one corresponding to one of the evaluated task. The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to the latest results.… See the full description on the dataset page: https://huggingface.co/datasets/open-llm-leaderboard/bunnycore_Phi-4-Model-Stock-v4-details.
1024m/PHI-4-Hindi-Instruct-Data dataset hosted on Hugging Face and contributed by the HF Datasets community