Dataset Card for "COFFEE-Dataset"
This is the official dataset for COFFEE: Boost Your Code LLMs by Fixing Bugs with Feedback COFFEE dataset is built for training a critic that generates natural language feedback given an erroneous code. Overall Filtered ratio: 12.65% Short Feedback: 0.00% (0 samples) stdin readline present: 1.37% (639 samples) Low Diff Score: 7.79% (3622 samples) Low Variable Overlap: 1.75% (813 samples) Variable Name: 1.74% (807 samples) The number of problem… See the full description on the dataset page: https://huggingface.co/datasets/DLI-Lab/COFFEE-Dataset.