CodeParrot 🦜 Dataset
What is it?
This is the full CodeParrot dataset. It contains Python files used to train the code generation model in Chapter 10: Training Transformers from Scratch in the NLP with Transformers book. You can find the full code in the accompanying Github repository.
Creation
It was created with the GitHub dataset available via Google's BigQuery. It contains approximately 22 million Python files and is 180 GB (50 GB compressed) big. The… See the full description on the dataset page: https://huggingface.co/datasets/transformersbook/codeparrot.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
CodeParrot 🦜 Dataset
What is it?
This is the full CodeParrot dataset. It contains Python files used to train the code generation model in Chapter 10: Training Transformers from Scratch in the NLP with Transformers book. You can find the full code in the accompanying Github repository.
Creation
It was created with the GitHub dataset available via Google's BigQuery. It contains approximately 22 million Python files and is 180 GB (50 GB compressed) big. The… See the full description on the dataset page: https://huggingface.co/datasets/transformersbook/codeparrot.