2 datasets found
  1. h

    codeparrot

    • huggingface.co
    Updated Sep 1, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Natural Language Processing with Transformers (2021). codeparrot [Dataset]. https://huggingface.co/datasets/transformersbook/codeparrot
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 1, 2021
    Dataset authored and provided by
    Natural Language Processing with Transformers
    Description

    CodeParrot 🦜 Dataset

      What is it?
    

    This is the full CodeParrot dataset. It contains Python files used to train the code generation model in Chapter 10: Training Transformers from Scratch in the NLP with Transformers book. You can find the full code in the accompanying Github repository.

      Creation
    

    It was created with the GitHub dataset available via Google's BigQuery. It contains approximately 22 million Python files and is 180 GB (50 GB compressed) big. The… See the full description on the dataset page: https://huggingface.co/datasets/transformersbook/codeparrot.

  2. h

    NLPProj

    • huggingface.co
    Updated May 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Siddiqui (2025). NLPProj [Dataset]. https://huggingface.co/datasets/alisid03/NLPProj
    Explore at:
    Dataset updated
    May 11, 2025
    Authors
    Ali Siddiqui
    Description

    ✈️ Flight Booking Chatbot

    A conversational Python chatbot that helps users search and book flights. It integrates OpenAI's GPT model for intelligent query parsing and uses SQLite to store flight and booking data.

      🧠 Features
    

    Natural language understanding powered by GPT (via OpenAI API) Flight search with flexible date ranges and route parsing SQLite database for storing flights and user bookings Interactive CLI chatbot interface View existing bookings by name and… See the full description on the dataset page: https://huggingface.co/datasets/alisid03/NLPProj.

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Natural Language Processing with Transformers (2021). codeparrot [Dataset]. https://huggingface.co/datasets/transformersbook/codeparrot

codeparrot

transformersbook/codeparrot

Explore at:
8 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 1, 2021
Dataset authored and provided by
Natural Language Processing with Transformers
Description

CodeParrot 🦜 Dataset

  What is it?

This is the full CodeParrot dataset. It contains Python files used to train the code generation model in Chapter 10: Training Transformers from Scratch in the NLP with Transformers book. You can find the full code in the accompanying Github repository.

  Creation

It was created with the GitHub dataset available via Google's BigQuery. It contains approximately 22 million Python files and is 180 GB (50 GB compressed) big. The… See the full description on the dataset page: https://huggingface.co/datasets/transformersbook/codeparrot.

Search
Clear search
Close search
Google apps
Main menu