100+ datasets found
  1. Dataset_Python_Question_Answer

    • kaggle.com
    zip
    Updated Mar 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chinmaya (2024). Dataset_Python_Question_Answer [Dataset]. https://www.kaggle.com/datasets/chinmayadatt/dataset-python-question-answer
    Explore at:
    zip(189137 bytes)Available download formats
    Dataset updated
    Mar 29, 2024
    Authors
    Chinmaya
    License

    Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
    License information was derived automatically

    Description

    This dataset is about Python programming. Question and answers are generated using Gemma. There are more than four hundred questions and their corresponding answers about Python programming.

    Questions are ranging from concepts like data-types, variables and keywords to regular-expression and threading.

    I have used this dataset here

    The code used for dataset generated is available here

  2. Data from: KGTorrent: A Dataset of Python Jupyter Notebooks from Kaggle

    • zenodo.org
    • dataon.kisti.re.kr
    • +1more
    bin, bz2, pdf
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luigi Quaranta; Fabio Calefato; Fabio Calefato; Filippo Lanubile; Filippo Lanubile; Luigi Quaranta (2024). KGTorrent: A Dataset of Python Jupyter Notebooks from Kaggle [Dataset]. http://doi.org/10.5281/zenodo.4468523
    Explore at:
    bz2, pdf, binAvailable download formats
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Luigi Quaranta; Fabio Calefato; Fabio Calefato; Filippo Lanubile; Filippo Lanubile; Luigi Quaranta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    KGTorrent is a dataset of Python Jupyter notebooks from the Kaggle platform.

    The dataset is accompanied by a MySQL database containing metadata about the notebooks and the activity of Kaggle users on the platform. The information to build the MySQL database has been derived from Meta Kaggle, a publicly available dataset containing Kaggle metadata.

    In this package, we share the complete KGTorrent dataset (consisting of the dataset itself plus its companion database), as well as the specific version of Meta Kaggle used to build the database.

    More specifically, the package comprises the following three compressed archives:

    1. KGT_dataset.tar.bz2, the dataset of Jupyter notebooks;

    2. KGTorrent_dump_10-2020.sql.tar.bz2, the dump of the MySQL companion database;

    3. MetaKaggle27Oct2020.tar.bz2, a copy of the Meta Kaggle version used to build the database.

    Moreover, we include KGTorrent_logical_schema.pdf, the logical schema of the KGTorrent MySQL database.

  3. Meta Kaggle Code

    • kaggle.com
    zip
    Updated Nov 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
    Explore at:
    zip(167219625372 bytes)Available download formats
    Dataset updated
    Nov 27, 2025
    Dataset authored and provided by
    Kagglehttp://kaggle.com/
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Explore our public notebook content!

    Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

    Why we’re releasing this dataset

    By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

    Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

    The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

    Sensitive data

    While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

    Joining with Meta Kaggle

    The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

    File organization

    The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

    The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

    Questions / Comments

    We love feedback! Let us know in the Discussion tab.

    Happy Kaggling!

  4. Python and Javascript Code

    • kaggle.com
    zip
    Updated Nov 27, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jordan Tantuico (2023). Python and Javascript Code [Dataset]. https://www.kaggle.com/datasets/jordantantuico/python-and-javascript-code
    Explore at:
    zip(62697 bytes)Available download formats
    Dataset updated
    Nov 27, 2023
    Authors
    Jordan Tantuico
    Description

    Dataset

    This dataset was created by Jordan Tantuico

    Contents

  5. Kaggle's Most Used Packages & Method Calls

    • kaggle.com
    zip
    Updated Jun 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TheItCrow (2025). Kaggle's Most Used Packages & Method Calls [Dataset]. https://www.kaggle.com/datasets/kevinbnisch/kaggles-most-used-packages-and-method-calls
    Explore at:
    zip(2405388375 bytes)Available download formats
    Dataset updated
    Jun 13, 2025
    Authors
    TheItCrow
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Enriching the Meta-Kaggle dataset using the Meta Kaggle Code to extract all Imports (for both R and Python) and Method Calls (only Python) as lists, which are then added to the KernelVersions.csv file as the columns Imports and MethodCalls.

    Most Imported R PackagesMost Imported Python Packages
    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F17421843%2F5bb95536aa5d8092d56f526aa04c8cd1%2Foutput.png?generation=1749374431744993&alt=media" alt="">https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F17421843%2Fa3d9a02ae0b314bfa6b3eb411c405ec0%2Foutput1.png?generation=1749374439690291&alt=media" alt="">


    We perform this extraction using the following three regex patterns:

    PYTHON_IMPORT_REGEX = re.compile(r'(?:from\s+([a-zA-Z0-9_\.]+)\s+import|import\s+([a-zA-Z0-9_\.]+))')
    PYTHON_METHOD_REGEX = *I wish I could add the regex here but kaggle kinda breaks if I do lol*
    R_IMPORT_REGEX = re.compile(r'(?:library|require)\((?:[\'"]?)([a-zA-Z0-9_.]+)(?:[\'"]?)\)')
    

    This dataset was created on 06-06-2025. Since the computation required for this process is very resource-intensive and cannot be run on a Kaggle kernel, it is not scheduled. A notebook demonstrating how to create this dataset and what insights it provides can be found here.

  6. AVATAR: Java-Python Program Translation Dataset

    • kaggle.com
    zip
    Updated Dec 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    makeitsimple (2022). AVATAR: Java-Python Program Translation Dataset [Dataset]. https://www.kaggle.com/datasets/hetulvpatel/avatar-javapython-program-translation-dataset
    Explore at:
    zip(8699485 bytes)Available download formats
    Dataset updated
    Dec 3, 2022
    Authors
    makeitsimple
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    What is AVATAR?

    Paper | Code

    • AVATAR stands for jAVA-pyThon progrAm tRanslation.
    • AVATAR is a corpus of 9,515 programming problems and their solutions written in Java and Python.

    Files Description

    • {{language}}_programms_{{split}}.tfrecord: Programs for unsupervised pretraining for java and python languages divided into the train, valid and test split.

      keys: code: source code and language: language name.

  7. pyVips: python & deb 📦package

    • kaggle.com
    Updated Oct 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jirka Borovec (2023). pyVips: python & deb 📦package [Dataset]. https://www.kaggle.com/datasets/jirkaborovec/pyvips-python-and-deb-package
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 23, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jirka Borovec
    Description

    Downloaded both Python and Debian packages for offline use. The creation and usage is described in https://www.kaggle.com/code/jirkaborovec/pip-pkg-pyvips-download-4-offline

    How to use:

    1. Click "**Add Data**" on your own notebook
    2. Search for dataset pyVips: python & deb package
    3. Run those installation lines below:
    !ls /kaggle/input/pyvips-python-and-deb-package
    # intall the deb packages
    !dpkg -i --force-depends /kaggle/input/pyvips-python-and-deb-package/linux_packages/archives/*.deb
    # install the python wrapper
    !pip install pyvips -f /kaggle/input/pyvips-python-and-deb-package/python_packages/ --no-index
    !pip list | grep pyvips
    
  8. Python IPL Data Project

    • kaggle.com
    zip
    Updated Jan 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pawan Kumar (2023). Python IPL Data Project [Dataset]. https://www.kaggle.com/datasets/pawankumar19/python-ipl-data-project
    Explore at:
    zip(161013 bytes)Available download formats
    Dataset updated
    Jan 27, 2023
    Authors
    Pawan Kumar
    Description

    Dataset

    This dataset was created by Pawan Kumar

    Contents

  9. Quasi-experimental Methods

    • kaggle.com
    zip
    Updated Jun 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harry Wang (2022). Quasi-experimental Methods [Dataset]. https://www.kaggle.com/datasets/harrywang/propensity-score-matching
    Explore at:
    zip(24985 bytes)Available download formats
    Dataset updated
    Jun 2, 2022
    Authors
    Harry Wang
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset is joint work with Gang Wang.

    smoker.csv is a simple simulated dataset on treatment results for patients, who may or may not be smokers.

    • smoker: 1 yes, 0 no
    • treatment: 1 treated, 0 not treated (control)
    • outcome: 1 dead, 0 not dead

    groupon.csv is a dataset of Groupon deals collected by Gang and used in his research paper.

    • deal_id: the ID of the deal
    • start_date: the starting date of the deal
    • min_req: minimal number of orders for the deal to work
    • treatment: 1 if has min_req, 0 otherwise
    • prom_length: the length of the deal
    • price: unit price of the item
    • discount_pct: discount percentage
    • coupon_duration: coupon duration
    • featured: whether the deal is featured or not
    • limited_supply: whether the supply of the item is limited or not
    • fb_likes: Facebook likes received
    • quantity_sold: quantity sold
    • revenue: revenue of the deal

    employment.csv is adapted from the dataset in Card and Krueger (1994), which estimates the causal effect of an increase in the state minimum wage on the employment.

    • On April 1, 1992, New Jersey raised the state minimum wage from $4.25 to $5.05 while the minimum wage in Pennsylvania stays the same at $4.25.
    • data about the employment in the fast food restaurants in NJ (0) and PA (1) were collected in February 1992 and in November 1992.
    • total 384 restaurants after removing null values
  10. Python Questions and Answers

    • kaggle.com
    zip
    Updated Apr 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oscar Wang (2024). Python Questions and Answers [Dataset]. https://www.kaggle.com/datasets/orionai/python-questions-and-answers
    Explore at:
    zip(3099 bytes)Available download formats
    Dataset updated
    Apr 14, 2024
    Authors
    Oscar Wang
    Description

    Dataset

    This dataset was created by Oscar Wang

    Contents

  11. python-box

    • kaggle.com
    zip
    Updated Aug 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HyeongChan Kim (2022). python-box [Dataset]. https://www.kaggle.com/datasets/kozistr/pythonbox
    Explore at:
    zip(94465 bytes)Available download formats
    Dataset updated
    Aug 27, 2022
    Authors
    HyeongChan Kim
    Description

    Dataset

    This dataset was created by HyeongChan Kim

    Contents

  12. Automobile Dataset For EDA Python And R

    • kaggle.com
    zip
    Updated Nov 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anubhav Kumar Gupta (2023). Automobile Dataset For EDA Python And R [Dataset]. https://www.kaggle.com/datasets/anubhavkumargupta/automobile-dataset-for-eda-python-and-r
    Explore at:
    zip(4923 bytes)Available download formats
    Dataset updated
    Nov 15, 2023
    Authors
    Anubhav Kumar Gupta
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Anubhav Kumar Gupta

    Released under Apache 2.0

    Contents

  13. Codenetpy

    • kaggle.com
    zip
    Updated May 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alex Jercan (2023). Codenetpy [Dataset]. https://www.kaggle.com/datasets/alexjercan/codenetpy
    Explore at:
    zip(35078290 bytes)Available download formats
    Dataset updated
    May 18, 2023
    Authors
    Alex Jercan
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Source code related tasks for machine learning have become important with the large need of software production. In this dataset our main goal is to create a dataset for bug detection and repair.

    Content

    The dataset is based on the CodeNet project and contains python code submissions for online coding competitions. The data is obtained by selecting consecutive attempts of a single user that resulted in fixing a buggy submission. Thus the data is represented by code pairs and annotated by the diff and error of each changed instruction. We have already tokenized all the source code files and kept the same format as in the original dataset.

    Acknowledgements

    CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks

    Inspiration

    Our goal is to create a bug detection and repair pipeline for online coding competition problems.

    • What are the most common mistakes (input, output, solving the problem)?
    • Is there any correlation between using libraries and mistakes in function calls?
    • What type of instruction is labeled as buggy the most (function call, for loop, if statement, binary operations)?
  14. Python FAQ Dataset

    • kaggle.com
    zip
    Updated Apr 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    William Alabi (2024). Python FAQ Dataset [Dataset]. https://www.kaggle.com/datasets/williamalabi/python-faq-dataset
    Explore at:
    zip(36158 bytes)Available download formats
    Dataset updated
    Apr 10, 2024
    Authors
    William Alabi
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Attention! Potential Scraped Python FAQs Inside

    This document holds a compilation of frequently asked questions (FAQs) regarding Python, presumably gathered from the authoritative source for all things Python – the official website, python.org. However, a word of caution:

    Beware of the Scrape!

    Since this collection stems from a scraping process, there's a chance the information might not be current or might lack the necessary context to be fully understood. For the most dependable and comprehensive details about Python, it's always recommended to consult the official Python documentation, which is meticulously maintained and guaranteed to be fresh.

    But what if this snippet of scraped FAQs sparks your curiosity?

    Well, fret not! This collection can serve as a springboard for further exploration. Look through the questions and if any pique your interest, use them as stepping stones to delve deeper into the official Python resources.

    Here are some ways to leverage these FAQs effectively:

    Identify areas you'd like to learn more about: If a specific question resonates with you, head over to the official Python documentation and search for that exact topic or its close equivalent.
    Gauge your existing Python knowledge: Review the FAQs and see how many you can answer comfortably. This can help you assess your current understanding of Python.
    Form a foundation for further learning: These FAQs, although potentially outdated, can provide a basic framework of Python concepts. Use them as a starting point to build your knowledge with the help of the official documentation and other reliable Python learning resources.
    

    Remember, while scraped data can be a handy starting point, official sources are the gold standard for accurate and up-to-date information. So, use this collection with a critical eye and leverage it to springboard your Pythonic journey!

  15. Python Practical Exam

    • kaggle.com
    Updated Mar 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allan Kirwa (2024). Python Practical Exam [Dataset]. https://www.kaggle.com/datasets/allankirwa/python-practical-exam
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 3, 2024
    Dataset provided by
    Kaggle
    Authors
    Allan Kirwa
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Allan Kirwa

    Released under Apache 2.0

    Contents

  16. Seaborn (Flights, Iris, Tips)

    • kaggle.com
    zip
    Updated Jan 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohan Pradhan (2024). Seaborn (Flights, Iris, Tips) [Dataset]. https://www.kaggle.com/datasets/mohanpradhan42/seaborn-flights-iris-tips
    Explore at:
    zip(3639 bytes)Available download formats
    Dataset updated
    Jan 3, 2024
    Authors
    Mohan Pradhan
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Mohan Pradhan

    Released under Apache 2.0

    Contents

  17. Exploratory Data Analysis on Automobile Dataset

    • kaggle.com
    zip
    Updated Sep 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Monis Ahmad (2022). Exploratory Data Analysis on Automobile Dataset [Dataset]. https://www.kaggle.com/datasets/monisahmad/automobile
    Explore at:
    zip(4915 bytes)Available download formats
    Dataset updated
    Sep 12, 2022
    Authors
    Monis Ahmad
    Description

    Dataset

    This dataset was created by Monis Ahmad

    Contents

  18. Practical ML implementations in Python

    • kaggle.com
    zip
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rajat Gupta (2025). Practical ML implementations in Python [Dataset]. https://www.kaggle.com/datasets/rajat95gupta/practical-ml-implementations-in-python
    Explore at:
    zip(4531794 bytes)Available download formats
    Dataset updated
    Nov 21, 2025
    Authors
    Rajat Gupta
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The dataset contains implementations of different use-cases in the Machine Learning life cycle - from data extraction through deployment. There are paper implementations from scratch, and examples of file handling, model conversion, web scraping, deployment using APIs etc.

  19. Python for Data science

    • kaggle.com
    zip
    Updated Dec 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Valde Junior (2024). Python for Data science [Dataset]. https://www.kaggle.com/datasets/valdejuinior/python-for-data-science/code
    Explore at:
    zip(849652 bytes)Available download formats
    Dataset updated
    Dec 9, 2024
    Authors
    Valde Junior
    Description

    Dataset

    This dataset was created by Valde Junior

    Contents

  20. Data set python

    • kaggle.com
    zip
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kolluri Nithin (2023). Data set python [Dataset]. https://www.kaggle.com/datasets/kollurinithin/data-set-python/code
    Explore at:
    zip(309360 bytes)Available download formats
    Dataset updated
    Jul 13, 2023
    Authors
    Kolluri Nithin
    Description

    Dataset

    This dataset was created by Kolluri Nithin

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Chinmaya (2024). Dataset_Python_Question_Answer [Dataset]. https://www.kaggle.com/datasets/chinmayadatt/dataset-python-question-answer
Organization logo

Dataset_Python_Question_Answer

Answer common questions about the Python programming language

Explore at:
zip(189137 bytes)Available download formats
Dataset updated
Mar 29, 2024
Authors
Chinmaya
License

Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically

Description

This dataset is about Python programming. Question and answers are generated using Gemma. There are more than four hundred questions and their corresponding answers about Python programming.

Questions are ranging from concepts like data-types, variables and keywords to regular-expression and threading.

I have used this dataset here

The code used for dataset generated is available here

Search
Clear search
Close search
Google apps
Main menu