100+ datasets found

Dataset_Python_Question_Answer
kaggle.com
zip
Updated Mar 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chinmaya (2024). Dataset_Python_Question_Answer [Dataset]. https://www.kaggle.com/datasets/chinmayadatt/dataset-python-question-answer
Explore at:
zip(189137 bytes)Available download formats
Dataset updated
Mar 29, 2024
Authors
Chinmaya
License
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Description
This dataset is about Python programming. Question and answers are generated using Gemma. There are more than four hundred questions and their corresponding answers about Python programming.

Questions are ranging from concepts like data-types, variables and keywords to regular-expression and threading.

I have used this dataset here

The code used for dataset generated is available here
Data from: KGTorrent: A Dataset of Python Jupyter Notebooks from Kaggle
zenodo.org
dataon.kisti.re.kr
+1more
bin, bz2, pdf
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luigi Quaranta; Fabio Calefato; Fabio Calefato; Filippo Lanubile; Filippo Lanubile; Luigi Quaranta (2024). KGTorrent: A Dataset of Python Jupyter Notebooks from Kaggle [Dataset]. http://doi.org/10.5281/zenodo.4468523
Explore at:
bz2, pdf, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4468523
Dataset updated
Jul 19, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Luigi Quaranta; Fabio Calefato; Fabio Calefato; Filippo Lanubile; Filippo Lanubile; Luigi Quaranta
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
KGTorrent is a dataset of Python Jupyter notebooks from the Kaggle platform.

The dataset is accompanied by a MySQL database containing metadata about the notebooks and the activity of Kaggle users on the platform. The information to build the MySQL database has been derived from Meta Kaggle, a publicly available dataset containing Kaggle metadata.

In this package, we share the complete KGTorrent dataset (consisting of the dataset itself plus its companion database), as well as the specific version of Meta Kaggle used to build the database.

More specifically, the package comprises the following three compressed archives:

KGT_dataset.tar.bz2, the dataset of Jupyter notebooks;

KGTorrent_dump_10-2020.sql.tar.bz2, the dump of the MySQL companion database;

MetaKaggle27Oct2020.tar.bz2, a copy of the Meta Kaggle version used to build the database.

Moreover, we include KGTorrent_logical_schema.pdf, the logical schema of the KGTorrent MySQL database.
Meta Kaggle Code
kaggle.com
zip
Updated Nov 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
Explore at:
zip(167219625372 bytes)Available download formats
Dataset updated
Nov 27, 2025
Dataset authored and provided by
Kagglehttp://kaggle.com/
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Explore our public notebook content!

Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

Why we’re releasing this dataset

By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

Sensitive data

While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

Joining with Meta Kaggle

The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

File organization

The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

Questions / Comments

We love feedback! Let us know in the Discussion tab.

Happy Kaggling!
Python and Javascript Code
kaggle.com
zip
Updated Nov 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jordan Tantuico (2023). Python and Javascript Code [Dataset]. https://www.kaggle.com/datasets/jordantantuico/python-and-javascript-code
Explore at:
zip(62697 bytes)Available download formats
Dataset updated
Nov 27, 2023
Authors
Jordan Tantuico
Description
Dataset

This dataset was created by Jordan Tantuico

Contents

Kaggle's Most Used Packages & Method Calls

kaggle.com

zip

Updated Jun 13, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

TheItCrow (2025). Kaggle's Most Used Packages & Method Calls [Dataset]. https://www.kaggle.com/datasets/kevinbnisch/kaggles-most-used-packages-and-method-calls

Explore at:

zip(2405388375 bytes)Available download formats

Dataset updated

Jun 13, 2025

Authors

TheItCrow

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Enriching the Meta-Kaggle dataset using the Meta Kaggle Code to extract all Imports (for both R and Python) and Method Calls (only Python) as lists, which are then added to the KernelVersions.csv file as the columns Imports and MethodCalls.

Most Imported R Packages	Most Imported Python Packages
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F17421843%2F5bb95536aa5d8092d56f526aa04c8cd1%2Foutput.png?generation=1749374431744993&alt=media" alt="">	https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F17421843%2Fa3d9a02ae0b314bfa6b3eb411c405ec0%2Foutput1.png?generation=1749374439690291&alt=media" alt="">

We perform this extraction using the following three regex patterns:

PYTHON_IMPORT_REGEX = re.compile(r'(?:from\s+([a-zA-Z0-9_\.]+)\s+import|import\s+([a-zA-Z0-9_\.]+))')
PYTHON_METHOD_REGEX = *I wish I could add the regex here but kaggle kinda breaks if I do lol*
R_IMPORT_REGEX = re.compile(r'(?:library|require)\((?:[\'"]?)([a-zA-Z0-9_.]+)(?:[\'"]?)\)')

This dataset was created on 06-06-2025. Since the computation required for this process is very resource-intensive and cannot be run on a Kaggle kernel, it is not scheduled. A notebook demonstrating how to create this dataset and what insights it provides can be found here.

AVATAR: Java-Python Program Translation Dataset
kaggle.com
zip
Updated Dec 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
makeitsimple (2022). AVATAR: Java-Python Program Translation Dataset [Dataset]. https://www.kaggle.com/datasets/hetulvpatel/avatar-javapython-program-translation-dataset
Explore at:
zip(8699485 bytes)Available download formats
Dataset updated
Dec 3, 2022
Authors
makeitsimple
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
What is AVATAR?

Paper | Code

AVATAR stands for jAVA-pyThon progrAm tRanslation.

AVATAR is a corpus of 9,515 programming problems and their solutions written in Java and Python.

Files Description

{{language}}_programms_{{split}}.tfrecord: Programs for unsupervised pretraining for java and python languages divided into the train, valid and test split.

keys: code: source code and language: language name.
pyVips: python & deb 📦package
kaggle.com
Updated Oct 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jirka Borovec (2023). pyVips: python & deb 📦package [Dataset]. https://www.kaggle.com/datasets/jirkaborovec/pyvips-python-and-deb-package
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 23, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Jirka Borovec
Description
Downloaded both Python and Debian packages for offline use. The creation and usage is described in https://www.kaggle.com/code/jirkaborovec/pip-pkg-pyvips-download-4-offline

How to use:

Click "**Add Data**" on your own notebook

Search for dataset pyVips: python & deb package

Run those installation lines below:

!ls /kaggle/input/pyvips-python-and-deb-package # intall the deb packages !dpkg -i --force-depends /kaggle/input/pyvips-python-and-deb-package/linux_packages/archives/*.deb # install the python wrapper !pip install pyvips -f /kaggle/input/pyvips-python-and-deb-package/python_packages/ --no-index !pip list | grep pyvips
Python IPL Data Project
kaggle.com
zip
Updated Jan 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pawan Kumar (2023). Python IPL Data Project [Dataset]. https://www.kaggle.com/datasets/pawankumar19/python-ipl-data-project
Explore at:
zip(161013 bytes)Available download formats
Dataset updated
Jan 27, 2023
Authors
Pawan Kumar
Description
Dataset

This dataset was created by Pawan Kumar

Contents
Quasi-experimental Methods
kaggle.com
zip
Updated Jun 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harry Wang (2022). Quasi-experimental Methods [Dataset]. https://www.kaggle.com/datasets/harrywang/propensity-score-matching
Explore at:
zip(24985 bytes)Available download formats
Dataset updated
Jun 2, 2022
Authors
Harry Wang
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset is joint work with Gang Wang.

smoker.csv is a simple simulated dataset on treatment results for patients, who may or may not be smokers.

smoker: 1 yes, 0 no

treatment: 1 treated, 0 not treated (control)

outcome: 1 dead, 0 not dead

groupon.csv is a dataset of Groupon deals collected by Gang and used in his research paper.

deal_id: the ID of the deal

start_date: the starting date of the deal

min_req: minimal number of orders for the deal to work

treatment: 1 if has min_req, 0 otherwise

prom_length: the length of the deal

price: unit price of the item

discount_pct: discount percentage

coupon_duration: coupon duration

featured: whether the deal is featured or not

limited_supply: whether the supply of the item is limited or not

fb_likes: Facebook likes received

quantity_sold: quantity sold

revenue: revenue of the deal

employment.csv is adapted from the dataset in Card and Krueger (1994), which estimates the causal effect of an increase in the state minimum wage on the employment.

On April 1, 1992, New Jersey raised the state minimum wage from $4.25 to $5.05 while the minimum wage in Pennsylvania stays the same at $4.25.

data about the employment in the fast food restaurants in NJ (0) and PA (1) were collected in February 1992 and in November 1992.

total 384 restaurants after removing null values
Python Questions and Answers
kaggle.com
zip
Updated Apr 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oscar Wang (2024). Python Questions and Answers [Dataset]. https://www.kaggle.com/datasets/orionai/python-questions-and-answers
Explore at:
zip(3099 bytes)Available download formats
Dataset updated
Apr 14, 2024
Authors
Oscar Wang
Description
Dataset

This dataset was created by Oscar Wang

Contents
python-box
kaggle.com
zip
Updated Aug 27, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HyeongChan Kim (2022). python-box [Dataset]. https://www.kaggle.com/datasets/kozistr/pythonbox
Explore at:
zip(94465 bytes)Available download formats
Dataset updated
Aug 27, 2022
Authors
HyeongChan Kim
Description
Dataset

This dataset was created by HyeongChan Kim

Contents
Automobile Dataset For EDA Python And R
kaggle.com
zip
Updated Nov 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anubhav Kumar Gupta (2023). Automobile Dataset For EDA Python And R [Dataset]. https://www.kaggle.com/datasets/anubhavkumargupta/automobile-dataset-for-eda-python-and-r
Explore at:
zip(4923 bytes)Available download formats
Dataset updated
Nov 15, 2023
Authors
Anubhav Kumar Gupta
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Anubhav Kumar Gupta

Released under Apache 2.0

Contents
Codenetpy
kaggle.com
zip
Updated May 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alex Jercan (2023). Codenetpy [Dataset]. https://www.kaggle.com/datasets/alexjercan/codenetpy
Explore at:
zip(35078290 bytes)Available download formats
Dataset updated
May 18, 2023
Authors
Alex Jercan
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Source code related tasks for machine learning have become important with the large need of software production. In this dataset our main goal is to create a dataset for bug detection and repair.

Content

The dataset is based on the CodeNet project and contains python code submissions for online coding competitions. The data is obtained by selecting consecutive attempts of a single user that resulted in fixing a buggy submission. Thus the data is represented by code pairs and annotated by the diff and error of each changed instruction. We have already tokenized all the source code files and kept the same format as in the original dataset.

Acknowledgements

CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks

Inspiration

Our goal is to create a bug detection and repair pipeline for online coding competition problems.

What are the most common mistakes (input, output, solving the problem)?

Is there any correlation between using libraries and mistakes in function calls?

What type of instruction is labeled as buggy the most (function call, for loop, if statement, binary operations)?
Python FAQ Dataset
kaggle.com
zip
Updated Apr 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
William Alabi (2024). Python FAQ Dataset [Dataset]. https://www.kaggle.com/datasets/williamalabi/python-faq-dataset
Explore at:
zip(36158 bytes)Available download formats
Dataset updated
Apr 10, 2024
Authors
William Alabi
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Attention! Potential Scraped Python FAQs Inside

This document holds a compilation of frequently asked questions (FAQs) regarding Python, presumably gathered from the authoritative source for all things Python – the official website, python.org. However, a word of caution:

Beware of the Scrape!

Since this collection stems from a scraping process, there's a chance the information might not be current or might lack the necessary context to be fully understood. For the most dependable and comprehensive details about Python, it's always recommended to consult the official Python documentation, which is meticulously maintained and guaranteed to be fresh.

But what if this snippet of scraped FAQs sparks your curiosity?

Well, fret not! This collection can serve as a springboard for further exploration. Look through the questions and if any pique your interest, use them as stepping stones to delve deeper into the official Python resources.

Here are some ways to leverage these FAQs effectively:

Identify areas you'd like to learn more about: If a specific question resonates with you, head over to the official Python documentation and search for that exact topic or its close equivalent. Gauge your existing Python knowledge: Review the FAQs and see how many you can answer comfortably. This can help you assess your current understanding of Python. Form a foundation for further learning: These FAQs, although potentially outdated, can provide a basic framework of Python concepts. Use them as a starting point to build your knowledge with the help of the official documentation and other reliable Python learning resources.

Remember, while scraped data can be a handy starting point, official sources are the gold standard for accurate and up-to-date information. So, use this collection with a critical eye and leverage it to springboard your Pythonic journey!
Python Practical Exam
kaggle.com
Updated Mar 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Allan Kirwa (2024). Python Practical Exam [Dataset]. https://www.kaggle.com/datasets/allankirwa/python-practical-exam
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 3, 2024
Dataset provided by
Kaggle
Authors
Allan Kirwa
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Allan Kirwa

Released under Apache 2.0

Contents
Seaborn (Flights, Iris, Tips)
kaggle.com
zip
Updated Jan 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohan Pradhan (2024). Seaborn (Flights, Iris, Tips) [Dataset]. https://www.kaggle.com/datasets/mohanpradhan42/seaborn-flights-iris-tips
Explore at:
zip(3639 bytes)Available download formats
Dataset updated
Jan 3, 2024
Authors
Mohan Pradhan
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Mohan Pradhan

Released under Apache 2.0

Contents
Exploratory Data Analysis on Automobile Dataset
kaggle.com
zip
Updated Sep 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Monis Ahmad (2022). Exploratory Data Analysis on Automobile Dataset [Dataset]. https://www.kaggle.com/datasets/monisahmad/automobile
Explore at:
zip(4915 bytes)Available download formats
Dataset updated
Sep 12, 2022
Authors
Monis Ahmad
Description
Dataset

This dataset was created by Monis Ahmad

Contents
Practical ML implementations in Python
kaggle.com
zip
Updated Nov 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rajat Gupta (2025). Practical ML implementations in Python [Dataset]. https://www.kaggle.com/datasets/rajat95gupta/practical-ml-implementations-in-python
Explore at:
zip(4531794 bytes)Available download formats
Dataset updated
Nov 21, 2025
Authors
Rajat Gupta
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The dataset contains implementations of different use-cases in the Machine Learning life cycle - from data extraction through deployment. There are paper implementations from scratch, and examples of file handling, model conversion, web scraping, deployment using APIs etc.
Python for Data science
kaggle.com
zip
Updated Dec 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Valde Junior (2024). Python for Data science [Dataset]. https://www.kaggle.com/datasets/valdejuinior/python-for-data-science/code
Explore at:
zip(849652 bytes)Available download formats
Dataset updated
Dec 9, 2024
Authors
Valde Junior
Description
Dataset

This dataset was created by Valde Junior

Contents
Data set python
kaggle.com
zip
Updated Jul 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kolluri Nithin (2023). Data set python [Dataset]. https://www.kaggle.com/datasets/kollurinithin/data-set-python/code
Explore at:
zip(309360 bytes)Available download formats
Dataset updated
Jul 13, 2023
Authors
Kolluri Nithin
Description
Dataset

This dataset was created by Kolluri Nithin

Contents

Facebook

Twitter

Click to copy link

Link copied

Cite

Chinmaya (2024). Dataset_Python_Question_Answer [Dataset]. https://www.kaggle.com/datasets/chinmayadatt/dataset-python-question-answer

Dataset_Python_Question_Answer

Answer common questions about the Python programming language

Explore at:

zip(189137 bytes)Available download formats

Dataset updated

Mar 29, 2024

Authors

Chinmaya

License

Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically

Description

This dataset is about Python programming. Question and answers are generated using Gemma. There are more than four hundred questions and their corresponding answers about Python programming.

Questions are ranging from concepts like data-types, variables and keywords to regular-expression and threading.

I have used this dataset here

The code used for dataset generated is available here

Clear search

Close search

Google apps

Main menu

Dataset_Python_Question_Answer

Data from: KGTorrent: A Dataset of Python Jupyter Notebooks from Kaggle

Meta Kaggle Code

Explore our public notebook content!

Why we’re releasing this dataset

Sensitive data

Joining with Meta Kaggle

File organization

Questions / Comments

Python and Javascript Code

Dataset

Contents

Kaggle's Most Used Packages & Method Calls

AVATAR: Java-Python Program Translation Dataset

What is AVATAR?

Files Description

pyVips: python & deb 📦package

How to use:

Python IPL Data Project

Dataset

Contents

Quasi-experimental Methods

Python Questions and Answers

Dataset

Contents

python-box

Dataset

Contents

Automobile Dataset For EDA Python And R

Dataset

Contents

Codenetpy

Context

Content

Acknowledgements

Inspiration

Python FAQ Dataset

Python Practical Exam

Dataset

Contents

Seaborn (Flights, Iris, Tips)

Dataset

Contents

Exploratory Data Analysis on Automobile Dataset

Dataset

Contents

Practical ML implementations in Python

Python for Data science

Dataset

Contents

Data set python

Dataset

Contents

Dataset_Python_Question_Answer

Answer common questions about the Python programming language