82 datasets found

Python Code Instruction
kaggle.com
Updated Nov 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Python Code Instruction [Dataset]. https://www.kaggle.com/datasets/thedevastator/python-code-instruction-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 30, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Python Code Instruction

Training Data with Instruction, Input, Output, and Prompt Columns

By Tarun Bisht (From Huggingface) [source]

About this dataset

The python_code_instructions_18k_alpaca dataset is a comprehensive training dataset specifically curated for researchers and developers involved in the analysis and comprehension of Python code instructions. It contains a vast collection of Python code snippets along with their corresponding instruction, input, output, and prompt information. By utilizing this dataset, users can gain valuable insights into various Python programming concepts and techniques.

The dataset is organized into columns to facilitate easy access to the required information. The instruction column holds the specific task or instruction that the Python code snippet is designed to perform. This allows users to understand the purpose or requirement of each code snippet at a glance.

The input column contains all necessary input data or parameters that are required for executing the Python code snippet accurately. These inputs provide context and enable users to comprehend how different variables or values impact the overall functioning of each code snippet.

Likewise, the output column presents expected results or outcomes that should be produced when executing each Python code snippet with its specified input values. This allows for validation and verification purposes, ensuring that each code snippet performs as intended.

In addition to instruction, input, and output details, this dataset also includes prompts. The prompt column provides additional context or information intended to assist users in better understanding the purpose or requirements of each particular Python code snippet.

By leveraging this comprehensive python_code_instructions_18k_alpaca training dataset, researchers and developers can delve into numerous real-world examples of Python programming challenges - helping them enhance their coding skills while gaining invaluable knowledge about effective implementation techniques across various domains

Research Ideas

Code Instruction Analysis: This dataset can be used to analyze different types of Python code instructions and identify patterns or common practices. Researchers or developers can use this dataset to gain insights into effective ways of writing code instructions.

Code Output Prediction: With the given input and instruction, this dataset can be used to train models for predicting the expected output of a Python code snippet. This can be useful in automating the testing process or verifying the correctness of the code.

Prompt Generation: Developers often struggle with providing clear and concise prompts for their code snippets. This dataset can serve as a resource for generating prompts by analyzing existing examples and extracting key information or requirements from them

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv | Column name | Description | |:----------------|:------------------------------------------------------------------------------------------------------------------| | instruction | Specific tasks or instructions assigned to each Python code snippet. (Text) | | input | The input data or parameters required for executing the code instruction. (Text) | | output | The expected result or output that should be produced when executing the code instruction. (Text) | | prompt | Additional information or context to help understand the purpose or requirements of each code instruction. (Text) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Tarun Bisht (From Huggingface).
Python for Machine Learning - Crash Course
kaggle.com
Updated Jun 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kunaal Naik (2023). Python for Machine Learning - Crash Course [Dataset]. https://www.kaggle.com/datasets/funxexcel/python-for-machine-learning-crash-course/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 24, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Kunaal Naik
Description
Dataset

This dataset was created by Kunaal Naik

Contents
truncated slab data for python course
kaggle.com
zip
Updated Nov 4, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Katja Rodionova (2018). truncated slab data for python course [Dataset]. https://www.kaggle.com/kpkatja/truncated-slab-data-for-python-course
Explore at:
zip(3292 bytes)Available download formats
Dataset updated
Nov 4, 2018
Authors
Katja Rodionova
Description
Dataset

This dataset was created by Katja Rodionova

Contents
Data from: Code4ML: a Large-scale Dataset of annotated Machine Learning Code...
zenodo.org
Updated May 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ekaterina Trofimova; Ekaterina Trofimova; Emil Sataev; Anastasia Drozdova; Polina Guseva; Anna Scherbakova; Andrey Ustyuzhanin; Anastasia Gorodilova; Valeriy Berezovskiy; Emil Sataev; Anastasia Drozdova; Polina Guseva; Anna Scherbakova; Andrey Ustyuzhanin; Anastasia Gorodilova; Valeriy Berezovskiy (2024). Code4ML: a Large-scale Dataset of annotated Machine Learning Code [Dataset]. http://doi.org/10.5281/zenodo.11213783
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.11213783
Dataset updated
May 18, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ekaterina Trofimova; Ekaterina Trofimova; Emil Sataev; Anastasia Drozdova; Polina Guseva; Anna Scherbakova; Andrey Ustyuzhanin; Anastasia Gorodilova; Valeriy Berezovskiy; Emil Sataev; Anastasia Drozdova; Polina Guseva; Anna Scherbakova; Andrey Ustyuzhanin; Anastasia Gorodilova; Valeriy Berezovskiy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is an enriched version of Code4ML: a Large-scale Dataset of annotated Machine Learning Code, a corpus of Python code snippets, competition summaries, and data descriptions from Kaggle. The initial corpus consists of ≈ 2.5 million snippets of ML code collected from ≈ 100 thousand Jupyter notebooks. A representative fraction of the snippets is annotated by human assessors through a user-friendly interface specially designed for that purpose.

The data is organized as a set of tables in CSV format. It includes several central entities: raw code blocks collected from Kaggle (code_blocks.csv), kernels (kernels_meta.csv) and competitions meta information (competitions_meta.csv). Manually annotated code blocks are presented as a separate table (murkup_data.csv). As this table contains the numeric id of the code block semantic type, we also provide a mapping from the id to semantic class and subclass (vertices.csv).

Snippets information (code_blocks.csv) can be mapped with kernels meta-data via kernel_id. Kernels metadata is linked to Kaggle competitions information through comp_name. To ensure the quality of the data kernels_meta.csv includes only notebooks with an available Kaggle score.

Automatic classification of code_blocks are stored in data_with_preds.csv. The mapping of this table with code_blocks.csv can be doe through code_blocks_index column, which corresponds to code_blocks indices.

The updated Code4ML 2.0 corpus includes kernels retrieved from Code Kaggle Meta. These kernels correspond to the kaggle competitions launched since 2020. The natural descriptions of the competitions are retrieved with the aim of LLM.

kernels_meta2.csv may contain kernels without Kaggle score, but with the place in the leader board (rank).

Code4ML 2.0 dataset can be used for various purposes, including training and evaluating models for code generation, code understanding, and natural language processing tasks.
R
Custom Yolov7 On Kaggle On Custom Dataset
universe.roboflow.com
zip
Updated Jan 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Owais Ahmad (2023). Custom Yolov7 On Kaggle On Custom Dataset [Dataset]. https://universe.roboflow.com/owais-ahmad/custom-yolov7-on-kaggle-on-custom-dataset-rakiq
Explore at:
zipAvailable download formats
Dataset updated
Jan 29, 2023
Dataset authored and provided by
Owais Ahmad
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Person Car Bounding Boxes
Description
Custom Training with YOLOv7 🔥

Some Important links

Model Inference🤖

🚀Training Yolov7 on Kaggle

Weight and Biases 🐝

HuggingFace 🤗 Model Repo

Contact Information

Name - Owais Ahmad

Phone - +91-9515884381

Email - owaiskhan9654@gmail.com

Portfolio - https://owaiskhan9654.github.io/

Objective

To Showcase custom Object Detection on the Given Dataset to train and Infer the Model using newly launched YoloV7.

Data Acquisition

The goal of this task is to train a model that can localize and classify each instance of Person and Car as accurately as possible.

Link to the Downloadable Dataset

from IPython.display import Markdown, display display(Markdown("../input/Car-Person-v2-Roboflow/README.roboflow.txt"))

Custom Training with YOLOv7 🔥

In this Notebook, I have processed the images with RoboFlow because in COCO formatted dataset was having different dimensions of image and Also data set was not splitted into different Format. To train a custom YOLOv7 model we need to recognize the objects in the dataset. To do so I have taken the following steps:

Export the dataset to YOLOv7

Train YOLOv7 to recognize the objects in our dataset

Evaluate our YOLOv7 model's performance

Run test inference to view performance of YOLOv7 model at work

📦 YOLOv7

https://raw.githubusercontent.com/Owaiskhan9654/Yolo-V7-Custom-Dataset-Train-on-Kaggle/main/car-person-2.PNG" width=800>

Image Credit - jinfagang

Step 1: Install Requirements

!git clone https://github.com/WongKinYiu/yolov7 # Downloading YOLOv7 repository and installing requirements %cd yolov7 !pip install -qr requirements.txt !pip install -q roboflow

Downloading YOLOV7 starting checkpoint

!wget "https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt"

import os import glob import wandb import torch from roboflow import Roboflow from kaggle_secrets import UserSecretsClient from IPython.display import Image, clear_output, display # to display images print(f"Setup complete. Using torch {torch._version_} ({torch.cuda.get_device_properties(0).name if torch.cuda.is_available() else 'CPU'})")

https://camo.githubusercontent.com/dd842f7b0be57140e68b2ab9cb007992acd131c48284eaf6b1aca758bfea358b/68747470733a2f2f692e696d6775722e636f6d2f52557469567a482e706e67">

I will be integrating W&B for visualizations and logging artifacts and comparisons of different models!

YOLOv7-Car-Person-Custom

try: user_secrets = UserSecretsClient() wandb_api_key = user_secrets.get_secret("wandb_api") wandb.login(key=wandb_api_key) anonymous = None except: wandb.login(anonymous='must') print('To use your W&B account, Go to Add-ons -> Secrets and provide your W&B access token. Use the Label name as WANDB. Get your W&B access token from here: https://wandb.ai/authorize') wandb.init(project="YOLOvR",name=f"7. YOLOv7-Car-Person-Custom-Run-7")

Step 2: Assemble Our Dataset

https://uploads-ssl.webflow.com/5f6bc60e665f54545a1e52a5/615627e5824c9c6195abfda9_computer-vision-cycle.png" alt="">

In order to train our custom model, we need to assemble a dataset of representative images with bounding box annotations around the objects that we want to detect. And we need our dataset to be in YOLOv7 format.

In Roboflow, We can choose between two paths:

Convert an existing Coco dataset to YOLOv7 format. In Roboflow it supports over 30 formats object detection formats for conversion.

Uploading only these raw images and annotate them in Roboflow with Roboflow Annotate.

Version v2 Aug 12, 2022 Looks like this.

https://raw.githubusercontent.com/Owaiskhan9654/Yolo-V7-Custom-Dataset-Train-on-Kaggle/main/Roboflow.PNG" alt="">

user_secrets = UserSecretsClient() roboflow_api_key = user_secrets.get_secret("roboflow_api")

rf = Roboflow(api_key=roboflow_api_key) project = rf.workspace("owais-ahmad").project("custom-yolov7-on-kaggle-on-custom-dataset-rakiq") dataset = project.version(2).download("yolov7")

Step 3: Training Custom pretrained YOLOv7 model

Here, I am able to pass a number of arguments: - img: define input image size - batch: determine
Python-course-gemet-english-2023
kaggle.com
Updated May 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Skeppstedt (2024). Python-course-gemet-english-2023 [Dataset]. https://www.kaggle.com/datasets/mariaskeppstedt/python-course-gemet-english-2023/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 18, 2024
Dataset provided by
Kaggle
Authors
Maria Skeppstedt
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset

This dataset was created by Maria Skeppstedt

Released under CC BY-SA 4.0

Contents

Code4ML 2.0

zenodo.org

csv, txt

Updated May 19, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Anonimous authors; Anonimous authors (2025). Code4ML 2.0 [Dataset]. http://doi.org/10.5281/zenodo.15465737

Explore at:

csv, txtAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.15465737

Dataset updated

May 19, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Anonimous authors; Anonimous authors

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This is an enriched version of the Code4ML dataset, a large-scale corpus of annotated Python code snippets, competition summaries, and data descriptions sourced from Kaggle. The initial release includes approximately 2.5 million snippets of machine learning code extracted from around 100,000 Jupyter notebooks. A portion of these snippets has been manually annotated by human assessors through a custom-built, user-friendly interface designed for this task.

The original dataset is organized into multiple CSV files, each containing structured data on different entities:

code_blocks.csv: Contains raw code snippets extracted from Kaggle.
kernels_meta.csv: Metadata for the notebooks (kernels) from which the code snippets were derived.
competitions_meta.csv: Metadata describing Kaggle competitions, including information about tasks and data.
markup_data.csv: Annotated code blocks with semantic types, allowing deeper analysis of code structure.
vertices.csv: A mapping from numeric IDs to semantic types and subclasses, used to interpret annotated code blocks.

Table 1. code_blocks.csv structure

Column	Description
code_blocks_index	Global index linking code blocks to markup_data.csv.
kernel_id	Identifier for the Kaggle Jupyter notebook from which the code block was extracted.
code_block_id	Position of the code block within the notebook.
code_block	The actual machine learning code snippet.

Table 2. kernels_meta.csv structure

Column	Description
kernel_id	Identifier for the Kaggle Jupyter notebook.
kaggle_score	Performance metric of the notebook.
kaggle_comments	Number of comments on the notebook.
kaggle_upvotes	Number of upvotes the notebook received.
kernel_link	URL to the notebook.
comp_name	Name of the associated Kaggle competition.

Table 3. competitions_meta.csv structure

Column	Description
comp_name	Name of the Kaggle competition.
description	Overview of the competition task.
data_type	Type of data used in the competition.
comp_type	Classification of the competition.
subtitle	Short description of the task.
EvaluationAlgorithmAbbreviation	Metric used for assessing competition submissions.
data_sources	Links to datasets used.
metric type	Class label for the assessment metric.

Table 4. markup_data.csv structure

Column	Description
code_block	Machine learning code block.
too_long	Flag indicating whether the block spans multiple semantic types.
marks	Confidence level of the annotation.
graph_vertex_id	ID of the semantic type.

The dataset allows mapping between these tables. For example:

code_blocks.csv can be linked to kernels_meta.csv via the kernel_id column.
kernels_meta.csv is connected to competitions_meta.csv through comp_name. To maintain quality, kernels_meta.csv includes only notebooks with available Kaggle scores.

In addition, data_with_preds.csv contains automatically classified code blocks, with a mapping back to code_blocks.csvvia the code_blocks_index column.

Code4ML 2.0 Enhancements

The updated Code4ML 2.0 corpus introduces kernels extracted from Meta Kaggle Code. These kernels correspond to the kaggle competitions launched since 2020. The natural descriptions of the competitions are retrieved with the aim of LLM.

Notebooks in kernels_meta2.csv may not have a Kaggle score but include a leaderboard ranking (rank), providing additional context for evaluation.

competitions_meta_2.csv is enriched with data_cards, decsribing the data used in the competitions.

Applications

The Code4ML 2.0 corpus is a versatile resource, enabling training and evaluation of models in areas such as:

Code generation
Code understanding
Natural language processing of code-related tasks

BoolArt-Python-Course
kaggle.com
Updated Sep 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KKY (2022). BoolArt-Python-Course [Dataset]. https://www.kaggle.com/datasets/evilpsycho42/boolartpythonplaygroud
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 11, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
KKY
License
http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
Description
Dataset

This dataset was created by KKY

Released under GNU Lesser General Public License 3.0

Contents
Z
DECIMER Image classifier dataset
data.niaid.nih.gov
Updated Jul 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
M. Isabel agea (2022). DECIMER Image classifier dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6670745
Explore at:
Dataset updated
Jul 9, 2022
Dataset authored and provided by
M. Isabel agea
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Images dataset divided into train (10905114 images), validation (2115528 images) and test (544946 images) folders containing a balanced number of images for two classes (chemical structures and non-chemical structures).

The chemical structures were generated using RanDepict to random picked compounds from the ChEMBL30 database and the COCONUT database.

The non-chemical structures were generated using Python or they were retrieved from several public datasets:

COCO dataset, MIT Places-205 dataset, Visual Genome dataset, Google Open labeled Images, MMU-OCR-21 (kaggle), HandWritten_Character (kaggle), CoronaHack -Chest X-Ray-dataset (kaggle), PANDAS Augmented Images (kaggle), Bacterial_Colony (kaggle), Ceylon Epigraphy Periods (kaggle), Chinese Calligraphy Styles by Calligraphers (kaggle), Graphs Dataset (kaggle), Function_Graphs Polynomial (kaggle), sketches (kaggle), Person Face Sketches (kaggle), Art Pictograms (kaggle), Russian handwritten letters (kaggle), Handwritten Russian Letters (kaggle), Covid-19 Misinformation Tweets Labeled Dataset (kaggle) and grapheme-imgs-224x224 (kaggle).

This data was used to build a CNN classification model using as a base model EfficienNetB0 and fine tuning it. The model is available on Github.
Best Books Ever Dataset
zenodo.org
csv
Updated Nov 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells (2020). Best Books Ever Dataset [Dataset]. http://doi.org/10.5281/zenodo.4265096
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4265096
Dataset updated
Nov 10, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).

The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).

Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset

The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.

Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.

The 25 fields of the dataset are:

| Attributes | Definition | Completeness | | ------------- | ------------- | ------------- | | bookId | Book Identifier as in goodreads.com | 100 | | title | Book title | 100 | | series | Series Name | 45 | | author | Book's Author | 100 | | rating | Global goodreads rating | 100 | | description | Book's description | 97 | | language | Book's language | 93 | | isbn | Book's ISBN | 92 | | genres | Book's genres | 91 | | characters | Main characters | 26 | | bookFormat | Type of binding | 97 | | edition | Type of edition (ex. Anniversary Edition) | 9 | | pages | Number of pages | 96 | | publisher | Editorial | 93 | | publishDate | publication date | 98 | | firstPublishDate | Publication date of first edition | 59 | | awards | List of awards | 20 | | numRatings | Number of total ratings | 100 | | ratingsByStars | Number of ratings by stars | 97 | | likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 | | setting | Story setting | 22 | | coverImg | URL to cover image | 99 | | bbeScore | Score in Best Books Ever list | 100 | | bbeVotes | Number of votes in Best Books Ever list | 100 | | price | Book's price (extracted from Iberlibro) | 73 |
A
‘COVID-19 dataset in Japan’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘COVID-19 dataset in Japan’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-covid-19-dataset-in-japan-2665/latest
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Japan
Description
Analysis of ‘COVID-19 dataset in Japan’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/lisphilar/covid19-dataset-in-japan on 28 January 2022.

--- Dataset description provided by original source is as follows ---

1. Context

This is a COVID-19 dataset in Japan. This does not include the cases in Diamond Princess cruise ship (Yokohama city, Kanagawa prefecture) and Costa Atlantica cruise ship (Nagasaki city, Nagasaki prefecture). - Total number of cases in Japan - The number of vaccinated people (New/experimental) - The number of cases at prefecture level - Metadata of each prefecture

Note: Lisphilar (author) uploads the same files to https://github.com/lisphilar/covid19-sir/tree/master/data

This dataset can be retrieved with CovsirPhy (Python library).

pip install covsirphy --upgrade

import covsirphy as cs data_loader = cs.DataLoader() japan_data = data_loader.japan() # The number of cases (Total/each province) clean_df = japan_data.cleaned() # Metadata meta_df = japan_data.meta()

Please refer to CovsirPhy Documentation: Japan-specific dataset.

Note: Before analysing the data, please refer to Kaggle notebook: EDA of Japan dataset and COVID-19: Government/JHU data in Japan. The detailed explanation of the build process is discussed in Steps to build the dataset in Japan. If you find errors or have any questions, feel free to create a discussion topic.

1.1 Total number of cases in Japan

covid_jpn_total.csv Cumulative number of cases: - PCR-tested / PCR-tested and positive - with symptoms (to 08May2020) / without symptoms (to 08May2020) / unknown (to 08May2020) - discharged - fatal

The number of cases: - requiring hospitalization (from 09May2020) - hospitalized with mild symptoms (to 08May2020) / severe symptoms / unknown (to 08May2020) - requiring hospitalization, but waiting in hotels or at home (to 08May2020)

In primary source, some variables were removed on 09May2020. Values are NA in this dataset from 09May2020.

Manually collected the data from Ministry of Health, Labour and Welfare HP:
厚生労働省 HP (in Japanese)
Ministry of Health, Labour and Welfare HP (in English)

The number of vaccinated people: - Vaccinated_1st: the number of vaccinated persons for the first time on the date - Vaccinated_2nd: the number of vaccinated persons with the second dose on the date - Vaccinated_3rd: the number of vaccinated persons with the third dose on the date

Data sources for vaccination: - To 09Apr2021: 厚生労働省 HP 新型コロナワクチンの接種実績(in Japanese) - 首相官邸新型コロナワクチンについて - From 10APr2021: Twitter: 首相官邸（新型コロナワクチン情報）

1.2 The number of cases at prefecture level

covid_jpn_prefecture.csv Cumulative number of cases: - PCR-tested / PCR-tested and positive - discharged - fatal

The number of cases: - requiring hospitalization (from 09May2020) - hospitalized with severe symptoms (from 09May2020)

Using pdf-excel converter, manually collected the data from Ministry of Health, Labour and Welfare HP:
厚生労働省 HP (in Japanese)
Ministry of Health, Labour and Welfare HP (in English)

Note: covid_jpn_prefecture.groupby("Date").sum() does not match covid_jpn_total. When you analyse total data in Japan, please use covid_jpn_total data.

1.3 Metadata of each prefecture

covid_jpn_metadata.csv - Population (Total, Male, Female): 厚生労働省厚生統計要覧（2017年度）第１－５表 - Area (Total, Habitable): Wikipedia 都道府県の面積一覧 (2015)

Hospital_bed: With the primary data of 厚生労働省感染症指定医療機関の指定状況（平成31年4月1日現在）, 厚生労働省第二種感染症指定医療機関の指定状況（平成31年4月1日現在）, 厚生労働省医療施設動態調査（令和２年１月末概数）, 厚生労働省感染症指定医療機関について and secondary data of COVID-19 Japan 都道府県別感染症病床数,

Specific: Hospital beds of medical institutions designated for specific infectious diseases

Type-I: Hospital beds of medical institutions designated for type I infectious diseases

Type-II: Hospital beds of medical institutions designated for type II infectious diseases

Tuberculosis: Hospital beds of medical institutions designated for tuberculosis (outpatient care)

Care: long term care bed of hospitals

Total: Beds of all hospitals

Clinic_bed: With the primary data of 医療施設動態調査（令和２年１月末概数） ,

Care: long term care beds of clinics

Total: Beds of all clinics

Location: Data is from LinkData 都道府県庁所在地 (Public Domain) (secondary data).

Latitude

Longitude

Admin

Capital: Prefectural capital city. Data is from LinkData 都道府県庁所在地 (Public Domain) (secondary data).

Region: Region name. Data is from WIkipedia (secondary data). "Kyushu-Okinawa region" was separated to "Kyushu" and "Okinawa" by this datasets' author.

Num: Prefecture code (JIS X 0401: Hokkaido=1,...Okinawa=47). Data is from 国土交通省 GIS HP Pref code. cf. (not source) Japan VIsitor: Japan Prefectures Map.

2. Acknowledgements

To create this dataset, edited and transformed data of the following sites was used.

厚生労働省 Ministry of Health, Labour and Welfare, Japan:
厚生労働省 HP (in Japanese)
Ministry of Health, Labour and Welfare HP (in English) 厚生労働省 HP 利用規約・リンク・著作権等 CC BY 4.0 (in Japanese)

国土交通省 Ministry of Land, Infrastructure, Transport and Tourism, Japan: 国土交通省 HP (in Japanese) 国土交通省 HP (in English) 国土交通省 HP 利用規約・リンク・著作権等 CC BY 4.0 (in Japanese)

Code for Japan / COVID-19 Japan: Code for Japan COVID-19 Japan Dashboard (CC BY 4.0) COVID-19 Japan 都道府県別感染症病床数 (CC BY)

Wikipedia: Wikipedia

LinkData: LinkData (Public Domain)

Inspiration

Changes in number of cases over time

Percentage of patients without symptoms / mild or severe symptoms

What to do next to prevent outbreak

License and how to cite

Kindly cite this dataset under CC BY-4.0 license as follows. - Hirokazu Takaya (2020-2022), COVID-19 dataset in Japan, GitHub repository, https://github.com/lisphilar/covid19-sir/data/japan, or - Hirokazu Takaya (2020-2022), COVID-19 dataset in Japan, Kaggle Dataset, https://www.kaggle.com/lisphilar/covid19-dataset-in-japan

--- Original source retains full ownership of the source dataset ---
Vezora/Tested-188k-Python-Alpaca: Functional
kaggle.com
Updated Nov 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Vezora/Tested-188k-Python-Alpaca: Functional [Dataset]. https://www.kaggle.com/datasets/thedevastator/vezora-tested-188k-python-alpaca-functional-pyth
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 30, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Vezora/Tested-188k-Python-Alpaca: Functional Python Code Dataset

188k Functional Python Code Samples

By Vezora (From Huggingface) [source]

About this dataset

The Vezora/Tested-188k-Python-Alpaca dataset is a comprehensive collection of functional Python code samples, specifically designed for training and analysis purposes. With 188,000 samples, this dataset offers an extensive range of examples that cater to the research needs of Python programming enthusiasts.

This valuable resource consists of various columns, including input, which represents the input or parameters required for executing the Python code sample. The instruction column describes the task or objective that the Python code sample aims to solve. Additionally, there is an output column that showcases the resulting output generated by running the respective Python code.

By utilizing this dataset, researchers can effectively study and analyze real-world scenarios and applications of Python programming. Whether for educational purposes or development projects, this dataset serves as a reliable reference for individuals seeking practical examples and solutions using Python

How to use the dataset

The Vezora/Tested-188k-Python-Alpaca dataset is a comprehensive collection of functional Python code samples, containing 188,000 samples in total. This dataset can be a valuable resource for researchers and programmers interested in exploring various aspects of Python programming.

Contents of the Dataset

The dataset consists of several columns:

output: This column represents the expected output or result that is obtained when executing the corresponding Python code sample.

instruction: It provides information about the task or instruction that each Python code sample is intended to solve.

input: The input parameters or values required to execute each Python code sample.

Exploring the Dataset

To make effective use of this dataset, it is essential to understand its structure and content properly. Here are some steps you can follow:

Importing Data: Load the dataset into your preferred environment for data analysis using appropriate tools like pandas in Python.

import pandas as pd # Load the dataset df = pd.read_csv('train.csv')

Understanding Column Names: Familiarize yourself with the column names and their meanings by referring to the provided description.

# Display column names print(df.columns)

Sample Exploration: Get an initial understanding of the data structure by examining a few random samples from different columns.

# Display random samples from 'output' column print(df['output'].sample(5))

Analyzing Instructions: Analyze different instructions or tasks present in the 'instruction' column to identify specific areas you are interested in studying or learning about.

# Count unique instructions and display top ones with highest occurrences instruction_counts = df['instruction'].value_counts() print(instruction_counts.head(10))

Potential Use Cases

The Vezora/Tested-188k-Python-Alpaca dataset can be utilized in various ways:

Code Analysis: Analyze the code samples to understand common programming patterns and best practices.

Code Debugging: Use code samples with known outputs to test and debug your own Python programs.

Educational Purposes: Utilize the dataset as a teaching tool for Python programming classes or tutorials.

Machine Learning Applications: Train machine learning models to predict outputs based on given inputs.

Remember that this dataset provides a plethora of diverse Python coding examples, allowing you to explore different

Research Ideas

Code analysis: Researchers and developers can use this dataset to analyze various Python code samples and identify patterns, best practices, and common mistakes. This can help in improving code quality and optimizing performance.

Language understanding: Natural language processing techniques can be applied to the instruction column of this dataset to develop models that can understand and interpret natural language instructions for programming tasks.

Code generation: The input column of this dataset contains the required inputs for executing each Python code sample. Researchers can build models that generate Python code based on specific inputs or task requirements using the examples provided in this dataset. This can be useful in automating repetitive programming tasks o...
A
‘Waiter's Tips Dataset’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Waiter's Tips Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-waiter-s-tips-dataset-b284/latest
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Waiter's Tips Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/aminizahra/tips-dataset on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

One waiter recorded information about each tip he received over a period of a few months working in one restaurant. In all he recorded 244 tips.

Acknowledgements

The data was reported in a collection of case studies for business statistics.

Bryant, P. G. and Smith, M (1995) Practical Data Analysis: Case Studies in Business Statistics. Homewood, IL: Richard D. Irwin Publishing

The dataset is also available through the Python package Seaborn.

Hint

Of course, this database has additional columns compared to other tips datasets.

Dataset info

RangeIndex: 244 entries, 0 to 243

Data columns (total 11 columns):

# Column Non-Null Count Dtype

0 total_bill 244 non-null float64

1 tip 244 non-null float64

2 sex 244 non-null object

3 smoker 244 non-null object

4 day 244 non-null object

5 time 244 non-null object

6 size 244 non-null int64

7 price_per_person 244 non-null float64

8 Payer Name 244 non-null object

9 CC Number 244 non-null int64

10 Payment ID 244 non-null object

dtypes: float64(3), int64(2), object(6)

Some details

total_bill a numeric vector, the bill amount (dollars)

tip a numeric vector, the tip amount (dollars)

sex a factor with levels Female Male, gender of the payer of the bill

Smoker a factor with levels No Yes, whether the party included smokers

day a factor with levels Friday Saturday Sunday Thursday, day of the week

time a factor with levels Day Night, rough time of day

size a numeric vector, number of ppartyeople in

--- Original source retains full ownership of the source dataset ---
T
fashion_mnist
tensorflow.org
opendatalab.com
+3more
Updated Jun 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). fashion_mnist [Dataset]. https://www.tensorflow.org/datasets/catalog/fashion_mnist
Explore at:
Dataset updated
Jun 1, 2024
Description
Fashion-MNIST is a dataset of Zalando's article images consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('fashion_mnist', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.

https://storage.googleapis.com/tfds-data/visualization/fig/fashion_mnist-3.0.1.png" alt="Visualization" width="500px">
A
‘Secondary Mushroom (Date Donated 2021)’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Secondary Mushroom (Date Donated 2021)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-secondary-mushroom-date-donated-2021-057c/12557338/?iid=017-114&v=presentation
Explore at:
Dataset updated
Feb 13, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Secondary Mushroom (Date Donated 2021)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yasserhessein/secondary-mushroom-dataset on 13 February 2022.

--- Dataset description provided by original source is as follows ---

Context

The given information is about the Secondary Mushroom Dataset, the Primary Mushroom Dataset used for the simulation and the respective metadata can be found in the zip.

Content

This dataset includes 61069 hypothetical mushrooms with caps based on 173 species (353 mushrooms per species). Each mushroom is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended (the latter class was combined with the poisonous class).

Acknowledgements

Donor: D. Wagner, dwagner93 '@' gmx.de Product of bachelor thesis at Philipps-UniversitÃ¤t Marburg, Bioinformatics Division, supervised by Dr. G. Hattab. Repository containing the related Python scripts and all the data sets: https://mushroom.mathematik.uni-marburg.de/files/ Inspired by the Mushroom Data Set of J. Schlimmer: url:https://archive.ics.uci.edu/ml/datasets/Mushroom.

Inspiration

The related Python project contains a Python module secondary_data_generation.py used to generate this data based on primary_data_edited.csv also found in the repository. Both nominal and metrical variables are a result of randomization. The simulated and ordered by species version is found in secondary_data_generated.csv. The randomly shuffled version is found in secondary_data_shuffled.csv.

Donor: D. Wagner, dwagner93 '@' gmx.de Product of bachelor thesis at Philipps-UniversitÃ¤t Marburg, Bioinformatics Division, supervised by Dr. G. Hattab. Repository containing the related Python scripts and all the data sets: https://mushroom.mathematik.uni-marburg.de/files/ Inspired by the Mushroom Data Set of J. Schlimmer: url:https://archive.ics.uci.edu/ml/datasets/Mushroom.

--- Original source retains full ownership of the source dataset ---
Answers of the Questions of the DataCamp DS Course
kaggle.com
Updated Mar 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paul Driessens (2023). Answers of the Questions of the DataCamp DS Course [Dataset]. https://www.kaggle.com/datasets/pauleyd/answers-of-the-questions-of-the-datacamp-ds-course/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 12, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Paul Driessens
Description
Dataset

This dataset was created by Paul Driessens

Contents
A
‘COVID-19: Holidays of countries’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘COVID-19: Holidays of countries’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-covid-19-holidays-of-countries-d8bd/e5a9e831/
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘COVID-19: Holidays of countries’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/vbmokin/covid19-holidays-of-countries on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

This research is devoted to the analysis of the impact of holidays on the statistics of confirmed coronavirus diseases. The Prophet using the holidays library with holidays of countries and their regions. As of 30 June 2020, only 62 countries (some with regions) are available in the holidays library:

['AR', 'AT', 'AU', 'BD', 'BE', 'BG', 'BR', 'BY', 'CA', 'CH', 'CL', 'CN', 'CO', 'CZ', 'DE', 'DK', 'DO', 'EE', 'EG', 'ES', 'FI', 'FR', 'GB', 'GR', 'HN', 'HR', 'HU', 'ID', 'IE', 'IL', 'IN', 'IS', 'IT', 'JP', 'KE', 'KR', 'LT', 'LU', 'MX', 'MY', 'NG', 'NI', 'NL', 'NO', 'NZ', 'PE', 'PH', 'PK', 'PL', 'PT', 'PY', 'RS', 'RU', 'SE', 'SG', 'SI', 'SK', 'TH', 'TR', 'UA', 'US', 'ZA'] or ['Argentina', 'Australia', 'Austria', 'Bangladesh', 'Belarus', 'Belgium', 'Brazil', 'Bulgaria', 'Canada', 'Chile', 'China', 'Colombia', 'Croatia', 'Czechia', 'Denmark', 'Dominican Republic', 'Egypt', 'Estonia', 'Finland', 'France', 'Germany', 'Greece', 'Honduras', 'Hungary', 'Iceland', 'India', 'Indonesia', 'Ireland', 'Israel', 'Italy', 'Japan', 'Kenya', 'Korea, Republic of', 'Lithuania', 'Luxembourg', 'Malaysia', 'Mexico', 'Netherlands', 'New Zealand', 'Nicaragua', 'Nigeria', 'Norway', 'Pakistan', 'Paraguay', 'Peru', 'Philippines', 'Poland', 'Portugal', 'Russian Federation', 'Serbia', 'Singapore', 'Slovakia', 'Slovenia', 'South Africa', 'Spain', 'Sweden', 'Switzerland', 'Thailand', 'Turkey', 'Ukraine', 'United Kingdom', 'United States']

I will note at once that the list of available countries in the description of the holidays library contains a lot of mistakes, which I wrote to the authors.

When I asked if this list would expand, the Prophet team made it clear that they were waiting for help from the community with holidays library expand.

As of Jan 2021 (version 8.4.1), 67 countries (some with regions) are available in the holidays library: a number of data have been refined and countries ['BI', 'LV', 'MA', 'RO', 'VN' - two-letter country codes or alpha_2 of the country (ISO 3166)] added.

Unfortunately, the format of the holidays library is not very suitable for coronavirus problems, as it has a number of disadvantages. First, the names of the countries are given in one word, which makes it difficult for many of them to identify them according to their common names (ISO 3166). It is best that the dataset contains the common name and two-letter abbreviation in English according to ISO 3166 (see pycountry). Second, the dates are not adapted to the potential impact of the holidays on coronavirus statistics. It is known that after the moment of infection, the active manifestation of symptoms occurs with a delay of 4-10 days, that is a person is likely to get into the statistics on the number of diseases only after 4-7 days. Therefore, it is advisable to use the dates window of impacts: ``` Lower_window = [4, 7] Upper_window = [7, 10]

`Lower_window <= 0` But my [request](https://github.com/facebook/prophet/issues/1588#issue-661098613) to allow positive numbers in this parameter [was refused](https://github.com/facebook/prophet/issues/1588#issuecomment-661984730) by the Prophet team and [advised](https://github.com/facebook/prophet/issues/1588#issuecomment-661984730) to simply move the dates themselves. Therefore, it is advisable to shift the holiday dates by 7 days. If the researcher thinks that 7 is too much and enough is 4 days, then he simply indicates "Lower" of the window in -3. Actually, by default, it makes sense to specify parameters:

Lower_window = -3 Upper_window = 3

If necessary, these settings are easy to change ### Content This dataset: 1. Contains ISO codes, ISO names (common and official) (ISO 3166) of **70** countries (3 European countries **['Albania' - 'AL', 'Georgia' - 'GE', 'Moldova' - 'MD']** have been added). 2. Contains imported dates from the holidays library for 2020-01-20-2021-12-31 (all countries from holidays library as of Jan 2021), and the same dates, but moved 7 days forward. 3. Holidays of countries that are not in the list of holidays of the library, but which are in the data of the World Health Organization and on which considerable statistics of diseases on coronavirus are already collected. 4. Parameters for Prophet model: `lower_window, upper_window, prior_scale` If you find errors, please write to the [Discussion](https://www.kaggle.com/vbmokin/covid19-holidays-of-countries/discussion). It is planned to periodically update (and, if necessary, correct) this dataset. ### Acknowledgements Thanks to the authors of the information resources * [https://github.com/dr-prodigy/python-holidays](https://github.com/dr-prodigy/python-holidays) * [https://en.wikipedia.org/wiki/List_of_holidays_by_country](https://en.wikipedia.org/wiki/List_of_holidays_by_country) about the dates and names of holidays in different countries, which I used. Thanks for the image to <a href="https://pixabay.com/ru/users/iXimus-2352783/?utm_source=link-attribution&utm_medium=referral&utm_campaign=image&utm_content=5062659">iXimus</a> from <a href="https://pixabay.com/ru/?utm_source=link-attribution&utm_medium=referral&utm_campaign=image&utm_content=5062659">Pixabay</a> ### Inspiration The main task for which this dataset was created is to study the impact of holidays on the accuracy of predicting coronavirus diseases, identifying new patterns, and forming optimal solutions to counteract or minimize its spread. Tasks that need to be solved to improve this dataset in order to increase the accuracy of modeling the impact of holidays on the number of coronavirus patients: 1) Expanding the list of countries 2) Clarification of holiday dates 3) Clarification of parameters `lower_window, upper_window, prior_scale` they must be unique for each country and each holiday. Also, it is advisable to carry out similar work for each region of countries, but this will not be done in this dataset. --- Original source retains full ownership of the source dataset ---

Blog-1K

zenodo.org
data.niaid.nih.gov

application/gzip

Updated Dec 21, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Haining Wang; Haining Wang (2022). Blog-1K [Dataset]. http://doi.org/10.5281/zenodo.7455623

Explore at:

application/gzipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7455623

Dataset updated

Dec 21, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Haining Wang; Haining Wang

License

https://www.isc.org/downloads/software-support-policy/isc-license/https://www.isc.org/downloads/software-support-policy/isc-license/

Description

The Blog-1K corpus is a redistributable authorship identification testbed for contemporary English prose. It has 1,000 candidate authors, 16K+ posts, and a pre-defined data split (train/dev/test proportional to ca. 8:1:1). It is a subset of the Blog Authorship Corpus from Kaggle. The MD5 for Blog-1K is '0a9e38740af9f921b6316b7f400acf06'.

1. Preprocessing

We first filter out texts shorter than 1,000 characters. Then we select one thousand authors whose writings meet the following criteria:
- accumulatively at least 10,000 characters,
- accumulatively at most 49,410 characters,
- accumulatively at least 16 posts,
- accumulatively at most 40 posts, and
- each text has at least 50 function words found in the Koppel512 list (to filter out non-English prose).

Blog-1K has three columns: 'id', 'text', and 'split', where 'id' corresponds to its parent corpus.

2. Statistics

Its creation and statistics can be found in the Jupyter Notebook.

Split	# Authors	# Posts	# Characters	Avg. Characters Per Author (Std.)	Avg. Characters Per Post (Std.)
Train	1,000	16,132	30,092,057	30,092 (5,884)	1,865 (1,007)
Validation	935	2,017	3,755,362	4,016 (2,269)	1,862 (999)
Test	924	2,017	3,732,448	4,039 (2,188)	1,850 (936)

3. Usage

import pandas as pd

df = pd.read_csv('blog1000.csv.gz', compression='infer')

# read in training data
train_text, train_label = zip(*df.loc[df.split=='train'][['text', 'id']].itertuples(index=False))

4. License
All the materials is licensed under the ISC License.

5. Contact
Please contact its maintainer for questions.

Car Prices Poland
kaggle.com
Updated Jan 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aleksandr Glotov (2022). Car Prices Poland [Dataset]. https://www.kaggle.com/datasets/aleksandrglotov/car-prices-poland
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 20, 2022
Dataset provided by
Kaggle
Authors
Aleksandr Glotov
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Hello I am a Python developer and train the ability to parse, analyze data and create neural networks. I decided to share the dataset I'm currently working with. Maybe it will be useful to someone. Why a car? I love cars! As well as programming. If you are interested in me as a specialist, please contact me glotoffalexandr@gmail.com (I'm looking for a new job)

The dataset was assembled in January 2022. Data from a well-known car sale site in Poland (which is public). Selenium and request were used for parsing (python of course)

The dataset contains information about the make, model, generation, year of production, mileage, engine type and volume, localization and price

I have ideas for expanding the model: add body type, configuration, color, power, etc.

I see many ways to use models created on the basis of this dataset, I will describe them in notebooks
Python Statistics Essential Training
kaggle.com
Updated Nov 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MD.Tanvir Rahman (2023). Python Statistics Essential Training [Dataset]. https://www.kaggle.com/datasets/tanvirrahmanornob/python-statistics-essential-training
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 17, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
MD.Tanvir Rahman
Description
Dataset

This dataset was created by MD.Tanvir Rahman

Contents

Facebook

Twitter

Click to copy link

Link copied

Cite

The Devastator (2023). Python Code Instruction [Dataset]. https://www.kaggle.com/datasets/thedevastator/python-code-instruction-dataset/code

Python Code Instruction

Training Data with Instruction, Input, Output, and Prompt Columns

Explore at:

25 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Nov 30, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

The Devastator

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Python Code Instruction

Training Data with Instruction, Input, Output, and Prompt Columns

By Tarun Bisht (From Huggingface) [source]

About this dataset

The python_code_instructions_18k_alpaca dataset is a comprehensive training dataset specifically curated for researchers and developers involved in the analysis and comprehension of Python code instructions. It contains a vast collection of Python code snippets along with their corresponding instruction, input, output, and prompt information. By utilizing this dataset, users can gain valuable insights into various Python programming concepts and techniques.

The dataset is organized into columns to facilitate easy access to the required information. The instruction column holds the specific task or instruction that the Python code snippet is designed to perform. This allows users to understand the purpose or requirement of each code snippet at a glance.

The input column contains all necessary input data or parameters that are required for executing the Python code snippet accurately. These inputs provide context and enable users to comprehend how different variables or values impact the overall functioning of each code snippet.

Likewise, the output column presents expected results or outcomes that should be produced when executing each Python code snippet with its specified input values. This allows for validation and verification purposes, ensuring that each code snippet performs as intended.

In addition to instruction, input, and output details, this dataset also includes prompts. The prompt column provides additional context or information intended to assist users in better understanding the purpose or requirements of each particular Python code snippet.

By leveraging this comprehensive python_code_instructions_18k_alpaca training dataset, researchers and developers can delve into numerous real-world examples of Python programming challenges - helping them enhance their coding skills while gaining invaluable knowledge about effective implementation techniques across various domains

Research Ideas

Code Instruction Analysis: This dataset can be used to analyze different types of Python code instructions and identify patterns or common practices. Researchers or developers can use this dataset to gain insights into effective ways of writing code instructions.

Code Output Prediction: With the given input and instruction, this dataset can be used to train models for predicting the expected output of a Python code snippet. This can be useful in automating the testing process or verifying the correctness of the code.

Prompt Generation: Developers often struggle with providing clear and concise prompts for their code snippets. This dataset can serve as a resource for generating prompts by analyzing existing examples and extracting key information or requirements from them

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv | Column name | Description | |:----------------|:------------------------------------------------------------------------------------------------------------------| | instruction | Specific tasks or instructions assigned to each Python code snippet. (Text) | | input | The input data or parameters required for executing the code instruction. (Text) | | output | The expected result or output that should be produced when executing the code instruction. (Text) | | prompt | Additional information or context to help understand the purpose or requirements of each code instruction. (Text) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Tarun Bisht (From Huggingface).

Clear search

Close search

Google apps

Main menu

Python Code Instruction

Python Code Instruction

Training Data with Instruction, Input, Output, and Prompt Columns

About this dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

Python for Machine Learning - Crash Course

Dataset

Contents

truncated slab data for python course

Dataset

Contents

Data from: Code4ML: a Large-scale Dataset of annotated Machine Learning Code...

Custom Yolov7 On Kaggle On Custom Dataset

Custom Training with YOLOv7 🔥

Some Important links

Contact Information

Objective

To Showcase custom Object Detection on the Given Dataset to train and Infer the Model using newly launched YoloV7.

Data Acquisition

Custom Training with YOLOv7 🔥

📦 YOLOv7

Step 1: Install Requirements

Downloading YOLOV7 starting checkpoint

Step 2: Assemble Our Dataset

Version v2 Aug 12, 2022 Looks like this.

Step 3: Training Custom pretrained YOLOv7 model

Python-course-gemet-english-2023

Dataset

Contents

Code4ML 2.0

Code4ML 2.0 Enhancements

Applications

BoolArt-Python-Course

Dataset

Contents

DECIMER Image classifier dataset

Best Books Ever Dataset

‘COVID-19 dataset in Japan’ analyzed by Analyst-2

1. Context

1.1 Total number of cases in Japan

1.2 The number of cases at prefecture level

1.3 Metadata of each prefecture

2. Acknowledgements

Inspiration

License and how to cite

Vezora/Tested-188k-Python-Alpaca: Functional

Vezora/Tested-188k-Python-Alpaca: Functional Python Code Dataset

188k Functional Python Code Samples

About this dataset

How to use the dataset

Contents of the Dataset

Exploring the Dataset

Potential Use Cases

Research Ideas

‘Waiter's Tips Dataset’ analyzed by Analyst-2

Context

Acknowledgements

Hint

Dataset info

Some details

fashion_mnist

‘Secondary Mushroom (Date Donated 2021)’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration

Answers of the Questions of the DataCamp DS Course

Dataset

Contents

‘COVID-19: Holidays of countries’ analyzed by Analyst-2

Context

Blog-1K

Car Prices Poland

Python Statistics Essential Training

Dataset

Contents