100+ datasets found
  1. Dataset_Python_Question_Answer

    • kaggle.com
    zip
    Updated Mar 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chinmaya (2024). Dataset_Python_Question_Answer [Dataset]. https://www.kaggle.com/datasets/chinmayadatt/dataset-python-question-answer
    Explore at:
    zip(189137 bytes)Available download formats
    Dataset updated
    Mar 29, 2024
    Authors
    Chinmaya
    License

    Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
    License information was derived automatically

    Description

    This dataset is about Python programming. Question and answers are generated using Gemma. There are more than four hundred questions and their corresponding answers about Python programming.

    Questions are ranging from concepts like data-types, variables and keywords to regular-expression and threading.

    I have used this dataset here

    The code used for dataset generated is available here

  2. Exploratory Data Analysis on Automobile Dataset

    • kaggle.com
    zip
    Updated Sep 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Monis Ahmad (2022). Exploratory Data Analysis on Automobile Dataset [Dataset]. https://www.kaggle.com/datasets/monisahmad/automobile
    Explore at:
    zip(4915 bytes)Available download formats
    Dataset updated
    Sep 12, 2022
    Authors
    Monis Ahmad
    Description

    Dataset

    This dataset was created by Monis Ahmad

    Contents

  3. h

    rlvr-code-data-python-r1

    • huggingface.co
    Updated Jun 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nathan Lambert (2025). rlvr-code-data-python-r1 [Dataset]. https://huggingface.co/datasets/natolambert/rlvr-code-data-python-r1
    Explore at:
    Dataset updated
    Jun 30, 2025
    Authors
    Nathan Lambert
    Description

    natolambert/rlvr-code-data-python-r1 dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. Datasets for manuscript "A data engineering framework for chemical flow...

    • catalog.data.gov
    • gimi9.com
    Updated Nov 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2021). Datasets for manuscript "A data engineering framework for chemical flow analysis of industrial pollution abatement operations" [Dataset]. https://catalog.data.gov/dataset/datasets-for-manuscript-a-data-engineering-framework-for-chemical-flow-analysis-of-industr
    Explore at:
    Dataset updated
    Nov 7, 2021
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    The EPA GitHub repository PAU4ChemAs as described in the README.md file, contains Python scripts written to build the PAU dataset modules (technologies, capital and operating costs, and chemical prices) for tracking chemical flows transfers, releases estimation, and identification of potential occupation exposure scenarios in pollution abatement units (PAUs). These PAUs are employed for on-site chemical end-of-life management. The folder datasets contains the outputs for each framework step. The Chemicals_in_categories.csv contains the chemicals for the TRI chemical categories. The EPA GitHub repository PAU_case_study as described in its readme.md entry, contains the Python scripts to run the manuscript case study for designing the PAUs, the data-driven models, and the decision-making module for chemicals of concern and tracking flow transfers at the end-of-life stage. The data was obtained by means of data engineering using different publicly-available databases. The properties of chemicals were obtained using the GitHub repository Properties_Scraper, while the PAU dataset using the repository PAU4Chem. Finally, the EPA GitHub repository Properties_Scraper contains a Python script to massively gather information about exposure limits and physical properties from different publicly-available sources: EPA, NOAA, OSHA, and the institute for Occupational Safety and Health of the German Social Accident Insurance (IFA). Also, all GitHub repositories describe the Python libraries required for running their code, how to use them, the obtained outputs files after running the Python script modules, and the corresponding EPA Disclaimer. This dataset is associated with the following publication: Hernandez-Betancur, J.D., M. Martin, and G.J. Ruiz-Mercado. A data engineering framework for on-site end-of-life industrial operations. JOURNAL OF CLEANER PRODUCTION. Elsevier Science Ltd, New York, NY, USA, 327: 129514, (2021).

  5. Data from: ManyTypes4Py: A benchmark Python Dataset for Machine...

    • data.europa.eu
    unknown
    Updated Feb 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2021). ManyTypes4Py: A benchmark Python Dataset for Machine Learning-Based Type Inference [Dataset]. https://data.europa.eu/88u/dataset/oai-zenodo-org-4571228
    Explore at:
    unknown(395470535)Available download formats
    Dataset updated
    Feb 28, 2021
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset is gathered on Sep. 17th 2020. It has more than 5.4K Python repositories that are hosted on GitHub. Check out the file ManyTypes4PyDataset.spec for repositories URL and their commit SHA. The dataset is also de-duplicated using the CD4Py tool. The list of duplicate files is provided in duplicate_files.txt file. All of its Python projects are processed in JSON-formatted files. They contain a seq2seq representation of each file, type-related hints, and information for machine learning models. The structure of JSON-formatted files is described in JSONOutput.md file. The dataset is split into train, validation and test sets by source code files. The list of files and their corresponding set is provided in dataset_split.csv file. Notable changes to each version of the dataset are documented in CHANGELOG.md.

  6. d

    Python code used to download U.S. Census Bureau data for public-supply water...

    • catalog.data.gov
    • data.usgs.gov
    Updated Nov 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Python code used to download U.S. Census Bureau data for public-supply water service areas [Dataset]. https://catalog.data.gov/dataset/python-code-used-to-download-u-s-census-bureau-data-for-public-supply-water-service-areas
    Explore at:
    Dataset updated
    Nov 19, 2025
    Dataset provided by
    U.S. Geological Survey
    Description

    This child item describes Python code used to query census data from the TigerWeb Representational State Transfer (REST) services and the U.S. Census Bureau Application Programming Interface (API). These data were needed as input feature variables for a machine learning model to predict public supply water use for the conterminous United States. Census data were retrieved for public-supply water service areas, but the census data collector could be used to retrieve data for other areas of interest. This dataset is part of a larger data release using machine learning to predict public supply water use for 12-digit hydrologic units from 2000-2020. Data retrieved by the census data collector code were used as input features in the public supply delivery and water use machine learning models. This page includes the following file: census_data_collector.zip - a zip file containing the census data collector Python code used to retrieve data from the U.S. Census Bureau and a README file.

  7. Data from: NICHE: A Curated Dataset of Engineered Machine Learning Projects...

    • figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO (2023). NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python [Dataset]. http://doi.org/10.6084/m9.figshare.21967265.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Machine learning (ML) has gained much attention and has been incorporated into our daily lives. While there are numerous publicly available ML projects on open source platforms such as GitHub, there have been limited attempts in filtering those projects to curate ML projects of high quality. The limited availability of such high-quality dataset poses an obstacle to understanding ML projects. To help clear this obstacle, we present NICHE, a manually labelled dataset consisting of 572 ML projects. Based on evidences of good software engineering practices, we label 441 of these projects as engineered and 131 as non-engineered. In this repository we provide "NICHE.csv" file that contains the list of the project names along with their labels, descriptive information for every dimension, and several basic statistics, such as the number of stars and commits. This dataset can help researchers understand the practices that are followed in high-quality ML projects. It can also be used as a benchmark for classifiers designed to identify engineered ML projects.

    GitHub page: https://github.com/soarsmu/NICHE

  8. Ecommerce Dataset for Data Analysis

    • kaggle.com
    zip
    Updated Sep 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shrishti Manja (2024). Ecommerce Dataset for Data Analysis [Dataset]. https://www.kaggle.com/datasets/shrishtimanja/ecommerce-dataset-for-data-analysis/code
    Explore at:
    zip(2028853 bytes)Available download formats
    Dataset updated
    Sep 19, 2024
    Authors
    Shrishti Manja
    Description

    This dataset contains 55,000 entries of synthetic customer transactions, generated using Python's Faker library. The goal behind creating this dataset was to provide a resource for learners like myself to explore, analyze, and apply various data analysis techniques in a context that closely mimics real-world data.

    About the Dataset: - CID (Customer ID): A unique identifier for each customer. - TID (Transaction ID): A unique identifier for each transaction. - Gender: The gender of the customer, categorized as Male or Female. - Age Group: Age group of the customer, divided into several ranges. - Purchase Date: The timestamp of when the transaction took place. - Product Category: The category of the product purchased, such as Electronics, Apparel, etc. - Discount Availed: Indicates whether the customer availed any discount (Yes/No). - Discount Name: Name of the discount applied (e.g., FESTIVE50). - Discount Amount (INR): The amount of discount availed by the customer. - Gross Amount: The total amount before applying any discount. - Net Amount: The final amount after applying the discount. - Purchase Method: The payment method used (e.g., Credit Card, Debit Card, etc.). - Location: The city where the purchase took place.

    Use Cases: 1. Exploratory Data Analysis (EDA): This dataset is ideal for conducting EDA, allowing users to practice techniques such as summary statistics, visualizations, and identifying patterns within the data. 2. Data Preprocessing and Cleaning: Learners can work on handling missing data, encoding categorical variables, and normalizing numerical values to prepare the dataset for analysis. 3. Data Visualization: Use tools like Python’s Matplotlib, Seaborn, or Power BI to visualize purchasing trends, customer demographics, or the impact of discounts on purchase amounts. 4. Machine Learning Applications: After applying feature engineering, this dataset is suitable for supervised learning models, such as predicting whether a customer will avail a discount or forecasting purchase amounts based on the input features.

    This dataset provides an excellent sandbox for honing skills in data analysis, machine learning, and visualization in a structured but flexible manner.

    This is not a real dataset. This dataset was generated using Python's Faker library for the sole purpose of learning

  9. Pandas Practice Dataset

    • kaggle.com
    zip
    Updated Jan 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mrityunjay Pathak (2023). Pandas Practice Dataset [Dataset]. https://www.kaggle.com/datasets/themrityunjaypathak/pandas-practice-dataset/discussion
    Explore at:
    zip(493 bytes)Available download formats
    Dataset updated
    Jan 27, 2023
    Authors
    Mrityunjay Pathak
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    What is Pandas?

    Pandas is a Python library used for working with data sets.

    It has functions for analyzing, cleaning, exploring, and manipulating data.

    The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in 2008.

    Why Use Pandas?

    Pandas allows us to analyze big data and make conclusions based on statistical theories.

    Pandas can clean messy data sets, and make them readable and relevant.

    Relevant data is very important in data science.

    What Can Pandas Do?

    Pandas gives you answers about the data. Like:

    Is there a correlation between two or more columns?

    What is average value?

    Max value?

    Min value?

  10. Project Python- Data Cleaning - EDA- Visualization

    • kaggle.com
    zip
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hussein Al Chami (2023). Project Python- Data Cleaning - EDA- Visualization [Dataset]. https://www.kaggle.com/datasets/husseinalchami/project-python-data-cleaning-eda-visualization
    Explore at:
    zip(322085 bytes)Available download formats
    Dataset updated
    Dec 10, 2023
    Authors
    Hussein Al Chami
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Hussein Al Chami

    Released under MIT

    Contents

  11. 150k Python Dataset V1.0

    • kaggle.com
    zip
    Updated Dec 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ailurophile (2024). 150k Python Dataset V1.0 [Dataset]. https://www.kaggle.com/datasets/veeralakrishna/150k-python-dataset-v1-0/code
    Explore at:
    zip(1065036252 bytes)Available download formats
    Dataset updated
    Dec 4, 2024
    Authors
    Ailurophile
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    150k Python Dataset

    This dataset is released as a part of Machine Learning for Programming project that aims to create new kinds of programming tools and techniques based on machine learning and statistical models learned over massive codebases. For more information about the project, tools and other resources please visit the main project page.

    Overview

    We provide a dataset consisting of parsed Parsed ASTs that were used to train and evaluate the DeepSyn tool. The Python programs are collected from GitHub repositories by removing duplicate files, removing project forks (copy of another existing repository), keeping only programs that parse and have at most 30'000 nodes in the AST and we aim to remove obfuscated files. Furthermore, we only used repositories with permissive and non-viral licenses such as MIT, BSD and Apache. For parsing, we used the Python AST parser included in Python 2.7. We also include the parser as part of our dataset. The dataset is split into two parts -- 100'000 files used for training and 50'000 files used for evaluation.

    Download

    Version 1.0 [526.6MB]

    An archive of the dataset

    Download

    The archive contains the following files:

    • parse_python.py -- The parser that we used to obtain JSON from each Python source code that we used to obtain this dataset.
    • python100k_train.json -- Parsed ASTs in JSON format. This is a dataset for training.
    • python50k_eval.json -- Parsed ASTs in JSON format. This is a dataset for evaluation.

    Redistributable Version [June 2020]

    Redistributable version containing subset of the original repositories and files.

    Download - Source Files

    Version 1.0 Files [190MB]

    An archive of source files used to generate the py150 Dataset

    Download

    The archive contains the following files:

    • data.tar.gz -- Archive containing all the source files
    • python100k_train.txt -- List of files used in the training dataset.
    • python50k_eval.txt -- List of files used in the evaluation dataset.
    • github_repos.txt -- List of GitHub repositories and their revisions used to obtain the dataset.

    Note that the order of python100k_train.txt and python100k_train.json (containing the ASTs of the parsed files) are the same. That is, parsing the n-th file from python100k_train.txt produces n-th ASTs in python100k_train.json.

    Published research using this dataset may cite the following paper:

    Raychev, V., Bielik, P., and Vechev, M. Probabilistic Model for Code with Decision Trees. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (2016), OOPSLA ’16, ACM

    Format

    Now we briefly explain the JSON format into which each AST is stored. The python100k_train.json and python50k_eval.json files include one such JSON per line. As an example, given a simple program:

    x = 7 print x+1

    The serialized AST is as follows (here we show it pretty-printed, but the entire JSON is on a single line in the data):

    [ {"type":"Module","children":[1,4]}, {"type":"Assign","children":[2,3]}, {"type":"NameStore","value":"x"}, {"type":"Num","value":"7"}, {"type":"Print","children":[5]}, {"type":"BinOpAdd","children":[6,7]}, {"type":"NameLoad","value":"x"}, {"type":"Num","value":"1"} ]

    As can be seen, the json contains array of objects. Each object contains several name/value pairs:

    • (Required) type: string containing type of current AST node.
    • (Optional) value: string containing value (if any) of the current AST node.
    • (Optional) children: array of integers denoting indices of children (if any) of the current AST node. Indices are 0-based starting from the first node in the JSON.
  12. Vector datasets for workshop "Introduction to Geospatial Raster and Vector...

    • figshare.com
    Updated Oct 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ryan Avery (2022). Vector datasets for workshop "Introduction to Geospatial Raster and Vector Data with Python" [Dataset]. http://doi.org/10.6084/m9.figshare.21273837.v1
    Explore at:
    application/x-sqlite3Available download formats
    Dataset updated
    Oct 5, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Ryan Avery
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cadaster data from PDOK used to illustrate the use of geopandas and shapely, geospatial python packages for manipulating vector data. The brpgewaspercelen_definitief_2020.gpkg file has been subsetted in order to make the download manageable for workshops. Other datasets are copies of those available from PDOK.

  13. COVID-19 Data Visualization Using Python

    • kaggle.com
    zip
    Updated Apr 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adithya Wijesinghe (2023). COVID-19 Data Visualization Using Python [Dataset]. https://www.kaggle.com/datasets/adithyawijesinghe/covid-19-data
    Explore at:
    zip(1291081 bytes)Available download formats
    Dataset updated
    Apr 21, 2023
    Authors
    Adithya Wijesinghe
    License

    https://www.usa.gov/government-works/https://www.usa.gov/government-works/

    Description

    Data visualization using Python (Pandas, Plotly).

    Data was used to visualization of the infection rate and the death rate from 01/20 to 04/22.

    The data was made available on Github: https://raw.githubusercontent.com/datasets/covid-19/master/data/countries-aggregated.csv

  14. Storage and Transit Time Data and Code

    • zenodo.org
    zip
    Updated Oct 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Felton; Andrew Felton (2024). Storage and Transit Time Data and Code [Dataset]. http://doi.org/10.5281/zenodo.14009758
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 29, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Andrew Felton; Andrew Felton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Author: Andrew J. Felton
    Date: 10/29/2024

    This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis, and figure production for the study entitled:

    "Global estimates of the storage and transit time of water through vegetation"

    Please note that 'turnover' and 'transit' are used interchangeably. Also please note that this R project has been updated multiple times as the analysis has updated.

    Data information:

    The data folder contains key data sets used for analysis. In particular:

    "data/turnover_from_python/updated/august_2024_lc/" contains the core datasets used in this study including global arrays summarizing five year (2016-2020) averages of mean (annual) and minimum (monthly) transit time, storage, canopy transpiration, and number of months of data able as both an array (.nc) or data table (.csv). These data were produced in python using the python scripts found in the "supporting_code" folder. The remaining files in the "data" and "data/supporting_data"" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here. The "supporting_data"" folder also contains annual (2016-2020) MODIS land cover data used in the analysis and contains separate filters containing the original data (.hdf) and then the final process (filtered) data in .nc format. The resulting annual land cover distributions were used in the pre-processing of data in python.

    #Code information

    Python scripts can be found in the "supporting_code" folder.

    Each R script in this project has a role:

    "01_start.R": This script sets the working directory, loads in the tidyverse package (the remaining packages in this project are called using the `::` operator), and can run two other scripts: one that loads the customized functions (02_functions.R) and one for importing and processing the key dataset for this analysis (03_import_data.R).

    "02_functions.R": This script contains custom functions. Load this using the
    `source()` function in the 01_start.R script.

    "03_import_data.R": This script imports and processes the .csv transit data. It joins the mean (annual) transit time data with the minimum (monthly) transit data to generate one dataset for analysis: annual_turnover_2. Load this using the
    `source()` function in the 01_start.R script.

    "04_figures_tables.R": This is the main workhouse for figure/table production and
    supporting analyses. This script generates the key figures and summary statistics
    used in the study that then get saved in the manuscript_figures folder. Note that all
    maps were produced using Python code found in the "supporting_code"" folder.

    "supporting_generate_data.R": This script processes supporting data used in the analysis, primarily the varying ground-based datasets of leaf water content.

    "supporting_process_land_cover.R": This takes annual MODIS land cover distributions and processes them through a multi-step filtering process so that they can be used in preprocessing of datasets in python.

  15. O

    Python ETL Update Test Dataset

    • data.cambridgema.gov
    csv, xlsx, xml
    Updated Nov 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Python ETL Update Test Dataset [Dataset]. https://data.cambridgema.gov/dataset/Python-ETL-Update-Test-Dataset/tqxn-z38b
    Explore at:
    xml, xlsx, csvAvailable download formats
    Dataset updated
    Nov 18, 2025
    Description

    Script we use to test the python ETL update process on milo. Keep it private, but please do not delete.

  16. Python Data Science Handbook Dataset MD

    • kaggle.com
    zip
    Updated Apr 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sita berete (2024). Python Data Science Handbook Dataset MD [Dataset]. https://www.kaggle.com/datasets/sitaberete/python-datascience-handbook-dataset-md/versions/1
    Explore at:
    zip(10781434 bytes)Available download formats
    Dataset updated
    Apr 8, 2024
    Authors
    sita berete
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset is the https://github.com/jakevdp/PythonDataScienceHandbook's Jupyter notebooks converted into Markdown for better RAG. Of course, you can use it for purposes other than RAG as long as they don't violate the LICENSE terms.

    The https://github.com/jakevdp/PythonDataScienceHandbook contains the entire Python Data Science Handbook, in the form of Jupyter notebooks.

  17. Data of Python

    • kaggle.com
    zip
    Updated Sep 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    rangeen almezory (2020). Data of Python [Dataset]. https://www.kaggle.com/datasets/rangeenalmezory/data-of-python
    Explore at:
    zip(3138 bytes)Available download formats
    Dataset updated
    Sep 18, 2020
    Authors
    rangeen almezory
    Description

    Dataset

    This dataset was created by rangeen almezory

    Contents

  18. w

    Dataset of book subjects that contain Python data science handbook :...

    • workwithdata.com
    Updated Nov 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2024). Dataset of book subjects that contain Python data science handbook : essential tools for working with data [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=Python+data+science+handbook+:+essential+tools+for+working+with+data&j=1&j0=books
    Explore at:
    Dataset updated
    Nov 7, 2024
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about book subjects. It has 2 rows and is filtered where the books is Python data science handbook : essential tools for working with data. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.

  19. Z

    Data Cleaning, Translation & Split of the Dataset for the Automatic...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Aug 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Köhler, Juliane (2022). Data Cleaning, Translation & Split of the Dataset for the Automatic Classification of Documents for the Classification System for the Berliner Handreichungen zur Bibliotheks- und Informationswissenschaft [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6957841
    Explore at:
    Dataset updated
    Aug 8, 2022
    Authors
    Köhler, Juliane
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cleaned_Dataset.csv – The combined CSV files of all scraped documents from DABI, e-LiS, o-bib and Springer.

    Data_Cleaning.ipynb – The Jupyter Notebook with python code for the analysis and cleaning of the original dataset.

    ger_train.csv – The German training set as CSV file.

    ger_validation.csv – The German validation set as CSV file.

    en_test.csv – The English test set as CSV file.

    en_train.csv – The English training set as CSV file.

    en_validation.csv – The English validation set as CSV file.

    splitting.py – The python code for splitting a dataset into train, test and validation set.

    DataSetTrans_de.csv – The final German dataset as a CSV file.

    DataSetTrans_en.csv – The final English dataset as a CSV file.

    translation.py – The python code for translating the cleaned dataset.

  20. Pokkemon Dataset_csv

    • kaggle.com
    zip
    Updated Aug 18, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BHARGAV NATH (2021). Pokkemon Dataset_csv [Dataset]. https://www.kaggle.com/datasets/bhargavnath/new-dataset-fr-pandas
    Explore at:
    zip(13955 bytes)Available download formats
    Dataset updated
    Aug 18, 2021
    Authors
    BHARGAV NATH
    Description

    Dataset

    This dataset was created by BHARGAV NATH

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Chinmaya (2024). Dataset_Python_Question_Answer [Dataset]. https://www.kaggle.com/datasets/chinmayadatt/dataset-python-question-answer
Organization logo

Dataset_Python_Question_Answer

Answer common questions about the Python programming language

Explore at:
zip(189137 bytes)Available download formats
Dataset updated
Mar 29, 2024
Authors
Chinmaya
License

Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically

Description

This dataset is about Python programming. Question and answers are generated using Gemma. There are more than four hundred questions and their corresponding answers about Python programming.

Questions are ranging from concepts like data-types, variables and keywords to regular-expression and threading.

I have used this dataset here

The code used for dataset generated is available here

Search
Clear search
Close search
Google apps
Main menu