100+ datasets found
  1. Pandas Practice Dataset

    • kaggle.com
    zip
    Updated Jan 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mrityunjay Pathak (2023). Pandas Practice Dataset [Dataset]. https://www.kaggle.com/datasets/themrityunjaypathak/pandas-practice-dataset/discussion
    Explore at:
    zip(493 bytes)Available download formats
    Dataset updated
    Jan 27, 2023
    Authors
    Mrityunjay Pathak
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    What is Pandas?

    Pandas is a Python library used for working with data sets.

    It has functions for analyzing, cleaning, exploring, and manipulating data.

    The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in 2008.

    Why Use Pandas?

    Pandas allows us to analyze big data and make conclusions based on statistical theories.

    Pandas can clean messy data sets, and make them readable and relevant.

    Relevant data is very important in data science.

    What Can Pandas Do?

    Pandas gives you answers about the data. Like:

    Is there a correlation between two or more columns?

    What is average value?

    Max value?

    Min value?

  2. H

    Python Codes for Data Analysis of The Impact of COVID-19 on Technical...

    • dataverse.harvard.edu
    • figshare.com
    Updated Mar 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elizabeth Szkirpan (2022). Python Codes for Data Analysis of The Impact of COVID-19 on Technical Services Units Survey Results [Dataset]. http://doi.org/10.7910/DVN/SXMSDZ
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 21, 2022
    Dataset provided by
    Harvard Dataverse
    Authors
    Elizabeth Szkirpan
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Copies of Anaconda 3 Jupyter Notebooks and Python script for holistic and clustered analysis of "The Impact of COVID-19 on Technical Services Units" survey results. Data was analyzed holistically using cleaned and standardized survey results and by library type clusters. To streamline data analysis in certain locations, an off-shoot CSV file was created so data could be standardized without compromising the integrity of the parent clean file. Three Jupyter Notebooks/Python scripts are available in relation to this project: COVID_Impact_TechnicalServices_HolisticAnalysis (a holistic analysis of all survey data) and COVID_Impact_TechnicalServices_LibraryTypeAnalysis (a clustered analysis of impact by library type, clustered files available as part of the Dataverse for this project).

  3. "module-utilities": A Python package for simplify creating python modules.

    • catalog.data.gov
    • s.cnmilf.com
    Updated Apr 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2024). "module-utilities": A Python package for simplify creating python modules. [Dataset]. https://catalog.data.gov/dataset/module-utilities-a-python-package-for-simplify-creating-python-modules
    Explore at:
    Dataset updated
    Apr 11, 2024
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    "module-utilities" is a python package of utilities to simplify working with python packages.The main features of module-utilities are as follows: "cached" module: A module to cache class attributes and methods. Right now, this uses a standard python dictionary for storage. Future versions will hopefully be more robust to threading and shared cache."docfiller" module: A module to share documentation. This is adapted from the pandas doc decorator. There are a host of utilities build around this."docinhert": An interface to "docstring-inheritance" module. This can be combined with "docfiller" to make creating related function/class documentation easy.

  4. d

    Python code used to determine average yearly and monthly tourism per 1000...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Python code used to determine average yearly and monthly tourism per 1000 residents for public-supply water service areas [Dataset]. https://catalog.data.gov/dataset/python-code-used-to-determine-average-yearly-and-monthly-tourism-per-1000-residents-for-pu
    Explore at:
    Dataset updated
    Nov 12, 2025
    Dataset provided by
    U.S. Geological Survey
    Description

    This child item describes Python code used to estimate average yearly and monthly tourism per 1000 residents within public-supply water service areas. Increases in population due to tourism may impact amounts of water used by public-supply water systems. This data release contains model input datasets, Python code used to develop the tourism information, and output estimates of tourism. This dataset is part of a larger data release using machine learning to predict public supply water use for 12-digit hydrologic units from 2000-2020. Output from this code was used as an input feature in the public supply delivery and water use machine learning models. This page includes the following files: tourism_input_data.zip - a zip file containing input data sets used by the tourism Python code tourism_output.zip - a zip file with output produced by the tourism Python code README.txt - a README file describing the data files and code requirements tourism_study_code.zip - a zip file containing the Python code used to create the tourism feature variable

  5. Python scripts used to generate the figures in "An algorithm to identify...

    • catalog.data.gov
    • datasets.ai
    • +3more
    Updated Jul 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2022). Python scripts used to generate the figures in "An algorithm to identify vapor-liquid-liquid equilibria of binary mixtures from vapor-liquid equilibria" [Dataset]. https://catalog.data.gov/dataset/python-scripts-used-to-generate-the-figures-in-an-algorithm-to-identify-vapor-liquid-liqui
    Explore at:
    Dataset updated
    Jul 29, 2022
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    The files in this repository can be used to generate the complete set of figures in the paper "An algorithm to identify vapor-liquid-liquid equilibria from vapor-liquid equilibria". The zip file, when expanded, includes a conda environment to populate the dependencies, and a set of python scripts. Running make_figures.py will regenerate all the figures, demonstrating how to use the algorithm.

  6. Replication Package: Unboxing Default Argument Breaking Changes in 1 + 2...

    • zenodo.org
    application/gzip
    Updated Jul 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    João Eduardo Montandon; Luciana Lourdes Silva; Cristiano Politowski; Daniel Prates; Arthur Bonifácio; Ghizlane El Boussaidi; João Eduardo Montandon; Luciana Lourdes Silva; Cristiano Politowski; Daniel Prates; Arthur Bonifácio; Ghizlane El Boussaidi (2024). Replication Package: Unboxing Default Argument Breaking Changes in 1 + 2 Data Science Libraries in Python [Dataset]. http://doi.org/10.5281/zenodo.11584961
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jul 15, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    João Eduardo Montandon; Luciana Lourdes Silva; Cristiano Politowski; Daniel Prates; Arthur Bonifácio; Ghizlane El Boussaidi; João Eduardo Montandon; Luciana Lourdes Silva; Cristiano Politowski; Daniel Prates; Arthur Bonifácio; Ghizlane El Boussaidi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Replication Package

    This repository contains data and source files needed to replicate our work described in the paper "Unboxing Default Argument Breaking Changes in Scikit Learn".

    Requirements

    We recommend the following requirements to replicate our study:

    1. Internet access
    2. At least 100GB of space
    3. Docker installed
    4. Git installed

    Package Structure

    We relied on Docker containers to provide a working environment that is easier to replicate. Specifically, we configure the following containers:

    • data-analysis, an R-based Container we used to run our data analysis.
    • data-collection, a Python Container we used to collect Scikit's default arguments and detect them in client applications.
    • database, a Postgres Container we used to store clients' data, obtainer from Grotov et al.
    • storage, a directory used to store the data processed in data-analysis and data-collection. This directory is shared in both containers.
    • docker-compose.yml, the Docker file that configures all containers used in the package.

    In the remainder of this document, we describe how to set up each container properly.

    Using VSCode to Setup the Package

    We selected VSCode as the IDE of choice because its extensions allow us to implement our scripts directly inside the containers. In this package, we provide configuration parameters for both data-analysis and data-collection containers. This way you can directly access and run each container inside it without any specific configuration.

    You first need to set up the containers

    $ cd /replication/package/folder
    $ docker-compose build
    $ docker-compose up
    # Wait docker creating and running all containers
    

    Then, you can open them in Visual Studio Code:

    1. Open VSCode in project root folder
    2. Access the command palette and select "Dev Container: Reopen in Container"
      1. Select either Data Collection or Data Analysis.
    3. Start working

    If you want/need a more customized organization, the remainder of this file describes it in detail.

    Longest Road: Manual Package Setup

    Database Setup

    The database container will automatically restore the dump in dump_matroskin.tar in its first launch. To set up and run the container, you should:

    Build an image:

    $ cd ./database
    $ docker build --tag 'dabc-database' .
    $ docker image ls
    REPOSITORY  TAG    IMAGE ID    CREATED     SIZE
    dabc-database latest  b6f8af99c90d  50 minutes ago  18.5GB
    

    Create and enter inside the container:

    $ docker run -it --name dabc-database-1 dabc-database
    $ docker exec -it dabc-database-1 /bin/bash
    root# psql -U postgres -h localhost -d jupyter-notebooks
    jupyter-notebooks=# \dt
           List of relations
     Schema |    Name    | Type | Owner
    --------+-------------------+-------+-------
     public | Cell       | table | root
     public | Code_cell     | table | root
     public | Md_cell      | table | root
     public | Notebook     | table | root
     public | Notebook_features | table | root
     public | Notebook_metadata | table | root
     public | repository    | table | root
    

    If you got the tables list as above, your database is properly setup.

    It is important to mention that this database is extended from the one provided by Grotov et al.. Basically, we added three columns in the table Notebook_features (API_functions_calls, defined_functions_calls, andother_functions_calls) containing the function calls performed by each client in the database.

    Data Collection Setup

    This container is responsible for collecting the data to answer our research questions. It has the following structure:

    • dabcs.py, extract DABCs from Scikit Learn source code, and export them to a CSV file.
    • dabcs-clients.py, extract function calls from clients and export them to a CSV file. We rely on a modified version of Matroskin to leverage the function calls. You can find the tool's source code in the `matroskin`` directory.
    • Makefile, commands to set up and run both dabcs.py and dabcs-clients.py
    • matroskin, the directory containing the modified version of matroskin tool. We extended the library to collect the function calls performed on the client notebooks of Grotov's dataset.
    • storage, a docker volume where the data-collection should save the exported data. This data will be used later in Data Analysis.
    • requirements.txt, Python dependencies adopted in this module.

    Note that the container will automatically configure this module for you, e.g., install dependencies, configure matroskin, download scikit learn source code, etc. For this, you must run the following commands:

    $ cd ./data-collection
    $ docker build --tag "data-collection" .
    $ docker run -it -d --name data-collection-1 -v $(pwd)/:/data-collection -v $(pwd)/../storage/:/data-collection/storage/ data-collection
    $ docker exec -it data-collection-1 /bin/bash
    $ ls
    Dockerfile Makefile config.yml dabcs-clients.py dabcs.py matroskin storage requirements.txt utils.py
    

    If you see project files, it means the container is configured accordingly.

    Data Analysis Setup

    We use this container to conduct the analysis over the data produced by the Data Collection container. It has the following structure:

    • dependencies.R, an R script containing the dependencies used in our data analysis.
    • data-analysis.Rmd, the R notebook we used to perform our data analysis
    • datasets, a docker volume pointing to the storage directory.

    Execute the following commands to run this container:

    $ cd ./data-analysis
    $ docker build --tag "data-analysis" .
    $ docker run -it -d --name data-analysis-1 -v $(pwd)/:/data-analysis -v $(pwd)/../storage/:/data-collection/datasets/ data-analysis
    $ docker exec -it data-analysis-1 /bin/bash
    $ ls
    data-analysis.Rmd datasets dependencies.R Dockerfile figures Makefile
    

    If you see project files, it means the container is configured accordingly.

    A note on storage shared folder

    As mentioned, the storage folder is mounted as a volume and shared between data-collection and data-analysis containers. We compressed the content of this folder due to space constraints. Therefore, before starting working on Data Collection or Data Analysis, make sure you extracted the compressed files. You can do this by running the Makefile inside storage folder.

    $ make unzip # extract files
    $ ls
    clients-dabcs.csv clients-validation.csv dabcs.csv Makefile scikit-learn-versions.csv versions.csv
    $ make zip # compress files
    $ ls
    csv-files.tar.gz Makefile
  7. H

    Creating Curve Number Grid using PyQGIS through Jupyter Notebook in mygeohub...

    • hydroshare.org
    • beta.hydroshare.org
    • +1more
    zip
    Updated Apr 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sayan Dey; Shizhang Wang; Venkatesh Merwade (2020). Creating Curve Number Grid using PyQGIS through Jupyter Notebook in mygeohub [Dataset]. http://doi.org/10.4211/hs.abf67aad0eb64a53bf787d369afdcc84
    Explore at:
    zip(105.5 MB)Available download formats
    Dataset updated
    Apr 28, 2020
    Dataset provided by
    HydroShare
    Authors
    Sayan Dey; Shizhang Wang; Venkatesh Merwade
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    This resource serves as a template for creating a curve number grid raster file which could be used to create corresponding maps or for further utilization, soil data and reclassified land-use raster files are created along the process, user has to provided or connect to a set of shape-files including boundary of watershed, soil data and land-use containing this watershed, land-use reclassification and curve number look up table. Script contained in this resource mainly uses PyQGIS through Jupyter Notebook for majority of the processing with a touch of Pandas for data manipulation. Detailed description of procedure are commented in the script.

  8. Data from: Project to create snake game by using python

    • kaggle.com
    zip
    Updated Jul 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kamal Acharya (2025). Project to create snake game by using python [Dataset]. https://www.kaggle.com/datasets/acharyakamal/project-to-create-snake-game-by-using-python
    Explore at:
    zip(700832 bytes)Available download formats
    Dataset updated
    Jul 9, 2025
    Authors
    Kamal Acharya
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    It’s the most famous game we used to play in our childhood before the advent of smartphones. This is a very simple project in which the snake will eat the food when its mouth touches the food. Furthermore, the length of the snake will keep on increasing after eating the food and if the snake touches the screen or itself the game will be over.

  9. Creating_simple_Sintetic_dataset

    • kaggle.com
    zip
    Updated Jan 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lala Ibadullayeva (2025). Creating_simple_Sintetic_dataset [Dataset]. https://www.kaggle.com/datasets/lalaibadullayeva/creating-simple-sintetic-dataset
    Explore at:
    zip(476698 bytes)Available download formats
    Dataset updated
    Jan 20, 2025
    Authors
    Lala Ibadullayeva
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Description

    Overview: This dataset contains three distinct fake datasets generated using the Faker and Mimesis libraries. These libraries are commonly used for generating realistic-looking synthetic data for testing, prototyping, and data science projects. The datasets were created to simulate real-world scenarios while ensuring no sensitive or private information is included.

    Data Generation Process: The data creation process is documented in the accompanying notebook, Creating_simple_Sintetic_data.ipynb. This notebook showcases the step-by-step procedure for generating synthetic datasets with customizable structures and fields using the Faker and Mimesis libraries.

    File Contents:

    Datasets: CSV files containing the three synthetic datasets. Notebook: Creating_simple_Sintetic_data.ipynb detailing the data generation process and the code used to create these datasets.

  10. d

    Data from: Improving access and use of climate projections for ecological...

    • search.dataone.org
    • data.niaid.nih.gov
    • +2more
    Updated Jul 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrea Paz; Thomas Lauber; Thomas W. Crowther; Johan van den Hoogen (2025). Improving access and use of climate projections for ecological research through the use of a new Python tool [Dataset]. http://doi.org/10.5061/dryad.3r2280gph
    Explore at:
    Dataset updated
    Jul 27, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Andrea Paz; Thomas Lauber; Thomas W. Crowther; Johan van den Hoogen
    Time period covered
    Jan 1, 2024
    Description

    Over the past decade, the use of future climate projections from the Coupled Model Intercomparison Project (CMIP) has become central in biodiversity science. Pre-packaged datasets containing future projections of the widely used bioclimatic variables, for different times and socio-economic pathways, have contributed immensely to the study of climate change implications for biodiversity. However, these datasets lack the flexibility to obtain projections to other target years, and the use of raw data requires coding and spatial information systems expertise. The Python tool, chelsa-cmip6, developed by Karger et. al 2023 provides the flexibility needed by allowing users to generate bioclimatic variables for the time of their choice provided the selected general circulation model and socioeconomic pathway combination exists. This is a fantastic step forward in bringing flexibility to the use of climate datasets in biodiversity and will allow for more widespread use of data provided by CMIP6..., This is a jupiter notebook that allows the user to check if the desired combination of GCM and SSP is available for the selected time and area, and then uses the chelsa-cmip6 tool to create bioclimatic variables based on the user-selected scenarios. This notebook was created to facilitate use of the chelsa-cmip6 Python tool by Karger et al 2023., , # Improving access and use of climate projections for ecological research through the use of a new Python tool

    # Improving access and use of climate projections for ecological research through the use of a new Python tool

    ---

    Brief summary of contents

    Data archive for:

    Improving access and use of climate projections for ecological research through the use of a new Python tool

    by Andrea Paz, Thomas Lauber, Thomas W. Crowther and Johan van den Hoogen

    This archive contains an example notebook script that allows the user to check if the desired combination of GCM and SSP is available for the selected time and area, and then uses the chelsa-cmip6 Python tool (Karger et al 2023) to create bioclimatic variables based on the user-selected scenarios. The bioclimatic variables can then be converted to GeoTiffs, downloaded and used in different applications through R, Python, Google Earth Engine, or others.

    This Notebook can be opened locally or uploaded to Google Colab, Deepnote, et...

  11. 14.2 Python for Everyone

    • hub.arcgis.com
    Updated Mar 4, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Iowa Department of Transportation (2017). 14.2 Python for Everyone [Dataset]. https://hub.arcgis.com/documents/7a3a96a7f9fe46b3bc853691621ccf7d
    Explore at:
    Dataset updated
    Mar 4, 2017
    Dataset authored and provided by
    Iowa Department of Transportationhttps://iowadot.gov/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Do you spend a lot of time repeating workflows, such as copying data, editing files, and setting up map documents? Did you know that you can use Python to automate data reproduction, data management, map document display, and many of your other daily tasks in ArcGIS?This course provides the building blocks needed to use Python. You will create and run scripts using these building blocks and can apply them directly inside of ArcGIS and to your own workflows.After completing this course, you will be able to:Determine where to write and run a Python script.Differentiate Python language elements and determine where to apply them.Follow a script workflow.Develop a Python script to run statements and functions.Solve common syntax errors.

  12. Omega-Prime: Data Model, Data Format and Python Library for Handling Ground...

    • doi.org
    • zenodo.org
    zip
    Updated Jul 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sven Tarlowski; Sven Tarlowski (2025). Omega-Prime: Data Model, Data Format and Python Library for Handling Ground Truth Traffic Data [Dataset]. http://doi.org/10.5281/zenodo.16565213
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 29, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sven Tarlowski; Sven Tarlowski
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data Model, Format and Python Library for ground truth data containing information on dynamic objects, map and environmental factors optimized for representing urban traffic. The repository contains:

    Data Model and Specification

    see ./docs/omega_prime_specification.md

    • 🌍 Data Model: What signals exist and how these are defined.
    • 🧾 Data Format Specification: How to exchange and store those signals.

    Python Library

    • 🔨 Create omega-prime files from many sources (see ./tutorial.ipynb):
    • 🗺️ Map Association: Associate Object Location with Lanes from OpenDRIVE or OSI Maps (see tutorial_locator.ipynb)
    • 📺 Plotting of data: interactive top view plots using altair
    • Validation of data: check if your data conforms to the omega-prime specification (e.g., correct yaw) using pandera
    • 📐 Interpolation of data: bring your data into a fixed frequency
    • 📈 Metrics: compute interaction metrics like PET, TTC, THW (see tutorial_metrics.ipynb)
    • 🚀 Fast Processing directly on DataFrames using polars, polars-st

    The data model and format utilize ASAM OpenDRIVE and ASAM Open-Simulation-Interface GroundTruth messages. omega-prime sets requirements on presence and quality of ASAM OSI GroundTruth messages and ASAM OpenDRIVE files and defines a file format for the exchange and storage of these.

    Omega-Prime is the successor of the OMEGAFormat. It has the benefit that its definition is directly based on the established standards ASAM OSI and ASAM OpenDRIVE and carries over the data quality requirements and the data tooling from OMEGAFormat. Therefore, it should be easier to incorporate omega-prime into existing workflows and tooling.

    To learn more about the example data read example_files/README.md. Example data was taken and created from esmini.

  13. S&T Project 22071 Code: Figure Generation Python Code

    • data.usbr.gov
    Updated Apr 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Bureau of Reclamation (2025). S&T Project 22071 Code: Figure Generation Python Code [Dataset]. https://data.usbr.gov/catalog/8075/item/128825
    Explore at:
    Dataset updated
    Apr 15, 2025
    Dataset authored and provided by
    United States Bureau of Reclamationhttp://www.usbr.gov/
    Description

    Code written in Python to create spatial plots of publicly available atmospheric and snow data. The user must have access to publicly available ERA5-Interim atmospheric data as well as SNOTEL snowfall timeseries data. Additionally, the plotting code is intended to be used on some of the outputs from the Weather Typing Algorithm written by Andreas Prein and owned by NCAR

  14. Dataset Sales - Aleatory Data - by python numpy

    • kaggle.com
    zip
    Updated Jul 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Italo Marcelo (2020). Dataset Sales - Aleatory Data - by python numpy [Dataset]. https://kaggle.com/italomarcelo/dataset-sales-aleatory-data-by-python-numpy
    Explore at:
    zip(4756152 bytes)Available download formats
    Dataset updated
    Jul 4, 2020
    Authors
    Italo Marcelo
    Description

    Dataset

    This dataset was created by Italo Marcelo

    Contents

    It contains the following files:

  15. Python Data Science Handbook

    • kaggle.com
    zip
    Updated Dec 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Timo Bozsolik (2021). Python Data Science Handbook [Dataset]. https://www.kaggle.com/timoboz/python-data-science-handbook
    Explore at:
    zip(16028316 bytes)Available download formats
    Dataset updated
    Dec 20, 2021
    Authors
    Timo Bozsolik
    Description

    Python Data Science Handbook

    This repository contains the entire Python Data Science Handbook, in the form of (free!) Jupyter notebooks.

    cover image

    How to Use this Book

    About

    The book was written and tested with Python 3.5, though other Python versions (including Python 2.7) should work in nearly all cases.

    The book introduces the core libraries essential for working with data in Python: particularly IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and related packages. Familiarity with Python as a language is assumed; if you need a quick introduction to the language itself, see the free companion project, A Whirlwind Tour of Python: it's a fast-paced introduction to the Python language aimed at researchers and scientists.

    See Index.ipynb for an index of the notebooks available to accompany the text.

    Software

    The code in the book was tested with Python 3.5, though most (but not all) will also work correctly with Python 2.7 and other older Python versions.

    The packages I used to run the code in the book are listed in requirements.txt (Note that some of these exact version numbers may not be available on your platform: you may have to tweak them for your own use). To install the requirements using conda, run the following at the command-line:

    $ conda install --file requirements.txt
    

    To create a stand-alone environment named PDSH with Python 3.5 and all the required package versions, run the following:

    $ conda create -n PDSH python=3.5 --file requirements.txt
    

    You can read more about using conda environments in the Managing Environments section of the conda documentation.

    License

    Code

    The code in this repository, including all code samples in the notebooks listed above, is released under the MIT license. Read more at the Open Source Initiative.

    Text

    The text content of the book is released under the CC-BY-NC-ND license. Read more at Creative Commons.

  16. a

    Python for ArcGIS - Working with ArcGIS Notebooks

    • edu.hub.arcgis.com
    Updated Oct 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Education and Research (2024). Python for ArcGIS - Working with ArcGIS Notebooks [Dataset]. https://edu.hub.arcgis.com/documents/16fbaf21dc7b41c187ebcfd9f6ea1d58
    Explore at:
    Dataset updated
    Oct 8, 2024
    Dataset authored and provided by
    Education and Research
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This resource was created by Esri Canada Education and Research. To browse our full collection of higher-education learning resources, please visit https://hed.esri.ca/resourcefinder/.This tutorial introduces you to using Python code in a Jupyter Notebook, an open source web application that enables you to create and share documents that contain rich text, equations and multimedia, alongside executable code and visualization of analysis outputs. The tutorial begins by stepping through the basics of setting up and being productive with Python notebooks. You will be introduced to ArcGIS Notebooks, which are Python Notebooks that are well-integrated within the ArcGIS platform. Finally, you will be guided through a series of ArcGIS Notebooks that illustrate how to create compelling notebooks for data science that integrate your own Python scripts using the ArcGIS API for Python and ArcPy in combination with thousands of open source Python libraries to enhance your analysis and visualization.To download the dataset Labs, click the Open button to the top right. This will automatically download a ZIP file containing all files and data required.You can also clone the tutorial documents and datasets for this GitHub repo: https://github.com/highered-esricanada/arcgis-notebooks-tutorial.git.Software & Solutions Used: Required: This tutorial was last tested on August 27th, 2024, using ArcGIS Pro 3.3. If you're using a different version of ArcGIS Pro, you may encounter different functionality and results.Recommended: ArcGIS Online subscription account with permissions to use advanced Notebooks and GeoEnrichmentOptional: Notebook Server for ArcGIS Enterprise 11.3+Time to Complete: 2 h (excludes processing time)File Size: 196 MBDate Created: January 2022Last Updated: August 27, 2024

  17. t

    Data from: Decoding Wayfinding: Analyzing Wayfinding Processes in the...

    • researchdata.tuwien.at
    html, pdf, zip
    Updated Mar 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Negar Alinaghi; Ioannis Giannopoulos; Ioannis Giannopoulos; Negar Alinaghi; Negar Alinaghi; Negar Alinaghi (2025). Decoding Wayfinding: Analyzing Wayfinding Processes in the Outdoor Environment [Dataset]. http://doi.org/10.48436/m2ha4-t1v92
    Explore at:
    html, zip, pdfAvailable download formats
    Dataset updated
    Mar 19, 2025
    Dataset provided by
    TU Wien
    Authors
    Negar Alinaghi; Ioannis Giannopoulos; Ioannis Giannopoulos; Negar Alinaghi; Negar Alinaghi; Negar Alinaghi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    How To Cite?

    Alinaghi, N., Giannopoulos, I., Kattenbeck, M., & Raubal, M. (2025). Decoding wayfinding: analyzing wayfinding processes in the outdoor environment. International Journal of Geographical Information Science, 1–31. https://doi.org/10.1080/13658816.2025.2473599

    Link to the paper: https://www.tandfonline.com/doi/full/10.1080/13658816.2025.2473599

    Folder Structure

    The folder named “submission” contains the following:

    1. “pythonProject”: This folder contains all the Python files and subfolders needed for analysis.
    2. ijgis.yml: This file lists all the Python libraries and dependencies required to run the code.

    Setting Up the Environment

    1. Use the ijgis.yml file to create a Python project and environment. Ensure you activate the environment before running the code.
    2. The pythonProject folder contains several .py files and subfolders, each with specific functionality as described below.

    Subfolders

    1. Data_4_IJGIS

    • This folder contains the data used for the results reported in the paper.
    • Note: The data analysis that we explain in this paper already begins with the synchronization and cleaning of the recorded raw data. The published data is already synchronized and cleaned. Both the cleaned files and the merged files with features extracted for them are given in this directory. If you want to perform the segmentation and feature extraction yourself, you should run the respective Python files yourself. If not, you can use the “merged_…csv” files as input for the training.

    2. results_[DateTime] (e.g., results_20240906_15_00_13)

    • This folder will be generated when you run the code and will store the output of each step.
    • The current folder contains results created during code debugging for the submission.
    • When you run the code, a new folder with fresh results will be generated.

    Python Files

    1. helper_functions.py

    • Contains reusable functions used throughout the analysis.
    • Each function includes a description of its purpose and the input parameters required.

    2. create_sanity_plots.py

    • Generates scatter plots like those in Figure 3 of the paper.
    • Although the code has been run for all 309 trials, it can be used to check the sample data provided.
    • Output: A .png file for each column of the raw gaze and IMU recordings, color-coded with logged events.
    • Usage: Run this file to create visualizations similar to Figure 3.

    3. overlapping_sliding_window_loop.py

    • Implements overlapping sliding window segmentation and generates plots like those in Figure 4.
    • Output:
      • Two new subfolders, “Gaze” and “IMU”, will be added to the Data_4_IJGIS folder.
      • Segmented files (default: 2–10 seconds with a 1-second step size) will be saved as .csv files.
      • A visualization of the segments, similar to Figure 4, will be automatically generated.

    4. gaze_features.py & imu_features.py (Note: there has been an update to the IDT function implementation in the gaze_features.py on 19.03.2025.)

    • These files compute features as explained in Tables 1 and 2 of the paper, respectively.
    • They process the segmented recordings generated by the overlapping_sliding_window_loop.py.
    • Usage: Just to know how the features are calculated, you can run this code after the segmentation with the sliding window and run these files to calculate the features from the segmented data.

    5. training_prediction.py

    • This file contains the main machine learning analysis of the paper. This file contains all the code for the training of the model, its evaluation, and its use for the inference of the “monitoring part”. It covers the following steps:
    a. Data Preparation (corresponding to Section 5.1.1 of the paper)
    • Prepares the data according to the research question (RQ) described in the paper. Since this data was collected with several RQs in mind, we remove parts of the data that are not related to the RQ of this paper.
    • A function named plot_labels_comparison(df, save_path, x_label_freq=10, figsize=(15, 5)) in line 116 visualizes the data preparation results. As this visualization is not used in the paper, the line is commented out, but if you want to see visually what has been changed compared to the original data, you can comment out this line.
    b. Training/Validation/Test Split
    • Splits the data for machine learning experiments (an explanation can be found in Section 5.1.1. Preparation of data for training and inference of the paper).
    • Make sure that you follow the instructions in the comments to the code exactly.
    • Output: The split data is saved as .csv files in the results folder.
    c. Machine and Deep Learning Experiments

    This part contains three main code blocks:

    iii. One for the XGboost code with correct hyperparameter tuning:
    Please read the instructions for each block carefully to ensure that the code works smoothly. Regardless of which block you use, you will get the classification results (in the form of scores) for unseen data. The way we empirically test the confidence threshold of

    • MLP Network (Commented Out): This code was used for classification with the MLP network, and the results shown in Table 3 are from this code. If you wish to use this model, please comment out the following blocks accordingly.
    • XGBoost without Hyperparameter Tuning: If you want to run the code but do not want to spend time on the full training with hyperparameter tuning (as was done for the paper), just uncomment this part. This will give you a simple, untuned model with which you can achieve at least some results.
    • XGBoost with Hyperparameter Tuning: If you want to train the model the way we trained it for the analysis reported in the paper, use this block (the plots in Figure 7 are from this block). We ran this block with different feature sets and different segmentation files and created a simple bar chart from the saved results, shown in Figure 6.

    Note: Please read the instructions for each block carefully to ensure that the code works smoothly. Regardless of which block you use, you will get the classification results (in the form of scores) for unseen data. The way we empirically calculated the confidence threshold of the model (explained in the paper in Section 5.2. Part II: Decoding surveillance by sequence analysis) is given in this block in lines 361 to 380.

    d. Inference (Monitoring Part)
    • Final inference is performed using the monitoring data. This step produces a .csv file containing inferred labels.
    • Figure 8 in the paper is generated using this part of the code.

    6. sequence_analysis.py

    • Performs analysis on the inferred data, producing Figures 9 and 10 from the paper.
    • This file reads the inferred data from the previous step and performs sequence analysis as described in Sections 5.2.1 and 5.2.2.

    Licenses

    The data is licensed under CC-BY, the code is licensed under MIT.

  18. Z

    DustNet - structured data and Python code to reproduce the model,...

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    Updated Jul 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nowak, T. E.; Augousti, Andy T.; Simmons, Benno I.; Siegert, Stefan (2024). DustNet - structured data and Python code to reproduce the model, statistical analysis and figures [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10631953
    Explore at:
    Dataset updated
    Jul 7, 2024
    Dataset provided by
    University of Exeter
    Kingston University
    Authors
    Nowak, T. E.; Augousti, Andy T.; Simmons, Benno I.; Siegert, Stefan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data and Python code used for AOD prediction with DustNet model - a Machine Learning/AI based forecasting.

    Model input data and code

    Processed MODIS AOD data (from Aqua and Terra) and selected ERA5 variables* ready to reproduce the DustNet model results or for similar forecasting with Machine Learning. These long-term daily timeseries (2003-2022) are provided as n-dimensional NumPy arrays. The Python code to handle the data and run the DustNet model** is included as Jupyter Notebook ‘DustNet_model_code.ipynb’. A subfolder with normalised and split data into training/validation/testing sets is also provided with Python code for two additional ML based models** used for comparison (U-NET and Conv2D). Pre-trained models are also archived here as TensorFlow files.

    Model output data and code

    This dataset was constructed by running the ‘DustNet_model_code.ipynb’ (see above). It consists of 1095 days of forecased AOD data (2020-2022) by CAMS, DustNet model, naïve prediction (persistence) and gridded climatology. The ground truth raw AOD data form MODIS is provided for comparison and statystical analysis of predictions. It is intended for a quick reproduction of figures and statystical analysis presented in DustNet introducing paper.

    *datasets are NumPy arrays (v1.23) created in Python v3.8.18.

    **all ML models were created with Keras in Python v3.10.10.

  19. Data from: Ecosystem-Level Determinants of Sustained Activity in Open-Source...

    • zenodo.org
    application/gzip, bin +2
    Updated Aug 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb (2024). Ecosystem-Level Determinants of Sustained Activity in Open-Source Projects: A Case Study of the PyPI Ecosystem [Dataset]. http://doi.org/10.5281/zenodo.1419788
    Explore at:
    bin, application/gzip, zip, text/x-pythonAvailable download formats
    Dataset updated
    Aug 2, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb
    License

    https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html

    Description
    Replication pack, FSE2018 submission #164:
    ------------------------------------------
    
    **Working title:** Ecosystem-Level Factors Affecting the Survival of Open-Source Projects: 
    A Case Study of the PyPI Ecosystem
    
    **Note:** link to data artifacts is already included in the paper. 
    Link to the code will be included in the Camera Ready version as well.
    
    
    Content description
    ===================
    
    - **ghd-0.1.0.zip** - the code archive. This code produces the dataset files 
     described below
    - **settings.py** - settings template for the code archive.
    - **dataset_minimal_Jan_2018.zip** - the minimally sufficient version of the dataset.
     This dataset only includes stats aggregated by the ecosystem (PyPI)
    - **dataset_full_Jan_2018.tgz** - full version of the dataset, including project-level
     statistics. It is ~34Gb unpacked. This dataset still doesn't include PyPI packages
     themselves, which take around 2TB.
    - **build_model.r, helpers.r** - R files to process the survival data 
      (`survival_data.csv` in **dataset_minimal_Jan_2018.zip**, 
      `common.cache/survival_data.pypi_2008_2017-12_6.csv` in 
      **dataset_full_Jan_2018.tgz**)
    - **Interview protocol.pdf** - approximate protocol used for semistructured interviews.
    - LICENSE - text of GPL v3, under which this dataset is published
    - INSTALL.md - replication guide (~2 pages)
    Replication guide
    =================
    
    Step 0 - prerequisites
    ----------------------
    
    - Unix-compatible OS (Linux or OS X)
    - Python interpreter (2.7 was used; Python 3 compatibility is highly likely)
    - R 3.4 or higher (3.4.4 was used, 3.2 is known to be incompatible)
    
    Depending on detalization level (see Step 2 for more details):
    - up to 2Tb of disk space (see Step 2 detalization levels)
    - at least 16Gb of RAM (64 preferable)
    - few hours to few month of processing time
    
    Step 1 - software
    ----------------
    
    - unpack **ghd-0.1.0.zip**, or clone from gitlab:
    
       git clone https://gitlab.com/user2589/ghd.git
       git checkout 0.1.0
     
     `cd` into the extracted folder. 
     All commands below assume it as a current directory.
      
    - copy `settings.py` into the extracted folder. Edit the file:
      * set `DATASET_PATH` to some newly created folder path
      * add at least one GitHub API token to `SCRAPER_GITHUB_API_TOKENS` 
    - install docker. For Ubuntu Linux, the command is 
      `sudo apt-get install docker-compose`
    - install libarchive and headers: `sudo apt-get install libarchive-dev`
    - (optional) to replicate on NPM, install yajl: `sudo apt-get install yajl-tools`
     Without this dependency, you might get an error on the next step, 
     but it's safe to ignore.
    - install Python libraries: `pip install --user -r requirements.txt` . 
    - disable all APIs except GitHub (Bitbucket and Gitlab support were
     not yet implemented when this study was in progress): edit
     `scraper/init.py`, comment out everything except GitHub support
     in `PROVIDERS`.
    
    Step 2 - obtaining the dataset
    -----------------------------
    
    The ultimate goal of this step is to get output of the Python function 
    `common.utils.survival_data()` and save it into a CSV file:
    
      # copy and paste into a Python console
      from common import utils
      survival_data = utils.survival_data('pypi', '2008', smoothing=6)
      survival_data.to_csv('survival_data.csv')
    
    Since full replication will take several months, here are some ways to speedup
    the process:
    
    ####Option 2.a, difficulty level: easiest
    
    Just use the precomputed data. Step 1 is not necessary under this scenario.
    
    - extract **dataset_minimal_Jan_2018.zip**
    - get `survival_data.csv`, go to the next step
    
    ####Option 2.b, difficulty level: easy
    
    Use precomputed longitudinal feature values to build the final table.
    The whole process will take 15..30 minutes.
    
    - create a folder `
  20. Data and Python scripts to produce the results in "Summer 2024 in northern...

    • zenodo.org
    bin, text/x-python
    Updated Apr 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mika Rantanen; Mika Rantanen (2025). Data and Python scripts to produce the results in "Summer 2024 in northern Fennoscandia was very likely the warmest in 2,000 years" [Dataset]. http://doi.org/10.5281/zenodo.15179676
    Explore at:
    text/x-python, binAvailable download formats
    Dataset updated
    Apr 30, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mika Rantanen; Mika Rantanen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    Tree-ring reconstruction data and Python scripts for reproducing the figures and results included in the manuscript "Summer 2024 in northern Fennoscandia was very likely the warmest in 2,000 years".
    The manuscript is published in npj Climate and Atmospheric Science:

    Rantanen, M., Helama, S., Räisänen, J. et al. Summer 2024 in northern Fennoscandia was very likely the warmest in 2000 years. npj Clim Atmos Sci 8, 158 (2025). https://doi.org/10.1038/s41612-025-01046-4
    Please contact Mika Rantanen (mika.rantanen@fmi.fi) for further information.
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mrityunjay Pathak (2023). Pandas Practice Dataset [Dataset]. https://www.kaggle.com/datasets/themrityunjaypathak/pandas-practice-dataset/discussion
Organization logo

Pandas Practice Dataset

Dataset to Practice Your Pandas Skill's

Explore at:
4 scholarly articles cite this dataset (View in Google Scholar)
zip(493 bytes)Available download formats
Dataset updated
Jan 27, 2023
Authors
Mrityunjay Pathak
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

What is Pandas?

Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in 2008.

Why Use Pandas?

Pandas allows us to analyze big data and make conclusions based on statistical theories.

Pandas can clean messy data sets, and make them readable and relevant.

Relevant data is very important in data science.

What Can Pandas Do?

Pandas gives you answers about the data. Like:

Is there a correlation between two or more columns?

What is average value?

Max value?

Min value?

Search
Clear search
Close search
Google apps
Main menu