100+ datasets found

Pandas Practice Dataset
kaggle.com
zip
Updated Jan 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mrityunjay Pathak (2023). Pandas Practice Dataset [Dataset]. https://www.kaggle.com/datasets/themrityunjaypathak/pandas-practice-dataset/discussion
Explore at:
zip(493 bytes)Available download formats
Dataset updated
Jan 27, 2023
Authors
Mrityunjay Pathak
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
What is Pandas?

Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in 2008.

Why Use Pandas?

Pandas allows us to analyze big data and make conclusions based on statistical theories.

Pandas can clean messy data sets, and make them readable and relevant.

Relevant data is very important in data science.

What Can Pandas Do?

Pandas gives you answers about the data. Like:

Is there a correlation between two or more columns?

What is average value?

Max value?

Min value?
H
Python Codes for Data Analysis of The Impact of COVID-19 on Technical...
dataverse.harvard.edu
figshare.com
Updated Mar 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elizabeth Szkirpan (2022). Python Codes for Data Analysis of The Impact of COVID-19 on Technical Services Units Survey Results [Dataset]. http://doi.org/10.7910/DVN/SXMSDZ
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/SXMSDZ
Dataset updated
Mar 21, 2022
Dataset provided by
Harvard Dataverse
Authors
Elizabeth Szkirpan
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Copies of Anaconda 3 Jupyter Notebooks and Python script for holistic and clustered analysis of "The Impact of COVID-19 on Technical Services Units" survey results. Data was analyzed holistically using cleaned and standardized survey results and by library type clusters. To streamline data analysis in certain locations, an off-shoot CSV file was created so data could be standardized without compromising the integrity of the parent clean file. Three Jupyter Notebooks/Python scripts are available in relation to this project: COVID_Impact_TechnicalServices_HolisticAnalysis (a holistic analysis of all survey data) and COVID_Impact_TechnicalServices_LibraryTypeAnalysis (a clustered analysis of impact by library type, clustered files available as part of the Dataverse for this project).
"module-utilities": A Python package for simplify creating python modules.
catalog.data.gov
s.cnmilf.com
Updated Apr 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2024). "module-utilities": A Python package for simplify creating python modules. [Dataset]. https://catalog.data.gov/dataset/module-utilities-a-python-package-for-simplify-creating-python-modules
Explore at:
Dataset updated
Apr 11, 2024
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
"module-utilities" is a python package of utilities to simplify working with python packages.The main features of module-utilities are as follows: "cached" module: A module to cache class attributes and methods. Right now, this uses a standard python dictionary for storage. Future versions will hopefully be more robust to threading and shared cache."docfiller" module: A module to share documentation. This is adapted from the pandas doc decorator. There are a host of utilities build around this."docinhert": An interface to "docstring-inheritance" module. This can be combined with "docfiller" to make creating related function/class documentation easy.
d
Python code used to determine average yearly and monthly tourism per 1000...
catalog.data.gov
data.usgs.gov
+1more
Updated Nov 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Python code used to determine average yearly and monthly tourism per 1000 residents for public-supply water service areas [Dataset]. https://catalog.data.gov/dataset/python-code-used-to-determine-average-yearly-and-monthly-tourism-per-1000-residents-for-pu
Explore at:
Dataset updated
Nov 12, 2025
Dataset provided by
U.S. Geological Survey
Description
This child item describes Python code used to estimate average yearly and monthly tourism per 1000 residents within public-supply water service areas. Increases in population due to tourism may impact amounts of water used by public-supply water systems. This data release contains model input datasets, Python code used to develop the tourism information, and output estimates of tourism. This dataset is part of a larger data release using machine learning to predict public supply water use for 12-digit hydrologic units from 2000-2020. Output from this code was used as an input feature in the public supply delivery and water use machine learning models. This page includes the following files: tourism_input_data.zip - a zip file containing input data sets used by the tourism Python code tourism_output.zip - a zip file with output produced by the tourism Python code README.txt - a README file describing the data files and code requirements tourism_study_code.zip - a zip file containing the Python code used to create the tourism feature variable
Python scripts used to generate the figures in "An algorithm to identify...
catalog.data.gov
datasets.ai
+3more
Updated Jul 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2022). Python scripts used to generate the figures in "An algorithm to identify vapor-liquid-liquid equilibria of binary mixtures from vapor-liquid equilibria" [Dataset]. https://catalog.data.gov/dataset/python-scripts-used-to-generate-the-figures-in-an-algorithm-to-identify-vapor-liquid-liqui
Explore at:
Dataset updated
Jul 29, 2022
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
The files in this repository can be used to generate the complete set of figures in the paper "An algorithm to identify vapor-liquid-liquid equilibria from vapor-liquid equilibria". The zip file, when expanded, includes a conda environment to populate the dependencies, and a set of python scripts. Running make_figures.py will regenerate all the figures, demonstrating how to use the algorithm.
Replication Package: Unboxing Default Argument Breaking Changes in 1 + 2...
zenodo.org
application/gzip
Updated Jul 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
João Eduardo Montandon; Luciana Lourdes Silva; Cristiano Politowski; Daniel Prates; Arthur Bonifácio; Ghizlane El Boussaidi; João Eduardo Montandon; Luciana Lourdes Silva; Cristiano Politowski; Daniel Prates; Arthur Bonifácio; Ghizlane El Boussaidi (2024). Replication Package: Unboxing Default Argument Breaking Changes in 1 + 2 Data Science Libraries in Python [Dataset]. http://doi.org/10.5281/zenodo.11584961
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11584961
Dataset updated
Jul 15, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
João Eduardo Montandon; Luciana Lourdes Silva; Cristiano Politowski; Daniel Prates; Arthur Bonifácio; Ghizlane El Boussaidi; João Eduardo Montandon; Luciana Lourdes Silva; Cristiano Politowski; Daniel Prates; Arthur Bonifácio; Ghizlane El Boussaidi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Replication Package

This repository contains data and source files needed to replicate our work described in the paper "Unboxing Default Argument Breaking Changes in Scikit Learn".

Requirements

We recommend the following requirements to replicate our study:

Internet access

At least 100GB of space

Docker installed

Git installed

Package Structure

We relied on Docker containers to provide a working environment that is easier to replicate. Specifically, we configure the following containers:

data-analysis, an R-based Container we used to run our data analysis.

data-collection, a Python Container we used to collect Scikit's default arguments and detect them in client applications.

database, a Postgres Container we used to store clients' data, obtainer from Grotov et al.

storage, a directory used to store the data processed in data-analysis and data-collection. This directory is shared in both containers.

docker-compose.yml, the Docker file that configures all containers used in the package.

In the remainder of this document, we describe how to set up each container properly.

Using VSCode to Setup the Package

We selected VSCode as the IDE of choice because its extensions allow us to implement our scripts directly inside the containers. In this package, we provide configuration parameters for both data-analysis and data-collection containers. This way you can directly access and run each container inside it without any specific configuration.

You first need to set up the containers

$ cd /replication/package/folder $ docker-compose build $ docker-compose up # Wait docker creating and running all containers

Then, you can open them in Visual Studio Code:

Open VSCode in project root folder

Access the command palette and select "Dev Container: Reopen in Container"

Select either Data Collection or Data Analysis.

Start working

If you want/need a more customized organization, the remainder of this file describes it in detail.

Longest Road: Manual Package Setup

Database Setup

The database container will automatically restore the dump in dump_matroskin.tar in its first launch. To set up and run the container, you should:

Build an image:

$ cd ./database $ docker build --tag 'dabc-database' . $ docker image ls REPOSITORY TAG IMAGE ID CREATED SIZE dabc-database latest b6f8af99c90d 50 minutes ago 18.5GB

Create and enter inside the container:

$ docker run -it --name dabc-database-1 dabc-database $ docker exec -it dabc-database-1 /bin/bash root# psql -U postgres -h localhost -d jupyter-notebooks jupyter-notebooks=# \dt List of relations Schema | Name | Type | Owner --------+-------------------+-------+------- public | Cell | table | root public | Code_cell | table | root public | Md_cell | table | root public | Notebook | table | root public | Notebook_features | table | root public | Notebook_metadata | table | root public | repository | table | root

If you got the tables list as above, your database is properly setup.

It is important to mention that this database is extended from the one provided by Grotov et al.. Basically, we added three columns in the table Notebook_features (API_functions_calls, defined_functions_calls, andother_functions_calls) containing the function calls performed by each client in the database.

Data Collection Setup

This container is responsible for collecting the data to answer our research questions. It has the following structure:

dabcs.py, extract DABCs from Scikit Learn source code, and export them to a CSV file.

dabcs-clients.py, extract function calls from clients and export them to a CSV file. We rely on a modified version of Matroskin to leverage the function calls. You can find the tool's source code in the `matroskin`` directory.

Makefile, commands to set up and run both dabcs.py and dabcs-clients.py

matroskin, the directory containing the modified version of matroskin tool. We extended the library to collect the function calls performed on the client notebooks of Grotov's dataset.

storage, a docker volume where the data-collection should save the exported data. This data will be used later in Data Analysis.

requirements.txt, Python dependencies adopted in this module.

Note that the container will automatically configure this module for you, e.g., install dependencies, configure matroskin, download scikit learn source code, etc. For this, you must run the following commands:

$ cd ./data-collection $ docker build --tag "data-collection" . $ docker run -it -d --name data-collection-1 -v $(pwd)/:/data-collection -v $(pwd)/../storage/:/data-collection/storage/ data-collection $ docker exec -it data-collection-1 /bin/bash $ ls Dockerfile Makefile config.yml dabcs-clients.py dabcs.py matroskin storage requirements.txt utils.py

If you see project files, it means the container is configured accordingly.

Data Analysis Setup

We use this container to conduct the analysis over the data produced by the Data Collection container. It has the following structure:

dependencies.R, an R script containing the dependencies used in our data analysis.

data-analysis.Rmd, the R notebook we used to perform our data analysis

datasets, a docker volume pointing to the storage directory.

Execute the following commands to run this container:

$ cd ./data-analysis $ docker build --tag "data-analysis" . $ docker run -it -d --name data-analysis-1 -v $(pwd)/:/data-analysis -v $(pwd)/../storage/:/data-collection/datasets/ data-analysis $ docker exec -it data-analysis-1 /bin/bash $ ls data-analysis.Rmd datasets dependencies.R Dockerfile figures Makefile

If you see project files, it means the container is configured accordingly.

A note on storage shared folder

As mentioned, the storage folder is mounted as a volume and shared between data-collection and data-analysis containers. We compressed the content of this folder due to space constraints. Therefore, before starting working on Data Collection or Data Analysis, make sure you extracted the compressed files. You can do this by running the Makefile inside storage folder.

$ make unzip # extract files $ ls clients-dabcs.csv clients-validation.csv dabcs.csv Makefile scikit-learn-versions.csv versions.csv $ make zip # compress files $ ls csv-files.tar.gz Makefile
H
Creating Curve Number Grid using PyQGIS through Jupyter Notebook in mygeohub...
hydroshare.org
beta.hydroshare.org
+1more
zip
Updated Apr 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sayan Dey; Shizhang Wang; Venkatesh Merwade (2020). Creating Curve Number Grid using PyQGIS through Jupyter Notebook in mygeohub [Dataset]. http://doi.org/10.4211/hs.abf67aad0eb64a53bf787d369afdcc84
Explore at:
zip(105.5 MB)Available download formats
Unique identifier
https://doi.org/10.4211/hs.abf67aad0eb64a53bf787d369afdcc84
Dataset updated
Apr 28, 2020
Dataset provided by
HydroShare
Authors
Sayan Dey; Shizhang Wang; Venkatesh Merwade
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
This resource serves as a template for creating a curve number grid raster file which could be used to create corresponding maps or for further utilization, soil data and reclassified land-use raster files are created along the process, user has to provided or connect to a set of shape-files including boundary of watershed, soil data and land-use containing this watershed, land-use reclassification and curve number look up table. Script contained in this resource mainly uses PyQGIS through Jupyter Notebook for majority of the processing with a touch of Pandas for data manipulation. Detailed description of procedure are commented in the script.
Data from: Project to create snake game by using python
kaggle.com
zip
Updated Jul 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kamal Acharya (2025). Project to create snake game by using python [Dataset]. https://www.kaggle.com/datasets/acharyakamal/project-to-create-snake-game-by-using-python
Explore at:
zip(700832 bytes)Available download formats
Dataset updated
Jul 9, 2025
Authors
Kamal Acharya
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
It’s the most famous game we used to play in our childhood before the advent of smartphones. This is a very simple project in which the snake will eat the food when its mouth touches the food. Furthermore, the length of the snake will keep on increasing after eating the food and if the snake touches the screen or itself the game will be over.
Creating_simple_Sintetic_dataset
kaggle.com
zip
Updated Jan 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lala Ibadullayeva (2025). Creating_simple_Sintetic_dataset [Dataset]. https://www.kaggle.com/datasets/lalaibadullayeva/creating-simple-sintetic-dataset
Explore at:
zip(476698 bytes)Available download formats
Dataset updated
Jan 20, 2025
Authors
Lala Ibadullayeva
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Description

Overview: This dataset contains three distinct fake datasets generated using the Faker and Mimesis libraries. These libraries are commonly used for generating realistic-looking synthetic data for testing, prototyping, and data science projects. The datasets were created to simulate real-world scenarios while ensuring no sensitive or private information is included.

Data Generation Process: The data creation process is documented in the accompanying notebook, Creating_simple_Sintetic_data.ipynb. This notebook showcases the step-by-step procedure for generating synthetic datasets with customizable structures and fields using the Faker and Mimesis libraries.

File Contents:

Datasets: CSV files containing the three synthetic datasets. Notebook: Creating_simple_Sintetic_data.ipynb detailing the data generation process and the code used to create these datasets.
d
Data from: Improving access and use of climate projections for ecological...
search.dataone.org
data.niaid.nih.gov
+2more
Updated Jul 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrea Paz; Thomas Lauber; Thomas W. Crowther; Johan van den Hoogen (2025). Improving access and use of climate projections for ecological research through the use of a new Python tool [Dataset]. http://doi.org/10.5061/dryad.3r2280gph
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.3r2280gph
Dataset updated
Jul 27, 2025
Dataset provided by
Dryad Digital Repository
Authors
Andrea Paz; Thomas Lauber; Thomas W. Crowther; Johan van den Hoogen
Time period covered
Jan 1, 2024
Description
Over the past decade, the use of future climate projections from the Coupled Model Intercomparison Project (CMIP) has become central in biodiversity science. Pre-packaged datasets containing future projections of the widely used bioclimatic variables, for different times and socio-economic pathways, have contributed immensely to the study of climate change implications for biodiversity. However, these datasets lack the flexibility to obtain projections to other target years, and the use of raw data requires coding and spatial information systems expertise. The Python tool, chelsa-cmip6, developed by Karger et. al 2023 provides the flexibility needed by allowing users to generate bioclimatic variables for the time of their choice provided the selected general circulation model and socioeconomic pathway combination exists. This is a fantastic step forward in bringing flexibility to the use of climate datasets in biodiversity and will allow for more widespread use of data provided by CMIP6..., This is a jupiter notebook that allows the user to check if the desired combination of GCM and SSP is available for the selected time and area, and then uses the chelsa-cmip6 tool to create bioclimatic variables based on the user-selected scenarios. This notebook was created to facilitate use of the chelsa-cmip6 Python tool by Karger et al 2023., , # Improving access and use of climate projections for ecological research through the use of a new Python tool

# Improving access and use of climate projections for ecological research through the use of a new Python tool

---

Brief summary of contents

Data archive for:

Improving access and use of climate projections for ecological research through the use of a new Python tool

by Andrea Paz, Thomas Lauber, Thomas W. Crowther and Johan van den Hoogen

This archive contains an example notebook script that allows the user to check if the desired combination of GCM and SSP is available for the selected time and area, and then uses the chelsa-cmip6 Python tool (Karger et al 2023) to create bioclimatic variables based on the user-selected scenarios. The bioclimatic variables can then be converted to GeoTiffs, downloaded and used in different applications through R, Python, Google Earth Engine, or others.

This Notebook can be opened locally or uploaded to Google Colab, Deepnote, et...
14.2 Python for Everyone
hub.arcgis.com
Updated Mar 4, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Iowa Department of Transportation (2017). 14.2 Python for Everyone [Dataset]. https://hub.arcgis.com/documents/7a3a96a7f9fe46b3bc853691621ccf7d
Explore at:
Dataset updated
Mar 4, 2017
Dataset authored and provided by
Iowa Department of Transportationhttps://iowadot.gov/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Do you spend a lot of time repeating workflows, such as copying data, editing files, and setting up map documents? Did you know that you can use Python to automate data reproduction, data management, map document display, and many of your other daily tasks in ArcGIS?This course provides the building blocks needed to use Python. You will create and run scripts using these building blocks and can apply them directly inside of ArcGIS and to your own workflows.After completing this course, you will be able to:Determine where to write and run a Python script.Differentiate Python language elements and determine where to apply them.Follow a script workflow.Develop a Python script to run statements and functions.Solve common syntax errors.
Omega-Prime: Data Model, Data Format and Python Library for Handling Ground...
doi.org
zenodo.org
zip
Updated Jul 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sven Tarlowski; Sven Tarlowski (2025). Omega-Prime: Data Model, Data Format and Python Library for Handling Ground Truth Traffic Data [Dataset]. http://doi.org/10.5281/zenodo.16565213
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.16565213
Dataset updated
Jul 29, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sven Tarlowski; Sven Tarlowski
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data Model, Format and Python Library for ground truth data containing information on dynamic objects, map and environmental factors optimized for representing urban traffic. The repository contains:

Data Model and Specification

see ./docs/omega_prime_specification.md

🌍 Data Model: What signals exist and how these are defined.

🧾 Data Format Specification: How to exchange and store those signals.

Python Library

🔨 Create omega-prime files from many sources (see ./tutorial.ipynb):

ASAM OSI GroundTruth trace (e.g., output of esmini)

Table of moving object data (e.g., csv data)

ASAM OpenDRIVE map

LevelXData datasets through lxd-io

extend yourself by subclassing DatasetConverter

🗺️ Map Association: Associate Object Location with Lanes from OpenDRIVE or OSI Maps (see tutorial_locator.ipynb)

📺 Plotting of data: interactive top view plots using altair

✅ Validation of data: check if your data conforms to the omega-prime specification (e.g., correct yaw) using pandera

📐 Interpolation of data: bring your data into a fixed frequency

📈 Metrics: compute interaction metrics like PET, TTC, THW (see tutorial_metrics.ipynb)

🚀 Fast Processing directly on DataFrames using polars, polars-st

The data model and format utilize ASAM OpenDRIVE and ASAM Open-Simulation-Interface GroundTruth messages. omega-prime sets requirements on presence and quality of ASAM OSI GroundTruth messages and ASAM OpenDRIVE files and defines a file format for the exchange and storage of these.

Omega-Prime is the successor of the OMEGAFormat. It has the benefit that its definition is directly based on the established standards ASAM OSI and ASAM OpenDRIVE and carries over the data quality requirements and the data tooling from OMEGAFormat. Therefore, it should be easier to incorporate omega-prime into existing workflows and tooling.

To learn more about the example data read example_files/README.md. Example data was taken and created from esmini.
S&T Project 22071 Code: Figure Generation Python Code
data.usbr.gov
Updated Apr 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States Bureau of Reclamation (2025). S&T Project 22071 Code: Figure Generation Python Code [Dataset]. https://data.usbr.gov/catalog/8075/item/128825
Explore at:
Dataset updated
Apr 15, 2025
Dataset authored and provided by
United States Bureau of Reclamationhttp://www.usbr.gov/
Description
Code written in Python to create spatial plots of publicly available atmospheric and snow data. The user must have access to publicly available ERA5-Interim atmospheric data as well as SNOTEL snowfall timeseries data. Additionally, the plotting code is intended to be used on some of the outputs from the Weather Typing Algorithm written by Andreas Prein and owned by NCAR
Dataset Sales - Aleatory Data - by python numpy
kaggle.com
zip
Updated Jul 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Italo Marcelo (2020). Dataset Sales - Aleatory Data - by python numpy [Dataset]. https://kaggle.com/italomarcelo/dataset-sales-aleatory-data-by-python-numpy
Explore at:
zip(4756152 bytes)Available download formats
Dataset updated
Jul 4, 2020
Authors
Italo Marcelo
Description
Dataset

This dataset was created by Italo Marcelo

Contents

It contains the following files:
Python Data Science Handbook
kaggle.com
zip
Updated Dec 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Timo Bozsolik (2021). Python Data Science Handbook [Dataset]. https://www.kaggle.com/timoboz/python-data-science-handbook
Explore at:
zip(16028316 bytes)Available download formats
Dataset updated
Dec 20, 2021
Authors
Timo Bozsolik
Description
Python Data Science Handbook

This repository contains the entire Python Data Science Handbook, in the form of (free!) Jupyter notebooks.

How to Use this Book

Read the book in its entirety online at https://jakevdp.github.io/PythonDataScienceHandbook/

Run the code using the Jupyter notebooks available in this repository's notebooks directory.

Launch executable versions of these notebooks using Google Colab:

Launch a live notebook server with these notebooks using binder:

Buy the printed book through O'Reilly Media

About

The book was written and tested with Python 3.5, though other Python versions (including Python 2.7) should work in nearly all cases.

The book introduces the core libraries essential for working with data in Python: particularly IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and related packages. Familiarity with Python as a language is assumed; if you need a quick introduction to the language itself, see the free companion project, A Whirlwind Tour of Python: it's a fast-paced introduction to the Python language aimed at researchers and scientists.

See Index.ipynb for an index of the notebooks available to accompany the text.

Software

The code in the book was tested with Python 3.5, though most (but not all) will also work correctly with Python 2.7 and other older Python versions.

The packages I used to run the code in the book are listed in requirements.txt (Note that some of these exact version numbers may not be available on your platform: you may have to tweak them for your own use). To install the requirements using conda, run the following at the command-line:

$ conda install --file requirements.txt

To create a stand-alone environment named PDSH with Python 3.5 and all the required package versions, run the following:

$ conda create -n PDSH python=3.5 --file requirements.txt

You can read more about using conda environments in the Managing Environments section of the conda documentation.

License

Code

The code in this repository, including all code samples in the notebooks listed above, is released under the MIT license. Read more at the Open Source Initiative.

Text

The text content of the book is released under the CC-BY-NC-ND license. Read more at Creative Commons.
a
Python for ArcGIS - Working with ArcGIS Notebooks
edu.hub.arcgis.com
Updated Oct 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Education and Research (2024). Python for ArcGIS - Working with ArcGIS Notebooks [Dataset]. https://edu.hub.arcgis.com/documents/16fbaf21dc7b41c187ebcfd9f6ea1d58
Explore at:
Dataset updated
Oct 8, 2024
Dataset authored and provided by
Education and Research
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This resource was created by Esri Canada Education and Research. To browse our full collection of higher-education learning resources, please visit https://hed.esri.ca/resourcefinder/.This tutorial introduces you to using Python code in a Jupyter Notebook, an open source web application that enables you to create and share documents that contain rich text, equations and multimedia, alongside executable code and visualization of analysis outputs. The tutorial begins by stepping through the basics of setting up and being productive with Python notebooks. You will be introduced to ArcGIS Notebooks, which are Python Notebooks that are well-integrated within the ArcGIS platform. Finally, you will be guided through a series of ArcGIS Notebooks that illustrate how to create compelling notebooks for data science that integrate your own Python scripts using the ArcGIS API for Python and ArcPy in combination with thousands of open source Python libraries to enhance your analysis and visualization.To download the dataset Labs, click the Open button to the top right. This will automatically download a ZIP file containing all files and data required.You can also clone the tutorial documents and datasets for this GitHub repo: https://github.com/highered-esricanada/arcgis-notebooks-tutorial.git.Software & Solutions Used: Required: This tutorial was last tested on August 27th, 2024, using ArcGIS Pro 3.3. If you're using a different version of ArcGIS Pro, you may encounter different functionality and results.Recommended: ArcGIS Online subscription account with permissions to use advanced Notebooks and GeoEnrichmentOptional: Notebook Server for ArcGIS Enterprise 11.3+Time to Complete: 2 h (excludes processing time)File Size: 196 MBDate Created: January 2022Last Updated: August 27, 2024
t
Data from: Decoding Wayfinding: Analyzing Wayfinding Processes in the...
researchdata.tuwien.at
html, pdf, zip
Updated Mar 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Negar Alinaghi; Ioannis Giannopoulos; Ioannis Giannopoulos; Negar Alinaghi; Negar Alinaghi; Negar Alinaghi (2025). Decoding Wayfinding: Analyzing Wayfinding Processes in the Outdoor Environment [Dataset]. http://doi.org/10.48436/m2ha4-t1v92
Explore at:
html, zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.48436/m2ha4-t1v92
Dataset updated
Mar 19, 2025
Dataset provided by
TU Wien
Authors
Negar Alinaghi; Ioannis Giannopoulos; Ioannis Giannopoulos; Negar Alinaghi; Negar Alinaghi; Negar Alinaghi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
How To Cite?

Alinaghi, N., Giannopoulos, I., Kattenbeck, M., & Raubal, M. (2025). Decoding wayfinding: analyzing wayfinding processes in the outdoor environment. International Journal of Geographical Information Science, 1–31. https://doi.org/10.1080/13658816.2025.2473599

Link to the paper: https://www.tandfonline.com/doi/full/10.1080/13658816.2025.2473599

Folder Structure

The folder named “submission” contains the following:

“pythonProject”: This folder contains all the Python files and subfolders needed for analysis.

ijgis.yml: This file lists all the Python libraries and dependencies required to run the code.

Setting Up the Environment

Use the ijgis.yml file to create a Python project and environment. Ensure you activate the environment before running the code.

The pythonProject folder contains several .py files and subfolders, each with specific functionality as described below.

Subfolders

1. Data_4_IJGIS

This folder contains the data used for the results reported in the paper.

Note: The data analysis that we explain in this paper already begins with the synchronization and cleaning of the recorded raw data. The published data is already synchronized and cleaned. Both the cleaned files and the merged files with features extracted for them are given in this directory. If you want to perform the segmentation and feature extraction yourself, you should run the respective Python files yourself. If not, you can use the “merged_…csv” files as input for the training.

2. results_[DateTime] (e.g., results_20240906_15_00_13)

This folder will be generated when you run the code and will store the output of each step.

The current folder contains results created during code debugging for the submission.

When you run the code, a new folder with fresh results will be generated.

Python Files

1. helper_functions.py

Contains reusable functions used throughout the analysis.

Each function includes a description of its purpose and the input parameters required.

2. create_sanity_plots.py

Generates scatter plots like those in Figure 3 of the paper.

Although the code has been run for all 309 trials, it can be used to check the sample data provided.

Output: A .png file for each column of the raw gaze and IMU recordings, color-coded with logged events.

Usage: Run this file to create visualizations similar to Figure 3.

3. overlapping_sliding_window_loop.py

Implements overlapping sliding window segmentation and generates plots like those in Figure 4.

Output:

Two new subfolders, “Gaze” and “IMU”, will be added to the Data_4_IJGIS folder.

Segmented files (default: 2–10 seconds with a 1-second step size) will be saved as .csv files.

A visualization of the segments, similar to Figure 4, will be automatically generated.

4. gaze_features.py & imu_features.py (Note: there has been an update to the IDT function implementation in the gaze_features.py on 19.03.2025.)

These files compute features as explained in Tables 1 and 2 of the paper, respectively.

They process the segmented recordings generated by the overlapping_sliding_window_loop.py.

Usage: Just to know how the features are calculated, you can run this code after the segmentation with the sliding window and run these files to calculate the features from the segmented data.

5. training_prediction.py

This file contains the main machine learning analysis of the paper. This file contains all the code for the training of the model, its evaluation, and its use for the inference of the “monitoring part”. It covers the following steps:

a. Data Preparation (corresponding to Section 5.1.1 of the paper)

Prepares the data according to the research question (RQ) described in the paper. Since this data was collected with several RQs in mind, we remove parts of the data that are not related to the RQ of this paper.

A function named plot_labels_comparison(df, save_path, x_label_freq=10, figsize=(15, 5)) in line 116 visualizes the data preparation results. As this visualization is not used in the paper, the line is commented out, but if you want to see visually what has been changed compared to the original data, you can comment out this line.

b. Training/Validation/Test Split

Splits the data for machine learning experiments (an explanation can be found in Section 5.1.1. Preparation of data for training and inference of the paper).

Make sure that you follow the instructions in the comments to the code exactly.

Output: The split data is saved as .csv files in the results folder.

c. Machine and Deep Learning Experiments

This part contains three main code blocks:

iii. One for the XGboost code with correct hyperparameter tuning:
Please read the instructions for each block carefully to ensure that the code works smoothly. Regardless of which block you use, you will get the classification results (in the form of scores) for unseen data. The way we empirically test the confidence threshold of

MLP Network (Commented Out): This code was used for classification with the MLP network, and the results shown in Table 3 are from this code. If you wish to use this model, please comment out the following blocks accordingly.

XGBoost without Hyperparameter Tuning: If you want to run the code but do not want to spend time on the full training with hyperparameter tuning (as was done for the paper), just uncomment this part. This will give you a simple, untuned model with which you can achieve at least some results.

XGBoost with Hyperparameter Tuning: If you want to train the model the way we trained it for the analysis reported in the paper, use this block (the plots in Figure 7 are from this block). We ran this block with different feature sets and different segmentation files and created a simple bar chart from the saved results, shown in Figure 6.

Note: Please read the instructions for each block carefully to ensure that the code works smoothly. Regardless of which block you use, you will get the classification results (in the form of scores) for unseen data. The way we empirically calculated the confidence threshold of the model (explained in the paper in Section 5.2. Part II: Decoding surveillance by sequence analysis) is given in this block in lines 361 to 380.

d. Inference (Monitoring Part)

Final inference is performed using the monitoring data. This step produces a .csv file containing inferred labels.

Figure 8 in the paper is generated using this part of the code.

6. sequence_analysis.py

Performs analysis on the inferred data, producing Figures 9 and 10 from the paper.

This file reads the inferred data from the previous step and performs sequence analysis as described in Sections 5.2.1 and 5.2.2.

Licenses

The data is licensed under CC-BY, the code is licensed under MIT.
Z
DustNet - structured data and Python code to reproduce the model,...
data.niaid.nih.gov
data-staging.niaid.nih.gov
+1more
Updated Jul 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nowak, T. E.; Augousti, Andy T.; Simmons, Benno I.; Siegert, Stefan (2024). DustNet - structured data and Python code to reproduce the model, statistical analysis and figures [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10631953
Explore at:
Dataset updated
Jul 7, 2024
Dataset provided by
University of Exeter
Kingston University
Authors
Nowak, T. E.; Augousti, Andy T.; Simmons, Benno I.; Siegert, Stefan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data and Python code used for AOD prediction with DustNet model - a Machine Learning/AI based forecasting.

Model input data and code

Processed MODIS AOD data (from Aqua and Terra) and selected ERA5 variables* ready to reproduce the DustNet model results or for similar forecasting with Machine Learning. These long-term daily timeseries (2003-2022) are provided as n-dimensional NumPy arrays. The Python code to handle the data and run the DustNet model** is included as Jupyter Notebook ‘DustNet_model_code.ipynb’. A subfolder with normalised and split data into training/validation/testing sets is also provided with Python code for two additional ML based models** used for comparison (U-NET and Conv2D). Pre-trained models are also archived here as TensorFlow files.

Model output data and code

This dataset was constructed by running the ‘DustNet_model_code.ipynb’ (see above). It consists of 1095 days of forecased AOD data (2020-2022) by CAMS, DustNet model, naïve prediction (persistence) and gridded climatology. The ground truth raw AOD data form MODIS is provided for comparison and statystical analysis of predictions. It is intended for a quick reproduction of figures and statystical analysis presented in DustNet introducing paper.

*datasets are NumPy arrays (v1.23) created in Python v3.8.18.

**all ML models were created with Keras in Python v3.10.10.

Data from: Ecosystem-Level Determinants of Sustained Activity in Open-Source...

zenodo.org

application/gzip, bin +2

Updated Aug 2, 2024

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb (2024). Ecosystem-Level Determinants of Sustained Activity in Open-Source Projects: A Case Study of the PyPI Ecosystem [Dataset]. http://doi.org/10.5281/zenodo.1419788

Explore at:

bin, application/gzip, zip, text/x-pythonAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.1419788

Dataset updated

Aug 2, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb

License

https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html

Description

Replication pack, FSE2018 submission #164:
------------------------------------------

**Working title:** Ecosystem-Level Factors Affecting the Survival of Open-Source Projects: 
A Case Study of the PyPI Ecosystem

**Note:** link to data artifacts is already included in the paper. 
Link to the code will be included in the Camera Ready version as well.


Content description
===================

- **ghd-0.1.0.zip** - the code archive. This code produces the dataset files 
 described below
- **settings.py** - settings template for the code archive.
- **dataset_minimal_Jan_2018.zip** - the minimally sufficient version of the dataset.
 This dataset only includes stats aggregated by the ecosystem (PyPI)
- **dataset_full_Jan_2018.tgz** - full version of the dataset, including project-level
 statistics. It is ~34Gb unpacked. This dataset still doesn't include PyPI packages
 themselves, which take around 2TB.
- **build_model.r, helpers.r** - R files to process the survival data 
  (`survival_data.csv` in **dataset_minimal_Jan_2018.zip**, 
  `common.cache/survival_data.pypi_2008_2017-12_6.csv` in 
  **dataset_full_Jan_2018.tgz**)
- **Interview protocol.pdf** - approximate protocol used for semistructured interviews.
- LICENSE - text of GPL v3, under which this dataset is published
- INSTALL.md - replication guide (~2 pages)

Replication guide
=================

Step 0 - prerequisites
----------------------

- Unix-compatible OS (Linux or OS X)
- Python interpreter (2.7 was used; Python 3 compatibility is highly likely)
- R 3.4 or higher (3.4.4 was used, 3.2 is known to be incompatible)

Depending on detalization level (see Step 2 for more details):
- up to 2Tb of disk space (see Step 2 detalization levels)
- at least 16Gb of RAM (64 preferable)
- few hours to few month of processing time

Step 1 - software
----------------

- unpack **ghd-0.1.0.zip**, or clone from gitlab:

   git clone https://gitlab.com/user2589/ghd.git
   git checkout 0.1.0
 
 `cd` into the extracted folder. 
 All commands below assume it as a current directory.
  
- copy `settings.py` into the extracted folder. Edit the file:
  * set `DATASET_PATH` to some newly created folder path
  * add at least one GitHub API token to `SCRAPER_GITHUB_API_TOKENS` 
- install docker. For Ubuntu Linux, the command is 
  `sudo apt-get install docker-compose`
- install libarchive and headers: `sudo apt-get install libarchive-dev`
- (optional) to replicate on NPM, install yajl: `sudo apt-get install yajl-tools`
 Without this dependency, you might get an error on the next step, 
 but it's safe to ignore.
- install Python libraries: `pip install --user -r requirements.txt` . 
- disable all APIs except GitHub (Bitbucket and Gitlab support were
 not yet implemented when this study was in progress): edit
 `scraper/init.py`, comment out everything except GitHub support
 in `PROVIDERS`.

Step 2 - obtaining the dataset
-----------------------------

The ultimate goal of this step is to get output of the Python function 
`common.utils.survival_data()` and save it into a CSV file:

  # copy and paste into a Python console
  from common import utils
  survival_data = utils.survival_data('pypi', '2008', smoothing=6)
  survival_data.to_csv('survival_data.csv')

Since full replication will take several months, here are some ways to speedup
the process:

####Option 2.a, difficulty level: easiest

Just use the precomputed data. Step 1 is not necessary under this scenario.

- extract **dataset_minimal_Jan_2018.zip**
- get `survival_data.csv`, go to the next step

####Option 2.b, difficulty level: easy

Use precomputed longitudinal feature values to build the final table.
The whole process will take 15..30 minutes.

- create a folder `

Data and Python scripts to produce the results in "Summer 2024 in northern...
zenodo.org
bin, text/x-python
Updated Apr 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mika Rantanen; Mika Rantanen (2025). Data and Python scripts to produce the results in "Summer 2024 in northern Fennoscandia was very likely the warmest in 2,000 years" [Dataset]. http://doi.org/10.5281/zenodo.15179676
Explore at:
text/x-python, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15179676
Dataset updated
Apr 30, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mika Rantanen; Mika Rantanen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Tree-ring reconstruction data and Python scripts for reproducing the figures and results included in the manuscript "Summer 2024 in northern Fennoscandia was very likely the warmest in 2,000 years".

The manuscript is published in npj Climate and Atmospheric Science:

Rantanen, M., Helama, S., Räisänen, J. et al. Summer 2024 in northern Fennoscandia was very likely the warmest in 2000 years. npj Clim Atmos Sci 8, 158 (2025). https://doi.org/10.1038/s41612-025-01046-4

Please contact Mika Rantanen (mika.rantanen@fmi.fi) for further information.

Facebook

Twitter

Click to copy link

Link copied

Cite

Mrityunjay Pathak (2023). Pandas Practice Dataset [Dataset]. https://www.kaggle.com/datasets/themrityunjaypathak/pandas-practice-dataset/discussion

Pandas Practice Dataset

Dataset to Practice Your Pandas Skill's

Explore at:

4 scholarly articles cite this dataset (View in Google Scholar)

zip(493 bytes)Available download formats

Dataset updated

Jan 27, 2023

Authors

Mrityunjay Pathak

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

What is Pandas?

Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in 2008.

Why Use Pandas?

Pandas allows us to analyze big data and make conclusions based on statistical theories.

Pandas can clean messy data sets, and make them readable and relevant.

Relevant data is very important in data science.

What Can Pandas Do?

Pandas gives you answers about the data. Like:

Is there a correlation between two or more columns?

What is average value?

Max value?

Min value?

Clear search

Close search

Google apps

Main menu

Pandas Practice Dataset

Python Codes for Data Analysis of The Impact of COVID-19 on Technical...

"module-utilities": A Python package for simplify creating python modules.

Python code used to determine average yearly and monthly tourism per 1000...

Python scripts used to generate the figures in "An algorithm to identify...

Replication Package: Unboxing Default Argument Breaking Changes in 1 + 2...

Creating Curve Number Grid using PyQGIS through Jupyter Notebook in mygeohub...

Data from: Project to create snake game by using python

Creating_simple_Sintetic_dataset

Data from: Improving access and use of climate projections for ecological...

14.2 Python for Everyone

Omega-Prime: Data Model, Data Format and Python Library for Handling Ground...

Data Model and Specification

Python Library

S&T Project 22071 Code: Figure Generation Python Code

Dataset Sales - Aleatory Data - by python numpy

Dataset

Contents

Python Data Science Handbook

Python Data Science Handbook

How to Use this Book

About

Software

License

Code

Text

Python for ArcGIS - Working with ArcGIS Notebooks

Data from: Decoding Wayfinding: Analyzing Wayfinding Processes in the...

How To Cite?

Folder Structure

Setting Up the Environment

Subfolders

1. Data_4_IJGIS

2. results_[DateTime] (e.g., results_20240906_15_00_13)

Python Files

1. helper_functions.py

2. create_sanity_plots.py

3. overlapping_sliding_window_loop.py

4. gaze_features.py & imu_features.py (Note: there has been an update to the IDT function implementation in the gaze_features.py on 19.03.2025.)

5. training_prediction.py

a. Data Preparation (corresponding to Section 5.1.1 of the paper)

b. Training/Validation/Test Split

c. Machine and Deep Learning Experiments

d. Inference (Monitoring Part)

6. sequence_analysis.py

Licenses

DustNet - structured data and Python code to reproduce the model,...

Data from: Ecosystem-Level Determinants of Sustained Activity in Open-Source...

Data and Python scripts to produce the results in "Summer 2024 in northern...

Pandas Practice Dataset

Dataset to Practice Your Pandas Skill's