71 datasets found

Python Data Science Handbook
kaggle.com
zip
Updated Dec 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Timo Bozsolik (2021). Python Data Science Handbook [Dataset]. https://www.kaggle.com/timoboz/python-data-science-handbook
Explore at:
zip(16028316 bytes)Available download formats
Dataset updated
Dec 20, 2021
Authors
Timo Bozsolik
Description
Python Data Science Handbook

This repository contains the entire Python Data Science Handbook, in the form of (free!) Jupyter notebooks.

How to Use this Book

Read the book in its entirety online at https://jakevdp.github.io/PythonDataScienceHandbook/

Run the code using the Jupyter notebooks available in this repository's notebooks directory.

Launch executable versions of these notebooks using Google Colab:

Launch a live notebook server with these notebooks using binder:

Buy the printed book through O'Reilly Media

About

The book was written and tested with Python 3.5, though other Python versions (including Python 2.7) should work in nearly all cases.

The book introduces the core libraries essential for working with data in Python: particularly IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and related packages. Familiarity with Python as a language is assumed; if you need a quick introduction to the language itself, see the free companion project, A Whirlwind Tour of Python: it's a fast-paced introduction to the Python language aimed at researchers and scientists.

See Index.ipynb for an index of the notebooks available to accompany the text.

Software

The code in the book was tested with Python 3.5, though most (but not all) will also work correctly with Python 2.7 and other older Python versions.

The packages I used to run the code in the book are listed in requirements.txt (Note that some of these exact version numbers may not be available on your platform: you may have to tweak them for your own use). To install the requirements using conda, run the following at the command-line:

$ conda install --file requirements.txt

To create a stand-alone environment named PDSH with Python 3.5 and all the required package versions, run the following:

$ conda create -n PDSH python=3.5 --file requirements.txt

You can read more about using conda environments in the Managing Environments section of the conda documentation.

License

Code

The code in this repository, including all code samples in the notebooks listed above, is released under the MIT license. Read more at the Open Source Initiative.

Text

The text content of the book is released under the CC-BY-NC-ND license. Read more at Creative Commons.
Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter...
zenodo.org
bz2
Updated Mar 15, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana (2021). Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter Notebooks [Dataset]. http://doi.org/10.5281/zenodo.2592524
Explore at:
bz2Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.2592524
Dataset updated
Mar 15, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of Jupyter Notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we analyzed 1.4 million notebooks from GitHub.

Paper: https://2019.msrconf.org/event/msr-2019-papers-a-large-scale-study-about-quality-and-reproducibility-of-jupyter-notebooks

This repository contains two files:

dump.tar.bz2

jupyter_reproducibility.tar.bz2

The dump.tar.bz2 file contains a PostgreSQL dump of the database, with all the data we extracted from the notebooks.

The jupyter_reproducibility.tar.bz2 file contains all the scripts we used to query and download Jupyter Notebooks, extract data from them, and analyze the data. It is organized as follows:

analyses: this folder has all the notebooks we use to analyze the data in the PostgreSQL database.

archaeology: this folder has all the scripts we use to query, download, and extract data from GitHub notebooks.

paper: empty. The notebook analyses/N12.To.Paper.ipynb moves data to it

In the remaining of this text, we give instructions for reproducing the analyses, by using the data provided in the dump and reproducing the collection, by collecting data from GitHub again.

Reproducing the Analysis

This section shows how to load the data in the database and run the analyses notebooks. In the analysis, we used the following environment:

Ubuntu 18.04.1 LTS
PostgreSQL 10.6
Conda 4.5.11
Python 3.7.2
PdfCrop 2012/11/02 v1.38

First, download dump.tar.bz2 and extract it:

tar -xjf dump.tar.bz2

It extracts the file db2019-03-13.dump. Create a database in PostgreSQL (we call it "jupyter"), and use psql to restore the dump:

psql jupyter < db2019-03-13.dump

It populates the database with the dump. Now, configure the connection string for sqlalchemy by setting the environment variable JUP_DB_CONNECTTION:

export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter";

Download and extract jupyter_reproducibility.tar.bz2:

tar -xjf jupyter_reproducibility.tar.bz2

Create a conda environment with Python 3.7:

conda create -n analyses python=3.7 conda activate analyses

Go to the analyses folder and install all the dependencies of the requirements.txt

cd jupyter_reproducibility/analyses pip install -r requirements.txt

For reproducing the analyses, run jupyter on this folder:

jupyter notebook

Execute the notebooks on this order:

Index.ipynb

N0.Repository.ipynb

N1.Skip.Notebook.ipynb

N2.Notebook.ipynb

N3.Cell.ipynb

N4.Features.ipynb

N5.Modules.ipynb

N6.AST.ipynb

N7.Name.ipynb

N8.Execution.ipynb

N9.Cell.Execution.Order.ipynb

N10.Markdown.ipynb

N11.Repository.With.Notebook.Restriction.ipynb

N12.To.Paper.ipynb

Reproducing or Expanding the Collection

The collection demands more steps to reproduce and takes much longer to run (months). It also involves running arbitrary code on your machine. Proceed with caution.

Requirements

This time, we have extra requirements:

All the analysis requirements
lbzip2 2.5
gcc 7.3.0
Github account
Gmail account

Environment

First, set the following environment variables:

export JUP_MACHINE="db"; # machine identifier export JUP_BASE_DIR="/mnt/jupyter/github"; # place to store the repositories export JUP_LOGS_DIR="/home/jupyter/logs"; # log files export JUP_COMPRESSION="lbzip2"; # compression program export JUP_VERBOSE="5"; # verbose level export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter"; # sqlchemy connection export JUP_GITHUB_USERNAME="github_username"; # your github username export JUP_GITHUB_PASSWORD="github_password"; # your github password export JUP_MAX_SIZE="8000.0"; # maximum size of the repositories directory (in GB) export JUP_FIRST_DATE="2013-01-01"; # initial date to query github export JUP_EMAIL_LOGIN="gmail@gmail.com"; # your gmail address export JUP_EMAIL_TO="target@email.com"; # email that receives notifications export JUP_OAUTH_FILE="~/oauth2_creds.json" # oauth2 auhentication file export JUP_NOTEBOOK_INTERVAL=""; # notebook id interval for this machine. Leave it in blank export JUP_REPOSITORY_INTERVAL=""; # repository id interval for this machine. Leave it in blank export JUP_WITH_EXECUTION="1"; # run execute python notebooks export JUP_WITH_DEPENDENCY="0"; # run notebooks with and without declared dependnecies export JUP_EXECUTION_MODE="-1"; # run following the execution order export JUP_EXECUTION_DIR="/home/jupyter/execution"; # temporary directory for running notebooks export JUP_ANACONDA_PATH="~/anaconda3"; # conda installation path export JUP_MOUNT_BASE="/home/jupyter/mount_ghstudy.sh"; # bash script to mount base dir export JUP_UMOUNT_BASE="/home/jupyter/umount_ghstudy.sh"; # bash script to umount base dir export JUP_NOTEBOOK_TIMEOUT="300"; # timeout the extraction # Frequenci of log report export JUP_ASTROID_FREQUENCY="5"; export JUP_IPYTHON_FREQUENCY="5"; export JUP_NOTEBOOKS_FREQUENCY="5"; export JUP_REQUIREMENT_FREQUENCY="5"; export JUP_CRAWLER_FREQUENCY="1"; export JUP_CLONE_FREQUENCY="1"; export JUP_COMPRESS_FREQUENCY="5"; export JUP_DB_IP="localhost"; # postgres database IP

Then, configure the file ~/oauth2_creds.json, according to yagmail documentation: https://media.readthedocs.org/pdf/yagmail/latest/yagmail.pdf

Configure the mount_ghstudy.sh and umount_ghstudy.sh scripts. The first one should mount the folder that stores the directories. The second one should umount it. You can leave the scripts in blank, but it is not advisable, as the reproducibility study runs arbitrary code on your machine and you may lose your data.

Scripts

Download and extract jupyter_reproducibility.tar.bz2:

tar -xjf jupyter_reproducibility.tar.bz2

Install 5 conda environments and 5 anaconda environments, for each python version. In each of them, upgrade pip, install pipenv, and install the archaeology package (Note that it is a local package that has not been published to pypi. Make sure to use the -e option):

Conda 2.7

conda create -n raw27 python=2.7 -y conda activate raw27 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Anaconda 2.7

conda create -n py27 python=2.7 anaconda -y conda activate py27 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.4

It requires a manual jupyter and pathlib2 installation due to some incompatibilities found on the default installation.

conda create -n raw34 python=3.4 -y conda activate raw34 conda install jupyter -c conda-forge -y conda uninstall jupyter -y pip install --upgrade pip pip install jupyter pip install pipenv pip install -e jupyter_reproducibility/archaeology pip install pathlib2

Anaconda 3.4

conda create -n py34 python=3.4 anaconda -y conda activate py34 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.5

conda create -n raw35 python=3.5 -y conda activate raw35 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Anaconda 3.5

It requires the manual installation of other anaconda packages.

conda create -n py35 python=3.5 anaconda -y conda install -y appdirs atomicwrites keyring secretstorage libuuid navigator-updater prometheus_client pyasn1 pyasn1-modules spyder-kernels tqdm jeepney automat constantly anaconda-navigator conda activate py35 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.6

conda create -n raw36 python=3.6 -y conda activate raw36 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Anaconda 3.6

conda create -n py36 python=3.6 anaconda -y conda activate py36 conda install -y anaconda-navigator jupyterlab_server navigator-updater pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.7

<code
g
Data from: Data Science Problems
github.com
opendatalab.com
Updated Feb 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Data Science Problems [Dataset]. https://github.com/microsoft/DataScienceProblems
Explore at:
Dataset updated
Feb 8, 2022
License
https://github.com/microsoft/DataScienceProblems/blob/main/LICENSE.txthttps://github.com/microsoft/DataScienceProblems/blob/main/LICENSE.txt
Description
Evaluate a natural language code generation model on real data science pedagogical notebooks! Data Science Problems (DSP) includes well-posed data science problems in Markdown along with unit tests to verify correctness and a Docker environment for reproducible execution. About 1/3 of notebooks in this benchmark also include data dependencies, so this benchmark not only can test a model's ability to chain together complex tasks, but also evaluate the solutions on real data! See our paper Training and Evaluating a Jupyter Notebook Data Science Assistant (https://arxiv.org/abs/2201.12901) for more details about state of the art results and other properties of the dataset.
Reproducibility in Practice: Dataset of a Large-Scale Study of Jupyter...
zenodo.org
bz2
Updated Mar 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous; Anonymous (2021). Reproducibility in Practice: Dataset of a Large-Scale Study of Jupyter Notebooks [Dataset]. http://doi.org/10.5281/zenodo.2546834
Explore at:
bz2Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.2546834
Dataset updated
Mar 15, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anonymous; Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of Jupyter Notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we analyzed 1.4 million notebooks from GitHub.

This repository contains two files:

dump.tar.bz2

jupyter_reproducibility.tar.bz2

The dump.tar.bz2 file contains a PostgreSQL dump of the database, with all the data we extracted from the notebooks.

The jupyter_reproducibility.tar.bz2 file contains all the scripts we used to query and download Jupyter Notebooks, extract data from them, and analyze the data. It is organized as follows:

analyses: this folder has all the notebooks we use to analyze the data in the PostgreSQL database.

archaeology: this folder has all the scripts we use to query, download, and extract data from GitHub notebooks.

paper: empty. The notebook analyses/N11.To.Paper.ipynb moves data to it

In the remaining of this text, we give instructions for reproducing the analyses, by using the data provided in the dump and reproducing the collection, by collecting data from GitHub again.

Reproducing the Analysis

This section shows how to load the data in the database and run the analyses notebooks. In the analysis, we used the following environment:

Ubuntu 18.04.1 LTS
PostgreSQL 10.6
Conda 4.5.1
Python 3.6.8
PdfCrop 2012/11/02 v1.38

First, download dump.tar.bz2 and extract it:

tar -xjf dump.tar.bz2

It extracts the file db2019-01-13.dump. Create a database in PostgreSQL (we call it "jupyter"), and use psql to restore the dump:

psql jupyter < db2019-01-13.dump

It populates the database with the dump. Now, configure the connection string for sqlalchemy by setting the environment variable JUP_DB_CONNECTTION:

export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter";

Download and extract jupyter_reproducibility.tar.bz2:

tar -xjf jupyter_reproducibility.tar.bz2

Create a conda environment with Python 3.6:

conda create -n py36 python=3.6

Go to the analyses folder and install all the dependencies of the requirements.txt

cd jupyter_reproducibility/analyses pip install -r requirements.txt

For reproducing the analyses, run jupyter on this folder:

jupyter notebook

Execute the notebooks on this order:

N0.Index.ipynb

N1.Repository.ipynb

N2.Notebook.ipynb

N3.Cell.ipynb

N4.Features.ipynb

N5.Modules.ipynb

N6.AST.ipynb

N7.Name.ipynb

N8.Execution.ipynb

N9.Cell.Execution.Order.ipynb

N10.Markdown.ipynb

N11.To.Paper.ipynb

Reproducing or Expanding the Collection

The collection demands more steps to reproduce and takes much longer to run (months). It also involves running arbitrary code on your machine. Proceed with caution.

Requirements

This time, we have extra requirements:

All the analysis requirements
lbzip2 2.5
gcc 7.3.0
Github account
Gmail account

Environment

First, set the following environment variables:

export JUP_MACHINE="db"; # machine identifier export JUP_BASE_DIR="/mnt/jupyter/github"; # place to store the repositories export JUP_LOGS_DIR="/home/jupyter/logs"; # log files export JUP_COMPRESSION="lbzip2"; # compression program export JUP_VERBOSE="5"; # verbose level export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter"; # sqlchemy connection export JUP_GITHUB_USERNAME="github_username"; # your github username export JUP_GITHUB_PASSWORD="github_password"; # your github password export JUP_MAX_SIZE="8000.0"; # maximum size of the repositories directory (in GB) export JUP_FIRST_DATE="2013-01-01"; # initial date to query github export JUP_EMAIL_LOGIN="gmail@gmail.com"; # your gmail address export JUP_EMAIL_TO="target@email.com"; # email that receives notifications export JUP_OAUTH_FILE="~/oauth2_creds.json" # oauth2 auhentication file export JUP_NOTEBOOK_INTERVAL=""; # notebook id interval for this machine. Leave it in blank export JUP_REPOSITORY_INTERVAL=""; # repository id interval for this machine. Leave it in blank export JUP_WITH_EXECUTION="1"; # run execute python notebooks export JUP_WITH_DEPENDENCY="0"; # run notebooks with and without declared dependnecies export JUP_EXECUTION_MODE="-1"; # run following the execution order export JUP_EXECUTION_DIR="/home/jupyter/execution"; # temporary directory for running notebooks export JUP_ANACONDA_PATH="~/anaconda3"; # conda installation path export JUP_MOUNT_BASE="/home/jupyter/mount_ghstudy.sh"; # bash script to mount base dir export JUP_UMOUNT_BASE="/home/jupyter/umount_ghstudy.sh"; # bash script to umount base dir export JUP_NOTEBOOK_TIMEOUT="300"; # timeout the extraction # Frequenci of log report export JUP_ASTROID_FREQUENCY="5"; export JUP_IPYTHON_FREQUENCY="5"; export JUP_NOTEBOOKS_FREQUENCY="5"; export JUP_REQUIREMENT_FREQUENCY="5"; export JUP_CRAWLER_FREQUENCY="1"; export JUP_CLONE_FREQUENCY="1"; export JUP_COMPRESS_FREQUENCY="5"; export JUP_DB_IP="localhost"; # postgres database IP

Then, configure the file ~/oauth2_creds.json, according to yagmail documentation: https://media.readthedocs.org/pdf/yagmail/latest/yagmail.pdf

Configure the mount_ghstudy.sh and umount_ghstudy.sh scripts. The first one should mount the folder that stores the directories. The second one should umount it. You can leave the scripts in blank, but it is not advisable, as the reproducibility study runs arbitrary code on your machine and you may lose your data.

Scripts

Download and extract jupyter_reproducibility.tar.bz2:

tar -xjf jupyter_reproducibility.tar.bz2

Install 5 conda environments and 5 anaconda environments, for each python version. In each of them, upgrade pip, install pipenv, and install the archaeology package (Note that it is a local package that has not been published to pypi. Make sure to use the -e option):

Conda 2.7

conda create -n raw27 python=2.7 -y conda activate raw27 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Anaconda 2.7

conda create -n py27 python=2.7 anaconda -y conda activate py27 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.4

It requires a manual jupyter and pathlib2 installation due to some incompatibilities found on the default installation.

conda create -n raw34 python=3.4 -y conda activate raw34 conda install jupyter -c conda-forge -y conda uninstall jupyter -y pip install --upgrade pip pip install jupyter pip install pipenv pip install -e jupyter_reproducibility/archaeology pip install pathlib2

Anaconda 3.4

conda create -n py34 python=3.4 anaconda -y conda activate py34 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.5

conda create -n raw35 python=3.5 -y conda activate raw35 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Anaconda 3.5

It requires the manual installation of other anaconda packages.

conda create -n py35 python=3.5 anaconda -y conda install -y appdirs atomicwrites keyring secretstorage libuuid navigator-updater prometheus_client pyasn1 pyasn1-modules spyder-kernels tqdm jeepney automat constantly anaconda-navigator conda activate py35 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.6

conda create -n raw36 python=3.6 -y conda activate raw36 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Anaconda 3.6

conda create -n py36 python=3.6 anaconda -y conda activate py36 conda install -y anaconda-navigator jupyterlab_server navigator-updater pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.7

conda create -n raw37 python=3.7 -y conda activate raw37 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Anaconda 3.7

When we
H
(HS 2) Automate Workflows using Jupyter notebook to create Large Extent...
hydroshare.org
search.dataone.org
zip
Updated Oct 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Young-Don Choi (2024). (HS 2) Automate Workflows using Jupyter notebook to create Large Extent Spatial Datasets [Dataset]. http://doi.org/10.4211/hs.a52df87347ef47c388d9633925cde9ad
Explore at:
zip(2.4 MB)Available download formats
Unique identifier
https://doi.org/10.4211/hs.a52df87347ef47c388d9633925cde9ad
Dataset updated
Oct 15, 2024
Dataset provided by
HydroShare
Authors
Young-Don Choi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We implemented automated workflows using Jupyter notebooks for each state. The GIS processing, crucial for merging, extracting, and projecting GeoTIFF data, was performed using ArcPy—a Python package for geographic data analysis, conversion, and management within ArcGIS (Toms, 2015). After generating state-scale LES (large extent spatial) datasets in GeoTIFF format, we utilized the xarray and rioxarray Python packages to convert GeoTIFF to NetCDF. Xarray is a Python package to work with multi-dimensional arrays and rioxarray is rasterio xarray extension. Rasterio is a Python library to read and write GeoTIFF and other raster formats. Xarray facilitated data manipulation and metadata addition in the NetCDF file, while rioxarray was used to save GeoTIFF as NetCDF. These procedures resulted in the creation of three HydroShare resources (HS 3, HS 4 and HS 5) for sharing state-scale LES datasets. Notably, due to licensing constraints with ArcGIS Pro, a commercial GIS software, the Jupyter notebook development was undertaken on a Windows OS.
Data from: A large-scale comparative analysis of Coding Standard conformance...
figshare.com
application/x-gzip
Updated Oct 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anj Simmons; Scott Barnett; Jessica Rivera-Villicana; Akshat Bajaj; Rajesh Vasa (2021). A large-scale comparative analysis of Coding Standard conformance in Open-Source Data Science projects [Dataset]. http://doi.org/10.6084/m9.figshare.12377237.v3
Explore at:
application/x-gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12377237.v3
Dataset updated
Oct 4, 2021
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Anj Simmons; Scott Barnett; Jessica Rivera-Villicana; Akshat Bajaj; Rajesh Vasa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This study investigates the extent to which data science projects follow code standards. In particular, which standards are followed, which are ignored, and how does this differ to traditional software projects? We compare a corpus of 1048 Open-Source Data Science projects to a reference group of 1099 non-Data Science projects with a similar level of quality and maturity.results.tar.gz: Extracted data for each project, including raw logs of all detected code violations.notebooks_out.tar.gz: Tables and figures generated by notebooks.source_code_anonymized.tar.gz: Anonymized source code (at time of publication) to identify, clone, and analyse the projects. Also includes Jupyter notebooks used to produce figures in the paper.The latest source code can be found at: https://github.com/a2i2/mining-data-science-repositoriesPublished in ESEM 2020: https://doi.org/10.1145/3382494.3410680Preprint: https://arxiv.org/abs/2007.08978
Z
Outputs of the Jupyter Notebook - Detecting floating objects using Deep...
data.niaid.nih.gov
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Environmental Data Science Community (2022). Outputs of the Jupyter Notebook - Detecting floating objects using Deep Learning and Sentinel-2 imagery [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5911142
Explore at:
Dataset updated
Jan 28, 2022
Authors
Environmental Data Science Community
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains the outputs of the notebook "Detecting floating objects using Deep Learning and Sentinel-2 imagery" published in the ocean modelling section of The Environmental Data Science Book.

Contributions

Notebook

Jamila Mifdal (author), European Space Agency Φ-lab, @jmifdal

Raquel Carmo (author), European Space Agency Φ-lab, @raquelcarmo

Alejandro Coca-Castro (reviewer), The Alan Turing Institute, @acocac

Modelling codebase

Jamila Mifdal (author), European Space Agency Φ-lab, @jmifdal

Raquel Carmo (author), European Space Agency Φ-lab, @raquelcarmo

Marc Rußwurm (author), EPFL-ECEO, @marccoru
Python Data Science Handbook Dataset MD
kaggle.com
zip
Updated Apr 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
sita berete (2024). Python Data Science Handbook Dataset MD [Dataset]. https://www.kaggle.com/datasets/sitaberete/python-datascience-handbook-dataset-md/versions/1
Explore at:
zip(10781434 bytes)Available download formats
Dataset updated
Apr 8, 2024
Authors
sita berete
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset is the https://github.com/jakevdp/PythonDataScienceHandbook's Jupyter notebooks converted into Markdown for better RAG. Of course, you can use it for purposes other than RAG as long as they don't violate the LICENSE terms.

The https://github.com/jakevdp/PythonDataScienceHandbook contains the entire Python Data Science Handbook, in the form of Jupyter notebooks.
D
Data Science Notebook Platform Market Research Report 2033
dataintelo.com
csv, pdf, pptx
Updated Sep 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Data Science Notebook Platform Market Research Report 2033 [Dataset]. https://dataintelo.com/report/data-science-notebook-platform-market
Explore at:
pdf, pptx, csvAvailable download formats
Dataset updated
Sep 30, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Data Science Notebook Platform Market Outlook

According to our latest research, the global Data Science Notebook Platform market size reached USD 820 million in 2024, driven by the accelerating adoption of advanced analytics and machine learning tools across industries. The market is projected to expand at a robust CAGR of 21.6% from 2025 to 2033, reaching a forecasted value of USD 6.06 billion by 2033. This remarkable growth is underpinned by the increasing demand for collaborative, scalable, and cloud-based data science solutions as organizations prioritize data-driven decision-making and digital transformation initiatives.

One of the primary growth factors propelling the Data Science Notebook Platform market is the rapid digitalization of enterprises and the proliferation of big data. As organizations generate and collect massive volumes of structured and unstructured data, there is a pressing need for platforms that enable seamless data exploration, analysis, and visualization. Data science notebook platforms, with their interactive and user-friendly interfaces, empower data scientists, analysts, and business users to collaborate in real-time, streamline workflows, and accelerate the development of machine learning models. The increasing integration of these platforms with cloud-based data storage and processing solutions further enhances their scalability, flexibility, and accessibility, making them indispensable tools for modern data-driven enterprises.

Another significant driver is the growing adoption of artificial intelligence (AI) and machine learning (ML) across various sectors such as BFSI, healthcare, retail, and manufacturing. These industries are leveraging data science notebook platforms to develop, test, and deploy sophisticated ML algorithms that can deliver actionable insights, optimize operations, and personalize customer experiences. The ability of these platforms to support a wide range of programming languages, libraries, and frameworks—such as Python, R, TensorFlow, and PyTorch—enables organizations to innovate rapidly and stay ahead of the competition. Moreover, the rising emphasis on open-source technologies and community-driven development is fostering a vibrant ecosystem around data science notebooks, driving further innovation and adoption.

Furthermore, the shift towards remote and hybrid work models has amplified the need for collaborative data science tools that can bridge geographical and functional silos. Data science notebook platforms offer integrated collaboration features, version control, and secure sharing capabilities, enabling distributed teams to work together efficiently on complex data projects. The growing focus on democratizing data science and empowering business users with self-service analytics tools is also expanding the user base of these platforms beyond traditional data scientists to include business analysts, domain experts, and citizen data scientists. This trend is expected to continue, fueling sustained demand for versatile and user-friendly data science notebook solutions.

From a regional perspective, North America currently dominates the Data Science Notebook Platform market, accounting for the largest revenue share in 2024, thanks to its mature technology infrastructure, high concentration of data-driven enterprises, and strong presence of leading platform vendors. However, the Asia Pacific region is poised for the fastest growth over the forecast period, driven by rapid digital transformation, increasing investments in AI and analytics, and expanding talent pools in countries such as China, India, and Japan. Europe also represents a significant market, characterized by stringent data privacy regulations and a growing focus on responsible AI and ethical data practices. Meanwhile, Latin America and the Middle East & Africa are witnessing gradual adoption, supported by government initiatives and the rising penetration of cloud-based solutions.

Component Analysis

The Data Science Notebook Platform market is segmented by component into Software and Services, each playing a critical role in the overall ecosystem. The software segment encompasses a broad range of notebook solutions, including open-source platforms like Jupyter and proprietary offerings from major technology vendors. These platforms are designed to provide interactive development environments where users can write, execute, and visualize code, making it easier
Z
Inputs of the Jupyter Notebook - Met Office UKV high-resolution atmosphere...
data.niaid.nih.gov
zenodo.org
Updated Sep 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Environmental Data Science Community (2022). Inputs of the Jupyter Notebook - Met Office UKV high-resolution atmosphere model data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7087008
Explore at:
Dataset updated
Sep 18, 2022
Authors
Environmental Data Science Community
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains the inputs of the notebook "Met Office UKV high-resolution atmosphere model data" published in The Environmental Data Science Book.

The input data refer to a subset of single sample data file for 1.5 m temperature as part of the Met Office contribution to the COVID 19 modelling effort.

The full dataset was available for download from the Met Office Azure (https://metdatasa.blob.core.windows.net/covid19-response-non-commercial/). The full dataset was available for download under the terms of non-commercial purposes.

Contributions

Notebook

Samantha V. Adams (author), Met Office Informatics Lab, @svadams

Alejandro Coca-Castro (reviewer), The Alan Turing Institute, @acocac

Dataset originator/creator

Met Office Informatics Lab (creator)

Microsoft (support)

European Regional Development Fund (support)

Dataset authors

Met Office

Dataset documentation

Theo McCaie. Met office and partners offer data and compute platform for covid-19 researchers. URL: https://medium.com/informatics-lab/met-office-and-partners-offer-data-and-compute-platform-for-covid-19-researchers-83848ac55f5f.

Note this data should be used only for non-commercial purposes.
d
Hydroinformatics Instruction Module Example Code: Databases and SQL in...
search.dataone.org
beta.hydroshare.org
+1more
Updated Dec 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amber Spackman Jones; Jeffery S. Horsburgh; Camilo J. Bastidas Pacheco (2023). Hydroinformatics Instruction Module Example Code: Databases and SQL in Python [Dataset]. https://search.dataone.org/view/sha256%3A2f7a187ad86e4d584cd35755a67398ffa67d6ebfc81dc1ec01539b85ccd827dc
Explore at:
Dataset updated
Dec 30, 2023
Dataset provided by
Hydroshare
Authors
Amber Spackman Jones; Jeffery S. Horsburgh; Camilo J. Bastidas Pacheco
Description
This resource contains Jupyter Notebooks with examples that illustrate how to work with SQLite databases in Python including database creation and viewing and querying with SQL. The resource is part of set of materials for hydroinformatics and water data science instruction. Complete learning module materials are found in HydroLearn: Jones, A.S., Horsburgh, J.S., Bastidas Pacheco, C.J. (2022). Hydroinformatics and Water Data Science. HydroLearn. https://edx.hydrolearn.org/courses/course-v1:USU+CEE6110+2022/about..

This resources consists of 3 example notebooks and a SQLite database.

Notebooks: 1. Example 1: Querying databases using SQL in Python 2. Example 2: Python functions to query SQLite databases 3. Example 3: SQL join, aggregate, and subquery functions

Data files: These examples use a SQLite database that uses the Observations Data Model structure and is pre-populated with Logan River temperature data.
Z
Outputs of the Jupyter Notebook - Tree crown delineation using detectreeRGB
data.niaid.nih.gov
Updated Mar 28, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Environmental Data Science Community (2022). Outputs of the Jupyter Notebook - Tree crown delineation using detectreeRGB [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6387952
Explore at:
Dataset updated
Mar 28, 2022
Authors
Environmental Data Science Community
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains the outputs of the notebook "Tree crown detection using DeepForest" published in The Environmental Data Science Book.

Contributions

Notebook

Sebastian H. M. Hickman (author), University of Cambridge, @shmh40

Alejandro Coca-Castro (reviewer), The Alan Turing Institute, @acocac

Modelling codebase

Sebastian H. M. Hickman (author), University of Cambridge @shmh40

James G. C. Ball (contributor), University of Cambridge @PatBall1

David A. Coomes (contributor), University of Cambridge

Toby Jackson (contributor), University of Cambridge
Z
Outputs of the Jupyter Notebook - Met Office UKV high-resolution atmosphere...
data.niaid.nih.gov
Updated Jul 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Environmental Data Science Community (2024). Outputs of the Jupyter Notebook - Met Office UKV high-resolution atmosphere model data [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_5984712
Explore at:
Dataset updated
Jul 17, 2024
Authors
Environmental Data Science Community
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains the outputs of the notebook "Met Office UKV high-resolution atmosphere model data" published in the urban sensors section of The Environmental Data Science Book.

Contributions

Notebook

Samantha V. Adams (author), Met Office Informatics Lab, @svadams

Alejandro Coca-Castro (reviewer), The Alan Turing Institute, @acocac

Dataset originator/creator

Met Office Informatics Lab (creator)

Microsoft (support)

European Regional Development Fund (support)

Dataset authors

Met Office

Dataset documentation

Theo McCaie. Met office and partners offer data and compute platform for covid-19 researchers. URL: https://medium.com/informatics-lab/met-office-and-partners-offer-data-and-compute-platform-for-covid-19-researchers-83848ac55f5f.
Big Data Analytics for Scanning Transmission Electron Microscopy...
osti.gov
Updated Aug 9, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Beekman, Christianne; Belianinov, Alex; Borisevich, Albina Y; Chi, Miaofang; Jesse, Stephen; Kalinin, Sergei V; Lupini, Andrew R; Somnath, Suhas (2018). Big Data Analytics for Scanning Transmission Electron Microscopy Ptychography [Dataset]. https://www.osti.gov/dataexplorer/biblio/1463599-big-data-analytics-scanning-transmission-electron-microscopy-ptychography
Explore at:
Dataset updated
Aug 9, 2018
Dataset provided by
Office of Sciencehttp://www.er.doe.gov/
Oak Ridge Leadership Computing Facility; Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Authors
Beekman, Christianne; Belianinov, Alex; Borisevich, Albina Y; Chi, Miaofang; Jesse, Stephen; Kalinin, Sergei V; Lupini, Andrew R; Somnath, Suhas
Description
Dataset containing the raw data and results from analyses, along with supporting Jupyter notebook that shows the processing of data in the following paper: Big Data Analytics for Scanning Transmission Electron Microscopy Ptychography S. Jesse, M. Chi, A. Belianinov, C. Beekman, S. V. Kalinin, A. Y. Borisevich & A. R. Lupini Scientific Reports volume 6, Article number: 26348 (2016) https://www.nature.com/articles/srep26348
Z
Outputs of the Jupyter Notebook - MODIS MOD021KM and FIRMS
data.niaid.nih.gov
Updated Jul 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Environmental Data Science Community (2024). Outputs of the Jupyter Notebook - MODIS MOD021KM and FIRMS [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_6065610
Explore at:
Dataset updated
Jul 17, 2024
Authors
Environmental Data Science Community
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains the outputs of the notebook "MODIS MOD021KM and FIRMS" published in The Environmental Data Science Book.

Contributions

Notebook

Samuel Jackson (author), Science & Technology Facilities Council, @samueljackson92

Alejandro Coca-Castro (reviewer), The Alan Turing Institute, @acocac

Dataset originator/creator

MOD021KM

MODIS Characterization Support Team (MCST)

MODIS Adaptive Processing System (MODAPS)

Firms

University of Maryland

Dataset authors

MOD021KM

MODIS Science Data Support Team (SDST)

Firms

NASA’s Applied Sciences Program

Dataset documentation

Louis Giglio, Wilfrid Schroeder, Joanne V. Hall, and Christopher O. Justice. MODIS Collection 6 Active Fire Product User’s Guide Revision B. Technical Report, NASA, 2018. URL: https://modis-fire.umd.edu/files/MODIS_C6_Fire_User_Guide_B.pdf.
d
Data from: GeoThermalCloud framework for fusion of big data and...
catalog.data.gov
gdr.openei.org
+2more
Updated Jan 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Los Alamos National Laboratory (2025). GeoThermalCloud framework for fusion of big data and multi-physics models in Nevada and Southwest New Mexico [Dataset]. https://catalog.data.gov/dataset/geothermalcloud-framework-for-fusion-of-big-data-and-multi-physics-models-in-nevada-and-so-31a4e
Explore at:
Dataset updated
Jan 20, 2025
Dataset provided by
Los Alamos National Laboratory
Area covered
New Mexico
Description
Our GeoThermalCloud framework is designed to process geothermal datasets using a novel toolbox for unsupervised and physics-informed machine learning called SmartTensors. More information about GeoThermalCloud can be found at the GeoThermalCloud GitHub Repository. More information about SmartTensors can be found at the SmartTensors Github Repository and the SmartTensors page at LANL.gov. Links to these pages are included in this submission. GeoThermalCloud.jl is a repository containing all the data and codes required to demonstrate applications of machine learning methods for geothermal exploration. GeoThermalCloud.jl includes: - site data - simulation scripts - jupyter notebooks - intermediate results - code outputs - summary figures - readme markdown files GeoThermalCloud.jl showcases the machine learning analyses performed for the following geothermal sites: - Brady: geothermal exploration of the Brady geothermal site, Nevada - SWNM: geothermal exploration of the Southwest New Mexico (SWNM) region - GreatBasin: geothermal exploration of the Great Basin region, Nevada Reports, research papers, and presentations summarizing these machine learning analyses are also available and will be posted soon.

Arcade Natural Language to Code Challenge

kaggle.com

zip

Updated Feb 22, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Google AI (2023). Arcade Natural Language to Code Challenge [Dataset]. https://www.kaggle.com/datasets/googleai/arcade-nl2code-dataset

Explore at:

zip(3921922 bytes)Available download formats

Dataset updated

Feb 22, 2023

Dataset authored and provided by

Google AI

Description

Arcade: Natural Language to Code Generation in Interactive Computing Notebooks

Arcade is a collection of natural language to code problems on interactive data science notebooks. Each problem features an NL intent as problem specification, a reference code solution, and preceding notebook context (Markdown or code cells). Arcade can be used to evaluate the accuracies of code large language models in generating data science programs given natural language instructions. Please read our paper for more details.

Note👉 This Kaggle dataset only contains the dataset files of Arcade. Refer to our main Github repository for detailed instructions to use this dataset.

Folder Structure

Below is the structure of its content:

└── ./
  ├── existing_tasks # Problems derived from existing data science notebooks on Github/
  │  ├── metadata.json # Metadata by `build_existing_tasks_split.py` to reproduce this split.
  │  ├── artifacts/ # Folder that stores dependent ML datasets to execute the problems, created by running `build_existing_tasks_split.py`
  │  └── derived_datasets/ # Folder for preprocessed datasets used for prompting experiments.
  ├── new_tasks/
  │  ├── dataset.json # Original, unprepossessed dataset
  │  ├── kaggle_dataset_provenance.csv # Metadata of the Kaggle datasets used to build this split.
  │  ├── artifacts/ # Folder that stores dependent ML Kaggle datasets to execute the problems, created by running `build_new_tasks_split.py`
  │  └── derived_datasets/ # Folder for preprocessed datasets used for prompting experiments.
  └── checksums.txt # Table of MD5 checksums of dataset files.

Dataset File Structure

All the dataset '*.json' files follow the same structure. Each dataset file is a Json-serialized list of Episodes. Each episode corresponds to a notebook annotated with NL-to-code problems. The structure of an episode is documented below:

{
  "notebook_name": "Name of the notebook.",
  "work_dir": "Path to the dependent data artifacts (e.g., ML datasets) to execute the notebook.",
  "annotator": "Anonymized annotator Id."
  "turns": [
    # A list of natural language to code examples using the current notebook context.
    {
      "input": "Prompt to a code generation model.",
      "turn": {
        "intent": {
          "value": "Annotated NL intent for the current turn.",
          "is_cell_intent": "Metadata used for the existing tasks split to indicate if the code solution is only part of an existing code cell.",
          "cell_idx": "Index of the intent Markdown cell.",
          "line_span": "Line span of the intent.",
          "not_sure": "Annotation confidence.",
          "output_variables": "List of variable names denoting the output. If None, use the output of the last line of code as the output of the problem.",
        },
        "code": {
          "value": "Reference code solution.",
          "cell_idx": "Cell index of the code cell containing the solution.",
          "num_lines": "Number of lines in the reference solution.",
          "line_span": "Line span.",
        },
        "code_context": "Context code (all code cells before this problem) that need to be executed before executing the reference/predicted programs.",
        "delta_code_context": "Delta context code between the last problem in this notebook and the current problem, useful for incremental execution.",
        "metadata": {
          "annotator_id": "Annotator Id",
          "num_code_lines": "Metadata, please ignore.",
          "utterance_without_output_spec": "Annotated NL intent without output specification. Refer to the paper for details.",
        },
      },
      "notebook": "Field intended to store the Json-serialized Jupyter notebook. Not used for now since the notebook can be reconstructed from other metadata in this file.",
      "metadata": {
        # A dict of metadata of this turn.
        "context_cells": [ # A list of context cells before the problem.
          {
            "cell_type": "code|markdown",
            "source": "Cell content."
          },
        ],
        "delta_cell_num": "Number of preceding context cells between the prior turn and the current turn.",
        # The following fields only occur in datasets inlined with schema descriptions.
        "context_cell_num": "Number of context cells in the prompt after inserting schema descriptions and left-truncation.",
        "inten...

Z
Inputs of the Jupyter Notebook - Cosmos-UK soil moisture
data.niaid.nih.gov
zenodo.org
Updated Dec 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Environmental Data Science Community (2022). Inputs of the Jupyter Notebook - Cosmos-UK soil moisture [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6562981
Explore at:
Dataset updated
Dec 7, 2022
Authors
Environmental Data Science Community
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United Kingdom
Description
The dataset contains the inputs of the notebook "Cosmos-UK soil moisture" published in The Environmental Data Science Book.

The input data refer to a subset of the public 2013-2019 COSMOS-UK dataset, daily and subhourly observations and metadata for four stations: WYTH1, WADDN, SHEEP and CHIMN. These stations represent the first sites to prototype COSMOS sensors in the UK, see further details in Evans et al. (2016) and they are situated in human-intervened areas (grassland and cropland), except for one in a woodland land cover site.

Data from COSMOS-UK up to the end of 2019 are available for download from the UKCEH Environmental Information Data Centre (EIDC). The data are accompanied by documentation that describes the site-specific instrumentation, data and processing including quality control. The full dataset is available for download under the terms of the Open Government License.

Contributions

Notebook

Alejandro Coca-Castro (author), The Alan Turing Institute, @acocac

Doran Khamis (reviewer), UK Centre for Ecology & Hydrology, @dorankhamis

Matt Fry (reviewer), UK Centre for Ecology & Hydrology, @mattfry-ceh

Dataset originator/creator

UK Centre for Ecology & Hydrology (creator)

Natural Environment Research Council (support)

Dataset reference and documentation

S. Stanley, V. Antoniou, A. Askquith-Ellis, L.A. Ball, E.S. Bennett, J.R. Blake, D.B. Boorman, M. Brooks, M. Clarke, H.M. Cooper, N. Cowan, A. Cumming, J.G. Evans, P. Farrand, M. Fry, O.E. Hitt, W.D. Lord, R. Morrison, G.V. Nash, D. Rylett, P.M. Scarlett, O.D. Swain, M. Szczykulska, J.L. Thornton, E.J. Trill, A.C. Warwick, and B. Winterbourn. Daily and sub-daily hydrometeorological and soil data (2013-2019) [cosmos-uk]. 2021. URL: https://doi.org/10.5285/b5c190e4-e35d-40ea-8fbe-598da03a1185, doi:10.5285/b5c190e4-e35d-40ea-8fbe-598da03a1185.

Further references

Jonathan G. Evans, H. C. Ward, J. R. Blake, E. J. Hewitt, R. Morrison, M. Fry, L. A. Ball, L. C. Doughty, J. W. Libre, O. E. Hitt, D. Rylett, R. J. Ellis, A. C. Warwick, M. Brooks, M. A. Parkes, G. M.H. Wright, A. C. Singer, D. B. Boorman, and A. Jenkins. Soil water content in southern england derived from a cosmic-ray soil moisture observing system – cosmos-uk. Hydrological Processes, 30:4987–4999, 12 2016. doi:10.1002/hyp.10929.

M. Zreda, W. J. Shuttleworth, X. Zeng, C. Zweck, D. Desilets, T. Franz, and R. Rosolem. Cosmos: the cosmic-ray soil moisture observing system. Hydrology and Earth System Sciences, 16(11):4079–4099, 2012. URL: https://hess.copernicus.org/articles/16/4079/2012/, doi:10.5194/hess-16-4079-2012.
Z
Outputs of the Jupyter Notebook - Deep learning and variational inversion to...
data.niaid.nih.gov
Updated Aug 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Environmental Data Science book Community (2023). Outputs of the Jupyter Notebook - Deep learning and variational inversion to quantify and attribute climate change (CIRC23) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8279574
Explore at:
Dataset updated
Aug 25, 2023
Authors
Environmental Data Science book Community
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains the outputs of the notebook "Deep learning and variational inversion to quantify and attribute climate change (CIRC23)" published in The Environmental Data Science Book.
P
Python Integrated Development Environment (IDE) Software Report
datainsightsmarket.com
doc, pdf, ppt
Updated May 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Python Integrated Development Environment (IDE) Software Report [Dataset]. https://www.datainsightsmarket.com/reports/python-integrated-development-environment-ide-software-1971834
Explore at:
ppt, pdf, docAvailable download formats
Dataset updated
May 18, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Python IDE market is booming, projected to reach $1.08B by 2033 at an 8.1% CAGR. Explore key trends, leading companies (PyCharm, Eclipse, AWS Cloud9), and regional market analysis in this comprehensive report. Discover the impact of cloud-based IDEs and the future of Python development tools.

Facebook

Twitter

Click to copy link

Link copied

Cite

Timo Bozsolik (2021). Python Data Science Handbook [Dataset]. https://www.kaggle.com/timoboz/python-data-science-handbook

Python Data Science Handbook

The entire Python Data Science Handbook, in the form of free Jupyter notebooks.

Explore at:

zip(16028316 bytes)Available download formats

Dataset updated

Dec 20, 2021

Authors

Timo Bozsolik

Description

Python Data Science Handbook

This repository contains the entire Python Data Science Handbook, in the form of (free!) Jupyter notebooks.

cover image

How to Use this Book

Read the book in its entirety online at https://jakevdp.github.io/PythonDataScienceHandbook/
Run the code using the Jupyter notebooks available in this repository's notebooks directory.
Launch executable versions of these notebooks using Google Colab:
Launch a live notebook server with these notebooks using binder:
Buy the printed book through O'Reilly Media

About

The book was written and tested with Python 3.5, though other Python versions (including Python 2.7) should work in nearly all cases.

The book introduces the core libraries essential for working with data in Python: particularly IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and related packages. Familiarity with Python as a language is assumed; if you need a quick introduction to the language itself, see the free companion project, A Whirlwind Tour of Python: it's a fast-paced introduction to the Python language aimed at researchers and scientists.

See Index.ipynb for an index of the notebooks available to accompany the text.

Software

The code in the book was tested with Python 3.5, though most (but not all) will also work correctly with Python 2.7 and other older Python versions.

The packages I used to run the code in the book are listed in requirements.txt (Note that some of these exact version numbers may not be available on your platform: you may have to tweak them for your own use). To install the requirements using conda, run the following at the command-line:

$ conda install --file requirements.txt

To create a stand-alone environment named PDSH with Python 3.5 and all the required package versions, run the following:

$ conda create -n PDSH python=3.5 --file requirements.txt

You can read more about using conda environments in the Managing Environments section of the conda documentation.

License

Code

The code in this repository, including all code samples in the notebooks listed above, is released under the MIT license. Read more at the Open Source Initiative.

Text

The text content of the book is released under the CC-BY-NC-ND license. Read more at Creative Commons.

Clear search

Close search

Google apps

Main menu

Python Data Science Handbook

Python Data Science Handbook

How to Use this Book

About

Software

License

Code

Text

Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter...

Data from: Data Science Problems

Reproducibility in Practice: Dataset of a Large-Scale Study of Jupyter...

(HS 2) Automate Workflows using Jupyter notebook to create Large Extent...

Data from: A large-scale comparative analysis of Coding Standard conformance...

Outputs of the Jupyter Notebook - Detecting floating objects using Deep...

Python Data Science Handbook Dataset MD

Data Science Notebook Platform Market Research Report 2033

Data Science Notebook Platform Market Outlook

Component Analysis

Inputs of the Jupyter Notebook - Met Office UKV high-resolution atmosphere...

Hydroinformatics Instruction Module Example Code: Databases and SQL in...

Outputs of the Jupyter Notebook - Tree crown delineation using detectreeRGB

Outputs of the Jupyter Notebook - Met Office UKV high-resolution atmosphere...

Big Data Analytics for Scanning Transmission Electron Microscopy...

Outputs of the Jupyter Notebook - MODIS MOD021KM and FIRMS

Data from: GeoThermalCloud framework for fusion of big data and...

Arcade Natural Language to Code Challenge

Arcade: Natural Language to Code Generation in Interactive Computing Notebooks

Folder Structure

Dataset File Structure

Inputs of the Jupyter Notebook - Cosmos-UK soil moisture

Outputs of the Jupyter Notebook - Deep learning and variational inversion to...

Python Integrated Development Environment (IDE) Software Report

Python Data Science Handbook

The entire Python Data Science Handbook, in the form of free Jupyter notebooks.

Python Data Science Handbook

How to Use this Book

About

Software

License

Code

Text