71 datasets found
  1. Python Data Science Handbook

    • kaggle.com
    zip
    Updated Dec 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Timo Bozsolik (2021). Python Data Science Handbook [Dataset]. https://www.kaggle.com/timoboz/python-data-science-handbook
    Explore at:
    zip(16028316 bytes)Available download formats
    Dataset updated
    Dec 20, 2021
    Authors
    Timo Bozsolik
    Description

    Python Data Science Handbook

    This repository contains the entire Python Data Science Handbook, in the form of (free!) Jupyter notebooks.

    cover image

    How to Use this Book

    About

    The book was written and tested with Python 3.5, though other Python versions (including Python 2.7) should work in nearly all cases.

    The book introduces the core libraries essential for working with data in Python: particularly IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and related packages. Familiarity with Python as a language is assumed; if you need a quick introduction to the language itself, see the free companion project, A Whirlwind Tour of Python: it's a fast-paced introduction to the Python language aimed at researchers and scientists.

    See Index.ipynb for an index of the notebooks available to accompany the text.

    Software

    The code in the book was tested with Python 3.5, though most (but not all) will also work correctly with Python 2.7 and other older Python versions.

    The packages I used to run the code in the book are listed in requirements.txt (Note that some of these exact version numbers may not be available on your platform: you may have to tweak them for your own use). To install the requirements using conda, run the following at the command-line:

    $ conda install --file requirements.txt
    

    To create a stand-alone environment named PDSH with Python 3.5 and all the required package versions, run the following:

    $ conda create -n PDSH python=3.5 --file requirements.txt
    

    You can read more about using conda environments in the Managing Environments section of the conda documentation.

    License

    Code

    The code in this repository, including all code samples in the notebooks listed above, is released under the MIT license. Read more at the Open Source Initiative.

    Text

    The text content of the book is released under the CC-BY-NC-ND license. Read more at Creative Commons.

  2. Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter...

    • zenodo.org
    bz2
    Updated Mar 15, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana (2021). Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter Notebooks [Dataset]. http://doi.org/10.5281/zenodo.2592524
    Explore at:
    bz2Available download formats
    Dataset updated
    Mar 15, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of Jupyter Notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we analyzed 1.4 million notebooks from GitHub.

    Paper: https://2019.msrconf.org/event/msr-2019-papers-a-large-scale-study-about-quality-and-reproducibility-of-jupyter-notebooks

    This repository contains two files:

    • dump.tar.bz2
    • jupyter_reproducibility.tar.bz2

    The dump.tar.bz2 file contains a PostgreSQL dump of the database, with all the data we extracted from the notebooks.

    The jupyter_reproducibility.tar.bz2 file contains all the scripts we used to query and download Jupyter Notebooks, extract data from them, and analyze the data. It is organized as follows:

    • analyses: this folder has all the notebooks we use to analyze the data in the PostgreSQL database.
    • archaeology: this folder has all the scripts we use to query, download, and extract data from GitHub notebooks.
    • paper: empty. The notebook analyses/N12.To.Paper.ipynb moves data to it

    In the remaining of this text, we give instructions for reproducing the analyses, by using the data provided in the dump and reproducing the collection, by collecting data from GitHub again.

    Reproducing the Analysis

    This section shows how to load the data in the database and run the analyses notebooks. In the analysis, we used the following environment:

    Ubuntu 18.04.1 LTS
    PostgreSQL 10.6
    Conda 4.5.11
    Python 3.7.2
    PdfCrop 2012/11/02 v1.38

    First, download dump.tar.bz2 and extract it:

    tar -xjf dump.tar.bz2

    It extracts the file db2019-03-13.dump. Create a database in PostgreSQL (we call it "jupyter"), and use psql to restore the dump:

    psql jupyter < db2019-03-13.dump

    It populates the database with the dump. Now, configure the connection string for sqlalchemy by setting the environment variable JUP_DB_CONNECTTION:

    export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter";

    Download and extract jupyter_reproducibility.tar.bz2:

    tar -xjf jupyter_reproducibility.tar.bz2

    Create a conda environment with Python 3.7:

    conda create -n analyses python=3.7
    conda activate analyses

    Go to the analyses folder and install all the dependencies of the requirements.txt

    cd jupyter_reproducibility/analyses
    pip install -r requirements.txt

    For reproducing the analyses, run jupyter on this folder:

    jupyter notebook

    Execute the notebooks on this order:

    • Index.ipynb
    • N0.Repository.ipynb
    • N1.Skip.Notebook.ipynb
    • N2.Notebook.ipynb
    • N3.Cell.ipynb
    • N4.Features.ipynb
    • N5.Modules.ipynb
    • N6.AST.ipynb
    • N7.Name.ipynb
    • N8.Execution.ipynb
    • N9.Cell.Execution.Order.ipynb
    • N10.Markdown.ipynb
    • N11.Repository.With.Notebook.Restriction.ipynb
    • N12.To.Paper.ipynb

    Reproducing or Expanding the Collection

    The collection demands more steps to reproduce and takes much longer to run (months). It also involves running arbitrary code on your machine. Proceed with caution.

    Requirements

    This time, we have extra requirements:

    All the analysis requirements
    lbzip2 2.5
    gcc 7.3.0
    Github account
    Gmail account

    Environment

    First, set the following environment variables:

    export JUP_MACHINE="db"; # machine identifier
    export JUP_BASE_DIR="/mnt/jupyter/github"; # place to store the repositories
    export JUP_LOGS_DIR="/home/jupyter/logs"; # log files
    export JUP_COMPRESSION="lbzip2"; # compression program
    export JUP_VERBOSE="5"; # verbose level
    export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter"; # sqlchemy connection
    export JUP_GITHUB_USERNAME="github_username"; # your github username
    export JUP_GITHUB_PASSWORD="github_password"; # your github password
    export JUP_MAX_SIZE="8000.0"; # maximum size of the repositories directory (in GB)
    export JUP_FIRST_DATE="2013-01-01"; # initial date to query github
    export JUP_EMAIL_LOGIN="gmail@gmail.com"; # your gmail address
    export JUP_EMAIL_TO="target@email.com"; # email that receives notifications
    export JUP_OAUTH_FILE="~/oauth2_creds.json" # oauth2 auhentication file
    export JUP_NOTEBOOK_INTERVAL=""; # notebook id interval for this machine. Leave it in blank
    export JUP_REPOSITORY_INTERVAL=""; # repository id interval for this machine. Leave it in blank
    export JUP_WITH_EXECUTION="1"; # run execute python notebooks
    export JUP_WITH_DEPENDENCY="0"; # run notebooks with and without declared dependnecies
    export JUP_EXECUTION_MODE="-1"; # run following the execution order
    export JUP_EXECUTION_DIR="/home/jupyter/execution"; # temporary directory for running notebooks
    export JUP_ANACONDA_PATH="~/anaconda3"; # conda installation path
    export JUP_MOUNT_BASE="/home/jupyter/mount_ghstudy.sh"; # bash script to mount base dir
    export JUP_UMOUNT_BASE="/home/jupyter/umount_ghstudy.sh"; # bash script to umount base dir
    export JUP_NOTEBOOK_TIMEOUT="300"; # timeout the extraction
    
    
    # Frequenci of log report
    export JUP_ASTROID_FREQUENCY="5";
    export JUP_IPYTHON_FREQUENCY="5";
    export JUP_NOTEBOOKS_FREQUENCY="5";
    export JUP_REQUIREMENT_FREQUENCY="5";
    export JUP_CRAWLER_FREQUENCY="1";
    export JUP_CLONE_FREQUENCY="1";
    export JUP_COMPRESS_FREQUENCY="5";
    
    export JUP_DB_IP="localhost"; # postgres database IP

    Then, configure the file ~/oauth2_creds.json, according to yagmail documentation: https://media.readthedocs.org/pdf/yagmail/latest/yagmail.pdf

    Configure the mount_ghstudy.sh and umount_ghstudy.sh scripts. The first one should mount the folder that stores the directories. The second one should umount it. You can leave the scripts in blank, but it is not advisable, as the reproducibility study runs arbitrary code on your machine and you may lose your data.

    Scripts

    Download and extract jupyter_reproducibility.tar.bz2:

    tar -xjf jupyter_reproducibility.tar.bz2

    Install 5 conda environments and 5 anaconda environments, for each python version. In each of them, upgrade pip, install pipenv, and install the archaeology package (Note that it is a local package that has not been published to pypi. Make sure to use the -e option):

    Conda 2.7

    conda create -n raw27 python=2.7 -y
    conda activate raw27
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 2.7

    conda create -n py27 python=2.7 anaconda -y
    conda activate py27
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology
    

    Conda 3.4

    It requires a manual jupyter and pathlib2 installation due to some incompatibilities found on the default installation.

    conda create -n raw34 python=3.4 -y
    conda activate raw34
    conda install jupyter -c conda-forge -y
    conda uninstall jupyter -y
    pip install --upgrade pip
    pip install jupyter
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology
    pip install pathlib2

    Anaconda 3.4

    conda create -n py34 python=3.4 anaconda -y
    conda activate py34
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.5

    conda create -n raw35 python=3.5 -y
    conda activate raw35
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 3.5

    It requires the manual installation of other anaconda packages.

    conda create -n py35 python=3.5 anaconda -y
    conda install -y appdirs atomicwrites keyring secretstorage libuuid navigator-updater prometheus_client pyasn1 pyasn1-modules spyder-kernels tqdm jeepney automat constantly anaconda-navigator
    conda activate py35
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.6

    conda create -n raw36 python=3.6 -y
    conda activate raw36
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 3.6

    conda create -n py36 python=3.6 anaconda -y
    conda activate py36
    conda install -y anaconda-navigator jupyterlab_server navigator-updater
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.7

    <code

  3. g

    Data from: Data Science Problems

    • github.com
    • opendatalab.com
    Updated Feb 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Data Science Problems [Dataset]. https://github.com/microsoft/DataScienceProblems
    Explore at:
    Dataset updated
    Feb 8, 2022
    License

    https://github.com/microsoft/DataScienceProblems/blob/main/LICENSE.txthttps://github.com/microsoft/DataScienceProblems/blob/main/LICENSE.txt

    Description

    Evaluate a natural language code generation model on real data science pedagogical notebooks! Data Science Problems (DSP) includes well-posed data science problems in Markdown along with unit tests to verify correctness and a Docker environment for reproducible execution. About 1/3 of notebooks in this benchmark also include data dependencies, so this benchmark not only can test a model's ability to chain together complex tasks, but also evaluate the solutions on real data! See our paper Training and Evaluating a Jupyter Notebook Data Science Assistant (https://arxiv.org/abs/2201.12901) for more details about state of the art results and other properties of the dataset.

  4. Reproducibility in Practice: Dataset of a Large-Scale Study of Jupyter...

    • zenodo.org
    bz2
    Updated Mar 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous; Anonymous (2021). Reproducibility in Practice: Dataset of a Large-Scale Study of Jupyter Notebooks [Dataset]. http://doi.org/10.5281/zenodo.2546834
    Explore at:
    bz2Available download formats
    Dataset updated
    Mar 15, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous; Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of Jupyter Notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we analyzed 1.4 million notebooks from GitHub.

    This repository contains two files:

    • dump.tar.bz2
    • jupyter_reproducibility.tar.bz2

    The dump.tar.bz2 file contains a PostgreSQL dump of the database, with all the data we extracted from the notebooks.

    The jupyter_reproducibility.tar.bz2 file contains all the scripts we used to query and download Jupyter Notebooks, extract data from them, and analyze the data. It is organized as follows:

    • analyses: this folder has all the notebooks we use to analyze the data in the PostgreSQL database.
    • archaeology: this folder has all the scripts we use to query, download, and extract data from GitHub notebooks.
    • paper: empty. The notebook analyses/N11.To.Paper.ipynb moves data to it

    In the remaining of this text, we give instructions for reproducing the analyses, by using the data provided in the dump and reproducing the collection, by collecting data from GitHub again.

    Reproducing the Analysis

    This section shows how to load the data in the database and run the analyses notebooks. In the analysis, we used the following environment:

    Ubuntu 18.04.1 LTS
    PostgreSQL 10.6
    Conda 4.5.1
    Python 3.6.8
    PdfCrop 2012/11/02 v1.38

    First, download dump.tar.bz2 and extract it:

    tar -xjf dump.tar.bz2

    It extracts the file db2019-01-13.dump. Create a database in PostgreSQL (we call it "jupyter"), and use psql to restore the dump:

    psql jupyter < db2019-01-13.dump

    It populates the database with the dump. Now, configure the connection string for sqlalchemy by setting the environment variable JUP_DB_CONNECTTION:

    export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter";

    Download and extract jupyter_reproducibility.tar.bz2:

    tar -xjf jupyter_reproducibility.tar.bz2

    Create a conda environment with Python 3.6:

    conda create -n py36 python=3.6

    Go to the analyses folder and install all the dependencies of the requirements.txt

    cd jupyter_reproducibility/analyses
    pip install -r requirements.txt

    For reproducing the analyses, run jupyter on this folder:

    jupyter notebook

    Execute the notebooks on this order:

    • N0.Index.ipynb
    • N1.Repository.ipynb
    • N2.Notebook.ipynb
    • N3.Cell.ipynb
    • N4.Features.ipynb
    • N5.Modules.ipynb
    • N6.AST.ipynb
    • N7.Name.ipynb
    • N8.Execution.ipynb
    • N9.Cell.Execution.Order.ipynb
    • N10.Markdown.ipynb
    • N11.To.Paper.ipynb

    Reproducing or Expanding the Collection

    The collection demands more steps to reproduce and takes much longer to run (months). It also involves running arbitrary code on your machine. Proceed with caution.

    Requirements

    This time, we have extra requirements:

    All the analysis requirements
    lbzip2 2.5
    gcc 7.3.0
    Github account
    Gmail account

    Environment

    First, set the following environment variables:

    export JUP_MACHINE="db"; # machine identifier
    export JUP_BASE_DIR="/mnt/jupyter/github"; # place to store the repositories
    export JUP_LOGS_DIR="/home/jupyter/logs"; # log files
    export JUP_COMPRESSION="lbzip2"; # compression program
    export JUP_VERBOSE="5"; # verbose level
    export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter"; # sqlchemy connection
    export JUP_GITHUB_USERNAME="github_username"; # your github username
    export JUP_GITHUB_PASSWORD="github_password"; # your github password
    export JUP_MAX_SIZE="8000.0"; # maximum size of the repositories directory (in GB)
    export JUP_FIRST_DATE="2013-01-01"; # initial date to query github
    export JUP_EMAIL_LOGIN="gmail@gmail.com"; # your gmail address
    export JUP_EMAIL_TO="target@email.com"; # email that receives notifications
    export JUP_OAUTH_FILE="~/oauth2_creds.json" # oauth2 auhentication file
    export JUP_NOTEBOOK_INTERVAL=""; # notebook id interval for this machine. Leave it in blank
    export JUP_REPOSITORY_INTERVAL=""; # repository id interval for this machine. Leave it in blank
    export JUP_WITH_EXECUTION="1"; # run execute python notebooks
    export JUP_WITH_DEPENDENCY="0"; # run notebooks with and without declared dependnecies
    export JUP_EXECUTION_MODE="-1"; # run following the execution order
    export JUP_EXECUTION_DIR="/home/jupyter/execution"; # temporary directory for running notebooks
    export JUP_ANACONDA_PATH="~/anaconda3"; # conda installation path
    export JUP_MOUNT_BASE="/home/jupyter/mount_ghstudy.sh"; # bash script to mount base dir
    export JUP_UMOUNT_BASE="/home/jupyter/umount_ghstudy.sh"; # bash script to umount base dir
    export JUP_NOTEBOOK_TIMEOUT="300"; # timeout the extraction
    
    
    # Frequenci of log report
    export JUP_ASTROID_FREQUENCY="5";
    export JUP_IPYTHON_FREQUENCY="5";
    export JUP_NOTEBOOKS_FREQUENCY="5";
    export JUP_REQUIREMENT_FREQUENCY="5";
    export JUP_CRAWLER_FREQUENCY="1";
    export JUP_CLONE_FREQUENCY="1";
    export JUP_COMPRESS_FREQUENCY="5";
    
    export JUP_DB_IP="localhost"; # postgres database IP

    Then, configure the file ~/oauth2_creds.json, according to yagmail documentation: https://media.readthedocs.org/pdf/yagmail/latest/yagmail.pdf

    Configure the mount_ghstudy.sh and umount_ghstudy.sh scripts. The first one should mount the folder that stores the directories. The second one should umount it. You can leave the scripts in blank, but it is not advisable, as the reproducibility study runs arbitrary code on your machine and you may lose your data.

    Scripts

    Download and extract jupyter_reproducibility.tar.bz2:

    tar -xjf jupyter_reproducibility.tar.bz2

    Install 5 conda environments and 5 anaconda environments, for each python version. In each of them, upgrade pip, install pipenv, and install the archaeology package (Note that it is a local package that has not been published to pypi. Make sure to use the -e option):

    Conda 2.7

    conda create -n raw27 python=2.7 -y
    conda activate raw27
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 2.7

    conda create -n py27 python=2.7 anaconda -y
    conda activate py27
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology
    

    Conda 3.4

    It requires a manual jupyter and pathlib2 installation due to some incompatibilities found on the default installation.

    conda create -n raw34 python=3.4 -y
    conda activate raw34
    conda install jupyter -c conda-forge -y
    conda uninstall jupyter -y
    pip install --upgrade pip
    pip install jupyter
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology
    pip install pathlib2

    Anaconda 3.4

    conda create -n py34 python=3.4 anaconda -y
    conda activate py34
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.5

    conda create -n raw35 python=3.5 -y
    conda activate raw35
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 3.5

    It requires the manual installation of other anaconda packages.

    conda create -n py35 python=3.5 anaconda -y
    conda install -y appdirs atomicwrites keyring secretstorage libuuid navigator-updater prometheus_client pyasn1 pyasn1-modules spyder-kernels tqdm jeepney automat constantly anaconda-navigator
    conda activate py35
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.6

    conda create -n raw36 python=3.6 -y
    conda activate raw36
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 3.6

    conda create -n py36 python=3.6 anaconda -y
    conda activate py36
    conda install -y anaconda-navigator jupyterlab_server navigator-updater
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.7

    conda create -n raw37 python=3.7 -y
    conda activate raw37
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 3.7

    When we

  5. H

    (HS 2) Automate Workflows using Jupyter notebook to create Large Extent...

    • hydroshare.org
    • search.dataone.org
    zip
    Updated Oct 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Young-Don Choi (2024). (HS 2) Automate Workflows using Jupyter notebook to create Large Extent Spatial Datasets [Dataset]. http://doi.org/10.4211/hs.a52df87347ef47c388d9633925cde9ad
    Explore at:
    zip(2.4 MB)Available download formats
    Dataset updated
    Oct 15, 2024
    Dataset provided by
    HydroShare
    Authors
    Young-Don Choi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We implemented automated workflows using Jupyter notebooks for each state. The GIS processing, crucial for merging, extracting, and projecting GeoTIFF data, was performed using ArcPy—a Python package for geographic data analysis, conversion, and management within ArcGIS (Toms, 2015). After generating state-scale LES (large extent spatial) datasets in GeoTIFF format, we utilized the xarray and rioxarray Python packages to convert GeoTIFF to NetCDF. Xarray is a Python package to work with multi-dimensional arrays and rioxarray is rasterio xarray extension. Rasterio is a Python library to read and write GeoTIFF and other raster formats. Xarray facilitated data manipulation and metadata addition in the NetCDF file, while rioxarray was used to save GeoTIFF as NetCDF. These procedures resulted in the creation of three HydroShare resources (HS 3, HS 4 and HS 5) for sharing state-scale LES datasets. Notably, due to licensing constraints with ArcGIS Pro, a commercial GIS software, the Jupyter notebook development was undertaken on a Windows OS.

  6. Data from: A large-scale comparative analysis of Coding Standard conformance...

    • figshare.com
    application/x-gzip
    Updated Oct 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anj Simmons; Scott Barnett; Jessica Rivera-Villicana; Akshat Bajaj; Rajesh Vasa (2021). A large-scale comparative analysis of Coding Standard conformance in Open-Source Data Science projects [Dataset]. http://doi.org/10.6084/m9.figshare.12377237.v3
    Explore at:
    application/x-gzipAvailable download formats
    Dataset updated
    Oct 4, 2021
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Anj Simmons; Scott Barnett; Jessica Rivera-Villicana; Akshat Bajaj; Rajesh Vasa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This study investigates the extent to which data science projects follow code standards. In particular, which standards are followed, which are ignored, and how does this differ to traditional software projects? We compare a corpus of 1048 Open-Source Data Science projects to a reference group of 1099 non-Data Science projects with a similar level of quality and maturity.results.tar.gz: Extracted data for each project, including raw logs of all detected code violations.notebooks_out.tar.gz: Tables and figures generated by notebooks.source_code_anonymized.tar.gz: Anonymized source code (at time of publication) to identify, clone, and analyse the projects. Also includes Jupyter notebooks used to produce figures in the paper.The latest source code can be found at: https://github.com/a2i2/mining-data-science-repositoriesPublished in ESEM 2020: https://doi.org/10.1145/3382494.3410680Preprint: https://arxiv.org/abs/2007.08978

  7. Z

    Outputs of the Jupyter Notebook - Detecting floating objects using Deep...

    • data.niaid.nih.gov
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Environmental Data Science Community (2022). Outputs of the Jupyter Notebook - Detecting floating objects using Deep Learning and Sentinel-2 imagery [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5911142
    Explore at:
    Dataset updated
    Jan 28, 2022
    Authors
    Environmental Data Science Community
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains the outputs of the notebook "Detecting floating objects using Deep Learning and Sentinel-2 imagery" published in the ocean modelling section of The Environmental Data Science Book.

    Contributions

    Notebook

    Jamila Mifdal (author), European Space Agency Φ-lab, @jmifdal

    Raquel Carmo (author), European Space Agency Φ-lab, @raquelcarmo

    Alejandro Coca-Castro (reviewer), The Alan Turing Institute, @acocac

    Modelling codebase

    Jamila Mifdal (author), European Space Agency Φ-lab, @jmifdal

    Raquel Carmo (author), European Space Agency Φ-lab, @raquelcarmo

    Marc Rußwurm (author), EPFL-ECEO, @marccoru

  8. Python Data Science Handbook Dataset MD

    • kaggle.com
    zip
    Updated Apr 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sita berete (2024). Python Data Science Handbook Dataset MD [Dataset]. https://www.kaggle.com/datasets/sitaberete/python-datascience-handbook-dataset-md/versions/1
    Explore at:
    zip(10781434 bytes)Available download formats
    Dataset updated
    Apr 8, 2024
    Authors
    sita berete
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset is the https://github.com/jakevdp/PythonDataScienceHandbook's Jupyter notebooks converted into Markdown for better RAG. Of course, you can use it for purposes other than RAG as long as they don't violate the LICENSE terms.

    The https://github.com/jakevdp/PythonDataScienceHandbook contains the entire Python Data Science Handbook, in the form of Jupyter notebooks.

  9. D

    Data Science Notebook Platform Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Data Science Notebook Platform Market Research Report 2033 [Dataset]. https://dataintelo.com/report/data-science-notebook-platform-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Data Science Notebook Platform Market Outlook



    According to our latest research, the global Data Science Notebook Platform market size reached USD 820 million in 2024, driven by the accelerating adoption of advanced analytics and machine learning tools across industries. The market is projected to expand at a robust CAGR of 21.6% from 2025 to 2033, reaching a forecasted value of USD 6.06 billion by 2033. This remarkable growth is underpinned by the increasing demand for collaborative, scalable, and cloud-based data science solutions as organizations prioritize data-driven decision-making and digital transformation initiatives.




    One of the primary growth factors propelling the Data Science Notebook Platform market is the rapid digitalization of enterprises and the proliferation of big data. As organizations generate and collect massive volumes of structured and unstructured data, there is a pressing need for platforms that enable seamless data exploration, analysis, and visualization. Data science notebook platforms, with their interactive and user-friendly interfaces, empower data scientists, analysts, and business users to collaborate in real-time, streamline workflows, and accelerate the development of machine learning models. The increasing integration of these platforms with cloud-based data storage and processing solutions further enhances their scalability, flexibility, and accessibility, making them indispensable tools for modern data-driven enterprises.




    Another significant driver is the growing adoption of artificial intelligence (AI) and machine learning (ML) across various sectors such as BFSI, healthcare, retail, and manufacturing. These industries are leveraging data science notebook platforms to develop, test, and deploy sophisticated ML algorithms that can deliver actionable insights, optimize operations, and personalize customer experiences. The ability of these platforms to support a wide range of programming languages, libraries, and frameworks—such as Python, R, TensorFlow, and PyTorch—enables organizations to innovate rapidly and stay ahead of the competition. Moreover, the rising emphasis on open-source technologies and community-driven development is fostering a vibrant ecosystem around data science notebooks, driving further innovation and adoption.




    Furthermore, the shift towards remote and hybrid work models has amplified the need for collaborative data science tools that can bridge geographical and functional silos. Data science notebook platforms offer integrated collaboration features, version control, and secure sharing capabilities, enabling distributed teams to work together efficiently on complex data projects. The growing focus on democratizing data science and empowering business users with self-service analytics tools is also expanding the user base of these platforms beyond traditional data scientists to include business analysts, domain experts, and citizen data scientists. This trend is expected to continue, fueling sustained demand for versatile and user-friendly data science notebook solutions.




    From a regional perspective, North America currently dominates the Data Science Notebook Platform market, accounting for the largest revenue share in 2024, thanks to its mature technology infrastructure, high concentration of data-driven enterprises, and strong presence of leading platform vendors. However, the Asia Pacific region is poised for the fastest growth over the forecast period, driven by rapid digital transformation, increasing investments in AI and analytics, and expanding talent pools in countries such as China, India, and Japan. Europe also represents a significant market, characterized by stringent data privacy regulations and a growing focus on responsible AI and ethical data practices. Meanwhile, Latin America and the Middle East & Africa are witnessing gradual adoption, supported by government initiatives and the rising penetration of cloud-based solutions.



    Component Analysis



    The Data Science Notebook Platform market is segmented by component into Software and Services, each playing a critical role in the overall ecosystem. The software segment encompasses a broad range of notebook solutions, including open-source platforms like Jupyter and proprietary offerings from major technology vendors. These platforms are designed to provide interactive development environments where users can write, execute, and visualize code, making it easier

  10. Z

    Inputs of the Jupyter Notebook - Met Office UKV high-resolution atmosphere...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Sep 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Environmental Data Science Community (2022). Inputs of the Jupyter Notebook - Met Office UKV high-resolution atmosphere model data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7087008
    Explore at:
    Dataset updated
    Sep 18, 2022
    Authors
    Environmental Data Science Community
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains the inputs of the notebook "Met Office UKV high-resolution atmosphere model data" published in The Environmental Data Science Book.

    The input data refer to a subset of single sample data file for 1.5 m temperature as part of the Met Office contribution to the COVID 19 modelling effort.

    The full dataset was available for download from the Met Office Azure (https://metdatasa.blob.core.windows.net/covid19-response-non-commercial/). The full dataset was available for download under the terms of non-commercial purposes.

    Contributions

    Notebook

    Samantha V. Adams (author), Met Office Informatics Lab, @svadams

    Alejandro Coca-Castro (reviewer), The Alan Turing Institute, @acocac

    Dataset originator/creator

    Met Office Informatics Lab (creator)

    Microsoft (support)

    European Regional Development Fund (support)

    Dataset authors

    Met Office

    Dataset documentation

    Theo McCaie. Met office and partners offer data and compute platform for covid-19 researchers. URL: https://medium.com/informatics-lab/met-office-and-partners-offer-data-and-compute-platform-for-covid-19-researchers-83848ac55f5f.

    Note this data should be used only for non-commercial purposes.

  11. d

    Hydroinformatics Instruction Module Example Code: Databases and SQL in...

    • search.dataone.org
    • beta.hydroshare.org
    • +1more
    Updated Dec 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amber Spackman Jones; Jeffery S. Horsburgh; Camilo J. Bastidas Pacheco (2023). Hydroinformatics Instruction Module Example Code: Databases and SQL in Python [Dataset]. https://search.dataone.org/view/sha256%3A2f7a187ad86e4d584cd35755a67398ffa67d6ebfc81dc1ec01539b85ccd827dc
    Explore at:
    Dataset updated
    Dec 30, 2023
    Dataset provided by
    Hydroshare
    Authors
    Amber Spackman Jones; Jeffery S. Horsburgh; Camilo J. Bastidas Pacheco
    Description

    This resource contains Jupyter Notebooks with examples that illustrate how to work with SQLite databases in Python including database creation and viewing and querying with SQL. The resource is part of set of materials for hydroinformatics and water data science instruction. Complete learning module materials are found in HydroLearn: Jones, A.S., Horsburgh, J.S., Bastidas Pacheco, C.J. (2022). Hydroinformatics and Water Data Science. HydroLearn. https://edx.hydrolearn.org/courses/course-v1:USU+CEE6110+2022/about..

    This resources consists of 3 example notebooks and a SQLite database.

    Notebooks: 1. Example 1: Querying databases using SQL in Python 2. Example 2: Python functions to query SQLite databases 3. Example 3: SQL join, aggregate, and subquery functions

    Data files: These examples use a SQLite database that uses the Observations Data Model structure and is pre-populated with Logan River temperature data.

  12. Z

    Outputs of the Jupyter Notebook - Tree crown delineation using detectreeRGB

    • data.niaid.nih.gov
    Updated Mar 28, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Environmental Data Science Community (2022). Outputs of the Jupyter Notebook - Tree crown delineation using detectreeRGB [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6387952
    Explore at:
    Dataset updated
    Mar 28, 2022
    Authors
    Environmental Data Science Community
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains the outputs of the notebook "Tree crown detection using DeepForest" published in The Environmental Data Science Book.

    Contributions

    Notebook

    Sebastian H. M. Hickman (author), University of Cambridge, @shmh40

    Alejandro Coca-Castro (reviewer), The Alan Turing Institute, @acocac

    Modelling codebase

    Sebastian H. M. Hickman (author), University of Cambridge @shmh40

    James G. C. Ball (contributor), University of Cambridge @PatBall1

    David A. Coomes (contributor), University of Cambridge

    Toby Jackson (contributor), University of Cambridge

  13. Z

    Outputs of the Jupyter Notebook - Met Office UKV high-resolution atmosphere...

    • data.niaid.nih.gov
    Updated Jul 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Environmental Data Science Community (2024). Outputs of the Jupyter Notebook - Met Office UKV high-resolution atmosphere model data [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_5984712
    Explore at:
    Dataset updated
    Jul 17, 2024
    Authors
    Environmental Data Science Community
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains the outputs of the notebook "Met Office UKV high-resolution atmosphere model data" published in the urban sensors section of The Environmental Data Science Book.

    Contributions

    Notebook

    Samantha V. Adams (author), Met Office Informatics Lab, @svadams

    Alejandro Coca-Castro (reviewer), The Alan Turing Institute, @acocac

    Dataset originator/creator

    Met Office Informatics Lab (creator)

    Microsoft (support)

    European Regional Development Fund (support)

    Dataset authors

    Met Office

    Dataset documentation

    Theo McCaie. Met office and partners offer data and compute platform for covid-19 researchers. URL: https://medium.com/informatics-lab/met-office-and-partners-offer-data-and-compute-platform-for-covid-19-researchers-83848ac55f5f.

  14. Big Data Analytics for Scanning Transmission Electron Microscopy...

    • osti.gov
    Updated Aug 9, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Beekman, Christianne; Belianinov, Alex; Borisevich, Albina Y; Chi, Miaofang; Jesse, Stephen; Kalinin, Sergei V; Lupini, Andrew R; Somnath, Suhas (2018). Big Data Analytics for Scanning Transmission Electron Microscopy Ptychography [Dataset]. https://www.osti.gov/dataexplorer/biblio/1463599-big-data-analytics-scanning-transmission-electron-microscopy-ptychography
    Explore at:
    Dataset updated
    Aug 9, 2018
    Dataset provided by
    Office of Sciencehttp://www.er.doe.gov/
    Oak Ridge Leadership Computing Facility; Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
    Authors
    Beekman, Christianne; Belianinov, Alex; Borisevich, Albina Y; Chi, Miaofang; Jesse, Stephen; Kalinin, Sergei V; Lupini, Andrew R; Somnath, Suhas
    Description

    Dataset containing the raw data and results from analyses, along with supporting Jupyter notebook that shows the processing of data in the following paper: Big Data Analytics for Scanning Transmission Electron Microscopy Ptychography S. Jesse, M. Chi, A. Belianinov, C. Beekman, S. V. Kalinin, A. Y. Borisevich & A. R. Lupini Scientific Reports volume 6, Article number: 26348 (2016) https://www.nature.com/articles/srep26348

  15. Z

    Outputs of the Jupyter Notebook - MODIS MOD021KM and FIRMS

    • data.niaid.nih.gov
    Updated Jul 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Environmental Data Science Community (2024). Outputs of the Jupyter Notebook - MODIS MOD021KM and FIRMS [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_6065610
    Explore at:
    Dataset updated
    Jul 17, 2024
    Authors
    Environmental Data Science Community
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains the outputs of the notebook "MODIS MOD021KM and FIRMS" published in The Environmental Data Science Book.

    Contributions

    Notebook

    Samuel Jackson (author), Science & Technology Facilities Council, @samueljackson92

    Alejandro Coca-Castro (reviewer), The Alan Turing Institute, @acocac

    Dataset originator/creator

    MOD021KM

    MODIS Characterization Support Team (MCST)

    MODIS Adaptive Processing System (MODAPS)

    Firms

    University of Maryland

    Dataset authors

    MOD021KM

    MODIS Science Data Support Team (SDST)

    Firms

    NASA’s Applied Sciences Program

    Dataset documentation

    Louis Giglio, Wilfrid Schroeder, Joanne V. Hall, and Christopher O. Justice. MODIS Collection 6 Active Fire Product User’s Guide Revision B. Technical Report, NASA, 2018. URL: https://modis-fire.umd.edu/files/MODIS_C6_Fire_User_Guide_B.pdf.

  16. d

    Data from: GeoThermalCloud framework for fusion of big data and...

    • catalog.data.gov
    • gdr.openei.org
    • +2more
    Updated Jan 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Los Alamos National Laboratory (2025). GeoThermalCloud framework for fusion of big data and multi-physics models in Nevada and Southwest New Mexico [Dataset]. https://catalog.data.gov/dataset/geothermalcloud-framework-for-fusion-of-big-data-and-multi-physics-models-in-nevada-and-so-31a4e
    Explore at:
    Dataset updated
    Jan 20, 2025
    Dataset provided by
    Los Alamos National Laboratory
    Area covered
    New Mexico
    Description

    Our GeoThermalCloud framework is designed to process geothermal datasets using a novel toolbox for unsupervised and physics-informed machine learning called SmartTensors. More information about GeoThermalCloud can be found at the GeoThermalCloud GitHub Repository. More information about SmartTensors can be found at the SmartTensors Github Repository and the SmartTensors page at LANL.gov. Links to these pages are included in this submission. GeoThermalCloud.jl is a repository containing all the data and codes required to demonstrate applications of machine learning methods for geothermal exploration. GeoThermalCloud.jl includes: - site data - simulation scripts - jupyter notebooks - intermediate results - code outputs - summary figures - readme markdown files GeoThermalCloud.jl showcases the machine learning analyses performed for the following geothermal sites: - Brady: geothermal exploration of the Brady geothermal site, Nevada - SWNM: geothermal exploration of the Southwest New Mexico (SWNM) region - GreatBasin: geothermal exploration of the Great Basin region, Nevada Reports, research papers, and presentations summarizing these machine learning analyses are also available and will be posted soon.

  17. Arcade Natural Language to Code Challenge

    • kaggle.com
    zip
    Updated Feb 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google AI (2023). Arcade Natural Language to Code Challenge [Dataset]. https://www.kaggle.com/datasets/googleai/arcade-nl2code-dataset
    Explore at:
    zip(3921922 bytes)Available download formats
    Dataset updated
    Feb 22, 2023
    Dataset authored and provided by
    Google AI
    Description

    Arcade: Natural Language to Code Generation in Interactive Computing Notebooks

    Arcade is a collection of natural language to code problems on interactive data science notebooks. Each problem features an NL intent as problem specification, a reference code solution, and preceding notebook context (Markdown or code cells). Arcade can be used to evaluate the accuracies of code large language models in generating data science programs given natural language instructions. Please read our paper for more details.

    Note👉 This Kaggle dataset only contains the dataset files of Arcade. Refer to our main Github repository for detailed instructions to use this dataset.

    Folder Structure

    Below is the structure of its content:

    └── ./
      ├── existing_tasks # Problems derived from existing data science notebooks on Github/
      │  ├── metadata.json # Metadata by `build_existing_tasks_split.py` to reproduce this split.
      │  ├── artifacts/ # Folder that stores dependent ML datasets to execute the problems, created by running `build_existing_tasks_split.py`
      │  └── derived_datasets/ # Folder for preprocessed datasets used for prompting experiments.
      ├── new_tasks/
      │  ├── dataset.json # Original, unprepossessed dataset
      │  ├── kaggle_dataset_provenance.csv # Metadata of the Kaggle datasets used to build this split.
      │  ├── artifacts/ # Folder that stores dependent ML Kaggle datasets to execute the problems, created by running `build_new_tasks_split.py`
      │  └── derived_datasets/ # Folder for preprocessed datasets used for prompting experiments.
      └── checksums.txt # Table of MD5 checksums of dataset files.
    

    Dataset File Structure

    All the dataset '*.json' files follow the same structure. Each dataset file is a Json-serialized list of Episodes. Each episode corresponds to a notebook annotated with NL-to-code problems. The structure of an episode is documented below:

    {
      "notebook_name": "Name of the notebook.",
      "work_dir": "Path to the dependent data artifacts (e.g., ML datasets) to execute the notebook.",
      "annotator": "Anonymized annotator Id."
      "turns": [
        # A list of natural language to code examples using the current notebook context.
        {
          "input": "Prompt to a code generation model.",
          "turn": {
            "intent": {
              "value": "Annotated NL intent for the current turn.",
              "is_cell_intent": "Metadata used for the existing tasks split to indicate if the code solution is only part of an existing code cell.",
              "cell_idx": "Index of the intent Markdown cell.",
              "line_span": "Line span of the intent.",
              "not_sure": "Annotation confidence.",
              "output_variables": "List of variable names denoting the output. If None, use the output of the last line of code as the output of the problem.",
            },
            "code": {
              "value": "Reference code solution.",
              "cell_idx": "Cell index of the code cell containing the solution.",
              "num_lines": "Number of lines in the reference solution.",
              "line_span": "Line span.",
            },
            "code_context": "Context code (all code cells before this problem) that need to be executed before executing the reference/predicted programs.",
            "delta_code_context": "Delta context code between the last problem in this notebook and the current problem, useful for incremental execution.",
            "metadata": {
              "annotator_id": "Annotator Id",
              "num_code_lines": "Metadata, please ignore.",
              "utterance_without_output_spec": "Annotated NL intent without output specification. Refer to the paper for details.",
            },
          },
          "notebook": "Field intended to store the Json-serialized Jupyter notebook. Not used for now since the notebook can be reconstructed from other metadata in this file.",
          "metadata": {
            # A dict of metadata of this turn.
            "context_cells": [ # A list of context cells before the problem.
              {
                "cell_type": "code|markdown",
                "source": "Cell content."
              },
            ],
            "delta_cell_num": "Number of preceding context cells between the prior turn and the current turn.",
            # The following fields only occur in datasets inlined with schema descriptions.
            "context_cell_num": "Number of context cells in the prompt after inserting schema descriptions and left-truncation.",
            "inten...
    
  18. Z

    Inputs of the Jupyter Notebook - Cosmos-UK soil moisture

    • data.niaid.nih.gov
    • zenodo.org
    Updated Dec 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Environmental Data Science Community (2022). Inputs of the Jupyter Notebook - Cosmos-UK soil moisture [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6562981
    Explore at:
    Dataset updated
    Dec 7, 2022
    Authors
    Environmental Data Science Community
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United Kingdom
    Description

    The dataset contains the inputs of the notebook "Cosmos-UK soil moisture" published in The Environmental Data Science Book.

    The input data refer to a subset of the public 2013-2019 COSMOS-UK dataset, daily and subhourly observations and metadata for four stations: WYTH1, WADDN, SHEEP and CHIMN. These stations represent the first sites to prototype COSMOS sensors in the UK, see further details in Evans et al. (2016) and they are situated in human-intervened areas (grassland and cropland), except for one in a woodland land cover site.

    Data from COSMOS-UK up to the end of 2019 are available for download from the UKCEH Environmental Information Data Centre (EIDC). The data are accompanied by documentation that describes the site-specific instrumentation, data and processing including quality control. The full dataset is available for download under the terms of the Open Government License.

    Contributions

    Notebook

    Alejandro Coca-Castro (author), The Alan Turing Institute, @acocac

    Doran Khamis (reviewer), UK Centre for Ecology & Hydrology, @dorankhamis

    Matt Fry (reviewer), UK Centre for Ecology & Hydrology, @mattfry-ceh

    Dataset originator/creator

    UK Centre for Ecology & Hydrology (creator)

    Natural Environment Research Council (support)

    Dataset reference and documentation

    S. Stanley, V. Antoniou, A. Askquith-Ellis, L.A. Ball, E.S. Bennett, J.R. Blake, D.B. Boorman, M. Brooks, M. Clarke, H.M. Cooper, N. Cowan, A. Cumming, J.G. Evans, P. Farrand, M. Fry, O.E. Hitt, W.D. Lord, R. Morrison, G.V. Nash, D. Rylett, P.M. Scarlett, O.D. Swain, M. Szczykulska, J.L. Thornton, E.J. Trill, A.C. Warwick, and B. Winterbourn. Daily and sub-daily hydrometeorological and soil data (2013-2019) [cosmos-uk]. 2021. URL: https://doi.org/10.5285/b5c190e4-e35d-40ea-8fbe-598da03a1185, doi:10.5285/b5c190e4-e35d-40ea-8fbe-598da03a1185.

    Further references

    Jonathan G. Evans, H. C. Ward, J. R. Blake, E. J. Hewitt, R. Morrison, M. Fry, L. A. Ball, L. C. Doughty, J. W. Libre, O. E. Hitt, D. Rylett, R. J. Ellis, A. C. Warwick, M. Brooks, M. A. Parkes, G. M.H. Wright, A. C. Singer, D. B. Boorman, and A. Jenkins. Soil water content in southern england derived from a cosmic-ray soil moisture observing system – cosmos-uk. Hydrological Processes, 30:4987–4999, 12 2016. doi:10.1002/hyp.10929.

    M. Zreda, W. J. Shuttleworth, X. Zeng, C. Zweck, D. Desilets, T. Franz, and R. Rosolem. Cosmos: the cosmic-ray soil moisture observing system. Hydrology and Earth System Sciences, 16(11):4079–4099, 2012. URL: https://hess.copernicus.org/articles/16/4079/2012/, doi:10.5194/hess-16-4079-2012.

  19. Z

    Outputs of the Jupyter Notebook - Deep learning and variational inversion to...

    • data.niaid.nih.gov
    Updated Aug 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Environmental Data Science book Community (2023). Outputs of the Jupyter Notebook - Deep learning and variational inversion to quantify and attribute climate change (CIRC23) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8279574
    Explore at:
    Dataset updated
    Aug 25, 2023
    Authors
    Environmental Data Science book Community
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains the outputs of the notebook "Deep learning and variational inversion to quantify and attribute climate change (CIRC23)" published in The Environmental Data Science Book.

  20. P

    Python Integrated Development Environment (IDE) Software Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated May 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Python Integrated Development Environment (IDE) Software Report [Dataset]. https://www.datainsightsmarket.com/reports/python-integrated-development-environment-ide-software-1971834
    Explore at:
    ppt, pdf, docAvailable download formats
    Dataset updated
    May 18, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Python IDE market is booming, projected to reach $1.08B by 2033 at an 8.1% CAGR. Explore key trends, leading companies (PyCharm, Eclipse, AWS Cloud9), and regional market analysis in this comprehensive report. Discover the impact of cloud-based IDEs and the future of Python development tools.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Timo Bozsolik (2021). Python Data Science Handbook [Dataset]. https://www.kaggle.com/timoboz/python-data-science-handbook
Organization logo

Python Data Science Handbook

The entire Python Data Science Handbook, in the form of free Jupyter notebooks.

Explore at:
zip(16028316 bytes)Available download formats
Dataset updated
Dec 20, 2021
Authors
Timo Bozsolik
Description

Python Data Science Handbook

This repository contains the entire Python Data Science Handbook, in the form of (free!) Jupyter notebooks.

cover image

How to Use this Book

About

The book was written and tested with Python 3.5, though other Python versions (including Python 2.7) should work in nearly all cases.

The book introduces the core libraries essential for working with data in Python: particularly IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and related packages. Familiarity with Python as a language is assumed; if you need a quick introduction to the language itself, see the free companion project, A Whirlwind Tour of Python: it's a fast-paced introduction to the Python language aimed at researchers and scientists.

See Index.ipynb for an index of the notebooks available to accompany the text.

Software

The code in the book was tested with Python 3.5, though most (but not all) will also work correctly with Python 2.7 and other older Python versions.

The packages I used to run the code in the book are listed in requirements.txt (Note that some of these exact version numbers may not be available on your platform: you may have to tweak them for your own use). To install the requirements using conda, run the following at the command-line:

$ conda install --file requirements.txt

To create a stand-alone environment named PDSH with Python 3.5 and all the required package versions, run the following:

$ conda create -n PDSH python=3.5 --file requirements.txt

You can read more about using conda environments in the Managing Environments section of the conda documentation.

License

Code

The code in this repository, including all code samples in the notebooks listed above, is released under the MIT license. Read more at the Open Source Initiative.

Text

The text content of the book is released under the CC-BY-NC-ND license. Read more at Creative Commons.

Search
Clear search
Close search
Google apps
Main menu