91 datasets found
  1. Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus...

    • figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Richard Ferrers; Speedtest Global Index (2023). Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus ALC - 2020, 2022 [Dataset]. http://doi.org/10.6084/m9.figshare.13621169.v24
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Richard Ferrers; Speedtest Global Index
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset compares four cities FIXED-line broadband internet speeds: - Melbourne, AU - Bangkok, TH - Shanghai, CN - Los Angeles, US - Alice Springs, AU

    ERRATA: 1.Data is for Q3 2020, but some files are labelled incorrectly as 02-20 of June 20. They all should read Sept 20, or 09-20 as Q3 20, rather than Q2. Will rename and reload. Amended in v7.

    1. LAX file named 0320, when should be Q320. Amended in v8.

    *lines of data for each geojson file; a line equates to a 600m^2 location, inc total tests, devices used, and average upload and download speed - MEL 16181 locations/lines => 0.85M speedtests (16.7 tests per 100people) - SHG 31745 lines => 0.65M speedtests (2.5/100pp) - BKK 29296 lines => 1.5M speedtests (14.3/100pp) - LAX 15899 lines => 1.3M speedtests (10.4/100pp) - ALC 76 lines => 500 speedtests (2/100pp)

    Geojsons of these 2* by 2* extracts for MEL, BKK, SHG now added, and LAX added v6. Alice Springs added v15.

    This dataset unpacks, geospatially, data summaries provided in Speedtest Global Index (linked below). See Jupyter Notebook (*.ipynb) to interrogate geo data. See link to install Jupyter.

    ** To Do Will add Google Map versions so everyone can see without installing Jupyter. - Link to Google Map (BKK) added below. Key:Green > 100Mbps(Superfast). Black > 500Mbps (Ultrafast). CSV provided. Code in Speedtestv1.1.ipynb Jupyter Notebook. - Community (Whirlpool) surprised [Link: https://whrl.pl/RgAPTl] that Melb has 20% at or above 100Mbps. Suggest plot Top 20% on map for community. Google Map link - now added (and tweet).

    ** Python melb = au_tiles.cx[144:146 , -39:-37] #Lat/Lon extract shg = tiles.cx[120:122 , 30:32] #Lat/Lon extract bkk = tiles.cx[100:102 , 13:15] #Lat/Lon extract lax = tiles.cx[-118:-120, 33:35] #lat/Lon extract ALC=tiles.cx[132:134, -22:-24] #Lat/Lon extract

    Histograms (v9), and data visualisations (v3,5,9,11) will be provided. Data Sourced from - This is an extract of Speedtest Open data available at Amazon WS (link below - opendata.aws).

    **VERSIONS v.24 Add tweet and google map of Top 20% (over 100Mbps locations) in Mel Q322. Add v.1.5 MEL-Superfast notebook, and CSV of results (now on Google Map; link below). v23. Add graph of 2022 Broadband distribution, and compare 2020 - 2022. Updated v1.4 Jupyter notebook. v22. Add Import ipynb; workflow-import-4cities. v21. Add Q3 2022 data; five cities inc ALC. Geojson files. (2020; 4.3M tests 2022; 2.9M tests)

    Melb 14784 lines Avg download speed 69.4M Tests 0.39M

    SHG 31207 lines Avg 233.7M Tests 0.56M

    ALC 113 lines Avg 51.5M Test 1092

    BKK 29684 lines Avg 215.9M Tests 1.2M

    LAX 15505 lines Avg 218.5M Tests 0.74M

    v20. Speedtest - Five Cities inc ALC. v19. Add ALC2.ipynb. v18. Add ALC line graph. v17. Added ipynb for ALC. Added ALC to title.v16. Load Alice Springs Data Q221 - csv. Added Google Map link of ALC. v15. Load Melb Q1 2021 data - csv. V14. Added Melb Q1 2021 data - geojson. v13. Added Twitter link to pics. v12 Add Line-Compare pic (fastest 1000 locations) inc Jupyter (nbn-intl-v1.2.ipynb). v11 Add Line-Compare pic, plotting Four Cities on a graph. v10 Add Four Histograms in one pic. v9 Add Histogram for Four Cities. Add NBN-Intl.v1.1.ipynb (Jupyter Notebook). v8 Renamed LAX file to Q3, rather than 03. v7 Amended file names of BKK files to correctly label as Q3, not Q2 or 06. v6 Added LAX file. v5 Add screenshot of BKK Google Map. v4 Add BKK Google map(link below), and BKK csv mapping files. v3 replaced MEL map with big key version. Prev key was very tiny in top right corner. v2 Uploaded MEL, SHG, BKK data and Jupyter Notebook v1 Metadata record

    ** LICENCE AWS data licence on Speedtest data is "CC BY-NC-SA 4.0", so use of this data must be: - non-commercial (NC) - reuse must be share-alike (SA)(add same licence). This restricts the standard CC-BY Figshare licence.

    ** Other uses of Speedtest Open Data; - see link at Speedtest below.

  2. Jupyter Notebook Activity Dataset (rsds-20241113)

    • zenodo.org
    application/gzip, zip
    Updated Jan 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tomoki Nakamaru; Tomoki Nakamaru; Tomomasa Matsunaga; Tetsuro Yamazaki; Tomomasa Matsunaga; Tetsuro Yamazaki (2025). Jupyter Notebook Activity Dataset (rsds-20241113) [Dataset]. http://doi.org/10.5281/zenodo.13357570
    Explore at:
    zip, application/gzipAvailable download formats
    Dataset updated
    Jan 18, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Tomoki Nakamaru; Tomoki Nakamaru; Tomomasa Matsunaga; Tetsuro Yamazaki; Tomomasa Matsunaga; Tetsuro Yamazaki
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    List of data

    • rsds-20241113.zip: Collection of SQLite database files
    • image.tar.gz: Docker image provided in our data collection experiment
    • redspot-341ffa5.zip: Redspot source code (redspot@341ffa5)

    Extended version of Section 2D of our paper

    Redspot is a Jupyter extension (i.e., Python package) that records activity signals. However, it also offers interfaces to read recorded signals. The following shows the most basic usage of its command-line interface:
    redspot replay


    This command generates snapshots (.ipynb files) restored from the signal records. Note that this command does not produce a snapshot for every signal. Since the change represented by a single signal is typically minimal (e.g., one keystroke), generating a snapshot for each signal results in a meaninglessly large number of snapshots. However, we want to obtain signal-level snapshots for some analyses. In such cases, one can analyze them using the application programming interfaces:

    from redspot import database
    from redspot.notebook import Notebook
    nbk = Notebook()
    for signal in database.get("path-to-db"):
    time, panel, kind, args = signal
    nbk.apply(kind, args) # apply change
    print(nbk) # print notebook

    To record activities, one needs to run the Redspot command in the recording mode as follows:

    redspot record

    This command launches Jupyter Notebook with Redspot enabled. Activities made in the launched environment are stored in an SQLite file named ``redspot.db'' under the current path.

    To launch the environment we provided to the participants, one first needs to download and import the image (image.tar.gz). One can then run the image with the following command:

    docker run --rm -it -p8888:8888

    Note that the SQLite file is generated in the running container. The file can be downloaded into the host machine via the file viewer of Jupyter Notebook.

  3. Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter...

    • zenodo.org
    • explore.openaire.eu
    bz2
    Updated Mar 15, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana (2021). Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter Notebooks [Dataset]. http://doi.org/10.5281/zenodo.2592524
    Explore at:
    bz2Available download formats
    Dataset updated
    Mar 15, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of Jupyter Notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we analyzed 1.4 million notebooks from GitHub.

    Paper: https://2019.msrconf.org/event/msr-2019-papers-a-large-scale-study-about-quality-and-reproducibility-of-jupyter-notebooks

    This repository contains two files:

    • dump.tar.bz2
    • jupyter_reproducibility.tar.bz2

    The dump.tar.bz2 file contains a PostgreSQL dump of the database, with all the data we extracted from the notebooks.

    The jupyter_reproducibility.tar.bz2 file contains all the scripts we used to query and download Jupyter Notebooks, extract data from them, and analyze the data. It is organized as follows:

    • analyses: this folder has all the notebooks we use to analyze the data in the PostgreSQL database.
    • archaeology: this folder has all the scripts we use to query, download, and extract data from GitHub notebooks.
    • paper: empty. The notebook analyses/N12.To.Paper.ipynb moves data to it

    In the remaining of this text, we give instructions for reproducing the analyses, by using the data provided in the dump and reproducing the collection, by collecting data from GitHub again.

    Reproducing the Analysis

    This section shows how to load the data in the database and run the analyses notebooks. In the analysis, we used the following environment:

    Ubuntu 18.04.1 LTS
    PostgreSQL 10.6
    Conda 4.5.11
    Python 3.7.2
    PdfCrop 2012/11/02 v1.38

    First, download dump.tar.bz2 and extract it:

    tar -xjf dump.tar.bz2

    It extracts the file db2019-03-13.dump. Create a database in PostgreSQL (we call it "jupyter"), and use psql to restore the dump:

    psql jupyter < db2019-03-13.dump

    It populates the database with the dump. Now, configure the connection string for sqlalchemy by setting the environment variable JUP_DB_CONNECTTION:

    export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter";

    Download and extract jupyter_reproducibility.tar.bz2:

    tar -xjf jupyter_reproducibility.tar.bz2

    Create a conda environment with Python 3.7:

    conda create -n analyses python=3.7
    conda activate analyses

    Go to the analyses folder and install all the dependencies of the requirements.txt

    cd jupyter_reproducibility/analyses
    pip install -r requirements.txt

    For reproducing the analyses, run jupyter on this folder:

    jupyter notebook

    Execute the notebooks on this order:

    • Index.ipynb
    • N0.Repository.ipynb
    • N1.Skip.Notebook.ipynb
    • N2.Notebook.ipynb
    • N3.Cell.ipynb
    • N4.Features.ipynb
    • N5.Modules.ipynb
    • N6.AST.ipynb
    • N7.Name.ipynb
    • N8.Execution.ipynb
    • N9.Cell.Execution.Order.ipynb
    • N10.Markdown.ipynb
    • N11.Repository.With.Notebook.Restriction.ipynb
    • N12.To.Paper.ipynb

    Reproducing or Expanding the Collection

    The collection demands more steps to reproduce and takes much longer to run (months). It also involves running arbitrary code on your machine. Proceed with caution.

    Requirements

    This time, we have extra requirements:

    All the analysis requirements
    lbzip2 2.5
    gcc 7.3.0
    Github account
    Gmail account

    Environment

    First, set the following environment variables:

    export JUP_MACHINE="db"; # machine identifier
    export JUP_BASE_DIR="/mnt/jupyter/github"; # place to store the repositories
    export JUP_LOGS_DIR="/home/jupyter/logs"; # log files
    export JUP_COMPRESSION="lbzip2"; # compression program
    export JUP_VERBOSE="5"; # verbose level
    export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter"; # sqlchemy connection
    export JUP_GITHUB_USERNAME="github_username"; # your github username
    export JUP_GITHUB_PASSWORD="github_password"; # your github password
    export JUP_MAX_SIZE="8000.0"; # maximum size of the repositories directory (in GB)
    export JUP_FIRST_DATE="2013-01-01"; # initial date to query github
    export JUP_EMAIL_LOGIN="gmail@gmail.com"; # your gmail address
    export JUP_EMAIL_TO="target@email.com"; # email that receives notifications
    export JUP_OAUTH_FILE="~/oauth2_creds.json" # oauth2 auhentication file
    export JUP_NOTEBOOK_INTERVAL=""; # notebook id interval for this machine. Leave it in blank
    export JUP_REPOSITORY_INTERVAL=""; # repository id interval for this machine. Leave it in blank
    export JUP_WITH_EXECUTION="1"; # run execute python notebooks
    export JUP_WITH_DEPENDENCY="0"; # run notebooks with and without declared dependnecies
    export JUP_EXECUTION_MODE="-1"; # run following the execution order
    export JUP_EXECUTION_DIR="/home/jupyter/execution"; # temporary directory for running notebooks
    export JUP_ANACONDA_PATH="~/anaconda3"; # conda installation path
    export JUP_MOUNT_BASE="/home/jupyter/mount_ghstudy.sh"; # bash script to mount base dir
    export JUP_UMOUNT_BASE="/home/jupyter/umount_ghstudy.sh"; # bash script to umount base dir
    export JUP_NOTEBOOK_TIMEOUT="300"; # timeout the extraction
    
    
    # Frequenci of log report
    export JUP_ASTROID_FREQUENCY="5";
    export JUP_IPYTHON_FREQUENCY="5";
    export JUP_NOTEBOOKS_FREQUENCY="5";
    export JUP_REQUIREMENT_FREQUENCY="5";
    export JUP_CRAWLER_FREQUENCY="1";
    export JUP_CLONE_FREQUENCY="1";
    export JUP_COMPRESS_FREQUENCY="5";
    
    export JUP_DB_IP="localhost"; # postgres database IP

    Then, configure the file ~/oauth2_creds.json, according to yagmail documentation: https://media.readthedocs.org/pdf/yagmail/latest/yagmail.pdf

    Configure the mount_ghstudy.sh and umount_ghstudy.sh scripts. The first one should mount the folder that stores the directories. The second one should umount it. You can leave the scripts in blank, but it is not advisable, as the reproducibility study runs arbitrary code on your machine and you may lose your data.

    Scripts

    Download and extract jupyter_reproducibility.tar.bz2:

    tar -xjf jupyter_reproducibility.tar.bz2

    Install 5 conda environments and 5 anaconda environments, for each python version. In each of them, upgrade pip, install pipenv, and install the archaeology package (Note that it is a local package that has not been published to pypi. Make sure to use the -e option):

    Conda 2.7

    conda create -n raw27 python=2.7 -y
    conda activate raw27
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 2.7

    conda create -n py27 python=2.7 anaconda -y
    conda activate py27
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology
    

    Conda 3.4

    It requires a manual jupyter and pathlib2 installation due to some incompatibilities found on the default installation.

    conda create -n raw34 python=3.4 -y
    conda activate raw34
    conda install jupyter -c conda-forge -y
    conda uninstall jupyter -y
    pip install --upgrade pip
    pip install jupyter
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology
    pip install pathlib2

    Anaconda 3.4

    conda create -n py34 python=3.4 anaconda -y
    conda activate py34
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.5

    conda create -n raw35 python=3.5 -y
    conda activate raw35
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 3.5

    It requires the manual installation of other anaconda packages.

    conda create -n py35 python=3.5 anaconda -y
    conda install -y appdirs atomicwrites keyring secretstorage libuuid navigator-updater prometheus_client pyasn1 pyasn1-modules spyder-kernels tqdm jeepney automat constantly anaconda-navigator
    conda activate py35
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.6

    conda create -n raw36 python=3.6 -y
    conda activate raw36
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 3.6

    conda create -n py36 python=3.6 anaconda -y
    conda activate py36
    conda install -y anaconda-navigator jupyterlab_server navigator-updater
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.7

    <code

  4. d

    Reporting behavior from WHO COVID-19 public data

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Nov 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Auss Abbood (2023). Reporting behavior from WHO COVID-19 public data [Dataset]. http://doi.org/10.5061/dryad.9s4mw6mmb
    Explore at:
    Dataset updated
    Nov 29, 2023
    Dataset provided by
    Dryad Digital Repository
    Authors
    Auss Abbood
    Time period covered
    Dec 16, 2022
    Description

    Objective Daily COVID-19 data reported by the World Health Organization (WHO) may provide the basis for political ad hoc decisions including travel restrictions. Data reported by countries, however, is heterogeneous and metrics to evaluate its quality are scarce. In this work, we analyzed COVID-19 case counts provided by WHO and developed tools to evaluate country-specific reporting behaviors. Methods In this retrospective cross-sectional study, COVID-19 data reported daily to WHO from 3rd January 2020 until 14th June 2021 were analyzed. We proposed the concepts of binary reporting rate and relative reporting behavior and performed descriptive analyses for all countries with these metrics. We developed a score to evaluate the consistency of incidence and binary reporting rates. Further, we performed spectral clustering of the binary reporting rate and relative reporting behavior to identify salient patterns in these metrics. Results Our final analysis included 222 countries and regions...., Data collection COVID-19 data was downloaded from WHO. Using a public repository, we have added the countries' full names to the WHO data set using the two-letter abbreviations for each country to merge both data sets. The provided COVID-19 data covers January 2020 until June 2021. We uploaded the final data set used for the analyses of this paper. Data processing We processed data using a Jupyter Notebook with a Python kernel and publically available external libraries. This upload contains the required Jupyter Notebook (reporting_behavior.ipynb) with all analyses and some additional work, a README, and the conda environment yml (env.yml)., Any text editor including Microsoft Excel and their free alternatives can open the uploaded CSV file. Any web browser and some code editors (like the freely available Visual Studio Code) can show the uploaded Jupyter Notebook if the required Python environment is set up correctly.

  5. d

    (HS 2) Automate Workflows using Jupyter notebook to create Large Extent...

    • search.dataone.org
    • hydroshare.org
    Updated Oct 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Young-Don Choi (2024). (HS 2) Automate Workflows using Jupyter notebook to create Large Extent Spatial Datasets [Dataset]. http://doi.org/10.4211/hs.a52df87347ef47c388d9633925cde9ad
    Explore at:
    Dataset updated
    Oct 19, 2024
    Dataset provided by
    Hydroshare
    Authors
    Young-Don Choi
    Description

    We implemented automated workflows using Jupyter notebooks for each state. The GIS processing, crucial for merging, extracting, and projecting GeoTIFF data, was performed using ArcPy—a Python package for geographic data analysis, conversion, and management within ArcGIS (Toms, 2015). After generating state-scale LES (large extent spatial) datasets in GeoTIFF format, we utilized the xarray and rioxarray Python packages to convert GeoTIFF to NetCDF. Xarray is a Python package to work with multi-dimensional arrays and rioxarray is rasterio xarray extension. Rasterio is a Python library to read and write GeoTIFF and other raster formats. Xarray facilitated data manipulation and metadata addition in the NetCDF file, while rioxarray was used to save GeoTIFF as NetCDF. These procedures resulted in the creation of three HydroShare resources (HS 3, HS 4 and HS 5) for sharing state-scale LES datasets. Notably, due to licensing constraints with ArcGIS Pro, a commercial GIS software, the Jupyter notebook development was undertaken on a Windows OS.

  6. H

    Jupyter Notebooks for the ERA5 Data Component

    • hydroshare.org
    • beta.hydroshare.org
    • +1more
    zip
    Updated Apr 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tian Gan (2025). Jupyter Notebooks for the ERA5 Data Component [Dataset]. https://www.hydroshare.org/resource/765e4e8bebab4eea8d72ee447a27c2fa
    Explore at:
    zip(29.0 KB)Available download formats
    Dataset updated
    Apr 30, 2025
    Dataset provided by
    HydroShare
    Authors
    Tian Gan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This resource includes two Jupyter Notebooks as a quick start tutorial for the ERA5 Data Component of the PyMT modeling framework (https://pymt.readthedocs.io/) developed by Community Surface Dynamics Modeling System (CSDMS https://csdms.colorado.edu/).

    The bmi_era5 package is an implementation of the Basic Model Interface (BMI https://bmi.readthedocs.io/en/latest/) for the ERA5 dataset (https://confluence.ecmwf.int/display/CKB/ERA5). This package uses the cdsapi (https://cds.climate.copernicus.eu/api-how-to) to download the ERA5 dataset and wraps the dataset with BMI for data control and query (currently support 3 dimensional ERA5 dataset). This package is not implemented for people to use and is the key element to help convert the ERA5 dataset into a data component for the PyMT modeling framework.

    The pymt_era5 package is implemented for people to use as a reusable, plug-and-play ERA5 data component for the PyMT modeling framework. This package uses the BMI implementation from the bmi_era5 package and allows the ERA5 datasets to be easily coupled with other datasets or models that expose a BMI.

    HydroShare users can test and run the Jupyter Notebooks (bmi_era5.ipynb, pymt_era5.ipynb) directly through the "CUAHSI JupyterHub" web app with the following steps: - For the new user of the CUAHSI JupyterHub, please first make a request to join the "CUAHSI Could Computing Group" (https://www.hydroshare.org/group/156). After approval, the user will gain access to launch the CUAHSI JupyterHub. - Click on the "Open with" button. (on the top right corner of the page) - Select "CUAHSI JupyterHub". - Select "CSDMS Workbench" server option. (Make sure to select the right server option. Otherwise, the notebook won't run correctly.)

    If there is any question or suggestion about the ERA5 data component, please create a github issue at https://github.com/gantian127/bmi_era5/issues

  7. B

    Python Code for Visualizing COVID-19 data

    • borealisdata.ca
    • search.dataone.org
    Updated Dec 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ryan Chartier; Geoffrey Rockwell (2023). Python Code for Visualizing COVID-19 data [Dataset]. http://doi.org/10.5683/SP3/PYEQL0
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 16, 2023
    Dataset provided by
    Borealis
    Authors
    Ryan Chartier; Geoffrey Rockwell
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The purpose of this code is to produce a line graph visualization of COVID-19 data. This Jupyter notebook was built and run on Google Colab. This code will serve mostly as a guide and will need to be adapted where necessary to be run locally. The separate COVID-19 datasets uploaded to this Dataverse can be used with this code. This upload is made up of the IPYNB and PDF files of the code.

  8. Replication Package for ML-EUP Conversational Agent Study

    • zenodo.org
    pdf
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous; Anonymous (2024). Replication Package for ML-EUP Conversational Agent Study [Dataset]. http://doi.org/10.5281/zenodo.7780223
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous; Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Replication Package Files

    • 1. Forms.zip: contains the forms used to collect data for the experiment

    • 2. Experiments.zip: contains the participants’ and sandboxers’ experimental task workflow with Newton.

    • 3. Responses.zip: contains the responses collected from participants during the experiments.

    • 4. Analysis.zip: contains the data analysis scripts and results of the experiments.

    • 5. newton.zip: contains the tool we used for the WoZ experiment.

    • TutorialStudy.pdf: script used in the experiment with and without Newton to be consistent with all participants.

    • Woz_Script.pdf: script wizard used to maintain consistent Newton responses among the participants.

    1. Forms.zip

    The forms zip contains the following files:

    • Demographics.pdf: a PDF form used to collect demographic information from participants before the experiments

    • Post-Task Control (without the tool).pdf: a PDF form used to collect data from participants about challenges and interactions when performing the task without Newton

    • Post-Task Newton (with the tool).pdf: a PDF form used to collect data from participants after the task with Newton.

    • Post-Study Questionnaire.pdf: a PDF form used to collect data from the participant after the experiment.

    2. Experiments.zip

    The experiments zip contains two types of folders:

    • exp[participant’s number]-c[number of dataset used for control task]e[number of dataset used for experimental task]. Example: exp1-c2e1 (experiment participant 1 - control used dataset 2, experimental used dataset 1)

    • sandboxing[sandboxer’s number]. Example: sandboxing1 (experiment with sandboxer 1)

    Every experiment subfolder contains:

    • warmup.json: a JSON file with the results of Newton-Participant interactions in the chat for the warmup task.

    • warmup.ipynb: a Jupyter notebook file with the participant’s results from the code provided by Newton in the warmup task.

    • sample1.csv: Death Event dataset.

    • sample2.csv: Heart Disease dataset.

    • tool.ipynb: a Jupyter notebook file with the participant’s results from the code provided by Newton in the experimental task.

    • python.ipynb: a Jupyter notebook file with the participant’s results from the code they tried during the control task.

    • results.json: a JSON file with the results of Newton-Participant interactions in the chat for the task with Newton.

    To load an experiment chat log into Newton, add the following code to the notebook:

    import anachat
    import json
    
    with open("result.json", "r") as f:
      anachat.comm.COMM.history = json.load(f)
    

    Then, click on the notebook name inside Newton chat

    Note 1: the subfolder for P6 is exp6-e2c1-serverdied because the experiment server died before we were able to save the logs. We reconstructed them using the notebook newton_remake.ipynb based on the video recording.

    Note 2: The sandboxing occurred during the development of Newton. We did not collect all the files, and the format of JSON files is different than the one supported by the attached version of Newton.

    3. Responses.zip

    The responses zip contains the following files:

    • demographics.csv: a CSV file containing the responses collected from participants using the demographics form

    • task_newton.csv: a CSV file containing the responses collected from participants using the post-task newton form.

    • task_control.csv: a CSV file containing the responses collected from participants using the post-task control form.

    • post_study.csv: a CSV file containing the responses collected from participants using the post-study control form.

    4. Analysis.zip

    The analysis zip contains the following files:

    • 1.Challenge.ipynb: a Jupyter notebook file where the perceptions of challenges figure was created.

    • 2.Interactions.py: a Python file where the participants’ JSON files were created.

    • 3.Interactions.Graph.ipynb: a Jupyter notebook file where the participant’s interaction figure was created.

    • 4.Interactions.Count.ipynb: a Jupyter notebook file that counts participants’ interaction with each figure.

    • config_interactions.py: this file contains the definitions of interaction colors and grouping

    • interactions.json: a JSON file with the interactions during the Newton task of each participant based on the categorization.

    • requirements.txt: dependencies required to run the code to generate the graphs and json analysis.

    To run the analyses, install the dependencies on python 3.10 with the following command and execute the scripts and notebooks in order.:

    pip install -r requirements.txt

    5. newton.zip

    The newton zip contains the source code of the Jupyter Lab extension we used in the experiments. Read the README.md file inside it for instructions on how to install and run it.

  9. f

    Cognitive Fatigue

    • figshare.com
    csv
    Updated Jun 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rui Varandas; Inês Silveira; Hugo Gamboa (2025). Cognitive Fatigue [Dataset]. http://doi.org/10.6084/m9.figshare.28188143.v3
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 4, 2025
    Dataset provided by
    figshare
    Authors
    Rui Varandas; Inês Silveira; Hugo Gamboa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    1. Cognitive Fatigue2.1. Experimental designCognitive fatigue (CF) is a phenomenon that arises following the prolonged engagement in mentally demanding cognitive tasks. Thus, we developed an experimental procedure that involved three demanding tasks: a digital lesson in Jupyter Notebook format, three repetitions of Corsi-Block task, and two repetitions of a concentration test.Before the Corsi-Block task and after the concentration task there were periods of baseline of two min. In our analysis, the first baseline period, although not explicitly present in the dataset, was designated as representing no CF, whereas the final baseline period was designated as representing the presence of CF. Between repetitions of the Corsi-Block task, there were periods of baseline of 15 s after the task and of 30 s before the beginning of each repetition of the task.2.2. Data recordingA data sample of 10 volunteer participants (4 females) aged between 22 and 48 years old (M = 28.2, SD = 7.6) took part in this study. All volunteers were recruited at NOVA School of Science and Technology, fluent in English, right-handed, none reported suffering from psychological disorders, and none reported taking regular medication. Written informed consent was obtained before participating and all Ethical Procedures approved by the Ethics Committee of NOVA University of Lisbon were thoroughly followed.In this study, we omitted the data from one participant due to the insufficient duration of data acquisition.2.3. Data labellingThe labels easy, difficult, very difficult and repeat found in the ECG_lesson_answers.txt files represent the subjects' opinion of the content read in the ECG lesson. The repeat label represents the most difficult level. It's called repeat because when you press it, the answer to the question is shown again. This system is based on the Anki system, which has been proposed and used to memorise information effectively. In addition, the PB description JSON files include timestamps indicating the start and end of cognitive tasks, baseline periods, and other events, which are useful for defining CF states as we defined in 2.1.2.4. Data descriptionBiosignals include EEG, fNIRS (not converted to oxi and deoxiHb), ECG, EDA, respiration (RIP), accelerometer (ACC), and push-button data (PB). All signals have already been converted to physical units. In each biosignal file, the first column corresponds to the timestamps.HCI features encompass keyboard, mouse, and screenshot data. Below is a Python code snippet for extracting screenshot files from the screenshots CSV file.import base64from os import mkdirfrom os.path import joinfile = '...'with open(file, 'r') as f: lines = f.readlines()for line in lines[1:]: timestamp = line.split(',')[0] code = line.split(',')[-1][:-2] imgdata = base64.b64decode(code) filename = str(timestamp) + '.jpeg' mkdir('screenshot') with open(join('screenshot', filename), 'wb') as f: f.write(imgdata)A characterization file containing age and gender information for all subjects in each dataset is provided within the respective dataset folder (e.g., D2_subject-info.csv). Other complementary files include (i) description of the pushbuttons to help segment the signals (e.g., D2_S2_PB_description.json) and (ii) labelling (e.g., D2_S2_ECG_lesson_results.txt). The files D2_Sx_results_corsi-block_board_1.json and D2_Sx_results_corsi-block_board_2.json show the results for the first and second iterations of the corsi-block task, where, for example, row_0_1 = 12 means that the subject got 12 pairs right in the first row of the first board, and row_0_2 = 12 means that the subject got 12 pairs right in the first row of the second board.
  10. O

    Time series

    • data.open-power-system-data.org
    csv, sqlite, xlsx
    Updated Oct 6, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Muehlenpfordt (2020). Time series [Dataset]. http://doi.org/10.25832/time_series/2020-10-06
    Explore at:
    csv, sqlite, xlsxAvailable download formats
    Dataset updated
    Oct 6, 2020
    Dataset provided by
    Open Power System Data
    Authors
    Jonathan Muehlenpfordt
    Time period covered
    Jan 1, 2015 - Oct 1, 2020
    Variables measured
    utc_timestamp, DE_wind_profile, DE_solar_profile, DE_wind_capacity, DK_wind_capacity, SE_wind_capacity, CH_solar_capacity, DE_solar_capacity, DK_solar_capacity, AT_price_day_ahead, and 290 more
    Description

    Load, wind and solar, prices in hourly resolution. This data package contains different kinds of timeseries data relevant for power system modelling, namely electricity prices, electricity consumption (load) as well as wind and solar power generation and capacities. The data is aggregated either by country, control area or bidding zone. Geographical coverage includes the EU and some neighbouring countries. All variables are provided in hourly resolution. Where original data is available in higher resolution (half-hourly or quarter-hourly), it is provided in separate files. This package version only contains data provided by TSOs and power exchanges via ENTSO-E Transparency, covering the period 2015-mid 2020. See previous versions for historical data from a broader range of sources. All data processing is conducted in Python/pandas and has been documented in the Jupyter notebooks linked below.

  11. Data from: Data and code from: Cultivation and dynamic cropping processes...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Data and code from: Cultivation and dynamic cropping processes impart land-cover heterogeneity within agroecosystems: a metrics-based case study in the Yazoo-Mississippi Delta (USA) [Dataset]. https://catalog.data.gov/dataset/data-and-code-from-cultivation-and-dynamic-cropping-processes-impart-land-cover-heterogene-f5f78
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Area covered
    Mississippi Delta, United States, Mississippi
    Description

    This dataset contains data and code from the manuscript:Heintzman, L.J., McIntyre, N.E., Langendoen, E.J., & Read, Q.D. (2024). Cultivation and dynamic cropping processes impart land-cover heterogeneity within agroecosystems: a metrics-based case study in the Yazoo-Mississippi Delta (USA). Landscape Ecology 39, 29 (2024). https://doi.org/10.1007/s10980-024-01797-0There are 14 rasters of land use and land cover data for the study region, in .tif format with associated auxiliary files, two shape files with county boundaries and study area extent, a CSV file with summary information derived from the rasters, and a Jupyter notebook containing Python code.The rasters included here represent an intermediate data product. Original unprocessed rasters from NASS CropScape are not included here, nor is the code to process them.List of filesMS_Delta_maps.zipMSDeltaCounties_UTMZone15N.shp: Depiction of the 19 counties (labeled) that intersect the Mississippi Alluvial Plain in western Mississippi.MS_Delta_MAP_UTMZone15N.shp: Depiction of the study area extent.mf8h_20082021.zipmf8h_XXXX.tif: Yearly, reclassified and majority filtered LULC data used to build comboall1.csv - derived from USDA NASS CropScape. There are 14 .tif files total for years 2008-2021. Each .tif file includes auxiliary files with the same file name and the following extensions: .tfw, .tif.aux.xml, .tif.ovr., .tif.vat.cpg., .tif.vat.dbf.comboall1.csv: Combined dataset of LULC information for all 14 years in study period.analysis.ipynb_.txt: Jupyter Notebook used to analyze comboall1.csv. Convert to .ipynb format to open with Jupyter.This research was conducted under USDA Agricultural Research Service, National Program 211 (Water Availability and Watershed Management).

  12. F

    OpenAXES Example Robot Dataset

    • data.uni-hannover.de
    zip,csv
    Updated Jul 24, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Institut fuer Mikroelektronische Systeme (2023). OpenAXES Example Robot Dataset [Dataset]. https://data.uni-hannover.de/dataset/openaxes-example-robot-dataset
    Explore at:
    zip,csv(423808715)Available download formats
    Dataset updated
    Jul 24, 2023
    Dataset authored and provided by
    Institut fuer Mikroelektronische Systeme
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    This is an example dataset recorded using version 1.0 of the open-source-hardware OpenAXES IMU. Please see the github repository for more information on the hardware and firmware. Please find the most up-to-date version of this document in the repository

    This dataset was recorded using four OpenAXES IMUs mounted on the segments of a robot arm (UR5 by Universal Robots). The robot arm was programmed to perform a calibration movement, then trace a 2D circle or triangle in the air with its tool center point (TCP), and return to its starting position, at four different speeds from 100 mm/s to 250 mm/s. This results in a total of 8 different scenarios (2 shapes times 4 speeds). The ground truth joint angle and TCP position values were obtained from the robot controller. The calibration movement at the beginning of the measurement allows for calculating the exact orientation of the sensors on the robot arm.

    The IMUs were configured to send the raw data from the three gyroscope axes and the six accelerometer axes to a PC via BLE with 16 bit resolution per axis and 100 Hz sample rate. Since no data packets were lost during this process, this dataset allows comparing and tuning different sensor fusion algorithms on the recorded raw data while using the ground truth robot data as a reference.

    In order to visualize the results, the quaternion sequences from the IMUs were applied to the individual segments of a 3D model of the robot arm. The end of this kinematic chain represents the TCP of the virtual model, which should ideally move along the same trajectory as the ground truth, barring the accuracy of the IMUs. Since the raw sensor data of these measurements is available, the calibration coefficients can also be applied ex-post.

    Since there are are 6 joints but only 4 IMUS, some redundancy must be exploited. The redundancy comes from the fact that each IMU has 3 rotational degrees of fredom, but each joint has only one:

    • The data for q0 and q1 are both derived from the orientation of the "humerus" IMU.
    • q2 is the difference between the orientation of the "humerus" and "radius" IMUs.
    • q3 is the difference between the orientation of the "radius" and "carpus" IMUs.
    • q4 is the difference between the orientation of the "carpus" and "digitus" IMUs.
    • The joint q5 does not influence the position of the TCP, only its orientation, so it is ignored in the evaluation.
    • Of course, difference here means not the subtraction of the quaternions but the rotational difference, which is the R1 * inv(R0) for two quaternions (or rotations) R0 and R1. The actual code works a bit differently, but this describes the general principle.

    Data

    • Data recorded from the IMUs is in the directory measure_raw-2022-09-15/, one folder per scenario. In those folders, there is one CSV file per IMU.
    • Data recorded from the robot arm is in the directory measure_raw-2022-09-15/robot/, one CSV and MAT file per scenario.
    • Some photos and videos of the recording process can be found in Media. Videos are stored in git lfs.

    Evaluation

    The file openaxes-example-robot-dataset.ipynb is provided to play around with the data in the dataset and demonstrate how the files are read and interpreted. To use the notebook, set up a Python 3 virtual environment and therein install the necessary packets with pip install -r resuirements.txt. In order to view the graphs contained in the ipynb file, you will most likely have to trust the notebook beforehand, using the following command:

    jupyter trust openaxes-example-robot-dataset.ipynb
    

    Beware: This notebook is not a comprehensive evaluation and any results and plots shown in the file are not necessarily scientifically sound evidence of anything.

    The notebook will store intermediate files in the measure_raw-2022-09-15 directory, like the quaternion files calculated by the different filters, or the files containing the reconstructed TCP positions. All intermediate files should be ignored by the file measure_raw-2022-09-15/.gitignore.

    The generated intermediate files are also provided in the file measure_raw-2022-09-15.tar.bz2, in case you want to inspect the generated files without running the the notebook.

    Tools

    A number of tools are used in the evaluation notebook. Below is a short overview, but not a complete specification. If you need to understand the input and output formats for each tool, please read the code.

    • The file calculate-quaternions.py is used in the evaluation notebook to compute different attitude estimation filters like Madgwick or VQF on the raw accelerometer and gyroscrope measurements at 100 Hz.
    • The directory madgwick-filter contains a small C program that applies the original Madgwick filter to a CSV file containing raw measurements and prints the results. It is used by calculate-quaternions.py.
    • The file calculate-robot-quaternions.py calculates a CSV file of quaternions equivalent to the IMU quaternions from a CSV file containing the joint angles of the robot.
    • The program dsense_vis mentioned in the notebook is used to calculate the 3D model of the robot arm from quaternions and determine the mounting orientations of the IMUs on the robot arm. This program will be released at a future date. In the meantime, the output files of dsense_vis are provided in the file measure_raw-2022-09-15.tar.bz2, which contains the complete content of the measure_raw-2022-09-15 directory after executing the whole notebook. Just unpack this archive and merge its contents with the measure_raw-2022-09-15 directory. This allows you to explore the reconstructed TCP files for the filters implemented at the time of publication.
  13. Reproducibility in Practice: Dataset of a Large-Scale Study of Jupyter...

    • zenodo.org
    bz2
    Updated Mar 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous; Anonymous (2021). Reproducibility in Practice: Dataset of a Large-Scale Study of Jupyter Notebooks [Dataset]. http://doi.org/10.5281/zenodo.2546834
    Explore at:
    bz2Available download formats
    Dataset updated
    Mar 15, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous; Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of Jupyter Notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we analyzed 1.4 million notebooks from GitHub.

    This repository contains two files:

    • dump.tar.bz2
    • jupyter_reproducibility.tar.bz2

    The dump.tar.bz2 file contains a PostgreSQL dump of the database, with all the data we extracted from the notebooks.

    The jupyter_reproducibility.tar.bz2 file contains all the scripts we used to query and download Jupyter Notebooks, extract data from them, and analyze the data. It is organized as follows:

    • analyses: this folder has all the notebooks we use to analyze the data in the PostgreSQL database.
    • archaeology: this folder has all the scripts we use to query, download, and extract data from GitHub notebooks.
    • paper: empty. The notebook analyses/N11.To.Paper.ipynb moves data to it

    In the remaining of this text, we give instructions for reproducing the analyses, by using the data provided in the dump and reproducing the collection, by collecting data from GitHub again.

    Reproducing the Analysis

    This section shows how to load the data in the database and run the analyses notebooks. In the analysis, we used the following environment:

    Ubuntu 18.04.1 LTS
    PostgreSQL 10.6
    Conda 4.5.1
    Python 3.6.8
    PdfCrop 2012/11/02 v1.38

    First, download dump.tar.bz2 and extract it:

    tar -xjf dump.tar.bz2

    It extracts the file db2019-01-13.dump. Create a database in PostgreSQL (we call it "jupyter"), and use psql to restore the dump:

    psql jupyter < db2019-01-13.dump

    It populates the database with the dump. Now, configure the connection string for sqlalchemy by setting the environment variable JUP_DB_CONNECTTION:

    export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter";

    Download and extract jupyter_reproducibility.tar.bz2:

    tar -xjf jupyter_reproducibility.tar.bz2

    Create a conda environment with Python 3.6:

    conda create -n py36 python=3.6

    Go to the analyses folder and install all the dependencies of the requirements.txt

    cd jupyter_reproducibility/analyses
    pip install -r requirements.txt

    For reproducing the analyses, run jupyter on this folder:

    jupyter notebook

    Execute the notebooks on this order:

    • N0.Index.ipynb
    • N1.Repository.ipynb
    • N2.Notebook.ipynb
    • N3.Cell.ipynb
    • N4.Features.ipynb
    • N5.Modules.ipynb
    • N6.AST.ipynb
    • N7.Name.ipynb
    • N8.Execution.ipynb
    • N9.Cell.Execution.Order.ipynb
    • N10.Markdown.ipynb
    • N11.To.Paper.ipynb

    Reproducing or Expanding the Collection

    The collection demands more steps to reproduce and takes much longer to run (months). It also involves running arbitrary code on your machine. Proceed with caution.

    Requirements

    This time, we have extra requirements:

    All the analysis requirements
    lbzip2 2.5
    gcc 7.3.0
    Github account
    Gmail account

    Environment

    First, set the following environment variables:

    export JUP_MACHINE="db"; # machine identifier
    export JUP_BASE_DIR="/mnt/jupyter/github"; # place to store the repositories
    export JUP_LOGS_DIR="/home/jupyter/logs"; # log files
    export JUP_COMPRESSION="lbzip2"; # compression program
    export JUP_VERBOSE="5"; # verbose level
    export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter"; # sqlchemy connection
    export JUP_GITHUB_USERNAME="github_username"; # your github username
    export JUP_GITHUB_PASSWORD="github_password"; # your github password
    export JUP_MAX_SIZE="8000.0"; # maximum size of the repositories directory (in GB)
    export JUP_FIRST_DATE="2013-01-01"; # initial date to query github
    export JUP_EMAIL_LOGIN="gmail@gmail.com"; # your gmail address
    export JUP_EMAIL_TO="target@email.com"; # email that receives notifications
    export JUP_OAUTH_FILE="~/oauth2_creds.json" # oauth2 auhentication file
    export JUP_NOTEBOOK_INTERVAL=""; # notebook id interval for this machine. Leave it in blank
    export JUP_REPOSITORY_INTERVAL=""; # repository id interval for this machine. Leave it in blank
    export JUP_WITH_EXECUTION="1"; # run execute python notebooks
    export JUP_WITH_DEPENDENCY="0"; # run notebooks with and without declared dependnecies
    export JUP_EXECUTION_MODE="-1"; # run following the execution order
    export JUP_EXECUTION_DIR="/home/jupyter/execution"; # temporary directory for running notebooks
    export JUP_ANACONDA_PATH="~/anaconda3"; # conda installation path
    export JUP_MOUNT_BASE="/home/jupyter/mount_ghstudy.sh"; # bash script to mount base dir
    export JUP_UMOUNT_BASE="/home/jupyter/umount_ghstudy.sh"; # bash script to umount base dir
    export JUP_NOTEBOOK_TIMEOUT="300"; # timeout the extraction
    
    
    # Frequenci of log report
    export JUP_ASTROID_FREQUENCY="5";
    export JUP_IPYTHON_FREQUENCY="5";
    export JUP_NOTEBOOKS_FREQUENCY="5";
    export JUP_REQUIREMENT_FREQUENCY="5";
    export JUP_CRAWLER_FREQUENCY="1";
    export JUP_CLONE_FREQUENCY="1";
    export JUP_COMPRESS_FREQUENCY="5";
    
    export JUP_DB_IP="localhost"; # postgres database IP

    Then, configure the file ~/oauth2_creds.json, according to yagmail documentation: https://media.readthedocs.org/pdf/yagmail/latest/yagmail.pdf

    Configure the mount_ghstudy.sh and umount_ghstudy.sh scripts. The first one should mount the folder that stores the directories. The second one should umount it. You can leave the scripts in blank, but it is not advisable, as the reproducibility study runs arbitrary code on your machine and you may lose your data.

    Scripts

    Download and extract jupyter_reproducibility.tar.bz2:

    tar -xjf jupyter_reproducibility.tar.bz2

    Install 5 conda environments and 5 anaconda environments, for each python version. In each of them, upgrade pip, install pipenv, and install the archaeology package (Note that it is a local package that has not been published to pypi. Make sure to use the -e option):

    Conda 2.7

    conda create -n raw27 python=2.7 -y
    conda activate raw27
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 2.7

    conda create -n py27 python=2.7 anaconda -y
    conda activate py27
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology
    

    Conda 3.4

    It requires a manual jupyter and pathlib2 installation due to some incompatibilities found on the default installation.

    conda create -n raw34 python=3.4 -y
    conda activate raw34
    conda install jupyter -c conda-forge -y
    conda uninstall jupyter -y
    pip install --upgrade pip
    pip install jupyter
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology
    pip install pathlib2

    Anaconda 3.4

    conda create -n py34 python=3.4 anaconda -y
    conda activate py34
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.5

    conda create -n raw35 python=3.5 -y
    conda activate raw35
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 3.5

    It requires the manual installation of other anaconda packages.

    conda create -n py35 python=3.5 anaconda -y
    conda install -y appdirs atomicwrites keyring secretstorage libuuid navigator-updater prometheus_client pyasn1 pyasn1-modules spyder-kernels tqdm jeepney automat constantly anaconda-navigator
    conda activate py35
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.6

    conda create -n raw36 python=3.6 -y
    conda activate raw36
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 3.6

    conda create -n py36 python=3.6 anaconda -y
    conda activate py36
    conda install -y anaconda-navigator jupyterlab_server navigator-updater
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.7

    conda create -n raw37 python=3.7 -y
    conda activate raw37
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 3.7

    When we

  14. d

    Toward a Reproducible Research Data Repository

    • data.depositar.io
    mp4, pdf
    Updated Jan 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    depositar (2024). Toward a Reproducible Research Data Repository [Dataset]. https://data.depositar.io/dataset/reproducible-research-data-repository
    Explore at:
    pdf(212638), pdf(2586248), pdf(627064), mp4(22141307)Available download formats
    Dataset updated
    Jan 26, 2024
    Dataset provided by
    depositar
    Description

    Collected in this dataset are the slideset and abstract for a presentation on Toward a Reproducible Research Data Repository by the depositar team at International Symposium on Data Science 2023 (DSWS 2023), hosted by the Science Council of Japan in Tokyo on December 13-15, 2023. The conference was organized by the Joint Support-Center for Data Science Research (DS), Research Organization of Information and Systems (ROIS) and the Committee of International Collaborations on Data Science, Science Council of Japan. The conference programme is also included as a reference.

    Title

    Toward a Reproducible Research Data Repository

    Author(s)

    Cheng-Jen Lee, Chia-Hsun Ally Wang, Ming-Syuan Ho, and Tyng-Ruey Chuang

    Affiliation of presenter

    Institute of Information Science, Academia Sinica, Taiwan

    Summary of Abstract

    The depositar (https://data.depositar.io/) is a research data repository at Academia Sinica (Taiwan) open to researhers worldwide for the deposit, discovery, and reuse of datasets. The depositar software itself is open source and builds on top of CKAN. CKAN, an open source project initiated by the Open Knowledge Foundation and sustained by an active user community, is a leading data management system for building data hubs and portals. In addition to CKAN's out-of-the-box features such as JSON data API and in-browser preview of uploaded data, we have added several features to the depositar, including sourcing from Wikidata as dataset keywords, a citation snippet for datasets, in-browser Shapefile preview, and a persistent identifier system based on ARK (Archival Resource Keys). At the same time, the depositar team faces an increasing demand for interactive computing (e.g. Jupyter Notebook) which facilitates not just data analysis, but also for the replication and demonstration of scientific studies. Recently, we have provided a JupyterHub service (a multi-tenancy JupyterLab) to some of the depositar's users. However, it still requires users to first download the data files (or copy the URLs of the files) from the depositar, then upload the data files (or paste the URLs) to the Jupyter notebooks for analysis. Furthermore, a JupyterHub deployed on a single server is limited by its processing power which may lower the service level to the users. To address the above issues, we are integrating the BinderHub into the depositar. BinderHub (https://binderhub.readthedocs.io/) is a kubernetes-based service that allows users to create interactive computing environments from code repositories. Once the integration is completed, users will be able to launch Jupyter Notebooks to perform data analysis and vsualization without leaving the depositar by clicking the BinderHub buttons on the datasets. In this presentation, we will first make a brief introduction to the depositar and BinderHub along with their relationship, then we will share our experiences in incorporating interactive computation in a data repository. We shall also evaluate the possibility of integrating the depositar with other automation frameworks (e.g. the Snakemake workflow management system) in order to enable users to reproduce data analysis.

    Keywords

    BinderHub, CKAN, Data Repositories, Interactive Computing, Reproducible Research

  15. Galaxy Training Material for the 'Use Jupyter notebooks in Galaxy' tutorial

    • zenodo.org
    csv
    Updated Apr 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Delphine Lariviere; Delphine Lariviere; Teresa Müller; Teresa Müller (2025). Galaxy Training Material for the 'Use Jupyter notebooks in Galaxy' tutorial [Dataset]. http://doi.org/10.5281/zenodo.15263830
    Explore at:
    csvAvailable download formats
    Dataset updated
    Apr 22, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Delphine Lariviere; Delphine Lariviere; Teresa Müller; Teresa Müller
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset was originally curated by Software Carpentry, a branch of The Carpentries non-profit organization, and is based on data from the Gapminder Foundation. It consists of six tabular CSV files containing GDP data for various countries across different years. The dataset was initially prepared for the Software Carpentry tutorial "Plotting and Programming in Python" and is also reused in the Galaxy Training Network (GTN) tutorial "Use Jupyter Notebooks in Galaxy."

    This GTN tutorial provides an introduction to launching a Jupyter Notebook in Galaxy, installing dependencies, and importing and exporting data. It serves as a setup guide for a Jupyter Notebook environment that can be used to follow the Software Carpentry tutorial "Plotting and Programming in Python."

  16. Data from: Long-Term Tracing of Indoor Solar Harvesting

    • zenodo.org
    • data.niaid.nih.gov
    bin, pdf, tar
    Updated Jul 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lukas Sigrist; Lukas Sigrist; Andres Gomez; Andres Gomez; Lothar Thiele; Lothar Thiele (2024). Long-Term Tracing of Indoor Solar Harvesting [Dataset]. http://doi.org/10.5281/zenodo.3346976
    Explore at:
    pdf, tar, binAvailable download formats
    Dataset updated
    Jul 22, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lukas Sigrist; Lukas Sigrist; Andres Gomez; Andres Gomez; Lothar Thiele; Lothar Thiele
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Information

    This dataset presents long-term term indoor solar harvesting traces and jointly monitored with the ambient conditions. The data is recorded at 6 indoor positions with diverse characteristics at our instituate at ETH Zurich in Zurich, Switzerland.

    The data is collected with a measurement platform [3] consisting of a solar panel (AM-5412) connected to a bq25505 energy harvesting chip that stores the harvested energy in a virtual battery circuit. Two TSL45315 light sensors placed on opposite sides of the solar panel monitor the illuminance level and a BME280 sensor logs ambient conditions like temperature, humidity and air pressure.

    The dataset contains the measurement of the energy flow at the input and the output of the bq25505 harvesting circuit, as well alse the illuminance, temperature, humidity and air pressure measurements of the ambient sensors. The following timestamped data columns are available in the raw measurement format, as well as preprocessed and filtered HDF5 datasets:

    • V_in - Converter input/solar panel output voltage, in volt
    • I_in - Converter input/solar panel output current, in ampere
    • V_bat - Battery voltage (emulated through circuit), in volt
    • I_bat - Net Battery current, in/out flowing current, in ampere
    • Ev_left - Illuminance left of solar panel, in lux
    • Ev_right - Illuminance left of solar panel, in lux
    • P_amb - Ambient air pressure, in pascal
    • RH_amb - Ambient relative humidity, unit-less between 0 and 1
    • T_amb - Ambient temperature, in centigrade Celsius

    The following publication presents and overview of the dataset and more details on the deployment used for data collection. A copy of the abstract is included in this dataset, see the file abstract.pdf.

    L. Sigrist, A. Gomez, and L. Thiele. Dataset: Tracing Indoor Solar Harvesting. In Proceedings of the 2nd Workshop on Data Acquisition To Analysis (DATA '19), 2019. [under submission]

    Folder Structure and Files

    • processed/ - This folder holds the imported, merged and filtered datasets of the power and sensor measurements. The datasets are stored in HDF5 format and split by measurement position posXX and and power and ambient sensor measurements. The files belonging to this folder are contained in archives named yyyy_mm_processed.tar, where yyyy and mm represent the year and month the data was published. A separate file lists the exact content of each archive (see below).
    • raw/ - This folder holds the raw measurement files recorded with the RocketLogger [1, 2] and using the measurement platform available at [3]. The files belonging to this folder are contained in archives named yyyy_mm_raw.tar, where yyyy and mm represent the year and month the data was published. A separate file lists the exact content of each archive (see below).
    • LICENSE - License information for the dataset.
    • README.md - The README file containing this information.
    • abstract.pdf - A copy of the above mentioned abstract submitted to the DATA '19 Workshop, introducing this dataset and the deployment used to collect it.
    • raw_import.ipynb [open in nbviewer] - Jupyter Python notebook to import, merge, and filter the raw dataset from the raw/ folder. This is the exact code used to generate the processed dataset and store it in the HDF5 format in the processed/ folder.
    • raw_preview.ipynb [open in nbviewer] - This Jupyter Python notebook imports the raw dataset directly and plots a preview of the full power trace for all measurement positions.
    • processing_python.ipynb [open in nbviewer] - Jupyter Python notebook demonstrating the import and use of the processed dataset in Python. Calculates column-wise statistics, includes more detailed power plots and the simple energy predictor performance comparison included in the abstract.
    • processing_r.ipynb [open in nbviewer] - Jupyter R notebook demonstrating the import and use of the processed dataset in R. Calculates column-wise statistics and extracts and plots the energy harvesting conversion efficiency included in the abstract. Furthermore, the harvested power is analyzed as a function of the ambient light level.

    Dataset File Lists

    Processed Dataset Files

    The list of the processed datasets included in the yyyy_mm_processed.tar archive is provided in yyyy_mm_processed.files.md. The markdown formatted table lists the name of all files, their size in bytes, as well as the SHA-256 sums.

    Raw Dataset Files

    A list of the raw measurement files included in the yyyy_mm_raw.tar archive(s) is provided in yyyy_mm_raw.files.md. The markdown formatted table lists the name of all files, their size in bytes, as well as the SHA-256 sums.

    Dataset Revisions

    v1.0 (2019-08-03)

    Initial release.
    Includes the data collected from 2017-07-27 to 2019-08-01. The dataset archive files related to this revision are 2019_08_raw.tar and 2019_08_processed.tar.
    For position pos06, the measurements from 2018-01-06 00:00:00 to 2018-01-10 00:00:00 are filtered (data inconsistency in file indoor1_p27.rld).

    Dataset Authors, Copyright and License

    References

    [1] L. Sigrist, A. Gomez, R. Lim, S. Lippuner, M. Leubin, and L. Thiele. Measurement and validation of energy harvesting IoT devices. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

    [2] ETH Zurich, Computer Engineering Group. RocketLogger Project Website, https://rocketlogger.ethz.ch/.

    [3] L. Sigrist. Solar Harvesting and Ambient Tracing Platform, 2019. https://gitlab.ethz.ch/tec/public/employees/sigristl/harvesting_tracing

  17. D

    PV and Wind power dataset for Europe

    • wdc-climate.de
    Updated Nov 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ho, Linh; Fiedler, Stephanie; Wahl, Sebastian (2023). PV and Wind power dataset for Europe [Dataset]. https://www.wdc-climate.de/ui/entry?acronym=DKRZ_LTA_1198_ds00003
    Explore at:
    Dataset updated
    Nov 29, 2023
    Dataset provided by
    DKRZ
    World Data Center for Climate (WDCC) at DKRZ
    Authors
    Ho, Linh; Fiedler, Stephanie; Wahl, Sebastian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1995 - Dec 31, 2017
    Description

    (1) Output of the Renewable Energy Model (REM) as described in Insights into weather-driven extremes in Europe’s resources for renewable energy (Ho and Fiedler, 2024), last modification on 30.10.2023 from Linh Ho, named year_PV_wind_generation_v2.nc, with 23 years from 1995 to 2017. REM includes one simulation of photovoltaic (PV) power production and one simulation of wind power production across European domain, with a horizontal resolution of 48 km, hourly output for the period 1995--2017.

    The output has a European domain with the same size as in the reanalysis dataset COSMO-REA6. This is a rotated grid with the coordinates of the rotated North Pole −162.0, 39.25, and of the lower left corner −23.375, −28.375. See Bollmeyer et al. (2014, http://doi.org/10.1002/qj.2486). Data downloaded from https://opendata.dwd.de/climate_environment/REA/COSMO_REA6/

    (2) Weather pattern classification daily for Europe from 1995 to April 2020, named EGWL_LegacyGWL.txt, from James (2007, http://doi.org/10.1007/s00704-006-0239-3)

    (3) The installation data of PV and wind power in Europe for one scenario in 2050 from the CLIMIX model, processed to have the same horizontal resolution as in REM, named installed_capacity_PV_wind_power_from_CLIMIX_final.nc. Original data were provided at 0.11 degree resolution, acquired from personal communication with the author from Jerez et al. (2015, http://doi.org/10.1016/j.rser.2014.09.041)

    (4) Python scripts of REM, including: - model_PV_wind_complete_v2.py: the main script to produce REM output - model_PV_wind_potential_v2.py: produce potential (capacity factor) of PV and wind power for model evaluations, e.g., against CDS and Renewables Ninja data, as descript in Ho and Fiedler (2024) - model_PV_wind_complete_v1_ONLYyear2000.py: a separate Python script to produce REM output only for the year 2000. Note that the data for 2000 from COSMO-REA6 were read in a different approach (using cfgrib) probably due to the time stamp changes at the beginning of the milenium, also explains the larger size of the final output - utils_LH_archive_Oct2022.py: contains necessary Python functions to run the other scripts

    (5) Jupyter notebook files to reproduce the figures in Ho and Fiedler (2024), named Paper1_Fig*_**.ipynb

    (6) Time series of European-aggregated PV and wind power production hourly during the period 1995--2017, processed data from the dataset (1) to facilitate the reproduction of the figures, including two installations scale-2019 and scenario-2050: - Timeseries_all_hourly_1995_2017_GW_scale2019.csv - Timeseries_all_hourly_1995_2017_GW_scen2050.csv

  18. Z

    Dataset for 'Unveiling the charge distribution of a GaAs-based...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Birner, Stefan (2024). Dataset for 'Unveiling the charge distribution of a GaAs-based nanoelectronic device' [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6498343
    Explore at:
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Bäuerle, Christopher
    Trellakis, Alex
    Mazzella, Lucas
    Lacerda-Santos, Antonio
    Wang, Junliang
    Grange, Thomas
    Waintal, Xavier
    Cecilia da Silva Figueira, Maria
    Chatzikyriakou, Eleni
    Birner, Stefan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the raw experimental data associated with the manuscript "Unveiling the charge distribution of a GaAs-based nanoelectronic device: A large experimental data-set approach" by Eleni Chatzikyriakou, Junliang Wang et al. See Arxiv:2205.00846 for more details.

    Content of the different data files

    The data are stored in 5 different files in the csv format.

    • data_1D_4K.csv: current versus gate voltage data for all the samples at 4K. The same gate voltage is applied on the Top and bottom gates.

    For sample X1Y3, X2Y3, X5Y3 and X6Y3 some additional measurements have been realised :

    • data_1D_mk.csv : top and bottom gates of each QPC have been swept at the same time at 50 mK temperature.

    • data_2D_4K_TB.csv : top gate has been swept for different values of the bottom gate at 4K.

    • data_2D_mK_TB.csv : top gate has been swept for different values of the bottom gate at 50 mK temperature.

    • data_2D_mK_BT.csv : bottom gate has been swept for different values of the top gate at 50 mK temperature.

    Format of the csv files

    --- All the data files are in the following format.

    • A given curve "current versus gate voltage" is stored in two consecutive raws. The first one contains the value of the measured current, the second one contains the values of the applied gate voltages.

    • A 2D measurement "current versus top gate and bottom gate" is stored in three consecutive raws. The first one contains the value of the measured current, the second one contains the list of values of voltage applied on one of the gate. The third third raw contains the list of values of voltage applied on the other gate.

    --- Each row of a csv file has the following format:

    • The first column identifies the quantum point contact and the quantity. The format is Xx_Yy_QpcNb_Meas.

      • Xx is the column on the chip (from X1 to X6).
      • Yy is the row on the chip (Y1, Y2 or Y3).
      • QpcNb is the number of the qpc (from 1 to 8 or from 9 to 16).
      • Meas is the reported quantity, either the (measured) "current" or the (applied) "voltage" or the "TopVoltage" or "BotVoltage" for 2D scans.
    • The second column is the unit (A or V).

    • The third column is the number of sweeps (1, 2 or 3) performed. For some measurements, the same sweep has been done multiple times.

    • The fourth column is the design of the quantum point contact (A, B, C, D or E).

    • All the following columns contain the measured value. For 2D scans the different values of the gate corresponding to the third raw are placed one after the other.

    Python scripts for data analysis

    For convenience, we provide an example python scripts that can be used to load the data and plot them.

    extract.py : Extracts the data into a dictionary and plots the I-V characteristics extract.ipynb : jupyter notebook using the different functions of extract.py

    type -h for help

  19. Australian Employee Salary/Wages DATAbase by detailed occupation, location...

    • figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Richard Ferrers; Australian Taxation Office (2023). Australian Employee Salary/Wages DATAbase by detailed occupation, location and year (2002-14); (plus Sole Traders) [Dataset]. http://doi.org/10.6084/m9.figshare.4522895.v5
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Richard Ferrers; Australian Taxation Office
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The ATO (Australian Tax Office) made a dataset openly available (see links) showing all the Australian Salary and Wages (2002, 2006, 2010, 2014) by detailed occupation (around 1,000) and over 100 SA4 regions. Sole Trader sales and earnings are also provided. This open data (csv) is now packaged into a database (*.sql) with 45 sample SQL queries (backupSQL[date]_public.txt).See more description at related Figshare #datavis record. Versions:V5: Following #datascience course, I have made main data (individual salary and wages) available as csv and Jupyter Notebook. Checksum matches #dataTotals. In 209,xxx rows.Also provided Jobs, and SA4(Locations) description files as csv. More details at: Where are jobs growing/shrinking? Figshare DOI: 4056282 (linked below). Noted 1% discrepancy ($6B) in 2010 wages total - to follow up.#dataTotals - Salary and WagesYearWorkers (M)Earnings ($B) 20028.528520069.4372201010.2481201410.3584#dataTotal - Sole TradersYearWorkers (M)Sales ($B)Earnings ($B)20020.9611320061.0881920101.11122620141.19630#links See ATO request for data at ideascale link below.See original csv open data set (CC-BY) at data.gov.au link below.This database was used to create maps of change in regional employment - see Figshare link below (m9.figshare.4056282).#packageThis file package contains a database (analysing the open data) in SQL package and sample SQL text, interrogating the DB. DB name: test. There are 20 queries relating to Salary and Wages.#analysisThe database was analysed and outputs provided on Nectar(.org.au) resources at: http://118.138.240.130.(offline)This is only resourced for max 1 year, from July 2016, so will expire in June 2017. Hence the filing here. The sample home page is provided here (and pdf), but not all the supporting files, which may be packaged and added later. Until then all files are available at the Nectar URL. Nectar URL now offline - server files attached as package (html_backup[date].zip), including php scripts, html, csv, jpegs.#installIMPORT: DB SQL dump e.g. test_2016-12-20.sql (14.8Mb)1.Started MAMP on OSX.1.1 Go to PhpMyAdmin2. New Database: 3. Import: Choose file: test_2016-12-20.sql -> Go (about 15-20 seconds on MacBookPro 16Gb, 2.3 Ghz i5)4. four tables appeared: jobTitles 3,208 rows | salaryWages 209,697 rows | soleTrader 97,209 rows | stateNames 9 rowsplus views e.g. deltahair, Industrycodes, states5. Run test query under **#; Sum of Salary by SA4 e.g. 101 $4.7B, 102 $6.9B#sampleSQLselect sa4,(select sum(count) from salaryWageswhere year = '2014' and sa4 = sw.sa4) as thisYr14,(select sum(count) from salaryWageswhere year = '2010' and sa4 = sw.sa4) as thisYr10,(select sum(count) from salaryWageswhere year = '2006' and sa4 = sw.sa4) as thisYr06,(select sum(count) from salaryWageswhere year = '2002' and sa4 = sw.sa4) as thisYr02from salaryWages swgroup by sa4order by sa4

  20. d

    Multi-task Deep Learning for Water Temperature and Streamflow Prediction...

    • catalog.data.gov
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Multi-task Deep Learning for Water Temperature and Streamflow Prediction (ver. 1.1, June 2022) [Dataset]. https://catalog.data.gov/dataset/multi-task-deep-learning-for-water-temperature-and-streamflow-prediction-ver-1-1-june-2022
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    This item contains data and code used in experiments that produced the results for Sadler et. al (2022) (see below for full reference). We ran five experiments for the analysis, Experiment A, Experiment B, Experiment C, Experiment D, and Experiment AuxIn. Experiment A tested multi-task learning for predicting streamflow with 25 years of training data and using a different model for each of 101 sites. Experiment B tested multi-task learning for predicting streamflow with 25 years of training data and using a single model for all 101 sites. Experiment C tested multi-task learning for predicting streamflow with just 2 years of training data. Experiment D tested multi-task learning for predicting water temperature with over 25 years of training data. Experiment AuxIn used water temperature as an input variable for predicting streamflow. These experiments and their results are described in detail in the WRR paper. Data from a total of 101 sites across the US was used for the experiments. The model input data and streamflow data were from the Catchment Attributes and Meteorology for Large-sample Studies (CAMELS) dataset (Newman et. al 2014, Addor et. al 2017). The water temperature data were gathered from the National Water Information System (NWIS) (U.S. Geological Survey, 2016). The contents of this item are broken into 13 files or groups of files aggregated into zip files:

    1. input_data_processing.zip: A zip file containing the scripts used to collate the observations, input weather drivers, and catchment attributes for the multi-task modeling experiments
    2. flow_observations.zip: A zip file containing collated daily streamflow data for the sites used in multi-task modeling experiments. The streamflow data were originally accessed from the CAMELs dataset. The data are stored in csv and Zarr formats.
    3. temperature_observations.zip: A zip file containing collated daily water temperature data for the sites used in multi-task modeling experiments. The data were originally accessed via NWIS. The data are stored in csv and Zarr formats.
    4. temperature_sites.geojson: Geojson file of the locations of the water temperature and streamflow sites used in the analysis.
    5. model_drivers.zip: A zip file containing the daily input weather driver data for the multi-task deep learning models. These data are from the Daymet drivers and were collated from the CAMELS dataset. The data are stored in csv and Zarr formats.
    6. catchment_attrs.csv: Catchment attributes collatted from the CAMELS dataset. These data are used for the Random Forest modeling. For full metadata regarding these data see CAMELS dataset.
    7. experiment_workflow_files.zip: A zip file containing workflow definitions used to run multi-task deep learning experiments. These are Snakemake workflows. To run a given experiment, one would run (for experiment A) 'snakemake -s expA_Snakefile --configfile expA_config.yml'
    8. river-dl-paper_v0.zip: A zip file containing python code used to run multi-task deep learning experiments. This code was called by the Snakemake workflows contained in 'experiment_workflow_files.zip'.
    9. random_forest_scripts.zip: A zip file containing Python code and a Python Jupyter Notebook used to prepare data for, train, and visualize feature importance of a Random Forest model.
    10. plotting_code.zip: A zip file containing python code and Snakemake workflow used to produce figures showing the results of multi-task deep learning experiments.
    11. results.zip: A zip file containing results of multi-task deep learning experiments. The results are stored in csv and netcdf formats. The netcdf files were used by the plotting libraries in 'plotting_code.zip'. These files are for five experiments, 'A', 'B', 'C', 'D', and 'AuxIn'. These experiment names are shown in the file name.
    12. sample_scripts.zip: A zip file containing scripts for creating sample output to demonstrate how the modeling workflow was executed.
    13. sample_output.zip: A zip file containing sample output data. Similar files are created by running the sample scripts provided.
    A. Newman; K. Sampson; M. P. Clark; A. Bock; R. J. Viger; D. Blodgett, 2014. A large-sample watershed-scale hydrometeorological dataset for the contiguous USA. Boulder, CO: UCAR/NCAR. https://dx.doi.org/10.5065/D6MW2F4D

    N. Addor, A. Newman, M. Mizukami, and M. P. Clark, 2017. Catchment attributes for large-sample studies. Boulder, CO: UCAR/NCAR. https://doi.org/10.5065/D6G73C3Q

    Sadler, J. M., Appling, A. P., Read, J. S., Oliver, S. K., Jia, X., Zwart, J. A., & Kumar, V. (2022). Multi-Task Deep Learning of Daily Streamflow and Water Temperature. Water Resources Research, 58(4), e2021WR030138. https://doi.org/10.1029/2021WR030138

    U.S. Geological Survey, 2016, National Water Information System data available on the World Wide Web (USGS Water Data for the Nation), accessed Dec. 2020.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Richard Ferrers; Speedtest Global Index (2023). Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus ALC - 2020, 2022 [Dataset]. http://doi.org/10.6084/m9.figshare.13621169.v24
Organization logo

Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus ALC - 2020, 2022

Explore at:
txtAvailable download formats
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Richard Ferrers; Speedtest Global Index
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset compares four cities FIXED-line broadband internet speeds: - Melbourne, AU - Bangkok, TH - Shanghai, CN - Los Angeles, US - Alice Springs, AU

ERRATA: 1.Data is for Q3 2020, but some files are labelled incorrectly as 02-20 of June 20. They all should read Sept 20, or 09-20 as Q3 20, rather than Q2. Will rename and reload. Amended in v7.

  1. LAX file named 0320, when should be Q320. Amended in v8.

*lines of data for each geojson file; a line equates to a 600m^2 location, inc total tests, devices used, and average upload and download speed - MEL 16181 locations/lines => 0.85M speedtests (16.7 tests per 100people) - SHG 31745 lines => 0.65M speedtests (2.5/100pp) - BKK 29296 lines => 1.5M speedtests (14.3/100pp) - LAX 15899 lines => 1.3M speedtests (10.4/100pp) - ALC 76 lines => 500 speedtests (2/100pp)

Geojsons of these 2* by 2* extracts for MEL, BKK, SHG now added, and LAX added v6. Alice Springs added v15.

This dataset unpacks, geospatially, data summaries provided in Speedtest Global Index (linked below). See Jupyter Notebook (*.ipynb) to interrogate geo data. See link to install Jupyter.

** To Do Will add Google Map versions so everyone can see without installing Jupyter. - Link to Google Map (BKK) added below. Key:Green > 100Mbps(Superfast). Black > 500Mbps (Ultrafast). CSV provided. Code in Speedtestv1.1.ipynb Jupyter Notebook. - Community (Whirlpool) surprised [Link: https://whrl.pl/RgAPTl] that Melb has 20% at or above 100Mbps. Suggest plot Top 20% on map for community. Google Map link - now added (and tweet).

** Python melb = au_tiles.cx[144:146 , -39:-37] #Lat/Lon extract shg = tiles.cx[120:122 , 30:32] #Lat/Lon extract bkk = tiles.cx[100:102 , 13:15] #Lat/Lon extract lax = tiles.cx[-118:-120, 33:35] #lat/Lon extract ALC=tiles.cx[132:134, -22:-24] #Lat/Lon extract

Histograms (v9), and data visualisations (v3,5,9,11) will be provided. Data Sourced from - This is an extract of Speedtest Open data available at Amazon WS (link below - opendata.aws).

**VERSIONS v.24 Add tweet and google map of Top 20% (over 100Mbps locations) in Mel Q322. Add v.1.5 MEL-Superfast notebook, and CSV of results (now on Google Map; link below). v23. Add graph of 2022 Broadband distribution, and compare 2020 - 2022. Updated v1.4 Jupyter notebook. v22. Add Import ipynb; workflow-import-4cities. v21. Add Q3 2022 data; five cities inc ALC. Geojson files. (2020; 4.3M tests 2022; 2.9M tests)

Melb 14784 lines Avg download speed 69.4M Tests 0.39M

SHG 31207 lines Avg 233.7M Tests 0.56M

ALC 113 lines Avg 51.5M Test 1092

BKK 29684 lines Avg 215.9M Tests 1.2M

LAX 15505 lines Avg 218.5M Tests 0.74M

v20. Speedtest - Five Cities inc ALC. v19. Add ALC2.ipynb. v18. Add ALC line graph. v17. Added ipynb for ALC. Added ALC to title.v16. Load Alice Springs Data Q221 - csv. Added Google Map link of ALC. v15. Load Melb Q1 2021 data - csv. V14. Added Melb Q1 2021 data - geojson. v13. Added Twitter link to pics. v12 Add Line-Compare pic (fastest 1000 locations) inc Jupyter (nbn-intl-v1.2.ipynb). v11 Add Line-Compare pic, plotting Four Cities on a graph. v10 Add Four Histograms in one pic. v9 Add Histogram for Four Cities. Add NBN-Intl.v1.1.ipynb (Jupyter Notebook). v8 Renamed LAX file to Q3, rather than 03. v7 Amended file names of BKK files to correctly label as Q3, not Q2 or 06. v6 Added LAX file. v5 Add screenshot of BKK Google Map. v4 Add BKK Google map(link below), and BKK csv mapping files. v3 replaced MEL map with big key version. Prev key was very tiny in top right corner. v2 Uploaded MEL, SHG, BKK data and Jupyter Notebook v1 Metadata record

** LICENCE AWS data licence on Speedtest data is "CC BY-NC-SA 4.0", so use of this data must be: - non-commercial (NC) - reuse must be share-alike (SA)(add same licence). This restricts the standard CC-BY Figshare licence.

** Other uses of Speedtest Open Data; - see link at Speedtest below.

Search
Clear search
Close search
Google apps
Main menu