Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset compares four cities FIXED-line broadband internet speeds: - Melbourne, AU - Bangkok, TH - Shanghai, CN - Los Angeles, US - Alice Springs, AU
ERRATA: 1.Data is for Q3 2020, but some files are labelled incorrectly as 02-20 of June 20. They all should read Sept 20, or 09-20 as Q3 20, rather than Q2. Will rename and reload. Amended in v7.
*lines of data for each geojson file; a line equates to a 600m^2 location, inc total tests, devices used, and average upload and download speed - MEL 16181 locations/lines => 0.85M speedtests (16.7 tests per 100people) - SHG 31745 lines => 0.65M speedtests (2.5/100pp) - BKK 29296 lines => 1.5M speedtests (14.3/100pp) - LAX 15899 lines => 1.3M speedtests (10.4/100pp) - ALC 76 lines => 500 speedtests (2/100pp)
Geojsons of these 2* by 2* extracts for MEL, BKK, SHG now added, and LAX added v6. Alice Springs added v15.
This dataset unpacks, geospatially, data summaries provided in Speedtest Global Index (linked below). See Jupyter Notebook (*.ipynb) to interrogate geo data. See link to install Jupyter.
** To Do Will add Google Map versions so everyone can see without installing Jupyter. - Link to Google Map (BKK) added below. Key:Green > 100Mbps(Superfast). Black > 500Mbps (Ultrafast). CSV provided. Code in Speedtestv1.1.ipynb Jupyter Notebook. - Community (Whirlpool) surprised [Link: https://whrl.pl/RgAPTl] that Melb has 20% at or above 100Mbps. Suggest plot Top 20% on map for community. Google Map link - now added (and tweet).
** Python melb = au_tiles.cx[144:146 , -39:-37] #Lat/Lon extract shg = tiles.cx[120:122 , 30:32] #Lat/Lon extract bkk = tiles.cx[100:102 , 13:15] #Lat/Lon extract lax = tiles.cx[-118:-120, 33:35] #lat/Lon extract ALC=tiles.cx[132:134, -22:-24] #Lat/Lon extract
Histograms (v9), and data visualisations (v3,5,9,11) will be provided. Data Sourced from - This is an extract of Speedtest Open data available at Amazon WS (link below - opendata.aws).
**VERSIONS v.24 Add tweet and google map of Top 20% (over 100Mbps locations) in Mel Q322. Add v.1.5 MEL-Superfast notebook, and CSV of results (now on Google Map; link below). v23. Add graph of 2022 Broadband distribution, and compare 2020 - 2022. Updated v1.4 Jupyter notebook. v22. Add Import ipynb; workflow-import-4cities. v21. Add Q3 2022 data; five cities inc ALC. Geojson files. (2020; 4.3M tests 2022; 2.9M tests)
v20. Speedtest - Five Cities inc ALC. v19. Add ALC2.ipynb. v18. Add ALC line graph. v17. Added ipynb for ALC. Added ALC to title.v16. Load Alice Springs Data Q221 - csv. Added Google Map link of ALC. v15. Load Melb Q1 2021 data - csv. V14. Added Melb Q1 2021 data - geojson. v13. Added Twitter link to pics. v12 Add Line-Compare pic (fastest 1000 locations) inc Jupyter (nbn-intl-v1.2.ipynb). v11 Add Line-Compare pic, plotting Four Cities on a graph. v10 Add Four Histograms in one pic. v9 Add Histogram for Four Cities. Add NBN-Intl.v1.1.ipynb (Jupyter Notebook). v8 Renamed LAX file to Q3, rather than 03. v7 Amended file names of BKK files to correctly label as Q3, not Q2 or 06. v6 Added LAX file. v5 Add screenshot of BKK Google Map. v4 Add BKK Google map(link below), and BKK csv mapping files. v3 replaced MEL map with big key version. Prev key was very tiny in top right corner. v2 Uploaded MEL, SHG, BKK data and Jupyter Notebook v1 Metadata record
** LICENCE AWS data licence on Speedtest data is "CC BY-NC-SA 4.0", so use of this data must be: - non-commercial (NC) - reuse must be share-alike (SA)(add same licence). This restricts the standard CC-BY Figshare licence.
** Other uses of Speedtest Open Data; - see link at Speedtest below.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
redspot replay
from redspot import database
from redspot.notebook import Notebook
nbk = Notebook()
for signal in database.get("path-to-db"):
time, panel, kind, args = signal
nbk.apply(kind, args) # apply change
print(nbk) # print notebook
redspot record
docker run --rm -it -p8888:8888
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of Jupyter Notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we analyzed 1.4 million notebooks from GitHub.
This repository contains two files:
The dump.tar.bz2 file contains a PostgreSQL dump of the database, with all the data we extracted from the notebooks.
The jupyter_reproducibility.tar.bz2 file contains all the scripts we used to query and download Jupyter Notebooks, extract data from them, and analyze the data. It is organized as follows:
In the remaining of this text, we give instructions for reproducing the analyses, by using the data provided in the dump and reproducing the collection, by collecting data from GitHub again.
Reproducing the Analysis
This section shows how to load the data in the database and run the analyses notebooks. In the analysis, we used the following environment:
Ubuntu 18.04.1 LTS
PostgreSQL 10.6
Conda 4.5.11
Python 3.7.2
PdfCrop 2012/11/02 v1.38
First, download dump.tar.bz2 and extract it:
tar -xjf dump.tar.bz2
It extracts the file db2019-03-13.dump. Create a database in PostgreSQL (we call it "jupyter"), and use psql to restore the dump:
psql jupyter < db2019-03-13.dump
It populates the database with the dump. Now, configure the connection string for sqlalchemy by setting the environment variable JUP_DB_CONNECTTION:
export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter";
Download and extract jupyter_reproducibility.tar.bz2:
tar -xjf jupyter_reproducibility.tar.bz2
Create a conda environment with Python 3.7:
conda create -n analyses python=3.7
conda activate analyses
Go to the analyses folder and install all the dependencies of the requirements.txt
cd jupyter_reproducibility/analyses
pip install -r requirements.txt
For reproducing the analyses, run jupyter on this folder:
jupyter notebook
Execute the notebooks on this order:
Reproducing or Expanding the Collection
The collection demands more steps to reproduce and takes much longer to run (months). It also involves running arbitrary code on your machine. Proceed with caution.
Requirements
This time, we have extra requirements:
All the analysis requirements
lbzip2 2.5
gcc 7.3.0
Github account
Gmail account
Environment
First, set the following environment variables:
export JUP_MACHINE="db"; # machine identifier
export JUP_BASE_DIR="/mnt/jupyter/github"; # place to store the repositories
export JUP_LOGS_DIR="/home/jupyter/logs"; # log files
export JUP_COMPRESSION="lbzip2"; # compression program
export JUP_VERBOSE="5"; # verbose level
export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter"; # sqlchemy connection
export JUP_GITHUB_USERNAME="github_username"; # your github username
export JUP_GITHUB_PASSWORD="github_password"; # your github password
export JUP_MAX_SIZE="8000.0"; # maximum size of the repositories directory (in GB)
export JUP_FIRST_DATE="2013-01-01"; # initial date to query github
export JUP_EMAIL_LOGIN="gmail@gmail.com"; # your gmail address
export JUP_EMAIL_TO="target@email.com"; # email that receives notifications
export JUP_OAUTH_FILE="~/oauth2_creds.json" # oauth2 auhentication file
export JUP_NOTEBOOK_INTERVAL=""; # notebook id interval for this machine. Leave it in blank
export JUP_REPOSITORY_INTERVAL=""; # repository id interval for this machine. Leave it in blank
export JUP_WITH_EXECUTION="1"; # run execute python notebooks
export JUP_WITH_DEPENDENCY="0"; # run notebooks with and without declared dependnecies
export JUP_EXECUTION_MODE="-1"; # run following the execution order
export JUP_EXECUTION_DIR="/home/jupyter/execution"; # temporary directory for running notebooks
export JUP_ANACONDA_PATH="~/anaconda3"; # conda installation path
export JUP_MOUNT_BASE="/home/jupyter/mount_ghstudy.sh"; # bash script to mount base dir
export JUP_UMOUNT_BASE="/home/jupyter/umount_ghstudy.sh"; # bash script to umount base dir
export JUP_NOTEBOOK_TIMEOUT="300"; # timeout the extraction
# Frequenci of log report
export JUP_ASTROID_FREQUENCY="5";
export JUP_IPYTHON_FREQUENCY="5";
export JUP_NOTEBOOKS_FREQUENCY="5";
export JUP_REQUIREMENT_FREQUENCY="5";
export JUP_CRAWLER_FREQUENCY="1";
export JUP_CLONE_FREQUENCY="1";
export JUP_COMPRESS_FREQUENCY="5";
export JUP_DB_IP="localhost"; # postgres database IP
Then, configure the file ~/oauth2_creds.json, according to yagmail documentation: https://media.readthedocs.org/pdf/yagmail/latest/yagmail.pdf
Configure the mount_ghstudy.sh and umount_ghstudy.sh scripts. The first one should mount the folder that stores the directories. The second one should umount it. You can leave the scripts in blank, but it is not advisable, as the reproducibility study runs arbitrary code on your machine and you may lose your data.
Scripts
Download and extract jupyter_reproducibility.tar.bz2:
tar -xjf jupyter_reproducibility.tar.bz2
Install 5 conda environments and 5 anaconda environments, for each python version. In each of them, upgrade pip, install pipenv, and install the archaeology package (Note that it is a local package that has not been published to pypi. Make sure to use the -e option):
Conda 2.7
conda create -n raw27 python=2.7 -y
conda activate raw27
pip install --upgrade pip
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
Anaconda 2.7
conda create -n py27 python=2.7 anaconda -y
conda activate py27
pip install --upgrade pip
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
Conda 3.4
It requires a manual jupyter and pathlib2 installation due to some incompatibilities found on the default installation.
conda create -n raw34 python=3.4 -y
conda activate raw34
conda install jupyter -c conda-forge -y
conda uninstall jupyter -y
pip install --upgrade pip
pip install jupyter
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
pip install pathlib2
Anaconda 3.4
conda create -n py34 python=3.4 anaconda -y
conda activate py34
pip install --upgrade pip
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
Conda 3.5
conda create -n raw35 python=3.5 -y
conda activate raw35
pip install --upgrade pip
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
Anaconda 3.5
It requires the manual installation of other anaconda packages.
conda create -n py35 python=3.5 anaconda -y
conda install -y appdirs atomicwrites keyring secretstorage libuuid navigator-updater prometheus_client pyasn1 pyasn1-modules spyder-kernels tqdm jeepney automat constantly anaconda-navigator
conda activate py35
pip install --upgrade pip
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
Conda 3.6
conda create -n raw36 python=3.6 -y
conda activate raw36
pip install --upgrade pip
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
Anaconda 3.6
conda create -n py36 python=3.6 anaconda -y
conda activate py36
conda install -y anaconda-navigator jupyterlab_server navigator-updater
pip install --upgrade pip
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
Conda 3.7
<code
Objective Daily COVID-19 data reported by the World Health Organization (WHO) may provide the basis for political ad hoc decisions including travel restrictions. Data reported by countries, however, is heterogeneous and metrics to evaluate its quality are scarce. In this work, we analyzed COVID-19 case counts provided by WHO and developed tools to evaluate country-specific reporting behaviors. Methods In this retrospective cross-sectional study, COVID-19 data reported daily to WHO from 3rd January 2020 until 14th June 2021 were analyzed. We proposed the concepts of binary reporting rate and relative reporting behavior and performed descriptive analyses for all countries with these metrics. We developed a score to evaluate the consistency of incidence and binary reporting rates. Further, we performed spectral clustering of the binary reporting rate and relative reporting behavior to identify salient patterns in these metrics. Results Our final analysis included 222 countries and regions...., Data collection COVID-19 data was downloaded from WHO. Using a public repository, we have added the countries' full names to the WHO data set using the two-letter abbreviations for each country to merge both data sets. The provided COVID-19 data covers January 2020 until June 2021. We uploaded the final data set used for the analyses of this paper. Data processing We processed data using a Jupyter Notebook with a Python kernel and publically available external libraries. This upload contains the required Jupyter Notebook (reporting_behavior.ipynb) with all analyses and some additional work, a README, and the conda environment yml (env.yml)., Any text editor including Microsoft Excel and their free alternatives can open the uploaded CSV file. Any web browser and some code editors (like the freely available Visual Studio Code) can show the uploaded Jupyter Notebook if the required Python environment is set up correctly.
We implemented automated workflows using Jupyter notebooks for each state. The GIS processing, crucial for merging, extracting, and projecting GeoTIFF data, was performed using ArcPy—a Python package for geographic data analysis, conversion, and management within ArcGIS (Toms, 2015). After generating state-scale LES (large extent spatial) datasets in GeoTIFF format, we utilized the xarray and rioxarray Python packages to convert GeoTIFF to NetCDF. Xarray is a Python package to work with multi-dimensional arrays and rioxarray is rasterio xarray extension. Rasterio is a Python library to read and write GeoTIFF and other raster formats. Xarray facilitated data manipulation and metadata addition in the NetCDF file, while rioxarray was used to save GeoTIFF as NetCDF. These procedures resulted in the creation of three HydroShare resources (HS 3, HS 4 and HS 5) for sharing state-scale LES datasets. Notably, due to licensing constraints with ArcGIS Pro, a commercial GIS software, the Jupyter notebook development was undertaken on a Windows OS.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This resource includes two Jupyter Notebooks as a quick start tutorial for the ERA5 Data Component of the PyMT modeling framework (https://pymt.readthedocs.io/) developed by Community Surface Dynamics Modeling System (CSDMS https://csdms.colorado.edu/).
The bmi_era5 package is an implementation of the Basic Model Interface (BMI https://bmi.readthedocs.io/en/latest/) for the ERA5 dataset (https://confluence.ecmwf.int/display/CKB/ERA5). This package uses the cdsapi (https://cds.climate.copernicus.eu/api-how-to) to download the ERA5 dataset and wraps the dataset with BMI for data control and query (currently support 3 dimensional ERA5 dataset). This package is not implemented for people to use and is the key element to help convert the ERA5 dataset into a data component for the PyMT modeling framework.
The pymt_era5 package is implemented for people to use as a reusable, plug-and-play ERA5 data component for the PyMT modeling framework. This package uses the BMI implementation from the bmi_era5 package and allows the ERA5 datasets to be easily coupled with other datasets or models that expose a BMI.
HydroShare users can test and run the Jupyter Notebooks (bmi_era5.ipynb, pymt_era5.ipynb) directly through the "CUAHSI JupyterHub" web app with the following steps: - For the new user of the CUAHSI JupyterHub, please first make a request to join the "CUAHSI Could Computing Group" (https://www.hydroshare.org/group/156). After approval, the user will gain access to launch the CUAHSI JupyterHub. - Click on the "Open with" button. (on the top right corner of the page) - Select "CUAHSI JupyterHub". - Select "CSDMS Workbench" server option. (Make sure to select the right server option. Otherwise, the notebook won't run correctly.)
If there is any question or suggestion about the ERA5 data component, please create a github issue at https://github.com/gantian127/bmi_era5/issues
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The purpose of this code is to produce a line graph visualization of COVID-19 data. This Jupyter notebook was built and run on Google Colab. This code will serve mostly as a guide and will need to be adapted where necessary to be run locally. The separate COVID-19 datasets uploaded to this Dataverse can be used with this code. This upload is made up of the IPYNB and PDF files of the code.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Replication Package Files
1. Forms.zip: contains the forms used to collect data for the experiment
2. Experiments.zip: contains the participants’ and sandboxers’ experimental task workflow with Newton.
3. Responses.zip: contains the responses collected from participants during the experiments.
4. Analysis.zip: contains the data analysis scripts and results of the experiments.
5. newton.zip: contains the tool we used for the WoZ experiment.
TutorialStudy.pdf: script used in the experiment with and without Newton to be consistent with all participants.
Woz_Script.pdf: script wizard used to maintain consistent Newton responses among the participants.
1. Forms.zip
The forms zip contains the following files:
Demographics.pdf: a PDF form used to collect demographic information from participants before the experiments
Post-Task Control (without the tool).pdf: a PDF form used to collect data from participants about challenges and interactions when performing the task without Newton
Post-Task Newton (with the tool).pdf: a PDF form used to collect data from participants after the task with Newton.
Post-Study Questionnaire.pdf: a PDF form used to collect data from the participant after the experiment.
2. Experiments.zip
The experiments zip contains two types of folders:
exp[participant’s number]-c[number of dataset used for control task]e[number of dataset used for experimental task]. Example: exp1-c2e1 (experiment participant 1 - control used dataset 2, experimental used dataset 1)
sandboxing[sandboxer’s number]. Example: sandboxing1 (experiment with sandboxer 1)
Every experiment subfolder contains:
warmup.json: a JSON file with the results of Newton-Participant interactions in the chat for the warmup task.
warmup.ipynb: a Jupyter notebook file with the participant’s results from the code provided by Newton in the warmup task.
sample1.csv: Death Event dataset.
sample2.csv: Heart Disease dataset.
tool.ipynb: a Jupyter notebook file with the participant’s results from the code provided by Newton in the experimental task.
python.ipynb: a Jupyter notebook file with the participant’s results from the code they tried during the control task.
results.json: a JSON file with the results of Newton-Participant interactions in the chat for the task with Newton.
To load an experiment chat log into Newton, add the following code to the notebook:
import anachat
import json
with open("result.json", "r") as f:
anachat.comm.COMM.history = json.load(f)
Then, click on the notebook name inside Newton chat
Note 1: the subfolder for P6 is exp6-e2c1-serverdied because the experiment server died before we were able to save the logs. We reconstructed them using the notebook newton_remake.ipynb based on the video recording.
Note 2: The sandboxing occurred during the development of Newton. We did not collect all the files, and the format of JSON files is different than the one supported by the attached version of Newton.
3. Responses.zip
The responses zip contains the following files:
demographics.csv: a CSV file containing the responses collected from participants using the demographics form
task_newton.csv: a CSV file containing the responses collected from participants using the post-task newton form.
task_control.csv: a CSV file containing the responses collected from participants using the post-task control form.
post_study.csv: a CSV file containing the responses collected from participants using the post-study control form.
4. Analysis.zip
The analysis zip contains the following files:
1.Challenge.ipynb: a Jupyter notebook file where the perceptions of challenges figure was created.
2.Interactions.py: a Python file where the participants’ JSON files were created.
3.Interactions.Graph.ipynb: a Jupyter notebook file where the participant’s interaction figure was created.
4.Interactions.Count.ipynb: a Jupyter notebook file that counts participants’ interaction with each figure.
config_interactions.py: this file contains the definitions of interaction colors and grouping
interactions.json: a JSON file with the interactions during the Newton task of each participant based on the categorization.
requirements.txt: dependencies required to run the code to generate the graphs and json analysis.
To run the analyses, install the dependencies on python 3.10 with the following command and execute the scripts and notebooks in order.:
pip install -r requirements.txt
5. newton.zip
The newton zip contains the source code of the Jupyter Lab extension we used in the experiments. Read the README.md file inside it for instructions on how to install and run it.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Load, wind and solar, prices in hourly resolution. This data package contains different kinds of timeseries data relevant for power system modelling, namely electricity prices, electricity consumption (load) as well as wind and solar power generation and capacities. The data is aggregated either by country, control area or bidding zone. Geographical coverage includes the EU and some neighbouring countries. All variables are provided in hourly resolution. Where original data is available in higher resolution (half-hourly or quarter-hourly), it is provided in separate files. This package version only contains data provided by TSOs and power exchanges via ENTSO-E Transparency, covering the period 2015-mid 2020. See previous versions for historical data from a broader range of sources. All data processing is conducted in Python/pandas and has been documented in the Jupyter notebooks linked below.
This dataset contains data and code from the manuscript:Heintzman, L.J., McIntyre, N.E., Langendoen, E.J., & Read, Q.D. (2024). Cultivation and dynamic cropping processes impart land-cover heterogeneity within agroecosystems: a metrics-based case study in the Yazoo-Mississippi Delta (USA). Landscape Ecology 39, 29 (2024). https://doi.org/10.1007/s10980-024-01797-0There are 14 rasters of land use and land cover data for the study region, in .tif format with associated auxiliary files, two shape files with county boundaries and study area extent, a CSV file with summary information derived from the rasters, and a Jupyter notebook containing Python code.The rasters included here represent an intermediate data product. Original unprocessed rasters from NASS CropScape are not included here, nor is the code to process them.List of filesMS_Delta_maps.zipMSDeltaCounties_UTMZone15N.shp: Depiction of the 19 counties (labeled) that intersect the Mississippi Alluvial Plain in western Mississippi.MS_Delta_MAP_UTMZone15N.shp: Depiction of the study area extent.mf8h_20082021.zipmf8h_XXXX.tif: Yearly, reclassified and majority filtered LULC data used to build comboall1.csv - derived from USDA NASS CropScape. There are 14 .tif files total for years 2008-2021. Each .tif file includes auxiliary files with the same file name and the following extensions: .tfw, .tif.aux.xml, .tif.ovr., .tif.vat.cpg., .tif.vat.dbf.comboall1.csv: Combined dataset of LULC information for all 14 years in study period.analysis.ipynb_.txt: Jupyter Notebook used to analyze comboall1.csv. Convert to .ipynb format to open with Jupyter.This research was conducted under USDA Agricultural Research Service, National Program 211 (Water Availability and Watershed Management).
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
This is an example dataset recorded using version 1.0 of the open-source-hardware OpenAXES IMU. Please see the github repository for more information on the hardware and firmware. Please find the most up-to-date version of this document in the repository
This dataset was recorded using four OpenAXES IMUs mounted on the segments of a robot arm (UR5 by Universal Robots). The robot arm was programmed to perform a calibration movement, then trace a 2D circle or triangle in the air with its tool center point (TCP), and return to its starting position, at four different speeds from 100 mm/s to 250 mm/s. This results in a total of 8 different scenarios (2 shapes times 4 speeds). The ground truth joint angle and TCP position values were obtained from the robot controller. The calibration movement at the beginning of the measurement allows for calculating the exact orientation of the sensors on the robot arm.
The IMUs were configured to send the raw data from the three gyroscope axes and the six accelerometer axes to a PC via BLE with 16 bit resolution per axis and 100 Hz sample rate. Since no data packets were lost during this process, this dataset allows comparing and tuning different sensor fusion algorithms on the recorded raw data while using the ground truth robot data as a reference.
In order to visualize the results, the quaternion sequences from the IMUs were applied to the individual segments of a 3D model of the robot arm. The end of this kinematic chain represents the TCP of the virtual model, which should ideally move along the same trajectory as the ground truth, barring the accuracy of the IMUs. Since the raw sensor data of these measurements is available, the calibration coefficients can also be applied ex-post.
Since there are are 6 joints but only 4 IMUS, some redundancy must be exploited. The redundancy comes from the fact that each IMU has 3 rotational degrees of fredom, but each joint has only one:
q0
and q1
are both derived from the orientation of the "humerus" IMU.q2
is the difference† between the orientation of the "humerus" and "radius" IMUs.q3
is the difference between the orientation of the "radius" and "carpus" IMUs.q4
is the difference between the orientation of the "carpus" and "digitus" IMUs.q5
does not influence the position of the TCP, only its orientation, so it is ignored in the evaluation.R1 * inv(R0)
for two quaternions (or rotations) R0
and R1
. The actual code works a bit differently, but this describes the general principle.measure_raw-2022-09-15/
, one folder per scenario.
In those folders, there is one CSV file per IMU.measure_raw-2022-09-15/robot/
, one CSV and MAT file per scenario.Media
. Videos are stored in git lfs.The file openaxes-example-robot-dataset.ipynb
is provided to play around with the data in the dataset and demonstrate how the files are read and interpreted.
To use the notebook, set up a Python 3 virtual environment and therein install the necessary packets with pip install -r resuirements.txt
.
In order to view the graphs contained in the ipynb file, you will most likely have to trust the notebook beforehand, using the following command:
jupyter trust openaxes-example-robot-dataset.ipynb
Beware: This notebook is not a comprehensive evaluation and any results and plots shown in the file are not necessarily scientifically sound evidence of anything.
The notebook will store intermediate files in the measure_raw-2022-09-15
directory, like the quaternion files calculated by the different filters, or the files containing the reconstructed TCP positions.
All intermediate files should be ignored by the file measure_raw-2022-09-15/.gitignore
.
The generated intermediate files are also provided in the file measure_raw-2022-09-15.tar.bz2
, in case you want to inspect the generated files without running the the notebook.
A number of tools are used in the evaluation notebook. Below is a short overview, but not a complete specification. If you need to understand the input and output formats for each tool, please read the code.
calculate-quaternions.py
is used in the evaluation notebook to compute different attitude estimation filters like Madgwick or VQF on the raw accelerometer and gyroscrope measurements at 100 Hz.madgwick-filter
contains a small C program that applies the original Madgwick filter to a CSV file containing raw measurements and prints the results. It is used by calculate-quaternions.py
.calculate-robot-quaternions.py
calculates a CSV file of quaternions equivalent to the IMU quaternions from a CSV file containing the joint angles of the robot.dsense_vis
mentioned in the notebook is used to calculate the 3D model of the robot arm from quaternions and determine the mounting orientations of the IMUs on the robot arm.
This program will be released at a future date.
In the meantime, the output files of dsense_vis
are provided in the file measure_raw-2022-09-15.tar.bz2
, which contains the complete content of the measure_raw-2022-09-15
directory after executing the whole notebook.
Just unpack this archive and merge its contents with the measure_raw-2022-09-15
directory.
This allows you to explore the reconstructed TCP files for the filters implemented at the time of publication.Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of Jupyter Notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we analyzed 1.4 million notebooks from GitHub.
This repository contains two files:
The dump.tar.bz2 file contains a PostgreSQL dump of the database, with all the data we extracted from the notebooks.
The jupyter_reproducibility.tar.bz2 file contains all the scripts we used to query and download Jupyter Notebooks, extract data from them, and analyze the data. It is organized as follows:
In the remaining of this text, we give instructions for reproducing the analyses, by using the data provided in the dump and reproducing the collection, by collecting data from GitHub again.
Reproducing the Analysis
This section shows how to load the data in the database and run the analyses notebooks. In the analysis, we used the following environment:
Ubuntu 18.04.1 LTS
PostgreSQL 10.6
Conda 4.5.1
Python 3.6.8
PdfCrop 2012/11/02 v1.38
First, download dump.tar.bz2 and extract it:
tar -xjf dump.tar.bz2
It extracts the file db2019-01-13.dump. Create a database in PostgreSQL (we call it "jupyter"), and use psql to restore the dump:
psql jupyter < db2019-01-13.dump
It populates the database with the dump. Now, configure the connection string for sqlalchemy by setting the environment variable JUP_DB_CONNECTTION:
export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter";
Download and extract jupyter_reproducibility.tar.bz2:
tar -xjf jupyter_reproducibility.tar.bz2
Create a conda environment with Python 3.6:
conda create -n py36 python=3.6
Go to the analyses folder and install all the dependencies of the requirements.txt
cd jupyter_reproducibility/analyses
pip install -r requirements.txt
For reproducing the analyses, run jupyter on this folder:
jupyter notebook
Execute the notebooks on this order:
Reproducing or Expanding the Collection
The collection demands more steps to reproduce and takes much longer to run (months). It also involves running arbitrary code on your machine. Proceed with caution.
Requirements
This time, we have extra requirements:
All the analysis requirements
lbzip2 2.5
gcc 7.3.0
Github account
Gmail account
Environment
First, set the following environment variables:
export JUP_MACHINE="db"; # machine identifier
export JUP_BASE_DIR="/mnt/jupyter/github"; # place to store the repositories
export JUP_LOGS_DIR="/home/jupyter/logs"; # log files
export JUP_COMPRESSION="lbzip2"; # compression program
export JUP_VERBOSE="5"; # verbose level
export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter"; # sqlchemy connection
export JUP_GITHUB_USERNAME="github_username"; # your github username
export JUP_GITHUB_PASSWORD="github_password"; # your github password
export JUP_MAX_SIZE="8000.0"; # maximum size of the repositories directory (in GB)
export JUP_FIRST_DATE="2013-01-01"; # initial date to query github
export JUP_EMAIL_LOGIN="gmail@gmail.com"; # your gmail address
export JUP_EMAIL_TO="target@email.com"; # email that receives notifications
export JUP_OAUTH_FILE="~/oauth2_creds.json" # oauth2 auhentication file
export JUP_NOTEBOOK_INTERVAL=""; # notebook id interval for this machine. Leave it in blank
export JUP_REPOSITORY_INTERVAL=""; # repository id interval for this machine. Leave it in blank
export JUP_WITH_EXECUTION="1"; # run execute python notebooks
export JUP_WITH_DEPENDENCY="0"; # run notebooks with and without declared dependnecies
export JUP_EXECUTION_MODE="-1"; # run following the execution order
export JUP_EXECUTION_DIR="/home/jupyter/execution"; # temporary directory for running notebooks
export JUP_ANACONDA_PATH="~/anaconda3"; # conda installation path
export JUP_MOUNT_BASE="/home/jupyter/mount_ghstudy.sh"; # bash script to mount base dir
export JUP_UMOUNT_BASE="/home/jupyter/umount_ghstudy.sh"; # bash script to umount base dir
export JUP_NOTEBOOK_TIMEOUT="300"; # timeout the extraction
# Frequenci of log report
export JUP_ASTROID_FREQUENCY="5";
export JUP_IPYTHON_FREQUENCY="5";
export JUP_NOTEBOOKS_FREQUENCY="5";
export JUP_REQUIREMENT_FREQUENCY="5";
export JUP_CRAWLER_FREQUENCY="1";
export JUP_CLONE_FREQUENCY="1";
export JUP_COMPRESS_FREQUENCY="5";
export JUP_DB_IP="localhost"; # postgres database IP
Then, configure the file ~/oauth2_creds.json, according to yagmail documentation: https://media.readthedocs.org/pdf/yagmail/latest/yagmail.pdf
Configure the mount_ghstudy.sh and umount_ghstudy.sh scripts. The first one should mount the folder that stores the directories. The second one should umount it. You can leave the scripts in blank, but it is not advisable, as the reproducibility study runs arbitrary code on your machine and you may lose your data.
Scripts
Download and extract jupyter_reproducibility.tar.bz2:
tar -xjf jupyter_reproducibility.tar.bz2
Install 5 conda environments and 5 anaconda environments, for each python version. In each of them, upgrade pip, install pipenv, and install the archaeology package (Note that it is a local package that has not been published to pypi. Make sure to use the -e option):
Conda 2.7
conda create -n raw27 python=2.7 -y
conda activate raw27
pip install --upgrade pip
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
Anaconda 2.7
conda create -n py27 python=2.7 anaconda -y
conda activate py27
pip install --upgrade pip
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
Conda 3.4
It requires a manual jupyter and pathlib2 installation due to some incompatibilities found on the default installation.
conda create -n raw34 python=3.4 -y
conda activate raw34
conda install jupyter -c conda-forge -y
conda uninstall jupyter -y
pip install --upgrade pip
pip install jupyter
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
pip install pathlib2
Anaconda 3.4
conda create -n py34 python=3.4 anaconda -y
conda activate py34
pip install --upgrade pip
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
Conda 3.5
conda create -n raw35 python=3.5 -y
conda activate raw35
pip install --upgrade pip
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
Anaconda 3.5
It requires the manual installation of other anaconda packages.
conda create -n py35 python=3.5 anaconda -y
conda install -y appdirs atomicwrites keyring secretstorage libuuid navigator-updater prometheus_client pyasn1 pyasn1-modules spyder-kernels tqdm jeepney automat constantly anaconda-navigator
conda activate py35
pip install --upgrade pip
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
Conda 3.6
conda create -n raw36 python=3.6 -y
conda activate raw36
pip install --upgrade pip
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
Anaconda 3.6
conda create -n py36 python=3.6 anaconda -y
conda activate py36
conda install -y anaconda-navigator jupyterlab_server navigator-updater
pip install --upgrade pip
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
Conda 3.7
conda create -n raw37 python=3.7 -y
conda activate raw37
pip install --upgrade pip
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
Anaconda 3.7
When we
Collected in this dataset are the slideset and abstract for a presentation on Toward a Reproducible Research Data Repository by the depositar team at International Symposium on Data Science 2023 (DSWS 2023), hosted by the Science Council of Japan in Tokyo on December 13-15, 2023. The conference was organized by the Joint Support-Center for Data Science Research (DS), Research Organization of Information and Systems (ROIS) and the Committee of International Collaborations on Data Science, Science Council of Japan. The conference programme is also included as a reference.
Toward a Reproducible Research Data Repository
Cheng-Jen Lee, Chia-Hsun Ally Wang, Ming-Syuan Ho, and Tyng-Ruey Chuang
Institute of Information Science, Academia Sinica, Taiwan
The depositar (https://data.depositar.io/) is a research data repository at Academia Sinica (Taiwan) open to researhers worldwide for the deposit, discovery, and reuse of datasets. The depositar software itself is open source and builds on top of CKAN. CKAN, an open source project initiated by the Open Knowledge Foundation and sustained by an active user community, is a leading data management system for building data hubs and portals. In addition to CKAN's out-of-the-box features such as JSON data API and in-browser preview of uploaded data, we have added several features to the depositar, including sourcing from Wikidata as dataset keywords, a citation snippet for datasets, in-browser Shapefile preview, and a persistent identifier system based on ARK (Archival Resource Keys). At the same time, the depositar team faces an increasing demand for interactive computing (e.g. Jupyter Notebook) which facilitates not just data analysis, but also for the replication and demonstration of scientific studies. Recently, we have provided a JupyterHub service (a multi-tenancy JupyterLab) to some of the depositar's users. However, it still requires users to first download the data files (or copy the URLs of the files) from the depositar, then upload the data files (or paste the URLs) to the Jupyter notebooks for analysis. Furthermore, a JupyterHub deployed on a single server is limited by its processing power which may lower the service level to the users. To address the above issues, we are integrating the BinderHub into the depositar. BinderHub (https://binderhub.readthedocs.io/) is a kubernetes-based service that allows users to create interactive computing environments from code repositories. Once the integration is completed, users will be able to launch Jupyter Notebooks to perform data analysis and vsualization without leaving the depositar by clicking the BinderHub buttons on the datasets. In this presentation, we will first make a brief introduction to the depositar and BinderHub along with their relationship, then we will share our experiences in incorporating interactive computation in a data repository. We shall also evaluate the possibility of integrating the depositar with other automation frameworks (e.g. the Snakemake workflow management system) in order to enable users to reproduce data analysis.
BinderHub, CKAN, Data Repositories, Interactive Computing, Reproducible Research
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was originally curated by Software Carpentry, a branch of The Carpentries non-profit organization, and is based on data from the Gapminder Foundation. It consists of six tabular CSV files containing GDP data for various countries across different years. The dataset was initially prepared for the Software Carpentry tutorial "Plotting and Programming in Python" and is also reused in the Galaxy Training Network (GTN) tutorial "Use Jupyter Notebooks in Galaxy."
This GTN tutorial provides an introduction to launching a Jupyter Notebook in Galaxy, installing dependencies, and importing and exporting data. It serves as a setup guide for a Jupyter Notebook environment that can be used to follow the Software Carpentry tutorial "Plotting and Programming in Python."
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Information
This dataset presents long-term term indoor solar harvesting traces and jointly monitored with the ambient conditions. The data is recorded at 6 indoor positions with diverse characteristics at our instituate at ETH Zurich in Zurich, Switzerland.
The data is collected with a measurement platform [3] consisting of a solar panel (AM-5412) connected to a bq25505 energy harvesting chip that stores the harvested energy in a virtual battery circuit. Two TSL45315 light sensors placed on opposite sides of the solar panel monitor the illuminance level and a BME280 sensor logs ambient conditions like temperature, humidity and air pressure.
The dataset contains the measurement of the energy flow at the input and the output of the bq25505 harvesting circuit, as well alse the illuminance, temperature, humidity and air pressure measurements of the ambient sensors. The following timestamped data columns are available in the raw measurement format, as well as preprocessed and filtered HDF5 datasets:
V_in
- Converter input/solar panel output voltage, in voltI_in
- Converter input/solar panel output current, in ampereV_bat
- Battery voltage (emulated through circuit), in voltI_bat
- Net Battery current, in/out flowing current, in ampereEv_left
- Illuminance left of solar panel, in luxEv_right
- Illuminance left of solar panel, in luxP_amb
- Ambient air pressure, in pascalRH_amb
- Ambient relative humidity, unit-less between 0 and 1T_amb
- Ambient temperature, in centigrade CelsiusThe following publication presents and overview of the dataset and more details on the deployment used for data collection. A copy of the abstract is included in this dataset, see the file abstract.pdf
.
L. Sigrist, A. Gomez, and L. Thiele. Dataset: Tracing Indoor Solar Harvesting. In Proceedings of the 2nd Workshop on Data Acquisition To Analysis (DATA '19), 2019. [under submission]
Folder Structure and Files
processed/
- This folder holds the imported, merged and filtered datasets of the power and sensor measurements. The datasets are stored in HDF5 format and split by measurement position posXX
and and power and ambient sensor measurements. The files belonging to this folder are contained in archives named yyyy_mm_processed.tar
, where yyyy
and mm
represent the year and month the data was published. A separate file lists the exact content of each archive (see below).raw/
- This folder holds the raw measurement files recorded with the RocketLogger [1, 2] and using the measurement platform available at [3]. The files belonging to this folder are contained in archives named yyyy_mm_raw.tar
, where yyyy
and mm
represent the year and month the data was published. A separate file lists the exact content of each archive (see below).LICENSE
- License information for the dataset.README.md
- The README file containing this information.abstract.pdf
- A copy of the above mentioned abstract submitted to the DATA '19 Workshop, introducing this dataset and the deployment used to collect it.raw_import.ipynb
[open in nbviewer] - Jupyter Python notebook to import, merge, and filter the raw dataset from the raw/
folder. This is the exact code used to generate the processed dataset and store it in the HDF5 format in the processed/
folder.raw_preview.ipynb
[open in nbviewer] - This Jupyter Python notebook imports the raw dataset directly and plots a preview of the full power trace for all measurement positions.processing_python.ipynb
[open in nbviewer] - Jupyter Python notebook demonstrating the import and use of the processed dataset in Python. Calculates column-wise statistics, includes more detailed power plots and the simple energy predictor performance comparison included in the abstract.processing_r.ipynb
[open in nbviewer] - Jupyter R notebook demonstrating the import and use of the processed dataset in R. Calculates column-wise statistics and extracts and plots the energy harvesting conversion efficiency included in the abstract. Furthermore, the harvested power is analyzed as a function of the ambient light level.Dataset File Lists
Processed Dataset Files
The list of the processed datasets included in the yyyy_mm_processed.tar
archive is provided in yyyy_mm_processed.files.md
. The markdown formatted table lists the name of all files, their size in bytes, as well as the SHA-256 sums.
Raw Dataset Files
A list of the raw measurement files included in the yyyy_mm_raw.tar
archive(s) is provided in yyyy_mm_raw.files.md
. The markdown formatted table lists the name of all files, their size in bytes, as well as the SHA-256 sums.
Dataset Revisions
v1.0 (2019-08-03)
Initial release.
Includes the data collected from 2017-07-27 to 2019-08-01. The dataset archive files related to this revision are 2019_08_raw.tar
and 2019_08_processed.tar
.
For position pos06, the measurements from 2018-01-06 00:00:00 to 2018-01-10 00:00:00 are filtered (data inconsistency in file indoor1_p27.rld
).
Dataset Authors, Copyright and License
References
[1] L. Sigrist, A. Gomez, R. Lim, S. Lippuner, M. Leubin, and L. Thiele. Measurement and validation of energy harvesting IoT devices. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.
[2] ETH Zurich, Computer Engineering Group. RocketLogger Project Website, https://rocketlogger.ethz.ch/.
[3] L. Sigrist. Solar Harvesting and Ambient Tracing Platform, 2019. https://gitlab.ethz.ch/tec/public/employees/sigristl/harvesting_tracing
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
(1) Output of the Renewable Energy Model (REM) as described in Insights into weather-driven extremes in Europe’s resources for renewable energy (Ho and Fiedler, 2024), last modification on 30.10.2023 from Linh Ho, named year_PV_wind_generation_v2.nc, with 23 years from 1995 to 2017. REM includes one simulation of photovoltaic (PV) power production and one simulation of wind power production across European domain, with a horizontal resolution of 48 km, hourly output for the period 1995--2017.
The output has a European domain with the same size as in the reanalysis dataset COSMO-REA6. This is a rotated grid with the coordinates of the rotated North Pole −162.0, 39.25, and of the lower left corner −23.375, −28.375. See Bollmeyer et al. (2014, http://doi.org/10.1002/qj.2486). Data downloaded from https://opendata.dwd.de/climate_environment/REA/COSMO_REA6/
(2) Weather pattern classification daily for Europe from 1995 to April 2020, named EGWL_LegacyGWL.txt, from James (2007, http://doi.org/10.1007/s00704-006-0239-3)
(3) The installation data of PV and wind power in Europe for one scenario in 2050 from the CLIMIX model, processed to have the same horizontal resolution as in REM, named installed_capacity_PV_wind_power_from_CLIMIX_final.nc. Original data were provided at 0.11 degree resolution, acquired from personal communication with the author from Jerez et al. (2015, http://doi.org/10.1016/j.rser.2014.09.041)
(4) Python scripts of REM, including: - model_PV_wind_complete_v2.py: the main script to produce REM output - model_PV_wind_potential_v2.py: produce potential (capacity factor) of PV and wind power for model evaluations, e.g., against CDS and Renewables Ninja data, as descript in Ho and Fiedler (2024) - model_PV_wind_complete_v1_ONLYyear2000.py: a separate Python script to produce REM output only for the year 2000. Note that the data for 2000 from COSMO-REA6 were read in a different approach (using cfgrib) probably due to the time stamp changes at the beginning of the milenium, also explains the larger size of the final output - utils_LH_archive_Oct2022.py: contains necessary Python functions to run the other scripts
(5) Jupyter notebook files to reproduce the figures in Ho and Fiedler (2024), named Paper1_Fig*_**.ipynb
(6) Time series of European-aggregated PV and wind power production hourly during the period 1995--2017, processed data from the dataset (1) to facilitate the reproduction of the figures, including two installations scale-2019 and scenario-2050: - Timeseries_all_hourly_1995_2017_GW_scale2019.csv - Timeseries_all_hourly_1995_2017_GW_scen2050.csv
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the raw experimental data associated with the manuscript "Unveiling the charge distribution of a GaAs-based nanoelectronic device: A large experimental data-set approach" by Eleni Chatzikyriakou, Junliang Wang et al. See Arxiv:2205.00846 for more details.
Content of the different data files
The data are stored in 5 different files in the csv format.
For sample X1Y3, X2Y3, X5Y3 and X6Y3 some additional measurements have been realised :
data_1D_mk.csv : top and bottom gates of each QPC have been swept at the same time at 50 mK temperature.
data_2D_4K_TB.csv : top gate has been swept for different values of the bottom gate at 4K.
data_2D_mK_TB.csv : top gate has been swept for different values of the bottom gate at 50 mK temperature.
data_2D_mK_BT.csv : bottom gate has been swept for different values of the top gate at 50 mK temperature.
Format of the csv files
--- All the data files are in the following format.
A given curve "current versus gate voltage" is stored in two consecutive raws. The first one contains the value of the measured current, the second one contains the values of the applied gate voltages.
A 2D measurement "current versus top gate and bottom gate" is stored in three consecutive raws. The first one contains the value of the measured current, the second one contains the list of values of voltage applied on one of the gate. The third third raw contains the list of values of voltage applied on the other gate.
--- Each row of a csv file has the following format:
The first column identifies the quantum point contact and the quantity. The format is Xx_Yy_QpcNb_Meas.
The second column is the unit (A or V).
The third column is the number of sweeps (1, 2 or 3) performed. For some measurements, the same sweep has been done multiple times.
The fourth column is the design of the quantum point contact (A, B, C, D or E).
All the following columns contain the measured value. For 2D scans the different values of the gate corresponding to the third raw are placed one after the other.
Python scripts for data analysis
For convenience, we provide an example python scripts that can be used to load the data and plot them.
extract.py : Extracts the data into a dictionary and plots the I-V characteristics extract.ipynb : jupyter notebook using the different functions of extract.py
type -h for help
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The ATO (Australian Tax Office) made a dataset openly available (see links) showing all the Australian Salary and Wages (2002, 2006, 2010, 2014) by detailed occupation (around 1,000) and over 100 SA4 regions. Sole Trader sales and earnings are also provided. This open data (csv) is now packaged into a database (*.sql) with 45 sample SQL queries (backupSQL[date]_public.txt).See more description at related Figshare #datavis record. Versions:V5: Following #datascience course, I have made main data (individual salary and wages) available as csv and Jupyter Notebook. Checksum matches #dataTotals. In 209,xxx rows.Also provided Jobs, and SA4(Locations) description files as csv. More details at: Where are jobs growing/shrinking? Figshare DOI: 4056282 (linked below). Noted 1% discrepancy ($6B) in 2010 wages total - to follow up.#dataTotals - Salary and WagesYearWorkers (M)Earnings ($B) 20028.528520069.4372201010.2481201410.3584#dataTotal - Sole TradersYearWorkers (M)Sales ($B)Earnings ($B)20020.9611320061.0881920101.11122620141.19630#links See ATO request for data at ideascale link below.See original csv open data set (CC-BY) at data.gov.au link below.This database was used to create maps of change in regional employment - see Figshare link below (m9.figshare.4056282).#packageThis file package contains a database (analysing the open data) in SQL package and sample SQL text, interrogating the DB. DB name: test. There are 20 queries relating to Salary and Wages.#analysisThe database was analysed and outputs provided on Nectar(.org.au) resources at: http://118.138.240.130.(offline)This is only resourced for max 1 year, from July 2016, so will expire in June 2017. Hence the filing here. The sample home page is provided here (and pdf), but not all the supporting files, which may be packaged and added later. Until then all files are available at the Nectar URL. Nectar URL now offline - server files attached as package (html_backup[date].zip), including php scripts, html, csv, jpegs.#installIMPORT: DB SQL dump e.g. test_2016-12-20.sql (14.8Mb)1.Started MAMP on OSX.1.1 Go to PhpMyAdmin2. New Database: 3. Import: Choose file: test_2016-12-20.sql -> Go (about 15-20 seconds on MacBookPro 16Gb, 2.3 Ghz i5)4. four tables appeared: jobTitles 3,208 rows | salaryWages 209,697 rows | soleTrader 97,209 rows | stateNames 9 rowsplus views e.g. deltahair, Industrycodes, states5. Run test query under **#; Sum of Salary by SA4 e.g. 101 $4.7B, 102 $6.9B#sampleSQLselect sa4,(select sum(count) from salaryWageswhere year = '2014' and sa4 = sw.sa4) as thisYr14,(select sum(count) from salaryWageswhere year = '2010' and sa4 = sw.sa4) as thisYr10,(select sum(count) from salaryWageswhere year = '2006' and sa4 = sw.sa4) as thisYr06,(select sum(count) from salaryWageswhere year = '2002' and sa4 = sw.sa4) as thisYr02from salaryWages swgroup by sa4order by sa4
This item contains data and code used in experiments that produced the results for Sadler et. al (2022) (see below for full reference). We ran five experiments for the analysis, Experiment A, Experiment B, Experiment C, Experiment D, and Experiment AuxIn. Experiment A tested multi-task learning for predicting streamflow with 25 years of training data and using a different model for each of 101 sites. Experiment B tested multi-task learning for predicting streamflow with 25 years of training data and using a single model for all 101 sites. Experiment C tested multi-task learning for predicting streamflow with just 2 years of training data. Experiment D tested multi-task learning for predicting water temperature with over 25 years of training data. Experiment AuxIn used water temperature as an input variable for predicting streamflow. These experiments and their results are described in detail in the WRR paper. Data from a total of 101 sites across the US was used for the experiments. The model input data and streamflow data were from the Catchment Attributes and Meteorology for Large-sample Studies (CAMELS) dataset (Newman et. al 2014, Addor et. al 2017). The water temperature data were gathered from the National Water Information System (NWIS) (U.S. Geological Survey, 2016). The contents of this item are broken into 13 files or groups of files aggregated into zip files:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset compares four cities FIXED-line broadband internet speeds: - Melbourne, AU - Bangkok, TH - Shanghai, CN - Los Angeles, US - Alice Springs, AU
ERRATA: 1.Data is for Q3 2020, but some files are labelled incorrectly as 02-20 of June 20. They all should read Sept 20, or 09-20 as Q3 20, rather than Q2. Will rename and reload. Amended in v7.
*lines of data for each geojson file; a line equates to a 600m^2 location, inc total tests, devices used, and average upload and download speed - MEL 16181 locations/lines => 0.85M speedtests (16.7 tests per 100people) - SHG 31745 lines => 0.65M speedtests (2.5/100pp) - BKK 29296 lines => 1.5M speedtests (14.3/100pp) - LAX 15899 lines => 1.3M speedtests (10.4/100pp) - ALC 76 lines => 500 speedtests (2/100pp)
Geojsons of these 2* by 2* extracts for MEL, BKK, SHG now added, and LAX added v6. Alice Springs added v15.
This dataset unpacks, geospatially, data summaries provided in Speedtest Global Index (linked below). See Jupyter Notebook (*.ipynb) to interrogate geo data. See link to install Jupyter.
** To Do Will add Google Map versions so everyone can see without installing Jupyter. - Link to Google Map (BKK) added below. Key:Green > 100Mbps(Superfast). Black > 500Mbps (Ultrafast). CSV provided. Code in Speedtestv1.1.ipynb Jupyter Notebook. - Community (Whirlpool) surprised [Link: https://whrl.pl/RgAPTl] that Melb has 20% at or above 100Mbps. Suggest plot Top 20% on map for community. Google Map link - now added (and tweet).
** Python melb = au_tiles.cx[144:146 , -39:-37] #Lat/Lon extract shg = tiles.cx[120:122 , 30:32] #Lat/Lon extract bkk = tiles.cx[100:102 , 13:15] #Lat/Lon extract lax = tiles.cx[-118:-120, 33:35] #lat/Lon extract ALC=tiles.cx[132:134, -22:-24] #Lat/Lon extract
Histograms (v9), and data visualisations (v3,5,9,11) will be provided. Data Sourced from - This is an extract of Speedtest Open data available at Amazon WS (link below - opendata.aws).
**VERSIONS v.24 Add tweet and google map of Top 20% (over 100Mbps locations) in Mel Q322. Add v.1.5 MEL-Superfast notebook, and CSV of results (now on Google Map; link below). v23. Add graph of 2022 Broadband distribution, and compare 2020 - 2022. Updated v1.4 Jupyter notebook. v22. Add Import ipynb; workflow-import-4cities. v21. Add Q3 2022 data; five cities inc ALC. Geojson files. (2020; 4.3M tests 2022; 2.9M tests)
v20. Speedtest - Five Cities inc ALC. v19. Add ALC2.ipynb. v18. Add ALC line graph. v17. Added ipynb for ALC. Added ALC to title.v16. Load Alice Springs Data Q221 - csv. Added Google Map link of ALC. v15. Load Melb Q1 2021 data - csv. V14. Added Melb Q1 2021 data - geojson. v13. Added Twitter link to pics. v12 Add Line-Compare pic (fastest 1000 locations) inc Jupyter (nbn-intl-v1.2.ipynb). v11 Add Line-Compare pic, plotting Four Cities on a graph. v10 Add Four Histograms in one pic. v9 Add Histogram for Four Cities. Add NBN-Intl.v1.1.ipynb (Jupyter Notebook). v8 Renamed LAX file to Q3, rather than 03. v7 Amended file names of BKK files to correctly label as Q3, not Q2 or 06. v6 Added LAX file. v5 Add screenshot of BKK Google Map. v4 Add BKK Google map(link below), and BKK csv mapping files. v3 replaced MEL map with big key version. Prev key was very tiny in top right corner. v2 Uploaded MEL, SHG, BKK data and Jupyter Notebook v1 Metadata record
** LICENCE AWS data licence on Speedtest data is "CC BY-NC-SA 4.0", so use of this data must be: - non-commercial (NC) - reuse must be share-alike (SA)(add same licence). This restricts the standard CC-BY Figshare licence.
** Other uses of Speedtest Open Data; - see link at Speedtest below.