32 datasets found

Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus...
figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard Ferrers; Speedtest Global Index (2023). Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus ALC - 2020, 2022 [Dataset]. http://doi.org/10.6084/m9.figshare.13621169.v24
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13621169.v24
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Richard Ferrers; Speedtest Global Index
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset compares four cities FIXED-line broadband internet speeds: - Melbourne, AU - Bangkok, TH - Shanghai, CN - Los Angeles, US - Alice Springs, AU

ERRATA: 1.Data is for Q3 2020, but some files are labelled incorrectly as 02-20 of June 20. They all should read Sept 20, or 09-20 as Q3 20, rather than Q2. Will rename and reload. Amended in v7.

LAX file named 0320, when should be Q320. Amended in v8.

*lines of data for each geojson file; a line equates to a 600m^2 location, inc total tests, devices used, and average upload and download speed - MEL 16181 locations/lines => 0.85M speedtests (16.7 tests per 100people) - SHG 31745 lines => 0.65M speedtests (2.5/100pp) - BKK 29296 lines => 1.5M speedtests (14.3/100pp) - LAX 15899 lines => 1.3M speedtests (10.4/100pp) - ALC 76 lines => 500 speedtests (2/100pp)

Geojsons of these 2* by 2* extracts for MEL, BKK, SHG now added, and LAX added v6. Alice Springs added v15.

This dataset unpacks, geospatially, data summaries provided in Speedtest Global Index (linked below). See Jupyter Notebook (*.ipynb) to interrogate geo data. See link to install Jupyter.

** To Do Will add Google Map versions so everyone can see without installing Jupyter. - Link to Google Map (BKK) added below. Key:Green > 100Mbps(Superfast). Black > 500Mbps (Ultrafast). CSV provided. Code in Speedtestv1.1.ipynb Jupyter Notebook. - Community (Whirlpool) surprised [Link: https://whrl.pl/RgAPTl] that Melb has 20% at or above 100Mbps. Suggest plot Top 20% on map for community. Google Map link - now added (and tweet).

** Python melb = au_tiles.cx[144:146 , -39:-37] #Lat/Lon extract shg = tiles.cx[120:122 , 30:32] #Lat/Lon extract bkk = tiles.cx[100:102 , 13:15] #Lat/Lon extract lax = tiles.cx[-118:-120, 33:35] #lat/Lon extract ALC=tiles.cx[132:134, -22:-24] #Lat/Lon extract

Histograms (v9), and data visualisations (v3,5,9,11) will be provided. Data Sourced from - This is an extract of Speedtest Open data available at Amazon WS (link below - opendata.aws).

**VERSIONS v.24 Add tweet and google map of Top 20% (over 100Mbps locations) in Mel Q322. Add v.1.5 MEL-Superfast notebook, and CSV of results (now on Google Map; link below). v23. Add graph of 2022 Broadband distribution, and compare 2020 - 2022. Updated v1.4 Jupyter notebook. v22. Add Import ipynb; workflow-import-4cities. v21. Add Q3 2022 data; five cities inc ALC. Geojson files. (2020; 4.3M tests 2022; 2.9M tests)

Melb 14784 lines Avg download speed 69.4M Tests 0.39M

SHG 31207 lines Avg 233.7M Tests 0.56M

ALC 113 lines Avg 51.5M Test 1092

BKK 29684 lines Avg 215.9M Tests 1.2M

LAX 15505 lines Avg 218.5M Tests 0.74M

v20. Speedtest - Five Cities inc ALC. v19. Add ALC2.ipynb. v18. Add ALC line graph. v17. Added ipynb for ALC. Added ALC to title.v16. Load Alice Springs Data Q221 - csv. Added Google Map link of ALC. v15. Load Melb Q1 2021 data - csv. V14. Added Melb Q1 2021 data - geojson. v13. Added Twitter link to pics. v12 Add Line-Compare pic (fastest 1000 locations) inc Jupyter (nbn-intl-v1.2.ipynb). v11 Add Line-Compare pic, plotting Four Cities on a graph. v10 Add Four Histograms in one pic. v9 Add Histogram for Four Cities. Add NBN-Intl.v1.1.ipynb (Jupyter Notebook). v8 Renamed LAX file to Q3, rather than 03. v7 Amended file names of BKK files to correctly label as Q3, not Q2 or 06. v6 Added LAX file. v5 Add screenshot of BKK Google Map. v4 Add BKK Google map(link below), and BKK csv mapping files. v3 replaced MEL map with big key version. Prev key was very tiny in top right corner. v2 Uploaded MEL, SHG, BKK data and Jupyter Notebook v1 Metadata record

** LICENCE AWS data licence on Speedtest data is "CC BY-NC-SA 4.0", so use of this data must be: - non-commercial (NC) - reuse must be share-alike (SA)(add same licence). This restricts the standard CC-BY Figshare licence.

** Other uses of Speedtest Open Data; - see link at Speedtest below.
Galaxy Training Material for the 'Use Jupyter notebooks in Galaxy' tutorial
zenodo.org
csv
Updated Apr 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Delphine Lariviere; Delphine Lariviere; Teresa Müller; Teresa Müller (2025). Galaxy Training Material for the 'Use Jupyter notebooks in Galaxy' tutorial [Dataset]. http://doi.org/10.5281/zenodo.15263830
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15263830
Dataset updated
Apr 22, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Delphine Lariviere; Delphine Lariviere; Teresa Müller; Teresa Müller
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset was originally curated by Software Carpentry, a branch of The Carpentries non-profit organization, and is based on data from the Gapminder Foundation. It consists of six tabular CSV files containing GDP data for various countries across different years. The dataset was initially prepared for the Software Carpentry tutorial "Plotting and Programming in Python" and is also reused in the Galaxy Training Network (GTN) tutorial "Use Jupyter Notebooks in Galaxy."

This GTN tutorial provides an introduction to launching a Jupyter Notebook in Galaxy, installing dependencies, and importing and exporting data. It serves as a setup guide for a Jupyter Notebook environment that can be used to follow the Software Carpentry tutorial "Plotting and Programming in Python."
o
Demographic Analysis Workflow using Census API in Jupyter Notebook:...
openicpsr.org
delimited
Updated Jul 23, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Donghwan Gu; Nathanael Rosenheim (2020). Demographic Analysis Workflow using Census API in Jupyter Notebook: 1990-2000 Population Size and Change [Dataset]. http://doi.org/10.3886/E120381V1
Explore at:
delimitedAvailable download formats
Unique identifier
https://doi.org/10.3886/E120381V1
Dataset updated
Jul 23, 2020
Dataset provided by
Texas A&M University
Authors
Donghwan Gu; Nathanael Rosenheim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Boone County, Kentucky, US Counties
Description
This archive reproduces a table titled "Table 3.1 Boone county population size, 1990 and 2000" from Wang and vom Hofe (2007, p.58). The archive provides a Jupyter Notebook that uses Python and can be run in Google Colaboratory. The workflow uses Census API to retrieve data, reproduce the table, and ensure reproducibility for anyone accessing this archive.The Python code was developed in Google Colaboratory, or Google Colab for short, which is an Integrated Development Environment (IDE) of JupyterLab and streamlines package installation, code collaboration and management. The Census API is used to obtain population counts from the 1990 and 2000 Decennial Census (Summary File 1, 100% data). All downloaded data are maintained in the notebook's temporary working directory while in use. The data are also stored separately with this archive.The notebook features extensive explanations, comments, code snippets, and code output. The notebook can be viewed in a PDF format or downloaded and opened in Google Colab. References to external resources are also provided for the various functional components. The notebook features code to perform the following functions:install/import necessary Python packagesintroduce a Census API Querydownload Census data via CensusAPI manipulate Census tabular data calculate absolute change and percent changeformatting numbersexport the table to csvThe notebook can be modified to perform the same operations for any county in the United States by changing the State and County FIPS code parameters for the Census API downloads. The notebook could be adapted for use in other environments (i.e., Jupyter Notebook) as well as reading and writing files to a local or shared drive, or cloud drive (i.e., Google Drive).
h
iris
huggingface.co
Updated Jul 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Energy Research Scientific Computing Center (2025). iris [Dataset]. https://huggingface.co/datasets/NERSC/iris
Explore at:
Dataset updated
Jul 1, 2025
Dataset authored and provided by
National Energy Research Scientific Computing Center
Description
Iris

The following code can be used to load the dataset from its stored location at NERSC. You may also access this code via a NERSC-hosted Jupyter notebook here.

Iris data loader

import pandas as pd iris_dat = pd.read_csv('/global/cfs/cdirs/dasrepo/www/ai_ready_datasets/iris/data/iris.csv')

If you would like to download the data, visit the following link: https://portal.nersc.gov/cfs/dasrepo/ai_ready_datasets/iris/data
d
National Water Model HydroLearn Python Notebooks
search.dataone.org
Updated Apr 15, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Justin Hunter (2022). National Water Model HydroLearn Python Notebooks [Dataset]. https://search.dataone.org/view/sha256%3A671542b16aaceb670d8f579224bba83cbcfa0cd9a6a0ffabe05e3767164448d7
Explore at:
Dataset updated
Apr 15, 2022
Dataset provided by
Hydroshare
Authors
Justin Hunter
Area covered

Description
This resource contains Jupyter Python notebooks which are intended to be used to learn about the U.S. National Water Model (NWM). These notebooks explore NWM forecasts in various ways. NWM Notebooks 1, 2, and 3, access NWM forecasts directly from the NOAA NOMADS file sharing system. Notebook 4 accesses NWM forecasts from Google Cloud Platform (GCP) storage in addition to NOMADS. A brief summary of what each notebook does is included below:

Notebook 1 (NWM1_Visualization) focuses on visualization. It includes functions for downloading and extracting time series forecasts for any of the 2.7 million stream reaches of the U.S. NWM. It also demonstrates ways to visualize forecasts using Python packages like matplotlib.

Notebook 2 (NWM2_Xarray) explores methods for slicing and dicing NWM NetCDF files using the python library, XArray.

Notebook 3 (NWM3_Subsetting) is focused on subsetting NWM forecasts and NetCDF files for specified reaches and exporting NWM forecast data to CSV files.

Notebook 4 (NWM4_Hydrotools) uses Hydrotools, a new suite of tools for evaluating NWM data, to retrieve NWM forecasts both from NOMADS and from Google Cloud Platform storage where older NWM forecasts are cached. This notebook also briefly covers visualizing, subsetting, and exporting forecasts retrieved with Hydrotools.

The notebooks are part of a NWM learning module on HydroLearn.org. When the associated learning module is complete, the link to it will be added here. It is recommended that these notebooks be opened through the CUAHSI JupyterHub App on Hydroshare. This can be done via the 'Open With' button at the top of this resource page.
d
Data from: A hydroclimatological approach to predicting regional landslide...
search.dataone.org
hydroshare.org
Updated Dec 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ronda Strauch; Erkan Istanbulluoglu; Sai Siddhartha Nudurupati; Christina Bandaragoda; Nicole Gasparini; Greg Tucker (2021). A hydroclimatological approach to predicting regional landslide probability using Landlab [Dataset]. http://doi.org/10.4211/hs.27d34fc967be4ee6bc1f1ae92657bf2b
Explore at:
Unique identifier
https://doi.org/10.4211/hs.27d34fc967be4ee6bc1f1ae92657bf2b
Dataset updated
Dec 5, 2021
Dataset provided by
Hydroshare
Authors
Ronda Strauch; Erkan Istanbulluoglu; Sai Siddhartha Nudurupati; Christina Bandaragoda; Nicole Gasparini; Greg Tucker
Area covered

Description
This resource supports the work published in Strauch et al., (2018) "A hydroclimatological approach to predicting regional landslide probability using Landlab", Earth Surf. Dynam., 6, 1-26 . It demonstrates a hydroclimatological approach to modeling of regional shallow landslide initiation based on the infinite slope stability model coupled with a steady-state subsurface flow representation. The model component is available as the LandslideProbability component in Landlab, an open-source, Python-based landscape earth systems modeling environment described in Hobley et al. (2017, Earth Surf. Dynam., 5, 21–46, https://doi.org/10.5194/esurf-5-21-2017). The model operates on a digital elevation model (DEM) grid to which local field parameters, such as cohesion and soil depth, are attached. A Monte Carlo approach is used to account for parameter uncertainty and calculate probability of shallow landsliding as well as the probability of soil saturation based on annual maximum recharge. The model is demonstrated in a steep mountainous region in northern Washington, U.S.A., using 30-m grid resolution over 2,700 km2.

This resource contains a 1) User Manual that describes the Landlab LandslideProbability Component design, parameters, and step-by-step guidance on using the component in a model, and 2) two Landlab driver codes (notebooks) and customized component code to run Landlab's LandslideProbability component for 2a) synthetic recharge and 2b) modeled recharge published in Strauch et al., (2018). The Jupyter Notebooks use HydroShare code libraries to import data located at this resource: https://www.hydroshare.org/resource/a5b52c0e1493401a815f4e77b09d352b/.

The Synthetic Recharge Jupyter Notebook

The Modeled Recharge Jupyter Notebook
Z
Supplementary Data for "Identification of Neighborhood Hotspots via the...
data.niaid.nih.gov
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jain, Sakshi; Gardner-Frolick, Rivkah; Martinussen, Nika; Jackson, Dan; Giang, Amanda; Zimmerman, Naomi (2024). Supplementary Data for "Identification of Neighborhood Hotspots via the Cumulative Hazard Index: Results from a Community-Partnered Low-cost Sensor Deployment" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8284425
Explore at:
Dataset updated
Jul 11, 2024
Dataset provided by
University of British Columbia
Strathcona Residents' Association
Authors
Jain, Sakshi; Gardner-Frolick, Rivkah; Martinussen, Nika; Jackson, Dan; Giang, Amanda; Zimmerman, Naomi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These are the underlying data sets needed to build the kriging maps and calculate dissemination block cumulative hazard indices described in the paper. There are three data sets:

"Sampling location names and coordinates.csv": locations and IDs of the low-cost sensors and the regulatory monitoring stations used in this work. [NOTE: latitudes and longitudes for the sensor deployments have been intentionally rounded to protect the location of volunteer sensor hosts.]

"Dissemination Block Populations.csv": These are the relevant dissemination blocks in the study domain and their associated populations. This information was originally extracted from: https://censusmapper.ca/#13/49.2430/-123.1252

"Daily average concentrations by site and pollutant.csv": This contains the PM2.5, NO2 and O3 daily averages for the entire study period across all low-cost sensor sites and regulatory monitoring stations. Refer to "Sampling location names and coordinates.csv" to parse the labels in this data set.

There is also a sample code in Python to construct the kriging maps provided in 2 formats. [NOTE: we have intentionally excluded uploading the exact data sets imported by this code; our original data contains exact locations of sensor host volunteers and thus cannot be shared.]

"Jain et al - GeoHealth - Kriging Script.ipynb": A Jupyter notebook script to import the data, build kriging maps, calculate CHIs, and export the data.

" Jain et al - GeoHealth - Kriging Script.pdf": A PDF export of the Jupyter notebook so that you can read the Python scripts even if you are not a Jupyter notebooks user.
Z
Gaia EDR3 Catalogs of Machine-Learned Radial Velocities
data-staging.niaid.nih.gov
data.niaid.nih.gov
Updated Jun 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dropulic, Adriana; Liu, Hongwan; Ostdiek, Bryan; Lisanti, Mariangela (2022). Gaia EDR3 Catalogs of Machine-Learned Radial Velocities [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_6558082
Explore at:
Dataset updated
Jun 10, 2022
Dataset provided by
Princeton University
Princeton University & Center for Computational Astrophysics, Flatiron Institute
Harvard University & The NSF AI Institute for Artificial Intelligence and Fundamental Interactions
New York University & Princeton University
Authors
Dropulic, Adriana; Liu, Hongwan; Ostdiek, Bryan; Lisanti, Mariangela
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Gaia EDR3 Catalogs of Machine-Learned Radial Velocities

Spatially complete Test-Set and Machine-Learned Radial Velocity (ML-RV) Catalogs described in Dropulic et al., arXiv:2205.12278. The spatially complete Test-Set Catalog contains a total of 4,332,657 stars, while the spatially complete ML-RV Catalog contains 91,840,346 stars. We provide Gaia EDR3 Source IDs, the network-predicted line-of-sight velocity in km/s, and the network-predicted uncertainty in km/s.

We have included a simple Jupyter notebook demonstrating how to import the data, and make a simple histogram with it.

If you find this catalog useful in your work, please cite Dropulic et al. arXiv:2205.12278, as well as Dropulic et al. ApJL 915, L14 (2021) arXiv:2103.14039.
H
Hydroinformatics Instruction Module Example Code: Programmatic Data Access...
hydroshare.org
beta.hydroshare.org
+1more
zip
Updated Mar 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amber Spackman Jones; Jeffery S. Horsburgh (2022). Hydroinformatics Instruction Module Example Code: Programmatic Data Access with USGS Data Retrieval [Dataset]. https://www.hydroshare.org/resource/a58b5d522d7f4ab08c15cd05f3fd2ad3
Explore at:
zip(34.5 KB)Available download formats
Dataset updated
Mar 3, 2022
Dataset provided by
HydroShare
Authors
Amber Spackman Jones; Jeffery S. Horsburgh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This resource contains Jupyter Notebooks with examples for accessing USGS NWIS data via web services and performing subsequent analysis related to drought with particular focus on sites in Utah and the southwestern United States (could be modified to any USGS sites). The code uses the Python DataRetrieval package. The resource is part of set of materials for hydroinformatics and water data science instruction. Complete learning module materials are found in HydroLearn: Jones, A.S., Horsburgh, J.S., Bastidas Pacheco, C.J. (2022). Hydroinformatics and Water Data Science. HydroLearn. https://edx.hydrolearn.org/courses/course-v1:USU+CEE6110+2022/about.

This resources consists of 6 example notebooks: 1. Example 1: Import and plot daily flow data 2. Example 2: Import and plot instantaneous flow data for multiple sites 3. Example 3: Perform analyses with USGS annual statistics data 4. Example 4: Retrieve data and find daily flow percentiles 3. Example 5: Further examination of drought year flows 6. Coding challenge: Assess drought severity
Z
Data from: Long-Term Tracing of Indoor Solar Harvesting
data-staging.niaid.nih.gov
data.niaid.nih.gov
+1more
Updated Jul 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sigrist, Lukas; Gomez, Andres; Thiele, Lothar (2024). Long-Term Tracing of Indoor Solar Harvesting [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_3346975
Explore at:
Dataset updated
Jul 22, 2024
Dataset provided by
ETH Zurich
Authors
Sigrist, Lukas; Gomez, Andres; Thiele, Lothar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Information

This dataset presents long-term term indoor solar harvesting traces and jointly monitored with the ambient conditions. The data is recorded at 6 indoor positions with diverse characteristics at our institute at ETH Zurich in Zurich, Switzerland.

The data is collected with a measurement platform [3] consisting of a solar panel (AM-5412) connected to a bq25505 energy harvesting chip that stores the harvested energy in a virtual battery circuit. Two TSL45315 light sensors placed on opposite sides of the solar panel monitor the illuminance level and a BME280 sensor logs ambient conditions like temperature, humidity and air pressure.

The dataset contains the measurement of the energy flow at the input and the output of the bq25505 harvesting circuit, as well as the illuminance, temperature, humidity and air pressure measurements of the ambient sensors. The following timestamped data columns are available in the raw measurement format, as well as preprocessed and filtered HDF5 datasets:

V_in - Converter input/solar panel output voltage, in volt

I_in - Converter input/solar panel output current, in ampere

V_bat - Battery voltage (emulated through circuit), in volt

I_bat - Net Battery current, in/out flowing current, in ampere

Ev_left - Illuminance left of solar panel, in lux

Ev_right - Illuminance left of solar panel, in lux

P_amb - Ambient air pressure, in pascal

RH_amb - Ambient relative humidity, unit-less between 0 and 1

T_amb - Ambient temperature, in centigrade Celsius

The following publication presents and overview of the dataset and more details on the deployment used for data collection. A copy of the abstract is included in this dataset, see the file abstract.pdf.

L. Sigrist, A. Gomez, and L. Thiele. "Dataset: Tracing Indoor Solar Harvesting." In Proceedings of the 2nd Workshop on Data Acquisition To Analysis (DATA '19), 2019.

Folder Structure and Files

processed/ - This folder holds the imported, merged and filtered datasets of the power and sensor measurements. The datasets are stored in HDF5 format and split by measurement position posXX and and power and ambient sensor measurements. The files belonging to this folder are contained in archives named yyyy_mm_processed.tar, where yyyy and mm represent the year and month the data was published. A separate file lists the exact content of each archive (see below).

raw/ - This folder holds the raw measurement files recorded with the RocketLogger [1, 2] and using the measurement platform available at [3]. The files belonging to this folder are contained in archives named yyyy_mm_raw.tar, where yyyy and mmrepresent the year and month the data was published. A separate file lists the exact content of each archive (see below).

LICENSE - License information for the dataset.

README.md - The README file containing this information.

abstract.pdf - A copy of the above mentioned abstract submitted to the DATA '19 Workshop, introducing this dataset and the deployment used to collect it.

raw_import.ipynb [open in nbviewer] - Jupyter Python notebook to import, merge, and filter the raw dataset from the raw/ folder. This is the exact code used to generate the processed dataset and store it in the HDF5 format in the processed/folder.

raw_preview.ipynb [open in nbviewer] - This Jupyter Python notebook imports the raw dataset directly and plots a preview of the full power trace for all measurement positions.

processing_python.ipynb [open in nbviewer] - Jupyter Python notebook demonstrating the import and use of the processed dataset in Python. Calculates column-wise statistics, includes more detailed power plots and the simple energy predictor performance comparison included in the abstract.

processing_r.ipynb [open in nbviewer] - Jupyter R notebook demonstrating the import and use of the processed dataset in R. Calculates column-wise statistics and extracts and plots the energy harvesting conversion efficiency included in the abstract. Furthermore, the harvested power is analyzed as a function of the ambient light level.

Dataset File Lists

Processed Dataset Files

The list of the processed datasets included in the yyyy_mm_processed.tar archive is provided in yyyy_mm_processed.files.md. The markdown formatted table lists the name of all files, their size in bytes, as well as the SHA-256 sums.

Raw Dataset Files

A list of the raw measurement files included in the yyyy_mm_raw.tar archive(s) is provided in yyyy_mm_raw.files.md. The markdown formatted table lists the name of all files, their size in bytes, as well as the SHA-256 sums.

Dataset Revisions

v1.0 (2019-08-03)

Initial release. Includes the data collected from 2017-07-27 to 2019-08-01. The dataset archive files related to this revision are 2019_08_raw.tar and 2019_08_processed.tar. For position pos06, the measurements from 2018-01-06 00:00:00 to 2018-01-10 00:00:00 are filtered (data inconsistency in file indoor1_p27.rld).

v1.1 (2019-09-09)

Revision of the processed dataset v1.0 and addition of the final dataset abstract. Updated processing scripts reduce the timestamp drift in the processed dataset, the archive 2019_08_processed.tar has been replaced. For position pos06, the measurements from 2018-01-06 16:00:00 to 2018-01-10 00:00:00 are filtered (indoor1_p27.rld data inconsistency).

v2.0 (2020-03-20)

Addition of new data. Includes the raw data collected from 2019-08-01 to 2019-03-16. The processed data is updated with full coverage from 2017-07-27 to 2019-03-16. The dataset archive files related to this revision are 2020_03_raw.tar and 2020_03_processed.tar.

Dataset Authors, Copyright and License

Authors: Lukas Sigrist, Andres Gomez, and Lothar Thiele

Contact: Lukas Sigrist (lukas.sigrist@tik.ee.ethz.ch)

Copyright: (c) 2017-2019, ETH Zurich, Computer Engineering Group

License: Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/)

References

[1] L. Sigrist, A. Gomez, R. Lim, S. Lippuner, M. Leubin, and L. Thiele. Measurement and validation of energy harvesting IoT devices. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[2] ETH Zurich, Computer Engineering Group. RocketLogger Project Website, https://rocketlogger.ethz.ch/.

[3] L. Sigrist. Solar Harvesting and Ambient Tracing Platform, 2019. https://gitlab.ethz.ch/tec/public/employees/sigristl/harvesting_tracing
OpenOrca
kaggle.com
opendatalab.com
+1more
zip
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). OpenOrca [Dataset]. https://www.kaggle.com/datasets/thedevastator/open-orca-augmented-flan-dataset/versions/2
Explore at:
zip(2548102631 bytes)Available download formats
Dataset updated
Nov 22, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Open-Orca Augmented FLAN Dataset

Unlocking Advanced Language Understanding and ML Model Performance

By Huggingface Hub [source]

About this dataset

The Open-Orca Augmented FLAN Collection is a revolutionary dataset that unlocks new levels of language understanding and machine learning model performance. This dataset was created to support research on natural language processing, machine learning models, and language understanding through leveraging the power of reasoning trace-enhancement techniques. By enabling models to understand complex relationships between words, phrases, and even entire sentences in a more robust way than ever before, this dataset provides researchers expanded opportunities for furthering the progress of linguistics research. With its unique combination of features including system prompts, questions from users and responses from systems, this dataset opens up exciting possibilities for deeper exploration into the cutting edge concepts underlying advanced linguistics applications. Experience a new level of accuracy and performance - explore Open-Orca Augmented FLAN Collection today!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This guide provides an introduction to the Open-Orca Augmented FLAN Collection dataset and outlines how researchers can utilize it for their language understanding and natural language processing (NLP) work. The Open-Orca dataset includes system prompts, questions posed by users, and responses from the system.

Getting Started The first step is to download the data set from Kaggle at https://www.kaggle.com/openai/open-orca-augmented-flan and save it in a project directory of your choice on your computer or cloud storage space. Once you have downloaded the data set, launch your ‘Jupyter Notebook’ or ‘Google Colab’ program with which you want to work with this data set.

Exploring & Preprocessing Data: To get a better understanding of the features in this dataset, import them into Pandas DataFrame as shown below. You can use other libraries as per your need:

import pandas as pd # Library used for importing datasets into Python df = pd.read_csv('train.csv') #Imports train csv file into Pandas};#DataFrame df[['system_prompt','question','response']].head() #Views top 5 rows with columns 'system_prompt','question','response'

After importing check each feature using basic descriptive statistics such Pandas groupby statement: We can use groupby statements to have greater clarity over the variables present in each feature(elements). The below command will show counts of each element in System Prompt column present under train CVS file :

df['system prompt'].value_counts().head()#shows count of each element present under 'System Prompt'column Output: User says hello guys 587 <br>System asks How are you?: 555 times<br>User says I am doing good: 487 times <br>..and so on

Data Transformation: After inspecting & exploring different features one may want/need certain changes that best suits their needs from this dataset before training modeling algorithms on it.
Common transformation steps include : Removing punctuation marks : Since punctuation marks may not add any value to computation operations , we can remove them using regex functions write .replace('[^A-Za -z]+','' ) as

Research Ideas

Automated Question Answering: Leverage the dataset to train and develop question answering models that can provide tailored answers to specific user queries while retaining language understanding abilities.

Natural Language Understanding: Use the dataset as an exploratory tool for fine-tuning natural language processing applications, such as sentiment analysis, document categorization, parts-of-speech tagging and more.

Machine Learning Optimizations: The dataset can be used to build highly customized machine learning pipelines that allow users to harness the power of conditioning data with pre-existing rules or models for improved accuracy and performance in automated tasks

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. [See Other Information](ht...
I
Data from: Enhancing Carrier Mobility In Monolayer MoS2 Transistors With...
aws-databank-alb.library.illinois.edu
databank.illinois.edu
Updated Mar 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yue Zhang; Helin Zhao; Siyuan Huang; Mohhamad Abir Hossain; Arend van der Zande (2024). Enhancing Carrier Mobility In Monolayer MoS2 Transistors With Process induced Strain [Dataset]. http://doi.org/10.13012/B2IDB-7519929_V1
Explore at:
Unique identifier
https://doi.org/10.13012/B2IDB-7519929_V1
Dataset updated
Mar 28, 2024
Authors
Yue Zhang; Helin Zhao; Siyuan Huang; Mohhamad Abir Hossain; Arend van der Zande
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Read me file for the data repository ******************************************************************************* This repository has raw data for the publication "Enhancing Carrier Mobility In Monolayer MoS2 Transistors With Process Induced Strain". We arrange the data following the figure in which it first appeared. For all electrical transfer measurement, we provide the up-sweep and down-sweep data, with voltage units in V and conductance unit in S. All Raman modes have unit of cm^-1. ******************************************************************************* How to use this dataset All data in this dataset is stored in binary Numpy array format as .npy file. To read a .npy file: use the Numpy module of the python language, and use np.load() command. Example: suppose the filename is example_data.npy. To load it into a python program, open a Jupyter notebook, or in the python program, run: import numpy as np data = np.load("example_data.npy") Then the example file is stored in the data object. *******************************************************************************
Z
Process Integration Multi-Actor Multi-Criteria Optimization (PI-MAMCO)...
data.niaid.nih.gov
data-staging.niaid.nih.gov
Updated Sep 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lechtenberg, Fabian; Aresté-Saló, Lluc (2024). Process Integration Multi-Actor Multi-Criteria Optimization (PI-MAMCO) Framework: Case Study on Palm Oil Based Complex (POBC) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_12622289
Explore at:
Dataset updated
Sep 17, 2024
Dataset provided by
Universitat Politècnica de Catalunya
Authors
Lechtenberg, Fabian; Aresté-Saló, Lluc
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset includes the complete workflow for the PI-MAMCO method implementation, demonstrated on a POBC integration case study. The files are structured as follows:

NOTEBOOKS Folder: This folder contains the Jupyter notebooks that constitute the entire workflow of our case study. These notebooks import necessary model equations, data, and functions from the "MODEL" folder. Running these notebooks with the correct Python package versions will reproduce the exact results presented in our submitted article.

MODEL Folder: This folder includes all model equations, data, and functions required by the Jupyter notebooks. The scripts in this folder are integral to executing the PI-MAMCO method as outlined in our research.

OUTPUTS Folder: The results generated by running the notebooks are saved in this folder. This includes all outputs reported in our manuscript as well as additional results not included in the paper.

OUTPUTS_GENERATED Folder: The files contained in this folder are the expected results from executing the case study notebook.

Reproducibility: To ensure the reproducibility of our results, please ensure that you are using the correct versions of the required Python packages. Detailed instructions for setting up the Python environment are provided within the notebook and the readme.txt.

Contained are also outputs from the IRenE procedure, used for the literature review (01_extracted_keywords.xlsx ; 02_sampling_results.csv)

Articles metadata from CrossRef

kaggle.com

zip

Updated Aug 1, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Kea Kohv (2025). Articles metadata from CrossRef [Dataset]. https://www.kaggle.com/datasets/keakohv/articles-doi-metadata

Explore at:

zip(72982417 bytes)Available download formats

Dataset updated

Aug 1, 2025

Authors

Kea Kohv

Description

This data originates from Crossref API. It has metadata on the articles contained in Data Citation Corpus where the citation pair dataset is a DOI.

How to recreate this dataset in Jupyter Notebook:

1) Prepare list of articles to query ```python import pandas as pd

See: https://www.kaggle.com/datasets/keakohv/data-citation-coprus-v4-1-eupmc-and-datacite

CITATIONS_PARQUET = "data_citation_corpus_filtered_v4.1.parquet"

Load the citation pairs from the Parquet file

citation_pairs = pd.read_parquet(CITATIONS_PARQUET)

Remove all rows where https is in the 'publication' column but no "doi.org" is present

citation_pairs = citation_pairs[ ~((citation_pairs['dataset'].str.contains("https")) & (~citation_pairs['dataset'].str.contains("doi.org"))) ]

Remove all rows where figshare is in the dataset name

citation_pairs = citation_pairs[ ~citation_pairs['dataset'].str.contains("figshare") ]

citation_pairs['is_doi'] = citation_pairs['dataset'].str.contains('doi.org', na=False)

citation_pairs_doi = citation_pairs[citation_pairs['is_doi'] == True].copy()

articles = list(set(citation_pairs_doi['publication'].to_list()))

articles = [doi.replace("_", "/") for doi in articles]

Save list articles to a file

with open("articles.txt", "w") as f: for article in articles: f.write(f"{article} ") ```

2) Query articles from CrossRef API


%%writefile enrich.py
#!pip install -q aiolimiter
import sys, pathlib, asyncio, aiohttp, orjson, sqlite3, time
from aiolimiter import AsyncLimiter

# ---------- config ----------
HEADERS  = {"User-Agent": "ForDataCiteEnrichment (mailto:your_email)"} # Put your email here
MAX_RPS  = 45           # polite pool limit (50), leave head-room
BATCH_SIZE = 10_000         # rows per INSERT
DB_PATH  = pathlib.Path("crossref.sqlite").resolve()
ARTICLES  = pathlib.Path("articles.txt")
# -----------------------------

# ---- platform tweak: prefer selector loop on Windows ----
if sys.platform == "win32":
  asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

# ---- read the DOI list ----
with ARTICLES.open(encoding="utf-8") as f:
  DOIS = [line.strip() for line in f if line.strip()]

# ---- make sure DB & table exist BEFORE the async part ----
DB_PATH.parent.mkdir(parents=True, exist_ok=True)
with sqlite3.connect(DB_PATH) as db:
  db.execute("""
    CREATE TABLE IF NOT EXISTS works (
      doi  TEXT PRIMARY KEY,
      json TEXT
    )
  """)
  db.execute("PRAGMA journal_mode=WAL;")   # better concurrency

# ---------- async section ----------
limiter = AsyncLimiter(MAX_RPS, 1)       # 45 req / second
sem   = asyncio.Semaphore(100)        # cap overall concurrency

async def fetch_one(session, doi: str):
  url = f"https://api.crossref.org/works/{doi}"
  async with limiter, sem:
    try:
      async with session.get(url, headers=HEADERS, timeout=10) as r:
        if r.status == 404:         # common “not found”
          return doi, None
        r.raise_for_status()        # propagate other 4xx/5xx
        return doi, await r.json()
    except Exception as e:
      return doi, None            # log later, don’t crash

async def main():
  start = time.perf_counter()
  db  = sqlite3.connect(DB_PATH)        # KEEP ONE connection
  db.execute("PRAGMA synchronous = NORMAL;")   # speed tweak

  async with aiohttp.ClientSession(json_serialize=orjson.dumps) as s:
    for chunk_start in range(0, len(DOIS), BATCH_SIZE):
      slice_ = DOIS[chunk_start:chunk_start + BATCH_SIZE]
      tasks = [asyncio.create_task(fetch_one(s, d)) for d in slice_]
      results = await asyncio.gather(*tasks)    # all tuples, no exc

      good_rows, bad_dois = [], []
      for doi, payload in results:
        if payload is None:
          bad_dois.append(doi)
        else:
          good_rows.append((doi, orjson.dumps(payload).decode()))

      if good_rows:
        db.executemany(
          "INSERT OR IGNORE INTO works (doi, json) VALUES (?, ?)",
          good_rows,
        )
        db.commit()

      if bad_dois:                # append for later retry
        with open("failures.log", "a", encoding="utf-8") as fh:
          fh.writelines(f"{d}
" for d in bad_dois)

      done = chunk_start + len(slice_)
      rate = done / (time.perf_counter() - start)
      print(f"{done:,}/{len(DOIS):,} ({rate:,.1f} DOI/s)")

  db.close()

if _name_ == "_main_":
  asyncio.run(main())

Then run: python !python enrich.py

3) Finally extract the necessary fields

import sqlite3
import orjson
i...

f
NFDITalk (15 July 2024): Jupyter4NFDI - a central Jupyter Hub for the NFDI
meta4ds.fokus.fraunhofer.de
html
Updated Jul 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NFDI (2024). NFDITalk (15 July 2024): Jupyter4NFDI - a central Jupyter Hub for the NFDI [Dataset]. https://meta4ds.fokus.fraunhofer.de/datasets/1pgpjyjs8tq?locale=en
Explore at:
htmlAvailable download formats
Dataset updated
Jul 15, 2024
Dataset authored and provided by
NFDI
Description
In our NFDITalks, scientists from different disciplines present exciting topics around NFDI and research data management. In this episode, Björn Hagemeier will talk about "Jupyter4NFDI - a central Jupyter Hub for the NFDI".

Jupyter Notebooks are widespread across scientific disciplines today. However, their deployment across various NFDI consortia currently occurs through individual JupyterHubs, resulting in access barriers to computational and data resources. Whereas some of the services are widely available, others are barricaded within VPNs or otherwise inaccessible for a wider audience. Our ambition is to improve the user experience by offering a centralized service to extend the reach of Jupyter to a broader audience within the NFDI and beyond. The technical foundation for our service will be the versatile configuration frontend that has been proven to meet user needs for the past seven years at JSC. It is continuously extended and traces and ever growing set of backend resources ranging from Cloud based, small-scale JupyterLabs to full-scale remote desktop environments on high-performance computing systems such as Germany's highest-ranked TOP500 system JUWELS Booster.

Importantly, the centralized system will not only simplify access but also support the import of projects along with their necessary dependencies, fostering an ecosystem conducive to creating reproducible FAIR Digital Objects (FDOs), possibly along with notebook identifiers supported by PID4NFDI.

In this talk, we'll revisit the history of the current solution, the landscape in which we intend to make it available, and give an outlook on the future of the service.
H
Hydroinformatics Instruction Module Example Code: Introduction to Machine...
beta.hydroshare.org
hydroshare.org
+1more
zip
Updated Apr 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Camilo J. Bastidas Pacheco; Amber Spackman Jones (2024). Hydroinformatics Instruction Module Example Code: Introduction to Machine Learning with Residential Water Use Data [Dataset]. https://beta.hydroshare.org/resource/af6989bbbe344cb38118104bc39ea05b/
Explore at:
zip(617.9 KB)Available download formats
Dataset updated
Apr 16, 2024
Dataset provided by
HydroShare
Authors
Camilo J. Bastidas Pacheco; Amber Spackman Jones
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This resource contains Jupyter Notebooks with examples that are an introduction to machine learning classification based on residential water use data. The resource is part of set of materials for hydroinformatics and water data science instruction. Complete learning module materials are found in HydroLearn: Jones, A.S., Horsburgh, J.S., Bastidas Pacheco, C.J. (2022). Hydroinformatics and Water Data Science. HydroLearn. https://edx.hydrolearn.org/courses/course-v1:USU+CEE6110+2022/about.

This resources consists of 4 example notebooks and a data files.

Notebooks: 1. Example 1: Data import and exploration 2. Example 2: Implementing a first machine learning model 3. Example 3: Comparing multiple machine learning models 4. Example 4: Model optimization by hyperparameter tuning

Data files: The data is contained in a flat file and is a record of water use data from a single residential property with manually applied labels to classify the water uses. Columns are: - StartTime: Start date and time of each individual event. Format: 'YYYY-MM-DD HH:MM:SS' - EndTime: End date and time of each individual event. Format: 'YYYY-MM-DD HH:MM:SS' - Duration: Duration of each individual event (end time - start time). Units: Minutes - Volume: Volume of water used in each individual event. Unit: Gallons - FlowRate: Average flow rate of each individual event. Unit: Gallons per minute - Peak: Maximum value observed in each 4-seconds period within each event. Unit: Gallons - Mode: Most frequent value observed in an event. Unit: Gallons - Label: Event classification. Values: faucet, toilet, shower, clotheswasher, bathtub
Speedtest Open Data - Australia(NZ) 2020-2025; Q220 - Q325 extract by Qtr
figshare.com
txt
Updated Oct 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard Ferrers; Speedtest Global Index (2025). Speedtest Open Data - Australia(NZ) 2020-2025; Q220 - Q325 extract by Qtr [Dataset]. http://doi.org/10.6084/m9.figshare.13370504.v43
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13370504.v43
Dataset updated
Oct 24, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
Richard Ferrers; Speedtest Global Index
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Australia, New Zealand
Description
This is an Australian extract of Speedtest Open data available at Amazon WS (link below - opendata.aws).AWS data licence is "CC BY-NC-SA 4.0", so use of this data must be:- non-commercial (NC)- reuse must be share-alike (SA)(add same licence).This restricts the standard CC-BY Figshare licence.A world speedtest open data was dowloaded (>400Mb, 7M lines of data). An extract of Australia's location (lat, long) revealed 88,000 lines of data (attached as csv).A Jupyter notebook of extract process is attached.See Binder version at Github - https://github.com/areff2000/speedtestAU.+> Install: 173 packages | Downgrade: 1 packages | Total download: 432MBBuild container time: approx - load time 25secs.=> Error: Timesout - BUT UNABLE TO LOAD GLOBAL DATA FILE (6.6M lines).=> Error: Overflows 8GB RAM container provided with global data file (3GB)=> On local JupyterLab M2 MBP; loads in 6 mins.Added Binder from ARDC service: https://binderhub.rc.nectar.org.auDocs: https://ardc.edu.au/resource/fair-for-jupyter-notebooks-a-practical-guide/A link to Twitter thread of outputs provided.A link to Data tutorial provided (GitHub), including Jupyter Notebook to analyse World Speedtest data, selecting one US State.Data Shows: (Q220)- 3.1M speedtests | 762,000 devices |- 88,000 grid locations (600m * 600m), summarised as a point- average speed 33.7Mbps (down), 12.4M (up) | Max speed 724Mbps- data is for 600m * 600m grids, showing average speed up/down, number of tests, and number of users (IP). Added centroid, and now lat/long.See tweet of image of centroids also attached.NB: Discrepancy Q2-21, Speedtest Global shows June AU average speedtest at 80Mbps, whereas Q2 mean is 52Mbps (v17; Q1 45Mbps; v14). Dec 20 Speedtest Global has AU at 59Mbps. Could be possible timing difference. Or spatial anonymising masking shaping highest speeds. Else potentially data inconsistent between national average and geospatial detail. Check in upcoming quarters.NextSteps:Histogram - compare Q220, Q121, Q122. per v1.4.ipynb.Versions:v43. Added revised NZ vs AUS graph for Q325 (NZ; Q2 25) since had NZ available from Github (link below). Calc using PlayNZ.ipynb notebook. See images in Twitter - https://x.com/ValueMgmt/status/1981607615496122814v42: Added AUS Q325 (97.6k lines avg d/l 165.5 Mbps (median d/l 150.8 Mbps) u/l 28.08 Mbps). Imported using v2 Jupyter notebook (MBP 16Gb). Mean tests: 24.5. Mean devices: 6.02. Download, extract and publish: UNK - not measured mins. Download avg is double Q423. Noting, NBN increased D/L speeds from Sept '25; 100 -> 500, 250 -> 750. For 1Gbps, upload speed only increased from 50Mbps to 100Mbps. New 2Gbps services introduced on FTTP and HFC networks.v41: Added AUS Q225 (96k lines avg d/l 130.5 Mbps (median d/l 108.4 Mbps) u/l 22.45 Mbps). Imported using v2 Jupyter notebook (MBP 16Gb). Mean tests: 17.2. Mean devices: 5.11. Download, extract and publish: 20 mins. Download avg is double Q422.v40: Added AUS Q125 (93k lines avg d/l 116.6 Mbps u/l 21.35 Mbps). Imported using v2 Jupyter notebook (MBP 16Gb). Mean tests: 16.9. Mean devices: 5.13. Download, extract and publish: 14 mins.v39: Added AUS Q424 (95k lines avg d/l 110.9 Mbps u/l 21.02 Mbps). Imported using v2 Jupyter notebook (MBP 16Gb). Mean tests: 17.2. Mean devices: 5.24. Download, extract and publish: 14 mins.v38: Added AUS Q324 (92k lines avg d/l 107.0 Mbps u/l 20.79 Mbps). Imported using v2 Jupyter notebook (iMac 32Gb). Mean tests: 17.7. Mean devices: 5.33.Added github speedtest-workflow-importv2vis.ipynb Jupyter added datavis code to colour code national map. (per Binder on Github; link below).v37: Added AUS Q224 (91k lines avg d/l 97.40 Mbps u/l 19.88 Mbps). Imported using speedtest-workflow-importv2 jupyter notebook. Mean tests:18.1. Mean devices: 5.4.v36 Load UK data, Q1-23 and compare to AUS and NZ Q123 data. Add compare image (au-nz-ukQ123.png), calc PlayNZUK.ipynb, data load import-UK.ipynb. UK data bit rough and ready as uses rectangle to mark out UK, but includes some EIRE and FR. Indicative only and to be definitively needs geo-clean to exclude neighbouring countries.v35 Load Melb geo-maps of speed quartiles (0-25, 25-50, 50-75, 75-100, 100-). Avg in 2020; 41Mbps. Avg in 2023; 86Mbps. MelbQ323.png, MelbQ320.png. Calc with Speedtest-incHist.ipynb code. Needed to install conda mapclassify. ax=melb.plot(column=...dict(bins[25,50,75,100]))v34 Added AUS Q124 (93k lines avg d/l 87.00 Mbps u/l 18.86 Mbps). Imported using speedtest-workflow-importv2 jupyter notebook. Mean tests:18.3. Mean devices: 5.5.v33 Added AUS Q423 (92k lines avg d/l 82.62 Mbps). Imported using speedtest-workflow-importv2 jupyter notebook. Mean tests:18.0. Mean devices: 5.6. Added link to Github.v32 Recalc Au vs NZ for upload performance; added image. using PlayNZ Jupyter. NZ approx 40% locations at or above 100Mbps. Aus
d
New Zealand Hydrological Society Data Workshop 2024: A Python Package for...
dataone.org
beta.hydroshare.org
+1more
Updated Apr 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amber Spackman Jones (2024). New Zealand Hydrological Society Data Workshop 2024: A Python Package for Automating Aquatic Data QA/QC [Dataset]. https://dataone.org/datasets/sha256%3A93eca57f29b5d65c3fd95b602555133924e43df458ee84b9b451025f2dbff192
Explore at:
Dataset updated
Apr 13, 2024
Dataset provided by
Hydroshare
Authors
Amber Spackman Jones
Description
This resource was created for the 2024 New Zealand Hydrological Society Data Workshop in Queenstown, NZ. This resource contains Jupyter Notebooks with examples for conducting quality control post processing for in situ aquatic sensor data. The code uses the Python pyhydroqc package to detect anomalies. This resource consists of 3 example notebooks and associated data files. For more information, see the original resource from which this was derived: http://www.hydroshare.org/resource/451c4f9697654b1682d87ee619cd7924.

Notebooks: 1. Example 1: Import and plot data 2. Example 2: Perform rules-based quality control 3. Example 3: Perform model-based quality control (ARIMA) 4. Example 4: Model-based quality control (ARIMA) with user data

Data files: Data files are available for 6 aquatic sites in the Logan River Observatory. Each file contains data for one site for a single year. Each file corresponds to a single year of data. The files are named according to monitoring site (FranklinBasin, TonyGrove, WaterLab, MainStreet, Mendon, BlackSmithFork) and year. The files were sourced by querying the Logan River Observatory relational database, and equivalent data could be obtained from the LRO website or on HydroShare. Additional information on sites, variables, and methods can be found on the LRO website (http://lrodata.usu.edu/tsa/) or HydroShare (https://www.hydroshare.org/search/?q=logan%20river%20observatory). Each file has the same structure indexed with a datetime column (mountain standard time) with three columns corresponding to each variable. Variable abbreviations and units are: - temp: water temperature, degrees C - cond: specific conductance, μS/cm - ph: pH, standard units - do: dissolved oxygen, mg/L - turb: turbidity, NTU - stage: stage height, cm

For each variable, there are 3 columns: - Raw data value measured by the sensor (column header is the variable abbreviation). - Technician quality controlled (corrected) value (column header is the variable abbreviation appended with '_cor'). - Technician labels/qualifiers (column header is the variable abbreviation appended with '_qual').

There is also a file "data.csv" for use with Example 4. If any user wants to bring their own data file, they should structure it similarly to this file with a single column of datetime values and a single column of numeric observations labeled "raw".
d
Hydroinformatics Instruction Module Example Code: Sensor Data Quality...
search.dataone.org
beta.hydroshare.org
+1more
Updated Dec 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amber Spackman Jones (2023). Hydroinformatics Instruction Module Example Code: Sensor Data Quality Control with pyhydroqc [Dataset]. https://search.dataone.org/view/sha256%3A481577821de9acf7d3d8ff140d43b228dc772dbcfbc7ba7aeece4bca39590c72
Explore at:
Dataset updated
Dec 30, 2023
Dataset provided by
Hydroshare
Authors
Amber Spackman Jones
Description
This resource contains Jupyter Notebooks with examples for conducting quality control post processing for in situ aquatic sensor data. The code uses the Python pyhydroqc package. The resource is part of set of materials for hydroinformatics and water data science instruction. Complete learning module materials are found in HydroLearn: Jones, A.S., Horsburgh, J.S., Bastidas Pacheco, C.J. (2022). Hydroinformatics and Water Data Science. HydroLearn. https://edx.hydrolearn.org/courses/course-v1:USU+CEE6110+2022/about.

This resources consists of 3 example notebooks and associated data files.

Notebooks: 1. Example 1: Import and plot data 2. Example 2: Perform rules-based quality control 3. Example 3: Perform model-based quality control (ARIMA)

Data files: Data files are available for 6 aquatic sites in the Logan River Observatory. Each file contains data for one site for a single year. Each file corresponds to a single year of data. The files are named according to monitoring site (FranklinBasin, TonyGrove, WaterLab, MainStreet, Mendon, BlackSmithFork) and year. The files were sourced by querying the Logan River Observatory relational database, and equivalent data could be obtained from the LRO website or on HydroShare. Additional information on sites, variables, and methods can be found on the LRO website (http://lrodata.usu.edu/tsa/) or HydroShare (https://www.hydroshare.org/search/?q=logan%20river%20observatory). Each file has the same structure indexed with a datetime column (mountain standard time) with three columns corresponding to each variable. Variable abbreviations and units are: - temp: water temperature, degrees C - cond: specific conductance, μS/cm - ph: pH, standard units - do: dissolved oxygen, mg/L - turb: turbidity, NTU - stage: stage height, cm

For each variable, there are 3 columns: - Raw data value measured by the sensor (column header is the variable abbreviation). - Technician quality controlled (corrected) value (column header is the variable abbreviation appended with '_cor'). - Technician labels/qualifiers (column header is the variable abbreviation appended with '_qual').
D
Data-driven analysis of structural instabilities in electroactive polymer...
darus.uni-stuttgart.de
Updated Feb 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Siddharth Sriram (2024). Data-driven analysis of structural instabilities in electroactive polymer bilayers based on a variational saddle-point principle: Datasets and ML codes [Dataset]. http://doi.org/10.18419/DARUS-3881
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.18419/DARUS-3881
Dataset updated
Feb 16, 2024
Dataset provided by
DaRUS
Authors
Siddharth Sriram
License
https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-3881https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-3881
Dataset funded by
DFG
Description
The datasets and codes provided here are associated with our article entitled "Data-driven analysis of structural instabilities in electroactive polymer bilayers based on a variational saddle-point principle". The main idea of the work is to develop surrogate models using the concepts of machine learning (ML) to predict the onset of wrinkling instabilities in dielectric elastomer (DE) bilayers as a function of its tunable geometric and material parameters. The required datasets for building the surrogate models are generated using a finite-element-based framework for structural stability analysis of DE specimens that is rooted in a saddle-point-based variational principle. For a detailed description of this finite-element framework, the sampling of data points for the training/test sets and some brief notes regarding our implementation of the ML-based surrogates, kindly refer to our article mentioned above. Here, the datasets 'training_set.xlsx' and 'test_set.xlsx' contain the values of the critical buckling load (critical electric-charge density) and critical wrinkle count for the DE bilayer for the sampled data points, where each data point represents a unique set of four tunable input-feature values. The article above provides a description of these features, their physical units and their considered domain of values. The individual Jupyter notebooks import the training dataset and develop ML models for the different problems that are described in the article. The developed models are cross-validated and then tested on the test dataset. Extensive comments describing the ML workflow have been made in the notebooks for the user's reference. The conda environment containing all the necessary packages and dependencies for the execution of the Jupyter notebooks is provided in the file 'de_instabilities.yml'.

Facebook

Twitter

Click to copy link

Link copied

Cite

Richard Ferrers; Speedtest Global Index (2023). Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus ALC - 2020, 2022 [Dataset]. http://doi.org/10.6084/m9.figshare.13621169.v24

Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus ALC - 2020, 2022

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.13621169.v24

Dataset updated

May 30, 2023

Dataset provided by

Figsharehttp://figshare.com/

Authors

Richard Ferrers; Speedtest Global Index

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset compares four cities FIXED-line broadband internet speeds: - Melbourne, AU - Bangkok, TH - Shanghai, CN - Los Angeles, US - Alice Springs, AU

ERRATA: 1.Data is for Q3 2020, but some files are labelled incorrectly as 02-20 of June 20. They all should read Sept 20, or 09-20 as Q3 20, rather than Q2. Will rename and reload. Amended in v7.

LAX file named 0320, when should be Q320. Amended in v8.

*lines of data for each geojson file; a line equates to a 600m^2 location, inc total tests, devices used, and average upload and download speed - MEL 16181 locations/lines => 0.85M speedtests (16.7 tests per 100people) - SHG 31745 lines => 0.65M speedtests (2.5/100pp) - BKK 29296 lines => 1.5M speedtests (14.3/100pp) - LAX 15899 lines => 1.3M speedtests (10.4/100pp) - ALC 76 lines => 500 speedtests (2/100pp)

Geojsons of these 2* by 2* extracts for MEL, BKK, SHG now added, and LAX added v6. Alice Springs added v15.

This dataset unpacks, geospatially, data summaries provided in Speedtest Global Index (linked below). See Jupyter Notebook (*.ipynb) to interrogate geo data. See link to install Jupyter.

** To Do Will add Google Map versions so everyone can see without installing Jupyter. - Link to Google Map (BKK) added below. Key:Green > 100Mbps(Superfast). Black > 500Mbps (Ultrafast). CSV provided. Code in Speedtestv1.1.ipynb Jupyter Notebook. - Community (Whirlpool) surprised [Link: https://whrl.pl/RgAPTl] that Melb has 20% at or above 100Mbps. Suggest plot Top 20% on map for community. Google Map link - now added (and tweet).

** Python melb = au_tiles.cx[144:146 , -39:-37] #Lat/Lon extract shg = tiles.cx[120:122 , 30:32] #Lat/Lon extract bkk = tiles.cx[100:102 , 13:15] #Lat/Lon extract lax = tiles.cx[-118:-120, 33:35] #lat/Lon extract ALC=tiles.cx[132:134, -22:-24] #Lat/Lon extract

Histograms (v9), and data visualisations (v3,5,9,11) will be provided. Data Sourced from - This is an extract of Speedtest Open data available at Amazon WS (link below - opendata.aws).

**VERSIONS v.24 Add tweet and google map of Top 20% (over 100Mbps locations) in Mel Q322. Add v.1.5 MEL-Superfast notebook, and CSV of results (now on Google Map; link below). v23. Add graph of 2022 Broadband distribution, and compare 2020 - 2022. Updated v1.4 Jupyter notebook. v22. Add Import ipynb; workflow-import-4cities. v21. Add Q3 2022 data; five cities inc ALC. Geojson files. (2020; 4.3M tests 2022; 2.9M tests)

Melb 14784 lines Avg download speed 69.4M Tests 0.39M

SHG 31207 lines Avg 233.7M Tests 0.56M

ALC 113 lines Avg 51.5M Test 1092

BKK 29684 lines Avg 215.9M Tests 1.2M

LAX 15505 lines Avg 218.5M Tests 0.74M

v20. Speedtest - Five Cities inc ALC. v19. Add ALC2.ipynb. v18. Add ALC line graph. v17. Added ipynb for ALC. Added ALC to title.v16. Load Alice Springs Data Q221 - csv. Added Google Map link of ALC. v15. Load Melb Q1 2021 data - csv. V14. Added Melb Q1 2021 data - geojson. v13. Added Twitter link to pics. v12 Add Line-Compare pic (fastest 1000 locations) inc Jupyter (nbn-intl-v1.2.ipynb). v11 Add Line-Compare pic, plotting Four Cities on a graph. v10 Add Four Histograms in one pic. v9 Add Histogram for Four Cities. Add NBN-Intl.v1.1.ipynb (Jupyter Notebook). v8 Renamed LAX file to Q3, rather than 03. v7 Amended file names of BKK files to correctly label as Q3, not Q2 or 06. v6 Added LAX file. v5 Add screenshot of BKK Google Map. v4 Add BKK Google map(link below), and BKK csv mapping files. v3 replaced MEL map with big key version. Prev key was very tiny in top right corner. v2 Uploaded MEL, SHG, BKK data and Jupyter Notebook v1 Metadata record

** LICENCE AWS data licence on Speedtest data is "CC BY-NC-SA 4.0", so use of this data must be: - non-commercial (NC) - reuse must be share-alike (SA)(add same licence). This restricts the standard CC-BY Figshare licence.

** Other uses of Speedtest Open Data; - see link at Speedtest below.

Clear search

Close search

Google apps

Main menu

Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus...

Melb 14784 lines Avg download speed 69.4M Tests 0.39M

SHG 31207 lines Avg 233.7M Tests 0.56M

ALC 113 lines Avg 51.5M Test 1092

BKK 29684 lines Avg 215.9M Tests 1.2M

LAX 15505 lines Avg 218.5M Tests 0.74M

Galaxy Training Material for the 'Use Jupyter notebooks in Galaxy' tutorial

Demographic Analysis Workflow using Census API in Jupyter Notebook:...

iris

Iris data loader

National Water Model HydroLearn Python Notebooks

Data from: A hydroclimatological approach to predicting regional landslide...

Supplementary Data for "Identification of Neighborhood Hotspots via the...

Gaia EDR3 Catalogs of Machine-Learned Radial Velocities

Hydroinformatics Instruction Module Example Code: Programmatic Data Access...

Data from: Long-Term Tracing of Indoor Solar Harvesting

OpenOrca

Open-Orca Augmented FLAN Dataset

Unlocking Advanced Language Understanding and ML Model Performance

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Data from: Enhancing Carrier Mobility In Monolayer MoS2 Transistors With...

Process Integration Multi-Actor Multi-Criteria Optimization (PI-MAMCO)...

Articles metadata from CrossRef

See: https://www.kaggle.com/datasets/keakohv/data-citation-coprus-v4-1-eupmc-and-datacite

Load the citation pairs from the Parquet file

Remove all rows where https is in the 'publication' column but no "doi.org" is present

Remove all rows where figshare is in the dataset name

Save list articles to a file

NFDITalk (15 July 2024): Jupyter4NFDI - a central Jupyter Hub for the NFDI

Hydroinformatics Instruction Module Example Code: Introduction to Machine...

Speedtest Open Data - Australia(NZ) 2020-2025; Q220 - Q325 extract by Qtr

New Zealand Hydrological Society Data Workshop 2024: A Python Package for...

Hydroinformatics Instruction Module Example Code: Sensor Data Quality...

Data-driven analysis of structural instabilities in electroactive polymer...

Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus ALC - 2020, 2022

Melb 14784 lines Avg download speed 69.4M Tests 0.39M

SHG 31207 lines Avg 233.7M Tests 0.56M

ALC 113 lines Avg 51.5M Test 1092

BKK 29684 lines Avg 215.9M Tests 1.2M

LAX 15505 lines Avg 218.5M Tests 0.74M