100+ datasets found

c
ckanext-nbedit - Extensions - CKAN Ecosystem Catalog Beta
catalog.civicdataecosystem.org
Updated Jun 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). ckanext-nbedit - Extensions - CKAN Ecosystem Catalog Beta [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-nbedit
Explore at:
Dataset updated
Jun 4, 2025
Description
The nbedit extension for CKAN allows users to create, edit, and run Jupyter Notebooks directly within the CKAN environment. This extension enables users to integrate data exploration and analysis workflows alongside their data management activities. It facilitates a streamlined process for working with datasets, by creating an edit view specifically for notebook editing. Key Features: Create Notebook Edit View: Makes it possible for users to construct edit views in CKAN configured specifically for editing Jupyter Notebooks. This feature sets the foundation for fully integrating analytical workflows and data management. Jupyter Notebook Server Integration: This extension manages the starting and stopping of a Jupyter Notebook server potentially simplifying the technical challenges of deploying such tools within the CKAN ecosystem. User Authentication and Authorization: Seamless integration creates corresponding JupyterHub users for each CKAN user and dynamically requests API Tokens to manage user access. This allows users to access notebooks using tokens within their current CKAN sessions. Project-Based Notebook Management: The extension maps CKAN projects to corresponding groups within JupyterHub, enabling better administration and reporting across datasets and notebook related activities. Fullscreen Editing: Provides an option to open a notebook in fullscreen mode, giving complete focus to data analysis and code development within the CKAN environment. Technical Integration: The nbedit extension uses an API token setup as a service account to JupyterHub. It automatically creates corresponding JupyterHub users for each CKAN user. It also automatically creates groups with administrative reporting capabilities in JupyterHub. The extension likely interfaces with the CKAN resource view system, and triggers backend processes to start/stop notebook servers and manage user authentication. It uses API interaction with JupyterHub, requesting tokens for users making requests on behalf of the current CKAN session. The extension requires configuration settings to be set within the CKAN configuration file (e.g., /etc/ckan/default/development.ini). Benefits & Impact: By integrating Jupyter Notebook functionality, this extension allows users to explore and analyze and edit data in CKAN. The Jupyter Notebook environment can easily be setup to edit and view datasets. The integration increases productivity by removing the need to switch between application and external Jupyter Notebook servers.
H
Jupyter Notebooks for the ERA5 Data Component
hydroshare.org
beta.hydroshare.org
+2more
zip
Updated Apr 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tian Gan (2025). Jupyter Notebooks for the ERA5 Data Component [Dataset]. https://www.hydroshare.org/resource/765e4e8bebab4eea8d72ee447a27c2fa
Explore at:
zip(29.0 KB)Available download formats
Dataset updated
Apr 30, 2025
Dataset provided by
HydroShare
Authors
Tian Gan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This resource includes two Jupyter Notebooks as a quick start tutorial for the ERA5 Data Component of the PyMT modeling framework (https://pymt.readthedocs.io/) developed by Community Surface Dynamics Modeling System (CSDMS https://csdms.colorado.edu/).

The bmi_era5 package is an implementation of the Basic Model Interface (BMI https://bmi.readthedocs.io/en/latest/) for the ERA5 dataset (https://confluence.ecmwf.int/display/CKB/ERA5). This package uses the cdsapi (https://cds.climate.copernicus.eu/api-how-to) to download the ERA5 dataset and wraps the dataset with BMI for data control and query (currently support 3 dimensional ERA5 dataset). This package is not implemented for people to use and is the key element to help convert the ERA5 dataset into a data component for the PyMT modeling framework.

The pymt_era5 package is implemented for people to use as a reusable, plug-and-play ERA5 data component for the PyMT modeling framework. This package uses the BMI implementation from the bmi_era5 package and allows the ERA5 datasets to be easily coupled with other datasets or models that expose a BMI.

HydroShare users can test and run the Jupyter Notebooks (bmi_era5.ipynb, pymt_era5.ipynb) directly through the "CUAHSI JupyterHub" web app with the following steps: - For the new user of the CUAHSI JupyterHub, please first make a request to join the "CUAHSI Could Computing Group" (https://www.hydroshare.org/group/156). After approval, the user will gain access to launch the CUAHSI JupyterHub. - Click on the "Open with" button. (on the top right corner of the page) - Select "CUAHSI JupyterHub". - Select "CSDMS Workbench" server option. (Make sure to select the right server option. Otherwise, the notebook won't run correctly.)

If there is any question or suggestion about the ERA5 data component, please create a github issue at https://github.com/gantian127/bmi_era5/issues
H
Jupyter Notebooks for the ROMS Data Component
hydroshare.org
beta.hydroshare.org
+1more
zip
Updated Jun 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tian Gan (2023). Jupyter Notebooks for the ROMS Data Component [Dataset]. https://www.hydroshare.org/resource/5bed8401cfe04c38b0f84119b1999482
Explore at:
zip(48.6 MB)Available download formats
Dataset updated
Jun 16, 2023
Dataset provided by
HydroShare
Authors
Tian Gan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Mar 31, 2023 - Apr 5, 2023
Area covered

Description
This resource includes two Jupyter Notebooks as a quick start tutorial for the ROMS data component of the PyMT modeling framework (https://pymt.readthedocs.io/) developed by Community Surface Dynamics Modeling System (CSDMS https://csdms.colorado.edu/).

bmi_roms package is an implementation of the Basic Model Interface (BMI https://bmi.readthedocs.io/en/latest/) for the ROMS model (https://www.myroms.org/) datasets. This package downloads the datasets and wraps them with BMI for data control and query. This package is not implemented for people to use but is the key element to convert the ROMS model output dataset into a data component for the PyMT modeling framework.

The pymt_roms package is implemented for people to use as a reusable, plug-and-play ROMS data component for the PyMT modeling framework. This package uses the BMI implementation from the bmi_roms package and allows the ROMS datasets to be easily coupled with other datasets or models that expose a BMI.

If there is any question or suggestion about the ROMS data component, please create a github issue at https://github.com/gantian127/bmi_roms/issues
d
Reporting behavior from WHO COVID-19 public data
search.dataone.org
data.niaid.nih.gov
+1more
Updated Jul 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Auss Abbood (2025). Reporting behavior from WHO COVID-19 public data [Dataset]. http://doi.org/10.5061/dryad.9s4mw6mmb
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.9s4mw6mmb
Dataset updated
Jul 14, 2025
Dataset provided by
Dryad Digital Repository
Authors
Auss Abbood
Time period covered
Dec 16, 2022
Description
Objective Daily COVID-19 data reported by the World Health Organization (WHO) may provide the basis for political ad hoc decisions including travel restrictions. Data reported by countries, however, is heterogeneous and metrics to evaluate its quality are scarce. In this work, we analyzed COVID-19 case counts provided by WHO and developed tools to evaluate country-specific reporting behaviors. Methods In this retrospective cross-sectional study, COVID-19 data reported daily to WHO from 3rd January 2020 until 14th June 2021 were analyzed. We proposed the concepts of binary reporting rate and relative reporting behavior and performed descriptive analyses for all countries with these metrics. We developed a score to evaluate the consistency of incidence and binary reporting rates. Further, we performed spectral clustering of the binary reporting rate and relative reporting behavior to identify salient patterns in these metrics. Results Our final analysis included 222 countries and regions...., Data collection COVID-19 data was downloaded from WHO. Using a public repository, we have added the countries' full names to the WHO data set using the two-letter abbreviations for each country to merge both data sets. The provided COVID-19 data covers January 2020 until June 2021. We uploaded the final data set used for the analyses of this paper. Data processing We processed data using a Jupyter Notebook with a Python kernel and publically available external libraries. This upload contains the required Jupyter Notebook (reporting_behavior.ipynb) with all analyses and some additional work, a README, and the conda environment yml (env.yml)., Any text editor including Microsoft Excel and their free alternatives can open the uploaded CSV file. Any web browser and some code editors (like the freely available Visual Studio Code) can show the uploaded Jupyter Notebook if the required Python environment is set up correctly.
d
(HS 17) Automate Workflows using Jupyter notebook to create Large Spatial...
search.dataone.org
dataone.org
Updated Dec 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Young-Don Choi (2023). (HS 17) Automate Workflows using Jupyter notebook to create Large Spatial Sample Datasets [Dataset]. https://search.dataone.org/view/sha256%3A031befe4052e42a42b569cdcb0e76542e5c5b163dbf4480db9d1a52481071759
Explore at:
Dataset updated
Dec 30, 2023
Dataset provided by
Hydroshare
Authors
Young-Don Choi
Description
For the automated workflows, we create Jupyter notebooks for each state. In these workflows, GIS processing to merge, extract and project GeoTIFF data was the most important process. For this process, we used ArcPy which is a python package to perform geographic data analysis, data conversion, and data management in ArcGIS (Toms, 2015). After creating state-scale LSS datasets in GeoTIFF format, we convert GeoTIFF to NetCDF using xarray and rioxarray Python packages. Xarray is a Python package to work with multi-dimensional arrays and rioxarray is rasterio xarray extension. Rasterio is a Python library to read and write GeoTIFF and other raster formats. We used xarray to manipulate data type and add metadata in NetCDF file and rioxarray to save GeoTIFF to NetCDF format. Through these procedures, we created three composite HyddroShare resources to share state-scale LSS datasets. Due to the limitation of ArcGIS Pro license which is a commercial GIS software, we developed this Jupyter notebook on Windows OS.
Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus...
figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard Ferrers; Speedtest Global Index (2023). Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus ALC - 2020, 2022 [Dataset]. http://doi.org/10.6084/m9.figshare.13621169.v24
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13621169.v24
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Richard Ferrers; Speedtest Global Index
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset compares four cities FIXED-line broadband internet speeds: - Melbourne, AU - Bangkok, TH - Shanghai, CN - Los Angeles, US - Alice Springs, AU

ERRATA: 1.Data is for Q3 2020, but some files are labelled incorrectly as 02-20 of June 20. They all should read Sept 20, or 09-20 as Q3 20, rather than Q2. Will rename and reload. Amended in v7.

LAX file named 0320, when should be Q320. Amended in v8.

*lines of data for each geojson file; a line equates to a 600m^2 location, inc total tests, devices used, and average upload and download speed - MEL 16181 locations/lines => 0.85M speedtests (16.7 tests per 100people) - SHG 31745 lines => 0.65M speedtests (2.5/100pp) - BKK 29296 lines => 1.5M speedtests (14.3/100pp) - LAX 15899 lines => 1.3M speedtests (10.4/100pp) - ALC 76 lines => 500 speedtests (2/100pp)

Geojsons of these 2* by 2* extracts for MEL, BKK, SHG now added, and LAX added v6. Alice Springs added v15.

This dataset unpacks, geospatially, data summaries provided in Speedtest Global Index (linked below). See Jupyter Notebook (*.ipynb) to interrogate geo data. See link to install Jupyter.

** To Do Will add Google Map versions so everyone can see without installing Jupyter. - Link to Google Map (BKK) added below. Key:Green > 100Mbps(Superfast). Black > 500Mbps (Ultrafast). CSV provided. Code in Speedtestv1.1.ipynb Jupyter Notebook. - Community (Whirlpool) surprised [Link: https://whrl.pl/RgAPTl] that Melb has 20% at or above 100Mbps. Suggest plot Top 20% on map for community. Google Map link - now added (and tweet).

** Python melb = au_tiles.cx[144:146 , -39:-37] #Lat/Lon extract shg = tiles.cx[120:122 , 30:32] #Lat/Lon extract bkk = tiles.cx[100:102 , 13:15] #Lat/Lon extract lax = tiles.cx[-118:-120, 33:35] #lat/Lon extract ALC=tiles.cx[132:134, -22:-24] #Lat/Lon extract

Histograms (v9), and data visualisations (v3,5,9,11) will be provided. Data Sourced from - This is an extract of Speedtest Open data available at Amazon WS (link below - opendata.aws).

**VERSIONS v.24 Add tweet and google map of Top 20% (over 100Mbps locations) in Mel Q322. Add v.1.5 MEL-Superfast notebook, and CSV of results (now on Google Map; link below). v23. Add graph of 2022 Broadband distribution, and compare 2020 - 2022. Updated v1.4 Jupyter notebook. v22. Add Import ipynb; workflow-import-4cities. v21. Add Q3 2022 data; five cities inc ALC. Geojson files. (2020; 4.3M tests 2022; 2.9M tests)

Melb 14784 lines Avg download speed 69.4M Tests 0.39M

SHG 31207 lines Avg 233.7M Tests 0.56M

ALC 113 lines Avg 51.5M Test 1092

BKK 29684 lines Avg 215.9M Tests 1.2M

LAX 15505 lines Avg 218.5M Tests 0.74M

v20. Speedtest - Five Cities inc ALC. v19. Add ALC2.ipynb. v18. Add ALC line graph. v17. Added ipynb for ALC. Added ALC to title.v16. Load Alice Springs Data Q221 - csv. Added Google Map link of ALC. v15. Load Melb Q1 2021 data - csv. V14. Added Melb Q1 2021 data - geojson. v13. Added Twitter link to pics. v12 Add Line-Compare pic (fastest 1000 locations) inc Jupyter (nbn-intl-v1.2.ipynb). v11 Add Line-Compare pic, plotting Four Cities on a graph. v10 Add Four Histograms in one pic. v9 Add Histogram for Four Cities. Add NBN-Intl.v1.1.ipynb (Jupyter Notebook). v8 Renamed LAX file to Q3, rather than 03. v7 Amended file names of BKK files to correctly label as Q3, not Q2 or 06. v6 Added LAX file. v5 Add screenshot of BKK Google Map. v4 Add BKK Google map(link below), and BKK csv mapping files. v3 replaced MEL map with big key version. Prev key was very tiny in top right corner. v2 Uploaded MEL, SHG, BKK data and Jupyter Notebook v1 Metadata record

** LICENCE AWS data licence on Speedtest data is "CC BY-NC-SA 4.0", so use of this data must be: - non-commercial (NC) - reuse must be share-alike (SA)(add same licence). This restricts the standard CC-BY Figshare licence.

** Other uses of Speedtest Open Data; - see link at Speedtest below.
Z
Dataset of Video Comments of a Vision Video Classified by Their Relevance,...
data.niaid.nih.gov
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Karras, Oliver; Kristo, Eklekta (2024). Dataset of Video Comments of a Vision Video Classified by Their Relevance, Polarity, Intention, and Topic [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4533301
Explore at:
Dataset updated
Jul 19, 2024
Dataset provided by
TIB - Leibniz Information Centre for Science and Technology
Leibniz University Hannover
Authors
Karras, Oliver; Kristo, Eklekta
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains all comments (comments and replies) of the YouTube vision video "Tunnels" by "The Boring Company" fetched on 2020-10-13 using YouTube API. The comments are classified manually by three persons. We performed a single-class labeling of the video comments regarding their relevance for requirement engineering (RE) (ham/spam), their polarity (positive/neutral/negative). Furthermore, we performed a multi-class labeling of the comments regarding their intention (feature request and problem report) and their topic (efficiency and safety). While a comment can only be relevant or not relevant and have only one polarity, a comment can have one or more intentions and also one or more topics.

For the replies, one person also classified them regarding their relevance for RE. However, the investigation of the replies is ongoing and future work.

Remark: For 126 comments and 26 replies, we could not determine the date and time since they were no longer accessible on YouTube at the time this data set was created. In the case of a missing date and time, we inserted "NULL" in the corresponding cell.

This data set includes the following files:

Dataset.xlsx contains the raw and labeled video comments and replies:

For each comment, the data set contains:

ID: An identification number generated by YouTube for the comment

Date: The date and time of the creation of the comment

Author: The username of the author of the comment

Likes: The number of likes of the comment

Replies: The number of replies to the comment

Comment: The written comment

Relevance: Label indicating the relevance of the comment for RE (ham = relevant, spam = irrelevant)

Polarity: Label indicating the polarity of the comment

Feature request: Label indicating that the comment request a feature

Problem report: Label indicating that the comment reports a problem

Efficiency: Label indicating that the comment deals with the topic efficiency

Safety: Label indicating that the comment deals with the topic safety

For each reply, the data set contains:

ID: The identification number of the comment to which the reply belongs

Date: The date and time of the creation of the reply

Author: The username of the author of the reply

Likes: The number of likes of the reply

Comment: The written reply

Relevance: Label indicating the relevance of the reply for RE (ham = relevant, spam = irrelevant)

Detailed analysis results.xlsx contains the detailed results of all ten times repeated 10-fold cross validation analyses for each of all considered combinations of machine learning algorithms and features

Guide Sheet - Multi-class labeling.pdf describes the coding task, defines the categories, and lists examples to reduce inconsistencies and increase the quality of manual multi-class labeling

Guide Sheet - Single-class labeling.pdf describes the coding task, defines the categories, and lists examples to reduce inconsistencies and increase the quality of manual single-class labeling

Python scripts for analysis.zip contains the scripts (as jupyter notebooks) and prepared data (as csv-files) for the analyses
O
Python Codebase and Jupyter Notebooks - Applications of Machine Learning...
data.openei.org
gdr.openei.org
+3more
archive
Updated Jun 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Steve Brown; Connor Smith; Steve Brown; Connor Smith (2022). Python Codebase and Jupyter Notebooks - Applications of Machine Learning Techniques to Geothermal Play Fairway Analysis in the Great Basin Region, Nevada [Dataset]. http://doi.org/10.15121/1897035
Explore at:
archiveAvailable download formats
Unique identifier
https://doi.org/10.15121/1897035
Dataset updated
Jun 30, 2022
Dataset provided by
USDOE Office of Energy Efficiency and Renewable Energy (EERE), Multiple Programs (EE)
Nevada Bureau of Mines and Geology
Open Energy Data Initiative (OEDI)
Authors
Steve Brown; Connor Smith; Steve Brown; Connor Smith
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Great Basin, Nevada
Description
Git archive containing Python modules and resources used to generate machine-learning models used in the "Applications of Machine Learning Techniques to Geothermal Play Fairway Analysis in the Great Basin Region, Nevada" project. This software is licensed as free to use, modify, and distribute with attribution. Full license details are included within the archive. See "documentation.zip" for setup instructions and file trees annotated with module descriptions.
H
Jupyter Notebooks for the NWIS Data Component
hydroshare.org
beta.hydroshare.org
+2more
zip
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tian Gan (2023). Jupyter Notebooks for the NWIS Data Component [Dataset]. https://www.hydroshare.org/resource/1473ac1368b44808b57cca2848923779
Explore at:
zip(20.4 KB)Available download formats
Dataset updated
Jun 21, 2023
Dataset provided by
HydroShare
Authors
Tian Gan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This resource includes two Jupyter Notebooks as a quick start tutorial for the NWIS Data Component of the PyMT modeling framework (https://pymt.readthedocs.io/) developed by Community Surface Dynamics Modeling System (CSDMS https://csdms.colorado.edu/).

The bmi_nwis package is an implementation of the Basic Model Interface (BMI https://bmi.readthedocs.io/en/latest/) for the USGS NWIS dataset (https://waterdata.usgs.gov/nwis). This package uses the dataretrieval package (https://github.com/USGS-python/dataretrieval) to download the NWIS dataset and wraps the dataset with BMI for data control and query. This package is not implemented for people to use but is the key element to convert the NWIS dataset into a data component for the PyMT modeling framework.

The pymt_nwis package is implemented for people to use as a reusable, plug-and-play NWIS data component for the PyMT modeling framework. This package uses the BMI implementation from the bmi_nwis package and allows the NWIS datasets to be easily coupled with other datasets or models that expose a BMI.

HydroShare users can test and run the Jupyter Notebooks (bmi_nwis.ipynb, pymt_nwis.ipynb) directly through the "CUAHSI JupyterHub" web app with the following steps: - For the new user of the CUAHSI JupyterHub, please first make a request to join the "CUAHSI Could Computing Group" (https://www.hydroshare.org/group/156). After approval, the user will gain access to launch the CUAHSI JupyterHub. - Click on the "Open with" button. (on the top right corner of the page) - Select "CUAHSI JupyterHub". - Select "CSDMS Workbench" server option. (Make sure to select the right server option. Otherwise, the notebook won't run correctly.)

If there is any question or suggestion about the NWIS data component, please create a github issue at https://github.com/gantian127/bmi_nwis/issues
Replication Package: Unboxing Default Argument Breaking Changes in 1 + 2...
zenodo.org
application/gzip
Updated Jul 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
João Eduardo Montandon; Luciana Lourdes Silva; Cristiano Politowski; Daniel Prates; Arthur Bonifácio; Ghizlane El Boussaidi; João Eduardo Montandon; Luciana Lourdes Silva; Cristiano Politowski; Daniel Prates; Arthur Bonifácio; Ghizlane El Boussaidi (2024). Replication Package: Unboxing Default Argument Breaking Changes in 1 + 2 Data Science Libraries in Python [Dataset]. http://doi.org/10.5281/zenodo.11584961
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11584961
Dataset updated
Jul 15, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
João Eduardo Montandon; Luciana Lourdes Silva; Cristiano Politowski; Daniel Prates; Arthur Bonifácio; Ghizlane El Boussaidi; João Eduardo Montandon; Luciana Lourdes Silva; Cristiano Politowski; Daniel Prates; Arthur Bonifácio; Ghizlane El Boussaidi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Replication Package

This repository contains data and source files needed to replicate our work described in the paper "Unboxing Default Argument Breaking Changes in Scikit Learn".

Requirements

We recommend the following requirements to replicate our study:

Internet access

At least 100GB of space

Docker installed

Git installed

Package Structure

We relied on Docker containers to provide a working environment that is easier to replicate. Specifically, we configure the following containers:

data-analysis, an R-based Container we used to run our data analysis.

data-collection, a Python Container we used to collect Scikit's default arguments and detect them in client applications.

database, a Postgres Container we used to store clients' data, obtainer from Grotov et al.

storage, a directory used to store the data processed in data-analysis and data-collection. This directory is shared in both containers.

docker-compose.yml, the Docker file that configures all containers used in the package.

In the remainder of this document, we describe how to set up each container properly.

Using VSCode to Setup the Package

We selected VSCode as the IDE of choice because its extensions allow us to implement our scripts directly inside the containers. In this package, we provide configuration parameters for both data-analysis and data-collection containers. This way you can directly access and run each container inside it without any specific configuration.

You first need to set up the containers

$ cd /replication/package/folder $ docker-compose build $ docker-compose up # Wait docker creating and running all containers

Then, you can open them in Visual Studio Code:

Open VSCode in project root folder

Access the command palette and select "Dev Container: Reopen in Container"

Select either Data Collection or Data Analysis.

Start working

If you want/need a more customized organization, the remainder of this file describes it in detail.

Longest Road: Manual Package Setup

Database Setup

The database container will automatically restore the dump in dump_matroskin.tar in its first launch. To set up and run the container, you should:

Build an image:

$ cd ./database $ docker build --tag 'dabc-database' . $ docker image ls REPOSITORY TAG IMAGE ID CREATED SIZE dabc-database latest b6f8af99c90d 50 minutes ago 18.5GB

Create and enter inside the container:

$ docker run -it --name dabc-database-1 dabc-database $ docker exec -it dabc-database-1 /bin/bash root# psql -U postgres -h localhost -d jupyter-notebooks jupyter-notebooks=# \dt List of relations Schema | Name | Type | Owner --------+-------------------+-------+------- public | Cell | table | root public | Code_cell | table | root public | Md_cell | table | root public | Notebook | table | root public | Notebook_features | table | root public | Notebook_metadata | table | root public | repository | table | root

If you got the tables list as above, your database is properly setup.

It is important to mention that this database is extended from the one provided by Grotov et al.. Basically, we added three columns in the table Notebook_features (API_functions_calls, defined_functions_calls, andother_functions_calls) containing the function calls performed by each client in the database.

Data Collection Setup

This container is responsible for collecting the data to answer our research questions. It has the following structure:

dabcs.py, extract DABCs from Scikit Learn source code, and export them to a CSV file.

dabcs-clients.py, extract function calls from clients and export them to a CSV file. We rely on a modified version of Matroskin to leverage the function calls. You can find the tool's source code in the `matroskin`` directory.

Makefile, commands to set up and run both dabcs.py and dabcs-clients.py

matroskin, the directory containing the modified version of matroskin tool. We extended the library to collect the function calls performed on the client notebooks of Grotov's dataset.

storage, a docker volume where the data-collection should save the exported data. This data will be used later in Data Analysis.

requirements.txt, Python dependencies adopted in this module.

Note that the container will automatically configure this module for you, e.g., install dependencies, configure matroskin, download scikit learn source code, etc. For this, you must run the following commands:

$ cd ./data-collection $ docker build --tag "data-collection" . $ docker run -it -d --name data-collection-1 -v $(pwd)/:/data-collection -v $(pwd)/../storage/:/data-collection/storage/ data-collection $ docker exec -it data-collection-1 /bin/bash $ ls Dockerfile Makefile config.yml dabcs-clients.py dabcs.py matroskin storage requirements.txt utils.py

If you see project files, it means the container is configured accordingly.

Data Analysis Setup

We use this container to conduct the analysis over the data produced by the Data Collection container. It has the following structure:

dependencies.R, an R script containing the dependencies used in our data analysis.

data-analysis.Rmd, the R notebook we used to perform our data analysis

datasets, a docker volume pointing to the storage directory.

Execute the following commands to run this container:

$ cd ./data-analysis $ docker build --tag "data-analysis" . $ docker run -it -d --name data-analysis-1 -v $(pwd)/:/data-analysis -v $(pwd)/../storage/:/data-collection/datasets/ data-analysis $ docker exec -it data-analysis-1 /bin/bash $ ls data-analysis.Rmd datasets dependencies.R Dockerfile figures Makefile

If you see project files, it means the container is configured accordingly.

A note on storage shared folder

As mentioned, the storage folder is mounted as a volume and shared between data-collection and data-analysis containers. We compressed the content of this folder due to space constraints. Therefore, before starting working on Data Collection or Data Analysis, make sure you extracted the compressed files. You can do this by running the Makefile inside storage folder.

$ make unzip # extract files $ ls clients-dabcs.csv clients-validation.csv dabcs.csv Makefile scikit-learn-versions.csv versions.csv $ make zip # compress files $ ls csv-files.tar.gz Makefile
Dissertation Supplementary Files
figshare.com
html
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Augustine Dunn (2023). Dissertation Supplementary Files [Dataset]. http://doi.org/10.6084/m9.figshare.810442.v7
Explore at:
htmlAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.810442.v7
Dataset updated
May 31, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Augustine Dunn
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Purpose These are a collection of supplementary files that are to be included in my dissertation. They include but are not limited to small IPython notebooks, extra figures, data-sets that are too large to publish in the main document such as full ortholog lists and other primary data.

Viewing IPython notebooks (ipynb files) To view an IPython notebook, "right-click" its download link and select "Copy link address". Then navigate to the the free notebook viewer by following this link: http://nbviewer.ipython.org/. Finally, paste the link to the ipynb file that you copied into the URL form on the nbviewer page and click "Go".
t
Dataset belonging to the publication "data-producing methods in crc 985:...
service.tib.eu
Updated Nov 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Dataset belonging to the publication "data-producing methods in crc 985: recommendations for research data management in large interdisciplinary projects" [Dataset]. https://service.tib.eu/ldmservice/dataset/rdr-doi-10-22000-1793
Explore at:
Dataset updated
Nov 28, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract: This data was collected and processed as part of the CRC 985 INF project. It was used to create an overview of the data-producing methods available and employed throughout the project and their associated file types. This information was used as a basis for the associated manuscript (see related identifiers). The Jupyter Notebook used to create the figures in the manuscipt is included within this dataset. Furthermore, the surveys give insight into research data management practices within this project and large, interdisciplinary projects in general. Method: Survey
N
Dataset belonging to the publication "Data-Producing Methods in CRC 985:...
search.nfdi4chem.de
radar4chem.radar-service.eu
+1more
tar
Updated Dec 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Radar4Chem (2025). Dataset belonging to the publication "Data-Producing Methods in CRC 985: Recommendations for Research Data Management in Large Interdisciplinary Projects [Dataset]. http://doi.org/10.22000/1793
Explore at:
tarAvailable download formats
Unique identifier
https://doi.org/10.22000/1793
Dataset updated
Dec 3, 2025
Dataset provided by
Radar4Chem
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data was collected and processed as part of the CRC 985 INF project. It was used to create an overview of the data-producing methods available and employed throughout the project and their associated file types. This information was used as a basis for the associated manuscript (see related identifiers). The Jupyter Notebook used to create the figures in the manuscipt is included within this dataset. Furthermore, the surveys give insight into research data management practices within this project and large, interdisciplinary projects in general.
J
Replication Data and Code for: 'Machine learning isotropic g values of...
data-legacy.fz-juelich.de
Updated Mar 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jülich DATA (2024). Replication Data and Code for: 'Machine learning isotropic g values of radical polymers' [Dataset]. http://doi.org/10.26165/JUELICH-DATA/TOBXWP
Explore at:
xyz(7432), application/x-ipynb+json(24454), xyz(7426), xyz(7590), xyz(7473), xyz(7220), xyz(7438), xyz(7440), xyz(7325), xyz(7431), xyz(7405), xyz(7437), xyz(7673), xyz(7377), xyz(7587), xyz(7439), xyz(7548), xyz(7585), xyz(7433), xyz(7456), xyz(7672), xyz(7448), xyz(7451), bin(1062), xyz(7392), xyz(7400), xyz(7370), xyz(7413), xyz(7586), xyz(7395), xyz(7402), xyz(7316), xyz(7122), xyz(7481), xyz(7564), xyz(7404), xyz(7209), xyz(7631), xyz(7529), pdf(441670), xyz(7435), xyz(7403), xyz(7314), xyz(7349), xyz(7503), xyz(7469), xyz(7407), xyz(7353), xyz(7040), xyz(7644), xyz(7441), xyz(7465), xyz(7665), xyz(7198), xyz(7464), xyz(7381), xyz(7434), xyz(7397), xyz(7463), xyz(7414), xyz(7531), bin(1328), xyz(7367), bin(371648), xyz(10059), xyz(7506), xyz(7043), bin(5702528), xyz(7466), xyz(7380), xyz(7358), xyz(7352), bin(2032), xyz(7444), xyz(7499), xyz(7488), xyz(7635), bin(224), bin(894128), xyz(7555), xyz(7562), xyz(7092), xyz(7422), xyz(7524), xyz(7577), xyz(7317), xyz(7211), xyz(7183), xyz(7280), xyz(7390), xyz(7335), xyz(7417), xyz(7070), xyz(7127), xyz(7643), xyz(7406), xyz(7201), xyz(7130), xyz(7470), xyz(7326), xyz(7046), xyz(7674), bin(804728), xyz(7592), xyz(7624), xyz(7312), xyz(7099), xyz(7475), xyz(7119), xyz(7482), xyz(7423), xyz(7126), xyz(7420), xyz(7063), xyz(7371), xyz(7374), xyz(7354), xyz(7457), xyz(7494), xyz(7342), xyz(7339), bin(1208), xyz(7453), xyz(7554), xyz(7500), png(127746), xyz(7580), xyz(7399), xyz(10081), xyz(7447), xyz(7461), xyz(7429), xyz(7669), xyz(7510), xyz(7331), xyz(7647), txt(2985), xyz(7625), xyz(7450), xyz(7452), xyz(7068), xyz(7534), xyz(7572), xyz(7446), xyz(7372), xyz(7389), text/x-python(2048), xyz(7140), xyz(7283), xyz(7269), xyz(7103), bin(8359328), xyz(7639), xyz(7293), xyz(7421), xyz(7327), xyz(7472), xyz(7100), bin(2755), xyz(7474), xyz(7459), xyz(7018), xyz(7536), xyz(7145), xyz(7468), xyz(7401), xyz(7462), bin(708927), application/x-ipynb+json(3971), xyz(7366), xyz(7436), xyz(7073), xyz(7428), xyz(7393), xyz(7556), xyz(7386), xyz(7398), xyz(7368), xyz(7376), bin(650678), xyz(7642), xyz(7087), xyz(7291), bin(3240128), xyz(7615), xyz(7520), xyz(7442), xyz(7454), xyz(7384), xyz(7443), xyz(7343), bin(1936719), xyz(7409), xyz(7412), xyz(7101), xyz(7516), bin(4089), xyz(7582)Available download formats
Unique identifier
https://doi.org/10.26165/JUELICH-DATA/TOBXWP
Dataset updated
Mar 11, 2024
Dataset provided by
Jülich DATA
Dataset funded by
DFG
RWTH Aachen University
Description
This data repository contains the data sets and python scripts associated with the manuscript 'Machine learning isotropic g values of radical polymers '. Electron paramagnetic resonance measurements allow for obtaining experimental g values of radical polymers. Analogous to chemical shifts, g values give insight into the identity and environment of the paramagnetic center. In this work, Machine learning based prediction of g values is explored as a viable alternative to computationally expensive density functional theory (DFT) methods. Description of folder contents (switch to tree view): Datasets : Contains PTMA polymer structures from TR, TE-1, and TE-2 data sets transformed using a molecular descriptor (SOAP, MBTR or DAD) and corresponding DFT-calculated g values. Filenames contain 'PTMA_X' where X denotes the number of monomers which are radicals. Structure data sets have 'structure_data' in the title, DFT calculated g values have 'giso_DFT_data' in the title. The files are in .npy (NumPy) format. Models : ERT models trained on SOAP, MBTR and DAD feature vectors. Scripts : Contains scripts which can be used to predict g values from XYZ files of PTMA structures with 6 monomer units and varying radical density. The script 'prediction_functions.py' contains the functions which transform the XYZ coordinates into an appropriate feature vector which the trained model uses to predict. Description of individual functions are also given as docstrings (python documentation strings) in the code. The folder also contains additional files needed for the ERT-DAD model in .pkl format. XYZ_files : Contains atomic coordinates of PTMA structures in XYZ format. Two subfolders : WSD and TE-2 correspond to structures present in the whole structure data set and TE-2 test data set (see main text in the manuscript for details). Filenames in the folder 'XYZ_files/TE-2/PTMA-X/' are of the type 'chainlength_6ptma_Y'_Y''.xyz' where 'chainlength_6ptma' denotes the length of polymer chain (6 monomers), Y' denotes the proportion of monomers which are radicals (for instance, Y' = 50 means 3 out of 6 monomers are radicals) and Y'' denotes the order of the MD time frame. Actual time frame values of Y'' in ps is given in the manuscript. PTMA-ML.ipynb : Jupyter notebook detailing the workflow of generating the trained model. The file includes steps to load data sets, transform xyz files using molecular descriptors, optimise hyperparameters , train the model, cross validate using the training data set and evaluate the model. PTMA-ML.pdf : PTMA-ML.ipynb in PDF format. List of abbreviations : PTMA : poly(2,2,6,6-tetramethyl-1-piperidinyloxy-4-yl methacrylate) TR : Training data set TE-1 : Test data set 1 TE-2 : Test data set 2 ERT : Extremely randomized trees WSD : Whole structure data set SOAP : Smooth overlap of atomic orbitals MBTR : Many-body tensor representation DAD : Distances-Angles-Dihedrals
Z
Dataset for paper "Mitigating the effect of errors in source parameters on...
data.niaid.nih.gov
Updated Sep 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nienke Blom; Phil-Simon Hardalupas; Nicholas Rawlinson (2022). Dataset for paper "Mitigating the effect of errors in source parameters on seismic (waveform) inversion" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6969601
Explore at:
Dataset updated
Sep 28, 2022
Dataset provided by
University of Cambridge
Authors
Nienke Blom; Phil-Simon Hardalupas; Nicholas Rawlinson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset corresponding to the journal article "Mitigating the effect of errors in source parameters on seismic (waveform) inversion" by Blom, Hardalupas and Rawlinson, accepted for publication in Geophysical Journal International. In this paper, we demonstrate the effect or errors in source parameters on seismic tomography, with a particular focus on (full) waveform tomography. We study effect both on forward modelling (i.e. comparing waveforms and measurements resulting from a perturbed vs. unperturbed source) and on seismic inversion (i.e. using a source which contains an (erroneous) perturbation to invert for Earth structure. These data were obtained using Salvus, a state-of-the-art (though proprietary) 3-D solver that can be used for wave propagation simulations (Afanasiev et al., GJI 2018).

This dataset contains:

The entire Salvus project. This project was prepared using Salvus version 0.11.x and 0.12.2 and should be fully compatible with the latter.

A number of Jupyter notebooks used to create all the figures, set up the project and do the data processing.

A number of Python scripts that are used in above notebooks.

two conda environment .yml files: one with the complete environment as used to produce this dataset, and one with the environment as supplied by Mondaic (the Salvus developers), on top of which I installed basemap and cartopy.

An overview of the inversion configurations used for each inversion experiment and the name of hte corresponding figures: inversion_runs_overview.ods / .csv .

Datasets corresponding to the different figures.

One dataset for Figure 1, showing the effect of a source perturbation in a real-world setting, as previously used by Blom et al., Solid Earth 2020

One dataset for Figure 2, showing how different methodologies and assumptions can lead to significantly different source parameters, notably including systematic shifts. This dataset was kindly supplied by Tim Craig (Craig, 2019).

A number of datasets (stored as pickled Pandas dataframes) derived from the Salvus project. We have computed:

travel-time arrival predictions from every source to all stations (df_stations...pkl)

misfits for different metrics for both P-wave centered and S-wave centered windows for all components on all stations, comparing every time waveforms from a reference source against waveforms from a perturbed source (df_misfits_cc.28s.pkl)

addition of synthetic waveforms for different (perturbed) moment tenors. All waveforms are stored in HDF5 (.h5) files of the ASDF (adaptable seismic data format) type

How to use this dataset:

To set up the conda environment:

make sure you have anaconda/miniconda

make sure you have access to Salvus functionality. This is not absolutely necessary, but most of the functionality within this dataset relies on salvus. You can do the analyses and create the figures without, but you'll have to hack around in the scripts to build workarounds.

Set up Salvus / create a conda environment. This is best done following the instructions on the Mondaic website. Check the changelog for breaking changes, in that case download an older salvus version.

Additionally in your conda env, install basemap and cartopy:

conda-env create -n salvus_0_12 -f environment.yml conda install -c conda-forge basemap conda install -c conda-forge cartopy

Install LASIF (https://github.com/dirkphilip/LASIF_2.0) and test. The project uses some lasif functionality.

To recreate the figures: This is extremely straightforward. Every figure has a corresponding Jupyter Notebook. Suffices to run the notebook in its entirety.

Figure 1: separate notebook, Fig1_event_98.py

Figure 2: separate notebook, Fig2_TimCraig_Andes_analysis.py

Figures 3-7: Figures_perturbation_study.py

Figures 8-10: Figures_toy_inversions.py

To recreate the dataframes in DATA: This can be done using the example notebook Create_perturbed_thrust_data_by_MT_addition.py and Misfits_moment_tensor_components.M66_M12.py . The same can easily be extended to the position shift and other perturbations you might want to investigate.

To recreate the complete Salvus project: This can be done using:

the notebook Prepare_project_Phil_28s_absb_M66.py (setting up project and running simulations)

the notebooks Moment_tensor_perturbations.py and Moment_tensor_perturbation_for_NS_thrust.py

For the inversions: using the notebook Inversion_SS_dip.M66.28s.py as an example. See the overview table inversion_runs_overview.ods (or .csv) as to naming conventions.

References:

Michael Afanasiev, Christian Boehm, Martin van Driel, Lion Krischer, Max Rietmann, Dave A May, Matthew G Knepley, Andreas Fichtner, Modular and flexible spectral-element waveform modelling in two and three dimensions, Geophysical Journal International, Volume 216, Issue 3, March 2019, Pages 1675–1692, https://doi.org/10.1093/gji/ggy469

Nienke Blom, Alexey Gokhberg, and Andreas Fichtner, Seismic waveform tomography of the central and eastern Mediterranean upper mantle, Solid Earth, Volume 11, Issue 2, 2020, Pages 669–690, 2020, https://doi.org/10.5194/se-11-669-2020

Tim J. Craig, Accurate depth determination for moderate-magnitude earthquakes using global teleseismic data. Journal of Geophysical Research: Solid Earth, 124, 2019, Pages 1759– 1780. https://doi.org/10.1029/2018JB016902
D
Replication Data for: Static polarizabilities at the basis set limit: A...
dataverse.no
dataverse.azure.uit.no
+1more
Updated Sep 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anders Brakestad; Anders Brakestad; Stig Rune Jensen; Stig Rune Jensen; Peter Wind; Peter Wind; Marco D'Alessandro; Luigi Genovese; Luigi Genovese; Kathrin Helen Hopmann; Kathrin Helen Hopmann; Luca Frediani; Luca Frediani; Marco D'Alessandro (2023). Replication Data for: Static polarizabilities at the basis set limit: A benchmark of 124 species [Dataset]. http://doi.org/10.18710/KLQVOK
Explore at:
application/x-ipynb+json(75002), application/x-ipynb+json(71738), bin(18597), application/x-ipynb+json(7439), bin(46380), application/x-ipynb+json(59752), application/x-ipynb+json(10644), xyz(16320), application/x-ipynb+json(84000), application/x-ipynb+json(53519), bin(1946), application/x-ipynb+json(91695), bin(49943), application/x-ipynb+json(182295), bin(85403), bin(34328), application/x-ipynb+json(73951), txt(86), application/x-ipynb+json(162938), application/x-ipynb+json(309298), bin(11129), bin(13119), text/x-python(4442), txt(4125), tsv(205), tsv(1949), tsv(4021)Available download formats
Unique identifier
https://doi.org/10.18710/KLQVOK
Dataset updated
Sep 28, 2023
Dataset provided by
DataverseNO
Authors
Anders Brakestad; Anders Brakestad; Stig Rune Jensen; Stig Rune Jensen; Peter Wind; Peter Wind; Marco D'Alessandro; Luigi Genovese; Luigi Genovese; Kathrin Helen Hopmann; Kathrin Helen Hopmann; Luca Frediani; Luca Frediani; Marco D'Alessandro
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Dataset funded by
The Research Council of Norway
The Norwegian Metacenter for Computational Science
Description
Introduction This Dataverse entry contains replication data for our journal article “Static polarizabilities at the basis set limit: A benchmark of 124 species” published in Journal of Chemical Theory and Computation. It contains highly precise static polarizabilities computed in multiwavelet basis in combination with density functional theory (DFT, PBE functional). In addition, the d/Preliminaryata set contains analysis tools (Jupyter Notebooks with Python3 code) for generating the figures in the journal article. How to use Because our multiwavelet data is guaranteed to be at the complete basis set limit (to within the specified limit), it is suitable as a benchmark reference in studies of static polarizabilities where the basis set convergence is important. With multiwavelets we don't have to assume that the computed property is at the basis set limit, as is the case with Gaussian type orbital (GTO) basis sets, and it is therefore possible to confirm whether the property of interest computed basis is sufficiently converged with respect to the complete basis set limit. Our benchmark reference can also be used in the development of new methodology that requires accurate training data. Running the Jupyter Notebooks The Anaconda Python distribution is usually recommended for obtaining Jupyter Notebook. It can be downloaded from here: https://www.anaconda.com/distribution/ The simplest way to run the notebooks is to download all files in this DataverseNO dataset. That will preserve the directory structure, which is absolutely necessary to avoid errors. Then start your Jupyter Notebook session, navigate to the data set directory, and open the desired notebook. Journal article Brakestad et al. "Static polarizabilities at the basis set limit: A benchmark of 124 species". J. Chem. Theory Comput. (2020) Abstract from journal article Benchmarking molecular properties with Gaussian-type orbital (GTO) basis sets can be challenging, because one has to assume that the computed property is at the complete basis set (CBS) limit, without a robust measure of the error. Multiwavelet (MW) bases can be systematically improved with a controllable error, which eliminates the need for such assumptions. In this work, we have used MWs within Kohn–Sham density functional theory to compute static polarizabilities for a set of 92 closed-shell and 32 open-shell species. The results are compared to recent benchmark calculations employing the GTO-type aug-pc4 basis set. We observe discrepancies between GTO and MW results for several species, with open-shell systems showing the largest deviations. Based on linear response calculations, we show that these discrepancies originate from artefacts caused by the field strength, and that several polarizabilies from a previous study were contaminated by higher order responses (hyperpolarizabilities). Based on our MW benchmark results, we can affirm that aug-pc4 is able to provide results close to the CBS limit, as long as finite-difference effects can be controlled. However, we suggest that a better approach is to use MWs, which are able to yield precise finite-difference polarizabilities even with small field strengths.
Z
Data from: Application Cases of Inverse Modelling with the PROPTI Framework...
data.niaid.nih.gov
Updated Mar 11, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arnold, Lukas; Hehnen, Tristan; Lauer, Patrick; Trettin, Corinna; Vinayak , Ashish (2021). Application Cases of Inverse Modelling with the PROPTI Framework - Data Set [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2538846
Explore at:
Dataset updated
Mar 11, 2021
Dataset provided by
Civil Safety Research, Forschungszentrum Jülich, Germany
Department of Fire and Explosion Safety, University of Wuppertal, Germany
Chair of Fluid Mechanics, University of Wuppertal, Germany
Authors
Arnold, Lukas; Hehnen, Tristan; Lauer, Patrick; Trettin, Corinna; Vinayak , Ashish
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Contents

Set of simulation data, supplementary for a paper submitted to (published: 15 June 2019) the Fire Safety Journal, with the title "Application Cases of Inverse Modelling with the PROPTI Framework". See also our project at ResearchGate.

This repository contains the complete input data for each IMP run of the mass loss calorimeter, shown in this paper. This comprises of the experimental data files, the templates for the simulation models and the input file for PROPTI.

The data base files are provided. This includes the original ones created by PROPTI during the run, as well as the cleaned data base files, used to create the plots, and the extracted best parameter sets per generation. Plots, created during the IMP runs as means of monitoring the progress are also included.

Furthermore, the repository contains a small collection of Jupyter notebooks which have been used to process the data base files and create the plots presented in this paper.

The full factorial simulations were set up from within a Jupyter notebook. This notebook and the conducted simulations are also part of this repository.

Data of the various TGA simulations are provided within a very similar repository, linked to a conference paper (ESFSS 2018, Nancy, France).

Finally, the simulation input files, PROPTI input, as well as the custom script for file handling in concert with OpenFOAM, are provided.

Technical Information

Each ZIP archive represents a sub-directory of the original directory. For the analysis scripts, the Jupyter notebooks, to work properly out of the box it is necessary to keep this structure. Thus, simply extract all archives into the same directory.

Note: Size on disc, after extraction, is about 4.1 GB. Version 2 adds about 5.1 GB.

Version 2:

Version 2 contains new IMP runs that address an error in determining the normalised residual mass, see Jupyter Notebook "RevisedTargetAssessment.ipynb", as well as input from the reviewers. The IMP runs are denoted by "08" after the optimisation algorithm label, e.g. "MLC_FSCABC_08_new_75kw_Ins".
N
Supplementary material for "The Effects of Surfaces and Confinement on...
search.nfdi4chem.de
darus.uni-stuttgart.de
html
Updated Jul 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DaRUS (2025). Supplementary material for "The Effects of Surfaces and Confinement on Formic Acid Dehydrogenation Catalyzed by an Immobilized Ru-H Complex: Insights from Molecular Simulation and Neutron Scattering" [Dataset]. https://search.nfdi4chem.de/dataset/doi-10-18419-darus-3584
Explore at:
htmlAvailable download formats
Dataset updated
Jul 29, 2025
Dataset provided by
DaRUS
Description
This dataset contains simulation input files in GROMACS format accompanying the mentioned publication. Structure, topology, and simulation parameter-files (directory mdp) are provided for bulk simulations of pure dioxane and formic acid as well as mixture of both in pore and bulk simulation. The pore simulation is divided into three steps, an energy-minimization, an NVT equilibration, and an NVT production simulation run. While the bulk simulations introduce an NpT step after the first equilibration step and an NpT production run instead of a NVT production. Provided structure files are of an already equilibrated system. Object files are supplied which can be used to load the generated pores into PoreMS for later alteration and analysis. Results of density of pore systems are provided in hdf5 format to be processed with the PoreAna python package. Jupyter notebooks to load and display with PoreAna the data are provided. Also yaml files which contain the density of the pure and mixture bulk simulations are added to the data set. Here an accompanying jupyter notebook to read the yaml files is supplied with.

In addition, the data set contains data from IR and NMR experiments.

We recommend viewing the data by choosing the option "Tree".
o
Demographic Analysis Workflow using Census API in Jupyter Notebook:...
openicpsr.org
delimited
Updated Jul 23, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Donghwan Gu; Nathanael Rosenheim (2020). Demographic Analysis Workflow using Census API in Jupyter Notebook: 1990-2000 Population Size and Change [Dataset]. http://doi.org/10.3886/E120381V1
Explore at:
delimitedAvailable download formats
Unique identifier
https://doi.org/10.3886/E120381V1
Dataset updated
Jul 23, 2020
Dataset provided by
Texas A&M University
Authors
Donghwan Gu; Nathanael Rosenheim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Boone County, Kentucky, US Counties
Description
This archive reproduces a table titled "Table 3.1 Boone county population size, 1990 and 2000" from Wang and vom Hofe (2007, p.58). The archive provides a Jupyter Notebook that uses Python and can be run in Google Colaboratory. The workflow uses Census API to retrieve data, reproduce the table, and ensure reproducibility for anyone accessing this archive.The Python code was developed in Google Colaboratory, or Google Colab for short, which is an Integrated Development Environment (IDE) of JupyterLab and streamlines package installation, code collaboration and management. The Census API is used to obtain population counts from the 1990 and 2000 Decennial Census (Summary File 1, 100% data). All downloaded data are maintained in the notebook's temporary working directory while in use. The data are also stored separately with this archive.The notebook features extensive explanations, comments, code snippets, and code output. The notebook can be viewed in a PDF format or downloaded and opened in Google Colab. References to external resources are also provided for the various functional components. The notebook features code to perform the following functions:install/import necessary Python packagesintroduce a Census API Querydownload Census data via CensusAPI manipulate Census tabular data calculate absolute change and percent changeformatting numbersexport the table to csvThe notebook can be modified to perform the same operations for any county in the United States by changing the State and County FIPS code parameters for the Census API downloads. The notebook could be adapted for use in other environments (i.e., Jupyter Notebook) as well as reading and writing files to a local or shared drive, or cloud drive (i.e., Google Drive).
H
Jupyter Notebooks to demonstrate RHESsys model on Coweeta sub18 in Binder
hydroshare.cuahsi.org
hydroshare.org
+1more
zip
Updated Nov 1, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
YOUNG-DON CHOI (2019). Jupyter Notebooks to demonstrate RHESsys model on Coweeta sub18 in Binder [Dataset]. https://hydroshare.cuahsi.org/resource/726ad560948d4c88b6ca7ef8b3d44cba/
Explore at:
zip(3.5 MB)Available download formats
Dataset updated
Nov 1, 2019
Dataset provided by
HydroShare
Authors
YOUNG-DON CHOI
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Hydrologic models are growing in complexity: spatial representations, model coupling, process representations, software structure, etc. New and emerging datasets are growing, supporting even more detailed modeling use cases. This complexity is leading to the reproducibility crisis in hydrologic modeling and analysis. We argue that moving hydrologic modeling to the cloud can help to address this reproducibility crisis. - We create two notebooks: 1. The first notebook demonstrates the process of collecting and manipulating GIS and Time-series data using GRASS GIS, Python and R to create RHESsys Model input. 2. The second notebook demonstrates the process of model compilation, parallel simulation, and visualization.

The first notebook includes:

Create Project Directory and Download Raw GIS Data from HydroShare

Set GRASS Database and GISBASE Environment

Preprocessing GIS Data for RHESsys Model using GRASS GIS and R script

Preprocess Time series data for RHESsys Model

Construct worldfile and flowtable to RHESSys

The second notebook includes:

Download and compile RHESsys Execution file

Simulate RHESsys model

Plotting RHESsys output

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). ckanext-nbedit - Extensions - CKAN Ecosystem Catalog Beta [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-nbedit

ckanext-nbedit - Extensions - CKAN Ecosystem Catalog Beta

Explore at:

Dataset updated

Jun 4, 2025

Description

The nbedit extension for CKAN allows users to create, edit, and run Jupyter Notebooks directly within the CKAN environment. This extension enables users to integrate data exploration and analysis workflows alongside their data management activities. It facilitates a streamlined process for working with datasets, by creating an edit view specifically for notebook editing. Key Features: Create Notebook Edit View: Makes it possible for users to construct edit views in CKAN configured specifically for editing Jupyter Notebooks. This feature sets the foundation for fully integrating analytical workflows and data management. Jupyter Notebook Server Integration: This extension manages the starting and stopping of a Jupyter Notebook server potentially simplifying the technical challenges of deploying such tools within the CKAN ecosystem. User Authentication and Authorization: Seamless integration creates corresponding JupyterHub users for each CKAN user and dynamically requests API Tokens to manage user access. This allows users to access notebooks using tokens within their current CKAN sessions. Project-Based Notebook Management: The extension maps CKAN projects to corresponding groups within JupyterHub, enabling better administration and reporting across datasets and notebook related activities. Fullscreen Editing: Provides an option to open a notebook in fullscreen mode, giving complete focus to data analysis and code development within the CKAN environment. Technical Integration: The nbedit extension uses an API token setup as a service account to JupyterHub. It automatically creates corresponding JupyterHub users for each CKAN user. It also automatically creates groups with administrative reporting capabilities in JupyterHub. The extension likely interfaces with the CKAN resource view system, and triggers backend processes to start/stop notebook servers and manage user authentication. It uses API interaction with JupyterHub, requesting tokens for users making requests on behalf of the current CKAN session. The extension requires configuration settings to be set within the CKAN configuration file (e.g., /etc/ckan/default/development.ini). Benefits & Impact: By integrating Jupyter Notebook functionality, this extension allows users to explore and analyze and edit data in CKAN. The Jupyter Notebook environment can easily be setup to edit and view datasets. The integration increases productivity by removing the need to switch between application and external Jupyter Notebook servers.

Clear search

Close search

Google apps

Main menu

ckanext-nbedit - Extensions - CKAN Ecosystem Catalog Beta

Jupyter Notebooks for the ERA5 Data Component

Jupyter Notebooks for the ROMS Data Component

Reporting behavior from WHO COVID-19 public data

(HS 17) Automate Workflows using Jupyter notebook to create Large Spatial...

Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus...

Melb 14784 lines Avg download speed 69.4M Tests 0.39M

SHG 31207 lines Avg 233.7M Tests 0.56M

ALC 113 lines Avg 51.5M Test 1092

BKK 29684 lines Avg 215.9M Tests 1.2M

LAX 15505 lines Avg 218.5M Tests 0.74M

Dataset of Video Comments of a Vision Video Classified by Their Relevance,...

Python Codebase and Jupyter Notebooks - Applications of Machine Learning...

Jupyter Notebooks for the NWIS Data Component

Replication Package: Unboxing Default Argument Breaking Changes in 1 + 2...

Dissertation Supplementary Files

Dataset belonging to the publication "data-producing methods in crc 985:...

Dataset belonging to the publication "Data-Producing Methods in CRC 985:...

Replication Data and Code for: 'Machine learning isotropic g values of...

Dataset for paper "Mitigating the effect of errors in source parameters on...

Replication Data for: Static polarizabilities at the basis set limit: A...

Data from: Application Cases of Inverse Modelling with the PROPTI Framework...

Supplementary material for "The Effects of Surfaces and Confinement on...

Demographic Analysis Workflow using Census API in Jupyter Notebook:...

Jupyter Notebooks to demonstrate RHESsys model on Coweeta sub18 in Binder

ckanext-nbedit - Extensions - CKAN Ecosystem Catalog Beta