44 datasets found

Python Import Data India – Buyers & Importers List
seair.co.in
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim, Python Import Data India – Buyers & Importers List [Dataset]. https://www.seair.co.in
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset provided by
Authors
Seair Exim
Area covered
India
Description
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
e
Eximpedia Export Import Trade
eximpedia.app
Updated Jan 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2025). Eximpedia Export Import Trade [Dataset]. https://www.eximpedia.app/
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Jan 9, 2025
Dataset provided by
Eximpedia Export Import Trade Data
Eximpedia PTE LTD
Authors
Seair Exim
Area covered
Moldova (Republic of), Mozambique, Austria, Curaçao, Gambia, Haiti, Comoros, El Salvador, Switzerland, Fiji
Description
Python Llc Company Export Import Records. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
e
Eximpedia Export Import Trade
eximpedia.app
Updated Jan 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2025). Eximpedia Export Import Trade [Dataset]. https://www.eximpedia.app/
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Jan 9, 2025
Dataset provided by
Eximpedia Export Import Trade Data
Eximpedia PTE LTD
Authors
Seair Exim
Area covered
Burundi, Senegal, Malaysia, Cook Islands, Mali, Jordan, Switzerland, Hungary, Vanuatu, Bahrain
Description
Python Logistics Llc Company Export Import Records. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
Reproducibility in Practice: Dataset of a Large-Scale Study of Jupyter...
zenodo.org
bz2
Updated Mar 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous; Anonymous (2021). Reproducibility in Practice: Dataset of a Large-Scale Study of Jupyter Notebooks [Dataset]. http://doi.org/10.5281/zenodo.2538877
Explore at:
bz2Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.2538877
Dataset updated
Mar 15, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anonymous; Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of Jupyter Notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we analyzed 1.4 million notebooks from GitHub.

This repository contains two files:

dump.tar.bz2

jupyter_reproducibility.tar.bz2

The dump.tar.bz2 file contains a PostgreSQL dump of the database, with all the data we extracted from the notebooks.

The jupyter_reproducibility.tar.bz2 file contains all the scripts we used to query and download Jupyter Notebooks, extract data from them, and analyze the data. It is organized as follows:

analyses: this folder has all the notebooks we use to analyze the data in the PostgreSQL database.

archaeology: this folder has all the scripts we use to query, download, and extract data from GitHub notebooks.

paper: empty. The notebook analyses/N11.To.Paper.ipynb moves data to it

In the remaining of this text, we give instructions for reproducing the analyses, by using the data provided in the dump and reproducing the collection, by collecting data from GitHub again.

Reproducing the Analysis

This section shows how to load the data in the database and run the analyses notebooks. In the analysis, we used the following environment:

Ubuntu 18.04.1 LTS
PostgreSQL 10.6
Conda 4.5.1
Python 3.6.8
PdfCrop 2012/11/02 v1.38

First, download dump.tar.bz2 and extract it:

tar -xjf dump.tar.bz2

It extracts the file db2019-01-13.dump. Create a database in PostgreSQL (we call it "jupyter"), and use psql to restore the dump:

psql jupyter < db2019-01-13.dump

It populates the database with the dump. Now, configure the connection string for sqlalchemy by setting the environment variable JUP_DB_CONNECTTION:

export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter";

Download and extract jupyter_reproducibility.tar.bz2:

tar -xjf jupyter_reproducibility.tar.bz2

Create a conda environment with Python 3.6:

conda create -n py36 python=3.6

Go to the analyses folder and install all the dependencies of the requirements.txt

cd jupyter_reproducibility/analyses pip install -r requirements.txt

For reproducing the analyses, run jupyter on this folder:

jupyter notebook

Execute the notebooks on this order:

N0.Index.ipynb

N1.Repository.ipynb

N2.Notebook.ipynb

N3.Cell.ipynb

N4.Features.ipynb

N5.Modules.ipynb

N6.AST.ipynb

N7.Name.ipynb

N8.Execution.ipynb

N9.Cell.Execution.Order.ipynb

N10.Markdown.ipynb

N11.To.Paper.ipynb

Reproducing or Expanding the Collection

The collection demands more steps to reproduce and takes much longer to run (months). It also involves running arbitrary code on your machine. Proceed with caution.

Requirements

This time, we have extra requirements:

All the analysis requirements
lbzip2 2.5
gcc 7.3.0
Github account
Gmail account

Environment

First, set the following environment variables:

export JUP_MACHINE="db"; # machine identifier export JUP_BASE_DIR="/mnt/jupyter/github"; # place to store the repositories export JUP_LOGS_DIR="/home/jupyter/logs"; # log files export JUP_COMPRESSION="lbzip2"; # compression program export JUP_VERBOSE="5"; # verbose level export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter"; # sqlchemy connection export JUP_GITHUB_USERNAME="github_username"; # your github username export JUP_GITHUB_PASSWORD="github_password"; # your github password export JUP_MAX_SIZE="8000.0"; # maximum size of the repositories directory (in GB) export JUP_FIRST_DATE="2013-01-01"; # initial date to query github export JUP_EMAIL_LOGIN="gmail@gmail.com"; # your gmail address export JUP_EMAIL_TO="target@email.com"; # email that receives notifications export JUP_OAUTH_FILE="~/oauth2_creds.json" # oauth2 auhentication file export JUP_NOTEBOOK_INTERVAL=""; # notebook id interval for this machine. Leave it in blank export JUP_REPOSITORY_INTERVAL=""; # repository id interval for this machine. Leave it in blank export JUP_WITH_EXECUTION="1"; # run execute python notebooks export JUP_WITH_DEPENDENCY="0"; # run notebooks with and without declared dependnecies export JUP_EXECUTION_MODE="-1"; # run following the execution order export JUP_EXECUTION_DIR="/home/jupyter/execution"; # temporary directory for running notebooks export JUP_ANACONDA_PATH="~/anaconda3"; # conda installation path export JUP_MOUNT_BASE="/home/jupyter/mount_ghstudy.sh"; # bash script to mount base dir export JUP_UMOUNT_BASE="/home/jupyter/umount_ghstudy.sh"; # bash script to umount base dir export JUP_NOTEBOOK_TIMEOUT="300"; # timeout the extraction # Frequenci of log report export JUP_ASTROID_FREQUENCY="5"; export JUP_IPYTHON_FREQUENCY="5"; export JUP_NOTEBOOKS_FREQUENCY="5"; export JUP_REQUIREMENT_FREQUENCY="5"; export JUP_CRAWLER_FREQUENCY="1"; export JUP_CLONE_FREQUENCY="1"; export JUP_COMPRESS_FREQUENCY="5"; export JUP_DB_IP="localhost"; # postgres database IP

Then, configure the file ~/oauth2_creds.json, according to yagmail documentation: https://media.readthedocs.org/pdf/yagmail/latest/yagmail.pdf

Configure the mount_ghstudy.sh and umount_ghstudy.sh scripts. The first one should mount the folder that stores the directories. The second one should umount it. You can leave the scripts in blank, but it is not advisable, as the reproducibility study runs arbitrary code on your machine and you may lose your data.

Scripts

Download and extract jupyter_reproducibility.tar.bz2:

tar -xjf jupyter_reproducibility.tar.bz2

Install 5 conda environments and 5 anaconda environments, for each python version. In each of them, upgrade pip, install pipenv, and install the archaeology package (Note that it is a local package that has not been published to pypi. Make sure to use the -e option):

Conda 2.7

conda create -n raw27 python=2.7 -y conda activate raw27 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Anaconda 2.7

conda create -n py27 python=2.7 anaconda -y conda activate py27 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.4

It requires a manual jupyter and pathlib2 installation due to some incompatibilities found on the default installation.

conda create -n raw34 python=3.4 -y conda activate raw34 conda install jupyter -c conda-forge -y conda uninstall jupyter -y pip install --upgrade pip pip install jupyter pip install pipenv pip install -e jupyter_reproducibility/archaeology pip install pathlib2

Anaconda 3.4

conda create -n py34 python=3.4 anaconda -y conda activate py34 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.5

conda create -n raw35 python=3.5 -y conda activate raw35 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Anaconda 3.5

It requires the manual installation of other anaconda packages.

conda create -n py35 python=3.5 anaconda -y conda install -y appdirs atomicwrites keyring secretstorage libuuid navigator-updater prometheus_client pyasn1 pyasn1-modules spyder-kernels tqdm jeepney automat constantly anaconda-navigator conda activate py35 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.6

conda create -n raw36 python=3.6 -y conda activate raw36 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Anaconda 3.6

conda create -n py36 python=3.6 anaconda -y conda activate py36 conda install -y anaconda-navigator jupyterlab_server navigator-updater pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.7

conda create -n raw37 python=3.7 -y conda activate raw37 pip install --upgrade pip pip install pipenv pip install -e
e
Ballroom Python South | See Full Import/Export Data | Eximpedia
eximpedia.app
Updated Jan 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2025). Ballroom Python South | See Full Import/Export Data | Eximpedia [Dataset]. https://www.eximpedia.app/
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Jan 8, 2025
Dataset provided by
Eximpedia Export Import Trade Data
Eximpedia PTE LTD
Authors
Seair Exim
Area covered
El Salvador, Eritrea, Iceland, State of, Mayotte, Croatia, Guyana, Luxembourg, Myanmar, Zambia
Description
Ballroom Python South Company Export Import Records. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
S
FLUNT simulated trapezoidal PCHE flow and heat transfer data set
scidb.cn
Updated Sep 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
zhao zi yan (2024). FLUNT simulated trapezoidal PCHE flow and heat transfer data set [Dataset]. http://doi.org/10.57760/sciencedb.hjs.00106
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.hjs.00106
Dataset updated
Sep 16, 2024
Dataset provided by
Science Data Bank
Authors
zhao zi yan
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Build a trapezoidal PCHE two channel model using FLUNT software as a unit for processing, simulate by changing different input conditions, obtain corresponding results, export them as CSV files, use Python for data processing, remove unnecessary information columns, and combine certain information from each file to form a snapshot matrix CSV file. After processing the snapshot matrix CSV file in Python, import it into MATLAB for prediction, and finally export the MATLAB results as a result CSV file.
q
Export bone healing simulation data for Matlab...
data.researchdatafinder.qut.edu.au
Updated Feb 1, 2002
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2002). Export bone healing simulation data for Matlab... [Dataset]. https://data.researchdatafinder.qut.edu.au/dataset/abaqus-finite-element/resource/49105b75-533e-45cd-bede-457a917559e9
Explore at:
Dataset updated
Feb 1, 2002
License
http://researchdatafinder.qut.edu.au/display/n4066http://researchdatafinder.qut.edu.au/display/n4066
Description
Exports required model results as text files, for reading into Matlab. Must be run from command line (after loading Abaqus module) thus: abaqus python ExportModelEndState02a.py Output databases... QUT Research Data Respository Dataset Resource available for download
Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter...
zenodo.org
explore.openaire.eu
bz2
Updated Mar 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana (2021). Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter Notebooks [Dataset]. http://doi.org/10.5281/zenodo.2592524
Explore at:
bz2Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.2592524
Dataset updated
Mar 15, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of Jupyter Notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we analyzed 1.4 million notebooks from GitHub.

Paper: https://2019.msrconf.org/event/msr-2019-papers-a-large-scale-study-about-quality-and-reproducibility-of-jupyter-notebooks

This repository contains two files:

dump.tar.bz2

jupyter_reproducibility.tar.bz2

The dump.tar.bz2 file contains a PostgreSQL dump of the database, with all the data we extracted from the notebooks.

The jupyter_reproducibility.tar.bz2 file contains all the scripts we used to query and download Jupyter Notebooks, extract data from them, and analyze the data. It is organized as follows:

analyses: this folder has all the notebooks we use to analyze the data in the PostgreSQL database.

archaeology: this folder has all the scripts we use to query, download, and extract data from GitHub notebooks.

paper: empty. The notebook analyses/N12.To.Paper.ipynb moves data to it

In the remaining of this text, we give instructions for reproducing the analyses, by using the data provided in the dump and reproducing the collection, by collecting data from GitHub again.

Reproducing the Analysis

This section shows how to load the data in the database and run the analyses notebooks. In the analysis, we used the following environment:

Ubuntu 18.04.1 LTS
PostgreSQL 10.6
Conda 4.5.11
Python 3.7.2
PdfCrop 2012/11/02 v1.38

First, download dump.tar.bz2 and extract it:

tar -xjf dump.tar.bz2

It extracts the file db2019-03-13.dump. Create a database in PostgreSQL (we call it "jupyter"), and use psql to restore the dump:

psql jupyter < db2019-03-13.dump

It populates the database with the dump. Now, configure the connection string for sqlalchemy by setting the environment variable JUP_DB_CONNECTTION:

export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter";

Download and extract jupyter_reproducibility.tar.bz2:

tar -xjf jupyter_reproducibility.tar.bz2

Create a conda environment with Python 3.7:

conda create -n analyses python=3.7 conda activate analyses

Go to the analyses folder and install all the dependencies of the requirements.txt

cd jupyter_reproducibility/analyses pip install -r requirements.txt

For reproducing the analyses, run jupyter on this folder:

jupyter notebook

Execute the notebooks on this order:

Index.ipynb

N0.Repository.ipynb

N1.Skip.Notebook.ipynb

N2.Notebook.ipynb

N3.Cell.ipynb

N4.Features.ipynb

N5.Modules.ipynb

N6.AST.ipynb

N7.Name.ipynb

N8.Execution.ipynb

N9.Cell.Execution.Order.ipynb

N10.Markdown.ipynb

N11.Repository.With.Notebook.Restriction.ipynb

N12.To.Paper.ipynb

Reproducing or Expanding the Collection

The collection demands more steps to reproduce and takes much longer to run (months). It also involves running arbitrary code on your machine. Proceed with caution.

Requirements

This time, we have extra requirements:

All the analysis requirements
lbzip2 2.5
gcc 7.3.0
Github account
Gmail account

Environment

First, set the following environment variables:

export JUP_MACHINE="db"; # machine identifier export JUP_BASE_DIR="/mnt/jupyter/github"; # place to store the repositories export JUP_LOGS_DIR="/home/jupyter/logs"; # log files export JUP_COMPRESSION="lbzip2"; # compression program export JUP_VERBOSE="5"; # verbose level export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter"; # sqlchemy connection export JUP_GITHUB_USERNAME="github_username"; # your github username export JUP_GITHUB_PASSWORD="github_password"; # your github password export JUP_MAX_SIZE="8000.0"; # maximum size of the repositories directory (in GB) export JUP_FIRST_DATE="2013-01-01"; # initial date to query github export JUP_EMAIL_LOGIN="gmail@gmail.com"; # your gmail address export JUP_EMAIL_TO="target@email.com"; # email that receives notifications export JUP_OAUTH_FILE="~/oauth2_creds.json" # oauth2 auhentication file export JUP_NOTEBOOK_INTERVAL=""; # notebook id interval for this machine. Leave it in blank export JUP_REPOSITORY_INTERVAL=""; # repository id interval for this machine. Leave it in blank export JUP_WITH_EXECUTION="1"; # run execute python notebooks export JUP_WITH_DEPENDENCY="0"; # run notebooks with and without declared dependnecies export JUP_EXECUTION_MODE="-1"; # run following the execution order export JUP_EXECUTION_DIR="/home/jupyter/execution"; # temporary directory for running notebooks export JUP_ANACONDA_PATH="~/anaconda3"; # conda installation path export JUP_MOUNT_BASE="/home/jupyter/mount_ghstudy.sh"; # bash script to mount base dir export JUP_UMOUNT_BASE="/home/jupyter/umount_ghstudy.sh"; # bash script to umount base dir export JUP_NOTEBOOK_TIMEOUT="300"; # timeout the extraction # Frequenci of log report export JUP_ASTROID_FREQUENCY="5"; export JUP_IPYTHON_FREQUENCY="5"; export JUP_NOTEBOOKS_FREQUENCY="5"; export JUP_REQUIREMENT_FREQUENCY="5"; export JUP_CRAWLER_FREQUENCY="1"; export JUP_CLONE_FREQUENCY="1"; export JUP_COMPRESS_FREQUENCY="5"; export JUP_DB_IP="localhost"; # postgres database IP

Then, configure the file ~/oauth2_creds.json, according to yagmail documentation: https://media.readthedocs.org/pdf/yagmail/latest/yagmail.pdf

Configure the mount_ghstudy.sh and umount_ghstudy.sh scripts. The first one should mount the folder that stores the directories. The second one should umount it. You can leave the scripts in blank, but it is not advisable, as the reproducibility study runs arbitrary code on your machine and you may lose your data.

Scripts

Download and extract jupyter_reproducibility.tar.bz2:

tar -xjf jupyter_reproducibility.tar.bz2

Install 5 conda environments and 5 anaconda environments, for each python version. In each of them, upgrade pip, install pipenv, and install the archaeology package (Note that it is a local package that has not been published to pypi. Make sure to use the -e option):

Conda 2.7

conda create -n raw27 python=2.7 -y conda activate raw27 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Anaconda 2.7

conda create -n py27 python=2.7 anaconda -y conda activate py27 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.4

It requires a manual jupyter and pathlib2 installation due to some incompatibilities found on the default installation.

conda create -n raw34 python=3.4 -y conda activate raw34 conda install jupyter -c conda-forge -y conda uninstall jupyter -y pip install --upgrade pip pip install jupyter pip install pipenv pip install -e jupyter_reproducibility/archaeology pip install pathlib2

Anaconda 3.4

conda create -n py34 python=3.4 anaconda -y conda activate py34 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.5

conda create -n raw35 python=3.5 -y conda activate raw35 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Anaconda 3.5

It requires the manual installation of other anaconda packages.

conda create -n py35 python=3.5 anaconda -y conda install -y appdirs atomicwrites keyring secretstorage libuuid navigator-updater prometheus_client pyasn1 pyasn1-modules spyder-kernels tqdm jeepney automat constantly anaconda-navigator conda activate py35 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.6

conda create -n raw36 python=3.6 -y conda activate raw36 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Anaconda 3.6

conda create -n py36 python=3.6 anaconda -y conda activate py36 conda install -y anaconda-navigator jupyterlab_server navigator-updater pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.7

<code
d
Data Management Project for Collaborative Groundwater Research
search.dataone.org
hydroshare.org
Updated Apr 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abbygael Johnson; Collins Stephenson; Brett Safely; Brooklyn Taylor (2025). Data Management Project for Collaborative Groundwater Research [Dataset]. https://search.dataone.org/view/sha256%3A1fa221662b0ba8fc29abb240dba6a668cd9388334e34eda28a18b4a84e2c51ab
Explore at:
Dataset updated
Apr 26, 2025
Dataset provided by
Hydroshare
Authors
Abbygael Johnson; Collins Stephenson; Brett Safely; Brooklyn Taylor
Description
This project developed a comprehensive data management system designed to support collaborative groundwater research across institutions by establishing a centralized, structured database for hydrologic time series data. Built on the Observations Data Model (ODM), the system stores time series data and metadata in a relational SQLite database. Key project components included database construction, automation of data formatting and importation, development of analytical and visualization tools, and integration with ArcGIS for geospatial representation. The data import workflow standardizes and validates diverse .csv datasets by aligning them with ODM formatting. A Python-based module was created to facilitate data retrieval, analysis, visualization, and export, while an interactive map feature enables users to explore site-specific data availability. Additionally, a custom ArcGIS script was implemented to generate maps that incorporate stream networks, site locations, and watershed boundaries using DEMs from USGS sources. The system was tested using real-world datasets from groundwater wells and surface water gages across Utah, demonstrating its flexibility in handling diverse formats and parameters. The relational structure enabled efficient querying and visualization, and the developed tools promoted accessibility and alignment with FAIR principles.
Z
Geographic Diversity in Public Code Contributions — Replication Package
data.niaid.nih.gov
explore.openaire.eu
Updated Mar 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Davide Rossi (2022). Geographic Diversity in Public Code Contributions — Replication Package [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6390354
Explore at:
Dataset updated
Mar 31, 2022
Dataset provided by
Stefano Zacchiroli
Davide Rossi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Geographic Diversity in Public Code Contributions - Replication Package

This document describes how to replicate the findings of the paper: Davide Rossi and Stefano Zacchiroli, 2022, Geographic Diversity in Public Code Contributions - An Exploratory Large-Scale Study Over 50 Years. In 19th International Conference on Mining Software Repositories (MSR ’22), May 23-24, Pittsburgh, PA, USA. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3524842.3528471

This document comes with the software needed to mine and analyze the data presented in the paper.

Prerequisites

These instructions assume the use of the bash shell, the Python programming language, the PosgreSQL DBMS (version 11 or later), the zstd compression utility and various usual *nix shell utilities (cat, pv, …), all of which are available for multiple architectures and OSs. It is advisable to create a Python virtual environment and install the following PyPI packages:

click==8.0.4 cycler==0.11.0 fonttools==4.31.2 kiwisolver==1.4.0 matplotlib==3.5.1 numpy==1.22.3 packaging==21.3 pandas==1.4.1 patsy==0.5.2 Pillow==9.0.1 pyparsing==3.0.7 python-dateutil==2.8.2 pytz==2022.1 scipy==1.8.0 six==1.16.0 statsmodels==0.13.2

Initial data

swh-replica, a PostgreSQL database containing a copy of Software Heritage data. The schema for the database is available at https://forge.softwareheritage.org/source/swh-storage/browse/master/swh/storage/sql/. We retrieved these data from Software Heritage, in collaboration with the archive operators, taking an archive snapshot as of 2021-07-07. We cannot make these data available in full as part of the replication package due to both its volume and the presence in it of personal information such as user email addresses. However, equivalent data (stripped of email addresses) can be obtained from the Software Heritage archive dataset, as documented in the article: Antoine Pietri, Diomidis Spinellis, Stefano Zacchiroli, The Software Heritage Graph Dataset: Public software development under one roof. In proceedings of MSR 2019: The 16th International Conference on Mining Software Repositories, May 2019, Montreal, Canada. Pages 138-142, IEEE 2019. http://dx.doi.org/10.1109/MSR.2019.00030. Once retrieved, the data can be loaded in PostgreSQL to populate swh-replica.

names.tab - forenames and surnames per country with their frequency

zones.acc.tab - countries/territories, timezones, population and world zones

c_c.tab - ccTDL entities - world zones matches

Data preparation

Export data from the swh-replica database to create commits.csv.zst and authors.csv.zst

sh> ./export.sh

Run the authors cleanup script to create authors--clean.csv.zst

sh> ./cleanup.sh authors.csv.zst

Filter out implausible names and create authors--plausible.csv.zst

sh> pv authors--clean.csv.zst | unzstd | ./filter_names.py 2> authors--plausible.csv.log | zstdmt > authors--plausible.csv.zst

Zone detection by email

Run the email detection script to create author-country-by-email.tab.zst

sh> pv authors--plausible.csv.zst | zstdcat | ./guess_country_by_email.py -f 3 2> author-country-by-email.csv.log | zstdmt > author-country-by-email.tab.zst

Database creation and initial data ingestion

Create the PostgreSQL DB

sh> createdb zones-commit

Notice that from now on when prepending the psql> prompt we assume the execution of psql on the zones-commit database.

Import data into PostgreSQL DB

sh> ./import_data.sh

Zone detection by name

Extract commits data from the DB and create commits.tab, that is used as input for the zone detection script

sh> psql -f extract_commits.sql zones-commit

Run the world zone detection script to create commit_zones.tab.zst

sh> pv commits.tab | ./assign_world_zone.py -a -n names.tab -p zones.acc.tab -x -w 8 | zstdmt > commit_zones.tab.zst Use ./assign_world_zone.py --help if you are interested in changing the script parameters.

Ingest zones assignment data into the DB

psql> \copy commit_zone from program 'zstdcat commit_zones.tab.zst | cut -f1,6 | grep -Ev ''\s$'''

Extraction and graphs

Run the script to execute the queries to extract the data to plot from the DB. This creates commit_zones_7120.tab, author_zones_7120_t5.tab, commit_zones_7120.grid and author_zones_7120_t5.grid. Edit extract_data.sql if you whish to modify extraction parameters (start/end year, sampling, …).

sh> ./extract_data.sh

Run the script to create the graphs from all the previously extracted tabfiles.

sh> ./create_stackedbar_chart.py -w 20 -s 1971 -f commit_zones_7120.grid -f author_zones_7120_t5.grid -o chart.pdf
o
Population Distribution Workflow using Census API in Jupyter Notebook:...
openicpsr.org
delimited
Updated Jul 23, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cooper Goodman; Nathanael Rosenheim; Wayne Day; Donghwan Gu; Jayasaree Korukonda (2020). Population Distribution Workflow using Census API in Jupyter Notebook: Dynamic Map of Census Tracts in Boone County, KY, 2000 [Dataset]. http://doi.org/10.3886/E120382V1
Explore at:
delimitedAvailable download formats
Unique identifier
https://doi.org/10.3886/E120382V1
Dataset updated
Jul 23, 2020
Dataset provided by
Texas A&M University
Authors
Cooper Goodman; Nathanael Rosenheim; Wayne Day; Donghwan Gu; Jayasaree Korukonda
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2000
Area covered
Boone County
Description
This archive reproduces a figure titled "Figure 3.2 Boone County population distribution" from Wang and vom Hofe (2007, p.60). The archive provides a Jupyter Notebook that uses Python and can be run in Google Colaboratory. The workflow uses the Census API to retrieve data, reproduce the figure, and ensure reproducibility for anyone accessing this archive.The Python code was developed in Google Colaboratory, or Google Colab for short, which is an Integrated Development Environment (IDE) of JupyterLab and streamlines package installation, code collaboration, and management. The Census API is used to obtain population counts from the 2000 Decennial Census (Summary File 1, 100% data). Shapefiles are downloaded from the TIGER/Line FTP Server. All downloaded data are maintained in the notebook's temporary working directory while in use. The data and shapefiles are stored separately with this archive. The final map is also stored as an HTML file.The notebook features extensive explanations, comments, code snippets, and code output. The notebook can be viewed in a PDF format or downloaded and opened in Google Colab. References to external resources are also provided for the various functional components. The notebook features code that performs the following functions:install/import necessary Python packagesdownload the Census Tract shapefile from the TIGER/Line FTP Serverdownload Census data via CensusAPI manipulate Census tabular data merge Census data with TIGER/Line shapefileapply a coordinate reference systemcalculate land area and population densitymap and export the map to HTMLexport the map to ESRI shapefileexport the table to CSVThe notebook can be modified to perform the same operations for any county in the United States by changing the State and County FIPS code parameters for the TIGER/Line shapefile and Census API downloads. The notebook can be adapted for use in other environments (i.e., Jupyter Notebook) as well as reading and writing files to a local or shared drive, or cloud drive (i.e., Google Drive).
H
Data from: We Just Ran Twenty-Three Million Queries of the World Bank's Web...
dataverse.harvard.edu
Updated Apr 27, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sarah Dykstra; Benjamin Dykstra; Justin Sandefur (2014). We Just Ran Twenty-Three Million Queries of the World Bank's Web Site [Dataset]. http://doi.org/10.7910/DVN/25492
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/25492
Dataset updated
Apr 27, 2014
Dataset provided by
Harvard Dataverse
Authors
Sarah Dykstra; Benjamin Dykstra; Justin Sandefur
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
1977 - 2012
Area covered
World
Description
This study provides data from the World Bank's PovcalNet on the distribution of household income and consumption across populations for 942 country-years, organized in dta and csv files by region. Each distribution contains 10,000 data points, one for each 0.01 incremental increase in percent of people living in households at or below a given income or consumption level. In addition, a data set containing the estimated parameters of the Beta and General Quadratic Lorenz curves is provided. For reference, we also provide the Python scripts used to query the PovcalNet online tool and export data from the Mongo database used to store results of these queries, along with all do files used to clean and construct the final data sets and summary statistics.
Worldwide Gender Differences in Public Code Contributions - Replication...
zenodo.org
data.niaid.nih.gov
bin, html, zip
Updated Feb 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Davide Rossi; Stefano Zacchiroli; Stefano Zacchiroli; Davide Rossi (2022). Worldwide Gender Differences in Public Code Contributions - Replication Package [Dataset]. http://doi.org/10.5281/zenodo.6020475
Explore at:
bin, html, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6020475
Dataset updated
Feb 9, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Davide Rossi; Stefano Zacchiroli; Stefano Zacchiroli; Davide Rossi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Worldwide Gender Differences in Public Code Contributions - Replication Package

This document describes how to replicate the findings of the paper: Davide Rossi and Stefano Zacchiroli, 2022, Worldwide Gender Differences in Public Code Contributions. In Software Engineering in Society (ICSE-SEIS'22), May 21-29, 2022, Pittsburgh, PA, USA. ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3510458.3513011

This document comes with the software needed to mine and analyze the data presented in the paper.

Prerequisites

These instructions assume the use of the bash shell, the Python programming language, the PosgreSQL DBMS (version 11 or later), the zstd compression utility and various usual *nix shell utilities (cat, pv, ...), all of which are available for multiple architectures and OSs.
It is advisable to create a Python virtual environment and install the following PyPI packages: click==8.0.3 cycler==0.10.0 gender-guesser==0.4.0 kiwisolver==1.3.2 matplotlib==3.4.3 numpy==1.21.3 pandas==1.3.4 patsy==0.5.2 Pillow==8.4.0 pyparsing==2.4.7 python-dateutil==2.8.2 pytz==2021.3 scipy==1.7.1 six==1.16.0 statsmodels==0.13.0

Initial data

swh-replica, a PostgreSQL database containing a copy of Software Heritage data. The schema for the database is available at https://forge.softwareheritage.org/source/swh-storage/browse/master/swh/storage/sql/.
We retrieved these data from Software Heritage, in collaboration with the archive operators, taking an archive snapshot as of 2021-07-07. We cannot make these data available in full as part of the replication package due to both its volume and the presence in it of personal information such as user email addresses. However, equivalent data (stripped of email addresses) can be obtained from the Software Heritage archive dataset, as documented in the article: Antoine Pietri, Diomidis Spinellis, Stefano Zacchiroli, The Software Heritage Graph Dataset: Public software development under one roof. In proceedings of MSR 2019: The 16th International Conference on Mining Software Repositories, May 2019, Montreal, Canada. Pages 138-142, IEEE 2019. http://dx.doi.org/10.1109/MSR.2019.00030.
Once retrieved, the data can be loaded in PostgreSQL to populate swh-replica.

names.tab - forenames and surnames per country with their frequency

zones.acc.tab - countries/territories, timezones, population and world zones

c_c.tab - ccTDL entities - world zones matches

Data preparation

Export data from the swh-replica database to create commits.csv.zst and authors.csv.zst sh> ./export.sh

Run the authors cleanup script to create authors--clean.csv.zst sh> ./cleanup.sh authors.csv.zst

Filter out implausible names and create authors--plausible.csv.zst sh> pv authors--clean.csv.zst | unzstd | ./filter_names.py 2> authors--plausible.csv.log | zstdmt > authors--plausible.csv.zst

Gender detection

Run the gender guessing script to create author-fullnames-gender.csv.zst sh> pv authors--plausible.csv.zst | unzstd | ./guess_gender.py --fullname --field 2 | zstdmt > author-fullnames-gender.csv.zst

Database creation and data ingestion

Create the PostgreSQL DB sh> createdb gender-commit Notice that from now on when prepending the psql> prompt we assume the execution of psql on the gender-commit database.

Import data into PostgreSQL DB sh> ./import_data.sh

Zone detection

Extract commits data from the DB and create commits.tab, that is used as input for the gender detection script
sh> psql -f extract_commits.sql gender-commit

Run the world zone detection script to create commit_zones.tab.zst sh> pv commits.tab | ./assign_world_zone.py -a -n names.tab -p zones.acc.tab -x -w 8 | zstdmt > commit_zones.tab.zst Use ./assign_world_zone.py --help if you are interested in changing the script parameters.

Read zones assignment data from the file into the DB
psql> \copy commit_culture from program 'zstdcat commit_zones.tab.zst | cut -f1,6 | grep -Ev ''\s$'''

Extraction and graphs

Run the script to execute the queries to extract the data to plot from the DB. This creates commits_tz.tab, authors_tz.tab, commits_zones.tab, authors_zones.tab, and authors_zones_1620.tab.
Edit extract_data.sql if you whish to modify extraction parameters (start/end year, sampling, ...). sh> ./extract_data.sh

Run the script to create the graphs from all the previously extracted tabfiles. This will generate commits_tzs.pdf, authors_tzs.pdf, commits_zones.pdf, authors_zones.pdf, and authors_zones_1620.pdf. sh> ./create_charts.sh

Additional graphs

This package also includes some already-made graphs

authors_zones_1.pdf: stacked graphs showing the ratio of female authors per world zone through the years, considering all authors with at least one commit per period

authors_zones_2.pdf: ditto with at least two commits per period

authors_zones_10.pdf: ditto with at least ten commits per period
Input/output data for blackwater modelling using pyDODOC (python model for...
researchdata.edu.au
data.csiro.au
datadownload
Updated Mar 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ashmita Sengupta; Catherine Ticehurst; Andrew Freebairn; Klaus Joehnk; Fazlul Karim; Klaus Jöhnk; Fazlul Karim; Catherine Ticehurst; Andrew Freebairn (2023). Input/output data for blackwater modelling using pyDODOC (python model for Dissolved Oxygen, Dissolved Organic Carbon) model [Dataset]. http://doi.org/10.25919/H9XT-ZD03
Explore at:
datadownloadAvailable download formats
Unique identifier
https://doi.org/10.25919/H9XT-ZD03
Dataset updated
Mar 20, 2023
Dataset provided by
CSIROhttp://www.csiro.au/
Authors
Ashmita Sengupta; Catherine Ticehurst; Andrew Freebairn; Klaus Joehnk; Fazlul Karim; Klaus Jöhnk; Fazlul Karim; Catherine Ticehurst; Andrew Freebairn
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 2010 - Dec 31, 2020
Area covered

Description
Flow data and model parameters to configure and run the pyDODOC model for estimating dissolved organic carbon (DOC) export from the Barmah floodplain in the Murray-Darling Basin (MDB) for different temperature, litter load concentration and flooding scenarios. Model outputs including DOC leaching, DOC-consumed and net DOC export from the floodplain are included in the output file for each scenario. The data were produced as a part of the Murray-Darling Basin Ecosystem Function (MDB-EF) project. Lineage: Measured flow and water quality data (e.g. DO, DOC) were obtained from the water monitoring agencies of New South Wales (NSW) and Victoria (Vic). The outputs were produced by executing the pyDODOC model for different temperatures and leaf litter concentrations.
Z
Data from: Research data and example scripts for the paper "Bayesian...
data.niaid.nih.gov
Updated Mar 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Schneider, Philipp-Immanuel (2024). Research data and example scripts for the paper "Bayesian Target-Vector Optimization for Efficient Parameter Reconstruction" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6359593
Explore at:
Dataset updated
Mar 6, 2024
Dataset provided by
Schneider, Philipp-Immanuel
Burger, Sven
Plock, Matthias
Andrle, Kas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Bayesian Target-Vector Optimization for Efficient Parameter Reconstruction

This publication contains the research data and example scripts for the paper “Bayesian Target-Vector Optimization for Efficient Parameter Reconstruction” [1]. The research data is found in the directory research_data, the example scripts are found in the directory example_scripts.

The research data contains all necessary information to be able to reconstruct the figures and values given in the paper, as well as all result figures shown. Where possible, the directories contain the necessary scripts to recreate the results themselves, up to stochastic variations.

The example scripts are intended to show how one can (i), perform a least-square type optimization of a model function (here we focus our efforts on the analytical model functions MGH17 and Gauss3, as described in the paper) using various methods (BTVO, LM, BO, L-BFGS-B, NM, including using derivative information when applicable), and (ii), perform Markov chain Monte Carlo (MCMC) sampling around the found maximum likelihood estimate (MLE) to estimate the uncertainties of the MLE parameter (both using a surrogate model of the actual model function, as well as using the actual model function directly).

Research data

Contained are directories for the experimental problem GIXRF, and the two analytical model functions MGH17 and Gauss3. What follows is a listing of directories and the contents:

gauss3_optimization: Optimization logs for the Gauss3 model function for BTVO, LM, BO, L-BFGS-B, NM (with derivatives when applicable), .npy files used for generating the plots, a benchmark.py file used for the generation of the data, as well as the plots shown in the paper.

mgh17_optimization: Optimization logs for the MGH17 model function for BTVO, LM, BO, L-BFGS-B, NM (with derivatives when applicable), .npy files used for generating the plots, a benchmark.py file used for the generation of the data, as well as the plots shown in the paper.

mgh17_mcmc_analytical: Scripts for the creation of the plots (does not use an optimization log), as well as plots shown in the paper. This uses the model function directly to perform the MCMC sampling.

mgh17_mcmc_surrogate: Optimization log of the MGH17 function used for the creation of the MCMC plots, scripts for the creation of the plots (use the optimization log), as well as plots shown in the paper. This uses a surrogate model to perform the MCMC sampling.

gixrf_optimization: benchmark.py file to perform the optimization, the optimization logs for the various methods (BTVO, LM, BO, L-BFGS-B, NM), .npy files and scripts used for the creation of the plots, and the plots shown in the paper.

gixrf_mcmc_supplement: optimization log used for the creation of the plot, pickle file used for the creation of the plot, script to create the MCMC plot.

gixrf_optimum_difference_supplement: optimization logs of BTVO optimization of the GIXRF problem, scripts to create the difference/error plots shown for the GIXRF problem in the supplement, and the plots themselves.

Employed software for creating the research data

The software used in the creation is:

JCMsuite Analysis and Optimization toolkit, development version, commit d55e99b (the closest commercial release is found in JCMsuite version 5.0.2)

A list of Python packages installed (excerpt from conda list, name and version)

corner 2.1.0

emcee 3.0.2

jax 0.2.22

jaxlib 0.1.72

matplotlib 3.2.1

numba 0.40.1

numpy 1.18.1

pandas 0.24.1

python 3.7.11

scikit-optimize 0.7.4

scipy 1.7.1

tikzplotlib 0.9.9

JCMsuite 4.6.3 for the evaluation of the experimental model

Example scripts

This directory contains a few sample files that show how parameter reconstructions can be performed using the JCMsuite analysis and optimization toolbox, with a particular focus on the Bayesian target-vector optimization method shown in the paper.

It also contains example files that show how an uncertainty quantification can be performed using MCMC, both directly using a model function, as well as using a surrogate model of the model function.

What follows is a listing of the contents of the directory:

mcmc_mgh17_analytical.py: performs a MCMC analysis of the MGH17 model function directly, without constructing a surrogate model. Uses emcee.

mcmc_mgh17_surrogate.py: performs a MCMC analysis of the MGH17 model function by constructing a surrogate model of the model function. Uses the JCMsuite analysis and optimization toolbox.

opt_gauss3.py: performs a parameter reconstruction of the Gauss3 model function using various methods (BTVO, LM, BO, L-BFGS-B, NM, with derivatives when applicable).

opt_mgh17.py: performs a parameter reconstruction of the MGH17 model function using various methods (BTVO, LM, BO, L-BFGS-B, NM, with derivatives when applicable).

util/model_functions.py: contains the MGH17 and Gauss3 model functions, their (automatic) derivatives, and objective functions used in the optimizations.

Requirements to execute the example scripts

These scripts have been developed and tested under Linux, Debian 10. We have tried to make sure that they would also work in a Windows environment, but can unfortunately give no guarantees for that.

We mainly use Python to run the reconstructions. To execute the files, a few Python packages have to be installed. In addition to the usual scientific Python stack (NumPy, SciPy, matplotlib, pandas, etc.), the packages jax and jaxlib (for automatic differentiation of Python/NumPy functions), emcee and corner (for MCMC sampling and subsequent plotting of the results) have to be installed.

This can be achieved for example using pip, e.g.

pip install -r requirements.txt

Additionally, JCMsuite has to be installed. For this you can visit [2] and download a free trial version.

On Linux, the installation has to be added to the PATH, e.g. by adding the following to your .bashrc file:

export JCMROOT=/FULL/PATH/TO/BASE/DIRECTORY export PATH=$JCMROOT/bin:$PATH export PYTHONPATH=$JCMROOT/ThirdPartySupport/Python:$PYTHONPATH

Bibliography

[1] M. Plock, K. Andrle, S. Burger, P.-I. Schneider, Bayesian Target-Vector Optimization for Efficient Parameter Reconstruction. Adv. Theory Simul. 5, 2200112 (2022).

[2] https://jcmwave.com/
Democracy and the Market for Culture: Institutional Determinants of Cultural...
zenodo.org
bin, csv
Updated Jul 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anon Anon; Anon Anon (2025). Democracy and the Market for Culture: Institutional Determinants of Cultural Goods Exports [Dataset]. http://doi.org/10.5281/zenodo.16417376
Explore at:
bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.16417376
Dataset updated
Jul 25, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anon Anon; Anon Anon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset accompanies the research article titled Democracy and the Market for Culture, which investigates how different dimensions of democratic governance influence international trade in cultural goods. Drawing from UNESCO cultural exports data, World Bank GDP per capita, and the V-Dem Core v15 democracy indicators, the study uses fixed-effects OLS regression to analyze the institutional determinants of cultural export performance across countries from 2004 to 2019.

The dataset includes:

culture_exports.csv: UNESCO cultural goods export data (USD, by country-year).

GDPPC.csv: World Bank GDP per capita data (log-transformed).

V-Dem-CY-Core-v15.csv: Country-year democracy indicators from V-Dem v15.

Three Colab-ready Python notebooks (v2x_libdem.ipynb, v2x_partipdem.ipynb, v2x_polyarchy.ipynb) with full replication code for each regression model.

🔧 How to Use in Google Colab

Open Google Colab:
Go to https://colab.research.google.com/

Upload Files to Colab:

You can drag the files into the file panel (left sidebar) or use this code to upload:

python

CopiarEditar

from google.colab import files uploaded = files.upload()

Install Required Libraries (if needed):
Most notebooks use pandas, statsmodels, and matplotlib:

python

CopiarEditar

!pip install pandas statsmodels matplotlib

Run a Notebook:

Open v2x_libdem.ipynb, v2x_partipdem.ipynb, or v2x_polyarchy.ipynb in Colab.

Click "Runtime" > "Run all" or run each cell individually to replicate the regression results.

Customize or Extend:

You can plug in other democracy or economic indicators from the V-Dem dataset to test additional hypotheses.

Modify the regression specification or add visualization outputs using seaborn or plotly.

License: Creative Commons Attribution 4.0 (CC BY 4.0)

Citation:
Anon Annotator. (2025). Democracy and the Market for Culture: Institutional Determinants of Cultural Goods Exports [Dataset]. Zenodo. https://doi.org/10.5281/zenodo.16417376
d
Hydroshare-GoogleEarthEngine
dataone.org
hydroshare.org
Updated Dec 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alfonso Torres-Rua (2021). Hydroshare-GoogleEarthEngine [Dataset]. https://dataone.org/datasets/sha256%3A0ddeef1d398d807e3b20a6525e819074588141b0f4021865c8366d7c76e8917f
Explore at:
Dataset updated
Dec 5, 2021
Dataset provided by
Hydroshare
Authors
Alfonso Torres-Rua
Description
A first trial for merging the Google Earth Engine API into HydroShare.

Google EarthEngine API http://href="https://developers.google.com/earth-engine/api_docs

The code here only provides connection to Earth Engine API yet. Does not interact with HS data or functions yet.

It uses git code published by Erik Tyler href="https://github.com/tylere/eeus2017-python" https://github.com/tylere/eeus2017-python%3C/a" rel="nofollow">rel="nofollow">https://github.com/tylere/eeus2017-python> to demonstrate adequate installation and setup

To Do:

Import /Export data from HS to Earth Engine Permanent storing of Google EE key Fix weird behavior of leaflet python module Separate code in folders
Replication Package: Unboxing Default Argument Breaking Changes in Scikit...
zenodo.org
application/gzip
Updated Aug 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
João Eduardo Montandon; Luciana Lourdes Silva; Cristiano Politowski; Ghizlane El Boussaidi; Marco Tulio Valente; João Eduardo Montandon; Luciana Lourdes Silva; Cristiano Politowski; Ghizlane El Boussaidi; Marco Tulio Valente (2023). Replication Package: Unboxing Default Argument Breaking Changes in Scikit Learn [Dataset]. http://doi.org/10.5281/zenodo.8132450
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8132450
Dataset updated
Aug 23, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
João Eduardo Montandon; Luciana Lourdes Silva; Cristiano Politowski; Ghizlane El Boussaidi; Marco Tulio Valente; João Eduardo Montandon; Luciana Lourdes Silva; Cristiano Politowski; Ghizlane El Boussaidi; Marco Tulio Valente
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Replication Package

This repository contains data and source files needed to replicate our work described in the paper "Unboxing Default Argument Breaking Changes in Scikit Learn".

Requirements

We recommend the following requirements to replicate our study:

Internet access

At least 100GB of space

Docker installed

Git installed

Package Structure

We relied on Docker containers to provide a working environment that is easier to replicate. Specifically, we configure the following containers:

data-analysis, an R-based Container we used to run our data analysis.

data-collection, a Python Container we used to collect Scikit's default arguments and detect them in client applications.

database, a Postgres Container we used to store clients' data, obtainer from Grotov et al.

storage, a directory used to store the data processed in data-analysis and data-collection. This directory is shared in both containers.

docker-compose.yml, the Docker file that configures all containers used in the package.

In the remainder of this document, we describe how to set up each container properly.

Using VSCode to Setup the Package

We selected VSCode as the IDE of choice because its extensions allow us to implement our scripts directly inside the containers. In this package, we provide configuration parameters for both data-analysis and data-collection containers. This way you can directly access and run each container inside it without any specific configuration.

You first need to set up the containers

$ cd /replication/package/folder $ docker-compose build $ docker-compose up # Wait docker creating and running all containers

Then, you can open them in Visual Studio Code:

Open VSCode in project root folder

Access the command palette and select "Dev Container: Reopen in Container"

Select either Data Collection or Data Analysis.

Start working

If you want/need a more customized organization, the remainder of this file describes it in detail.

Longest Road: Manual Package Setup

Database Setup

The database container will automatically restore the dump in dump_matroskin.tar in its first launch. To set up and run the container, you should:

Build an image:

$ cd ./database $ docker build --tag 'dabc-database' . $ docker image ls REPOSITORY TAG IMAGE ID CREATED SIZE dabc-database latest b6f8af99c90d 50 minutes ago 18.5GB

Create and enter inside the container:

$ docker run -it --name dabc-database-1 dabc-database $ docker exec -it dabc-database-1 /bin/bash root# psql -U postgres -h localhost -d jupyter-notebooks jupyter-notebooks=# \dt List of relations Schema | Name | Type | Owner --------+-------------------+-------+------- public | Cell | table | root public | Code_cell | table | root public | Md_cell | table | root public | Notebook | table | root public | Notebook_features | table | root public | Notebook_metadata | table | root public | repository | table | root

If you got the tables list as above, your database is properly setup.

It is important to mention that this database is extended from the one provided by Grotov et al.. Basically, we added three columns in the table Notebook_features (API_functions_calls, defined_functions_calls, andother_functions_calls) containing the function calls performed by each client in the database.

Data Collection Setup

This container is responsible for collecting the data to answer our research questions. It has the following structure:

dabcs.py, extract DABCs from Scikit Learn source code, and export them to a CSV file.

dabcs-clients.py, extract function calls from clients and export them to a CSV file. We rely on a modified version of Matroskin to leverage the function calls. You can find the tool's source code in the `matroskin`` directory.

Makefile, commands to set up and run both dabcs.py and dabcs-clients.py

matroskin, the directory containing the modified version of matroskin tool. We extended the library to collect the function calls performed on the client notebooks of Grotov's dataset.

storage, a docker volume where the data-collection should save the exported data. This data will be used later in Data Analysis.

requirements.txt, Python dependencies adopted in this module.

Note that the container will automatically configure this module for you, e.g., install dependencies, configure matroskin, download scikit learn source code, etc. For this, you must run the following commands:

$ cd ./data-collection $ docker build --tag "data-collection" . $ docker run -it -d --name data-collection-1 -v $(pwd)/:/data-collection -v $(pwd)/../storage/:/data-collection/storage/ data-collection $ docker exec -it data-collection-1 /bin/bash $ ls Dockerfile Makefile config.yml dabcs-clients.py dabcs.py matroskin storage requirements.txt utils.py

If you see project files, it means the container is configured accordingly.

Data Analysis Setup

We use this container to conduct the analysis over the data produced by the Data Collection container. It has the following structure:

dependencies.R, an R script containing the dependencies used in our data analysis.

data-analysis.Rmd, the R notebook we used to perform our data analysis

datasets, a docker volume pointing to the storage directory.

Execute the following commands to run this container:

$ cd ./data-analysis $ docker build --tag "data-analysis" . $ docker run -it -d --name data-analysis-1 -v $(pwd)/:/data-analysis -v $(pwd)/../storage/:/data-collection/datasets/ data-analysis $ docker exec -it data-analysis-1 /bin/bash $ ls data-analysis.Rmd datasets dependencies.R Dockerfile figures Makefile

If you see project files, it means the container is configured accordingly.

A note on storage shared folder

As mentioned, the storage folder is mounted as a volume and shared between data-collection and data-analysis containers. We compressed the content of this folder due to space constraints. Therefore, before starting working on Data Collection or Data Analysis, make sure you extracted the compressed files. You can do this by running the Makefile inside storage folder.

$ make unzip # extract files $ ls clients-dabcs.csv clients-validation.csv dabcs.csv Makefile scikit-learn-versions.csv versions.csv $ make zip # compress files $ ls csv-files.tar.gz Makefile
The Kubernetes Security Landscape: Full Open Research Archive
figshare.com
zip
Updated May 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
J. Alexander Curtis (2025). The Kubernetes Security Landscape: Full Open Research Archive [Dataset]. http://doi.org/10.6084/m9.figshare.25522459.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25522459.v1
Dataset updated
May 30, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
J. Alexander Curtis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Kubernetes Security Landscape: AI-Driven Insights from a Comprehensive Analysis of Developer DiscussionsThis zip archive is organized into four main directories, plus a handful of supporting files. It contains everything you need to recreate the results of our research under the Open Research initiative.assets/: Images and compiled assets used in the research paperdata/: Exported raw data sources, including the data dump, CSV export, and Pickle filesresearch/: Jupyter Notebooks demonstrating how the data was processed within the paperscripts/: Python scripts for scraping - exposed by cli.pycli.py: Python script which exposes a command-line-interface (CLI) that allows for easier interaction with the scraping scripts in the scripts/ directorypoetry.lock: Dependency package manifest for using with the poetry install command and poetry package manager.requirements.txt: Python/Pip project configuration file if you prefer to use Pip instead of Poetry.README.md:
d
Bathymetry of the Main Pool of Lake Calumet, Cook County, Illinois, July...
catalog.data.gov
data.usgs.gov
+1more
Updated Jul 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Bathymetry of the Main Pool of Lake Calumet, Cook County, Illinois, July 2023 [Dataset]. https://catalog.data.gov/dataset/bathymetry-of-the-main-pool-of-lake-calumet-cook-county-illinois-july-2023
Explore at:
Dataset updated
Jul 24, 2025
Dataset provided by
U.S. Geological Survey
Area covered
Cook County, Lake Calumet, Illinois
Description
These data are single-beam bathymetry points compiled in comma separated values (CSV) file format, generated from a hydrographic survey of the northern portion of Lake Calumet in Cook County, Illinois. Hydrographic data were collected July 18-19, 2023, using a single-beam echosounder (SBES) integrated with a Global Navigation Satellite System (GNSS) mounted on a marine survey vessel. Surface water elevation data were collected July 18 utilizing a single-base real-time kinematic (RTK)/GNSS unit. Bathymetric data points were collected as the vessel traversed the northern portions of the lake along overlapping survey lines. The SBES internally collected and stored the depth data from the echosounder and the horizontal and vertical position data of the vessel from the GNSS in real time. Data processing required specialized computer software to export bathymetry data from the raw data files. A Python script was written to calculate the lakebed elevations and identify outliers in the dataset. These data are provided in comma separated values (CSV) format as LakeCalumet_SBES_20230718.csv. Data points are stored as a series of x (longitude), y (latitude), and z (elevation or depth) points along with variable length records specific to the data transects.