44 datasets found
  1. Python Import Data India – Buyers & Importers List

    • seair.co.in
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim, Python Import Data India – Buyers & Importers List [Dataset]. https://www.seair.co.in
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset provided by
    Authors
    Seair Exim
    Area covered
    India
    Description

    Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.

  2. e

    Eximpedia Export Import Trade

    • eximpedia.app
    Updated Jan 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim (2025). Eximpedia Export Import Trade [Dataset]. https://www.eximpedia.app/
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Jan 9, 2025
    Dataset provided by
    Eximpedia Export Import Trade Data
    Eximpedia PTE LTD
    Authors
    Seair Exim
    Area covered
    Moldova (Republic of), Mozambique, Austria, Curaçao, Gambia, Haiti, Comoros, El Salvador, Switzerland, Fiji
    Description

    Python Llc Company Export Import Records. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.

  3. e

    Eximpedia Export Import Trade

    • eximpedia.app
    Updated Jan 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim (2025). Eximpedia Export Import Trade [Dataset]. https://www.eximpedia.app/
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Jan 9, 2025
    Dataset provided by
    Eximpedia Export Import Trade Data
    Eximpedia PTE LTD
    Authors
    Seair Exim
    Area covered
    Burundi, Senegal, Malaysia, Cook Islands, Mali, Jordan, Switzerland, Hungary, Vanuatu, Bahrain
    Description

    Python Logistics Llc Company Export Import Records. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.

  4. Reproducibility in Practice: Dataset of a Large-Scale Study of Jupyter...

    • zenodo.org
    bz2
    Updated Mar 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous; Anonymous (2021). Reproducibility in Practice: Dataset of a Large-Scale Study of Jupyter Notebooks [Dataset]. http://doi.org/10.5281/zenodo.2538877
    Explore at:
    bz2Available download formats
    Dataset updated
    Mar 15, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous; Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of Jupyter Notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we analyzed 1.4 million notebooks from GitHub.

    This repository contains two files:

    • dump.tar.bz2
    • jupyter_reproducibility.tar.bz2

    The dump.tar.bz2 file contains a PostgreSQL dump of the database, with all the data we extracted from the notebooks.

    The jupyter_reproducibility.tar.bz2 file contains all the scripts we used to query and download Jupyter Notebooks, extract data from them, and analyze the data. It is organized as follows:

    • analyses: this folder has all the notebooks we use to analyze the data in the PostgreSQL database.
    • archaeology: this folder has all the scripts we use to query, download, and extract data from GitHub notebooks.
    • paper: empty. The notebook analyses/N11.To.Paper.ipynb moves data to it

    In the remaining of this text, we give instructions for reproducing the analyses, by using the data provided in the dump and reproducing the collection, by collecting data from GitHub again.

    Reproducing the Analysis

    This section shows how to load the data in the database and run the analyses notebooks. In the analysis, we used the following environment:

    Ubuntu 18.04.1 LTS
    PostgreSQL 10.6
    Conda 4.5.1
    Python 3.6.8
    PdfCrop 2012/11/02 v1.38

    First, download dump.tar.bz2 and extract it:

    tar -xjf dump.tar.bz2

    It extracts the file db2019-01-13.dump. Create a database in PostgreSQL (we call it "jupyter"), and use psql to restore the dump:

    psql jupyter < db2019-01-13.dump

    It populates the database with the dump. Now, configure the connection string for sqlalchemy by setting the environment variable JUP_DB_CONNECTTION:

    export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter";

    Download and extract jupyter_reproducibility.tar.bz2:

    tar -xjf jupyter_reproducibility.tar.bz2

    Create a conda environment with Python 3.6:

    conda create -n py36 python=3.6

    Go to the analyses folder and install all the dependencies of the requirements.txt

    cd jupyter_reproducibility/analyses
    pip install -r requirements.txt

    For reproducing the analyses, run jupyter on this folder:

    jupyter notebook

    Execute the notebooks on this order:

    • N0.Index.ipynb
    • N1.Repository.ipynb
    • N2.Notebook.ipynb
    • N3.Cell.ipynb
    • N4.Features.ipynb
    • N5.Modules.ipynb
    • N6.AST.ipynb
    • N7.Name.ipynb
    • N8.Execution.ipynb
    • N9.Cell.Execution.Order.ipynb
    • N10.Markdown.ipynb
    • N11.To.Paper.ipynb

    Reproducing or Expanding the Collection

    The collection demands more steps to reproduce and takes much longer to run (months). It also involves running arbitrary code on your machine. Proceed with caution.

    Requirements

    This time, we have extra requirements:

    All the analysis requirements
    lbzip2 2.5
    gcc 7.3.0
    Github account
    Gmail account

    Environment

    First, set the following environment variables:

    export JUP_MACHINE="db"; # machine identifier
    export JUP_BASE_DIR="/mnt/jupyter/github"; # place to store the repositories
    export JUP_LOGS_DIR="/home/jupyter/logs"; # log files
    export JUP_COMPRESSION="lbzip2"; # compression program
    export JUP_VERBOSE="5"; # verbose level
    export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter"; # sqlchemy connection
    export JUP_GITHUB_USERNAME="github_username"; # your github username
    export JUP_GITHUB_PASSWORD="github_password"; # your github password
    export JUP_MAX_SIZE="8000.0"; # maximum size of the repositories directory (in GB)
    export JUP_FIRST_DATE="2013-01-01"; # initial date to query github
    export JUP_EMAIL_LOGIN="gmail@gmail.com"; # your gmail address
    export JUP_EMAIL_TO="target@email.com"; # email that receives notifications
    export JUP_OAUTH_FILE="~/oauth2_creds.json" # oauth2 auhentication file
    export JUP_NOTEBOOK_INTERVAL=""; # notebook id interval for this machine. Leave it in blank
    export JUP_REPOSITORY_INTERVAL=""; # repository id interval for this machine. Leave it in blank
    export JUP_WITH_EXECUTION="1"; # run execute python notebooks
    export JUP_WITH_DEPENDENCY="0"; # run notebooks with and without declared dependnecies
    export JUP_EXECUTION_MODE="-1"; # run following the execution order
    export JUP_EXECUTION_DIR="/home/jupyter/execution"; # temporary directory for running notebooks
    export JUP_ANACONDA_PATH="~/anaconda3"; # conda installation path
    export JUP_MOUNT_BASE="/home/jupyter/mount_ghstudy.sh"; # bash script to mount base dir
    export JUP_UMOUNT_BASE="/home/jupyter/umount_ghstudy.sh"; # bash script to umount base dir
    export JUP_NOTEBOOK_TIMEOUT="300"; # timeout the extraction
    
    
    # Frequenci of log report
    export JUP_ASTROID_FREQUENCY="5";
    export JUP_IPYTHON_FREQUENCY="5";
    export JUP_NOTEBOOKS_FREQUENCY="5";
    export JUP_REQUIREMENT_FREQUENCY="5";
    export JUP_CRAWLER_FREQUENCY="1";
    export JUP_CLONE_FREQUENCY="1";
    export JUP_COMPRESS_FREQUENCY="5";
    
    export JUP_DB_IP="localhost"; # postgres database IP

    Then, configure the file ~/oauth2_creds.json, according to yagmail documentation: https://media.readthedocs.org/pdf/yagmail/latest/yagmail.pdf

    Configure the mount_ghstudy.sh and umount_ghstudy.sh scripts. The first one should mount the folder that stores the directories. The second one should umount it. You can leave the scripts in blank, but it is not advisable, as the reproducibility study runs arbitrary code on your machine and you may lose your data.

    Scripts

    Download and extract jupyter_reproducibility.tar.bz2:

    tar -xjf jupyter_reproducibility.tar.bz2

    Install 5 conda environments and 5 anaconda environments, for each python version. In each of them, upgrade pip, install pipenv, and install the archaeology package (Note that it is a local package that has not been published to pypi. Make sure to use the -e option):

    Conda 2.7

    conda create -n raw27 python=2.7 -y
    conda activate raw27
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 2.7

    conda create -n py27 python=2.7 anaconda -y
    conda activate py27
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology
    

    Conda 3.4

    It requires a manual jupyter and pathlib2 installation due to some incompatibilities found on the default installation.

    conda create -n raw34 python=3.4 -y
    conda activate raw34
    conda install jupyter -c conda-forge -y
    conda uninstall jupyter -y
    pip install --upgrade pip
    pip install jupyter
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology
    pip install pathlib2

    Anaconda 3.4

    conda create -n py34 python=3.4 anaconda -y
    conda activate py34
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.5

    conda create -n raw35 python=3.5 -y
    conda activate raw35
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 3.5

    It requires the manual installation of other anaconda packages.

    conda create -n py35 python=3.5 anaconda -y
    conda install -y appdirs atomicwrites keyring secretstorage libuuid navigator-updater prometheus_client pyasn1 pyasn1-modules spyder-kernels tqdm jeepney automat constantly anaconda-navigator
    conda activate py35
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.6

    conda create -n raw36 python=3.6 -y
    conda activate raw36
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 3.6

    conda create -n py36 python=3.6 anaconda -y
    conda activate py36
    conda install -y anaconda-navigator jupyterlab_server navigator-updater
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.7

    conda create -n raw37 python=3.7 -y
    conda activate raw37
    pip install --upgrade pip
    pip install pipenv
    pip install -e

  5. e

    Ballroom Python South | See Full Import/Export Data | Eximpedia

    • eximpedia.app
    Updated Jan 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim (2025). Ballroom Python South | See Full Import/Export Data | Eximpedia [Dataset]. https://www.eximpedia.app/
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Jan 8, 2025
    Dataset provided by
    Eximpedia Export Import Trade Data
    Eximpedia PTE LTD
    Authors
    Seair Exim
    Area covered
    El Salvador, Eritrea, Iceland, State of, Mayotte, Croatia, Guyana, Luxembourg, Myanmar, Zambia
    Description

    Ballroom Python South Company Export Import Records. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.

  6. S

    FLUNT simulated trapezoidal PCHE flow and heat transfer data set

    • scidb.cn
    Updated Sep 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    zhao zi yan (2024). FLUNT simulated trapezoidal PCHE flow and heat transfer data set [Dataset]. http://doi.org/10.57760/sciencedb.hjs.00106
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 16, 2024
    Dataset provided by
    Science Data Bank
    Authors
    zhao zi yan
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Build a trapezoidal PCHE two channel model using FLUNT software as a unit for processing, simulate by changing different input conditions, obtain corresponding results, export them as CSV files, use Python for data processing, remove unnecessary information columns, and combine certain information from each file to form a snapshot matrix CSV file. After processing the snapshot matrix CSV file in Python, import it into MATLAB for prediction, and finally export the MATLAB results as a result CSV file.

  7. q

    Export bone healing simulation data for Matlab...

    • data.researchdatafinder.qut.edu.au
    Updated Feb 1, 2002
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2002). Export bone healing simulation data for Matlab... [Dataset]. https://data.researchdatafinder.qut.edu.au/dataset/abaqus-finite-element/resource/49105b75-533e-45cd-bede-457a917559e9
    Explore at:
    Dataset updated
    Feb 1, 2002
    License

    http://researchdatafinder.qut.edu.au/display/n4066http://researchdatafinder.qut.edu.au/display/n4066

    Description

    Exports required model results as text files, for reading into Matlab. Must be run from command line (after loading Abaqus module) thus: abaqus python ExportModelEndState02a.py Output databases... QUT Research Data Respository Dataset Resource available for download

  8. Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter...

    • zenodo.org
    • explore.openaire.eu
    bz2
    Updated Mar 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana (2021). Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter Notebooks [Dataset]. http://doi.org/10.5281/zenodo.2592524
    Explore at:
    bz2Available download formats
    Dataset updated
    Mar 15, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of Jupyter Notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we analyzed 1.4 million notebooks from GitHub.

    Paper: https://2019.msrconf.org/event/msr-2019-papers-a-large-scale-study-about-quality-and-reproducibility-of-jupyter-notebooks

    This repository contains two files:

    • dump.tar.bz2
    • jupyter_reproducibility.tar.bz2

    The dump.tar.bz2 file contains a PostgreSQL dump of the database, with all the data we extracted from the notebooks.

    The jupyter_reproducibility.tar.bz2 file contains all the scripts we used to query and download Jupyter Notebooks, extract data from them, and analyze the data. It is organized as follows:

    • analyses: this folder has all the notebooks we use to analyze the data in the PostgreSQL database.
    • archaeology: this folder has all the scripts we use to query, download, and extract data from GitHub notebooks.
    • paper: empty. The notebook analyses/N12.To.Paper.ipynb moves data to it

    In the remaining of this text, we give instructions for reproducing the analyses, by using the data provided in the dump and reproducing the collection, by collecting data from GitHub again.

    Reproducing the Analysis

    This section shows how to load the data in the database and run the analyses notebooks. In the analysis, we used the following environment:

    Ubuntu 18.04.1 LTS
    PostgreSQL 10.6
    Conda 4.5.11
    Python 3.7.2
    PdfCrop 2012/11/02 v1.38

    First, download dump.tar.bz2 and extract it:

    tar -xjf dump.tar.bz2

    It extracts the file db2019-03-13.dump. Create a database in PostgreSQL (we call it "jupyter"), and use psql to restore the dump:

    psql jupyter < db2019-03-13.dump

    It populates the database with the dump. Now, configure the connection string for sqlalchemy by setting the environment variable JUP_DB_CONNECTTION:

    export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter";

    Download and extract jupyter_reproducibility.tar.bz2:

    tar -xjf jupyter_reproducibility.tar.bz2

    Create a conda environment with Python 3.7:

    conda create -n analyses python=3.7
    conda activate analyses

    Go to the analyses folder and install all the dependencies of the requirements.txt

    cd jupyter_reproducibility/analyses
    pip install -r requirements.txt

    For reproducing the analyses, run jupyter on this folder:

    jupyter notebook

    Execute the notebooks on this order:

    • Index.ipynb
    • N0.Repository.ipynb
    • N1.Skip.Notebook.ipynb
    • N2.Notebook.ipynb
    • N3.Cell.ipynb
    • N4.Features.ipynb
    • N5.Modules.ipynb
    • N6.AST.ipynb
    • N7.Name.ipynb
    • N8.Execution.ipynb
    • N9.Cell.Execution.Order.ipynb
    • N10.Markdown.ipynb
    • N11.Repository.With.Notebook.Restriction.ipynb
    • N12.To.Paper.ipynb

    Reproducing or Expanding the Collection

    The collection demands more steps to reproduce and takes much longer to run (months). It also involves running arbitrary code on your machine. Proceed with caution.

    Requirements

    This time, we have extra requirements:

    All the analysis requirements
    lbzip2 2.5
    gcc 7.3.0
    Github account
    Gmail account

    Environment

    First, set the following environment variables:

    export JUP_MACHINE="db"; # machine identifier
    export JUP_BASE_DIR="/mnt/jupyter/github"; # place to store the repositories
    export JUP_LOGS_DIR="/home/jupyter/logs"; # log files
    export JUP_COMPRESSION="lbzip2"; # compression program
    export JUP_VERBOSE="5"; # verbose level
    export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter"; # sqlchemy connection
    export JUP_GITHUB_USERNAME="github_username"; # your github username
    export JUP_GITHUB_PASSWORD="github_password"; # your github password
    export JUP_MAX_SIZE="8000.0"; # maximum size of the repositories directory (in GB)
    export JUP_FIRST_DATE="2013-01-01"; # initial date to query github
    export JUP_EMAIL_LOGIN="gmail@gmail.com"; # your gmail address
    export JUP_EMAIL_TO="target@email.com"; # email that receives notifications
    export JUP_OAUTH_FILE="~/oauth2_creds.json" # oauth2 auhentication file
    export JUP_NOTEBOOK_INTERVAL=""; # notebook id interval for this machine. Leave it in blank
    export JUP_REPOSITORY_INTERVAL=""; # repository id interval for this machine. Leave it in blank
    export JUP_WITH_EXECUTION="1"; # run execute python notebooks
    export JUP_WITH_DEPENDENCY="0"; # run notebooks with and without declared dependnecies
    export JUP_EXECUTION_MODE="-1"; # run following the execution order
    export JUP_EXECUTION_DIR="/home/jupyter/execution"; # temporary directory for running notebooks
    export JUP_ANACONDA_PATH="~/anaconda3"; # conda installation path
    export JUP_MOUNT_BASE="/home/jupyter/mount_ghstudy.sh"; # bash script to mount base dir
    export JUP_UMOUNT_BASE="/home/jupyter/umount_ghstudy.sh"; # bash script to umount base dir
    export JUP_NOTEBOOK_TIMEOUT="300"; # timeout the extraction
    
    
    # Frequenci of log report
    export JUP_ASTROID_FREQUENCY="5";
    export JUP_IPYTHON_FREQUENCY="5";
    export JUP_NOTEBOOKS_FREQUENCY="5";
    export JUP_REQUIREMENT_FREQUENCY="5";
    export JUP_CRAWLER_FREQUENCY="1";
    export JUP_CLONE_FREQUENCY="1";
    export JUP_COMPRESS_FREQUENCY="5";
    
    export JUP_DB_IP="localhost"; # postgres database IP

    Then, configure the file ~/oauth2_creds.json, according to yagmail documentation: https://media.readthedocs.org/pdf/yagmail/latest/yagmail.pdf

    Configure the mount_ghstudy.sh and umount_ghstudy.sh scripts. The first one should mount the folder that stores the directories. The second one should umount it. You can leave the scripts in blank, but it is not advisable, as the reproducibility study runs arbitrary code on your machine and you may lose your data.

    Scripts

    Download and extract jupyter_reproducibility.tar.bz2:

    tar -xjf jupyter_reproducibility.tar.bz2

    Install 5 conda environments and 5 anaconda environments, for each python version. In each of them, upgrade pip, install pipenv, and install the archaeology package (Note that it is a local package that has not been published to pypi. Make sure to use the -e option):

    Conda 2.7

    conda create -n raw27 python=2.7 -y
    conda activate raw27
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 2.7

    conda create -n py27 python=2.7 anaconda -y
    conda activate py27
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology
    

    Conda 3.4

    It requires a manual jupyter and pathlib2 installation due to some incompatibilities found on the default installation.

    conda create -n raw34 python=3.4 -y
    conda activate raw34
    conda install jupyter -c conda-forge -y
    conda uninstall jupyter -y
    pip install --upgrade pip
    pip install jupyter
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology
    pip install pathlib2

    Anaconda 3.4

    conda create -n py34 python=3.4 anaconda -y
    conda activate py34
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.5

    conda create -n raw35 python=3.5 -y
    conda activate raw35
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 3.5

    It requires the manual installation of other anaconda packages.

    conda create -n py35 python=3.5 anaconda -y
    conda install -y appdirs atomicwrites keyring secretstorage libuuid navigator-updater prometheus_client pyasn1 pyasn1-modules spyder-kernels tqdm jeepney automat constantly anaconda-navigator
    conda activate py35
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.6

    conda create -n raw36 python=3.6 -y
    conda activate raw36
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 3.6

    conda create -n py36 python=3.6 anaconda -y
    conda activate py36
    conda install -y anaconda-navigator jupyterlab_server navigator-updater
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.7

    <code

  9. d

    Data Management Project for Collaborative Groundwater Research

    • search.dataone.org
    • hydroshare.org
    Updated Apr 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abbygael Johnson; Collins Stephenson; Brett Safely; Brooklyn Taylor (2025). Data Management Project for Collaborative Groundwater Research [Dataset]. https://search.dataone.org/view/sha256%3A1fa221662b0ba8fc29abb240dba6a668cd9388334e34eda28a18b4a84e2c51ab
    Explore at:
    Dataset updated
    Apr 26, 2025
    Dataset provided by
    Hydroshare
    Authors
    Abbygael Johnson; Collins Stephenson; Brett Safely; Brooklyn Taylor
    Description

    This project developed a comprehensive data management system designed to support collaborative groundwater research across institutions by establishing a centralized, structured database for hydrologic time series data. Built on the Observations Data Model (ODM), the system stores time series data and metadata in a relational SQLite database. Key project components included database construction, automation of data formatting and importation, development of analytical and visualization tools, and integration with ArcGIS for geospatial representation. The data import workflow standardizes and validates diverse .csv datasets by aligning them with ODM formatting. A Python-based module was created to facilitate data retrieval, analysis, visualization, and export, while an interactive map feature enables users to explore site-specific data availability. Additionally, a custom ArcGIS script was implemented to generate maps that incorporate stream networks, site locations, and watershed boundaries using DEMs from USGS sources. The system was tested using real-world datasets from groundwater wells and surface water gages across Utah, demonstrating its flexibility in handling diverse formats and parameters. The relational structure enabled efficient querying and visualization, and the developed tools promoted accessibility and alignment with FAIR principles.

  10. Z

    Geographic Diversity in Public Code Contributions — Replication Package

    • data.niaid.nih.gov
    • explore.openaire.eu
    Updated Mar 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Davide Rossi (2022). Geographic Diversity in Public Code Contributions — Replication Package [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6390354
    Explore at:
    Dataset updated
    Mar 31, 2022
    Dataset provided by
    Stefano Zacchiroli
    Davide Rossi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Geographic Diversity in Public Code Contributions - Replication Package

    This document describes how to replicate the findings of the paper: Davide Rossi and Stefano Zacchiroli, 2022, Geographic Diversity in Public Code Contributions - An Exploratory Large-Scale Study Over 50 Years. In 19th International Conference on Mining Software Repositories (MSR ’22), May 23-24, Pittsburgh, PA, USA. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3524842.3528471

    This document comes with the software needed to mine and analyze the data presented in the paper.

    Prerequisites

    These instructions assume the use of the bash shell, the Python programming language, the PosgreSQL DBMS (version 11 or later), the zstd compression utility and various usual *nix shell utilities (cat, pv, …), all of which are available for multiple architectures and OSs. It is advisable to create a Python virtual environment and install the following PyPI packages:

    click==8.0.4 cycler==0.11.0 fonttools==4.31.2 kiwisolver==1.4.0 matplotlib==3.5.1 numpy==1.22.3 packaging==21.3 pandas==1.4.1 patsy==0.5.2 Pillow==9.0.1 pyparsing==3.0.7 python-dateutil==2.8.2 pytz==2022.1 scipy==1.8.0 six==1.16.0 statsmodels==0.13.2

    Initial data

    swh-replica, a PostgreSQL database containing a copy of Software Heritage data. The schema for the database is available at https://forge.softwareheritage.org/source/swh-storage/browse/master/swh/storage/sql/. We retrieved these data from Software Heritage, in collaboration with the archive operators, taking an archive snapshot as of 2021-07-07. We cannot make these data available in full as part of the replication package due to both its volume and the presence in it of personal information such as user email addresses. However, equivalent data (stripped of email addresses) can be obtained from the Software Heritage archive dataset, as documented in the article: Antoine Pietri, Diomidis Spinellis, Stefano Zacchiroli, The Software Heritage Graph Dataset: Public software development under one roof. In proceedings of MSR 2019: The 16th International Conference on Mining Software Repositories, May 2019, Montreal, Canada. Pages 138-142, IEEE 2019. http://dx.doi.org/10.1109/MSR.2019.00030. Once retrieved, the data can be loaded in PostgreSQL to populate swh-replica.

    names.tab - forenames and surnames per country with their frequency

    zones.acc.tab - countries/territories, timezones, population and world zones

    c_c.tab - ccTDL entities - world zones matches

    Data preparation

    Export data from the swh-replica database to create commits.csv.zst and authors.csv.zst

    sh> ./export.sh

    Run the authors cleanup script to create authors--clean.csv.zst

    sh> ./cleanup.sh authors.csv.zst

    Filter out implausible names and create authors--plausible.csv.zst

    sh> pv authors--clean.csv.zst | unzstd | ./filter_names.py 2> authors--plausible.csv.log | zstdmt > authors--plausible.csv.zst

    Zone detection by email

    Run the email detection script to create author-country-by-email.tab.zst

    sh> pv authors--plausible.csv.zst | zstdcat | ./guess_country_by_email.py -f 3 2> author-country-by-email.csv.log | zstdmt > author-country-by-email.tab.zst

    Database creation and initial data ingestion

    Create the PostgreSQL DB

    sh> createdb zones-commit

    Notice that from now on when prepending the psql> prompt we assume the execution of psql on the zones-commit database.

    Import data into PostgreSQL DB

    sh> ./import_data.sh

    Zone detection by name

    Extract commits data from the DB and create commits.tab, that is used as input for the zone detection script

    sh> psql -f extract_commits.sql zones-commit

    Run the world zone detection script to create commit_zones.tab.zst

    sh> pv commits.tab | ./assign_world_zone.py -a -n names.tab -p zones.acc.tab -x -w 8 | zstdmt > commit_zones.tab.zst Use ./assign_world_zone.py --help if you are interested in changing the script parameters.

    Ingest zones assignment data into the DB

    psql> \copy commit_zone from program 'zstdcat commit_zones.tab.zst | cut -f1,6 | grep -Ev ''\s$'''

    Extraction and graphs

    Run the script to execute the queries to extract the data to plot from the DB. This creates commit_zones_7120.tab, author_zones_7120_t5.tab, commit_zones_7120.grid and author_zones_7120_t5.grid. Edit extract_data.sql if you whish to modify extraction parameters (start/end year, sampling, …).

    sh> ./extract_data.sh

    Run the script to create the graphs from all the previously extracted tabfiles.

    sh> ./create_stackedbar_chart.py -w 20 -s 1971 -f commit_zones_7120.grid -f author_zones_7120_t5.grid -o chart.pdf

  11. o

    Population Distribution Workflow using Census API in Jupyter Notebook:...

    • openicpsr.org
    delimited
    Updated Jul 23, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cooper Goodman; Nathanael Rosenheim; Wayne Day; Donghwan Gu; Jayasaree Korukonda (2020). Population Distribution Workflow using Census API in Jupyter Notebook: Dynamic Map of Census Tracts in Boone County, KY, 2000 [Dataset]. http://doi.org/10.3886/E120382V1
    Explore at:
    delimitedAvailable download formats
    Dataset updated
    Jul 23, 2020
    Dataset provided by
    Texas A&M University
    Authors
    Cooper Goodman; Nathanael Rosenheim; Wayne Day; Donghwan Gu; Jayasaree Korukonda
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2000
    Area covered
    Boone County
    Description

    This archive reproduces a figure titled "Figure 3.2 Boone County population distribution" from Wang and vom Hofe (2007, p.60). The archive provides a Jupyter Notebook that uses Python and can be run in Google Colaboratory. The workflow uses the Census API to retrieve data, reproduce the figure, and ensure reproducibility for anyone accessing this archive.The Python code was developed in Google Colaboratory, or Google Colab for short, which is an Integrated Development Environment (IDE) of JupyterLab and streamlines package installation, code collaboration, and management. The Census API is used to obtain population counts from the 2000 Decennial Census (Summary File 1, 100% data). Shapefiles are downloaded from the TIGER/Line FTP Server. All downloaded data are maintained in the notebook's temporary working directory while in use. The data and shapefiles are stored separately with this archive. The final map is also stored as an HTML file.The notebook features extensive explanations, comments, code snippets, and code output. The notebook can be viewed in a PDF format or downloaded and opened in Google Colab. References to external resources are also provided for the various functional components. The notebook features code that performs the following functions:install/import necessary Python packagesdownload the Census Tract shapefile from the TIGER/Line FTP Serverdownload Census data via CensusAPI manipulate Census tabular data merge Census data with TIGER/Line shapefileapply a coordinate reference systemcalculate land area and population densitymap and export the map to HTMLexport the map to ESRI shapefileexport the table to CSVThe notebook can be modified to perform the same operations for any county in the United States by changing the State and County FIPS code parameters for the TIGER/Line shapefile and Census API downloads. The notebook can be adapted for use in other environments (i.e., Jupyter Notebook) as well as reading and writing files to a local or shared drive, or cloud drive (i.e., Google Drive).

  12. H

    Data from: We Just Ran Twenty-Three Million Queries of the World Bank's Web...

    • dataverse.harvard.edu
    Updated Apr 27, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarah Dykstra; Benjamin Dykstra; Justin Sandefur (2014). We Just Ran Twenty-Three Million Queries of the World Bank's Web Site [Dataset]. http://doi.org/10.7910/DVN/25492
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 27, 2014
    Dataset provided by
    Harvard Dataverse
    Authors
    Sarah Dykstra; Benjamin Dykstra; Justin Sandefur
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    1977 - 2012
    Area covered
    World
    Description

    This study provides data from the World Bank's PovcalNet on the distribution of household income and consumption across populations for 942 country-years, organized in dta and csv files by region. Each distribution contains 10,000 data points, one for each 0.01 incremental increase in percent of people living in households at or below a given income or consumption level. In addition, a data set containing the estimated parameters of the Beta and General Quadratic Lorenz curves is provided. For reference, we also provide the Python scripts used to query the PovcalNet online tool and export data from the Mongo database used to store results of these queries, along with all do files used to clean and construct the final data sets and summary statistics.

  13. Worldwide Gender Differences in Public Code Contributions - Replication...

    • zenodo.org
    • data.niaid.nih.gov
    bin, html, zip
    Updated Feb 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Davide Rossi; Stefano Zacchiroli; Stefano Zacchiroli; Davide Rossi (2022). Worldwide Gender Differences in Public Code Contributions - Replication Package [Dataset]. http://doi.org/10.5281/zenodo.6020475
    Explore at:
    bin, html, zipAvailable download formats
    Dataset updated
    Feb 9, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Davide Rossi; Stefano Zacchiroli; Stefano Zacchiroli; Davide Rossi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Worldwide Gender Differences in Public Code Contributions - Replication Package

    This document describes how to replicate the findings of the paper: Davide Rossi and Stefano Zacchiroli, 2022, Worldwide Gender Differences in Public Code Contributions. In Software Engineering in Society (ICSE-SEIS'22), May 21-29, 2022, Pittsburgh, PA, USA. ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3510458.3513011

    This document comes with the software needed to mine and analyze the data presented in the paper.

    Prerequisites

    These instructions assume the use of the bash shell, the Python programming language, the PosgreSQL DBMS (version 11 or later), the zstd compression utility and various usual *nix shell utilities (cat, pv, ...), all of which are available for multiple architectures and OSs.
    It is advisable to create a Python virtual environment and install the following PyPI packages: click==8.0.3 cycler==0.10.0 gender-guesser==0.4.0 kiwisolver==1.3.2 matplotlib==3.4.3 numpy==1.21.3 pandas==1.3.4 patsy==0.5.2 Pillow==8.4.0 pyparsing==2.4.7 python-dateutil==2.8.2 pytz==2021.3 scipy==1.7.1 six==1.16.0 statsmodels==0.13.0

    Initial data

    • swh-replica, a PostgreSQL database containing a copy of Software Heritage data. The schema for the database is available at https://forge.softwareheritage.org/source/swh-storage/browse/master/swh/storage/sql/.
      We retrieved these data from Software Heritage, in collaboration with the archive operators, taking an archive snapshot as of 2021-07-07. We cannot make these data available in full as part of the replication package due to both its volume and the presence in it of personal information such as user email addresses. However, equivalent data (stripped of email addresses) can be obtained from the Software Heritage archive dataset, as documented in the article: Antoine Pietri, Diomidis Spinellis, Stefano Zacchiroli, The Software Heritage Graph Dataset: Public software development under one roof. In proceedings of MSR 2019: The 16th International Conference on Mining Software Repositories, May 2019, Montreal, Canada. Pages 138-142, IEEE 2019. http://dx.doi.org/10.1109/MSR.2019.00030.
      Once retrieved, the data can be loaded in PostgreSQL to populate swh-replica.
    • names.tab - forenames and surnames per country with their frequency
    • zones.acc.tab - countries/territories, timezones, population and world zones
    • c_c.tab - ccTDL entities - world zones matches

    Data preparation

    • Export data from the swh-replica database to create commits.csv.zst and authors.csv.zst sh> ./export.sh
    • Run the authors cleanup script to create authors--clean.csv.zst sh> ./cleanup.sh authors.csv.zst
    • Filter out implausible names and create authors--plausible.csv.zst sh> pv authors--clean.csv.zst | unzstd | ./filter_names.py 2> authors--plausible.csv.log | zstdmt > authors--plausible.csv.zst

    Gender detection

    • Run the gender guessing script to create author-fullnames-gender.csv.zst sh> pv authors--plausible.csv.zst | unzstd | ./guess_gender.py --fullname --field 2 | zstdmt > author-fullnames-gender.csv.zst

    Database creation and data ingestion

    • Create the PostgreSQL DB sh> createdb gender-commit Notice that from now on when prepending the psql> prompt we assume the execution of psql on the gender-commit database.

    • Import data into PostgreSQL DB sh> ./import_data.sh

    Zone detection

    • Extract commits data from the DB and create commits.tab, that is used as input for the gender detection script
      sh> psql -f extract_commits.sql gender-commit
    • Run the world zone detection script to create commit_zones.tab.zst sh> pv commits.tab | ./assign_world_zone.py -a -n names.tab -p zones.acc.tab -x -w 8 | zstdmt > commit_zones.tab.zst Use ./assign_world_zone.py --help if you are interested in changing the script parameters.
    • Read zones assignment data from the file into the DB
      psql> \copy commit_culture from program 'zstdcat commit_zones.tab.zst | cut -f1,6 | grep -Ev ''\s$'''

    Extraction and graphs

    • Run the script to execute the queries to extract the data to plot from the DB. This creates commits_tz.tab, authors_tz.tab, commits_zones.tab, authors_zones.tab, and authors_zones_1620.tab.
      Edit extract_data.sql if you whish to modify extraction parameters (start/end year, sampling, ...). sh> ./extract_data.sh
    • Run the script to create the graphs from all the previously extracted tabfiles. This will generate commits_tzs.pdf, authors_tzs.pdf, commits_zones.pdf, authors_zones.pdf, and authors_zones_1620.pdf. sh> ./create_charts.sh

    Additional graphs

    This package also includes some already-made graphs

    • authors_zones_1.pdf: stacked graphs showing the ratio of female authors per world zone through the years, considering all authors with at least one commit per period
    • authors_zones_2.pdf: ditto with at least two commits per period
    • authors_zones_10.pdf: ditto with at least ten commits per period
  14. Input/output data for blackwater modelling using pyDODOC (python model for...

    • researchdata.edu.au
    • data.csiro.au
    datadownload
    Updated Mar 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ashmita Sengupta; Catherine Ticehurst; Andrew Freebairn; Klaus Joehnk; Fazlul Karim; Klaus Jöhnk; Fazlul Karim; Catherine Ticehurst; Andrew Freebairn (2023). Input/output data for blackwater modelling using pyDODOC (python model for Dissolved Oxygen, Dissolved Organic Carbon) model [Dataset]. http://doi.org/10.25919/H9XT-ZD03
    Explore at:
    datadownloadAvailable download formats
    Dataset updated
    Mar 20, 2023
    Dataset provided by
    CSIROhttp://www.csiro.au/
    Authors
    Ashmita Sengupta; Catherine Ticehurst; Andrew Freebairn; Klaus Joehnk; Fazlul Karim; Klaus Jöhnk; Fazlul Karim; Catherine Ticehurst; Andrew Freebairn
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2010 - Dec 31, 2020
    Area covered
    Description

    Flow data and model parameters to configure and run the pyDODOC model for estimating dissolved organic carbon (DOC) export from the Barmah floodplain in the Murray-Darling Basin (MDB) for different temperature, litter load concentration and flooding scenarios. Model outputs including DOC leaching, DOC-consumed and net DOC export from the floodplain are included in the output file for each scenario. The data were produced as a part of the Murray-Darling Basin Ecosystem Function (MDB-EF) project. Lineage: Measured flow and water quality data (e.g. DO, DOC) were obtained from the water monitoring agencies of New South Wales (NSW) and Victoria (Vic). The outputs were produced by executing the pyDODOC model for different temperatures and leaf litter concentrations.

  15. Z

    Data from: Research data and example scripts for the paper "Bayesian...

    • data.niaid.nih.gov
    Updated Mar 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schneider, Philipp-Immanuel (2024). Research data and example scripts for the paper "Bayesian Target-Vector Optimization for Efficient Parameter Reconstruction" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6359593
    Explore at:
    Dataset updated
    Mar 6, 2024
    Dataset provided by
    Schneider, Philipp-Immanuel
    Burger, Sven
    Plock, Matthias
    Andrle, Kas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Bayesian Target-Vector Optimization for Efficient Parameter Reconstruction

    This publication contains the research data and example scripts for the paper “Bayesian Target-Vector Optimization for Efficient Parameter Reconstruction” [1]. The research data is found in the directory research_data, the example scripts are found in the directory example_scripts.

    The research data contains all necessary information to be able to reconstruct the figures and values given in the paper, as well as all result figures shown. Where possible, the directories contain the necessary scripts to recreate the results themselves, up to stochastic variations.

    The example scripts are intended to show how one can (i), perform a least-square type optimization of a model function (here we focus our efforts on the analytical model functions MGH17 and Gauss3, as described in the paper) using various methods (BTVO, LM, BO, L-BFGS-B, NM, including using derivative information when applicable), and (ii), perform Markov chain Monte Carlo (MCMC) sampling around the found maximum likelihood estimate (MLE) to estimate the uncertainties of the MLE parameter (both using a surrogate model of the actual model function, as well as using the actual model function directly).

    Research data

    Contained are directories for the experimental problem GIXRF, and the two analytical model functions MGH17 and Gauss3. What follows is a listing of directories and the contents:

    gauss3_optimization: Optimization logs for the Gauss3 model function for BTVO, LM, BO, L-BFGS-B, NM (with derivatives when applicable), .npy files used for generating the plots, a benchmark.py file used for the generation of the data, as well as the plots shown in the paper.

    mgh17_optimization: Optimization logs for the MGH17 model function for BTVO, LM, BO, L-BFGS-B, NM (with derivatives when applicable), .npy files used for generating the plots, a benchmark.py file used for the generation of the data, as well as the plots shown in the paper.

    mgh17_mcmc_analytical: Scripts for the creation of the plots (does not use an optimization log), as well as plots shown in the paper. This uses the model function directly to perform the MCMC sampling.

    mgh17_mcmc_surrogate: Optimization log of the MGH17 function used for the creation of the MCMC plots, scripts for the creation of the plots (use the optimization log), as well as plots shown in the paper. This uses a surrogate model to perform the MCMC sampling.

    gixrf_optimization: benchmark.py file to perform the optimization, the optimization logs for the various methods (BTVO, LM, BO, L-BFGS-B, NM), .npy files and scripts used for the creation of the plots, and the plots shown in the paper.

    gixrf_mcmc_supplement: optimization log used for the creation of the plot, pickle file used for the creation of the plot, script to create the MCMC plot.

    gixrf_optimum_difference_supplement: optimization logs of BTVO optimization of the GIXRF problem, scripts to create the difference/error plots shown for the GIXRF problem in the supplement, and the plots themselves.

    Employed software for creating the research data

    The software used in the creation is:

    JCMsuite Analysis and Optimization toolkit, development version, commit d55e99b (the closest commercial release is found in JCMsuite version 5.0.2)

    A list of Python packages installed (excerpt from conda list, name and version)

    corner 2.1.0

    emcee 3.0.2

    jax 0.2.22

    jaxlib 0.1.72

    matplotlib 3.2.1

    numba 0.40.1

    numpy 1.18.1

    pandas 0.24.1

    python 3.7.11

    scikit-optimize 0.7.4

    scipy 1.7.1

    tikzplotlib 0.9.9

    JCMsuite 4.6.3 for the evaluation of the experimental model

    Example scripts

    This directory contains a few sample files that show how parameter reconstructions can be performed using the JCMsuite analysis and optimization toolbox, with a particular focus on the Bayesian target-vector optimization method shown in the paper.

    It also contains example files that show how an uncertainty quantification can be performed using MCMC, both directly using a model function, as well as using a surrogate model of the model function.

    What follows is a listing of the contents of the directory:

    mcmc_mgh17_analytical.py: performs a MCMC analysis of the MGH17 model function directly, without constructing a surrogate model. Uses emcee.

    mcmc_mgh17_surrogate.py: performs a MCMC analysis of the MGH17 model function by constructing a surrogate model of the model function. Uses the JCMsuite analysis and optimization toolbox.

    opt_gauss3.py: performs a parameter reconstruction of the Gauss3 model function using various methods (BTVO, LM, BO, L-BFGS-B, NM, with derivatives when applicable).

    opt_mgh17.py: performs a parameter reconstruction of the MGH17 model function using various methods (BTVO, LM, BO, L-BFGS-B, NM, with derivatives when applicable).

    util/model_functions.py: contains the MGH17 and Gauss3 model functions, their (automatic) derivatives, and objective functions used in the optimizations.

    Requirements to execute the example scripts

    These scripts have been developed and tested under Linux, Debian 10. We have tried to make sure that they would also work in a Windows environment, but can unfortunately give no guarantees for that.

    We mainly use Python to run the reconstructions. To execute the files, a few Python packages have to be installed. In addition to the usual scientific Python stack (NumPy, SciPy, matplotlib, pandas, etc.), the packages jax and jaxlib (for automatic differentiation of Python/NumPy functions), emcee and corner (for MCMC sampling and subsequent plotting of the results) have to be installed.

    This can be achieved for example using pip, e.g.

    pip install -r requirements.txt

    Additionally, JCMsuite has to be installed. For this you can visit [2] and download a free trial version.

    On Linux, the installation has to be added to the PATH, e.g. by adding the following to your .bashrc file:

    export JCMROOT=/FULL/PATH/TO/BASE/DIRECTORY export PATH=$JCMROOT/bin:$PATH export PYTHONPATH=$JCMROOT/ThirdPartySupport/Python:$PYTHONPATH

    Bibliography

    [1] M. Plock, K. Andrle, S. Burger, P.-I. Schneider, Bayesian Target-Vector Optimization for Efficient Parameter Reconstruction. Adv. Theory Simul. 5, 2200112 (2022).

    [2] https://jcmwave.com/

  16. Democracy and the Market for Culture: Institutional Determinants of Cultural...

    • zenodo.org
    bin, csv
    Updated Jul 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anon Anon; Anon Anon (2025). Democracy and the Market for Culture: Institutional Determinants of Cultural Goods Exports [Dataset]. http://doi.org/10.5281/zenodo.16417376
    Explore at:
    bin, csvAvailable download formats
    Dataset updated
    Jul 25, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anon Anon; Anon Anon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset accompanies the research article titled Democracy and the Market for Culture, which investigates how different dimensions of democratic governance influence international trade in cultural goods. Drawing from UNESCO cultural exports data, World Bank GDP per capita, and the V-Dem Core v15 democracy indicators, the study uses fixed-effects OLS regression to analyze the institutional determinants of cultural export performance across countries from 2004 to 2019.

    The dataset includes:

    • culture_exports.csv: UNESCO cultural goods export data (USD, by country-year).

    • GDPPC.csv: World Bank GDP per capita data (log-transformed).

    • V-Dem-CY-Core-v15.csv: Country-year democracy indicators from V-Dem v15.

    • Three Colab-ready Python notebooks (v2x_libdem.ipynb, v2x_partipdem.ipynb, v2x_polyarchy.ipynb) with full replication code for each regression model.

    🔧 How to Use in Google Colab

    1. Open Google Colab:
      Go to https://colab.research.google.com/

    2. Upload Files to Colab:

      • You can drag the files into the file panel (left sidebar) or use this code to upload:

        python
        CopiarEditar
        from google.colab import files uploaded = files.upload()
    3. Install Required Libraries (if needed):
      Most notebooks use pandas, statsmodels, and matplotlib:

      python
      CopiarEditar
      !pip install pandas statsmodels matplotlib
    4. Run a Notebook:

      • Open v2x_libdem.ipynb, v2x_partipdem.ipynb, or v2x_polyarchy.ipynb in Colab.

      • Click "Runtime" > "Run all" or run each cell individually to replicate the regression results.

    5. Customize or Extend:

      • You can plug in other democracy or economic indicators from the V-Dem dataset to test additional hypotheses.

      • Modify the regression specification or add visualization outputs using seaborn or plotly.

    License: Creative Commons Attribution 4.0 (CC BY 4.0)

    Citation:
    Anon Annotator. (2025). Democracy and the Market for Culture: Institutional Determinants of Cultural Goods Exports [Dataset]. Zenodo. https://doi.org/10.5281/zenodo.16417376

  17. d

    Hydroshare-GoogleEarthEngine

    • dataone.org
    • hydroshare.org
    Updated Dec 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alfonso Torres-Rua (2021). Hydroshare-GoogleEarthEngine [Dataset]. https://dataone.org/datasets/sha256%3A0ddeef1d398d807e3b20a6525e819074588141b0f4021865c8366d7c76e8917f
    Explore at:
    Dataset updated
    Dec 5, 2021
    Dataset provided by
    Hydroshare
    Authors
    Alfonso Torres-Rua
    Description
  18. Replication Package: Unboxing Default Argument Breaking Changes in Scikit...

    • zenodo.org
    application/gzip
    Updated Aug 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    João Eduardo Montandon; Luciana Lourdes Silva; Cristiano Politowski; Ghizlane El Boussaidi; Marco Tulio Valente; João Eduardo Montandon; Luciana Lourdes Silva; Cristiano Politowski; Ghizlane El Boussaidi; Marco Tulio Valente (2023). Replication Package: Unboxing Default Argument Breaking Changes in Scikit Learn [Dataset]. http://doi.org/10.5281/zenodo.8132450
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Aug 23, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    João Eduardo Montandon; Luciana Lourdes Silva; Cristiano Politowski; Ghizlane El Boussaidi; Marco Tulio Valente; João Eduardo Montandon; Luciana Lourdes Silva; Cristiano Politowski; Ghizlane El Boussaidi; Marco Tulio Valente
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Replication Package

    This repository contains data and source files needed to replicate our work described in the paper "Unboxing Default Argument Breaking Changes in Scikit Learn".

    Requirements

    We recommend the following requirements to replicate our study:

    1. Internet access
    2. At least 100GB of space
    3. Docker installed
    4. Git installed

    Package Structure

    We relied on Docker containers to provide a working environment that is easier to replicate. Specifically, we configure the following containers:

    • data-analysis, an R-based Container we used to run our data analysis.
    • data-collection, a Python Container we used to collect Scikit's default arguments and detect them in client applications.
    • database, a Postgres Container we used to store clients' data, obtainer from Grotov et al.
    • storage, a directory used to store the data processed in data-analysis and data-collection. This directory is shared in both containers.
    • docker-compose.yml, the Docker file that configures all containers used in the package.

    In the remainder of this document, we describe how to set up each container properly.

    Using VSCode to Setup the Package

    We selected VSCode as the IDE of choice because its extensions allow us to implement our scripts directly inside the containers. In this package, we provide configuration parameters for both data-analysis and data-collection containers. This way you can directly access and run each container inside it without any specific configuration.

    You first need to set up the containers

    $ cd /replication/package/folder
    $ docker-compose build
    $ docker-compose up
    # Wait docker creating and running all containers
    

    Then, you can open them in Visual Studio Code:

    1. Open VSCode in project root folder
    2. Access the command palette and select "Dev Container: Reopen in Container"
      1. Select either Data Collection or Data Analysis.
    3. Start working

    If you want/need a more customized organization, the remainder of this file describes it in detail.

    Longest Road: Manual Package Setup

    Database Setup

    The database container will automatically restore the dump in dump_matroskin.tar in its first launch. To set up and run the container, you should:

    Build an image:

    $ cd ./database
    $ docker build --tag 'dabc-database' .
    $ docker image ls
    REPOSITORY  TAG    IMAGE ID    CREATED     SIZE
    dabc-database latest  b6f8af99c90d  50 minutes ago  18.5GB
    

    Create and enter inside the container:

    $ docker run -it --name dabc-database-1 dabc-database
    $ docker exec -it dabc-database-1 /bin/bash
    root# psql -U postgres -h localhost -d jupyter-notebooks
    jupyter-notebooks=# \dt
           List of relations
     Schema |    Name    | Type | Owner
    --------+-------------------+-------+-------
     public | Cell       | table | root
     public | Code_cell     | table | root
     public | Md_cell      | table | root
     public | Notebook     | table | root
     public | Notebook_features | table | root
     public | Notebook_metadata | table | root
     public | repository    | table | root
    

    If you got the tables list as above, your database is properly setup.

    It is important to mention that this database is extended from the one provided by Grotov et al.. Basically, we added three columns in the table Notebook_features (API_functions_calls, defined_functions_calls, andother_functions_calls) containing the function calls performed by each client in the database.

    Data Collection Setup

    This container is responsible for collecting the data to answer our research questions. It has the following structure:

    • dabcs.py, extract DABCs from Scikit Learn source code, and export them to a CSV file.
    • dabcs-clients.py, extract function calls from clients and export them to a CSV file. We rely on a modified version of Matroskin to leverage the function calls. You can find the tool's source code in the `matroskin`` directory.
    • Makefile, commands to set up and run both dabcs.py and dabcs-clients.py
    • matroskin, the directory containing the modified version of matroskin tool. We extended the library to collect the function calls performed on the client notebooks of Grotov's dataset.
    • storage, a docker volume where the data-collection should save the exported data. This data will be used later in Data Analysis.
    • requirements.txt, Python dependencies adopted in this module.

    Note that the container will automatically configure this module for you, e.g., install dependencies, configure matroskin, download scikit learn source code, etc. For this, you must run the following commands:

    $ cd ./data-collection
    $ docker build --tag "data-collection" .
    $ docker run -it -d --name data-collection-1 -v $(pwd)/:/data-collection -v $(pwd)/../storage/:/data-collection/storage/ data-collection
    $ docker exec -it data-collection-1 /bin/bash
    $ ls
    Dockerfile Makefile config.yml dabcs-clients.py dabcs.py matroskin storage requirements.txt utils.py
    

    If you see project files, it means the container is configured accordingly.

    Data Analysis Setup

    We use this container to conduct the analysis over the data produced by the Data Collection container. It has the following structure:

    • dependencies.R, an R script containing the dependencies used in our data analysis.
    • data-analysis.Rmd, the R notebook we used to perform our data analysis
    • datasets, a docker volume pointing to the storage directory.

    Execute the following commands to run this container:

    $ cd ./data-analysis
    $ docker build --tag "data-analysis" .
    $ docker run -it -d --name data-analysis-1 -v $(pwd)/:/data-analysis -v $(pwd)/../storage/:/data-collection/datasets/ data-analysis
    $ docker exec -it data-analysis-1 /bin/bash
    $ ls
    data-analysis.Rmd datasets dependencies.R Dockerfile figures Makefile
    

    If you see project files, it means the container is configured accordingly.

    A note on storage shared folder

    As mentioned, the storage folder is mounted as a volume and shared between data-collection and data-analysis containers. We compressed the content of this folder due to space constraints. Therefore, before starting working on Data Collection or Data Analysis, make sure you extracted the compressed files. You can do this by running the Makefile inside storage folder.

    $ make unzip # extract files
    $ ls
    clients-dabcs.csv clients-validation.csv dabcs.csv Makefile scikit-learn-versions.csv versions.csv
    $ make zip # compress files
    $ ls
    csv-files.tar.gz Makefile
  19. The Kubernetes Security Landscape: Full Open Research Archive

    • figshare.com
    zip
    Updated May 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    J. Alexander Curtis (2025). The Kubernetes Security Landscape: Full Open Research Archive [Dataset]. http://doi.org/10.6084/m9.figshare.25522459.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    J. Alexander Curtis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Kubernetes Security Landscape: AI-Driven Insights from a Comprehensive Analysis of Developer DiscussionsThis zip archive is organized into four main directories, plus a handful of supporting files. It contains everything you need to recreate the results of our research under the Open Research initiative.assets/: Images and compiled assets used in the research paperdata/: Exported raw data sources, including the data dump, CSV export, and Pickle filesresearch/: Jupyter Notebooks demonstrating how the data was processed within the paperscripts/: Python scripts for scraping - exposed by cli.pycli.py: Python script which exposes a command-line-interface (CLI) that allows for easier interaction with the scraping scripts in the scripts/ directorypoetry.lock: Dependency package manifest for using with the poetry install command and poetry package manager.requirements.txt: Python/Pip project configuration file if you prefer to use Pip instead of Poetry.README.md:

  20. d

    Bathymetry of the Main Pool of Lake Calumet, Cook County, Illinois, July...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Jul 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Bathymetry of the Main Pool of Lake Calumet, Cook County, Illinois, July 2023 [Dataset]. https://catalog.data.gov/dataset/bathymetry-of-the-main-pool-of-lake-calumet-cook-county-illinois-july-2023
    Explore at:
    Dataset updated
    Jul 24, 2025
    Dataset provided by
    U.S. Geological Survey
    Area covered
    Cook County, Lake Calumet, Illinois
    Description

    These data are single-beam bathymetry points compiled in comma separated values (CSV) file format, generated from a hydrographic survey of the northern portion of Lake Calumet in Cook County, Illinois. Hydrographic data were collected July 18-19, 2023, using a single-beam echosounder (SBES) integrated with a Global Navigation Satellite System (GNSS) mounted on a marine survey vessel. Surface water elevation data were collected July 18 utilizing a single-base real-time kinematic (RTK)/GNSS unit. Bathymetric data points were collected as the vessel traversed the northern portions of the lake along overlapping survey lines. The SBES internally collected and stored the depth data from the echosounder and the horizontal and vertical position data of the vessel from the GNSS in real time. Data processing required specialized computer software to export bathymetry data from the raw data files. A Python script was written to calculate the lakebed elevations and identify outliers in the dataset. These data are provided in comma separated values (CSV) format as LakeCalumet_SBES_20230718.csv. Data points are stored as a series of x (longitude), y (latitude), and z (elevation or depth) points along with variable length records specific to the data transects.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Seair Exim, Python Import Data India – Buyers & Importers List [Dataset]. https://www.seair.co.in
Organization logo

Python Import Data India – Buyers & Importers List

Seair Exim Solutions

Seair Info Solutions PVT LTD

Explore at:
23 scholarly articles cite this dataset (View in Google Scholar)
.bin, .xml, .csv, .xlsAvailable download formats
Dataset provided by
Authors
Seair Exim
Area covered
India
Description

Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.

Search
Clear search
Close search
Google apps
Main menu