16 datasets found

Data from: Meteogalicia PostgreSQL Database (2000 - 2018)
zenodo.org
portalinvestigacion.udc.gal
bin
Updated Sep 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jose Vidal-Paz; Jose Vidal-Paz (2024). Meteogalicia PostgreSQL Database (2000 - 2018) [Dataset]. http://doi.org/10.5281/zenodo.11915325
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11915325
Dataset updated
Sep 9, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jose Vidal-Paz; Jose Vidal-Paz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This database contains: rainfall, humidity, temperature, global solar radiation, wind velocity and wind direction ten-minute data from 150 stations of the Meteogalicia network between 1-jan-2000 and 31-dec-2018.

Version installed: postgresql 9.1

Extension installed: postgis 1.5.3-1

Instructions to restore the database:

Create template:
createdb -E UTF8 -O postgres -U postgres template_postgis

Activate PL/pgSQL language:
createlang plpgsql -d template_postgis -U postgres

Load definitions of PostGIS:
psql -d template_postgis -U postgres -f /usr/share/postgresql/9.1/contrib/postgis-1.5/postgis.sql

psql -d template_postgis -U postgres -f /usr/share/postgresql/9.1/contrib/postgis-1.5/spatial_ref_sys.sql

psql -d template_postgis -U postgres -f /usr/share/postgresql/9.1/contrib/postgis_comments.sql

Create database with "MeteoGalicia" name with PostGIS extension:
createdb -U postgres -T template_postgis MeteoGalicia

Restore backup:
cat Meteogalicia* | psql MeteoGalicia
n
PostgreSQL
neuinfo.org
scicrunch.org
+2more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). PostgreSQL [Dataset]. http://identifiers.org/RRID:SCR_021067
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_021067 https://identifiers.org/RRID:SCR_021067/resolver?q=&i=rrid
Dataset updated
Jan 29, 2022
Description
Open source object relational database system that uses and extends SQL language combined with many features that safely store and scale the most complicated data workloads. PostgreSQL runs on all major operating systems.
Most popular database management systems worldwide 2024
statista.com
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Most popular database management systems worldwide 2024 [Dataset]. https://www.statista.com/statistics/809750/worldwide-popularity-ranking-database-management-systems/
Explore at:
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jun 2024
Area covered
Worldwide
Description
As of June 2024, the most popular database management system (DBMS) worldwide was Oracle, with a ranking score of *******; MySQL and Microsoft SQL server rounded out the top three. Although the database management industry contains some of the largest companies in the tech industry, such as Microsoft, Oracle and IBM, a number of free and open-source DBMSs such as PostgreSQL and MariaDB remain competitive. Database Management Systems As the name implies, DBMSs provide a platform through which developers can organize, update, and control large databases. Given the business world’s growing focus on big data and data analytics, knowledge of SQL programming languages has become an important asset for software developers around the world, and database management skills are seen as highly desirable. In addition to providing developers with the tools needed to operate databases, DBMS are also integral to the way that consumers access information through applications, which further illustrates the importance of the software.
Z
Data from: Atlas of European Eel Distribution (Anguilla anguilla) in...
data.niaid.nih.gov
zenodo.org
Updated Jul 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Beaulaton, Laurent (2024). Atlas of European Eel Distribution (Anguilla anguilla) in Portugal, Spain and France [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6021837
Explore at:
Dataset updated
Jul 12, 2024
Dataset provided by
Zamora, Lluis
Briand, Cédric
Beaulaton, Laurent
Korta, Maria
Drouineau, Hilaire
Pella, Herve
Bardonnet, Agnès
Mateo, Maria
Amilhat, Elsa
Díaz, Estibalitz
Herrera, Mercedes
Domingos, Isabel
Fernández-Delgado, Carlos
De Miguel Rubio, Ramon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
France, Spain, Portugal
Description
DESCRIPTION

VERSIONS

version1.0.1 fixes problem with functions

version1.0.2 added table dbeel_rivers.rn_rivermouth with GEREM basin, distance to Gibraltar and link to CCM.

version1.0.3 fixes problem with functions

version1.0.4 adds views rn_rna and rn_rne to the database

The SUDOANG project aims at providing common tools to managers to support eel conservation in the SUDOE area (Spain, France and Portugal). VISUANG is the SUDOANG Interactive Web Application that host all these tools . The application consists of an eel distribution atlas (GT1), assessments of mortalities caused by turbines and an atlas showing obstacles to migration (GT2), estimates of recruitment and exploitation rate (GT3) and escapement (chosen as a target by the EC for the Eel Management Plans) (GT4). In addition, it includes an interactive map showing sampling results from the pilot basin network produced by GT6.

The eel abundance for the eel atlas and escapement has been obtained using the Eel Density Analysis model (EDA, GT4's product). EDA extrapolates the abundance of eel in sampled river segments to other segments taking into account how the abundance, sex and size of the eels change depending on different parameters. Thus, EDA requires two main data sources: those related to the river characteristics and those related to eel abundance and characteristics.

However, in both cases, data availability was uneven in the SUDOE area. In addition, this information was dispersed among several managers and in different formats due to different sampling sources: Water Framework Directive (WFD), Community Framework for the Collection, Management and Use of Data in the Fisheries Sector (EUMAP), Eel Management Plans, research groups, scientific papers and technical reports. Therefore, the first step towards having eel abundance estimations including the whole SUDOE area, was to have a joint river and eel database. In this report we will describe the database corresponding to the river’s characteristics in the SUDOE area and the eel abundances and their characteristics.

In the case of rivers, two types of information has been collected:

River topology (RN table): a compilation of data on rivers and their topological and hydrographic characteristics in the three countries.

River attributes (RNA table): contains physical attributes that have fed the SUDOANG models.

The estimation of eel abundance and characteristic (size, biomass, sex-ratio and silver) distribution at different scales (river segment, basin, Eel Management Unit (EMU), and country) in the SUDOE area obtained with the implementation of the EDA2.3 model has been compiled in the RNE table (eel predictions).

CURRENT ACTIVE PROJECT

The project is currently active here : gitlab forgemia

TECHNICAL DESCRIPTION TO BUILD THE POSTGRES DATABASE

Build the database in postgres.

All tables are in ESPG:3035 (European LAEA). The format is postgreSQL database. You can download other formats (shapefiles, csv), here SUDOANG gt1 database.

Initial command

open a shell with command CMD

Move to the place where you have downloaded the file using the following command

cd c:/path/to/my/folder

note psql must be accessible, in windows you can add the path to the postgres

bin folder, otherwise you need to add the full path to the postgres bin folder see link to instructions below

createdb -U postgres eda2.3 psql -U postgres eda2.3

this will open a command with # where you can launch the commands in the next box

Within the psql command

create extension "postgis"; create extension "dblink"; create extension "ltree"; create extension "tablefunc"; create schema dbeel_rivers; create schema france; create schema spain; create schema portugal; -- type \q to quit the psql shell

Now the database is ready to receive the differents dumps. The dump file are large. You might not need the part including unit basins or waterbodies. All the tables except waterbodies and unit basins are described in the Atlas. You might need to understand what is inheritance in a database. https://www.postgresql.org/docs/12/tutorial-inheritance.html

RN (riversegments)

These layers contain the topology (see Atlas for detail)

dbeel_rivers.rn

france.rn

spain.rn

portugal.rn

Columns (see Atlas)

gid idsegment source target lengthm nextdownidsegment path isfrontier issource seaidsegment issea geom isendoreic isinternational country

dbeel_rivers.rn_rivermouth

seaidsegment geom (polygon) gerem_zone_3 gerem_zone_4 (used in EDA) gerem_zone_5 ccm_wso_id country emu_name_short geom_outlet (point) name_basin dist_from_gibraltar_km name_coast basin_name

dbeel_rivers.rn ! mandatory => table at the international level from which

the other table inherit

even if you don't want to use other countries

(In many cases you should ... there are transboundary catchments) download this first.

the rn network must be restored firt !

table rne and rna refer to it by foreign keys.

pg_restore -U postgres -d eda2.3 "dbeel_rivers.rn.backup"

france

pg_restore -U postgres -d eda2.3 "france.rn.backup"

spain

pg_restore -U postgres -d eda2.3 "spain.rn.backup"

portugal

pg_restore -U postgres -d eda2.3 "portugal.rn.backup"

rivermouth and basins, this file contains GEREM basins, distance to Gibraltar, the link to CCM id

for each basin flowing to the sea. pg_restore -U postgres -d eda2.3 "dbeel_rivers.rn_rivermouth.backup"

with the schema you will probably want to be able to use the functions, but launch this only after

restoring rna in the next step

psql -U postgres -d eda2.3 -f "function_dbeel_rivers.sql"

RNA (Attributes)

This corresponds to tables

dbeel_rivers.rna

france.rna

spain.rna

portugal.rna

Columns (See Atlas)

idsegment altitudem distanceseam distancesourcem cumnbdam medianflowm3ps surfaceunitbvm2 surfacebvm2 strahler shreeve codesea name pfafriver pfafsegment basin riverwidthm temperature temperaturejan temperaturejul wettedsurfacem2 wettedsurfaceotherm2 lengthriverm emu cumheightdam riverwidthmsource slope dis_m3_pyr_riveratlas dis_m3_pmn_riveratlas dis_m3_pmx_riveratlas drought drought_type_calc

Code :

pg_restore -U postgres -d eda2.3 "dbeel_rivers.rna.backup" pg_restore -U postgres -d eda2.3 "france.rna.backup" pg_restore -U postgres -d eda2.3 "spain.rna.backup"
pg_restore -U postgres -d eda2.3 "portugal.rna.backup"

RNE (eel predictions)

These layers contain eel data (see Atlas for detail)

dbeel_rivers.rne

france.rne

spain.rne

portugal.rne

Columns (see Atlas)

idsegment surfaceunitbvm2 surfacebvm2 delta gamma density neel beel peel150 peel150300 peel300450 peel450600 peel600750 peel750 nsilver bsilver psilver150300 psilver300450 psilver450600 psilver600750 psilver750 psilver pmale150300 pmale300450 pmale450600 pfemale300450 pfemale450600 pfemale600750 pfemale750 pmale pfemale sex_ratio cnfemale300450 cnfemale450600 cnfemale600750 cnfemale750 cnmale150300 cnmale300450 cnmale450600 cnsilver150300 cnsilver300450 cnsilver450600 cnsilver600750 cnsilver750 cnsilver delta_tr gamma_tr type_fit_delta_tr type_fit_gamma_tr density_tr density_pmax_tr neel_pmax_tr nsilver_pmax_tr density_wd neel_wd beel_wd nsilver_wd bsilver_wd sector_tr year_tr is_current_distribution_area is_pristine_distribution_area_1985

Code for restauration

pg_restore -U postgres -d eda2.3 "dbeel_rivers.rne.backup" pg_restore -U postgres -d eda2.3 "france.rne.backup" pg_restore -U postgres -d eda2.3 "spain.rne.backup"
pg_restore -U postgres -d eda2.3 "portugal.rne.backup"

Unit basins

Units basins are not described in the Altas. They correspond to the following tables :

dbeel_rivers.basinunit_bu

france.basinunit_bu

spain.basinunit_bu

portugal.basinunit_bu

france.basinunitout_buo

spain.basinunitout_buo

portugal.basinunitout_buo

The unit basins is the simple basin that surrounds a segment. It correspond to the topography unit from which unit segment have been calculated. ESPG 3035. Tables bu_unitbv, and bu_unitbvout inherit from dbeel_rivers.unit_bv. The first table intersects with a segment, the second table does not, it corresponds to basin polygons which do not have a riversegment.

Source :

Portugal

https://sniambgeoviewer.apambiente.pt/Geodocs/gml/inspire/HY_PhysicalWaters_DrainageBasinGeoCod.ziphttps://sniambgeoviewer.apambiente.pt/Geodocs/gml/inspire/HY_PhysicalWaters_DrainageBasinGeoCod.zip

France

In france unit bv corresponds to the RHT (Pella et al., 2012)

Spain

http://www.mapama.gob.es/ide/metadatos/index.html?srv=metadata.show&uuid=898f0ff8-f06c-4c14-88f7-43ea90e48233

pg_restore -U postgres -d eda2.3 'dbeel_rivers.basinunit_bu.backup'

france

pg_restore -U postgres -d eda2.3
d
PostgreSQL Dump of IMDB Data for JOB Workload
search.dataone.org
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marcus, Ryan (2023). PostgreSQL Dump of IMDB Data for JOB Workload [Dataset]. http://doi.org/10.7910/DVN/2QYZBT
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/2QYZBT
Dataset updated
Nov 22, 2023
Dataset provided by
Harvard Dataverse
Authors
Marcus, Ryan
Description
This is a dump generated by pg_dump -Fc of the IMDb data used in the "How Good are Query Optimizers, Really?" paper. PostgreSQL compatible SQL queries and scripts to automatically create a VM with this dataset can be found here: https://git.io/imdb
SQL Databases for Students and Educators
zenodo.org
data.niaid.nih.gov
bin, html
Updated Oct 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mauricio Vargas Sepúlveda; Mauricio Vargas Sepúlveda (2020). SQL Databases for Students and Educators [Dataset]. http://doi.org/10.5281/zenodo.4136985
Explore at:
bin, htmlAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4136985
Dataset updated
Oct 28, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mauricio Vargas Sepúlveda; Mauricio Vargas Sepúlveda
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Publicly accessible databases often impose query limits or require registration. Even when I maintain public and limit-free APIs, I never wanted to host a public database because I tend to think that the connection strings are a problem for the user.

I’ve decided to host different light/medium size by using PostgreSQL, MySQL and SQL Server backends (in strict descending order of preference!).

Why 3 database backends? I think there are a ton of small edge cases when moving between DB back ends and so testing lots with live databases is quite valuable. With this resource you can benchmark speed, compression, and DDL types.

Please send me a tweet if you need the connection strings for your lectures or workshops. My Twitter username is @pachamaltese. See the SQL dumps on each section to have the data locally.
Papyrus dataset postgres dump
zenodo.org
data.niaid.nih.gov
tar
Updated Jul 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rachael Skyner; Ben Tehan; Rachael Skyner; Ben Tehan (2022). Papyrus dataset postgres dump [Dataset]. http://doi.org/10.5281/zenodo.6866697
Explore at:
tarAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6866697
Dataset updated
Jul 21, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Rachael Skyner; Ben Tehan; Rachael Skyner; Ben Tehan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset that the database dump was created from is described here: 10.33774/chemrxiv-2021-1rxhk

A dump of the postgres database created from the code in the 'postgres' directory (Papyrus-scripts/src/papyrus_scripts/postgres/) of Rachael Skyner's fork (https://github.com/reskyner/Papyrus-scripts) of Oliver Bequignon's Papyrus-scripts github (https://github.com/OlivierBeq/Papyrus-scripts). The database was created by:

1. Download the papyrus csv files from Oliver's code using the download functionality

2. Spin up a 'papyrus' container using the docker-compose.yml file in Rachael's fork (running on a machine with access to the postgres instance you want to add the database to)

3. Start a shell in the papyrus container with docker exec -it papyrus /bin/bash

4. Start a jupyter notebook server with jupyter notebook --ip 0.0.0.0 --allow-root --no-browser

5. Run the two notebooks (1-insert_molecule_data.ipynb and 2-insert_activities.ipynb) in order

6. Create a dump of the database
Most commonly used database technologies among developers worldwide 2023
statista.com
Updated Jul 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Most commonly used database technologies among developers worldwide 2023 [Dataset]. https://www.statista.com/statistics/794187/united-states-developer-survey-most-wanted-used-database-technologies/
Explore at:
Dataset updated
Jul 1, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 8, 2023 - May 19, 2023
Area covered
Worldwide
Description
In 2023, over ** percent of surveyed software developers worldwide reported using PostgreSQL, the highest share of any database technology. Other popular database tools among developers included MySQL and SQLite.
Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter...
zenodo.org
explore.openaire.eu
bz2
Updated Mar 15, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana (2021). Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter Notebooks [Dataset]. http://doi.org/10.5281/zenodo.2592524
Explore at:
bz2Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.2592524
Dataset updated
Mar 15, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of Jupyter Notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we analyzed 1.4 million notebooks from GitHub.

Paper: https://2019.msrconf.org/event/msr-2019-papers-a-large-scale-study-about-quality-and-reproducibility-of-jupyter-notebooks

This repository contains two files:

dump.tar.bz2

jupyter_reproducibility.tar.bz2

The dump.tar.bz2 file contains a PostgreSQL dump of the database, with all the data we extracted from the notebooks.

The jupyter_reproducibility.tar.bz2 file contains all the scripts we used to query and download Jupyter Notebooks, extract data from them, and analyze the data. It is organized as follows:

analyses: this folder has all the notebooks we use to analyze the data in the PostgreSQL database.

archaeology: this folder has all the scripts we use to query, download, and extract data from GitHub notebooks.

paper: empty. The notebook analyses/N12.To.Paper.ipynb moves data to it

In the remaining of this text, we give instructions for reproducing the analyses, by using the data provided in the dump and reproducing the collection, by collecting data from GitHub again.

Reproducing the Analysis

This section shows how to load the data in the database and run the analyses notebooks. In the analysis, we used the following environment:

Ubuntu 18.04.1 LTS
PostgreSQL 10.6
Conda 4.5.11
Python 3.7.2
PdfCrop 2012/11/02 v1.38

First, download dump.tar.bz2 and extract it:

tar -xjf dump.tar.bz2

It extracts the file db2019-03-13.dump. Create a database in PostgreSQL (we call it "jupyter"), and use psql to restore the dump:

psql jupyter < db2019-03-13.dump

It populates the database with the dump. Now, configure the connection string for sqlalchemy by setting the environment variable JUP_DB_CONNECTTION:

export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter";

Download and extract jupyter_reproducibility.tar.bz2:

tar -xjf jupyter_reproducibility.tar.bz2

Create a conda environment with Python 3.7:

conda create -n analyses python=3.7 conda activate analyses

Go to the analyses folder and install all the dependencies of the requirements.txt

cd jupyter_reproducibility/analyses pip install -r requirements.txt

For reproducing the analyses, run jupyter on this folder:

jupyter notebook

Execute the notebooks on this order:

Index.ipynb

N0.Repository.ipynb

N1.Skip.Notebook.ipynb

N2.Notebook.ipynb

N3.Cell.ipynb

N4.Features.ipynb

N5.Modules.ipynb

N6.AST.ipynb

N7.Name.ipynb

N8.Execution.ipynb

N9.Cell.Execution.Order.ipynb

N10.Markdown.ipynb

N11.Repository.With.Notebook.Restriction.ipynb

N12.To.Paper.ipynb

Reproducing or Expanding the Collection

The collection demands more steps to reproduce and takes much longer to run (months). It also involves running arbitrary code on your machine. Proceed with caution.

Requirements

This time, we have extra requirements:

All the analysis requirements
lbzip2 2.5
gcc 7.3.0
Github account
Gmail account

Environment

First, set the following environment variables:

export JUP_MACHINE="db"; # machine identifier export JUP_BASE_DIR="/mnt/jupyter/github"; # place to store the repositories export JUP_LOGS_DIR="/home/jupyter/logs"; # log files export JUP_COMPRESSION="lbzip2"; # compression program export JUP_VERBOSE="5"; # verbose level export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter"; # sqlchemy connection export JUP_GITHUB_USERNAME="github_username"; # your github username export JUP_GITHUB_PASSWORD="github_password"; # your github password export JUP_MAX_SIZE="8000.0"; # maximum size of the repositories directory (in GB) export JUP_FIRST_DATE="2013-01-01"; # initial date to query github export JUP_EMAIL_LOGIN="gmail@gmail.com"; # your gmail address export JUP_EMAIL_TO="target@email.com"; # email that receives notifications export JUP_OAUTH_FILE="~/oauth2_creds.json" # oauth2 auhentication file export JUP_NOTEBOOK_INTERVAL=""; # notebook id interval for this machine. Leave it in blank export JUP_REPOSITORY_INTERVAL=""; # repository id interval for this machine. Leave it in blank export JUP_WITH_EXECUTION="1"; # run execute python notebooks export JUP_WITH_DEPENDENCY="0"; # run notebooks with and without declared dependnecies export JUP_EXECUTION_MODE="-1"; # run following the execution order export JUP_EXECUTION_DIR="/home/jupyter/execution"; # temporary directory for running notebooks export JUP_ANACONDA_PATH="~/anaconda3"; # conda installation path export JUP_MOUNT_BASE="/home/jupyter/mount_ghstudy.sh"; # bash script to mount base dir export JUP_UMOUNT_BASE="/home/jupyter/umount_ghstudy.sh"; # bash script to umount base dir export JUP_NOTEBOOK_TIMEOUT="300"; # timeout the extraction # Frequenci of log report export JUP_ASTROID_FREQUENCY="5"; export JUP_IPYTHON_FREQUENCY="5"; export JUP_NOTEBOOKS_FREQUENCY="5"; export JUP_REQUIREMENT_FREQUENCY="5"; export JUP_CRAWLER_FREQUENCY="1"; export JUP_CLONE_FREQUENCY="1"; export JUP_COMPRESS_FREQUENCY="5"; export JUP_DB_IP="localhost"; # postgres database IP

Then, configure the file ~/oauth2_creds.json, according to yagmail documentation: https://media.readthedocs.org/pdf/yagmail/latest/yagmail.pdf

Configure the mount_ghstudy.sh and umount_ghstudy.sh scripts. The first one should mount the folder that stores the directories. The second one should umount it. You can leave the scripts in blank, but it is not advisable, as the reproducibility study runs arbitrary code on your machine and you may lose your data.

Scripts

Download and extract jupyter_reproducibility.tar.bz2:

tar -xjf jupyter_reproducibility.tar.bz2

Install 5 conda environments and 5 anaconda environments, for each python version. In each of them, upgrade pip, install pipenv, and install the archaeology package (Note that it is a local package that has not been published to pypi. Make sure to use the -e option):

Conda 2.7

conda create -n raw27 python=2.7 -y conda activate raw27 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Anaconda 2.7

conda create -n py27 python=2.7 anaconda -y conda activate py27 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.4

It requires a manual jupyter and pathlib2 installation due to some incompatibilities found on the default installation.

conda create -n raw34 python=3.4 -y conda activate raw34 conda install jupyter -c conda-forge -y conda uninstall jupyter -y pip install --upgrade pip pip install jupyter pip install pipenv pip install -e jupyter_reproducibility/archaeology pip install pathlib2

Anaconda 3.4

conda create -n py34 python=3.4 anaconda -y conda activate py34 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.5

conda create -n raw35 python=3.5 -y conda activate raw35 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Anaconda 3.5

It requires the manual installation of other anaconda packages.

conda create -n py35 python=3.5 anaconda -y conda install -y appdirs atomicwrites keyring secretstorage libuuid navigator-updater prometheus_client pyasn1 pyasn1-modules spyder-kernels tqdm jeepney automat constantly anaconda-navigator conda activate py35 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.6

conda create -n raw36 python=3.6 -y conda activate raw36 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Anaconda 3.6

conda create -n py36 python=3.6 anaconda -y conda activate py36 conda install -y anaconda-navigator jupyterlab_server navigator-updater pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.7

<code
Reproducibility in Practice: Dataset of a Large-Scale Study of Jupyter...
zenodo.org
bz2
Updated Mar 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous; Anonymous (2021). Reproducibility in Practice: Dataset of a Large-Scale Study of Jupyter Notebooks [Dataset]. http://doi.org/10.5281/zenodo.2538877
Explore at:
bz2Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.2538877
Dataset updated
Mar 15, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anonymous; Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of Jupyter Notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we analyzed 1.4 million notebooks from GitHub.

This repository contains two files:

dump.tar.bz2

jupyter_reproducibility.tar.bz2

The dump.tar.bz2 file contains a PostgreSQL dump of the database, with all the data we extracted from the notebooks.

The jupyter_reproducibility.tar.bz2 file contains all the scripts we used to query and download Jupyter Notebooks, extract data from them, and analyze the data. It is organized as follows:

analyses: this folder has all the notebooks we use to analyze the data in the PostgreSQL database.

archaeology: this folder has all the scripts we use to query, download, and extract data from GitHub notebooks.

paper: empty. The notebook analyses/N11.To.Paper.ipynb moves data to it

In the remaining of this text, we give instructions for reproducing the analyses, by using the data provided in the dump and reproducing the collection, by collecting data from GitHub again.

Reproducing the Analysis

This section shows how to load the data in the database and run the analyses notebooks. In the analysis, we used the following environment:

Ubuntu 18.04.1 LTS
PostgreSQL 10.6
Conda 4.5.1
Python 3.6.8
PdfCrop 2012/11/02 v1.38

First, download dump.tar.bz2 and extract it:

tar -xjf dump.tar.bz2

It extracts the file db2019-01-13.dump. Create a database in PostgreSQL (we call it "jupyter"), and use psql to restore the dump:

psql jupyter < db2019-01-13.dump

It populates the database with the dump. Now, configure the connection string for sqlalchemy by setting the environment variable JUP_DB_CONNECTTION:

export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter";

Download and extract jupyter_reproducibility.tar.bz2:

tar -xjf jupyter_reproducibility.tar.bz2

Create a conda environment with Python 3.6:

conda create -n py36 python=3.6

Go to the analyses folder and install all the dependencies of the requirements.txt

cd jupyter_reproducibility/analyses pip install -r requirements.txt

For reproducing the analyses, run jupyter on this folder:

jupyter notebook

Execute the notebooks on this order:

N0.Index.ipynb

N1.Repository.ipynb

N2.Notebook.ipynb

N3.Cell.ipynb

N4.Features.ipynb

N5.Modules.ipynb

N6.AST.ipynb

N7.Name.ipynb

N8.Execution.ipynb

N9.Cell.Execution.Order.ipynb

N10.Markdown.ipynb

N11.To.Paper.ipynb

Reproducing or Expanding the Collection

The collection demands more steps to reproduce and takes much longer to run (months). It also involves running arbitrary code on your machine. Proceed with caution.

Requirements

This time, we have extra requirements:

All the analysis requirements
lbzip2 2.5
gcc 7.3.0
Github account
Gmail account

Environment

First, set the following environment variables:

export JUP_MACHINE="db"; # machine identifier export JUP_BASE_DIR="/mnt/jupyter/github"; # place to store the repositories export JUP_LOGS_DIR="/home/jupyter/logs"; # log files export JUP_COMPRESSION="lbzip2"; # compression program export JUP_VERBOSE="5"; # verbose level export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter"; # sqlchemy connection export JUP_GITHUB_USERNAME="github_username"; # your github username export JUP_GITHUB_PASSWORD="github_password"; # your github password export JUP_MAX_SIZE="8000.0"; # maximum size of the repositories directory (in GB) export JUP_FIRST_DATE="2013-01-01"; # initial date to query github export JUP_EMAIL_LOGIN="gmail@gmail.com"; # your gmail address export JUP_EMAIL_TO="target@email.com"; # email that receives notifications export JUP_OAUTH_FILE="~/oauth2_creds.json" # oauth2 auhentication file export JUP_NOTEBOOK_INTERVAL=""; # notebook id interval for this machine. Leave it in blank export JUP_REPOSITORY_INTERVAL=""; # repository id interval for this machine. Leave it in blank export JUP_WITH_EXECUTION="1"; # run execute python notebooks export JUP_WITH_DEPENDENCY="0"; # run notebooks with and without declared dependnecies export JUP_EXECUTION_MODE="-1"; # run following the execution order export JUP_EXECUTION_DIR="/home/jupyter/execution"; # temporary directory for running notebooks export JUP_ANACONDA_PATH="~/anaconda3"; # conda installation path export JUP_MOUNT_BASE="/home/jupyter/mount_ghstudy.sh"; # bash script to mount base dir export JUP_UMOUNT_BASE="/home/jupyter/umount_ghstudy.sh"; # bash script to umount base dir export JUP_NOTEBOOK_TIMEOUT="300"; # timeout the extraction # Frequenci of log report export JUP_ASTROID_FREQUENCY="5"; export JUP_IPYTHON_FREQUENCY="5"; export JUP_NOTEBOOKS_FREQUENCY="5"; export JUP_REQUIREMENT_FREQUENCY="5"; export JUP_CRAWLER_FREQUENCY="1"; export JUP_CLONE_FREQUENCY="1"; export JUP_COMPRESS_FREQUENCY="5"; export JUP_DB_IP="localhost"; # postgres database IP

Then, configure the file ~/oauth2_creds.json, according to yagmail documentation: https://media.readthedocs.org/pdf/yagmail/latest/yagmail.pdf

Configure the mount_ghstudy.sh and umount_ghstudy.sh scripts. The first one should mount the folder that stores the directories. The second one should umount it. You can leave the scripts in blank, but it is not advisable, as the reproducibility study runs arbitrary code on your machine and you may lose your data.

Scripts

Download and extract jupyter_reproducibility.tar.bz2:

tar -xjf jupyter_reproducibility.tar.bz2

Install 5 conda environments and 5 anaconda environments, for each python version. In each of them, upgrade pip, install pipenv, and install the archaeology package (Note that it is a local package that has not been published to pypi. Make sure to use the -e option):

Conda 2.7

conda create -n raw27 python=2.7 -y conda activate raw27 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Anaconda 2.7

conda create -n py27 python=2.7 anaconda -y conda activate py27 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.4

It requires a manual jupyter and pathlib2 installation due to some incompatibilities found on the default installation.

conda create -n raw34 python=3.4 -y conda activate raw34 conda install jupyter -c conda-forge -y conda uninstall jupyter -y pip install --upgrade pip pip install jupyter pip install pipenv pip install -e jupyter_reproducibility/archaeology pip install pathlib2

Anaconda 3.4

conda create -n py34 python=3.4 anaconda -y conda activate py34 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.5

conda create -n raw35 python=3.5 -y conda activate raw35 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Anaconda 3.5

It requires the manual installation of other anaconda packages.

conda create -n py35 python=3.5 anaconda -y conda install -y appdirs atomicwrites keyring secretstorage libuuid navigator-updater prometheus_client pyasn1 pyasn1-modules spyder-kernels tqdm jeepney automat constantly anaconda-navigator conda activate py35 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.6

conda create -n raw36 python=3.6 -y conda activate raw36 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Anaconda 3.6

conda create -n py36 python=3.6 anaconda -y conda activate py36 conda install -y anaconda-navigator jupyterlab_server navigator-updater pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.7

conda create -n raw37 python=3.7 -y conda activate raw37 pip install --upgrade pip pip install pipenv pip install -e
c
ckanext-jsondatastore
catalog.civicdataecosystem.org
Updated Jun 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). ckanext-jsondatastore [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-jsondatastore
Explore at:
Dataset updated
Jun 4, 2025
Description
The jsondatastore extension for CKAN is designed as a proof-of-concept for utilizing the Postgres JSONB data type within CKAN's datastore. This extension allows storing JSON data natively within a Postgres database, potentially offering advantages in terms of data storage and retrieval flexibility. Importantly, the readme explicitly states that this is not intended for actual use in its current state. Key Features: Postgres JSONB Integration: Leverages the Postgres JSONB data type for storing JSON data. This facilitates flexible schema management and efficient querying of JSON data directly within the database. Proof-of-Concept Implementation: Demonstrates the feasibility of using JSONB within the CKAN datastore. The intention is primarily experimental rather than production-ready. SQLAlchemy Compatibility: Requires SQLAlchemy version 0.9.7, hinting at potential compatibility with CKAN's ORM for seamless integration with database operations. Technical Integration: The extension integrates with CKAN by enabling the jsondatastore plugin in the CKAN configuration file (ckan.plugins). This suggests it modifies or extends CKAN's datastore functionality to handle JSON data using Postgres JSONB storage. The extension also introduces dependencies on SQLAlchemy 0.9.7 and Postgres 9.4. Benefits & Impact: While described as a proof-of-concept and not production ready, the extension demonstrates a possible avenue for CKAN to leverage JSONB, which leads to a more flexible storage solution within CKAN's datastore. In the future, this approach could allow users to store and query semi-structured data more easily. The JSONB type generally provides advantages in terms of querying and indexing JSON data compared to storing JSON as plain text.
e
Eolicos Parks
data.europa.eu
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eolicos Parks [Dataset]. https://data.europa.eu/data/datasets/1451ff3c-feb9-4d6e-9d20-aa28deff3219/
Explore at:
Description
Layer containing the location of wind farms in Andalusia. The data is updated every six months. The data is stored in a spatial Postgres database maintained by the Andalusian Energy Agency. The information is published on the WMS and WFS web services.
Z
Data from: International authorship and collaboration across bioRxiv...
data.niaid.nih.gov
zenodo.org
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard J. Abdill (2024). Data from: International authorship and collaboration across bioRxiv preprints [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3762814
Explore at:
Dataset updated
Jul 19, 2024
Dataset provided by
Richard J. Abdill
Elizabeth M. Adamowicz
Ran Blekhman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data and supplementary tables for "International authorship and collaboration across bioRxiv preprints," a paper first posted to bioRxiv and now published in eLife.

"reproduce.md" includes all R code used to generate figures and perform analyses described in the paper.

"biorxiv_countries.postgres.backup" is a database snapshot that can be loaded into a PostgreSQL database to access all data collected and used in the study.

"schema.pdf" describes each field in each table of the database.

"manual_edits.sql" describes all corrections made to the automated inference of the country-level affiliations inferred for all authors.

"affiliation_corrections.csv" lists every unique affiliation string that was re-categorized after institutional corrections. The consequences of the corrections described in "manual_edits.sql."

"institution_corrections_summary.csv" summarizes "affiliation_corrections.csv" by listing each "before" and "after" correction one time. It is important to note that each before/after pair does not necessarily indicate that every affiliation string from the "before" institution was reassigned to the "after" institution, just that at least one affiliation string was switched from one to the other.

Note that the final two "corrections" files describe steps taken to correct the institution-level associations between authors and countries. The final set of corrections assigned authors to countries using heuristics that did not take institution-level accuracy into account.

Version history:

1.0.0: New files uploaded reflecting substantial corrections to the data, mostly linked to classification of authors and preprints previously without a country classification. (26 Jun 2020)

0.2.1: Added "schema.pdf" file, previously only in the manuscript.

0.2.0: Added new files "affiliation_corrections.csv" and "institution_corrections_summary.csv"

0.1.1: Database snapshot added.

0.1.0: First version with supplementary tables added.
u
Proximity to Water Bodies (DMTI CanMap documentation) - 1 - Catalogue -...
data.urbandatacentre.ca
beta.data.urbandatacentre.ca
Updated Sep 18, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Proximity to Water Bodies (DMTI CanMap documentation) - 1 - Catalogue - Canadian Urban Data Catalogue (CUDC) [Dataset]. https://data.urbandatacentre.ca/dataset/proximity-to-water-bodies-dmti-canmap-documentation-1
Explore at:
Dataset updated
Sep 18, 2023
Description
Hydrology files from DMTI Spatial Inc. CanMap Content Suite for 2018 (water bodies, water lines) were downloaded via the University of Victoria library, and loaded into a PostGRES database. Specifically for distance to oceans, we used the 2011 Hydrographic Layers - coast GIS file from Statistics Canada. Distances in metres to the nearest water feature within 5 kilometres by class (defined below) was calculated using PostGRES, for all DMTI Spatial Inc. single-link postal code for all years. This assumes that water features have remained constant over time. Note: many waterbodies in Alberta were coded as unknown in the water_defn column. Any features that were otherwise coded as permanent and had a river name or lake name were re-coded as watercourse and lake respectively and added to the appropriate class prior to calculation. NOTE: Features from the DMTI waterbody file are large enough to be representing as polygons or rivers/channels with right and left banks delineated. Features from the DMTI waterline file are narrow enough that they are only represented as a single line feature, rather than having a right and left bank.
d
gnaf-loader
data.gov.au
Updated Aug 21, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). gnaf-loader [Dataset]. https://data.gov.au/dataset/7863c61a-a46f-4441-a1c6-97c516012aac
Explore at:
Dataset updated
Aug 21, 2016
Description
A Python script for quickly loading the complete G-NAF and PSMA Administrative Boundaries into Postgres, simplified and ready to use as reference data for address validation, geocoding, analysis and …Show full descriptionA Python script for quickly loading the complete G-NAF and PSMA Administrative Boundaries into Postgres, simplified and ready to use as reference data for address validation, geocoding, analysis and visualisation. It also customises G-NAF and the Admin Bdys to remove some of the known, minor limitations of the data.
s
Groundtruthing points for the Falkland Islands fine scale habitat map...
dataportal.saeri.org
Updated Jan 30, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Groundtruthing points for the Falkland Islands fine scale habitat map collected using ODK app on smartphones [Dataset]. https://dataportal.saeri.org/dataset/groundtruthing-points-for-the-falkland-islands-fine-scale-habitat-map-collected-using-odk-app-on-sma
Explore at:
Dataset updated
Jan 30, 2020
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Area covered
Falkland Islands (Islas Malvinas)
Description
The dataset is a geojson file wich includes points used for validating the fine-scale habitat map for the Falkland Islands. Points were collected using mobile devices through a customised Open Data Kit form. The app was used between 2018 & 2019 and data were collected at various locations by various personnel. The data were then exported from postgres database to a geojson file.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Jose Vidal-Paz; Jose Vidal-Paz (2024). Meteogalicia PostgreSQL Database (2000 - 2018) [Dataset]. http://doi.org/10.5281/zenodo.11915325

Data from: Meteogalicia PostgreSQL Database (2000 - 2018)

Explore at:

binAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.11915325

Dataset updated

Sep 9, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Jose Vidal-Paz; Jose Vidal-Paz

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This database contains: rainfall, humidity, temperature, global solar radiation, wind velocity and wind direction ten-minute data from 150 stations of the Meteogalicia network between 1-jan-2000 and 31-dec-2018.

Version installed: postgresql 9.1

Extension installed: postgis 1.5.3-1

Instructions to restore the database:

Create template:
createdb -E UTF8 -O postgres -U postgres template_postgis
Activate PL/pgSQL language:
createlang plpgsql -d template_postgis -U postgres
Load definitions of PostGIS:
psql -d template_postgis -U postgres -f /usr/share/postgresql/9.1/contrib/postgis-1.5/postgis.sql

psql -d template_postgis -U postgres -f /usr/share/postgresql/9.1/contrib/postgis-1.5/spatial_ref_sys.sql

psql -d template_postgis -U postgres -f /usr/share/postgresql/9.1/contrib/postgis_comments.sql
Create database with "MeteoGalicia" name with PostGIS extension:
createdb -U postgres -T template_postgis MeteoGalicia
Restore backup:
cat Meteogalicia* | psql MeteoGalicia

Clear search

Close search

Google apps

Main menu

Data from: Meteogalicia PostgreSQL Database (2000 - 2018)

PostgreSQL

Most popular database management systems worldwide 2024

Data from: Atlas of European Eel Distribution (Anguilla anguilla) in...

open a shell with command CMD

Move to the place where you have downloaded the file using the following command

note psql must be accessible, in windows you can add the path to the postgres

bin folder, otherwise you need to add the full path to the postgres bin folder see link to instructions below

this will open a command with # where you can launch the commands in the next box

dbeel_rivers.rn ! mandatory => table at the international level from which

the other table inherit

even if you don't want to use other countries

(In many cases you should ... there are transboundary catchments) download this first.

the rn network must be restored firt !

table rne and rna refer to it by foreign keys.

france

spain

portugal

rivermouth and basins, this file contains GEREM basins, distance to Gibraltar, the link to CCM id

with the schema you will probably want to be able to use the functions, but launch this only after

restoring rna in the next step

france

PostgreSQL Dump of IMDB Data for JOB Workload

SQL Databases for Students and Educators

Papyrus dataset postgres dump

Most commonly used database technologies among developers worldwide 2023

Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter...

Reproducibility in Practice: Dataset of a Large-Scale Study of Jupyter...

ckanext-jsondatastore

Eolicos Parks

Data from: International authorship and collaboration across bioRxiv...

Proximity to Water Bodies (DMTI CanMap documentation) - 1 - Catalogue -...

gnaf-loader

Groundtruthing points for the Falkland Islands fine scale habitat map...

Data from: Meteogalicia PostgreSQL Database (2000 - 2018)