44 datasets found
  1. n

    PostgreSQL

    • neuinfo.org
    • scicrunch.org
    • +2more
    Updated Jan 29, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). PostgreSQL [Dataset]. http://identifiers.org/RRID:SCR_021067
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Open source object relational database system that uses and extends SQL language combined with many features that safely store and scale the most complicated data workloads. PostgreSQL runs on all major operating systems.

  2. Z

    Data from: Atlas of European Eel Distribution (Anguilla anguilla) in...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mateo, Maria; Drouineau, Hilaire; Pella, Herve; Beaulaton, Laurent; Amilhat, Elsa; Bardonnet, Agnès; Domingos, Isabel; Fernández-Delgado, Carlos; De Miguel Rubio, Ramon; Herrera, Mercedes; Korta, Maria; Zamora, Lluis; Díaz, Estibalitz; Briand, Cédric (2024). Atlas of European Eel Distribution (Anguilla anguilla) in Portugal, Spain and France [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6021837
    Explore at:
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    FCUL/MARE
    INRAe
    University of Girona
    EPTB-Vilaine
    OFB
    AZTI
    University of Perpignan
    University of Córdoba
    Authors
    Mateo, Maria; Drouineau, Hilaire; Pella, Herve; Beaulaton, Laurent; Amilhat, Elsa; Bardonnet, Agnès; Domingos, Isabel; Fernández-Delgado, Carlos; De Miguel Rubio, Ramon; Herrera, Mercedes; Korta, Maria; Zamora, Lluis; Díaz, Estibalitz; Briand, Cédric
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    France, Portugal, Spain
    Description

    DESCRIPTION

    VERSIONS

    version1.0.1 fixes problem with functions

    version1.0.2 added table dbeel_rivers.rn_rivermouth with GEREM basin, distance to Gibraltar and link to CCM.

    version1.0.3 fixes problem with functions

    version1.0.4 adds views rn_rna and rn_rne to the database

    The SUDOANG project aims at providing common tools to managers to support eel conservation in the SUDOE area (Spain, France and Portugal). VISUANG is the SUDOANG Interactive Web Application that host all these tools . The application consists of an eel distribution atlas (GT1), assessments of mortalities caused by turbines and an atlas showing obstacles to migration (GT2), estimates of recruitment and exploitation rate (GT3) and escapement (chosen as a target by the EC for the Eel Management Plans) (GT4). In addition, it includes an interactive map showing sampling results from the pilot basin network produced by GT6.

    The eel abundance for the eel atlas and escapement has been obtained using the Eel Density Analysis model (EDA, GT4's product). EDA extrapolates the abundance of eel in sampled river segments to other segments taking into account how the abundance, sex and size of the eels change depending on different parameters. Thus, EDA requires two main data sources: those related to the river characteristics and those related to eel abundance and characteristics.

    However, in both cases, data availability was uneven in the SUDOE area. In addition, this information was dispersed among several managers and in different formats due to different sampling sources: Water Framework Directive (WFD), Community Framework for the Collection, Management and Use of Data in the Fisheries Sector (EUMAP), Eel Management Plans, research groups, scientific papers and technical reports. Therefore, the first step towards having eel abundance estimations including the whole SUDOE area, was to have a joint river and eel database. In this report we will describe the database corresponding to the river’s characteristics in the SUDOE area and the eel abundances and their characteristics.

    In the case of rivers, two types of information has been collected:

    River topology (RN table): a compilation of data on rivers and their topological and hydrographic characteristics in the three countries.

    River attributes (RNA table): contains physical attributes that have fed the SUDOANG models.

    The estimation of eel abundance and characteristic (size, biomass, sex-ratio and silver) distribution at different scales (river segment, basin, Eel Management Unit (EMU), and country) in the SUDOE area obtained with the implementation of the EDA2.3 model has been compiled in the RNE table (eel predictions).

    CURRENT ACTIVE PROJECT

    The project is currently active here : gitlab forgemia

    TECHNICAL DESCRIPTION TO BUILD THE POSTGRES DATABASE

    1. Build the database in postgres.

    All tables are in ESPG:3035 (European LAEA). The format is postgreSQL database. You can download other formats (shapefiles, csv), here SUDOANG gt1 database.

    Initial command

    open a shell with command CMD

    Move to the place where you have downloaded the file using the following command

    cd c:/path/to/my/folder

    note psql must be accessible, in windows you can add the path to the postgres

    bin folder, otherwise you need to add the full path to the postgres bin folder see link to instructions below

    createdb -U postgres eda2.3 psql -U postgres eda2.3

    this will open a command with # where you can launch the commands in the next box

    Within the psql command

    create extension "postgis"; create extension "dblink"; create extension "ltree"; create extension "tablefunc"; create schema dbeel_rivers; create schema france; create schema spain; create schema portugal; -- type \q to quit the psql shell

    Now the database is ready to receive the differents dumps. The dump file are large. You might not need the part including unit basins or waterbodies. All the tables except waterbodies and unit basins are described in the Atlas. You might need to understand what is inheritance in a database. https://www.postgresql.org/docs/12/tutorial-inheritance.html

    1. RN (riversegments)

    These layers contain the topology (see Atlas for detail)

    dbeel_rivers.rn

    france.rn

    spain.rn

    portugal.rn

    Columns (see Atlas)

        gid
    
    
        idsegment
    
    
        source
    
    
        target
    
    
        lengthm
    
    
        nextdownidsegment
    
    
        path
    
    
        isfrontier
    
    
        issource
    
    
        seaidsegment
    
    
        issea
    
    
        geom
    
    
        isendoreic
    
    
        isinternational
    
    
        country
    

    dbeel_rivers.rn_rivermouth

        seaidsegment
    
    
        geom (polygon)
    
    
        gerem_zone_3
    
    
        gerem_zone_4 (used in EDA)
    
    
        gerem_zone_5
    
    
        ccm_wso_id
    
    
        country
    
    
        emu_name_short
    
    
        geom_outlet (point)
    
    
        name_basin
    
    
        dist_from_gibraltar_km
    
    
        name_coast
    
    
        basin_name
    

    dbeel_rivers.rn ! mandatory => table at the international level from which

    the other table inherit

    even if you don't want to use other countries

    (In many cases you should ... there are transboundary catchments) download this first.

    the rn network must be restored firt !

    table rne and rna refer to it by foreign keys.

    pg_restore -U postgres -d eda2.3 "dbeel_rivers.rn.backup"

    france

    pg_restore -U postgres -d eda2.3 "france.rn.backup"

    spain

    pg_restore -U postgres -d eda2.3 "spain.rn.backup"

    portugal

    pg_restore -U postgres -d eda2.3 "portugal.rn.backup"

    rivermouth and basins, this file contains GEREM basins, distance to Gibraltar, the link to CCM id

    for each basin flowing to the sea. pg_restore -U postgres -d eda2.3 "dbeel_rivers.rn_rivermouth.backup"

    with the schema you will probably want to be able to use the functions, but launch this only after

    restoring rna in the next step

    psql -U postgres -d eda2.3 -f "function_dbeel_rivers.sql"

    1. RNA (Attributes)

    This corresponds to tables

    dbeel_rivers.rna

    france.rna

    spain.rna

    portugal.rna

    Columns (See Atlas)

        idsegment
    
    
        altitudem
    
    
        distanceseam
    
    
        distancesourcem
    
    
        cumnbdam
    
    
        medianflowm3ps
    
    
        surfaceunitbvm2
    
    
        surfacebvm2
    
    
        strahler
    
    
        shreeve
    
    
        codesea
    
    
        name
    
    
        pfafriver
    
    
        pfafsegment
    
    
        basin
    
    
        riverwidthm
    
    
        temperature
    
    
        temperaturejan
    
    
        temperaturejul
    
    
        wettedsurfacem2
    
    
        wettedsurfaceotherm2
    
    
        lengthriverm
    
    
        emu
    
    
        cumheightdam
    
    
        riverwidthmsource
    
    
        slope
    
    
        dis_m3_pyr_riveratlas
    
    
        dis_m3_pmn_riveratlas
    
    
        dis_m3_pmx_riveratlas
    
    
        drought
    
    
        drought_type_calc
    

    Code :

    pg_restore -U postgres -d eda2.3 "dbeel_rivers.rna.backup" pg_restore -U postgres -d eda2.3 "france.rna.backup" pg_restore -U postgres -d eda2.3 "spain.rna.backup"
    pg_restore -U postgres -d eda2.3 "portugal.rna.backup"

    1. RNE (eel predictions)

    These layers contain eel data (see Atlas for detail)

    dbeel_rivers.rne

    france.rne

    spain.rne

    portugal.rne

    Columns (see Atlas)

        idsegment
    
    
        surfaceunitbvm2
    
    
        surfacebvm2
    
    
        delta
    
    
        gamma
    
    
        density
    
    
        neel
    
    
        beel
    
    
        peel150
    
    
        peel150300
    
    
        peel300450
    
    
        peel450600
    
    
        peel600750
    
    
        peel750
    
    
        nsilver
    
    
        bsilver
    
    
        psilver150300
    
    
        psilver300450
    
    
        psilver450600
    
    
        psilver600750
    
    
        psilver750
    
    
        psilver
    
    
        pmale150300
    
    
        pmale300450
    
    
        pmale450600
    
    
        pfemale300450
    
    
        pfemale450600
    
    
        pfemale600750
    
    
        pfemale750
    
    
        pmale
    
    
        pfemale
    
    
        sex_ratio
    
    
        cnfemale300450
    
    
        cnfemale450600
    
    
        cnfemale600750
    
    
        cnfemale750
    
    
        cnmale150300
    
    
        cnmale300450
    
    
        cnmale450600
    
    
        cnsilver150300
    
    
        cnsilver300450
    
    
        cnsilver450600
    
    
        cnsilver600750
    
    
        cnsilver750
    
    
        cnsilver
    
    
        delta_tr
    
    
        gamma_tr
    
    
        type_fit_delta_tr
    
    
        type_fit_gamma_tr
    
    
        density_tr
    
    
        density_pmax_tr
    
    
        neel_pmax_tr
    
    
        nsilver_pmax_tr
    
    
        density_wd
    
    
        neel_wd
    
    
        beel_wd
    
    
        nsilver_wd
    
    
        bsilver_wd
    
    
        sector_tr
    
    
        year_tr
    
    
        is_current_distribution_area
    
    
        is_pristine_distribution_area_1985
    

    Code for restauration

    pg_restore -U postgres -d eda2.3 "dbeel_rivers.rne.backup" pg_restore -U postgres -d eda2.3 "france.rne.backup" pg_restore -U postgres -d eda2.3 "spain.rne.backup"
    pg_restore -U postgres -d eda2.3 "portugal.rne.backup"

    1. Unit basins

    Units basins are not described in the Altas. They correspond to the following tables :

    dbeel_rivers.basinunit_bu

    france.basinunit_bu

    spain.basinunit_bu

    portugal.basinunit_bu

    france.basinunitout_buo

    spain.basinunitout_buo

    portugal.basinunitout_buo

    The unit basins is the simple basin that surrounds a segment. It correspond to the topography unit from which unit segment have been calculated. ESPG 3035. Tables bu_unitbv, and bu_unitbvout inherit from dbeel_rivers.unit_bv. The first table intersects with a segment, the second table does not, it corresponds to basin polygons which do not have a riversegment.

    Source :

    Portugal

    https://sniambgeoviewer.apambiente.pt/Geodocs/gml/inspire/HY_PhysicalWaters_DrainageBasinGeoCod.ziphttps://sniambgeoviewer.apambiente.pt/Geodocs/gml/inspire/HY_PhysicalWaters_DrainageBasinGeoCod.zip

    France

    In france unit bv corresponds to the RHT (Pella et al., 2012)

    Spain

    http://www.mapama.gob.es/ide/metadatos/index.html?srv=metadata.show&uuid=898f0ff8-f06c-4c14-88f7-43ea90e48233

    pg_restore -U postgres -d eda2.3 'dbeel_rivers.basinunit_bu.backup'

    france

    pg_restore -U postgres -d eda2.3

  3. Classicmodels

    • kaggle.com
    zip
    Updated Dec 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Javier Landaeta (2024). Classicmodels [Dataset]. https://www.kaggle.com/datasets/javierlandaeta/classicmodels
    Explore at:
    zip(65751 bytes)Available download formats
    Dataset updated
    Dec 15, 2024
    Authors
    Javier Landaeta
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Abstract This project presents a comprehensive analysis of a company's annual sales, using the classic dataset classicmodels as the database. Python is used as the main programming language, along with the Pandas, NumPy and SQLAlchemy libraries for data manipulation and analysis, and PostgreSQL as the database management system.

    The main objective of the project is to answer key questions related to the company's sales performance, such as: Which were the most profitable products and customers? Were sales goals met? The results obtained serve as input for strategic decision making in future sales campaigns.

    Methodology 1. Data Extraction:

    • A connection is established with the PostgreSQL database to extract the relevant data from the orders, orderdetails, customers, products and employees tables.
    • A reusable function is created to read each table and load it into a Pandas DataFrame.

    2. Data Cleansing and Transformation:

    • An exploratory analysis of the data is performed to identify missing values, inconsistencies, and outliers.
    • New variables are calculated, such as the total value of each sale, cost, and profit.
    • Different DataFrames are joined using primary and foreign keys to obtain a complete view of sales.

    3. Exploratory Data Analysis (EDA):

    • Key metrics such as total sales, number of unique customers, and average order value are calculated.
    • Data is grouped by different dimensions (products, customers, dates) to identify patterns and trends.
    • Results are visualized using relevant graphics (histograms, bar charts, etc.).

    4. Modeling and Prediction:

    • Although the main focus of the project is descriptive, predictive modeling techniques (e.g., time series) could be explored to forecast future sales.

    5. Report Generation:

    • Detailed reports are created in Pandas DataFrames format that answer specific business questions.
    • These reports are stored in new PostgreSQL tables for further analysis and visualization.

    Results - Identification of top products and customers: The best-selling products and the customers that generate the most revenue are identified. - Analysis of sales trends: Sales trends over time are analyzed and possible factors that influence sales behavior are identified. - Calculation of key metrics: Metrics such as average profit margin and sales growth rate are calculated.

    Conclusions This project demonstrates how Python and PostgreSQL can be effectively used to analyze large data sets and obtain valuable insights for business decision making. The results obtained can serve as a starting point for future research and development in the area of ​​sales analysis.

    Technologies Used - Python: Pandas, NumPy, SQLAlchemy, Matplotlib/Seaborn - Database: PostgreSQL - Tools: Jupyter Notebook - Keywords: data analysis, Python, PostgreSQL, Pandas, NumPy, SQLAlchemy, EDA, sales, business intelligence

  4. d

    Technographic Data | 22M Records | Refreshed 2x/Mo | Delivery Hourly via...

    • datarade.ai
    .json, .csv, .sql
    Updated Jan 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Forager.ai (2023). Technographic Data | 22M Records | Refreshed 2x/Mo | Delivery Hourly via CSV/JSON/PostgreSQL DB Delivery | B2B Data [Dataset]. https://datarade.ai/data-products/technographic-data-22m-records-refreshed-2x-mo-delivery-forager-ai
    Explore at:
    .json, .csv, .sqlAvailable download formats
    Dataset updated
    Jan 1, 2023
    Dataset provided by
    Forager.ai
    Area covered
    State of, French Southern Territories, Lithuania, Canada, Togo, Liechtenstein, Guernsey, South Georgia and the South Sandwich Islands, Botswana, Netherlands
    Description

    The Forager.ai Global Install Base Data set is a leading source of firmographic data, backed by advanced AI and offering the highest refresh rate in the industry.

    | Volume and Stats |

    • Over 22M total records, the highest volume in the industry today.
    • Every company record refreshed twice a month, offering an unparalleled update frequency.
    • Delivery is made every hour, ensuring you have the latest data at your fingertips.
    • Each record is the result of an advanced AI-driven process, ensuring high-quality, accurate data.

    | Use Cases |

    Sales Platforms, ABM and Intent Data Platforms, Identity Platforms, Data Vendors:

    Example applications include:

    1. Uncover trending technologies or tools gaining popularity.

    2. Pinpoint lucrative business prospects by identifying similar solutions utilized by a specific company.

    3. Study a company's tech stacks to understand the technical capability and skills available within that company.

    B2B Tech Companies:

    • Enrich leads that sign-up through the Company Search API (available separately).
    • Identify and map every company that fits your core personas and ICP.
    • Build audiences to target, using key fields like location, company size, industry, and description.

    Venture Capital and Private Equity:

    • Discover new investment opportunities using company descriptions and industry-level data.
    • Review the growth of private companies and benchmark their strength against competitors.
    • Create high-level views of companies competing in popular verticals for investment.

    | Delivery Options |

    • Flat files via S3 or GCP
    • PostgreSQL Shared Database
    • PostgreSQL Managed Database
    • API
    • Other options available upon request, depending on the scale required

    Our dataset provides a unique blend of volume, freshness, and detail that is perfect for Sales Platforms, B2B Tech, VCs & PE firms, Marketing Automation, ABM & Intent. It stands as a cornerstone in our broader data offering, ensuring you have the information you need to drive decision-making and growth.

    Tags: Company Data, Company Profiles, Employee Data, Firmographic Data, AI-Driven Data, High Refresh Rate, Company Classification, Private Market Intelligence, Workforce Intelligence, Public Companies.

  5. c

    PostgreSQL Price Prediction Data

    • coinbase.com
    Updated Dec 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). PostgreSQL Price Prediction Data [Dataset]. https://www.coinbase.com/price-prediction/base-postgresql
    Explore at:
    Dataset updated
    Dec 3, 2025
    Variables measured
    Growth Rate, Predicted Price
    Measurement technique
    User-defined projections based on compound growth. This is not a formal financial forecast.
    Description

    This dataset contains the predicted prices of the asset PostgreSQL over the next 16 years. This data is calculated initially using a default 5 percent annual growth rate, and after page load, it features a sliding scale component where the user can then further adjust the growth rate to their own positive or negative projections. The maximum positive adjustable growth rate is 100 percent, and the minimum adjustable growth rate is -100 percent.

  6. Papyrus dataset postgres dump

    • zenodo.org
    • data.niaid.nih.gov
    tar
    Updated Jul 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rachael Skyner; Ben Tehan; Rachael Skyner; Ben Tehan (2022). Papyrus dataset postgres dump [Dataset]. http://doi.org/10.5281/zenodo.6866697
    Explore at:
    tarAvailable download formats
    Dataset updated
    Jul 21, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Rachael Skyner; Ben Tehan; Rachael Skyner; Ben Tehan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset that the database dump was created from is described here: 10.33774/chemrxiv-2021-1rxhk

    A dump of the postgres database created from the code in the 'postgres' directory (Papyrus-scripts/src/papyrus_scripts/postgres/) of Rachael Skyner's fork (https://github.com/reskyner/Papyrus-scripts) of Oliver Bequignon's Papyrus-scripts github (https://github.com/OlivierBeq/Papyrus-scripts). The database was created by:

    1. Download the papyrus csv files from Oliver's code using the download functionality

    2. Spin up a 'papyrus' container using the docker-compose.yml file in Rachael's fork (running on a machine with access to the postgres instance you want to add the database to)

    3. Start a shell in the papyrus container with docker exec -it papyrus /bin/bash

    4. Start a jupyter notebook server with jupyter notebook --ip 0.0.0.0 --allow-root --no-browser

    5. Run the two notebooks (1-insert_molecule_data.ipynb and 2-insert_activities.ipynb) in order

    6. Create a dump of the database

  7. Z

    Magnetique: input data and PostgreSQL database

    • data.niaid.nih.gov
    Updated Oct 5, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thiago Britto-Borges; Annekathrin Ludt; Etienne Boileau; Enio Gjerga; Federico Marini; Christoph Dieterich (2022). Magnetique: input data and PostgreSQL database [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6854307
    Explore at:
    Dataset updated
    Oct 5, 2022
    Dataset provided by
    University Hospital Heidelberg
    University Medical Center of the Johannes Gutenberg University Mainz
    Authors
    Thiago Britto-Borges; Annekathrin Ludt; Etienne Boileau; Enio Gjerga; Federico Marini; Christoph Dieterich
    Description

    Magnetique: An interactive web application to explore transcriptome signatures of heart failure

    Supplementary dataset.

    Refer to https://shiny.dieterichlab.org/app/magnetique or contact the authors for details.

  8. H

    PostgreSQL Dump of IMDB Data for JOB Workload

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Sep 24, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ryan Marcus (2019). PostgreSQL Dump of IMDB Data for JOB Workload [Dataset]. http://doi.org/10.7910/DVN/2QYZBT
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 24, 2019
    Dataset provided by
    Harvard Dataverse
    Authors
    Ryan Marcus
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This is a dump generated by pg_dump -Fc of the IMDb data used in the "How Good are Query Optimizers, Really?" paper. PostgreSQL compatible SQL queries and scripts to automatically create a VM with this dataset can be found here: https://git.io/imdb

  9. Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter...

    • zenodo.org
    bz2
    Updated Mar 15, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana (2021). Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter Notebooks [Dataset]. http://doi.org/10.5281/zenodo.2592524
    Explore at:
    bz2Available download formats
    Dataset updated
    Mar 15, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of Jupyter Notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we analyzed 1.4 million notebooks from GitHub.

    Paper: https://2019.msrconf.org/event/msr-2019-papers-a-large-scale-study-about-quality-and-reproducibility-of-jupyter-notebooks

    This repository contains two files:

    • dump.tar.bz2
    • jupyter_reproducibility.tar.bz2

    The dump.tar.bz2 file contains a PostgreSQL dump of the database, with all the data we extracted from the notebooks.

    The jupyter_reproducibility.tar.bz2 file contains all the scripts we used to query and download Jupyter Notebooks, extract data from them, and analyze the data. It is organized as follows:

    • analyses: this folder has all the notebooks we use to analyze the data in the PostgreSQL database.
    • archaeology: this folder has all the scripts we use to query, download, and extract data from GitHub notebooks.
    • paper: empty. The notebook analyses/N12.To.Paper.ipynb moves data to it

    In the remaining of this text, we give instructions for reproducing the analyses, by using the data provided in the dump and reproducing the collection, by collecting data from GitHub again.

    Reproducing the Analysis

    This section shows how to load the data in the database and run the analyses notebooks. In the analysis, we used the following environment:

    Ubuntu 18.04.1 LTS
    PostgreSQL 10.6
    Conda 4.5.11
    Python 3.7.2
    PdfCrop 2012/11/02 v1.38

    First, download dump.tar.bz2 and extract it:

    tar -xjf dump.tar.bz2

    It extracts the file db2019-03-13.dump. Create a database in PostgreSQL (we call it "jupyter"), and use psql to restore the dump:

    psql jupyter < db2019-03-13.dump

    It populates the database with the dump. Now, configure the connection string for sqlalchemy by setting the environment variable JUP_DB_CONNECTTION:

    export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter";

    Download and extract jupyter_reproducibility.tar.bz2:

    tar -xjf jupyter_reproducibility.tar.bz2

    Create a conda environment with Python 3.7:

    conda create -n analyses python=3.7
    conda activate analyses

    Go to the analyses folder and install all the dependencies of the requirements.txt

    cd jupyter_reproducibility/analyses
    pip install -r requirements.txt

    For reproducing the analyses, run jupyter on this folder:

    jupyter notebook

    Execute the notebooks on this order:

    • Index.ipynb
    • N0.Repository.ipynb
    • N1.Skip.Notebook.ipynb
    • N2.Notebook.ipynb
    • N3.Cell.ipynb
    • N4.Features.ipynb
    • N5.Modules.ipynb
    • N6.AST.ipynb
    • N7.Name.ipynb
    • N8.Execution.ipynb
    • N9.Cell.Execution.Order.ipynb
    • N10.Markdown.ipynb
    • N11.Repository.With.Notebook.Restriction.ipynb
    • N12.To.Paper.ipynb

    Reproducing or Expanding the Collection

    The collection demands more steps to reproduce and takes much longer to run (months). It also involves running arbitrary code on your machine. Proceed with caution.

    Requirements

    This time, we have extra requirements:

    All the analysis requirements
    lbzip2 2.5
    gcc 7.3.0
    Github account
    Gmail account

    Environment

    First, set the following environment variables:

    export JUP_MACHINE="db"; # machine identifier
    export JUP_BASE_DIR="/mnt/jupyter/github"; # place to store the repositories
    export JUP_LOGS_DIR="/home/jupyter/logs"; # log files
    export JUP_COMPRESSION="lbzip2"; # compression program
    export JUP_VERBOSE="5"; # verbose level
    export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter"; # sqlchemy connection
    export JUP_GITHUB_USERNAME="github_username"; # your github username
    export JUP_GITHUB_PASSWORD="github_password"; # your github password
    export JUP_MAX_SIZE="8000.0"; # maximum size of the repositories directory (in GB)
    export JUP_FIRST_DATE="2013-01-01"; # initial date to query github
    export JUP_EMAIL_LOGIN="gmail@gmail.com"; # your gmail address
    export JUP_EMAIL_TO="target@email.com"; # email that receives notifications
    export JUP_OAUTH_FILE="~/oauth2_creds.json" # oauth2 auhentication file
    export JUP_NOTEBOOK_INTERVAL=""; # notebook id interval for this machine. Leave it in blank
    export JUP_REPOSITORY_INTERVAL=""; # repository id interval for this machine. Leave it in blank
    export JUP_WITH_EXECUTION="1"; # run execute python notebooks
    export JUP_WITH_DEPENDENCY="0"; # run notebooks with and without declared dependnecies
    export JUP_EXECUTION_MODE="-1"; # run following the execution order
    export JUP_EXECUTION_DIR="/home/jupyter/execution"; # temporary directory for running notebooks
    export JUP_ANACONDA_PATH="~/anaconda3"; # conda installation path
    export JUP_MOUNT_BASE="/home/jupyter/mount_ghstudy.sh"; # bash script to mount base dir
    export JUP_UMOUNT_BASE="/home/jupyter/umount_ghstudy.sh"; # bash script to umount base dir
    export JUP_NOTEBOOK_TIMEOUT="300"; # timeout the extraction
    
    
    # Frequenci of log report
    export JUP_ASTROID_FREQUENCY="5";
    export JUP_IPYTHON_FREQUENCY="5";
    export JUP_NOTEBOOKS_FREQUENCY="5";
    export JUP_REQUIREMENT_FREQUENCY="5";
    export JUP_CRAWLER_FREQUENCY="1";
    export JUP_CLONE_FREQUENCY="1";
    export JUP_COMPRESS_FREQUENCY="5";
    
    export JUP_DB_IP="localhost"; # postgres database IP

    Then, configure the file ~/oauth2_creds.json, according to yagmail documentation: https://media.readthedocs.org/pdf/yagmail/latest/yagmail.pdf

    Configure the mount_ghstudy.sh and umount_ghstudy.sh scripts. The first one should mount the folder that stores the directories. The second one should umount it. You can leave the scripts in blank, but it is not advisable, as the reproducibility study runs arbitrary code on your machine and you may lose your data.

    Scripts

    Download and extract jupyter_reproducibility.tar.bz2:

    tar -xjf jupyter_reproducibility.tar.bz2

    Install 5 conda environments and 5 anaconda environments, for each python version. In each of them, upgrade pip, install pipenv, and install the archaeology package (Note that it is a local package that has not been published to pypi. Make sure to use the -e option):

    Conda 2.7

    conda create -n raw27 python=2.7 -y
    conda activate raw27
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 2.7

    conda create -n py27 python=2.7 anaconda -y
    conda activate py27
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology
    

    Conda 3.4

    It requires a manual jupyter and pathlib2 installation due to some incompatibilities found on the default installation.

    conda create -n raw34 python=3.4 -y
    conda activate raw34
    conda install jupyter -c conda-forge -y
    conda uninstall jupyter -y
    pip install --upgrade pip
    pip install jupyter
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology
    pip install pathlib2

    Anaconda 3.4

    conda create -n py34 python=3.4 anaconda -y
    conda activate py34
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.5

    conda create -n raw35 python=3.5 -y
    conda activate raw35
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 3.5

    It requires the manual installation of other anaconda packages.

    conda create -n py35 python=3.5 anaconda -y
    conda install -y appdirs atomicwrites keyring secretstorage libuuid navigator-updater prometheus_client pyasn1 pyasn1-modules spyder-kernels tqdm jeepney automat constantly anaconda-navigator
    conda activate py35
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.6

    conda create -n raw36 python=3.6 -y
    conda activate raw36
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 3.6

    conda create -n py36 python=3.6 anaconda -y
    conda activate py36
    conda install -y anaconda-navigator jupyterlab_server navigator-updater
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.7

    <code

  10. Music Store Data Analysis Project using SQL

    • kaggle.com
    zip
    Updated Jun 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aetik (2023). Music Store Data Analysis Project using SQL [Dataset]. https://www.kaggle.com/datasets/adimadapalageetika/music-store-data-analysis-project-using-sql/discussion
    Explore at:
    zip(1748 bytes)Available download formats
    Dataset updated
    Jun 30, 2023
    Authors
    Aetik
    Description

    I completed a PostgreSQL project to hone my SQL abilities. Following a tutorial video, I worked on a music store data analysis. In the project, I used SQL to answer several queries about the music shop company.

  11. c

    RUST-POSTGRES github.com/sfackler/RUST-POSTGRES Price Prediction Data

    • coinbase.com
    Updated Dec 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). RUST-POSTGRES github.com/sfackler/RUST-POSTGRES Price Prediction Data [Dataset]. https://www.coinbase.com/price-prediction/base-rust-postgres-githubcomsfacklerrust-postgres-5515
    Explore at:
    Dataset updated
    Dec 1, 2025
    Variables measured
    Growth Rate, Predicted Price
    Measurement technique
    User-defined projections based on compound growth. This is not a formal financial forecast.
    Description

    This dataset contains the predicted prices of the asset RUST-POSTGRES github.com/sfackler/RUST-POSTGRES over the next 16 years. This data is calculated initially using a default 5 percent annual growth rate, and after page load, it features a sliding scale component where the user can then further adjust the growth rate to their own positive or negative projections. The maximum positive adjustable growth rate is 100 percent, and the minimum adjustable growth rate is -100 percent.

  12. SQL Databases for Students and Educators

    • zenodo.org
    • data-staging.niaid.nih.gov
    • +1more
    bin, html
    Updated Oct 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mauricio Vargas Sepúlveda; Mauricio Vargas Sepúlveda (2020). SQL Databases for Students and Educators [Dataset]. http://doi.org/10.5281/zenodo.4136985
    Explore at:
    bin, htmlAvailable download formats
    Dataset updated
    Oct 28, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mauricio Vargas Sepúlveda; Mauricio Vargas Sepúlveda
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Publicly accessible databases often impose query limits or require registration. Even when I maintain public and limit-free APIs, I never wanted to host a public database because I tend to think that the connection strings are a problem for the user.

    I’ve decided to host different light/medium size by using PostgreSQL, MySQL and SQL Server backends (in strict descending order of preference!).

    Why 3 database backends? I think there are a ton of small edge cases when moving between DB back ends and so testing lots with live databases is quite valuable. With this resource you can benchmark speed, compression, and DDL types.

    Please send me a tweet if you need the connection strings for your lectures or workshops. My Twitter username is @pachamaltese. See the SQL dumps on each section to have the data locally.

  13. 🏪🏬 Pagila (PostgreSQL Sample Database)

    • kaggle.com
    zip
    Updated Aug 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Kapturov (2025). 🏪🏬 Pagila (PostgreSQL Sample Database) [Dataset]. https://www.kaggle.com/datasets/kapturovalexander/pagila-postgresql-sample-database/discussion
    Explore at:
    zip(1926924 bytes)Available download formats
    Dataset updated
    Aug 17, 2025
    Authors
    Alexander Kapturov
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    DVD rental database to demonstrate the features of PostgreSQL.

    There are 15 tables in the DVD Rental database:

    • actor – stores actors data including first name and last name.
    • film – stores film data such as title, release year, length, rating, etc.
    • film_actor – stores the relationships between films and actors.
    • category – stores film’s categories data.
    • film_category- stores the relationships between films and categories.
    • store – contains the store data including manager staff and address.
    • inventory – stores inventory data.
    • rental – stores rental data.
    • payment – stores customer’s payments.
    • staff – stores staff data.
    • customer – stores customer data.
    • address – stores address data for staff and customers
    • city – stores city names.
    • country – stores country names.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10074224%2F428950174ca8917693d9a125242c9a02%2F2.png?generation=1688974937835056&alt=media" alt="">

    Launch pagila-schema.sql code in PgAdmin 4 and then launch pagila-insert-data.sql

    Don't forget to switch on auto-commit mode.

  14. Most popular database management systems worldwide 2024

    • statista.com
    Updated Jun 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Most popular database management systems worldwide 2024 [Dataset]. https://www.statista.com/statistics/809750/worldwide-popularity-ranking-database-management-systems/
    Explore at:
    Dataset updated
    Jun 15, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jun 2024
    Area covered
    Worldwide
    Description

    As of June 2024, the most popular database management system (DBMS) worldwide was Oracle, with a ranking score of *******; MySQL and Microsoft SQL server rounded out the top three. Although the database management industry contains some of the largest companies in the tech industry, such as Microsoft, Oracle and IBM, a number of free and open-source DBMSs such as PostgreSQL and MariaDB remain competitive. Database Management Systems As the name implies, DBMSs provide a platform through which developers can organize, update, and control large databases. Given the business world’s growing focus on big data and data analytics, knowledge of SQL programming languages has become an important asset for software developers around the world, and database management skills are seen as highly desirable. In addition to providing developers with the tools needed to operate databases, DBMS are also integral to the way that consumers access information through applications, which further illustrates the importance of the software.

  15. r

    PostgreSQL

    • rrid.site
    Updated Nov 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). PostgreSQL [Dataset]. http://identifiers.org/RRID:SCR_021067
    Explore at:
    Dataset updated
    Nov 30, 2025
    Description

    Open source object relational database system that uses and extends SQL language combined with many features that safely store and scale the most complicated data workloads. PostgreSQL runs on all major operating systems.

  16. Most commonly used database technologies among developers worldwide 2023

    • statista.com
    Updated Nov 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Most commonly used database technologies among developers worldwide 2023 [Dataset]. https://www.statista.com/statistics/794187/united-states-developer-survey-most-wanted-used-database-technologies/
    Explore at:
    Dataset updated
    Nov 28, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    May 8, 2023 - May 19, 2023
    Area covered
    Worldwide
    Description

    In 2023, over ** percent of surveyed software developers worldwide reported using PostgreSQL, the highest share of any database technology. Other popular database tools among developers included MySQL and SQLite.

  17. D

    Managed PostgreSQL Services Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Oct 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Managed PostgreSQL Services Market Research Report 2033 [Dataset]. https://dataintelo.com/report/managed-postgresql-services-market
    Explore at:
    pptx, pdf, csvAvailable download formats
    Dataset updated
    Oct 1, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Managed PostgreSQL Services Market Outlook



    According to our latest research, the global Managed PostgreSQL Services market size reached USD 1.68 billion in 2024, with a robust year-on-year growth trajectory. The market is projected to expand at a CAGR of 17.2% during the forecast period, reaching approximately USD 7.09 billion by 2033. This remarkable growth is primarily driven by increasing enterprise adoption of open-source databases, the rising demand for scalable and secure data management solutions, and heightened digital transformation initiatives across industries. As organizations seek to streamline operations and enhance data-driven decision-making, managed PostgreSQL services are emerging as a critical enabler for modern IT infrastructures.




    One of the most significant growth factors for the Managed PostgreSQL Services market is the rapid acceleration of cloud adoption across enterprises of all sizes. Businesses are increasingly migrating their workloads to the cloud to capitalize on the agility, scalability, and cost efficiencies it offers. PostgreSQL, as a leading open-source relational database, has gained immense popularity due to its flexibility, extensibility, and strong community support. Managed service providers are leveraging these attributes by offering fully managed PostgreSQL solutions that reduce the operational burden on IT teams, ensure high availability, and provide automated backup and disaster recovery capabilities. This shift to managed services not only reduces infrastructure management complexities but also enables organizations to focus on core business innovation, further fueling market growth.




    Another pivotal factor propelling the managed PostgreSQL services market is the heightened emphasis on data security, compliance, and regulatory requirements. As data privacy regulations such as GDPR, CCPA, and others become increasingly stringent, organizations are compelled to adopt secure and compliant database management solutions. Managed PostgreSQL service providers are investing heavily in advanced security features such as encryption at rest and in transit, automated patch management, and comprehensive access controls to address these concerns. Additionally, the integration of monitoring and performance optimization tools ensures that enterprises can proactively manage database health, minimize downtime, and meet service level agreements (SLAs). This growing need for robust security and compliance frameworks is expected to sustain strong demand for managed PostgreSQL services in the coming years.




    The ongoing digital transformation across various industry verticals, including BFSI, healthcare, IT & telecommunications, and retail, is also a major contributor to the market's expansion. These sectors are experiencing exponential data growth, necessitating highly reliable, scalable, and cost-effective database management solutions. Managed PostgreSQL services offer seamless integration with modern application architectures, including microservices and containerized environments, supporting the rapid deployment and scaling of mission-critical applications. Furthermore, the proliferation of artificial intelligence, machine learning, and analytics-driven workloads is creating new opportunities for managed PostgreSQL services, as organizations seek to leverage real-time insights from their data assets.




    From a regional perspective, North America continues to dominate the managed PostgreSQL services market, driven by the presence of major cloud service providers, advanced IT infrastructure, and a high concentration of technology-driven enterprises. Europe is following closely, propelled by strict data protection regulations and increasing investments in digital transformation initiatives. Meanwhile, the Asia Pacific region is witnessing the fastest growth, fueled by the rapid expansion of digital economies, increasing cloud adoption among SMEs, and government-led initiatives to modernize IT infrastructure. Latin America and the Middle East & Africa are also demonstrating steady growth, albeit at a comparatively moderate pace, as enterprises in these regions gradually embrace managed database solutions.



    Service Type Analysis



    The service type segment of the managed PostgreSQL services market encompasses a range of offerings, including database administration, backup & recovery, security & compliance, monitoring & performance optimization, and other specialized ser

  18. G

    Managed PostgreSQL Services Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Sep 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Managed PostgreSQL Services Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/managed-postgresql-services-market
    Explore at:
    pptx, csv, pdfAvailable download formats
    Dataset updated
    Sep 1, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Managed PostgreSQL Services Market Outlook



    As per our latest research, the global managed PostgreSQL services market size reached USD 2.1 billion in 2024, reflecting robust adoption across industries seeking scalable, reliable, and secure database management solutions. With a strong compound annual growth rate (CAGR) of 16.8% projected from 2025 to 2033, the market is expected to climb to USD 8.1 billion by 2033. This impressive growth trajectory is underpinned by increasing digital transformation initiatives, the proliferation of cloud-native applications, and the critical need for streamlined database administration and security in a data-driven world.




    The managed PostgreSQL services market is witnessing significant momentum due to the surging demand for cost-effective, scalable, and high-performance database solutions among enterprises of all sizes. Organizations are increasingly migrating to managed database services to alleviate the burdens of in-house database maintenance, reduce operational costs, and ensure business continuity. The integration of advanced automation, artificial intelligence, and machine learning capabilities into managed PostgreSQL offerings further enhances performance optimization, predictive maintenance, and intelligent monitoring, making these services indispensable for modern enterprises. Additionally, the rise of DevOps and agile development methodologies has necessitated robust, flexible, and managed database solutions that can seamlessly integrate with CI/CD pipelines, further fueling market expansion.




    Another key driver propelling the managed PostgreSQL services market is the heightened focus on data security, compliance, and regulatory requirements, particularly in highly regulated industries such as BFSI, healthcare, and government. Managed service providers offer comprehensive security frameworks, including encryption, access controls, and automated backup and disaster recovery, which are vital for organizations handling sensitive and mission-critical data. The growing complexity of cyber threats and the need for adherence to standards like GDPR, HIPAA, and PCI DSS have made managed PostgreSQL services an attractive proposition for enterprises seeking to mitigate risks and ensure regulatory compliance without incurring the costs and complexities of managing these aspects internally.




    Furthermore, the increasing adoption of cloud computing and hybrid IT environments has accelerated the shift towards managed PostgreSQL services. As organizations embrace multi-cloud and hybrid strategies to enhance agility, scalability, and resilience, managed PostgreSQL solutions offer seamless integration, centralized management, and cross-platform compatibility. The ability to deploy, monitor, and optimize PostgreSQL databases across diverse environments, including public, private, and hybrid clouds, positions managed services as a strategic enabler of digital transformation. These services also support business continuity and disaster recovery strategies, ensuring minimal downtime and rapid recovery in the event of system failures or cyber incidents.




    From a regional perspective, North America continues to dominate the managed PostgreSQL services market, driven by the presence of leading technology companies, early adoption of cloud technologies, and a mature digital ecosystem. Europe follows closely, with strong demand from banking, finance, healthcare, and government sectors. The Asia Pacific region is emerging as a high-growth market, fueled by rapid digitalization, a burgeoning start-up ecosystem, and increasing investments in IT infrastructure. Latin America and the Middle East & Africa are also witnessing steady growth, albeit from a smaller base, as organizations in these regions accelerate their cloud adoption and digital transformation journeys.



    Managed Presto Services have emerged as a complementary solution to managed PostgreSQL services, particularly for organizations that require high-performance, distributed SQL query engines for big data analytics. As businesses increasingly rely on data-driven insights to inform strategic decisions, the integration of Presto with PostgreSQL enables real-time querying and analysis of vast datasets across diverse data sources. This synergy allows enterprises to harness the power of

  19. R

    Serverless Postgres Market Research Report 2033

    • researchintelo.com
    csv, pdf, pptx
    Updated Oct 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Research Intelo (2025). Serverless Postgres Market Research Report 2033 [Dataset]. https://researchintelo.com/report/serverless-postgres-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Oct 1, 2025
    Dataset authored and provided by
    Research Intelo
    License

    https://researchintelo.com/privacy-and-policyhttps://researchintelo.com/privacy-and-policy

    Time period covered
    2024 - 2033
    Area covered
    Global
    Description

    Serverless Postgres Market Outlook



    According to our latest research, the Global Serverless Postgres market size was valued at $1.2 billion in 2024 and is projected to reach $7.8 billion by 2033, expanding at a robust CAGR of 23.6% during 2024–2033. The primary growth driver for the Serverless Postgres market globally is the rising demand for scalable, cost-efficient, and highly available database solutions that eliminate the need for complex infrastructure management. As organizations accelerate their digital transformation journeys, the adoption of serverless databases, particularly Serverless Postgres, is surging due to its ability to seamlessly handle dynamic workloads, reduce operational overhead, and optimize resource allocation. This shift is further fueled by the proliferation of cloud-native applications and the increasing need for real-time data analytics across industries.



    Regional Outlook



    North America currently holds the largest share of the global Serverless Postgres market, accounting for nearly 38% of total revenue in 2024. This dominance is attributed to the region’s mature cloud infrastructure, high concentration of technology companies, and early adoption of advanced database technologies. The presence of major cloud providers and a robust ecosystem of managed service vendors has facilitated rapid deployment and integration of serverless database solutions among enterprises. Furthermore, favorable regulatory frameworks, a strong focus on data security, and significant investments in AI and big data analytics have contributed to the region’s leadership position. The United States, in particular, continues to be the epicenter of innovation, with organizations leveraging Serverless Postgres for mission-critical applications in finance, healthcare, and e-commerce.



    Asia Pacific is emerging as the fastest-growing region in the Serverless Postgres market, projected to register an impressive CAGR of 28.9% over the forecast period. The rapid digitalization of economies such as China, India, and Southeast Asian countries, coupled with increasing cloud adoption among SMEs and large enterprises, is fueling market expansion. Governments across the region are investing heavily in cloud infrastructure, data sovereignty, and smart city initiatives, which are further accelerating the adoption of serverless database technologies. Additionally, the rise of mobile applications, IoT deployments, and e-commerce platforms is creating new opportunities for Serverless Postgres providers to cater to diverse, high-growth use cases.



    In emerging economies across Latin America, the Middle East, and Africa, the Serverless Postgres market is witnessing steady growth, albeit from a smaller base. Adoption challenges persist due to limited cloud infrastructure, skills shortages, and concerns over data localization and regulatory compliance. However, localized demand for cost-effective, scalable database solutions is rising, particularly among startups and government agencies aiming to modernize their IT environments. Policy reforms, such as incentives for cloud adoption and digital transformation, are gradually creating a more conducive environment for serverless technologies. As connectivity improves and awareness grows, these regions are expected to contribute increasingly to global market revenues.



    Report Scope





    Attributes Details
    Report Title Serverless Postgres Market Research Report 2033
    By Deployment Type Public Cloud, Private Cloud, Hybrid Cloud
    By Component Database, Tools & Services
    By Organization Size Small and Medium Enterprises, Large Enterprises
    By Application Data Analytics, Web Applications, Mobile Applications, IoT, Others
    By End-User <

  20. Reproducibility in Practice: Dataset of a Large-Scale Study of Jupyter...

    • zenodo.org
    bz2
    Updated Mar 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous; Anonymous (2021). Reproducibility in Practice: Dataset of a Large-Scale Study of Jupyter Notebooks [Dataset]. http://doi.org/10.5281/zenodo.2546834
    Explore at:
    bz2Available download formats
    Dataset updated
    Mar 15, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous; Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of Jupyter Notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we analyzed 1.4 million notebooks from GitHub.

    This repository contains two files:

    • dump.tar.bz2
    • jupyter_reproducibility.tar.bz2

    The dump.tar.bz2 file contains a PostgreSQL dump of the database, with all the data we extracted from the notebooks.

    The jupyter_reproducibility.tar.bz2 file contains all the scripts we used to query and download Jupyter Notebooks, extract data from them, and analyze the data. It is organized as follows:

    • analyses: this folder has all the notebooks we use to analyze the data in the PostgreSQL database.
    • archaeology: this folder has all the scripts we use to query, download, and extract data from GitHub notebooks.
    • paper: empty. The notebook analyses/N11.To.Paper.ipynb moves data to it

    In the remaining of this text, we give instructions for reproducing the analyses, by using the data provided in the dump and reproducing the collection, by collecting data from GitHub again.

    Reproducing the Analysis

    This section shows how to load the data in the database and run the analyses notebooks. In the analysis, we used the following environment:

    Ubuntu 18.04.1 LTS
    PostgreSQL 10.6
    Conda 4.5.1
    Python 3.6.8
    PdfCrop 2012/11/02 v1.38

    First, download dump.tar.bz2 and extract it:

    tar -xjf dump.tar.bz2

    It extracts the file db2019-01-13.dump. Create a database in PostgreSQL (we call it "jupyter"), and use psql to restore the dump:

    psql jupyter < db2019-01-13.dump

    It populates the database with the dump. Now, configure the connection string for sqlalchemy by setting the environment variable JUP_DB_CONNECTTION:

    export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter";

    Download and extract jupyter_reproducibility.tar.bz2:

    tar -xjf jupyter_reproducibility.tar.bz2

    Create a conda environment with Python 3.6:

    conda create -n py36 python=3.6

    Go to the analyses folder and install all the dependencies of the requirements.txt

    cd jupyter_reproducibility/analyses
    pip install -r requirements.txt

    For reproducing the analyses, run jupyter on this folder:

    jupyter notebook

    Execute the notebooks on this order:

    • N0.Index.ipynb
    • N1.Repository.ipynb
    • N2.Notebook.ipynb
    • N3.Cell.ipynb
    • N4.Features.ipynb
    • N5.Modules.ipynb
    • N6.AST.ipynb
    • N7.Name.ipynb
    • N8.Execution.ipynb
    • N9.Cell.Execution.Order.ipynb
    • N10.Markdown.ipynb
    • N11.To.Paper.ipynb

    Reproducing or Expanding the Collection

    The collection demands more steps to reproduce and takes much longer to run (months). It also involves running arbitrary code on your machine. Proceed with caution.

    Requirements

    This time, we have extra requirements:

    All the analysis requirements
    lbzip2 2.5
    gcc 7.3.0
    Github account
    Gmail account

    Environment

    First, set the following environment variables:

    export JUP_MACHINE="db"; # machine identifier
    export JUP_BASE_DIR="/mnt/jupyter/github"; # place to store the repositories
    export JUP_LOGS_DIR="/home/jupyter/logs"; # log files
    export JUP_COMPRESSION="lbzip2"; # compression program
    export JUP_VERBOSE="5"; # verbose level
    export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter"; # sqlchemy connection
    export JUP_GITHUB_USERNAME="github_username"; # your github username
    export JUP_GITHUB_PASSWORD="github_password"; # your github password
    export JUP_MAX_SIZE="8000.0"; # maximum size of the repositories directory (in GB)
    export JUP_FIRST_DATE="2013-01-01"; # initial date to query github
    export JUP_EMAIL_LOGIN="gmail@gmail.com"; # your gmail address
    export JUP_EMAIL_TO="target@email.com"; # email that receives notifications
    export JUP_OAUTH_FILE="~/oauth2_creds.json" # oauth2 auhentication file
    export JUP_NOTEBOOK_INTERVAL=""; # notebook id interval for this machine. Leave it in blank
    export JUP_REPOSITORY_INTERVAL=""; # repository id interval for this machine. Leave it in blank
    export JUP_WITH_EXECUTION="1"; # run execute python notebooks
    export JUP_WITH_DEPENDENCY="0"; # run notebooks with and without declared dependnecies
    export JUP_EXECUTION_MODE="-1"; # run following the execution order
    export JUP_EXECUTION_DIR="/home/jupyter/execution"; # temporary directory for running notebooks
    export JUP_ANACONDA_PATH="~/anaconda3"; # conda installation path
    export JUP_MOUNT_BASE="/home/jupyter/mount_ghstudy.sh"; # bash script to mount base dir
    export JUP_UMOUNT_BASE="/home/jupyter/umount_ghstudy.sh"; # bash script to umount base dir
    export JUP_NOTEBOOK_TIMEOUT="300"; # timeout the extraction
    
    
    # Frequenci of log report
    export JUP_ASTROID_FREQUENCY="5";
    export JUP_IPYTHON_FREQUENCY="5";
    export JUP_NOTEBOOKS_FREQUENCY="5";
    export JUP_REQUIREMENT_FREQUENCY="5";
    export JUP_CRAWLER_FREQUENCY="1";
    export JUP_CLONE_FREQUENCY="1";
    export JUP_COMPRESS_FREQUENCY="5";
    
    export JUP_DB_IP="localhost"; # postgres database IP

    Then, configure the file ~/oauth2_creds.json, according to yagmail documentation: https://media.readthedocs.org/pdf/yagmail/latest/yagmail.pdf

    Configure the mount_ghstudy.sh and umount_ghstudy.sh scripts. The first one should mount the folder that stores the directories. The second one should umount it. You can leave the scripts in blank, but it is not advisable, as the reproducibility study runs arbitrary code on your machine and you may lose your data.

    Scripts

    Download and extract jupyter_reproducibility.tar.bz2:

    tar -xjf jupyter_reproducibility.tar.bz2

    Install 5 conda environments and 5 anaconda environments, for each python version. In each of them, upgrade pip, install pipenv, and install the archaeology package (Note that it is a local package that has not been published to pypi. Make sure to use the -e option):

    Conda 2.7

    conda create -n raw27 python=2.7 -y
    conda activate raw27
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 2.7

    conda create -n py27 python=2.7 anaconda -y
    conda activate py27
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology
    

    Conda 3.4

    It requires a manual jupyter and pathlib2 installation due to some incompatibilities found on the default installation.

    conda create -n raw34 python=3.4 -y
    conda activate raw34
    conda install jupyter -c conda-forge -y
    conda uninstall jupyter -y
    pip install --upgrade pip
    pip install jupyter
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology
    pip install pathlib2

    Anaconda 3.4

    conda create -n py34 python=3.4 anaconda -y
    conda activate py34
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.5

    conda create -n raw35 python=3.5 -y
    conda activate raw35
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 3.5

    It requires the manual installation of other anaconda packages.

    conda create -n py35 python=3.5 anaconda -y
    conda install -y appdirs atomicwrites keyring secretstorage libuuid navigator-updater prometheus_client pyasn1 pyasn1-modules spyder-kernels tqdm jeepney automat constantly anaconda-navigator
    conda activate py35
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.6

    conda create -n raw36 python=3.6 -y
    conda activate raw36
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 3.6

    conda create -n py36 python=3.6 anaconda -y
    conda activate py36
    conda install -y anaconda-navigator jupyterlab_server navigator-updater
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.7

    conda create -n raw37 python=3.7 -y
    conda activate raw37
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 3.7

    When we

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2022). PostgreSQL [Dataset]. http://identifiers.org/RRID:SCR_021067

PostgreSQL

RRID:SCR_021067, PostgreSQL (RRID:SCR_021067), Postgres

Explore at:
Dataset updated
Jan 29, 2022
Description

Open source object relational database system that uses and extends SQL language combined with many features that safely store and scale the most complicated data workloads. PostgreSQL runs on all major operating systems.

Search
Clear search
Close search
Google apps
Main menu