39 datasets found

Additional file 1: of VarGenius executes cohort-level DNA-seq variant...
springernature.figshare.com
txt
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
F. Musacchia; A. Ciolfi; M. Mutarelli; A. Bruselles; R. Castello; M. Pinelli; S. Basu; S. Banfi; G. Casari; M. Tartaglia; V. Nigro (2023). Additional file 1: of VarGenius executes cohort-level DNA-seq variant calling and annotation and allows to manage the resulting data through a PostgreSQL database [Dataset]. http://doi.org/10.6084/m9.figshare.7460612.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7460612.v1
Dataset updated
Jun 1, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
F. Musacchia; A. Ciolfi; M. Mutarelli; A. Bruselles; R. Castello; M. Pinelli; S. Basu; S. Banfi; G. Casari; M. Tartaglia; V. Nigro
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
An example sample sheet containing samples information that is used to start an analysis in VarGenius. (TSV 330 bytes)
Bike Store Relational Database | SQL
kaggle.com
zip
Updated Aug 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dillon Myrick (2023). Bike Store Relational Database | SQL [Dataset]. https://www.kaggle.com/datasets/dillonmyrick/bike-store-sample-database
Explore at:
zip(94412 bytes)Available download formats
Dataset updated
Aug 21, 2023
Authors
Dillon Myrick
Description
This is the sample database from sqlservertutorial.net. This is a great dataset for learning SQL and practicing querying relational databases.

Database Diagram:

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4146319%2Fc5838eb006bab3938ad94de02f58c6c1%2FSQL-Server-Sample-Database.png?generation=1692609884383007&alt=media" alt="">

Terms of Use

The sample database is copyrighted and cannot be used for commercial purposes. For example, it cannot be used for the following but is not limited to the purposes: - Selling - Including in paid courses
H
PostgreSQL Dump of IMDB Data for JOB Workload
dataverse.harvard.edu
search.dataone.org
Updated Sep 24, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ryan Marcus (2019). PostgreSQL Dump of IMDB Data for JOB Workload [Dataset]. http://doi.org/10.7910/DVN/2QYZBT
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/2QYZBT
Dataset updated
Sep 24, 2019
Dataset provided by
Harvard Dataverse
Authors
Ryan Marcus
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This is a dump generated by pg_dump -Fc of the IMDb data used in the "How Good are Query Optimizers, Really?" paper. PostgreSQL compatible SQL queries and scripts to automatically create a VM with this dataset can be found here: https://git.io/imdb
Z
Data from: Atlas of European Eel Distribution (Anguilla anguilla) in...
data.niaid.nih.gov
zenodo.org
Updated Jul 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mateo, Maria; Drouineau, Hilaire; Pella, Herve; Beaulaton, Laurent; Amilhat, Elsa; Bardonnet, Agnès; Domingos, Isabel; Fernández-Delgado, Carlos; De Miguel Rubio, Ramon; Herrera, Mercedes; Korta, Maria; Zamora, Lluis; Díaz, Estibalitz; Briand, Cédric (2024). Atlas of European Eel Distribution (Anguilla anguilla) in Portugal, Spain and France [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6021837
Explore at:
Dataset updated
Jul 12, 2024
Dataset provided by
University of Córdoba
EPTB-Vilaine
OFB
FCUL/MARE
INRAe
University of Perpignan
University of Girona
AZTI
Authors
Mateo, Maria; Drouineau, Hilaire; Pella, Herve; Beaulaton, Laurent; Amilhat, Elsa; Bardonnet, Agnès; Domingos, Isabel; Fernández-Delgado, Carlos; De Miguel Rubio, Ramon; Herrera, Mercedes; Korta, Maria; Zamora, Lluis; Díaz, Estibalitz; Briand, Cédric
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Spain, France, Portugal
Description
DESCRIPTION

VERSIONS

version1.0.1 fixes problem with functions

version1.0.2 added table dbeel_rivers.rn_rivermouth with GEREM basin, distance to Gibraltar and link to CCM.

version1.0.3 fixes problem with functions

version1.0.4 adds views rn_rna and rn_rne to the database

The SUDOANG project aims at providing common tools to managers to support eel conservation in the SUDOE area (Spain, France and Portugal). VISUANG is the SUDOANG Interactive Web Application that host all these tools . The application consists of an eel distribution atlas (GT1), assessments of mortalities caused by turbines and an atlas showing obstacles to migration (GT2), estimates of recruitment and exploitation rate (GT3) and escapement (chosen as a target by the EC for the Eel Management Plans) (GT4). In addition, it includes an interactive map showing sampling results from the pilot basin network produced by GT6.

The eel abundance for the eel atlas and escapement has been obtained using the Eel Density Analysis model (EDA, GT4's product). EDA extrapolates the abundance of eel in sampled river segments to other segments taking into account how the abundance, sex and size of the eels change depending on different parameters. Thus, EDA requires two main data sources: those related to the river characteristics and those related to eel abundance and characteristics.

However, in both cases, data availability was uneven in the SUDOE area. In addition, this information was dispersed among several managers and in different formats due to different sampling sources: Water Framework Directive (WFD), Community Framework for the Collection, Management and Use of Data in the Fisheries Sector (EUMAP), Eel Management Plans, research groups, scientific papers and technical reports. Therefore, the first step towards having eel abundance estimations including the whole SUDOE area, was to have a joint river and eel database. In this report we will describe the database corresponding to the river’s characteristics in the SUDOE area and the eel abundances and their characteristics.

In the case of rivers, two types of information has been collected:

River topology (RN table): a compilation of data on rivers and their topological and hydrographic characteristics in the three countries.

River attributes (RNA table): contains physical attributes that have fed the SUDOANG models.

The estimation of eel abundance and characteristic (size, biomass, sex-ratio and silver) distribution at different scales (river segment, basin, Eel Management Unit (EMU), and country) in the SUDOE area obtained with the implementation of the EDA2.3 model has been compiled in the RNE table (eel predictions).

CURRENT ACTIVE PROJECT

The project is currently active here : gitlab forgemia

TECHNICAL DESCRIPTION TO BUILD THE POSTGRES DATABASE

Build the database in postgres.

All tables are in ESPG:3035 (European LAEA). The format is postgreSQL database. You can download other formats (shapefiles, csv), here SUDOANG gt1 database.

Initial command

open a shell with command CMD

Move to the place where you have downloaded the file using the following command

cd c:/path/to/my/folder

note psql must be accessible, in windows you can add the path to the postgres

bin folder, otherwise you need to add the full path to the postgres bin folder see link to instructions below

createdb -U postgres eda2.3 psql -U postgres eda2.3

this will open a command with # where you can launch the commands in the next box

Within the psql command

create extension "postgis"; create extension "dblink"; create extension "ltree"; create extension "tablefunc"; create schema dbeel_rivers; create schema france; create schema spain; create schema portugal; -- type \q to quit the psql shell

Now the database is ready to receive the differents dumps. The dump file are large. You might not need the part including unit basins or waterbodies. All the tables except waterbodies and unit basins are described in the Atlas. You might need to understand what is inheritance in a database. https://www.postgresql.org/docs/12/tutorial-inheritance.html

RN (riversegments)

These layers contain the topology (see Atlas for detail)

dbeel_rivers.rn

france.rn

spain.rn

portugal.rn

Columns (see Atlas)

gid idsegment source target lengthm nextdownidsegment path isfrontier issource seaidsegment issea geom isendoreic isinternational country

dbeel_rivers.rn_rivermouth

seaidsegment geom (polygon) gerem_zone_3 gerem_zone_4 (used in EDA) gerem_zone_5 ccm_wso_id country emu_name_short geom_outlet (point) name_basin dist_from_gibraltar_km name_coast basin_name

dbeel_rivers.rn ! mandatory => table at the international level from which

the other table inherit

even if you don't want to use other countries

(In many cases you should ... there are transboundary catchments) download this first.

the rn network must be restored firt !

table rne and rna refer to it by foreign keys.

pg_restore -U postgres -d eda2.3 "dbeel_rivers.rn.backup"

france

pg_restore -U postgres -d eda2.3 "france.rn.backup"

spain

pg_restore -U postgres -d eda2.3 "spain.rn.backup"

portugal

pg_restore -U postgres -d eda2.3 "portugal.rn.backup"

rivermouth and basins, this file contains GEREM basins, distance to Gibraltar, the link to CCM id

for each basin flowing to the sea. pg_restore -U postgres -d eda2.3 "dbeel_rivers.rn_rivermouth.backup"

with the schema you will probably want to be able to use the functions, but launch this only after

restoring rna in the next step

psql -U postgres -d eda2.3 -f "function_dbeel_rivers.sql"

RNA (Attributes)

This corresponds to tables

dbeel_rivers.rna

france.rna

spain.rna

portugal.rna

Columns (See Atlas)

idsegment altitudem distanceseam distancesourcem cumnbdam medianflowm3ps surfaceunitbvm2 surfacebvm2 strahler shreeve codesea name pfafriver pfafsegment basin riverwidthm temperature temperaturejan temperaturejul wettedsurfacem2 wettedsurfaceotherm2 lengthriverm emu cumheightdam riverwidthmsource slope dis_m3_pyr_riveratlas dis_m3_pmn_riveratlas dis_m3_pmx_riveratlas drought drought_type_calc

Code :

pg_restore -U postgres -d eda2.3 "dbeel_rivers.rna.backup" pg_restore -U postgres -d eda2.3 "france.rna.backup" pg_restore -U postgres -d eda2.3 "spain.rna.backup"
pg_restore -U postgres -d eda2.3 "portugal.rna.backup"

RNE (eel predictions)

These layers contain eel data (see Atlas for detail)

dbeel_rivers.rne

france.rne

spain.rne

portugal.rne

Columns (see Atlas)

idsegment surfaceunitbvm2 surfacebvm2 delta gamma density neel beel peel150 peel150300 peel300450 peel450600 peel600750 peel750 nsilver bsilver psilver150300 psilver300450 psilver450600 psilver600750 psilver750 psilver pmale150300 pmale300450 pmale450600 pfemale300450 pfemale450600 pfemale600750 pfemale750 pmale pfemale sex_ratio cnfemale300450 cnfemale450600 cnfemale600750 cnfemale750 cnmale150300 cnmale300450 cnmale450600 cnsilver150300 cnsilver300450 cnsilver450600 cnsilver600750 cnsilver750 cnsilver delta_tr gamma_tr type_fit_delta_tr type_fit_gamma_tr density_tr density_pmax_tr neel_pmax_tr nsilver_pmax_tr density_wd neel_wd beel_wd nsilver_wd bsilver_wd sector_tr year_tr is_current_distribution_area is_pristine_distribution_area_1985

Code for restauration

pg_restore -U postgres -d eda2.3 "dbeel_rivers.rne.backup" pg_restore -U postgres -d eda2.3 "france.rne.backup" pg_restore -U postgres -d eda2.3 "spain.rne.backup"
pg_restore -U postgres -d eda2.3 "portugal.rne.backup"

Unit basins

Units basins are not described in the Altas. They correspond to the following tables :

dbeel_rivers.basinunit_bu

france.basinunit_bu

spain.basinunit_bu

portugal.basinunit_bu

france.basinunitout_buo

spain.basinunitout_buo

portugal.basinunitout_buo

The unit basins is the simple basin that surrounds a segment. It correspond to the topography unit from which unit segment have been calculated. ESPG 3035. Tables bu_unitbv, and bu_unitbvout inherit from dbeel_rivers.unit_bv. The first table intersects with a segment, the second table does not, it corresponds to basin polygons which do not have a riversegment.

Source :

Portugal

https://sniambgeoviewer.apambiente.pt/Geodocs/gml/inspire/HY_PhysicalWaters_DrainageBasinGeoCod.ziphttps://sniambgeoviewer.apambiente.pt/Geodocs/gml/inspire/HY_PhysicalWaters_DrainageBasinGeoCod.zip

France

In france unit bv corresponds to the RHT (Pella et al., 2012)

Spain

http://www.mapama.gob.es/ide/metadatos/index.html?srv=metadata.show&uuid=898f0ff8-f06c-4c14-88f7-43ea90e48233

pg_restore -U postgres -d eda2.3 'dbeel_rivers.basinunit_bu.backup'

france

pg_restore -U postgres -d eda2.3
🏪🏬 Pagila (PostgreSQL Sample Database)
kaggle.com
zip
Updated Aug 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Kapturov (2025). 🏪🏬 Pagila (PostgreSQL Sample Database) [Dataset]. https://www.kaggle.com/datasets/kapturovalexander/pagila-postgresql-sample-database/discussion
Explore at:
zip(1926924 bytes)Available download formats
Dataset updated
Aug 17, 2025
Authors
Alexander Kapturov
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
DVD rental database to demonstrate the features of PostgreSQL.

There are 15 tables in the DVD Rental database:

actor – stores actors data including first name and last name.

film – stores film data such as title, release year, length, rating, etc.

film_actor – stores the relationships between films and actors.

category – stores film’s categories data.

film_category- stores the relationships between films and categories.

store – contains the store data including manager staff and address.

inventory – stores inventory data.

rental – stores rental data.

payment – stores customer’s payments.

staff – stores staff data.

customer – stores customer data.

address – stores address data for staff and customers

city – stores city names.

country – stores country names.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10074224%2F428950174ca8917693d9a125242c9a02%2F2.png?generation=1688974937835056&alt=media" alt="">

Launch pagila-schema.sql code in PgAdmin 4 and then launch pagila-insert-data.sql

Don't forget to switch on auto-commit mode.
d
Technographic Data | 22M Records | Refreshed 2x/Mo | Delivery Hourly via...
datarade.ai
.json, .csv, .sql
Updated Jan 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Forager.ai (2023). Technographic Data | 22M Records | Refreshed 2x/Mo | Delivery Hourly via CSV/JSON/PostgreSQL DB Delivery | B2B Data [Dataset]. https://datarade.ai/data-products/technographic-data-22m-records-refreshed-2x-mo-delivery-forager-ai
Explore at:
.json, .csv, .sqlAvailable download formats
Dataset updated
Jan 1, 2023
Dataset provided by
Forager.ai
Area covered
State of, Netherlands, Lithuania, Canada, French Southern Territories, South Georgia and the South Sandwich Islands, Botswana, Guernsey, Togo, Liechtenstein
Description
The Forager.ai Global Install Base Data set is a leading source of firmographic data, backed by advanced AI and offering the highest refresh rate in the industry.

| Volume and Stats |

Over 22M total records, the highest volume in the industry today.

Every company record refreshed twice a month, offering an unparalleled update frequency.

Delivery is made every hour, ensuring you have the latest data at your fingertips.

Each record is the result of an advanced AI-driven process, ensuring high-quality, accurate data.

| Use Cases |

Sales Platforms, ABM and Intent Data Platforms, Identity Platforms, Data Vendors:

Example applications include:

Uncover trending technologies or tools gaining popularity.

Pinpoint lucrative business prospects by identifying similar solutions utilized by a specific company.

Study a company's tech stacks to understand the technical capability and skills available within that company.

B2B Tech Companies:

Enrich leads that sign-up through the Company Search API (available separately).

Identify and map every company that fits your core personas and ICP.

Build audiences to target, using key fields like location, company size, industry, and description.

Venture Capital and Private Equity:

Discover new investment opportunities using company descriptions and industry-level data.

Review the growth of private companies and benchmark their strength against competitors.

Create high-level views of companies competing in popular verticals for investment.

| Delivery Options |

Flat files via S3 or GCP

PostgreSQL Shared Database

PostgreSQL Managed Database

API

Other options available upon request, depending on the scale required

Our dataset provides a unique blend of volume, freshness, and detail that is perfect for Sales Platforms, B2B Tech, VCs & PE firms, Marketing Automation, ABM & Intent. It stands as a cornerstone in our broader data offering, ensuring you have the information you need to drive decision-making and growth.

Tags: Company Data, Company Profiles, Employee Data, Firmographic Data, AI-Driven Data, High Refresh Rate, Company Classification, Private Market Intelligence, Workforce Intelligence, Public Companies.
G
Serverless PostgreSQL Market Research Report 2033
growthmarketreports.com
csv, pdf, pptx
Updated Oct 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Growth Market Reports (2025). Serverless PostgreSQL Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/serverless-postgresql-market
Explore at:
pdf, csv, pptxAvailable download formats
Dataset updated
Oct 7, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description
Serverless PostgreSQL Market Outlook

According to our latest research, the global serverless PostgreSQL market size reached USD 1.25 billion in 2024, reflecting robust adoption across industries. The market is poised to expand at a CAGR of 22.1% from 2025 to 2033, projecting a significant rise to USD 8.82 billion by 2033. This rapid growth is primarily driven by the increasing demand for scalable, cost-efficient, and low-maintenance database solutions, as enterprises accelerate their cloud migration and digital transformation journeys.

A key growth factor for the serverless PostgreSQL market is the compelling need for operational agility and cost optimization in database management. Traditional database systems require significant upfront investments in hardware, software, and skilled personnel for maintenance and scaling. In contrast, serverless PostgreSQL solutions eliminate the burden of infrastructure management, allowing organizations to focus on application development and innovation. The pay-as-you-go pricing model and automated scaling capabilities are particularly attractive for businesses with fluctuating workloads, enabling them to optimize resource utilization and reduce total cost of ownership. This paradigm shift is further fueled by the proliferation of cloud-native application architectures and the growing adoption of DevOps practices, which emphasize agility, automation, and continuous delivery.

Another critical driver is the rising demand for real-time data analytics and the integration of advanced technologies such as artificial intelligence, machine learning, and the Internet of Things (IoT). Serverless PostgreSQL offers seamless scalability and high availability, making it an ideal choice for data-intensive applications that require rapid ingestion, processing, and analysis of large data volumes. As organizations increasingly leverage data-driven insights to gain a competitive edge, the need for robust, flexible, and easily manageable database solutions continues to surge. Additionally, the open-source nature of PostgreSQL fosters innovation and customization, enabling enterprises to tailor their database environments to specific business requirements without vendor lock-in.

Furthermore, the expanding ecosystem of cloud service providers and managed database platforms is accelerating the adoption of serverless PostgreSQL on a global scale. Leading cloud vendors are continuously enhancing their offerings with advanced features such as automated backups, security compliance, multi-region replication, and integrated monitoring tools. These advancements simplify database operations and enhance reliability, security, and performance, making serverless PostgreSQL a preferred choice for mission-critical applications across diverse industry verticals. The growing emphasis on digital transformation, coupled with the rising trend of remote work and distributed teams, is expected to sustain the momentum of market growth in the coming years.

From a regional perspective, North America continues to dominate the serverless PostgreSQL market, accounting for the largest revenue share in 2024. This leadership is attributed to the presence of major cloud service providers, early adoption of advanced technologies, and a mature IT infrastructure. However, the Asia Pacific region is witnessing the fastest growth, driven by rapid digitalization, increasing cloud investments, and a burgeoning startup ecosystem. Europe also represents a significant market, supported by stringent data protection regulations and a growing focus on cloud-based innovation. Latin America and the Middle East & Africa are gradually catching up, propelled by government initiatives and rising awareness of cloud benefits, though their market shares remain relatively modest compared to the leading regions.

Deployment Type Analysis

The deployment type segment of the serverless PostgreSQL market is categorized into public cloud, private cloud, and hybrid cloud. The public
Z
Location of Ryanodine Receptor Type 2 Associated Catecholaminergic...
data.niaid.nih.gov
explore.openaire.eu
+1more
Updated Jan 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chang, Alexander; Beqaj, Halil; Sittenfeld, Leah; Miotto, Marco; Dridi, Haikel; Willson, Gloria; Jorge Martinez, Carolyn; Altosaar Li, Jaan; Reiken, Steven; Liu, Yang; Dai, Zonglin; Tchagou, Carl Christopher; Marks, Andrew (2025). Location of Ryanodine Receptor Type 2 Associated Catecholaminergic Polymorphic Ventricular Tachycardia Variants Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8277761
Explore at:
Dataset updated
Jan 14, 2025
Dataset provided by
Columbia University Irving Medical Center
Columbia University
One Fact Foundation
Authors
Chang, Alexander; Beqaj, Halil; Sittenfeld, Leah; Miotto, Marco; Dridi, Haikel; Willson, Gloria; Jorge Martinez, Carolyn; Altosaar Li, Jaan; Reiken, Steven; Liu, Yang; Dai, Zonglin; Tchagou, Carl Christopher; Marks, Andrew
Description
Location of RYR2 Associated CPVT Variants Dataset

Catecholaminergic polymorphic ventricular tachycardia (CPVT) is a rare inherited arrhythmia caused by pathogenic RYR2 variants. CPVT is characterized by exercise/stress-induced syncope and cardiac arrest in the absence of resting ECG and structural cardiac abnormalities.

Here, we present a database collected from 221 clinical papers, published from 2001-October 2020, about CPVT associated RYR2 variants. 1342 patients, both with and without CPVT, with RYR2 variants are in the database. There are a total of 964 CPVT patients or suspected CPVT patients in the database. The database includes information regarding genetic diagnosis, location of the RYR2 variant(s), clinical history and presentation, and treatment strategies for each patient. Patients will have a varying depth of information in each of the provided fields.

Database website: https://cpvtdb.port5000.com/

Dataset Information

This dataset includes:

all_data.xlsx

Tabular version of the database

Most relevant tables in the PostgreSQL database regarding patient sex, conditions, treatments, family history, and variant information were joined to create this database

Views calculating the affected RYR2 exons, domains and subdomains have been joined to patient information

m-n tables for patient's conditions and treatments have been converted to pivot tables - every condition and treatment that has at least 1 person with that condition or treatment is a column.

NOTE: This was created using a LEFT JOIN of individuals and individual_variants tables. Individuals with more than 1 recorded variant will be listed on multiple rows.

There is only 1 patient in this database with multiple recorded variants (all intronic)

20241219-dd040736b518.sql.gz

PostgreSQL database dump

Expands to about 200MB after loading the database dump

The database includes two schemas:

public: Includes all information in patients and variants

Also includes all RYR2 variants in ClinVar

uta: Contains the rows from biocommons/uta database required to make the hgvs Python package validate RYR2 variants

See https://github.com/biocommons/uta for more information

NOTE: It is recommended to use this version of the database only for development or analysis purposes

database_tables.pdf

Contains information on most of the database tables and columns in the public schema

00_globals.sql

Required to load the PostgreSQL database dump

How To Load Database Using Docker

First, download the 00_globals.sql and _.gz.sql file and move it into a directory. The default postgres image will load files from the /docker-entrypoint-initdb.d directory if the database is empty. See Docker Hub for more information. Mount the directory with the files into the /docker-entrypoint-initdb.d.

Example using docker compose with pgadmin and a volume to persist the data.

Use postgres/example user/password credentials

volumes: mydatabasevolume: null

services:

db: image: postgres:16 restart: always environment: POSTGRES_PASSWORD: mysecretpassword POSTGRES_USER: postgres volumes: - ':/docker-entrypoint-initdb.d/' - 'mydatabasevolume:/var/lib/postgresql/data' ports: - 5432:5432

pgadmin: image: dpage/pgadmin4 environment: PGADMIN_DEFAULT_EMAIL: user@domain.com PGADMIN_DEFAULT_PASSWORD: SuperSecret ports: - 8080:80

Analysis Code

See https://github.com/alexdaiii/cpvt_database_analysis for source code to create the xlsx file and analysis of the data.

Changelist

v0.3.0

Removed inasscessable publications

Updated publications tgo include information on what type of publication it is (e.g. Original Article, Abstract, Review, etc)

v0.2.1

Updated all_patients.xlsx -> all_data.xlsx

Corrected how the data from all the patient's conditions, diseases, treatments, and the patients' variants tables are joined
Steam Dataset 2025: Multi-Modal Gaming Analytics
kaggle.com
zip
Updated Oct 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CrainBramp (2025). Steam Dataset 2025: Multi-Modal Gaming Analytics [Dataset]. https://www.kaggle.com/datasets/crainbramp/steam-dataset-2025-multi-modal-gaming-analytics
Explore at:
zip(12478964226 bytes)Available download formats
Dataset updated
Oct 7, 2025
Authors
CrainBramp
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Steam Dataset 2025: Multi-Modal Gaming Analytics Platform

The first multi-modal Steam dataset with semantic search capabilities. 239,664 applications collected from official Steam Web APIs with PostgreSQL database architecture, vector embeddings for content discovery, and comprehensive review analytics.

Made by a lifelong gamer for the gamer in all of us. Enjoy!🎮

GitHub Repository https://github.com/vintagedon/steam-dataset-2025

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F28514182%2F4b7eb73ac0f2c3cc9f0d57f37321b38f%2FScreenshot%202025-10-18%20180450.png?generation=1760825194507387&alt=media" alt=""> 1024-dimensional game embeddings projected to 2D via UMAP reveal natural genre clustering in semantic space

What Makes This Different

Unlike traditional flat-file Steam datasets, this is built as an analytically-native database optimized for advanced data science workflows:

☑️ Semantic Search Ready - 1024-dimensional BGE-M3 embeddings enable content-based game discovery beyond keyword matching

☑️ Multi-Modal Architecture - PostgreSQL + JSONB + pgvector in unified database structure

☑️ Production Scale - 239K applications vs typical 6K-27K in existing datasets

☑️ Complete Review Corpus - 1,048,148 user reviews with sentiment and metadata

☑️ 28-Year Coverage - Platform evolution from 1997-2025

☑️ Publisher Networks - Developer and publisher relationship data for graph analysis

☑️ Complete Methodology & Infrastructure - Full work logs document every technical decision and challenge encountered, while my API collection scripts, database schemas, and processing pipelines enable you to update the dataset, fork it for customized analysis, learn from real-world data engineering workflows, or critique and improve the methodology

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F28514182%2F649e9f7f46c6ce213101d0948c89e8ac%2F4_price_distribution_by_top_10_genres.png?generation=1760824835918620&alt=media" alt=""> Market segmentation and pricing strategy analysis across top 10 genres

What's Included

Core Data (CSV Exports): - 239,664 Steam applications with complete metadata - 1,048,148 user reviews with scores and statistics - 13 normalized relational tables for pandas/SQL workflows - Genre classifications, pricing history, platform support - Hardware requirements (min/recommended specs) - Developer and publisher portfolios

Advanced Features (PostgreSQL): - Full database dump with optimized indexes - JSONB storage preserving complete API responses - Materialized columns for sub-second query performance - Vector embeddings table (pgvector-ready)

Documentation: - Complete data dictionary with field specifications - Database schema documentation - Collection methodology and validation reports

Example Analysis: Published Notebooks (v1.0)

Three comprehensive analysis notebooks demonstrate dataset capabilities. All notebooks render directly on GitHub with full visualizations and output:

📊 Platform Evolution & Market Landscape

View on GitHub | PDF Export
28 years of Steam's growth, genre evolution, and pricing strategies.

🔍 Semantic Game Discovery

View on GitHub | PDF Export
Content-based recommendations using vector embeddings across genre boundaries.

🎯 The Semantic Fingerprint

View on GitHub | PDF Export
Genre prediction from game descriptions - demonstrates text analysis capabilities.

Notebooks render with full output on GitHub. Kaggle-native versions planned for v1.1 release. CSV data exports included in dataset for immediate analysis.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F28514182%2F4079e43559d0068af00a48e2c31f0f1d%2FScreenshot%202025-10-18%20180214.png?generation=1760824950649726&alt=media" alt=""> *Steam platfor...
Most popular database management systems worldwide 2024
statista.com
Updated Jun 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Most popular database management systems worldwide 2024 [Dataset]. https://www.statista.com/statistics/809750/worldwide-popularity-ranking-database-management-systems/
Explore at:
Dataset updated
Jun 15, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jun 2024
Area covered
Worldwide
Description
As of June 2024, the most popular database management system (DBMS) worldwide was Oracle, with a ranking score of *******; MySQL and Microsoft SQL server rounded out the top three. Although the database management industry contains some of the largest companies in the tech industry, such as Microsoft, Oracle and IBM, a number of free and open-source DBMSs such as PostgreSQL and MariaDB remain competitive. Database Management Systems As the name implies, DBMSs provide a platform through which developers can organize, update, and control large databases. Given the business world’s growing focus on big data and data analytics, knowledge of SQL programming languages has become an important asset for software developers around the world, and database management skills are seen as highly desirable. In addition to providing developers with the tools needed to operate databases, DBMS are also integral to the way that consumers access information through applications, which further illustrates the importance of the software.
ParkingDB HCMCity PostgreSQL
kaggle.com
zip
Updated Dec 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nghĩa Trung (2024). ParkingDB HCMCity PostgreSQL [Dataset]. https://www.kaggle.com/datasets/ren294/parkingdb-hcmcity-postgres
Explore at:
zip(504763600 bytes)Available download formats
Dataset updated
Dec 26, 2024
Authors
Nghĩa Trung
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description

This database supports the "SmartTraffic_Lakehouse_for_HCMC" project, designed to improve traffic management in Ho Chi Minh City by leveraging big data and modern lakehouse architecture.

This database manages operations for a parking lot system in Ho Chi Minh City, Vietnam, tracking everything from parking records to customer feedback. The database contains operational data for managing parking facilities, including vehicle tracking, payment processing, customer management, and staff scheduling. It's an excellent example of a comprehensive system for managing a modern parking infrastructure, handling different vehicle types (cars, motorbikes, and bicycles) and various payment methods.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13779146%2F40c8bbd9fd27a7b9fbe7c77598512cf2%2FParkingTransaction.png?generation=1735218498592627&alt=media" alt="">

The parking management database includes sample data for the following: - Owner: Customer information including contact details, enabling personalized service and feedback tracking - Vehicle: Detailed vehicle information linked to owners, including license plates, types, colors, and brands - ParkingLot: Information about different parking facilities at shopping malls, including capacity management for different vehicle types and hourly rates - ParkingRecord: Tracks vehicle entry/exit times and calculated parking fees - Payment: Records payment transactions with various payment methods (Cash, E-Wallet) - Feedback: Stores customer ratings and comments about parking services - Promotion: Manages promotional campaigns with discount rates and valid periods - Staff: Manages parking facility employees, including roles, contact information, and shift schedules

The design reflects real-world requirements for managing complex parking operations in a busy metropolitan area. The system can track occupancy rates, process payments, manage staff schedules, and handle customer relations across multiple locations.

Note: This database is part of the SmartTraffic_Lakehouse_for_HCMC project, designed to improve urban mobility management in Ho Chi Minh City. All data contained within is simulated for demonstration and development purposes. The project was created by Nguyen Trung Nghia (ren294) and is available on GitHub.

About my project: - Project: SmartTraffic_Lakehouse_for_HCMC - Author: Nguyen Trung Nghia (ren294) - Contact: trungnghia294@gmail.com - GitHub: Ren294
H
Hydrocam Sample Data and Processed Results
hydroshare.org
zip
Updated Aug 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sajan Neupane; Jeffery S. Horsburgh (2025). Hydrocam Sample Data and Processed Results [Dataset]. https://www.hydroshare.org/resource/5d872ecf37684244a12e729c2790a0c3
Explore at:
zip(406.4 MB)Available download formats
Dataset updated
Aug 3, 2025
Dataset provided by
HydroShare
Authors
Sajan Neupane; Jeffery S. Horsburgh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 12, 2024
Area covered

Description
This HydroShare resource provides a complete example of camera-based streamflow monitoring data collection and automated segmentation processing for the Blacksmith Fork site, demonstrated on one day of real image data. It includes both example image inputs and compute outputs generated using a containerized cloud-based inference pipeline. The processing workflow uses the Segment Anything deep learning model, deployed in a serverless environment with AWS Lambda and S3. Each image is segmented to identify regions of interest (ROIs) and calculate water-relevant pixel statistics. Ground truth comparison supports quality assurance using Intersection over Union (IoU) scores. Results are automatically uploaded and stored in a PostgreSQL database for hydrologic analysis. This dataset supports the reproducibility of the modeling approaches described in submitted manuscripts to Environmental Modelling & Software, offering transparency into the full data processing pipeline from raw image ingestion to output storage. It serves as a reference implementation for camera-based environmental monitoring at scale.
Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter...
zenodo.org
application/gzip
Updated Mar 16, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana (2021). Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter Notebooks / Understanding and Improving the Quality and Reproducibility of Jupyter Notebooks [Dataset]. http://doi.org/10.5281/zenodo.3519618
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3519618
Dataset updated
Mar 16, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of Jupyter Notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourages poor coding practices and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we analyzed 1.4 million notebooks from GitHub. Based on the results, we proposed and evaluated Julynter, a linting tool for Jupyter Notebooks.

Papers:

PIMENTEL, J. F.; MURTA, L.; BRAGANHOLO, V.; FREIRE, J.; A large-scale study about quality and reproducibility of jupyter notebooks. In: International Conference on Mining Software Repositories (MSR), 2019, Montreal, Canada.

PIMENTEL, J. F.; MURTA, L.; BRAGANHOLO, V.; FREIRE, J.; Understanding and Improving the Quality and Reproducibility of Jupyter Notebooks. Empirical Software Engineering, 2021 (in press)

This repository contains three files:

db2020-09-22.dump.gz

sample.tar.gz

julynter_reproducility.tar.gz

Reproducing the Notebook Study

The db2020-09-22.dump.gz file contains a PostgreSQL dump of the database, with all the data we extracted from notebooks. For loading it, run:

gunzip -c db2020-09-22.dump.gz | psql jupyter

Note that this file contains only the database with the extracted data. The actual repositories are available in a google drive folder, which also contains the docker images we used in the reproducibility study. The repositories are stored as content/{hash_dir1}/{hash_dir2}.tar.bz2, where hash_dir1 and hash_dir2 are columns of repositories in the database.

For scripts, notebooks, and detailed instructions on how to analyze or reproduce the data collection, please check the instructions on the Jupyter Archaeology repository (tag 1.0.0)

The sample.tar.gz file contains the repositories obtained during the manual sampling.

Reproducing the Julynter Experiment

The julynter_reproducility.tar.gz file contains all the data collected in the Julynter experiment and the analysis notebooks. Reproducing the analysis is straightforward:

Uncompress the file: $ tar zxvf julynter_reproducibility.tar.gz

Install the dependencies: $ pip install julynter/requirements.txt

Run the notebooks in order: J1.Data.Collection.ipynb; J2.Recommendations.ipynb; J3.Usability.ipynb.

The collected data is stored in the julynter/data folder.

Changelog

2019/01/14 - Version 1 - Initial version
2019/01/22 - Version 2 - Update N8.Execution.ipynb to calculate the rate of failure for each reason
2019/03/13 - Version 3 - Update package for camera ready. Add columns to db to detect duplicates, change notebooks to consider them, and add N1.Skip.Notebook.ipynb and N11.Repository.With.Notebook.Restriction.ipynb.
2021/03/15 - Version 4 - Add Julynter experiment; Update database dump to include new data collected for the second paper; remove scripts and analysis notebooks from this package (moved to GitHub), add a link to Google Drive with collected repository files
Z
Up-to-date mapping of COVID-19 treatment and vaccine development...
data.niaid.nih.gov
zenodo.org
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wagner, Tomáš; Mišová, Ivana; Frankovský, Ján (2024). Up-to-date mapping of COVID-19 treatment and vaccine development (covid19-help.org data dump) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4601445
Explore at:
Dataset updated
Jul 19, 2024
Dataset provided by
Direct Impact s.r.o.
Authors
Wagner, Tomáš; Mišová, Ivana; Frankovský, Ján
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The free database mapping COVID-19 treatment and vaccine development based on the global scientific research is available at https://covid19-help.org/.

Files provided here are curated partial data exports in the form of .csv files or full data export as .sql script generated with pg_dump from our PostgreSQL 12 database. You can also find .png file with our ER diagram of tables in .sql file in this repository.

Structure of CSV files

*On our site, compounds are named as substances

compounds.csv

Id - Unique identifier in our database (unsigned integer)

Name - Name of the Substance/Compound (string)

Marketed name - The marketed name of the Substance/Compound (string)

Synonyms - Known synonyms (string)

Description - Description (HTML code)

Dietary sources - Dietary sources where the Substance/Compound can be found (string)

Dietary sources URL - Dietary sources URL (string)

Formula - Compound formula (HTML code)

Structure image URL - Url to our website with the structure image (string)

Status - Status of approval (string)

Therapeutic approach - Approach in which Substance/Compound works (string)

Drug status - Availability of Substance/Compound (string)

Additional data - Additional data in stringified JSON format with data as prescribing information and note (string)

General information - General information about Substance/Compound (HTML code)

references.csv

Id - Unique identifier in our database (unsigned integer)

Impact factor - Impact factor of the scientific article (string)

Source title - Title of the scientific article (string)

Source URL - URL link of the scientific article (string)

Tested on species - What testing model was used for the study (string)

Published at - Date of publication of the scientific article (Date in ISO 8601 format)

clinical-trials.csv

Id - Unique identifier in our database (unsigned integer)

Title - Title of the clinical trial study (string)

Acronym title - Acronym of title of the clinical trial study (string)

Source id - Unique identifier in the source database

Source id optional - Optional identifier in other databases (string)

Interventions - Description of interventions (string)

Study type - Type of the conducted study (string)

Study results - Has results? (string)

Phase - Current phase of the clinical trial (string)

Url - URL to clinical trial study page on clinicaltrials.gov (string)

Status - Status in which study currently is (string)

Start date - Date at which study was started (Date in ISO 8601 format)

Completion date - Date at which study was completed (Date in ISO 8601 format)

Additional data - Additional data in the form of stringified JSON with data as locations of study, study design, enrollment, age, outcome measures (string)

compound-reference-relations.csv

Reference id - Id of a reference in our DB (unsigned integer)

Compound id - Id of a substance in our DB (unsigned integer)

Note - Id of a substance in our DB (unsigned integer)

Is supporting - Is evidence supporting or contradictory (Boolean, true if supporting)

compound-clinical-trial.csv

Clinical trial id - Id of a clinical trial in our DB (unsigned integer)

Compound id - Id of a Substance/Compound in our DB (unsigned integer)

tags.csv

Id - Unique identifier in our database (unsigned integer)

Name - Name of the tag (string)

tags-entities.csv

Tag id - Id of a tag in our DB (unsigned integer)

Reference id - Id of a reference in our DB (unsigned integer)

API Specification

Our project also has an Open API that gives you access to our data in a format suitable for processing, particularly in JSON format.

https://covid19-help.org/api-specification

Services are split into five endpoints:

Substances - /api/substances

References - /api/references

Substance-reference relations - /api/substance-reference-relations

Clinical trials - /api/clinical-trials

Clinical trials-substances relations - /api/clinical-trials-substances

Method of providing data

All dates are text strings formatted in compliance with ISO 8601 as YYYY-MM-DD

If the syntax request is incorrect (missing or incorrectly formatted parameters) an HTTP 400 Bad Request response will be returned. The body of the response may include an explanation.

Data updated_at (used for querying changed-from) refers only to a particular entity and not its logical relations. Example: If a new substance reference relation is added, but the substance detail has not changed, this is reflected in the substance reference relation endpoint where a new entity with id and current dates in created_at and updated_at fields will be added, but in substances or references endpoint nothing has changed.

The recommended way of sequential download

During the first download, it is possible to obtain all data by entering an old enough date in the parameter value changed-from, for example: changed-from=2020-01-01 It is important to write down the date on which the receiving the data was initiated let’s say 2020-10-20

For repeated data downloads, it is sufficient to receive only the records in which something has changed. It can therefore be requested with the parameter changed-from=2020-10-20 (example from the previous bullet). Again, it is important to write down the date when the updates were downloaded (eg. 2020-10-20). This date will be used in the next update (refresh) of the data.

Services for entities

List of endpoint URLs:

/api/substances

/api/references

/api/substance-reference-relations

/api/clinical-trials

/api/clinical-trials-substances

Format of the request

All endpoints have these parameters in common:

changed-from - a parameter to return only the entities that have been modified on a given date or later.

continue-after-id - a parameter to return only the entities that have a larger ID than specified in the parameter.

limit - a parameter to return only the number of records specified (up to 1000). The preset number is 100.

Request example:

/api/references?changed-from=2020-01-01&continue-after-id=1&limit=100

Format of the response

The response format is the same for all endpoints.

number_of_remaining_ids - the number of remaining entities that meet the specified criteria but are not displayed on the page. An integer of virtually unlimited size.

entities - an array of entity details in JSON format.

Response example:

{

"number_of_remaining_ids" : 100, "entities" : [ { "id": 3, "url": "https://www.ncbi.nlm.nih.gov/pubmed/32147628", "title": "Discovering drugs to treat coronavirus disease 2019 (COVID-19).", "impact_factor": "Discovering drugs to treat coronavirus disease 2019 (COVID-19).", "tested_on_species": "in silico", "publication_date": "2020-22-02", "created_at": "2020-30-03", "updated_at": "2020-31-03", "deleted_at": null }, { "id": 4, "url": "https://www.ncbi.nlm.nih.gov/pubmed/32157862", "title": "CT Manifestations of Novel Coronavirus Pneumonia: A Case Report", "impact_factor": "CT Manifestations of Novel Coronavirus Pneumonia: A Case Report", "tested_on_species": "Patient", "publication_date": "2020-06-03", "created_at": "2020-30-03", "updated_at": "2020-30-03", "deleted_at": null }, ]

}

Endpoint details

Substances

URL: /api/substances

Substances endpoint returns data in the format specified in Response example as an array of entities in JSON format specified in the entity format section.

Entity format:

id - Unique identifier in our database (unsigned integer)

name - Name of the Substance (string)

description - Description (HTML code)

phase_of_research - Phase of research (string)

how_it_helps - How it helps (string)

drug_status - Drug status (string)

general_information - General information (HTML code)

synonyms - Synonyms (string)

marketed_as - "Marketed as" (string)

dietary_sources - Dietary sources name (string)

dietary_sources_url - Dietary sources URL (string)

prescribing_information - Prescribing information as an array of JSON objects with description and URL attributes as strings

formula - Formula (HTML code)

created_at - Date when the entity was added to our database (Date in ISO 8601 format)

updated_at - Date when the entity was last updated in our database (Date in ISO 8601 format)

deleted_at - Date when the entity was deleted in our database (Date in ISO 8601 format)

References

URL: /api/references

References endpoint returns data in the format specified in Response example as an array of entities in JSON format specified in the entity format section.

Entity format:

id - Unique identifier in our database (unsigned integer)

url - URL link of the scientific article (string)

title - Title of the scientific article (string)

impact_factor - Impact factor of the scientific article (string)

tested_on_species - What testing model was used for the study (string)

publication_date - Date of publication of the scientific article (Date in ISO 8601 format)

created_at - Date when the entity was added to our database (Date in ISO 8601 format)

updated_at - Date when the entity was last updated in our database (Date in ISO 8601
Most popular database management systems in software companies in Russia...
statista.com
Updated Aug 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2022). Most popular database management systems in software companies in Russia 2022 [Dataset]. https://www.statista.com/statistics/1330732/most-popular-dbms-in-software-companies-russia/
Explore at:
Dataset updated
Aug 18, 2022
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Feb 2022 - May 2022
Area covered
Russia
Description
Approximately ** percent of the surveyed software companies in Russia mentioned PostgreSQL, making it the most popular database management system (DBMS) in the period between February and May 2022. MS SQL and MySQL followed, having been mentioned by ** percent and ** percent of respondents, respectively.
Z
Data from: SQL Injection Attack Netflow
data.niaid.nih.gov
portalcienciaytecnologia.jcyl.es
+3more
Updated Sep 28, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ignacio Crespo; Adrián Campazas (2022). SQL Injection Attack Netflow [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6907251
Explore at:
Dataset updated
Sep 28, 2022
Authors
Ignacio Crespo; Adrián Campazas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Introduction

This datasets have SQL injection attacks (SLQIA) as malicious Netflow data. The attacks carried out are SQL injection for Union Query and Blind SQL injection. To perform the attacks, the SQLMAP tool has been used.

NetFlow traffic has generated using DOROTHEA (DOcker-based fRamework fOr gaTHering nEtflow trAffic). NetFlow is a network protocol developed by Cisco for the collection and monitoring of network traffic flow data generated. A flow is defined as a unidirectional sequence of packets with some common properties that pass through a network device.

Datasets

The firts dataset was colleted to train the detection models (D1) and other collected using different attacks than those used in training to test the models and ensure their generalization (D2).

The datasets contain both benign and malicious traffic. All collected datasets are balanced.

The version of NetFlow used to build the datasets is 5.

Dataset Aim Samples Benign-malicious traffic ratio D1 Training 400,003 50% D2 Test 57,239 50%

Infrastructure and implementation

Two sets of flow data were collected with DOROTHEA. DOROTHEA is a Docker-based framework for NetFlow data collection. It allows you to build interconnected virtual networks to generate and collect flow data using the NetFlow protocol. In DOROTHEA, network traffic packets are sent to a NetFlow generator that has a sensor ipt_netflow installed. The sensor consists of a module for the Linux kernel using Iptables, which processes the packets and converts them to NetFlow flows.

DOROTHEA is configured to use Netflow V5 and export the flow after it is inactive for 15 seconds or after the flow is active for 1800 seconds (30 minutes)

Benign traffic generation nodes simulate network traffic generated by real users, performing tasks such as searching in web browsers, sending emails, or establishing Secure Shell (SSH) connections. Such tasks run as Python scripts. Users may customize them or even incorporate their own. The network traffic is managed by a gateway that performs two main tasks. On the one hand, it routes packets to the Internet. On the other hand, it sends it to a NetFlow data generation node (this process is carried out similarly to packets received from the Internet).

The malicious traffic collected (SQLI attacks) was performed using SQLMAP. SQLMAP is a penetration tool used to automate the process of detecting and exploiting SQL injection vulnerabilities.

The attacks were executed on 16 nodes and launch SQLMAP with the parameters of the following table.

Parameters Description '--banner','--current-user','--current-db','--hostname','--is-dba','--users','--passwords','--privileges','--roles','--dbs','--tables','--columns','--schema','--count','--dump','--comments', --schema' Enumerate users, password hashes, privileges, roles, databases, tables and columns --level=5 Increase the probability of a false positive identification --risk=3 Increase the probability of extracting data --random-agent Select the User-Agent randomly --batch Never ask for user input, use the default behavior --answers="follow=Y" Predefined answers to yes

Every node executed SQLIA on 200 victim nodes. The victim nodes had deployed a web form vulnerable to Union-type injection attacks, which was connected to the MYSQL or SQLServer database engines (50% of the victim nodes deployed MySQL and the other 50% deployed SQLServer).

The web service was accessible from ports 443 and 80, which are the ports typically used to deploy web services. The IP address space was 182.168.1.1/24 for the benign and malicious traffic-generating nodes. For victim nodes, the address space was 126.52.30.0/24. The malicious traffic in the test sets was collected under different conditions. For D1, SQLIA was performed using Union attacks on the MySQL and SQLServer databases.

However, for D2, BlindSQL SQLIAs were performed against the web form connected to a PostgreSQL database. The IP address spaces of the networks were also different from those of D1. In D2, the IP address space was 152.148.48.1/24 for benign and malicious traffic generating nodes and 140.30.20.1/24 for victim nodes.

To run the MySQL server we ran MariaDB version 10.4.12. Microsoft SQL Server 2017 Express and PostgreSQL version 13 were used.
g
A Holocene relative sea-level database for the Baltic Sea
dataservices.gfz-potsdam.de
Updated 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alar Rosentau; Volker Klemann; Ole Bennike; Holger Steffen; Jasmin Wehr; Milena Latinović; Meike Bagge; Antti Ojala; Mikael Berglund; Gustaf Peterson Becher; Kristian Schoning; Anton Hansson; Lars Nielsen; Lars B. Clemmensen; Mikkel U. Hede; Aart Kroon; Morten Pejrup; Lasse Sander; Karl Stattegger; Klaus Schwarzer; Reinhard Lampe; Matthias Lampe; Szymon Uścinowicz; Albertas Bitinas; Ieva Grudzinska; Jüri Vassiljev; Triine Nirgi; Yuriy Kublitskiy; Dmitry Subetto; Jasmin Wehr; Milena Latinović; Mikael Berglund; Kristian Schoning; Anton Hansson; Lars Nielsen; Mikkel U. Hede; Karl Stattegger; Matthias Lampe; Szymon Uścinowicz; Albertas Bitinas; Yuriy Kublitskiy (2021). A Holocene relative sea-level database for the Baltic Sea [Dataset]. http://doi.org/10.5880/gfz.1.3.2020.003
Explore at:
Unique identifier
https://doi.org/10.5880/gfz.1.3.2020.003
Dataset updated
2021
Dataset provided by
GFZ Data Services
datacite
Authors
Alar Rosentau; Volker Klemann; Ole Bennike; Holger Steffen; Jasmin Wehr; Milena Latinović; Meike Bagge; Antti Ojala; Mikael Berglund; Gustaf Peterson Becher; Kristian Schoning; Anton Hansson; Lars Nielsen; Lars B. Clemmensen; Mikkel U. Hede; Aart Kroon; Morten Pejrup; Lasse Sander; Karl Stattegger; Klaus Schwarzer; Reinhard Lampe; Matthias Lampe; Szymon Uścinowicz; Albertas Bitinas; Ieva Grudzinska; Jüri Vassiljev; Triine Nirgi; Yuriy Kublitskiy; Dmitry Subetto; Jasmin Wehr; Milena Latinović; Mikael Berglund; Kristian Schoning; Anton Hansson; Lars Nielsen; Mikkel U. Hede; Karl Stattegger; Matthias Lampe; Szymon Uścinowicz; Albertas Bitinas; Yuriy Kublitskiy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
We present a compilation and analysis of 1099 Holocene relative shore-level (RSL) indicators including 867 relative sea-level data points and 232 data points from the Ancylus Lake and the following transitional phase from 10.7 to 8.5 ka BP located around the Baltic Sea. The spatial distribution covers the Baltic Sea and near-coastal areas fairly well, but some gaps remain mainly in Sweden. RSL data follow the standardized HOLSEA format and, thus, are ready for spatially comprehensive applications in, e.g., glacial isostatic adjustment (GIA) modelling. Sampling method The data set is a compilation of rather different samples from geological, geomorphological and archaeological studies. Most of the data was already published in different formats. In this compilation we homogenized the meta information of the available information according to the HOLSEA database format, https://www.holsea.org/archive-your-data, which is a modification of the recommendations given in Hijma et al. (2015). In addition to the reformatting, the majority of samples with radiocarbon dating were recalibrated with oxcal-software using the calib13 and marine13 curves. Furthermore, all sample descriptions were critically checked for consistency in positioning, levelling and indicative meaning by experts of the respective geographic region see Supplement 2. Analytical method In principle, it is a compilation, recalibration and revision of already published data. Data Processing Data of individual compilations were revised and imported into a relational database system. Therein, the data was transferred into the HOLSEA format by specified rules. By this procedure, a homogeneous categorisation was achieved without losing the original data. Also this is stored in the relational database system allowing for later updates of the transfer procedure or a recalibration of the data. Description of data table HOLSEA-baltic-yymmdd.xlsx The workbook in excel format contains 5 sheets, see https://www.holsea.org/archive-your-data: · Long-form, containing the complete information available for each sample · Short-form, a subset of attributes of the Long-form sheet · Radiocarbon, containing the radiocarbon dating information of the respective samples · U-series, a corresponding table containing the respective information of Uranium dating · References, a complete reference list of the primary publications in which the individual data sampling is described. All online sources for the compilation are included in the metadata. A full list of source references is provided in the data description file.
d
Startup Data | 249 Countries Coverage | +95% Email and Phone Data Accuracy |...
datarade.ai
.json, .csv
Updated Jan 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Forager.ai (2023). Startup Data | 249 Countries Coverage | +95% Email and Phone Data Accuracy | Bi-weekly Refresh Rate | 50+ Data Points [Dataset]. https://datarade.ai/data-products/startup-data-company-data-refreshed-2x-mo-delivery-hour-forager-ai
Explore at:
.json, .csvAvailable download formats
Dataset updated
Jan 1, 2023
Dataset provided by
Forager.ai
Area covered
Angola, Somalia, Bangladesh, Saint Vincent and the Grenadines, Oman, Swaziland, Cameroon, Northern Mariana Islands, Dominica, New Zealand
Description
The Forager.ai Global Dataset is a leading source of firmographic data, backed by advanced AI and offering the highest refresh rate in the industry.

| Volume and Stats |

Over 70M total records, the highest volume in the industry today.

Every company record refreshed twice a month, offering an unparalleled update frequency.

Delivery is made every hour, ensuring you have the latest data at your fingertips.

Each record is the result of an advanced AI-driven process, ensuring high-quality, accurate data.

| Use Cases |

Sales Platforms, ABM and Intent Data Platforms, Identity Platforms, Data Vendors:

Example applications include:

Uncover trending technologies or tools gaining popularity.

Pinpoint lucrative business prospects by identifying similar solutions utilized by a specific company.

Study a company's tech stacks to understand the technical capability and skills available within that company.

B2B Tech Companies:

Enrich leads that sign-up through the Company Search API (available separately).

Identify and map every company that fits your core personas and ICP.

Build audiences to target, using key fields like location, company size, industry, and description.

Venture Capital and Private Equity:

Discover new investment opportunities using company descriptions and industry-level data.

Review the growth of private companies and benchmark their strength against competitors.

Create high-level views of companies competing in popular verticals for investment.

| Delivery Options |

Flat files via S3 or GCP

PostgreSQL Shared Database

PostgreSQL Managed Database

API

Other options available upon request, depending on the scale required

Our dataset provides a unique blend of volume, freshness, and detail that is perfect for Sales Platforms, B2B Tech, VCs & PE firms, Marketing Automation, ABM & Intent. It stands as a cornerstone in our broader data offering, ensuring you have the information you need to drive decision-making and growth.

Tags: Company Data, Company Profiles, Employee Data, Firmographic Data, AI-Driven Data, High Refresh Rate, Company Classification, Private Market Intelligence, Workforce Intelligence, Public Companies.
CHINOOK Music
kaggle.com
zip
Updated Sep 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
willian oliveira (2024). CHINOOK Music [Dataset]. https://www.kaggle.com/datasets/willianoliveiragibin/chinook-music
Explore at:
zip(9603 bytes)Available download formats
Dataset updated
Sep 19, 2024
Authors
willian oliveira
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The Chinook Database is a sample database designed for use with multiple database platforms, such as SQL Server, Oracle, MySQL, and others. It can be easily set up by running a single SQL script, making it a convenient alternative to the popular Northwind database. Chinook is widely used in demos and testing environments, particularly for Object-Relational Mapping (ORM) tools that target both single and multiple database servers.

Supported Database Servers Chinook supports several database servers, including:

DB2 MySQL Oracle PostgreSQL SQL Server SQL Server Compact SQLite Download Instructions You can download the SQL scripts for each supported database server from the latest release assets. The appropriate SQL script file(s) for your database vendor are provided, which can be executed using your preferred database management tool.

Data Model The Chinook Database represents a digital media store, containing tables that include:

Artists Albums Media tracks Invoices Customers Sample Data The media data in Chinook is derived from a real iTunes Library, providing a realistic dataset for users. Additionally, users can generate their own SQL scripts using their personal iTunes Library by following specific instructions. Customer and employee details in the database were manually crafted with fictitious names, addresses (mappable via Google Maps), and well-structured contact information such as phone numbers, faxes, and emails. Sales data is auto-generated and spans a four-year period, using random values.

Why is it Called Chinook? The Chinook Database's name is a nod to its predecessor, the Northwind database. Chinooks are warm, dry winds found in the interior regions of North America, particularly over southern Alberta in Canada, where the Canadian Prairies meet mountain ranges. This natural phenomenon inspired the choice of name, reflecting the idea that Chinook serves as a refreshing alternative to the Northwind database.
Data from: Eurasian Modern Pollen Database (former European Modern Pollen...
doi.pangaea.de
html, tsv
Updated Nov 26, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Manuel Chevalier; Basil A S Davis; Philipp S Sommer; Marco Zanon; Vachel A Carter; Leanne N Phelps; Achille Mauri; Walter Finsinger (2019). Eurasian Modern Pollen Database (former European Modern Pollen Database) [Dataset]. http://doi.org/10.1594/PANGAEA.909130
Explore at:
html, tsvAvailable download formats
Unique identifier
https://doi.org/10.1594/PANGAEA.909130
Dataset updated
Nov 26, 2019
Dataset provided by
PANGAEA
Authors
Manuel Chevalier; Basil A S Davis; Philipp S Sommer; Marco Zanon; Vachel A Carter; Leanne N Phelps; Achille Mauri; Walter Finsinger
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Eurasia,
Variables measured
LATITUDE, ELEVATION, LONGITUDE, Sample ID, Event label, Precipitation, May, Precipitation, July, Precipitation, June, Precipitation, April, Precipitation, March, and 29 more
Description
The Eurasian Modern Pollen Database (EMPD) contains modern pollen data (raw counts) for the entire Eurasian continent. Derived from the European Modern Pollen Database, the dataset contains many more samples West of the Ural Mountains. We propose this dataset in three different format: 1/ an Excel spreadsheet, 2/ a PostgreSQL dump and 3/ a SQLite3 portable database format. All three datasets are strictly equivalent. For download see "Original Version".

Facebook

Twitter

Click to copy link

Link copied

Cite

F. Musacchia; A. Ciolfi; M. Mutarelli; A. Bruselles; R. Castello; M. Pinelli; S. Basu; S. Banfi; G. Casari; M. Tartaglia; V. Nigro (2023). Additional file 1: of VarGenius executes cohort-level DNA-seq variant calling and annotation and allows to manage the resulting data through a PostgreSQL database [Dataset]. http://doi.org/10.6084/m9.figshare.7460612.v1

Additional file 1: of VarGenius executes cohort-level DNA-seq variant calling and annotation and allows to manage the resulting data through a PostgreSQL database

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.7460612.v1

Dataset updated

Jun 1, 2023

Dataset provided by

figshare
Figsharehttp://figshare.com/

Authors

F. Musacchia; A. Ciolfi; M. Mutarelli; A. Bruselles; R. Castello; M. Pinelli; S. Basu; S. Banfi; G. Casari; M. Tartaglia; V. Nigro

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

An example sample sheet containing samples information that is used to start an analysis in VarGenius. (TSV 330 bytes)

Clear search

Close search

Google apps

Main menu

Additional file 1: of VarGenius executes cohort-level DNA-seq variant...

Bike Store Relational Database | SQL

PostgreSQL Dump of IMDB Data for JOB Workload

Data from: Atlas of European Eel Distribution (Anguilla anguilla) in...

open a shell with command CMD

Move to the place where you have downloaded the file using the following command

note psql must be accessible, in windows you can add the path to the postgres

bin folder, otherwise you need to add the full path to the postgres bin folder see link to instructions below

this will open a command with # where you can launch the commands in the next box

dbeel_rivers.rn ! mandatory => table at the international level from which

the other table inherit

even if you don't want to use other countries

(In many cases you should ... there are transboundary catchments) download this first.

the rn network must be restored firt !

table rne and rna refer to it by foreign keys.

france

spain

portugal

rivermouth and basins, this file contains GEREM basins, distance to Gibraltar, the link to CCM id

with the schema you will probably want to be able to use the functions, but launch this only after

restoring rna in the next step

france

🏪🏬 Pagila (PostgreSQL Sample Database)

DVD rental database to demonstrate the features of PostgreSQL.

There are 15 tables in the DVD Rental database:

Technographic Data | 22M Records | Refreshed 2x/Mo | Delivery Hourly via...

Serverless PostgreSQL Market Research Report 2033

Serverless PostgreSQL Market Outlook

Deployment Type Analysis

Location of Ryanodine Receptor Type 2 Associated Catecholaminergic...

Use postgres/example user/password credentials

Steam Dataset 2025: Multi-Modal Gaming Analytics

Steam Dataset 2025: Multi-Modal Gaming Analytics Platform

What Makes This Different

What's Included

Example Analysis: Published Notebooks (v1.0)

📊 Platform Evolution & Market Landscape

🔍 Semantic Game Discovery

🎯 The Semantic Fingerprint

Most popular database management systems worldwide 2024

ParkingDB HCMCity PostgreSQL

Hydrocam Sample Data and Processed Results

Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter...

Up-to-date mapping of COVID-19 treatment and vaccine development...

Most popular database management systems in software companies in Russia...

Data from: SQL Injection Attack Netflow

A Holocene relative sea-level database for the Baltic Sea

Startup Data | 249 Countries Coverage | +95% Email and Phone Data Accuracy |...

CHINOOK Music

Data from: Eurasian Modern Pollen Database (former European Modern Pollen...

Additional file 1: of VarGenius executes cohort-level DNA-seq variant calling and annotation and allows to manage the resulting data through a PostgreSQL databaseSee More Versions

Additional file 1: of VarGenius executes cohort-level DNA-seq variant calling and annotation and allows to manage the resulting data through a PostgreSQL database