Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This database contains: rainfall, humidity, temperature, global solar radiation, wind velocity and wind direction ten-minute data from 150 stations of the Meteogalicia network between 1-jan-2000 and 31-dec-2018.
Version installed: postgresql 9.1
Extension installed: postgis 1.5.3-1
Instructions to restore the database:
createdb -E UTF8 -O postgres -U postgres template_postgis
createlang plpgsql -d template_postgis -U postgres
psql -d template_postgis -U postgres -f /usr/share/postgresql/9.1/contrib/postgis-1.5/postgis.sql
psql -d template_postgis -U postgres -f /usr/share/postgresql/9.1/contrib/postgis-1.5/spatial_ref_sys.sql
psql -d template_postgis -U postgres -f /usr/share/postgresql/9.1/contrib/postgis_comments.sql
createdb -U postgres -T template_postgis MeteoGalicia
cat Meteogalicia* | psql MeteoGalicia
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
DESCRIPTION
VERSIONS
version1.0.1 fixes problem with functions
version1.0.2 added table dbeel_rivers.rn_rivermouth with GEREM basin, distance to Gibraltar and link to CCM.
version1.0.3 fixes problem with functions
version1.0.4 adds views rn_rna and rn_rne to the database
The SUDOANG project aims at providing common tools to managers to support eel conservation in the SUDOE area (Spain, France and Portugal). VISUANG is the SUDOANG Interactive Web Application that host all these tools . The application consists of an eel distribution atlas (GT1), assessments of mortalities caused by turbines and an atlas showing obstacles to migration (GT2), estimates of recruitment and exploitation rate (GT3) and escapement (chosen as a target by the EC for the Eel Management Plans) (GT4). In addition, it includes an interactive map showing sampling results from the pilot basin network produced by GT6.
The eel abundance for the eel atlas and escapement has been obtained using the Eel Density Analysis model (EDA, GT4's product). EDA extrapolates the abundance of eel in sampled river segments to other segments taking into account how the abundance, sex and size of the eels change depending on different parameters. Thus, EDA requires two main data sources: those related to the river characteristics and those related to eel abundance and characteristics.
However, in both cases, data availability was uneven in the SUDOE area. In addition, this information was dispersed among several managers and in different formats due to different sampling sources: Water Framework Directive (WFD), Community Framework for the Collection, Management and Use of Data in the Fisheries Sector (EUMAP), Eel Management Plans, research groups, scientific papers and technical reports. Therefore, the first step towards having eel abundance estimations including the whole SUDOE area, was to have a joint river and eel database. In this report we will describe the database corresponding to the river’s characteristics in the SUDOE area and the eel abundances and their characteristics.
In the case of rivers, two types of information has been collected:
River topology (RN table): a compilation of data on rivers and their topological and hydrographic characteristics in the three countries.
River attributes (RNA table): contains physical attributes that have fed the SUDOANG models.
The estimation of eel abundance and characteristic (size, biomass, sex-ratio and silver) distribution at different scales (river segment, basin, Eel Management Unit (EMU), and country) in the SUDOE area obtained with the implementation of the EDA2.3 model has been compiled in the RNE table (eel predictions).
CURRENT ACTIVE PROJECT
The project is currently active here : gitlab forgemia
TECHNICAL DESCRIPTION TO BUILD THE POSTGRES DATABASE
All tables are in ESPG:3035 (European LAEA). The format is postgreSQL database. You can download other formats (shapefiles, csv), here SUDOANG gt1 database.
Initial command
cd c:/path/to/my/folder
createdb -U postgres eda2.3 psql -U postgres eda2.3
Within the psql command
create extension "postgis"; create extension "dblink"; create extension "ltree"; create extension "tablefunc"; create schema dbeel_rivers; create schema france; create schema spain; create schema portugal; -- type \q to quit the psql shell
Now the database is ready to receive the differents dumps. The dump file are large. You might not need the part including unit basins or waterbodies. All the tables except waterbodies and unit basins are described in the Atlas. You might need to understand what is inheritance in a database. https://www.postgresql.org/docs/12/tutorial-inheritance.html
These layers contain the topology (see Atlas for detail)
dbeel_rivers.rn
france.rn
spain.rn
portugal.rn
Columns (see Atlas)
gid
idsegment
source
target
lengthm
nextdownidsegment
path
isfrontier
issource
seaidsegment
issea
geom
isendoreic
isinternational
country
dbeel_rivers.rn_rivermouth
seaidsegment
geom (polygon)
gerem_zone_3
gerem_zone_4 (used in EDA)
gerem_zone_5
ccm_wso_id
country
emu_name_short
geom_outlet (point)
name_basin
dist_from_gibraltar_km
name_coast
basin_name
pg_restore -U postgres -d eda2.3 "dbeel_rivers.rn.backup"
pg_restore -U postgres -d eda2.3 "france.rn.backup"
pg_restore -U postgres -d eda2.3 "spain.rn.backup"
pg_restore -U postgres -d eda2.3 "portugal.rn.backup"
for each basin flowing to the sea. pg_restore -U postgres -d eda2.3 "dbeel_rivers.rn_rivermouth.backup"
psql -U postgres -d eda2.3 -f "function_dbeel_rivers.sql"
This corresponds to tables
dbeel_rivers.rna
france.rna
spain.rna
portugal.rna
Columns (See Atlas)
idsegment
altitudem
distanceseam
distancesourcem
cumnbdam
medianflowm3ps
surfaceunitbvm2
surfacebvm2
strahler
shreeve
codesea
name
pfafriver
pfafsegment
basin
riverwidthm
temperature
temperaturejan
temperaturejul
wettedsurfacem2
wettedsurfaceotherm2
lengthriverm
emu
cumheightdam
riverwidthmsource
slope
dis_m3_pyr_riveratlas
dis_m3_pmn_riveratlas
dis_m3_pmx_riveratlas
drought
drought_type_calc
Code :
pg_restore -U postgres -d eda2.3 "dbeel_rivers.rna.backup"
pg_restore -U postgres -d eda2.3 "france.rna.backup"
pg_restore -U postgres -d eda2.3 "spain.rna.backup"
pg_restore -U postgres -d eda2.3 "portugal.rna.backup"
These layers contain eel data (see Atlas for detail)
dbeel_rivers.rne
france.rne
spain.rne
portugal.rne
Columns (see Atlas)
idsegment
surfaceunitbvm2
surfacebvm2
delta
gamma
density
neel
beel
peel150
peel150300
peel300450
peel450600
peel600750
peel750
nsilver
bsilver
psilver150300
psilver300450
psilver450600
psilver600750
psilver750
psilver
pmale150300
pmale300450
pmale450600
pfemale300450
pfemale450600
pfemale600750
pfemale750
pmale
pfemale
sex_ratio
cnfemale300450
cnfemale450600
cnfemale600750
cnfemale750
cnmale150300
cnmale300450
cnmale450600
cnsilver150300
cnsilver300450
cnsilver450600
cnsilver600750
cnsilver750
cnsilver
delta_tr
gamma_tr
type_fit_delta_tr
type_fit_gamma_tr
density_tr
density_pmax_tr
neel_pmax_tr
nsilver_pmax_tr
density_wd
neel_wd
beel_wd
nsilver_wd
bsilver_wd
sector_tr
year_tr
is_current_distribution_area
is_pristine_distribution_area_1985
Code for restauration
pg_restore -U postgres -d eda2.3 "dbeel_rivers.rne.backup"
pg_restore -U postgres -d eda2.3 "france.rne.backup"
pg_restore -U postgres -d eda2.3 "spain.rne.backup"
pg_restore -U postgres -d eda2.3 "portugal.rne.backup"
Units basins are not described in the Altas. They correspond to the following tables :
dbeel_rivers.basinunit_bu
france.basinunit_bu
spain.basinunit_bu
portugal.basinunit_bu
france.basinunitout_buo
spain.basinunitout_buo
portugal.basinunitout_buo
The unit basins is the simple basin that surrounds a segment. It correspond to the topography unit from which unit segment have been calculated. ESPG 3035. Tables bu_unitbv, and bu_unitbvout inherit from dbeel_rivers.unit_bv. The first table intersects with a segment, the second table does not, it corresponds to basin polygons which do not have a riversegment.
Source :
Portugal
France
In france unit bv corresponds to the RHT (Pella et al., 2012)
Spain
pg_restore -U postgres -d eda2.3 'dbeel_rivers.basinunit_bu.backup'
pg_restore -U postgres -d eda2.3
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of Jupyter Notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourages poor coding practices and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we analyzed 1.4 million notebooks from GitHub. Based on the results, we proposed and evaluated Julynter, a linting tool for Jupyter Notebooks.
Papers:
This repository contains three files:
Reproducing the Notebook Study
The db2020-09-22.dump.gz file contains a PostgreSQL dump of the database, with all the data we extracted from notebooks. For loading it, run:
gunzip -c db2020-09-22.dump.gz | psql jupyter
Note that this file contains only the database with the extracted data. The actual repositories are available in a google drive folder, which also contains the docker images we used in the reproducibility study. The repositories are stored as content/{hash_dir1}/{hash_dir2}.tar.bz2, where hash_dir1 and hash_dir2 are columns of repositories in the database.
For scripts, notebooks, and detailed instructions on how to analyze or reproduce the data collection, please check the instructions on the Jupyter Archaeology repository (tag 1.0.0)
The sample.tar.gz file contains the repositories obtained during the manual sampling.
Reproducing the Julynter Experiment
The julynter_reproducility.tar.gz file contains all the data collected in the Julynter experiment and the analysis notebooks. Reproducing the analysis is straightforward:
The collected data is stored in the julynter/data folder.
Changelog
2019/01/14 - Version 1 - Initial version
2019/01/22 - Version 2 - Update N8.Execution.ipynb to calculate the rate of failure for each reason
2019/03/13 - Version 3 - Update package for camera ready. Add columns to db to detect duplicates, change notebooks to consider them, and add N1.Skip.Notebook.ipynb and N11.Repository.With.Notebook.Restriction.ipynb.
2021/03/15 - Version 4 - Add Julynter experiment; Update database dump to include new data collected for the second paper; remove scripts and analysis notebooks from this package (moved to GitHub), add a link to Google Drive with collected repository files
Spider 2.0 is a comprehensive code generation agent task that includes 632 examples. The agent has to interactively explore various types of databases, such as BigQuery, Snowflake, Postgres, ClickHouse, DuckDB, and SQLite. It is required to engage with complex SQL workflows, process extensive contexts, perform intricate reasoning, and generate multiple SQL queries with diverse operations, often exceeding 100 lines across multiple interactions.
Location of RYR2 Associated CPVT Variants Dataset
Catecholaminergic polymorphic ventricular tachycardia (CPVT) is a rare inherited arrhythmia caused by pathogenic RYR2 variants. CPVT is characterized by exercise/stress-induced syncope and cardiac arrest in the absence of resting ECG and structural cardiac abnormalities.
Here, we present a database collected from 221 clinical papers, published from 2001-October 2020, about CPVT associated RYR2 variants. 1342 patients, both with and without CPVT, with RYR2 variants are in the database. There are a total of 964 CPVT patients or suspected CPVT patients in the database. The database includes information regarding genetic diagnosis, location of the RYR2 variant(s), clinical history and presentation, and treatment strategies for each patient. Patients will have a varying depth of information in each of the provided fields.
Database website: https://cpvtdb.port5000.com/
Dataset Information
This dataset includes:
all_data.xlsx
Tabular version of the database
Most relevant tables in the PostgreSQL database regarding patient sex, conditions, treatments, family history, and variant information were joined to create this database
Views calculating the affected RYR2 exons, domains and subdomains have been joined to patient information
m-n tables for patient's conditions and treatments have been converted to pivot tables - every condition and treatment that has at least 1 person with that condition or treatment is a column.
NOTE: This was created using a LEFT JOIN of individuals and individual_variants tables. Individuals with more than 1 recorded variant will be listed on multiple rows.
There is only 1 patient in this database with multiple recorded variants (all intronic)
20241219-dd040736b518.sql.gz
PostgreSQL database dump
Expands to about 200MB after loading the database dump
The database includes two schemas:
public: Includes all information in patients and variants
Also includes all RYR2 variants in ClinVar
uta: Contains the rows from biocommons/uta database required to make the hgvs Python package validate RYR2 variants
See https://github.com/biocommons/uta for more information
NOTE: It is recommended to use this version of the database only for development or analysis purposes
database_tables.pdf
Contains information on most of the database tables and columns in the public schema
00_globals.sql
Required to load the PostgreSQL database dump
How To Load Database Using Docker
First, download the 00_globals.sql
and _.gz.sql
file and move it into a directory. The default postgres image will load files from the /docker-entrypoint-initdb.d
directory if the database is empty. See Docker Hub for more information. Mount the directory with the files into the /docker-entrypoint-initdb.d
.
Example using docker compose with pgadmin and a volume to persist the data.
volumes: mydatabasevolume: null
services:
db: image: postgres:16 restart: always environment: POSTGRES_PASSWORD: mysecretpassword POSTGRES_USER: postgres volumes: - ':/docker-entrypoint-initdb.d/' - 'mydatabasevolume:/var/lib/postgresql/data' ports: - 5432:5432
pgadmin: image: dpage/pgadmin4 environment: PGADMIN_DEFAULT_EMAIL: user@domain.com PGADMIN_DEFAULT_PASSWORD: SuperSecret ports: - 8080:80
Analysis Code
See https://github.com/alexdaiii/cpvt_database_analysis for source code to create the xlsx file and analysis of the data.
Changelist
v0.3.0
Removed inasscessable publications
Updated publications tgo include information on what type of publication it is (e.g. Original Article, Abstract, Review, etc)
v0.2.1
Updated all_patients.xlsx -> all_data.xlsx
Corrected how the data from all the patient's conditions, diseases, treatments, and the patients' variants tables are joined
Link to the ScienceBase Item Summary page for the item described by this metadata record. Service Protocol: Link to the ScienceBase Item Summary page for the item described by this metadata record. Application Profile: Web Browser. Link Function: information
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset for the research paper "Anthropogenic Specular Interference in the Operational GOES-R Fire Product".
Large reflective structures like solar power plants and commercial greenhouses sometimes reflect sunlight directly into GOES-R sensors. These anthropogenic specular reflections, or "sparkles", cause commission errors in operational GOES-R ABI products like the Fire Detection and Characterization Algorithm (FDCA). Using the abi-sparkle library for Python (Dove-Robinson, 2023), we generated a dataset containing both detected anthropogenic specular reflection pixels and the coincident FDCA commission errors caused by them for the GOES-16 CONUS domain during the 2020 calendar year.
The dataset consists of two exported PostgreSQL tables: sparkle_pixels_g16_abi_conus_2020, which contains the detected anthropogenic specular reflection pixels at 500 m resolution, and fdca_commission_error_clusters_g16_abi_conus_2020, which contains clustered FDCA false alarm fire pixels caused by anthropogenic specular reflection at 2 km resolution. The FDCA pixels were only processed for fire mask codes 10-15 and 30-35; see Table 3.11 in Schmidt et al., 2013 for fire code definitions.
Each row in sparkle_pixels_g16_abi_conus_2020 is a detected specular reflection pixel in a GOES-16 CONUS image from the 2020 calendar year with a unique numeric ID sparkle_id and associated metadata from the detection algorithm abi-sparkle. The column sparkle_geom is a PostGIS geometry ST_Point object that can be used to plot the pixels on a map.
FDCA fire pixels at 2 km resolution were clustered based on their connectivity in a 3x3 pixel kernel and assigned a UUID fire_cluster_id in the table fdca_commission_error_clusters_g16_abi_conus_2020. Only the fire clusters that overlapped with sparkle pixels in time and space were retained in the table. In this way, each row of fdca_commission_error_clusters_g16_abi_conus_2020 is a unique cluster of errant FDCA fire pixels caused by anthropogenic specular reflection in every available scan start time for the GOES-16 CONUS domain in 2020. Every fire cluster centroid has a PostGIS geometry object fire_cluster_centroid_geom that can be used to plot the errant fire pixel clusters on a map.
The two tables relate with the column sparkle_ids in fdca_commission_error_clusters_g16_abi_conus_2020, which is an array of overlapping sparkle IDs from the sparkle_pixels_g16_abi_conus_2020 table. The combined dataset may therefore be generated with a simple SQL INNER JOIN:
SELECT * FROM fdca_commission_error_clusters_g16_abi_conus_2020 fcecgac
INNER JOIN sparkle_pixels_g16_abi_conus_2020 spgac ON spgac.sparkle_id = ANY(fcecgac.sparkle_ids);
The tables can be imported into a PostgreSQL database version 12 or newer with PostGIS extensions installed. For example, to import the tables into a database in a Linux environment, run the following commands:
gunzip -c sparkle_pixels_g16_abi_conus_2020.sql.gz | psql -d your_database_name
gunzip -c fdca_commission_error_clusters_g16_abi_conus_2020.sql.gz | psql -d your_database_name
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This database contains: rainfall, humidity, temperature, global solar radiation, wind velocity and wind direction ten-minute data from 150 stations of the Meteogalicia network between 1-jan-2000 and 31-dec-2018.
Version installed: postgresql 9.1
Extension installed: postgis 1.5.3-1
Instructions to restore the database:
createdb -E UTF8 -O postgres -U postgres template_postgis
createlang plpgsql -d template_postgis -U postgres
psql -d template_postgis -U postgres -f /usr/share/postgresql/9.1/contrib/postgis-1.5/postgis.sql
psql -d template_postgis -U postgres -f /usr/share/postgresql/9.1/contrib/postgis-1.5/spatial_ref_sys.sql
psql -d template_postgis -U postgres -f /usr/share/postgresql/9.1/contrib/postgis_comments.sql
createdb -U postgres -T template_postgis MeteoGalicia
cat Meteogalicia* | psql MeteoGalicia