33 datasets found
  1. Data from: Meteogalicia PostgreSQL Database (2000 - 2018)

    • zenodo.org
    • portalinvestigacion.udc.gal
    bin
    Updated Sep 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jose Vidal-Paz; Jose Vidal-Paz (2024). Meteogalicia PostgreSQL Database (2000 - 2018) [Dataset]. http://doi.org/10.5281/zenodo.11915325
    Explore at:
    binAvailable download formats
    Dataset updated
    Sep 9, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jose Vidal-Paz; Jose Vidal-Paz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This database contains: rainfall, humidity, temperature, global solar radiation, wind velocity and wind direction ten-minute data from 150 stations of the Meteogalicia network between 1-jan-2000 and 31-dec-2018.

    Version installed: postgresql 9.1

    Extension installed: postgis 1.5.3-1

    Instructions to restore the database:

    1. Create template:

      createdb -E UTF8 -O postgres -U postgres template_postgis

    2. Activate PL/pgSQL language:

      createlang plpgsql -d template_postgis -U postgres

    3. Load definitions of PostGIS:

      psql -d template_postgis -U postgres -f /usr/share/postgresql/9.1/contrib/postgis-1.5/postgis.sql

      psql -d template_postgis -U postgres -f /usr/share/postgresql/9.1/contrib/postgis-1.5/spatial_ref_sys.sql

      psql -d template_postgis -U postgres -f /usr/share/postgresql/9.1/contrib/postgis_comments.sql

    4. Create database with "MeteoGalicia" name with PostGIS extension:

      createdb -U postgres -T template_postgis MeteoGalicia

    5. Restore backup:

      cat Meteogalicia* | psql MeteoGalicia

  2. Z

    Data from: Atlas of European Eel Distribution (Anguilla anguilla) in...

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Domingos, Isabel (2024). Atlas of European Eel Distribution (Anguilla anguilla) in Portugal, Spain and France [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6021837
    Explore at:
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Drouineau, Hilaire
    Mateo, Maria
    Amilhat, Elsa
    Fernández-Delgado, Carlos
    De Miguel Rubio, Ramon
    Pella, Herve
    Korta, Maria
    Briand, Cédric
    Zamora, Lluis
    Herrera, Mercedes
    Beaulaton, Laurent
    Díaz, Estibalitz
    Domingos, Isabel
    Bardonnet, Agnès
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Spain, France, Portugal
    Description

    DESCRIPTION

    VERSIONS

    version1.0.1 fixes problem with functions

    version1.0.2 added table dbeel_rivers.rn_rivermouth with GEREM basin, distance to Gibraltar and link to CCM.

    version1.0.3 fixes problem with functions

    version1.0.4 adds views rn_rna and rn_rne to the database

    The SUDOANG project aims at providing common tools to managers to support eel conservation in the SUDOE area (Spain, France and Portugal). VISUANG is the SUDOANG Interactive Web Application that host all these tools . The application consists of an eel distribution atlas (GT1), assessments of mortalities caused by turbines and an atlas showing obstacles to migration (GT2), estimates of recruitment and exploitation rate (GT3) and escapement (chosen as a target by the EC for the Eel Management Plans) (GT4). In addition, it includes an interactive map showing sampling results from the pilot basin network produced by GT6.

    The eel abundance for the eel atlas and escapement has been obtained using the Eel Density Analysis model (EDA, GT4's product). EDA extrapolates the abundance of eel in sampled river segments to other segments taking into account how the abundance, sex and size of the eels change depending on different parameters. Thus, EDA requires two main data sources: those related to the river characteristics and those related to eel abundance and characteristics.

    However, in both cases, data availability was uneven in the SUDOE area. In addition, this information was dispersed among several managers and in different formats due to different sampling sources: Water Framework Directive (WFD), Community Framework for the Collection, Management and Use of Data in the Fisheries Sector (EUMAP), Eel Management Plans, research groups, scientific papers and technical reports. Therefore, the first step towards having eel abundance estimations including the whole SUDOE area, was to have a joint river and eel database. In this report we will describe the database corresponding to the river’s characteristics in the SUDOE area and the eel abundances and their characteristics.

    In the case of rivers, two types of information has been collected:

    River topology (RN table): a compilation of data on rivers and their topological and hydrographic characteristics in the three countries.

    River attributes (RNA table): contains physical attributes that have fed the SUDOANG models.

    The estimation of eel abundance and characteristic (size, biomass, sex-ratio and silver) distribution at different scales (river segment, basin, Eel Management Unit (EMU), and country) in the SUDOE area obtained with the implementation of the EDA2.3 model has been compiled in the RNE table (eel predictions).

    CURRENT ACTIVE PROJECT

    The project is currently active here : gitlab forgemia

    TECHNICAL DESCRIPTION TO BUILD THE POSTGRES DATABASE

    1. Build the database in postgres.

    All tables are in ESPG:3035 (European LAEA). The format is postgreSQL database. You can download other formats (shapefiles, csv), here SUDOANG gt1 database.

    Initial command

    open a shell with command CMD

    Move to the place where you have downloaded the file using the following command

    cd c:/path/to/my/folder

    note psql must be accessible, in windows you can add the path to the postgres

    bin folder, otherwise you need to add the full path to the postgres bin folder see link to instructions below

    createdb -U postgres eda2.3 psql -U postgres eda2.3

    this will open a command with # where you can launch the commands in the next box

    Within the psql command

    create extension "postgis"; create extension "dblink"; create extension "ltree"; create extension "tablefunc"; create schema dbeel_rivers; create schema france; create schema spain; create schema portugal; -- type \q to quit the psql shell

    Now the database is ready to receive the differents dumps. The dump file are large. You might not need the part including unit basins or waterbodies. All the tables except waterbodies and unit basins are described in the Atlas. You might need to understand what is inheritance in a database. https://www.postgresql.org/docs/12/tutorial-inheritance.html

    1. RN (riversegments)

    These layers contain the topology (see Atlas for detail)

    dbeel_rivers.rn

    france.rn

    spain.rn

    portugal.rn

    Columns (see Atlas)

        gid
    
    
        idsegment
    
    
        source
    
    
        target
    
    
        lengthm
    
    
        nextdownidsegment
    
    
        path
    
    
        isfrontier
    
    
        issource
    
    
        seaidsegment
    
    
        issea
    
    
        geom
    
    
        isendoreic
    
    
        isinternational
    
    
        country
    

    dbeel_rivers.rn_rivermouth

        seaidsegment
    
    
        geom (polygon)
    
    
        gerem_zone_3
    
    
        gerem_zone_4 (used in EDA)
    
    
        gerem_zone_5
    
    
        ccm_wso_id
    
    
        country
    
    
        emu_name_short
    
    
        geom_outlet (point)
    
    
        name_basin
    
    
        dist_from_gibraltar_km
    
    
        name_coast
    
    
        basin_name
    

    dbeel_rivers.rn ! mandatory => table at the international level from which

    the other table inherit

    even if you don't want to use other countries

    (In many cases you should ... there are transboundary catchments) download this first.

    the rn network must be restored firt !

    table rne and rna refer to it by foreign keys.

    pg_restore -U postgres -d eda2.3 "dbeel_rivers.rn.backup"

    france

    pg_restore -U postgres -d eda2.3 "france.rn.backup"

    spain

    pg_restore -U postgres -d eda2.3 "spain.rn.backup"

    portugal

    pg_restore -U postgres -d eda2.3 "portugal.rn.backup"

    rivermouth and basins, this file contains GEREM basins, distance to Gibraltar, the link to CCM id

    for each basin flowing to the sea. pg_restore -U postgres -d eda2.3 "dbeel_rivers.rn_rivermouth.backup"

    with the schema you will probably want to be able to use the functions, but launch this only after

    restoring rna in the next step

    psql -U postgres -d eda2.3 -f "function_dbeel_rivers.sql"

    1. RNA (Attributes)

    This corresponds to tables

    dbeel_rivers.rna

    france.rna

    spain.rna

    portugal.rna

    Columns (See Atlas)

        idsegment
    
    
        altitudem
    
    
        distanceseam
    
    
        distancesourcem
    
    
        cumnbdam
    
    
        medianflowm3ps
    
    
        surfaceunitbvm2
    
    
        surfacebvm2
    
    
        strahler
    
    
        shreeve
    
    
        codesea
    
    
        name
    
    
        pfafriver
    
    
        pfafsegment
    
    
        basin
    
    
        riverwidthm
    
    
        temperature
    
    
        temperaturejan
    
    
        temperaturejul
    
    
        wettedsurfacem2
    
    
        wettedsurfaceotherm2
    
    
        lengthriverm
    
    
        emu
    
    
        cumheightdam
    
    
        riverwidthmsource
    
    
        slope
    
    
        dis_m3_pyr_riveratlas
    
    
        dis_m3_pmn_riveratlas
    
    
        dis_m3_pmx_riveratlas
    
    
        drought
    
    
        drought_type_calc
    

    Code :

    pg_restore -U postgres -d eda2.3 "dbeel_rivers.rna.backup" pg_restore -U postgres -d eda2.3 "france.rna.backup" pg_restore -U postgres -d eda2.3 "spain.rna.backup"
    pg_restore -U postgres -d eda2.3 "portugal.rna.backup"

    1. RNE (eel predictions)

    These layers contain eel data (see Atlas for detail)

    dbeel_rivers.rne

    france.rne

    spain.rne

    portugal.rne

    Columns (see Atlas)

        idsegment
    
    
        surfaceunitbvm2
    
    
        surfacebvm2
    
    
        delta
    
    
        gamma
    
    
        density
    
    
        neel
    
    
        beel
    
    
        peel150
    
    
        peel150300
    
    
        peel300450
    
    
        peel450600
    
    
        peel600750
    
    
        peel750
    
    
        nsilver
    
    
        bsilver
    
    
        psilver150300
    
    
        psilver300450
    
    
        psilver450600
    
    
        psilver600750
    
    
        psilver750
    
    
        psilver
    
    
        pmale150300
    
    
        pmale300450
    
    
        pmale450600
    
    
        pfemale300450
    
    
        pfemale450600
    
    
        pfemale600750
    
    
        pfemale750
    
    
        pmale
    
    
        pfemale
    
    
        sex_ratio
    
    
        cnfemale300450
    
    
        cnfemale450600
    
    
        cnfemale600750
    
    
        cnfemale750
    
    
        cnmale150300
    
    
        cnmale300450
    
    
        cnmale450600
    
    
        cnsilver150300
    
    
        cnsilver300450
    
    
        cnsilver450600
    
    
        cnsilver600750
    
    
        cnsilver750
    
    
        cnsilver
    
    
        delta_tr
    
    
        gamma_tr
    
    
        type_fit_delta_tr
    
    
        type_fit_gamma_tr
    
    
        density_tr
    
    
        density_pmax_tr
    
    
        neel_pmax_tr
    
    
        nsilver_pmax_tr
    
    
        density_wd
    
    
        neel_wd
    
    
        beel_wd
    
    
        nsilver_wd
    
    
        bsilver_wd
    
    
        sector_tr
    
    
        year_tr
    
    
        is_current_distribution_area
    
    
        is_pristine_distribution_area_1985
    

    Code for restauration

    pg_restore -U postgres -d eda2.3 "dbeel_rivers.rne.backup" pg_restore -U postgres -d eda2.3 "france.rne.backup" pg_restore -U postgres -d eda2.3 "spain.rne.backup"
    pg_restore -U postgres -d eda2.3 "portugal.rne.backup"

    1. Unit basins

    Units basins are not described in the Altas. They correspond to the following tables :

    dbeel_rivers.basinunit_bu

    france.basinunit_bu

    spain.basinunit_bu

    portugal.basinunit_bu

    france.basinunitout_buo

    spain.basinunitout_buo

    portugal.basinunitout_buo

    The unit basins is the simple basin that surrounds a segment. It correspond to the topography unit from which unit segment have been calculated. ESPG 3035. Tables bu_unitbv, and bu_unitbvout inherit from dbeel_rivers.unit_bv. The first table intersects with a segment, the second table does not, it corresponds to basin polygons which do not have a riversegment.

    Source :

    Portugal

    https://sniambgeoviewer.apambiente.pt/Geodocs/gml/inspire/HY_PhysicalWaters_DrainageBasinGeoCod.ziphttps://sniambgeoviewer.apambiente.pt/Geodocs/gml/inspire/HY_PhysicalWaters_DrainageBasinGeoCod.zip

    France

    In france unit bv corresponds to the RHT (Pella et al., 2012)

    Spain

    http://www.mapama.gob.es/ide/metadatos/index.html?srv=metadata.show&uuid=898f0ff8-f06c-4c14-88f7-43ea90e48233

    pg_restore -U postgres -d eda2.3 'dbeel_rivers.basinunit_bu.backup'

    france

    pg_restore -U postgres -d eda2.3

  3. Additional file 1: of VarGenius executes cohort-level DNA-seq variant...

    • springernature.figshare.com
    txt
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    F. Musacchia; A. Ciolfi; M. Mutarelli; A. Bruselles; R. Castello; M. Pinelli; S. Basu; S. Banfi; G. Casari; M. Tartaglia; V. Nigro (2023). Additional file 1: of VarGenius executes cohort-level DNA-seq variant calling and annotation and allows to manage the resulting data through a PostgreSQL database [Dataset]. http://doi.org/10.6084/m9.figshare.7460612.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    F. Musacchia; A. Ciolfi; M. Mutarelli; A. Bruselles; R. Castello; M. Pinelli; S. Basu; S. Banfi; G. Casari; M. Tartaglia; V. Nigro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    An example sample sheet containing samples information that is used to start an analysis in VarGenius. (TSV 330 bytes)

  4. d

    LinkDB - a Postgresql database of close to 500M public global LinkedIn...

    • datarade.ai
    .sql
    Updated Jan 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nubela (2023). LinkDB - a Postgresql database of close to 500M public global LinkedIn profiles [Dataset]. https://datarade.ai/data-products/linkdb-a-postgresql-database-of-more-than-400m-public-linke-nubela
    Explore at:
    .sqlAvailable download formats
    Dataset updated
    Jan 27, 2023
    Dataset authored and provided by
    Nubela
    Area covered
    Andorra, Moldova (Republic of), Myanmar, French Guiana, Guinea, Greenland, Cayman Islands, British Indian Ocean Territory, Pitcairn, Montserrat
    Description

    LinkDB is an exhaustive dataset of publicly accessible LinkedIn people and companies, containing close to 500M people & companies profiles by region.

    LinkDB is updated up to millions of profiles daily at the point of purchase. Post-purchase, you can keep LinkDB updated quarterly for a nominal fee.

    Data is shipped in Parquet file format, Apache Parquet, a column-oriented data file format.

    All our data and procedures are in place that meet major legal compliance requirements such as GDPR, CCPA. We help you be compliant too.

  5. d

    Global Private Equity (PE) Funding Data | Refreshed 2x/Mo | Delivery Hourly...

    • datarade.ai
    .json, .csv, .sql
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Forager.ai, Global Private Equity (PE) Funding Data | Refreshed 2x/Mo | Delivery Hourly via CSV/JSON/PostgreSQL DB Delivery | Company Data [Dataset]. https://datarade.ai/data-products/global-private-equity-pe-funding-data-refreshed-2x-mo-d-forager-ai
    Explore at:
    .json, .csv, .sqlAvailable download formats
    Dataset provided by
    Forager.ai
    Area covered
    Albania, Bermuda, Jamaica, Bosnia and Herzegovina, Barbados, Bouvet Island, Iceland, Andorra, Côte d'Ivoire, Liechtenstein
    Description

    The Forager.ai Global Private Equity (PE) Funding Data Set is a leading source of firmographic data, backed by advanced AI and offering the highest refresh rate in the industry.

    | Volume and Stats |

    • Every company record refreshed twice a month, offering an unparalleled update frequency.
    • Delivery is made every hour, ensuring you have the latest data at your fingertips.
    • Each record is the result of an advanced AI-driven process, ensuring high-quality, accurate data.

    | Use Cases |

    Sales Platforms, ABM and Intent Data Platforms, Identity Platforms, Data Vendors:

    Example applications include:

    1. Uncover trending technologies or tools gaining popularity.

    2. Pinpoint lucrative business prospects by identifying similar solutions utilized by a specific company.

    3. Study a company's tech stacks to understand the technical capability and skills available within that company.

    B2B Tech Companies:

    • Enrich leads that sign-up through the Company Search API (available separately).
    • Identify and map every company that fits your core personas and ICP.
    • Build audiences to target, using key fields like location, company size, industry, and description.

    Venture Capital and Private Equity:

    • Discover new investment opportunities using company descriptions and industry-level data.
    • Review the growth of private companies and benchmark their strength against competitors.
    • Create high-level views of companies competing in popular verticals for investment.

    | Delivery Options |

    • Flat files via S3 or GCP
    • PostgreSQL Shared Database
    • PostgreSQL Managed Database
    • API
    • Other options available upon request, depending on the scale required

    Our dataset provides a unique blend of volume, freshness, and detail that is perfect for Sales Platforms, B2B Tech, VCs & PE firms, Marketing Automation, ABM & Intent. It stands as a cornerstone in our broader data offering, ensuring you have the information you need to drive decision-making and growth.

    Tags: Company Data, Company Profiles, Employee Data, Firmographic Data, AI-Driven Data, High Refresh Rate, Company Classification, Private Market Intelligence, Workforce Intelligence, Public Companies.

  6. d

    PostGIS integration in CyberGIS-Jupyter for Water (CJW) platform

    • search.dataone.org
    • hydroshare.org
    • +1more
    Updated Apr 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Weiye Chen; Shaohua Wang (2022). PostGIS integration in CyberGIS-Jupyter for Water (CJW) platform [Dataset]. https://search.dataone.org/view/sha256%3Acb0742b2847d905f742211f4f9e50f2232a0b8352b09b8e55c4778aafc6a44be
    Explore at:
    Dataset updated
    Apr 15, 2022
    Dataset provided by
    Hydroshare
    Authors
    Weiye Chen; Shaohua Wang
    Area covered
    Description

    This example demonstrates how to use PostGIS capabilities in CyberGIS-Jupyter notebook environment. Modified from notebook by Weiye Chen (weiyec2@illinois.edu)

    PostGIS is an extension to the PostgreSQL object-relational database system which allows GIS (Geographic Information Systems) objects to be stored in the database. PostGIS includes support for GiST-based R-Tree spatial indices, and functions for analysis and processing of GIS objects.

    Resources for PostGIS:

    Manual https://postgis.net/docs/ In this demo, we use PostGIS 3.0. Note that significant changes in APIs have been made to PostGIS compared to version 2.x. This demo assumes that you have basic knowledge of SQL.

  7. Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter...

    • zenodo.org
    application/gzip
    Updated Mar 16, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana (2021). Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter Notebooks / Understanding and Improving the Quality and Reproducibility of Jupyter Notebooks [Dataset]. http://doi.org/10.5281/zenodo.3519618
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Mar 16, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of Jupyter Notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourages poor coding practices and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we analyzed 1.4 million notebooks from GitHub. Based on the results, we proposed and evaluated Julynter, a linting tool for Jupyter Notebooks.

    Papers:

    This repository contains three files:

    Reproducing the Notebook Study

    The db2020-09-22.dump.gz file contains a PostgreSQL dump of the database, with all the data we extracted from notebooks. For loading it, run:

    gunzip -c db2020-09-22.dump.gz | psql jupyter

    Note that this file contains only the database with the extracted data. The actual repositories are available in a google drive folder, which also contains the docker images we used in the reproducibility study. The repositories are stored as content/{hash_dir1}/{hash_dir2}.tar.bz2, where hash_dir1 and hash_dir2 are columns of repositories in the database.

    For scripts, notebooks, and detailed instructions on how to analyze or reproduce the data collection, please check the instructions on the Jupyter Archaeology repository (tag 1.0.0)

    The sample.tar.gz file contains the repositories obtained during the manual sampling.

    Reproducing the Julynter Experiment

    The julynter_reproducility.tar.gz file contains all the data collected in the Julynter experiment and the analysis notebooks. Reproducing the analysis is straightforward:

    • Uncompress the file: $ tar zxvf julynter_reproducibility.tar.gz
    • Install the dependencies: $ pip install julynter/requirements.txt
    • Run the notebooks in order: J1.Data.Collection.ipynb; J2.Recommendations.ipynb; J3.Usability.ipynb.

    The collected data is stored in the julynter/data folder.

    Changelog

    2019/01/14 - Version 1 - Initial version
    2019/01/22 - Version 2 - Update N8.Execution.ipynb to calculate the rate of failure for each reason
    2019/03/13 - Version 3 - Update package for camera ready. Add columns to db to detect duplicates, change notebooks to consider them, and add N1.Skip.Notebook.ipynb and N11.Repository.With.Notebook.Restriction.ipynb.
    2021/03/15 - Version 4 - Add Julynter experiment; Update database dump to include new data collected for the second paper; remove scripts and analysis notebooks from this package (moved to GitHub), add a link to Google Drive with collected repository files

  8. d

    Technographic Data | B2B Data | 22M Records | Refreshed 2x/Mo | Delivery...

    • datarade.ai
    .json, .csv, .sql
    Updated Sep 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Forager.ai (2024). Technographic Data | B2B Data | 22M Records | Refreshed 2x/Mo | Delivery Hourly via CSV/JSON/PostgreSQL DB Delivery [Dataset]. https://datarade.ai/data-products/technographic-data-b2b-data-22m-records-refreshed-2x-mo-forager-ai
    Explore at:
    .json, .csv, .sqlAvailable download formats
    Dataset updated
    Sep 30, 2024
    Dataset provided by
    Forager.ai
    Area covered
    Barbados, United Kingdom, Czech Republic, Singapore, Denmark, Uzbekistan, Brazil, Congo, Uganda, Anguilla
    Description

    The Forager.ai Global Dataset is a leading source of firmographic data, backed by advanced AI and offering the highest refresh rate in the industry.

    | Volume and Stats |

    • Over 22M total records, the highest volume in the industry today.
    • Every company record refreshed twice a month, offering an unparalleled update frequency.
    • Delivery is made every hour, ensuring you have the latest data at your fingertips.
    • Each record is the result of an advanced AI-driven process, ensuring high-quality, accurate data.

    | Use Cases |

    Sales Platforms, ABM and Intent Data Platforms, Identity Platforms, Data Vendors:

    Example applications include:

    1. Uncover trending technologies or tools gaining popularity.

    2. Pinpoint lucrative business prospects by identifying similar solutions utilized by a specific company.

    3. Study a company's tech stacks to understand the technical capability and skills available within that company.

    B2B Tech Companies:

    • Enrich leads that sign-up through the Company Search API (available separately).
    • Identify and map every company that fits your core personas and ICP.
    • Build audiences to target, using key fields like location, company size, industry, and description.

    Venture Capital and Private Equity:

    • Discover new investment opportunities using company descriptions and industry-level data.
    • Review the growth of private companies and benchmark their strength against competitors.
    • Create high-level views of companies competing in popular verticals for investment.

    | Delivery Options |

    • Flat files via S3 or GCP
    • PostgreSQL Shared Database
    • PostgreSQL Managed Database
    • API
    • Other options available upon request, depending on the scale required

    Our dataset provides a unique blend of volume, freshness, and detail that is perfect for Sales Platforms, B2B Tech, VCs & PE firms, Marketing Automation, ABM & Intent. It stands as a cornerstone in our broader data offering, ensuring you have the information you need to drive decision-making and growth.

    Tags: Company Data, Company Profiles, Employee Data, Firmographic Data, AI-Driven Data, High Refresh Rate, Company Classification, Private Market Intelligence, Workforce Intelligence, Public Companies.

  9. Up-to-date mapping of COVID-19 treatment and vaccine development...

    • zenodo.org
    • data.niaid.nih.gov
    bin, csv, png
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tomáš Wagner; Ivana Mišová; Ivana Mišová; Ján Frankovský; Ján Frankovský; Tomáš Wagner (2024). Up-to-date mapping of COVID-19 treatment and vaccine development (covid19-help.org data dump) [Dataset]. http://doi.org/10.5281/zenodo.4601446
    Explore at:
    csv, png, binAvailable download formats
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Tomáš Wagner; Ivana Mišová; Ivana Mišová; Ján Frankovský; Ján Frankovský; Tomáš Wagner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The free database mapping COVID-19 treatment and vaccine development based on the global scientific research is available at https://covid19-help.org/.

    Files provided here are curated partial data exports in the form of .csv files or full data export as .sql script generated with pg_dump from our PostgreSQL 12 database. You can also find .png file with our ER diagram of tables in .sql file in this repository.

    Structure of CSV files

    *On our site, compounds are named as substances

    compounds.csv

    1. Id - Unique identifier in our database (unsigned integer)

    2. Name - Name of the Substance/Compound (string)

    3. Marketed name - The marketed name of the Substance/Compound (string)

    4. Synonyms - Known synonyms (string)

    5. Description - Description (HTML code)

    6. Dietary sources - Dietary sources where the Substance/Compound can be found (string)

    7. Dietary sources URL - Dietary sources URL (string)

    8. Formula - Compound formula (HTML code)

    9. Structure image URL - Url to our website with the structure image (string)

    10. Status - Status of approval (string)

    11. Therapeutic approach - Approach in which Substance/Compound works (string)

    12. Drug status - Availability of Substance/Compound (string)

    13. Additional data - Additional data in stringified JSON format with data as prescribing information and note (string)

    14. General information - General information about Substance/Compound (HTML code)

    references.csv

    1. Id - Unique identifier in our database (unsigned integer)

    2. Impact factor - Impact factor of the scientific article (string)

    3. Source title - Title of the scientific article (string)

    4. Source URL - URL link of the scientific article (string)

    5. Tested on species - What testing model was used for the study (string)

    6. Published at - Date of publication of the scientific article (Date in ISO 8601 format)

    clinical-trials.csv

    1. Id - Unique identifier in our database (unsigned integer)

    2. Title - Title of the clinical trial study (string)

    3. Acronym title - Acronym of title of the clinical trial study (string)

    4. Source id - Unique identifier in the source database

    5. Source id optional - Optional identifier in other databases (string)

    6. Interventions - Description of interventions (string)

    7. Study type - Type of the conducted study (string)

    8. Study results - Has results? (string)

    9. Phase - Current phase of the clinical trial (string)

    10. Url - URL to clinical trial study page on clinicaltrials.gov (string)

    11. Status - Status in which study currently is (string)

    12. Start date - Date at which study was started (Date in ISO 8601 format)

    13. Completion date - Date at which study was completed (Date in ISO 8601 format)

    14. Additional data - Additional data in the form of stringified JSON with data as locations of study, study design, enrollment, age, outcome measures (string)

    compound-reference-relations.csv

    1. Reference id - Id of a reference in our DB (unsigned integer)

    2. Compound id - Id of a substance in our DB (unsigned integer)

    3. Note - Id of a substance in our DB (unsigned integer)

    4. Is supporting - Is evidence supporting or contradictory (Boolean, true if supporting)

    compound-clinical-trial.csv

    1. Clinical trial id - Id of a clinical trial in our DB (unsigned integer)

    2. Compound id - Id of a Substance/Compound in our DB (unsigned integer)

    tags.csv

    1. Id - Unique identifier in our database (unsigned integer)

    2. Name - Name of the tag (string)

    tags-entities.csv

    1. Tag id - Id of a tag in our DB (unsigned integer)

    2. Reference id - Id of a reference in our DB (unsigned integer)

    API Specification

    Our project also has an Open API that gives you access to our data in a format suitable for processing, particularly in JSON format.

    https://covid19-help.org/api-specification

    Services are split into five endpoints:

    • Substances - /api/substances

    • References - /api/references

    • Substance-reference relations - /api/substance-reference-relations

    • Clinical trials - /api/clinical-trials

    • Clinical trials-substances relations - /api/clinical-trials-substances

    Method of providing data

    • All dates are text strings formatted in compliance with ISO 8601 as YYYY-MM-DD

    • If the syntax request is incorrect (missing or incorrectly formatted parameters) an HTTP 400 Bad Request response will be returned. The body of the response may include an explanation.

    • Data updated_at (used for querying changed-from) refers only to a particular entity and not its logical relations. Example: If a new substance reference relation is added, but the substance detail has not changed, this is reflected in the substance reference relation endpoint where a new entity with id and current dates in created_at and updated_at fields will be added, but in substances or references endpoint nothing has changed.

    The recommended way of sequential download

    • During the first download, it is possible to obtain all data by entering an old enough date in the parameter value changed-from, for example: changed-from=2020-01-01 It is important to write down the date on which the receiving the data was initiated let’s say 2020-10-20

    • For repeated data downloads, it is sufficient to receive only the records in which something has changed. It can therefore be requested with the parameter changed-from=2020-10-20 (example from the previous bullet). Again, it is important to write down the date when the updates were downloaded (eg. 2020-10-20). This date will be used in the next update (refresh) of the data.

    Services for entities

    List of endpoint URLs:

    Format of the request

    All endpoints have these parameters in common:

    • changed-from - a parameter to return only the entities that have been modified on a given date or later.

    • continue-after-id - a parameter to return only the entities that have a larger ID than specified in the parameter.

    • limit - a parameter to return only the number of records specified (up to 1000). The preset number is 100.

    Request example:

    /api/references?changed-from=2020-01-01&continue-after-id=1&limit=100

    Format of the response

    The response format is the same for all endpoints.

    • number_of_remaining_ids - the number of remaining entities that meet the specified criteria but are not displayed on the page. An integer of virtually unlimited size.

    • entities - an array of entity details in JSON format.

    Response example:

    {

    "number_of_remaining_ids" : 100,

    "entities" : [

    {

    "id": 3,

    "url": "https://www.ncbi.nlm.nih.gov/pubmed/32147628",

    "title": "Discovering drugs to treat coronavirus disease 2019 (COVID-19).",

    "impact_factor": "Discovering drugs to treat coronavirus disease 2019 (COVID-19).",

    "tested_on_species": "in silico",

    "publication_date": "2020-22-02",

    "created_at": "2020-30-03",

    "updated_at": "2020-31-03",

    "deleted_at": null

    },

    {

    "id": 4,

    "url": "https://www.ncbi.nlm.nih.gov/pubmed/32157862",

    "title": "CT Manifestations of Novel Coronavirus Pneumonia: A Case Report",

    "impact_factor": "CT Manifestations of Novel Coronavirus Pneumonia: A Case Report",

    "tested_on_species": "Patient",

    "publication_date": "2020-06-03",

    "created_at": "2020-30-03",

    "updated_at": "2020-30-03",

    "deleted_at": null

    },

    ]

    }

    Endpoint details

    Substances

    URL: /api/substances

    Substances

  10. e

    Salmonid management around the Channel (SAMARCH). Estimating salmonid growth...

    • b2find.eudat.eu
    Updated Aug 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Salmonid management around the Channel (SAMARCH). Estimating salmonid growth and survival at sea: a database to manage samples, data, and results of analyses. - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/1b5acd49-d7cc-52d2-998b-b44027e1c2b0
    Explore at:
    Dataset updated
    Aug 22, 2024
    Description

    In order for the data on salmon and sea trout that were produced by SAMARCH to be FAIR (Findable, Accessible, Interoperable, Reusable), all data were put together in file formats that could be read by anyone without computer skills and in international standards. Internally, the data was stored in a Postgresql database or in Excel files. We made interfaces or extractions in .csv format in order to make the data available to the scientific community. The data concerns the samples used and the analyses performed in the SAMARCH project. In total, 17133 biological samples were used to obtain 3 types of results (growth, sex and genetic characteristics). 14756 growth analyses, 12633 sex analyses, 1182 genetic analyses and 13682 photos were produced using 5 different protocols (scale reading and growth measurement, genetic sexing, genotyping, tracking and acoustic). Samples As part of the SAMARCH project, 14756 scales of salmon and sea trout were used for age determination, growth measurement and sexing. In addition, 1099 fin clips were preserved in alcohol and used for genetic analysis (Figure 1). The samples are stored and managed by the organisations that collected them. Some of them are managed by the Colisa Biological Resource Centre (Marchand et al., 2018), which makes them visible and available through an online catalogue. This online catalogue has been improved with the SAMARCH funding and display all the samples collected in France (Figure 2). A first home page gives access to the description of Colisa and a summary of the number of samples per species and per type of tissue. Access to more detailed information on the samples and to the request form is possible after registering on the website. Finally, thanks to the interoperability of the data and in order to widen access to the samples, data are also integrated into the international databases of the Global Biodiversity Information Facility (GBIF) and the Global Genome Biodiversity Network (GGBN). Images and analysis From the samples, different variables could be measured and the value of a variable is defined as the result of an analysis. Some results (age and sex) are made available immediately in an Excel file. This file is built from scripts that were used in the SAMARCH project, and can be reused in future research programmes. Depending on the type of variable, the results can be compiled directly into a "master" file or from links to other files stored in a directory linked to the file. The raw data files are very large (71 GB) and are therefore not stored online. Therefore, the name of a contact person is provided for each sample. Also, some results are only accessible (e.g. genetics) upon request to the contact person. File description The file is composed of two tabs, one for the different fields describing the samples and a second one to make the link between the sample and the associated data files. 1st tab: • Index: Unique id linking the analysis performed with the growth or image data files (second tab). • The first 3 fields (study site, sample type and sample code) guarantee the uniqueness of the sample code because the different partners may use the same code to different samples. This makes it possible to find the analyses carried out on a unique sample of interest. • Site: The different study sites correspond to an internal nomenclature and correspond to the study sites of the ORE DiaPFC located in Brittany and Normandy (Bresle, Oir tributary of the Selune and Scorff), to the Centre d'interprétation des captures de Salmonidés (CNICS) and to the English study sites (‘Autres’). • Type of sample: fine clip or scale. • Sample code: sample code defined by each partner. • Phenotype observed: Atlantic Salmon, Brown trout and Sea trout. • Catch number: This is used to link different samples from the same catch operation. • Catch date: this is the date when the sample was collected. • Catch site: Watercourse where the fish was caught. • Size (mm): Total length of the fish for the CNICS study sites (fish caught by anglers) and fork length for the other study sites. Measurement is in millimetres. • Weight (g): Weight of the fish, in grams. • Individual tagging: Individual mark identifier, when available. • Type of marking: Pit tag, RFID, Carlin tag, Floytag and visible implant, when available. • Protocol: Protocol of analysis that was carried out on the sample or on the fish from which the sample was taken: scale reading and growth measurement, genetic sexing, genotyping, acoustic tracking. • Result: Value of the result of the analysis for the variable of interest. • Contact: Person to contact for more information about the sample. Second tab: Attachments • Index: Unique id linking the sample to the analysis performed (first tab). • File: Link to the corresponding file.

  11. Long-term tree inventory dataset from the permanent sampling plot in the...

    • gbif.org
    Updated Aug 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Olga V. Smirnova; Maxim V. Bobrovsky; Roman V. Popadiouk; Maxim P. Shashkov; Larisa G. Khanina; Natalya V. Ivanova; Vladimir N. Shanin; Miroslav N. Stamenov; Sergey I. Chumachenko; Olga V. Smirnova; Maxim V. Bobrovsky; Roman V. Popadiouk; Maxim P. Shashkov; Larisa G. Khanina; Natalya V. Ivanova; Vladimir N. Shanin; Miroslav N. Stamenov; Sergey I. Chumachenko (2021). Long-term tree inventory dataset from the permanent sampling plot in the broadleaved forest of European Russia [Dataset]. http://doi.org/10.15468/mu99hf
    Explore at:
    Dataset updated
    Aug 20, 2021
    Dataset provided by
    Global Biodiversity Information Facilityhttps://www.gbif.org/
    State Nature Reserve "Kaluzhskie Zaseki"
    Authors
    Olga V. Smirnova; Maxim V. Bobrovsky; Roman V. Popadiouk; Maxim P. Shashkov; Larisa G. Khanina; Natalya V. Ivanova; Vladimir N. Shanin; Miroslav N. Stamenov; Sergey I. Chumachenko; Olga V. Smirnova; Maxim V. Bobrovsky; Roman V. Popadiouk; Maxim P. Shashkov; Larisa G. Khanina; Natalya V. Ivanova; Vladimir N. Shanin; Miroslav N. Stamenov; Sergey I. Chumachenko
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    This occurrence dataset provides primary data on repeated tree measurement of two inventories on the permanent sampling plot (8.8 ha) established in the old-growth polydominant broadleaved forest stand in the “Kaluzhskie Zaseki” State Nature Reserve (center of the European part of Russian Federation). The time span between the inventories was 30 years, and a total of more than 11 000 stems were included in the study (11 tree species and 3 genera). During the measurements, the tree species (for some trees only genus was determined), stem diameter at breast height of 1.3 m (DBH), and life status were recorded for every individual stem, and some additional attributes were determined for some trees. Field data were digitized and compiled into the PostgreSQL database. Deep data cleaning and validation (with documentation of changes) has been performed before data standardization according to the Darwin Core standard.

    Представлены первичные данные двух перечетов деревьев, выполненных на постоянной пробной площади (8.8 га), заложенной в старовозрастном полидоминантном широколиственном лесу в заповеднике “Калужские засеки”. Перечеты выполнены с разницей в 30 лет, всего исследовано более 11 000 учетных единиц (деревья 11-ти видов и 3-х родов). Для каждой учетной единицы определяли вид, диаметр на высоте 1.3 м и статус, для части деревьев также измеряли дополнительные характеристики. Все полевые данные были оцифрованы и организованы в базу данных в среде PostgreSQL. Перед стандартизацией данных в соответствии с Darwin Core выполнена их тщательная проверка, все внесенные изменения документированы.

  12. o

    Location of Ryanodine Receptor Type 2 Associated Catecholaminergic...

    • explore.openaire.eu
    • data.niaid.nih.gov
    • +1more
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Chang; Halil Beqaj; Leah Sittenfeld; Marco Miotto; Haikel Dridi; Gloria Willson; Carolyn Jorge Martinez; Jaan Altosaar Li; Steven Reiken; Yang Liu; Zonglin Dai; Andrew Marks (2024). Location of Ryanodine Receptor Type 2 Associated Catecholaminergic Polymorphic Ventricular Tachycardia Variants Dataset [Dataset]. http://doi.org/10.5281/zenodo.12786084
    Explore at:
    Dataset updated
    Jul 19, 2024
    Authors
    Alexander Chang; Halil Beqaj; Leah Sittenfeld; Marco Miotto; Haikel Dridi; Gloria Willson; Carolyn Jorge Martinez; Jaan Altosaar Li; Steven Reiken; Yang Liu; Zonglin Dai; Andrew Marks
    Description

    Location of RYR2 Associated CPVT Variants Dataset Catecholaminergic polymorphic ventricular tachycardia (CPVT) is a rare inherited arrhythmia caused by pathogenic RYR2 variants. CPVT is characterized by exercise/stress-induced syncope and cardiac arrest in the absence of resting ECG and structural cardiac abnormalities. Here, we present a database collected from 225 clinical papers, published from 2001-October 2020, about CPVT associated RYR2 variants. 1355 patients, both with and without CPVT, with RYR2 variants are in the database. There are a total of 968 CPVT patients or suspected CPVT patients in the database. The database includes information regarding genetic diagnosis, location of the RYR2 variant(s), clinical history and presentation, and treatment strategies for each patient. Patients will have a varying depth of information in each of the provided fields. Database website: https://cpvtdb.port5000.com/ Dataset Information This dataset includes: eTable2.xlsx Tabular version of the database Most relevant tables in the PostgreSQL database regarding patient sex, conditions, treatments, family history, and variant information were joined to create this database Views calculating the affected RYR2 exons, domains and subdomains have been joined to patient information m-n tables for patient's conditions and treatments have been converted to pivot tables - every condition and treatment that has at least 1 person with that condition or treatment is a column. NOTE: This was created using a LEFT JOIN of individuals and individual_variants tables. Individuals with more than 1 recorded variant will be listed on multiple rows. There is only 1 person in this database as of the current version with multiple recorded variants _.gz.sql PostgreSQL database dump Expands to about 4.1 GB after loading the database dump The database includes two schemas: public: Includes all information in patients and variants Also includes all RYR2 variants in ClinVar uta: Contains the biocommons/uta database required to make the hgvs Python package to work locally See https://github.com/biocommons/uta for more information NOTE: It is recommended to use this version of the database only for development or analysis purposes database_tables.pdf Contains information on most of the database tables in the public schema 00_globals.sql Required to load the PostgreSQL database dump Creates a user named anonymous for the uta schema How To Load Database Using Docker First, download the 00_globals.sql and _.gz.sql file and move it into a directory. The default postgres image will load files from the /docker-entrypoint-initdb.d directory if the database is empty. See Docker Hub for more information. Example using docker compose with pgadmin and a volume to persist the data. # Use postgres/example user/password credentials version: '3.9' volumes: mydatabasevolume: null services: db: image: postgres:16 restart: always environment: POSTGRES_PASSWORD: mysecretpassword POSTGRES_USER: postgres volumes: - ':/docker-entrypoint-initdb.d/' - 'mydatabasevolume:/var/lib/postgresql/data' pgadmin: image: dpage/pgadmin4 environment: PGADMIN_DEFAULT_EMAIL: user@domain.com PGADMIN_DEFAULT_PASSWORD: SuperSecret Creating the Database from Scratch See https://github.com/alexdaiii/cpvt-database-loader for source code to create the database from scratch.

  13. c

    Data Base Management Systems market size was USD 50.5 billion in 2022 !

    • cognitivemarketresearch.com
    pdf,excel,csv,ppt
    Updated Jun 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cognitive Market Research (2024). Data Base Management Systems market size was USD 50.5 billion in 2022 ! [Dataset]. https://www.cognitivemarketresearch.com/data-base-management-systems-market-report
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Jun 15, 2024
    Dataset authored and provided by
    Cognitive Market Research
    License

    https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy

    Time period covered
    2021 - 2033
    Area covered
    Global
    Description

    The global Data Base Management Systems market was valued at USD 50.5 billion in 2022 and is projected to reach USD 120.6 Billion by 2030, registering a CAGR of 11.5 % for the forecast period 2023-2030. Factors Affecting Data Base Management Systems Market Growth

    Growing inclination of organizations towards adoption of advanced technologies like cloud-based technology favours the growth of global DBMS market
    

    The cloud-based data base management system solutions offer the organizations with an ability to scale their database infrastructure up or down as per requirement. In a crucial business environment data volume can vary over time. Here, the cloud allows organizations to allocate resources in a dynamic and systematic manner, thereby, ensuring optimal performance without underutilization. In addition, these cloud-based solutions are cost-efficient. As, these cloud-based DBMS solutions eliminate the need for companies to maintain and invest in physical infrastructure and hardware. It helps in reducing ongoing operational costs and upfront capital expenditures. Organizations can choose pay-as-you-go pricing models, where they need to pay only for the resources they consume. Therefore, it has been a cost-efficient option for both smaller businesses and large-enterprises. Moreover, the cloud-based data base management system platforms usually come with management tools which streamline administrative tasks such as backup, provisioning, recovery, and monitoring. It allows IT teams to concentrate on more of strategic tasks rather than routine maintenance activities, thereby, enhancing operational efficiency. Whereas, these cloud-based data base management systems allow users to remote access and collaboration among teams, irrespective of their physical locations. Thus, in regards with today's work environment, which focuses on distributed and remote workforces. These cloud-based DBMS solution enables to access data and update in real-time through authorized personnel, allowing collaboration and better decision-making. Thus, owing to all the above factors, the rising adoption of advanced technologies like cloud-based DBMS is favouring the market growth.

    Availability of open-source solutions is likely to restrain the global data base management systems market growth
    

    Open-source data base management system solutions such as PostgreSQL, MongoDB, and MySQL, offer strong functionality at minimal or no licensing costs. It makes open-source solutions an attractive option for companies, especially start-ups or smaller businesses with limited budgets. As these open-source solutions offer similar capabilities to various commercial DBMS offerings, various organizations may opt for this solutions in order to save costs. The open-source solutions may benefit from active developer communities which contribute to their development, enhancement, and maintenance. This type of collaborative environment supports continuous innovation and improvement, which results into solutions that are slightly competitive with commercial offerings in terms of performance and features. Thus, the open-source solutions create competition for commercial DBMS market, they thrive in the market by offering unique value propositions, addressing needs of organizations which prioritize professional support, seamless integration into complex IT ecosystems, and advanced features. Introduction of Data Base Management Systems

    A Database Management System (DBMS) is a software which is specifically designed to organize and manage data in a structured manner. This system allows users to create, modify, and query a database, and also manage the security and access controls for that particular database. The DBMS offers tools for creating and modifying data models, that define the structure and relationships of data in a database. This system is also responsible for storing and retrieving data from the database, and also provide several methods for searching and querying the data. The data base management system also offers mechanisms to control concurrent access to the database, in order to ensure that number of users may access the data. The DBMS provides tools to enforce security constraints and data integrity, such as the constraints on the value of data and access controls that restricts who can access the data. The data base management system also provides mechanisms for recovering and backing up the data when a system failure occurs....

  14. Data from: SQL Injection Attack Netflow

    • zenodo.org
    • portalcientifico.unileon.es
    • +1more
    Updated Sep 28, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ignacio Crespo; Ignacio Crespo; Adrián Campazas; Adrián Campazas (2022). SQL Injection Attack Netflow [Dataset]. http://doi.org/10.5281/zenodo.6907252
    Explore at:
    Dataset updated
    Sep 28, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ignacio Crespo; Ignacio Crespo; Adrián Campazas; Adrián Campazas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction

    This datasets have SQL injection attacks (SLQIA) as malicious Netflow data. The attacks carried out are SQL injection for Union Query and Blind SQL injection. To perform the attacks, the SQLMAP tool has been used.

    NetFlow traffic has generated using DOROTHEA (DOcker-based fRamework fOr gaTHering nEtflow trAffic). NetFlow is a network protocol developed by Cisco for the collection and monitoring of network traffic flow data generated. A flow is defined as a unidirectional sequence of packets with some common properties that pass through a network device.

    Datasets

    The firts dataset was colleted to train the detection models (D1) and other collected using different attacks than those used in training to test the models and ensure their generalization (D2).

    The datasets contain both benign and malicious traffic. All collected datasets are balanced.

    The version of NetFlow used to build the datasets is 5.

    DatasetAimSamplesBenign-malicious
    traffic ratio
    D1Training400,00350%
    D2Test57,23950%

    Infrastructure and implementation

    Two sets of flow data were collected with DOROTHEA. DOROTHEA is a Docker-based framework for NetFlow data collection. It allows you to build interconnected virtual networks to generate and collect flow data using the NetFlow protocol. In DOROTHEA, network traffic packets are sent to a NetFlow generator that has a sensor ipt_netflow installed. The sensor consists of a module for the Linux kernel using Iptables, which processes the packets and converts them to NetFlow flows.

    DOROTHEA is configured to use Netflow V5 and export the flow after it is inactive for 15 seconds or after the flow is active for 1800 seconds (30 minutes)

    Benign traffic generation nodes simulate network traffic generated by real users, performing tasks such as searching in web browsers, sending emails, or establishing Secure Shell (SSH) connections. Such tasks run as Python scripts. Users may customize them or even incorporate their own. The network traffic is managed by a gateway that performs two main tasks. On the one hand, it routes packets to the Internet. On the other hand, it sends it to a NetFlow data generation node (this process is carried out similarly to packets received from the Internet).

    The malicious traffic collected (SQLI attacks) was performed using SQLMAP. SQLMAP is a penetration tool used to automate the process of detecting and exploiting SQL injection vulnerabilities.

    The attacks were executed on 16 nodes and launch SQLMAP with the parameters of the following table.

    ParametersDescription
    '--banner','--current-user','--current-db','--hostname','--is-dba','--users','--passwords','--privileges','--roles','--dbs','--tables','--columns','--schema','--count','--dump','--comments', --schema'Enumerate users, password hashes, privileges, roles, databases, tables and columns
    --level=5Increase the probability of a false positive identification
    --risk=3Increase the probability of extracting data
    --random-agentSelect the User-Agent randomly
    --batchNever ask for user input, use the default behavior
    --answers="follow=Y"Predefined answers to yes

    Every node executed SQLIA on 200 victim nodes. The victim nodes had deployed a web form vulnerable to Union-type injection attacks, which was connected to the MYSQL or SQLServer database engines (50% of the victim nodes deployed MySQL and the other 50% deployed SQLServer).

    The web service was accessible from ports 443 and 80, which are the ports typically used to deploy web services. The IP address space was 182.168.1.1/24 for the benign and malicious traffic-generating nodes. For victim nodes, the address space was 126.52.30.0/24.
    The malicious traffic in the test sets was collected under different conditions. For D1, SQLIA was performed using Union attacks on the MySQL and SQLServer databases.

    However, for D2, BlindSQL SQLIAs were performed against the web form connected to a PostgreSQL database. The IP address spaces of the networks were also different from those of D1. In D2, the IP address space was 152.148.48.1/24 for benign and malicious traffic generating nodes and 140.30.20.1/24 for victim nodes.

    To run the MySQL server we ran MariaDB version 10.4.12.
    Microsoft SQL Server 2017 Express and PostgreSQL version 13 were used.

  15. m

    Vessel Density in Irish Waters (2019)

    • data.marine.ie
    ogc:wms +1
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Marine Observation and Data Network (EMODnet) (2023). Vessel Density in Irish Waters (2019) [Dataset]. https://data.marine.ie/geonetwork/srv/api/records/ie.marine.data:dataset.3991
    Explore at:
    ogc:wms, www:link-1.0-http--linkAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Marine Institute
    Authors
    European Marine Observation and Data Network (EMODnet)
    Time period covered
    Jan 1, 2017 - Dec 31, 2017
    Description

    EMODnet Vessel Density Map were created by Cogea in 2019 in the framework of EMODnet Human Activities, an initiative funded by the EU Commission. The maps are based on AIS data purchased by CLS and show shipping density in 1km*1km cells of a grid covering all EU waters (and some neighbouring areas). Density is expressed as hours per square kilometre per month. A set of AIS data had to be purchased from CLS, a commercial provider. The data consists of messages sent by automatic tracking system installed on board ships and received by terrestrial and satellite receivers alike. The dataset covers the whole 2017 for an area covering all EU waters. A partial pre-processing of the data was carried out by CLS: (i) The only AIS messages delivered were the ones relevant for assessing shipping activities (AIS messages 1, 2, 3, 18 and 19). (ii) The AIS DATA were down-sampled to 3 minutes (iii) Duplicate signals were removed. (iv) Wrong MMSI signals were removed. (v) Special characters and diacritics were removed. (vi) Signals with erroneous speed over ground (SOG) were removed (negative values or more than 80 knots). (vii) Signals with erroneous course over ground (COG) were removed (negative values or more than 360 degrees). (viii) A Kalman filter was applied to remove satellite noise. The Kalman filter was based on a correlated random walk fine-tuned for ship behaviour. The consistency of a new observation with the modeled position is checked compared to key performance indicators such as innovation, likelihood and speed. (ix) A footprint filter was applied to check for satellite AIS data consistency. All positions which were not compliant with the ship-satellite co-visibility were flagged as invalid.The AIS data were converted from their original format (NMEA) to CSV, and split into 12 files, each corresponding to a month of 2017. Overall the pre-processed dataset included about 1.9 billion records. Upon trying and importing the data into a database, it emerged that some messages still contained invalid characters. By running a series of commands from a Linux shell, all invalid characters were removed. The data were then imported into a PostgreSQL relational database. By querying the database it emerged that some MMSI numbers are associated to more than a ship type during the year. To cope with this issue, we thus created an unique MMSI/shyp type register where we attributed to an MMSI the most recurring ship type. The admissible ship types reported in the AIS messages were grouped into macro categories: 0 Other, 1 Fishing, 2 Service, 3 Dredging or underwater ops, 4 Sailing, 5 Pleasure Craft, 6 High speed craft, 7 Tug and towing, 8 Passenger, 9 Cargo, 10 Tanker, 11 Military and Law Enforcement, 12 Unknown and All ship types. The subsequent step consisted of creating points representing ship positions from the AIS messages. This was done through a custom-made script for ArcGIS developed by Lovell Johns. Another custom-made script reconstructed ship routes (lines) from the points, by using the MMSI number as a unique identifier of a ship. The script created a line for every two consecutive positions of a ship. In addition, for each line the script calculated its length (in km) and its duration (in hours) and appended them both as attributes to the line. If the distance between two consecutive positions of a ship was longer than 30 km or if the time interval was longer than 6 hours, no line was created. Both datasets (points and lines) were projected into the ETRS89/ETRS-LAEA coordinate reference system, used for statistical mapping at all scales, where true area representation is required (EPSG: 3035).The lines obtained through the ArcGIS script were then intersected with a custom-made 1km*1km grid polygon (21 million cells) based on the EEA's grid and covering the whole area of interest (all EU sea basins). Because each line had length and duration as attributes, it was possible to calculate how much time each ship spent in a given cell over a month by intersecting line records with grid cell records in another dedicated PostgreSQL database. Using the PostGIS Intersect tool, for each cell of the grid, we then summed the time value of each 'segment' in it, thus obtaining the density value associated to that cell, stored in calculated PostGIS raster tables. Density is thus expressed in hours per square kilometre per month. The final step consisted of creating raster files (TIFF file format) with QuantumGIS from the PostgreSQL vessel density tables. Annual average rasters by ship type were also created. The dataset was clipped according to the National Marine Planning Framework (NMPF) assessment area. None

  16. Neotoma Database Snapshot 2021-06-08

    • figshare.com
    tar
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Backup Neotoma Paleoecological Database (2023). Neotoma Database Snapshot 2021-06-08 [Dataset]. http://doi.org/10.6084/m9.figshare.14750697.v1
    Explore at:
    tarAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Data Backup Neotoma Paleoecological Database
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Neotoma Database snapshot. Can be restored from the commandline using pg_restore (https://www.postgresql.org/docs/current/app-pgrestore.html). Current as of June 8, 2021.

  17. Z

    MoreFixes: Largest CVE dataset with fixes

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Oct 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Akhoundali, Jafar (2024). MoreFixes: Largest CVE dataset with fixes [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11199119
    Explore at:
    Dataset updated
    Oct 23, 2024
    Dataset provided by
    Rahim Nouri, Sajad
    Rietveld, Kristian F. D.
    GADYATSKAYA, Olga
    Akhoundali, Jafar
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    In our work, we have designed and implemented a novel workflow with several heuristic methods to combine state-of-the-art methods related to CVE fix commits gathering. As a consequence of our improvements, we have been able to gather the largest programming language-independent real-world dataset of CVE vulnerabilities with the associated fix commits. Our dataset containing 29,203 unique CVEs coming from 7,238 unique GitHub projects is, to the best of our knowledge, by far the biggest CVE vulnerability dataset with fix commits available today. These CVEs are associated with 35,276 unique commits as sql and 39,931 patch commit files that fixed those vulnerabilities(some patch files can't be saved as sql due to several techincal reasons) Our larger dataset thus substantially improves over the current real-world vulnerability datasets and enables further progress in research on vulnerability detection and software security. We used NVD(nvd.nist.gov) and Github Secuirty advisory Database as the main sources of our pipeline.

    We release to the community a 16GB PostgreSQL database that contains information on CVEs up to 2024-09-26, CWEs of each CVE, files and methods changed by each commit, and repository metadata. Additionally, patch files related to the fix commits are available as a separate package. Furthermore, we make our dataset collection tool also available to the community.

    cvedataset-patches.zip file contains fix patches, and postgrescvedumper.sql.zip contains a postgtesql dump of fixes, together with several other fields such as CVEs, CWEs, repository meta-data, commit data, file changes, method changed, etc.

    MoreFixes data-storage strategy is based on CVEFixes to store CVE commits fixes from open-source repositories, and uses a modified version of Porspector(part of ProjectKB from SAP) as a module to detect commit fixes of a CVE. Our full methodology is presented in the paper, with the title of "MoreFixes: A Large-Scale Dataset of CVE Fix Commits Mined through Enhanced Repository Discovery", which will be published in the Promise conference (2024).

    For more information about usage and sample queries, visit the Github repository: https://github.com/JafarAkhondali/Morefixes

    If you are using this dataset, please be aware that the repositories that we mined contain different licenses and you are responsible to handle any licesnsing issues. This is also the similar case with CVEFixes.

    This product uses the NVD API but is not endorsed or certified by the NVD.

    This research was partially supported by the Dutch Research Council (NWO) under the project NWA.1215.18.008 Cyber Security by Integrated Design (C-SIDe).

    To restore the dataset, you can use the docker-compose file available at the gitub repository. Dataset default credentials after restoring dump:

    POSTGRES_USER=postgrescvedumper POSTGRES_DB=postgrescvedumper POSTGRES_PASSWORD=a42a18537d74c3b7e584c769152c3d

    Please use this for citation:

     title={MoreFixes: A large-scale dataset of CVE fix commits mined through enhanced repository discovery},
     author={Akhoundali, Jafar and Nouri, Sajad Rahim and Rietveld, Kristian and Gadyatskaya, Olga},
     booktitle={Proceedings of the 20th International Conference on Predictive Models and Data Analytics in Software Engineering},
     pages={42--51},
     year={2024}
    }
    
  18. d

    Small Business Contact Data | Global Coverage | +95% Email and Phone Data...

    • datarade.ai
    .json, .csv
    Updated Feb 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Forager.ai (2024). Small Business Contact Data | Global Coverage | +95% Email and Phone Data Accuracy | Bi-weekly Refresh Rate | 50+ Data Points [Dataset]. https://datarade.ai/data-products/small-business-contact-data-bi-weekly-updates-linkedin-in-forager-ai
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Feb 27, 2024
    Dataset provided by
    Forager.ai
    Area covered
    Cayman Islands, Colombia, Oman, Belgium, Vanuatu, Namibia, Slovenia, Japan, Macedonia (the former Yugoslav Republic of), Virgin Islands (British)
    Description

    Forager.ai's Small Business Contact Data set is a comprehensive collection of over 695M professional profiles. With an unmatched 2x/month refresh rate, we ensure the most current and dynamic data in the industry today. We deliver this data via JSONL flat-files or PostgreSQL database delivery, capturing publicly available information on each profile.

    | Volume and Stats |

    Every single record refreshed 2x per month, setting industry standards. First-party data curation powering some of the most renowned sales and recruitment platforms. Delivery frequency is hourly (fastest in the industry today). Additional datapoints and linkages available. Delivery formats: JSONL, PostgreSQL, CSV. | Datapoints |

    Over 150+ unique datapoints available! Key fields like Current Title, Current Company, Work History, Educational Background, Location, Address, and more. Unique linkage data to other social networks or contact data available. | Use Cases |

    Sales Platforms, ABM Vendors, Intent Data Companies, AdTech and more:

    Deliver the best end-customer experience with our people feed powering your solution! Be the first to know when someone changes jobs and share that with end-customers. Industry-leading data accuracy. Connect our professional records to your existing database, find new connections to other social networks, and contact data. Hashed records also available for advertising use-cases. Venture Capital and Private Equity:

    Track every company and employee with a publicly available profile. Keep track of your portfolio's founders, employees and ex-employees, and be the first to know when they move or start up. Keep an eye on the pulse by following the most influential people in the industries and segments you care about. Provide your portfolio companies with the best data for recruitment and talent sourcing. Review departmental headcount growth of private companies and benchmark their strength against competitors. HR Tech, ATS Platforms, Recruitment Solutions, as well as Executive Search Agencies:

    Build products for industry-specific and industry-agnostic candidate recruiting platforms. Track person job changes and immediately refresh profiles to avoid stale data. Identify ideal candidates through work experience and education history. Keep ATS systems and candidate profiles constantly updated. Link data from this dataset into GitHub, LinkedIn, and other social networks. | Delivery Options |

    Flat files via S3 or GCP PostgreSQL Shared Database PostgreSQL Managed Database REST API Other options available at request, depending on scale required | Other key features |

    Over 120M US Professional Profiles. 150+ Data Fields (available upon request) Free data samples, and evaluation. Tags: Professionals Data, People Data, Work Experience History, Education Data, Employee Data, Workforce Intelligence, Identity Resolution, Talent, Candidate Database, Sales Database, Contact Data, Account Based Marketing, Intent Data.

  19. 4

    Visualizing Collaboration with Superstars - Replication Package

    • data.4tu.nl
    zip
    Updated Jul 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Preston Hull; Yuanze Xiong; Filip Plonka; Filip Marchidan (2024). Visualizing Collaboration with Superstars - Replication Package [Dataset]. http://doi.org/10.4121/4243dace-9bc2-4ca2-a343-3d481d6a9316.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 4, 2024
    Dataset provided by
    4TU.ResearchData
    Authors
    Preston Hull; Yuanze Xiong; Filip Plonka; Filip Marchidan
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains various source code files used in the process of replicating the Visualizing Collaboration with Superstars Bachelor's Thesis by Preston Hull.


    Among the included source code files include SQL files (intended to execute on PostgreSQL 16) to create databases, Python 3.11 scripts to process the S2AG dataset into a database, and various samples from the dataset (from Semantic Scholar). To execute the scripts, the psycopg2 library is required, as well as proper configuration within the scripts to connect to the database.

  20. m

    Help Desk Tickets

    • data.mendeley.com
    Updated Apr 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammad Abdellatif (2025). Help Desk Tickets [Dataset]. http://doi.org/10.17632/btm76zndnt.1
    Explore at:
    Dataset updated
    Apr 22, 2025
    Authors
    Mohammad Abdellatif
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These datasets were created as part of a study involving an experiment with a helpdesk team at an international software company. The goal was to implement an automated performance appraisal model that evaluates the team based on issue reports and key features derived from classifying message exchanged with the customers using Dialog Acts. The data was extracted from a PostgreSQL database and curated to present aggregated views of helpdesk tickets reported between January 2016 and March 2023. Certain fields have been anonymized (masked) to protect the data owner’s privacy while preserving the overall meaning of the information. The datasets are: - issues.csv Dataset holds information for all reported tickets, showing its category, priority, who reported the issue, related project, who was assigned to resolve that ticket, the start time, the resolution time, and how many seconds the ticket spent in each resolution step. - issues_change_history.csv Shows when the ticket assignee and status were changed. This dataset helps calculate the time spent on each step. - issues_snapshots.csv Contains the same records in the issues.csv but duplicates the tickets that multiple assignees handled; each record is the processing cycle per assignee. - scored_issues_snapshot_sample.xlsx A stratified and representative sample extracted from the tickets and then handed to an annotator (the help-desk manager) to appraise the resolution performance against three targets, where 5 is the highest score and 1 is the lowest. - sample_utterances.csv Contains the messages (comments) that were exchanged between the reporters and the helpdesk team. This dataset only contains the curated messages for the issues listed in scored_issues_snapshot_sample.xlsx, as those were the focus of the initial study.

    The following files are guidelines on how to work and interpret the datasets: - FEATURES.md Describes the datasets features (fields). - EXAMPLE.md Shows an example of an issue in all datasets so the reader can understand the relations between them. - process-flow.png A demonstration of the steps followed by the helpdesk team to resolve an issue.

    These datasets are valuable for many other experiments such like: - Count Predictions - Regression - Association rules mining - Natural Language Processing - Classification - Clustering

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jose Vidal-Paz; Jose Vidal-Paz (2024). Meteogalicia PostgreSQL Database (2000 - 2018) [Dataset]. http://doi.org/10.5281/zenodo.11915325
Organization logo

Data from: Meteogalicia PostgreSQL Database (2000 - 2018)

Related Article
Explore at:
binAvailable download formats
Dataset updated
Sep 9, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jose Vidal-Paz; Jose Vidal-Paz
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This database contains: rainfall, humidity, temperature, global solar radiation, wind velocity and wind direction ten-minute data from 150 stations of the Meteogalicia network between 1-jan-2000 and 31-dec-2018.

Version installed: postgresql 9.1

Extension installed: postgis 1.5.3-1

Instructions to restore the database:

  1. Create template:

    createdb -E UTF8 -O postgres -U postgres template_postgis

  2. Activate PL/pgSQL language:

    createlang plpgsql -d template_postgis -U postgres

  3. Load definitions of PostGIS:

    psql -d template_postgis -U postgres -f /usr/share/postgresql/9.1/contrib/postgis-1.5/postgis.sql

    psql -d template_postgis -U postgres -f /usr/share/postgresql/9.1/contrib/postgis-1.5/spatial_ref_sys.sql

    psql -d template_postgis -U postgres -f /usr/share/postgresql/9.1/contrib/postgis_comments.sql

  4. Create database with "MeteoGalicia" name with PostGIS extension:

    createdb -U postgres -T template_postgis MeteoGalicia

  5. Restore backup:

    cat Meteogalicia* | psql MeteoGalicia

Search
Clear search
Close search
Google apps
Main menu