77 datasets found
  1. Number sequences for self-supervised learning

    • kaggle.com
    zip
    Updated Jan 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    stepheniota (2023). Number sequences for self-supervised learning [Dataset]. https://www.kaggle.com/datasets/stepheniota/integer-sequences-for-representation-learning
    Explore at:
    zip(36109567 bytes)Available download formats
    Dataset updated
    Jan 21, 2023
    Authors
    stepheniota
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Number sequences for self-supervised learning

    The Online Encyclopedia of Integer Sequences (OEIS) is a well-known database of number sequences with interesting and unique mathematical properties. Some sequences are easily recognizable, like A000045 (the Fibonacci numbers) or A000108 (Catalan numbers). Others are more abstract and require complicated formulas to validate (e.g. A000289 a non-linear recurrence.).

    In this dataset you will find the complete set of number sequences from the OEIS database. There are two files, sequences.csv and metadata.txt. Each row of sequences.csv starts with a sequence's A-number, followed by the comma separated sequence. metadata.txt maps sequence's A-number to their definition.

    References

    OEIS Foundation Inc. (2023), The On-Line Encyclopedia of Integer Sequences, Published electronically at https://oeis.org.

  2. t

    Trusted Research Environments: Analysis of Characteristics and Data...

    • researchdata.tuwien.ac.at
    bin, csv
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Weise; Martin Weise; Andreas Rauber; Andreas Rauber (2024). Trusted Research Environments: Analysis of Characteristics and Data Availability [Dataset]. http://doi.org/10.48436/cv20m-sg117
    Explore at:
    bin, csvAvailable download formats
    Dataset updated
    Jun 25, 2024
    Dataset provided by
    TU Wien
    Authors
    Martin Weise; Martin Weise; Andreas Rauber; Andreas Rauber
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Trusted Research Environments (TREs) enable analysis of sensitive data under strict security assertions that protect the data with technical organizational and legal measures from (accidentally) being leaked outside the facility. While many TREs exist in Europe, little information is available publicly on the architecture and descriptions of their building blocks & their slight technical variations. To shine light on these problems, we give an overview of existing, publicly described TREs and a bibliography linking to the system description. We further analyze their technical characteristics, especially in their commonalities & variations and provide insight on their data type characteristics and availability. Our literature study shows that 47 TREs worldwide provide access to sensitive data of which two-thirds provide data themselves, predominantly via secure remote access. Statistical offices make available a majority of available sensitive data records included in this study.

    Methodology

    We performed a literature study covering 47 TREs worldwide using scholarly databases (Scopus, Web of Science, IEEE Xplore, Science Direct), a computer science library (dblp.org), Google and grey literature focusing on retrieving the following source material:

    • Peer-reviewed articles where available,
    • TRE websites,
    • TRE metadata catalogs.

    The goal for this literature study is to discover existing TREs, analyze their characteristics and data availability to give an overview on available infrastructure for sensitive data research as many European initiatives have been emerging in recent months.

    Technical details

    This dataset consists of five comma-separated values (.csv) files describing our inventory:

    • countries.csv: Table of countries with columns id (number), name (text) and code (text, in ISO 3166-A3 encoding, optional)
    • tres.csv: Table of TREs with columns id (number), name (text), countryid (number, refering to column id of table countries), structureddata (bool, optional), datalevel (one of [1=de-identified, 2=pseudonomized, 3=anonymized], optional), outputcontrol (bool, optional), inceptionyear (date, optional), records (number, optional), datatype (one of [1=claims, 2=linked records]), optional), statistics_office (bool), size (number, optional), source (text, optional), comment (text, optional)
    • access.csv: Table of access modes of TREs with columns id (number), suf (bool, optional), physical_visit (bool, optional), external_physical_visit (bool, optional), remote_visit (bool, optional)
    • inclusion.csv: Table of included TREs into the literature study with columns id (number), included (bool), exclusion reason (one of [peer review, environment, duplicate], optional), comment (text, optional)
    • major_fields.csv: Table of data categorization into the major research fields with columns id (number), life_sciences (bool, optional), physical_sciences (bool, optional), arts_and_humanities (bool, optional), social_sciences (bool, optional).

    Additionally, a MariaDB (10.5 or higher) schema definition .sql file is needed, properly modelling the schema for databases:

    • schema.sql: Schema definition file to create the tables and views used in the analysis.

    The analysis was done through Jupyter Notebook which can be found in our source code repository: https://gitlab.tuwien.ac.at/martin.weise/tres/-/blob/master/analysis.ipynb

  3. p

    Macedonia Number Dataset

    • listtodata.com
    .csv, .xls, .txt
    Updated Jul 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    List to Data (2025). Macedonia Number Dataset [Dataset]. https://listtodata.com/macedonia-dataset
    Explore at:
    .csv, .xls, .txtAvailable download formats
    Dataset updated
    Jul 17, 2025
    Authors
    List to Data
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2025 - Dec 31, 2025
    Area covered
    Namibia, Estonia, Jamaica, Tajikistan, Bahrain, Mozambique, Guatemala, Saint Lucia, Jersey, India
    Variables measured
    phone numbers, Email Address, full name, Address, City, State, gender,age,income,ip address,
    Description

    Macedonia number dataset is a collection of phone numbers from people living in Macedonia. You can filter the data by gender, age, and relationship status. This flexibility helps you connect with the right audience. If you want to reach young adults or families, you can quickly find the right numbers. This makes your communication more effective and targeted. List to Data helps find phone numbers for your business. Additionally, the Macedonia number dataset follows GDPR rules. These rules protect people’s privacy and ensure that all data usage is legal. You can remove invalid data, keeping only active, accurate numbers. This helps update your list as numbers change. With this database, you have access to information that is not only reliable but also respectful of privacy. Macedonia phone data refers to a database of phone numbers that is 100% correct and valid. We carefully check every number in this database to ensure it works. This means businesses can call these numbers confidently, knowing they will reach real people. If you find a number that doesn’t work, you have a replacement guarantee. This means the company will give you a new number for free. Therefore, your contact list stays fresh and reliable. Furthermore, all phone numbers in this Macedonia phone data are based on a customer permission basis. This means each person included their number in the database. They know they use their information safely and ethically. You can trust this data for marketing or outreach efforts. Overall, phone data from Macedonia provides a strong foundation for any outreach campaign. Macedonia phone number list is a valuable tool that allows you to filter information based on specific needs. This list is helpful for businesses and organizations that want to reach out to people in this country. The phone numbers come from trusted sources, meaning companies gather data from reliable sources. You can also check the source URLs to see where the information comes from. Moreover, the Macedonia phone number list follows an opt-in process. This means that everyone on the list of Macedonia agreed to share their phone number. They understand that they will use their information and permit it. This ensures the data is legal and respectful of people’s privacy. Businesses can use the list without worrying about breaking any rules.

  4. Z

    Peatland Decomposition Database (1.1.0)

    • data.niaid.nih.gov
    Updated Mar 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Teickner, Henning; Knorr, Klaus-Holger (2025). Peatland Decomposition Database (1.1.0) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11276064
    Explore at:
    Dataset updated
    Mar 5, 2025
    Dataset provided by
    University of Münster
    Authors
    Teickner, Henning; Knorr, Klaus-Holger
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    1 Introduction

    The Peatland Decomposition Database (PDD) stores data from published litterbag experiments related to peatlands. Currently, the database focuses on northern peatlands and Sphagnum litter and peat, but it also contains data from some vascular plant litterbag experiments. Currently, the database contains entries from 34 studies, 2,160 litterbag experiments, and 7,297 individual samples with 117,841 measurements for various attributes (e.g. relative mass remaining, N content, holocellulose content, mesh size). The aim is to provide a harmonized data source that can be useful to re-analyse existing data and to plan future litterbag experiments.

    The Peatland Productivity and Decomposition Parameter Database (PPDPD) (Bona et al. 2018) is similar to the Peatland Decomposition Database (PDD) in that both contain data from peatland litterbag experiments. The differences are that both databases partly contain different data, that PPDPD additionally contains information on vegetation productivity, which PDD does not, and that PDD provides more information and metadata on litterbag experiments, and also measurement errors.

    2 Updates

    Compared to version 1.0.0, this version has a new structure for table experimental_design_format, contains additional metadata on the experimental design (these were omitted in version 1.0.0), and contains the scripts that were used to import the data into the database.

    3 Methods

    3.1 Data collection

    Data for the database was collected from published litterbag studies, by extracting published data from figures, tables, or other data sources, and by contacting the authors of the studies to obtain raw data. All data processing was done with R (R version 4.2.0 (2022-04-22)) (R Core Team 2022).

    Studies were identified via a Scopus search with search string (TITLE-ABS-KEY ( peat* AND ( "litter bag" OR "decomposition rate" OR "decay rate" OR "mass loss")) AND NOT ("tropic*")) (2022-12-17). These studies were further screened to exclude those which do not contain litterbag data or which recycle data from other studies that have already been considered. Additional studies with litterbag experiments in northern peatlands we were aware of, but which were not identified in the literature search were added to the list of publications. For studies not older than 10 years, authors were contacted to obtain raw data, however this was successful only in few cases. To date, the database focuses on Sphagnum litterbag experiments and not from all studies that were identified by the literature search data have been included yet in the database.

    Data from figures were extracted using the package ‘metaDigitise’ (1.0.1) (Pick, Nakagawa, and Noble 2018). Data from tables were extracted manually.

    Data from the following studies are currently included: Farrish and Grigal (1985), Bartsch and Moore (1985), Farrish and Grigal (1988), Vitt (1990), Hogg, Lieffers, and Wein (1992), Sanger, Billett, and Cresser (1994), Hiroki and Watanabe (1996), Szumigalski and Bayley (1996), Prevost, Belleau, and Plamondon (1997), Arp, Cooper, and Stednick (1999), Robbert A. Scheffer and Aerts (2000), R. A. Scheffer, Van Logtestijn, and Verhoeven (2001), Limpens and Berendse (2003), Waddington, Rochefort, and Campeau (2003), Asada, Warner, and Banner (2004), Thormann, Bayley, and Currah (2001), Trinder, Johnson, and Artz (2008), Breeuwer et al. (2008), Trinder, Johnson, and Artz (2009), Bragazza and Iacumin (2009), Hoorens, Stroetenga, and Aerts (2010), Straková et al. (2010), Straková et al. (2012), Orwin and Ostle (2012), Lieffers (1988), Manninen et al. (2016), Johnson and Damman (1991), Bengtsson, Rydin, and Hájek (2018a), Bengtsson, Rydin, and Hájek (2018b), Asada and Warner (2005), Bengtsson, Granath, and Rydin (2017), Bengtsson, Granath, and Rydin (2016), Hagemann and Moroni (2015), Hagemann and Moroni (2016), B. Piatkowski et al. (2021), B. T. Piatkowski et al. (2021), Mäkilä et al. (2018), Golovatskaya and Nikonova (2017), Golovatskaya and Nikonova (2017).

    4 Database records

    The database is a ‘MariaDB’ database and the database schema was designed to store data and metadata following the Ecological Metadata Language (EML) (Jones et al. 2019). Descriptions of the tables are shown in Tab. 1.

    The database contains general metadata relevant for litterbag experiments (e.g., geographical, temporal, and taxonomic coverage, mesh sizes, experimental design). However, it does not contain a detailed description of sample handling, sample preprocessing methods, site descriptions, because there currently are no discipline-specific metadata and reporting standards. Table 1: Description of the individual tables in the database.

    Name Description

    attributes Defines the attributes of the database and the values in column attribute_name in table data.

    citations Stores bibtex entries for references and data sources.

    citations_to_datasets Links entries in table citations with entries in table datasets.

    custom_units Stores custom units.

    data Stores measured values for samples, for example remaining masses.

    datasets Lists the individual datasets.

    experimental_design_format Stores information on the experimental design of litterbag experiments.

    measurement_scales, measurement_scales_date_time, measurement_scales_interval, measurement_scales_nominal, measurement_scales_ordinal, measurement_scales_ratio Defines data value types.

    missing_value_codes Defines how missing values are encoded.

    samples Stores information on individual samples.

    samples_to_samples Links samples to other samples, for example litter samples collected in the field to litter samples collected during the incubation of the litterbags.

    units, unit_types Stores information on measurement units.

    5 Attributes Table 2: Definition of attributes in the Peatland Decomposition Database and entries in the column attribute_name in table data.

    Name Definition Example value Unit Measurement scale Number type Minimum value Maximum value String format

    4_hydroxyacetophenone_mass_absolute A numeric value representing the content of 4-hydroxyacetophenone, as described in Straková et al. (2010). 0.26 g ratio real 0 Inf NA

    4_hydroxyacetophenone_mass_relative_mass A numeric value representing the content of 4-hydroxyacetophenone, as described in Straková et al. (2010). 0.26 g/g ratio real 0 1 NA

    4_hydroxybenzaldehyde_mass_absolute A numeric value representing the content of 4-hydroxybenzaldehyde, as described in Straková et al. (2010). 0.26 g ratio real 0 Inf NA

    4_hydroxybenzaldehyde_mass_relative_mass A numeric value representing the content of 4-hydroxybenzaldehyde, as described in Straková et al. (2010). 0.26 g/g ratio real 0 1 NA

    4_hydroxybenzoic_acid_mass_absolute A numeric value representing the content of 4-hydroxybenzoic acid, as described in Straková et al. (2010). 0.26 g ratio real 0 Inf NA

    4_hydroxybenzoic_acid_mass_relative_mass A numeric value representing the content of 4-hydroxybenzoic acid, as described in Straková et al. (2010). 0.26 g/g ratio real 0 1 NA

    abbreviation In table custom_units: A string representing an abbreviation for the custom unit. gC NA nominal NA NA NA NA

    acetone_extractives_mass_absolute A numeric value representing the content of acetone extractives, as described in Straková et al. (2010). 0.26 g ratio real 0 Inf NA

    acetone_extractives_mass_relative_mass A numeric value representing the content of acetone extractives, as described in Straková et al. (2010). 0.26 g/g ratio real 0 1 NA

    acetosyringone_mass_absolute A numeric value representing the content of acetosyringone, as described in Straková et al. (2010). 0.26 g ratio real 0 Inf NA

    acetosyringone_mass_relative_mass A numeric value representing the content of acetosyringone, as described in Straková et al. (2010). 0.26 g/g ratio real 0 1 NA

    acetovanillone_mass_absolute A numeric value representing the content of acetovanillone, as described in Straková et al. (2010). 0.26 g ratio real 0 Inf NA

    acetovanillone_mass_relative_mass A numeric value representing the content of acetovanillone, as described in Straková et al. (2010). 0.26 g/g ratio real 0 1 NA

    arabinose_mass_absolute A numeric value representing the content of arabinose, as described in Straková et al. (2010). 0.26 g ratio real 0 Inf NA

    arabinose_mass_relative_mass A numeric value representing the content of arabinose, as described in Straková et al. (2010). 0.26 g/g ratio real 0 1 NA

    ash_mass_absolute A numeric value representing the content of ash (after burning at 550°C). 4 g ratio real 0 Inf NA

    ash_mass_relative_mass A numeric value representing the content of ash (after burning at 550°C). 0.05 g/g ratio real 0 Inf NA

    attribute_definition A free text field with a textual description of the meaning of attributes in the dpeatdecomposition database. NA NA nominal NA NA NA NA

    attribute_name A string describing the names of the attributes in all tables of the dpeatdecomposition database. attribute_name NA nominal NA NA NA NA

    bibtex A string representing the bibtex code used for a literature reference throughout the dpeatdecomposition database. Galka.2021 NA nominal NA NA NA NA

    bounds_maximum A numeric value representing the minimum possible value for a numeric attribute. 0 NA interval real Inf Inf NA

    bounds_minimum A numeric value representing the maximum possible value for a numeric attribute. INF NA interval real Inf Inf NA

    bulk_density A numeric value representing the bulk density of the sample [g cm-3]. 0,2 g/cm^3 ratio real 0 Inf NA

    C_absolute The absolute mass of C in the sample. 1 g ratio real 0 Inf NA

    C_relative_mass The absolute mass of C in the sample. 1 g/g ratio real 0 Inf NA

    C_to_N A numeric value representing the C to N ratio of the sample. 35 g/g ratio real 0 Inf NA

    C_to_P A numeric value representing the C to P ratio of the sample. 35 g/g ratio real 0 Inf NA

    Ca_absolute The

  5. p

    Spain Phone Number Data

    • listtodata.com
    .csv, .xls, .txt
    Updated Jul 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    List to Data (2025). Spain Phone Number Data [Dataset]. https://listtodata.com/spain-number-data
    Explore at:
    .csv, .xls, .txtAvailable download formats
    Dataset updated
    Jul 17, 2025
    Authors
    List to Data
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2025 - Dec 31, 2025
    Area covered
    Spain
    Variables measured
    phone numbers, Email Address, full name, Address, City, State, gender,age,income,ip address,
    Description

    Spain number data is a valuable tool that allows you to filter information based on specific needs. You can filter the data by gender, age, and relationship status. This flexibility helps you connect with the right audience. If you want to reach young adults or families, you can quickly find the right numbers. This makes your communication more effective and targeted. Additionally, the database follows GDPR rules. These rules protect people’s privacy and ensure that all data usage is legal. With this data, you have access to information that is not only reliable but also respectful of privacy.Spain phone number data refers to a database of phone numbers that is 100% correct and valid. Every number in this database is checked carefully to ensure it works. This means businesses can call these numbers confidently, knowing they will reach real people. If you find a number that doesn’t work, you have a replacement guarantee. This means the company will give you a new number at no extra cost. This keeps your contact list fresh and reliable, helping you connect with the right people. You can find important numbers easily on our website, List to Data.

  6. Z

    Data from: SWOT River Database (SWORD)

    • data-staging.niaid.nih.gov
    • zenodo.org
    Updated Jan 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elizabeth H. Altenau; Tamlin M. Pavelsky; Michael T. Durand; Xiao Yang; Renato P. d. M. Frasson; Liam Bendezu (2025). SWOT River Database (SWORD) [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_3898569
    Explore at:
    Dataset updated
    Jan 23, 2025
    Dataset provided by
    University of North Carolina at Chapel Hill
    The Ohio State University
    Jet Propulsion Laboratory, California Institute of Technology
    Authors
    Elizabeth H. Altenau; Tamlin M. Pavelsky; Michael T. Durand; Xiao Yang; Renato P. d. M. Frasson; Liam Bendezu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ** IMPORTANT UPDATE: **

    Until now, the project and public versions of SWORD have been kept separate while algorithms were being developed in preparation for SWOT launch. Now that the SWOT mission is here, we have decided to publish the project version of SWORD which is why the version numbers jump after v2. The primary difference between the project and public versions of SWORD are extra "filler" variables in the NetCDF format that will be used for calculating discharge. Everything else, reach definition, attribute values, etc. are the same between the two versions. For details on the filler variables please reference the Product Description Document provided with the downloads.

    If you use the SWORD Database in your work, please cite: Altenau et al., (2021) The Surface Water and Ocean Topography (SWOT) Mission River Database (SWORD): A Global River Network for Satellite Data Products. Water Resources Research. https://doi.org/10.1029/2021WR030054

    You can also visit www.swordexplorer.com to explore the current version of SWORD before downloading.

    1. Summary:

    The upcoming Surface Water and Ocean Topography (SWOT) satellite mission, planned to launch in 2022, will vastly expand observations of river water surface elevation (WSE), width, and slope. In order to facilitate a wide range of new analyses with flexibility, the SWOT mission will provide a range of relevant data products. One product the SWOT mission will provide are river vector products stored in shapefile format for each SWOT overpass (JPL Internal Document, 2020b). The SWOT vector data products will be most broadly useful if they allow multitemporal analysis of river nodes and reaches covering the same river areas. Doing so requires defining SWOT reaches and nodes a priori, so that SWOT data can be assigned to them. The SWOt River Database (SWORD) combines multiple global river- and satellite-related datasets to define the nodes and reaches that will constitute SWOT river vector data products. SWORD provides high-resolution river nodes (200 m) and reaches (~10 km) in shapefile and netCDF formats with attached hydrologic variables (WSE, width, slope, etc.) as well as a consistent topological system for global rivers 30 m wide and greater.

    1. Data Formats:

    The SWORD database is provided in netCDF, geopackage, and shapefile formats. All files start with a two-digit continent identifier ("af" – Africa, "as" – Asia / Siberia, "eu" – Europe / Middle East, "na" – North America, "oc" – Oceania, "sa" – South America). File syntax denotes the regional information for each file and varies slightly between netCDF and shapefile formats.

    NetCDF files are structured in 3 groups: centerlines, nodes, and reaches. The centerline group contains location information and associated reach and node ids along the original GRWL 30 m centerlines (Allen and Pavelsky, 2018). Node and reach groups contain hydrologic attributes at the ~200 m node and ~10 km reach locations (see description of attributes below). NetCDFs are distributed at continental scales with a filename convention as follows: [continent]_sword_v17.nc (i.e. na_sword_v17.nc).

    SWORD shapefiles consist of four main files (.dbf, .prj, .shp, .shx). There are separate shapefiles for nodes and reaches, where nodes are represented as ~200 m spaced points and reaches are represented as polylines. All shapefiles are in geographic (latitude/longitude) projection, referenced to datum WGS84. Shapefiles are split into HydroBASINS (Lehner and Grill, 2013) Pfafstetter level 2 basins (hbXX) for each continent with a naming convention as follows: [continent]_sword_[nodes/reaches]_hb[XX]_v17.shp (i.e. na_sword_nodes_hb74_v17.shp; na_sword_reaches_hb74_v17.shp).

    SWORD geopackage files are split into two files for nodes and reaches per continental region, where nodes are represented as 200 m spaced points and reaches are represented as polylines. All geopackage files are in geographic (latitude/longitude) projection, referenced to datum WGS84. Geopackage file names are distributed at continental scales and are defined by a two-digit identifier (Table 2): [continent]_sword_[nodes/reaches]_v17.gpkg (i.e. na_sword_nodes_v17.gpkg; na_sword_reaches_v17.gpkg).

    1. Attribute Description:

    This list contains the primary attributes contained in the SWORD database.

    x: Longitude of the node or reach ranging from 180°E to 180°W (units: decimal degrees).

    y: Latitude of the node or reach ranging from 90°S to 90°N (units: decimal degrees).

    node_id: ID of each node. The format of the id is as follows: CBBBBBRRRRNNNT where C = Continent (the first number of the Pfafstetter basin code), B = Remaining Pfafstetter basin code up to level 6, R = Reach number (assigned sequentially within a level 6 basin starting at the downstream end working upstream), N = Node number (assigned sequentially within a reach starting at the downstream end working upstream), T = Type (1 – river, 3 – lake on river, 4 – dam or waterfall, 5 – unreliable topology, 6 – ghost node).

    node_length (node files only): Node length measured along the GRWL centerline points (units: meters).

    reach_id: ID of each reach. The format of the id is as follows: CBBBBBRRRRT where C = Continent (the first number of the Pfafstetter basin code), B = Remaining Pfafstetter basin codes up to level 6, R = Reach number (assigned sequentially within a level 6 basin starting at the downstream end working upstream, T = Type (1 – river, 3 – lake on river, 4 – dam or waterfall, 5 – unreliable topology, 6 – ghost reach).

    reach_length (reach files only): Reach length measured along the GRWL centerline points (units: meters).

    wse: Average water surface elevation (WSE) value for a node or reach. WSEs are extracted from the MERIT Hydro dataset (Yamazaki et al., 2019) and referenced to the EGM96 geoid (units: meters).

    wse_var: WSE variance along the GRWL centerline points used to calculate the average WSE for each node or reach (units: square meters).

    width: Average width for a node or reach (units: meters).

    width_var: Width variance along the GRWL centerline points used to calculate the average width for each node or reach (units: square meters).

    max_width: Maximum width value across the channel for each node or reach that includes island and bar areas (units: meters).

    facc: Maximum flow accumulation value for a node or reach. Flow accumulation values are extracted from the MERIT Hydro dataset (Yamazaki et al., 2019) (units: square kilometers).

    n_chan_max: Maximum number of channels for each node or reach.

    n_chan_mod: Mode of the number of channels for each node or reach.

    obstr_type: Type of obstruction for each node or reach based on the Globale Obstruction Database (GROD, Whittemore et al., 2020) and HydroFALLS data (http://wp.geog.mcgill.ca/hydrolab/hydrofalls). Obstr_type values: 0 - No Dam, 1 - Dam, 2 - Channel Dam, 3 - Lock, 4 - Low Permeable Dam, 5 - Waterfall.

    grod_id: The unique GROD ID for each node or reach with obstr_type values 1-4.

    hfalls_id: The unique HydroFALLS ID for each node or reach with obstr_type value 5.

    dist_out: Distance from the river outlet for each node or reach (units: meters).

    type: Type identifier for a node or reach: 1 – river, 2 – lake off river, 3 – lake on river, 4 – dam or waterfall, 5 – unreliable topology, 6 – ghost reach/node.

    lakeflag: GRWL water body identifier for each reach: 0 – river, 1 – lake/reservoir, 2 – canal, 3 – tidally influenced river.

    manual_add (node files only): Binary flag indicating whether the node was manually added to the public GRWL centerlines (Allen and Pavelsky, 2018). These nodes were originally given a width = 1, but have since been updated to have the reach width values.

    meand_len (node files only): Length of the meander that a node belongs to, measured from beginning of the meander to its end in meters. For nodes longer than one meander, the meander length will represent the average length of all meanders belonging to the node (units: meters).

    sinuosity (node files only): The total reach length the node belongs to divided by the Euclidean distance between the reach end points.

    slope (reach files only): Reach average slope calculated along the GRWL centerline points. Slopes are calculated using a linear regression (units: meters/kilometer).

    n_nodes (reach files only): Number of nodes associated with each reach.

    n_rch_up (reach files only): Number of upstream reaches for each reach.

    n_rch_down (reach files only): Number of downstream reaches for each reach.

    rch_id_up (reach files only): Reach IDs of the upstream neighboring reaches.

    rch_id_dn (reach files only): Reach IDs of the downstream neighboring reaches.

    swot_obs (reach files only): The maximum number of SWOT passes to intersect each reach during the 21 day orbit cycle.

    swot_orbits (reach files only): A list of the SWOT orbit tracks that intersect each reach during the 21 day orbit cycle.

    river_name: All river names associated with a node or reach. If there are multiple names for a node or reach they are listed in alphabetical order and separated by a semicolon.

    edit_flag: Numerical flag indicating the type of update applied to SWORD nodes or reaches from the previous version. Flag descriptions are listed in the Product Description Documentation included with the file downloads.

    trib_flag: Binary flag indicating if a large tributary not represented in SWORD is entering a node or reach. 0 - no tributary, 1 - tributary.

    1. References:

    Allen, G. H., & Pavelsky, T. M. (2018). Global extent of rivers and streams. Science, 361(6402), 585-588.

    Altenau, E. H., Pavelsky, T. M., Durand, M. T., Yang X., Frasson, R. P. d. M., & Bendezu, L. (2021). The Surface Water and Ocean Topography (SWOT) Mission River Database (SWORD): A global river network for satellite data products". Water Resources Research.

    Biancamaria, S., Lettenmaier, D. P., & Pavelsky, T. M. (2016). The SWOT mission and its

  7. p

    Lithuania Number Dataset

    • listtodata.com
    • hmn.listtodata.com
    .csv, .xls, .txt
    Updated Jul 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    List to Data (2025). Lithuania Number Dataset [Dataset]. https://listtodata.com/lithuania-dataset
    Explore at:
    .csv, .xls, .txtAvailable download formats
    Dataset updated
    Jul 17, 2025
    Authors
    List to Data
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2025 - Dec 31, 2025
    Area covered
    Lithuania
    Variables measured
    phone numbers, Email Address, full name, Address, City, State, gender,age,income,ip address,
    Description

    Lithuania number dataset is a database of phone numbers collected from trusted sources. This means the numbers come from reliable places like government records, websites, or phone companies. The companies that provide this data work hard to ensure it is correct. They even offer source URLs, so you can see where the data came from. Moreover, you get 24/7 support, so if you have questions, help is always available. List to Data is a helpful website for finding important cell numbers quickly. Additionally, the phone numbers in the Lithuania number dataset follow an opt-in system. This means people agreed to share their phone numbers. This system is important because it keeps the data legal. It ensures that you are only contacting people who have given permission. Number data in Lithuania makes it easy to connect with the right people. Lithuania phone data is a special set of phone numbers that you can filter to meet your needs. You can easily filter the list by gender, age, and relationship status. For example, you can quickly sort the data to contact older adults or young singles easily. This flexibility makes it easier to communicate with the right audience. Therefore, you can connect with the people you want to reach. Also, the Lithuanian phone data follows strict GDPR rules. These rules protect people’s privacy and make sure their information stays safe. We collect and use the database of Lithuania in ways that respect everyone’s rights. Additionally, it removes any invalid numbers. You can find important phone numbers easily on our website, List to Data. Lithuania phone number list is a collection of phone numbers from people living in Lithuania. This list is completely correct and valid, meaning all numbers work properly. Companies check every phone number to ensure it is accurate. If you find a number that doesn’t work, you can get a new one for free. Moreover, Lithuania phone number list is about all numbers from authorized customers. People on this list agreed to share their numbers. As a result, you can use the data without worrying about legal issues. This makes the phonebook safe and useful for businesses that want to connect with people in Lithuania.

  8. p

    Saudi Arabia Phone Number Data

    • listtodata.com
    .csv, .xls, .txt
    Updated Jul 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    List to Data (2025). Saudi Arabia Phone Number Data [Dataset]. https://listtodata.com/saudi-arabia-number-data
    Explore at:
    .csv, .xls, .txtAvailable download formats
    Dataset updated
    Jul 17, 2025
    Authors
    List to Data
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2025 - Dec 31, 2025
    Area covered
    Saudi Arabia
    Variables measured
    phone numbers, Email Address, full name, Address, City, State, gender,age,income,ip address,
    Description

    Saudi Arabia phone number data is another important collection of phone numbers. These numbers come from trusted sources. We carefully check every number. This means you only get real numbers from reliable places. Furthermore, this data includes source URLs. You can use these URLs to find out where the numbers came from. This adds transparency to the data. If you have questions, you can get help anytime. Support is available 24/7. Moreover, the phone data has an opt-in feature. With customer support always on hand to help, you can feel confident using this data.Saudi Arabia number data is a special collection of phone numbers. Besides, this list includes numbers from people living in Saudi Arabia. Each number in this database has verification for accuracy. If you ever find a number that does not work, there is a replacement guarantee. This means any invalid number gets replaced with a valid one at no extra cost. The data comes from people who have given permission. Thus, this respect for privacy makes it a great tool for businesses. At List to Data, we help you find important phone numbers easily and quickly.

  9. d

    Mass Killings in America, 2006 - present

    • data.world
    csv, zip
    Updated Dec 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Associated Press (2025). Mass Killings in America, 2006 - present [Dataset]. https://data.world/associatedpress/mass-killings-public
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Dec 1, 2025
    Authors
    The Associated Press
    Time period covered
    Jan 1, 2006 - Nov 29, 2025
    Area covered
    Description

    THIS DATASET WAS LAST UPDATED AT 7:11 AM EASTERN ON DEC. 1

    OVERVIEW

    2019 had the most mass killings since at least the 1970s, according to the Associated Press/USA TODAY/Northeastern University Mass Killings Database.

    In all, there were 45 mass killings, defined as when four or more people are killed excluding the perpetrator. Of those, 33 were mass shootings . This summer was especially violent, with three high-profile public mass shootings occurring in the span of just four weeks, leaving 38 killed and 66 injured.

    A total of 229 people died in mass killings in 2019.

    The AP's analysis found that more than 50% of the incidents were family annihilations, which is similar to prior years. Although they are far less common, the 9 public mass shootings during the year were the most deadly type of mass murder, resulting in 73 people's deaths, not including the assailants.

    One-third of the offenders died at the scene of the killing or soon after, half from suicides.

    About this Dataset

    The Associated Press/USA TODAY/Northeastern University Mass Killings database tracks all U.S. homicides since 2006 involving four or more people killed (not including the offender) over a short period of time (24 hours) regardless of weapon, location, victim-offender relationship or motive. The database includes information on these and other characteristics concerning the incidents, offenders, and victims.

    The AP/USA TODAY/Northeastern database represents the most complete tracking of mass murders by the above definition currently available. Other efforts, such as the Gun Violence Archive or Everytown for Gun Safety may include events that do not meet our criteria, but a review of these sites and others indicates that this database contains every event that matches the definition, including some not tracked by other organizations.

    This data will be updated periodically and can be used as an ongoing resource to help cover these events.

    Using this Dataset

    To get basic counts of incidents of mass killings and mass shootings by year nationwide, use these queries:

    Mass killings by year

    Mass shootings by year

    To get these counts just for your state:

    Filter killings by state

    Definition of "mass murder"

    Mass murder is defined as the intentional killing of four or more victims by any means within a 24-hour period, excluding the deaths of unborn children and the offender(s). The standard of four or more dead was initially set by the FBI.

    This definition does not exclude cases based on method (e.g., shootings only), type or motivation (e.g., public only), victim-offender relationship (e.g., strangers only), or number of locations (e.g., one). The time frame of 24 hours was chosen to eliminate conflation with spree killers, who kill multiple victims in quick succession in different locations or incidents, and to satisfy the traditional requirement of occurring in a “single incident.”

    Offenders who commit mass murder during a spree (before or after committing additional homicides) are included in the database, and all victims within seven days of the mass murder are included in the victim count. Negligent homicides related to driving under the influence or accidental fires are excluded due to the lack of offender intent. Only incidents occurring within the 50 states and Washington D.C. are considered.

    Methodology

    Project researchers first identified potential incidents using the Federal Bureau of Investigation’s Supplementary Homicide Reports (SHR). Homicide incidents in the SHR were flagged as potential mass murder cases if four or more victims were reported on the same record, and the type of death was murder or non-negligent manslaughter.

    Cases were subsequently verified utilizing media accounts, court documents, academic journal articles, books, and local law enforcement records obtained through Freedom of Information Act (FOIA) requests. Each data point was corroborated by multiple sources, which were compiled into a single document to assess the quality of information.

    In case(s) of contradiction among sources, official law enforcement or court records were used, when available, followed by the most recent media or academic source.

    Case information was subsequently compared with every other known mass murder database to ensure reliability and validity. Incidents listed in the SHR that could not be independently verified were excluded from the database.

    Project researchers also conducted extensive searches for incidents not reported in the SHR during the time period, utilizing internet search engines, Lexis-Nexis, and Newspapers.com. Search terms include: [number] dead, [number] killed, [number] slain, [number] murdered, [number] homicide, mass murder, mass shooting, massacre, rampage, family killing, familicide, and arson murder. Offender, victim, and location names were also directly searched when available.

    This project started at USA TODAY in 2012.

    Contacts

    Contact AP Data Editor Justin Myers with questions, suggestions or comments about this dataset at jmyers@ap.org. The Northeastern University researcher working with AP and USA TODAY is Professor James Alan Fox, who can be reached at j.fox@northeastern.edu or 617-416-4400.

  10. Newcastle Libraries online resources usage - Dataset - data.gov.uk

    • ckan.publishing.service.gov.uk
    Updated Sep 20, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2023). Newcastle Libraries online resources usage - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/newcastle-libraries-online-resources-usage1
    Explore at:
    Dataset updated
    Sep 20, 2023
    Dataset provided by
    CKANhttps://ckan.org/
    Description

    Monthly usage figures for online resources including databases and e-book platforms when available, for January 2005 to present. Additional information Blank means no data available In 2020, all library buildings closed from 19 March included due to the coronavirus outbreak. Resources included : description {minimum dates of subscription} What the figure is 19th Century British Library Newspapers : digital newspaper archive {May 2007 - present} Number of sessions Access to Research : online journals {April 2014 - present} Number of pages viewed Ancestry : family history {October 2008 - present} Number of sessions until May 2015; number of content pages viewed from June 2015 Britannica Online : encyclopedia {January 2005? - present} Number of searches conducted, until June 2014; number of sessions from July 2014 British Standards {March 2005 - April 2017; November 2017 - present} Number of content pages viewed British Way of Life : information to help asylum seekers, refugees and migrants in getting settled in the UK {October 2016 - January 2023} Number of sessions - subscription ceased January 2023 Citizens Advice Notes : UK law made understandable {March 2007 - March 2016} Number of pages viewed COBRA : business information fact sheets and business sector profiles {October 2005 - present} Number of pages viewed Corporate researcher / Market IQ : company information database {January 2008 - 2015} Number of "reports viewed" EISODOS : information for foreigners coming to live in the UK {October 2008 - October 2013} Information on meaning of figure lost Enquire : "ask a librarian" online chat service {2005 - March 2016} Number of chats started by users in the Newcastle area Find my past : family history {April 2011 - present} Number of sessions (or so we seem to remember when we had access to usage figures) Go Citizen : replaces Life in Great Britain, citizenship test preparing for UK citizenship. {September 2023 - present} Number of tests taken IBISWorld : market research {January 2017 - present} Number of pages viewed Key Note : company information and market research {April 2011 - October 2018} Number of reports viewed Kompass : business information {2006 - July 2011} Information on meaning of figure lost Know UK : current reference information {January 2007 - June 2011} Information on meaning of figure lost Life in Great Britain : self-learn course to prepare for the Life in the UK citizenship test {January 2010 - January 2023} Number of sessions - subscription ceased January 2023 Local Data Online : business (retail sector) information {November 2013 - July 2015?} Number of queries per month. No longer receive stats on this as of July 2024. Mint UK & Mint Global : company information databases {March 2014 - 2015} Information on meaning of figure lost Mintel : market reports {2006? - April 2010; June 2013 - present} Number of reports viewed Newsstand : online newspapers {January 2011 - March 2014} Information on meaning of figure lost Onesource / Avention : company information database (changed name over the years) {March 2012 - October 2013; July 2015 - present} Number of searches conducted - Subscription ceased June 2024 News UK : newspaper articles {January 2007 - October 2010?} Information on meaning of figure lost Oxford English Dictionary {May 2006 - present} Number of sessions Oxford Art Online {March 2006 - present} Number of sessions Oxford Dictionaries {February 2015 - present} Number of sessions Oxford Dictionary of National Biography {January 2006 - present} Number of sessions Oxford Music Online {March 2006 - present} Number of sessions Oxford Reference Online {March 2006 - present} Number of sessions Safari Select : online books (to read online, as opposed to the e-books you can download and read offline) {May 2009 - March 2014} Number of books viewed Times Digital Archive : digitised newspapers {January 2005 - present} Number of sessions Theory Test Pro : practice questions for the driving theory test {August 2010 - present} Number of sessions Transparent language online / Byki : language courses {January 2011 - November 2012} Number of courses accessed Universal Skills : learn basic computer skills and how to use Universal Job Match {November 2014 - present} Number of users Newcastle Library App (devices) : number of devices the app is on {2013 - present} Newcastle Library App (launches) : number of times the app has been used {2013 - present} Bibliotheca Cloud Library : e-books and e-audiobooks {February 2016 - March 2018} Number of items borrowed Bolinda : e-audiobooks collection {2012 - February 2016} Number of items borrowed (figures only from April 2015) Bolinda BorrowBox e-books {February 2018 - present} Number of items borrowed Bolinda BorrowBox e-audiobooks {February 2018 - present} Number of items borrowed ComicsPlus : e-comic books {March 2017} Number of items borrowed - no longer record this, not sure when subscription ceased OneClick / RB Digital (e-audiobooks) : e-audiobooks collection (became RB Digital in... 2017?) {May 2015} Number of items borrowed - no longer record this, not sure when subscription ceased Overdrive (e-audiobooks) {2011 - May 2016} Number of items borrowed (figures only from April 2015) - subscription ceased January 2023016} Number of items borrowed (figures only from April 2015) - subscription ceased March 2023 Public Library Online : e-books collection {April 2016 - February 2018} Number of items borrowed Zinio / RB Digital (magazines) : digital magazines (the Zinio service became integrated with the other RB Digital content in 2017) {May 2015 - present} Number of magazines downloaded (figures only from January 2016)

  11. U

    Elevation, Flow Accumulation, Flow Direction, and Stream Definition Data in...

    • data.usgs.gov
    • datasets.ai
    • +2more
    Updated Dec 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lindsey Schafer; Jennifer Sharpe (2023). Elevation, Flow Accumulation, Flow Direction, and Stream Definition Data in Support of the Illinois StreamStats Upgrade to the Basin Delineation Database [Dataset]. http://doi.org/10.5066/P9YIAUZQ
    Explore at:
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Lindsey Schafer; Jennifer Sharpe
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    2023
    Area covered
    Illinois
    Description

    The U.S. Geological Survey (USGS), in cooperation with the Illinois Center for Transportation and the Illinois Department of Transportation, prepared hydro-conditioned geographic information systems (GIS) layers for use in the Illinois StreamStats application. These data were used to delineate drainage basins and compute basin characteristics for updated peak flow and flow duration regression equations for Illinois. This dataset consists of raster grid files for elevation (dem), flow accumulation (fac), flow direction (fdr), and stream definition (str900) for each 8-digit Hydrologic Unit Code (HUC) area in Illinois merged into a single dataset. There are 51 full or partial HUC 8s represented by this data set: 04040002, 05120108, 05120109, 05120111, 05120112, 05120113, 05120114, 05120115, 05140202, 05140203, 05140204, 05140206, 07060005, 07080101, 07080104, 07090001, 07090002, 07090003, 07090004, 07090005, 07090006, 07090007, 07110001, 07110004, 07110009, 07120001, 07120002, 071200 ...

  12. Z

    Dataset: A Systematic Literature Review on the topic of High-value datasets

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anastasija Nikiforova; Nina Rizun; Magdalena Ciesielska; Charalampos Alexopoulos; Andrea Miletič (2023). Dataset: A Systematic Literature Review on the topic of High-value datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7944424
    Explore at:
    Dataset updated
    Jun 23, 2023
    Dataset provided by
    Gdańsk University of Technology
    University of the Aegean
    University of Tartu
    University of Zagreb
    Authors
    Anastasija Nikiforova; Nina Rizun; Magdalena Ciesielska; Charalampos Alexopoulos; Andrea Miletič
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains data collected during a study ("Towards High-Value Datasets determination for data-driven development: a systematic literature review") conducted by Anastasija Nikiforova (University of Tartu), Nina Rizun, Magdalena Ciesielska (Gdańsk University of Technology), Charalampos Alexopoulos (University of the Aegean) and Andrea Miletič (University of Zagreb) It being made public both to act as supplementary data for "Towards High-Value Datasets determination for data-driven development: a systematic literature review" paper (pre-print is available in Open Access here -> https://arxiv.org/abs/2305.10234) and in order for other researchers to use these data in their own work.

    The protocol is intended for the Systematic Literature review on the topic of High-value Datasets with the aim to gather information on how the topic of High-value datasets (HVD) and their determination has been reflected in the literature over the years and what has been found by these studies to date, incl. the indicators used in them, involved stakeholders, data-related aspects, and frameworks. The data in this dataset were collected in the result of the SLR over Scopus, Web of Science, and Digital Government Research library (DGRL) in 2023.

    Methodology

    To understand how HVD determination has been reflected in the literature over the years and what has been found by these studies to date, all relevant literature covering this topic has been studied. To this end, the SLR was carried out to by searching digital libraries covered by Scopus, Web of Science (WoS), Digital Government Research library (DGRL).

    These databases were queried for keywords ("open data" OR "open government data") AND ("high-value data*" OR "high value data*"), which were applied to the article title, keywords, and abstract to limit the number of papers to those, where these objects were primary research objects rather than mentioned in the body, e.g., as a future work. After deduplication, 11 articles were found unique and were further checked for relevance. As a result, a total of 9 articles were further examined. Each study was independently examined by at least two authors.

    To attain the objective of our study, we developed the protocol, where the information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information.

    Test procedure Each study was independently examined by at least two authors, where after the in-depth examination of the full-text of the article, the structured protocol has been filled for each study. The structure of the survey is available in the supplementary file available (see Protocol_HVD_SLR.odt, Protocol_HVD_SLR.docx) The data collected for each study by two researchers were then synthesized in one final version by the third researcher.

    Description of the data in this data set

    Protocol_HVD_SLR provides the structure of the protocol Spreadsheets #1 provides the filled protocol for relevant studies. Spreadsheet#2 provides the list of results after the search over three indexing databases, i.e. before filtering out irrelevant studies

    The information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information

    Descriptive information
    1) Article number - a study number, corresponding to the study number assigned in an Excel worksheet 2) Complete reference - the complete source information to refer to the study 3) Year of publication - the year in which the study was published 4) Journal article / conference paper / book chapter - the type of the paper -{journal article, conference paper, book chapter} 5) DOI / Website- a link to the website where the study can be found 6) Number of citations - the number of citations of the article in Google Scholar, Scopus, Web of Science 7) Availability in OA - availability of an article in the Open Access 8) Keywords - keywords of the paper as indicated by the authors 9) Relevance for this study - what is the relevance level of the article for this study? {high / medium / low}

    Approach- and research design-related information 10) Objective / RQ - the research objective / aim, established research questions 11) Research method (including unit of analysis) - the methods used to collect data, including the unit of analy-sis (country, organisation, specific unit that has been ana-lysed, e.g., the number of use-cases, scope of the SLR etc.) 12) Contributions - the contributions of the study 13) Method - whether the study uses a qualitative, quantitative, or mixed methods approach? 14) Availability of the underlying research data- whether there is a reference to the publicly available underly-ing research data e.g., transcriptions of interviews, collected data, or explanation why these data are not shared? 15) Period under investigation - period (or moment) in which the study was conducted 16) Use of theory / theoretical concepts / approaches - does the study mention any theory / theoretical concepts / approaches? If any theory is mentioned, how is theory used in the study?

    Quality- and relevance- related information
    17) Quality concerns - whether there are any quality concerns (e.g., limited infor-mation about the research methods used)? 18) Primary research object - is the HVD a primary research object in the study? (primary - the paper is focused around the HVD determination, sec-ondary - mentioned but not studied (e.g., as part of discus-sion, future work etc.))

    HVD determination-related information
    19) HVD definition and type of value - how is the HVD defined in the article and / or any other equivalent term? 20) HVD indicators - what are the indicators to identify HVD? How were they identified? (components & relationships, “input -> output") 21) A framework for HVD determination - is there a framework presented for HVD identification? What components does it consist of and what are the rela-tionships between these components? (detailed description) 22) Stakeholders and their roles - what stakeholders or actors does HVD determination in-volve? What are their roles? 23) Data - what data do HVD cover? 24) Level (if relevant) - what is the level of the HVD determination covered in the article? (e.g., city, regional, national, international)

    Format of the file .xls, .csv (for the first spreadsheet only), .odt, .docx

    Licenses or restrictions CC-BY

    For more info, see README.txt

  13. Dictionary of English Words and Definitions

    • kaggle.com
    zip
    Updated Sep 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AnthonyTherrien (2024). Dictionary of English Words and Definitions [Dataset]. https://www.kaggle.com/datasets/anthonytherrien/dictionary-of-english-words-and-definitions
    Explore at:
    zip(6401928 bytes)Available download formats
    Dataset updated
    Sep 22, 2024
    Authors
    AnthonyTherrien
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Overview

    This dataset consists of 42,052 English words and their corresponding definitions. It is a comprehensive collection of words ranging from common terms to more obscure vocabulary. The dataset is ideal for Natural Language Processing (NLP) tasks, educational tools, and various language-related applications.

    Key Features:

    • Words: A diverse set of English words, including both rare and frequently used terms.
    • Definitions: Each word is accompanied by a detailed definition that explains its meaning and contextual usage.

    Total Number of Words: 42,052

    Applications

    This dataset is well-suited for a range of use cases, including:

    • Natural Language Processing (NLP): Enhance text understanding models by providing contextual meaning and word associations.
    • Vocabulary Building: Create educational tools or games that help users expand their vocabulary.
    • Lexical Studies: Perform academic research on word usage, trends, and lexical semantics.
    • Dictionary and Thesaurus Development: Serve as a resource for building dictionary or thesaurus applications, where users can search for words and definitions.

    Data Structure

    • Word: The column containing the English word.
    • Definition: The column providing a comprehensive definition of the word.

    Potential Use Cases

    • Language Learning: This dataset can be used to develop applications or tools aimed at enhancing vocabulary acquisition for language learners.
    • NLP Model Training: Useful for tasks such as word embeddings, definition generation, and contextual learning.
    • Research: Analyze word patterns, rare vocabulary, and trends in the English language.

    This version focuses on providing essential information while emphasizing the total number of words and potential applications of the dataset. Let me know if you'd like any further adjustments!

  14. A database of semantic structures for analogy

    • kaggle.com
    zip
    Updated Mar 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Edmund Dantes (2025). A database of semantic structures for analogy [Dataset]. https://www.kaggle.com/datasets/mtatlas/atlas-analogy-structures
    Explore at:
    zip(36943640 bytes)Available download formats
    Dataset updated
    Mar 7, 2025
    Authors
    Edmund Dantes
    License

    https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

    Description

    This database, structured as a text-file with one line per structural entry per concept, associates symbolic structures with the names of lexical concepts such as scientist, art, war, politics, gun, etc.

    Each line specifies an "open-form" structure which uses an open-ended set of predicates, and one one or canonical rewritings of this structure using a small, closed set of predicates (essentially semantic primitives).

    For each open-form and closed-form structure, the symbol marked with an asterix denotes the entity that the structure is about. The same structure that relates scientist to science may be associated with "scientist" and with "science" but the asterix will be on a different symbol in each case. During analogically structure-mapping, a symbol marked with an asterix can only be mapped to another symbol that is so marked; this prevents a structure about art or magic, say, being mapped to a structure about scientists, say; rather, scientists should be mapped to artists or magicians while science is mapped to art or magic.

    For each open-form and closed-form structure we also provide an abstraction form in which non-predicate entities are replaced with numeric variables. Two different structures, for scientist and artist say, will be superficially different but they may have the same abstraction form, which means they can be structure-mapped consistently. Abstract forms can be used as keys in a hash map that maps to all of the specific structures that instantiate those abstractions. In this way, retrieving possible analogues becomes very efficient.

    Each abstraction form has a number that indicates the number of specific forms it is associated with. If this number is 1, it means that the form is unique; if greater than 1, it indicates there are other, different specific forms that have the same abstract form (and can thus be retrieved as a potential analogue).

  15. Confusion matrix of K-Means clustering results on dataset 6.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaobin Huang; Yuan Cheng; Dapeng Lang; Ronghua Chi; Guofeng Liu (2023). Confusion matrix of K-Means clustering results on dataset 6. [Dataset]. http://doi.org/10.1371/journal.pone.0090109.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Shaobin Huang; Yuan Cheng; Dapeng Lang; Ronghua Chi; Guofeng Liu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Confusion matrix of K-Means clustering results on dataset 6.

  16. Data from: Thunderstorm outflows in the Mediterranean Sea area

    • zenodo.org
    txt, zip
    Updated Apr 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federico Canepa; Federico Canepa; Massimiliano Burlando; Massimiliano Burlando; Maria Pia Repetto; Maria Pia Repetto (2024). Thunderstorm outflows in the Mediterranean Sea area [Dataset]. http://doi.org/10.5281/zenodo.10688746
    Explore at:
    txt, zipAvailable download formats
    Dataset updated
    Apr 30, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Federico Canepa; Federico Canepa; Massimiliano Burlando; Massimiliano Burlando; Maria Pia Repetto; Maria Pia Repetto
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Mediterranean Sea
    Description

    In the context of the European projects “Wind and Ports” (grant No. B87E09000000007) and “Wind, Ports and Sea” (grant No. B82F13000100005), an extensive in-situ wind monitoring network was installed in the main ports of the Northern Mediterranean Sea. An unprecedent number of wind records has been acquired and systematically analyzed. Among these, a considerable number of records presented non-stationary and non-Gaussian characteristics that are completely different from those of synoptic extra-tropical cyclones, widely known in the atmospheric science and wind engineering communities. The cross-checking with meteorological information allows to identify which of these events can be defined as thunderstorm winds, i.e., downbursts and gust fronts.

    The scientific literature of the last few decades has demonstrated that downbursts, and especially micro-bursts, are extremely dangerous for the natural and built environment. Furthermore, recent trends in climate change seem to preview drastic future scenarios in terms of intensification and frequency increase of this type of extreme events. However, the limited space and time structure of thunderstorm outflows makes them still difficult to be measured in nature and, consequently, to build physically reliable and easily applicable models as in the case of extra-tropical cyclones. For these reasons, the collection and publication of events of this type represents a unique opportunity for the scientific community.

    The dataset here presented was built in the context of the activities of the project THUNDERR “Detection, simulation, modelling and loading of thunderstorm outflows to design wind-safer and cost-efficient structures”, financed by the European Research Council (ERC), Advanced Grant 2016 (grant No. 741273, P.I. Prof. Giovanni Solari, University of Genoa). It collects 29 thunderstorm downbursts that occurred between 2010 and 2015 in the Italian ports of Genoa (GE) (4), Livorno (LI) (14), and La Spezia (SP) (11), and were recorded by means of ultrasonic anemometers (Gill WindObserver II in Genoa and La Spezia, Gill WindMaster Pro in Livorno). All thunderstorm events included in the database were verified by means of meteorological information, such as radar (CIMA Research Foundation is gratefully acknowledge for providing with most of the radar images), satellite, and lightning data. In fact, (i) high and localized clouds typical of thunderstorm cumulonimbus, (ii) precipitations, and (iii) lightnings represent reliable indicators of the occurrence of the thunderstorm event.

    Some events were recorded by multiple anemometers in the same port area – the total number of signals included in the database is 99. Despite the limited number of points (anemometers), this will allow the user to perform cross-correlation analysis in time and space to eventually retrieve size, position, trajectory of the storm, etc.

    The ASCII tab-delimited file ‘Anemometers_location.txt’ reports specifications of the anemometers used in this monitoring study: port code (Port code – Genoa-GE, Livorno-LI, La Spezia-SP); anemometer code (Anemometer code); latitude (Lat.) and longitude (Lon.) in decimal degree WGS84; height above the ground level (h a.g.l.) in meters; Instrument type. Bi-axial anemometers were used from the ports of Genoa and La Spezia, recording the two horizontal wind speed components (u, v). Three-axial ultrasonic anemometers were used in the port of Livorno, also providing the vertical wind speed component w (except bi-axial anemometers LI06 and LI07). All anemometers acquired velocity data at sampling frequency 10 Hz, sensitivity 0.01 m s-1 (except anemometers LI06 and LI07 with sensitivity 0.1 m s-1) and were installed at various heights ranging from 13.0 to 75.0 m, as reported in the file ‘Anemometers_location.txt’.

    The ASCII tab-delimited file ‘List_DBevents.txt’ lists all downburst records included in the database, in terms of: event and record number (Event | record no.); port code (Port code); date of event occurrence (Date) in the format yyyy-mm-dd; approximate time of occurrence of the velocity peak (Time [UTC]) in the format HH:MM; anemometer code (Anemometer code).

    The database is presented as a zip file (‘DB-records.zip’). The events are divided based on the port of occurrence (three folders GE, LI, and SP). Within each folder, the downburst events that were recorded in that specific port are reported as subfolders (name format ‘[port code]_yyyy-mm-dd’) and contain the single anemometers signals as TAB-delimited text files (name format ‘[port and anemometer code]_yyyy-mm-dd.txt’). Each sub-dataset (file) contains 3(4) columns and 360.000 rows. The first column shows the 10-h time vector (t, ISO format) in UTC, while the remaining 2(3) columns report the 10-h time series of 10-Hz instantaneous horizontal (zonal west-to-east u, meridional south-to-north v) and, where available, vertical (positive upward w) wind speed components, centred around the time of maximum horizontal wind speed (vectorial sum of u and v). The choice of representation of the wind speed in a large time interval (10 hours) allows the user to perform a more comprehensive and detailed analysis of the event by taking into account also the wind conditions before and after the onset of the downburst phenomenon. 'Not-a-Number' (‘NaN’) values are reported in wind velocity signals when the instrument did not record valid data. Some wind speed records show noise in discrete intervals of the signal, which reflects in an increase of the wind speed standard deviation. A modified Hampel filter was employed to remove measurement outliers. For each wind speed signal, every data sample was considered in ascending order, along with its adjacent ten samples (five on each side). This technique calculated the median and standard deviation within the sampling window using the median absolute deviation. Elements deviating from the median by more than six standard deviations were identified and replaced with 'NaN'. The tuning of the filter parameters involved finding a balance between overly agressive and insufficient removal of outliers. Residual outliers were subsequently manually removed through meticulous qualitative inspection. The complexity and subjectivity of this operation provide users with the opportunity to explore alternative approaches. Consequently, the published dataset includes two versions: an initial version (v1) comprising the original raw data with no filtering applied, and a second "cleaned" version (v2).

    The presented database can be further used by researchers to validate and calibrate experimental and numerical simulations, as well as analytical models, of downburst winds. It will also be an important resource for the scientific community working in the wind engineering field, in meteorology and atmospheric sciences, as well as in the risk management and reductions of losses related to thunderstorm events (i.e., insurance companies).

  17. C

    China CN: Total R&D Personnel: Compound Annual Growth Rate

    • ceicdata.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com, China CN: Total R&D Personnel: Compound Annual Growth Rate [Dataset]. https://www.ceicdata.com/en/china/number-of-researchers-and-personnel-on-research-and-development-non-oecd-member-annual
    Explore at:
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 2011 - Dec 1, 2022
    Area covered
    China
    Description

    CN: Total R&D Personnel: Compound Annual Growth Rate data was reported at 11.148 % in 2022. This records an increase from the previous number of 9.205 % for 2021. CN: Total R&D Personnel: Compound Annual Growth Rate data is updated yearly, averaging 8.624 % from Dec 1992 (Median) to 2022, with 29 observations. The data reached an all-time high of 18.409 % in 2005 and a record low of -9.143 % in 1998. CN: Total R&D Personnel: Compound Annual Growth Rate data remains active status in CEIC and is reported by Organisation for Economic Co-operation and Development. The data is categorized under Global Database’s China – Table CN.OECD.MSTI: Number of Researchers and Personnel on Research and Development: Non OECD Member: Annual.

    The national breakdown by source of funds does not fully match with the classification defined in the Frascati Manual. The R&D financed by the government, business enterprises, and by the rest of the world can be retrieved but part of the expenditure has no specific source of financing, i.e. self-raised funding (in particular for independent research institutions), the funds from the higher education sector and left-over government grants from previous years.

    The government and higher education sectors cover all fields of NSE and SSH while the business enterprise sector only covers the fields of NSE. There are only few organisations in the private non-profit sector, hence no R&D survey has been carried out in this sector and the data are not available.

    From 2009, researcher data are collected according to the Frascati Manual definition of researcher. Beforehand, this was only the case for independent research institutions, while for the other sectors data were collected according to the UNESCO concept of “scientist and engineer”.

    In 2009, the survey coverage in the business and the government sectors has been expanded.

    Before 2000, all of the personnel data and 95% of the expenditure data in the business enterprise sector are for large and medium-sized enterprises only. Since 2000 however, the survey covers almost all industries and all enterprises above a certain threshold. In 2000 and 2004, a census of all enterprises was held, while in the intermediate years data for small enterprises are estimated.

    Due to the reform of the S&T system some government institutions have become enterprises, and their R&D data have been reflected in the Business Enterprise sector since 2000.

  18. Directory of Important Wetlands Spatial Database including Wetlands Type and...

    • data.ozcoasts.org.au
    • researchdata.edu.au
    Updated Oct 15, 2008
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Australian Government Department of the Environment (2008). Directory of Important Wetlands Spatial Database including Wetlands Type and Criteria [Dataset]. https://data.ozcoasts.org.au/geonetwork/srv/api/records/%7B0377A251-4E6C-48DF-95FC-720048F879B6%7D
    Explore at:
    ogc:wms-1.3.0-http-get-map, www:link-1.0-http--linkAvailable download formats
    Dataset updated
    Oct 15, 2008
    Dataset provided by
    Australian Governmenthttp://www.australia.gov.au/
    Authors
    Australian Government Department of the Environment
    Area covered
    Description

    This is a polygon coverage representing the wetlands cited in the "A Directory of Important Wetlands in Australia" Third Edition (EA, 2001), plus various additions for wetlands listed after 2001. This dataset includes attribute information showing the wetlands type and criteria for listing for each wetland. This coverage is a compilation of various data sources and has been collected using a variety of methods. This dataset should therefore be used as an indicative guide only to wetland boundaries and locations. The data has been collated by the Australian Government Department of the Environment from various datasets including those supplied by the relevant State agencies. State agency contributors include the Queensland Environmental Protection Authority, NSW Department of Environment and Conservation and the Victorian Department of Sustainability and Environment. For the identification of wetland boundaries or locations in regard to the compliance of activities with relevant State legislation, the relevant State authority should be contacted to obtain the most recent and accurate wetland boundary information available. The criteria for the definition of a wetland used in this dataset is that adopted by the Ramsar Convention, namely: "areas of marsh, fen, peatland or water, whether natural or artificial, permanent or temporary, with water that is static or flowing, fresh, brackish or salt, including areas of marine water the depth of which at low tide does not exceed six meters." Attributes in the dataset include: WNAME: the name of the wetland site as listed in the Directory. REFCODE: an individual reference number including a cross reference to the State in which it occurs. The first 2-3 characters relate to the State or Territory of origin followed by the 3 digit sequential wetland numeric code. (eg. "NSW001": NSW=New South Wales; 001=wetland number). WET_TYPE: The wetland type code. Definitions are shown below. CRITERIA: The criteria for listing code. Definitions are shown below. WETLAND TYPE CODES: A-Marine and Coastal Zone wetlands 1. Marine waters-permanent shallow waters less than six metres deep at low tide; includes sea bays, straits 2. Subtidal aquatic beds; includes kelp beds, seagrasses, tropical marine meadows 3. Coral reefs 4. Rocky marine shores; includes rocky offshore islands, sea cliffs 5. Sand, shingle or pebble beaches; includes sand bars, spits, sandy islets 6. Estuarine waters; permanent waters of estuaries and estuarine systems of deltas 7. Intertidal mud, sand or salt flats 8. Intertidal marshes; includes saltmarshes, salt meadows, saltings, raised salt marshes, tidal brackish and freshwater marshes 9. Intertidal forested wetlands; includes mangrove swamps, nipa swamps, tidal freshwater swamp forests 10. Brackish to saline lagoons and marshes with one or more relatively narrow connections with the sea 11. Freshwater lagoons and marshes in the coastal zone 12. Non-tidal freshwater forested wetlands B-Inland wetlands 1. Permanent rivers and streams; includes waterfalls 2. Seasonal and irregular rivers and streams 3. Inland deltas (permanent) 4. Riverine floodplains; includes river flats, flooded river basins, seasonally flooded grassland, savanna and palm savanna 5. Permanent freshwater lakes (more than 8 ha); includes large oxbow lakes 6. Seasonal/intermittent freshwater lakes (more than 8 ha), floodplain lakes 7. Permanent saline/brackish lakes 8. Seasonal/intermittent saline lakes 9. Permanent freshwater ponds ( 8 ha) 10. Ponds, including farm ponds, stock ponds, small tanks (generally

  19. Global Terrorism Dataset

    • kaggle.com
    zip
    Updated Mar 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Parth (2023). Global Terrorism Dataset [Dataset]. https://www.kaggle.com/parthdevrani/global-terrorism-dataset
    Explore at:
    zip(30077034 bytes)Available download formats
    Dataset updated
    Mar 7, 2023
    Authors
    Parth
    Description

    Content Geography: Worldwide

    Time period: 1970-2017, except 1993

    Unit of analysis: Attack

    Variables: >100 variables on location, tactics, perpetrators, targets, and outcomes

    Sources: Unclassified media articles (Note: Please interpret changes over time with caution. Global patterns are driven by diverse trends in particular regions, and data collection is influenced by fluctuations in access to media coverage over both time and place.)

    Definition of terrorism:

    "The threatened or actual use of illegal force and violence by a non-state actor to attain a political, economic, religious, or social goal through fear, coercion, or intimidation."

    See the GTD Codebook for important details on data collection methodology, definitions, and coding schema.

    Acknowledgements The Global Terrorism Database is funded through START, by the US Department of State (Contract Number: SAQMMA12M1292) and the US Department of Homeland Security Science and Technology Directorate’s Office of University Programs (Award Number 2012-ST-061-CS0001, CSTAB 3.1). The coding decisions and classifications contained in the database are determined independently by START researchers and should not be interpreted as necessarily representing the official views or policies of the United States Government.

    GTD Team

    Publications The GTD has been leveraged extensively in scholarly publications, reports, and media articles. Putting Terrorism in Context: Lessons from the Global Terrorism Database, by GTD principal investigators LaFree, Dugan, and Miller investigates patterns of terrorism and provides perspective on the challenges of data collection and analysis. The GTD's data collection manager, Michael Jensen, discusses important Benefits and Drawbacks of Methodological Advancements in Data Collection and Coding.

    Terms of Use Use of the data signifies your agreement to the following terms and conditions.

    END USER LICENSE AGREEMENT WITH UNIVERSITY OF MARYLAND

    IMPORTANT – THIS IS A LEGAL AGREEMENT BETWEEN YOU ("You") AND THE UNIVERSITY OF MARYLAND, a public agency and instrumentality of the State of Maryland, by and through the National Consortium for the Study of Terrorism and Responses to Terrorism (“START,” “US,” “WE” or “University”). PLEASE READ THIS END USER LICENSE AGREEMENT (“EULA”) BEFORE ACCESSING THE Global Terrorism Database (“GTD”). THE TERMS OF THIS EULA GOVERN YOUR ACCESS TO AND USE OF THE GTD WEBSITE, THE DATA, THE CODEBOOK, AND ANY AUXILIARY MATERIALS. BY ACCESSING THE GTD, YOU SIGNIFY THAT YOU HAVE READ, UNDERSTAND, ACCEPT, AND AGREE TO ABIDE BY THESE TERMS AND CONDITIONS. IF YOU DO NOT ACCEPT THE TERMS OF THIS EULA, DO NOT ACCESS THE GTD.

    TERMS AND CONDITIONS

    GTD means Global Terrorism Database data and the online user interface (www.start.umd.edu/gtd) produced and maintained by the National Consortium for the Study of Terrorism and Responses to Terrorism (START). This includes the data and codebook, any auxiliary materials present, and the user interface by which the data are presented.

    LICENSE GRANT. University hereby grants You a revocable, non-exclusive, non-transferable right and license to access the GTD and use the data, the codebook, and any auxiliary materials solely for non-commercial research and analysis.

    RESTRICTIONS. You agree to NOT: a. publicly post or display the data, the codebook, or any auxiliary materials without express written permission by University of Maryland (this excludes publication of analysis or visualization of the data for non-commercial purposes); b. sell, license, sublicense, or otherwise distribute the data, the codebook, or any auxiliary materials to third parties for cash or other considerations; c. modify, hide, delete or interfere with any notices that are included on the GTD or the codebook, or any auxiliary materials; d. use the GTD to draw conclusions about the official legal status or criminal record of an individual, or the status of a criminal or civil investigation; e. interfere with or disrupt the GTD website or servers and networks connected to the GTD website; or f. use robots, spiders, crawlers, automated devices and similar technologies to screen-scrape the site or to engage in data aggregation or indexing of the data, the codebook, or any auxiliary materials other than in accordance with the site’s robots.txt file.

    YOUR RESPONSIBILITIES: a. All information sourced from the GTD should be acknowledged and cited as follows: "National Consortium for the Study of Terrorism and Responses to Terrorism (START), University of Maryland. (2018). The Global Terrorism Database (GTD) [Data file]. Retrieved from https://www.start.umd.edu/gtd" b. You agree to acknowledge any copyrightable materials with a copyright notice “Copyright University of Maryland 2018.” c. Any modifications You make to the GTD for published analysis must be clearly documented and must not misrepresent an...

  20. R

    Russia RU: Total Business Enterprise R&D Personnel: Per Thousand Employment...

    • ceicdata.com
    Updated Feb 1, 2001
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2001). Russia RU: Total Business Enterprise R&D Personnel: Per Thousand Employment In Industry [Dataset]. https://www.ceicdata.com/en/russia/number-of-researchers-and-personnel-on-research-and-development-non-oecd-member-annual/ru-total-business-enterprise-rd-personnel-per-thousand-employment-in-industry
    Explore at:
    Dataset updated
    Feb 1, 2001
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 2009 - Dec 1, 2020
    Area covered
    Russia
    Description

    Russia RU: Total Business Enterprise R&D Personnel: Per Thousand Employment In Industry data was reported at 7.086 Per 1000 in 2020. This records a decrease from the previous number of 7.356 Per 1000 for 2019. Russia RU: Total Business Enterprise R&D Personnel: Per Thousand Employment In Industry data is updated yearly, averaging 8.988 Per 1000 from Dec 1998 (Median) to 2020, with 23 observations. The data reached an all-time high of 13.599 Per 1000 in 1998 and a record low of 6.724 Per 1000 in 2018. Russia RU: Total Business Enterprise R&D Personnel: Per Thousand Employment In Industry data remains active status in CEIC and is reported by Organisation for Economic Co-operation and Development. The data is categorized under Global Database’s Russian Federation – Table RU.OECD.MSTI: Number of Researchers and Personnel on Research and Development: Non OECD Member: Annual. In response to Russia's large-scale aggression against Ukraine, the OECD Council decided on 8 March 2022 to immediately suspend the participation of Russia and Belarus in OECD bodies. In view of this decision, the OECD suspended its solicitation of official statistics on R&D from Russian authorities, leading to the absence of more recent R&D statistics for this country in the OECD database, while previously compiled data are still available.The business enterprise sector includes all organisations and enterprises whose main activity is connected with the production of goods and services for sale, including those owned by the state, and private non-profit institutions serving the above-mentioned organisations. In practice however, R&D performed in this sector is carried out mostly by industrial research institutes other than enterprises. This particularity reflects the traditional organisation of Russian R&D.Headcount data include full-time personnel only, and hence are underestimated, while data in full-time equivalents (FTE) are calculated on the basis of both full-time and part-time personnel. This explains why the FTE data are greater than the headcount data.New budgetary procedures introduced in 2005 have resulted in items previously classified as GBARD being attributed to other headings and have affected the coverage and breakdown by socio-economic objective.;

    Definition of MSTI variables 'Value Added of Industry' and 'Industrial Employment':

    R&D data are typically expressed as a percentage of GDP to allow cross-country comparisons. When compiling such indicators for the business enterprise sector, one may wish to exclude, from GDP measures, economic activities for which the Business R&D (BERD) is null or negligible by definition. By doing so, the adjusted denominator (GDP, or Value Added, excluding non-relevant industries) better correspond to the numerator (BERD) with which it is compared to.

    The MSTI variable 'Value added in industry' is used to this end:

    It is calculated as the total Gross Value Added (GVA) excluding 'real estate activities' (ISIC rev.4 68) where the 'imputed rent of owner-occupied dwellings', specific to the framework of the System of National Accounts, represents a significant share of total GVA and has no R&D counterpart. Moreover, the R&D performed by the community, social and personal services is mainly driven by R&D performers other than businesses.

    Consequently, the following service industries are also excluded: ISIC rev.4 84 to 88 and 97 to 98. GVA data are presented at basic prices except for the People's Republic of China, Japan and New Zealand (expressed at producers' prices).In the same way, some indicators on R&D personnel in the business sector are expressed as a percentage of industrial employment. The latter corresponds to total employment excluding ISIC rev.4 68, 84 to 88 and 97 to 98.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
stepheniota (2023). Number sequences for self-supervised learning [Dataset]. https://www.kaggle.com/datasets/stepheniota/integer-sequences-for-representation-learning
Organization logo

Number sequences for self-supervised learning

Learn mathematical properties of the Integers through modern embedding methods

Explore at:
zip(36109567 bytes)Available download formats
Dataset updated
Jan 21, 2023
Authors
stepheniota
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

Number sequences for self-supervised learning

The Online Encyclopedia of Integer Sequences (OEIS) is a well-known database of number sequences with interesting and unique mathematical properties. Some sequences are easily recognizable, like A000045 (the Fibonacci numbers) or A000108 (Catalan numbers). Others are more abstract and require complicated formulas to validate (e.g. A000289 a non-linear recurrence.).

In this dataset you will find the complete set of number sequences from the OEIS database. There are two files, sequences.csv and metadata.txt. Each row of sequences.csv starts with a sequence's A-number, followed by the comma separated sequence. metadata.txt maps sequence's A-number to their definition.

References

OEIS Foundation Inc. (2023), The On-Line Encyclopedia of Integer Sequences, Published electronically at https://oeis.org.

Search
Clear search
Close search
Google apps
Main menu