100+ datasets found
  1. f

    Established databases included in DISCOVER CKD.

    • plos.figshare.com
    xls
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Supriya Kumar; Matthew Arnold; Glen James; Rema Padman (2023). Established databases included in DISCOVER CKD. [Dataset]. http://doi.org/10.1371/journal.pone.0274131.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Supriya Kumar; Matthew Arnold; Glen James; Rema Padman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Established databases included in DISCOVER CKD.

  2. B

    Standardizing Definitions and Data Collection Approach for Black-owned...

    • borealisdata.ca
    Updated Aug 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dozie Okoye; Emmanuel U. Nwugo (2025). Standardizing Definitions and Data Collection Approach for Black-owned Businesses and Black Entrepreneurs | Normalisation des définitions et de l'approche de la collecte de données pour les entreprises et les entrepreneurs noirs [Dataset]. http://doi.org/10.5683/SP3/OKGGWS
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 7, 2025
    Dataset provided by
    Borealis
    Authors
    Dozie Okoye; Emmanuel U. Nwugo
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Introduction Black entrepreneurship in Canada makes important contributions to the Canadian economy, from fostering innovation to creating employment and building generational wealth. At least 1.3% of Black adults in Canada are business owners, relative to the 2.3% that are business owners from the entire Canadian population (Business Development Bank of Canada (BDC), 2025). Similarly, Black people account for 2.4% of all business owners in the country even though they only represent 4.3% of the entire population. Women account for 33% of these Black businesses, as compared to their 20% share in the country’s total business ownership (Diversity Institute, 2024), highlighting the potential of Black entrepreneurship for economic empowerment. However, Black businesses are faced with some systemic barriers that impede their ability to thrive, including but not limited to underrepresentation among entrepreneurs in Canada and limited access to finance, restricted networking opportunities, and insufficient specialized support programs (Gueye et al., 2022; Gueye, 2023; Diversity Institute, 2024). These challenges stem in part from a lack of comprehensive and reliable data on Black businesses and the absence of standardized definitions for key concepts such as Black entrepreneurs, Black enterprises, and Black entrepreneurship. Without a clear understanding of who constitutes a Black entrepreneur and the scale of their contributions, policymakers and stakeholders struggle to provide the necessary support and resources to advance this community. In fact, the development of policies and initiatives for Black businesses faces difficulties because current data about Black entrepreneurship remains fragmented and inconsistent, with different sources reporting different numbers (Grekou et al., 2021; Gueye, 2023). These discrepancies highlight the urgent need for a unified approach to data collection and analysis, as accurate and comprehensive data are critical to understanding the size, scope, and needs of Black entrepreneurs, enabling targeted policy interventions and resource allocation. Current data fragmentation problems combined with non-standardized definitions create a situation where Black business owners are frequently ignored or inaccurately classified or omitted (Coletto et al., 2021). The significance of this research, therefore, lies in its ability to resolve systemic barriers through an improved representation of Black entrepreneurs. This research aims to harmonise missing data points and set specific criteria to establish sound tools for policymakers, researchers, and community groups who want to better assist Black entrepreneurs. With this, Black-owned business support will be strengthened through targeted policies and programs that develop sustainable growth for these businesses in Canada. The main objectives of this study are threefold. The research seeks to reconcile disparate Black entrepreneurship statistics from Afrobiz.ca alongside Canadian Black Chamber of Commerce records and Statistics Canada databases. Also, the research seeks to develop unified criteria to define Black business owners together with their enterprises to improve both data collection precision and reporting consistency. Lastly, the research will establish procedures to build a standardized database of Black entrepreneurs by integrating present data sources and making sure both formal and informal businesses receive proper representation. These research efforts will establish fundamental principles for developing an inclusive and equal entrepreneurial system throughout Canada. Introduction L'entrepreneuriat noir au Canada apporte d'importantes contributions à l'économie canadienne, qu'il s'agisse de favoriser l'innovation, de créer des emplois ou de constituer un patrimoine générationnel. Au moins 1,3 % des adultes noirs au Canada sont propriétaires d'une entreprise, contre 2,3 % pour l'ensemble de la population canadienne (Banque de développement du Canada (BDC), 2025). De même, les Noirs représentent 2,4 % de tous les propriétaires d'entreprise du pays, alors qu'ils ne représentent que 4,3 % de la population totale. Les femmes représentent 33 % de ces entreprises noires, alors qu'elles représentent 20 % de l'ensemble des entreprises du pays (Diversity Institute, 2024), ce qui souligne le potentiel de l'entrepreneuriat noir en matière d'émancipation économique. Toutefois, les entreprises noires sont confrontées à certains obstacles systémiques qui entravent leur capacité à prospérer, notamment la sousreprésentation des entrepreneurs au Canada et l'accès limité au financement, les possibilités de réseautage restreintes et l'insuffisance des programmes de soutien spécialisés (Gueye et al., 2022 ; Gueye, 2023 ; Diversity Institute, 2024). Ces défis découlent en partie d'un manque de données complètes et fiables sur les entreprises noires et de l'absence de définitions normalisées pour des concepts clés tels que les entrepreneurs...

  3. OneNet Cross-Platform Services

    • data.europa.eu
    • zenodo.org
    unknown
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo, OneNet Cross-Platform Services [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-8329051?locale=en
    Explore at:
    unknown(199476)Available download formats
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The goal of the OneNet System is to facilitate data exchanges among existing platforms, services, applications, and devices by the power of interoperability techniques. To ensure that system requirements are technically -implementable and widely adopted, internationally standardized file formats, metadata, vocabularies and identifiers - are required. The OneNet “Cross-Platform Access” pattern is the fundamental characteristic of an interoperable ecosystem, leading to the definition of the exposed list OneNet Cross-Platform services (CPS). The pattern entails that an application accesses services or resources (information or functions) from multiple platforms through the same interface. For example, a “grid monitoring” application gathers information on different grid indicators provided by different platforms that conduct measurements or state estimations. The challenge of realizing this pattern lies in allowing applications or services within one platform to interact with other platforms (eventually from different providers) with relevant services or applications via the same interface and data formats. Thereby, reuse and composition of services as well as easy integration of data from different platforms are enabled. Based on the defined concept for CPS, an extensive analysis has been performed regarding data exchange patterns and roles involved for system use cases (SUCs) from other H2020 projects and the OneNet demo clusters. This has resulted into a first list of CPS, that has been thereafter taxonomized into 10 categories. The different entries have been defined providing a set of classes such as service description, indicative data producer/consumer etc. Each CPS can be assigned with multiple business objects describing the context of it. For a specific set of widely used by the Demo CPS, there have been formal semantic definitions provided in the "CrossPlatformServices-Semantic" excel worksheet.

  4. f

    Secondary data and baseline covariates of patients included in DISCOVER CKD....

    • plos.figshare.com
    xls
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Supriya Kumar; Matthew Arnold; Glen James; Rema Padman (2023). Secondary data and baseline covariates of patients included in DISCOVER CKD. [Dataset]. http://doi.org/10.1371/journal.pone.0274131.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Supriya Kumar; Matthew Arnold; Glen James; Rema Padman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Secondary data and baseline covariates of patients included in DISCOVER CKD.

  5. World Bank's Global Data🌎🌏🌍

    • kaggle.com
    Updated Jan 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vijay Veer Singh (2025). World Bank's Global Data🌎🌏🌍 [Dataset]. https://www.kaggle.com/datasets/vijayveersingh/world-banks-global-indicator-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 11, 2025
    Dataset provided by
    Kaggle
    Authors
    Vijay Veer Singh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description:

    *The World Development Indicators (WDI) is a premier compilation of cross-country comparable data about development. It provides a broad range of economic, social, environmental, and governance indicators to support analysis and decision-making for development policies. The dataset includes indicators from different countries, spanning multiple decades, enabling researchers and policymakers to understand trends and progress in development goals such as poverty reduction, education, healthcare, and infrastructure.*

    *The dataset is a collection of multiple CSV files providing information on global indicators, countries, and time-series data. It is structured as follows:*

    1. series:
    Contains metadata for various indicators, including their descriptions, definitions, and other relevant information. This file acts as a reference for understanding what each indicator represents.

    2. country_series:
    Establishes relationships between countries and specific indicators. It provides additional metadata, such as contextual descriptions of indicator usage for particular countries.

    3. countries:
    Includes detailed information about countries, such as country codes, region classifications, income levels, and other geographical or socio-economic attributes.

    4. footnotes:
    Provides supplementary notes and additional context for specific data points in the main dataset. These notes clarify exceptions, limitations, or other special considerations for particular entries.

    5. main_data:
    The core dataset containing the actual indicator values for countries across different years. This file forms the backbone of the dataset and is used for analysis.

    6. series_time:
    Contains time-related metadata for indicators, such as their start and end years or periods of data availability.

    *This dataset is ideal for analyzing global development trends, comparing country-level statistics, and studying the relationships between different socio-economic indicators over time.*

    Columns and Examples:

    Series Code:

    Description: Unique code identifying the data series.

    Example: AG.LND.AGRI.K2 (Agricultural land, sq. km).

    Topic:

    Description: Category under which the indicator is classified.

    Example: Environment: Land use.

    Indicator Name:

    Description: Full name describing what the indicator measures.

    Example: Agricultural land (sq. km).

    Short Definition:

    Description: A brief explanation of the indicator (if available).

    Example: Not applicable for all indicators.

    Long Definition:

    Description: Detailed explanation of the indicator’s meaning and methodology.

    Example: "Agricultural land refers to the share of land area that is arable, under permanent crops, or under permanent pastures."

    Unit of Measure:

    Description: Unit in which the data is expressed.

    Example: Square kilometers.

    Periodicity:

    Description: How frequently the data is collected or reported.

    Example: Annual.

    Base Period:

    Description: The reference period used for comparison, if applicable.

    Example: Often not specified.

    Other Notes:

    Description: Additional context or remarks about the data.

    Example: "Data for former states are included in successor states."

    Aggregation Method:

    Description: Method used to combine data for groups (e.g., regions).

    Example: Weighted average.

    Limitations and Exceptions:

    Description: Constraints or exceptions in the data.

    Example: "Data may not be directly comparable across countries due to different definitions."

    Notes from Original Source:

    Description: Remarks provided by the data source.

    Example: Not specified for all indicators.

    General Comments:

    Description: Broad remarks about the dataset or indicator.

    Example: Not available in all cases.

    Source:

    Description: Organization providing the data.

    Example: Food and Agriculture Organization.

    Statistical Concept and Methodology:

    Description: Explanation of how the data was generated.

    Example: "Agricultural land is calculated based on land area classified as arable."

    Development Relevance:

    Description: Importance of the indicator for development.

    Example: "Agricultural land availability impacts food security and rural livelihoods."

    Related Source Links:

    Description: URLs to related information sources (if any).

    Example: Not specified.

    Other Web Links:

    Description: Additional web resources.

    Example: Not specified.

    Related Indicators:

    Description: Indicators conceptually related...

  6. Z

    Data from: NetFlow data collected with different packet sampling rates

    • data.niaid.nih.gov
    • portalcientifico.unileon.es
    • +2more
    Updated Feb 24, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adrián Campazas (2022). NetFlow data collected with different packet sampling rates [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6243335
    Explore at:
    Dataset updated
    Feb 24, 2022
    Dataset provided by
    Ignacio Crespo
    Adrián Campazas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    NetFlow traffic generated using DOROTHEA (DOcker-based fRamework fOr gaTHering nEtflow trAffic) NetFlow is a network protocol developed by Cisco for the collection and monitoring of network traffic flow data generated. A flow is defined as a unidirectional sequence of packets with some common properties that pass through a network device.

    NetFlow flows have been captured with different sampling at the packet level. A sampling means that 1 out of every X packets is selected to be flow while the rest of the packets are not valued.

    The version of NetFlow used to build the datasets is 5.

  7. p

    MIMIC-IV

    • physionet.org
    Updated Oct 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Brian Gow; Benjamin Moody; Steven Horng; Leo Anthony Celi; Roger Mark (2024). MIMIC-IV [Dataset]. http://doi.org/10.13026/kpb9-mt58
    Explore at:
    Dataset updated
    Oct 11, 2024
    Authors
    Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Brian Gow; Benjamin Moody; Steven Horng; Leo Anthony Celi; Roger Mark
    License

    https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts

    Description

    Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy. Here we present Medical Information Mart for Intensive Care (MIMIC)-IV, a large deidentified dataset of patients admitted to the emergency department or an intensive care unit at the Beth Israel Deaconess Medical Center in Boston, MA. MIMIC-IV contains data for over 65,000 patients admitted to an ICU and over 200,000 patients admitted to the emergency department. MIMIC-IV incorporates contemporary data and adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare.

  8. ERA5 hourly data on single levels from 1940 to present

    • cds.climate.copernicus.eu
    grib
    Updated Aug 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ECMWF (2025). ERA5 hourly data on single levels from 1940 to present [Dataset]. http://doi.org/10.24381/cds.adbb2d47
    Explore at:
    gribAvailable download formats
    Dataset updated
    Aug 11, 2025
    Dataset provided by
    European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
    Authors
    ECMWF
    License

    https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdfhttps://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdf

    Time period covered
    Jan 1, 1940 - Aug 5, 2025
    Description

    ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. ERA5 provides hourly estimates for a large number of atmospheric, ocean-wave and land-surface quantities. An uncertainty estimate is sampled by an underlying 10-member ensemble at three-hourly intervals. Ensemble mean and spread have been pre-computed for convenience. Such uncertainty estimates are closely related to the information content of the available observing system which has evolved considerably over time. They also indicate flow-dependent sensitive areas. To facilitate many climate applications, monthly-mean averages have been pre-calculated too, though monthly means are not available for the ensemble mean and spread. ERA5 is updated daily with a latency of about 5 days. In case that serious flaws are detected in this early release (called ERA5T), this data could be different from the final release 2 to 3 months later. In case that this occurs users are notified. The data set presented here is a regridded subset of the full ERA5 data set on native resolution. It is online on spinning disk, which should ensure fast and easy access. It should satisfy the requirements for most common applications. An overview of all ERA5 datasets can be found in this article. Information on access to ERA5 data on native resolution is provided in these guidelines. Data has been regridded to a regular lat-lon grid of 0.25 degrees for the reanalysis and 0.5 degrees for the uncertainty estimate (0.5 and 1 degree respectively for ocean waves). There are four main sub sets: hourly and monthly products, both on pressure levels (upper air fields) and single levels (atmospheric, ocean-wave and land surface quantities). The present entry is "ERA5 hourly data on single levels from 1940 to present".

  9. E

    New Oxford Dictionary of English, 2nd Edition

    • live.european-language-grid.eu
    • catalog.elra.info
    Updated Dec 6, 2005
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2005). New Oxford Dictionary of English, 2nd Edition [Dataset]. https://live.european-language-grid.eu/catalogue/lcr/2276
    Explore at:
    Dataset updated
    Dec 6, 2005
    License

    http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    Description

    This is Oxford University Press's most comprehensive single-volume dictionary, with 170,000 entries covering all varieties of English worldwide. The NODE data set constitutes a fully integrated range of formal data types suitable for language engineering and NLP applications: It is available in XML or SGML. - Source dictionary data. The NODE data set includes all the information present in the New Oxford Dictionary of English itself, such as definition text, example sentences, grammatical indicators, and encyclopaedic material. - Morphological data. Each NODE lemma (both headwords and subentries) has a full listing of all possible syntactic forms (e.g. plurals for nouns, inflections for verbs, comparatives and superlatives for adjectives), tagged to show their syntactic relationships. Each form has an IPA pronunciation. Full morphological data is also given for spelling variants (e.g. typical American variants), and a system of links enables straightforward correlation of variant forms to standard forms. The data set thus provides robust support for all look-up routines, and is equally viable for applications dealing with American and British English. - Phrases and idioms. The NODE data set provides a rich and flexible codification of over 10,000 phrasal verbs and other multi-word phrases. It features comprehensive lexical resources enabling applications to identify a phrase not only in the form listed in the dictionary but also in a range of real-world variations, including alternative wording, variable syntactic patterns, inflected verbs, optional determiners, etc. - Subject classification. Using a categorization scheme of 200 key domains, over 80,000 words and senses have been associated with particular subject areas, from aeronautics to zoology. As well as facilitating the extraction of subject-specific sub-lexicons, this also provides an extensive resource for document categorization and information retrieval. - Semantic relationships. The relationships between every noun and noun sense in the dictionary are being codified using an extensive semantic taxonomy on the model of the Princeton WordNet project. (Mapping to WordNet 1.7 is supported.) This structure allows elements of the basic lexical database to function as a formal knowledge database, enabling functionality such as sense disambiguation and logical inference. - Derived from the detailed and authoritative corpus-based research of Oxford University Press's lexicographic team, the NODE data set is a powerful asset for any task dealing with real-world contemporary English usage. By integrating a number of different data types into a single structure, it creates a coherent resource which can be queried along numerous axes, allowing open-ended exploitation by many kinds of language-related applications.

  10. d

    The definition data service for traffic congestion levels in Taoyuan City

    • data.gov.tw
    xml
    Updated Feb 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Transportation, Taoyuan (2022). The definition data service for traffic congestion levels in Taoyuan City [Dataset]. https://data.gov.tw/en/datasets/149613
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Feb 25, 2022
    Dataset authored and provided by
    Department of Transportation, Taoyuan
    License

    https://data.gov.tw/licensehttps://data.gov.tw/license

    Area covered
    Taoyuan
    Description

    For the real-time traffic information of Taoyuan City, explain the basic data definition of congestion levels. According to different road characteristics, different groups are defined for congestion levels, and each group is further subdivided to describe different congestion levels.

  11. T

    Pedestrian Stops - Contact Cards

    • data.cincinnati-oh.gov
    application/rdfxml +5
    Updated Mar 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Pedestrian Stops - Contact Cards [Dataset]. https://data.cincinnati-oh.gov/Safety/Pedestrian-Stops-Contact-Cards/swrz-ak2i
    Explore at:
    application/rssxml, csv, json, xml, application/rdfxml, tsvAvailable download formats
    Dataset updated
    Mar 31, 2025
    Description

    Data Description: This dataset captures all Cincinnati Police Department stops of pedestrians. This data includes time of incident, officer assignment, race/sex of stop subject, and outcome of the stop ("Action taken").. Individual pedestrian stops may populate multiple data rows to account for multiple outcomes: "Instance_ID" is the unique identifier for every one (1) pedestrian stop.

    NOTE: CPD transitioned to a new Record Management System on 6/3/2024. The data before this date may have a different structure than the data after this date.

    Data Creation: This data is created when CPD completes a pedestrian stop and logs the interview via Contact Cards. Contact Cards are a result of the Collaborative Agreement. Contact Cards are manually entered and may experience lags in data entry.

    Data Created by: This data is created by the Cincinnati Police Department.

    Refresh Frequency: This data is updated daily.

    CincyInsights: The City of Cincinnati maintains an interactive dashboard portal, CincyInsights in addition to our Open Data in an effort to increase access and usage of city data. This data set has an associated dashboard available here: https://insights.cincinnati-oh.gov/stories/s/gw5q-kjng

    Data Dictionary: A data dictionary providing definitions of columns and attributes is available as an attachment to this dataset.

    Processing: The City of Cincinnati is committed to providing the most granular and accurate data possible. In that pursuit the Office of Performance and Data Analytics facilitates standard processing to most raw data prior to publication. Processing includes but is not limited: address verification, geocoding, decoding attributes, and addition of administrative areas (i.e. Census, neighborhoods, police districts, etc.).

    Data Usage: For directions on downloading and using open data please visit our How-to Guide: https://data.cincinnati-oh.gov/dataset/Open-Data-How-To-Guide/gdr9-g3ad

    Disclaimer: In compliance with privacy laws, all Public Safety datasets are anonymized and appropriately redacted prior to publication on the City of Cincinnati’s Open Data Portal. This means that for all public safety datasets: (1) the last two digits of all addresses have been replaced with “XX,” and in cases where there is a single digit street address, the entire address number is replaced with "X"; and (2) Latitude and Longitude have been randomly skewed to represent values within the same block area (but not the exact location) of the incident.

  12. f

    fdata-02-00040_Challenges and Legal Gaps of Genetic Profiling in the Era of...

    • frontiersin.figshare.com
    bin
    Updated Jun 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Murat Sariyar; Irene Schlünder (2023). fdata-02-00040_Challenges and Legal Gaps of Genetic Profiling in the Era of Big Data.xml [Dataset]. http://doi.org/10.3389/fdata.2019.00040.s002
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Frontiers
    Authors
    Murat Sariyar; Irene Schlünder
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Profiling of individuals based on inborn, acquired, and assigned characteristics is central for decision making in health care. In the era of omics and big smart data, it becomes urgent to differentiate between different data governance affordances for different profiling activities. Typically, diagnostic profiling is in the focus of researchers and physicians, and other types are regarded as undesired side-effects; for example, in the connection of health care insurance risk calculations. Profiling in a legal sense is addressed, for example, by the EU data protection law. It is defined in the General Data Protection Regulation as automated decision making. This term does not correspond fully with profiling in biomedical research and healthcare, and the impact on privacy has hardly ever been examined. But profiling is also an issue concerning the fundamental right of non-discrimination, whenever profiles are used in a way that has a discriminatory effect on individuals. Here, we will focus on genetic profiling, define related notions as legal and subject-matter definitions frequently differ, and discuss the ethical and legal challenges.

  13. Z

    Data from: Modeling software processes from different domains using SPEM and...

    • data.niaid.nih.gov
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carla Bezerra (2024). Modeling software processes from different domains using SPEM and BPMN notations: An experience report of teaching software processes [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7674964
    Explore at:
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Carla Bezerra
    Emanuel Coutinho
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In a current application development scenario in different environments, technologies and contexts, such as IoT, Blockchain, Machine Learning and Cloud Computing, there is a need for particular solutions for domain-specific software development processes. The proper definition of software processes requires understanding the involved teams and organization’s particularities and specialized technical knowledge in Software Engineering. Although it is an essential part of Software Engineering, many university curricula do not dedicate as much effort to teach software processes, focusing more on the basic principles of Software Engineering, such as requirements, architecture and programming languages. Another important aspect of software processes is modeling. The modeling of a software process provides a basis for managing, automating and supporting the software process improvement. In this context, teaching software process modeling becomes challenging, mainly due to the great emphasis on theory and few practices. This work presents an experience report teaching the definition and modeling of software processes in different domains. We applied in the discipline of software processes a practice for defining and modeling processes in various application domains, such as: IoT, cloud, mobile, critical systems, self-adaptive systems, machine learning, blockchain and games. The processes were modeled in the Software & Systems Process Engineering Metamodel (SPEM) and Business Process Model and Notation (BPMN) notations based on references from the literature for each domain. We evaluated the process modeling practice with the SPEM and BPMN in 3 classes of the software processes discipline and compared the use of the two notations applied to the different domains. We concluded that the modeling tool and the maturity in the domain are essential for the excellent performance of the process.

  14. d

    Geophysical surveys and geospatial data for Bob Kidd Lake, Washington...

    • catalog.data.gov
    • data.usgs.gov
    • +3more
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Geophysical surveys and geospatial data for Bob Kidd Lake, Washington County, Arkansas [Dataset]. https://catalog.data.gov/dataset/geophysical-surveys-and-geospatial-data-for-bob-kidd-lake-washington-county-arkansas
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    U.S. Geological Survey
    Area covered
    Washington County, Arkansas, Bob Kidd Lake
    Description

    This data release consists of three different types of data: including direct current (DC) resistivity profiles, frequency domain electromagnetic (FDEM) survey data, and global navigation satellite system (GNSS) coordinate data of the geophysical measurement locations. A data dictionary is included along with the data and defines all of the table headings, definitions, and units. Earthen dams are common on lakes and ponds, but characteristics of these structures such as construction history, composition, and integrity are often unknown for older dams. Geophysical surveying techniques provide a non-invasive method of mapping their lithology and structure. In particular, DC resistivity and FDEM methods can, when properly processed, provide the information necessary to construct a lithologic model of an earthen dam without having to trench or core through the shell of the dam itself. In September, 2016 the U.S. Geological Survey (USGS) conducted geophysical surveys at Bob Kidd Lake, an 81-hectare lake, in northwestern Arkansas to help determine the composition of the earthen dam and guide any potential geotechnical investigations. A series of DC resistivity surveys were conducted along, parallel, and perpendicular to the axis of the crest of the dam to identify the soil-bedrock interface and any variations in the composition of the earthen dam. A dense survey using a multi-frequency electromagnetic sensor was used to map the shallow materials comprising the dam at a higher resolution. Resistivity measurements were made by transmitting a known current through two electrodes (transmitter) and measuring the voltage potential across two other electrodes (receiver). The multiple channels on the resistivity meter allow for voltage measurements to be made at 10 receivers simultaneously following a current injection. The configuration of the transmitter relative to the receiver(s) is referred to as an array. For these surveys, a Reciprocal Schlumberger array was used, which positions the transmitting pair of electrodes toward the center of the array and the receiving pairs extending away from the transmitter (Loke, 2000; Zonge and others, 2005). The electrical resistance was calculated by dividing the measured voltage by the applied current. The apparent resistivity was determined by multiplying the electrical resistance by a geometric factor. Apparent resistivity is not the true resistivity, but rather a volume-averaged estimate of the true resistivity distribution, because a homogeneous, isotropic subsurface is assumed. To estimate the true resistivity of the heterogeneous and/or anisotropic subsurface, the apparent resistivity data were processed using an inverse modeling software program. The FDEM method complements the two-dimensional (2-D) DC resistivity method and was used to extend the depth of subsurface characterization obtained with resistivity profiles. The FDEM method uses multiple current frequencies to measure bulk electric conductivity values (the inverse of resistivity values) of the earth at different depths (Lucius and others, 2007). For this project FDEM data were collected with a GEM-2, a broadband, multifrequency, fixed-coil electromagnetic induction unit (Geophex, 2015). In addition to the geophysical surveys a concurrent Global Navigation Satellite System (GNSS) survey was conducted using a Real Time Kinematic system (RTK). All electrode locations on the DC resistivity profiles, all measurement locations in the FDEM survey, as well as a point-cloud survey were collected and are included in the dataset. These data were used to geo-reference the geophysical data and may be used to create a Digital Elevation Model (DEM) of the dam surface.

  15. 2023 American Community Survey: DP02 | Selected Social Characteristics in...

    • data.census.gov
    • test.data.census.gov
    Updated Oct 6, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ACS (2022). 2023 American Community Survey: DP02 | Selected Social Characteristics in the United States (ACS 1-Year Estimates Data Profiles) [Dataset]. https://data.census.gov/cedsci/table?q=DP02
    Explore at:
    Dataset updated
    Oct 6, 2022
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Authors
    ACS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2023
    Area covered
    United States
    Description

    Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, the decennial census is the official source of population totals for April 1st of each decennial year. In between censuses, the Census Bureau's Population Estimates Program produces and disseminates the official estimates of the population for the nation, states, counties, cities, and towns and estimates of housing units and the group quarters population for states and counties..Information about the American Community Survey (ACS) can be found on the ACS website. Supporting documentation including code lists, subject definitions, data accuracy, and statistical testing, and a full list of ACS tables and table shells (without estimates) can be found on the Technical Documentation section of the ACS website.Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Source: U.S. Census Bureau, 2023 American Community Survey 1-Year Estimates.ACS data generally reflect the geographic boundaries of legal and statistical areas as of January 1 of the estimate year. For more information, see Geography Boundaries by Year..Users must consider potential differences in geographic boundaries, questionnaire content or coding, or other methodological issues when comparing ACS data from different years. Statistically significant differences shown in ACS Comparison Profiles, or in data users' own analysis, may be the result of these differences and thus might not necessarily reflect changes to the social, economic, housing, or demographic characteristics being compared. For more information, see Comparing ACS Data..Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation). The effect of nonsampling error is not represented in these tables..Ancestry listed in this table refers to the total number of people who responded with a particular ancestry; for example, the estimate given for German represents the number of people who listed German as either their first or second ancestry. This table lists only the largest ancestry groups; see the Detailed Tables for more categories. Race and Hispanic origin groups are not included in this table because data for those groups come from the Race and Hispanic origin questions rather than the ancestry question (see Demographic Table)..Data for year of entry of the native population reflect the year of entry into the U.S. by people who were born in Puerto Rico or U.S. Island Areas or born outside the U.S. to a U.S. citizen parent and who subsequently moved to the U.S..The category "with a broadband Internet subscription" refers to those who said "Yes" to at least one of the following types of Internet subscriptions: Broadband such as cable, fiber optic, or DSL; a cellular data plan; satellite; a fixed wireless subscription; or other non-dial up subscription types..An Internet "subscription" refers to a type of service that someone pays for to access the Internet such as a cellular data plan, broadband such as cable, fiber optic or DSL, or other type of service. This will normally refer to a service that someone is billed for directly for Internet alone or sometimes as part of a bundle.."With a computer" includes those who said "Yes" to at least one of the following types of computers: Desktop or laptop; smartphone; tablet or other portable wireless computer; or some other type of computer..Estimates of urban and rural populations, housing units, and characteristics reflect boundaries of urban areas defined based on 2020 Census data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..Explanation of Symbols:- The estimate could not be computed because there were an insufficient number of sample observations. For a ratio of medians estimate, one or both of the median estimates falls in the lowest interval or highest interval of an open-ended distribution. For a 5-year median estimate, the margin of error associated with a median was larger than the median itself.N The estimate or margin of error cannot be displayed because there were an insufficient number of sample cases in the selected geographic area. (X) The estimate or margin of error is not applicable or not available.median- ...

  16. Data from: Teaching software processes from different application domains

    • zenodo.org
    bin, pdf
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carla Bezerra; Carla Bezerra; Emanuel Coutinho; Emanuel Coutinho (2024). Teaching software processes from different application domains [Dataset]. http://doi.org/10.5281/zenodo.7068357
    Explore at:
    bin, pdfAvailable download formats
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Carla Bezerra; Carla Bezerra; Emanuel Coutinho; Emanuel Coutinho
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In a current application development scenario in different environments, technologies and contexts, such as IoT, Blockchain, Machine Learning and Cloud Computing, there is a need for particular solutions for domain-specific software development processes. The proper definition of software processes requires understanding the involved teams and organization’s particularities and specialized technical knowledge in Software Engineering. Although it is an essential part of Software Engineering, many university curricula do not dedicate as much effort to teaching software processes, focusing more on the basic principles of Software Engineering, such as requirements, architecture and programming languages. Another important aspect of software processes is modeling. The modeling of a software process provides a basis for managing, automating and supporting the software processes improvement. In this context, teaching software processes modeling becomes challenging, mainly due to the great emphasis on theory and few practices. This work presents an experience report teaching the definition and modeling of software processes in different domains. We apply in the discipline of software processes a practice for defining and modeling processes in various application domains, such as: IoT, cloud, mobile, critical systems, self-adaptive systems and games. The processes were modeled in the EPF composer tool based on references from the literature for each domain. In the end, we evaluated the process modeling practice with the students. We concluded that the modeling tool and the maturity in the domain are essential for the good performance of the process.

  17. Enterprise Survey 2009-2019, Panel Data - Slovenia

    • microdata.worldbank.org
    • catalog.ihsn.org
    Updated Aug 6, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank Group (WBG) (2020). Enterprise Survey 2009-2019, Panel Data - Slovenia [Dataset]. https://microdata.worldbank.org/index.php/catalog/3762
    Explore at:
    Dataset updated
    Aug 6, 2020
    Dataset provided by
    World Bankhttp://worldbank.org/
    European Investment Bankhttp://eib.org/
    European Bank for Reconstruction and Developmenthttp://ebrd.com/
    Time period covered
    2008 - 2019
    Area covered
    Slovenia
    Description

    Abstract

    The documentation covers Enterprise Survey panel datasets that were collected in Slovenia in 2009, 2013 and 2019.

    The Slovenia ES 2009 was conducted between 2008 and 2009. The Slovenia ES 2013 was conducted between March 2013 and September 2013. Finally, the Slovenia ES 2019 was conducted between December 2018 and November 2019. The objective of the Enterprise Survey is to gain an understanding of what firms experience in the private sector.

    As part of its strategic goal of building a climate for investment, job creation, and sustainable growth, the World Bank has promoted improving the business environment as a key strategy for development, which has led to a systematic effort in collecting enterprise data across countries. The Enterprise Surveys (ES) are an ongoing World Bank project in collecting both objective data based on firms' experiences and enterprises' perception of the environment in which they operate.

    Geographic coverage

    National

    Analysis unit

    The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must take its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.

    Universe

    As it is standard for the ES, the Slovenia ES was based on the following size stratification: small (5 to 19 employees), medium (20 to 99 employees), and large (100 or more employees).

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The sample for Slovenia ES 2009, 2013, 2019 were selected using stratified random sampling, following the methodology explained in the Sampling Manual for Slovenia 2009 ES and for Slovenia 2013 ES, and in the Sampling Note for 2019 Slovenia ES.

    Three levels of stratification were used in this country: industry, establishment size, and oblast (region). The original sample designs with specific information of the industries and regions chosen are included in the attached Excel file (Sampling Report.xls.) for Slovenia 2009 ES. For Slovenia 2013 and 2019 ES, specific information of the industries and regions chosen is described in the "The Slovenia 2013 Enterprise Surveys Data Set" and "The Slovenia 2019 Enterprise Surveys Data Set" reports respectively, Appendix E.

    For the Slovenia 2009 ES, industry stratification was designed in the way that follows: the universe was stratified into manufacturing industries, services industries, and one residual (core) sector as defined in the sampling manual. Each industry had a target of 90 interviews. For the manufacturing industries sample sizes were inflated by about 17% to account for potential non-response cases when requesting sensitive financial data and also because of likely attrition in future surveys that would affect the construction of a panel. For the other industries (residuals) sample sizes were inflated by about 12% to account for under sampling in firms in service industries.

    For Slovenia 2013 ES, industry stratification was designed in the way that follows: the universe was stratified into one manufacturing industry, and two service industries (retail, and other services).

    Finally, for Slovenia 2019 ES, three levels of stratification were used in this country: industry, establishment size, and region. The original sample design with specific information of the industries and regions chosen is described in "The Slovenia 2019 Enterprise Surveys Data Set" report, Appendix C. Industry stratification was done as follows: Manufacturing – combining all the relevant activities (ISIC Rev. 4.0 codes 10-33), Retail (ISIC 47), and Other Services (ISIC 41-43, 45, 46, 49-53, 55, 56, 58, 61, 62, 79, 95).

    For Slovenia 2009 and 2013 ES, size stratification was defined following the standardized definition for the rollout: small (5 to 19 employees), medium (20 to 99 employees), and large (more than 99 employees). For stratification purposes, the number of employees was defined on the basis of reported permanent full-time workers. This seems to be an appropriate definition of the labor force since seasonal/casual/part-time employment is not a common practice, except in the sectors of construction and agriculture.

    For Slovenia 2009 ES, regional stratification was defined in 2 regions. These regions are Vzhodna Slovenija and Zahodna Slovenija. The Slovenia sample contains panel data. The wave 1 panel “Investment Climate Private Enterprise Survey implemented in Slovenia” consisted of 223 establishments interviewed in 2005. A total of 57 establishments have been re-interviewed in the 2008 Business Environment and Enterprise Performance Survey.

    For Slovenia 2013 ES, regional stratification was defined in 2 regions (city and the surrounding business area) throughout Slovenia.

    Finally, for Slovenia 2019 ES, regional stratification was done across two regions: Eastern Slovenia (NUTS code SI03) and Western Slovenia (SI04).

    Mode of data collection

    Computer Assisted Personal Interview [capi]

    Research instrument

    Questionnaires have common questions (core module) and respectfully additional manufacturing- and services-specific questions. The eligible manufacturing industries have been surveyed using the Manufacturing questionnaire (includes the core module, plus manufacturing specific questions). Retail firms have been interviewed using the Services questionnaire (includes the core module plus retail specific questions) and the residual eligible services have been covered using the Services questionnaire (includes the core module). Each variation of the questionnaire is identified by the index variable, a0.

    Response rate

    Survey non-response must be differentiated from item non-response. The former refers to refusals to participate in the survey altogether whereas the latter refers to the refusals to answer some specific questions. Enterprise Surveys suffer from both problems and different strategies were used to address these issues.

    Item non-response was addressed by two strategies: a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect the refusal to respond as (-8). b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary. However, there were clear cases of low response.

    For 2009 and 2013 Slovenia ES, the survey non-response was addressed by maximizing efforts to contact establishments that were initially selected for interview. Up to 4 attempts were made to contact the establishment for interview at different times/days of the week before a replacement establishment (with similar strata characteristics) was suggested for interview. Survey non-response did occur but substitutions were made in order to potentially achieve strata-specific goals. Further research is needed on survey non-response in the Enterprise Surveys regarding potential introduction of bias.

    For 2009, the number of contacted establishments per realized interview was 6.18. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The relatively low ratio of contacted establishments per realized interview (6.18) suggests that the main source of error in estimates in the Slovenia may be selection bias and not frame inaccuracy.

    For 2013, the number of realized interviews per contacted establishment was 25%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The number of rejections per contact was 44%.

    Finally, for 2019, the number of interviews per contacted establishments was 9.7%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The share of rejections per contact was 75.2%.

  18. Z

    Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...

    • data.niaid.nih.gov
    Updated Oct 20, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yfantidou, Sofia (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6826682
    Explore at:
    Dataset updated
    Oct 20, 2022
    Dataset provided by
    Ferrari, Elena
    Marchioro, Thomas
    Giakatos, Dimitrios Panteleimon
    Yfantidou, Sofia
    Karagianni, Christina
    Kazlouski, Andrei
    Palotti, Joao
    Girdzijauskas, Šarūnas
    Efstathiou, Stefanos
    Vakali, Athena
    Description

    LifeSnaps Dataset Documentation

    Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.

    The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.

    Data Import: Reading CSV

    For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.

    Data Import: Setting up a MongoDB (Recommended)

    To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.

    To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.

    For the Fitbit data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c fitbit

    For the SEMA data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c sema

    For surveys data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c surveys

    If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.

    Data Availability

    The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:

    { _id: id (or user_id): type: data: }

    Each document consists of four fields: id (also found as user_id in sema and survey collections), type, and data. The _id field is the MongoDB-defined primary key and can be ignored. The id field refers to a user-specific ID used to uniquely identify each user across all collections. The type field refers to the specific data type within the collection, e.g., steps, heart rate, calories, etc. The data field contains the actual information about the document e.g., steps count for a specific timestamp for the steps type, in the form of an embedded object. The contents of the data object are type-dependent, meaning that the fields within the data object are different between different types of data. As mentioned previously, all times are stored in local time, and user IDs are common across different collections. For more information on the available data types, see the related publication.

    Surveys Encoding

    BREQ2

    Why do you engage in exercise?

        Code
        Text
    
    
        engage[SQ001]
        I exercise because other people say I should
    
    
        engage[SQ002]
        I feel guilty when I don’t exercise
    
    
        engage[SQ003]
        I value the benefits of exercise
    
    
        engage[SQ004]
        I exercise because it’s fun
    
    
        engage[SQ005]
        I don’t see why I should have to exercise
    
    
        engage[SQ006]
        I take part in exercise because my friends/family/partner say I should
    
    
        engage[SQ007]
        I feel ashamed when I miss an exercise session
    
    
        engage[SQ008]
        It’s important to me to exercise regularly
    
    
        engage[SQ009]
        I can’t see why I should bother exercising
    
    
        engage[SQ010]
        I enjoy my exercise sessions
    
    
        engage[SQ011]
        I exercise because others will not be pleased with me if I don’t
    
    
        engage[SQ012]
        I don’t see the point in exercising
    
    
        engage[SQ013]
        I feel like a failure when I haven’t exercised in a while
    
    
        engage[SQ014]
        I think it is important to make the effort to exercise regularly
    
    
        engage[SQ015]
        I find exercise a pleasurable activity
    
    
        engage[SQ016]
        I feel under pressure from my friends/family to exercise
    
    
        engage[SQ017]
        I get restless if I don’t exercise regularly
    
    
        engage[SQ018]
        I get pleasure and satisfaction from participating in exercise
    
    
        engage[SQ019]
        I think exercising is a waste of time
    

    PANAS

    Indicate the extent you have felt this way over the past week

        P1[SQ001]
        Interested
    
    
        P1[SQ002]
        Distressed
    
    
        P1[SQ003]
        Excited
    
    
        P1[SQ004]
        Upset
    
    
        P1[SQ005]
        Strong
    
    
        P1[SQ006]
        Guilty
    
    
        P1[SQ007]
        Scared
    
    
        P1[SQ008]
        Hostile
    
    
        P1[SQ009]
        Enthusiastic
    
    
        P1[SQ010]
        Proud
    
    
        P1[SQ011]
        Irritable
    
    
        P1[SQ012]
        Alert
    
    
        P1[SQ013]
        Ashamed
    
    
        P1[SQ014]
        Inspired
    
    
        P1[SQ015]
        Nervous
    
    
        P1[SQ016]
        Determined
    
    
        P1[SQ017]
        Attentive
    
    
        P1[SQ018]
        Jittery
    
    
        P1[SQ019]
        Active
    
    
        P1[SQ020]
        Afraid
    

    Personality

    How Accurately Can You Describe Yourself?

        Code
        Text
    
    
        ipip[SQ001]
        Am the life of the party.
    
    
        ipip[SQ002]
        Feel little concern for others.
    
    
        ipip[SQ003]
        Am always prepared.
    
    
        ipip[SQ004]
        Get stressed out easily.
    
    
        ipip[SQ005]
        Have a rich vocabulary.
    
    
        ipip[SQ006]
        Don't talk a lot.
    
    
        ipip[SQ007]
        Am interested in people.
    
    
        ipip[SQ008]
        Leave my belongings around.
    
    
        ipip[SQ009]
        Am relaxed most of the time.
    
    
        ipip[SQ010]
        Have difficulty understanding abstract ideas.
    
    
        ipip[SQ011]
        Feel comfortable around people.
    
    
        ipip[SQ012]
        Insult people.
    
    
        ipip[SQ013]
        Pay attention to details.
    
    
        ipip[SQ014]
        Worry about things.
    
    
        ipip[SQ015]
        Have a vivid imagination.
    
    
        ipip[SQ016]
        Keep in the background.
    
    
        ipip[SQ017]
        Sympathize with others' feelings.
    
    
        ipip[SQ018]
        Make a mess of things.
    
    
        ipip[SQ019]
        Seldom feel blue.
    
    
        ipip[SQ020]
        Am not interested in abstract ideas.
    
    
        ipip[SQ021]
        Start conversations.
    
    
        ipip[SQ022]
        Am not interested in other people's problems.
    
    
        ipip[SQ023]
        Get chores done right away.
    
    
        ipip[SQ024]
        Am easily disturbed.
    
    
        ipip[SQ025]
        Have excellent ideas.
    
    
        ipip[SQ026]
        Have little to say.
    
    
        ipip[SQ027]
        Have a soft heart.
    
    
        ipip[SQ028]
        Often forget to put things back in their proper place.
    
    
        ipip[SQ029]
        Get upset easily.
    
    
        ipip[SQ030]
        Do not have a good imagination.
    
    
        ipip[SQ031]
        Talk to a lot of different people at parties.
    
    
        ipip[SQ032]
        Am not really interested in others.
    
    
        ipip[SQ033]
        Like order.
    
    
        ipip[SQ034]
        Change my mood a lot.
    
    
        ipip[SQ035]
        Am quick to understand things.
    
    
        ipip[SQ036]
        Don't like to draw attention to myself.
    
    
        ipip[SQ037]
        Take time out for others.
    
    
        ipip[SQ038]
        Shirk my duties.
    
    
        ipip[SQ039]
        Have frequent mood swings.
    
    
        ipip[SQ040]
        Use difficult words.
    
    
        ipip[SQ041]
        Don't mind being the centre of attention.
    
    
        ipip[SQ042]
        Feel others' emotions.
    
    
        ipip[SQ043]
        Follow a schedule.
    
    
        ipip[SQ044]
        Get irritated easily.
    
    
        ipip[SQ045]
        Spend time reflecting on things.
    
    
        ipip[SQ046]
        Am quiet around strangers.
    
    
        ipip[SQ047]
        Make people feel at ease.
    
    
        ipip[SQ048]
        Am exacting in my work.
    
    
        ipip[SQ049]
        Often feel blue.
    
    
        ipip[SQ050]
        Am full of ideas.
    

    STAI

    Indicate how you feel right now

        Code
        Text
    
    
        STAI[SQ001]
        I feel calm
    
    
        STAI[SQ002]
        I feel secure
    
    
        STAI[SQ003]
        I am tense
    
    
        STAI[SQ004]
        I feel strained
    
    
        STAI[SQ005]
        I feel at ease
    
    
        STAI[SQ006]
        I feel upset
    
    
        STAI[SQ007]
        I am presently worrying over possible misfortunes
    
    
        STAI[SQ008]
        I feel satisfied
    
    
        STAI[SQ009]
        I feel frightened
    
    
        STAI[SQ010]
        I feel comfortable
    
    
        STAI[SQ011]
        I feel self-confident
    
    
        STAI[SQ012]
        I feel nervous
    
    
        STAI[SQ013]
        I am jittery
    
    
        STAI[SQ014]
        I feel indecisive
    
    
        STAI[SQ015]
        I am relaxed
    
    
        STAI[SQ016]
        I feel content
    
    
        STAI[SQ017]
        I am worried
    
    
        STAI[SQ018]
        I feel confused
    
    
        STAI[SQ019]
        I feel steady
    
    
        STAI[SQ020]
        I feel pleasant
    

    TTM

    Do you engage in regular physical activity according to the definition above? How frequently did each event or experience occur in the past month?

        Code
        Text
    
    
        processes[SQ002]
        I read articles to learn more about physical
    
  19. Predicting Coupon Redemption_Feature Selection

    • kaggle.com
    zip
    Updated Nov 17, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vasudeva (2019). Predicting Coupon Redemption_Feature Selection [Dataset]. https://www.kaggle.com/vasudeva009/predicting-coupon-redemption-feature-selection
    Explore at:
    zip(65337333 bytes)Available download formats
    Dataset updated
    Nov 17, 2019
    Authors
    vasudeva
    Description

    Problem Statement

    Predicting Coupon Redemption

    XYZ Credit Card company regularly helps its merchants understand their data better and take key business decisions accurately by providing machine learning and analytics consulting. ABC is an established Brick & Mortar retailer that frequently conducts marketing campaigns for its diverse product range. As a merchant of XYZ, they have sought XYZ to assist them in their discount marketing process using the power of machine learning.

    Discount marketing and coupon usage are very widely used promotional techniques to attract new customers and to retain & reinforce loyalty of existing customers. The measurement of a consumer’s propensity towards coupon usage and the prediction of the redemption behaviour are crucial parameters in assessing the effectiveness of a marketing campaign.

    ABC promotions are shared across various channels including email, notifications, etc. A number of these campaigns include coupon discounts that are offered for a specific product/range of products. The retailer would like the ability to predict whether customers redeem the coupons received across channels, which will enable the retailer’s marketing team to accurately design coupon construct, and develop more precise and targeted marketing strategies.

    The data available in this problem contains the following information, including the details of a sample of campaigns and coupons used in previous campaigns -

    User Demographic Details

    Campaign and coupon Details

    Product details

    Previous transactions

    Based on previous transaction & performance data from the last 18 campaigns, predict the probability for the next 10 campaigns in the test set for each coupon and customer combination, whether the customer will redeem the coupon or not?

    Dataset Description

    Here is the schema for the different data tables available. The detailed data dictionary is provided next.

    You are provided with the following files:

    train.csv: Train data containing the coupons offered to the given customers under the 18 campaigns

    VariableDefinition
    idUnique id for coupon customer impression
    campaign_idUnique id for a discount campaign
    coupon_idUnique id for a discount coupon
    customer_idUnique id for a customer
    redemption_status(target) (0 - Coupon not redeemed, 1 - Coupon redeemed)

    campaign_data.csv: Campaign information for each of the 28 campaigns

    VariableDefinition
    campaign_idUnique id for a discount campaign
    campaign_typeAnonymised Campaign Type (X/Y)
    start_dateCampaign Start Date
    end_dateCampaign End Date

    coupon_item_mapping.csv: Mapping of coupon and items valid for discount under that coupon

    VariableDefinition
    coupon_idUnique id for a discount coupon (no order)
    item_idUnique id for items for which given coupon is valid (no order)

    customer_demographics.csv: Customer demographic information for some customers

    VariableDefinition
    customer_idUnique id for a customer
    age_rangeAge range of customer family in years
    marital_statusMarried/Single
    rented0 - not rented accommodation, 1 - rented accommodation
    family_sizeNumber of family members
    no_of_childrenNumber of children in the family
    income_bracketLabel Encoded Income Bracket (Higher income corresponds to higher number)

    customer_transaction_data.csv: Transaction data for all customers for duration of campaigns in the train data

    VariableDefinition
    dateDate of Transaction
    customer_idUnique id for a customer
    item_idUnique id for item
    quantityquantity of item bought
    selling_priceSales value of the transaction
    other_discountDiscount from other sources such as manufacturer coupon/loyalty card
    coupon_discountDiscount availed from retailer coupon

    item_data.csv: Item information for each item sold by the retailer

    VariableDefinition
    item_idUnique id for itemv
    brandUnique id for item brand
    brand_typeBrand Type (local/Established)
    categoryItem Category

    test.csv: Contains the coupon customer combination for which redemption status is to be predicted

    VariableDefinition
    idUnique id for coupon customer impression
    campaign_idUnique id for a discount campaign
    coupon_idUnique id for a discount coupon
    customer_idUnique id for a customer

    To summarise the entire process:

    • Customers receive coupons under various campaigns and may choose to redeem it.
    • They can redeem the given coupon for any valid product for that coupon as per coupon item mapping within the duration between campaign start date and end date
    • Next, the customer will redeem the coupon for an item at the retailer store and that will reflect in the transaction table in the column coupon_discount.

    Public and Private Split

    • Test data is further randomly divided into Public (40%) and Private data (60%)
    • Your initial responses will be checked and scored on the Public data.
    • The final rankings would be based on your private score which will be published once the competition is over.

    Note

    • AV_amex_lgb_folds_v28.csv Private Score of 92.50 (Submitted)
    • AV_amex_stack2_folds_v28.csv Private Score 92.811 (Best out of all - mean of CB and LGBM)
    • Stacking always works, dont ignore whatever Public LB says
    • Kaggle Link Best Kernel -**v31**
  20. Data Virtualization Market By Deployment Mode (On-Premises, Cloud-Based),...

    • verifiedmarketresearch.com
    Updated Jul 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VERIFIED MARKET RESEARCH (2024). Data Virtualization Market By Deployment Mode (On-Premises, Cloud-Based), Component Outlook (Standalone Software, Data Integration Solution, Application Tool Solution), Organization Size (Small and Medium-sized Enterprises (SMEs), Large Enterprises), Verticals (Banking, Financial Services, and Insurance (BFSI), Healthcare, Retail and eCommerce, Telecom and IT, Manufacturing), Region for 2024-2031 [Dataset]. https://www.verifiedmarketresearch.com/product/data-virtualization-market/
    Explore at:
    Dataset updated
    Jul 25, 2024
    Dataset provided by
    Verified Market Researchhttps://www.verifiedmarketresearch.com/
    Authors
    VERIFIED MARKET RESEARCH
    License

    https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/

    Time period covered
    2024 - 2031
    Area covered
    Global
    Description

    Data Virtualization Market size was valued at USD 4.05 Billion in 2023 and is projected to reach USD 15.55 Billion By 2031, growing at a CAGR of 20.20% during the forecast period 2024 to 2031.

    Data Virtualization Market: Definition/ Overview

    Data virtualization is an advanced technique in data management that streamlines access to information from various sources, offering a seamless and unified view of data despite its diverse locations and formats. This approach acts as an intermediary layer, enabling users to interact with data as if it were consolidated in a single repository. By abstracting the underlying complexities of different data sources, data virtualization simplifies the user's experience and eliminates the necessity of understanding the specifics of each individual source.

    One of the primary benefits of data virtualization is its ability to provide near real-time access to information. Unlike traditional data integration methods that rely on duplicating data, data virtualization allows users to retrieve and query data in its original location. This real-time capability ensures that users have the most current and accurate data available for decision-making.

    Additionally, data virtualization can significantly enhance system performance. By optimizing queries and minimizing the movement of data across networks, it reduces the overhead typically associated with data integration processes. This efficiency not only accelerates data retrieval and processing times but also improves the overall responsiveness of the system. From a financial perspective, data virtualization offers substantial cost savings. It eliminates the need for complex and costly data integration projects that involve extensive data extraction, transformation, and loading (ETL) processes. By reducing the dependency on physical data consolidation, organizations can allocate resources more effectively and decrease the total cost of ownership for their data infrastructure.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Supriya Kumar; Matthew Arnold; Glen James; Rema Padman (2023). Established databases included in DISCOVER CKD. [Dataset]. http://doi.org/10.1371/journal.pone.0274131.t001

Established databases included in DISCOVER CKD.

Related Article
Explore at:
xlsAvailable download formats
Dataset updated
Jun 16, 2023
Dataset provided by
PLOS ONE
Authors
Supriya Kumar; Matthew Arnold; Glen James; Rema Padman
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Established databases included in DISCOVER CKD.

Search
Clear search
Close search
Google apps
Main menu