100+ datasets found
  1. f

    Established databases included in DISCOVER CKD.

    • plos.figshare.com
    xls
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Supriya Kumar; Matthew Arnold; Glen James; Rema Padman (2023). Established databases included in DISCOVER CKD. [Dataset]. http://doi.org/10.1371/journal.pone.0274131.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Supriya Kumar; Matthew Arnold; Glen James; Rema Padman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Established databases included in DISCOVER CKD.

  2. OneNet Cross-Platform Services

    • data.europa.eu
    • zenodo.org
    unknown
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo, OneNet Cross-Platform Services [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-8329051?locale=en
    Explore at:
    unknown(199476)Available download formats
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The goal of the OneNet System is to facilitate data exchanges among existing platforms, services, applications, and devices by the power of interoperability techniques. To ensure that system requirements are technically -implementable and widely adopted, internationally standardized file formats, metadata, vocabularies and identifiers - are required. The OneNet “Cross-Platform Access” pattern is the fundamental characteristic of an interoperable ecosystem, leading to the definition of the exposed list OneNet Cross-Platform services (CPS). The pattern entails that an application accesses services or resources (information or functions) from multiple platforms through the same interface. For example, a “grid monitoring” application gathers information on different grid indicators provided by different platforms that conduct measurements or state estimations. The challenge of realizing this pattern lies in allowing applications or services within one platform to interact with other platforms (eventually from different providers) with relevant services or applications via the same interface and data formats. Thereby, reuse and composition of services as well as easy integration of data from different platforms are enabled. Based on the defined concept for CPS, an extensive analysis has been performed regarding data exchange patterns and roles involved for system use cases (SUCs) from other H2020 projects and the OneNet demo clusters. This has resulted into a first list of CPS, that has been thereafter taxonomized into 10 categories. The different entries have been defined providing a set of classes such as service description, indicative data producer/consumer etc. Each CPS can be assigned with multiple business objects describing the context of it. For a specific set of widely used by the Demo CPS, there have been formal semantic definitions provided in the "CrossPlatformServices-Semantic" excel worksheet.

  3. f

    Secondary data and baseline covariates of patients included in DISCOVER CKD....

    • plos.figshare.com
    xls
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Supriya Kumar; Matthew Arnold; Glen James; Rema Padman (2023). Secondary data and baseline covariates of patients included in DISCOVER CKD. [Dataset]. http://doi.org/10.1371/journal.pone.0274131.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Supriya Kumar; Matthew Arnold; Glen James; Rema Padman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Secondary data and baseline covariates of patients included in DISCOVER CKD.

  4. Z

    Data from: NetFlow data collected with different packet sampling rates

    • data.niaid.nih.gov
    • portalcientifico.unileon.es
    • +2more
    Updated Feb 24, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adrián Campazas (2022). NetFlow data collected with different packet sampling rates [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6243335
    Explore at:
    Dataset updated
    Feb 24, 2022
    Dataset provided by
    Ignacio Crespo
    Adrián Campazas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    NetFlow traffic generated using DOROTHEA (DOcker-based fRamework fOr gaTHering nEtflow trAffic) NetFlow is a network protocol developed by Cisco for the collection and monitoring of network traffic flow data generated. A flow is defined as a unidirectional sequence of packets with some common properties that pass through a network device.

    NetFlow flows have been captured with different sampling at the packet level. A sampling means that 1 out of every X packets is selected to be flow while the rest of the packets are not valued.

    The version of NetFlow used to build the datasets is 5.

  5. p

    MIMIC-IV

    • physionet.org
    Updated Oct 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Brian Gow; Benjamin Moody; Steven Horng; Leo Anthony Celi; Roger Mark (2024). MIMIC-IV [Dataset]. http://doi.org/10.13026/kpb9-mt58
    Explore at:
    Dataset updated
    Oct 11, 2024
    Authors
    Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Brian Gow; Benjamin Moody; Steven Horng; Leo Anthony Celi; Roger Mark
    License

    https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts

    Description

    Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy. Here we present Medical Information Mart for Intensive Care (MIMIC)-IV, a large deidentified dataset of patients admitted to the emergency department or an intensive care unit at the Beth Israel Deaconess Medical Center in Boston, MA. MIMIC-IV contains data for over 65,000 patients admitted to an ICU and over 200,000 patients admitted to the emergency department. MIMIC-IV incorporates contemporary data and adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare.

  6. ERA5 hourly data on single levels from 1940 to present

    • cds.climate.copernicus.eu
    grib
    Updated Aug 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ECMWF (2025). ERA5 hourly data on single levels from 1940 to present [Dataset]. http://doi.org/10.24381/cds.adbb2d47
    Explore at:
    gribAvailable download formats
    Dataset updated
    Aug 9, 2025
    Dataset provided by
    European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
    Authors
    ECMWF
    License

    https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdfhttps://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdf

    Time period covered
    Jan 1, 1940 - Aug 3, 2025
    Description

    ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. ERA5 provides hourly estimates for a large number of atmospheric, ocean-wave and land-surface quantities. An uncertainty estimate is sampled by an underlying 10-member ensemble at three-hourly intervals. Ensemble mean and spread have been pre-computed for convenience. Such uncertainty estimates are closely related to the information content of the available observing system which has evolved considerably over time. They also indicate flow-dependent sensitive areas. To facilitate many climate applications, monthly-mean averages have been pre-calculated too, though monthly means are not available for the ensemble mean and spread. ERA5 is updated daily with a latency of about 5 days. In case that serious flaws are detected in this early release (called ERA5T), this data could be different from the final release 2 to 3 months later. In case that this occurs users are notified. The data set presented here is a regridded subset of the full ERA5 data set on native resolution. It is online on spinning disk, which should ensure fast and easy access. It should satisfy the requirements for most common applications. An overview of all ERA5 datasets can be found in this article. Information on access to ERA5 data on native resolution is provided in these guidelines. Data has been regridded to a regular lat-lon grid of 0.25 degrees for the reanalysis and 0.5 degrees for the uncertainty estimate (0.5 and 1 degree respectively for ocean waves). There are four main sub sets: hourly and monthly products, both on pressure levels (upper air fields) and single levels (atmospheric, ocean-wave and land surface quantities). The present entry is "ERA5 hourly data on single levels from 1940 to present".

  7. E

    New Oxford Dictionary of English, 2nd Edition

    • live.european-language-grid.eu
    • catalog.elra.info
    Updated Dec 6, 2005
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2005). New Oxford Dictionary of English, 2nd Edition [Dataset]. https://live.european-language-grid.eu/catalogue/lcr/2276
    Explore at:
    Dataset updated
    Dec 6, 2005
    License

    http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    Description

    This is Oxford University Press's most comprehensive single-volume dictionary, with 170,000 entries covering all varieties of English worldwide. The NODE data set constitutes a fully integrated range of formal data types suitable for language engineering and NLP applications: It is available in XML or SGML. - Source dictionary data. The NODE data set includes all the information present in the New Oxford Dictionary of English itself, such as definition text, example sentences, grammatical indicators, and encyclopaedic material. - Morphological data. Each NODE lemma (both headwords and subentries) has a full listing of all possible syntactic forms (e.g. plurals for nouns, inflections for verbs, comparatives and superlatives for adjectives), tagged to show their syntactic relationships. Each form has an IPA pronunciation. Full morphological data is also given for spelling variants (e.g. typical American variants), and a system of links enables straightforward correlation of variant forms to standard forms. The data set thus provides robust support for all look-up routines, and is equally viable for applications dealing with American and British English. - Phrases and idioms. The NODE data set provides a rich and flexible codification of over 10,000 phrasal verbs and other multi-word phrases. It features comprehensive lexical resources enabling applications to identify a phrase not only in the form listed in the dictionary but also in a range of real-world variations, including alternative wording, variable syntactic patterns, inflected verbs, optional determiners, etc. - Subject classification. Using a categorization scheme of 200 key domains, over 80,000 words and senses have been associated with particular subject areas, from aeronautics to zoology. As well as facilitating the extraction of subject-specific sub-lexicons, this also provides an extensive resource for document categorization and information retrieval. - Semantic relationships. The relationships between every noun and noun sense in the dictionary are being codified using an extensive semantic taxonomy on the model of the Princeton WordNet project. (Mapping to WordNet 1.7 is supported.) This structure allows elements of the basic lexical database to function as a formal knowledge database, enabling functionality such as sense disambiguation and logical inference. - Derived from the detailed and authoritative corpus-based research of Oxford University Press's lexicographic team, the NODE data set is a powerful asset for any task dealing with real-world contemporary English usage. By integrating a number of different data types into a single structure, it creates a coherent resource which can be queried along numerous axes, allowing open-ended exploitation by many kinds of language-related applications.

  8. d

    The definition data service for traffic congestion levels in Taoyuan City

    • data.gov.tw
    xml
    Updated Feb 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Transportation, Taoyuan (2022). The definition data service for traffic congestion levels in Taoyuan City [Dataset]. https://data.gov.tw/en/datasets/149613
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Feb 25, 2022
    Dataset authored and provided by
    Department of Transportation, Taoyuan
    License

    https://data.gov.tw/licensehttps://data.gov.tw/license

    Area covered
    Taoyuan
    Description

    For the real-time traffic information of Taoyuan City, explain the basic data definition of congestion levels. According to different road characteristics, different groups are defined for congestion levels, and each group is further subdivided to describe different congestion levels.

  9. T

    Pedestrian Stops - Contact Cards

    • data.cincinnati-oh.gov
    application/rdfxml +5
    Updated Mar 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Pedestrian Stops - Contact Cards [Dataset]. https://data.cincinnati-oh.gov/Safety/Pedestrian-Stops-Contact-Cards/swrz-ak2i
    Explore at:
    application/rssxml, csv, json, xml, application/rdfxml, tsvAvailable download formats
    Dataset updated
    Mar 31, 2025
    Description

    Data Description: This dataset captures all Cincinnati Police Department stops of pedestrians. This data includes time of incident, officer assignment, race/sex of stop subject, and outcome of the stop ("Action taken").. Individual pedestrian stops may populate multiple data rows to account for multiple outcomes: "Instance_ID" is the unique identifier for every one (1) pedestrian stop.

    NOTE: CPD transitioned to a new Record Management System on 6/3/2024. The data before this date may have a different structure than the data after this date.

    Data Creation: This data is created when CPD completes a pedestrian stop and logs the interview via Contact Cards. Contact Cards are a result of the Collaborative Agreement. Contact Cards are manually entered and may experience lags in data entry.

    Data Created by: This data is created by the Cincinnati Police Department.

    Refresh Frequency: This data is updated daily.

    CincyInsights: The City of Cincinnati maintains an interactive dashboard portal, CincyInsights in addition to our Open Data in an effort to increase access and usage of city data. This data set has an associated dashboard available here: https://insights.cincinnati-oh.gov/stories/s/gw5q-kjng

    Data Dictionary: A data dictionary providing definitions of columns and attributes is available as an attachment to this dataset.

    Processing: The City of Cincinnati is committed to providing the most granular and accurate data possible. In that pursuit the Office of Performance and Data Analytics facilitates standard processing to most raw data prior to publication. Processing includes but is not limited: address verification, geocoding, decoding attributes, and addition of administrative areas (i.e. Census, neighborhoods, police districts, etc.).

    Data Usage: For directions on downloading and using open data please visit our How-to Guide: https://data.cincinnati-oh.gov/dataset/Open-Data-How-To-Guide/gdr9-g3ad

    Disclaimer: In compliance with privacy laws, all Public Safety datasets are anonymized and appropriately redacted prior to publication on the City of Cincinnati’s Open Data Portal. This means that for all public safety datasets: (1) the last two digits of all addresses have been replaced with “XX,” and in cases where there is a single digit street address, the entire address number is replaced with "X"; and (2) Latitude and Longitude have been randomly skewed to represent values within the same block area (but not the exact location) of the incident.

  10. f

    fdata-02-00040_Challenges and Legal Gaps of Genetic Profiling in the Era of...

    • frontiersin.figshare.com
    bin
    Updated Jun 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Murat Sariyar; Irene Schlünder (2023). fdata-02-00040_Challenges and Legal Gaps of Genetic Profiling in the Era of Big Data.xml [Dataset]. http://doi.org/10.3389/fdata.2019.00040.s002
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Frontiers
    Authors
    Murat Sariyar; Irene Schlünder
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Profiling of individuals based on inborn, acquired, and assigned characteristics is central for decision making in health care. In the era of omics and big smart data, it becomes urgent to differentiate between different data governance affordances for different profiling activities. Typically, diagnostic profiling is in the focus of researchers and physicians, and other types are regarded as undesired side-effects; for example, in the connection of health care insurance risk calculations. Profiling in a legal sense is addressed, for example, by the EU data protection law. It is defined in the General Data Protection Regulation as automated decision making. This term does not correspond fully with profiling in biomedical research and healthcare, and the impact on privacy has hardly ever been examined. But profiling is also an issue concerning the fundamental right of non-discrimination, whenever profiles are used in a way that has a discriminatory effect on individuals. Here, we will focus on genetic profiling, define related notions as legal and subject-matter definitions frequently differ, and discuss the ethical and legal challenges.

  11. Z

    Data from: Modeling software processes from different domains using SPEM and...

    • data.niaid.nih.gov
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carla Bezerra (2024). Modeling software processes from different domains using SPEM and BPMN notations: An experience report of teaching software processes [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7674964
    Explore at:
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Carla Bezerra
    Emanuel Coutinho
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In a current application development scenario in different environments, technologies and contexts, such as IoT, Blockchain, Machine Learning and Cloud Computing, there is a need for particular solutions for domain-specific software development processes. The proper definition of software processes requires understanding the involved teams and organization’s particularities and specialized technical knowledge in Software Engineering. Although it is an essential part of Software Engineering, many university curricula do not dedicate as much effort to teach software processes, focusing more on the basic principles of Software Engineering, such as requirements, architecture and programming languages. Another important aspect of software processes is modeling. The modeling of a software process provides a basis for managing, automating and supporting the software process improvement. In this context, teaching software process modeling becomes challenging, mainly due to the great emphasis on theory and few practices. This work presents an experience report teaching the definition and modeling of software processes in different domains. We applied in the discipline of software processes a practice for defining and modeling processes in various application domains, such as: IoT, cloud, mobile, critical systems, self-adaptive systems, machine learning, blockchain and games. The processes were modeled in the Software & Systems Process Engineering Metamodel (SPEM) and Business Process Model and Notation (BPMN) notations based on references from the literature for each domain. We evaluated the process modeling practice with the SPEM and BPMN in 3 classes of the software processes discipline and compared the use of the two notations applied to the different domains. We concluded that the modeling tool and the maturity in the domain are essential for the excellent performance of the process.

  12. d

    Geophysical surveys and geospatial data for Bob Kidd Lake, Washington...

    • catalog.data.gov
    • data.usgs.gov
    • +3more
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Geophysical surveys and geospatial data for Bob Kidd Lake, Washington County, Arkansas [Dataset]. https://catalog.data.gov/dataset/geophysical-surveys-and-geospatial-data-for-bob-kidd-lake-washington-county-arkansas
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    U.S. Geological Survey
    Area covered
    Arkansas, Washington County, Bob Kidd Lake
    Description

    This data release consists of three different types of data: including direct current (DC) resistivity profiles, frequency domain electromagnetic (FDEM) survey data, and global navigation satellite system (GNSS) coordinate data of the geophysical measurement locations. A data dictionary is included along with the data and defines all of the table headings, definitions, and units. Earthen dams are common on lakes and ponds, but characteristics of these structures such as construction history, composition, and integrity are often unknown for older dams. Geophysical surveying techniques provide a non-invasive method of mapping their lithology and structure. In particular, DC resistivity and FDEM methods can, when properly processed, provide the information necessary to construct a lithologic model of an earthen dam without having to trench or core through the shell of the dam itself. In September, 2016 the U.S. Geological Survey (USGS) conducted geophysical surveys at Bob Kidd Lake, an 81-hectare lake, in northwestern Arkansas to help determine the composition of the earthen dam and guide any potential geotechnical investigations. A series of DC resistivity surveys were conducted along, parallel, and perpendicular to the axis of the crest of the dam to identify the soil-bedrock interface and any variations in the composition of the earthen dam. A dense survey using a multi-frequency electromagnetic sensor was used to map the shallow materials comprising the dam at a higher resolution. Resistivity measurements were made by transmitting a known current through two electrodes (transmitter) and measuring the voltage potential across two other electrodes (receiver). The multiple channels on the resistivity meter allow for voltage measurements to be made at 10 receivers simultaneously following a current injection. The configuration of the transmitter relative to the receiver(s) is referred to as an array. For these surveys, a Reciprocal Schlumberger array was used, which positions the transmitting pair of electrodes toward the center of the array and the receiving pairs extending away from the transmitter (Loke, 2000; Zonge and others, 2005). The electrical resistance was calculated by dividing the measured voltage by the applied current. The apparent resistivity was determined by multiplying the electrical resistance by a geometric factor. Apparent resistivity is not the true resistivity, but rather a volume-averaged estimate of the true resistivity distribution, because a homogeneous, isotropic subsurface is assumed. To estimate the true resistivity of the heterogeneous and/or anisotropic subsurface, the apparent resistivity data were processed using an inverse modeling software program. The FDEM method complements the two-dimensional (2-D) DC resistivity method and was used to extend the depth of subsurface characterization obtained with resistivity profiles. The FDEM method uses multiple current frequencies to measure bulk electric conductivity values (the inverse of resistivity values) of the earth at different depths (Lucius and others, 2007). For this project FDEM data were collected with a GEM-2, a broadband, multifrequency, fixed-coil electromagnetic induction unit (Geophex, 2015). In addition to the geophysical surveys a concurrent Global Navigation Satellite System (GNSS) survey was conducted using a Real Time Kinematic system (RTK). All electrode locations on the DC resistivity profiles, all measurement locations in the FDEM survey, as well as a point-cloud survey were collected and are included in the dataset. These data were used to geo-reference the geophysical data and may be used to create a Digital Elevation Model (DEM) of the dam surface.

  13. 2023 American Community Survey: DP02 | Selected Social Characteristics in...

    • data.census.gov
    • test.data.census.gov
    Updated Oct 6, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ACS (2022). 2023 American Community Survey: DP02 | Selected Social Characteristics in the United States (ACS 1-Year Estimates Data Profiles) [Dataset]. https://data.census.gov/cedsci/table?q=DP02
    Explore at:
    Dataset updated
    Oct 6, 2022
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Authors
    ACS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2023
    Area covered
    United States
    Description

    Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, the decennial census is the official source of population totals for April 1st of each decennial year. In between censuses, the Census Bureau's Population Estimates Program produces and disseminates the official estimates of the population for the nation, states, counties, cities, and towns and estimates of housing units and the group quarters population for states and counties..Information about the American Community Survey (ACS) can be found on the ACS website. Supporting documentation including code lists, subject definitions, data accuracy, and statistical testing, and a full list of ACS tables and table shells (without estimates) can be found on the Technical Documentation section of the ACS website.Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Source: U.S. Census Bureau, 2023 American Community Survey 1-Year Estimates.ACS data generally reflect the geographic boundaries of legal and statistical areas as of January 1 of the estimate year. For more information, see Geography Boundaries by Year..Users must consider potential differences in geographic boundaries, questionnaire content or coding, or other methodological issues when comparing ACS data from different years. Statistically significant differences shown in ACS Comparison Profiles, or in data users' own analysis, may be the result of these differences and thus might not necessarily reflect changes to the social, economic, housing, or demographic characteristics being compared. For more information, see Comparing ACS Data..Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation). The effect of nonsampling error is not represented in these tables..Ancestry listed in this table refers to the total number of people who responded with a particular ancestry; for example, the estimate given for German represents the number of people who listed German as either their first or second ancestry. This table lists only the largest ancestry groups; see the Detailed Tables for more categories. Race and Hispanic origin groups are not included in this table because data for those groups come from the Race and Hispanic origin questions rather than the ancestry question (see Demographic Table)..Data for year of entry of the native population reflect the year of entry into the U.S. by people who were born in Puerto Rico or U.S. Island Areas or born outside the U.S. to a U.S. citizen parent and who subsequently moved to the U.S..The category "with a broadband Internet subscription" refers to those who said "Yes" to at least one of the following types of Internet subscriptions: Broadband such as cable, fiber optic, or DSL; a cellular data plan; satellite; a fixed wireless subscription; or other non-dial up subscription types..An Internet "subscription" refers to a type of service that someone pays for to access the Internet such as a cellular data plan, broadband such as cable, fiber optic or DSL, or other type of service. This will normally refer to a service that someone is billed for directly for Internet alone or sometimes as part of a bundle.."With a computer" includes those who said "Yes" to at least one of the following types of computers: Desktop or laptop; smartphone; tablet or other portable wireless computer; or some other type of computer..Estimates of urban and rural populations, housing units, and characteristics reflect boundaries of urban areas defined based on 2020 Census data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..Explanation of Symbols:- The estimate could not be computed because there were an insufficient number of sample observations. For a ratio of medians estimate, one or both of the median estimates falls in the lowest interval or highest interval of an open-ended distribution. For a 5-year median estimate, the margin of error associated with a median was larger than the median itself.N The estimate or margin of error cannot be displayed because there were an insufficient number of sample cases in the selected geographic area. (X) The estimate or margin of error is not applicable or not available.median- ...

  14. Data from: Teaching software processes from different application domains

    • zenodo.org
    bin, pdf
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carla Bezerra; Carla Bezerra; Emanuel Coutinho; Emanuel Coutinho (2024). Teaching software processes from different application domains [Dataset]. http://doi.org/10.5281/zenodo.7068357
    Explore at:
    bin, pdfAvailable download formats
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Carla Bezerra; Carla Bezerra; Emanuel Coutinho; Emanuel Coutinho
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In a current application development scenario in different environments, technologies and contexts, such as IoT, Blockchain, Machine Learning and Cloud Computing, there is a need for particular solutions for domain-specific software development processes. The proper definition of software processes requires understanding the involved teams and organization’s particularities and specialized technical knowledge in Software Engineering. Although it is an essential part of Software Engineering, many university curricula do not dedicate as much effort to teaching software processes, focusing more on the basic principles of Software Engineering, such as requirements, architecture and programming languages. Another important aspect of software processes is modeling. The modeling of a software process provides a basis for managing, automating and supporting the software processes improvement. In this context, teaching software processes modeling becomes challenging, mainly due to the great emphasis on theory and few practices. This work presents an experience report teaching the definition and modeling of software processes in different domains. We apply in the discipline of software processes a practice for defining and modeling processes in various application domains, such as: IoT, cloud, mobile, critical systems, self-adaptive systems and games. The processes were modeled in the EPF composer tool based on references from the literature for each domain. In the end, we evaluated the process modeling practice with the students. We concluded that the modeling tool and the maturity in the domain are essential for the good performance of the process.

  15. Enterprise Survey 2009-2019, Panel Data - Slovenia

    • microdata.worldbank.org
    • catalog.ihsn.org
    Updated Aug 6, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank Group (WBG) (2020). Enterprise Survey 2009-2019, Panel Data - Slovenia [Dataset]. https://microdata.worldbank.org/index.php/catalog/3762
    Explore at:
    Dataset updated
    Aug 6, 2020
    Dataset provided by
    World Bankhttp://worldbank.org/
    European Investment Bankhttp://eib.org/
    European Bank for Reconstruction and Developmenthttp://ebrd.com/
    Time period covered
    2008 - 2019
    Area covered
    Slovenia
    Description

    Abstract

    The documentation covers Enterprise Survey panel datasets that were collected in Slovenia in 2009, 2013 and 2019.

    The Slovenia ES 2009 was conducted between 2008 and 2009. The Slovenia ES 2013 was conducted between March 2013 and September 2013. Finally, the Slovenia ES 2019 was conducted between December 2018 and November 2019. The objective of the Enterprise Survey is to gain an understanding of what firms experience in the private sector.

    As part of its strategic goal of building a climate for investment, job creation, and sustainable growth, the World Bank has promoted improving the business environment as a key strategy for development, which has led to a systematic effort in collecting enterprise data across countries. The Enterprise Surveys (ES) are an ongoing World Bank project in collecting both objective data based on firms' experiences and enterprises' perception of the environment in which they operate.

    Geographic coverage

    National

    Analysis unit

    The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must take its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.

    Universe

    As it is standard for the ES, the Slovenia ES was based on the following size stratification: small (5 to 19 employees), medium (20 to 99 employees), and large (100 or more employees).

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The sample for Slovenia ES 2009, 2013, 2019 were selected using stratified random sampling, following the methodology explained in the Sampling Manual for Slovenia 2009 ES and for Slovenia 2013 ES, and in the Sampling Note for 2019 Slovenia ES.

    Three levels of stratification were used in this country: industry, establishment size, and oblast (region). The original sample designs with specific information of the industries and regions chosen are included in the attached Excel file (Sampling Report.xls.) for Slovenia 2009 ES. For Slovenia 2013 and 2019 ES, specific information of the industries and regions chosen is described in the "The Slovenia 2013 Enterprise Surveys Data Set" and "The Slovenia 2019 Enterprise Surveys Data Set" reports respectively, Appendix E.

    For the Slovenia 2009 ES, industry stratification was designed in the way that follows: the universe was stratified into manufacturing industries, services industries, and one residual (core) sector as defined in the sampling manual. Each industry had a target of 90 interviews. For the manufacturing industries sample sizes were inflated by about 17% to account for potential non-response cases when requesting sensitive financial data and also because of likely attrition in future surveys that would affect the construction of a panel. For the other industries (residuals) sample sizes were inflated by about 12% to account for under sampling in firms in service industries.

    For Slovenia 2013 ES, industry stratification was designed in the way that follows: the universe was stratified into one manufacturing industry, and two service industries (retail, and other services).

    Finally, for Slovenia 2019 ES, three levels of stratification were used in this country: industry, establishment size, and region. The original sample design with specific information of the industries and regions chosen is described in "The Slovenia 2019 Enterprise Surveys Data Set" report, Appendix C. Industry stratification was done as follows: Manufacturing – combining all the relevant activities (ISIC Rev. 4.0 codes 10-33), Retail (ISIC 47), and Other Services (ISIC 41-43, 45, 46, 49-53, 55, 56, 58, 61, 62, 79, 95).

    For Slovenia 2009 and 2013 ES, size stratification was defined following the standardized definition for the rollout: small (5 to 19 employees), medium (20 to 99 employees), and large (more than 99 employees). For stratification purposes, the number of employees was defined on the basis of reported permanent full-time workers. This seems to be an appropriate definition of the labor force since seasonal/casual/part-time employment is not a common practice, except in the sectors of construction and agriculture.

    For Slovenia 2009 ES, regional stratification was defined in 2 regions. These regions are Vzhodna Slovenija and Zahodna Slovenija. The Slovenia sample contains panel data. The wave 1 panel “Investment Climate Private Enterprise Survey implemented in Slovenia” consisted of 223 establishments interviewed in 2005. A total of 57 establishments have been re-interviewed in the 2008 Business Environment and Enterprise Performance Survey.

    For Slovenia 2013 ES, regional stratification was defined in 2 regions (city and the surrounding business area) throughout Slovenia.

    Finally, for Slovenia 2019 ES, regional stratification was done across two regions: Eastern Slovenia (NUTS code SI03) and Western Slovenia (SI04).

    Mode of data collection

    Computer Assisted Personal Interview [capi]

    Research instrument

    Questionnaires have common questions (core module) and respectfully additional manufacturing- and services-specific questions. The eligible manufacturing industries have been surveyed using the Manufacturing questionnaire (includes the core module, plus manufacturing specific questions). Retail firms have been interviewed using the Services questionnaire (includes the core module plus retail specific questions) and the residual eligible services have been covered using the Services questionnaire (includes the core module). Each variation of the questionnaire is identified by the index variable, a0.

    Response rate

    Survey non-response must be differentiated from item non-response. The former refers to refusals to participate in the survey altogether whereas the latter refers to the refusals to answer some specific questions. Enterprise Surveys suffer from both problems and different strategies were used to address these issues.

    Item non-response was addressed by two strategies: a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect the refusal to respond as (-8). b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary. However, there were clear cases of low response.

    For 2009 and 2013 Slovenia ES, the survey non-response was addressed by maximizing efforts to contact establishments that were initially selected for interview. Up to 4 attempts were made to contact the establishment for interview at different times/days of the week before a replacement establishment (with similar strata characteristics) was suggested for interview. Survey non-response did occur but substitutions were made in order to potentially achieve strata-specific goals. Further research is needed on survey non-response in the Enterprise Surveys regarding potential introduction of bias.

    For 2009, the number of contacted establishments per realized interview was 6.18. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The relatively low ratio of contacted establishments per realized interview (6.18) suggests that the main source of error in estimates in the Slovenia may be selection bias and not frame inaccuracy.

    For 2013, the number of realized interviews per contacted establishment was 25%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The number of rejections per contact was 44%.

    Finally, for 2019, the number of interviews per contacted establishments was 9.7%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The share of rejections per contact was 75.2%.

  16. Z

    Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...

    • data.niaid.nih.gov
    Updated Oct 20, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yfantidou, Sofia (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6826682
    Explore at:
    Dataset updated
    Oct 20, 2022
    Dataset provided by
    Girdzijauskas, Šarūnas
    Marchioro, Thomas
    Karagianni, Christina
    Ferrari, Elena
    Giakatos, Dimitrios Panteleimon
    Yfantidou, Sofia
    Kazlouski, Andrei
    Efstathiou, Stefanos
    Palotti, Joao
    Vakali, Athena
    Description

    LifeSnaps Dataset Documentation

    Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.

    The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.

    Data Import: Reading CSV

    For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.

    Data Import: Setting up a MongoDB (Recommended)

    To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.

    To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.

    For the Fitbit data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c fitbit

    For the SEMA data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c sema

    For surveys data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c surveys

    If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.

    Data Availability

    The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:

    { _id: id (or user_id): type: data: }

    Each document consists of four fields: id (also found as user_id in sema and survey collections), type, and data. The _id field is the MongoDB-defined primary key and can be ignored. The id field refers to a user-specific ID used to uniquely identify each user across all collections. The type field refers to the specific data type within the collection, e.g., steps, heart rate, calories, etc. The data field contains the actual information about the document e.g., steps count for a specific timestamp for the steps type, in the form of an embedded object. The contents of the data object are type-dependent, meaning that the fields within the data object are different between different types of data. As mentioned previously, all times are stored in local time, and user IDs are common across different collections. For more information on the available data types, see the related publication.

    Surveys Encoding

    BREQ2

    Why do you engage in exercise?

        Code
        Text
    
    
        engage[SQ001]
        I exercise because other people say I should
    
    
        engage[SQ002]
        I feel guilty when I don’t exercise
    
    
        engage[SQ003]
        I value the benefits of exercise
    
    
        engage[SQ004]
        I exercise because it’s fun
    
    
        engage[SQ005]
        I don’t see why I should have to exercise
    
    
        engage[SQ006]
        I take part in exercise because my friends/family/partner say I should
    
    
        engage[SQ007]
        I feel ashamed when I miss an exercise session
    
    
        engage[SQ008]
        It’s important to me to exercise regularly
    
    
        engage[SQ009]
        I can’t see why I should bother exercising
    
    
        engage[SQ010]
        I enjoy my exercise sessions
    
    
        engage[SQ011]
        I exercise because others will not be pleased with me if I don’t
    
    
        engage[SQ012]
        I don’t see the point in exercising
    
    
        engage[SQ013]
        I feel like a failure when I haven’t exercised in a while
    
    
        engage[SQ014]
        I think it is important to make the effort to exercise regularly
    
    
        engage[SQ015]
        I find exercise a pleasurable activity
    
    
        engage[SQ016]
        I feel under pressure from my friends/family to exercise
    
    
        engage[SQ017]
        I get restless if I don’t exercise regularly
    
    
        engage[SQ018]
        I get pleasure and satisfaction from participating in exercise
    
    
        engage[SQ019]
        I think exercising is a waste of time
    

    PANAS

    Indicate the extent you have felt this way over the past week

        P1[SQ001]
        Interested
    
    
        P1[SQ002]
        Distressed
    
    
        P1[SQ003]
        Excited
    
    
        P1[SQ004]
        Upset
    
    
        P1[SQ005]
        Strong
    
    
        P1[SQ006]
        Guilty
    
    
        P1[SQ007]
        Scared
    
    
        P1[SQ008]
        Hostile
    
    
        P1[SQ009]
        Enthusiastic
    
    
        P1[SQ010]
        Proud
    
    
        P1[SQ011]
        Irritable
    
    
        P1[SQ012]
        Alert
    
    
        P1[SQ013]
        Ashamed
    
    
        P1[SQ014]
        Inspired
    
    
        P1[SQ015]
        Nervous
    
    
        P1[SQ016]
        Determined
    
    
        P1[SQ017]
        Attentive
    
    
        P1[SQ018]
        Jittery
    
    
        P1[SQ019]
        Active
    
    
        P1[SQ020]
        Afraid
    

    Personality

    How Accurately Can You Describe Yourself?

        Code
        Text
    
    
        ipip[SQ001]
        Am the life of the party.
    
    
        ipip[SQ002]
        Feel little concern for others.
    
    
        ipip[SQ003]
        Am always prepared.
    
    
        ipip[SQ004]
        Get stressed out easily.
    
    
        ipip[SQ005]
        Have a rich vocabulary.
    
    
        ipip[SQ006]
        Don't talk a lot.
    
    
        ipip[SQ007]
        Am interested in people.
    
    
        ipip[SQ008]
        Leave my belongings around.
    
    
        ipip[SQ009]
        Am relaxed most of the time.
    
    
        ipip[SQ010]
        Have difficulty understanding abstract ideas.
    
    
        ipip[SQ011]
        Feel comfortable around people.
    
    
        ipip[SQ012]
        Insult people.
    
    
        ipip[SQ013]
        Pay attention to details.
    
    
        ipip[SQ014]
        Worry about things.
    
    
        ipip[SQ015]
        Have a vivid imagination.
    
    
        ipip[SQ016]
        Keep in the background.
    
    
        ipip[SQ017]
        Sympathize with others' feelings.
    
    
        ipip[SQ018]
        Make a mess of things.
    
    
        ipip[SQ019]
        Seldom feel blue.
    
    
        ipip[SQ020]
        Am not interested in abstract ideas.
    
    
        ipip[SQ021]
        Start conversations.
    
    
        ipip[SQ022]
        Am not interested in other people's problems.
    
    
        ipip[SQ023]
        Get chores done right away.
    
    
        ipip[SQ024]
        Am easily disturbed.
    
    
        ipip[SQ025]
        Have excellent ideas.
    
    
        ipip[SQ026]
        Have little to say.
    
    
        ipip[SQ027]
        Have a soft heart.
    
    
        ipip[SQ028]
        Often forget to put things back in their proper place.
    
    
        ipip[SQ029]
        Get upset easily.
    
    
        ipip[SQ030]
        Do not have a good imagination.
    
    
        ipip[SQ031]
        Talk to a lot of different people at parties.
    
    
        ipip[SQ032]
        Am not really interested in others.
    
    
        ipip[SQ033]
        Like order.
    
    
        ipip[SQ034]
        Change my mood a lot.
    
    
        ipip[SQ035]
        Am quick to understand things.
    
    
        ipip[SQ036]
        Don't like to draw attention to myself.
    
    
        ipip[SQ037]
        Take time out for others.
    
    
        ipip[SQ038]
        Shirk my duties.
    
    
        ipip[SQ039]
        Have frequent mood swings.
    
    
        ipip[SQ040]
        Use difficult words.
    
    
        ipip[SQ041]
        Don't mind being the centre of attention.
    
    
        ipip[SQ042]
        Feel others' emotions.
    
    
        ipip[SQ043]
        Follow a schedule.
    
    
        ipip[SQ044]
        Get irritated easily.
    
    
        ipip[SQ045]
        Spend time reflecting on things.
    
    
        ipip[SQ046]
        Am quiet around strangers.
    
    
        ipip[SQ047]
        Make people feel at ease.
    
    
        ipip[SQ048]
        Am exacting in my work.
    
    
        ipip[SQ049]
        Often feel blue.
    
    
        ipip[SQ050]
        Am full of ideas.
    

    STAI

    Indicate how you feel right now

        Code
        Text
    
    
        STAI[SQ001]
        I feel calm
    
    
        STAI[SQ002]
        I feel secure
    
    
        STAI[SQ003]
        I am tense
    
    
        STAI[SQ004]
        I feel strained
    
    
        STAI[SQ005]
        I feel at ease
    
    
        STAI[SQ006]
        I feel upset
    
    
        STAI[SQ007]
        I am presently worrying over possible misfortunes
    
    
        STAI[SQ008]
        I feel satisfied
    
    
        STAI[SQ009]
        I feel frightened
    
    
        STAI[SQ010]
        I feel comfortable
    
    
        STAI[SQ011]
        I feel self-confident
    
    
        STAI[SQ012]
        I feel nervous
    
    
        STAI[SQ013]
        I am jittery
    
    
        STAI[SQ014]
        I feel indecisive
    
    
        STAI[SQ015]
        I am relaxed
    
    
        STAI[SQ016]
        I feel content
    
    
        STAI[SQ017]
        I am worried
    
    
        STAI[SQ018]
        I feel confused
    
    
        STAI[SQ019]
        I feel steady
    
    
        STAI[SQ020]
        I feel pleasant
    

    TTM

    Do you engage in regular physical activity according to the definition above? How frequently did each event or experience occur in the past month?

        Code
        Text
    
    
        processes[SQ002]
        I read articles to learn more about physical
    
  17. Predicting Coupon Redemption_Feature Selection

    • kaggle.com
    zip
    Updated Nov 17, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vasudeva (2019). Predicting Coupon Redemption_Feature Selection [Dataset]. https://www.kaggle.com/vasudeva009/predicting-coupon-redemption-feature-selection
    Explore at:
    zip(65337333 bytes)Available download formats
    Dataset updated
    Nov 17, 2019
    Authors
    vasudeva
    Description

    Problem Statement

    Predicting Coupon Redemption

    XYZ Credit Card company regularly helps its merchants understand their data better and take key business decisions accurately by providing machine learning and analytics consulting. ABC is an established Brick & Mortar retailer that frequently conducts marketing campaigns for its diverse product range. As a merchant of XYZ, they have sought XYZ to assist them in their discount marketing process using the power of machine learning.

    Discount marketing and coupon usage are very widely used promotional techniques to attract new customers and to retain & reinforce loyalty of existing customers. The measurement of a consumer’s propensity towards coupon usage and the prediction of the redemption behaviour are crucial parameters in assessing the effectiveness of a marketing campaign.

    ABC promotions are shared across various channels including email, notifications, etc. A number of these campaigns include coupon discounts that are offered for a specific product/range of products. The retailer would like the ability to predict whether customers redeem the coupons received across channels, which will enable the retailer’s marketing team to accurately design coupon construct, and develop more precise and targeted marketing strategies.

    The data available in this problem contains the following information, including the details of a sample of campaigns and coupons used in previous campaigns -

    User Demographic Details

    Campaign and coupon Details

    Product details

    Previous transactions

    Based on previous transaction & performance data from the last 18 campaigns, predict the probability for the next 10 campaigns in the test set for each coupon and customer combination, whether the customer will redeem the coupon or not?

    Dataset Description

    Here is the schema for the different data tables available. The detailed data dictionary is provided next.

    You are provided with the following files:

    train.csv: Train data containing the coupons offered to the given customers under the 18 campaigns

    VariableDefinition
    idUnique id for coupon customer impression
    campaign_idUnique id for a discount campaign
    coupon_idUnique id for a discount coupon
    customer_idUnique id for a customer
    redemption_status(target) (0 - Coupon not redeemed, 1 - Coupon redeemed)

    campaign_data.csv: Campaign information for each of the 28 campaigns

    VariableDefinition
    campaign_idUnique id for a discount campaign
    campaign_typeAnonymised Campaign Type (X/Y)
    start_dateCampaign Start Date
    end_dateCampaign End Date

    coupon_item_mapping.csv: Mapping of coupon and items valid for discount under that coupon

    VariableDefinition
    coupon_idUnique id for a discount coupon (no order)
    item_idUnique id for items for which given coupon is valid (no order)

    customer_demographics.csv: Customer demographic information for some customers

    VariableDefinition
    customer_idUnique id for a customer
    age_rangeAge range of customer family in years
    marital_statusMarried/Single
    rented0 - not rented accommodation, 1 - rented accommodation
    family_sizeNumber of family members
    no_of_childrenNumber of children in the family
    income_bracketLabel Encoded Income Bracket (Higher income corresponds to higher number)

    customer_transaction_data.csv: Transaction data for all customers for duration of campaigns in the train data

    VariableDefinition
    dateDate of Transaction
    customer_idUnique id for a customer
    item_idUnique id for item
    quantityquantity of item bought
    selling_priceSales value of the transaction
    other_discountDiscount from other sources such as manufacturer coupon/loyalty card
    coupon_discountDiscount availed from retailer coupon

    item_data.csv: Item information for each item sold by the retailer

    VariableDefinition
    item_idUnique id for itemv
    brandUnique id for item brand
    brand_typeBrand Type (local/Established)
    categoryItem Category

    test.csv: Contains the coupon customer combination for which redemption status is to be predicted

    VariableDefinition
    idUnique id for coupon customer impression
    campaign_idUnique id for a discount campaign
    coupon_idUnique id for a discount coupon
    customer_idUnique id for a customer

    To summarise the entire process:

    • Customers receive coupons under various campaigns and may choose to redeem it.
    • They can redeem the given coupon for any valid product for that coupon as per coupon item mapping within the duration between campaign start date and end date
    • Next, the customer will redeem the coupon for an item at the retailer store and that will reflect in the transaction table in the column coupon_discount.

    Public and Private Split

    • Test data is further randomly divided into Public (40%) and Private data (60%)
    • Your initial responses will be checked and scored on the Public data.
    • The final rankings would be based on your private score which will be published once the competition is over.

    Note

    • AV_amex_lgb_folds_v28.csv Private Score of 92.50 (Submitted)
    • AV_amex_stack2_folds_v28.csv Private Score 92.811 (Best out of all - mean of CB and LGBM)
    • Stacking always works, dont ignore whatever Public LB says
    • Kaggle Link Best Kernel -**v31**
  18. Data Virtualization Market By Deployment Mode (On-Premises, Cloud-Based),...

    • verifiedmarketresearch.com
    Updated Jul 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VERIFIED MARKET RESEARCH (2024). Data Virtualization Market By Deployment Mode (On-Premises, Cloud-Based), Component Outlook (Standalone Software, Data Integration Solution, Application Tool Solution), Organization Size (Small and Medium-sized Enterprises (SMEs), Large Enterprises), Verticals (Banking, Financial Services, and Insurance (BFSI), Healthcare, Retail and eCommerce, Telecom and IT, Manufacturing), Region for 2024-2031 [Dataset]. https://www.verifiedmarketresearch.com/product/data-virtualization-market/
    Explore at:
    Dataset updated
    Jul 25, 2024
    Dataset provided by
    Verified Market Researchhttps://www.verifiedmarketresearch.com/
    Authors
    VERIFIED MARKET RESEARCH
    License

    https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/

    Time period covered
    2024 - 2031
    Area covered
    Global
    Description

    Data Virtualization Market size was valued at USD 4.05 Billion in 2023 and is projected to reach USD 15.55 Billion By 2031, growing at a CAGR of 20.20% during the forecast period 2024 to 2031.

    Data Virtualization Market: Definition/ Overview

    Data virtualization is an advanced technique in data management that streamlines access to information from various sources, offering a seamless and unified view of data despite its diverse locations and formats. This approach acts as an intermediary layer, enabling users to interact with data as if it were consolidated in a single repository. By abstracting the underlying complexities of different data sources, data virtualization simplifies the user's experience and eliminates the necessity of understanding the specifics of each individual source.

    One of the primary benefits of data virtualization is its ability to provide near real-time access to information. Unlike traditional data integration methods that rely on duplicating data, data virtualization allows users to retrieve and query data in its original location. This real-time capability ensures that users have the most current and accurate data available for decision-making.

    Additionally, data virtualization can significantly enhance system performance. By optimizing queries and minimizing the movement of data across networks, it reduces the overhead typically associated with data integration processes. This efficiency not only accelerates data retrieval and processing times but also improves the overall responsiveness of the system. From a financial perspective, data virtualization offers substantial cost savings. It eliminates the need for complex and costly data integration projects that involve extensive data extraction, transformation, and loading (ETL) processes. By reducing the dependency on physical data consolidation, organizations can allocate resources more effectively and decrease the total cost of ownership for their data infrastructure.

  19. d

    AdBlue / DEF Gas Station Data and Price Data Europe – 44k+ Stations, Truck...

    • datarade.ai
    .json, .xml, .csv
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    xavvy, AdBlue / DEF Gas Station Data and Price Data Europe – 44k+ Stations, Truck and Car Pumps – API & Datasets [Dataset]. https://datarade.ai/data-products/xavvy-adblue-def-station-poi-and-price-data-europe-44k-xavvy
    Explore at:
    .json, .xml, .csvAvailable download formats
    Dataset authored and provided by
    xavvy
    Area covered
    Czech Republic, Germany, Lithuania, Slovakia, Sweden, Romania, Estonia, Finland, Croatia, United States of America, Europe
    Description

    Xavvy fuel is the leading source for Fuel Station POI and Price data worldwide and specialized in data quality and enrichment. We provide high quality POI Data of gas stations about different fuel types for all European countries. One-time or regular data delivery, push or pull services, and any data format – we adjust to our customer’s needs. Total number of stations per country or region, distribution of market shares among competitors or the perfect location for new AdBlue stations or Truck pumps - our data provides answers to various questions and offers the perfect foundation for in-depth analyses and statistics. In this way, our data helps customers from various industries to gain more valuable insights into the fuel market and its development. Thereby providing an unparalleled basis for strategic decisions such as business development, competitive approach or expansion. In addition, our data can contribute to the consistency and quality of an existing dataset. Simply map data to check for accuracy and correct erroneous data. 130+ sources including governments, petroleum companies, fuel card providers and crowd sourcing enable xavvy to provide various information about AdBlue / DEF Stations in Europe.

    Especially if you want to display information about AdBlue stations on a map or in an application, high data quality is crucial for an excellent customer experience. Therefore, processing procedures are continuously improved to increase data quality: • regular quality controls (e.g. via monitoring dashboards) • Geocoding systems correct and specify geocoordinates • Data sets are cleaned and standardized • Current developments and mergers are taken into account • The number of data sources is constantly expanded to map different data sources against each other

    Check out our other Data Offerings available and gain more valuable market insights on gas stations and AdBlue distribution directly from the experts!

  20. a

    SES Water Domestic Consumption

    • hub.arcgis.com
    Updated Apr 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    dpararajasingam_ses (2024). SES Water Domestic Consumption [Dataset]. https://hub.arcgis.com/maps/f2cdc1248fcf4fd289ac1d3f25e75b3b_0/about
    Explore at:
    Dataset updated
    Apr 26, 2024
    Dataset authored and provided by
    dpararajasingam_ses
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview    This dataset offers valuable insights into yearly domestic water consumption across various Lower Super Output Areas (LSOAs) or Data Zones, accompanied by the count of water meters within each area. It is instrumental for analysing residential water use patterns, facilitating water conservation efforts, and guiding infrastructure development and policy making at a localised level. Key Definitions    Aggregation   The process of summarising or grouping data to obtain a single or reduced set of information, often for analysis or reporting purposes.     AMR Meter Automatic meter reading (AMR) is the technology of automatically collecting consumption, diagnostic, and status data from a water meter remotely and periodically. Dataset   Structured and organised collection of related elements, often stored digitally, used for analysis and interpretation in various fields.  Data Zone Data zones are the key geography for the dissemination of small area statistics in Scotland Dumb Meter A dumb meter or analogue meter is read manually. It does not have any external connectivity. Granularity   Data granularity is a measure of the level of detail in a data structure. In time-series data, for example, the granularity of measurement might be based on intervals of years, months, weeks, days, or hours   ID   Abbreviation for Identification that refers to any means of verifying the unique identifier assigned to each asset for the purposes of tracking, management, and maintenance.    LSOA Lower Layer Super Output Areas (LSOA) are a geographic hierarchy designed to improve the reporting of small area statistics in England and Wales. Open Data Triage   The process carried out by a Data Custodian to determine if there is any evidence of sensitivities associated with Data Assets, their associated Metadata and Software Scripts used to process Data Assets if they are used as Open Data.    Schema   Structure for organising and handling data within a dataset, defining the attributes, their data types, and the relationships between different entities. It acts as a framework that ensures data integrity and consistency by specifying permissible data types and constraints for each attribute.    Smart Meter A smart meter is an electronic device that records information and communicates it to the consumer and the supplier. It differs from automatic meter reading (AMR) in that it enables two-way communication between the meter and the supplier. Units   Standard measurements used to quantify and compare different physical quantities.  Water Meter Water metering is the practice of measuring water use. Water meters measure the volume of water used by residential and commercial building units that are supplied with water by a public water supply system. Data History    Data Origin    Domestic consumption data is recorded using water meters. The consumption recorded is then sent back to water companies. This dataset is extracted from the water companies. Data Triage Considerations    This section discusses the careful handling of data to maintain anonymity and addresses the challenges associated with data updates, such as identifying household changes or meter replacements. Identification of Critical Infrastructure  This aspect is not applicable for the dataset, as the focus is on domestic water consumption and does not contain any information that reveals critical infrastructure details. Commercial Risks and Anonymisation Individual Identification Risks There is a potential risk of identifying individuals or households if the consumption data is updated irregularly (e.g., every 6 months) and an out-of-cycle update occurs (e.g., after 2 months), which could signal a change in occupancy or ownership. Such patterns need careful handling to avoid accidental exposure of sensitive information. Meter and Property Association Challenges arise in maintaining historical data integrity when meters are replaced but the property remains the same. Ensuring continuity in the data without revealing personal information is crucial. Interpretation of Null Consumption Instances of null consumption could be misunderstood as a lack of water use, whereas they might simply indicate missing data. Distinguishing between these scenarios is vital to prevent misleading conclusions. Meter Re-reads The dataset must account for instances where meters are read multiple times for accuracy. Joint Supplies & Multiple Meters per Household Special consideration is required for households with multiple meters as well as multiple households that share a meter as this could complicate data aggregation. Schema Consistency with the Energy Industry: In formulating the schema for the domestic water consumption dataset, careful consideration was given to the potential risks to individual privacy. This evaluation included examining the frequency of data updates, the handling of property and meter associations, interpretations of null consumption, meter re-reads, joint suppliers, and the presence of multiple meters within a single household as described above. After a thorough assessment of these factors and their implications for individual privacy, it was decided to align the dataset's schema with the standards established within the energy industry. This decision was influenced by the energy sector's experience and established practices in managing similar risks associated with smart meters. This ensures a high level of data integrity and privacy protection. Schema The dataset schema is aligned with those used in the energy industry, which has encountered similar challenges with smart meters. However, it is important to note that the energy industry has a much higher density of meter distribution, especially smart meters. Aggregation to Mitigate Risks The dataset employs an elevated level of data aggregation to minimise the risk of individual identification. This approach is crucial in maintaining the utility of the dataset while ensuring individual privacy. The aggregation level is carefully chosen to remove identifiable risks without excluding valuable data, thus balancing data utility with privacy concerns. Data Freshness  Users should be aware that this dataset reflects historical consumption patterns and does not represent real-time data. Publish Frequency  Annually Data Triage Review Frequency    An annual review is conducted to ensure the dataset's relevance and accuracy, with adjustments made based on specific requests or evolving data trends. Data Specifications   For the domestic water consumption dataset, the data specifications are designed to ensure comprehensiveness and relevance, while maintaining clarity and focus. The specifications for this dataset include: Each dataset encompasses recordings of domestic water consumption as measured and reported by the data publisher. It excludes commercial consumption. Where it is necessary to estimate consumption, this is calculated based on actual meter readings. Meters of all types (smart, dumb, AMR) are included in this dataset. The dataset is updated and published annually. Historical data may be made available to facilitate trend analysis and comparative studies, although it is not mandatory for each dataset release. Context   Users are cautioned against using the dataset for immediate operational decisions regarding water supply management. The data should be interpreted considering potential seasonal and weather-related influences on water consumption patterns. The geographical data provided does not pinpoint locations of water meters within an LSOA. The dataset aims to cover a broad spectrum of households, from single-meter homes to those with multiple meters, to accurately reflect the diversity of water use within an LSOA.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Supriya Kumar; Matthew Arnold; Glen James; Rema Padman (2023). Established databases included in DISCOVER CKD. [Dataset]. http://doi.org/10.1371/journal.pone.0274131.t001

Established databases included in DISCOVER CKD.

Related Article
Explore at:
xlsAvailable download formats
Dataset updated
Jun 16, 2023
Dataset provided by
PLOS ONE
Authors
Supriya Kumar; Matthew Arnold; Glen James; Rema Padman
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Established databases included in DISCOVER CKD.

Search
Clear search
Close search
Google apps
Main menu