Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Established databases included in DISCOVER CKD.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Introduction Black entrepreneurship in Canada makes important contributions to the Canadian economy, from fostering innovation to creating employment and building generational wealth. At least 1.3% of Black adults in Canada are business owners, relative to the 2.3% that are business owners from the entire Canadian population (Business Development Bank of Canada (BDC), 2025). Similarly, Black people account for 2.4% of all business owners in the country even though they only represent 4.3% of the entire population. Women account for 33% of these Black businesses, as compared to their 20% share in the country’s total business ownership (Diversity Institute, 2024), highlighting the potential of Black entrepreneurship for economic empowerment. However, Black businesses are faced with some systemic barriers that impede their ability to thrive, including but not limited to underrepresentation among entrepreneurs in Canada and limited access to finance, restricted networking opportunities, and insufficient specialized support programs (Gueye et al., 2022; Gueye, 2023; Diversity Institute, 2024). These challenges stem in part from a lack of comprehensive and reliable data on Black businesses and the absence of standardized definitions for key concepts such as Black entrepreneurs, Black enterprises, and Black entrepreneurship. Without a clear understanding of who constitutes a Black entrepreneur and the scale of their contributions, policymakers and stakeholders struggle to provide the necessary support and resources to advance this community. In fact, the development of policies and initiatives for Black businesses faces difficulties because current data about Black entrepreneurship remains fragmented and inconsistent, with different sources reporting different numbers (Grekou et al., 2021; Gueye, 2023). These discrepancies highlight the urgent need for a unified approach to data collection and analysis, as accurate and comprehensive data are critical to understanding the size, scope, and needs of Black entrepreneurs, enabling targeted policy interventions and resource allocation. Current data fragmentation problems combined with non-standardized definitions create a situation where Black business owners are frequently ignored or inaccurately classified or omitted (Coletto et al., 2021). The significance of this research, therefore, lies in its ability to resolve systemic barriers through an improved representation of Black entrepreneurs. This research aims to harmonise missing data points and set specific criteria to establish sound tools for policymakers, researchers, and community groups who want to better assist Black entrepreneurs. With this, Black-owned business support will be strengthened through targeted policies and programs that develop sustainable growth for these businesses in Canada. The main objectives of this study are threefold. The research seeks to reconcile disparate Black entrepreneurship statistics from Afrobiz.ca alongside Canadian Black Chamber of Commerce records and Statistics Canada databases. Also, the research seeks to develop unified criteria to define Black business owners together with their enterprises to improve both data collection precision and reporting consistency. Lastly, the research will establish procedures to build a standardized database of Black entrepreneurs by integrating present data sources and making sure both formal and informal businesses receive proper representation. These research efforts will establish fundamental principles for developing an inclusive and equal entrepreneurial system throughout Canada. Introduction L'entrepreneuriat noir au Canada apporte d'importantes contributions à l'économie canadienne, qu'il s'agisse de favoriser l'innovation, de créer des emplois ou de constituer un patrimoine générationnel. Au moins 1,3 % des adultes noirs au Canada sont propriétaires d'une entreprise, contre 2,3 % pour l'ensemble de la population canadienne (Banque de développement du Canada (BDC), 2025). De même, les Noirs représentent 2,4 % de tous les propriétaires d'entreprise du pays, alors qu'ils ne représentent que 4,3 % de la population totale. Les femmes représentent 33 % de ces entreprises noires, alors qu'elles représentent 20 % de l'ensemble des entreprises du pays (Diversity Institute, 2024), ce qui souligne le potentiel de l'entrepreneuriat noir en matière d'émancipation économique. Toutefois, les entreprises noires sont confrontées à certains obstacles systémiques qui entravent leur capacité à prospérer, notamment la sousreprésentation des entrepreneurs au Canada et l'accès limité au financement, les possibilités de réseautage restreintes et l'insuffisance des programmes de soutien spécialisés (Gueye et al., 2022 ; Gueye, 2023 ; Diversity Institute, 2024). Ces défis découlent en partie d'un manque de données complètes et fiables sur les entreprises noires et de l'absence de définitions normalisées pour des concepts clés tels que les entrepreneurs...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The goal of the OneNet System is to facilitate data exchanges among existing platforms, services, applications, and devices by the power of interoperability techniques. To ensure that system requirements are technically -implementable and widely adopted, internationally standardized file formats, metadata, vocabularies and identifiers - are required. The OneNet “Cross-Platform Access” pattern is the fundamental characteristic of an interoperable ecosystem, leading to the definition of the exposed list OneNet Cross-Platform services (CPS). The pattern entails that an application accesses services or resources (information or functions) from multiple platforms through the same interface. For example, a “grid monitoring” application gathers information on different grid indicators provided by different platforms that conduct measurements or state estimations. The challenge of realizing this pattern lies in allowing applications or services within one platform to interact with other platforms (eventually from different providers) with relevant services or applications via the same interface and data formats. Thereby, reuse and composition of services as well as easy integration of data from different platforms are enabled. Based on the defined concept for CPS, an extensive analysis has been performed regarding data exchange patterns and roles involved for system use cases (SUCs) from other H2020 projects and the OneNet demo clusters. This has resulted into a first list of CPS, that has been thereafter taxonomized into 10 categories. The different entries have been defined providing a set of classes such as service description, indicative data producer/consumer etc. Each CPS can be assigned with multiple business objects describing the context of it. For a specific set of widely used by the Demo CPS, there have been formal semantic definitions provided in the "CrossPlatformServices-Semantic" excel worksheet.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Secondary data and baseline covariates of patients included in DISCOVER CKD.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
*The World Development Indicators (WDI) is a premier compilation of cross-country comparable data about development. It provides a broad range of economic, social, environmental, and governance indicators to support analysis and decision-making for development policies. The dataset includes indicators from different countries, spanning multiple decades, enabling researchers and policymakers to understand trends and progress in development goals such as poverty reduction, education, healthcare, and infrastructure.*
*The dataset is a collection of multiple CSV files providing information on global indicators, countries, and time-series data. It is structured as follows:*
1. series
:
Contains metadata for various indicators, including their descriptions, definitions, and other relevant information. This file acts as a reference for understanding what each indicator represents.
2. country_series
:
Establishes relationships between countries and specific indicators. It provides additional metadata, such as contextual descriptions of indicator usage for particular countries.
3. countries
:
Includes detailed information about countries, such as country codes, region classifications, income levels, and other geographical or socio-economic attributes.
4. footnotes
:
Provides supplementary notes and additional context for specific data points in the main dataset. These notes clarify exceptions, limitations, or other special considerations for particular entries.
5. main_data
:
The core dataset containing the actual indicator values for countries across different years. This file forms the backbone of the dataset and is used for analysis.
6. series_time
:
Contains time-related metadata for indicators, such as their start and end years or periods of data availability.
*This dataset is ideal for analyzing global development trends, comparing country-level statistics, and studying the relationships between different socio-economic indicators over time.*
Description: Unique code identifying the data series.
Example: AG.LND.AGRI.K2 (Agricultural land, sq. km).
Description: Category under which the indicator is classified.
Example: Environment: Land use.
Description: Full name describing what the indicator measures.
Example: Agricultural land (sq. km).
Description: A brief explanation of the indicator (if available).
Example: Not applicable for all indicators.
Description: Detailed explanation of the indicator’s meaning and methodology.
Example: "Agricultural land refers to the share of land area that is arable, under permanent crops, or under permanent pastures."
Description: Unit in which the data is expressed.
Example: Square kilometers.
Description: How frequently the data is collected or reported.
Example: Annual.
Description: The reference period used for comparison, if applicable.
Example: Often not specified.
Description: Additional context or remarks about the data.
Example: "Data for former states are included in successor states."
Description: Method used to combine data for groups (e.g., regions).
Example: Weighted average.
Description: Constraints or exceptions in the data.
Example: "Data may not be directly comparable across countries due to different definitions."
Description: Remarks provided by the data source.
Example: Not specified for all indicators.
Description: Broad remarks about the dataset or indicator.
Example: Not available in all cases.
Description: Organization providing the data.
Example: Food and Agriculture Organization.
Description: Explanation of how the data was generated.
Example: "Agricultural land is calculated based on land area classified as arable."
Description: Importance of the indicator for development.
Example: "Agricultural land availability impacts food security and rural livelihoods."
Description: URLs to related information sources (if any).
Example: Not specified.
Description: Additional web resources.
Example: Not specified.
Description: Indicators conceptually related...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
NetFlow traffic generated using DOROTHEA (DOcker-based fRamework fOr gaTHering nEtflow trAffic) NetFlow is a network protocol developed by Cisco for the collection and monitoring of network traffic flow data generated. A flow is defined as a unidirectional sequence of packets with some common properties that pass through a network device.
NetFlow flows have been captured with different sampling at the packet level. A sampling means that 1 out of every X packets is selected to be flow while the rest of the packets are not valued.
The version of NetFlow used to build the datasets is 5.
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy. Here we present Medical Information Mart for Intensive Care (MIMIC)-IV, a large deidentified dataset of patients admitted to the emergency department or an intensive care unit at the Beth Israel Deaconess Medical Center in Boston, MA. MIMIC-IV contains data for over 65,000 patients admitted to an ICU and over 200,000 patients admitted to the emergency department. MIMIC-IV incorporates contemporary data and adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare.
https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdfhttps://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdf
ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. ERA5 provides hourly estimates for a large number of atmospheric, ocean-wave and land-surface quantities. An uncertainty estimate is sampled by an underlying 10-member ensemble at three-hourly intervals. Ensemble mean and spread have been pre-computed for convenience. Such uncertainty estimates are closely related to the information content of the available observing system which has evolved considerably over time. They also indicate flow-dependent sensitive areas. To facilitate many climate applications, monthly-mean averages have been pre-calculated too, though monthly means are not available for the ensemble mean and spread. ERA5 is updated daily with a latency of about 5 days. In case that serious flaws are detected in this early release (called ERA5T), this data could be different from the final release 2 to 3 months later. In case that this occurs users are notified. The data set presented here is a regridded subset of the full ERA5 data set on native resolution. It is online on spinning disk, which should ensure fast and easy access. It should satisfy the requirements for most common applications. An overview of all ERA5 datasets can be found in this article. Information on access to ERA5 data on native resolution is provided in these guidelines. Data has been regridded to a regular lat-lon grid of 0.25 degrees for the reanalysis and 0.5 degrees for the uncertainty estimate (0.5 and 1 degree respectively for ocean waves). There are four main sub sets: hourly and monthly products, both on pressure levels (upper air fields) and single levels (atmospheric, ocean-wave and land surface quantities). The present entry is "ERA5 hourly data on single levels from 1940 to present".
http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
This is Oxford University Press's most comprehensive single-volume dictionary, with 170,000 entries covering all varieties of English worldwide. The NODE data set constitutes a fully integrated range of formal data types suitable for language engineering and NLP applications: It is available in XML or SGML. - Source dictionary data. The NODE data set includes all the information present in the New Oxford Dictionary of English itself, such as definition text, example sentences, grammatical indicators, and encyclopaedic material. - Morphological data. Each NODE lemma (both headwords and subentries) has a full listing of all possible syntactic forms (e.g. plurals for nouns, inflections for verbs, comparatives and superlatives for adjectives), tagged to show their syntactic relationships. Each form has an IPA pronunciation. Full morphological data is also given for spelling variants (e.g. typical American variants), and a system of links enables straightforward correlation of variant forms to standard forms. The data set thus provides robust support for all look-up routines, and is equally viable for applications dealing with American and British English. - Phrases and idioms. The NODE data set provides a rich and flexible codification of over 10,000 phrasal verbs and other multi-word phrases. It features comprehensive lexical resources enabling applications to identify a phrase not only in the form listed in the dictionary but also in a range of real-world variations, including alternative wording, variable syntactic patterns, inflected verbs, optional determiners, etc. - Subject classification. Using a categorization scheme of 200 key domains, over 80,000 words and senses have been associated with particular subject areas, from aeronautics to zoology. As well as facilitating the extraction of subject-specific sub-lexicons, this also provides an extensive resource for document categorization and information retrieval. - Semantic relationships. The relationships between every noun and noun sense in the dictionary are being codified using an extensive semantic taxonomy on the model of the Princeton WordNet project. (Mapping to WordNet 1.7 is supported.) This structure allows elements of the basic lexical database to function as a formal knowledge database, enabling functionality such as sense disambiguation and logical inference. - Derived from the detailed and authoritative corpus-based research of Oxford University Press's lexicographic team, the NODE data set is a powerful asset for any task dealing with real-world contemporary English usage. By integrating a number of different data types into a single structure, it creates a coherent resource which can be queried along numerous axes, allowing open-ended exploitation by many kinds of language-related applications.
https://data.gov.tw/licensehttps://data.gov.tw/license
For the real-time traffic information of Taoyuan City, explain the basic data definition of congestion levels. According to different road characteristics, different groups are defined for congestion levels, and each group is further subdivided to describe different congestion levels.
Data Description: This dataset captures all Cincinnati Police Department stops of pedestrians. This data includes time of incident, officer assignment, race/sex of stop subject, and outcome of the stop ("Action taken").. Individual pedestrian stops may populate multiple data rows to account for multiple outcomes: "Instance_ID" is the unique identifier for every one (1) pedestrian stop.
NOTE: CPD transitioned to a new Record Management System on 6/3/2024. The data before this date may have a different structure than the data after this date.
Data Creation: This data is created when CPD completes a pedestrian stop and logs the interview via Contact Cards. Contact Cards are a result of the Collaborative Agreement. Contact Cards are manually entered and may experience lags in data entry.
Data Created by: This data is created by the Cincinnati Police Department.
Refresh Frequency: This data is updated daily.
CincyInsights: The City of Cincinnati maintains an interactive dashboard portal, CincyInsights in addition to our Open Data in an effort to increase access and usage of city data. This data set has an associated dashboard available here: https://insights.cincinnati-oh.gov/stories/s/gw5q-kjng
Data Dictionary: A data dictionary providing definitions of columns and attributes is available as an attachment to this dataset.
Processing: The City of Cincinnati is committed to providing the most granular and accurate data possible. In that pursuit the Office of Performance and Data Analytics facilitates standard processing to most raw data prior to publication. Processing includes but is not limited: address verification, geocoding, decoding attributes, and addition of administrative areas (i.e. Census, neighborhoods, police districts, etc.).
Data Usage: For directions on downloading and using open data please visit our How-to Guide: https://data.cincinnati-oh.gov/dataset/Open-Data-How-To-Guide/gdr9-g3ad
Disclaimer: In compliance with privacy laws, all Public Safety datasets are anonymized and appropriately redacted prior to publication on the City of Cincinnati’s Open Data Portal. This means that for all public safety datasets: (1) the last two digits of all addresses have been replaced with “XX,” and in cases where there is a single digit street address, the entire address number is replaced with "X"; and (2) Latitude and Longitude have been randomly skewed to represent values within the same block area (but not the exact location) of the incident.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Profiling of individuals based on inborn, acquired, and assigned characteristics is central for decision making in health care. In the era of omics and big smart data, it becomes urgent to differentiate between different data governance affordances for different profiling activities. Typically, diagnostic profiling is in the focus of researchers and physicians, and other types are regarded as undesired side-effects; for example, in the connection of health care insurance risk calculations. Profiling in a legal sense is addressed, for example, by the EU data protection law. It is defined in the General Data Protection Regulation as automated decision making. This term does not correspond fully with profiling in biomedical research and healthcare, and the impact on privacy has hardly ever been examined. But profiling is also an issue concerning the fundamental right of non-discrimination, whenever profiles are used in a way that has a discriminatory effect on individuals. Here, we will focus on genetic profiling, define related notions as legal and subject-matter definitions frequently differ, and discuss the ethical and legal challenges.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In a current application development scenario in different environments, technologies and contexts, such as IoT, Blockchain, Machine Learning and Cloud Computing, there is a need for particular solutions for domain-specific software development processes. The proper definition of software processes requires understanding the involved teams and organization’s particularities and specialized technical knowledge in Software Engineering. Although it is an essential part of Software Engineering, many university curricula do not dedicate as much effort to teach software processes, focusing more on the basic principles of Software Engineering, such as requirements, architecture and programming languages. Another important aspect of software processes is modeling. The modeling of a software process provides a basis for managing, automating and supporting the software process improvement. In this context, teaching software process modeling becomes challenging, mainly due to the great emphasis on theory and few practices. This work presents an experience report teaching the definition and modeling of software processes in different domains. We applied in the discipline of software processes a practice for defining and modeling processes in various application domains, such as: IoT, cloud, mobile, critical systems, self-adaptive systems, machine learning, blockchain and games. The processes were modeled in the Software & Systems Process Engineering Metamodel (SPEM) and Business Process Model and Notation (BPMN) notations based on references from the literature for each domain. We evaluated the process modeling practice with the SPEM and BPMN in 3 classes of the software processes discipline and compared the use of the two notations applied to the different domains. We concluded that the modeling tool and the maturity in the domain are essential for the excellent performance of the process.
This data release consists of three different types of data: including direct current (DC) resistivity profiles, frequency domain electromagnetic (FDEM) survey data, and global navigation satellite system (GNSS) coordinate data of the geophysical measurement locations. A data dictionary is included along with the data and defines all of the table headings, definitions, and units. Earthen dams are common on lakes and ponds, but characteristics of these structures such as construction history, composition, and integrity are often unknown for older dams. Geophysical surveying techniques provide a non-invasive method of mapping their lithology and structure. In particular, DC resistivity and FDEM methods can, when properly processed, provide the information necessary to construct a lithologic model of an earthen dam without having to trench or core through the shell of the dam itself. In September, 2016 the U.S. Geological Survey (USGS) conducted geophysical surveys at Bob Kidd Lake, an 81-hectare lake, in northwestern Arkansas to help determine the composition of the earthen dam and guide any potential geotechnical investigations. A series of DC resistivity surveys were conducted along, parallel, and perpendicular to the axis of the crest of the dam to identify the soil-bedrock interface and any variations in the composition of the earthen dam. A dense survey using a multi-frequency electromagnetic sensor was used to map the shallow materials comprising the dam at a higher resolution. Resistivity measurements were made by transmitting a known current through two electrodes (transmitter) and measuring the voltage potential across two other electrodes (receiver). The multiple channels on the resistivity meter allow for voltage measurements to be made at 10 receivers simultaneously following a current injection. The configuration of the transmitter relative to the receiver(s) is referred to as an array. For these surveys, a Reciprocal Schlumberger array was used, which positions the transmitting pair of electrodes toward the center of the array and the receiving pairs extending away from the transmitter (Loke, 2000; Zonge and others, 2005). The electrical resistance was calculated by dividing the measured voltage by the applied current. The apparent resistivity was determined by multiplying the electrical resistance by a geometric factor. Apparent resistivity is not the true resistivity, but rather a volume-averaged estimate of the true resistivity distribution, because a homogeneous, isotropic subsurface is assumed. To estimate the true resistivity of the heterogeneous and/or anisotropic subsurface, the apparent resistivity data were processed using an inverse modeling software program. The FDEM method complements the two-dimensional (2-D) DC resistivity method and was used to extend the depth of subsurface characterization obtained with resistivity profiles. The FDEM method uses multiple current frequencies to measure bulk electric conductivity values (the inverse of resistivity values) of the earth at different depths (Lucius and others, 2007). For this project FDEM data were collected with a GEM-2, a broadband, multifrequency, fixed-coil electromagnetic induction unit (Geophex, 2015). In addition to the geophysical surveys a concurrent Global Navigation Satellite System (GNSS) survey was conducted using a Real Time Kinematic system (RTK). All electrode locations on the DC resistivity profiles, all measurement locations in the FDEM survey, as well as a point-cloud survey were collected and are included in the dataset. These data were used to geo-reference the geophysical data and may be used to create a Digital Elevation Model (DEM) of the dam surface.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, the decennial census is the official source of population totals for April 1st of each decennial year. In between censuses, the Census Bureau's Population Estimates Program produces and disseminates the official estimates of the population for the nation, states, counties, cities, and towns and estimates of housing units and the group quarters population for states and counties..Information about the American Community Survey (ACS) can be found on the ACS website. Supporting documentation including code lists, subject definitions, data accuracy, and statistical testing, and a full list of ACS tables and table shells (without estimates) can be found on the Technical Documentation section of the ACS website.Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Source: U.S. Census Bureau, 2023 American Community Survey 1-Year Estimates.ACS data generally reflect the geographic boundaries of legal and statistical areas as of January 1 of the estimate year. For more information, see Geography Boundaries by Year..Users must consider potential differences in geographic boundaries, questionnaire content or coding, or other methodological issues when comparing ACS data from different years. Statistically significant differences shown in ACS Comparison Profiles, or in data users' own analysis, may be the result of these differences and thus might not necessarily reflect changes to the social, economic, housing, or demographic characteristics being compared. For more information, see Comparing ACS Data..Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation). The effect of nonsampling error is not represented in these tables..Ancestry listed in this table refers to the total number of people who responded with a particular ancestry; for example, the estimate given for German represents the number of people who listed German as either their first or second ancestry. This table lists only the largest ancestry groups; see the Detailed Tables for more categories. Race and Hispanic origin groups are not included in this table because data for those groups come from the Race and Hispanic origin questions rather than the ancestry question (see Demographic Table)..Data for year of entry of the native population reflect the year of entry into the U.S. by people who were born in Puerto Rico or U.S. Island Areas or born outside the U.S. to a U.S. citizen parent and who subsequently moved to the U.S..The category "with a broadband Internet subscription" refers to those who said "Yes" to at least one of the following types of Internet subscriptions: Broadband such as cable, fiber optic, or DSL; a cellular data plan; satellite; a fixed wireless subscription; or other non-dial up subscription types..An Internet "subscription" refers to a type of service that someone pays for to access the Internet such as a cellular data plan, broadband such as cable, fiber optic or DSL, or other type of service. This will normally refer to a service that someone is billed for directly for Internet alone or sometimes as part of a bundle.."With a computer" includes those who said "Yes" to at least one of the following types of computers: Desktop or laptop; smartphone; tablet or other portable wireless computer; or some other type of computer..Estimates of urban and rural populations, housing units, and characteristics reflect boundaries of urban areas defined based on 2020 Census data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..Explanation of Symbols:- The estimate could not be computed because there were an insufficient number of sample observations. For a ratio of medians estimate, one or both of the median estimates falls in the lowest interval or highest interval of an open-ended distribution. For a 5-year median estimate, the margin of error associated with a median was larger than the median itself.N The estimate or margin of error cannot be displayed because there were an insufficient number of sample cases in the selected geographic area. (X) The estimate or margin of error is not applicable or not available.median- ...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In a current application development scenario in different environments, technologies and contexts, such as IoT, Blockchain, Machine Learning and Cloud Computing, there is a need for particular solutions for domain-specific software development processes. The proper definition of software processes requires understanding the involved teams and organization’s particularities and specialized technical knowledge in Software Engineering. Although it is an essential part of Software Engineering, many university curricula do not dedicate as much effort to teaching software processes, focusing more on the basic principles of Software Engineering, such as requirements, architecture and programming languages. Another important aspect of software processes is modeling. The modeling of a software process provides a basis for managing, automating and supporting the software processes improvement. In this context, teaching software processes modeling becomes challenging, mainly due to the great emphasis on theory and few practices. This work presents an experience report teaching the definition and modeling of software processes in different domains. We apply in the discipline of software processes a practice for defining and modeling processes in various application domains, such as: IoT, cloud, mobile, critical systems, self-adaptive systems and games. The processes were modeled in the EPF composer tool based on references from the literature for each domain. In the end, we evaluated the process modeling practice with the students. We concluded that the modeling tool and the maturity in the domain are essential for the good performance of the process.
The documentation covers Enterprise Survey panel datasets that were collected in Slovenia in 2009, 2013 and 2019.
The Slovenia ES 2009 was conducted between 2008 and 2009. The Slovenia ES 2013 was conducted between March 2013 and September 2013. Finally, the Slovenia ES 2019 was conducted between December 2018 and November 2019. The objective of the Enterprise Survey is to gain an understanding of what firms experience in the private sector.
As part of its strategic goal of building a climate for investment, job creation, and sustainable growth, the World Bank has promoted improving the business environment as a key strategy for development, which has led to a systematic effort in collecting enterprise data across countries. The Enterprise Surveys (ES) are an ongoing World Bank project in collecting both objective data based on firms' experiences and enterprises' perception of the environment in which they operate.
National
The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must take its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.
As it is standard for the ES, the Slovenia ES was based on the following size stratification: small (5 to 19 employees), medium (20 to 99 employees), and large (100 or more employees).
Sample survey data [ssd]
The sample for Slovenia ES 2009, 2013, 2019 were selected using stratified random sampling, following the methodology explained in the Sampling Manual for Slovenia 2009 ES and for Slovenia 2013 ES, and in the Sampling Note for 2019 Slovenia ES.
Three levels of stratification were used in this country: industry, establishment size, and oblast (region). The original sample designs with specific information of the industries and regions chosen are included in the attached Excel file (Sampling Report.xls.) for Slovenia 2009 ES. For Slovenia 2013 and 2019 ES, specific information of the industries and regions chosen is described in the "The Slovenia 2013 Enterprise Surveys Data Set" and "The Slovenia 2019 Enterprise Surveys Data Set" reports respectively, Appendix E.
For the Slovenia 2009 ES, industry stratification was designed in the way that follows: the universe was stratified into manufacturing industries, services industries, and one residual (core) sector as defined in the sampling manual. Each industry had a target of 90 interviews. For the manufacturing industries sample sizes were inflated by about 17% to account for potential non-response cases when requesting sensitive financial data and also because of likely attrition in future surveys that would affect the construction of a panel. For the other industries (residuals) sample sizes were inflated by about 12% to account for under sampling in firms in service industries.
For Slovenia 2013 ES, industry stratification was designed in the way that follows: the universe was stratified into one manufacturing industry, and two service industries (retail, and other services).
Finally, for Slovenia 2019 ES, three levels of stratification were used in this country: industry, establishment size, and region. The original sample design with specific information of the industries and regions chosen is described in "The Slovenia 2019 Enterprise Surveys Data Set" report, Appendix C. Industry stratification was done as follows: Manufacturing – combining all the relevant activities (ISIC Rev. 4.0 codes 10-33), Retail (ISIC 47), and Other Services (ISIC 41-43, 45, 46, 49-53, 55, 56, 58, 61, 62, 79, 95).
For Slovenia 2009 and 2013 ES, size stratification was defined following the standardized definition for the rollout: small (5 to 19 employees), medium (20 to 99 employees), and large (more than 99 employees). For stratification purposes, the number of employees was defined on the basis of reported permanent full-time workers. This seems to be an appropriate definition of the labor force since seasonal/casual/part-time employment is not a common practice, except in the sectors of construction and agriculture.
For Slovenia 2009 ES, regional stratification was defined in 2 regions. These regions are Vzhodna Slovenija and Zahodna Slovenija. The Slovenia sample contains panel data. The wave 1 panel “Investment Climate Private Enterprise Survey implemented in Slovenia” consisted of 223 establishments interviewed in 2005. A total of 57 establishments have been re-interviewed in the 2008 Business Environment and Enterprise Performance Survey.
For Slovenia 2013 ES, regional stratification was defined in 2 regions (city and the surrounding business area) throughout Slovenia.
Finally, for Slovenia 2019 ES, regional stratification was done across two regions: Eastern Slovenia (NUTS code SI03) and Western Slovenia (SI04).
Computer Assisted Personal Interview [capi]
Questionnaires have common questions (core module) and respectfully additional manufacturing- and services-specific questions. The eligible manufacturing industries have been surveyed using the Manufacturing questionnaire (includes the core module, plus manufacturing specific questions). Retail firms have been interviewed using the Services questionnaire (includes the core module plus retail specific questions) and the residual eligible services have been covered using the Services questionnaire (includes the core module). Each variation of the questionnaire is identified by the index variable, a0.
Survey non-response must be differentiated from item non-response. The former refers to refusals to participate in the survey altogether whereas the latter refers to the refusals to answer some specific questions. Enterprise Surveys suffer from both problems and different strategies were used to address these issues.
Item non-response was addressed by two strategies: a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect the refusal to respond as (-8). b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary. However, there were clear cases of low response.
For 2009 and 2013 Slovenia ES, the survey non-response was addressed by maximizing efforts to contact establishments that were initially selected for interview. Up to 4 attempts were made to contact the establishment for interview at different times/days of the week before a replacement establishment (with similar strata characteristics) was suggested for interview. Survey non-response did occur but substitutions were made in order to potentially achieve strata-specific goals. Further research is needed on survey non-response in the Enterprise Surveys regarding potential introduction of bias.
For 2009, the number of contacted establishments per realized interview was 6.18. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The relatively low ratio of contacted establishments per realized interview (6.18) suggests that the main source of error in estimates in the Slovenia may be selection bias and not frame inaccuracy.
For 2013, the number of realized interviews per contacted establishment was 25%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The number of rejections per contact was 44%.
Finally, for 2019, the number of interviews per contacted establishments was 9.7%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The share of rejections per contact was 75.2%.
LifeSnaps Dataset Documentation
Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.
The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.
Data Import: Reading CSV
For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.
Data Import: Setting up a MongoDB (Recommended)
To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.
To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.
For the Fitbit data, run the following:
mongorestore --host localhost:27017 -d rais_anonymized -c fitbit
For the SEMA data, run the following:
mongorestore --host localhost:27017 -d rais_anonymized -c sema
For surveys data, run the following:
mongorestore --host localhost:27017 -d rais_anonymized -c surveys
If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.
Data Availability
The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:
{ _id: id (or user_id): type: data: }
Each document consists of four fields: id (also found as user_id in sema and survey collections), type, and data. The _id field is the MongoDB-defined primary key and can be ignored. The id field refers to a user-specific ID used to uniquely identify each user across all collections. The type field refers to the specific data type within the collection, e.g., steps, heart rate, calories, etc. The data field contains the actual information about the document e.g., steps count for a specific timestamp for the steps type, in the form of an embedded object. The contents of the data object are type-dependent, meaning that the fields within the data object are different between different types of data. As mentioned previously, all times are stored in local time, and user IDs are common across different collections. For more information on the available data types, see the related publication.
Surveys Encoding
BREQ2
Why do you engage in exercise?
Code
Text
engage[SQ001]
I exercise because other people say I should
engage[SQ002]
I feel guilty when I don’t exercise
engage[SQ003]
I value the benefits of exercise
engage[SQ004]
I exercise because it’s fun
engage[SQ005]
I don’t see why I should have to exercise
engage[SQ006]
I take part in exercise because my friends/family/partner say I should
engage[SQ007]
I feel ashamed when I miss an exercise session
engage[SQ008]
It’s important to me to exercise regularly
engage[SQ009]
I can’t see why I should bother exercising
engage[SQ010]
I enjoy my exercise sessions
engage[SQ011]
I exercise because others will not be pleased with me if I don’t
engage[SQ012]
I don’t see the point in exercising
engage[SQ013]
I feel like a failure when I haven’t exercised in a while
engage[SQ014]
I think it is important to make the effort to exercise regularly
engage[SQ015]
I find exercise a pleasurable activity
engage[SQ016]
I feel under pressure from my friends/family to exercise
engage[SQ017]
I get restless if I don’t exercise regularly
engage[SQ018]
I get pleasure and satisfaction from participating in exercise
engage[SQ019]
I think exercising is a waste of time
PANAS
Indicate the extent you have felt this way over the past week
P1[SQ001]
Interested
P1[SQ002]
Distressed
P1[SQ003]
Excited
P1[SQ004]
Upset
P1[SQ005]
Strong
P1[SQ006]
Guilty
P1[SQ007]
Scared
P1[SQ008]
Hostile
P1[SQ009]
Enthusiastic
P1[SQ010]
Proud
P1[SQ011]
Irritable
P1[SQ012]
Alert
P1[SQ013]
Ashamed
P1[SQ014]
Inspired
P1[SQ015]
Nervous
P1[SQ016]
Determined
P1[SQ017]
Attentive
P1[SQ018]
Jittery
P1[SQ019]
Active
P1[SQ020]
Afraid
Personality
How Accurately Can You Describe Yourself?
Code
Text
ipip[SQ001]
Am the life of the party.
ipip[SQ002]
Feel little concern for others.
ipip[SQ003]
Am always prepared.
ipip[SQ004]
Get stressed out easily.
ipip[SQ005]
Have a rich vocabulary.
ipip[SQ006]
Don't talk a lot.
ipip[SQ007]
Am interested in people.
ipip[SQ008]
Leave my belongings around.
ipip[SQ009]
Am relaxed most of the time.
ipip[SQ010]
Have difficulty understanding abstract ideas.
ipip[SQ011]
Feel comfortable around people.
ipip[SQ012]
Insult people.
ipip[SQ013]
Pay attention to details.
ipip[SQ014]
Worry about things.
ipip[SQ015]
Have a vivid imagination.
ipip[SQ016]
Keep in the background.
ipip[SQ017]
Sympathize with others' feelings.
ipip[SQ018]
Make a mess of things.
ipip[SQ019]
Seldom feel blue.
ipip[SQ020]
Am not interested in abstract ideas.
ipip[SQ021]
Start conversations.
ipip[SQ022]
Am not interested in other people's problems.
ipip[SQ023]
Get chores done right away.
ipip[SQ024]
Am easily disturbed.
ipip[SQ025]
Have excellent ideas.
ipip[SQ026]
Have little to say.
ipip[SQ027]
Have a soft heart.
ipip[SQ028]
Often forget to put things back in their proper place.
ipip[SQ029]
Get upset easily.
ipip[SQ030]
Do not have a good imagination.
ipip[SQ031]
Talk to a lot of different people at parties.
ipip[SQ032]
Am not really interested in others.
ipip[SQ033]
Like order.
ipip[SQ034]
Change my mood a lot.
ipip[SQ035]
Am quick to understand things.
ipip[SQ036]
Don't like to draw attention to myself.
ipip[SQ037]
Take time out for others.
ipip[SQ038]
Shirk my duties.
ipip[SQ039]
Have frequent mood swings.
ipip[SQ040]
Use difficult words.
ipip[SQ041]
Don't mind being the centre of attention.
ipip[SQ042]
Feel others' emotions.
ipip[SQ043]
Follow a schedule.
ipip[SQ044]
Get irritated easily.
ipip[SQ045]
Spend time reflecting on things.
ipip[SQ046]
Am quiet around strangers.
ipip[SQ047]
Make people feel at ease.
ipip[SQ048]
Am exacting in my work.
ipip[SQ049]
Often feel blue.
ipip[SQ050]
Am full of ideas.
STAI
Indicate how you feel right now
Code
Text
STAI[SQ001]
I feel calm
STAI[SQ002]
I feel secure
STAI[SQ003]
I am tense
STAI[SQ004]
I feel strained
STAI[SQ005]
I feel at ease
STAI[SQ006]
I feel upset
STAI[SQ007]
I am presently worrying over possible misfortunes
STAI[SQ008]
I feel satisfied
STAI[SQ009]
I feel frightened
STAI[SQ010]
I feel comfortable
STAI[SQ011]
I feel self-confident
STAI[SQ012]
I feel nervous
STAI[SQ013]
I am jittery
STAI[SQ014]
I feel indecisive
STAI[SQ015]
I am relaxed
STAI[SQ016]
I feel content
STAI[SQ017]
I am worried
STAI[SQ018]
I feel confused
STAI[SQ019]
I feel steady
STAI[SQ020]
I feel pleasant
TTM
Do you engage in regular physical activity according to the definition above? How frequently did each event or experience occur in the past month?
Code
Text
processes[SQ002]
I read articles to learn more about physical
XYZ Credit Card company regularly helps its merchants understand their data better and take key business decisions accurately by providing machine learning and analytics consulting. ABC is an established Brick & Mortar retailer that frequently conducts marketing campaigns for its diverse product range. As a merchant of XYZ, they have sought XYZ to assist them in their discount marketing process using the power of machine learning.
Discount marketing and coupon usage are very widely used promotional techniques to attract new customers and to retain & reinforce loyalty of existing customers. The measurement of a consumer’s propensity towards coupon usage and the prediction of the redemption behaviour are crucial parameters in assessing the effectiveness of a marketing campaign.
ABC promotions are shared across various channels including email, notifications, etc. A number of these campaigns include coupon discounts that are offered for a specific product/range of products. The retailer would like the ability to predict whether customers redeem the coupons received across channels, which will enable the retailer’s marketing team to accurately design coupon construct, and develop more precise and targeted marketing strategies.
The data available in this problem contains the following information, including the details of a sample of campaigns and coupons used in previous campaigns -
User Demographic Details
Campaign and coupon Details
Product details
Previous transactions
Based on previous transaction & performance data from the last 18 campaigns, predict the probability for the next 10 campaigns in the test set for each coupon and customer combination, whether the customer will redeem the coupon or not?
Here is the schema for the different data tables available. The detailed data dictionary is provided next.
You are provided with the following files:
train.csv: Train data containing the coupons offered to the given customers under the 18 campaigns
Variable | Definition |
---|---|
id | Unique id for coupon customer impression |
campaign_id | Unique id for a discount campaign |
coupon_id | Unique id for a discount coupon |
customer_id | Unique id for a customer |
redemption_status | (target) (0 - Coupon not redeemed, 1 - Coupon redeemed) |
campaign_data.csv: Campaign information for each of the 28 campaigns
Variable | Definition |
---|---|
campaign_id | Unique id for a discount campaign |
campaign_type | Anonymised Campaign Type (X/Y) |
start_date | Campaign Start Date |
end_date | Campaign End Date |
coupon_item_mapping.csv: Mapping of coupon and items valid for discount under that coupon
Variable | Definition |
---|---|
coupon_id | Unique id for a discount coupon (no order) |
item_id | Unique id for items for which given coupon is valid (no order) |
customer_demographics.csv: Customer demographic information for some customers
Variable | Definition |
---|---|
customer_id | Unique id for a customer |
age_range | Age range of customer family in years |
marital_status | Married/Single |
rented | 0 - not rented accommodation, 1 - rented accommodation |
family_size | Number of family members |
no_of_children | Number of children in the family |
income_bracket | Label Encoded Income Bracket (Higher income corresponds to higher number) |
customer_transaction_data.csv: Transaction data for all customers for duration of campaigns in the train data
Variable | Definition |
---|---|
date | Date of Transaction |
customer_id | Unique id for a customer |
item_id | Unique id for item |
quantity | quantity of item bought |
selling_price | Sales value of the transaction |
other_discount | Discount from other sources such as manufacturer coupon/loyalty card |
coupon_discount | Discount availed from retailer coupon |
item_data.csv: Item information for each item sold by the retailer
Variable | Definition |
---|---|
item_id | Unique id for itemv |
brand | Unique id for item brand |
brand_type | Brand Type (local/Established) |
category | Item Category |
test.csv: Contains the coupon customer combination for which redemption status is to be predicted
Variable | Definition |
---|---|
id | Unique id for coupon customer impression |
campaign_id | Unique id for a discount campaign |
coupon_id | Unique id for a discount coupon |
customer_id | Unique id for a customer |
To summarise the entire process:
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Data Virtualization Market size was valued at USD 4.05 Billion in 2023 and is projected to reach USD 15.55 Billion By 2031, growing at a CAGR of 20.20% during the forecast period 2024 to 2031.
Data Virtualization Market: Definition/ Overview
Data virtualization is an advanced technique in data management that streamlines access to information from various sources, offering a seamless and unified view of data despite its diverse locations and formats. This approach acts as an intermediary layer, enabling users to interact with data as if it were consolidated in a single repository. By abstracting the underlying complexities of different data sources, data virtualization simplifies the user's experience and eliminates the necessity of understanding the specifics of each individual source.
One of the primary benefits of data virtualization is its ability to provide near real-time access to information. Unlike traditional data integration methods that rely on duplicating data, data virtualization allows users to retrieve and query data in its original location. This real-time capability ensures that users have the most current and accurate data available for decision-making.
Additionally, data virtualization can significantly enhance system performance. By optimizing queries and minimizing the movement of data across networks, it reduces the overhead typically associated with data integration processes. This efficiency not only accelerates data retrieval and processing times but also improves the overall responsiveness of the system. From a financial perspective, data virtualization offers substantial cost savings. It eliminates the need for complex and costly data integration projects that involve extensive data extraction, transformation, and loading (ETL) processes. By reducing the dependency on physical data consolidation, organizations can allocate resources more effectively and decrease the total cost of ownership for their data infrastructure.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Established databases included in DISCOVER CKD.