100+ datasets found

f
Established databases included in DISCOVER CKD.
plos.figshare.com
xls
Updated Jun 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Supriya Kumar; Matthew Arnold; Glen James; Rema Padman (2023). Established databases included in DISCOVER CKD. [Dataset]. http://doi.org/10.1371/journal.pone.0274131.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0274131.t001
Dataset updated
Jun 16, 2023
Dataset provided by
PLOS ONE
Authors
Supriya Kumar; Matthew Arnold; Glen James; Rema Padman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Established databases included in DISCOVER CKD.
B
Standardizing Definitions and Data Collection Approach for Black-owned...
borealisdata.ca
Updated Aug 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dozie Okoye; Emmanuel U. Nwugo (2025). Standardizing Definitions and Data Collection Approach for Black-owned Businesses and Black Entrepreneurs | Normalisation des définitions et de l'approche de la collecte de données pour les entreprises et les entrepreneurs noirs [Dataset]. http://doi.org/10.5683/SP3/OKGGWS
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/OKGGWS
Dataset updated
Aug 7, 2025
Dataset provided by
Borealis
Authors
Dozie Okoye; Emmanuel U. Nwugo
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Introduction Black entrepreneurship in Canada makes important contributions to the Canadian economy, from fostering innovation to creating employment and building generational wealth. At least 1.3% of Black adults in Canada are business owners, relative to the 2.3% that are business owners from the entire Canadian population (Business Development Bank of Canada (BDC), 2025). Similarly, Black people account for 2.4% of all business owners in the country even though they only represent 4.3% of the entire population. Women account for 33% of these Black businesses, as compared to their 20% share in the country’s total business ownership (Diversity Institute, 2024), highlighting the potential of Black entrepreneurship for economic empowerment. However, Black businesses are faced with some systemic barriers that impede their ability to thrive, including but not limited to underrepresentation among entrepreneurs in Canada and limited access to finance, restricted networking opportunities, and insufficient specialized support programs (Gueye et al., 2022; Gueye, 2023; Diversity Institute, 2024). These challenges stem in part from a lack of comprehensive and reliable data on Black businesses and the absence of standardized definitions for key concepts such as Black entrepreneurs, Black enterprises, and Black entrepreneurship. Without a clear understanding of who constitutes a Black entrepreneur and the scale of their contributions, policymakers and stakeholders struggle to provide the necessary support and resources to advance this community. In fact, the development of policies and initiatives for Black businesses faces difficulties because current data about Black entrepreneurship remains fragmented and inconsistent, with different sources reporting different numbers (Grekou et al., 2021; Gueye, 2023). These discrepancies highlight the urgent need for a unified approach to data collection and analysis, as accurate and comprehensive data are critical to understanding the size, scope, and needs of Black entrepreneurs, enabling targeted policy interventions and resource allocation. Current data fragmentation problems combined with non-standardized definitions create a situation where Black business owners are frequently ignored or inaccurately classified or omitted (Coletto et al., 2021). The significance of this research, therefore, lies in its ability to resolve systemic barriers through an improved representation of Black entrepreneurs. This research aims to harmonise missing data points and set specific criteria to establish sound tools for policymakers, researchers, and community groups who want to better assist Black entrepreneurs. With this, Black-owned business support will be strengthened through targeted policies and programs that develop sustainable growth for these businesses in Canada. The main objectives of this study are threefold. The research seeks to reconcile disparate Black entrepreneurship statistics from Afrobiz.ca alongside Canadian Black Chamber of Commerce records and Statistics Canada databases. Also, the research seeks to develop unified criteria to define Black business owners together with their enterprises to improve both data collection precision and reporting consistency. Lastly, the research will establish procedures to build a standardized database of Black entrepreneurs by integrating present data sources and making sure both formal and informal businesses receive proper representation. These research efforts will establish fundamental principles for developing an inclusive and equal entrepreneurial system throughout Canada. Introduction L'entrepreneuriat noir au Canada apporte d'importantes contributions à l'économie canadienne, qu'il s'agisse de favoriser l'innovation, de créer des emplois ou de constituer un patrimoine générationnel. Au moins 1,3 % des adultes noirs au Canada sont propriétaires d'une entreprise, contre 2,3 % pour l'ensemble de la population canadienne (Banque de développement du Canada (BDC), 2025). De même, les Noirs représentent 2,4 % de tous les propriétaires d'entreprise du pays, alors qu'ils ne représentent que 4,3 % de la population totale. Les femmes représentent 33 % de ces entreprises noires, alors qu'elles représentent 20 % de l'ensemble des entreprises du pays (Diversity Institute, 2024), ce qui souligne le potentiel de l'entrepreneuriat noir en matière d'émancipation économique. Toutefois, les entreprises noires sont confrontées à certains obstacles systémiques qui entravent leur capacité à prospérer, notamment la sousreprésentation des entrepreneurs au Canada et l'accès limité au financement, les possibilités de réseautage restreintes et l'insuffisance des programmes de soutien spécialisés (Gueye et al., 2022 ; Gueye, 2023 ; Diversity Institute, 2024). Ces défis découlent en partie d'un manque de données complètes et fiables sur les entreprises noires et de l'absence de définitions normalisées pour des concepts clés tels que les entrepreneurs...
OneNet Cross-Platform Services
data.europa.eu
zenodo.org
unknown
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo, OneNet Cross-Platform Services [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-8329051?locale=en
Explore at:
unknown(199476)Available download formats
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The goal of the OneNet System is to facilitate data exchanges among existing platforms, services, applications, and devices by the power of interoperability techniques. To ensure that system requirements are technically -implementable and widely adopted, internationally standardized file formats, metadata, vocabularies and identifiers - are required. The OneNet “Cross-Platform Access” pattern is the fundamental characteristic of an interoperable ecosystem, leading to the definition of the exposed list OneNet Cross-Platform services (CPS). The pattern entails that an application accesses services or resources (information or functions) from multiple platforms through the same interface. For example, a “grid monitoring” application gathers information on different grid indicators provided by different platforms that conduct measurements or state estimations. The challenge of realizing this pattern lies in allowing applications or services within one platform to interact with other platforms (eventually from different providers) with relevant services or applications via the same interface and data formats. Thereby, reuse and composition of services as well as easy integration of data from different platforms are enabled. Based on the defined concept for CPS, an extensive analysis has been performed regarding data exchange patterns and roles involved for system use cases (SUCs) from other H2020 projects and the OneNet demo clusters. This has resulted into a first list of CPS, that has been thereafter taxonomized into 10 categories. The different entries have been defined providing a set of classes such as service description, indicative data producer/consumer etc. Each CPS can be assigned with multiple business objects describing the context of it. For a specific set of widely used by the Demo CPS, there have been formal semantic definitions provided in the "CrossPlatformServices-Semantic" excel worksheet.
f
Secondary data and baseline covariates of patients included in DISCOVER CKD....
plos.figshare.com
xls
Updated Jun 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Supriya Kumar; Matthew Arnold; Glen James; Rema Padman (2023). Secondary data and baseline covariates of patients included in DISCOVER CKD. [Dataset]. http://doi.org/10.1371/journal.pone.0274131.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0274131.t002
Dataset updated
Jun 16, 2023
Dataset provided by
PLOS ONE
Authors
Supriya Kumar; Matthew Arnold; Glen James; Rema Padman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Secondary data and baseline covariates of patients included in DISCOVER CKD.
World Bank's Global Data🌎🌏🌍
kaggle.com
Updated Jan 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vijay Veer Singh (2025). World Bank's Global Data🌎🌏🌍 [Dataset]. https://www.kaggle.com/datasets/vijayveersingh/world-banks-global-indicator-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 11, 2025
Dataset provided by
Kaggle
Authors
Vijay Veer Singh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description:

*The World Development Indicators (WDI) is a premier compilation of cross-country comparable data about development. It provides a broad range of economic, social, environmental, and governance indicators to support analysis and decision-making for development policies. The dataset includes indicators from different countries, spanning multiple decades, enabling researchers and policymakers to understand trends and progress in development goals such as poverty reduction, education, healthcare, and infrastructure.*

*The dataset is a collection of multiple CSV files providing information on global indicators, countries, and time-series data. It is structured as follows:*

1. series:
Contains metadata for various indicators, including their descriptions, definitions, and other relevant information. This file acts as a reference for understanding what each indicator represents.

2. country_series:
Establishes relationships between countries and specific indicators. It provides additional metadata, such as contextual descriptions of indicator usage for particular countries.

3. countries:
Includes detailed information about countries, such as country codes, region classifications, income levels, and other geographical or socio-economic attributes.

4. footnotes:
Provides supplementary notes and additional context for specific data points in the main dataset. These notes clarify exceptions, limitations, or other special considerations for particular entries.

5. main_data:
The core dataset containing the actual indicator values for countries across different years. This file forms the backbone of the dataset and is used for analysis.

6. series_time:
Contains time-related metadata for indicators, such as their start and end years or periods of data availability.

*This dataset is ideal for analyzing global development trends, comparing country-level statistics, and studying the relationships between different socio-economic indicators over time.*

Columns and Examples:

Series Code:

Description: Unique code identifying the data series.

Example: AG.LND.AGRI.K2 (Agricultural land, sq. km).

Topic:

Description: Category under which the indicator is classified.

Example: Environment: Land use.

Indicator Name:

Description: Full name describing what the indicator measures.

Example: Agricultural land (sq. km).

Short Definition:

Description: A brief explanation of the indicator (if available).

Example: Not applicable for all indicators.

Long Definition:

Description: Detailed explanation of the indicator’s meaning and methodology.

Example: "Agricultural land refers to the share of land area that is arable, under permanent crops, or under permanent pastures."

Unit of Measure:

Description: Unit in which the data is expressed.

Example: Square kilometers.

Periodicity:

Description: How frequently the data is collected or reported.

Example: Annual.

Base Period:

Description: The reference period used for comparison, if applicable.

Example: Often not specified.

Other Notes:

Description: Additional context or remarks about the data.

Example: "Data for former states are included in successor states."

Aggregation Method:

Description: Method used to combine data for groups (e.g., regions).

Example: Weighted average.

Limitations and Exceptions:

Description: Constraints or exceptions in the data.

Example: "Data may not be directly comparable across countries due to different definitions."

Notes from Original Source:

Description: Remarks provided by the data source.

Example: Not specified for all indicators.

General Comments:

Description: Broad remarks about the dataset or indicator.

Example: Not available in all cases.

Source:

Description: Organization providing the data.

Example: Food and Agriculture Organization.

Statistical Concept and Methodology:

Description: Explanation of how the data was generated.

Example: "Agricultural land is calculated based on land area classified as arable."

Development Relevance:

Description: Importance of the indicator for development.

Example: "Agricultural land availability impacts food security and rural livelihoods."

Related Source Links:

Description: URLs to related information sources (if any).

Example: Not specified.

Other Web Links:

Description: Additional web resources.

Example: Not specified.

Related Indicators:

Description: Indicators conceptually related...
Z
Data from: NetFlow data collected with different packet sampling rates
data.niaid.nih.gov
portalcientifico.unileon.es
+2more
Updated Feb 24, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adrián Campazas (2022). NetFlow data collected with different packet sampling rates [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6243335
Explore at:
Dataset updated
Feb 24, 2022
Dataset provided by
Ignacio Crespo
Adrián Campazas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
NetFlow traffic generated using DOROTHEA (DOcker-based fRamework fOr gaTHering nEtflow trAffic) NetFlow is a network protocol developed by Cisco for the collection and monitoring of network traffic flow data generated. A flow is defined as a unidirectional sequence of packets with some common properties that pass through a network device.

NetFlow flows have been captured with different sampling at the packet level. A sampling means that 1 out of every X packets is selected to be flow while the rest of the packets are not valued.

The version of NetFlow used to build the datasets is 5.
p
MIMIC-IV
physionet.org
Updated Oct 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Brian Gow; Benjamin Moody; Steven Horng; Leo Anthony Celi; Roger Mark (2024). MIMIC-IV [Dataset]. http://doi.org/10.13026/kpb9-mt58
Explore at:
Unique identifier
https://doi.org/10.13026/kpb9-mt58
Dataset updated
Oct 11, 2024
Authors
Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Brian Gow; Benjamin Moody; Steven Horng; Leo Anthony Celi; Roger Mark
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy. Here we present Medical Information Mart for Intensive Care (MIMIC)-IV, a large deidentified dataset of patients admitted to the emergency department or an intensive care unit at the Beth Israel Deaconess Medical Center in Boston, MA. MIMIC-IV contains data for over 65,000 patients admitted to an ICU and over 200,000 patients admitted to the emergency department. MIMIC-IV incorporates contemporary data and adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare.
ERA5 hourly data on single levels from 1940 to present
cds.climate.copernicus.eu
grib
Updated Aug 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ECMWF (2025). ERA5 hourly data on single levels from 1940 to present [Dataset]. http://doi.org/10.24381/cds.adbb2d47
Explore at:
gribAvailable download formats
Unique identifier
https://doi.org/10.24381/cds.adbb2d47
Dataset updated
Aug 11, 2025
Dataset provided by
European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
Authors
ECMWF
License
https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdfhttps://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdf
Time period covered
Jan 1, 1940 - Aug 5, 2025
Description
ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. ERA5 provides hourly estimates for a large number of atmospheric, ocean-wave and land-surface quantities. An uncertainty estimate is sampled by an underlying 10-member ensemble at three-hourly intervals. Ensemble mean and spread have been pre-computed for convenience. Such uncertainty estimates are closely related to the information content of the available observing system which has evolved considerably over time. They also indicate flow-dependent sensitive areas. To facilitate many climate applications, monthly-mean averages have been pre-calculated too, though monthly means are not available for the ensemble mean and spread. ERA5 is updated daily with a latency of about 5 days. In case that serious flaws are detected in this early release (called ERA5T), this data could be different from the final release 2 to 3 months later. In case that this occurs users are notified. The data set presented here is a regridded subset of the full ERA5 data set on native resolution. It is online on spinning disk, which should ensure fast and easy access. It should satisfy the requirements for most common applications. An overview of all ERA5 datasets can be found in this article. Information on access to ERA5 data on native resolution is provided in these guidelines. Data has been regridded to a regular lat-lon grid of 0.25 degrees for the reanalysis and 0.5 degrees for the uncertainty estimate (0.5 and 1 degree respectively for ocean waves). There are four main sub sets: hourly and monthly products, both on pressure levels (upper air fields) and single levels (atmospheric, ocean-wave and land surface quantities). The present entry is "ERA5 hourly data on single levels from 1940 to present".
E
New Oxford Dictionary of English, 2nd Edition
live.european-language-grid.eu
catalog.elra.info
Updated Dec 6, 2005
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2005). New Oxford Dictionary of English, 2nd Edition [Dataset]. https://live.european-language-grid.eu/catalogue/lcr/2276
Explore at:
Dataset updated
Dec 6, 2005
License
http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
Description
This is Oxford University Press's most comprehensive single-volume dictionary, with 170,000 entries covering all varieties of English worldwide. The NODE data set constitutes a fully integrated range of formal data types suitable for language engineering and NLP applications: It is available in XML or SGML. - Source dictionary data. The NODE data set includes all the information present in the New Oxford Dictionary of English itself, such as definition text, example sentences, grammatical indicators, and encyclopaedic material. - Morphological data. Each NODE lemma (both headwords and subentries) has a full listing of all possible syntactic forms (e.g. plurals for nouns, inflections for verbs, comparatives and superlatives for adjectives), tagged to show their syntactic relationships. Each form has an IPA pronunciation. Full morphological data is also given for spelling variants (e.g. typical American variants), and a system of links enables straightforward correlation of variant forms to standard forms. The data set thus provides robust support for all look-up routines, and is equally viable for applications dealing with American and British English. - Phrases and idioms. The NODE data set provides a rich and flexible codification of over 10,000 phrasal verbs and other multi-word phrases. It features comprehensive lexical resources enabling applications to identify a phrase not only in the form listed in the dictionary but also in a range of real-world variations, including alternative wording, variable syntactic patterns, inflected verbs, optional determiners, etc. - Subject classification. Using a categorization scheme of 200 key domains, over 80,000 words and senses have been associated with particular subject areas, from aeronautics to zoology. As well as facilitating the extraction of subject-specific sub-lexicons, this also provides an extensive resource for document categorization and information retrieval. - Semantic relationships. The relationships between every noun and noun sense in the dictionary are being codified using an extensive semantic taxonomy on the model of the Princeton WordNet project. (Mapping to WordNet 1.7 is supported.) This structure allows elements of the basic lexical database to function as a formal knowledge database, enabling functionality such as sense disambiguation and logical inference. - Derived from the detailed and authoritative corpus-based research of Oxford University Press's lexicographic team, the NODE data set is a powerful asset for any task dealing with real-world contemporary English usage. By integrating a number of different data types into a single structure, it creates a coherent resource which can be queried along numerous axes, allowing open-ended exploitation by many kinds of language-related applications.
d
The definition data service for traffic congestion levels in Taoyuan City
data.gov.tw
xml
Updated Feb 25, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Transportation, Taoyuan (2022). The definition data service for traffic congestion levels in Taoyuan City [Dataset]. https://data.gov.tw/en/datasets/149613
Explore at:
xmlAvailable download formats
Dataset updated
Feb 25, 2022
Dataset authored and provided by
Department of Transportation, Taoyuan
License
https://data.gov.tw/licensehttps://data.gov.tw/license
Area covered
Taoyuan
Description
For the real-time traffic information of Taoyuan City, explain the basic data definition of congestion levels. According to different road characteristics, different groups are defined for congestion levels, and each group is further subdivided to describe different congestion levels.
T
Pedestrian Stops - Contact Cards
data.cincinnati-oh.gov
application/rdfxml +5
Updated Mar 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Pedestrian Stops - Contact Cards [Dataset]. https://data.cincinnati-oh.gov/Safety/Pedestrian-Stops-Contact-Cards/swrz-ak2i
Explore at:
application/rssxml, csv, json, xml, application/rdfxml, tsvAvailable download formats
Dataset updated
Mar 31, 2025
Description
Data Description: This dataset captures all Cincinnati Police Department stops of pedestrians. This data includes time of incident, officer assignment, race/sex of stop subject, and outcome of the stop ("Action taken").. Individual pedestrian stops may populate multiple data rows to account for multiple outcomes: "Instance_ID" is the unique identifier for every one (1) pedestrian stop.

NOTE: CPD transitioned to a new Record Management System on 6/3/2024. The data before this date may have a different structure than the data after this date.

Data Creation: This data is created when CPD completes a pedestrian stop and logs the interview via Contact Cards. Contact Cards are a result of the Collaborative Agreement. Contact Cards are manually entered and may experience lags in data entry.

Data Created by: This data is created by the Cincinnati Police Department.

Refresh Frequency: This data is updated daily.

CincyInsights: The City of Cincinnati maintains an interactive dashboard portal, CincyInsights in addition to our Open Data in an effort to increase access and usage of city data. This data set has an associated dashboard available here: https://insights.cincinnati-oh.gov/stories/s/gw5q-kjng

Data Dictionary: A data dictionary providing definitions of columns and attributes is available as an attachment to this dataset.

Processing: The City of Cincinnati is committed to providing the most granular and accurate data possible. In that pursuit the Office of Performance and Data Analytics facilitates standard processing to most raw data prior to publication. Processing includes but is not limited: address verification, geocoding, decoding attributes, and addition of administrative areas (i.e. Census, neighborhoods, police districts, etc.).

Data Usage: For directions on downloading and using open data please visit our How-to Guide: https://data.cincinnati-oh.gov/dataset/Open-Data-How-To-Guide/gdr9-g3ad

Disclaimer: In compliance with privacy laws, all Public Safety datasets are anonymized and appropriately redacted prior to publication on the City of Cincinnati’s Open Data Portal. This means that for all public safety datasets: (1) the last two digits of all addresses have been replaced with “XX,” and in cases where there is a single digit street address, the entire address number is replaced with "X"; and (2) Latitude and Longitude have been randomly skewed to represent values within the same block area (but not the exact location) of the incident.
f
fdata-02-00040_Challenges and Legal Gaps of Genetic Profiling in the Era of...
frontiersin.figshare.com
bin
Updated Jun 3, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Murat Sariyar; Irene Schlünder (2023). fdata-02-00040_Challenges and Legal Gaps of Genetic Profiling in the Era of Big Data.xml [Dataset]. http://doi.org/10.3389/fdata.2019.00040.s002
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.3389/fdata.2019.00040.s002
Dataset updated
Jun 3, 2023
Dataset provided by
Frontiers
Authors
Murat Sariyar; Irene Schlünder
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Profiling of individuals based on inborn, acquired, and assigned characteristics is central for decision making in health care. In the era of omics and big smart data, it becomes urgent to differentiate between different data governance affordances for different profiling activities. Typically, diagnostic profiling is in the focus of researchers and physicians, and other types are regarded as undesired side-effects; for example, in the connection of health care insurance risk calculations. Profiling in a legal sense is addressed, for example, by the EU data protection law. It is defined in the General Data Protection Regulation as automated decision making. This term does not correspond fully with profiling in biomedical research and healthcare, and the impact on privacy has hardly ever been examined. But profiling is also an issue concerning the fundamental right of non-discrimination, whenever profiles are used in a way that has a discriminatory effect on individuals. Here, we will focus on genetic profiling, define related notions as legal and subject-matter definitions frequently differ, and discuss the ethical and legal challenges.
Z
Data from: Modeling software processes from different domains using SPEM and...
data.niaid.nih.gov
Updated Jul 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carla Bezerra (2024). Modeling software processes from different domains using SPEM and BPMN notations: An experience report of teaching software processes [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7674964
Explore at:
Dataset updated
Jul 12, 2024
Dataset provided by
Carla Bezerra
Emanuel Coutinho
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In a current application development scenario in different environments, technologies and contexts, such as IoT, Blockchain, Machine Learning and Cloud Computing, there is a need for particular solutions for domain-specific software development processes. The proper definition of software processes requires understanding the involved teams and organization’s particularities and specialized technical knowledge in Software Engineering. Although it is an essential part of Software Engineering, many university curricula do not dedicate as much effort to teach software processes, focusing more on the basic principles of Software Engineering, such as requirements, architecture and programming languages. Another important aspect of software processes is modeling. The modeling of a software process provides a basis for managing, automating and supporting the software process improvement. In this context, teaching software process modeling becomes challenging, mainly due to the great emphasis on theory and few practices. This work presents an experience report teaching the definition and modeling of software processes in different domains. We applied in the discipline of software processes a practice for defining and modeling processes in various application domains, such as: IoT, cloud, mobile, critical systems, self-adaptive systems, machine learning, blockchain and games. The processes were modeled in the Software & Systems Process Engineering Metamodel (SPEM) and Business Process Model and Notation (BPMN) notations based on references from the literature for each domain. We evaluated the process modeling practice with the SPEM and BPMN in 3 classes of the software processes discipline and compared the use of the two notations applied to the different domains. We concluded that the modeling tool and the maturity in the domain are essential for the excellent performance of the process.
d
Geophysical surveys and geospatial data for Bob Kidd Lake, Washington...
catalog.data.gov
data.usgs.gov
+3more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Geophysical surveys and geospatial data for Bob Kidd Lake, Washington County, Arkansas [Dataset]. https://catalog.data.gov/dataset/geophysical-surveys-and-geospatial-data-for-bob-kidd-lake-washington-county-arkansas
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
U.S. Geological Survey
Area covered
Washington County, Arkansas, Bob Kidd Lake
Description
This data release consists of three different types of data: including direct current (DC) resistivity profiles, frequency domain electromagnetic (FDEM) survey data, and global navigation satellite system (GNSS) coordinate data of the geophysical measurement locations. A data dictionary is included along with the data and defines all of the table headings, definitions, and units. Earthen dams are common on lakes and ponds, but characteristics of these structures such as construction history, composition, and integrity are often unknown for older dams. Geophysical surveying techniques provide a non-invasive method of mapping their lithology and structure. In particular, DC resistivity and FDEM methods can, when properly processed, provide the information necessary to construct a lithologic model of an earthen dam without having to trench or core through the shell of the dam itself. In September, 2016 the U.S. Geological Survey (USGS) conducted geophysical surveys at Bob Kidd Lake, an 81-hectare lake, in northwestern Arkansas to help determine the composition of the earthen dam and guide any potential geotechnical investigations. A series of DC resistivity surveys were conducted along, parallel, and perpendicular to the axis of the crest of the dam to identify the soil-bedrock interface and any variations in the composition of the earthen dam. A dense survey using a multi-frequency electromagnetic sensor was used to map the shallow materials comprising the dam at a higher resolution. Resistivity measurements were made by transmitting a known current through two electrodes (transmitter) and measuring the voltage potential across two other electrodes (receiver). The multiple channels on the resistivity meter allow for voltage measurements to be made at 10 receivers simultaneously following a current injection. The configuration of the transmitter relative to the receiver(s) is referred to as an array. For these surveys, a Reciprocal Schlumberger array was used, which positions the transmitting pair of electrodes toward the center of the array and the receiving pairs extending away from the transmitter (Loke, 2000; Zonge and others, 2005). The electrical resistance was calculated by dividing the measured voltage by the applied current. The apparent resistivity was determined by multiplying the electrical resistance by a geometric factor. Apparent resistivity is not the true resistivity, but rather a volume-averaged estimate of the true resistivity distribution, because a homogeneous, isotropic subsurface is assumed. To estimate the true resistivity of the heterogeneous and/or anisotropic subsurface, the apparent resistivity data were processed using an inverse modeling software program. The FDEM method complements the two-dimensional (2-D) DC resistivity method and was used to extend the depth of subsurface characterization obtained with resistivity profiles. The FDEM method uses multiple current frequencies to measure bulk electric conductivity values (the inverse of resistivity values) of the earth at different depths (Lucius and others, 2007). For this project FDEM data were collected with a GEM-2, a broadband, multifrequency, fixed-coil electromagnetic induction unit (Geophex, 2015). In addition to the geophysical surveys a concurrent Global Navigation Satellite System (GNSS) survey was conducted using a Real Time Kinematic system (RTK). All electrode locations on the DC resistivity profiles, all measurement locations in the FDEM survey, as well as a point-cloud survey were collected and are included in the dataset. These data were used to geo-reference the geophysical data and may be used to create a Digital Elevation Model (DEM) of the dam surface.
2023 American Community Survey: DP02 | Selected Social Characteristics in...
data.census.gov
test.data.census.gov
Updated Oct 6, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ACS (2022). 2023 American Community Survey: DP02 | Selected Social Characteristics in the United States (ACS 1-Year Estimates Data Profiles) [Dataset]. https://data.census.gov/cedsci/table?q=DP02
Explore at:
Dataset updated
Oct 6, 2022
Dataset provided by
United States Census Bureauhttp://census.gov/
Authors
ACS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
2023
Area covered
United States
Description
Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, the decennial census is the official source of population totals for April 1st of each decennial year. In between censuses, the Census Bureau's Population Estimates Program produces and disseminates the official estimates of the population for the nation, states, counties, cities, and towns and estimates of housing units and the group quarters population for states and counties..Information about the American Community Survey (ACS) can be found on the ACS website. Supporting documentation including code lists, subject definitions, data accuracy, and statistical testing, and a full list of ACS tables and table shells (without estimates) can be found on the Technical Documentation section of the ACS website.Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Source: U.S. Census Bureau, 2023 American Community Survey 1-Year Estimates.ACS data generally reflect the geographic boundaries of legal and statistical areas as of January 1 of the estimate year. For more information, see Geography Boundaries by Year..Users must consider potential differences in geographic boundaries, questionnaire content or coding, or other methodological issues when comparing ACS data from different years. Statistically significant differences shown in ACS Comparison Profiles, or in data users' own analysis, may be the result of these differences and thus might not necessarily reflect changes to the social, economic, housing, or demographic characteristics being compared. For more information, see Comparing ACS Data..Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation). The effect of nonsampling error is not represented in these tables..Ancestry listed in this table refers to the total number of people who responded with a particular ancestry; for example, the estimate given for German represents the number of people who listed German as either their first or second ancestry. This table lists only the largest ancestry groups; see the Detailed Tables for more categories. Race and Hispanic origin groups are not included in this table because data for those groups come from the Race and Hispanic origin questions rather than the ancestry question (see Demographic Table)..Data for year of entry of the native population reflect the year of entry into the U.S. by people who were born in Puerto Rico or U.S. Island Areas or born outside the U.S. to a U.S. citizen parent and who subsequently moved to the U.S..The category "with a broadband Internet subscription" refers to those who said "Yes" to at least one of the following types of Internet subscriptions: Broadband such as cable, fiber optic, or DSL; a cellular data plan; satellite; a fixed wireless subscription; or other non-dial up subscription types..An Internet "subscription" refers to a type of service that someone pays for to access the Internet such as a cellular data plan, broadband such as cable, fiber optic or DSL, or other type of service. This will normally refer to a service that someone is billed for directly for Internet alone or sometimes as part of a bundle.."With a computer" includes those who said "Yes" to at least one of the following types of computers: Desktop or laptop; smartphone; tablet or other portable wireless computer; or some other type of computer..Estimates of urban and rural populations, housing units, and characteristics reflect boundaries of urban areas defined based on 2020 Census data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..Explanation of Symbols:- The estimate could not be computed because there were an insufficient number of sample observations. For a ratio of medians estimate, one or both of the median estimates falls in the lowest interval or highest interval of an open-ended distribution. For a 5-year median estimate, the margin of error associated with a median was larger than the median itself.N The estimate or margin of error cannot be displayed because there were an insufficient number of sample cases in the selected geographic area. (X) The estimate or margin of error is not applicable or not available.median- ...
Data from: Teaching software processes from different application domains
zenodo.org
bin, pdf
Updated Jul 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carla Bezerra; Carla Bezerra; Emanuel Coutinho; Emanuel Coutinho (2024). Teaching software processes from different application domains [Dataset]. http://doi.org/10.5281/zenodo.7068357
Explore at:
bin, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7068357
Dataset updated
Jul 16, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Carla Bezerra; Carla Bezerra; Emanuel Coutinho; Emanuel Coutinho
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In a current application development scenario in different environments, technologies and contexts, such as IoT, Blockchain, Machine Learning and Cloud Computing, there is a need for particular solutions for domain-specific software development processes. The proper definition of software processes requires understanding the involved teams and organization’s particularities and specialized technical knowledge in Software Engineering. Although it is an essential part of Software Engineering, many university curricula do not dedicate as much effort to teaching software processes, focusing more on the basic principles of Software Engineering, such as requirements, architecture and programming languages. Another important aspect of software processes is modeling. The modeling of a software process provides a basis for managing, automating and supporting the software processes improvement. In this context, teaching software processes modeling becomes challenging, mainly due to the great emphasis on theory and few practices. This work presents an experience report teaching the definition and modeling of software processes in different domains. We apply in the discipline of software processes a practice for defining and modeling processes in various application domains, such as: IoT, cloud, mobile, critical systems, self-adaptive systems and games. The processes were modeled in the EPF composer tool based on references from the literature for each domain. In the end, we evaluated the process modeling practice with the students. We concluded that the modeling tool and the maturity in the domain are essential for the good performance of the process.
Enterprise Survey 2009-2019, Panel Data - Slovenia
microdata.worldbank.org
catalog.ihsn.org
Updated Aug 6, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Bank Group (WBG) (2020). Enterprise Survey 2009-2019, Panel Data - Slovenia [Dataset]. https://microdata.worldbank.org/index.php/catalog/3762
Explore at:
Dataset updated
Aug 6, 2020
Dataset provided by
World Bankhttp://worldbank.org/
European Investment Bankhttp://eib.org/
European Bank for Reconstruction and Developmenthttp://ebrd.com/
Time period covered
2008 - 2019
Area covered
Slovenia
Description
Abstract

The documentation covers Enterprise Survey panel datasets that were collected in Slovenia in 2009, 2013 and 2019.

The Slovenia ES 2009 was conducted between 2008 and 2009. The Slovenia ES 2013 was conducted between March 2013 and September 2013. Finally, the Slovenia ES 2019 was conducted between December 2018 and November 2019. The objective of the Enterprise Survey is to gain an understanding of what firms experience in the private sector.

As part of its strategic goal of building a climate for investment, job creation, and sustainable growth, the World Bank has promoted improving the business environment as a key strategy for development, which has led to a systematic effort in collecting enterprise data across countries. The Enterprise Surveys (ES) are an ongoing World Bank project in collecting both objective data based on firms' experiences and enterprises' perception of the environment in which they operate.

Geographic coverage

National

Analysis unit

The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must take its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.

Universe

As it is standard for the ES, the Slovenia ES was based on the following size stratification: small (5 to 19 employees), medium (20 to 99 employees), and large (100 or more employees).

Kind of data

Sample survey data [ssd]

Sampling procedure

The sample for Slovenia ES 2009, 2013, 2019 were selected using stratified random sampling, following the methodology explained in the Sampling Manual for Slovenia 2009 ES and for Slovenia 2013 ES, and in the Sampling Note for 2019 Slovenia ES.

Three levels of stratification were used in this country: industry, establishment size, and oblast (region). The original sample designs with specific information of the industries and regions chosen are included in the attached Excel file (Sampling Report.xls.) for Slovenia 2009 ES. For Slovenia 2013 and 2019 ES, specific information of the industries and regions chosen is described in the "The Slovenia 2013 Enterprise Surveys Data Set" and "The Slovenia 2019 Enterprise Surveys Data Set" reports respectively, Appendix E.

For the Slovenia 2009 ES, industry stratification was designed in the way that follows: the universe was stratified into manufacturing industries, services industries, and one residual (core) sector as defined in the sampling manual. Each industry had a target of 90 interviews. For the manufacturing industries sample sizes were inflated by about 17% to account for potential non-response cases when requesting sensitive financial data and also because of likely attrition in future surveys that would affect the construction of a panel. For the other industries (residuals) sample sizes were inflated by about 12% to account for under sampling in firms in service industries.

For Slovenia 2013 ES, industry stratification was designed in the way that follows: the universe was stratified into one manufacturing industry, and two service industries (retail, and other services).

Finally, for Slovenia 2019 ES, three levels of stratification were used in this country: industry, establishment size, and region. The original sample design with specific information of the industries and regions chosen is described in "The Slovenia 2019 Enterprise Surveys Data Set" report, Appendix C. Industry stratification was done as follows: Manufacturing – combining all the relevant activities (ISIC Rev. 4.0 codes 10-33), Retail (ISIC 47), and Other Services (ISIC 41-43, 45, 46, 49-53, 55, 56, 58, 61, 62, 79, 95).

For Slovenia 2009 and 2013 ES, size stratification was defined following the standardized definition for the rollout: small (5 to 19 employees), medium (20 to 99 employees), and large (more than 99 employees). For stratification purposes, the number of employees was defined on the basis of reported permanent full-time workers. This seems to be an appropriate definition of the labor force since seasonal/casual/part-time employment is not a common practice, except in the sectors of construction and agriculture.

For Slovenia 2009 ES, regional stratification was defined in 2 regions. These regions are Vzhodna Slovenija and Zahodna Slovenija. The Slovenia sample contains panel data. The wave 1 panel “Investment Climate Private Enterprise Survey implemented in Slovenia” consisted of 223 establishments interviewed in 2005. A total of 57 establishments have been re-interviewed in the 2008 Business Environment and Enterprise Performance Survey.

For Slovenia 2013 ES, regional stratification was defined in 2 regions (city and the surrounding business area) throughout Slovenia.

Finally, for Slovenia 2019 ES, regional stratification was done across two regions: Eastern Slovenia (NUTS code SI03) and Western Slovenia (SI04).

Mode of data collection

Computer Assisted Personal Interview [capi]

Research instrument

Questionnaires have common questions (core module) and respectfully additional manufacturing- and services-specific questions. The eligible manufacturing industries have been surveyed using the Manufacturing questionnaire (includes the core module, plus manufacturing specific questions). Retail firms have been interviewed using the Services questionnaire (includes the core module plus retail specific questions) and the residual eligible services have been covered using the Services questionnaire (includes the core module). Each variation of the questionnaire is identified by the index variable, a0.

Response rate

Survey non-response must be differentiated from item non-response. The former refers to refusals to participate in the survey altogether whereas the latter refers to the refusals to answer some specific questions. Enterprise Surveys suffer from both problems and different strategies were used to address these issues.

Item non-response was addressed by two strategies: a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect the refusal to respond as (-8). b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary. However, there were clear cases of low response.

For 2009 and 2013 Slovenia ES, the survey non-response was addressed by maximizing efforts to contact establishments that were initially selected for interview. Up to 4 attempts were made to contact the establishment for interview at different times/days of the week before a replacement establishment (with similar strata characteristics) was suggested for interview. Survey non-response did occur but substitutions were made in order to potentially achieve strata-specific goals. Further research is needed on survey non-response in the Enterprise Surveys regarding potential introduction of bias.

For 2009, the number of contacted establishments per realized interview was 6.18. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The relatively low ratio of contacted establishments per realized interview (6.18) suggests that the main source of error in estimates in the Slovenia may be selection bias and not frame inaccuracy.

For 2013, the number of realized interviews per contacted establishment was 25%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The number of rejections per contact was 44%.

Finally, for 2019, the number of interviews per contacted establishments was 9.7%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The share of rejections per contact was 75.2%.

Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...

data.niaid.nih.gov

Updated Oct 20, 2022

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Yfantidou, Sofia (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6826682

Explore at:

Dataset updated

Oct 20, 2022

Dataset provided by

Ferrari, Elena
Marchioro, Thomas
Giakatos, Dimitrios Panteleimon
Yfantidou, Sofia
Karagianni, Christina
Kazlouski, Andrei
Palotti, Joao
Girdzijauskas, Šarūnas
Efstathiou, Stefanos
Vakali, Athena

Description

LifeSnaps Dataset Documentation

Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.

The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.

Data Import: Reading CSV

For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.

Data Import: Setting up a MongoDB (Recommended)

To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.

To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.

For the Fitbit data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c fitbit

For the SEMA data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c sema

For surveys data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c surveys

If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.

Data Availability

The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:

{ _id: id (or user_id): type: data: }

Each document consists of four fields: id (also found as user_id in sema and survey collections), type, and data. The _id field is the MongoDB-defined primary key and can be ignored. The id field refers to a user-specific ID used to uniquely identify each user across all collections. The type field refers to the specific data type within the collection, e.g., steps, heart rate, calories, etc. The data field contains the actual information about the document e.g., steps count for a specific timestamp for the steps type, in the form of an embedded object. The contents of the data object are type-dependent, meaning that the fields within the data object are different between different types of data. As mentioned previously, all times are stored in local time, and user IDs are common across different collections. For more information on the available data types, see the related publication.

Surveys Encoding

BREQ2

Why do you engage in exercise?

    Code
    Text


    engage[SQ001]
    I exercise because other people say I should


    engage[SQ002]
    I feel guilty when I don’t exercise


    engage[SQ003]
    I value the benefits of exercise


    engage[SQ004]
    I exercise because it’s fun


    engage[SQ005]
    I don’t see why I should have to exercise


    engage[SQ006]
    I take part in exercise because my friends/family/partner say I should


    engage[SQ007]
    I feel ashamed when I miss an exercise session


    engage[SQ008]
    It’s important to me to exercise regularly


    engage[SQ009]
    I can’t see why I should bother exercising


    engage[SQ010]
    I enjoy my exercise sessions


    engage[SQ011]
    I exercise because others will not be pleased with me if I don’t


    engage[SQ012]
    I don’t see the point in exercising


    engage[SQ013]
    I feel like a failure when I haven’t exercised in a while


    engage[SQ014]
    I think it is important to make the effort to exercise regularly


    engage[SQ015]
    I find exercise a pleasurable activity


    engage[SQ016]
    I feel under pressure from my friends/family to exercise


    engage[SQ017]
    I get restless if I don’t exercise regularly


    engage[SQ018]
    I get pleasure and satisfaction from participating in exercise


    engage[SQ019]
    I think exercising is a waste of time

PANAS

Indicate the extent you have felt this way over the past week

    P1[SQ001]
    Interested


    P1[SQ002]
    Distressed


    P1[SQ003]
    Excited


    P1[SQ004]
    Upset


    P1[SQ005]
    Strong


    P1[SQ006]
    Guilty


    P1[SQ007]
    Scared


    P1[SQ008]
    Hostile


    P1[SQ009]
    Enthusiastic


    P1[SQ010]
    Proud


    P1[SQ011]
    Irritable


    P1[SQ012]
    Alert


    P1[SQ013]
    Ashamed


    P1[SQ014]
    Inspired


    P1[SQ015]
    Nervous


    P1[SQ016]
    Determined


    P1[SQ017]
    Attentive


    P1[SQ018]
    Jittery


    P1[SQ019]
    Active


    P1[SQ020]
    Afraid

Personality

How Accurately Can You Describe Yourself?

    Code
    Text


    ipip[SQ001]
    Am the life of the party.


    ipip[SQ002]
    Feel little concern for others.


    ipip[SQ003]
    Am always prepared.


    ipip[SQ004]
    Get stressed out easily.


    ipip[SQ005]
    Have a rich vocabulary.


    ipip[SQ006]
    Don't talk a lot.


    ipip[SQ007]
    Am interested in people.


    ipip[SQ008]
    Leave my belongings around.


    ipip[SQ009]
    Am relaxed most of the time.


    ipip[SQ010]
    Have difficulty understanding abstract ideas.


    ipip[SQ011]
    Feel comfortable around people.


    ipip[SQ012]
    Insult people.


    ipip[SQ013]
    Pay attention to details.


    ipip[SQ014]
    Worry about things.


    ipip[SQ015]
    Have a vivid imagination.


    ipip[SQ016]
    Keep in the background.


    ipip[SQ017]
    Sympathize with others' feelings.


    ipip[SQ018]
    Make a mess of things.


    ipip[SQ019]
    Seldom feel blue.


    ipip[SQ020]
    Am not interested in abstract ideas.


    ipip[SQ021]
    Start conversations.


    ipip[SQ022]
    Am not interested in other people's problems.


    ipip[SQ023]
    Get chores done right away.


    ipip[SQ024]
    Am easily disturbed.


    ipip[SQ025]
    Have excellent ideas.


    ipip[SQ026]
    Have little to say.


    ipip[SQ027]
    Have a soft heart.


    ipip[SQ028]
    Often forget to put things back in their proper place.


    ipip[SQ029]
    Get upset easily.


    ipip[SQ030]
    Do not have a good imagination.


    ipip[SQ031]
    Talk to a lot of different people at parties.


    ipip[SQ032]
    Am not really interested in others.


    ipip[SQ033]
    Like order.


    ipip[SQ034]
    Change my mood a lot.


    ipip[SQ035]
    Am quick to understand things.


    ipip[SQ036]
    Don't like to draw attention to myself.


    ipip[SQ037]
    Take time out for others.


    ipip[SQ038]
    Shirk my duties.


    ipip[SQ039]
    Have frequent mood swings.


    ipip[SQ040]
    Use difficult words.


    ipip[SQ041]
    Don't mind being the centre of attention.


    ipip[SQ042]
    Feel others' emotions.


    ipip[SQ043]
    Follow a schedule.


    ipip[SQ044]
    Get irritated easily.


    ipip[SQ045]
    Spend time reflecting on things.


    ipip[SQ046]
    Am quiet around strangers.


    ipip[SQ047]
    Make people feel at ease.


    ipip[SQ048]
    Am exacting in my work.


    ipip[SQ049]
    Often feel blue.


    ipip[SQ050]
    Am full of ideas.

STAI

Indicate how you feel right now

    Code
    Text


    STAI[SQ001]
    I feel calm


    STAI[SQ002]
    I feel secure


    STAI[SQ003]
    I am tense


    STAI[SQ004]
    I feel strained


    STAI[SQ005]
    I feel at ease


    STAI[SQ006]
    I feel upset


    STAI[SQ007]
    I am presently worrying over possible misfortunes


    STAI[SQ008]
    I feel satisfied


    STAI[SQ009]
    I feel frightened


    STAI[SQ010]
    I feel comfortable


    STAI[SQ011]
    I feel self-confident


    STAI[SQ012]
    I feel nervous


    STAI[SQ013]
    I am jittery


    STAI[SQ014]
    I feel indecisive


    STAI[SQ015]
    I am relaxed


    STAI[SQ016]
    I feel content


    STAI[SQ017]
    I am worried


    STAI[SQ018]
    I feel confused


    STAI[SQ019]
    I feel steady


    STAI[SQ020]
    I feel pleasant

TTM

Do you engage in regular physical activity according to the definition above? How frequently did each event or experience occur in the past month?

    Code
    Text


    processes[SQ002]
    I read articles to learn more about physical

Predicting Coupon Redemption_Feature Selection

kaggle.com

zip

Updated Nov 17, 2019

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

vasudeva (2019). Predicting Coupon Redemption_Feature Selection [Dataset]. https://www.kaggle.com/vasudeva009/predicting-coupon-redemption-feature-selection

Explore at:

zip(65337333 bytes)Available download formats

Dataset updated

Nov 17, 2019

Authors

vasudeva

Description

Problem Statement

Predicting Coupon Redemption

XYZ Credit Card company regularly helps its merchants understand their data better and take key business decisions accurately by providing machine learning and analytics consulting. ABC is an established Brick & Mortar retailer that frequently conducts marketing campaigns for its diverse product range. As a merchant of XYZ, they have sought XYZ to assist them in their discount marketing process using the power of machine learning.

Discount marketing and coupon usage are very widely used promotional techniques to attract new customers and to retain & reinforce loyalty of existing customers. The measurement of a consumer’s propensity towards coupon usage and the prediction of the redemption behaviour are crucial parameters in assessing the effectiveness of a marketing campaign.

ABC promotions are shared across various channels including email, notifications, etc. A number of these campaigns include coupon discounts that are offered for a specific product/range of products. The retailer would like the ability to predict whether customers redeem the coupons received across channels, which will enable the retailer’s marketing team to accurately design coupon construct, and develop more precise and targeted marketing strategies.

The data available in this problem contains the following information, including the details of a sample of campaigns and coupons used in previous campaigns -

User Demographic Details

Campaign and coupon Details

Product details

Previous transactions

Based on previous transaction & performance data from the last 18 campaigns, predict the probability for the next 10 campaigns in the test set for each coupon and customer combination, whether the customer will redeem the coupon or not?

Dataset Description

Here is the schema for the different data tables available. The detailed data dictionary is provided next.

You are provided with the following files:

train.csv: Train data containing the coupons offered to the given customers under the 18 campaigns

Variable	Definition
id	Unique id for coupon customer impression
campaign_id	Unique id for a discount campaign
coupon_id	Unique id for a discount coupon
customer_id	Unique id for a customer
redemption_status	(target) (0 - Coupon not redeemed, 1 - Coupon redeemed)

campaign_data.csv: Campaign information for each of the 28 campaigns

Variable	Definition
campaign_id	Unique id for a discount campaign
campaign_type	Anonymised Campaign Type (X/Y)
start_date	Campaign Start Date
end_date	Campaign End Date

coupon_item_mapping.csv: Mapping of coupon and items valid for discount under that coupon

Variable	Definition
coupon_id	Unique id for a discount coupon (no order)
item_id	Unique id for items for which given coupon is valid (no order)

customer_demographics.csv: Customer demographic information for some customers

Variable	Definition
customer_id	Unique id for a customer
age_range	Age range of customer family in years
marital_status	Married/Single
rented	0 - not rented accommodation, 1 - rented accommodation
family_size	Number of family members
no_of_children	Number of children in the family
income_bracket	Label Encoded Income Bracket (Higher income corresponds to higher number)

customer_transaction_data.csv: Transaction data for all customers for duration of campaigns in the train data

Variable	Definition
date	Date of Transaction
customer_id	Unique id for a customer
item_id	Unique id for item
quantity	quantity of item bought
selling_price	Sales value of the transaction
other_discount	Discount from other sources such as manufacturer coupon/loyalty card
coupon_discount	Discount availed from retailer coupon

item_data.csv: Item information for each item sold by the retailer

Variable	Definition
item_id	Unique id for itemv
brand	Unique id for item brand
brand_type	Brand Type (local/Established)
category	Item Category

test.csv: Contains the coupon customer combination for which redemption status is to be predicted

Variable	Definition
id	Unique id for coupon customer impression
campaign_id	Unique id for a discount campaign
coupon_id	Unique id for a discount coupon
customer_id	Unique id for a customer

To summarise the entire process:

Customers receive coupons under various campaigns and may choose to redeem it.
They can redeem the given coupon for any valid product for that coupon as per coupon item mapping within the duration between campaign start date and end date
Next, the customer will redeem the coupon for an item at the retailer store and that will reflect in the transaction table in the column coupon_discount.

Public and Private Split

Test data is further randomly divided into Public (40%) and Private data (60%)
Your initial responses will be checked and scored on the Public data.
The final rankings would be based on your private score which will be published once the competition is over.

Note

AV_amex_lgb_folds_v28.csv Private Score of 92.50 (Submitted)
AV_amex_stack2_folds_v28.csv Private Score 92.811 (Best out of all - mean of CB and LGBM)
Stacking always works, dont ignore whatever Public LB says
Kaggle Link Best Kernel -**v31**

Data Virtualization Market By Deployment Mode (On-Premises, Cloud-Based),...
verifiedmarketresearch.com
Updated Jul 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VERIFIED MARKET RESEARCH (2024). Data Virtualization Market By Deployment Mode (On-Premises, Cloud-Based), Component Outlook (Standalone Software, Data Integration Solution, Application Tool Solution), Organization Size (Small and Medium-sized Enterprises (SMEs), Large Enterprises), Verticals (Banking, Financial Services, and Insurance (BFSI), Healthcare, Retail and eCommerce, Telecom and IT, Manufacturing), Region for 2024-2031 [Dataset]. https://www.verifiedmarketresearch.com/product/data-virtualization-market/
Explore at:
Dataset updated
Jul 25, 2024
Dataset provided by
Verified Market Researchhttps://www.verifiedmarketresearch.com/
Authors
VERIFIED MARKET RESEARCH
License
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Time period covered
2024 - 2031
Area covered
Global
Description
Data Virtualization Market size was valued at USD 4.05 Billion in 2023 and is projected to reach USD 15.55 Billion By 2031, growing at a CAGR of 20.20% during the forecast period 2024 to 2031.

Data Virtualization Market: Definition/ Overview

Data virtualization is an advanced technique in data management that streamlines access to information from various sources, offering a seamless and unified view of data despite its diverse locations and formats. This approach acts as an intermediary layer, enabling users to interact with data as if it were consolidated in a single repository. By abstracting the underlying complexities of different data sources, data virtualization simplifies the user's experience and eliminates the necessity of understanding the specifics of each individual source.

One of the primary benefits of data virtualization is its ability to provide near real-time access to information. Unlike traditional data integration methods that rely on duplicating data, data virtualization allows users to retrieve and query data in its original location. This real-time capability ensures that users have the most current and accurate data available for decision-making.

Additionally, data virtualization can significantly enhance system performance. By optimizing queries and minimizing the movement of data across networks, it reduces the overhead typically associated with data integration processes. This efficiency not only accelerates data retrieval and processing times but also improves the overall responsiveness of the system. From a financial perspective, data virtualization offers substantial cost savings. It eliminates the need for complex and costly data integration projects that involve extensive data extraction, transformation, and loading (ETL) processes. By reducing the dependency on physical data consolidation, organizations can allocate resources more effectively and decrease the total cost of ownership for their data infrastructure.

Facebook

Twitter

Click to copy link

Link copied

Cite

Supriya Kumar; Matthew Arnold; Glen James; Rema Padman (2023). Established databases included in DISCOVER CKD. [Dataset]. http://doi.org/10.1371/journal.pone.0274131.t001

Established databases included in DISCOVER CKD.

Explore at:

xlsAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0274131.t001

Dataset updated

Jun 16, 2023

Dataset provided by

PLOS ONE

Authors

Supriya Kumar; Matthew Arnold; Glen James; Rema Padman

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Established databases included in DISCOVER CKD.

Clear search

Close search

Google apps

Main menu

Established databases included in DISCOVER CKD.

Standardizing Definitions and Data Collection Approach for Black-owned...

OneNet Cross-Platform Services

Secondary data and baseline covariates of patients included in DISCOVER CKD....

World Bank's Global Data🌎🌏🌍

Description:

Columns and Examples:

Series Code:

Topic:

Indicator Name:

Short Definition:

Long Definition:

Unit of Measure:

Periodicity:

Base Period:

Other Notes:

Aggregation Method:

Limitations and Exceptions:

Notes from Original Source:

General Comments:

Source:

Statistical Concept and Methodology:

Development Relevance:

Related Source Links:

Other Web Links:

Related Indicators:

Data from: NetFlow data collected with different packet sampling rates

MIMIC-IV

ERA5 hourly data on single levels from 1940 to present

New Oxford Dictionary of English, 2nd Edition

The definition data service for traffic congestion levels in Taoyuan City

Pedestrian Stops - Contact Cards

fdata-02-00040_Challenges and Legal Gaps of Genetic Profiling in the Era of...

Data from: Modeling software processes from different domains using SPEM and...

Geophysical surveys and geospatial data for Bob Kidd Lake, Washington...

2023 American Community Survey: DP02 | Selected Social Characteristics in...

Data from: Teaching software processes from different application domains

Enterprise Survey 2009-2019, Panel Data - Slovenia

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Response rate

Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...

Predicting Coupon Redemption_Feature Selection

Problem Statement

Predicting Coupon Redemption

Dataset Description

Public and Private Split

Note

Data Virtualization Market By Deployment Mode (On-Premises, Cloud-Based),...

Established databases included in DISCOVER CKD.