97 datasets found

Z
Dataset: A Systematic Literature Review on the topic of High-value datasets
data.niaid.nih.gov
Updated Jun 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Charalampos Alexopoulos (2023). Dataset: A Systematic Literature Review on the topic of High-value datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7944424
Explore at:
Dataset updated
Jun 23, 2023
Dataset provided by
Andrea Miletič
Charalampos Alexopoulos
Magdalena Ciesielska
Nina Rizun
Anastasija Nikiforova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains data collected during a study ("Towards High-Value Datasets determination for data-driven development: a systematic literature review") conducted by Anastasija Nikiforova (University of Tartu), Nina Rizun, Magdalena Ciesielska (Gdańsk University of Technology), Charalampos Alexopoulos (University of the Aegean) and Andrea Miletič (University of Zagreb) It being made public both to act as supplementary data for "Towards High-Value Datasets determination for data-driven development: a systematic literature review" paper (pre-print is available in Open Access here -> https://arxiv.org/abs/2305.10234) and in order for other researchers to use these data in their own work.

The protocol is intended for the Systematic Literature review on the topic of High-value Datasets with the aim to gather information on how the topic of High-value datasets (HVD) and their determination has been reflected in the literature over the years and what has been found by these studies to date, incl. the indicators used in them, involved stakeholders, data-related aspects, and frameworks. The data in this dataset were collected in the result of the SLR over Scopus, Web of Science, and Digital Government Research library (DGRL) in 2023.

Methodology

To understand how HVD determination has been reflected in the literature over the years and what has been found by these studies to date, all relevant literature covering this topic has been studied. To this end, the SLR was carried out to by searching digital libraries covered by Scopus, Web of Science (WoS), Digital Government Research library (DGRL).

These databases were queried for keywords ("open data" OR "open government data") AND ("high-value data*" OR "high value data*"), which were applied to the article title, keywords, and abstract to limit the number of papers to those, where these objects were primary research objects rather than mentioned in the body, e.g., as a future work. After deduplication, 11 articles were found unique and were further checked for relevance. As a result, a total of 9 articles were further examined. Each study was independently examined by at least two authors.

To attain the objective of our study, we developed the protocol, where the information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information.

Test procedure Each study was independently examined by at least two authors, where after the in-depth examination of the full-text of the article, the structured protocol has been filled for each study. The structure of the survey is available in the supplementary file available (see Protocol_HVD_SLR.odt, Protocol_HVD_SLR.docx) The data collected for each study by two researchers were then synthesized in one final version by the third researcher.

Description of the data in this data set

Protocol_HVD_SLR provides the structure of the protocol Spreadsheets #1 provides the filled protocol for relevant studies. Spreadsheet#2 provides the list of results after the search over three indexing databases, i.e. before filtering out irrelevant studies

The information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information

Descriptive information
1) Article number - a study number, corresponding to the study number assigned in an Excel worksheet 2) Complete reference - the complete source information to refer to the study 3) Year of publication - the year in which the study was published 4) Journal article / conference paper / book chapter - the type of the paper -{journal article, conference paper, book chapter} 5) DOI / Website- a link to the website where the study can be found 6) Number of citations - the number of citations of the article in Google Scholar, Scopus, Web of Science 7) Availability in OA - availability of an article in the Open Access 8) Keywords - keywords of the paper as indicated by the authors 9) Relevance for this study - what is the relevance level of the article for this study? {high / medium / low}

Approach- and research design-related information 10) Objective / RQ - the research objective / aim, established research questions 11) Research method (including unit of analysis) - the methods used to collect data, including the unit of analy-sis (country, organisation, specific unit that has been ana-lysed, e.g., the number of use-cases, scope of the SLR etc.) 12) Contributions - the contributions of the study 13) Method - whether the study uses a qualitative, quantitative, or mixed methods approach? 14) Availability of the underlying research data- whether there is a reference to the publicly available underly-ing research data e.g., transcriptions of interviews, collected data, or explanation why these data are not shared? 15) Period under investigation - period (or moment) in which the study was conducted 16) Use of theory / theoretical concepts / approaches - does the study mention any theory / theoretical concepts / approaches? If any theory is mentioned, how is theory used in the study?

Quality- and relevance- related information
17) Quality concerns - whether there are any quality concerns (e.g., limited infor-mation about the research methods used)? 18) Primary research object - is the HVD a primary research object in the study? (primary - the paper is focused around the HVD determination, sec-ondary - mentioned but not studied (e.g., as part of discus-sion, future work etc.))

HVD determination-related information
19) HVD definition and type of value - how is the HVD defined in the article and / or any other equivalent term? 20) HVD indicators - what are the indicators to identify HVD? How were they identified? (components & relationships, “input -> output") 21) A framework for HVD determination - is there a framework presented for HVD identification? What components does it consist of and what are the rela-tionships between these components? (detailed description) 22) Stakeholders and their roles - what stakeholders or actors does HVD determination in-volve? What are their roles? 23) Data - what data do HVD cover? 24) Level (if relevant) - what is the level of the HVD determination covered in the article? (e.g., city, regional, national, international)

Format of the file .xls, .csv (for the first spreadsheet only), .odt, .docx

Licenses or restrictions CC-BY

For more info, see README.txt
Big data and business analytics revenue worldwide 2015-2022
statista.com
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2023). Big data and business analytics revenue worldwide 2015-2022 [Dataset]. https://www.statista.com/statistics/551501/worldwide-big-data-business-analytics-revenue/
Explore at:
Dataset updated
Nov 22, 2023
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
The global big data and business analytics (BDA) market was valued at 168.8 billion U.S. dollars in 2018 and is forecast to grow to 215.7 billion U.S. dollars by 2021. In 2021, more than half of BDA spending will go towards services. IT services is projected to make up around 85 billion U.S. dollars, and business services will account for the remainder. Big data High volume, high velocity and high variety: one or more of these characteristics is used to define big data, the kind of data sets that are too large or too complex for traditional data processing applications. Fast-growing mobile data traffic, cloud computing traffic, as well as the rapid development of technologies such as artificial intelligence (AI) and the Internet of Things (IoT) all contribute to the increasing volume and complexity of data sets. For example, connected IoT devices are projected to generate 79.4 ZBs of data in 2025. Business analytics Advanced analytics tools, such as predictive analytics and data mining, help to extract value from the data and generate business insights. The size of the business intelligence and analytics software application market is forecast to reach around 16.5 billion U.S. dollars in 2022. Growth in this market is driven by a focus on digital transformation, a demand for data visualization dashboards, and an increased adoption of cloud.
Z
Conceptualization of public data ecosystems
data.niaid.nih.gov
Updated Sep 26, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin, Lnenicka (2024). Conceptualization of public data ecosystems [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13842001
Explore at:
Dataset updated
Sep 26, 2024
Dataset provided by
Anastasija, Nikiforova
Martin, Lnenicka
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains data collected during a study "Understanding the development of public data ecosystems: from a conceptual model to a six-generation model of the evolution of public data ecosystems" conducted by Martin Lnenicka (University of Hradec Králové, Czech Republic), Anastasija Nikiforova (University of Tartu, Estonia), Mariusz Luterek (University of Warsaw, Warsaw, Poland), Petar Milic (University of Pristina - Kosovska Mitrovica, Serbia), Daniel Rudmark (Swedish National Road and Transport Research Institute, Sweden), Sebastian Neumaier (St. Pölten University of Applied Sciences, Austria), Karlo Kević (University of Zagreb, Croatia), Anneke Zuiderwijk (Delft University of Technology, Delft, the Netherlands), Manuel Pedro Rodríguez Bolívar (University of Granada, Granada, Spain).

As there is a lack of understanding of the elements that constitute different types of value-adding public data ecosystems and how these elements form and shape the development of these ecosystems over time, which can lead to misguided efforts to develop future public data ecosystems, the aim of the study is: (1) to explore how public data ecosystems have developed over time and (2) to identify the value-adding elements and formative characteristics of public data ecosystems. Using an exploratory retrospective analysis and a deductive approach, we systematically review 148 studies published between 1994 and 2023. Based on the results, this study presents a typology of public data ecosystems and develops a conceptual model of elements and formative characteristics that contribute most to value-adding public data ecosystems, and develops a conceptual model of the evolutionary generation of public data ecosystems represented by six generations called Evolutionary Model of Public Data Ecosystems (EMPDE). Finally, three avenues for a future research agenda are proposed.

This dataset is being made public both to act as supplementary data for "Understanding the development of public data ecosystems: from a conceptual model to a six-generation model of the evolution of public data ecosystems ", Telematics and Informatics*, and its Systematic Literature Review component that informs the study.

Description of the data in this data set

PublicDataEcosystem_SLR provides the structure of the protocol

Spreadsheet#1 provides the list of results after the search over three indexing databases and filtering out irrelevant studies

Spreadsheets #2 provides the protocol structure.

Spreadsheets #3 provides the filled protocol for relevant studies.

The information on each selected study was collected in four categories:(1) descriptive information,(2) approach- and research design- related information,(3) quality-related information,(4) HVD determination-related information

Descriptive Information

Article number

A study number, corresponding to the study number assigned in an Excel worksheet

Complete reference

The complete source information to refer to the study (in APA style), including the author(s) of the study, the year in which it was published, the study's title and other source information.

Year of publication

The year in which the study was published.

Journal article / conference paper / book chapter

The type of the paper, i.e., journal article, conference paper, or book chapter.

Journal / conference / book

Journal article, conference, where the paper is published.

DOI / Website

A link to the website where the study can be found.

Number of words

A number of words of the study.

Number of citations in Scopus and WoS

The number of citations of the paper in Scopus and WoS digital libraries.

Availability in Open Access

Availability of a study in the Open Access or Free / Full Access.

Keywords

Keywords of the paper as indicated by the authors (in the paper).

Relevance for our study (high / medium / low)

What is the relevance level of the paper for our study

Approach- and research design-related information

Approach- and research design-related information

Objective / Aim / Goal / Purpose & Research Questions

The research objective and established RQs.

Research method (including unit of analysis)

The methods used to collect data in the study, including the unit of analysis that refers to the country, organisation, or other specific unit that has been analysed such as the number of use-cases or policy documents, number and scope of the SLR etc.

Study’s contributions

The study’s contribution as defined by the authors

Qualitative / quantitative / mixed method

Whether the study uses a qualitative, quantitative, or mixed methods approach?

Availability of the underlying research data

Whether the paper has a reference to the public availability of the underlying research data e.g., transcriptions of interviews, collected data etc., or explains why these data are not openly shared?

Period under investigation

Period (or moment) in which the study was conducted (e.g., January 2021-March 2022)

Use of theory / theoretical concepts / approaches? If yes, specify them

Does the study mention any theory / theoretical concepts / approaches? If yes, what theory / concepts / approaches? If any theory is mentioned, how is theory used in the study? (e.g., mentioned to explain a certain phenomenon, used as a framework for analysis, tested theory, theory mentioned in the future research section).

Quality-related information

Quality concerns

Whether there are any quality concerns (e.g., limited information about the research methods used)?

Public Data Ecosystem-related information

Public data ecosystem definition

How is the public data ecosystem defined in the paper and any other equivalent term, mostly infrastructure. If an alternative term is used, how is the public data ecosystem called in the paper?

Public data ecosystem evolution / development

Does the paper define the evolution of the public data ecosystem? If yes, how is it defined and what factors affect it?

What constitutes a public data ecosystem?

What constitutes a public data ecosystem (components & relationships) - their "FORM / OUTPUT" presented in the paper (general description with more detailed answers to further additional questions).

Components and relationships

What components does the public data ecosystem consist of and what are the relationships between these components? Alternative names for components - element, construct, concept, item, helix, dimension etc. (detailed description).

Stakeholders

What stakeholders (e.g., governments, citizens, businesses, Non-Governmental Organisations (NGOs) etc.) does the public data ecosystem involve?

Actors and their roles

What actors does the public data ecosystem involve? What are their roles?

Data (data types, data dynamism, data categories etc.)

What data do the public data ecosystem cover (is intended / designed for)? Refer to all data-related aspects, including but not limited to data types, data dynamism (static data, dynamic, real-time data, stream), prevailing data categories / domains / topics etc.

Processes / activities / dimensions, data lifecycle phases

What processes, activities, dimensions and data lifecycle phases (e.g., locate, acquire, download, reuse, transform, etc.) does the public data ecosystem involve or refer to?

Level (if relevant)

What is the level of the public data ecosystem covered in the paper? (e.g., city, municipal, regional, national (=country), supranational, international).

Other elements or relationships (if any)

What other elements or relationships does the public data ecosystem consist of?

Additional comments

Additional comments (e.g., what other topics affected the public data ecosystems and their elements, what is expected to affect the public data ecosystems in the future, what were important topics by which the period was characterised etc.).

New papers

Does the study refer to any other potentially relevant papers?

Additional references to potentially relevant papers that were found in the analysed paper (snowballing).

Format of the file.xls, .csv (for the first spreadsheet only), .docx

Licenses or restrictionsCC-BY

For more info, see README.txt
Company Datasets for Business Profiling
datarade.ai
Updated Feb 23, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oxylabs (2017). Company Datasets for Business Profiling [Dataset]. https://datarade.ai/data-products/company-datasets-for-business-profiling-oxylabs
Explore at:
.json, .xml, .csv, .xlsAvailable download formats
Dataset updated
Feb 23, 2017
Dataset authored and provided by
Oxylabs
Area covered
Canada, Isle of Man, Moldova (Republic of), Tunisia, British Indian Ocean Territory, Andorra, Nepal, Bangladesh, Taiwan, Northern Mariana Islands
Description
Company Datasets for valuable business insights!

Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.

These datasets are sourced from top industry providers, ensuring you have access to high-quality information:

Owler: Gain valuable business insights and competitive intelligence. -AngelList: Receive fresh startup data transformed into actionable insights. -CrunchBase: Access clean, parsed, and ready-to-use business data from private and public companies. -Craft.co: Make data-informed business decisions with Craft.co's company datasets. -Product Hunt: Harness the Product Hunt dataset, a leader in curating the best new products.

We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:

Company name;

Size;

Founding date;

Location;

Industry;

Revenue;

Employee count;

Competitors.

You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.

Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.

With Oxylabs Datasets, you can count on:

Fresh and accurate data collected and parsed by our expert web scraping team.

Time and resource savings, allowing you to focus on data analysis and achieving your business goals.

A customized approach tailored to your specific business needs.

Legal compliance in line with GDPR and CCPA standards, thanks to our membership in the Ethical Web Data Collection Initiative.

Pricing Options:

Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

Experience a seamless journey with Oxylabs:

Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.

Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.

Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.

Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!
f
fdata-02-00040_Challenges and Legal Gaps of Genetic Profiling in the Era of...
figshare.com
bin
Updated Jun 3, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Murat Sariyar; Irene Schlünder (2023). fdata-02-00040_Challenges and Legal Gaps of Genetic Profiling in the Era of Big Data.xml [Dataset]. http://doi.org/10.3389/fdata.2019.00040.s002
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.3389/fdata.2019.00040.s002
Dataset updated
Jun 3, 2023
Dataset provided by
Frontiers
Authors
Murat Sariyar; Irene Schlünder
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Profiling of individuals based on inborn, acquired, and assigned characteristics is central for decision making in health care. In the era of omics and big smart data, it becomes urgent to differentiate between different data governance affordances for different profiling activities. Typically, diagnostic profiling is in the focus of researchers and physicians, and other types are regarded as undesired side-effects; for example, in the connection of health care insurance risk calculations. Profiling in a legal sense is addressed, for example, by the EU data protection law. It is defined in the General Data Protection Regulation as automated decision making. This term does not correspond fully with profiling in biomedical research and healthcare, and the impact on privacy has hardly ever been examined. But profiling is also an issue concerning the fundamental right of non-discrimination, whenever profiles are used in a way that has a discriminatory effect on individuals. Here, we will focus on genetic profiling, define related notions as legal and subject-matter definitions frequently differ, and discuss the ethical and legal challenges.
World Health Survey 2003 - Belgium
microdata.worldbank.org
catalog.ihsn.org
+2more
Updated Oct 17, 2013
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Health Organization (WHO) (2013). World Health Survey 2003 - Belgium [Dataset]. https://microdata.worldbank.org/index.php/catalog/1694
Explore at:
Dataset updated
Oct 17, 2013
Dataset provided by
World Health Organizationhttps://who.int/
Authors
World Health Organization (WHO)
Time period covered
2003
Area covered
Belgium
Description
Abstract

Different countries have different health outcomes that are in part due to the way respective health systems perform. Regardless of the type of health system, individuals will have health and non-health expectations in terms of how the institution responds to their needs. In many countries, however, health systems do not perform effectively and this is in part due to lack of information on health system performance, and on the different service providers.

The aim of the WHO World Health Survey is to provide empirical data to the national health information systems so that there is a better monitoring of health of the people, responsiveness of health systems and measurement of health-related parameters.

The overall aims of the survey is to examine the way populations report their health, understand how people value health states, measure the performance of health systems in relation to responsiveness and gather information on modes and extents of payment for health encounters through a nationally representative population based community survey. In addition, it addresses various areas such as health care expenditures, adult mortality, birth history, various risk factors, assessment of main chronic health conditions and the coverage of health interventions, in specific additional modules.

The objectives of the survey programme are to: 1. develop a means of providing valid, reliable and comparable information, at low cost, to supplement the information provided by routine health information systems. 2. build the evidence base necessary for policy-makers to monitor if health systems are achieving the desired goals, and to assess if additional investment in health is achieving the desired outcomes. 3. provide policy-makers with the evidence they need to adjust their policies, strategies and programmes as necessary.

Geographic coverage

The survey sampling frame must cover 100% of the country's eligible population, meaning that the entire national territory must be included. This does not mean that every province or territory need be represented in the survey sample but, rather, that all must have a chance (known probability) of being included in the survey sample.

There may be exceptional circumstances that preclude 100% national coverage. Certain areas in certain countries may be impossible to include due to reasons such as accessibility or conflict. All such exceptions must be discussed with WHO sampling experts. If any region must be excluded, it must constitute a coherent area, such as a particular province or region. For example if ¾ of region D in country X is not accessible due to war, the entire region D will be excluded from analysis.

Analysis unit

Households and individuals

Universe

The WHS will include all male and female adults (18 years of age and older) who are not out of the country during the survey period. It should be noted that this includes the population who may be institutionalized for health reasons at the time of the survey: all persons who would have fit the definition of household member at the time of their institutionalisation are included in the eligible population.

If the randomly selected individual is institutionalized short-term (e.g. a 3-day stay at a hospital) the interviewer must return to the household when the individual will have come back to interview him/her. If the randomly selected individual is institutionalized long term (e.g. has been in a nursing home the last 8 years), the interviewer must travel to that institution to interview him/her.

The target population includes any adult, male or female age 18 or over living in private households. Populations in group quarters, on military reservations, or in other non-household living arrangements will not be eligible for the study. People who are in an institution due to a health condition (such as a hospital, hospice, nursing home, home for the aged, etc.) at the time of the visit to the household are interviewed either in the institution or upon their return to their household if this is within a period of two weeks from the first visit to the household.

Kind of data

Sample survey data [ssd]

Sampling procedure

SAMPLING GUIDELINES FOR WHS

Surveys in the WHS program must employ a probability sampling design. This means that every single individual in the sampling frame has a known and non-zero chance of being selected into the survey sample. While a Single Stage Random Sample is ideal if feasible, it is recognized that most sites will carry out Multi-stage Cluster Sampling.

The WHS sampling frame should cover 100% of the eligible population in the surveyed country. This means that every eligible person in the country has a chance of being included in the survey sample. It also means that particular ethnic groups or geographical areas may not be excluded from the sampling frame.

The sample size of the WHS in each country is 5000 persons (exceptions considered on a by-country basis). An adequate number of persons must be drawn from the sampling frame to account for an estimated amount of non-response (refusal to participate, empty houses etc.). The highest estimate of potential non-response and empty households should be used to ensure that the desired sample size is reached at the end of the survey period. This is very important because if, at the end of data collection, the required sample size of 5000 has not been reached additional persons must be selected randomly into the survey sample from the sampling frame. This is both costly and technically complicated (if this situation is to occur, consult WHO sampling experts for assistance), and best avoided by proper planning before data collection begins.

All steps of sampling, including justification for stratification, cluster sizes, probabilities of selection, weights at each stage of selection, and the computer program used for randomization must be communicated to WHO

STRATIFICATION

Stratification is the process by which the population is divided into subgroups. Sampling will then be conducted separately in each subgroup. Strata or subgroups are chosen because evidence is available that they are related to the outcome (e.g. health, responsiveness, mortality, coverage etc.). The strata chosen will vary by country and reflect local conditions. Some examples of factors that can be stratified on are geography (e.g. North, Central, South), level of urbanization (e.g. urban, rural), socio-economic zones, provinces (especially if health administration is primarily under the jurisdiction of provincial authorities), or presence of health facility in area. Strata to be used must be identified by each country and the reasons for selection explicitly justified.

Stratification is strongly recommended at the first stage of sampling. Once the strata have been chosen and justified, all stages of selection will be conducted separately in each stratum. We recommend stratifying on 3-5 factors. It is optimum to have half as many strata (note the difference between stratifying variables, which may be such variables as gender, socio-economic status, province/region etc. and strata, which are the combination of variable categories, for example Male, High socio-economic status, Xingtao Province would be a stratum).

Strata should be as homogenous as possible within and as heterogeneous as possible between. This means that strata should be formulated in such a way that individuals belonging to a stratum should be as similar to each other with respect to key variables as possible and as different as possible from individuals belonging to a different stratum. This maximises the efficiency of stratification in reducing sampling variance.

MULTI-STAGE CLUSTER SELECTION

A cluster is a naturally occurring unit or grouping within the population (e.g. enumeration areas, cities, universities, provinces, hospitals etc.); it is a unit for which the administrative level has clear, nonoverlapping boundaries. Cluster sampling is useful because it avoids having to compile exhaustive lists of every single person in the population. Clusters should be as heterogeneous as possible within and as homogenous as possible between (note that this is the opposite criterion as that for strata). Clusters should be as small as possible (i.e. large administrative units such as Provinces or States are not good clusters) but not so small as to be homogenous.

In cluster sampling, a number of clusters are randomly selected from a list of clusters. Then, either all members of the chosen cluster or a random selection from among them are included in the sample. Multistage sampling is an extension of cluster sampling where a hierarchy of clusters are chosen going from larger to smaller.

In order to carry out multi-stage sampling, one needs to know only the population sizes of the sampling units. For the smallest sampling unit above the elementary unit however, a complete list of all elementary units (households) is needed; in order to be able to randomly select among all households in the TSU, a list of all those households is required. This information may be available from the most recent population census. If the last census was >3 years ago or the information furnished by it was of poor quality or unreliable, the survey staff will have the task of enumerating all households in the smallest randomly selected sampling unit. It is very important to budget for this step if it is necessary and ensure that all households are properly enumerated in order that a representative sample is obtained.

It is always best to have as many clusters in the PSU as possible. The reason for this is that the fewer the number of respondents in each PSU, the lower will be the clustering effect which
Enterprise Survey 2009-2019, Panel Data - Slovenia
microdata.worldbank.org
catalog.ihsn.org
Updated Aug 6, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
European Bank for Reconstruction and Development (EBRD) (2020). Enterprise Survey 2009-2019, Panel Data - Slovenia [Dataset]. https://microdata.worldbank.org/index.php/catalog/3762
Explore at:
Dataset updated
Aug 6, 2020
Dataset provided by
European Bank for Reconstruction and Developmenthttp://ebrd.com/
World Bankhttp://worldbank.org/
World Bank Grouphttp://www.worldbank.org/
European Investment Bank (EIB)
Time period covered
2008 - 2019
Area covered
Slovenia
Description
Abstract

The documentation covers Enterprise Survey panel datasets that were collected in Slovenia in 2009, 2013 and 2019.

The Slovenia ES 2009 was conducted between 2008 and 2009. The Slovenia ES 2013 was conducted between March 2013 and September 2013. Finally, the Slovenia ES 2019 was conducted between December 2018 and November 2019. The objective of the Enterprise Survey is to gain an understanding of what firms experience in the private sector.

As part of its strategic goal of building a climate for investment, job creation, and sustainable growth, the World Bank has promoted improving the business environment as a key strategy for development, which has led to a systematic effort in collecting enterprise data across countries. The Enterprise Surveys (ES) are an ongoing World Bank project in collecting both objective data based on firms' experiences and enterprises' perception of the environment in which they operate.

Geographic coverage

National

Analysis unit

The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must take its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.

Universe

As it is standard for the ES, the Slovenia ES was based on the following size stratification: small (5 to 19 employees), medium (20 to 99 employees), and large (100 or more employees).

Kind of data

Sample survey data [ssd]

Sampling procedure

The sample for Slovenia ES 2009, 2013, 2019 were selected using stratified random sampling, following the methodology explained in the Sampling Manual for Slovenia 2009 ES and for Slovenia 2013 ES, and in the Sampling Note for 2019 Slovenia ES.

Three levels of stratification were used in this country: industry, establishment size, and oblast (region). The original sample designs with specific information of the industries and regions chosen are included in the attached Excel file (Sampling Report.xls.) for Slovenia 2009 ES. For Slovenia 2013 and 2019 ES, specific information of the industries and regions chosen is described in the "The Slovenia 2013 Enterprise Surveys Data Set" and "The Slovenia 2019 Enterprise Surveys Data Set" reports respectively, Appendix E.

For the Slovenia 2009 ES, industry stratification was designed in the way that follows: the universe was stratified into manufacturing industries, services industries, and one residual (core) sector as defined in the sampling manual. Each industry had a target of 90 interviews. For the manufacturing industries sample sizes were inflated by about 17% to account for potential non-response cases when requesting sensitive financial data and also because of likely attrition in future surveys that would affect the construction of a panel. For the other industries (residuals) sample sizes were inflated by about 12% to account for under sampling in firms in service industries.

For Slovenia 2013 ES, industry stratification was designed in the way that follows: the universe was stratified into one manufacturing industry, and two service industries (retail, and other services).

Finally, for Slovenia 2019 ES, three levels of stratification were used in this country: industry, establishment size, and region. The original sample design with specific information of the industries and regions chosen is described in "The Slovenia 2019 Enterprise Surveys Data Set" report, Appendix C. Industry stratification was done as follows: Manufacturing – combining all the relevant activities (ISIC Rev. 4.0 codes 10-33), Retail (ISIC 47), and Other Services (ISIC 41-43, 45, 46, 49-53, 55, 56, 58, 61, 62, 79, 95).

For Slovenia 2009 and 2013 ES, size stratification was defined following the standardized definition for the rollout: small (5 to 19 employees), medium (20 to 99 employees), and large (more than 99 employees). For stratification purposes, the number of employees was defined on the basis of reported permanent full-time workers. This seems to be an appropriate definition of the labor force since seasonal/casual/part-time employment is not a common practice, except in the sectors of construction and agriculture.

For Slovenia 2009 ES, regional stratification was defined in 2 regions. These regions are Vzhodna Slovenija and Zahodna Slovenija. The Slovenia sample contains panel data. The wave 1 panel “Investment Climate Private Enterprise Survey implemented in Slovenia” consisted of 223 establishments interviewed in 2005. A total of 57 establishments have been re-interviewed in the 2008 Business Environment and Enterprise Performance Survey.

For Slovenia 2013 ES, regional stratification was defined in 2 regions (city and the surrounding business area) throughout Slovenia.

Finally, for Slovenia 2019 ES, regional stratification was done across two regions: Eastern Slovenia (NUTS code SI03) and Western Slovenia (SI04).

Mode of data collection

Computer Assisted Personal Interview [capi]

Research instrument

Questionnaires have common questions (core module) and respectfully additional manufacturing- and services-specific questions. The eligible manufacturing industries have been surveyed using the Manufacturing questionnaire (includes the core module, plus manufacturing specific questions). Retail firms have been interviewed using the Services questionnaire (includes the core module plus retail specific questions) and the residual eligible services have been covered using the Services questionnaire (includes the core module). Each variation of the questionnaire is identified by the index variable, a0.

Response rate

Survey non-response must be differentiated from item non-response. The former refers to refusals to participate in the survey altogether whereas the latter refers to the refusals to answer some specific questions. Enterprise Surveys suffer from both problems and different strategies were used to address these issues.

Item non-response was addressed by two strategies: a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect the refusal to respond as (-8). b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary. However, there were clear cases of low response.

For 2009 and 2013 Slovenia ES, the survey non-response was addressed by maximizing efforts to contact establishments that were initially selected for interview. Up to 4 attempts were made to contact the establishment for interview at different times/days of the week before a replacement establishment (with similar strata characteristics) was suggested for interview. Survey non-response did occur but substitutions were made in order to potentially achieve strata-specific goals. Further research is needed on survey non-response in the Enterprise Surveys regarding potential introduction of bias.

For 2009, the number of contacted establishments per realized interview was 6.18. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The relatively low ratio of contacted establishments per realized interview (6.18) suggests that the main source of error in estimates in the Slovenia may be selection bias and not frame inaccuracy.

For 2013, the number of realized interviews per contacted establishment was 25%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The number of rejections per contact was 44%.

Finally, for 2019, the number of interviews per contacted establishments was 9.7%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The share of rejections per contact was 75.2%.
Forecast revenue big data market worldwide 2011-2027
statista.com
Updated Feb 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Forecast revenue big data market worldwide 2011-2027 [Dataset]. https://www.statista.com/statistics/254266/global-big-data-market-forecast/
Explore at:
Dataset updated
Feb 13, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
The global big data market is forecasted to grow to 103 billion U.S. dollars by 2027, more than double its expected market size in 2018. With a share of 45 percent, the software segment would become the large big data market segment by 2027.

What is Big data?

Big data is a term that refers to the kind of data sets that are too large or too complex for traditional data processing applications. It is defined as having one or some of the following characteristics: high volume, high velocity or high variety. Fast-growing mobile data traffic, cloud computing traffic, as well as the rapid development of technologies such as artificial intelligence (AI) and the Internet of Things (IoT) all contribute to the increasing volume and complexity of data sets.

Big data analytics

Advanced analytics tools, such as predictive analytics and data mining, help to extract value from the data and generate new business insights. The global big data and business analytics market was valued at 169 billion U.S. dollars in 2018 and is expected to grow to 274 billion U.S. dollars in 2022. As of November 2018, 45 percent of professionals in the market research industry reportedly used big data analytics as a research method.
2021 American Community Survey: B28001 | TYPES OF COMPUTERS IN HOUSEHOLD...
data.census.gov
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ACS, 2021 American Community Survey: B28001 | TYPES OF COMPUTERS IN HOUSEHOLD (ACS 5-Year Estimates Selected Population Detailed Tables) [Dataset]. https://data.census.gov/table/ACSDT5YSPT2021.B28001
Explore at:
Dataset provided by
United States Census Bureauhttp://census.gov/
Authors
ACS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
2021
Description
Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, it is the Census Bureau's Population Estimates Program that produces and disseminates the official estimates of the population for the nation, states, counties, cities, and towns and estimates of housing units for states and counties..Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Technical Documentation section.Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Source: U.S. Census Bureau, 2017-2021 American Community Survey 5-Year Estimates.Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation). The effect of nonsampling error is not represented in these tables..Data about computer and Internet use were collected by asking respondents to select "Yes" or "No" to each type of computer and each type of Internet subscription. Therefore, respondents were able to select more than one type of computer and more than one type of Internet subscription..The category "Has one or more types of computing devices" refers to those who said "Yes" to at least one of the following types of computers: Desktop or laptop; smartphone; tablet or other portable wireless computer; or some other type of computer. The category "No computer" consists of those who said "No" to all of these types of computers..Desktop or laptop refers to those who selected that category regardless of whether or not they indicated they also had another type of computer. However, "Desktop or laptop with no other type of computing device" refers to those who said "Yes" to owning or using a desktop or laptop and "No" to smartphone, tablet or other wireless computer, and other computer. Similarly, the same holds true for "Smartphone" compared to "Smartphone with no other type of computing device", "Tablet or other portable wireless computer" compared to "Tablet or other portable wireless computer with no other type of computing device", and "Other computer" compared to "Other computer with no other type of computing device.".The 2017-2021 American Community Survey (ACS) data generally reflect the March 2020 Office of Management and Budget (OMB) delineations of metropolitan and micropolitan statistical areas. In certain instances, the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB delineation lists due to differences in the effective dates of the geographic entities..Estimates of urban and rural populations, housing units, and characteristics reflect boundaries of urban areas defined based on Census 2010 data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..Explanation of Symbols:- The estimate could not be computed because there were an insufficient number of sample observations. For a ratio of medians estimate, one or both of the median estimates falls in the lowest interval or highest interval of an open-ended distribution. For a 5-year median estimate, the margin of error associated with a median was larger than the median itself.N The estimate or margin of error cannot be displayed because there were an insufficient number of sample cases in the selected geographic area. (X) The estimate or margin of error is not applicable or not available.median- The median falls in the lowest interval of an open-ended distribution (for example "2,500-")median+ The median falls in the highest interval of an open-ended distribution (for example "250,000+").** The margin of error could not be computed because there were an insufficient number of sample observations.*** The margin of error could not be computed because the median falls in the lowest interval or highest interval of an open-ended distribution.***** A margin of error is not appropriate because the corresponding estimate is controlled to an independent population or housing estimate. Effectively, the corresponding estimate has no sampling error and the margin of error may be treated as zero.
Google Analytics Sample
kaggle.com
zip
Updated Sep 19, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/bigquery/google-analytics-sample
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Sep 19, 2019
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

Content

The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

Fork this kernel to get started.

Acknowledgements

Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

Banner Photo by Edho Pratama from Unsplash.

Inspiration

What is the total number of transactions generated per device browser in July 2017?

The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

What was the average number of product pageviews for users who made a purchase in July 2017?

What was the average number of product pageviews for users who did not make a purchase in July 2017?

What was the average total transactions per user that made a purchase in July 2017?

What is the average amount of money spent per session in July 2017?

What is the sequence of pages viewed?
SAS code used to analyze data and a datafile with metadata glossary
catalog.data.gov
data.amerigeoss.org
Updated Nov 12, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). SAS code used to analyze data and a datafile with metadata glossary [Dataset]. https://catalog.data.gov/dataset/sas-code-used-to-analyze-data-and-a-datafile-with-metadata-glossary
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
We compiled macroinvertebrate assemblage data collected from 1995 to 2014 from the St. Louis River Area of Concern (AOC) of western Lake Superior. Our objective was to define depth-adjusted cutoff values for benthos condition classes (poor, fair, reference) to provide tool useful for assessing progress toward achieving removal targets for the degraded benthos beneficial use impairment in the AOC. The relationship between depth and benthos metrics was wedge-shaped. We therefore used quantile regression to model the limiting effect of depth on selected benthos metrics, including taxa richness, percent non-oligochaete individuals, combined percent Ephemeroptera, Trichoptera, and Odonata individuals, and density of ephemerid mayfly nymphs (Hexagenia). We created a scaled trimetric index from the first three metrics. Metric values at or above the 90th percentile quantile regression model prediction were defined as reference condition for that depth. We set the cutoff between poor and fair condition as the 50th percentile model prediction. We examined sampler type, exposure, geographic zone of the AOC, and substrate type for confounding effects. Based on these analyses we combined data across sampler type and exposure classes and created separate models for each geographic zone. We used the resulting condition class cutoff values to assess the relative benthic condition for three habitat restoration project areas. The depth-limited pattern of ephemerid abundance we observed in the St. Louis River AOC also occurred elsewhere in the Great Lakes. We provide tabulated model predictions for application of our depth-adjusted condition class cutoff values to new sample data. This dataset is associated with the following publication: Angradi, T., W. Bartsch, A. Trebitz, V. Brady, and J. Launspach. A depth-adjusted ambient distribution approach for setting numeric removal targets for a Great Lakes Area of Concern beneficial use impairment: Degraded benthos. JOURNAL OF GREAT LAKES RESEARCH. International Association for Great Lakes Research, Ann Arbor, MI, USA, 43(1): 108-120, (2017).
f
Data from "Obstacles to the Reuse of Study Metadata in ClinicalTrials.gov"
figshare.com
zip
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laura Miron; Rafael Gonçalves; Mark A. Musen (2023). Data from "Obstacles to the Reuse of Study Metadata in ClinicalTrials.gov" [Dataset]. http://doi.org/10.6084/m9.figshare.12743939.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12743939.v2
Dataset updated
Jun 1, 2023
Dataset provided by
figshare
Authors
Laura Miron; Rafael Gonçalves; Mark A. Musen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This fileset provides supporting data and corpora for the empirical study described in: Laura Miron, Rafael S. Goncalves and Mark A. Musen. Obstacles to the Reuse of Metadata in ClinicalTrials.govDescription of filesOriginal data files:- AllPublicXml.zip contains the set of all public XML records in ClinicalTrials.gov (protocols and summary results information), on which all remaining analyses are based. Set contains 302,091 records downloaded on April 3, 2019.- public.xsd is the XML schema downloaded from ClinicalTrials.gov on April 3, 2019, used to validate records in AllPublicXML.BioPortal API Query Results- condition_matches.csv contains the results of querying the BioPortal API for all ontology terms that are an 'exact match' to each condition string scraped from the ClinicalTrials.gov XML. Columns={filename, condition, url, bioportal term, cuis, tuis}. - intervention_matches.csv contains BioPortal API query results for all interventions scraped from the ClinicalTrials.gov XML. Columns={filename, intervention, url, bioportal term, cuis, tuis}.Data Element Definitions- supplementary_table_1.xlsx Mapping of element names, element types, and whether elements are required in ClinicalTrials.gov data dictionaries, the ClinicalTrials.gov XML schema declaration for records (public.XSD), the Protocol Registration System (PRS), FDAAA801, and the WHO required data elements for clinical trial registrations.Column and value definitions: - CT.gov Data Dictionary Section: Section heading for a group of data elements in the ClinicalTrials.gov data dictionary (https://prsinfo.clinicaltrials.gov/definitions.html) - CT.gov Data Dictionary Element Name: Name of an element/field according to the ClinicalTrials.gov data dictionaries (https://prsinfo.clinicaltrials.gov/definitions.html) and (https://prsinfo.clinicaltrials.gov/expanded_access_definitions.html) - CT.gov Data Dictionary Element Type: "Data" if the element is a field for which the user provides a value, "Group Heading" if the element is a group heading for several sub-fields, but is not in itself associated with a user-provided value. - Required for CT.gov for Interventional Records: "Required" if the element is required for interventional records according to the data dictionary, "CR" if the element is conditionally required, "Jan 2017" if the element is required for studies starting on or after January 18, 2017, the effective date of the FDAAA801 Final Rule, "-" indicates if this element is not applicable to interventional records (only observational or expanded access) - Required for CT.gov for Observational Records: "Required" if the element is required for interventional records according to the data dictionary, "CR" if the element is conditionally required, "Jan 2017" if the element is required for studies starting on or after January 18, 2017, the effective date of the FDAAA801 Final Rule, "-" indicates if this element is not applicable to observational records (only interventional or expanded access) - Required in CT.gov for Expanded Access Records?: "Required" if the element is required for interventional records according to the data dictionary, "CR" if the element is conditionally required, "Jan 2017" if the element is required for studies starting on or after January 18, 2017, the effective date of the FDAAA801 Final Rule, "-" indicates if this element is not applicable to expanded access records (only interventional or observational) - CT.gov XSD Element Definition: abbreviated xpath to the corresponding element in the ClinicalTrials.gov XSD (public.XSD). The full xpath includes 'clinical_study/' as a prefix to every element. (There is a single top-level element called "clinical_study" for all other elements.) - Required in XSD? : "Yes" if the element is required according to public.XSD, "No" if the element is optional, "-" if the element is not made public or included in the XSD - Type in XSD: "text" if the XSD type was "xs:string" or "textblock", name of enum given if type was enum, "integer" if type was "xs:integer" or "xs:integer" extended with the "type" attribute, "struct" if the type was a struct defined in the XSD - PRS Element Name: Name of the corresponding entry field in the PRS system - PRS Entry Type: Entry type in the PRS system. This column contains some free text explanations/observations - FDAAA801 Final Rule FIeld Name: Name of the corresponding required field in the FDAAA801 Final Rule (https://www.federalregister.gov/documents/2016/09/21/2016-22129/clinical-trials-registration-and-results-information-submission). This column contains many empty values where elements in ClinicalTrials.gov do not correspond to a field required by the FDA - WHO Field Name: Name of the corresponding field required by the WHO Trial Registration Data Set (v 1.3.1) (https://prsinfo.clinicaltrials.gov/trainTrainer/WHO-ICMJE-ClinTrialsgov-Cross-Ref.pdf)Analytical Results:- EC_human_review.csv contains the results of a manual review of random sample eligibility criteria from 400 CT.gov records. Table gives filename, criteria, and whether manual review determined the criteria to contain criteria for "multiple subgroups" of participants.- completeness.xlsx contains counts and percentages of interventional records missing fields required by FDAAA801 and its Final Rule.- industry_completeness.xlsx contains percentages of interventional records missing required fields, broken up by agency class of trial's lead sponsor ("NIH", "US Fed", "Industry", or "Other"), and before and after the effective date of the Final Rule- location_completeness.xlsx contains percentages of interventional records missing required fields, broken up by whether record listed at least one location in the United States and records with only international location (excluding trials with no listed location), and before and after the effective date of the Final RuleIntermediate Results:- cache.zip contains pickle and csv files of pandas dataframes with values scraped from the XML records in AllPublicXML. Downloading these files greatly speeds up running analysis steps from jupyter notebooks in our github repository.
Data from: 3DHD CityScenes: High-Definition Maps in High-Density Point...
zenodo.org
data.niaid.nih.gov
bin, pdf
Updated Jul 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christopher Plachetka; Benjamin Sertolli; Jenny Fricke; Marvin Klingner; Tim Fingscheidt; Christopher Plachetka; Benjamin Sertolli; Jenny Fricke; Marvin Klingner; Tim Fingscheidt (2024). 3DHD CityScenes: High-Definition Maps in High-Density Point Clouds [Dataset]. http://doi.org/10.5281/zenodo.7085090
Explore at:
bin, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7085090
Dataset updated
Jul 16, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Christopher Plachetka; Benjamin Sertolli; Jenny Fricke; Marvin Klingner; Tim Fingscheidt; Christopher Plachetka; Benjamin Sertolli; Jenny Fricke; Marvin Klingner; Tim Fingscheidt
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overview

3DHD CityScenes is the most comprehensive, large-scale high-definition (HD) map dataset to date, annotated in the three spatial dimensions of globally referenced, high-density LiDAR point clouds collected in urban domains. Our HD map covers 127 km of road sections of the inner city of Hamburg, Germany including 467 km of individual lanes. In total, our map comprises 266,762 individual items.

Our corresponding paper (published at ITSC 2022) is available here.
Further, we have applied 3DHD CityScenes to map deviation detection here.

Moreover, we release code to facilitate the application of our dataset and the reproducibility of our research. Specifically, our 3DHD_DevKit comprises:

Python tools to read, generate, and visualize the dataset,

3DHDNet deep learning pipeline (training, inference, evaluation) for
map deviation detection and 3D object detection.

The DevKit is available here:

https://github.com/volkswagen/3DHD_devkit.

The dataset and DevKit have been created by Christopher Plachetka as project lead during his PhD period at Volkswagen Group, Germany.

When using our dataset, you are welcome to cite:

@INPROCEEDINGS{9921866, author={Plachetka, Christopher and Sertolli, Benjamin and Fricke, Jenny and Klingner, Marvin and Fingscheidt, Tim}, booktitle={2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC)}, title={3DHD CityScenes: High-Definition Maps in High-Density Point Clouds}, year={2022}, pages={627-634}}

Acknowledgements

We thank the following interns for their exceptional contributions to our work.

Benjamin Sertolli: Major contributions to our DevKit during his master thesis

Niels Maier: Measurement campaign for data collection and data preparation

The European large-scale project Hi-Drive (www.Hi-Drive.eu) supports the publication of 3DHD CityScenes and encourages the general publication of information and databases facilitating the development of automated driving technologies.

The Dataset

After downloading, the 3DHD_CityScenes folder provides five subdirectories, which are explained briefly in the following.

1. Dataset

This directory contains the training, validation, and test set definition (train.json, val.json, test.json) used in our publications. Respective files contain samples that define a geolocation and the orientation of the ego vehicle in global coordinates on the map.

During dataset generation (done by our DevKit), samples are used to take crops from the larger point cloud. Also, map elements in reach of a sample are collected. Both modalities can then be used, e.g., as input to a neural network such as our 3DHDNet.

To read any JSON-encoded data provided by 3DHD CityScenes in Python, you can use the following code snipped as an example.

import json json_path = r"E:\3DHD_CityScenes\Dataset\train.json" with open(json_path) as jf: data = json.load(jf) print(data)

2. HD_Map

Map items are stored as lists of items in JSON format. In particular, we provide:

traffic signs,

traffic lights,

pole-like objects,

construction site locations,

construction site obstacles (point-like such as cones, and line-like such as fences),

line-shaped markings (solid, dashed, etc.),

polygon-shaped markings (arrows, stop lines, symbols, etc.),

lanes (ordinary and temporary),

relations between elements (only for construction sites, e.g., sign to lane association).

3. HD_Map_MetaData

Our high-density point cloud used as basis for annotating the HD map is split in 648 tiles. This directory contains the geolocation for each tile as polygon on the map. You can view the respective tile definition using QGIS. Alternatively, we also provide respective polygons as lists of UTM coordinates in JSON.

Files with the ending .dbf, .prj, .qpj, .shp, and .shx belong to the tile definition as “shape file” (commonly used in geodesy) that can be viewed using QGIS. The JSON file contains the same information provided in a different format used in our Python API.

4. HD_PointCloud_Tiles

The high-density point cloud tiles are provided in global UTM32N coordinates and are encoded in a proprietary binary format. The first 4 bytes (integer) encode the number of points contained in that file. Subsequently, all point cloud values are provided as arrays. First all x-values, then all y-values, and so on. Specifically, the arrays are encoded as follows.

x-coordinates: 4 byte integer

y-coordinates: 4 byte integer

z-coordinates: 4 byte integer

intensity of reflected beams: 2 byte unsigned integer

ground classification flag: 1 byte unsigned integer

After reading, respective values have to be unnormalized. As an example, you can use the following code snipped to read the point cloud data. For visualization, you can use the pptk package, for instance.

import numpy as np import pptk file_path = r"E:\3DHD_CityScenes\HD_PointCloud_Tiles\HH_001.bin" pc_dict = {} key_list = ['x', 'y', 'z', 'intensity', 'is_ground'] type_list = ['
5. Trajectories We provide 15 real-world trajectories recorded during a measurement campaign covering the whole HD map. Trajectory samples are provided approx. with 30 Hz and are encoded in JSON. These trajectories were used to provide the samples in train.json, val.json. and test.json with realistic geolocations and orientations of the ego vehicle. OP1 – OP5 cover the majority of the map with 5 trajectories. RH1 – RH10 cover the majority of the map with 10 trajectories. Note that OP5 is split into three separate parts, a-c. RH9 is split into two parts, a-b. Moreover, OP4 mostly equals OP1 (thus, we speak of 14 trajectories in our paper). For completeness, however, we provide all recorded trajectories here.
d
Community Survey: 2021 Random Sample Results
catalog.data.gov
data.bloomington.in.gov
Updated May 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.bloomington.in.gov (2023). Community Survey: 2021 Random Sample Results [Dataset]. https://catalog.data.gov/dataset/community-survey-2021-random-sample-results-69942
Explore at:
Dataset updated
May 20, 2023
Dataset provided by
data.bloomington.in.gov
Description
A random sample of households were invited to participate in this survey. In the dataset, you will find the respondent level data in each row with the questions in each column. The numbers represent a scale option from the survey, such as 1=Excellent, 2=Good, 3=Fair, 4=Poor. The question stem, response option, and scale information for each field can be found in the var "variable labels" and "value labels" sheets. VERY IMPORTANT NOTE: The scientific survey data were weighted, meaning that the demographic profile of respondents was compared to the demographic profile of adults in Bloomington from US Census data. Statistical adjustments were made to bring the respondent profile into balance with the population profile. This means that some records were given more "weight" and some records were given less weight. The weights that were applied are found in the field "wt". If you do not apply these weights, you will not obtain the same results as can be found in the report delivered to the Bloomington. The easiest way to replicate these results is likely to create pivot tables, and use the sum of the "wt" field rather than a count of responses.
Z
Peatland Decomposition Database (1.1.0)
data.niaid.nih.gov
Updated Mar 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Teickner, Henning (2025). Peatland Decomposition Database (1.1.0) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11276064
Explore at:
Dataset updated
Mar 5, 2025
Dataset provided by
Teickner, Henning
Knorr, Klaus-Holger
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
1 Introduction

The Peatland Decomposition Database (PDD) stores data from published litterbag experiments related to peatlands. Currently, the database focuses on northern peatlands and Sphagnum litter and peat, but it also contains data from some vascular plant litterbag experiments. Currently, the database contains entries from 34 studies, 2,160 litterbag experiments, and 7,297 individual samples with 117,841 measurements for various attributes (e.g. relative mass remaining, N content, holocellulose content, mesh size). The aim is to provide a harmonized data source that can be useful to re-analyse existing data and to plan future litterbag experiments.

The Peatland Productivity and Decomposition Parameter Database (PPDPD) (Bona et al. 2018) is similar to the Peatland Decomposition Database (PDD) in that both contain data from peatland litterbag experiments. The differences are that both databases partly contain different data, that PPDPD additionally contains information on vegetation productivity, which PDD does not, and that PDD provides more information and metadata on litterbag experiments, and also measurement errors.

2 Updates

Compared to version 1.0.0, this version has a new structure for table experimental_design_format, contains additional metadata on the experimental design (these were omitted in version 1.0.0), and contains the scripts that were used to import the data into the database.

3 Methods

3.1 Data collection

Data for the database was collected from published litterbag studies, by extracting published data from figures, tables, or other data sources, and by contacting the authors of the studies to obtain raw data. All data processing was done with R (R version 4.2.0 (2022-04-22)) (R Core Team 2022).

Studies were identified via a Scopus search with search string (TITLE-ABS-KEY ( peat* AND ( "litter bag" OR "decomposition rate" OR "decay rate" OR "mass loss")) AND NOT ("tropic*")) (2022-12-17). These studies were further screened to exclude those which do not contain litterbag data or which recycle data from other studies that have already been considered. Additional studies with litterbag experiments in northern peatlands we were aware of, but which were not identified in the literature search were added to the list of publications. For studies not older than 10 years, authors were contacted to obtain raw data, however this was successful only in few cases. To date, the database focuses on Sphagnum litterbag experiments and not from all studies that were identified by the literature search data have been included yet in the database.

Data from figures were extracted using the package ‘metaDigitise’ (1.0.1) (Pick, Nakagawa, and Noble 2018). Data from tables were extracted manually.

Data from the following studies are currently included: Farrish and Grigal (1985), Bartsch and Moore (1985), Farrish and Grigal (1988), Vitt (1990), Hogg, Lieffers, and Wein (1992), Sanger, Billett, and Cresser (1994), Hiroki and Watanabe (1996), Szumigalski and Bayley (1996), Prevost, Belleau, and Plamondon (1997), Arp, Cooper, and Stednick (1999), Robbert A. Scheffer and Aerts (2000), R. A. Scheffer, Van Logtestijn, and Verhoeven (2001), Limpens and Berendse (2003), Waddington, Rochefort, and Campeau (2003), Asada, Warner, and Banner (2004), Thormann, Bayley, and Currah (2001), Trinder, Johnson, and Artz (2008), Breeuwer et al. (2008), Trinder, Johnson, and Artz (2009), Bragazza and Iacumin (2009), Hoorens, Stroetenga, and Aerts (2010), Straková et al. (2010), Straková et al. (2012), Orwin and Ostle (2012), Lieffers (1988), Manninen et al. (2016), Johnson and Damman (1991), Bengtsson, Rydin, and Hájek (2018a), Bengtsson, Rydin, and Hájek (2018b), Asada and Warner (2005), Bengtsson, Granath, and Rydin (2017), Bengtsson, Granath, and Rydin (2016), Hagemann and Moroni (2015), Hagemann and Moroni (2016), B. Piatkowski et al. (2021), B. T. Piatkowski et al. (2021), Mäkilä et al. (2018), Golovatskaya and Nikonova (2017), Golovatskaya and Nikonova (2017).

4 Database records

The database is a ‘MariaDB’ database and the database schema was designed to store data and metadata following the Ecological Metadata Language (EML) (Jones et al. 2019). Descriptions of the tables are shown in Tab. 1.

The database contains general metadata relevant for litterbag experiments (e.g., geographical, temporal, and taxonomic coverage, mesh sizes, experimental design). However, it does not contain a detailed description of sample handling, sample preprocessing methods, site descriptions, because there currently are no discipline-specific metadata and reporting standards. Table 1: Description of the individual tables in the database.

Name Description

attributes Defines the attributes of the database and the values in column attribute_name in table data.

citations Stores bibtex entries for references and data sources.

citations_to_datasets Links entries in table citations with entries in table datasets.

custom_units Stores custom units.

data Stores measured values for samples, for example remaining masses.

datasets Lists the individual datasets.

experimental_design_format Stores information on the experimental design of litterbag experiments.

measurement_scales, measurement_scales_date_time, measurement_scales_interval, measurement_scales_nominal, measurement_scales_ordinal, measurement_scales_ratio Defines data value types.

missing_value_codes Defines how missing values are encoded.

samples Stores information on individual samples.

samples_to_samples Links samples to other samples, for example litter samples collected in the field to litter samples collected during the incubation of the litterbags.

units, unit_types Stores information on measurement units.

5 Attributes Table 2: Definition of attributes in the Peatland Decomposition Database and entries in the column attribute_name in table data.

Name Definition Example value Unit Measurement scale Number type Minimum value Maximum value String format

4_hydroxyacetophenone_mass_absolute A numeric value representing the content of 4-hydroxyacetophenone, as described in Straková et al. (2010). 0.26 g ratio real 0 Inf NA

4_hydroxyacetophenone_mass_relative_mass A numeric value representing the content of 4-hydroxyacetophenone, as described in Straková et al. (2010). 0.26 g/g ratio real 0 1 NA

4_hydroxybenzaldehyde_mass_absolute A numeric value representing the content of 4-hydroxybenzaldehyde, as described in Straková et al. (2010). 0.26 g ratio real 0 Inf NA

4_hydroxybenzaldehyde_mass_relative_mass A numeric value representing the content of 4-hydroxybenzaldehyde, as described in Straková et al. (2010). 0.26 g/g ratio real 0 1 NA

4_hydroxybenzoic_acid_mass_absolute A numeric value representing the content of 4-hydroxybenzoic acid, as described in Straková et al. (2010). 0.26 g ratio real 0 Inf NA

4_hydroxybenzoic_acid_mass_relative_mass A numeric value representing the content of 4-hydroxybenzoic acid, as described in Straková et al. (2010). 0.26 g/g ratio real 0 1 NA

abbreviation In table custom_units: A string representing an abbreviation for the custom unit. gC NA nominal NA NA NA NA

acetone_extractives_mass_absolute A numeric value representing the content of acetone extractives, as described in Straková et al. (2010). 0.26 g ratio real 0 Inf NA

acetone_extractives_mass_relative_mass A numeric value representing the content of acetone extractives, as described in Straková et al. (2010). 0.26 g/g ratio real 0 1 NA

acetosyringone_mass_absolute A numeric value representing the content of acetosyringone, as described in Straková et al. (2010). 0.26 g ratio real 0 Inf NA

acetosyringone_mass_relative_mass A numeric value representing the content of acetosyringone, as described in Straková et al. (2010). 0.26 g/g ratio real 0 1 NA

acetovanillone_mass_absolute A numeric value representing the content of acetovanillone, as described in Straková et al. (2010). 0.26 g ratio real 0 Inf NA

acetovanillone_mass_relative_mass A numeric value representing the content of acetovanillone, as described in Straková et al. (2010). 0.26 g/g ratio real 0 1 NA

arabinose_mass_absolute A numeric value representing the content of arabinose, as described in Straková et al. (2010). 0.26 g ratio real 0 Inf NA

arabinose_mass_relative_mass A numeric value representing the content of arabinose, as described in Straková et al. (2010). 0.26 g/g ratio real 0 1 NA

ash_mass_absolute A numeric value representing the content of ash (after burning at 550°C). 4 g ratio real 0 Inf NA

ash_mass_relative_mass A numeric value representing the content of ash (after burning at 550°C). 0.05 g/g ratio real 0 Inf NA

attribute_definition A free text field with a textual description of the meaning of attributes in the dpeatdecomposition database. NA NA nominal NA NA NA NA

attribute_name A string describing the names of the attributes in all tables of the dpeatdecomposition database. attribute_name NA nominal NA NA NA NA

bibtex A string representing the bibtex code used for a literature reference throughout the dpeatdecomposition database. Galka.2021 NA nominal NA NA NA NA

bounds_maximum A numeric value representing the minimum possible value for a numeric attribute. 0 NA interval real Inf Inf NA

bounds_minimum A numeric value representing the maximum possible value for a numeric attribute. INF NA interval real Inf Inf NA

bulk_density A numeric value representing the bulk density of the sample [g cm-3]. 0,2 g/cm^3 ratio real 0 Inf NA

C_absolute The absolute mass of C in the sample. 1 g ratio real 0 Inf NA

C_relative_mass The absolute mass of C in the sample. 1 g/g ratio real 0 Inf NA

C_to_N A numeric value representing the C to N ratio of the sample. 35 g/g ratio real 0 Inf NA

C_to_P A numeric value representing the C to P ratio of the sample. 35 g/g ratio real 0 Inf NA

Ca_absolute The
Amount of data created, consumed, and stored 2010-2023, with forecasts to...
statista.com
Updated Nov 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
Explore at:
Dataset updated
Nov 21, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 2024
Area covered
Worldwide
Description
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching 149 zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than 394 zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just two percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of 19.2 percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached 6.7 zettabytes.
d
2016-17 - 2020-21 End-of-Year Borough Attendance and Chronic Absenteeism...
catalog.data.gov
data.cityofnewyork.us
+1more
Updated Nov 29, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofnewyork.us (2024). 2016-17 - 2020-21 End-of-Year Borough Attendance and Chronic Absenteeism Data [Dataset]. https://catalog.data.gov/dataset/2016-17-2020-21-end-of-year-borough-attendance-and-chronic-absenteeism-data
Explore at:
Dataset updated
Nov 29, 2024
Dataset provided by
data.cityofnewyork.us
Description
Overall attendance data include students in Districts 1-32 and 75 (Special Education). Students in District 79 (Alternative Schools & Programs), charter schools, home schooling, and home and hospital instruction are excluded. Pre-K data do not include NYC Early Education Centers or District Pre-K Centers; therefore, Pre-K data are limited to those who attend K-12 schools that offer Pre-K. Transfer schools are included in citywide, borough, and district counts but removed from school-level files. Attendance is attributed to the school the student attended at the time. If a student attends multiple schools in a school year, the student will contribute data towards multiple schools. Starting in 2020-21, the NYC DOE transitioned to NYSED's definition of chronic absenteeism. Students are considered chronically absent if they have an attendance of 90 percent or less (i.e. students who are absent 10 percent or more of the total days). In order to be included in chronic absenteeism calculations, students must be enrolled for at least 10 days (regardless of whether present or absent) and must have been present for at least 1 day. The NYSED chronic absenteeism definition is applied to all prior years in the report. School-level chronic absenteeism data reflect chronic absenteeism at a particular school. In order to eliminate double-counting students in chronic absenteeism counts, calculations at the district, borough, and citywide levels include all attendance data that contribute to the given geographic category. For example, if a student was chronically absent at one school but not at another, the student would only be counted once in the citywide calculation. For this reason, chronic absenteeism counts will not align across files. All demographic data are based on a student's most recent record in a given year. Students With Disabilities (SWD) data do not include Pre-K students since Pre-K students are screened for IEPs only at the parents' request. English language learner (ELL) data do not include Pre-K students since the New York State Education Department only begins administering assessments to be identified as an ELL in Kindergarten. Only grades PK-12 are shown, but calculations for "All Grades" also include students missing a grade level, so PK-12 may not add up to "All Grades". Data include students missing a gender, but are not shown due to small cell counts. Data for Asian students include Native Hawaiian or Other Pacific Islanders . Multi-racial and Native American students, as well as students missing ethnicity/race data are included in the "Other" ethnicity category. In order to comply with the Family Educational Rights and Privacy Act (FERPA) regulations on public reporting of education outcomes, rows with five or fewer students are suppressed, and have been replaced with an "s". Using total days of attendance as a proxy , rows with 900 or fewer total days are suppressed. In addition, other rows have been replaced with an "s" when they could reveal, through addition or subtraction, the underlying numbers that have been redacted. Chronic absenteeism values are suppressed, regardless of total days, if the number of students who contribute at least 20 days is five or fewer. Due to the COVID-19 pandemic and resulting shift to remote learning in March 2020, 2019-20 attendance data was only available for September 2019 through March 13, 2020. Interactions data from the spring of 2020 are reported on a separate tab. Interactions were reported by schools during remote learning, from April 6 2020 through June 26 2020 (a total of 57 instructional days, excluding special professional development days of June 4 and June 9). Schools were required to indicate any student from their roster that did not have an interaction on a given day. Schools were able to define interactions in a way that made sense for their students and families. Definitions of an interaction included: • Student submission of an assignment or completion of an
n
FOI 26605 - Datasets - Open Data Portal
opendata.nhsbsa.net
Updated Nov 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). FOI 26605 - Datasets - Open Data Portal [Dataset]. https://opendata.nhsbsa.net/dataset/foi-26605
Explore at:
Dataset updated
Nov 24, 2022
License
Open Government Licence 2.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/2/
License information was derived automatically
Description
Further to the original Enterprise Application request, the contract below has expired. Please provide the current status. Finance Capita CRM Trustmarque Solutions Ltd I'd like to apologise for the length of this request, and how tedious it may be to handle. That being said, please make an effort to provide all of this information. The information I'm requesting is regarding the software contracts that the organisation uses, for the following fields.Enterprise Resource Planning Software Solution (ERP): Primary Customer Relationship Management Solution (CRM): For example, Salesforce, Lagan CRM, Microsoft Dynamics; software of this nature. Primary Human Resources (HR) and Payroll Software Solution: For example, iTrent, ResourceLink, HealthRoster; software of this nature. The organisation’s primary corporate Finance Software Solution: For example, Agresso, Integra, Sapphire Systems; software of this nature. Name of Supplier: Can you please provide me with the software provider for each contract? The brand of the software: Can you please provide me with the actual name of the software. Please do not provide me with the supplier name again please provide me with the actual software name. Description of the contract: Can you please provide me with detailed information about this contract and please state if upgrade, maintenance and support is included. Please also list the software modules included in these contracts. Number of Users/Licenses: What is the total number of user/licenses for this contract? Annual Spend: What is the annual average spend for each contract? Contract Duration: What is the duration of the contract please include any available extensions within the contract. Contract Start Date: What is the start date of this contract? Please include month and year of the contract. DD-MM-YY or MM-YY. Contract Expiry: What is the expiry date of this contract? Please include month and year of the contract. DD-MM-YY or MM-YY. Contract Review Date: What is the review date of this contract? Please include month and year of the contract. If this cannot be provide please provide me estimates of when the contract is likely to be reviewed. DD-MM-YY or MM-YY. Contact Details: I require the full contact details of the person within the organisation responsible for this particular software contract (name, job title, email, contact number).’
n
InterAgencyFirePerimeterHistory All Years View - Dataset - CKAN
nationaldataplatform.org
Updated Feb 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). InterAgencyFirePerimeterHistory All Years View - Dataset - CKAN [Dataset]. https://nationaldataplatform.org/catalog/dataset/interagencyfireperimeterhistory-all-years-view
Explore at:
Dataset updated
Feb 28, 2024
Description
Historical FiresLast updated on 06/17/2022OverviewThe national fire history perimeter data layer of conglomerated Agency Authoratative perimeters was developed in support of the WFDSS application and wildfire decision support for the 2021 fire season. The layer encompasses the final fire perimeter datasets of the USDA Forest Service, US Department of Interior Bureau of Land Management, Bureau of Indian Affairs, Fish and Wildlife Service, and National Park Service, the Alaska Interagency Fire Center, CalFire, and WFIGS History. Perimeters are included thru the 2021 fire season. Requirements for fire perimeter inclusion, such as minimum acreage requirements, are set by the contributing agencies. WFIGS, NPS and CALFIRE data now include Prescribed Burns. Data InputSeveral data sources were used in the development of this layer:Alaska fire history USDA FS Regional Fire History Data BLM Fire Planning and Fuels National Park Service - Includes Prescribed Burns Fish and Wildlife ServiceBureau of Indian AffairsCalFire FRAS - Includes Prescribed BurnsWFIGS - BLM & BIA and other S&LData LimitationsFire perimeter data are often collected at the local level, and fire management agencies have differing guidelines for submitting fire perimeter data. Often data are collected by agencies only once annually. If you do not see your fire perimeters in this layer, they were not present in the sources used to create the layer at the time the data were submitted. A companion service for perimeters entered into the WFDSS application is also available, if a perimeter is found in the WFDSS service that is missing in this Agency Authoratative service or a perimeter is missing in both services, please contact the appropriate agency Fire GIS Contact listed in the table below.AttributesThis dataset implements the NWCG Wildland Fire Perimeters (polygon) data standard.https://www.nwcg.gov/sites/default/files/stds/WildlandFirePerimeters_definition.pdfIRWINID - Primary key for linking to the IRWIN Incident dataset. The origin of this GUID is the wildland fire locations point data layer. (This unique identifier may NOT replace the GeometryID core attribute)INCIDENT - The name assigned to an incident; assigned by responsible land management unit. (IRWIN required). Officially recorded name.FIRE_YEAR (Alias) - Calendar year in which the fire started. Example: 2013. Value is of type integer (FIRE_YEAR_INT).AGENCY - Agency assigned for this fire - should be based on jurisdiction at origin.SOURCE - System/agency source of record from which the perimeter came.DATE_CUR - The last edit, update, or other valid date of this GIS Record. Example: mm/dd/yyyy.MAP_METHOD - Controlled vocabulary to define how the geospatial feature was derived. Map method may help define data quality.GPS-Driven; GPS-Flight; GPS-Walked; GPS-Walked/Driven; GPS-Unknown Travel Method; Hand Sketch; Digitized-Image; Digitized-Topo; Digitized-Other; Image Interpretation; Infrared Image; Modeled; Mixed Methods; Remote Sensing Derived; Survey/GCDB/Cadastral; Vector; OtherGIS_ACRES - GIS calculated acres within the fire perimeter. Not adjusted for unburned areas within the fire perimeter. Total should include 1 decimal place. (ArcGIS: Precision=10; Scale=1). Example: 23.9UNQE_FIRE_ - Unique fire identifier is the Year-Unit Identifier-Local Incident Identifier (yyyy-SSXXX-xxxxxx). SS = State Code or International Code, XXX or XXXX = A code assigned to an organizational unit, xxxxx = Alphanumeric with hyphens or periods. The unit identifier portion corresponds to the POINT OF ORIGIN RESPONSIBLE AGENCY UNIT IDENTIFIER (POOResonsibleUnit) from the responsible unit’s corresponding fire report. Example: 2013-CORMP-000001LOCAL_NUM - Local incident identifier (dispatch number). A number or code that uniquely identifies an incident for a particular local fire management organization within a particular calendar year. Field is string to allow for leading zeros when the local incident identifier is less than 6 characters. (IRWIN required). Example: 123456.UNIT_ID - NWCG Unit Identifier of landowner/jurisdictional agency unit at the point of origin of a fire. (NFIRS ID should be used only when no NWCG Unit Identifier exists). Example: CORMPCOMMENTS - Additional information describing the feature. Free Text.FEATURE_CA - Type of wildland fire polygon: Wildfire (represents final fire perimeter or last daily fire perimeter available) or Prescribed Fire or UnknownGEO_ID - Primary key for linking geospatial objects with other database systems. Required for every feature. This field may be renamed for each standard to fit the feature. Globally Unique Identifier (GUID).Cross-Walk from sources (GeoID) and other processing notesAK: GEOID = OBJECT ID of provided file geodatabase (4580 Records thru 2021), other federal sources for AK data removed. CA: GEOID = OBJECT ID of downloaded file geodatabase (12776 Records, federal fires removed, includes RX)FWS: GEOID = OBJECTID of service download combined history 2005-2021 (2052 Records). Handful of WFIGS (11) fires added that were not in FWS record.BIA: GEOID = "FireID" 2017/2018 data (416 records) provided or WFDSS PID (415 records). An additional 917 fires from WFIGS were added, GEOID=GLOBALID in source.NPS: GEOID = EVENT ID (IRWINID or FRM_ID from FOD), 29,943 records includes RX.BLM: GEOID = GUID from BLM FPER and GLOBALID from WFIGS. Date Current = best available modify_date, create_date, fire_cntrl_dt or fire_dscvr_dt to reduce the number of 9999 entries in FireYear. Source FPER (25,389 features), WFIGS (5357 features)USFS: GEOID=GLOBALID in source, 46,574 features. Also fixed Date Current to best available date from perimeterdatetime, revdate, discoverydatetime, dbsourcedate to reduce number of 1899 entries in FireYear.Relevant Websites and ReferencesAlaska Fire Service: https://afs.ak.blm.gov/CALFIRE: https://frap.fire.ca.gov/mapping/gis-dataBIA - data prior to 2017 from WFDSS, 2017-2018 Agency Provided, 2019 and after WFIGSBLM: https://gis.blm.gov/arcgis/rest/services/fire/BLM_Natl_FirePerimeter/MapServerNPS: New data set provided from NPS Fire & Aviation GIS. cross checked against WFIGS for any missing perimeters in 2021.https://nifc.maps.arcgis.com/home/item.html?id=098ebc8e561143389ca3d42be3707caaFWS -https://services.arcgis.com/QVENGdaPbd4LUkLV/arcgis/rest/services/USFWS_Wildfire_History_gdb/FeatureServerUSFS - https://apps.fs.usda.gov/arcx/rest/services/EDW/EDW_FireOccurrenceAndPerimeter_01/MapServerAgency Fire GIS ContactsRD&A Data ManagerVACANTSusan McClendonWFM RD&A GIS Specialist208-258-4244send emailJill KuenziUSFS-NIFC208.387.5283send email Joseph KafkaBIA-NIFC208.387.5572send emailCameron TongierUSFWS-NIFC208.387.5712send emailSkip EdelNPS-NIFC303.969.2947send emailJulie OsterkampBLM-NIFC208.258.0083send email Jennifer L. Jenkins Alaska Fire Service 907.356.5587 send email
q
Data from: The Berth Allocation Problem with Channel Restrictions - Datasets...
researchdatafinder.qut.edu.au
researchdata.edu.au
Updated Jun 22, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr Paul Corry (2018). The Berth Allocation Problem with Channel Restrictions - Datasets [Dataset]. https://researchdatafinder.qut.edu.au/individual/n4992
Explore at:
Dataset updated
Jun 22, 2018
Dataset provided by
Queensland University of Technology (QUT)
Authors
Dr Paul Corry
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These datatasets relate to the computational study presented in the paper The Berth Allocation Problem with Channel Restrictions, authored by Paul Corry and Christian Bierwirth. They consist of all the randomly generated problem instances along with the computational results presented in the paper.

Results across all problem instances assume ship separation parameters of [delta_1, delta_2, delta_3] = [0.25, 0, 0.5].

Excel Workbook Organisation:

The data is organised into separate Excel files for each table in the paper, as indicated by the file description. Within each file, each row of data presented (aggregating 10 replications) in the corrsponding table is captured in two worksheets, one with the problem instance data, and the other with generated solution data obtained from several solution methods (described in the paper). For example, row 3 of Tab. 2, will have data for 10 problem instances on worksheet T2R3, and corresponding solution data on T2R3X.

Problem Instance Data Format:

On each problem instance worksheet (e.g. T2R3), each row of data corresponds to a different problem instance, and there are 10 replications on each worksheet.

The first column provides a replication identifier which is referenced on the corresponding solution worksheet (e.g. T2R3X).

Following this, there are n*(2c+1) columns (n = number of ships, c = number of channel segmenets) with headers p(i)_(j).(k)., where i references the operation (channel transit/berth visit) id, j references the ship id, and k references the index of the operation within the ship. All indexing starts at 0. These columns define the transit or dwell times on each segment. A value of -1 indicates a segment on which a berth allocation must be applied, and hence the dwell time is unkown.

There are then a further n columns with headers r(j), defining the release times of each ship.

For ChSP problems, there are a final n colums with headers b(j), defining the berth to be visited by each ship. ChSP problems with fixed berth sequencing enforced have an additional n columns with headers toa(j), indicating the order in which ship j sits within its berth sequence. For BAP-CR problems, these columnns are not present, but replaced by n*m columns (m = number of berths) with headers p(j).(b) defining the berth processing time of ship j if allocated to berth b.

Solution Data Format:

Each row of data corresponds to a different solution.

Column A references the replication identifier (from the corresponding instance worksheet) that the soluion refers to.

Column B defines the algorithm that was used to generate the solution.

Column C shows the objective function value (total waiting and excess handling time) obtained.

Column D shows the CPU time consumed in generating the solution, rounded to the nearest second.

Column E shows the optimality gap as a proportion. A value of -1 or an empty value indicates that optimality gap is unknown.

From column F onwards, there are are n*(2c+1) columns with the previously described p(i)_(j).(k). headers. The values in these columns define the entry times at each segment.

For BAP-CR problems only, following this there are a further 2n columns. For each ship j, there will be columns titled b(j) and p.b(j) defining the berth that was allocated to ship j, and the processing time on that berth respectively.

Facebook

Twitter

Click to copy link

Link copied

Cite

Charalampos Alexopoulos (2023). Dataset: A Systematic Literature Review on the topic of High-value datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7944424

Dataset: A Systematic Literature Review on the topic of High-value datasets

Explore at:

Dataset updated

Jun 23, 2023

Dataset provided by

Andrea Miletič
Charalampos Alexopoulos
Magdalena Ciesielska
Nina Rizun
Anastasija Nikiforova

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset contains data collected during a study ("Towards High-Value Datasets determination for data-driven development: a systematic literature review") conducted by Anastasija Nikiforova (University of Tartu), Nina Rizun, Magdalena Ciesielska (Gdańsk University of Technology), Charalampos Alexopoulos (University of the Aegean) and Andrea Miletič (University of Zagreb) It being made public both to act as supplementary data for "Towards High-Value Datasets determination for data-driven development: a systematic literature review" paper (pre-print is available in Open Access here -> https://arxiv.org/abs/2305.10234) and in order for other researchers to use these data in their own work.

The protocol is intended for the Systematic Literature review on the topic of High-value Datasets with the aim to gather information on how the topic of High-value datasets (HVD) and their determination has been reflected in the literature over the years and what has been found by these studies to date, incl. the indicators used in them, involved stakeholders, data-related aspects, and frameworks. The data in this dataset were collected in the result of the SLR over Scopus, Web of Science, and Digital Government Research library (DGRL) in 2023.

Methodology

To understand how HVD determination has been reflected in the literature over the years and what has been found by these studies to date, all relevant literature covering this topic has been studied. To this end, the SLR was carried out to by searching digital libraries covered by Scopus, Web of Science (WoS), Digital Government Research library (DGRL).

These databases were queried for keywords ("open data" OR "open government data") AND ("high-value data*" OR "high value data*"), which were applied to the article title, keywords, and abstract to limit the number of papers to those, where these objects were primary research objects rather than mentioned in the body, e.g., as a future work. After deduplication, 11 articles were found unique and were further checked for relevance. As a result, a total of 9 articles were further examined. Each study was independently examined by at least two authors.

To attain the objective of our study, we developed the protocol, where the information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information.

Test procedure Each study was independently examined by at least two authors, where after the in-depth examination of the full-text of the article, the structured protocol has been filled for each study. The structure of the survey is available in the supplementary file available (see Protocol_HVD_SLR.odt, Protocol_HVD_SLR.docx) The data collected for each study by two researchers were then synthesized in one final version by the third researcher.

Description of the data in this data set

Protocol_HVD_SLR provides the structure of the protocol Spreadsheets #1 provides the filled protocol for relevant studies. Spreadsheet#2 provides the list of results after the search over three indexing databases, i.e. before filtering out irrelevant studies

The information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information

Descriptive information
1) Article number - a study number, corresponding to the study number assigned in an Excel worksheet 2) Complete reference - the complete source information to refer to the study 3) Year of publication - the year in which the study was published 4) Journal article / conference paper / book chapter - the type of the paper -{journal article, conference paper, book chapter} 5) DOI / Website- a link to the website where the study can be found 6) Number of citations - the number of citations of the article in Google Scholar, Scopus, Web of Science 7) Availability in OA - availability of an article in the Open Access 8) Keywords - keywords of the paper as indicated by the authors 9) Relevance for this study - what is the relevance level of the article for this study? {high / medium / low}

Approach- and research design-related information 10) Objective / RQ - the research objective / aim, established research questions 11) Research method (including unit of analysis) - the methods used to collect data, including the unit of analy-sis (country, organisation, specific unit that has been ana-lysed, e.g., the number of use-cases, scope of the SLR etc.) 12) Contributions - the contributions of the study 13) Method - whether the study uses a qualitative, quantitative, or mixed methods approach? 14) Availability of the underlying research data- whether there is a reference to the publicly available underly-ing research data e.g., transcriptions of interviews, collected data, or explanation why these data are not shared? 15) Period under investigation - period (or moment) in which the study was conducted 16) Use of theory / theoretical concepts / approaches - does the study mention any theory / theoretical concepts / approaches? If any theory is mentioned, how is theory used in the study?

Quality- and relevance- related information
17) Quality concerns - whether there are any quality concerns (e.g., limited infor-mation about the research methods used)? 18) Primary research object - is the HVD a primary research object in the study? (primary - the paper is focused around the HVD determination, sec-ondary - mentioned but not studied (e.g., as part of discus-sion, future work etc.))

HVD determination-related information
19) HVD definition and type of value - how is the HVD defined in the article and / or any other equivalent term? 20) HVD indicators - what are the indicators to identify HVD? How were they identified? (components & relationships, “input -> output") 21) A framework for HVD determination - is there a framework presented for HVD identification? What components does it consist of and what are the rela-tionships between these components? (detailed description) 22) Stakeholders and their roles - what stakeholders or actors does HVD determination in-volve? What are their roles? 23) Data - what data do HVD cover? 24) Level (if relevant) - what is the level of the HVD determination covered in the article? (e.g., city, regional, national, international)

Format of the file .xls, .csv (for the first spreadsheet only), .odt, .docx

Licenses or restrictions CC-BY

For more info, see README.txt

Clear search

Close search

Google apps

Main menu

Dataset: A Systematic Literature Review on the topic of High-value datasets

Big data and business analytics revenue worldwide 2015-2022

Conceptualization of public data ecosystems

Company Datasets for Business Profiling

fdata-02-00040_Challenges and Legal Gaps of Genetic Profiling in the Era of...

World Health Survey 2003 - Belgium

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Enterprise Survey 2009-2019, Panel Data - Slovenia

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Response rate

Forecast revenue big data market worldwide 2011-2027

2021 American Community Survey: B28001 | TYPES OF COMPUTERS IN HOUSEHOLD...

Google Analytics Sample

Context

Content

Acknowledgements

Inspiration

SAS code used to analyze data and a datafile with metadata glossary

Data from "Obstacles to the Reuse of Study Metadata in ClinicalTrials.gov"

Data from: 3DHD CityScenes: High-Definition Maps in High-Density Point...

Community Survey: 2021 Random Sample Results

Peatland Decomposition Database (1.1.0)

Amount of data created, consumed, and stored 2010-2023, with forecasts to...

2016-17 - 2020-21 End-of-Year Borough Attendance and Chronic Absenteeism...

FOI 26605 - Datasets - Open Data Portal

InterAgencyFirePerimeterHistory All Years View - Dataset - CKAN

Data from: The Berth Allocation Problem with Channel Restrictions - Datasets...

Dataset: A Systematic Literature Review on the topic of High-value datasets