73 datasets found

Z
Dataset: A Systematic Literature Review on the topic of High-value datasets
data.niaid.nih.gov
Updated Jun 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anastasija Nikiforova (2023). Dataset: A Systematic Literature Review on the topic of High-value datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7944424
Explore at:
Dataset updated
Jun 23, 2023
Dataset provided by
Andrea Miletič
Charalampos Alexopoulos
Magdalena Ciesielska
Nina Rizun
Anastasija Nikiforova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains data collected during a study ("Towards High-Value Datasets determination for data-driven development: a systematic literature review") conducted by Anastasija Nikiforova (University of Tartu), Nina Rizun, Magdalena Ciesielska (Gdańsk University of Technology), Charalampos Alexopoulos (University of the Aegean) and Andrea Miletič (University of Zagreb) It being made public both to act as supplementary data for "Towards High-Value Datasets determination for data-driven development: a systematic literature review" paper (pre-print is available in Open Access here -> https://arxiv.org/abs/2305.10234) and in order for other researchers to use these data in their own work.

The protocol is intended for the Systematic Literature review on the topic of High-value Datasets with the aim to gather information on how the topic of High-value datasets (HVD) and their determination has been reflected in the literature over the years and what has been found by these studies to date, incl. the indicators used in them, involved stakeholders, data-related aspects, and frameworks. The data in this dataset were collected in the result of the SLR over Scopus, Web of Science, and Digital Government Research library (DGRL) in 2023.

Methodology

To understand how HVD determination has been reflected in the literature over the years and what has been found by these studies to date, all relevant literature covering this topic has been studied. To this end, the SLR was carried out to by searching digital libraries covered by Scopus, Web of Science (WoS), Digital Government Research library (DGRL).

These databases were queried for keywords ("open data" OR "open government data") AND ("high-value data*" OR "high value data*"), which were applied to the article title, keywords, and abstract to limit the number of papers to those, where these objects were primary research objects rather than mentioned in the body, e.g., as a future work. After deduplication, 11 articles were found unique and were further checked for relevance. As a result, a total of 9 articles were further examined. Each study was independently examined by at least two authors.

To attain the objective of our study, we developed the protocol, where the information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information.

Test procedure Each study was independently examined by at least two authors, where after the in-depth examination of the full-text of the article, the structured protocol has been filled for each study. The structure of the survey is available in the supplementary file available (see Protocol_HVD_SLR.odt, Protocol_HVD_SLR.docx) The data collected for each study by two researchers were then synthesized in one final version by the third researcher.

Description of the data in this data set

Protocol_HVD_SLR provides the structure of the protocol Spreadsheets #1 provides the filled protocol for relevant studies. Spreadsheet#2 provides the list of results after the search over three indexing databases, i.e. before filtering out irrelevant studies

The information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information

Descriptive information
1) Article number - a study number, corresponding to the study number assigned in an Excel worksheet 2) Complete reference - the complete source information to refer to the study 3) Year of publication - the year in which the study was published 4) Journal article / conference paper / book chapter - the type of the paper -{journal article, conference paper, book chapter} 5) DOI / Website- a link to the website where the study can be found 6) Number of citations - the number of citations of the article in Google Scholar, Scopus, Web of Science 7) Availability in OA - availability of an article in the Open Access 8) Keywords - keywords of the paper as indicated by the authors 9) Relevance for this study - what is the relevance level of the article for this study? {high / medium / low}

Approach- and research design-related information 10) Objective / RQ - the research objective / aim, established research questions 11) Research method (including unit of analysis) - the methods used to collect data, including the unit of analy-sis (country, organisation, specific unit that has been ana-lysed, e.g., the number of use-cases, scope of the SLR etc.) 12) Contributions - the contributions of the study 13) Method - whether the study uses a qualitative, quantitative, or mixed methods approach? 14) Availability of the underlying research data- whether there is a reference to the publicly available underly-ing research data e.g., transcriptions of interviews, collected data, or explanation why these data are not shared? 15) Period under investigation - period (or moment) in which the study was conducted 16) Use of theory / theoretical concepts / approaches - does the study mention any theory / theoretical concepts / approaches? If any theory is mentioned, how is theory used in the study?

Quality- and relevance- related information
17) Quality concerns - whether there are any quality concerns (e.g., limited infor-mation about the research methods used)? 18) Primary research object - is the HVD a primary research object in the study? (primary - the paper is focused around the HVD determination, sec-ondary - mentioned but not studied (e.g., as part of discus-sion, future work etc.))

HVD determination-related information
19) HVD definition and type of value - how is the HVD defined in the article and / or any other equivalent term? 20) HVD indicators - what are the indicators to identify HVD? How were they identified? (components & relationships, “input -> output") 21) A framework for HVD determination - is there a framework presented for HVD identification? What components does it consist of and what are the rela-tionships between these components? (detailed description) 22) Stakeholders and their roles - what stakeholders or actors does HVD determination in-volve? What are their roles? 23) Data - what data do HVD cover? 24) Level (if relevant) - what is the level of the HVD determination covered in the article? (e.g., city, regional, national, international)

Format of the file .xls, .csv (for the first spreadsheet only), .odt, .docx

Licenses or restrictions CC-BY

For more info, see README.txt
5 elements that define what a thesis statement is
kaggle.com
Updated Feb 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shanda Murray (2021). 5 elements that define what a thesis statement is [Dataset]. https://kaggle.com/shandamurray/5-elements-that-define-what-a-thesis-statement-is
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 15, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Shanda Murray
Description
Dataset

This dataset was created by Shanda Murray

Contents
Z
Conceptualization of public data ecosystems
data.niaid.nih.gov
Updated Sep 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin, Lnenicka (2024). Conceptualization of public data ecosystems [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13842001
Explore at:
Dataset updated
Sep 26, 2024
Dataset provided by
Anastasija, Nikiforova
Martin, Lnenicka
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains data collected during a study "Understanding the development of public data ecosystems: from a conceptual model to a six-generation model of the evolution of public data ecosystems" conducted by Martin Lnenicka (University of Hradec Králové, Czech Republic), Anastasija Nikiforova (University of Tartu, Estonia), Mariusz Luterek (University of Warsaw, Warsaw, Poland), Petar Milic (University of Pristina - Kosovska Mitrovica, Serbia), Daniel Rudmark (Swedish National Road and Transport Research Institute, Sweden), Sebastian Neumaier (St. Pölten University of Applied Sciences, Austria), Karlo Kević (University of Zagreb, Croatia), Anneke Zuiderwijk (Delft University of Technology, Delft, the Netherlands), Manuel Pedro Rodríguez Bolívar (University of Granada, Granada, Spain).

As there is a lack of understanding of the elements that constitute different types of value-adding public data ecosystems and how these elements form and shape the development of these ecosystems over time, which can lead to misguided efforts to develop future public data ecosystems, the aim of the study is: (1) to explore how public data ecosystems have developed over time and (2) to identify the value-adding elements and formative characteristics of public data ecosystems. Using an exploratory retrospective analysis and a deductive approach, we systematically review 148 studies published between 1994 and 2023. Based on the results, this study presents a typology of public data ecosystems and develops a conceptual model of elements and formative characteristics that contribute most to value-adding public data ecosystems, and develops a conceptual model of the evolutionary generation of public data ecosystems represented by six generations called Evolutionary Model of Public Data Ecosystems (EMPDE). Finally, three avenues for a future research agenda are proposed.

This dataset is being made public both to act as supplementary data for "Understanding the development of public data ecosystems: from a conceptual model to a six-generation model of the evolution of public data ecosystems ", Telematics and Informatics*, and its Systematic Literature Review component that informs the study.

Description of the data in this data set

PublicDataEcosystem_SLR provides the structure of the protocol

Spreadsheet#1 provides the list of results after the search over three indexing databases and filtering out irrelevant studies

Spreadsheets #2 provides the protocol structure.

Spreadsheets #3 provides the filled protocol for relevant studies.

The information on each selected study was collected in four categories:(1) descriptive information,(2) approach- and research design- related information,(3) quality-related information,(4) HVD determination-related information

Descriptive Information

Article number

A study number, corresponding to the study number assigned in an Excel worksheet

Complete reference

The complete source information to refer to the study (in APA style), including the author(s) of the study, the year in which it was published, the study's title and other source information.

Year of publication

The year in which the study was published.

Journal article / conference paper / book chapter

The type of the paper, i.e., journal article, conference paper, or book chapter.

Journal / conference / book

Journal article, conference, where the paper is published.

DOI / Website

A link to the website where the study can be found.

Number of words

A number of words of the study.

Number of citations in Scopus and WoS

The number of citations of the paper in Scopus and WoS digital libraries.

Availability in Open Access

Availability of a study in the Open Access or Free / Full Access.

Keywords

Keywords of the paper as indicated by the authors (in the paper).

Relevance for our study (high / medium / low)

What is the relevance level of the paper for our study

Approach- and research design-related information

Approach- and research design-related information

Objective / Aim / Goal / Purpose & Research Questions

The research objective and established RQs.

Research method (including unit of analysis)

The methods used to collect data in the study, including the unit of analysis that refers to the country, organisation, or other specific unit that has been analysed such as the number of use-cases or policy documents, number and scope of the SLR etc.

Study’s contributions

The study’s contribution as defined by the authors

Qualitative / quantitative / mixed method

Whether the study uses a qualitative, quantitative, or mixed methods approach?

Availability of the underlying research data

Whether the paper has a reference to the public availability of the underlying research data e.g., transcriptions of interviews, collected data etc., or explains why these data are not openly shared?

Period under investigation

Period (or moment) in which the study was conducted (e.g., January 2021-March 2022)

Use of theory / theoretical concepts / approaches? If yes, specify them

Does the study mention any theory / theoretical concepts / approaches? If yes, what theory / concepts / approaches? If any theory is mentioned, how is theory used in the study? (e.g., mentioned to explain a certain phenomenon, used as a framework for analysis, tested theory, theory mentioned in the future research section).

Quality-related information

Quality concerns

Whether there are any quality concerns (e.g., limited information about the research methods used)?

Public Data Ecosystem-related information

Public data ecosystem definition

How is the public data ecosystem defined in the paper and any other equivalent term, mostly infrastructure. If an alternative term is used, how is the public data ecosystem called in the paper?

Public data ecosystem evolution / development

Does the paper define the evolution of the public data ecosystem? If yes, how is it defined and what factors affect it?

What constitutes a public data ecosystem?

What constitutes a public data ecosystem (components & relationships) - their "FORM / OUTPUT" presented in the paper (general description with more detailed answers to further additional questions).

Components and relationships

What components does the public data ecosystem consist of and what are the relationships between these components? Alternative names for components - element, construct, concept, item, helix, dimension etc. (detailed description).

Stakeholders

What stakeholders (e.g., governments, citizens, businesses, Non-Governmental Organisations (NGOs) etc.) does the public data ecosystem involve?

Actors and their roles

What actors does the public data ecosystem involve? What are their roles?

Data (data types, data dynamism, data categories etc.)

What data do the public data ecosystem cover (is intended / designed for)? Refer to all data-related aspects, including but not limited to data types, data dynamism (static data, dynamic, real-time data, stream), prevailing data categories / domains / topics etc.

Processes / activities / dimensions, data lifecycle phases

What processes, activities, dimensions and data lifecycle phases (e.g., locate, acquire, download, reuse, transform, etc.) does the public data ecosystem involve or refer to?

Level (if relevant)

What is the level of the public data ecosystem covered in the paper? (e.g., city, municipal, regional, national (=country), supranational, international).

Other elements or relationships (if any)

What other elements or relationships does the public data ecosystem consist of?

Additional comments

Additional comments (e.g., what other topics affected the public data ecosystems and their elements, what is expected to affect the public data ecosystems in the future, what were important topics by which the period was characterised etc.).

New papers

Does the study refer to any other potentially relevant papers?

Additional references to potentially relevant papers that were found in the analysed paper (snowballing).

Format of the file.xls, .csv (for the first spreadsheet only), .docx

Licenses or restrictionsCC-BY

For more info, see README.txt
w
Global Urban Definitions
datacatalog.worldbank.org
utf-8, zip
Updated Dec 12, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin P. Stewart (2018). Global Urban Definitions [Dataset]. https://datacatalog.worldbank.org/search/dataset/0040216
Explore at:
zip, utf-8Available download formats
Dataset updated
Dec 12, 2018
Dataset provided by
Benjamin P. Stewart
License
https://datacatalog.worldbank.org/public-licenses?fragment=cchttps://datacatalog.worldbank.org/public-licenses?fragment=cc
Description
There are a number of different ways to define what is urban, and this dataset contains the geospatial data used by the World Bank Group to assess various definitions. Specifically, there are the following datasets:
- European Commissions methodology applied to Landscan 2012
- Agglomeration Index

A detailed comparison of the datasets can be found in this paper: Urbanization and Development: Is Latin America and the Caribbean Different from the Rest of the World?
f
What is your definition of Big Data? Researchers’ understanding of the...
plos.figshare.com
pdf
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maddalena Favaretto; Eva De Clercq; Christophe Olivier Schneble; Bernice Simone Elger (2023). What is your definition of Big Data? Researchers’ understanding of the phenomenon of the decade [Dataset]. http://doi.org/10.1371/journal.pone.0228987
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0228987
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Maddalena Favaretto; Eva De Clercq; Christophe Olivier Schneble; Bernice Simone Elger
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The term Big Data is commonly used to describe a range of different concepts: from the collection and aggregation of vast amounts of data, to a plethora of advanced digital techniques designed to reveal patterns related to human behavior. In spite of its widespread use, the term is still loaded with conceptual vagueness. The aim of this study is to examine the understanding of the meaning of Big Data from the perspectives of researchers in the fields of psychology and sociology in order to examine whether researchers consider currently existing definitions to be adequate and investigate if a standard discipline centric definition is possible.MethodsThirty-nine interviews were performed with Swiss and American researchers involved in Big Data research in relevant fields. The interviews were analyzed using thematic coding.ResultsNo univocal definition of Big Data was found among the respondents and many participants admitted uncertainty towards giving a definition of Big Data. A few participants described Big Data with the traditional “Vs” definition—although they could not agree on the number of Vs. However, most of the researchers preferred a more practical definition, linking it to processes such as data collection and data processing.ConclusionThe study identified an overall uncertainty or uneasiness among researchers towards the use of the term Big Data which might derive from the tendency to recognize Big Data as a shifting and evolving cultural phenomenon. Moreover, the currently enacted use of the term as a hyped-up buzzword might further aggravate the conceptual vagueness of Big Data.
N
What Cheer, IA Population Breakdown by Gender and Age Dataset: Male and...
neilsberg.com
csv, json
Updated Feb 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). What Cheer, IA Population Breakdown by Gender and Age Dataset: Male and Female Population Distribution Across 18 Age Groups // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/e20a1def-f25d-11ef-8c1b-3860777c1fe6/
Explore at:
json, csvAvailable download formats
Dataset updated
Feb 24, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
What Cheer, Iowa
Variables measured
Male and Female Population Under 5 Years, Male and Female Population over 85 years, Male and Female Population Between 5 and 9 years, Male and Female Population Between 10 and 14 years, Male and Female Population Between 15 and 19 years, Male and Female Population Between 20 and 24 years, Male and Female Population Between 25 and 29 years, Male and Female Population Between 30 and 34 years, Male and Female Population Between 35 and 39 years, Male and Female Population Between 40 and 44 years, and 8 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the three variables, namely (a) Population (Male), (b) Population (Female), and (c) Gender Ratio (Males per 100 Females), we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau across 18 age groups, ranging from under 5 years to 85 years and above. These age groups are described above in the variables section. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the population of What Cheer by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for What Cheer. The dataset can be utilized to understand the population distribution of What Cheer by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in What Cheer. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for What Cheer.

Key observations

Largest age group (population): Male # 5-9 years (56) | Female # 20-24 years (38). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Age groups:

Under 5 years

5 to 9 years

10 to 14 years

15 to 19 years

20 to 24 years

25 to 29 years

30 to 34 years

35 to 39 years

40 to 44 years

45 to 49 years

50 to 54 years

55 to 59 years

60 to 64 years

65 to 69 years

70 to 74 years

75 to 79 years

80 to 84 years

85 years and over

Scope of gender :

Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.

Variables / Data Columns

Age Group: This column displays the age group for the What Cheer population analysis. Total expected values are 18 and are define above in the age groups section.

Population (Male): The male population in the What Cheer is shown in the following column.

Population (Female): The female population in the What Cheer is shown in the following column.

Gender Ratio: Also known as the sex ratio, this column displays the number of males per 100 females in What Cheer for each age group.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for What Cheer Population by Gender. You can refer the same here
A
‘School Dataset’ analyzed by Analyst-2
analyst-2.ai
Updated Sep 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘School Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-school-dataset-d13a/8b0564bc/?iid=004-885&v=presentation
Explore at:
Dataset updated
Sep 30, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘School Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/smeilisa07/number of school teacher student class on 30 September 2021.

--- Dataset description provided by original source is as follows ---

Context

This is my first analyst data. This dataset i got from open data Jakarta website (http://data.jakarta.go.id/), so mostly the dataset is in Indonesian. But i have try describe it that you can find it on VARIABLE DESCRIPTION.txt file.

Content

The title of this dataset is jumlah-sekolah-guru-murid-dan-ruang-kelas-menurut-jenis-sekolah-2011-2016, with type is CSV, so you can easily access it. If you not understand, the title means the number of school, teacher, student, and classroom according to the type of school 2011 - 2016. I think, if you just read from the title, you can imagine the contents. So this dataset have 50 observations and 8 variables, taken from 2011 until 2016.

In general, this dataset is about the quality of education in Jakarta, which each year some of school level always decreasing and some is increase, but not significant.

Acknowledgements

This dataset comes from Indonesian education authorities, which is already established in the CSV file by Open Data Jakarta.

Inspiration

Althought this data given from Open Data Jakarta publicly, i want always continue to improve my Data Scientist skill, especially in R programming, because i think R programming is easy to learn and really help me to be always curious about Data Scientist. So, this dataset that I am still struggle with below problem, and i need solution.

Question :

How can i cleaning this dataset ? I have try cleaning this dataset, but i still not sure. You can check on
my_hypothesis.txt file, when i try cleaning and visualize this dataset.

How can i specify the model for machine learning ? What recommended steps i should take ?

How should i cluster my dataset, if i want the label is not number but tingkat_sekolah for every tahun and
jenis_sekolah ? You can check on my_hypothesis.txt file.

--- Original source retains full ownership of the source dataset ---
Data from: Dataset of the manuscript "What is local research? Towards a...
zenodo.org
produccioncientifica.ugr.es
bin
Updated Nov 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Victoria Di Césare; Victoria Di Césare; Nicolas Robinson-Garcia; Nicolas Robinson-Garcia (2024). Dataset of the manuscript "What is local research? Towards a multidimensional framework linking theory and methods" [Dataset]. http://doi.org/10.5281/zenodo.14190851
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14190851
Dataset updated
Nov 20, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Victoria Di Césare; Victoria Di Césare; Nicolas Robinson-Garcia; Nicolas Robinson-Garcia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset of the manuscript "What is local research? Towards a multidimensional framework linking theory and methods". In this research article we propose a theoretical and empirical framework of local research, a concept of growing importance due to its far-reaching implications for public policy. Our motivation stems from the lack of clarity surrounding the increasing yet uncritical use of the term in both scientific publications and policy documents, where local research is conceptualized and measured in many ways. A clear understanding of it is crucial for informed decision-making when setting research agendas, allocating funds, and evaluating and rewarding scientists. Our twofold aim is (1) to compare the existing approaches that define and measure local research, and (2) to assess the implications of applying one over another. We first review the perspectives and measures used since the 1970s. Drawing on spatial scientometrics and proximities, we then build a framework that splits the concept into several dimensions: locally informed research, locally situated research, locally relevant research, locally bound research, and locally governed research. Each dimension is composed of a definition and a methodological approach, which we test in 10 million publications from the Dimensions database. Our findings reveal that these approaches measure distinct and sometimes unaligned aspects of local research, with varying effectiveness across countries and disciplines. This study highlights the complex, multifaceted nature of local research. We provide a flexible framework that facilitates the analysis of these dimensions and their intersections, in an attempt to contribute to the understanding and assessment of local research and its role within the production, dissemination, and impact of scientific knowledge.
Z
Data set - What Defines Quality of Life for Older Patients Diagnosed with...
data.niaid.nih.gov
data.europa.eu
Updated Oct 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marije E. Hamaker (2022). Data set - What Defines Quality of Life for Older Patients Diagnosed with Cancer? A Qualitative Study [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7062210
Explore at:
Dataset updated
Oct 5, 2022
Dataset provided by
Marije E. Hamaker
Siri Rostoft
Seghers, PAL
Shane O'Hanlon
Jolina A. Kregting
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data set from- What Defines Quality of Life for Older Patients Diagnosed with Cancer? A Qualitative Study

Abstract of the study: The treatment of cancer can have a significant impact on quality of life in older patients and this needs to be taken into account in decision making. However, quality of life can consist of many different components with varying importance between individuals. We set out to assess how older patients with cancer define quality of life and the components that are most significant to them. This was a single-centre, qualitative interview study. Patients aged 70 years or older with cancer were asked to answer open-ended questions: What makes life worthwhile? What does quality of life mean to you? What could affect your quality of life? Subsequently, they were asked to choose the five most important determinants of quality of life from a predefined list: cognition, contact with family or with community, independence, staying in your own home, helping others, having enough energy, emotional well-being, life satisfaction, religion and leisure activities. Afterwards, answers to the open-ended questions were independently categorized by two authors. The proportion of patients mentioning each category in the open-ended questions were compared to the predefined questions. Overall, 63 patients (median age 76 years) were included. When asked, “What makes life worthwhile?”, patients identified social functioning (86%) most frequently. Moreover, to define quality of life, patients most frequently mentioned categories in the domains of physical functioning (70%) and physical health (48%). Maintaining cognition was mentioned in 17% of the open-ended questions and it was the most commonly chosen option from the list of determinants (72% of respondents). In conclusion, physical functioning, social functioning, physical health and cognition are important components in quality of life. When discussing treatment options, the impact of treatment on these aspects should be taken into consideration.

Reference of research paper: Seghers PAL, Kregting JA, van Huis-Tanja LH, Soubeyran P, O'Hanlon S, Rostoft S, Hamaker ME, Portielje JEA. What Defines Quality of Life for Older Patients Diagnosed with Cancer? A Qualitative Study. Cancers. 2022; 14(5):1123. https://doi.org/10.3390/cancers14051123

Content of the data set: The first Tab describes what questions were asked, the second tab shows all individual anonymised answers to the open questions, the fourth shows the definitions that were used to classify all answers. Q1-Q4 show how the answers were categorised.
m
AR-ASAG-Dataset
data.mendeley.com
Updated Jul 1, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leila Ouahrani (2020). AR-ASAG-Dataset [Dataset]. http://doi.org/10.17632/dj95jh332j.1
Explore at:
Unique identifier
https://doi.org/10.17632/dj95jh332j.1
Dataset updated
Jul 1, 2020
Authors
Leila Ouahrani
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The ARabic Dataset for Automatic Short Answer Grading Evaluation V1. ISLRN 529-005-230-448-6. Our dataset consists of reported evaluations relate to answers submitted for three different exams submitted to three classes of students. The exams were conducted under natural conditions of evaluation. Each test consists of 16 short answer questions (a total of 48 questions). For each question, a model answer is proposed. Students submitted answers to these questions.
The number of answers obtained is different from one question to another. The dataset includes a total of 2133 pairs (Model Answer, student answer). the Dataset encompasses 5 types of questions: • "عرف ": Define? • "إشرح": Explain? • "ما النتائج المترتبة على": What consequences? • "علل": Justify? • "ما الفرق": What is the difference

AR-ASAG Dataset is available in different versions: TXT, XML, XML-MOODLE and Database (.DB).
The .DB format allows making the necessary exports according to specific analysis needs.
The XML-MOODLE format is used on Moodle e-learning Platforms For each pair, two grades (Mark1 and Mark2 ) are associated with a manual Average Gold Score Both manual grades are available in the dataset. Inter-Annotators Agreement: - (Pearson Correlation: r=0.8384) - (Root Mean Square Error : RMSE=0.8381). The Dataset can be also used for essay scoring as the students's answers responses take to reach 4-5 sentences. The Dataset exist in TXT, XML, XML-MOODLE Versions The name of the file is representative of its content. We use the term "Mark" to specify "Grade" For privacy reasons, no student identifiers are used in this Dataset.
The NIST Extensible Resource Data Model (NERDm): JSON schemas for rich...
catalog.data.gov
s.cnmilf.com
+1more
Updated Mar 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2025). The NIST Extensible Resource Data Model (NERDm): JSON schemas for rich description of data resources [Dataset]. https://catalog.data.gov/dataset/the-nist-extensible-resource-data-model-nerdm-json-schemas-for-rich-description-of-data-re
Explore at:
Dataset updated
Mar 14, 2025
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
The NIST Extensible Resource Data Model (NERDm) is a set of schemas for encoding in JSON format metadatathat describe digital resources. The variety of digital resources it can describe includes not onlydigital data sets and collections, but also software, digital services, web sites and portals, anddigital twins. It was created to serve as the internal metadata format used by the NIST Public DataRepository and Science Portal to drive rich presentations on the web and to enable discovery; however, itwas also designed to enable programmatic access to resources and their metadata by external users.Interoperability was also a key design aim: the schemas are defined using the JSON Schema standard,metadata are encoded as JSON-LD, and their semantics are tied to community ontologies, with an emphasison DCAT and the US federal Project Open Data (POD) models. Finally, extensibility is also central to itsdesign: the schemas are composed of a central core schema and various extension schemas. New extensionsto support richer metadata concepts can be added over time without breaking existing applications.Validation is central to NERDm's extensibility model. Consuming applications should be able to choosewhich metadata extensions they care to support and ignore terms and extensions they don't support.Furthermore, they should not fail when a NERDm document leverages extensions they don't recognize, evenwhen on-the-fly validation is required. To support this flexibility, the NERDm framework allowsdocuments to declare what extensions are being used and where. We have developed an optional extensionto the standard JSON Schema validation (see ejsonschema below) to support flexible validation: while astandard JSON Schema validater can validate a NERDm document against the NERDm core schema, our extensionwill validate a NERDm document against any recognized extensions and ignore those that are notrecognized.The NERDm data model is based around the concept of resource, semantically equivalent to a schema.orgResource, and as in schema.org, there can be different types of resources, such as data sets andsoftware. A NERDm document indicates what types the resource qualifies as via the JSON-LD "@type"property. All NERDm Resources are described by metadata terms from the core NERDm schema; however,different resource types can be described by additional metadata properties (often drawing on particularNERDm extension schemas). A Resource contains Components of various types (includingDCAT-defined Distributions) that are considered part of the Resource; specifically, these can include downloadable data files, hierachical datacollecitons, links to web sites (like software repositories), software tools, or other NERDm Resources.Through the NERDm extension system, domain-specific metadata can be included at either the resource orcomponent level. The direct semantic and syntactic connections to the DCAT, POD, and schema.org schemasis intended to ensure unambiguous conversion of NERDm documents into those schemas.As of this writing, the Core NERDm schema and its framework stands at version 0.7 and is compatible withthe "draft-04" version of JSON Schema. Version 1.0 is projected to be released in 2025. In thatrelease, the NERDm schemas will be updated to the "draft2020" version of JSON Schema. Other improvementswill include stronger support for RDF and the Linked Data Platform through its support of JSON-LD.
d
Sidewalk to Street "Walkability" Ratio
catalog.data.gov
Updated Jan 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Western Pennsylvania Regional Data Center (2023). Sidewalk to Street "Walkability" Ratio [Dataset]. https://catalog.data.gov/dataset/sidewalk-to-street-walkability-ratio
Explore at:
Dataset updated
Jan 24, 2023
Dataset provided by
Western Pennsylvania Regional Data Center
Description
We’ve been asked to create measures of communities that are “walkable” for several projects. While there is no standard definition of what makes a community “walkable”, and the definition of “walkability” can differ from person to person, we thought an indicator that explores the total length of available sidewalks relative to the total length of streets in a community could be a good place to start. In this blog post, we describe how we used open data from SPC and Allegheny County to create a new measure for how “walkable” a community is. We wanted to create a ratio of the length of a community’s sidewalks to the length of a community’s streets as a measure of pedestrian infrastructure. A ratio of 1 would mean that a community has an equal number of linear feet of sidewalks and streets. A ratio of about 2 would mean that a community has two linear feet of sidewalk for every linear foot of street. In other words, every street has a sidewalk on either side of it. In creating a measure of the ratio of streets to sidewalks, we had to do a little bit of data cleanup. Much of this was by trial and error, ground-truthing the data based on our personal experiences walking in different neighborhoods. Since street data was not shared as open data by many counties in our region either on PASDA or through the SPC open data portal, we limited our analysis of “walkability” to Allegheny County. In looking at the sidewalk data table and map, we noticed that trails were included. While nice to have in the data, we wanted to exclude these two features from the ratio. We did this to avoid a situation where a community that had few sidewalks but was in the same blockgroup as a park with trails would get “credit” for being more “walkable” than it actually is according to our definition. We did this by removing all segments where “Trail” was in the “Type_Name” field. We also used a similar tabular selection method to remove crosswalks from the sidewalk data “Type_Name”=”Crosswalk.” We kept the steps in the dataset along with the sidewalks. In the street data obtained from Allegheny County’s GIS department, we felt like we should try to exclude limited-access highway segments from the analysis, since pedestrians are prohibited from using them, and their presence would have reduced the sidewalk/street ratio in communities where they are located. We did this by excluding street segments whose values in the “FCC” field (designating type of street) equaled “A11” or “A63.” We also removed trails from this dataset by excluding those classified as “H10.” Since documentation was sparse, we looked to see how these features were classified in the data to determine which codes to exclude. After running the data initially, we also realized that excluding alleyways from the calculations also could improve the accuracy of our results. Some of the communities with substantial pedestrian infrastructure have alleyways, and including them would make them appear to be less-”walkable” in our indicator. We removed these from the dataset by removing records with a value of “Aly” or “Way” in the “St_Type” field. We also excluded streets where the word “Alley” appeared in the street name, or “St_Name” field. The full methodology used for this dataset is captured in our blog post, and we have also included the sidewalk and street data used to create the ratio here as well.
e
Data from: Analysis of the Dataset for the Terms Novice Programmers Use to...
b2find.eudat.eu
Updated Jun 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Analysis of the Dataset for the Terms Novice Programmers Use to Describe Code Snippets in Java [Dataset]. https://b2find.eudat.eu/dataset/0e3aa9e0-79a2-5f7c-9aa0-83c168ca5c34
Explore at:
Dataset updated
Jun 2, 2024
Description
This dataset consists of about 1800 free-text responses in German from 123 students in an introductory programming course. For 15 different code snippets in Java, the participants described how they would explain what the corresponding code snippet does. This dataset includes also the analysis of the responses.
N
What Cheer, IA Population Pyramid Dataset: Age Groups, Male and Female...
neilsberg.com
csv, json
Updated Feb 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). What Cheer, IA Population Pyramid Dataset: Age Groups, Male and Female Population, and Total Population for Demographics Analysis // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/5278cfba-f122-11ef-8c1b-3860777c1fe6/
Explore at:
json, csvAvailable download formats
Dataset updated
Feb 22, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
What Cheer, Iowa
Variables measured
Male and Female Population Under 5 Years, Male and Female Population over 85 years, Male and Female Total Population for Age Groups, Male and Female Population Between 5 and 9 years, Male and Female Population Between 10 and 14 years, Male and Female Population Between 15 and 19 years, Male and Female Population Between 20 and 24 years, Male and Female Population Between 25 and 29 years, Male and Female Population Between 30 and 34 years, Male and Female Population Between 35 and 39 years, and 9 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the three variables, namely (a) male population, (b) female population and (b) total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the data for the What Cheer, IA population pyramid, which represents the What Cheer population distribution across age and gender, using estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. It lists the male and female population for each age group, along with the total population for those age groups. Higher numbers at the bottom of the table suggest population growth, whereas higher numbers at the top indicate declining birth rates. Furthermore, the dataset can be utilized to understand the youth dependency ratio, old-age dependency ratio, total dependency ratio, and potential support ratio.

Key observations

Youth dependency ratio, which is the number of children aged 0-14 per 100 persons aged 15-64, for What Cheer, IA, is 38.7.

Old-age dependency ratio, which is the number of persons aged 65 or over per 100 persons aged 15-64, for What Cheer, IA, is 31.7.

Total dependency ratio for What Cheer, IA is 70.4.

Potential support ratio, which is the number of youth (working age population) per elderly, for What Cheer, IA is 3.2.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Age groups:

Under 5 years

5 to 9 years

10 to 14 years

15 to 19 years

20 to 24 years

25 to 29 years

30 to 34 years

35 to 39 years

40 to 44 years

45 to 49 years

50 to 54 years

55 to 59 years

60 to 64 years

65 to 69 years

70 to 74 years

75 to 79 years

80 to 84 years

85 years and over

Variables / Data Columns

Age Group: This column displays the age group for the What Cheer population analysis. Total expected values are 18 and are define above in the age groups section.

Population (Male): The male population in the What Cheer for the selected age group is shown in the following column.

Population (Female): The female population in the What Cheer for the selected age group is shown in the following column.

Total Population: The total population of the What Cheer for the selected age group is shown in the following column.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for What Cheer Population by Age. You can refer the same here
d
Traffic Crashes - Vehicles
catalog.data.gov
data.cityofchicago.org
+1more
Updated Aug 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofchicago.org (2025). Traffic Crashes - Vehicles [Dataset]. https://catalog.data.gov/dataset/traffic-crashes-vehicles
Explore at:
Dataset updated
Aug 11, 2025
Dataset provided by
data.cityofchicago.org
Description
This dataset contains information about vehicles (or units as they are identified in crash reports) involved in a traffic crash. This dataset should be used in conjunction with the traffic Crash and People dataset available in the portal. “Vehicle” information includes motor vehicle and non-motor vehicle modes of transportation, such as bicycles and pedestrians. Each mode of transportation involved in a crash is a “unit” and get one entry here. Each vehicle, each pedestrian, each motorcyclist, and each bicyclist is considered an independent unit that can have a trajectory separate from the other units. However, people inside a vehicle including the driver do not have a trajectory separate from the vehicle in which they are travelling and hence only the vehicle they are travelling in get any entry here. This type of identification of “units” is needed to determine how each movement affected the crash. Data for occupants who do not make up an independent unit, typically drivers and passengers, are available in the People table. Many of the fields are coded to denote the type and location of damage on the vehicle. Vehicle information can be linked back to Crash data using the “CRASH_RECORD_ID” field. Since this dataset is a combination of vehicles, pedestrians, and pedal cyclists not all columns are applicable to each record. Look at the Unit Type field to determine what additional data may be available for that record. The Chicago Police Department reports crashes on IL Traffic Crash Reporting form SR1050. The crash data published on the Chicago data portal mostly follows the data elements in SR1050 form. The current version of the SR1050 instructions manual with detailed information on each data elements is available here. Change 11/21/2023: We have removed the RD_NO (Chicago Police Department report number) for privacy reasons.
d
Police Department Investigated Hate Crimes
catalog.data.gov
data.sfgov.org
+1more
Updated Aug 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.sfgov.org (2025). Police Department Investigated Hate Crimes [Dataset]. https://catalog.data.gov/dataset/police-department-investigated-hate-crimes
Explore at:
Dataset updated
Aug 23, 2025
Dataset provided by
data.sfgov.org
Description
A. SUMMARY These data represent hate crimes reported by the SFPD to the California Department of Justice. Read the detailed overview of this dataset here. What is a Hate Crime? A hate crime is a crime against a person, group, or property motivated by the victim's real or perceived protected social group. An individual may be the victim of a hate crime if they have been targeted because of their actual or perceived: (1) disability, (2) gender, (3) nationality, (4) race or ethnicity, (5) religion, (6) sexual orientation, and/or (7) association with a person or group with one or more of these actual or perceived characteristics. Hate crimes are serious crimes that may result in imprisonment or jail time. B. HOW THE DATASET IS CREATED How is a Hate Crime Processed? Not all prejudice incidents including the utterance of hate speech rise to the level of a hate crime. The U.S. Constitution allows hate speech if it does not interfere with the civil rights of others. While these acts are certainly hurtful, they do not rise to the level of criminal violations and thus may not be prosecuted. When a prejudice incident is reported, the reporting officer conducts a preliminary investigation and writes a crime or incident report. Bigotry must be the central motivation for an incident to be determined to be a hate crime. In that report, all facts such as verbatims or statements that occurred before or after the incident and characteristics such as the race, ethnicity, sex, religion, or sexual orientations of the victim and suspect (if known) are included. To classify a prejudice incident, the San Francisco Police Department’s Hate Crimes Unit of the Special Investigations Division conducts an analysis of the incident report to determine if the incident falls under the definition of a “hate crime” as defined by state law. California Penal Code 422.55 - Hate Crime Definition C. UPDATE PROCESS These data are updated monthly. D. HOW TO USE THIS DATASET This dataset includes the following information about each incident: the hate crime offense, bias type, location/time, and the number of hate crime victims and suspects. The data presented mirrors data published by the California Department of Justice, albeit at a higher frequency. The publishing of these data meet requirements set forth in PC 13023. E. RELATED DATASETS California Department of Justice - Hate Crimes Info California Department of Justice - Hate Crimes Data
Art&Emotions Dataset
zenodo.org
data.niaid.nih.gov
+1more
csv, zip
Updated Aug 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alessio Bosca; Alessio Bosca (2023). Art&Emotions Dataset [Dataset]. http://doi.org/10.5281/zenodo.8296750
Explore at:
csv, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8296750
Dataset updated
Aug 29, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alessio Bosca; Alessio Bosca
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Art&Emotion experiment description

The Art & Emotions dataset was collected in the scope of EU funded research project SPICE (https://cordis.europa.eu/project/id/870811) with the goal of investigating the relationship between art and emotions and collecting written data (User Generated Content) in the domain of arts in all the languages of the SPICE project (fi, en, es, he, it). The data was collected through a set of Google Forms (one for each language) and it was used in the project (along the other datasets collected by museums in the different project use cases) in order to train and test Emotion Detection Models within the project.

The experiment consists of 12 artworks, chosen from a group of artworks provided by the GAM Museum of Turin (https://www.gamtorino.it/) one of the project partners. Each artwork is presented in a different section of the form; for each of the artworks, the user is asked to answer 5 open questions:

1. What do you see in this picture? Write what strikes you most in this image.

2. What does this artwork make you think about? Write the thoughts and memories that the picture evokes.

3. How does this painting make you feel? Write the feelings and emotions that the picture evokes in you

4. What title would you give to this artwork?

5. Now choose one or more emoji to associate with your feelings looking at this artwork. You can also select "other" and insert other emojis by copying them from this link: https://emojipedia.org/

For each of the artworks, the user can decide whether to skip to the next artwork, if he does not like the one in front of him or go back to the previous artworks and modify the answers. It is not mandatory to fill all the questions for a given artwork.

The question about emotions is left open so as not to force the person to choose emotions from a list of tags which are the tags of a model (e.g. Plutchik), but leaving him free to express the different shades of emotions that can be felt.

Before getting to the heart of the experiment, with the artworks sections, the user is asked to leave some personal information (anonymously), to help us getting an idea of the type of users who participated in the experiment.

The questions are:

Age (open)

Gender (male, female, prefer not to say, other (open))

How would you define your relationship with art?

My job is related to the art world

I am passionate about the art

I am a little interested in art

I am not interested in art

4. Do you like going to museums or art exhibitions?

I like to visit museums frequently

I go occasionally to museums or art exhibitions

I rarely visit museums or art exhibitions

---------------------

Dataset structure:

FI.csv: form data (personal data + open questions) in Finnish (UTF-8)

EN.csv: form data (personal data + open questions) in English (UTF-8)

ES.csv: form data (personal data + open questions) in Spanish (UTF-8)

HE.csv: form data (personal data + open questions) in Hebrew (UTF-8)

IT.csv: form data (personal data + open questions) in Italian (UTF-8)

artworks.csv: the list of artworks including title, author, picture name (the pictures can be found in pictures.zip) and the mapping between the columns in the form data and the questions about that artwork

pictures.zip: the jpeg of the artworks
a
SES Water Reservoir Levels
hub.arcgis.com
streamwaterdata.co.uk
+1more
Updated Apr 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
dpararajasingam_ses (2024). SES Water Reservoir Levels [Dataset]. https://hub.arcgis.com/datasets/f8699b39279b4def88ef3eff6ebdc5ab
Explore at:
Dataset updated
Apr 26, 2024
Dataset authored and provided by
dpararajasingam_ses
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
Overview   This dataset provides the measurements of raw water storage levels in reservoirs crucial for public water supply, The reservoirs included in this dataset are natural bodies of water that have been dammed to store untreated water.    Key Definitions   Aggregation  The process of summarizing or grouping data to obtain a single or reduced set of information, often for analysis or reporting purposes.    Capacity The maximum volume of water a reservoir can hold above the natural level of the surrounding land, with thresholds for regulation at 10,000 cubic meters in England, Wales and Northern Ireland and a modified threshold of 25,000 cubic meters in Scotland pending full implementation of the Reservoirs (Scotland) Act 2011. Current Level The present volume of water held in a reservoir measured above a set baseline crucial for safety and regulatory compliance. Current Percentage The current water volume in a reservoir as a percentage of its total capacity, indicating how full the reservoir is at any given time. Dataset  Structured and organized collection of related elements, often stored digitally, used for analysis and interpretation in various fields.   Granularity  Data granularity is a measure of the level of detail in a data structure. In time-series data, for example, the granularity of measurement might be based on intervals of years, months, weeks, days, or hours  ID  Abbreviation for Identification that refers to any means of verifying the unique identifier assigned to each asset for the purposes of tracking, management, and maintenance.   Open Data Triage  The process carried out by a Data Custodian to determine if there is any evidence of sensitivities associated with Data Assets, their associated Metadata and Software Scripts used to process Data Assets if they are used as Open Data.   Reservoir Large natural lake used for storing raw water intended for human consumption. Its volume is measurable, allowing for careful management and monitoring to meet demand for clean, safe water. Reservoir Type The classification of a reservoir based on the method of construction, the purpose it serves or the source of water it stores. Schema  Structure for organizing and handling data within a dataset, defining the attributes, their data types, and the relationships between different entities. It acts as a framework that ensures data integrity and consistency by specifying permissible data types and constraints for each attribute.   Units  Standard measurements used to quantify and compare different physical quantities.     Data History   Data Origin   Reservoir level data is sourced from water companies who may also update this information on their website and government publications such as the Water situation reports provided by the UK government. Data Triage Considerations  Identification of Critical Infrastructure Special attention is given to safeguard data on essential reservoirs in line with the National Infrastructure Act, to mitigate security risks and ensure resilience of public water systems. Currently, it is agreed that only reservoirs with a location already available in the public domain are included in this dataset. Commercial Risks and Anonymisation The risk of personal information exposure is minimal to none since the data concerns reservoir levels, which are not linked to individuals or households. Data Freshness It is not currently possible to make the dataset live. Some companies have digital monitoring, and some are measuring reservoir levels analogically. This dataset may not be used to determine reservoir level in place of visual checks where these are advised. Data Triage Review Frequency   Annually unless otherwise requested  Data Specifications  Data specifications define what is included and excluded in the dataset to maintain clarity and focus. For this dataset: Each dataset covers measurements taken by the publisher. This dataset is published periodically in line with the publisher’s capabilities Historical datasets may be provided for comparison but are not required The location data provided may be a point from anywhere within the body of water or on its boundary. Reservoirs included in the dataset must be: Open bodies of water used to store raw/untreated water Filled naturally Measurable Contain water that may go on to be used for public supply Context  This dataset must not be used to determine the implementation of low supply or high supply measures such as hose pipe bans being put in place or removed. Please await guidance from your water supplier regarding any changes required to your usage of water. Particularly high or low reservoir levels may be considered normal or as expected given the season or recent weather. This dataset does not remove the requirement for visual checks on reservoir level that are in place for caving/pot holing safety. Some water companies calculate the capacity of reservoirs differently than others. The capacity can mean the useable volume of the reservoir or the overall volume that can be held in the reservoir including water below the water table. Data Publish Frequency   Annually
Product sales
kaggle.com
Updated May 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sourabh Baldwa (2021). Product sales [Dataset]. https://www.kaggle.com/sourabhbaldwa/commerceiq/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 14, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sourabh Baldwa
Description
Context

There's a story behind every dataset and here's your opportunity to share yours. sales data of Various products on Amazon are given for three years.

Content

What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.

Acknowledgements

We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered? You need to define pricing strategy of different product on holidays by using time series analysis
b
90 days past due loans by state - small-sized enterprise - Pará - Dataset -...
opendata.bcb.gov.br
Updated Jan 25, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). 90 days past due loans by state - small-sized enterprise - Pará - Dataset - Banco Central do Brasil Open Data Portal [Dataset]. https://opendata.bcb.gov.br/dataset/26261-90-days-past-due-loans-by-state---small-sized-enterprise---para
Explore at:
Dataset updated
Jan 25, 2018
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Concept: 90 days past due loans by state - small-sized enterprise - Pará Source: Credit Information System 26261-90-days-past-due-loans-by-state---small-sized-enterprise---para 26261-90-days-past-due-loans-by-state---small-sized-enterprise---para 0 0 Feedback Thank you! Close Feedback. Sorry. Tell us what happen. The data is out of date. I was unable to access the dataset (specify the resource). Insufficient documentation to understand the data set. The data contains error or inconsistency. Describe Your assessment will be sent to the e-Ouv system as a complaint. Click here if you want to track your progress. Name Email Your statement was sent to the e-Ouv system. Click here for details. Protocol number: Access code: Send to e-Ouv system Close

Facebook

Twitter

Click to copy link

Link copied

Cite

Anastasija Nikiforova (2023). Dataset: A Systematic Literature Review on the topic of High-value datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7944424

Dataset: A Systematic Literature Review on the topic of High-value datasets

Explore at:

Dataset updated

Jun 23, 2023

Dataset provided by

Andrea Miletič
Charalampos Alexopoulos
Magdalena Ciesielska
Nina Rizun
Anastasija Nikiforova

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset contains data collected during a study ("Towards High-Value Datasets determination for data-driven development: a systematic literature review") conducted by Anastasija Nikiforova (University of Tartu), Nina Rizun, Magdalena Ciesielska (Gdańsk University of Technology), Charalampos Alexopoulos (University of the Aegean) and Andrea Miletič (University of Zagreb) It being made public both to act as supplementary data for "Towards High-Value Datasets determination for data-driven development: a systematic literature review" paper (pre-print is available in Open Access here -> https://arxiv.org/abs/2305.10234) and in order for other researchers to use these data in their own work.

The protocol is intended for the Systematic Literature review on the topic of High-value Datasets with the aim to gather information on how the topic of High-value datasets (HVD) and their determination has been reflected in the literature over the years and what has been found by these studies to date, incl. the indicators used in them, involved stakeholders, data-related aspects, and frameworks. The data in this dataset were collected in the result of the SLR over Scopus, Web of Science, and Digital Government Research library (DGRL) in 2023.

Methodology

To understand how HVD determination has been reflected in the literature over the years and what has been found by these studies to date, all relevant literature covering this topic has been studied. To this end, the SLR was carried out to by searching digital libraries covered by Scopus, Web of Science (WoS), Digital Government Research library (DGRL).

These databases were queried for keywords ("open data" OR "open government data") AND ("high-value data*" OR "high value data*"), which were applied to the article title, keywords, and abstract to limit the number of papers to those, where these objects were primary research objects rather than mentioned in the body, e.g., as a future work. After deduplication, 11 articles were found unique and were further checked for relevance. As a result, a total of 9 articles were further examined. Each study was independently examined by at least two authors.

To attain the objective of our study, we developed the protocol, where the information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information.

Test procedure Each study was independently examined by at least two authors, where after the in-depth examination of the full-text of the article, the structured protocol has been filled for each study. The structure of the survey is available in the supplementary file available (see Protocol_HVD_SLR.odt, Protocol_HVD_SLR.docx) The data collected for each study by two researchers were then synthesized in one final version by the third researcher.

Description of the data in this data set

Protocol_HVD_SLR provides the structure of the protocol Spreadsheets #1 provides the filled protocol for relevant studies. Spreadsheet#2 provides the list of results after the search over three indexing databases, i.e. before filtering out irrelevant studies

The information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information

Descriptive information
1) Article number - a study number, corresponding to the study number assigned in an Excel worksheet 2) Complete reference - the complete source information to refer to the study 3) Year of publication - the year in which the study was published 4) Journal article / conference paper / book chapter - the type of the paper -{journal article, conference paper, book chapter} 5) DOI / Website- a link to the website where the study can be found 6) Number of citations - the number of citations of the article in Google Scholar, Scopus, Web of Science 7) Availability in OA - availability of an article in the Open Access 8) Keywords - keywords of the paper as indicated by the authors 9) Relevance for this study - what is the relevance level of the article for this study? {high / medium / low}

Approach- and research design-related information 10) Objective / RQ - the research objective / aim, established research questions 11) Research method (including unit of analysis) - the methods used to collect data, including the unit of analy-sis (country, organisation, specific unit that has been ana-lysed, e.g., the number of use-cases, scope of the SLR etc.) 12) Contributions - the contributions of the study 13) Method - whether the study uses a qualitative, quantitative, or mixed methods approach? 14) Availability of the underlying research data- whether there is a reference to the publicly available underly-ing research data e.g., transcriptions of interviews, collected data, or explanation why these data are not shared? 15) Period under investigation - period (or moment) in which the study was conducted 16) Use of theory / theoretical concepts / approaches - does the study mention any theory / theoretical concepts / approaches? If any theory is mentioned, how is theory used in the study?

Quality- and relevance- related information
17) Quality concerns - whether there are any quality concerns (e.g., limited infor-mation about the research methods used)? 18) Primary research object - is the HVD a primary research object in the study? (primary - the paper is focused around the HVD determination, sec-ondary - mentioned but not studied (e.g., as part of discus-sion, future work etc.))

HVD determination-related information
19) HVD definition and type of value - how is the HVD defined in the article and / or any other equivalent term? 20) HVD indicators - what are the indicators to identify HVD? How were they identified? (components & relationships, “input -> output") 21) A framework for HVD determination - is there a framework presented for HVD identification? What components does it consist of and what are the rela-tionships between these components? (detailed description) 22) Stakeholders and their roles - what stakeholders or actors does HVD determination in-volve? What are their roles? 23) Data - what data do HVD cover? 24) Level (if relevant) - what is the level of the HVD determination covered in the article? (e.g., city, regional, national, international)

Format of the file .xls, .csv (for the first spreadsheet only), .odt, .docx

Licenses or restrictions CC-BY

For more info, see README.txt

Clear search

Close search

Google apps

Main menu

Dataset: A Systematic Literature Review on the topic of High-value datasets

5 elements that define what a thesis statement is

Dataset

Contents

Conceptualization of public data ecosystems

Global Urban Definitions

What is your definition of Big Data? Researchers’ understanding of the...

What Cheer, IA Population Breakdown by Gender and Age Dataset: Male and...

About this dataset

Content

Inspiration

Recommended for further research

‘School Dataset’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration

Data from: Dataset of the manuscript "What is local research? Towards a...

Data set - What Defines Quality of Life for Older Patients Diagnosed with...

AR-ASAG-Dataset

The NIST Extensible Resource Data Model (NERDm): JSON schemas for rich...

Sidewalk to Street "Walkability" Ratio

Data from: Analysis of the Dataset for the Terms Novice Programmers Use to...

What Cheer, IA Population Pyramid Dataset: Age Groups, Male and Female...

About this dataset

Content

Inspiration

Recommended for further research

Traffic Crashes - Vehicles

Police Department Investigated Hate Crimes

Art&Emotions Dataset

SES Water Reservoir Levels

Product sales

Context

Content

Acknowledgements

Inspiration

90 days past due loans by state - small-sized enterprise - Pará - Dataset -...

Dataset: A Systematic Literature Review on the topic of High-value datasets