66 datasets found

Z
Dataset: A Systematic Literature Review on the topic of High-value datasets
data.niaid.nih.gov
zenodo.org
Updated Jun 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anastasija Nikiforova; Nina Rizun; Magdalena Ciesielska; Charalampos Alexopoulos; Andrea Miletič (2023). Dataset: A Systematic Literature Review on the topic of High-value datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7944424
Explore at:
Dataset updated
Jun 23, 2023
Dataset provided by
University of Zagreb
University of the Aegean
Gdańsk University of Technology
University of Tartu
Authors
Anastasija Nikiforova; Nina Rizun; Magdalena Ciesielska; Charalampos Alexopoulos; Andrea Miletič
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains data collected during a study ("Towards High-Value Datasets determination for data-driven development: a systematic literature review") conducted by Anastasija Nikiforova (University of Tartu), Nina Rizun, Magdalena Ciesielska (Gdańsk University of Technology), Charalampos Alexopoulos (University of the Aegean) and Andrea Miletič (University of Zagreb) It being made public both to act as supplementary data for "Towards High-Value Datasets determination for data-driven development: a systematic literature review" paper (pre-print is available in Open Access here -> https://arxiv.org/abs/2305.10234) and in order for other researchers to use these data in their own work.

The protocol is intended for the Systematic Literature review on the topic of High-value Datasets with the aim to gather information on how the topic of High-value datasets (HVD) and their determination has been reflected in the literature over the years and what has been found by these studies to date, incl. the indicators used in them, involved stakeholders, data-related aspects, and frameworks. The data in this dataset were collected in the result of the SLR over Scopus, Web of Science, and Digital Government Research library (DGRL) in 2023.

Methodology

To understand how HVD determination has been reflected in the literature over the years and what has been found by these studies to date, all relevant literature covering this topic has been studied. To this end, the SLR was carried out to by searching digital libraries covered by Scopus, Web of Science (WoS), Digital Government Research library (DGRL).

These databases were queried for keywords ("open data" OR "open government data") AND ("high-value data*" OR "high value data*"), which were applied to the article title, keywords, and abstract to limit the number of papers to those, where these objects were primary research objects rather than mentioned in the body, e.g., as a future work. After deduplication, 11 articles were found unique and were further checked for relevance. As a result, a total of 9 articles were further examined. Each study was independently examined by at least two authors.

To attain the objective of our study, we developed the protocol, where the information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information.

Test procedure Each study was independently examined by at least two authors, where after the in-depth examination of the full-text of the article, the structured protocol has been filled for each study. The structure of the survey is available in the supplementary file available (see Protocol_HVD_SLR.odt, Protocol_HVD_SLR.docx) The data collected for each study by two researchers were then synthesized in one final version by the third researcher.

Description of the data in this data set

Protocol_HVD_SLR provides the structure of the protocol Spreadsheets #1 provides the filled protocol for relevant studies. Spreadsheet#2 provides the list of results after the search over three indexing databases, i.e. before filtering out irrelevant studies

The information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information

Descriptive information
1) Article number - a study number, corresponding to the study number assigned in an Excel worksheet 2) Complete reference - the complete source information to refer to the study 3) Year of publication - the year in which the study was published 4) Journal article / conference paper / book chapter - the type of the paper -{journal article, conference paper, book chapter} 5) DOI / Website- a link to the website where the study can be found 6) Number of citations - the number of citations of the article in Google Scholar, Scopus, Web of Science 7) Availability in OA - availability of an article in the Open Access 8) Keywords - keywords of the paper as indicated by the authors 9) Relevance for this study - what is the relevance level of the article for this study? {high / medium / low}

Approach- and research design-related information 10) Objective / RQ - the research objective / aim, established research questions 11) Research method (including unit of analysis) - the methods used to collect data, including the unit of analy-sis (country, organisation, specific unit that has been ana-lysed, e.g., the number of use-cases, scope of the SLR etc.) 12) Contributions - the contributions of the study 13) Method - whether the study uses a qualitative, quantitative, or mixed methods approach? 14) Availability of the underlying research data- whether there is a reference to the publicly available underly-ing research data e.g., transcriptions of interviews, collected data, or explanation why these data are not shared? 15) Period under investigation - period (or moment) in which the study was conducted 16) Use of theory / theoretical concepts / approaches - does the study mention any theory / theoretical concepts / approaches? If any theory is mentioned, how is theory used in the study?

Quality- and relevance- related information
17) Quality concerns - whether there are any quality concerns (e.g., limited infor-mation about the research methods used)? 18) Primary research object - is the HVD a primary research object in the study? (primary - the paper is focused around the HVD determination, sec-ondary - mentioned but not studied (e.g., as part of discus-sion, future work etc.))

HVD determination-related information
19) HVD definition and type of value - how is the HVD defined in the article and / or any other equivalent term? 20) HVD indicators - what are the indicators to identify HVD? How were they identified? (components & relationships, “input -> output") 21) A framework for HVD determination - is there a framework presented for HVD identification? What components does it consist of and what are the rela-tionships between these components? (detailed description) 22) Stakeholders and their roles - what stakeholders or actors does HVD determination in-volve? What are their roles? 23) Data - what data do HVD cover? 24) Level (if relevant) - what is the level of the HVD determination covered in the article? (e.g., city, regional, national, international)

Format of the file .xls, .csv (for the first spreadsheet only), .odt, .docx

Licenses or restrictions CC-BY

For more info, see README.txt
Z
Conceptualization of public data ecosystems
data.niaid.nih.gov
Updated Sep 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anastasija, Nikiforova; Martin, Lnenicka (2024). Conceptualization of public data ecosystems [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13842001
Explore at:
Dataset updated
Sep 26, 2024
Dataset provided by
University of Tartu
University of Hradec Králové
Authors
Anastasija, Nikiforova; Martin, Lnenicka
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains data collected during a study "Understanding the development of public data ecosystems: from a conceptual model to a six-generation model of the evolution of public data ecosystems" conducted by Martin Lnenicka (University of Hradec Králové, Czech Republic), Anastasija Nikiforova (University of Tartu, Estonia), Mariusz Luterek (University of Warsaw, Warsaw, Poland), Petar Milic (University of Pristina - Kosovska Mitrovica, Serbia), Daniel Rudmark (Swedish National Road and Transport Research Institute, Sweden), Sebastian Neumaier (St. Pölten University of Applied Sciences, Austria), Karlo Kević (University of Zagreb, Croatia), Anneke Zuiderwijk (Delft University of Technology, Delft, the Netherlands), Manuel Pedro Rodríguez Bolívar (University of Granada, Granada, Spain).

As there is a lack of understanding of the elements that constitute different types of value-adding public data ecosystems and how these elements form and shape the development of these ecosystems over time, which can lead to misguided efforts to develop future public data ecosystems, the aim of the study is: (1) to explore how public data ecosystems have developed over time and (2) to identify the value-adding elements and formative characteristics of public data ecosystems. Using an exploratory retrospective analysis and a deductive approach, we systematically review 148 studies published between 1994 and 2023. Based on the results, this study presents a typology of public data ecosystems and develops a conceptual model of elements and formative characteristics that contribute most to value-adding public data ecosystems, and develops a conceptual model of the evolutionary generation of public data ecosystems represented by six generations called Evolutionary Model of Public Data Ecosystems (EMPDE). Finally, three avenues for a future research agenda are proposed.

This dataset is being made public both to act as supplementary data for "Understanding the development of public data ecosystems: from a conceptual model to a six-generation model of the evolution of public data ecosystems ", Telematics and Informatics*, and its Systematic Literature Review component that informs the study.

Description of the data in this data set

PublicDataEcosystem_SLR provides the structure of the protocol

Spreadsheet#1 provides the list of results after the search over three indexing databases and filtering out irrelevant studies

Spreadsheets #2 provides the protocol structure.

Spreadsheets #3 provides the filled protocol for relevant studies.

The information on each selected study was collected in four categories:(1) descriptive information,(2) approach- and research design- related information,(3) quality-related information,(4) HVD determination-related information

Descriptive Information

Article number

A study number, corresponding to the study number assigned in an Excel worksheet

Complete reference

The complete source information to refer to the study (in APA style), including the author(s) of the study, the year in which it was published, the study's title and other source information.

Year of publication

The year in which the study was published.

Journal article / conference paper / book chapter

The type of the paper, i.e., journal article, conference paper, or book chapter.

Journal / conference / book

Journal article, conference, where the paper is published.

DOI / Website

A link to the website where the study can be found.

Number of words

A number of words of the study.

Number of citations in Scopus and WoS

The number of citations of the paper in Scopus and WoS digital libraries.

Availability in Open Access

Availability of a study in the Open Access or Free / Full Access.

Keywords

Keywords of the paper as indicated by the authors (in the paper).

Relevance for our study (high / medium / low)

What is the relevance level of the paper for our study

Approach- and research design-related information

Approach- and research design-related information

Objective / Aim / Goal / Purpose & Research Questions

The research objective and established RQs.

Research method (including unit of analysis)

The methods used to collect data in the study, including the unit of analysis that refers to the country, organisation, or other specific unit that has been analysed such as the number of use-cases or policy documents, number and scope of the SLR etc.

Study’s contributions

The study’s contribution as defined by the authors

Qualitative / quantitative / mixed method

Whether the study uses a qualitative, quantitative, or mixed methods approach?

Availability of the underlying research data

Whether the paper has a reference to the public availability of the underlying research data e.g., transcriptions of interviews, collected data etc., or explains why these data are not openly shared?

Period under investigation

Period (or moment) in which the study was conducted (e.g., January 2021-March 2022)

Use of theory / theoretical concepts / approaches? If yes, specify them

Does the study mention any theory / theoretical concepts / approaches? If yes, what theory / concepts / approaches? If any theory is mentioned, how is theory used in the study? (e.g., mentioned to explain a certain phenomenon, used as a framework for analysis, tested theory, theory mentioned in the future research section).

Quality-related information

Quality concerns

Whether there are any quality concerns (e.g., limited information about the research methods used)?

Public Data Ecosystem-related information

Public data ecosystem definition

How is the public data ecosystem defined in the paper and any other equivalent term, mostly infrastructure. If an alternative term is used, how is the public data ecosystem called in the paper?

Public data ecosystem evolution / development

Does the paper define the evolution of the public data ecosystem? If yes, how is it defined and what factors affect it?

What constitutes a public data ecosystem?

What constitutes a public data ecosystem (components & relationships) - their "FORM / OUTPUT" presented in the paper (general description with more detailed answers to further additional questions).

Components and relationships

What components does the public data ecosystem consist of and what are the relationships between these components? Alternative names for components - element, construct, concept, item, helix, dimension etc. (detailed description).

Stakeholders

What stakeholders (e.g., governments, citizens, businesses, Non-Governmental Organisations (NGOs) etc.) does the public data ecosystem involve?

Actors and their roles

What actors does the public data ecosystem involve? What are their roles?

Data (data types, data dynamism, data categories etc.)

What data do the public data ecosystem cover (is intended / designed for)? Refer to all data-related aspects, including but not limited to data types, data dynamism (static data, dynamic, real-time data, stream), prevailing data categories / domains / topics etc.

Processes / activities / dimensions, data lifecycle phases

What processes, activities, dimensions and data lifecycle phases (e.g., locate, acquire, download, reuse, transform, etc.) does the public data ecosystem involve or refer to?

Level (if relevant)

What is the level of the public data ecosystem covered in the paper? (e.g., city, municipal, regional, national (=country), supranational, international).

Other elements or relationships (if any)

What other elements or relationships does the public data ecosystem consist of?

Additional comments

Additional comments (e.g., what other topics affected the public data ecosystems and their elements, what is expected to affect the public data ecosystems in the future, what were important topics by which the period was characterised etc.).

New papers

Does the study refer to any other potentially relevant papers?

Additional references to potentially relevant papers that were found in the analysed paper (snowballing).

Format of the file.xls, .csv (for the first spreadsheet only), .docx

Licenses or restrictionsCC-BY

For more info, see README.txt
T
Metadata description, quantitative data, and supplementary material
dataverse.tec.ac.cr
docx, xlsx
Updated Oct 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anthony Valverde Abarca; Francisco Sevilla Benavides; Anthony Valverde Abarca; Francisco Sevilla Benavides (2025). Metadata description, quantitative data, and supplementary material [Dataset]. http://doi.org/10.18845/RDA/ORDORO
Explore at:
docx(187765), xlsx(348466), docx(16674)Available download formats
Unique identifier
https://doi.org/10.18845/RDA/ORDORO
Dataset updated
Oct 27, 2025
Dataset provided by
TECdatos Repository
Authors
Anthony Valverde Abarca; Francisco Sevilla Benavides; Anthony Valverde Abarca; Francisco Sevilla Benavides
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Time period covered
Jul 9, 2024 - Aug 16, 2024
Dataset funded by
Instituto Tecnológico de Costa Rica
Description
Contain information about quantitative data of the analysis of sperm kinematics, concentration, viability and morphology in bovine semen using the iSperm® system, metadata description and supplementary material.
Statistical description of quantitative attributes.
plos.figshare.com
xls
Updated Jun 21, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luis J. Rodríguez-Muñiz; Ana B. Bernardo; María Esteban; Irene Díaz (2019). Statistical description of quantitative attributes. [Dataset]. http://doi.org/10.1371/journal.pone.0218796.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0218796.t002
Dataset updated
Jun 21, 2019
Dataset provided by
PLOShttp://plos.org/
Authors
Luis J. Rodríguez-Muñiz; Ana B. Bernardo; María Esteban; Irene Díaz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Statistical description of quantitative attributes.
d
Dataframe of Significant Stems for: Big Data and Digital Aesthetic, Arts and...
demo-b2find.dkrz.de
Updated Sep 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Dataframe of Significant Stems for: Big Data and Digital Aesthetic, Arts and Cultural Education: Hot Spots of Current Quantitative Research Dataset for: Big Data and Digital Aesthetic, Arts and Cultural Education: Hot Spots of Current Quantitative Research - Dataset - B2FIND [Dataset]. http://demo-b2find.dkrz.de/dataset/0bd97871-d19f-5b9b-bfcc-87f133bd9275
Explore at:
Dataset updated
Sep 21, 2025
Description
Systematic reviews are the method of choice to synthesize research evidence. To identify main topics (so-called hot spots) relevant to large corpora of original publications in need of a synthesis, one must address the “three Vs” of big data (volume, velocity, and variety), especially in loosely defined or fragmented disciplines. For this purpose, text mining and predictive modeling are very helpful. Thus, we applied these methods to a compilation of documents related to digitalization in aesthetic, arts, and cultural education, as a prototypical, loosely defined, fragmented discipline, and particularly to quantitative research within it (QRD-ACE). By broadly querying the abstract and citation database Scopus with terms indicative of QRD-ACE, we identified a corpus of N = 55,553 publications for the years 2013–2017. As the result of an iterative approach of text mining, priority screening, and predictive modeling, we identified n = 8,304 potentially relevant publications of which n = 1,666 were included after priority screening. Analysis of the subject distribution of the included publications revealed video games as a first hot spot of QRD-ACE. Topic modeling resulted in aesthetics and cultural activities on social media as a second hot spot, related to 4 of k = 8 identified topics. This way, we were able to identify current hot spots of QRD-ACE by screening less than 15% of the corpus. We discuss implications for harnessing text mining, predictive modeling, and priority screening in future research syntheses and avenues for future original research on QRD-ACE. Dataset for: Christ, A., Penthin, M., & Kröner, S. (2019). Big Data and Digital Aesthetic, Arts, and Cultural Education: Hot Spots of Current Quantitative Research. Social Science Computer Review, 089443931988845. https://doi.org/10.1177/0894439319888455
Data from: Establishing performance metrics for quantitative non-targeted...
catalog.data.gov
datasets.ai
+1more
Updated Feb 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2024). Establishing performance metrics for quantitative non-targeted analysis: a demonstration using per- and poly-fluoroalkyl substances [Dataset]. https://catalog.data.gov/dataset/establishing-performance-metrics-for-quantitative-non-targeted-analysis-a-demonstration-us
Explore at:
Dataset updated
Feb 17, 2024
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Non-targeted analysis (NTA) is an increasingly popular technique for characterizing undefined chemical analytes. Generating quantitative NTA (qNTA) concentration estimates requires the use of training data from calibration “surrogates”. The use of surrogate training data can yield diminished performance of concentration estimation approaches. In order to evaluate performance differences between targeted and qNTA approaches, we defined new metrics that convey predictive accuracy, uncertainty (using 95% inverse confidence intervals), and reliability (the extent to which confidence intervals contain true values). We calculated and examined these newly defined metrics across five quantitative approaches applied to a mixture of 29 per- and polyfluoroalkyl substances (PFAS). The quantitative approaches spanned a traditional targeted design using chemical-specific calibration curves to a generalizable qNTA design using bootstrap-sampled calibration values from chemical surrogates. This dataset is associated with the following publication: Pu, S., J. McCord, J. Bangma, and J. Sobus. Establishing performance metrics for quantitative non-targeted analysis: a demonstration using per- and polyfluoroalkyl substances. Analytical and Bioanalytical Chemistry. Springer, New York, NY, USA, 416: 1249-1267, (2024).
f
Data from: Improved Accuracy and Reliability in Untargeted Analysis with...
acs.figshare.com
figshare.com
xlsx
Updated Apr 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Guillaume Laurent Erny; Julia Nowak; Michał Woźniakiewicz (2025). Improved Accuracy and Reliability in Untargeted Analysis with LC-ESI-QTOF/MS1 by Ensemble Averaging [Dataset]. http://doi.org/10.1021/acs.analchem.4c06078.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.analchem.4c06078.s002
Dataset updated
Apr 4, 2025
Dataset provided by
ACS Publications
Authors
Guillaume Laurent Erny; Julia Nowak; Michał Woźniakiewicz
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Untargeted liquid chromatography coupled with high-resolution mass spectrometry (LC-HRMS) is a powerful tool for comprehensive chemical analysis. Such techniques allow the detection and quantification of thousands of compounds in a sample. However, the complexity and variability in the data can introduce significant errors, impacting the reliability of the results. This study investigates ensemble averaging to mitigate these errors and improve signal-to-noise (S/N) ratios, feature detection, and data quality. In this work, 256 LC-qTOF/MS1 data sets from the analysis of Morning Glory seeds were averaged to generate merged data sets. The numbers of the pooled data sets in the merged files were varied, and the number of features, the S/N ratio, the accuracy and precision of the accurate masses, relative intensities, and migration time were examined. It was proved that ensemble averaging allows an increase in the S/N up to a factor of 10, and the relative standard deviation of the accurate masses and retention time decreased by a factor of 10. Moreover, the average number of features mined per data set increased from 1192 ± 129 with the original data set to 4408 when all data sets were averaged into one. Using known target compounds, ensemble averaging benefits on quantitative analysis were investigated. The measured and theoretical relative intensities between the [M+1]+H+, [M+2]+H+, and [M+3]+H+ and [M]+H+ isotopes of known alkaloids were used. The standard deviation decreased by up to a factor of 10, and the absolute error between theoretical and experimental relative intensities was below 3%, making the theoretical isotopic pattern a valid criterion for confirming a putative molecular formula. Using a targeted approach to recover quantitative data from the original data sets from information in the merged data sets provides an accurate quantitative means. Peak lists from the merged data sets and quantitative information from the original data sets were fused to obtain a robust clustering approach that allows recognizing features (adducts, isotopes, and fragments) generated by a common chemical in the ionization chamber. Two hundred and four clusters were obtained, characterized by two or more features with migration times that differ by less than 0.05 min and with similar response patterns.
Methodology data of "A qualitative and quantitative citation analysis toward...
zenodo.org
data.niaid.nih.gov
zip
Updated Jul 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ivan Heibi; Ivan Heibi; Silvio Peroni; Silvio Peroni (2022). Methodology data of "A qualitative and quantitative citation analysis toward retracted articles: a case of study" [Dataset]. http://doi.org/10.5281/zenodo.4323221
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4323221
Dataset updated
Jul 8, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ivan Heibi; Ivan Heibi; Silvio Peroni; Silvio Peroni
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This document contains the datasets and visualizations generated after the application of the methodology defined in our work: "A qualitative and quantitative citation analysis toward retracted articles: a case of study". The methodology defines a citation analysis of the Wakefield et al. [1] retracted article from a quantitative and qualitative point of view. The data contained in this repository are based on the first two steps of the methodology. The first step of the methodology (i.e. “Data gathering”) builds an annotated dataset of the citing entities, this step is largely discussed also in [2]. The second step (i.e. "Topic Modelling") runs a topic modeling analysis on the textual features contained in the dataset generated by the first step.

Note: the data are all contained inside the "method_data.zip" file. You need to unzip the file to get access to all the files and directories listed below.

Data gathering

The data generated by this step are stored in "data/":

"cits_features.csv": a dataset containing all the entities (rows in the CSV) which have cited the Wakefield et al. retracted article, and a set of features characterizing each citing entity (columns in the CSV). The features included are: DOI ("doi"), year of publication ("year"), the title ("title"), the venue identifier ("source_id"), the title of the venue ("source_title"), yes/no value in case the entity is retracted as well ("retracted"), the subject area ("area"), the subject category ("category"), the sections of the in-text citations ("intext_citation.section"), the value of the reference pointer ("intext_citation.pointer"), the in-text citation function ("intext_citation.intent"), the in-text citation perceived sentiment ("intext_citation.sentiment"), and a yes/no value to denote whether the in-text citation context mentions the retraction of the cited entity ("intext_citation.section.ret_mention").
Note: this dataset is licensed under a Creative Commons public domain dedication (CC0).

"cits_text.csv": this dataset stores the abstract ("abstract") and the in-text citations context ("intext_citation.context") for each citing entity identified using the DOI value ("doi").
Note: the data keep their original license (the one provided by their publisher). This dataset is provided in order to favor the reproducibility of the results obtained in our work.

Topic modeling
We run a topic modeling analysis on the textual features gathered (i.e. abstracts and citation contexts). The results are stored inside the "topic_modeling/" directory. The topic modeling has been done using MITAO, a tool for mashing up automatic text analysis tools, and creating a completely customizable visual workflow [3]. The topic modeling results for each textual feature are separated into two different folders, "abstracts/" for the abstracts, and "intext_cit/" for the in-text citation contexts. Both the directories contain the following directories/files:

"mitao_workflows/": the workflows of MITAO. These are JSON files that could be reloaded in MITAO to reproduce the results following the same workflows.

"corpus_and_dictionary/": it contains the dictionary and the vectorized corpus given as inputs for the LDA topic modeling.

"coherence/coherence.csv": the coherence score of several topic models trained on a number of topics from 1 - 40.

"datasets_and_views/": the datasets and visualizations generated using MITAO.

References

Wakefield, A., Murch, S., Anthony, A., Linnell, J., Casson, D., Malik, M., Berelowitz, M., Dhillon, A., Thomson, M., Harvey, P., Valentine, A., Davies, S., & Walker-Smith, J. (1998). RETRACTED: Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children. The Lancet, 351(9103), 637–641. https://doi.org/10.1016/S0140-6736(97)11096-0

Heibi, I., & Peroni, S. (2020). A methodology for gathering and annotating the raw-data/characteristics of the documents citing a retracted article v1 (protocols.io.bdc4i2yw) [Data set]. In protocols.io. ZappyLab, Inc. https://doi.org/10.17504/protocols.io.bdc4i2yw

Ferri, P., Heibi, I., Pareschi, L., & Peroni, S. (2020). MITAO: A User Friendly and Modular Software for Topic Modelling [JD]. PuntOorg International Journal, 5(2), 135–149. https://doi.org/10.19245/25.05.pij.5.2.3
Optimized parameter values for play detection.
plos.figshare.com
xls
Updated Apr 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonas Bischofberger; Arnold Baca; Erich Schikuta (2024). Optimized parameter values for play detection. [Dataset]. http://doi.org/10.1371/journal.pone.0298107.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0298107.t004
Dataset updated
Apr 18, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Jonas Bischofberger; Arnold Baca; Erich Schikuta
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
With recent technological advancements, quantitative analysis has become an increasingly important area within professional sports. However, the manual process of collecting data on relevant match events like passes, goals and tacklings comes with considerable costs and limited consistency across providers, affecting both research and practice. In football, while automatic detection of events from positional data of the players and the ball could alleviate these issues, it is not entirely clear what accuracy current state-of-the-art methods realistically achieve because there is a lack of high-quality validations on realistic and diverse data sets. This paper adds context to existing research by validating a two-step rule-based pass and shot detection algorithm on four different data sets using a comprehensive validation routine that accounts for the temporal, hierarchical and imbalanced nature of the task. Our evaluation shows that pass and shot detection performance is highly dependent on the specifics of the data set. In accordance with previous studies, we achieve F-scores of up to 0.92 for passes, but only when there is an inherent dependency between event and positional data. We find a significantly lower accuracy with F-scores of 0.71 for passes and 0.65 for shots if event and positional data are independent. This result, together with a critical evaluation of existing methodologies, suggests that the accuracy of current football event detection algorithms operating on positional data is currently overestimated. Further analysis reveals that the temporal extraction of passes and shots from positional data poses the main challenge for rule-based approaches. Our results further indicate that the classification of plays into shots and passes is a relatively straightforward task, achieving F-scores between 0.83 to 0.91 ro rule-based classifiers and up to 0.95 for machine learning classifiers. We show that there exist simple classifiers that accurately differentiate shots from passes in different data sets using a low number of human-understandable rules. Operating on basic spatial features, our classifiers provide a simple, objective event definition that can be used as a foundation for more reliable event-based match analysis.
d
Quantitative Data SPSS
search.dataone.org
dataone.org
+2more
Updated Apr 2, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew Kliskey (2016). Quantitative Data SPSS [Dataset]. http://doi.org/10.18739/A2KW3J
Explore at:
Unique identifier
https://doi.org/10.18739/A2KW3J
Dataset updated
Apr 2, 2016
Dataset provided by
Arctic Data Center
Authors
Andrew Kliskey
Description
No description is available. Visit https://dataone.org/datasets/doi%3A10.18739%2FA2KW3J for complete metadata about this dataset.
Quantitative description of data sets: M is the number of points, H[σ] is...
figshare.com
xls
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sandipan Sikdar; Animesh Mukherjee; Matteo Marsili (2023). Quantitative description of data sets: M is the number of points, H[σ] is the entropy of the ground truth classification. [Dataset]. http://doi.org/10.1371/journal.pone.0239331.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0239331.t001
Dataset updated
Jun 4, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Sandipan Sikdar; Animesh Mukherjee; Matteo Marsili
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
d1—d2 represents the conformity among the different goodness metrics (purity, NMI and ARI) in terms of Kendall’s and Spearman’s rank correlation (see text). The last column reports the Kendall’s τ and Spearman’s ρ rank correlations of with the majority ranking of similarity to the ground truth (see text).
e
Brain-wide quantitative data on parvalbumin positive neurons in the mouse
search.kg.ebrains.eu
Updated Jun 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ingvild E. Bjerke; Sharon C. Yates; Maja A. Puchades; Jan G. Bjaalie; Trygve Brauns Leergaard (2020). Brain-wide quantitative data on parvalbumin positive neurons in the mouse [Dataset]. http://doi.org/10.25493/BT8X-FN9
Explore at:
Unique identifier
https://doi.org/10.25493/BT8X-FN9
Dataset updated
Jun 9, 2020
Authors
Ingvild E. Bjerke; Sharon C. Yates; Maja A. Puchades; Jan G. Bjaalie; Trygve Brauns Leergaard
Description
This dataset contains quantitative data extracted from four sets of parvalbumin stained sections covering the whole brain of normal, adult PVCre X Rosa26eYFP mice. Section images from the dataset “Distribution of parvalbumin-positive interneurons in the normal adult mouse brain” (Laja et al., 2020) were analysed using the QUINT workflow. The dataset includes numbers and densities of cells for brain regions defined according to the Allen Mouse brain Common Coordinate Framework (CCF) and point coordinate data representing labelled cells across the brain. A detailed description of the number and distribution on parvalbumin neurons across brain regions is given in the related publication. To facilitate re-use of the data, the dataset furthermore includes atlas maps and segmented images in .png format and .nut files used to run the analyses.
h
xPeerdMSv1.0
huggingface.co
Updated Sep 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Khalid Saqr (2025). xPeerdMSv1.0 [Dataset]. https://huggingface.co/datasets/saqure/xPeerdMSv1.0
Explore at:
Dataset updated
Sep 20, 2025
Authors
Khalid Saqr
Description
xPeerd Analysis Pipeline

This repository contains a comprehensive Python script designed for analyzing peer review reports. The pipeline processes a CSV file of review data, extracts structured information, classifies each review into an academic supergroup, performs statistical analysis, and generates a series of publication-quality visualizations.

Overview

The core functionality of this script is to transform unstructured peer review text into quantitative data and… See the full description on the dataset page: https://huggingface.co/datasets/saqure/xPeerdMSv1.0.
u
Transforming Universities for a Changing Climate: Qualitative and...
datacatalogue.ukdataservice.ac.uk
Updated Jan 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
McCowan, T, UCL; Blitz, B, UCL; Brandli, L, University of Passo Fundo; Frediani, A, International Institute for Environment and Development; Kitagawa, K, UCL; Lagi, R, University of the South Pacific; Langa, P, Eduardo Mondlane University; Nussey, C, UCL; Nyerere, J, Kenyatta University; Rolleston, C, UCL; Wright, A, Association of Commonwealth Universities (2025). Transforming Universities for a Changing Climate: Qualitative and Quantitative Data, 2021-2023 [Dataset]. http://doi.org/10.5255/UKDA-SN-856168
Explore at:
Unique identifier
https://doi.org/10.5255/UKDA-SN-856168
Dataset updated
Jan 13, 2025
Authors
McCowan, T, UCL; Blitz, B, UCL; Brandli, L, University of Passo Fundo; Frediani, A, International Institute for Environment and Development; Kitagawa, K, UCL; Lagi, R, University of the South Pacific; Langa, P, Eduardo Mondlane University; Nussey, C, UCL; Nyerere, J, Kenyatta University; Rolleston, C, UCL; Wright, A, Association of Commonwealth Universities
Area covered
United Kingdom, Kenya, Fiji, Brazil, Mozambique
Description
Higher education has a crucial role to play in responding to the climate crisis, not only through carrying out research, but also through teaching, community engagement and public awareness. The Transforming Universities for a Changing Climate (Climate-U) project aimed to strengthen the contribution of universities to addressing the causes and impacts of climate change in lower-income contexts. In doing so, it contributed to the broader task of understanding the role of education in achieving the full set of Sustainable Development Goals (SDGs). First starting in 2020, it focused on five countries: Brazil, Fiji, Kenya, Mozambique and the UK. The project sought to answer two main research questions in these countries: What are the effects of locally-generated university initiatives on actions and ideas relating to climate change?; and How do they inform our understandings of the role of higher education in sustainable development? The qualitative and quantitative collections of data deposited here contribute to an analysis of that answers these questions.

We start with a description of the qualitative data collection. A case study design was adopted to guide the research. The focus of the case studies was variously on community engagement, curriculum and campus greening activities. The collaborations and partnerships that exist between the university and external organisations on climate action were also examined during the study. Interviews and focus groups were conducted with a range of key informants (community members, academics, students and non-government organisations). The broad aim of the interviews and focus groups was to establish respondents' views on the role of universities in responding to climate change through and beyond the teaching, research, community engagement and public awareness functions. This was in order to determine the extent to which universities can themselves be transformed in order to respond to the climate crisis, as well as transform the marginalised communities surrounding universities.

The qualitative case studies formed part of the broader research method for the project – participatory action research (PAR). Not all of the participating universities made formal data collection of interviews and focus groups as part of the PAR. Qualitative data from four of the participating institutions are included in this dataset.

We now turn to a description of the quantitative data collection. A survey on climate change was conducted in twelve universities in Brazil, Fiji, Kenya and Mozambique. The survey examined the experiences of students, their engagement in climate action and their attitudes towards environmental issues. It responded to the overall aim of the project, which was to generate insights into how to maximise the contribution of universities to the mitigation and adaptation challenges of climate change, and to understand how universities might contribute to climate justice. To this end, the survey aimed to assess students’ perceptions and experiences regarding climate change and their universities, and their environmental attitudes. It was designed to be internationally comparable and to draw on existing work and questions, so a number of previous surveys and studies were reviewed in the process of drafting our questionnaire.
Climate change is widely recognised as the most critical challenge of our age, with the recent Intergovernmental Panel on Climate Change report suggesting that to avoid devastating effects, the world must move entirely to renewables by 2050. This project aims to strengthen the contribution of universities in lower-income countries to addressing this challenge.

The role of research and innovation in this task is widely acknowledged, and universities around the world are closely involved in the tasks of monitoring, interpreting and responding to the process and effects of global warming. Yet the broader role of universities in addressing the climate crisis is as yet under-researched. How do courses provided by universities address the question of climate change, and what forms of climate-related learning do students engage with on campus and beyond? What impacts do universities have on climate change through community engagement activities, in fostering public debate on the issue and in the way they embody the principles of sustainability in their own institutional forms?

These roles of universities beyond knowledge production are critical in addressing climate change, given the deep social, political and economic roots of the crisis, and the need to engage with professional development, civic action and public awareness. At the same time, it is clear that despite the potentialities of universities in this regard, much more could be done. This is particularly the case in low and middle-income countries in which there is disproportionate impact of the most devastating effects of climate change.

This project addresses these questions in the context of the higher education systems of Brazil, Fiji, Kenya and Mozambique. These countries have been selected on account of the vulnerability of their populations to climate-related disasters, but also because of the potentialities of their higher education systems for responding to the challenges, and in generating learning that can be utilised in other contexts. The countries have distinct features in relation to their culture, politics, economics and geography, as well as in their higher education systems, which will allow for significant possibilities of learning across the countries and with the UK.

The research started with a survey of the state of play as regards universities' coverage of climate change issues within their teaching, research and community engagement. Participatory action research groups were then created in 12 universities across the participating countries, including representatives of students, lecturers, senior management and local communities. These groups designed, implemented and monitored initiatives to address local challenges, in line with their own priorities. Interventions included new modules for students, community-based projects on disaster preparedness, and developing sustainable campuses.

The learning generated from these diverse experiences contributed to theory building and understanding of the relationship between education and sustainable development, and of the role of higher education in achieving the Sustainable Development Goals (SDGs). There was a strong emphasis on South-South collaboration and learning, and insights generated from interaction and comparison across high/middle/low income countries, between Anglophone and Lusophone higher education systems, and between Africa, the Pacific and Latin America.

While most acknowledge that education has some role to play in achieving the SDGs, much closer attention is needed to the institutional forms and practices that are most conducive. This project grapples with this question in the context of diverse countries in the Global South, with significant lessons for the broader global community.
Comprehensive Films Dataset for Analysis
kaggle.com
Updated Sep 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
eman fatima (2025). Comprehensive Films Dataset for Analysis [Dataset]. https://www.kaggle.com/datasets/emanfatima2025/comprehensive-films-dataset-for-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 21, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
eman fatima
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Description

The dataset is a thorough compilation of data about movies that includes both qualitative and quantitative characteristics. Every record reflects a distinct film and offers information about its identity, genre, release date, country of production, financial performance, audience reaction, and major players.

Important points to note are:

Identifiers and Metadata:Unique movie IDs, titles, genres, and release details.

Financial Indicators:Early sales performance, domestic (U.S.) and international box office performance, and budget.

Audience and Critical Reaction:Rotten Tomatoes scores, IMDb ratings, and the number of votes that go along with them.

Contributors:Details about the director and the main actor.

This dataset works well with:

Exploratory Data Analysis (EDA):Finding trends in the popularity, profitability, and creation of films.

Predictive Modeling:Forecasting financial success, ratings, or audience engagement.

Comparative Studies: Examining patterns in various genres, nations, and years of publication.

Recommendation systems use votes and ratings to sugest films.

This dataset offers a solid basis for data analysis, visualization, and machine learning applications in the film industry by providing a well-balanced combination of descriptive, financial, and evaluative information.
H
Diary Study Database
dataverse.harvard.edu
search.dataone.org
Updated Oct 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Teresa Amabile (2024). Diary Study Database [Dataset]. http://doi.org/10.7910/DVN/25463
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/25463
Dataset updated
Oct 21, 2024
Dataset provided by
Harvard Dataverse
Authors
Teresa Amabile
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/3.4/customlicense?persistentId=doi:10.7910/DVN/25463https://dataverse.harvard.edu/api/datasets/:persistentId/versions/3.4/customlicense?persistentId=doi:10.7910/DVN/25463
Description
The Diary Study (also known as The T.E.A.M. Study or The Progress Principle Study) was carried out in the late 1990s to early 2000s in order to probe the everyday work experiences of professionals working on important innovation projects within their companies. Teresa Amabile was the principal investigator. The database contains quantitative data and detailed categorical coding of qualitative data (not the verbatim qualitative data itself). Data were collected daily from the 238 professionals in 26 project teams who participated in this study throughout the entire course of a project (or discrete project phase) that required creativity – novel, useful ideas – in order to be successful. Many of the projects involved new product development. To the extent possible, daily data collection with a given team began on the first day of the project and continued until the last day. A large body of additional data on the individuals and their performance was collected at various other points during the study. The 26 teams were recruited from seven different companies in three industries: high tech, chemicals, and consumer products. Five of the companies had four teams that participated; one company had five teams; and one company had one team. (Please see the Metadata tab, below, for full description.) The primary source of data is the Daily Questionnaire (DQ) diary form that each participant was emailed each workday, Monday through Friday, throughout the course of the project on which the participant’s team was working. Participants were asked to return the completed diary, which took most people 5-10 minutes to complete, shortly before the end of their workday. Most did complete the diary on the day that the diary referred to, but some habitually completed the diary early the next day. The overall response rate was 75%, yielding a total of 11,637 individual daily diary entries. The DQ, which was identical for each day, contained questions calling for Likert-scale responses to questions about psychological state that day: (a) emotions; (b) motivation; and (c) perceptions of the project supervisor, the project team, the work environment, and the work itself. In addition, participants completed an open-ended question asking them to describe one event that stood out in their minds from the day that was relevant to the work in some way – the “Event Description” (ED) – and then answered additional Likert-scale questions about the event. The DQ included a few additional quantitative items. Although the DQ forms collected both quantitative and qualitative data (the EDs), the raw qualitative data are not included in this database. All included data have been de-identified, and it was not possible to adequately disguise the qualitative data. However, this database contains codes from several different coding schemes that prior researchers using this database created to categorize the events (and attributes of events) that participants reported in their EDs. Of the two primary coding schemes, the Detailed Event Narrative Analysis (DENA) scheme is extremely detailed; the Broad Event Narrative Analysis (BENA) scheme is considerably less detailed. In addition, several LIWC (Linguistic Inquiry and Word Count) analyses of the EDs are included in this database. A great deal of additional quantitative data was collected from all participants at various points in the study of their teams, including: demographics; personality; job satisfaction; cognitive style; motivational orientation; broad perceptions of the work environment, the project team, and the project; and monthly assessments of the performance of themselves and each of their teammates. Data were also collected from multiple managers in the participant’s area of the organization, who were broadly familiar with projects in that area. These managers completed monthly surveys assessing each of the participating projects, as well as a set of comparable but non-participating projects, on several dimensions. The book, The Progress Principle (Amabile, T. & Kramer, S., 2011, Harvard Business Publishing), reports a number of findings derived from quantitative and qualitative analyses of this database. The Research Appendix of this book contains descriptions (written in non-technical terms) of the Diary Study companies, participants, procedure, data collection instruments, data, and primary analyses conducted by Amabile and her colleagues. The Dataverse record lists several papers that used this database. Like the book, they can be used for additional information about the data collection methods and instruments as well as findings. Approval is required for use of this data. To apply for access, fill out the Diary Study application for use; make sure first you already have a Dataverse account.
URBANWASTE - Dataset 1 URBAN_METABOLISM_DATA
data.europa.eu
data.niaid.nih.gov
unknown
Updated Nov 8, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2017). URBANWASTE - Dataset 1 URBAN_METABOLISM_DATA [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-1035097?locale=da
Explore at:
unknown(647951)Available download formats
Dataset updated
Nov 8, 2017
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
General description Within URBANWASTE Work Package 2 data from the pilot cases needed to perform the metabolic analysis was collected. In this sense, mainly data regarding waste generation and management, tourism (accommodation capacity, tourist flows, tourism economy) and socio-economic data of each pilot was collected. The indicator sets finally collected were previously cross-checked with the 11 URBANWASTE Pilot Cases regarding data availability on pilot case scale to ensure their suitability and practicability to answer specific URBANWASTE questions. This cross-checking was done by the means of performing a "Survey on data availability" within Task 2.3. Origin, Nature and scale of data For data collection, all pilot cases partners received an empty excel database divided into three thematic areas (waste related data, socio-economic data and tourism related data). In case the pilot case partners did not have access to the requested information, other organisations such as local municipal departments, waste management companies, tourism associations, national statistical agencies etc. were contacted by them for support in data provision. The data collected with these databases mainly represent statistical data. For transparency reasons, data sources were to be specified as well. The spatial scale of the collected data was supposed to be the pilot case area (meaning for the whole city, municipality or metropolitan area). As data on this small scale was not available for all data sets, some of the provided data is on regional or even national level. For ensuring transparency, the spatial scale had to be specified for each data set. According to the type of indicators, the temporal scale varies from annual data to monthly data. For selected data sets, time series data at annual scale were collected for the period 2000 – 2015. For some selected data sets (e.g. waste quantities, tourist arrivals & overnight stays), additionally, also time series on monthly scale were collected for the period 2013 – 2015. Data Format The database prepared to collect the data needed for performing the metabolic analysis was divided into three thematic areas, which are further divided in categories as indicated below: Waste related data - Waste generation and waste quantities [number] - Waste prevention [text] [number] - Waste management [number] [%] [€] Socio-economic data - Description of the pilot case [number] [km²] - Economy [number] [%] [€] - Society [number] [%] - Building statistics [%] Tourism related data - Tourism economy [€] - Accommodation capacity [number] - Tourist flows [number] - Other tourism related information [number] Each category contains a lot of indicators, each indicator being identified by a data ID, a unit, and a spatial scale. Data sources had to be specified as well. When needed, the definitions of these indicators were added directly in the database template. In total, 48 data sets (some of them further divided into sub-sets) were collected. Most of the collected data represent quantitative data in the format of [number], [%] [€] or [km²]. The data on urban metabolism received from the pilot cases is stored in 1 excel database. Further Information and Contact The data on waste generation and management, socio-economic data and tourism data used for all the assessments performed within Work Package 2 and presented in this report was provided by the URBANWASTE pilot cases. More detailed information is contained within the database. In case of questions related to this database please contact: abf@boku.ac.at For more information on the URBANWASTE project please visit: http://www.urban-waste.eu/
c
Standardization in Quantitative Imaging: A Multi-center Comparison of...
cancerimagingarchive.net
n/a, nifti and zip +1
Updated Jun 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2020). Standardization in Quantitative Imaging: A Multi-center Comparison of Radiomic Feature Values [Dataset]. http://doi.org/10.7937/tcia.2020.9era-gg29
Explore at:
xlsx, n/a, nifti and zipAvailable download formats
Unique identifier
https://doi.org/10.7937/tcia.2020.9era-gg29
Dataset updated
Jun 9, 2020
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
Jun 9, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
This dataset was used by the NCI's Quantitative Imaging Network (QIN) PET-CT Subgroup for their project titled: Multi-center Comparison of Radiomic Features from Different Software Packages on Digital Reference Objects and Patient Datasets. The purpose of this project was to assess the agreement among radiomic features when computed by several groups by using different software packages under very tightly controlled conditions, which included common image data sets and standardized feature definitions. The image datasets (and Volumes of Interest – VOIs) provided here are the same ones used in that project and reported in the publication listed below (ISSN 2379-1381 https://doi.org/10.18383/j.tom.2019.00031). In addition, we have provided detailed information about the software packages used (Table 1 in that publication) as well as the individual feature value results for each image dataset and each software package that was used to create the summary tables (Tables 2, 3 and 4) in that publication. For that project, nine common quantitative imaging features were selected for comparison including features that describe morphology, intensity, shape, and texture and that are described in detail in the International Biomarker Standardisation Initiative (IBSI, https://arxiv.org/abs/1612.07003 and publication (Zwanenburg A. Vallières M, et al, The Image Biomarker Standardization Initiative: Standardized Quantitative Radiomics for High-Throughput Image-based Phenotyping. Radiology. 2020 May;295(2):328-338. doi: https://doi.org/10.1148/radiol.2020191145). There are three datasets provided – two image datasets and one dataset consisting of four excel spreadsheets containing feature values.

The first image dataset is a set of three Digital Reference Objects (DROs) used in the project, which are: (a) a sphere with uniform intensity, (b) a sphere with intensity variation (c) a nonspherical (but mathematically defined) object with uniform intensity. These DROs were created by the team at Stanford University and are described in (Jaggi A, Mattonen SA, McNitt-Gray M, Napel S. Stanford DRO Toolkit: digital reference objects for standardization of radiomic features. Tomography. 2019;6:–.) and are a subset of the DROs described in DRO Toolkit. Each DRO is represented in both DICOM and NIfTI format and the VOI was provided in each format as well (DICOM Segmentation Object (DSO) as well as NIfTI segmentation boundary).

The second image dataset is the set of 10 patient CT scans, originating from the LIDC-IDRI dataset, that were used in the QIN multi-site collection of Lung CT data with Nodule Segmentations project ( https://doi.org/10.7937/K9/TCIA.2015.1BUVFJR7 ). In that QIN study, a single lesion from each case was identified for analysis and then nine VOIs were generated using three repeat runs of three segmentation algorithms (one from each of three academic institutions) on each lesion. To eliminate one source of variability in our project, only one of the VOIs previously created for each lesion was identified and all sites used that same VOI definition. The specific VOI chosen for each lesion was the first run of the first algorithm (algorithm 1, run 1). DICOM images were provided for each dataset and the VOI was provided in both DICOM Segmentation Object (DSO) and NIfTI segmentation formats.

The third dataset is a collection of four excel spreadsheets, each of which contains detailed information corresponding to each of the four tables in the publication. For example, the raw feature values and the summary tables for Tables 2,3 and 4 reported in the publication cited (https://doi.org/10.18383/j.tom.2019.00031). These tables are:

Software Package details : This table contains detailed information about the software packages used in the study (and listed in Table 1 in the publication) including version number and any parameters specified in the calculation of the features reported. DRO results : This contains the original feature values obtained for each software package for each DRO as well as the table summarizing results across software packages (Table 2 in the publication) . Patient Dataset results: This contains the original feature values for each software package for each patient dataset (1 lesion per case) as well as the table summarizing results across software packages and patient datasets (Table 3 in the publication). Harmonized GLCM Entropy Results : This contains the values for the “Harmonized” GLCM Entropy feature for each patient dataset and each software package as well as the summary across software packages (Table 4 in the publication).
Z
Dataset: maturity of transparency of open data ecosystems in 22 smart cities...
data.niaid.nih.gov
Updated Apr 27, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anastasija Nikiforova; Martin Lnenicka; Mariusz Luterek (2022). Dataset: maturity of transparency of open data ecosystems in 22 smart cities [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_6497068
Explore at:
Dataset updated
Apr 27, 2022
Dataset provided by
University of Warsaw
University of Pardubice
University of Tartu
Authors
Anastasija Nikiforova; Martin Lnenicka; Mariusz Luterek
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains data collected during a study "Transparency of open data ecosystems in smart cities: Definition and assessment of the maturity of transparency in 22 smart cities" (Sustainable Cities and Society (SCS), vol.82, 103906) conducted by Martin Lnenicka (University of Pardubice), Anastasija Nikiforova (University of Tartu), Mariusz Luterek (University of Warsaw), Otmane Azeroual (German Centre for Higher Education Research and Science Studies), Dandison Ukpabi (University of Jyväskylä), Visvaldis Valtenbergs (University of Latvia), Renata Machova (University of Pardubice).

This study inspects smart cities’ data portals and assesses their compliance with transparency requirements for open (government) data by means of the expert assessment of 34 portals representing 22 smart cities, with 36 features.

It being made public both to act as supplementary data for the paper and in order for other researchers to use these data in their own work potentially contributing to the improvement of current data ecosystems and build sustainable, transparent, citizen-centered, and socially resilient open data-driven smart cities.

Purpose of the expert assessment The data in this dataset were collected in the result of the applying the developed benchmarking framework for assessing the compliance of open (government) data portals with the principles of transparency-by-design proposed by Lněnička and Nikiforova (2021)* to 34 portals that can be considered to be part of open data ecosystems in smart cities, thereby carrying out their assessment by experts in 36 features context, which allows to rank them and discuss their maturity levels and (4) based on the results of the assessment, defining the components and unique models that form the open data ecosystem in the smart city context.

Methodology Sample selection: the capitals of the Member States of the European Union and countries of the European Economic Area were selected to ensure a more coherent political and legal framework. They were mapped/cross-referenced with their rank in 5 smart city rankings: IESE Cities in Motion Index, Top 50 smart city governments (SCG), IMD smart city index (SCI), global cities index (GCI), and sustainable cities index (SCI). A purposive sampling method and systematic search for portals was then carried out to identify relevant websites for each city using two complementary techniques: browsing and searching. To evaluate the transparency maturity of data ecosystems in smart cities, we have used the transparency-by-design framework (Lněnička & Nikiforova, 2021)*. The benchmarking supposes the collection of quantitative data, which makes this task an acceptability task. A six-point Likert scale was applied for evaluating the portals. Each sub-dimension was supplied with its description to ensure the common understanding, a drop-down list to select the level at which the respondent (dis)agree, and a comment to be provided, which has not been mandatory. This formed a protocol to be fulfilled on every portal. Each sub-dimension/feature was assessed using a six-point Likert scale, where strong agreement is assessed with 6 points, while strong disagreement is represented by 1 point. Each website (portal) was evaluated by experts, where a person is considered to be an expert if a person works with open (government) data and data portals daily, i.e., it is the key part of their job, which can be public officials, researchers, and independent organizations. In other words, compliance with the expert profile according to the International Certification of Digital Literacy (ICDL) and its derivation proposed in Lněnička et al. (2021)* is expected to be met. When all individual protocols were collected, mean values and standard deviations (SD) were calculated, and if statistical contradictions/inconsistencies were found, reassessment took place to ensure individual consistency and interrater reliability among experts’ answers. *Lnenicka, M., & Nikiforova, A. (2021). Transparency-by-design: What is the role of open data portals?. Telematics and Informatics, 61, 101605 *Lněnička, M., Machova, R., Volejníková, J., Linhartová, V., Knezackova, R., & Hub, M. (2021). Enhancing transparency through open government data: the case of data portals and their features and capabilities. Online Information Review.

Test procedure (1) perform an assessment of each dimension using sub-dimensions, mapping out the achievement of each indicator (2) all sub-dimensions in one dimension are aggregated, and then the average value is calculated based on the number of sub-dimensions – the resulting average stands for a dimension value - eight values per portal (3) the average value from all dimensions are calculated and then mapped to the maturity level – this value of each portal is also used to rank the portals.

Description of the data in this data set Sheet#1 "comparison_overall" provides results by portal Sheet#2 "comparison_category" provides results by portal and category Sheet#3 "category_subcategory" provides list of categories and its elements

Format of the file .xls

Licenses or restrictions CC-BY

For more info, see README.txt
Overview of training and test sets.
plos.figshare.com
xls
Updated Apr 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonas Bischofberger; Arnold Baca; Erich Schikuta (2024). Overview of training and test sets. [Dataset]. http://doi.org/10.1371/journal.pone.0298107.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0298107.t002
Dataset updated
Apr 18, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Jonas Bischofberger; Arnold Baca; Erich Schikuta
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
With recent technological advancements, quantitative analysis has become an increasingly important area within professional sports. However, the manual process of collecting data on relevant match events like passes, goals and tacklings comes with considerable costs and limited consistency across providers, affecting both research and practice. In football, while automatic detection of events from positional data of the players and the ball could alleviate these issues, it is not entirely clear what accuracy current state-of-the-art methods realistically achieve because there is a lack of high-quality validations on realistic and diverse data sets. This paper adds context to existing research by validating a two-step rule-based pass and shot detection algorithm on four different data sets using a comprehensive validation routine that accounts for the temporal, hierarchical and imbalanced nature of the task. Our evaluation shows that pass and shot detection performance is highly dependent on the specifics of the data set. In accordance with previous studies, we achieve F-scores of up to 0.92 for passes, but only when there is an inherent dependency between event and positional data. We find a significantly lower accuracy with F-scores of 0.71 for passes and 0.65 for shots if event and positional data are independent. This result, together with a critical evaluation of existing methodologies, suggests that the accuracy of current football event detection algorithms operating on positional data is currently overestimated. Further analysis reveals that the temporal extraction of passes and shots from positional data poses the main challenge for rule-based approaches. Our results further indicate that the classification of plays into shots and passes is a relatively straightforward task, achieving F-scores between 0.83 to 0.91 ro rule-based classifiers and up to 0.95 for machine learning classifiers. We show that there exist simple classifiers that accurately differentiate shots from passes in different data sets using a low number of human-understandable rules. Operating on basic spatial features, our classifiers provide a simple, objective event definition that can be used as a foundation for more reliable event-based match analysis.

Facebook

Twitter

Click to copy link

Link copied

Cite

Anastasija Nikiforova; Nina Rizun; Magdalena Ciesielska; Charalampos Alexopoulos; Andrea Miletič (2023). Dataset: A Systematic Literature Review on the topic of High-value datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7944424

Dataset: A Systematic Literature Review on the topic of High-value datasets

Explore at:

Dataset updated

Jun 23, 2023

Dataset provided by

University of Zagreb
University of the Aegean
Gdańsk University of Technology
University of Tartu

Authors

Anastasija Nikiforova; Nina Rizun; Magdalena Ciesielska; Charalampos Alexopoulos; Andrea Miletič

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset contains data collected during a study ("Towards High-Value Datasets determination for data-driven development: a systematic literature review") conducted by Anastasija Nikiforova (University of Tartu), Nina Rizun, Magdalena Ciesielska (Gdańsk University of Technology), Charalampos Alexopoulos (University of the Aegean) and Andrea Miletič (University of Zagreb) It being made public both to act as supplementary data for "Towards High-Value Datasets determination for data-driven development: a systematic literature review" paper (pre-print is available in Open Access here -> https://arxiv.org/abs/2305.10234) and in order for other researchers to use these data in their own work.

The protocol is intended for the Systematic Literature review on the topic of High-value Datasets with the aim to gather information on how the topic of High-value datasets (HVD) and their determination has been reflected in the literature over the years and what has been found by these studies to date, incl. the indicators used in them, involved stakeholders, data-related aspects, and frameworks. The data in this dataset were collected in the result of the SLR over Scopus, Web of Science, and Digital Government Research library (DGRL) in 2023.

Methodology

To understand how HVD determination has been reflected in the literature over the years and what has been found by these studies to date, all relevant literature covering this topic has been studied. To this end, the SLR was carried out to by searching digital libraries covered by Scopus, Web of Science (WoS), Digital Government Research library (DGRL).

These databases were queried for keywords ("open data" OR "open government data") AND ("high-value data*" OR "high value data*"), which were applied to the article title, keywords, and abstract to limit the number of papers to those, where these objects were primary research objects rather than mentioned in the body, e.g., as a future work. After deduplication, 11 articles were found unique and were further checked for relevance. As a result, a total of 9 articles were further examined. Each study was independently examined by at least two authors.

To attain the objective of our study, we developed the protocol, where the information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information.

Test procedure Each study was independently examined by at least two authors, where after the in-depth examination of the full-text of the article, the structured protocol has been filled for each study. The structure of the survey is available in the supplementary file available (see Protocol_HVD_SLR.odt, Protocol_HVD_SLR.docx) The data collected for each study by two researchers were then synthesized in one final version by the third researcher.

Description of the data in this data set

Protocol_HVD_SLR provides the structure of the protocol Spreadsheets #1 provides the filled protocol for relevant studies. Spreadsheet#2 provides the list of results after the search over three indexing databases, i.e. before filtering out irrelevant studies

The information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information

Descriptive information
1) Article number - a study number, corresponding to the study number assigned in an Excel worksheet 2) Complete reference - the complete source information to refer to the study 3) Year of publication - the year in which the study was published 4) Journal article / conference paper / book chapter - the type of the paper -{journal article, conference paper, book chapter} 5) DOI / Website- a link to the website where the study can be found 6) Number of citations - the number of citations of the article in Google Scholar, Scopus, Web of Science 7) Availability in OA - availability of an article in the Open Access 8) Keywords - keywords of the paper as indicated by the authors 9) Relevance for this study - what is the relevance level of the article for this study? {high / medium / low}

Approach- and research design-related information 10) Objective / RQ - the research objective / aim, established research questions 11) Research method (including unit of analysis) - the methods used to collect data, including the unit of analy-sis (country, organisation, specific unit that has been ana-lysed, e.g., the number of use-cases, scope of the SLR etc.) 12) Contributions - the contributions of the study 13) Method - whether the study uses a qualitative, quantitative, or mixed methods approach? 14) Availability of the underlying research data- whether there is a reference to the publicly available underly-ing research data e.g., transcriptions of interviews, collected data, or explanation why these data are not shared? 15) Period under investigation - period (or moment) in which the study was conducted 16) Use of theory / theoretical concepts / approaches - does the study mention any theory / theoretical concepts / approaches? If any theory is mentioned, how is theory used in the study?

Quality- and relevance- related information
17) Quality concerns - whether there are any quality concerns (e.g., limited infor-mation about the research methods used)? 18) Primary research object - is the HVD a primary research object in the study? (primary - the paper is focused around the HVD determination, sec-ondary - mentioned but not studied (e.g., as part of discus-sion, future work etc.))

HVD determination-related information
19) HVD definition and type of value - how is the HVD defined in the article and / or any other equivalent term? 20) HVD indicators - what are the indicators to identify HVD? How were they identified? (components & relationships, “input -> output") 21) A framework for HVD determination - is there a framework presented for HVD identification? What components does it consist of and what are the rela-tionships between these components? (detailed description) 22) Stakeholders and their roles - what stakeholders or actors does HVD determination in-volve? What are their roles? 23) Data - what data do HVD cover? 24) Level (if relevant) - what is the level of the HVD determination covered in the article? (e.g., city, regional, national, international)

Format of the file .xls, .csv (for the first spreadsheet only), .odt, .docx

Licenses or restrictions CC-BY

For more info, see README.txt

Clear search

Close search

Google apps

Main menu

Dataset: A Systematic Literature Review on the topic of High-value datasets

Conceptualization of public data ecosystems

Metadata description, quantitative data, and supplementary material

Statistical description of quantitative attributes.

Dataframe of Significant Stems for: Big Data and Digital Aesthetic, Arts and...

Data from: Establishing performance metrics for quantitative non-targeted...

Data from: Improved Accuracy and Reliability in Untargeted Analysis with...

Methodology data of "A qualitative and quantitative citation analysis toward...

Optimized parameter values for play detection.

Quantitative Data SPSS

Quantitative description of data sets: M is the number of points, H[σ] is...

Brain-wide quantitative data on parvalbumin positive neurons in the mouse

xPeerdMSv1.0

Transforming Universities for a Changing Climate: Qualitative and...

Comprehensive Films Dataset for Analysis

Description

Important points to note are:

This dataset works well with:

Diary Study Database

URBANWASTE - Dataset 1 URBAN_METABOLISM_DATA

Standardization in Quantitative Imaging: A Multi-center Comparison of...

Dataset: maturity of transparency of open data ecosystems in 22 smart cities...

Overview of training and test sets.

Dataset: A Systematic Literature Review on the topic of High-value datasets