100+ datasets found
  1. Z

    Dataset: A Systematic Literature Review on the topic of High-value datasets

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anastasija Nikiforova; Nina Rizun; Magdalena Ciesielska; Charalampos Alexopoulos; Andrea Miletič (2023). Dataset: A Systematic Literature Review on the topic of High-value datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7944424
    Explore at:
    Dataset updated
    Jun 23, 2023
    Dataset provided by
    Gdańsk University of Technology
    University of the Aegean
    University of Tartu
    University of Zagreb
    Authors
    Anastasija Nikiforova; Nina Rizun; Magdalena Ciesielska; Charalampos Alexopoulos; Andrea Miletič
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains data collected during a study ("Towards High-Value Datasets determination for data-driven development: a systematic literature review") conducted by Anastasija Nikiforova (University of Tartu), Nina Rizun, Magdalena Ciesielska (Gdańsk University of Technology), Charalampos Alexopoulos (University of the Aegean) and Andrea Miletič (University of Zagreb) It being made public both to act as supplementary data for "Towards High-Value Datasets determination for data-driven development: a systematic literature review" paper (pre-print is available in Open Access here -> https://arxiv.org/abs/2305.10234) and in order for other researchers to use these data in their own work.

    The protocol is intended for the Systematic Literature review on the topic of High-value Datasets with the aim to gather information on how the topic of High-value datasets (HVD) and their determination has been reflected in the literature over the years and what has been found by these studies to date, incl. the indicators used in them, involved stakeholders, data-related aspects, and frameworks. The data in this dataset were collected in the result of the SLR over Scopus, Web of Science, and Digital Government Research library (DGRL) in 2023.

    Methodology

    To understand how HVD determination has been reflected in the literature over the years and what has been found by these studies to date, all relevant literature covering this topic has been studied. To this end, the SLR was carried out to by searching digital libraries covered by Scopus, Web of Science (WoS), Digital Government Research library (DGRL).

    These databases were queried for keywords ("open data" OR "open government data") AND ("high-value data*" OR "high value data*"), which were applied to the article title, keywords, and abstract to limit the number of papers to those, where these objects were primary research objects rather than mentioned in the body, e.g., as a future work. After deduplication, 11 articles were found unique and were further checked for relevance. As a result, a total of 9 articles were further examined. Each study was independently examined by at least two authors.

    To attain the objective of our study, we developed the protocol, where the information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information.

    Test procedure Each study was independently examined by at least two authors, where after the in-depth examination of the full-text of the article, the structured protocol has been filled for each study. The structure of the survey is available in the supplementary file available (see Protocol_HVD_SLR.odt, Protocol_HVD_SLR.docx) The data collected for each study by two researchers were then synthesized in one final version by the third researcher.

    Description of the data in this data set

    Protocol_HVD_SLR provides the structure of the protocol Spreadsheets #1 provides the filled protocol for relevant studies. Spreadsheet#2 provides the list of results after the search over three indexing databases, i.e. before filtering out irrelevant studies

    The information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information

    Descriptive information
    1) Article number - a study number, corresponding to the study number assigned in an Excel worksheet 2) Complete reference - the complete source information to refer to the study 3) Year of publication - the year in which the study was published 4) Journal article / conference paper / book chapter - the type of the paper -{journal article, conference paper, book chapter} 5) DOI / Website- a link to the website where the study can be found 6) Number of citations - the number of citations of the article in Google Scholar, Scopus, Web of Science 7) Availability in OA - availability of an article in the Open Access 8) Keywords - keywords of the paper as indicated by the authors 9) Relevance for this study - what is the relevance level of the article for this study? {high / medium / low}

    Approach- and research design-related information 10) Objective / RQ - the research objective / aim, established research questions 11) Research method (including unit of analysis) - the methods used to collect data, including the unit of analy-sis (country, organisation, specific unit that has been ana-lysed, e.g., the number of use-cases, scope of the SLR etc.) 12) Contributions - the contributions of the study 13) Method - whether the study uses a qualitative, quantitative, or mixed methods approach? 14) Availability of the underlying research data- whether there is a reference to the publicly available underly-ing research data e.g., transcriptions of interviews, collected data, or explanation why these data are not shared? 15) Period under investigation - period (or moment) in which the study was conducted 16) Use of theory / theoretical concepts / approaches - does the study mention any theory / theoretical concepts / approaches? If any theory is mentioned, how is theory used in the study?

    Quality- and relevance- related information
    17) Quality concerns - whether there are any quality concerns (e.g., limited infor-mation about the research methods used)? 18) Primary research object - is the HVD a primary research object in the study? (primary - the paper is focused around the HVD determination, sec-ondary - mentioned but not studied (e.g., as part of discus-sion, future work etc.))

    HVD determination-related information
    19) HVD definition and type of value - how is the HVD defined in the article and / or any other equivalent term? 20) HVD indicators - what are the indicators to identify HVD? How were they identified? (components & relationships, “input -> output") 21) A framework for HVD determination - is there a framework presented for HVD identification? What components does it consist of and what are the rela-tionships between these components? (detailed description) 22) Stakeholders and their roles - what stakeholders or actors does HVD determination in-volve? What are their roles? 23) Data - what data do HVD cover? 24) Level (if relevant) - what is the level of the HVD determination covered in the article? (e.g., city, regional, national, international)

    Format of the file .xls, .csv (for the first spreadsheet only), .odt, .docx

    Licenses or restrictions CC-BY

    For more info, see README.txt

  2. w

    Data Use in Academia Dataset

    • datacatalog.worldbank.org
    csv, utf-8
    Updated Nov 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semantic Scholar Open Research Corpus (S2ORC) (2023). Data Use in Academia Dataset [Dataset]. https://datacatalog.worldbank.org/search/dataset/0065200/data_use_in_academia_dataset
    Explore at:
    utf-8, csvAvailable download formats
    Dataset updated
    Nov 27, 2023
    Dataset provided by
    Semantic Scholar Open Research Corpus (S2ORC)
    Brian William Stacy
    License

    https://datacatalog.worldbank.org/public-licenses?fragment=cchttps://datacatalog.worldbank.org/public-licenses?fragment=cc

    Description

    This dataset contains metadata (title, abstract, date of publication, field, etc) for around 1 million academic articles. Each record contains additional information on the country of study and whether the article makes use of data. Machine learning tools were used to classify the country of study and data use.


    Our data source of academic articles is the Semantic Scholar Open Research Corpus (S2ORC) (Lo et al. 2020). The corpus contains more than 130 million English language academic papers across multiple disciplines. The papers included in the Semantic Scholar corpus are gathered directly from publishers, from open archives such as arXiv or PubMed, and crawled from the internet.


    We placed some restrictions on the articles to make them usable and relevant for our purposes. First, only articles with an abstract and parsed PDF or latex file are included in the analysis. The full text of the abstract is necessary to classify the country of study and whether the article uses data. The parsed PDF and latex file are important for extracting important information like the date of publication and field of study. This restriction eliminated a large number of articles in the original corpus. Around 30 million articles remain after keeping only articles with a parsable (i.e., suitable for digital processing) PDF, and around 26% of those 30 million are eliminated when removing articles without an abstract. Second, only articles from the year 2000 to 2020 were considered. This restriction eliminated an additional 9% of the remaining articles. Finally, articles from the following fields of study were excluded, as we aim to focus on fields that are likely to use data produced by countries’ national statistical system: Biology, Chemistry, Engineering, Physics, Materials Science, Environmental Science, Geology, History, Philosophy, Math, Computer Science, and Art. Fields that are included are: Economics, Political Science, Business, Sociology, Medicine, and Psychology. This third restriction eliminated around 34% of the remaining articles. From an initial corpus of 136 million articles, this resulted in a final corpus of around 10 million articles.


    Due to the intensive computer resources required, a set of 1,037,748 articles were randomly selected from the 10 million articles in our restricted corpus as a convenience sample.


    The empirical approach employed in this project utilizes text mining with Natural Language Processing (NLP). The goal of NLP is to extract structured information from raw, unstructured text. In this project, NLP is used to extract the country of study and whether the paper makes use of data. We will discuss each of these in turn.


    To determine the country or countries of study in each academic article, two approaches are employed based on information found in the title, abstract, or topic fields. The first approach uses regular expression searches based on the presence of ISO3166 country names. A defined set of country names is compiled, and the presence of these names is checked in the relevant fields. This approach is transparent, widely used in social science research, and easily extended to other languages. However, there is a potential for exclusion errors if a country’s name is spelled non-standardly.


    The second approach is based on Named Entity Recognition (NER), which uses machine learning to identify objects from text, utilizing the spaCy Python library. The Named Entity Recognition algorithm splits text into named entities, and NER is used in this project to identify countries of study in the academic articles. SpaCy supports multiple languages and has been trained on multiple spellings of countries, overcoming some of the limitations of the regular expression approach. If a country is identified by either the regular expression search or NER, it is linked to the article. Note that one article can be linked to more than one country.


    The second task is to classify whether the paper uses data. A supervised machine learning approach is employed, where 3500 publications were first randomly selected and manually labeled by human raters using the Mechanical Turk service (Paszke et al. 2019).[1] To make sure the human raters had a similar and appropriate definition of data in mind, they were given the following instructions before seeing their first paper:


    Each of these documents is an academic article. The goal of this study is to measure whether a specific academic article is using data and from which country the data came.

    There are two classification tasks in this exercise:

    1. identifying whether an academic article is using data from any country

    2. Identifying from which country that data came.

    For task 1, we are looking specifically at the use of data. Data is any information that has been collected, observed, generated or created to produce research findings. As an example, a study that reports findings or analysis using a survey data, uses data. Some clues to indicate that a study does use data includes whether a survey or census is described, a statistical model estimated, or a table or means or summary statistics is reported.

    After an article is classified as using data, please note the type of data used. The options are population or business census, survey data, administrative data, geospatial data, private sector data, and other data. If no data is used, then mark "Not applicable". In cases where multiple data types are used, please click multiple options.[2]

    For task 2, we are looking at the country or countries that are studied in the article. In some cases, no country may be applicable. For instance, if the research is theoretical and has no specific country application. In some cases, the research article may involve multiple countries. In these cases, select all countries that are discussed in the paper.

    We expect between 10 and 35 percent of all articles to use data.


    The median amount of time that a worker spent on an article, measured as the time between when the article was accepted to be classified by the worker and when the classification was submitted was 25.4 minutes. If human raters were exclusively used rather than machine learning tools, then the corpus of 1,037,748 articles examined in this study would take around 50 years of human work time to review at a cost of $3,113,244, which assumes a cost of $3 per article as was paid to MTurk workers.


    A model is next trained on the 3,500 labelled articles. We use a distilled version of the BERT (bidirectional Encoder Representations for transformers) model to encode raw text into a numeric format suitable for predictions (Devlin et al. (2018)). BERT is pre-trained on a large corpus comprising the Toronto Book Corpus and Wikipedia. The distilled version (DistilBERT) is a compressed model that is 60% the size of BERT and retains 97% of the language understanding capabilities and is 60% faster (Sanh, Debut, Chaumond, Wolf 2019). We use PyTorch to produce a model to classify articles based on the labeled data. Of the 3,500 articles that were hand coded by the MTurk workers, 900 are fed to the machine learning model. 900 articles were selected because of computational limitations in training the NLP model. A classification of “uses data” was assigned if the model predicted an article used data with at least 90% confidence.


    The performance of the models classifying articles to countries and as using data or not can be compared to the classification by the human raters. We consider the human raters as giving us the ground truth. This may underestimate the model performance if the workers at times got the allocation wrong in a way that would not apply to the model. For instance, a human rater could mistake the Republic of Korea for the Democratic People’s Republic of Korea. If both humans and the model perform the same kind of errors, then the performance reported here will be overestimated.


    The model was able to predict whether an article made use of data with 87% accuracy evaluated on the set of articles held out of the model training. The correlation between the number of articles written about each country using data estimated under the two approaches is given in the figure below. The number of articles represents an aggregate total of

  3. Search strategy for data sources.

    • plos.figshare.com
    xls
    Updated Sep 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vinicius Costa Maia Monteiro; Ísis de Siqueira Silva; Pedro Bezerra Xavier; Felipe Magdiel Bandeira Montenegro; Josemario de Abreu Silva; Severina Alice da Costa Uchoa (2025). Search strategy for data sources. [Dataset]. http://doi.org/10.1371/journal.pone.0331902.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Sep 8, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Vinicius Costa Maia Monteiro; Ísis de Siqueira Silva; Pedro Bezerra Xavier; Felipe Magdiel Bandeira Montenegro; Josemario de Abreu Silva; Severina Alice da Costa Uchoa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    AimTo describe the protocol for a scoping review on digital health technologies in Primary Health Care in rural territories, with a view to evaluating their impact on the attributes of Primary Health Care and identifying barriers and facilitators for its implementation.BackgroundRural populations face significant barriers in accessing health care, and digital health emerges as a promising strategy to overcome challenges. Nonetheless, there is a gap in the literature regarding the systematic evaluation of the impact of these technologies on rural Primary Health Care, which justifies this scoping review.MethodScoping review protocol, conducted according to the guidelines of the JBI Manual for Evidence Synthesis. It was duly registered in the Open Science Framework platform, with information on the conduct of the study in nine stages, following the PRISMA-ScR. The search strategy includes relevant databases and gray literature to ensure a broad mapping of scientific production on the topic. The analysis of quantitative variables will be carried out by simple descriptive statistics, with absolute and relative frequencies, while qualitative data will undergo thematic content analysis, following the stages of preparation, organization and reporting.Expected resultsBy evaluating the impact of digital technologies on rural Primary Health Care services, as well as identifying barriers and facilitators in their implementation, information is sought to improve access and quality of these services in rural territories through digital health.ConclusionThis review will have great practical relevance for managers, public policy makers, health professionals and researchers, as it is intended to map scientific evidence that can support decision-making and the development of strategies for the implementation of digital health in the studied context.

  4. G

    AI Dataset Search Platform Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Aug 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). AI Dataset Search Platform Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/ai-dataset-search-platform-market
    Explore at:
    pptx, pdf, csvAvailable download formats
    Dataset updated
    Aug 21, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    AI Dataset Search Platform Market Outlook



    According to our latest research, the global AI Dataset Search Platform market size is valued at USD 1.18 billion in 2024, with a robust year-over-year expansion driven by the escalating demand for high-quality datasets to fuel artificial intelligence and machine learning initiatives across industries. The market is expected to grow at a CAGR of 22.6% from 2025 to 2033, reaching an estimated USD 9.62 billion by 2033. This exponential growth is primarily attributed to the increasing recognition of data as a strategic asset, the proliferation of AI applications across sectors, and the need for efficient, scalable, and secure platforms to discover, curate, and manage diverse datasets.



    One of the primary growth factors propelling the AI Dataset Search Platform market is the exponential surge in AI adoption across both public and private sectors. Businesses and institutions are increasingly leveraging AI to gain competitive advantages, enhance operational efficiencies, and deliver personalized experiences. However, the effectiveness of AI models is fundamentally reliant on the quality and diversity of training datasets. As organizations strive to accelerate their AI initiatives, the need for platforms that can efficiently search, aggregate, and validate datasets from disparate sources has become paramount. This has led to a significant uptick in investments in AI dataset search platforms, as they enable faster data discovery, reduce development cycles, and ensure compliance with data governance standards.



    Another key driver for the market is the growing complexity and volume of data generated from emerging technologies such as IoT, edge computing, and connected devices. The sheer scale and heterogeneity of data sources necessitate advanced search platforms equipped with intelligent indexing, semantic search, and metadata management capabilities. These platforms not only facilitate the identification of relevant datasets but also support data annotation, labeling, and preprocessing, which are critical for building robust AI models. Furthermore, the integration of AI-powered search algorithms within these platforms enhances the accuracy and relevance of search results, thereby improving the overall efficiency of data scientists and AI practitioners.



    Additionally, regulatory pressures and the increasing emphasis on ethical AI have underscored the importance of transparent and auditable data sourcing. Organizations are compelled to demonstrate the provenance and integrity of the datasets used in their AI models to mitigate risks related to bias, privacy, and compliance. AI dataset search platforms address these challenges by providing traceability, version control, and access management features, ensuring that only authorized and compliant datasets are utilized. This not only reduces legal and reputational risks but also fosters trust among stakeholders, further accelerating market adoption.



    From a regional perspective, North America dominates the AI Dataset Search Platform market in 2024, accounting for over 38% of the global revenue. This leadership is driven by the presence of major technology providers, a mature AI ecosystem, and substantial investments in research and development. Europe follows closely, benefiting from stringent data privacy regulations and strong government support for AI innovation. The Asia Pacific region is experiencing the fastest growth, propelled by rapid digital transformation, expanding AI research communities, and increasing government initiatives to foster AI adoption. Latin America and the Middle East & Africa are also witnessing steady growth, albeit from a smaller base, as organizations in these regions gradually embrace AI-driven solutions.





    Component Analysis



    The AI Dataset Search Platform market by component is segmented into platforms and services, each playing a pivotal role in the ecosystem. The platform segment encompasses the core software infrastructure that enables users to search, index, curate, and manage datasets. This segmen

  5. Z

    Conceptualization of public data ecosystems

    • data.niaid.nih.gov
    Updated Sep 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anastasija, Nikiforova; Martin, Lnenicka (2024). Conceptualization of public data ecosystems [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13842001
    Explore at:
    Dataset updated
    Sep 26, 2024
    Dataset provided by
    University of Tartu
    University of Hradec Králové
    Authors
    Anastasija, Nikiforova; Martin, Lnenicka
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains data collected during a study "Understanding the development of public data ecosystems: from a conceptual model to a six-generation model of the evolution of public data ecosystems" conducted by Martin Lnenicka (University of Hradec Králové, Czech Republic), Anastasija Nikiforova (University of Tartu, Estonia), Mariusz Luterek (University of Warsaw, Warsaw, Poland), Petar Milic (University of Pristina - Kosovska Mitrovica, Serbia), Daniel Rudmark (Swedish National Road and Transport Research Institute, Sweden), Sebastian Neumaier (St. Pölten University of Applied Sciences, Austria), Karlo Kević (University of Zagreb, Croatia), Anneke Zuiderwijk (Delft University of Technology, Delft, the Netherlands), Manuel Pedro Rodríguez Bolívar (University of Granada, Granada, Spain).

    As there is a lack of understanding of the elements that constitute different types of value-adding public data ecosystems and how these elements form and shape the development of these ecosystems over time, which can lead to misguided efforts to develop future public data ecosystems, the aim of the study is: (1) to explore how public data ecosystems have developed over time and (2) to identify the value-adding elements and formative characteristics of public data ecosystems. Using an exploratory retrospective analysis and a deductive approach, we systematically review 148 studies published between 1994 and 2023. Based on the results, this study presents a typology of public data ecosystems and develops a conceptual model of elements and formative characteristics that contribute most to value-adding public data ecosystems, and develops a conceptual model of the evolutionary generation of public data ecosystems represented by six generations called Evolutionary Model of Public Data Ecosystems (EMPDE). Finally, three avenues for a future research agenda are proposed.

    This dataset is being made public both to act as supplementary data for "Understanding the development of public data ecosystems: from a conceptual model to a six-generation model of the evolution of public data ecosystems ", Telematics and Informatics*, and its Systematic Literature Review component that informs the study.

    Description of the data in this data set

    PublicDataEcosystem_SLR provides the structure of the protocol

    Spreadsheet#1 provides the list of results after the search over three indexing databases and filtering out irrelevant studies

    Spreadsheets #2 provides the protocol structure.

    Spreadsheets #3 provides the filled protocol for relevant studies.

    The information on each selected study was collected in four categories:(1) descriptive information,(2) approach- and research design- related information,(3) quality-related information,(4) HVD determination-related information

    Descriptive Information

    Article number

    A study number, corresponding to the study number assigned in an Excel worksheet

    Complete reference

    The complete source information to refer to the study (in APA style), including the author(s) of the study, the year in which it was published, the study's title and other source information.

    Year of publication

    The year in which the study was published.

    Journal article / conference paper / book chapter

    The type of the paper, i.e., journal article, conference paper, or book chapter.

    Journal / conference / book

    Journal article, conference, where the paper is published.

    DOI / Website

    A link to the website where the study can be found.

    Number of words

    A number of words of the study.

    Number of citations in Scopus and WoS

    The number of citations of the paper in Scopus and WoS digital libraries.

    Availability in Open Access

    Availability of a study in the Open Access or Free / Full Access.

    Keywords

    Keywords of the paper as indicated by the authors (in the paper).

    Relevance for our study (high / medium / low)

    What is the relevance level of the paper for our study

    Approach- and research design-related information

    Approach- and research design-related information

    Objective / Aim / Goal / Purpose & Research Questions

    The research objective and established RQs.

    Research method (including unit of analysis)

    The methods used to collect data in the study, including the unit of analysis that refers to the country, organisation, or other specific unit that has been analysed such as the number of use-cases or policy documents, number and scope of the SLR etc.

    Study’s contributions

    The study’s contribution as defined by the authors

    Qualitative / quantitative / mixed method

    Whether the study uses a qualitative, quantitative, or mixed methods approach?

    Availability of the underlying research data

    Whether the paper has a reference to the public availability of the underlying research data e.g., transcriptions of interviews, collected data etc., or explains why these data are not openly shared?

    Period under investigation

    Period (or moment) in which the study was conducted (e.g., January 2021-March 2022)

    Use of theory / theoretical concepts / approaches? If yes, specify them

    Does the study mention any theory / theoretical concepts / approaches? If yes, what theory / concepts / approaches? If any theory is mentioned, how is theory used in the study? (e.g., mentioned to explain a certain phenomenon, used as a framework for analysis, tested theory, theory mentioned in the future research section).

    Quality-related information

    Quality concerns

    Whether there are any quality concerns (e.g., limited information about the research methods used)?

    Public Data Ecosystem-related information

    Public data ecosystem definition

    How is the public data ecosystem defined in the paper and any other equivalent term, mostly infrastructure. If an alternative term is used, how is the public data ecosystem called in the paper?

    Public data ecosystem evolution / development

    Does the paper define the evolution of the public data ecosystem? If yes, how is it defined and what factors affect it?

    What constitutes a public data ecosystem?

    What constitutes a public data ecosystem (components & relationships) - their "FORM / OUTPUT" presented in the paper (general description with more detailed answers to further additional questions).

    Components and relationships

    What components does the public data ecosystem consist of and what are the relationships between these components? Alternative names for components - element, construct, concept, item, helix, dimension etc. (detailed description).

    Stakeholders

    What stakeholders (e.g., governments, citizens, businesses, Non-Governmental Organisations (NGOs) etc.) does the public data ecosystem involve?

    Actors and their roles

    What actors does the public data ecosystem involve? What are their roles?

    Data (data types, data dynamism, data categories etc.)

    What data do the public data ecosystem cover (is intended / designed for)? Refer to all data-related aspects, including but not limited to data types, data dynamism (static data, dynamic, real-time data, stream), prevailing data categories / domains / topics etc.

    Processes / activities / dimensions, data lifecycle phases

    What processes, activities, dimensions and data lifecycle phases (e.g., locate, acquire, download, reuse, transform, etc.) does the public data ecosystem involve or refer to?

    Level (if relevant)

    What is the level of the public data ecosystem covered in the paper? (e.g., city, municipal, regional, national (=country), supranational, international).

    Other elements or relationships (if any)

    What other elements or relationships does the public data ecosystem consist of?

    Additional comments

    Additional comments (e.g., what other topics affected the public data ecosystems and their elements, what is expected to affect the public data ecosystems in the future, what were important topics by which the period was characterised etc.).

    New papers

    Does the study refer to any other potentially relevant papers?

    Additional references to potentially relevant papers that were found in the analysed paper (snowballing).

    Format of the file.xls, .csv (for the first spreadsheet only), .docx

    Licenses or restrictionsCC-BY

    For more info, see README.txt

  6. News Events Data in Asia ( Techsalerator)

    • datarade.ai
    Updated Jul 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Techsalerator (2024). News Events Data in Asia ( Techsalerator) [Dataset]. https://datarade.ai/data-products/news-events-data-in-asia-techsalerator-techsalerator
    Explore at:
    .json, .csv, .xls, .txtAvailable download formats
    Dataset updated
    Jul 9, 2024
    Dataset provided by
    Techsalerator LLC
    Authors
    Techsalerator
    Area covered
    Uzbekistan, Timor-Leste, Brunei Darussalam, Kazakhstan, Kyrgyzstan, United Arab Emirates, Iran (Islamic Republic of), Hong Kong, Maldives, China
    Description

    Techsalerator’s News Event Data in Asia offers a detailed and expansive dataset designed to provide businesses, analysts, journalists, and researchers with comprehensive insights into significant news events across the Asian continent. This dataset captures and categorizes major events reported from a diverse range of news sources, including press releases, industry news sites, blogs, and PR platforms, offering valuable perspectives on regional developments, economic shifts, political changes, and cultural occurrences.

    Key Features of the Dataset: Extensive Coverage:

    The dataset aggregates news events from a wide range of sources such as company press releases, industry-specific news outlets, blogs, PR sites, and traditional media. This broad coverage ensures a diverse array of information from multiple reporting channels. Categorization of Events:

    News events are categorized into various types including business and economic updates, political developments, technological advancements, legal and regulatory changes, and cultural events. This categorization helps users quickly find and analyze information relevant to their interests or sectors. Real-Time Updates:

    The dataset is updated regularly to include the most current events, ensuring users have access to the latest news and can stay informed about recent developments as they happen. Geographic Segmentation:

    Events are tagged with their respective countries and regions within Asia. This geographic segmentation allows users to filter and analyze news events based on specific locations, facilitating targeted research and analysis. Event Details:

    Each event entry includes comprehensive details such as the date of occurrence, source of the news, a description of the event, and relevant keywords. This thorough detailing helps users understand the context and significance of each event. Historical Data:

    The dataset includes historical news event data, enabling users to track trends and perform comparative analysis over time. This feature supports longitudinal studies and provides insights into the evolution of news events. Advanced Search and Filter Options:

    Users can search and filter news events based on criteria such as date range, event type, location, and keywords. This functionality allows for precise and efficient retrieval of relevant information. Asian Countries and Territories Covered: Central Asia: Kazakhstan Kyrgyzstan Tajikistan Turkmenistan Uzbekistan East Asia: China Hong Kong (Special Administrative Region of China) Japan Mongolia North Korea South Korea Taiwan South Asia: Afghanistan Bangladesh Bhutan India Maldives Nepal Pakistan Sri Lanka Southeast Asia: Brunei Cambodia East Timor (Timor-Leste) Indonesia Laos Malaysia Myanmar (Burma) Philippines Singapore Thailand Vietnam Western Asia (Middle East): Armenia Azerbaijan Bahrain Cyprus Georgia Iraq Israel Jordan Kuwait Lebanon Oman Palestine Qatar Saudi Arabia Syria Turkey (partly in Europe, but often included in Asia contextually) United Arab Emirates Yemen Benefits of the Dataset: Strategic Insights: Businesses and analysts can use the dataset to gain insights into significant regional developments, economic conditions, and political changes, aiding in strategic decision-making and market analysis. Market and Industry Trends: The dataset provides valuable information on industry-specific trends and events, helping users understand market dynamics and identify emerging opportunities. Media and PR Monitoring: Journalists and PR professionals can track relevant news across Asia, enabling them to monitor media coverage, identify emerging stories, and manage public relations efforts effectively. Academic and Research Use: Researchers can utilize the dataset for longitudinal studies, trend analysis, and academic research on various topics related to Asian news and events. Techsalerator’s News Event Data in Asia is a crucial resource for accessing and analyzing significant news events across the continent. By offering detailed, categorized, and up-to-date information, it supports effective decision-making, research, and media monitoring across diverse sectors.

  7. News Events Data in Latin America( Techsalerator)

    • datarade.ai
    Updated Mar 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Techsalerator (2024). News Events Data in Latin America( Techsalerator) [Dataset]. https://datarade.ai/data-products/news-events-data-in-latin-america-techsalerator-techsalerator
    Explore at:
    .json, .csv, .xls, .txtAvailable download formats
    Dataset updated
    Mar 20, 2024
    Dataset provided by
    Techsalerator LLC
    Authors
    Techsalerator
    Area covered
    Chile, Cuba, Argentina, Aruba, Falkland Islands (Malvinas), Martinique, Montserrat, Dominican Republic, French Guiana, Ecuador, Latin America, Americas
    Description

    Techsalerator’s News Event Data in Latin America offers a detailed and extensive dataset designed to provide businesses, analysts, journalists, and researchers with an in-depth view of significant news events across the Latin American region. This dataset captures and categorizes key events reported from a wide array of news sources, including press releases, industry news sites, blogs, and PR platforms, offering valuable insights into regional developments, economic changes, political shifts, and cultural events.

    Key Features of the Dataset: Comprehensive Coverage:

    The dataset aggregates news events from numerous sources such as company press releases, industry news outlets, blogs, PR sites, and traditional news media. This broad coverage ensures a wide range of information from multiple reporting channels. Categorization of Events:

    News events are categorized into various types including business and economic updates, political developments, technological advancements, legal and regulatory changes, and cultural events. This categorization helps users quickly locate and analyze information relevant to their interests or sectors. Real-Time Updates:

    The dataset is updated regularly to include the most recent events, ensuring users have access to the latest news and can stay informed about current developments. Geographic Segmentation:

    Events are tagged with their respective countries and regions within Latin America. This geographic segmentation allows users to filter and analyze news events based on specific locations, facilitating targeted research and analysis. Event Details:

    Each event entry includes comprehensive details such as the date of occurrence, source of the news, a description of the event, and relevant keywords. This thorough detailing helps in understanding the context and significance of each event. Historical Data:

    The dataset includes historical news event data, enabling users to track trends and perform comparative analysis over time. This feature supports longitudinal studies and provides insights into how news events evolve. Advanced Search and Filter Options:

    Users can search and filter news events based on criteria such as date range, event type, location, and keywords. This functionality allows for precise and efficient retrieval of relevant information. Latin American Countries Covered: South America: Argentina Bolivia Brazil Chile Colombia Ecuador Guyana Paraguay Peru Suriname Uruguay Venezuela Central America: Belize Costa Rica El Salvador Guatemala Honduras Nicaragua Panama Caribbean: Cuba Dominican Republic Haiti (Note: Primarily French-speaking but included due to geographic and cultural ties) Jamaica Trinidad and Tobago Benefits of the Dataset: Strategic Insights: Businesses and analysts can use the dataset to gain insights into significant regional developments, economic conditions, and political changes, aiding in strategic decision-making and market analysis. Market and Industry Trends: The dataset provides valuable information on industry-specific trends and events, helping users understand market dynamics and emerging opportunities. Media and PR Monitoring: Journalists and PR professionals can track relevant news across Latin America, enabling them to monitor media coverage, identify emerging stories, and manage public relations efforts effectively. Academic and Research Use: Researchers can utilize the dataset for longitudinal studies, trend analysis, and academic research on various topics related to Latin American news and events. Techsalerator’s News Event Data in Latin America is a crucial resource for accessing and analyzing significant news events across the region. By providing detailed, categorized, and up-to-date information, it supports effective decision-making, research, and media monitoring across diverse sectors.

  8. News Events Data in North America ( Techsalerator)

    • datarade.ai
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Techsalerator (2024). News Events Data in North America ( Techsalerator) [Dataset]. https://datarade.ai/data-products/news-events-data-in-north-america-techsalerator-techsalerator
    Explore at:
    .json, .csv, .xls, .txtAvailable download formats
    Dataset updated
    Jun 25, 2024
    Dataset provided by
    Techsalerator LLC
    Authors
    Techsalerator
    Area covered
    Canada, United States
    Description

    Techsalerator’s News Event Data in North America offers a comprehensive and detailed dataset designed to provide businesses, analysts, journalists, and researchers with a thorough view of significant news events across North America. This dataset captures and categorizes major events reported from a diverse range of news sources, including press releases, industry news sites, blogs, and PR platforms, providing valuable insights into regional developments, economic shifts, political changes, and cultural events.

    Key Features of the Dataset: Extensive Coverage:

    The dataset aggregates news events from a wide array of sources, including company press releases, industry-specific news outlets, blogs, PR sites, and traditional media. This broad coverage ensures a diverse range of information from multiple reporting channels. Categorization of Events:

    News events are categorized into various types such as business and economic updates, political developments, technological advancements, legal and regulatory changes, and cultural events. This categorization helps users quickly find and analyze information relevant to their interests or sectors. Real-Time Updates:

    The dataset is updated regularly to include the most current events, ensuring that users have access to up-to-date news and can stay informed about recent developments as they happen. Geographic Segmentation:

    Events are tagged with their respective countries and territories within North America. This geographic segmentation allows users to filter and analyze news events based on specific locations, facilitating targeted research and analysis. Event Details:

    Each event entry includes comprehensive details such as the date of occurrence, source of the news, a description of the event, and relevant keywords. This thorough detailing helps users understand the context and significance of each event. Historical Data:

    The dataset includes historical news event data, enabling users to track trends and conduct comparative analysis over time. This feature supports longitudinal studies and provides insights into how news events evolve. Advanced Search and Filter Options:

    Users can search and filter news events based on criteria such as date range, event type, location, and keywords. This functionality allows for precise and efficient retrieval of relevant information. North American Countries and Territories Covered: Countries: Canada Mexico United States Territories: American Samoa (U.S. territory) French Polynesia (French overseas collectivity; included for regional relevance) Guam (U.S. territory) New Caledonia (French special collectivity; included for regional relevance) Northern Mariana Islands (U.S. territory) Puerto Rico (U.S. territory) Saint Pierre and Miquelon (French overseas territory; geographically close to North America and included for regional comprehensiveness) Wallis and Futuna (French overseas collectivity; included for regional relevance) Benefits of the Dataset: Strategic Insights: Businesses and analysts can use the dataset to gain insights into significant regional developments, economic conditions, and political changes, aiding in strategic decision-making and market analysis. Market and Industry Trends: The dataset provides valuable information on industry-specific trends and events, helping users understand market dynamics and identify emerging opportunities. Media and PR Monitoring: Journalists and PR professionals can track relevant news across North America, enabling them to monitor media coverage, identify emerging stories, and manage public relations efforts effectively. Academic and Research Use: Researchers can utilize the dataset for longitudinal studies, trend analysis, and academic research on various topics related to North American news and events. Techsalerator’s News Event Data in North America is a crucial resource for accessing and analyzing significant news events across the continent. By providing detailed, categorized, and up-to-date information, it supports effective decision-making, research, and media monitoring across diverse sectors.

  9. PHOENIX Dataset for each Pilot 2022

    • zenodo.org
    • data.europa.eu
    Updated Jun 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Loes Bouman; Dimitris Ballas; Elena Tarsi; Andrea Testi; Cassandra Fontana; Iacopo Zetti; Maddalena Rossi; Loes Bouman; Dimitris Ballas; Elena Tarsi; Andrea Testi; Cassandra Fontana; Iacopo Zetti; Maddalena Rossi (2023). PHOENIX Dataset for each Pilot 2022 [Dataset]. http://doi.org/10.5281/zenodo.7124141
    Explore at:
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Loes Bouman; Dimitris Ballas; Elena Tarsi; Andrea Testi; Cassandra Fontana; Iacopo Zetti; Maddalena Rossi; Loes Bouman; Dimitris Ballas; Elena Tarsi; Andrea Testi; Cassandra Fontana; Iacopo Zetti; Maddalena Rossi
    Description

    Deliverable D2.5 Datasets for each Pilot.

    This deliverable is the result of joint efforts from the PHOENIX consortium. This executive report describes the purpose and content of the deliverable.

    Deliverable D2.5 Datasets for each Pilot is exactly what the title suggests: The deliverable consists of a vast collection of environmental and socio-economic datasets from the EU at national and local levels of territories where local partners and pilots are operating. Please note that this deliverable consists of data only and there are no interpretations and analyses performed with this data, this step will be executed in other deliverables.

    What data does the deliverable consist of? First, there is a dataset comprising over 100 indicators that give insight into a wide range of themes that are relevant to PHOENIX, including population structures, social-economic conditions, information about marginalised groups, energy poverty and environmental/ecological conditions in the respective territories. These data are mostly coming from large international databases such as Eurostat or national statistical databases. Second, the deliverable includes various relevant secondary datasets with information on current opinions, social attitudes, values and ‘green’ behaviours that are the product of international collaborations and initiatives such as the European Social Survey (ESS) and Eurobarometer. Third, the deliverable compiles data at local levels (cities and regions) collected from censuses of population, digital boundary data sources. This in order to understand dynamics on a much local scale and to provide input data for some of the next steps in the PHOENIX project (including spatial microsimulation, agent-based modelling and geo-visualisations).

    In particular, Deliverable D2.5 is originally developed with the purpose to support WP3 and specifically task 3.3 to develop the Tangram’s methodologies and tools to investigate cornerstone democratic innovations and estimate their success in citizens’ readiness to change for climate change, and tailor and test these across pilots. Yet, the usability of these data are expected to be of interest across all work packages within the consortium.

    In this report, the first chapter outlines a short description with instructions on how all the different datasets are organised and stored and how one can find, obtain and use the data. Chapters two to eight describe available data per territory where the various pilots will be operating. It will be apparent that there is overlap between these chapters, only with some different details about the data formats.

    PHOENIX adheres to up-to date data management standards and regulations such as the GDPR, for details on our data management organisation and policies please inquire our dedicated deliverable D 1.2 Data Management Plan.

    Overall, this deliverable provides the basis for following steps in PHOENIX and at the same time it is formatted in a way that can support all partners in their search for relevant information for their work packages and tasks (and pilot area case studies) that they work on.

  10. News Events Data in Europe ( Techsalerator)

    • datarade.ai
    Updated Jun 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Techsalerator (2024). News Events Data in Europe ( Techsalerator) [Dataset]. https://datarade.ai/data-products/news-events-data-in-europe-techsalerator-techsalerator
    Explore at:
    .json, .csv, .xls, .txtAvailable download formats
    Dataset updated
    Jun 20, 2024
    Dataset provided by
    Techsalerator LLC
    Authors
    Techsalerator
    Area covered
    Kosovo, Macedonia (the former Yugoslav Republic of), Netherlands, Malta, Andorra, Jersey, Belarus, Czech Republic, Latvia, Bulgaria, Europe
    Description

    Techsalerator’s News Event Data in Europe is a comprehensive and meticulously curated dataset designed to provide businesses, analysts, journalists, and researchers with an extensive view of significant news events across Europe. This dataset captures and categorizes key events reported from a variety of news sources, offering valuable insights into industry developments, economic changes, political shifts, and other noteworthy occurrences throughout the continent.

    Key Features of the Dataset: Extensive Coverage:

    The dataset aggregates news events from a wide range of sources including press releases, industry news sites, blogs, PR platforms, and traditional news outlets. This broad coverage ensures that users receive a diverse array of information from multiple reporting channels. Categorization of Events:

    News events are meticulously categorized into various types such as business and financial updates, political developments, technological advancements, legal and regulatory changes, and cultural events. This categorization helps users quickly locate and analyze information relevant to specific interests or sectors. Real-Time Updates:

    Data is updated regularly to include the most current events. This ensures that users have access to the latest information and can stay informed about recent developments as they unfold. Geographic Segmentation:

    Events are tagged with their respective countries and regions within Europe. This geographic segmentation allows users to filter and analyze news events based on specific locations, facilitating targeted research and analysis. Event Details:

    Each event entry includes detailed information such as the date of occurrence, source of the news, event description, and relevant keywords. This comprehensive detail aids in understanding the context and significance of each event. Historical Data:

    The dataset includes historical news event data, enabling users to track trends and analyze changes over time. This feature supports longitudinal studies and comparative analysis of historical and recent events. Advanced Search and Filter Options:

    Users can search and filter news events based on various criteria such as date range, event type, location, and keywords. This functionality allows for precise and efficient retrieval of relevant information. European Countries Covered: Austria Belgium Bulgaria Croatia Cyprus Czech Republic Denmark Estonia Finland France Germany Greece Hungary Ireland Italy Latvia Lithuania Luxembourg Malta Netherlands Poland Portugal Romania Slovakia Slovenia Spain Sweden Benefits of the Dataset: Informed Decision-Making: Businesses and analysts can leverage the dataset to stay updated on key developments that may impact their operations, market conditions, or strategic decisions. Market and Industry Analysis: The dataset provides valuable insights into industry trends, economic changes, and political events, helping users analyze market dynamics and make informed decisions. Media and PR Monitoring: Journalists and PR professionals can track relevant news and events across Europe, allowing them to monitor media coverage, identify emerging stories, and manage public relations efforts effectively. Academic and Research Purposes: Researchers can use the dataset for longitudinal studies, trend analysis, and academic research on various topics related to European news and events. Techsalerator’s News Event Data in Europe is a vital resource for accessing and analyzing significant news events across the continent. By offering detailed, categorized, and up-to-date information, it supports effective decision-making, research, and media monitoring across diverse sectors.

  11. Z

    Data from: Qbias – A Dataset on Media Bias in Search Queries and Query...

    • data.niaid.nih.gov
    Updated Mar 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haak, Fabian; Schaer, Philipp (2023). Qbias – A Dataset on Media Bias in Search Queries and Query Suggestions [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7682914
    Explore at:
    Dataset updated
    Mar 1, 2023
    Dataset provided by
    Technische Hochschule Köln
    Authors
    Haak, Fabian; Schaer, Philipp
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present Qbias, two novel datasets that promote the investigation of bias in online news search as described in

    Fabian Haak and Philipp Schaer. 2023. 𝑄𝑏𝑖𝑎𝑠 - A Dataset on Media Bias in Search Queries and Query Suggestions. In Proceedings of ACM Web Science Conference (WebSci’23). ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3578503.3583628.

    Dataset 1: AllSides Balanced News Dataset (allsides_balanced_news_headlines-texts.csv)

    The dataset contains 21,747 news articles collected from AllSides balanced news headline roundups in November 2022 as presented in our publication. The AllSides balanced news feature three expert-selected U.S. news articles from sources of different political views (left, right, center), often featuring spin bias, and slant other forms of non-neutral reporting on political news. All articles are tagged with a bias label by four expert annotators based on the expressed political partisanship, left, right, or neutral. The AllSides balanced news aims to offer multiple political perspectives on important news stories, educate users on biases, and provide multiple viewpoints. Collected data further includes headlines, dates, news texts, topic tags (e.g., "Republican party", "coronavirus", "federal jobs"), and the publishing news outlet. We also include AllSides' neutral description of the topic of the articles. Overall, the dataset contains 10,273 articles tagged as left, 7,222 as right, and 4,252 as center.

    To provide easier access to the most recent and complete version of the dataset for future research, we provide a scraping tool and a regularly updated version of the dataset at https://github.com/irgroup/Qbias. The repository also contains regularly updated more recent versions of the dataset with additional tags (such as the URL to the article). We chose to publish the version used for fine-tuning the models on Zenodo to enable the reproduction of the results of our study.

    Dataset 2: Search Query Suggestions (suggestions.csv)

    The second dataset we provide consists of 671,669 search query suggestions for root queries based on tags of the AllSides biased news dataset. We collected search query suggestions from Google and Bing for the 1,431 topic tags, that have been used for tagging AllSides news at least five times, approximately half of the total number of topics. The topic tags include names, a wide range of political terms, agendas, and topics (e.g., "communism", "libertarian party", "same-sex marriage"), cultural and religious terms (e.g., "Ramadan", "pope Francis"), locations and other news-relevant terms. On average, the dataset contains 469 search queries for each topic. In total, 318,185 suggestions have been retrieved from Google and 353,484 from Bing.

    The file contains a "root_term" column based on the AllSides topic tags. The "query_input" column contains the search term submitted to the search engine ("search_engine"). "query_suggestion" and "rank" represents the search query suggestions at the respective positions returned by the search engines at the given time of search "datetime". We scraped our data from a US server saved in "location".

    We retrieved ten search query suggestions provided by the Google and Bing search autocomplete systems for the input of each of these root queries, without performing a search. Furthermore, we extended the root queries by the letters a to z (e.g., "democrats" (root term) >> "democrats a" (query input) >> "democrats and recession" (query suggestion)) to simulate a user's input during information search and generate a total of up to 270 query suggestions per topic and search engine. The dataset we provide contains columns for root term, query input, and query suggestion for each suggested query. The location from which the search is performed is the location of the Google servers running Colab, in our case Iowa in the United States of America, which is added to the dataset.

    AllSides Scraper

    At https://github.com/irgroup/Qbias, we provide a scraping tool, that allows for the automatic retrieval of all available articles at the AllSides balanced news headlines.

    We want to provide an easy means of retrieving the news and all corresponding information. For many tasks it is relevant to have the most recent documents available. Thus, we provide this Python-based scraper, that scrapes all available AllSides news articles and gathers available information. By providing the scraper we facilitate access to a recent version of the dataset for other researchers.

  12. Summary of co-design limitations and challenges and potential strategies to...

    • plos.figshare.com
    xls
    Updated Feb 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hardeep Singh; Natasha Benn; Agnes Fung; Kristina M. Kokorelias; Julia Martyniuk; Michelle L. A. Nelson; Heather Colquhoun; Jill I. Cameron; Sarah Munce; Marianne Saragosa; Kian Godhwani; Aleena Khan; Paul Yejong Yoo; Kerry Kuluski (2024). Summary of co-design limitations and challenges and potential strategies to overcome these limitations and challenges. [Dataset]. http://doi.org/10.1371/journal.pone.0297162.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Feb 14, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Hardeep Singh; Natasha Benn; Agnes Fung; Kristina M. Kokorelias; Julia Martyniuk; Michelle L. A. Nelson; Heather Colquhoun; Jill I. Cameron; Sarah Munce; Marianne Saragosa; Kian Godhwani; Aleena Khan; Paul Yejong Yoo; Kerry Kuluski
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary of co-design limitations and challenges and potential strategies to overcome these limitations and challenges.

  13. g

    NESP MB Project B3 - Enhancing access to relevant marine information –...

    • gimi9.com
    Updated Apr 10, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). NESP MB Project B3 - Enhancing access to relevant marine information – developing a service for searching, aggregating and filtering collections of linked open marine data | gimi9.com [Dataset]. https://gimi9.com/dataset/au_nesp-mb-project-b3-enhancing-access-to-relevant-marine-information-developing-a-service-for-sea/
    Explore at:
    Dataset updated
    Apr 10, 2016
    Description

    This record provides an overview of the scope and research output of NESP Marine Biodiversity Hub Project B3 - "Enhancing access to relevant marine information –developing a service for searching, aggregating and filtering collections of linked open marine data". For specific data outputs from this project, please see child records associated with this metadata. This project aims to improve the searchability and delivery of sources of linked open data, and to provide the ability to forward collections of discovered data to web services for subsequent processing through the development of a linked open data search tool. This work will improve access to existing data collections, and facilitate the development of new applications by acting as an aggregator of links to streams of marine data. The work will benefit managers (i.e. Department of the Environment staff) by providing fast and simple access to a wide range of marine information products, and offering a means of quickly synthesizing and aggregating multiple sources of information. Planned Outputs • Delivery of open source code to perform the search functions described above. • A simple initial web interface for performing the search and retrieval of results. • Expanded collections of data holdings available in linked open format, including the use of semantic mark-up to enable fully-automated data aggregation and web services. In particular, addition of linked-open data capability to a pilot collection of existing data sets (GA, CERF and NERP data sets).

  14. d

    Knowledge Management (Normalized)

    • search.dataone.org
    • datasetcatalog.nlm.nih.gov
    Updated Oct 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anez, Diomar; Anez, Dimar (2025). Knowledge Management (Normalized) [Dataset]. http://doi.org/10.7910/DVN/BAPIEP
    Explore at:
    Dataset updated
    Oct 29, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Anez, Diomar; Anez, Dimar
    Description

    This dataset provides processed and normalized/standardized indices for the management tool 'Knowledge Management' (KM), including related concepts like Intellectual Capital Management and Knowledge Transfer. Derived from five distinct raw data sources, these indices are specifically designed for comparative longitudinal analysis, enabling the examination of trends and relationships across different empirical domains (web search, literature, academic publishing, and executive adoption). The data presented here represent transformed versions of the original source data, aimed at achieving metric comparability. Users requiring the unprocessed source data should consult the corresponding KM dataset in the Management Tool Source Data (Raw Extracts) Dataverse. Data Files and Processing Methodologies: Google Trends File (Prefix: GT_): Normalized Relative Search Interest (RSI) Input Data: Native monthly RSI values from Google Trends (Jan 2004 - Jan 2025) for the query "knowledge management" + "knowledge management organizational". Processing: None. Utilizes the original base-100 normalized Google Trends index. Output Metric: Monthly Normalized RSI (Base 100). Frequency: Monthly. Google Books Ngram Viewer File (Prefix: GB_): Normalized Relative Frequency Input Data: Annual relative frequency values from Google Books Ngram Viewer (1950-2022, English corpus, no smoothing) for the query Knowledge Management + Intellectual Capital Management + Knowledge Transfer. Processing: Annual relative frequency series normalized (peak year = 100). Output Metric: Annual Normalized Relative Frequency Index (Base 100). Frequency: Annual. Crossref.org File (Prefix: CR_): Normalized Relative Publication Share Index Input Data: Absolute monthly publication counts matching KM-related keywords [("knowledge management" OR ...) AND (...) - see raw data for full query] in titles/abstracts (1950-2025), alongside total monthly Crossref publications. Deduplicated via DOIs. Processing: Monthly relative share calculated (KM Count / Total Count). Monthly relative share series normalized (peak month's share = 100). Output Metric: Monthly Normalized Relative Publication Share Index (Base 100). Frequency: Monthly. Bain & Co. Survey - Usability File (Prefix: BU_): Normalized Usability Index Input Data: Original usability percentages (%) from Bain surveys for specific years: Knowledge Management (1999, 2000, 2002, 2004, 2006, 2008, 2010). Note: Not reported after 2010. Processing: Normalization: Original usability percentages normalized relative to its historical peak (Max % = 100). Output Metric: Biennial Estimated Normalized Usability Index (Base 100 relative to historical peak). Frequency: Biennial (Approx.). Bain & Co. Survey - Satisfaction File (Prefix: BS_): Standardized Satisfaction Index Input Data: Original average satisfaction scores (1-5 scale) from Bain surveys for specific years: Knowledge Management (1999-2010). Note: Not reported after 2010. Processing: Standardization (Z-scores): Using Z = (X - 3.0) / 0.891609. Index Scale Transformation: Index = 50 + (Z * 22). Output Metric: Biennial Standardized Satisfaction Index (Center=50, Range?[1,100]). Frequency: Biennial (Approx.). File Naming Convention: Files generally follow the pattern: PREFIX_Tool_Processed.csv or similar, where the PREFIX indicates the data source (GT_, GB_, CR_, BU_, BS_). Consult the parent Dataverse description (Management Tool Comparative Indices) for general context and the methodological disclaimer. For original extraction details (specific keywords, URLs, etc.), refer to the corresponding KM dataset in the Raw Extracts Dataverse. Comprehensive project documentation provides full details on all processing steps.

  15. Z

    Dataset related to the manuscript: "An open-source integrated framework for...

    • data-staging.niaid.nih.gov
    Updated Mar 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    D'Ambrosio Angelo (2022). Dataset related to the manuscript: "An open-source integrated framework for the automation of citation collection and screening in systematic reviews" [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_6323360
    Explore at:
    Dataset updated
    Mar 3, 2022
    Dataset provided by
    Freiburg University Hospital
    Authors
    D'Ambrosio Angelo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset related to the manuscript: “An open-source integrated framework for the automation of citation collection and screening in systematic reviews”, to be used together with the code stored at https://github.com/AD-Papers-Material/BART_SystReviewClassifier to reproduce the results.

    There are three datasets: - The Record data collected from the online scientific databases; - The session journal which describes the search session, i.e., how many records were collected and from which source, for each query/session pairs. - The session data which is the outcome of the classification and review tasks;

  16. National Health and Nutrition Examination Survey

    • kaggle.com
    zip
    Updated Jan 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). National Health and Nutrition Examination Survey [Dataset]. https://www.kaggle.com/datasets/thedevastator/national-health-and-nutrition-examination-survey/code
    Explore at:
    zip(183217 bytes)Available download formats
    Dataset updated
    Jan 12, 2023
    Authors
    The Devastator
    Description

    National Health and Nutrition Examination Survey (NHANES) Data

    Health Indicators for Different Locations

    By Centers for Disease Control and Prevention [source]

    About this dataset

    This dataset offers an in-depth look into the National Health and Nutrition Examination Survey (NHANES), which provides valuable insights on various health indicators throughout the United States. It includes important information such as the year when data was collected, location of the survey, data source and value, priority areas of focus, category and topic related to the survey, break out categories of data values, geographic location coordinates and other key indicators.Discover patterns in mortality rates from cardiovascular disease or analyze if pregnant women are more likely to report poor health than those who are not expecting with this NHANES dataset — a powerful collection for understanding personal health behaviors

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    Step 1: Understand the Data Format - Before beginning to work with NHANES data, you should become familiar with the different columns in the dataset. Each column contains a specific type of information about the data such as year collected, geographic location abbreviations and descriptions, sources used for collecting data, priority areas assigned by researchers or institutions associated with understanding health trends in a given area or population group as well as indicator values related to nutrition/health.

    Step 2: Choose an Indicator - Once you understand what is included in each column and what type of values correspond to each field it is time to select which indicator(s) you would like plots or visualizations against demographic/geographical characteristics represented by NHANES data. Selecting an appropriate indicator helps narrow down your search criteria when conducting analyses of health/nutrition trends over time in different locations or amongst different demographic groups.

    Step 3: Utilizing Subsets - When narrowing down your search criteria it may be beneficial to break up large datasets into smaller subsets that focus on a single area or topic for study (i.e., looking at nutrition trends among rural communities). This allows users to zoom into certain datasets if needed within their larger studies so they can further drill down on particular topics that are relevant for their research objectives without losing greater context from more general analysis results when viewing overall datasets containing all available fields for all locations examined by NHANES over many years of records collected at specific geographical areas requested within the parameters set forth by those wanting insights from external research teams utilizing this dataset remotely via Kaggle access granted through user accounts giving them authorized access controls solely limited by base administration permissions set forth where required prior granting needs authorization process has been met prior downloading/extraction activities successful completion finalized allowed beyond initial site signup page make sure rules followed while also ensuring positive experience interactive engagement processes fluid flow signature one-time registration entry after exit page exits once completed neutralize logout button pops finish downloading extract image files transfer end destination requires hard drive storage efficient manner duplicate second backup remain resilient mitigate file corruption concerns start working properly formatted smooth transition between systems be seamless reflective channel dynamic organization approach complement function beneficial effort allow comprehensive review completed quality control standards align desires outcomes desired critical path

    Research Ideas

    • Creating a health calculator to help people measure their health risk. The indicator and data value fields can be used to create an algorithm that will generate a personalized label for each user's health status.
    • Developing a visual representation of the nutritional habits of different populations based on the DataSource, LocationAbbr, and PriorityArea fields from this dataset.
    • Employing machine learning to discern patterns in the data or predict potential health risks in different regions or populations by using the GeoLocation field as inputs for geographic analysis.

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    **Unknown License - Please check the dataset description for more information....

  17. d

    Mission and Vision Statements (Normalized)

    • search.dataone.org
    • datasetcatalog.nlm.nih.gov
    Updated Oct 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anez, Diomar; Anez, Dimar (2025). Mission and Vision Statements (Normalized) [Dataset]. http://doi.org/10.7910/DVN/SFKSW0
    Explore at:
    Dataset updated
    Oct 29, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Anez, Diomar; Anez, Dimar
    Description

    This dataset provides processed and normalized/standardized indices for the management tool group focused on 'Mission and Vision Statements', including related concepts like Purpose Statements. Derived from five distinct raw data sources, these indices are specifically designed for comparative longitudinal analysis, enabling the examination of trends and relationships across different empirical domains (web search, literature, academic publishing, and executive adoption). The data presented here represent transformed versions of the original source data, aimed at achieving metric comparability. Users requiring the unprocessed source data should consult the corresponding Mission/Vision dataset in the Management Tool Source Data (Raw Extracts) Dataverse. Data Files and Processing Methodologies: Google Trends File (Prefix: GT_): Normalized Relative Search Interest (RSI) Input Data: Native monthly RSI values from Google Trends (Jan 2004 - Jan 2025) for the query "mission statement" + "vision statement" + "mission and vision corporate". Processing: None. Utilizes the original base-100 normalized Google Trends index. Output Metric: Monthly Normalized RSI (Base 100). Frequency: Monthly. Google Books Ngram Viewer File (Prefix: GB_): Normalized Relative Frequency Input Data: Annual relative frequency values from Google Books Ngram Viewer (1950-2022, English corpus, no smoothing) for the query Mission Statements + Vision Statements + Purpose Statements + Mission and Vision. Processing: Annual relative frequency series normalized (peak year = 100). Output Metric: Annual Normalized Relative Frequency Index (Base 100). Frequency: Annual. Crossref.org File (Prefix: CR_): Normalized Relative Publication Share Index Input Data: Absolute monthly publication counts matching Mission/Vision-related keywords [("mission statement" OR ...) AND (...) - see raw data for full query] in titles/abstracts (1950-2025), alongside total monthly Crossref publications. Deduplicated via DOIs. Processing: Monthly relative share calculated (Mission/Vision Count / Total Count). Monthly relative share series normalized (peak month's share = 100). Output Metric: Monthly Normalized Relative Publication Share Index (Base 100). Frequency: Monthly. Bain & Co. Survey - Usability File (Prefix: BU_): Normalized Usability Index Input Data: Original usability percentages (%) from Bain surveys for specific years: Mission/Vision (1993); Mission Statements (1996); Mission and Vision Statements (1999-2017); Purpose, Mission, and Vision Statements (2022). Processing: Semantic Grouping: Data points across the different naming conventions were treated as a single conceptual series. Normalization: Combined series normalized relative to its historical peak (Max % = 100). Output Metric: Biennial Estimated Normalized Usability Index (Base 100 relative to historical peak). Frequency: Biennial (Approx.). Bain & Co. Survey - Satisfaction File (Prefix: BS_): Standardized Satisfaction Index Input Data: Original average satisfaction scores (1-5 scale) from Bain surveys for specific years (same names/years as Usability). Processing: Semantic Grouping: Data points treated as a single conceptual series. Standardization (Z-scores): Using Z = (X - 3.0) / 0.891609. Index Scale Transformation: Index = 50 + (Z * 22). Output Metric: Biennial Standardized Satisfaction Index (Center=50, Range?[1,100]). Frequency: Biennial (Approx.). File Naming Convention: Files generally follow the pattern: PREFIX_Tool_Processed.csv or similar, where the PREFIX indicates the data source (GT_, GB_, CR_, BU_, BS_). Consult the parent Dataverse description (Management Tool Comparative Indices) for general context and the methodological disclaimer. For original extraction details (specific keywords, URLs, etc.), refer to the corresponding Mission/Vision dataset in the Raw Extracts Dataverse. Comprehensive project documentation provides full details on all processing steps.

  18. Daily Trends Data from blogspot.com

    • kaggle.com
    zip
    Updated Jan 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Daily Trends Data from blogspot.com [Dataset]. https://www.kaggle.com/datasets/thedevastator/worldwide-daily-trends-data-from-primearchive-bl
    Explore at:
    zip(28034217 bytes)Available download formats
    Dataset updated
    Jan 15, 2023
    Authors
    The Devastator
    Description

    Daily Trends Data from blogspot.com

    Analyzing Global Searches, Buzz, Topics, and Popularity by Country and Type

    By Jeffrey Mvutu Mabilama [source]

    About this dataset

    This dataset contains daily top trends from around the world and is a great source of knowledge to help discover new business opportunities, fuel creativity, and improve international business relationships. It contains data collected from over a blog that was dedicated to analyzing search result trends in various countries before the blog itself was discontinued.

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    Heads up! Whether you’re a marketer, business-owner, student or researcher, this dataset can give you some valuable insights and ideas.

    Let’s check out how you can use the Worldwide Daily Trend dataset from primearchive.blogspot.com to your advantage:

    • Business – Understand the desires and preferences of different population around the world by analyzing their top trends and search terms. This data can be used to create products tailored to their needs, build better international relationships with people who have an interest in what your country offers, or for collaborative efforts between countries in the same field or industry (e.g., tech startups).
    • Marketing – Create campaigns that will inspire people from different parts of the world using this data as insight into their interests, habits and ways of communication (tone of voice). Take what’s trending to get ideas on how to effectively reach those audiences with your message!

    • Social – Keep tabs on what your potential customers are interested in so you know where they are spending their time online — this way it will be easier for you to interact with them at appropriate times concerning various topics related to them or utilise small talk when networking abroad as well as have a data-backed approach when evaluating customer segmentation/census initiatives - all thanks of understanding local trends!

    • Studies– Conduct research about buzz formation and longevity by keeping track of global trends across countries - does something quickly go viral? Is there any pattern for long-lasting trends? Are there any similarities between countries in terms correlations? What’s their trajectory looking like over time? Questions like these can be answered using this comprehensive data source which allows trend measurement from one place!

    Research Ideas

    • Using this dataset as a starting point to understand the impact/popularity of international TV programs and movies across countries, which can inform decisions on advertising/marketing campaigns.
    • Studying population habits across different countries by analyzing the most popular search terms and topics, which can be used to create better services and products tailored to local needs.
    • Cross-referencing this dataset with other datasets (e.g., census data) to study individual social behavior in different countries, which can help marketers tailor ads better according to personal characteristics

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    File: primearchive.blogspot.com_detailled-trends_swiss.csv | Column name | Description | |:-------------------|:-------------------------------------------------------------------------------| | country | The country the trend is from. (String) | | type | The type of trend (business or social). (String) | | title | The title of the trend. (String) | | date | The date the trend was published. (Date) | | url | The URL of the trend. (String) | | no | The number of the trend. (Integer) | | name | The name of the trend. (String) | | traffic | The amount of traffic the trend has received. (Integer) | | publishDate | The date the trend was published. (Date) | | relatedKeyword | Related keywords associated with the trend. ...

  19. Overview of 89 studies analyzed in this review that explicitly used the term...

    • figshare.com
    xls
    Updated Feb 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hardeep Singh; Natasha Benn; Agnes Fung; Kristina M. Kokorelias; Julia Martyniuk; Michelle L. A. Nelson; Heather Colquhoun; Jill I. Cameron; Sarah Munce; Marianne Saragosa; Kian Godhwani; Aleena Khan; Paul Yejong Yoo; Kerry Kuluski (2024). Overview of 89 studies analyzed in this review that explicitly used the term co-design. [Dataset]. http://doi.org/10.1371/journal.pone.0297162.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Feb 14, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Hardeep Singh; Natasha Benn; Agnes Fung; Kristina M. Kokorelias; Julia Martyniuk; Michelle L. A. Nelson; Heather Colquhoun; Jill I. Cameron; Sarah Munce; Marianne Saragosa; Kian Godhwani; Aleena Khan; Paul Yejong Yoo; Kerry Kuluski
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview of 89 studies analyzed in this review that explicitly used the term co-design.

  20. d

    Data from: Remote sensing and landcover in ring-necked pheasant research: A...

    • search.dataone.org
    • datadryad.org
    Updated Jul 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Megan Baldissara; Allison Barg; Andrew Little; Zhenghong Tang; Brian Wardlow; Daniel Uden (2025). Remote sensing and landcover in ring-necked pheasant research: A review of data sources and scales [Dataset]. http://doi.org/10.5061/dryad.xsj3tx9rb
    Explore at:
    Dataset updated
    Jul 15, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Megan Baldissara; Allison Barg; Andrew Little; Zhenghong Tang; Brian Wardlow; Daniel Uden
    Description

    Documenting wildlife–habitat relationships at multiple scales is essential for conservation. Remote sensing datasets and their derivatives (e.g., landcover data) enable efficient multi-scale assessment of ring-necked pheasant (Phasianus colchicus) habitat, albeit with trade-offs among their thematic, spatial, temporal, and/or spectral grains and extents. For example, the National Agriculture Imagery Program provides fine spatial but coarse spectral grain imagery, both important for identifying pheasant habitats. Spatial technologies and datasets relevant to pheasant research are advancing, yet the information on the data sources utilized in research to date is limited. Remote sensing and landcover datasets surveys in pheasant research could help fill information gaps in pheasant–habitat relationships. In this systematic review we filtered 1,110 peer-reviewed pheasant habitat studies to 65 from the Central U.S.A. Temporal trends were tested in the broad use of remote sensing and the sele..., Research selection and classification criteria Two authors of this paper independently collected and filtered articles following the same protocols. The search was performed in May 2024. The result from each author was then compared to check that the process was performed correctly. The data collection methods closely align with the ones employed by Barg et al (forthcoming 2024; doi: 10.5061/dryad.j3tx95xr4). The review followed the PRISMA guidelines for ecology and evolution (O’Dea et al., 2021) utilizing all databases and collections in the Web of Science. The final search query returned papers with titles or abstracts containing keywords related to pheasants and their habitats. The search initially returned 1100 papers, which we filtered to 174 peer-reviewed articles based in the U.S.A. or Canada. These papers were further manually filtered to exclude (1) papers that were not about the ring-necked pheasant (Phasianus colchicus), (2) non-original primary peer-reviewed research article..., , # Pheasant research temporal trends analysis

    https://doi.org/10.5061/dryad.xsj3tx9rb

    Description of the data and file structure

    This dataset includes the data and code needed to reproduce the data summary and analysis from the associated publication. The data was collected to review the ring-necked pheasant (Phasianus colchicus) scale research from a remote sensing perspective. The ultimate aim is to understand this declining species' habitat needs at multiple scales. The data was collected from the Web of Science using keywords relating to pheasants and habitats, and then the data was manually filtered to select relevant studies, mainly on the Great Plains. The Excel file lists the studies that were considered and selected or disregarded depending on our chosen criteria. Sixty-five studies were found researching pheasant habitat in the study area, 26 of which used remote sensing. The study methodology, remote sensing data types and pla...,

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Anastasija Nikiforova; Nina Rizun; Magdalena Ciesielska; Charalampos Alexopoulos; Andrea Miletič (2023). Dataset: A Systematic Literature Review on the topic of High-value datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7944424

Dataset: A Systematic Literature Review on the topic of High-value datasets

Explore at:
Dataset updated
Jun 23, 2023
Dataset provided by
Gdańsk University of Technology
University of the Aegean
University of Tartu
University of Zagreb
Authors
Anastasija Nikiforova; Nina Rizun; Magdalena Ciesielska; Charalampos Alexopoulos; Andrea Miletič
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset contains data collected during a study ("Towards High-Value Datasets determination for data-driven development: a systematic literature review") conducted by Anastasija Nikiforova (University of Tartu), Nina Rizun, Magdalena Ciesielska (Gdańsk University of Technology), Charalampos Alexopoulos (University of the Aegean) and Andrea Miletič (University of Zagreb) It being made public both to act as supplementary data for "Towards High-Value Datasets determination for data-driven development: a systematic literature review" paper (pre-print is available in Open Access here -> https://arxiv.org/abs/2305.10234) and in order for other researchers to use these data in their own work.

The protocol is intended for the Systematic Literature review on the topic of High-value Datasets with the aim to gather information on how the topic of High-value datasets (HVD) and their determination has been reflected in the literature over the years and what has been found by these studies to date, incl. the indicators used in them, involved stakeholders, data-related aspects, and frameworks. The data in this dataset were collected in the result of the SLR over Scopus, Web of Science, and Digital Government Research library (DGRL) in 2023.

Methodology

To understand how HVD determination has been reflected in the literature over the years and what has been found by these studies to date, all relevant literature covering this topic has been studied. To this end, the SLR was carried out to by searching digital libraries covered by Scopus, Web of Science (WoS), Digital Government Research library (DGRL).

These databases were queried for keywords ("open data" OR "open government data") AND ("high-value data*" OR "high value data*"), which were applied to the article title, keywords, and abstract to limit the number of papers to those, where these objects were primary research objects rather than mentioned in the body, e.g., as a future work. After deduplication, 11 articles were found unique and were further checked for relevance. As a result, a total of 9 articles were further examined. Each study was independently examined by at least two authors.

To attain the objective of our study, we developed the protocol, where the information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information.

Test procedure Each study was independently examined by at least two authors, where after the in-depth examination of the full-text of the article, the structured protocol has been filled for each study. The structure of the survey is available in the supplementary file available (see Protocol_HVD_SLR.odt, Protocol_HVD_SLR.docx) The data collected for each study by two researchers were then synthesized in one final version by the third researcher.

Description of the data in this data set

Protocol_HVD_SLR provides the structure of the protocol Spreadsheets #1 provides the filled protocol for relevant studies. Spreadsheet#2 provides the list of results after the search over three indexing databases, i.e. before filtering out irrelevant studies

The information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information

Descriptive information
1) Article number - a study number, corresponding to the study number assigned in an Excel worksheet 2) Complete reference - the complete source information to refer to the study 3) Year of publication - the year in which the study was published 4) Journal article / conference paper / book chapter - the type of the paper -{journal article, conference paper, book chapter} 5) DOI / Website- a link to the website where the study can be found 6) Number of citations - the number of citations of the article in Google Scholar, Scopus, Web of Science 7) Availability in OA - availability of an article in the Open Access 8) Keywords - keywords of the paper as indicated by the authors 9) Relevance for this study - what is the relevance level of the article for this study? {high / medium / low}

Approach- and research design-related information 10) Objective / RQ - the research objective / aim, established research questions 11) Research method (including unit of analysis) - the methods used to collect data, including the unit of analy-sis (country, organisation, specific unit that has been ana-lysed, e.g., the number of use-cases, scope of the SLR etc.) 12) Contributions - the contributions of the study 13) Method - whether the study uses a qualitative, quantitative, or mixed methods approach? 14) Availability of the underlying research data- whether there is a reference to the publicly available underly-ing research data e.g., transcriptions of interviews, collected data, or explanation why these data are not shared? 15) Period under investigation - period (or moment) in which the study was conducted 16) Use of theory / theoretical concepts / approaches - does the study mention any theory / theoretical concepts / approaches? If any theory is mentioned, how is theory used in the study?

Quality- and relevance- related information
17) Quality concerns - whether there are any quality concerns (e.g., limited infor-mation about the research methods used)? 18) Primary research object - is the HVD a primary research object in the study? (primary - the paper is focused around the HVD determination, sec-ondary - mentioned but not studied (e.g., as part of discus-sion, future work etc.))

HVD determination-related information
19) HVD definition and type of value - how is the HVD defined in the article and / or any other equivalent term? 20) HVD indicators - what are the indicators to identify HVD? How were they identified? (components & relationships, “input -> output") 21) A framework for HVD determination - is there a framework presented for HVD identification? What components does it consist of and what are the rela-tionships between these components? (detailed description) 22) Stakeholders and their roles - what stakeholders or actors does HVD determination in-volve? What are their roles? 23) Data - what data do HVD cover? 24) Level (if relevant) - what is the level of the HVD determination covered in the article? (e.g., city, regional, national, international)

Format of the file .xls, .csv (for the first spreadsheet only), .odt, .docx

Licenses or restrictions CC-BY

For more info, see README.txt

Search
Clear search
Close search
Google apps
Main menu