Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The study addresses the use of the Semantic Web and Linked Data principles proposed by the World Wide Web Consortium for the development of Web application for semantic management of scanned documents. The main goal is to record scanned documents describing them in a way the machine is able to understand and process them, filtering content and assisting us in searching for such documents when a decision-making process is in course. To this end, machine-understandable metadata, created through the use of reference Linked Data ontologies, are associated to documents, creating a knowledge base. To further enrich the process, (semi)automatic mashup of these metadata with data from the new Web of Linked Data is carried out, considerably increasing the scope of the knowledge base and enabling to extract new data related to the content of stored documents from the Web and combine them, without the user making any effort or perceiving the complexity of the whole process.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data sets accompanying the paper "The FAIR Assessment Conundrum: Reflections on Tools and Metrics", an analysis of a comprehensive set of FAIR assessment tools and the metrics used by these tools for the assessment. The data set "metrics.csv" consists of the metrics collected from several sources linked to the analysed FAIR assessments tools. It is structured into 11 columns: (i) tool_id, (ii) tool_name, (iii) metric_discarded, (iv) metric_fairness_scope_declared, (v) metric_fairness_scope_observed, (vi) metric_id, (vii) metric_text, (viii) metric_technology, (ix) metric_approach, (x) last_accessed_date, and (xi) provenance. The columns tool_id and tool_name are used for the identifier we assigned to each tool analysed and the full name of the tool respectively. The metric_discarded column refers to the selection we operated on the collected metrics, since we excluded the metrics created for testing purposes or written in a language different from English. The possible values are boolean. We assigned TRUE if the metric was discarded. The columns metric_fairness_scope_declared and metric_fairness_scope_observed are used for indicating the declared intent of the metrics, with respect to the FAIR principle assessed, and the one we observed respectively. Possible values are: (a) a letter of the FAIR acronym (for the metrics without a link declared to a specific FAIR principle), (b) one or more identifiers of the FAIR principles (F1, F2…), (c) n/a, if no FAIR references were declared, or (d) none, if no FAIR references were observed. The metric_id and metric_text columns are used for the identifiers of the metrics and the textual and human-oriented content of the metrics respectively. The column metric_technology is used for enumerating the technologies (a term used in its widest acceptation) mentioned or used by the metrics for the specific assessment purpose. Such technologies include very diverse typologies ranging from (meta)data formats to standards, semantic technologies, protocols, and services. For tools implementing automated assessments, the technologies listed take into consideration also the available code and documentation, not just the metric text. The column metric_approach is used for identifying the type of implementation observed in the assessments. The identification of the implementation types followed a bottom-to-top approach applied to the metrics organised by the metric_fairness_scope_declared values. Consequently, while the labels used for creating the implementation type strings are the same, their combination and specialisation varies based on the characteristics of the actual set of metrics analysed. The main labels used are: (a) 3rd party service-based, (b) documentation-centred, (c) format-centred, (d) generic, (e) identifier-centred, (f) policy-centred, (g) protocol-centred, (h) metadata element-centred, (i) metadata schema-centred, (j) metadata value-centred, (k) service-centred, and (l) na. The columns provenance and last_accessed_date are used for the main source of information about each metric (at least with regard to the text) and the date we last accessed it respectively. The data set "classified_technologies.csv" consists of the technologies mentioned or used by the metrics for the specific assessment purpose. It is structured into 3 columns: (i) technology, (ii) class, and (iii) discarded. The column technology is used for the names of the different technologies mentioned or used by the metrics. The column class is used for specifying the type of technology used. Possible values are: (a) application programming interface, (b) format, (c) identifier, (d) library, (e) licence, (f) protocol, (g) query language, (h) registry, (i) repository, (j) search engine, (k) semantic artefact, and (l) service. The discarded column refers to the exclusion of the value 'linked data' from the accepted technologies since it is too generic. The possible values are boolean. We assigned TRUE if the technology was discarded.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ABSTRACT The exponential increase of published data and the diversity of systems require the adoption of good practices to achieve quality indexes that enable discovery, access, and reuse. To identify good practices, an integrative review was used, as well as procedures from the ProKnow-C methodology. After applying the ProKnow-C procedures to the documents retrieved from the Web of Science, Scopus and Library, Information Science & Technology Abstracts databases, an analysis of 31 items was performed. This analysis allowed observing that in the last 20 years the guidelines for publishing open government data had a great impact on the Linked Data model implementation in several domains and currently the FAIR principles and the Data on the Web Best Practices are the most highlighted in the literature. These guidelines presents orientations in relation to various aspects for the publication of data in order to contribute to the optimization of quality, independent of the context in which they are applied. The CARE and FACT principles, on the other hand, although they were not formulated with the same objective as FAIR and the Best Practices, represent great challenges for information and technology scientists regarding ethics, responsibility, confidentiality, impartiality, security, and transparency of data.
Facebook
TwitterTHIS RESOURCE IS NO LONGER IN SERVICE. Documented on January 11, 2023.The Linked Clinical Trials (LinkedCT) project aims at publishing the first open Semantic Web data source for clinical trials data. The data exposed by LinkedCT is generated by (1) transforming existing data sources of clinical trials into RDF, and (2) discovering links between the records in the trials data and several other data sources. You may download static data dumps. The LinkedCT data space is published according to the principles of publishing Linked Data. These principles greatly enhance adaptability and usability of data on the web. Each entity in LinkedCT is identified by a unique HTTP dereferenceable Uniform Resource Identifier (URI). When the URI is looked up, related RDF statements about the entity is returned in HTML or RDF/XML based on the user''s agent. Moreover, a SPARQL endpoint is provided as the standard access method for RDF data.
Facebook
TwitterBringing together data on the United Kingdom's railway network under linked data principles.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
jats:titleAbstract/jats:title jats:pThe speed and accuracy of new scientific discoveries – be it by humans or artificial intelligence – depends on the quality of the underlying data and on the technology to connect, search and share the data efficiently. In recent years, we have seen the rise of graph databases and semi-formal data models such as knowledge graphs to facilitate software approaches to scientific discovery. These approaches extend work based on formalised models, such as the Semantic Web. In this paper, we present our developments to connect, search and share data about genome-scale knowledge networks (GSKN). We have developed a simple application ontology based on OWL/RDF with mappings to standard schemas. We are employing the ontology to power data access services like resolvable URIs, SPARQL endpoints, JSON-LD web APIs and Neo4j-based knowledge graphs. We demonstrate how the proposed ontology and graph databases considerably improve search and access to interoperable and reusable biological knowledge (i.e. the FAIRness data principles)./jats:p
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains data collected during a study "Smarter open government data for Society 5.0: are your open data smart enough" (Sensors. 2021; 21(15):5204) conducted by Anastasija Nikiforova (University of Latvia). It being made public both to act as supplementary data for "Smarter open government data for Society 5.0: are your open data smart enough" paper and in order for other researchers to use these data in their own work.
The data in this dataset were collected in the result of the inspection of 60 countries and their OGD portals (total of 51 OGD portal in May 2021) to find out whether they meet the trends of Society 5.0 and Industry 4.0 obtained by conducting an analysis of relevant OGD portals.
Each portal has been studied starting with a search for a data set of interest, i.e. “real-time”, “sensor” and “covid-19”, follwing by asking a list of additional questions. These questions were formulated on the basis of combination of (1) crucial open (government) data-related aspects, including open data principles, success factors, recent studies on the topic, PSI Directive etc., (2) trends and features of Society 5.0 and Industry 4.0, (3) elements of the Technology Acceptance Model (TAM) and the Unified Theory of Acceptance and Use Model (UTAUT).
The method used belongs to typical / daily tasks of open data portals sometimes called “usability test” – keywords related to a research question are used to filter data sets, i.e. “real-time”, “real time” and “real time”, “sensor”, covid”, “covid-19”, “corona”, “coronavirus”, “virus”. In most cases, “real-time”, “sensor” and “covid” keywords were sufficient.
The examination of the respective aspects for less user-friendly portals was adapted to particular case based on the portal or data set specifics, by checking:
1. are the open data related to the topic under question ({sensor; real-time; Covid-19}) published, i.e. available?
2. are these data available in a machine-readable format?
3. are these data current, i.e. regularly updated? Where the criteria on the currency depends on the nature of data, i.e. Covid-19 data on the number of cases per day is expected to be updated daily, which won’t be sufficient for real-time data as the title supposes etc.
4. is API ensured for these data? having most importance for real-time and sensor data;
5. have they been published in a timely manner? which was verified mainly for Covid-19 related data. The timeliness is assessed by comparing the dates of the first case identified in a given country and the first release of open data on this topic.
6. what is the total number of available data sets?
7. does the open government data portal provides use-cases / showcases?
8. does the open government portal provide an opportunity to gain insight into the popularity of the data, i.e. does the portal provide statistics of this nature, such as the number of views, downloads, reuses, rating etc.?
9. is there an opportunity to provide a feedback, comment, suggestion or complaint?
10. (9a) is the artifact, i.e. feedback, comment, suggestion or complaint, visible to other users?
Format of the file .xls, .ods, .csv (for the first spreadsheet only)
Licenses or restrictions CC-BY
For more info, see README.txt
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A patent is a set of exclusive rights granted to an inventor by a sovereign state for a solution, be it a product or a process, for a solution to a particular technological problem. The United States Patent and Trademark Office (USPTO) is part of the US department of Commerce that provides patents to businesses and inventors for their inventions in addition to registration of products and intellectual property identification. Each year, the USPTO grants over 150,000 patents to individuals and companies all over the world. As of December 2011, 8,743,423 patents have been issued and 16,020,302 applications have been received. The USPTO patents are accepted in electronic form and are filed as PDF documents. However, the indexing is not perfect and it is cumbersome to search through the PDF documents. Additionally, Google has also made all the patents available for download in XML format, albeit only from the years 2002 to 2015. Thus, we converted this bulk of data (spanning 13 years) from XML to RDF to conform to the Linked Data principles.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Knowledge graphs are able to capture, enrich and disseminate research data objects so that the FAIR and Linked Data principles are fulfilled. How knowledge graphs can improve the domain-specific (BERD) and cross-domain (NFDI) research data infrastructures? The answer is based on the use cases in BERD@NFDI and on activities of the NFDI working group “Knowledge graphs”. First, we describe the architecture, knowledge graphs and use cases in BERD@NFDI. Then, we present the NFDI working group “Knowledge Graphs”, its work plan and potential base services.
Facebook
TwitterThe ckanext-data-depositario extension customizes CKAN specifically for the depositar research data repository. It contains most of the instance-specific modifications, providing a tailored user experience. Functioning alongside other extensions like ckanext-depositartheme, ckanext-wikidatakeyword, and ckanext-citation, this central extension manages the core customizations required by the depositar instance. Key Features: Core Depositar Customizations: Centralizes the major site-specific modifications for the depositar CKAN instance. This implies handling unique data structures, workflows, or validation rules tailored to the repository's research data focus. Extension Dependency: Operates in conjunction with other specialized extensions, indicating a modular design. This layering enables focused development and maintenance of related features. Theming Support (via ckanext-depositartheme): Integration with a dedicated theming extension allows consistent branding and user interface customization specific to depositar. This ensures the visual identity aligns with the repository's goals. Wikidata Integration (via ckanext-wikidatakeyword): Enables the use of Wikidata for keyword management, which enriches metadata with linked data principles and improves discoverability by linking datasets to Wikidata concepts. Citation Management (via ckanext-citation): Facilitates the display and export of dataset citations, acknowledging the research effort in creating and sharing data. This feature supports academic standards and ensures proper data attribution. Technical Integration: While detailed integration steps are available in the linked documentation, the extension likely uses CKAN's plugin architecture to modify various aspects of the platform. This includes:
Facebook
TwitterWeb application to discover resources available at participating networked universities. This distributed platform for creating and sharing semantically rich data is built around semantic web technologies and follows linked open data principles.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ABSTRACT In Scientometric Studies, measuring scientific indicators is a complex task due to the challenges associated with data collection, organization and linking, especially in the web, where data is distributed in various sources and incompatible formats. These problems can be tackled with the technological and methodological techniques based on the Linked Open Data principles. These principles cover a set of the best practices from the fields of Semantic Web and Open Data to organize, publish and interlink the data on the Web. With the use of those best practices, the data can be accessed and consumed without restrictions, in many applications. This paper addresses the availability of a Qualis historical dataset, according to the mentioned principles. In Scientometric studies, this effort is important for data reuse, taking into the account: measuring an evolution of scientific journals; assisting production of qualitative and quantitative measures of scientific publications; or obtaining relevant information by interlinking and exploring other scientific indicators. The availability of the Qualis dataset is verified by the three use cases. As a result, the Qualis index (historical series 2005-2013) is shared by a web interface for: (i) furthering the data reuse and integration; and (ii) supporting the interoperability and computational processability of the available resources.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Graffiti is an urban phenomenon that is increasingly attracting the interest of the sciences. To the best of our knowledge, no suitable data corpora are available for systematic research until now. The Information System Graffiti in Germany project (INGRID) closes this gap by dealing with graffiti image collections that have been made available to the project for public use. Within INGRID, the graffiti images are collected, digitized and annotated. With this work, we aim to support the rapid access to a comprehensive data source on INGRID targeted especially by researchers. In particular, we present INGRIDKG, an RDF knowledge graph of annotated graffiti, abides by the Linked Data and FAIR principles. We weekly update INGRIDKG by augmenting the new annotated graffiti to our knowledge graph. Our generation pipeline applies RDF data conversion, link discovery and data fusion approaches to the original data. The current version of INGRIDKG contains 460,640,154 triples and is linked to 3 other knowledge graphs by over 200,000 links. In our use case studies, we demonstrate the usefulness of our knowledge graph for different applications. INGRIDKG is publicly available under the Creative Commons Attribution 4.0 International license.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This repository contains a dataset of higher education institutions in the United States of America. This dataset was compiled in response to a cybersecurity research of American higher education institutions' websites [1]. The data is being made publicly available to promote open science principles [2].
The data includes the following fields for each institution:
The dataset was obtained from the Higher Education Integrated Data System (IPEDS) website [3], which is administered by the National Center for Education Statistics (NCES). NCES serves as the primary federal entity for collecting and analyzing education-related data in the United States. The data was collected on February 2, 2023.
The initial list of institutions was derived from the IPEDS database using the following criteria: (1) US institutions only, (2) degree-granting institutions, primarily bachelor's or higher, and (3) industry classification, which includes: public 4 - year or above, private not-for-profit 4 years or more, private for-profit 4 years or more, public 2 years, private not-for-profit 2 years, private for-profit 2 years, public less than 2 years, private not-for-profit for-profit less than 2 years and private for-profit less than 2 years.
The following variables have been added to the list of institutions: Control of the institution, state abbreviation, degree-granting status, Status of the institution, and Institution's internet website address. This resulted in a report with 1,979 institutions.
The institution's status was labeled with the following values: A (Active), N (New), R (Restored), M (Closed in the current year), C (Combined with another institution), D (Deleted out of business), I (Inactive due to hurricane-related issues), O (Outside IPEDS scope), P (Potential new/add institution), Q (Potential institution reestablishment), W (Potential addition outside IPEDS scope), X ( Potential restoration outside the scope of IPEDS) and G (Perfect Children's Campus).
A filter was applied to the report to retain only institutions with an A, N, or R status, resulting in 1,978 institutions. Finally, a data cleaning process was applied, which involved removing the whitespace at the beginning and end of cell content and duplicate whitespace. The final data were compiled into the dataset included in this repository.
This data is available under the Creative Commons Zero (CC0) license and can be used for any purpose, including academic research purposes. We encourage the sharing of knowledge and the advancement of research in this field by adhering to open science principles [2].
If you use this data in your research, please cite the source and include a link to this repository. To properly attribute this data, please use the following DOI: 10.5281/zenodo.7614862
If you have any updates or corrections to the data, please feel free to open a pull request or contact us directly. Let's work together to keep this data accurate and up-to-date.
We would like to acknowledge the support of the Norte Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF), within the project "Cybers SeC IP" (NORTE-01-0145-FEDER-000044). This study was also developed as part of the Master in Cybersecurity Program at the Instituto Politécnico de Viana do Castelo, Portugal.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The World Hydrological Observing System (WHOS), operating under the World Meteorological Organization (WMO) Data Policy, serves as a global gateway for the standardized exchange of hydrological, meteorological, and climate-related environmental data. Designed to uphold principles of open access and transparency, WHOS eliminates the need for centralized data storage by dynamically linking users to original data providers—such as national hydrometeorological agencies, research institutions, and monitoring networks—through its advanced Discovery and Access Broker (DAB) technology. This middleware framework harmonizes disparate data formats and protocols (e.g., OGC WaterML 2.0, ISO metadata standards), enabling seamless interoperability across geographic and institutional boundaries. Users gain real-time access to critical datasets, including river discharge, groundwater levels, and precipitation trends, while adhering to strict Terms of Use that prohibit unauthorized commercial exploitation, mandate attribution to source agencies in publications or downstream services, and require acknowledgment of inherent risks (e.g., data latency, sensor inaccuracies).
The WMO explicitly disclaims liability for decisions or damages arising from data use, emphasizing user responsibility to verify data quality and applicability. Terms are subject to change, potentially altering access permissions or usage rights, necessitating regular policy reviews by stakeholders. By prioritizing decentralized governance and FAIR (Findable, Accessible, Interoperable, Reusable) data principles, WHOS empowers global collaboration in addressing water-related challenges, from transboundary basin management to climate adaptation strategies, while safeguarding data sovereignty and intellectual property rights of contributing entities.
Facebook
TwitterREADMEThis file explains all the variables and provides full references for the data in each of the datasets that accompany: Portalier S., Fussmann G. F., Loreau M. & Cherif M., 2018ms, The mechanics of predator-prey interactions: first principles of physics predict predator-prey size ratios.Predator-prey species-based dataThe file provides average body masses for predators and prey, across a wide range of sizes and different life media.Portalier_etal_2018_Predator_Prey_Species_Based_Data.csvPredator-prey individual-based dataThe file provides individual body masses of predators and prey in marine food webs.Portalier_etal_2018_Predator_Prey_Individual_Based_Data.csv
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundAccording to the FAIR principles (Findable, Accessible, Interoperable, and Reusable), scientific research data should be findable, accessible, interoperable, and reusable. The COVID-19 pandemic has led to massive research activities and an unprecedented number of topical publications in a short time. However, no evaluation has assessed whether this COVID-19-related research data has complied with FAIR principles (or FAIRness).ObjectiveOur objective was to investigate the availability of open data in COVID-19-related research and to assess compliance with FAIRness.MethodsWe conducted a comprehensive search and retrieved all open-access articles related to COVID-19 from journals indexed in PubMed, available in the Europe PubMed Central database, published from January 2020 through June 2023, using the metareadr package. Using rtransparent, a validated automated tool, we identified articles with links to their raw data hosted in a public repository. We then screened the link and included those repositories that included data specifically for their pertaining paper. Subsequently, we automatically assessed the adherence of the repositories to the FAIR principles using FAIRsFAIR Research Data Object Assessment Service (F-UJI) and rfuji package. The FAIR scores ranged from 1–22 and had four components. We reported descriptive analysis for each article type, journal category, and repository. We used linear regression models to find the most influential factors on the FAIRness of data.Results5,700 URLs were included in the final analysis, sharing their data in a general-purpose repository. The mean (standard deviation, SD) level of compliance with FAIR metrics was 9.4 (4.88). The percentages of moderate or advanced compliance were as follows: Findability: 100.0%, Accessibility: 21.5%, Interoperability: 46.7%, and Reusability: 61.3%. The overall and component-wise monthly trends were consistent over the follow-up. Reviews (9.80, SD = 5.06, n = 160), articles in dental journals (13.67, SD = 3.51, n = 3) and Harvard Dataverse (15.79, SD = 3.65, n = 244) had the highest mean FAIRness scores, whereas letters (7.83, SD = 4.30, n = 55), articles in neuroscience journals (8.16, SD = 3.73, n = 63), and those deposited in GitHub (4.50, SD = 0.13, n = 2,152) showed the lowest scores. Regression models showed that the repository was the most influential factor on FAIRness scores (R2 = 0.809).ConclusionThis paper underscored the potential for improvement across all facets of FAIR principles, specifically emphasizing Interoperability and Reusability in the data shared within general repositories during the COVID-19 pandemic.
Facebook
TwitterThe EPA Office of Water’s Watershed Assessment, Tracking and Environmental Results system (WATERS) integrates water-related information by linking it to the NHDPlus stream network. The National Hydrgraphy Dataset Plus (NHDPlus) provides the underlying geospatial hydrologic framework that supports a variety of network-based capabilities including upstream/downstream search and watershed delineation. The WATERS GeoViewer provides easy access to these data and capabilities via the Internet on any desktop or mobile device. It implements the concepts and principles of the Open Water Data Initiative, including the hydrologic Network Linked Data Index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
KEES (Knowledge Exchange Engine Schema ) ontology describes a knowledge base configuration in terms of ABox and TBox statements together with their accrual and reasoning policies. This vocabulary is designed to drive automatic data ingestion in a graph database according KEES and Linked (Open) Data principles. @en
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
A manually curated registry of standards, split into three types - Terminology Artifacts (ontologies, e.g. Gene Ontology), Models and Formats (conceptual schema, formats, data models, e.g. FASTA), and Reporting Guidelines (e.g. the ARRIVE guidelines for in vivo animal testing). These are linked to the databases that implement them and the funder and journal publisher data policies that recommend or endorse their use.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The study addresses the use of the Semantic Web and Linked Data principles proposed by the World Wide Web Consortium for the development of Web application for semantic management of scanned documents. The main goal is to record scanned documents describing them in a way the machine is able to understand and process them, filtering content and assisting us in searching for such documents when a decision-making process is in course. To this end, machine-understandable metadata, created through the use of reference Linked Data ontologies, are associated to documents, creating a knowledge base. To further enrich the process, (semi)automatic mashup of these metadata with data from the new Web of Linked Data is carried out, considerably increasing the scope of the knowledge base and enabling to extract new data related to the content of stored documents from the Web and combine them, without the user making any effort or perceiving the complexity of the whole process.