4 datasets found
  1. PheKnowLator Builds -- CERLIB Challenge

    • zenodo.org
    txt
    Updated Oct 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2023). PheKnowLator Builds -- CERLIB Challenge [Dataset]. http://doi.org/10.5281/zenodo.10052203
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 30, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PHENOTYPE KNOWLEDGE TRANSLATOR (PHEKNOWLATOR)

    2021 Continuous Evaluation of Relational Learning in Biomedicine (CERLIB)

    OVERVIEW

    • Introduction
    • Knowledge Graph Builds
    • Challenge Data
    • Challenge Relations
    • Updates

    INTRODUCTION

    PheKnowLator (Phenotype Knowledge Translator), is a Python 3 library that constructs semantically-rich, large-scale biomedical knowledge graphs under different semantic models. PheKnowLator is also a data sharing hub, providing downloadable versions of prebuilt knowledge graphs. For this challenge, the PheKnowLator knowledge graphs have been designed to model mechanisms of human disease and were built using 12 open biomedical ontologies, 24 linked open datasets, and results from two large-scale, experimentally-derived datasets. For additional information see the associated GitHub website: https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0. For a visual representation of the resources used (and their relationships) in the PheKnowLator knowledge graphs, click the link below.

    KNOWLEDGE GRAPH BUILDS

    PheKnowLator was designed to generate knowledge graphs under different semantic models and to provide users with complete flexibility throughout the construction process. At its core, PheKnowLator is built on a core set of Open Biomedical Ontologies (OBOs), which are extended with external data sources by utilizing different knowledge models. The software allows users the flexibility to customize the following parameters:

    1. Construction Approach: The semantic model utilized when integrating ontology and non-ontology data. The two available models are instance and subclass (details here: https://bit.ly/3p0ZNgg). We are providing an instance-based build for the challenge.
    2. Relations: A single relation can be added (relations_only) or each relation and its inverse can be added (inverse_relations). We are providing a knowledge graph built with inverse relations for the challenge.
    3. OWL Decoding: An OWL-decoded version of the full semantic knowledge graph. The method that we currently provide is called OWL-NETS (details here: https://bit.ly/35XCP2g), which decodes all triples needed to support OWL expressivity, but which alone are not biologically meaningful.

    CHALLENGE DATA

    With this information in mind, the Google Cloud Storage Bucket includes the data files listed below. Additional information for each file type can be found here: https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0#knowledge-graph-output.

    Knowledge Graph Data

    • PheKnowLator_v2.*.*_full_instance_inverseRelations_noOWL_OWLNETS.nt
    • PheKnowLator_v2.*.*_full_instance_inverseRelations_noOWL_OWLNETS_NetworkxMultiDiGraph.gpickle

    Edge Lists

    • PheKnowLator_v2.*.*_full_instance_inverseRelations_noOWL_Triples_Identifiers.txt
    • PheKnowLator_v2.*.*_full_instance_inverseRelations_noOWL_Triples_Integers.txt
    • PheKnowLator_v2.*.*_full_instance_inverseRelations_noOWL_Triples_Integer_Identifier_Map.json

    Metadata

    • Node and Relation Metadata including labels, synonyms, and definitions. Additional information can be found here:
      • PheKnowLator_v2.*.*_full_instance_inverseRelations_noOWL_NodeLabels.txt
      • node_metadata_dict.pkl

    CHALLENGE RELATIONS

    We will evaluate predictions on 15 Relation Ontology (RO) relations utilized in 34 distinct edge types. Additional details on these edge types can be found here: https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0#edge-data. The 15 RO relations and their associated edge types are shown in the table below.

    BUILD UPDATES

    Below we note important updates to each build. For additional information on each build please see the project Wiki (https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0) and for more information on the data sources that are used for each build see: https://github.com/callahantiff/PheKnowLator/wiki/v2-Data-Sources.

    JANUARY 2021

    • No data found for the pathway-gomf edge type, which was identified as being due to a change in the input data file, which is downloaded from Reactome

    APRIL 2021

    • Significant updates made to the workflow for building the graphs. This should have only a marginal impact on the resulting the knowledge graphs
    • Changes made to the edge types from the table above. We are no longer supporting chemical-rna edge types
    • The filtering applied to the input data when constructing the edges was updated to reduce potential variance in the quality of the resulting edges. Please see the descriptions for each data source (here: https://github.com/callahantiff/PheKnowLator/wiki/v2-Data-Sources) for addition information

    MAY 2021

    • pathway-gomf edges are back, Reactome appears to have resolved the errors we discovered in January 2021
    • disease-phenotype edge data may change slightly as the HPO changed the phenotype annotation file and associated formatting
    • gene-gene edge data count has drastically decreased. The cause has been identified as changes that GeneMania has made to their data (change to origin data noted as 04/27/2021)

    JUNE 2021

    • Build successful. No issues to report

    JULY 2021

    • Build successful. No issues to report

    AUGUST 2021

    • Build successful. No issues to report

    SEPTEMBER 2021

    • Build successful. No issues to report

    OCTOBER 2021

    NOVEMBER 2021

    • Build successful. No issues to report

  2. g

    Autonomous Knowledge Extractor | gimi9.com

    • gimi9.com
    Updated Mar 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Autonomous Knowledge Extractor | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_https-dane-gov-pl-pl-dataset-3071-autonomiczny-ekstraktor-wiedzy
    Explore at:
    Dataset updated
    Mar 24, 2023
    Description

    Industrial research: Task No. 1 - Development of algorithms for extracting objects from data The task includes industrial works consisting in the development of algorithms for extracting objects from data. The basic assumption of the semantic web is to operate on objects that have specific attributes and relations between them. Assuming that the input data to the system usually have a weak structure (textual or structured documents with general attributes, e.g. title, creator, etc.), it is necessary to develop methods for extracting objects of basic types representing typical concepts from the real world, such as people, institutions, places, dates etc. Tasks of this type are performed by algorithms from the group of natural language processing and entity extraction. The main technological issue of this stage was to develop an algorithm that would extract entities from documents with a weak structure - in the extreme case, text documents - as efficiently as possible. For this purpose, it was necessary to process documents included in the shared internal representation and extract entities in a generalized way, regardless of the source form of the document. Detailed tasks (milestones): Development of an algorithm for pre-processing data and internal representation of a generalized document. As part of the task, methods of pre-processing documents from various sources and in various formats will be selected to a common form on which further algorithms will operate. As input - text documents (pdf, word etc), scans of printed documents (we do not include handwriting), web documents (HTML pages), other databases (relational tables), csv/xls files, XML files Development of an algorithm for extracting simple attributes from documents - Extraction of simple scalar attributes from processed documents, such as dates and numbers, taking into account the metadata existing in source systems and document templates for groups of documents with a similar structure. Development of an entity extraction algorithm from documents for basic classes of objects - entity extraction in unstructured text documents based on NLP techniques based on the developed language corpus for Polish and English with the possibility of development for other languages, taking into account the basic types of real-world objects (places, people , institutions, events, etc.) Industrial research: Task No. 2 - Development of algorithms for automatic ontology creation As part of task 2, it is planned to develop algorithms for automatic ontology creation. Reducing the impact of the human factor on data organization processes requires the development of algorithms that will significantly automate the process of classifying and organizing data imported to the system. It requires the use of advanced knowledge modeling techniques such as ontology extraction and thematic modeling. These algorithms are usually based on text statistics and the quality of their operation largely depends on the quality of the input data. This creates the risk that models created by algorithms may differ from expert models used by field experts. It is therefore necessary to take this risk into account in the architecture of the solution. Detailed tasks (milestones): Development of an algorithm for organizing objects in dictionaries and deduplication of entities in dictionaries - The purpose of the task is to develop an algorithm that organizes objects identified in previously developed algorithms in such a way as to prevent duplication of objects representing the same concepts and to enable the presentation of appropriate relationships between nodes of the semantic network. Development of an extraction algorithm for a domain ontological model - Requires the use of sophisticated methods of analyzing the accumulated corpus of documents in terms of identifying concepts and objects specific to the domain. The task will be carried out by a research unit experienced in the field of creating ontological models. Development of a semantic tagging algorithm - Requires the use of topic modeling methods. The task will be carried out by a research unit experienced in the field of creating ontological models. Development of a method of representing the semantic model in the database - The aim of the task is to develop a method of encoding information resulting from the operation of previous algorithms in such a way that it can be saved in a scalable manner in the appropriate database. Experimental development work: Task No. 3 - Prototype of the system The purpose of this task was to create an application prototype that would enable validation of the possibility of implementing the application on a real scale of applications (millions of documents) and functional usability for the end user. The problem faced by semantic modeling researchers is that they often work with theoretical models expressed in languages that are optimal for mathematical modeling but unscaled for production use. Therefore, it was necessary to develop an architecture that would enable scaling of the developed algorithms to process large data sets. Another aspect of semantic solutions is the problem of usability for end users. These solutions are based on advanced concepts, which forces a complex internal structure of the systems and complicated access to data. To ensure the usability of the project, it was necessary to develop a user interface that would offer the use of advanced data operations to the common user. Detailed tasks (milestones): Development of methods for obtaining data from various sources - the goal of the task is to develop an appropriate architecture and pipelines for processing data obtained from heterogeneous sources and formats in order to collect them in a coherent form in a central knowledge repository. It requires the use of an ETL/ESB type architecture based on a queuing system and distributed processing. Development of a large-scale data processing architecture by developed algorithms - the goal of the task is to develop an implementation architecture that would enable the implementation of the developed algorithms on a large scale, e.g. on the basis of distributed processing systems such as Apache Spark. Development of scalable data storage methods - the aim of the task is to select a data storage environment that enables effective representation of knowledge as a semantic network. The use of a graph database engine or a base that supports the RDF format will be required here. Development of an API enabling data mining - the aim of the task is to develop an API enabling the use of semantic knowledge accumulated in the system by various types of algorithms for further data processing, machine learning and artificial intelligence. A probable solution here may be to create an interface based on the SPARQL standard. Development of a prototype of a user interface for data mining - the aim of the task is to develop an ergonomic interface that allows domain users to explore and analyze the collected data. It is necessary to develop a method of generating an interface that automatically adapts to the type of data that is collected in the system, enabling data exploration by asking queries on the "Query By Example" basis, faceted/faceted search and traversing relationships between entities in the semantic model.

  3. i

    American Sign Language dataset for semantic communications

    • ieee-dataport.org
    • zenodo.org
    Updated Jan 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vasileios Kouvakis (2025). American Sign Language dataset for semantic communications [Dataset]. http://doi.org/10.21227/2c1z-8j21
    Explore at:
    Dataset updated
    Jan 12, 2025
    Dataset provided by
    IEEE Dataport
    Authors
    Vasileios Kouvakis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    The dataset was developed as part of the NANCY project (https://nancy-project.eu/) to support tasks in the computer vision area. It is specifically designed for sign language recognition, focusing on representing joints and finger positions. The dataset comprises images of hands that represent the alphabet in American Sign Language (ASL), with the exception of the letters "J" and "Z," as these involve motion and the dataset is limited to static images. A significant feature of the dataset is the use of color-coding, where each finger is associated with a distinct color. This approach enhances the ability to extract features and distinguish between different fingers, offering significant advantages over traditional grayscale datasets like MNIST. The dataset consists of RGB images, which enhance the recognition process and support more effective learning, achieving high performance even with a relatively modest amount of training data. This format improves the ability to discriminate and extract features compared to grayscale images. Although the use of RGB images introduces additional complexity, such as increased data representation and storage requirements, the advantages in accuracy and feature extraction make it a valuable choice. The dataset is well-suited for applications involving gesture recognition, sign language interpretation, and other tasks requiring detailed analysis of joint and finger positions. The NANCY project has received funding from the Smart Networks and Services Joint Undertaking (SNS JU) under the European Union's Horizon Europe research and innovation programme under Grant Agreement No 101096456.

  4. d

    Autonomiczny Ekstraktor Wiedzy

    • dane.gov.pl
    txt, xml
    Updated Mar 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Business Online Services Sp. z o.o. (2024). Autonomiczny Ekstraktor Wiedzy [Dataset]. https://dane.gov.pl/en/dataset/3071
    Explore at:
    txt, xmlAvailable download formats
    Dataset updated
    Mar 19, 2024
    Dataset authored and provided by
    Business Online Services Sp. z o.o.
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Badania przemysłowe: Zadanie nr 1 - Opracowanie algorytmów ekstrakcji obiektów z danych

    W ramach zadania przewidziano prace przemysłowe polegające na opracowaniu algorytmów ekstrakcji obiektów z danych. Podstawowym założeniem sieci semantycznej jest operowanie na obiektach, które posiadają konkretne atrybuty i relacjach między nimi. Zakładając, że dane wejściowe do systemu mają zazwyczaj słabą strukturę (dokumenty tekstowe lub strukturalne o atrybutach ogólnych np. tytuł, twórca itp.) konieczne jest opracowanie metod ekstrakcji obiektów podstawowych typów reprezentujących typowe pojęcia ze świata rzeczywistego, jak osoby, instytucje, miejsca, daty itd. Zadania tego typu są realizowane przez algorytmy z grupy przetwarzania języka naturalnego i ekstrakcji encji. Głównym zagadnieniem technologicznym tego etapu było wiec opracowanie algorytmu, który możliwie jak najskuteczniej będzie ekstrahował encje z dokumentów o słabej strukturze - w skrajnym przypadku dokumentów tekstowych. Konieczne było w tym celu przetworzenie dokumentów wchodzących do uwspólnionej wewnętrznej reprezentacji oraz ekstrahowanie encji w uogólniony sposób niezależnie od źródłowej postaci dokumentu.

    Zadania szczegółowe (kamienie milowe):

    1. Opracowanie algorytmu wstępnego przetwarzania danych i wewnętrznej reprezentacji uogólnionej dokumentu. W ramach zadania zostaną dobrane metody pre processingu dokumentów z różnych źródeł i w różnych formatach do wspólnej postaci na której działać będą dalsze algorytmy. Należy założyć jako wejście - dokumenty tekstowe (pdf, word etc), skany dokumentów drukowanych (nie uwzględniamy pisma ręcznego), dokumenty webowe (strony HTML), inne bazy danych (tabele relacyjne), pliki csv/xls, pliki XML
    2. Opracowanie algorytmu ekstrakcji atrybutów prostych z dokumentów - Ekstrakcja prostych atrybutów skalarnych z przetwarzanych dokumentów, takich jak daty i liczby z uwzględnieniem istniejących w systemach źródłowych meta danych oraz szablonów dokumentów dla grup dokumentów o podobnej strukturze.
    3. Opracowanie algorytmu ekstrakcji encji z dokumentów dla podstawowych klas obiektów - ekstrakcja encji w dokumentów tekstowych nie posiadających struktury na podstawie technik NLP na podstawie opracowanego korpusu języka dla języków polskiego i angielskiego z możliwością rozwoju dla innych języków z uwzględnieniem podstawowych typów obiektów świata rzeczywistego (miejsca, osoby, instytucje, wydarzenia itd.)

    Badania przemysłowe: Zadanie nr 2 - Opracowanie algorytmów automatycznego tworzenia ontologii

    W ramach zadania 2 przewidziano opracowanie algorytmów automatycznego tworzenia ontologii. Ograniczenie wpływu czynnika ludzkiego na procesy organizacji danych wymaga opracowania algorytmów, które w znaczny sposób zautomatyzują proces klasyfikacji i organizacji danych importowanych do systemu. Wymagane jest tutaj zastosowanie zaawansowanych technik modelowania wiedzy jak ekstrakcja ontologii i modelowanie tematyczne. Algorytmy te bazują zazwyczaj na statystyce tekstu i jakość ich działania w dużej mierze zależy od jakości danych wejściowych. Rodzi to ryzyko tego rodzaju, że modele tworzone przez algorytmy mogą się różnić od modeli eksperckich stosowanych przez ekspertów dziedzinowych. Konieczne jest więc uwzględnienie tego ryzyka w architekturze rozwiązania.

    Zadania szczegółowe (kamienie milowe):

    1. Opracowanie algorytmu organizacji obiektów w słowniki i deduplikacji encji w słownikach - Celem zadania jest opracowanie algorytmu porządkującego obiekty zidentyfikowane we wcześniej opracowanych algorytmach w taki sposób, aby zapobiec duplikacji obiektów reprezentujących te same pojęcia i aby umożliwić przedstawienie odpowiednich relacji między węzłami sieci semantycznej.
    2. Opracowanie algorytmu ekstrakcji dziedzinowego modelu ontologicznego - Wymaga zastosowania wyrafinowanych metod analizy zgromadzonego korpusu dokumentów pod kątem identyfikacji pojęć i obiektów specyficznych dla dziedziny. Zadanie będzie realizowane przez jednostkę naukową mającą doświadczenie w dziedzinie tworzenia modeli ontologicznych.
    3. Opracowanie algorytmu tagowania semantycznego - Wymaga zastosowania metod z obszaru modelowania tematycznego (topic modeling). Zadanie będzie realizowane przez jednostkę naukową mającą doświadczenie w dziedzinie tworzenia modeli ontologicznych.
    4. Opracowanie sposobu reprezentacji modelu semantycznego w bazie danych - Celem zadania jest opracowanie sposobu takiego kodowania informacji będącej wynikiem działania wcześniejszych algorytmów aby możliwe było jej zapisanie w sposób skalowalny w odpowiedniej bazie danych.

    Eksperymatalne prace rozwojowe: Zadanie nr 3 - Wykonanie prototypu systemu

    Celem tego zadania było wytworzenie prototypu aplikacji umożliwiającego walidację możliwości wdrożenia aplikacji w rzeczywistej skali zastosowań (miliony dokumentów) oraz użyteczności funkcjonalnej dla użytkownika końcowego. Problemem, który napotykają naukowcy zajmujący się tematem modelowania semantycznego jest to, że często pracują oni na modelach teoretycznych wyrażanych w językach optymalnych do modelowania matematycznego, ale nieskalowanych do zastosowań produkcyjnych. W związku z tym konieczne było opracowanie takiej architektury, która umożliwi skalowanie opracowanych algorytmów do realizacji przetwarzania dużych zbiorów danych. Innym aspektem rozwiązań semantycznych jest problem użyteczności dla użytkowników końcowych. Rozwiązania te bazują na zaawansowanych koncepcjach, co wymusza złożoną wewnętrzną strukturę systemów i skomplikowany dostęp do danych. Dla zapewnienia użyteczności projektu konieczne było opracowanie interfejsu użytkownika, który będzie oferował wykorzystanie zaawansowanych operacji na danych zwykłemu użytkownikowi.

    Zadania szczegółowe (kamienie milowe):

    1. Opracowanie metod pozyskiwania danych z różnych źródeł - celem zadania jest opracowanie odpowiedniej architektury i potoków przetwarzania danych pozyskiwanych z heterogenicznych źródeł i formatów w celu zgromadzenia w spójnej formie w centralnym repozytorium wiedzy. Wymaga zastosowania architektury typu ETL/ESB bazującej na systemie kolejkowym i rozproszonym przetwarzaniu.
    2. Opracowanie architektury przetwarzania danych w dużej skali przez opracowane algorytmy - celem zadania jest opracowanie architektury wdrożeniowej umożliwiającej uruchamianie opracowanych algorytmów w dużej skali np. na bazie rozproszonych systemów przetwarzania typu Apache Spark.
    3. Opracowanie skalowalnych metod składowania danych - celem zadania jest dobór środowiska składowania danych umożliwiającego skuteczną reprezentację wiedzy jako sieci semantycznej. Wymagane będzie tutaj zastosowanie silnika grafowej bazy danych lub bazy obsługującej format RDF.
    4. Opracowanie API umożliwiającego eksplorację danych - celem zadania jest opracowanie API umożliwiającego wykorzystanie wiedzy semantycznej zgromadzonej w systemie przez różnego rodzaju algorytmy dalszego przetwarzania danych, uczenia maszynowego i sztucznej inteligencji. Prawdopodobnym rozwiązaniem może być tutaj wytworzenie interfejsu bazującego na standardzie SPARQL.
    5. Opracowanie prototypu interfejsu użytkownika do eksploracji danych - celem zadania jest wypracowanie ergonomicznego interfejsu umożliwiającego użytkownikom dziedzinowym eksplorowanie i analizę zgromadzonych danych. Konieczne jest wypracowanie metody generowania interfejsu, który automatycznie dostosowuje się do rodzaju danych, które są zgromadzone w systemie umożliwiając eksplorację danych przez zadawanie zapytań na zasadzie "Query By Example", wyszukiwania aspektowego/fasetowego oraz trawersowania relacji między encjami w modelu semantycznym.
  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Zenodo (2023). PheKnowLator Builds -- CERLIB Challenge [Dataset]. http://doi.org/10.5281/zenodo.10052203
Organization logo

PheKnowLator Builds -- CERLIB Challenge

Explore at:
txtAvailable download formats
Dataset updated
Oct 30, 2023
Dataset provided by
Zenodohttp://zenodo.org/
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

PHENOTYPE KNOWLEDGE TRANSLATOR (PHEKNOWLATOR)

2021 Continuous Evaluation of Relational Learning in Biomedicine (CERLIB)

OVERVIEW

  • Introduction
  • Knowledge Graph Builds
  • Challenge Data
  • Challenge Relations
  • Updates

INTRODUCTION

PheKnowLator (Phenotype Knowledge Translator), is a Python 3 library that constructs semantically-rich, large-scale biomedical knowledge graphs under different semantic models. PheKnowLator is also a data sharing hub, providing downloadable versions of prebuilt knowledge graphs. For this challenge, the PheKnowLator knowledge graphs have been designed to model mechanisms of human disease and were built using 12 open biomedical ontologies, 24 linked open datasets, and results from two large-scale, experimentally-derived datasets. For additional information see the associated GitHub website: https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0. For a visual representation of the resources used (and their relationships) in the PheKnowLator knowledge graphs, click the link below.

KNOWLEDGE GRAPH BUILDS

PheKnowLator was designed to generate knowledge graphs under different semantic models and to provide users with complete flexibility throughout the construction process. At its core, PheKnowLator is built on a core set of Open Biomedical Ontologies (OBOs), which are extended with external data sources by utilizing different knowledge models. The software allows users the flexibility to customize the following parameters:

  1. Construction Approach: The semantic model utilized when integrating ontology and non-ontology data. The two available models are instance and subclass (details here: https://bit.ly/3p0ZNgg). We are providing an instance-based build for the challenge.
  2. Relations: A single relation can be added (relations_only) or each relation and its inverse can be added (inverse_relations). We are providing a knowledge graph built with inverse relations for the challenge.
  3. OWL Decoding: An OWL-decoded version of the full semantic knowledge graph. The method that we currently provide is called OWL-NETS (details here: https://bit.ly/35XCP2g), which decodes all triples needed to support OWL expressivity, but which alone are not biologically meaningful.

CHALLENGE DATA

With this information in mind, the Google Cloud Storage Bucket includes the data files listed below. Additional information for each file type can be found here: https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0#knowledge-graph-output.

Knowledge Graph Data

  • PheKnowLator_v2.*.*_full_instance_inverseRelations_noOWL_OWLNETS.nt
  • PheKnowLator_v2.*.*_full_instance_inverseRelations_noOWL_OWLNETS_NetworkxMultiDiGraph.gpickle

Edge Lists

  • PheKnowLator_v2.*.*_full_instance_inverseRelations_noOWL_Triples_Identifiers.txt
  • PheKnowLator_v2.*.*_full_instance_inverseRelations_noOWL_Triples_Integers.txt
  • PheKnowLator_v2.*.*_full_instance_inverseRelations_noOWL_Triples_Integer_Identifier_Map.json

Metadata

  • Node and Relation Metadata including labels, synonyms, and definitions. Additional information can be found here:
    • PheKnowLator_v2.*.*_full_instance_inverseRelations_noOWL_NodeLabels.txt
    • node_metadata_dict.pkl

CHALLENGE RELATIONS

We will evaluate predictions on 15 Relation Ontology (RO) relations utilized in 34 distinct edge types. Additional details on these edge types can be found here: https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0#edge-data. The 15 RO relations and their associated edge types are shown in the table below.

BUILD UPDATES

Below we note important updates to each build. For additional information on each build please see the project Wiki (https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0) and for more information on the data sources that are used for each build see: https://github.com/callahantiff/PheKnowLator/wiki/v2-Data-Sources.

JANUARY 2021

  • No data found for the pathway-gomf edge type, which was identified as being due to a change in the input data file, which is downloaded from Reactome

APRIL 2021

  • Significant updates made to the workflow for building the graphs. This should have only a marginal impact on the resulting the knowledge graphs
  • Changes made to the edge types from the table above. We are no longer supporting chemical-rna edge types
  • The filtering applied to the input data when constructing the edges was updated to reduce potential variance in the quality of the resulting edges. Please see the descriptions for each data source (here: https://github.com/callahantiff/PheKnowLator/wiki/v2-Data-Sources) for addition information

MAY 2021

  • pathway-gomf edges are back, Reactome appears to have resolved the errors we discovered in January 2021
  • disease-phenotype edge data may change slightly as the HPO changed the phenotype annotation file and associated formatting
  • gene-gene edge data count has drastically decreased. The cause has been identified as changes that GeneMania has made to their data (change to origin data noted as 04/27/2021)

JUNE 2021

  • Build successful. No issues to report

JULY 2021

  • Build successful. No issues to report

AUGUST 2021

  • Build successful. No issues to report

SEPTEMBER 2021

  • Build successful. No issues to report

OCTOBER 2021

NOVEMBER 2021

  • Build successful. No issues to report

Search
Clear search
Close search
Google apps
Main menu