4 datasets found

h
pareto-ogbn-arxiv
huggingface.co
Updated Feb 14, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saurav Maheshkar (2024). pareto-ogbn-arxiv [Dataset]. https://huggingface.co/datasets/SauravMaheshkar/pareto-ogbn-arxiv
Explore at:
Dataset updated
Feb 14, 2024
Authors
Saurav Maheshkar
License
https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/
Description
Dataset Information

Nodes

Edges

Features

169,343 1,166,243 128

Pre-processed as per the official codebase of https://arxiv.org/abs/2210.02016

Citations

@article{ju2023multi, title={Multi-task Self-supervised Graph Neural Networks Enable Stronger Task Generalization}, author={Ju, Mingxuan and Zhao, Tong and Wen, Qianlong and Yu, Wenhao and Shah, Neil and Ye, Yanfang and Zhang, Chuxu}, booktitle={International Conference on Learning… See the full description on the dataset page: https://huggingface.co/datasets/SauravMaheshkar/pareto-ogbn-arxiv.
OGBN-Proteins (Processed for PyG)
kaggle.com
zip
Updated Feb 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Redao da Taupl (2021). OGBN-Proteins (Processed for PyG) [Dataset]. https://www.kaggle.com/dataup1/ogbn-proteins
Explore at:
zip(677947148 bytes)Available download formats
Dataset updated
Feb 27, 2021
Authors
Redao da Taupl
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
OGBN-Proteins

Webpage: https://ogb.stanford.edu/docs/nodeprop/#ogbn-proteins

Usage in Python

import os.path as osp import pandas as pd import torch import torch_geometric.transforms as T from ogb.nodeproppred import PygNodePropPredDataset class PygOgbnProteins(PygNodePropPredDataset): def _init_(self, meta_csv = None): root, name, transform = '/kaggle/input', 'ogbn-proteins', T.ToSparseTensor() if meta_csv is None: meta_csv = osp.join(root, name, 'ogbn-master.csv') master = pd.read_csv(meta_csv, index_col = 0) meta_dict = master[name] meta_dict['dir_path'] = osp.join(root, name) super()._init_(name = name, root = root, transform = transform, meta_dict = meta_dict) def get_idx_split(self, split_type = None): if split_type is None: split_type = self.meta_info['split'] path = osp.join(self.root, 'split', split_type) if osp.isfile(os.path.join(path, 'split_dict.pt')): return torch.load(os.path.join(path, 'split_dict.pt')) if self.is_hetero: train_idx_dict, valid_idx_dict, test_idx_dict = read_nodesplitidx_split_hetero(path) for nodetype in train_idx_dict.keys(): train_idx_dict[nodetype] = torch.from_numpy(train_idx_dict[nodetype]).to(torch.long) valid_idx_dict[nodetype] = torch.from_numpy(valid_idx_dict[nodetype]).to(torch.long) test_idx_dict[nodetype] = torch.from_numpy(test_idx_dict[nodetype]).to(torch.long) return {'train': train_idx_dict, 'valid': valid_idx_dict, 'test': test_idx_dict} else: train_idx = dt.fread(osp.join(path, 'train.csv'), header = None).to_numpy().T[0] train_idx = torch.from_numpy(train_idx).to(torch.long) valid_idx = dt.fread(osp.join(path, 'valid.csv'), header = None).to_numpy().T[0] valid_idx = torch.from_numpy(valid_idx).to(torch.long) test_idx = dt.fread(osp.join(path, 'test.csv'), header = None).to_numpy().T[0] test_idx = torch.from_numpy(test_idx).to(torch.long) return {'train': train_idx, 'valid': valid_idx, 'test': test_idx}

dataset = PygOgbnProteins() split_idx = dataset.get_idx_split() train_idx, valid_idx, test_idx = split_idx['train'], split_idx['valid'], split_idx['test'] graph = dataset[0] # PyG Graph object

Description

Graph: The ogbn-proteins dataset is an undirected, weighted, and typed (according to species) graph. Nodes represent proteins, and edges indicate different types of biologically meaningful associations between proteins, e.g., physical interactions, co-expression or homology [1,2]. All edges come with 8-dimensional features, where each dimension represents the strength of a single association type and takes values between 0 and 1 (the larger the value is, the stronger the association is). The proteins come from 8 species.

Prediction task: The task is to predict the presence of protein functions in a multi-label binary classification setup, where there are 112 kinds of labels to predict in total. The performance is measured by the average of ROC-AUC scores across the 112 tasks.

Dataset splitting: The authors split the protein nodes into training/validation/test sets according to the species which the proteins come from. This enables the evaluation of the generalization performance of the model across different species.

Note: For undirected graphs, the loaded graphs will have the doubled number of edges because the bidirectional edges will be added automatically.

Summary

Package #Nodes #Edges Split Type Task Type Metric
ogb>=1.1.1 132,534 39,561,252 Species Multi-label binary classification ROC-AUC

Open Graph Benchmark

Website: https://ogb.stanford.edu

The Open Graph Benchmark (OGB) [3] is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs. OGB datasets are automatically downloaded, processed, and split using the OGB Data Loader. The model performance can be evaluated using the OGB Evaluator in a unified manner.

References

[1] Damian Szklarczyk, Annika L Gable, David Lyon, Alexander Junge, Stefan Wyder, Jaime Huerta-Cepas, Milan Simonovic, Nadezhda T Doncheva, John H Morris, Peer Bork, et al. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Research, 47(D1):D607–D613, 2019. [2] Gene Ontology Consortium. The gene ontology resource: 20 years and still going strong. Nucleic Acids Research, 47(D1):D330–D338, 2018. [3] Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. Open graph benchmark: Datasets for machine learning on graphs. Advances in Neural Information Processing Systems, pp. 22118–22133, 2020.

Disclaimer

I am NOT the author of this dataset. It was downloaded from its official website. I assume no responsibility or liability for the content in this dataset. Any questions, problems or issues, please contact the original authors at their website or their GitHub repo.
t
Cora, Citeseer, PubMed, Ogbn-arxiv, Amazon-Computer, Amazon-Photo - Dataset...
service.tib.eu
Updated Dec 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Cora, Citeseer, PubMed, Ogbn-arxiv, Amazon-Computer, Amazon-Photo - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/cora--citeseer--pubmed--ogbn-arxiv--amazon-computer--amazon-photo
Explore at:
Dataset updated
Dec 2, 2024
Description
Graph classification and node classification datasets
Multi-labeled node classification performance (AUC-ROC) in ogbn-protein.
plos.figshare.com
xls
Updated Jun 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Junghun Kim; Jinhong Jung; U. Kang (2023). Multi-labeled node classification performance (AUC-ROC) in ogbn-protein. [Dataset]. http://doi.org/10.1371/journal.pone.0256187.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0256187.t004
Dataset updated
Jun 10, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Junghun Kim; Jinhong Jung; U. Kang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The distillations are done from trained teachers with different numbers of GCN layers: 3, 7, 14, 28, and 56. Note that the proposed method Student_MustaD provides the best performance among the student models.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Package	#Nodes	#Edges	Split Type	Task Type	Metric
`ogb>=1.1.1`	132,534	39,561,252	Species	Multi-label binary classification	ROC-AUC

Facebook

Twitter

Click to copy link

Link copied

Cite

Saurav Maheshkar (2024). pareto-ogbn-arxiv [Dataset]. https://huggingface.co/datasets/SauravMaheshkar/pareto-ogbn-arxiv

pareto-ogbn-arxiv

SauravMaheshkar/pareto-ogbn-arxiv

Explore at:

Dataset updated

Feb 14, 2024

Authors

Saurav Maheshkar

License

https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/

Description

Dataset Information

Nodes

Edges

Features

169,343 1,166,243 128

Pre-processed as per the official codebase of https://arxiv.org/abs/2210.02016

  Citations

@article{ju2023multi, title={Multi-task Self-supervised Graph Neural Networks Enable Stronger Task Generalization}, author={Ju, Mingxuan and Zhao, Tong and Wen, Qianlong and Yu, Wenhao and Shah, Neil and Ye, Yanfang and Zhang, Chuxu}, booktitle={International Conference on Learning… See the full description on the dataset page: https://huggingface.co/datasets/SauravMaheshkar/pareto-ogbn-arxiv.

Clear search

Close search

Google apps

Main menu

pareto-ogbn-arxiv

Nodes

Edges

Features

OGBN-Proteins (Processed for PyG)

OGBN-Proteins

Usage in Python

Description

Summary

Open Graph Benchmark

References

Disclaimer

Cora, Citeseer, PubMed, Ogbn-arxiv, Amazon-Computer, Amazon-Photo - Dataset...

Multi-labeled node classification performance (AUC-ROC) in ogbn-protein.

pareto-ogbn-arxivSee More Versions

SauravMaheshkar/pareto-ogbn-arxiv

Nodes

Edges

Features

pareto-ogbn-arxiv