https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is the Python wheel package file for PyTorch Geometric external library (to install PyG just pip install torch_geometric
). PyTorch Geometric is the torch implementation used to build the graph neural network. For details, please refer to torch_geometric.👋
Note: These library are not required to install PyG. I compile the wheel files because it takes a long to install them. If you want to use a specific version, please refer to this notebook.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Vishal Baraiya
Released under MIT
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
All-atom Diffusion Transformers - QM9 dataset
QM9 dataset from the paper "All-atom Diffusion Transformers: Unified generative modelling of molecules and materials", by Chaitanya K. Joshi, Xiang Fu, Yi-Lun Liao, Vahe Gharakhanyan, Benjamin Kurt Miller, Anuroop Sriram*, and Zachary W. Ulissi* from FAIR Chemistry at Meta (* Joint last author). Original data source: https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.datasets.QM9.html (Adapted from MoleculeNet)… See the full description on the dataset page: https://huggingface.co/datasets/chaitjo/QM9_ADiT.
These datasets are customized Torch Geometric Datasets that contain raw .off polygon meshes as well as preprocessed .pt files needed for training morphVQ models. morphVQ can be found at https://github.com/oothomas/morphVQ.
Original dataset to dataset containing image slices, related features, edge mappings, edge features etc which can be used to convert to a torch_geometric dataset easily
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for CSK
Dataset Summary
The CSL dataset is a synthetic dataset, to test GNN expressivity.
Supported Tasks and Leaderboards
CSL should be used for binary graph classification, on isomoprhism or not.
External Use
PyGeometric
To load in PyGeometric, do the following: from datasets import load_dataset
from torch_geometric.data import Data from torch_geometric.loader import DataLoader
dataset_hf =… See the full description on the dataset page: https://huggingface.co/datasets/graphs-datasets/CSL.
PyG库文件,导入后可以直接安装
!pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv torch_geometric --no-index --find-links=file:///kaggle/input/pyg-packages-torch1121-cu113/pyg-packages
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Source Paper: https://arxiv.org/abs/1802.06916
Usage
from torch_geometric.datasets.cornell import CornellTemporalHyperGraphDataset
dataset = CornellTemporalHyperGraphDataset(root = "./", name="tags-ask-ubuntu", split="train")
Citation
@article{Benson-2018-simplicial, author = {Benson, Austin R. and Abebe, Rediet and Schaub, Michael T. and Jadbabaie, Ali and Kleinberg, Jon}, title = {Simplicial closure and higher-order link prediction}, year = {2018}, doi =… See the full description on the dataset page: https://huggingface.co/datasets/SauravMaheshkar/tags-ask-ubuntu.
https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
Overview This database contains results from linear elastodynamic simulations performed on 2D “seat” geometries. The dataset comprises 1,800 examples generated using random configurations of holes (round or square), with six different parameterizations (1 round hole, 2 round holes, 3 round holes, 1 square hole, 2 square holes, 3 square holes). The geometries are grouped by sets of six in ascending order of their index. All simulations were carried out with linear T3 finite elements and time integration was done using the Newmark scheme. Matlab Files: seat_lin_i.mat Each seat_lin_i.mat file contains the following data: M: Mass matrix K: Stiffness matrix F: Time-dependent loading term ddlu: Boolean indices indicating free degrees of freedom lt: List of 400 time steps Uref: Primal solution field (displacements) over time The equation solved in these simulations is: M d2Uref/dt2 + K Uref = F. Python Reader: transfer_mat2py.py A Python script, transfer_mat2py.py, is provided to facilitate reading the .mat files within a Python environment. This script allows users to import the simulation data (mass matrix, stiffness matrix, loading terms, etc.) directly into their Python workflows. Python Files: seat_i.pt Each seat_i.pt file is stored in the torch_geometric.data “graph” format and contains: Node features: X: Node positions and local contributions of the stiffness matrix N: Node type (Dirichlet, non-zero Neumann, or zero Neumann) F: Loading term at each node Edge features: edge_attr: Stiffness matrix contributions associated with each edge edge_index: Graph connectivity Output fields: s1: First spatial mode of the primal solution s2: Second spatial mode of the primal solution s3: Third spatial mode of the primal solution This dataset can be used to develop and benchmark methods for reduced-order modeling, machine learning approaches in computational mechanics, or any application that requires detailed finite element simulations of linear elastodynamics on heterogeneous 2D geometries.
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Source Paper: https://arxiv.org/abs/1802.06916
Usage
from torch_geometric.datasets.cornell import CornellTemporalHyperGraphDataset
dataset = CornellTemporalHyperGraphDataset(root = "./", name="NDC-classes", split="train")
Citation
@article{Benson-2018-simplicial, author = {Benson, Austin R. and Abebe, Rediet and Schaub, Michael T. and Jadbabaie, Ali and Kleinberg, Jon}, title = {Simplicial closure and higher-order link prediction}, year = {2018}, doi =… See the full description on the dataset page: https://huggingface.co/datasets/SauravMaheshkar/NDC-classes.
This dataset provides preprocessed image-based and graph-based drug representations to facilitate research in multimodal learning for drug discovery and interaction prediction.
id2imageembedding.pt
{drug_id: image_embedding}
Details:
[CLS]
token embedding from a Vision Transformer (ViT).id2pyg.pt
{drug_id: torch_geometric.data.Data}
Details:
This dataset was created by CurisZhou
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for MNIST
Dataset Summary
The MNIST dataset consists of 55000 images in 10 classes, represented as graphs. It comes from a computer vision dataset.
Supported Tasks and Leaderboards
MNIST should be used for multiclass graph classification.
External Use
PyGeometric
To load in PyGeometric, do the following: from datasets import load_dataset
from torch_geometric.data import Data from torch_geometric.loader import DataLoader… See the full description on the dataset page: https://huggingface.co/datasets/graphs-datasets/MNIST.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Webpage: https://ogb.stanford.edu/docs/nodeprop/#ogbn-proteins
import os.path as osp
import pandas as pd
import torch
import torch_geometric.transforms as T
from ogb.nodeproppred import PygNodePropPredDataset
class PygOgbnProteins(PygNodePropPredDataset):
def _init_(self, meta_csv = None):
root, name, transform = '/kaggle/input', 'ogbn-proteins', T.ToSparseTensor()
if meta_csv is None:
meta_csv = osp.join(root, name, 'ogbn-master.csv')
master = pd.read_csv(meta_csv, index_col = 0)
meta_dict = master[name]
meta_dict['dir_path'] = osp.join(root, name)
super()._init_(name = name, root = root, transform = transform, meta_dict = meta_dict)
def get_idx_split(self, split_type = None):
if split_type is None:
split_type = self.meta_info['split']
path = osp.join(self.root, 'split', split_type)
if osp.isfile(os.path.join(path, 'split_dict.pt')):
return torch.load(os.path.join(path, 'split_dict.pt'))
if self.is_hetero:
train_idx_dict, valid_idx_dict, test_idx_dict = read_nodesplitidx_split_hetero(path)
for nodetype in train_idx_dict.keys():
train_idx_dict[nodetype] = torch.from_numpy(train_idx_dict[nodetype]).to(torch.long)
valid_idx_dict[nodetype] = torch.from_numpy(valid_idx_dict[nodetype]).to(torch.long)
test_idx_dict[nodetype] = torch.from_numpy(test_idx_dict[nodetype]).to(torch.long)
return {'train': train_idx_dict, 'valid': valid_idx_dict, 'test': test_idx_dict}
else:
train_idx = dt.fread(osp.join(path, 'train.csv'), header = None).to_numpy().T[0]
train_idx = torch.from_numpy(train_idx).to(torch.long)
valid_idx = dt.fread(osp.join(path, 'valid.csv'), header = None).to_numpy().T[0]
valid_idx = torch.from_numpy(valid_idx).to(torch.long)
test_idx = dt.fread(osp.join(path, 'test.csv'), header = None).to_numpy().T[0]
test_idx = torch.from_numpy(test_idx).to(torch.long)
return {'train': train_idx, 'valid': valid_idx, 'test': test_idx}
dataset = PygOgbnProteins()
split_idx = dataset.get_idx_split()
train_idx, valid_idx, test_idx = split_idx['train'], split_idx['valid'], split_idx['test']
graph = dataset[0] # PyG Graph object
Graph: The ogbn-proteins dataset is an undirected, weighted, and typed (according to species) graph. Nodes represent proteins, and edges indicate different types of biologically meaningful associations between proteins, e.g., physical interactions, co-expression or homology [1,2]. All edges come with 8-dimensional features, where each dimension represents the strength of a single association type and takes values between 0 and 1 (the larger the value is, the stronger the association is). The proteins come from 8 species.
Prediction task: The task is to predict the presence of protein functions in a multi-label binary classification setup, where there are 112 kinds of labels to predict in total. The performance is measured by the average of ROC-AUC scores across the 112 tasks.
Dataset splitting: The authors split the protein nodes into training/validation/test sets according to the species which the proteins come from. This enables the evaluation of the generalization performance of the model across different species.
Note: For undirected graphs, the loaded graphs will have the doubled number of edges because the bidirectional edges will be added automatically.
Package | #Nodes | #Edges | Split Type | Task Type | Metric |
---|---|---|---|---|---|
ogb>=1.1.1 | 132,534 | 39,561,252 | Species | Multi-label binary classification | ROC-AUC |
Website: https://ogb.stanford.edu
The Open Graph Benchmark (OGB) [3] is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs. OGB datasets are automatically downloaded, processed, and split using the OGB Data Loader. The model performance can be evaluated using the OGB Evaluator in a unified manner.
[1] Damian Szklarczyk, Annika L Gable, David Lyon, Alexander Junge, Stefan Wyder, Jaime Huerta-Cepas, Milan Simonovic, Nadezhda T Doncheva, John H Morris, Peer Bork, et al. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Research, 47(D1):D607–D613, 2019. [2] Gene Ontology Consortium. The gene ontology resource: 20 years and still going strong. Nucleic Acids Research, 47(D1):D330–D338, 2018. [3] Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. Open graph benchmark: Datasets for machine learning on graphs. Advances in Neural Information Processing Systems, pp. 22118–22133, 2020.
I am NOT the author of this dataset. It was downloaded from its official website. I assume no responsibility or liability for the content in this dataset. Any questions, problems or issues, please contact the original authors at their website or their GitHub repo.
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Source Paper: https://arxiv.org/abs/1802.06916
Usage
from torch_geometric.datasets.cornell import CornellTemporalHyperGraphDataset
dataset = CornellTemporalHyperGraphDataset(root = "./", name="NDC-substances-25", split="train")
Citation
@article{Benson-2018-simplicial, author = {Benson, Austin R. and Abebe, Rediet and Schaub, Michael T. and Jadbabaie, Ali and Kleinberg, Jon}, title = {Simplicial closure and higher-order link prediction}, year = {2018}, doi =… See the full description on the dataset page: https://huggingface.co/datasets/SauravMaheshkar/NDC-substances-25.
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Source Paper: https://arxiv.org/abs/1802.06916
Usage
from torch_geometric.datasets.cornell import CornellTemporalHyperGraphDataset
dataset = CornellTemporalHyperGraphDataset(root = "./", name="contact-high-school", split="train")
Citation
@article{Benson-2018-simplicial, author = {Benson, Austin R. and Abebe, Rediet and Schaub, Michael T. and Jadbabaie, Ali and Kleinberg, Jon}, title = {Simplicial closure and higher-order link prediction}, year = {2018}, doi =… See the full description on the dataset page: https://huggingface.co/datasets/SauravMaheshkar/contact-high-school.
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Source Paper: https://arxiv.org/abs/1802.06916
Usage
from torch_geometric.datasets.cornell import CornellTemporalHyperGraphDataset
dataset = CornellTemporalHyperGraphDataset(root = "./", name="email-Eu-25", split="train")
Citation
@article{Benson-2018-simplicial, author = {Benson, Austin R. and Abebe, Rediet and Schaub, Michael T. and Jadbabaie, Ali and Kleinberg, Jon}, title = {Simplicial closure and higher-order link prediction}, year = {2018}, doi =… See the full description on the dataset page: https://huggingface.co/datasets/SauravMaheshkar/email-Eu-25.
Dataset Card for AIDS
Dataset Summary
The AIDS dataset is a dataset containing compounds checked for evidence of anti-HIV activity..
Supported Tasks and Leaderboards
AIDS should be used for molecular classification, a binary classification task. The score used is accuracy with cross validation.
External Use
PyGeometric
To load in PyGeometric, do the following: from datasets import load_dataset
from torch_geometric.data import Data from… See the full description on the dataset page: https://huggingface.co/datasets/graphs-datasets/AIDS.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for CIFAR10
Dataset Summary
The CIFAR10 dataset consists of 45000 images in 10 classes, represented as graphs.
Supported Tasks and Leaderboards
CIFAR10 should be used for multiclass graph classification.
External Use
PyGeometric
To load in PyGeometric, do the following: from datasets import load_dataset
from torch_geometric.data import Data from torch_geometric.loader import DataLoader
dataset_hf =… See the full description on the dataset page: https://huggingface.co/datasets/graphs-datasets/CIFAR10.
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Dataset Card for ZINC
Dataset Summary
The ZINC dataset is a "curated collection of commercially available chemical compounds prepared especially for virtual screening" (Wikipedia).
Supported Tasks and Leaderboards
ZINC should be used for molecular property prediction (aiming to predict the constrained solubility of the molecules), a graph regression task. The score used is the MAE. The associated leaderboard is here: Papers with code leaderboard.… See the full description on the dataset page: https://huggingface.co/datasets/graphs-datasets/ZINC.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is the Python wheel package file for PyTorch Geometric external library (to install PyG just pip install torch_geometric
). PyTorch Geometric is the torch implementation used to build the graph neural network. For details, please refer to torch_geometric.👋
Note: These library are not required to install PyG. I compile the wheel files because it takes a long to install them. If you want to use a specific version, please refer to this notebook.