43 datasets found
  1. OGB(Open Graph Benchmark)

    • opendatalab.com
    zip
    Updated May 1, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford University (2020). OGB(Open Graph Benchmark) [Dataset]. https://opendatalab.com/OpenDataLab/OGB
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 1, 2020
    Dataset provided by
    微软研究院http://research.microsoft.com/en-us/
    Technical University of Dortmund
    Harvard University
    Stanford University
    Description

    The Open Graph Benchmark (OGB) is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs. OGB datasets are automatically downloaded, processed, and split using the OGB Data Loader. The model performance can be evaluated using the OGB Evaluator in a unified manner. OGB is a community-driven initiative in active development.

  2. t

    OGB - Dataset - LDM

    • service.tib.eu
    Updated Dec 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). OGB - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/ogb
    Explore at:
    Dataset updated
    Dec 2, 2024
    Description

    OGB is a collection of graph datasets.

  3. T

    ogbg_molpcba

    • tensorflow.org
    Updated Dec 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). ogbg_molpcba [Dataset]. https://www.tensorflow.org/datasets/catalog/ogbg_molpcba
    Explore at:
    Dataset updated
    Dec 14, 2022
    Description

    'ogbg-molpcba' is a molecular dataset sampled from PubChem BioAssay. It is a graph prediction dataset from the Open Graph Benchmark (OGB).

    This dataset is experimental, and the API is subject to change in future releases.

    The below description of the dataset is adapted from the OGB paper:

    Input Format

    All the molecules are pre-processed using RDKit ([1]).

    • Each graph represents a molecule, where nodes are atoms, and edges are chemical bonds.
    • Input node features are 9-dimensional, containing atomic number and chirality, as well as other additional atom features such as formal charge and whether the atom is in the ring.
    • Input edge features are 3-dimensional, containing bond type, bond stereochemistry, as well as an additional bond feature indicating whether the bond is conjugated.

    The exact description of all features is available at https://github.com/snap-stanford/ogb/blob/master/ogb/utils/features.py.

    Prediction

    The task is to predict 128 different biological activities (inactive/active). See [2] and [3] for more description about these targets. Not all targets apply to each molecule: missing targets are indicated by NaNs.

    References

    [1]: Greg Landrum, et al. 'RDKit: Open-source cheminformatics'. URL: https://github.com/rdkit/rdkit

    [2]: Bharath Ramsundar, Steven Kearnes, Patrick Riley, Dale Webster, David Konerding and Vijay Pande. 'Massively Multitask Networks for Drug Discovery'. URL: https://arxiv.org/pdf/1502.02072.pdf

    [3]: Zhenqin Wu, Bharath Ramsundar, Evan N Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S. Pappu, Karl Leswing, and Vijay Pande. MoleculeNet: a benchmark for molecular machine learning. Chemical Science, 9(2):513-530, 2018.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('ogbg_molpcba', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

    https://storage.googleapis.com/tfds-data/visualization/fig/ogbg_molpcba-0.1.3.png" alt="Visualization" width="500px">

  4. r

    Open Graph Benchmark (OGB) - proteins dataset

    • resodate.org
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Houyi Li; Zhihong Chen; Zhao Li; Qinkai Zheng; Peng Zhang; Shuigeng Zhou (2024). Open Graph Benchmark (OGB) - proteins dataset [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQvb3Blbi1ncmFwaC1iZW5jaG1hcmstLW9nYi0tLS1wcm90ZWlucy1kYXRhc2V0
    Explore at:
    Dataset updated
    Dec 16, 2024
    Dataset provided by
    Leibniz Data Manager
    Authors
    Houyi Li; Zhihong Chen; Zhao Li; Qinkai Zheng; Peng Zhang; Shuigeng Zhou
    Description

    Graph representation learning typically aims to learn an informative embedding for each graph node based on the graph topology (link) information.

  5. OGBG-MolHIV (Processed for PyG)

    • kaggle.com
    zip
    Updated Feb 27, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redao da Taupl (2021). OGBG-MolHIV (Processed for PyG) [Dataset]. https://www.kaggle.com/datasets/dataup1/ogbg-molhiv
    Explore at:
    zip(8362512 bytes)Available download formats
    Dataset updated
    Feb 27, 2021
    Authors
    Redao da Taupl
    Description

    OGBN-MolHIV

    Webpage: https://ogb.stanford.edu/docs/graphprop/#ogbg-mol

    Usage in Python

    import os
    import os.path as osp
    import pandas as pd
    import torch
    from ogb.graphproppred import PygGraphPropPredDataset
    
    class PygOgbgMol(PygGraphPropPredDataset):
      def _init_(self, name, transform = None, pre_transform = None, meta_csv = None):
        root = '../input'
        if meta_csv is None:
          meta_csv = osp.join(root, name, 'ogbg-master.csv')
        master = pd.read_csv(meta_csv, index_col = 0)
        meta_dict = master[name]
        meta_dict['dir_path'] = osp.join(root, name)
        super()._init_(name = name, root = root, transform = transform, pre_transform = pre_transform, meta_dict = meta_dict)
      def get_idx_split(self, split_type = None):
        if split_type is None:
          split_type = self.meta_info['split']
          
        path = osp.join(self.root, 'split', split_type)
    
        # short-cut if split_dict.pt exists
        if os.path.isfile(os.path.join(path, 'split_dict.pt')):
          return torch.load(os.path.join(path, 'split_dict.pt'))
    
        train_idx = pd.read_csv(osp.join(path, 'train.csv'), header = None).values.T[0]
        valid_idx = pd.read_csv(osp.join(path, 'valid.csv'), header = None).values.T[0]
        test_idx = pd.read_csv(osp.join(path, 'test.csv'), header = None).values.T[0]
    
        return {'train': torch.tensor(train_idx, dtype = torch.long), 'valid': torch.tensor(valid_idx, dtype = torch.long), 'test': torch.tensor(test_idx, dtype = torch.long)}
    
    dataset = PygOgbgMol('ogbg-molhiv')
    
    from torch_geometric.data import DataLoader
    
    batch_size = 32
    split_idx = dataset.get_idx_split()
    train_loader = DataLoader(dataset[split_idx['train']], batch_size = batch_size, shuffle = True)
    valid_loader = DataLoader(dataset[split_idx['valid']], batch_size = batch_size, shuffle = False)
    test_loader = DataLoader(dataset[split_idx['test']], batch_size = batch_size, shuffle = False)
    

    Description

    Graph: The ogbg-molhiv and ogbg-molpcba datasets are two molecular property prediction datasets of different sizes: ogbg-molhiv (small) and ogbg-molpcba (medium). They are adopted from the MoleculeNet [1], and are among the largest of the MoleculeNet datasets. All the molecules are pre-processed using RDKit [2]. Each graph represents a molecule, where nodes are atoms, and edges are chemical bonds. Input node features are 9-dimensional, containing atomic number and chirality, as well as other additional atom features such as formal charge and whether the atom is in the ring or not. The full description of the features is provided in code. The script to convert the SMILES string [3] to the above graph object can be found here. Note that the script requires RDKit to be installed. The script can be used to pre-process external molecule datasets so that those datasets share the same input feature space as the OGB molecule datasets. This is particularly useful for pre-training graph models, which has great potential to significantly increase generalization performance on the (downstream) OGB datasets [4].

    Beside the two main datasets, the dataset authors additionally provide 10 smaller datasets from MoleculeNet. They are ogbg-moltox21, ogbg-molbace, ogbg-molbbbp, ogbg-molclintox, ogbg-molmuv, ogbg-molsider, and ogbg-moltoxcast for (multi-task) binary classification, and ogbg-molesol, ogbg-molfreesolv, and ogbg-mollipo for regression. Evaluators are also provided for these datasets. These datasets can be used to stress-test molecule-specific methods or transfer learning [4].

    For encoding these raw input features, the dataset authors prepare simple modules called AtomEncoder and BondEncoder. They can be used as follows to embed raw atom and bond features to obtain atom_emb and bond_emb.

    from ogb.graphproppred.mol_encoder import AtomEncoder, BondEncoder
    atom_encoder = AtomEncoder(emb_dim = 100)
    bond_encoder = BondEncoder(emb_dim = 100)
    
    atom_emb = atom_encoder(x) # x is the input atom feature
    edge_emb = bond_encoder(edge_attr) # edge_attr is the input edge feature
    

    Prediction task: The task is to predict the target molecular properties as accurately as possible, where the molecular properties are cast as binary labels, e.g, whether a molecule inhibits HIV virus replication or not. Note that some datasets (e.g., ogbg-molpcba) can have multiple tasks, and can contain nan that indicates the corresponding label is not assigned to the molecule. For evaluation metric, the dataset authors closely follow [2]. Specifically, for ogbg-molhiv, the dataset authors use ROC-AUC for...

  6. h

    OGB

    • huggingface.co
    Updated Jan 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhikai chen (2024). OGB [Dataset]. https://huggingface.co/datasets/zkchen/OGB
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 25, 2024
    Authors
    Zhikai chen
    Description

    zkchen/OGB dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. OGBN-MAG (Processed for PyG)

    • kaggle.com
    zip
    Updated Feb 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redao da Taupl (2021). OGBN-MAG (Processed for PyG) [Dataset]. https://www.kaggle.com/dataup1/ogbn-mag
    Explore at:
    zip(852576506 bytes)Available download formats
    Dataset updated
    Feb 27, 2021
    Authors
    Redao da Taupl
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    OGBN-MAG

    Webpage: https://ogb.stanford.edu/docs/nodeprop/#ogbn-mag

    Usage in Python

    Warning: Currently not usable.

    import torch_geometric
    from ogb.nodeproppred import PygNodePropPredDataset
    
    dataset = PygNodePropPredDataset('ogbn-mag', root = '/kaggle/input')
    split_idx = dataset.get_idx_split()
    train_idx, valid_idx, test_idx = split_idx['train'], split_idx['valid'], split_idx['test']
    graph = dataset[0] # PyG Graph object
    

    Description

    Graph: The ogbn-mag dataset is a heterogeneous network composed of a subset of the Microsoft Academic Graph (MAG) [1]. It contains four types of entities—papers (736,389 nodes), authors (1,134,649 nodes), institutions (8,740 nodes), and fields of study (59,965 nodes)—as well as four types of directed relations connecting two types of entities—an author is “affiliated with” an institution, an author “writes” a paper, a paper “cites” a paper, and a paper “has a topic of” a field of study. Similar to ogbn-arxiv, each paper is associated with a 128-dimensional word2vec feature vector, and all the other types of entities are not associated with input node features.

    Prediction task: Given the heterogeneous ogbn-mag data, the task is to predict the venue (conference or journal) of each paper, given its content, references, authors, and authors’ affiliations. This is of practical interest as some manuscripts’ venue information is unknown or missing in MAG, due to the noisy nature of Web data. In total, there are 349 different venues in ogbn-mag, making the task a 349-class classification problem.

    Dataset splitting: The authors of this dataset follow the same time-based strategy as ogbn-arxiv and ogbn-papers100M to split the paper nodes in the heterogeneous graph, i.e., training models to predict venue labels of all papers published before 2018, validating and testing the models on papers published in 2018 and since 2019, respectively.

    Summary

    Package#Nodes#EdgesSplit TypeTask TypeMetric
    ogb>=1.2.11,939,74321,111,007TimeMulti-class classificationAccuracy

    Open Graph Benchmark

    Website: https://ogb.stanford.edu

    The Open Graph Benchmark (OGB) [2] is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs. OGB datasets are automatically downloaded, processed, and split using the OGB Data Loader. The model performance can be evaluated using the OGB Evaluator in a unified manner.

    References

    [1] Kuansan Wang, Zhihong Shen, Chiyuan Huang, Chieh-Han Wu, Yuxiao Dong, and Anshul Kanakia. Microsoft academic graph: When experts are not enough. Quantitative Science Studies, 1(1):396–413, 2020. [2] Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. Open graph benchmark: Datasets for machine learning on graphs. Advances in Neural Information Processing Systems, pp. 22118–22133, 2020.

    Disclaimer

    I am NOT the author of this dataset. It was downloaded from its official website. I assume no responsibility or liability for the content in this dataset. Any questions, problems or issues, please contact the original authors at their website or their GitHub repo.

  8. ogb-1.3.6-library

    • kaggle.com
    zip
    Updated Jul 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sajan Gohil (2025). ogb-1.3.6-library [Dataset]. https://www.kaggle.com/datasets/srg9000/ogb-1-3-6-library
    Explore at:
    zip(74937 bytes)Available download formats
    Dataset updated
    Jul 9, 2025
    Authors
    Sajan Gohil
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Sajan Gohil

    Released under MIT

    Contents

  9. pip install ogb

    • kaggle.com
    zip
    Updated Jul 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NGUYENGN1410 (2024). pip install ogb [Dataset]. https://www.kaggle.com/datasets/nguyengn1410/pip-install-ogb
    Explore at:
    zip(171327671 bytes)Available download formats
    Dataset updated
    Jul 21, 2024
    Authors
    NGUYENGN1410
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by NGUYENGN1410

    Released under MIT

    Contents

  10. e

    Ets Ogb Commerce General Imp Exp Import Export Data & Shipment Details

    • eximpedia.app
    Updated Feb 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Ets Ogb Commerce General Imp Exp Import Export Data & Shipment Details [Dataset]. https://www.eximpedia.app/companies/ets-ogb-commerce-general-imp-exp/33092127
    Explore at:
    Dataset updated
    Feb 18, 2025
    Description

    View Ets Ogb Commerce General Imp Exp import export trade data, including shipment records, HS codes, top buyers, suppliers, trade values, and global market insights.

  11. Real OGB Recent's YouTube Channel Statistics

    • vidiq.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vidIQ, Real OGB Recent's YouTube Channel Statistics [Dataset]. https://vidiq.com/youtube-stats/channel/UCIc_pliB7EkbWHIrDbyf86Q/
    Explore at:
    Dataset authored and provided by
    vidIQ
    Time period covered
    Jan 1, 2026 - Jan 26, 2026
    Area covered
    YouTube, Worldwide
    Variables measured
    subscribers, video count, video views, engagement rate, upload frequency, estimated earnings
    Description

    Comprehensive YouTube channel statistics for Real OGB Recent, featuring 899,000 subscribers and 110,425,829 total views. This dataset includes detailed performance metrics such as subscriber growth, video views, engagement rates, and estimated revenue. The channel operates in the Entertainment category. Track 191 videos with daily and monthly performance data, including view counts, subscriber changes, and earnings estimates. Analyze growth trends, engagement patterns, and compare performance against similar channels in the same category.

  12. OGBG-Code (Processed for PyG)

    • kaggle.com
    zip
    Updated Feb 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redao da Taupl (2021). OGBG-Code (Processed for PyG) [Dataset]. https://www.kaggle.com/dataup1/ogbg-code
    Explore at:
    zip(1314604183 bytes)Available download formats
    Dataset updated
    Feb 27, 2021
    Authors
    Redao da Taupl
    Description

    OGBN-Code

    Webpage: https://ogb.stanford.edu/docs/graphprop/#ogbg-code

    Usage in Python

    from torch_geometric.data import DataLoader
    from ogb.graphproppred import PygGraphPropPredDataset
    
    dataset = PygGraphPropPredDataset(name = 'ogbg-code', root = '/kaggle/input') 
    
    batch_size = 32
    split_idx = dataset.get_idx_split()
    train_loader = DataLoader(dataset[split_idx['train']], batch_size = batch_size, shuffle = True)
    valid_loader = DataLoader(dataset[split_idx['valid']], batch_size = batch_size, shuffle = False)
    test_loader = DataLoader(dataset[split_idx['test']], batch_size = batch_size, shuffle = False)
    

    Description

    Graph: The ogbg-code dataset is a collection of Abstract Syntax Trees (ASTs) obtained from approximately 450 thousands Python method definitions. Methods are extracted from a total of 13,587 different repositories across the most popular projects on GitHub. The collection of Python methods originates from GitHub CodeSearchNet, a collection of datasets and benchmarks for machine-learning-based code retrieval. In ogbg-code, the dataset authors contribute an additional feature extraction step, which includes: AST edges, AST nodes, and tokenized method name. Altogether, ogbg-code allows you to capture source code with its underlying graph structure, beyond its token sequence representation.

    Prediction task: The task is to predict the sub-tokens forming the method name, given the Python method body represented by AST and its node features. This task is often referred to as “code summarization”, because the model is trained to find succinct and precise description (i.e., the method name chosen by the developer) for a complete logical unit (i.e., the method body). Code summarization is a representative task in the field of machine learning for code not only for its straightforward adoption in developer tools, but also because it is a proxy measure for assessing how well a model captures the code semantic [1]. Following [2,3], the dataset authors use an F1 score to evaluate predicted sub-tokens against ground-truth sub-tokens.

    Dataset splitting: The dataset authors adopt a project split [4], where the ASTs for the train set are obtained from GitHub projects that do not appear in the validation and test sets. This split respects the practical scenario of training a model on a large collection of source code (obtained, for instance, from the popular GitHub projects), and then using it to predict method names on a separate code base. The project split stress-tests the model’s ability to capture code’s semantics, and avoids a model that trivially memorizes the idiosyncrasies of training projects (such as the naming conventions and the coding style of a specific developer) to achieve a high test score.

    Summary

    Package#Graphs#Nodes per Graph#Edges per GraphSplit TypeTask TypeMetric
    ogb>=1.2.0452,741125.2124.2ProjectSub-token predictionF1 score

    License: MIT License

    Open Graph Benchmark

    Website: https://ogb.stanford.edu

    The Open Graph Benchmark (OGB) [5] is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs. OGB datasets are automatically downloaded, processed, and split using the OGB Data Loader. The model performance can be evaluated using the OGB Evaluator in a unified manner.

    References

    [1] Miltiadis Allamanis, Earl T Barr, Premkumar Devanbu, and Charles Sutton. A survey of machinelearning for big code and naturalness. ACM Computing Surveys, 51(4):1–37, 2018. [2] Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. code2seq: Generating sequences fromstructured representations of code. arXiv preprint arXiv:1808.01400, 2018. [3] Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. code2vec: Learning distributed rep-resentations of code. Proceedings of the ACM on Programming Languages, 3(POPL):1–29,2019. [4] Miltiadis Allamanis. The adverse effects of code duplication in machine learning models of code. Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, pp. 143–153, 2019. [5] Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. Open graph benchmark: Datasets for machine learning on graphs. Advances in Neural Information Processing Systems, pp. 22118–22133, 2020.

    Disclaimer

    I am NOT the author of this dataset. It was downloaded from its official website. I assume no responsibility or liability for the content in this dataset. Any questions, problems or issues, please contact the original authors at their website or their GitHub repo.

  13. O

    OGB-LSC (OGB Large-Scale Challenge)

    • opendatalab.com
    zip
    Updated Sep 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Facebook AI Research (2022). OGB-LSC (OGB Large-Scale Challenge) [Dataset]. https://opendatalab.com/OpenDataLab/OGB-LSC
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 29, 2022
    Dataset provided by
    Facebook AI Research
    RIKEN Center for Advanced Intelligence Project
    Technical University of Dortmund
    Stanford University
    Description

    OGB Large-Scale Challenge (OGB-LSC) is a collection of three real-world datasets for advancing the state-of-the-art in large-scale graph ML. OGB-LSC provides graph datasets that are orders of magnitude larger than existing ones and covers three core graph learning tasks -- link prediction, graph regression, and node classification. OGB-LSC consists of three datasets: MAG240M-LSC, WikiKG90M-LSC, and PCQM4M-LSC. Each dataset offers an independent task. MAG240M-LSC is a heterogeneous academic graph, and the task is to predict the subject areas of papers situated in the heterogeneous graph (node classification). WikiKG90M-LSC is a knowledge graph, and the task is to impute missing triplets (link prediction). PCQM4M-LSC is a quantum chemistry dataset, and the task is to predict an important molecular property, the HOMO-LUMO gap, of a given molecule (graph regression).

  14. Benchmark Data for Chemprop

    • zenodo.org
    application/gzip
    Updated Jul 24, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esther Heid; Esther Heid; Kevin P. Greenman; Kevin P. Greenman; Yunsie Chung; Yunsie Chung; Shih-Cheng Li; Shih-Cheng Li; David E. Graff; David E. Graff; Florence H. Vermeire; Florence H. Vermeire; Haoyang Wu; Haoyang Wu; William H. Green; William H. Green; Charles J. McGill; Charles J. McGill (2023). Benchmark Data for Chemprop [Dataset]. http://doi.org/10.5281/zenodo.8174268
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jul 24, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Esther Heid; Esther Heid; Kevin P. Greenman; Kevin P. Greenman; Yunsie Chung; Yunsie Chung; Shih-Cheng Li; Shih-Cheng Li; David E. Graff; David E. Graff; Florence H. Vermeire; Florence H. Vermeire; Haoyang Wu; Haoyang Wu; William H. Green; William H. Green; Charles J. McGill; Charles J. McGill
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasets and splits of the manuscript "Chemprop: Machine Learning Package for Chemical Property Prediction." Train, validation and test splits are located within each folder, as well as additional data necessary for some of the benchmarks. To train Chemprop models, refer to our code repository to obtain ready-to-use scripts to train machine learning models for each of the systems.

    Available benchmarking systems:

    • `hiv` HIV replication inhibition from MoleculeNet and OGB with scaffold splits
    • `pcba_random` Biological activities from MoleculeNet and OGB with random splits
    • `pcba_scaffold` Biological activities from MoleculeNet and OGB with scaffold splits
    • `qm9_multitask` DFT calculated properties from MoleculeNet and OGB, trained as a multi-task model
    • `qm9_u0` DFT calculated properties from MoleculeNet and OGB, trained as a single-task model on the target U0 only
    • `qm9_gap` DFT calculated properties from MoleculeNet and OGB, trained as a single-task model on the target gap only
    • `sampl` Water-octanol partition coefficients, used to predict molecules from the SAMPL6, 7 and 9 challenges
    • `atom_bond_137k` Quantum-mechanical atom and bond descriptors
    • `bde` Bond dissociation enthalpies trained as single-task model
    • `bde_charges` Bond dissociation enthalpies trained as multi-task model together with atomic partial charges
    • `charges_eps_4` Partial charges at a dielectric constant of 4 (in protein)
    • `charges_eps_78` Partial charges at a dielectric constant of 78 (in water)
    • `barriers_e2` Reaction barrier heights of E2 reactions
    • `barriers_sn2` Reaction barrier heights of SN2 reactions
    • `barriers_cycloadd` Reaction barrier heights of cycloaddition reactions
    • `barriers_rdb7` Reaction barrier heights in the RDB7 dataset
    • `barriers_rgd1` Reaction barrier heights in the RGD1-CNHO dataset
    • `multi_molecule` UV/Vis peak absorption wavelengths in different solvents
    • `ir` IR Spectra
    • `pcqm4mv2` HOMO-LUMO gaps of the PCQM4Mv2 dataset
    • `uncertainty_ensemble` Uncertainty estimation using an ensemble using the QM9 gap dataset
    • `uncertainty_evidential` Uncertainty estimation using evidential learning using the QM9 gap dataset
    • `uncertainty_mve` Uncertainty estimation using mean-variance estimation using the QM9 gap dataset
    • `timing` Timing benchmark using subsets of QM9 gap
  15. Calcium time series from OGB labeled V1 neurons in awake or anesthesized...

    • figshare.com
    bin
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pieter Goltstein (2016). Calcium time series from OGB labeled V1 neurons in awake or anesthesized mice. [Dataset]. http://doi.org/10.6084/m9.figshare.1287764.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Pieter Goltstein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Calcium time series from OGB labeled V1 neurons in awake or anesthesized mice. Data published in: Pieter M. Goltstein, Jorrit S. Montijn, Cyriel M.A. Pennartz. (2015). Effects of isoflurane anesthesia on ensemble patterns of Ca2+ activity in mouse V1: Reduced direction selectivity independent of increased correlations in cellular activity. PLOS ONE.

  16. OGBN-Products (Processed for PyG)

    • kaggle.com
    zip
    Updated Feb 27, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redao da Taupl (2021). OGBN-Products (Processed for PyG) [Dataset]. https://www.kaggle.com/datasets/dataup1/ogbn-products/code
    Explore at:
    zip(3699538358 bytes)Available download formats
    Dataset updated
    Feb 27, 2021
    Authors
    Redao da Taupl
    Description

    OGBN-Products

    Webpage: https://ogb.stanford.edu/docs/nodeprop/#ogbn-products

    Usage in Python

    import os.path as osp
    import pandas as pd
    import datatable as dt
    import torch
    import torch_geometric as pyg
    from ogb.nodeproppred import PygNodePropPredDataset
    
    class PygOgbnProducts(PygNodePropPredDataset):
      def _init_(self, meta_csv = None):
        root, name, transform = '/kaggle/input', 'ogbn-products', None
        if meta_csv is None:
          meta_csv = osp.join(root, name, 'ogbn-master.csv')
        master = pd.read_csv(meta_csv, index_col = 0)
        meta_dict = master[name]
        meta_dict['dir_path'] = osp.join(root, name)
        super()._init_(name = name, root = root, transform = transform, meta_dict = meta_dict)
      def get_idx_split(self, split_type = None):
        if split_type is None:
          split_type = self.meta_info['split']
        path = osp.join(self.root, 'split', split_type)
        if osp.isfile(os.path.join(path, 'split_dict.pt')):
          return torch.load(os.path.join(path, 'split_dict.pt'))
        if self.is_hetero:
          train_idx_dict, valid_idx_dict, test_idx_dict = read_nodesplitidx_split_hetero(path)
          for nodetype in train_idx_dict.keys():
            train_idx_dict[nodetype] = torch.from_numpy(train_idx_dict[nodetype]).to(torch.long)
            valid_idx_dict[nodetype] = torch.from_numpy(valid_idx_dict[nodetype]).to(torch.long)
            test_idx_dict[nodetype] = torch.from_numpy(test_idx_dict[nodetype]).to(torch.long)
            return {'train': train_idx_dict, 'valid': valid_idx_dict, 'test': test_idx_dict}
        else:
          train_idx = dt.fread(osp.join(path, 'train.csv'), header = None).to_numpy().T[0]
          train_idx = torch.from_numpy(train_idx).to(torch.long)
          valid_idx = dt.fread(osp.join(path, 'valid.csv'), header = None).to_numpy().T[0]
          valid_idx = torch.from_numpy(valid_idx).to(torch.long)
          test_idx = dt.fread(osp.join(path, 'test.csv'), header = None).to_numpy().T[0]
          test_idx = torch.from_numpy(test_idx).to(torch.long)
          return {'train': train_idx, 'valid': valid_idx, 'test': test_idx}
    
    dataset = PygOgbnProducts()
    split_idx = dataset.get_idx_split()
    train_idx, valid_idx, test_idx = split_idx['train'], split_idx['valid'], split_idx['test']
    graph = dataset[0] # PyG Graph object
    

    Description

    Graph: The ogbn-products dataset is an undirected and unweighted graph, representing an Amazon product co-purchasing network [1]. Nodes represent products sold in Amazon, and edges between two products indicate that the products are purchased together. The authors follow [2] to process node features and target categories. Specifically, node features are generated by extracting bag-of-words features from the product descriptions followed by a Principal Component Analysis to reduce the dimension to 100.

    Prediction task: The task is to predict the category of a product in a multi-class classification setup, where the 47 top-level categories are used for target labels.

    Dataset splitting: The authors consider a more challenging and realistic dataset splitting that differs from the one used in [2] Instead of randomly assigning 90% of the nodes for training and 10% of the nodes for testing (without use of a validation set), use the sales ranking (popularity) to split nodes into training/validation/test sets. Specifically, the authors sort the products according to their sales ranking and use the top 8% for training, next top 2% for validation, and the rest for testing. This is a more challenging splitting procedure that closely matches the real-world application where labels are first assigned to important nodes in the network and ML models are subsequently used to make predictions on less important ones.

    Note 1: A very small number of self-connecting edges are repeated (see here); you may remove them if necessary.

    Note 2: For undirected graphs, the loaded graphs will have the doubled number of edges because the bidirectional edges will be added automatically.

    Summary

    Package#Nodes#EdgesSplit TypeTask TypeMetric
    ogb>=1.1.12,449,02961,859,140Sales rankMulti-class classificationAccuracy

    Open Graph Benchmark

    Website: https://ogb.stanford.edu

    The Open Graph Benchmark (OGB) [3] is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs. OGB datasets are automatically downloaded, processed, and split using the OGB Data Loader. The model performance can be evaluated using the OGB Evaluator in a unified manner.

    References

    [1] http://manikvarma.org/downloads/XC/XMLRepository.html [2] Wei-Lin Chiang, ...

  17. e

    O G B Company Limited Import Export Data & Shipment Details

    • eximpedia.app
    Updated Nov 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). O G B Company Limited Import Export Data & Shipment Details [Dataset]. https://www.eximpedia.app/companies/o-g-b-company-limited/92906644
    Explore at:
    Dataset updated
    Nov 2, 2025
    Description

    View O G B Company Limited import export trade data, including shipment records, HS codes, top buyers, suppliers, trade values, and global market insights.

  18. h

    pjf_llama_instruction_prep

    • huggingface.co
    Updated Mar 15, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ogb (2020). pjf_llama_instruction_prep [Dataset]. https://huggingface.co/datasets/ogbrandt/pjf_llama_instruction_prep
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 15, 2020
    Authors
    ogb
    Description

    ogbrandt/pjf_llama_instruction_prep dataset hosted on Hugging Face and contributed by the HF Datasets community

  19. Z

    Data from: Algorithm and System Co-design for Efficient Subgraph-based Graph...

    • data-staging.niaid.nih.gov
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yin, Haoteng; Zhang, Muhan; Wang, Yanbang; Wang, Jianguo; Li, Pan (2025). Algorithm and System Co-design for Efficient Subgraph-based Graph Representation Learning [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_15186012
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Peking University
    Cornell University
    Purdue University West Lafayette
    Authors
    Yin, Haoteng; Zhang, Muhan; Wang, Yanbang; Wang, Jianguo; Li, Pan
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Following the format of the Open Graph Benchmark (OGB), we design four prediction tasks of relations (mag-write, mag-cite) and higher-order patterns (tags-math, DBLP-coauthor) and construct the corresponding datasets over heterogeneous graphs and hypergraphs [1]. The original ogb-mag dataset only contains features for 'paper'-type nodes. We add the node embedding provided by [2] as raw features for other node types in MAG(P-A)/(P-P). For these four tasks, the model is evaluated by one positive query paired with a certain number of randomly sampled negative queries (1:1000 by default, except for tags-math 1:100).

  20. e

    Ogb And Partners Limited Import Export Data & Shipment Details

    • eximpedia.app
    Updated Oct 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Ogb And Partners Limited Import Export Data & Shipment Details [Dataset]. https://www.eximpedia.app/companies/ogb-and-partners-limited/67600759
    Explore at:
    Dataset updated
    Oct 9, 2025
    Description

    View Ogb And Partners Limited import export trade data, including shipment records, HS codes, top buyers, suppliers, trade values, and global market insights.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Stanford University (2020). OGB(Open Graph Benchmark) [Dataset]. https://opendatalab.com/OpenDataLab/OGB
Organization logo

OGB(Open Graph Benchmark)

OpenDataLab/OGB

Explore at:
31 scholarly articles cite this dataset (View in Google Scholar)
zipAvailable download formats
Dataset updated
May 1, 2020
Dataset provided by
微软研究院http://research.microsoft.com/en-us/
Technical University of Dortmund
Harvard University
Stanford University
Description

The Open Graph Benchmark (OGB) is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs. OGB datasets are automatically downloaded, processed, and split using the OGB Data Loader. The model performance can be evaluated using the OGB Evaluator in a unified manner. OGB is a community-driven initiative in active development.

Search
Clear search
Close search
Google apps
Main menu