36 datasets found
  1. P

    OGB-LSC Dataset

    • paperswithcode.com
    Updated Jan 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Weihua Hu; Matthias Fey; Hongyu Ren; Maho Nakata; Yuxiao Dong; Jure Leskovec (2024). OGB-LSC Dataset [Dataset]. https://paperswithcode.com/dataset/ogb-lsc
    Explore at:
    Dataset updated
    Jan 25, 2024
    Authors
    Weihua Hu; Matthias Fey; Hongyu Ren; Maho Nakata; Yuxiao Dong; Jure Leskovec
    Description

    OGB Large-Scale Challenge (OGB-LSC) is a collection of three real-world datasets for advancing the state-of-the-art in large-scale graph ML. OGB-LSC provides graph datasets that are orders of magnitude larger than existing ones and covers three core graph learning tasks -- link prediction, graph regression, and node classification.

    OGB-LSC consists of three datasets: MAG240M-LSC, WikiKG90M-LSC, and PCQM4M-LSC. Each dataset offers an independent task.

    MAG240M-LSC is a heterogeneous academic graph, and the task is to predict the subject areas of papers situated in the heterogeneous graph (node classification). WikiKG90M-LSC is a knowledge graph, and the task is to impute missing triplets (link prediction). PCQM4M-LSC is a quantum chemistry dataset, and the task is to predict an important molecular property, the HOMO-LUMO gap, of a given molecule (graph regression).

  2. P

    OGB Dataset

    • paperswithcode.com
    Updated Jul 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Weihua Hu; Matthias Fey; Marinka Zitnik; Yuxiao Dong; Hongyu Ren; Bowen Liu; Michele Catasta; Jure Leskovec (2021). OGB Dataset [Dataset]. https://paperswithcode.com/dataset/ogb
    Explore at:
    Dataset updated
    Jul 19, 2021
    Authors
    Weihua Hu; Matthias Fey; Marinka Zitnik; Yuxiao Dong; Hongyu Ren; Bowen Liu; Michele Catasta; Jure Leskovec
    Description

    The Open Graph Benchmark (OGB) is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs. OGB datasets are automatically downloaded, processed, and split using the OGB Data Loader. The model performance can be evaluated using the OGB Evaluator in a unified manner. OGB is a community-driven initiative in active development.

  3. T

    ogbg_molpcba

    • tensorflow.org
    Updated Dec 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). ogbg_molpcba [Dataset]. https://www.tensorflow.org/datasets/catalog/ogbg_molpcba
    Explore at:
    Dataset updated
    Dec 14, 2022
    Description

    'ogbg-molpcba' is a molecular dataset sampled from PubChem BioAssay. It is a graph prediction dataset from the Open Graph Benchmark (OGB).

    This dataset is experimental, and the API is subject to change in future releases.

    The below description of the dataset is adapted from the OGB paper:

    Input Format

    All the molecules are pre-processed using RDKit ([1]).

    • Each graph represents a molecule, where nodes are atoms, and edges are chemical bonds.
    • Input node features are 9-dimensional, containing atomic number and chirality, as well as other additional atom features such as formal charge and whether the atom is in the ring.
    • Input edge features are 3-dimensional, containing bond type, bond stereochemistry, as well as an additional bond feature indicating whether the bond is conjugated.

    The exact description of all features is available at https://github.com/snap-stanford/ogb/blob/master/ogb/utils/features.py.

    Prediction

    The task is to predict 128 different biological activities (inactive/active). See [2] and [3] for more description about these targets. Not all targets apply to each molecule: missing targets are indicated by NaNs.

    References

    [1]: Greg Landrum, et al. 'RDKit: Open-source cheminformatics'. URL: https://github.com/rdkit/rdkit

    [2]: Bharath Ramsundar, Steven Kearnes, Patrick Riley, Dale Webster, David Konerding and Vijay Pande. 'Massively Multitask Networks for Drug Discovery'. URL: https://arxiv.org/pdf/1502.02072.pdf

    [3]: Zhenqin Wu, Bharath Ramsundar, Evan N Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S. Pappu, Karl Leswing, and Vijay Pande. MoleculeNet: a benchmark for molecular machine learning. Chemical Science, 9(2):513-530, 2018.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('ogbg_molpcba', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

    https://storage.googleapis.com/tfds-data/visualization/fig/ogbg_molpcba-0.1.3.png" alt="Visualization" width="500px">

  4. OGB(Open Graph Benchmark)

    • opendatalab.com
    zip
    Updated May 1, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard University (2020). OGB(Open Graph Benchmark) [Dataset]. https://opendatalab.com/OpenDataLab/OGB
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 1, 2020
    Dataset provided by
    微软研究院https://www.microsoft.com/research/
    Harvard University
    Technical University of Dortmund
    Stanford University
    Description

    The Open Graph Benchmark (OGB) is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs. OGB datasets are automatically downloaded, processed, and split using the OGB Data Loader. The model performance can be evaluated using the OGB Evaluator in a unified manner. OGB is a community-driven initiative in active development.

  5. h

    OGB

    • huggingface.co
    Updated Jan 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhikai chen (2024). OGB [Dataset]. https://huggingface.co/datasets/zkchen/OGB
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 25, 2024
    Authors
    Zhikai chen
    Description

    zkchen/OGB dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. OGBN-Proteins (Processed for PyG)

    • kaggle.com
    zip
    Updated Feb 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redao da Taupl (2021). OGBN-Proteins (Processed for PyG) [Dataset]. https://www.kaggle.com/dataup1/ogbn-proteins
    Explore at:
    zip(677947148 bytes)Available download formats
    Dataset updated
    Feb 27, 2021
    Authors
    Redao da Taupl
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    OGBN-Proteins

    Webpage: https://ogb.stanford.edu/docs/nodeprop/#ogbn-proteins

    Usage in Python

    import os.path as osp
    import pandas as pd
    import torch
    import torch_geometric.transforms as T
    from ogb.nodeproppred import PygNodePropPredDataset
    
    class PygOgbnProteins(PygNodePropPredDataset):
      def _init_(self, meta_csv = None):
        root, name, transform = '/kaggle/input', 'ogbn-proteins', T.ToSparseTensor()
        if meta_csv is None:
          meta_csv = osp.join(root, name, 'ogbn-master.csv')
        master = pd.read_csv(meta_csv, index_col = 0)
        meta_dict = master[name]
        meta_dict['dir_path'] = osp.join(root, name)
        super()._init_(name = name, root = root, transform = transform, meta_dict = meta_dict)
      def get_idx_split(self, split_type = None):
        if split_type is None:
          split_type = self.meta_info['split']
        path = osp.join(self.root, 'split', split_type)
        if osp.isfile(os.path.join(path, 'split_dict.pt')):
          return torch.load(os.path.join(path, 'split_dict.pt'))
        if self.is_hetero:
          train_idx_dict, valid_idx_dict, test_idx_dict = read_nodesplitidx_split_hetero(path)
          for nodetype in train_idx_dict.keys():
            train_idx_dict[nodetype] = torch.from_numpy(train_idx_dict[nodetype]).to(torch.long)
            valid_idx_dict[nodetype] = torch.from_numpy(valid_idx_dict[nodetype]).to(torch.long)
            test_idx_dict[nodetype] = torch.from_numpy(test_idx_dict[nodetype]).to(torch.long)
            return {'train': train_idx_dict, 'valid': valid_idx_dict, 'test': test_idx_dict}
        else:
          train_idx = dt.fread(osp.join(path, 'train.csv'), header = None).to_numpy().T[0]
          train_idx = torch.from_numpy(train_idx).to(torch.long)
          valid_idx = dt.fread(osp.join(path, 'valid.csv'), header = None).to_numpy().T[0]
          valid_idx = torch.from_numpy(valid_idx).to(torch.long)
          test_idx = dt.fread(osp.join(path, 'test.csv'), header = None).to_numpy().T[0]
          test_idx = torch.from_numpy(test_idx).to(torch.long)
          return {'train': train_idx, 'valid': valid_idx, 'test': test_idx}
    
    dataset = PygOgbnProteins()
    split_idx = dataset.get_idx_split()
    train_idx, valid_idx, test_idx = split_idx['train'], split_idx['valid'], split_idx['test']
    graph = dataset[0] # PyG Graph object
    

    Description

    Graph: The ogbn-proteins dataset is an undirected, weighted, and typed (according to species) graph. Nodes represent proteins, and edges indicate different types of biologically meaningful associations between proteins, e.g., physical interactions, co-expression or homology [1,2]. All edges come with 8-dimensional features, where each dimension represents the strength of a single association type and takes values between 0 and 1 (the larger the value is, the stronger the association is). The proteins come from 8 species.

    Prediction task: The task is to predict the presence of protein functions in a multi-label binary classification setup, where there are 112 kinds of labels to predict in total. The performance is measured by the average of ROC-AUC scores across the 112 tasks.

    Dataset splitting: The authors split the protein nodes into training/validation/test sets according to the species which the proteins come from. This enables the evaluation of the generalization performance of the model across different species.

    Note: For undirected graphs, the loaded graphs will have the doubled number of edges because the bidirectional edges will be added automatically.

    Summary

    Package#Nodes#EdgesSplit TypeTask TypeMetric
    ogb>=1.1.1132,53439,561,252SpeciesMulti-label binary classificationROC-AUC

    Open Graph Benchmark

    Website: https://ogb.stanford.edu

    The Open Graph Benchmark (OGB) [3] is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs. OGB datasets are automatically downloaded, processed, and split using the OGB Data Loader. The model performance can be evaluated using the OGB Evaluator in a unified manner.

    References

    [1] Damian Szklarczyk, Annika L Gable, David Lyon, Alexander Junge, Stefan Wyder, Jaime Huerta-Cepas, Milan Simonovic, Nadezhda T Doncheva, John H Morris, Peer Bork, et al. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Research, 47(D1):D607–D613, 2019. [2] Gene Ontology Consortium. The gene ontology resource: 20 years and still going strong. Nucleic Acids Research, 47(D1):D330–D338, 2018. [3] Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. Open graph benchmark: Datasets for machine learning on graphs. Advances in Neural Information Processing Systems, pp. 22118–22133, 2020.

    Disclaimer

    I am NOT the author of this dataset. It was downloaded from its official website. I assume no responsibility or liability for the content in this dataset. Any questions, problems or issues, please contact the original authors at their website or their GitHub repo.

  7. t

    CORA, Citeseer, Pubmed, OGB arXiv - Dataset - LDM

    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). CORA, Citeseer, Pubmed, OGB arXiv - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/cora--citeseer--pubmed--ogb-arxiv
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    CORA, Citeseer, Pubmed, OGB arXiv

  8. t

    Weihua Hu, Matthias Fey, Hongyu Ren, Maho Nakata, Yuxiao Dong, Jure Leskovec...

    • service.tib.eu
    Updated Dec 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Weihua Hu, Matthias Fey, Hongyu Ren, Maho Nakata, Yuxiao Dong, Jure Leskovec (2024). Dataset: OGB-LSC. https://doi.org/10.57702/lsm2j4pu [Dataset]. https://service.tib.eu/ldmservice/dataset/ogb-lsc
    Explore at:
    Dataset updated
    Dec 2, 2024
    Description

    OGB-LSC provides the three large-scale realistic benchmark datasets, covering the core graph ML tasks of node classification, link prediction, and graph regression.

  9. Z

    Benchmark Data for Chemprop

    • data.niaid.nih.gov
    Updated Nov 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Graff, David E. (2023). Benchmark Data for Chemprop [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8174267
    Explore at:
    Dataset updated
    Nov 9, 2023
    Dataset provided by
    Chung, Yunsie
    Green, William H.
    Vermeire, Florence H.
    Heid, Esther
    Greenman, Kevin P.
    Li, Shih-Cheng
    McGill, Charles J.
    Wu, Haoyang
    Graff, David E.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasets and splits of the manuscript "Chemprop: Machine Learning Package for Chemical Property Prediction." Train, validation and test splits are located within each folder, as well as additional data necessary for some of the benchmarks. To train Chemprop models, refer to our code repository to obtain ready-to-use scripts to train machine learning models for each of the systems. Available benchmarking systems:

    hiv HIV replication inhibition from MoleculeNet and OGB with scaffold splits pcba_random Biological activities from MoleculeNet with random splits (with missing targets filled in with zeros as provided by MoleculeNet) pcba_random_nans Biological activities from MoleculeNet with random splits and data format to match OGB (with missing targets not filled in with zeros) pcba_scaffold Biological activities from OGB with scaffold splits qm9_multitask DFT calculated properties from MoleculeNet and OGB, trained as a multi-task model qm9_u0 DFT calculated properties from MoleculeNet and OGB, trained as a single-task model on the target U0 only qm9_gap DFT calculated properties from MoleculeNet and OGB, trained as a single-task model on the target gap only sampl Water-octanol partition coefficients, used to predict molecules from the SAMPL6, 7 and 9 challenges atom_bond_137k Quantum-mechanical atom and bond descriptors bde Bond dissociation enthalpies trained as single-task model bde_charges Bond dissociation enthalpies trained as multi-task model together with atomic partial charges charges_eps_4 Partial charges at a dielectric constant of 4 (in protein) charges_eps_78 Partial charges at a dielectric constant of 78 (in water) barriers_e2 Reaction barrier heights of E2 reactions barriers_sn2 Reaction barrier heights of SN2 reactions barriers_cycloadd Reaction barrier heights of cycloaddition reactions barriers_rdb7 Reaction barrier heights in the RDB7 dataset barriers_rgd1 Reaction barrier heights in the RGD1-CNHO dataset multi_molecule UV/Vis peak absorption wavelengths in different solvents ir IR Spectra pcqm4mv2 HOMO-LUMO gaps of the PCQM4Mv2 dataset uncertainty_ensemble Uncertainty estimation using an ensemble using the QM9 gap dataset uncertainty_evidential Uncertainty estimation using evidential learning using the QM9 gap dataset uncertainty_mve Uncertainty estimation using mean-variance estimation using the QM9 gap dataset timing Timing benchmark using subsets of QM9 gap Version: This version of the dataset (Version 2) is compatible with all versions of Chemprop (supporting the respective functionality). Version 1 of this dataset is compatible with all versions except Chemprop v.1.6.1, which cannot process the charges_eps_4 and charges_eps_78 datasets (all other benchmarks work as expected). We therefore recommend to always use Version 2 of the dataset (with reformatted charges_eps_4 and charges_eps_78 datasets), since it is compatible with all versions of Chemprop. For use with any other ML software, you can use any version.

  10. e

    Ets Ogb Commerce General Imp Exp | See Full Import/Export Data | Eximpedia

    • eximpedia.app
    Updated Feb 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim (2025). Ets Ogb Commerce General Imp Exp | See Full Import/Export Data | Eximpedia [Dataset]. https://www.eximpedia.app/
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Feb 18, 2025
    Dataset provided by
    Eximpedia Export Import Trade Data
    Eximpedia PTE LTD
    Authors
    Seair Exim
    Area covered
    Jamaica, Panama, Saint Barthélemy, India, Djibouti, Jersey, Gabon, Switzerland, Micronesia (Federated States of), Libya
    Description

    Eximpedia Export import trade data lets you search trade data and active Exporters, Importers, Buyers, Suppliers, manufacturers exporters from over 209 countries

  11. Vintage Ogb (Name) - Reverse Whois Lookup

    • whoisdatacenter.com
    csv
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AllHeart Web Inc, Vintage Ogb (Name) - Reverse Whois Lookup [Dataset]. https://whoisdatacenter.com/name/Vintage-Ogb/
    Explore at:
    csvAvailable download formats
    Dataset provided by
    AllHeart Web
    Authors
    AllHeart Web Inc
    License

    https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/

    Time period covered
    Mar 15, 1985 - Jul 1, 2025
    Description

    Investigate historical ownership changes and registration details by initiating a reverse Whois lookup for the name Vintage Ogb.

  12. h

    pjf-podcast-qa-sharegpt

    • huggingface.co
    Updated Mar 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ogb (2020). pjf-podcast-qa-sharegpt [Dataset]. https://huggingface.co/datasets/ogbrandt/pjf-podcast-qa-sharegpt
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 15, 2020
    Authors
    ogb
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Used TheBloke/OpenHermes-2-Mistral-7B-GPTQ to convert chunks into QA pairs used for finetuning

  13. f

    Calcium time series from OGB labeled V1 neurons in awake or anesthesized...

    • figshare.com
    bin
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pieter Goltstein (2016). Calcium time series from OGB labeled V1 neurons in awake or anesthesized mice. [Dataset]. http://doi.org/10.6084/m9.figshare.1287764.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    figshare
    Authors
    Pieter Goltstein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Calcium time series from OGB labeled V1 neurons in awake or anesthesized mice. Data published in: Pieter M. Goltstein, Jorrit S. Montijn, Cyriel M.A. Pennartz. (2015). Effects of isoflurane anesthesia on ensemble patterns of Ca2+ activity in mouse V1: Reduced direction selectivity independent of increased correlations in cellular activity. PLOS ONE.

  14. w

    xn--biberciimento-ogb.com - Historical whois Lookup

    • whoisdatacenter.com
    csv
    Updated Feb 23, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AllHeart Web Inc (2018). xn--biberciimento-ogb.com - Historical whois Lookup [Dataset]. https://whoisdatacenter.com/domain/xn--biberciimento-ogb.com/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Feb 23, 2018
    Dataset authored and provided by
    AllHeart Web Inc
    License

    https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/

    Time period covered
    Mar 15, 1985 - Jun 17, 2025
    Description

    Explore the historical Whois records related to xn--biberciimento-ogb.com (Domain). Get insights into ownership history and changes over time.

  15. h

    pjf_llama_instruction_prep

    • huggingface.co
    Updated Mar 15, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ogb (2020). pjf_llama_instruction_prep [Dataset]. https://huggingface.co/datasets/ogbrandt/pjf_llama_instruction_prep
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 15, 2020
    Authors
    ogb
    Description

    ogbrandt/pjf_llama_instruction_prep dataset hosted on Hugging Face and contributed by the HF Datasets community

  16. Amelia Ad1 Llc Importer and Ogb Engineerding Bv Exporter Data to USA

    • seair.co.in
    Updated Feb 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim (2024). Amelia Ad1 Llc Importer and Ogb Engineerding Bv Exporter Data to USA [Dataset]. https://www.seair.co.in
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Feb 18, 2024
    Dataset provided by
    Seair Exim Solutions
    Authors
    Seair Exim
    Area covered
    United States
    Description

    Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.

  17. xn--digitalebrn-ogb.com - Historical whois Lookup

    • whoisdatacenter.com
    csv
    Updated Feb 24, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AllHeart Web Inc (2017). xn--digitalebrn-ogb.com - Historical whois Lookup [Dataset]. https://whoisdatacenter.com/domain/xn--digitalebrn-ogb.com/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Feb 24, 2017
    Dataset provided by
    AllHeart Web
    Authors
    AllHeart Web Inc
    License

    https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/

    Time period covered
    Mar 15, 1985 - Jul 11, 2025
    Description

    Explore the historical Whois records related to xn--digitalebrn-ogb.com (Domain). Get insights into ownership history and changes over time.

  18. h

    nous-pjf

    • huggingface.co
    Updated Mar 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ogb (2020). nous-pjf [Dataset]. https://huggingface.co/datasets/ogbrandt/nous-pjf
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 15, 2020
    Authors
    ogb
    Description

    ogbrandt/nous-pjf dataset hosted on Hugging Face and contributed by the HF Datasets community

  19. h

    Geom3D_data

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    shengchao, Geom3D_data [Dataset]. https://huggingface.co/datasets/chao1224/Geom3D_data
    Explore at:
    Authors
    shengchao
    Description

    Specifications of Dataset Download in Geom3D

    We provide both the raw and processed data at this HuggingFace link.

      PCQM4Mv2
    

    mkdir -p pcqm4mv2/raw cd pcqm4mv2/raw wget http://ogb-data.stanford.edu/data/lsc/pcqm4m-v2-train.sdf.tar.gz tar -xf pcqm4m-v2-train.sdf.tar.gz

    wget http://ogb-data.stanford.edu/data/lsc/pcqm4m-v2.zip unzip pcqm4m-v2.zip mv pcqm4m-v2/raw/data.csv.gz . rm pcqm4m-v2.zip rm -rf pcqm4m-v2

      GEOM
    

    wget… See the full description on the dataset page: https://huggingface.co/datasets/chao1224/Geom3D_data.

  20. h

    gpt4_preference_rlaif

    • huggingface.co
    Updated Feb 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ogb (2024). gpt4_preference_rlaif [Dataset]. https://huggingface.co/datasets/ogbrandt/gpt4_preference_rlaif
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 18, 2024
    Authors
    ogb
    Description

    ogbrandt/gpt4_preference_rlaif dataset hosted on Hugging Face and contributed by the HF Datasets community

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Weihua Hu; Matthias Fey; Hongyu Ren; Maho Nakata; Yuxiao Dong; Jure Leskovec (2024). OGB-LSC Dataset [Dataset]. https://paperswithcode.com/dataset/ogb-lsc

OGB-LSC Dataset

OGB Large-Scale Challenge

Explore at:
454 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jan 25, 2024
Authors
Weihua Hu; Matthias Fey; Hongyu Ren; Maho Nakata; Yuxiao Dong; Jure Leskovec
Description

OGB Large-Scale Challenge (OGB-LSC) is a collection of three real-world datasets for advancing the state-of-the-art in large-scale graph ML. OGB-LSC provides graph datasets that are orders of magnitude larger than existing ones and covers three core graph learning tasks -- link prediction, graph regression, and node classification.

OGB-LSC consists of three datasets: MAG240M-LSC, WikiKG90M-LSC, and PCQM4M-LSC. Each dataset offers an independent task.

MAG240M-LSC is a heterogeneous academic graph, and the task is to predict the subject areas of papers situated in the heterogeneous graph (node classification). WikiKG90M-LSC is a knowledge graph, and the task is to impute missing triplets (link prediction). PCQM4M-LSC is a quantum chemistry dataset, and the task is to predict an important molecular property, the HOMO-LUMO gap, of a given molecule (graph regression).

Search
Clear search
Close search
Google apps
Main menu