7 datasets found
  1. P

    OC20 Dataset

    • paperswithcode.com
    Updated Jul 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lowik Chanussot; Abhishek Das; Siddharth Goyal; Thibaut Lavril; Muhammed Shuaibi; Morgane Riviere; Kevin Tran; Javier Heras-Domingo; Caleb Ho; Weihua Hu; Aini Palizhati; Anuroop Sriram; Brandon Wood; Junwoong Yoon; Devi Parikh; C. Lawrence Zitnick; Zachary Ulissi (2023). OC20 Dataset [Dataset]. https://paperswithcode.com/dataset/oc20
    Explore at:
    Dataset updated
    Jul 16, 2023
    Authors
    Lowik Chanussot; Abhishek Das; Siddharth Goyal; Thibaut Lavril; Muhammed Shuaibi; Morgane Riviere; Kevin Tran; Javier Heras-Domingo; Caleb Ho; Weihua Hu; Aini Palizhati; Anuroop Sriram; Brandon Wood; Junwoong Yoon; Devi Parikh; C. Lawrence Zitnick; Zachary Ulissi
    Description

    Open Catalyst 2020 is a dataset for catalysis in chemical engineering. Focusing on molecules that are important in renewable energy applications, the OC20 data set comprises over 1.3 million relaxations of molecular adsorptions onto surfaces, the largest data set of electrocatalyst structures to date.

  2. h

    oc20-s2ef

    • huggingface.co
    Updated Apr 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nima Shoghi (2025). oc20-s2ef [Dataset]. https://huggingface.co/datasets/nimashoghi/oc20-s2ef
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 9, 2025
    Authors
    Nima Shoghi
    Description

    nimashoghi/oc20-s2ef dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. h

    OC20_S2EF_train_200K

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ColabFit, OC20_S2EF_train_200K [Dataset]. https://huggingface.co/datasets/colabfit/OC20_S2EF_train_200K
    Explore at:
    Dataset authored and provided by
    ColabFit
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cite this dataset

    Chanussot, L., Das, A., Goyal, S., Lavril, T., Shuaibi, M., Riviere, M., Tran, K., Heras-Domingo, J., Ho, C., Hu, W., Palizhati, A., Sriram, A., Wood, B., Yoon, J., Parikh, D., Zitnick, C. L., and Ulissi, Z. OC20 S2EF train 200K. ColabFit, 2024. https://doi.org/10.60732/6ccdeb1d

      View on the ColabFit Exchange
    

    https://materials.colabfit.org/id/DS_zdy2xz6y88nl_0

      Dataset Name
    

    OC20 S2EF train 200K

      Description
    

    OC20_S2EF_train_200K is… See the full description on the dataset page: https://huggingface.co/datasets/colabfit/OC20_S2EF_train_200K.

  4. c

    Supporting information for Neural Network Embeddings based Similarity Search...

    • kilthub.cmu.edu
    txt
    Updated Jun 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yilin Yang; Mingjie Liu; John Kitchin (2022). Supporting information for Neural Network Embeddings based Similarity Search Method for Catalyst Systems [Dataset]. http://doi.org/10.1184/R1/19968323.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 3, 2022
    Dataset provided by
    Carnegie Mellon University
    Authors
    Yilin Yang; Mingjie Liu; John Kitchin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this repository, we included code to prepare dataset, train gemnet model, build the faiss index, search the faiss index and visualize the searched results in the notebook faiss-gemnet-qm9-mp.ipynb. It reproduced our examples in the manuscript for the QM9 and the Materials Project dataset. For the OC20 dataset, we did not include its related data here because of its large size (> 50 GB), the code to process the OC20 dataset is almost the same as the code included in the notebook for the QM9 dataset.

    We include the intermediate data (GemNet checkpoints, lmdb, faiss index and the searched result for the QM9 and the Materials project in the directory example-data. We also put the GemNet checkpoint for the OC20 dataset in this directory. The training and evaluation of the Gaussian regression process model using the searched molecules for the query Benzene are demonstrated in the ben-gp-data directory, in which the qm9-gp-gemnet-morgan-random-nrg.ipynb can be run on Colab.

  5. AdsMT: Multi-modal Transformer for Predicting Global Minimum Adsorption...

    • zenodo.org
    • figshare.com
    application/gzip
    Updated Jun 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Junwu Chen; Junwu Chen; Xu Huang; Xu Huang; Cheng Hua; Cheng Hua; Yulian He; Yulian He; Philippe Schwaller; Philippe Schwaller (2024). AdsMT: Multi-modal Transformer for Predicting Global Minimum Adsorption Energy [Dataset]. http://doi.org/10.5281/zenodo.12104162
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jun 20, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Junwu Chen; Junwu Chen; Xu Huang; Xu Huang; Cheng Hua; Cheng Hua; Yulian He; Yulian He; Philippe Schwaller; Philippe Schwaller
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We built three Global Minimum Adsorption Energy (GMAE) benchmark datasets named OCD-GMAE, Alloy-GMAE and FG-GMAE from OC20-Dense, Catalysis Hub, and `functional groups' (FG)-dataset datasets through strict data cleaning, and each data point represents a unique combination of catalyst surface and adsorbate. These new benchmark datasets can be beneficial for future ML study on GMAE prediction.

    In addition, a similar data cleaning procedure was employed on the OC20 dataset to create a new dataset named OC20-LMAE, which comprises surface/adsorbate pairings along with their local minimum adsorption energies (LMAE). The OC20-LMAE dataset contains 363,937 data points and serves as an effective resource for model pretraining.

  6. Z

    Open MatSci ML Toolkit - DGL Graphs for OpenCatalyst IS2RE Task (OC20)

    • data.niaid.nih.gov
    Updated Feb 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Miret, Santiago (2023). Open MatSci ML Toolkit - DGL Graphs for OpenCatalyst IS2RE Task (OC20) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7411132
    Explore at:
    Dataset updated
    Feb 11, 2023
    Dataset provided by
    Miret, Santiago
    Lee, Kin Long Kelvin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset accompanying the release of the Open MatSciML Toolkit, an open source software for development graph neural networks on the OpenCatalyst project using the Deep Graph Library (DGL).

    For more details about the Open MatSci ML Toolkit, check the associated open-source repository and paper.

    Compressed files ~8GB with uncompressed file being ~80 GB.

  7. h

    OC22-IS2RE-Validation-in-domain

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ColabFit, OC22-IS2RE-Validation-in-domain [Dataset]. https://huggingface.co/datasets/colabfit/OC22-IS2RE-Validation-in-domain
    Explore at:
    Dataset authored and provided by
    ColabFit
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset

    OC22-IS2RE-Validation-in-domain

      Description
    

    In-domain validation configurations for the initial structure to relaxed total energy (IS2RE) task of OC22. Open Catalyst 2022 (OC22) is a database of training trajectories for predicting catalytic reactions on oxide surfaces meant to complement OC20, which did not contain oxide surfaces.Additional details stored in dataset columns prepended with "dataset_".

      Dataset authors
    

    Richard Tran, Janice Lan… See the full description on the dataset page: https://huggingface.co/datasets/colabfit/OC22-IS2RE-Validation-in-domain.

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Lowik Chanussot; Abhishek Das; Siddharth Goyal; Thibaut Lavril; Muhammed Shuaibi; Morgane Riviere; Kevin Tran; Javier Heras-Domingo; Caleb Ho; Weihua Hu; Aini Palizhati; Anuroop Sriram; Brandon Wood; Junwoong Yoon; Devi Parikh; C. Lawrence Zitnick; Zachary Ulissi (2023). OC20 Dataset [Dataset]. https://paperswithcode.com/dataset/oc20

OC20 Dataset

Open Catalyst 2020

Explore at:
Dataset updated
Jul 16, 2023
Authors
Lowik Chanussot; Abhishek Das; Siddharth Goyal; Thibaut Lavril; Muhammed Shuaibi; Morgane Riviere; Kevin Tran; Javier Heras-Domingo; Caleb Ho; Weihua Hu; Aini Palizhati; Anuroop Sriram; Brandon Wood; Junwoong Yoon; Devi Parikh; C. Lawrence Zitnick; Zachary Ulissi
Description

Open Catalyst 2020 is a dataset for catalysis in chemical engineering. Focusing on molecules that are important in renewable energy applications, the OC20 data set comprises over 1.3 million relaxations of molecular adsorptions onto surfaces, the largest data set of electrocatalyst structures to date.

Search
Clear search
Close search
Google apps
Main menu