92 datasets found
  1. T

    qm9

    • tensorflow.org
    Updated Dec 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). qm9 [Dataset]. http://doi.org/10.6084/m9.figshare.c.978904.v5
    Explore at:
    Dataset updated
    Dec 11, 2024
    Description

    QM9 consists of computed geometric, energetic, electronic, and thermodynamic properties for 134k stable small organic molecules made up of C, H, O, N, and F. As usual, we remove the uncharacterized molecules and provide the remaining 130,831.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('qm9', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  2. c

    Hessian QM9

    • colabfit.org
    • huggingface.co
    Updated Nov 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicholas J. Williams; Lara Kabalan; Ljiljana Stojanovic; Viktor Zólyomi; Edward O. Pyzer-Knapp (2025). Hessian QM9 [Dataset]. https://colabfit.org/id/DS_gk9tv5a9498z_0
    Explore at:
    Dataset updated
    Nov 22, 2025
    Dataset provided by
    ColabFit
    Authors
    Nicholas J. Williams; Lara Kabalan; Ljiljana Stojanovic; Viktor Zólyomi; Edward O. Pyzer-Knapp
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Hessian QM9 is the first database of equilibrium configurations and numerical Hessian matrices, consisting of 41,645 molecules from the QM9 dataset at the wB97x/6-31G* level. Molecular Hessians were calculated in vacuum, as well as in water, tetrahydrofuran, and toluene using an implicit solvation model.

  3. h

    QM9-Dataset

    • huggingface.co
    Updated Oct 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Reza Hemmati (2024). QM9-Dataset [Dataset]. https://huggingface.co/datasets/HR-machine/QM9-Dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 1, 2024
    Authors
    Reza Hemmati
    Description

    HR-machine/QM9-Dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. h

    qm9

    • huggingface.co
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yair Schiff (2024). qm9 [Dataset]. https://huggingface.co/datasets/yairschiff/qm9
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 16, 2024
    Authors
    Yair Schiff
    Description

    Dataset Card for "QM9"

    QM9 dataset from Ruddigkeit et al., 2012; Ramakrishnan et al., 2014. Original data downloaded from: http://quantum-machine.org/datasets. Additional annotations (QED, logP, SA score, NP score, bond and ring counts) added using rdkit library.

      Quick start usage:
    

    from datasets import load_dataset

    ds = load_dataset("yairschiff/qm9")

    Random train/test splits as recommended by:

    https://moleculenet.org/datasets-1

    test_size = 0.1 seed = 1… See the full description on the dataset page: https://huggingface.co/datasets/yairschiff/qm9.

  5. Quantum Machine 9, aka QM9

    • kaggle.com
    zip
    Updated Jun 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nosound (2019). Quantum Machine 9, aka QM9 [Dataset]. https://www.kaggle.com/zaharch/quantum-machine-9-aka-qm9
    Explore at:
    zip(282580282 bytes)Available download formats
    Dataset updated
    Jun 12, 2019
    Authors
    nosound
    Description

    downloaded from: http://quantum-machine.org/datasets/

    Abstract

    Computational de novo design of new drugs and materials requires rigorous and unbiased exploration of chemical compound space. However, large uncharted territories persist due to its size scaling combinatorially with molecular size. We report computed geometric, energetic, electronic, and thermodynamic properties for 134k stable small organic molecules made up of CHONF. These molecules correspond to the subset of all 133,885 species with up to nine heavy atoms (CONF) out of the GDB-17 chemical universe of 166 billion organic molecules. We report geometries minimal in energy, corresponding harmonic frequencies, dipole moments, polarizabilities, along with energies, enthalpies, and free energies of atomization. All properties were calculated at the B3LYP/6-31G(2df,p) level of quantum chemistry. Furthermore, for the predominant stoichiometry, C7H10O2, there are 6,095 constitutional isomers among the 134k molecules. We report energies, enthalpies, and free energies of atomization at the more accurate G4MP2 level of theory for all of them. As such, this data set provides quantum chemical properties for a relevant, consistent, and comprehensive chemical space of small organic molecules. This database may serve the benchmarking of existing methods, development of new methods, such as hybrid quantum mechanics/machine learning, and systematic identification of structure-property relationships.

    Download Available via figshare.

    How to cite When using this dataset, please make sure to cite the following two papers:

    L. Ruddigkeit, R. van Deursen, L. C. Blum, J.-L. Reymond, Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17, J. Chem. Inf. Model. 52, 2864–2875, 2012.

    R. Ramakrishnan, P. O. Dral, M. Rupp, O. A. von Lilienfeld, Quantum chemistry structures and properties of 134 kilo molecules, Scientific Data 1, 140022, 2014. [bibtex]

  6. Revised QM9 dataset (revQM9)

    • zenodo.org
    bin
    Updated Feb 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Danish Khan; Danish Khan; Anatole von Lilienfeld; Anatole von Lilienfeld (2025). Revised QM9 dataset (revQM9) [Dataset]. http://doi.org/10.5281/zenodo.10689884
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 9, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Danish Khan; Danish Khan; Anatole von Lilienfeld; Anatole von Lilienfeld
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Revised QM9 dataset with properties calculated using aPBE0 in the cc-pVTZ basis set.

    The atomic coordinates, atomic numbers, chemical symbols, total energies, atomization energies, MO energies, homos, lumos, dipoles moment norms are in the arrays "coords", "charges", "elements", "energies", "atomization", "moenergies", "homo", "lumo", "dipole" respectively.
    Density matrices will be uploaded soon.

    Usage example :

    import numpy as np
    data = np.load('revQM9.npz',allow_pickle=True)
    coords, q, elems, energies = data['coords'], data['charges'], data['elements'], data['energies']
  7. QM9S dataset

    • figshare.com
    txt
    Updated Dec 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    zihan zou (2023). QM9S dataset [Dataset]. http://doi.org/10.6084/m9.figshare.24235333.v3
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 18, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    zihan zou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We constructed the QM9Spectra(QM9S) dataset using 130K organic molecules based on the popular QM9 dataset. We firstly re-optimized molecular geometries using the Gaussian16 package (B.01 version) at B3LYP/def-TZVP level of theory. Then the molecular properties including scalars (energy, NPA charges, etc.), vectors (electric dipole, etc.), 2nd order tensors (Hessian matrix, quadrupole moment, polarizability, etc.), and 3rd order tensors (octupole moment, first hyperpolarizability, etc.) were calculated at the same level. The frequency analysis and time-dependent density functional theory (TD-DFT) were carried out at the same level to obtain the infrared, Raman, and UV-Vis spectra.Two versions of the dataset, .pt (torch_geometric version) and .csv, are provided for training and use. In addition, we also provide broadened spectra.When using this dataset, please cite to the original article's doi: https://doi.org/10.1038/s43588-023-00550-y instead of the doi provided by figshare.

  8. h

    QM9

    • huggingface.co
    Updated Jun 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gang Liu (2025). QM9 [Dataset]. https://huggingface.co/datasets/liuganghuggingface/QM9
    Explore at:
    Dataset updated
    Jun 28, 2025
    Authors
    Gang Liu
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    liuganghuggingface/QM9 dataset hosted on Hugging Face and contributed by the HF Datasets community

  9. Revised QM9 dataset

    • figshare.com
    zip
    Updated Feb 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Danish Khan (2024). Revised QM9 dataset [Dataset]. http://doi.org/10.6084/m9.figshare.25266574.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 22, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Danish Khan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Revised QM9 dataset with properties calculated using the aPBE0 functional and cc-pVTZ basis set.The atomic coordinates, atomic numbers, chemical symbols, total energies, atomization energies, MO energies, homos, lumos, dipoles moments are in the arrays "coordinates", "charges", "elements", "energies", "atomization", "moenergies", "homos", "lumos", "dipole" respectively.Usage example :import numpy as npdata = np.load('revQM9.npz',allow_pickle=True)coords, q, elems, energies = data['coordinates'], data['charges'], data['elements'], data['energies']

  10. t

    QM9 - Dataset - LDM

    • service.tib.eu
    • resodate.org
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). QM9 - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/qm9
    Explore at:
    Dataset updated
    Nov 25, 2024
    Description

    A dataset of small molecules for benchmarking molecule generation methods. The dataset consists of fingerprints of the molecules, and the goal is to predict the original molecule from the fingerprint.

  11. s

    Results of Quantum Chemical and Machine Learning Computations for Molecules...

    • purl.stanford.edu
    Updated Aug 6, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sinitskiy, Anton V.; Pande, Vijay S. (2019). Results of Quantum Chemical and Machine Learning Computations for Molecules in the QM9 Database [Dataset]. https://purl.stanford.edu/kf921gd3855
    Explore at:
    Dataset updated
    Aug 6, 2019
    Authors
    Sinitskiy, Anton V.; Pande, Vijay S.
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    Two types of approaches to modeling molecular systems have demonstrated high practical efficiency. Density functional theory (DFT), the most widely used quantum chemical method, is a physical approach predicting energies and electron densities of molecules. Recently, numerous papers on machine learning (ML) of molecular properties have also been published. ML models greatly outperform DFT in terms of computational costs, and may even reach comparable accuracy, but they are missing physicality - a direct link to Quantum Physics - which limits their applicability. Here, we propose an approach that combines the strong sides of DFT and ML, namely, physicality and low computational cost. We derive general equations for exact electron densities and energies that can naturally guide applications of ML in Quantum Chemistry. Based on these equations, we build a deep neural network that can compute electron densities and energies of a wide range of organic molecules not only much faster, but also closer to exact physical values than current versions of DFT. In particular, we reached a mean absolute error in energies of molecules with up to eight non-hydrogen atoms as low as 0.9 kcal/mol relative to CCSD(T) values, noticeably lower than those of DFT (approaching ~2 kcal/mol) and ML (~1.5 kcal/mol) methods. A simultaneous improvement in the accuracy of predictions of electron densities and energies suggests that the proposed approach describes the physics of molecules better than DFT functionals developed by "human learning" earlier. Thus, physics-based ML offers exciting opportunities for modeling, with high-theory-level quantum chemical accuracy, of much larger molecular systems than currently possible. Sinitskiy, A. V., & Pande, V. S. Deep Neural Network Computes Electron Densities and Energies of a Large Set of Organic Molecules Faster than Density Functional Theory (DFT). arXiv:1809.02723 (2018). Available at https://arxiv.org/abs/1809.02723 Sinitskiy, A. V., & Pande, V. S. Physical machine learning outperforms "human learning" in Quantum Chemistry. arXiv:1908.00971 (2019). Available at https://arxiv.org/abs/1908.00971

  12. h

    QM9

    • huggingface.co
    Updated Jul 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaoning Li (2023). QM9 [Dataset]. https://huggingface.co/datasets/lisn519010/QM9
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 1, 2023
    Authors
    Shaoning Li
    Description

    Dataset Card for "QM9"

    More Information needed

  13. Accurate GW frontier orbital energies of 134 kilo molecules of the QM9...

    • figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artem Fediai; Patrick Reiser; Jorge Enrique Olivares Peña; Pascal Friederich; Wolfgang Wenzel (2023). Accurate GW frontier orbital energies of 134 kilo molecules of the QM9 dataset. [Dataset]. http://doi.org/10.6084/m9.figshare.21610077.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Artem Fediai; Patrick Reiser; Jorge Enrique Olivares Peña; Pascal Friederich; Wolfgang Wenzel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A dataset of HOMO/LUMO energies of the QM9 dataset computed at GW level of theory.

  14. r

    QM9 Dataset

    • resodate.org
    • service.tib.eu
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yogesh Verma; Markus Heinonen; Vikas Garg (2024). QM9 Dataset [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQvcW05LWRhdGFzZXQ=
    Explore at:
    Dataset updated
    Dec 3, 2024
    Dataset provided by
    Leibniz Data Manager
    Authors
    Yogesh Verma; Markus Heinonen; Vikas Garg
    Description

    The dataset is used for testing the proposed TopNets architecture on molecular property prediction tasks.

  15. QM9-extended-plus database

    • zenodo.org
    csv
    Updated Nov 29, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bartłomiej Fliszkiewicz; Bartłomiej Fliszkiewicz; Marcin Sajdak; Marcin Sajdak (2023). QM9-extended-plus database [Dataset]. http://doi.org/10.5281/zenodo.10184793
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 29, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Bartłomiej Fliszkiewicz; Bartłomiej Fliszkiewicz; Marcin Sajdak; Marcin Sajdak
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    QM9-extended database was further extended with 1781 compounds containing chlorine atoms and 2020 compounds containing bromine atoms.

  16. Z

    QM9-XAS database of 56k QM9 small organic molecules labeled with TDDFT X-ray...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Sep 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kotobi, Amir (2023). QM9-XAS database of 56k QM9 small organic molecules labeled with TDDFT X-ray absorption spectra [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8276901
    Explore at:
    Dataset updated
    Sep 14, 2023
    Dataset provided by
    Helmholtz-Zentrum Hereon
    Authors
    Kotobi, Amir
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Database for training graph neural network (GNN) models in Integrating Explainability into Graph Neural Network Models for the Prediction of X-ray Absorption Spectra, by Amir Kotobi, Kanishka Singh, Daniel Höche, Sadia Bari, Robert H.Meißner, and Annika Bande.

    Included:

    qm9_Cedge_xas_56k.npz: the TDDFT XAS spectra of 56k structures from the QM9 dataset, were employed to label the graph dataset. The dataset contains two pairs of key/value entries: spec_stk, which represents a 2D array containing energies and oscillator strengths of XAS spectra, and id, which consists of the indices of QM9 structures. This data was used to create the QM9-XAS graph dataset.

    qm9xas_orca_output.zip: the raw ORCA output of TDDFT calculations for the 56k QM9-XAS dataset consists of excitation energies, densities, molecular orbitals, and other relevant information. This unprocessed output serves as a source to derive ground truth data for explaining the predictions made by GNNs.

    qm9xas_spec_train_val.pt: processed graph train/validation dataset of 50k QM9 structures. It is used as input to GNN models for training and validation.

    qm9xas_spec_test.pt: processed graph test dataset of 6k QM9 structures. It is used to test the performance of trained GNN models.

    Notes on the datasets:

    The QM9-XAS dataset was created using ORCA electronic structure package [Neese, F., WIREs Computational Molecular Science 2012, 2, 73–78] to calculate carbon K-edge XAS spectra with the time-dependent density functional theory (TDDFT) method [Petersilka, M.; Gossmann, U. J.; Gross, E. K. U., Phys. Rev. Lett. 1996, 76, 1212–1215]

    The molecular structures of QM9-XAS datasets were sourced from the QM9 database [R. Ramakrishnan, P. O. Dral, M. Rupp, and O. A. Von Lilienfeld, Sci. Data 1, 1 (2014)].

    Funding:

    This research was funded by HIDA Trainee Network program, HAICU, Helmholtz AI-4-XAS, DASHH and HEIBRiDS graduate schools. For theoretical calculations and model training, computational resources at DESY and JFZ were used.

  17. d

    qm9

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fu, Tianfan (2023). qm9 [Dataset]. http://doi.org/10.7910/DVN/8ZZZ6J
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Fu, Tianfan
    Description
  18. QM9_molecules

    • kaggle.com
    zip
    Updated Sep 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mario Vozza (2024). QM9_molecules [Dataset]. https://www.kaggle.com/datasets/mariovozza5/qm9-molecules/code
    Explore at:
    zip(135817411 bytes)Available download formats
    Dataset updated
    Sep 9, 2024
    Authors
    Mario Vozza
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Notebook powered by https://daimoners.eu

    PREDICTING MOLECULAR PROPERTIES WITH MACHINE LEARNING

    Introduction and Objectives

    The computational de novo design of new drugs and materials requires a thorough and unbiased exploration of chemical compound space. However, this space remains largely unexplored due to its combinatorial scaling with molecular size. To address this challenge, a dataset of 134,000 stable small organic molecules composed of carbon (C), hydrogen (H), oxygen (O), nitrogen (N), and fluorine (F) has been meticulously computed. These molecules represent a subset of all 133,885 species with up to nine heavy atoms (C, O, N, F) from the GDB-17 chemical universe, which encompasses 166 billion organic molecules.

    For each molecule, computed geometric, energetic, electronic, and thermodynamic properties are provided, including:

    This dataset offers a relevant, consistent, and comprehensive exploration of chemical space for small organic molecules, providing a valuable resource for benchmarking existing methods, developing new methodologies (such as hybrid quantum mechanics/machine learning approaches), and systematically identifying structure-property relationships [1].

    [1] Ramakrishnan, Raghunathan, et al. "Quantum chemistry structures and properties of 134 kilo molecules." Scientific data 1.1 (2014): 1-7.

    In this notebook, we aim to leverage this dataset (QM9) to predict the molecular properties of these small organic molecules using the Coulomb matrix representation. Specifically, we will focus on using the eigenvalues of the Coulomb matrix, which serve as a crucial descriptor for capturing the electronic structure of molecules for predicting molecular properties.

    By the end of this notebook, you will have:

    • Explored the dataset and visualized key molecular properties
    • Generated Coulomb matrices for the molecules in the dataset
    • Calculate the eigenvalues of the Coulomb matrices and predicting properties using machine learning models
    • Evaluated the performance of these models in accurately predicting molecular properties

    Let's begin by loading and exploring the dataset.

    Enjoy! ⚛

    Properties in the QM9 Dataset

    No.PropertyUnitDescription
    1tag‘gdb9’ string to facilitate extraction
    2iConsecutive, 1-based integer identifier
    3AGHzRotational constant
    4BGHzRotational constant
    5CGHzRotational constant
    6μDDipole moment
    7αIsotropic polarizability
    8εHOMOHaEnergy of HOMO
    9εLUMOHaEnergy of LUMO
    10εgapHaGap (εLUMO − εHOMO)
    11/R2SElectronic spatial extent
    12zpveHaZero point vibrational energy
    13U0HaInternal energy at 0 K
    14UHaInternal energy at 298.15 K
    15HHaEnthalpy at 298.15 K
    16GHaFree energy at 298.15 K
    17C vcal/mol·KHeat capacity at 298.15 K

    Dataset Structure

    For each molecule, atomic coordinates and calculated properties are stored in a file named dataset_index.xyz. The XYZ format 1 is a widespread plain text format for encoding Cartesian coordinates of molecules, with no formal specification. It contains a header line specifying the number of atoms n a, a comment line, and n a lines containing element type and atomic coordinates, one atom per line. The comment line is used to store all scalar properties, Mulliken charges are added as a fifth column. Harmonic vibrational frequencies, SMILES and InChI [2] are appended as respective additional lines.

    [1] https://open-babel.readthedocs.io/en/latest/FileFormats/XYZ_cartesian_coordinates_format.html

    [2] https://iupac.org/who-we-are/divisions/division-details/inchi/

    QM9 xyz format

    | Line | Content | |------|----------------------------------------------------------...

  19. d

    QM9 data for graph2mat

    • data.dtu.dk
    txt
    Updated Aug 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arghya Bhowmik (2024). QM9 data for graph2mat [Dataset]. http://doi.org/10.11583/DTU.26195282.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Aug 6, 2024
    Dataset provided by
    Technical University of Denmark
    Authors
    Arghya Bhowmik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Creators

    Pol Febrer (pol.febrer@icn2.cat, ORCID 0000-0003-0904-2234) Peter Bjorn Jorgensen (peterbjorgensen@gmail.com, ORCID 0000-0003-4404-7276) Arghya Bhowmik (arbh@dtu.dk, ORCID 0000-0003-3198-5116)

    Related publication

    The dataset is published as part of the paper: "GRAPH2MAT: UNIVERSAL GRAPH TO MATRIX CONVERSION FOR ELECTRON DENSITY PREDICTION" (https://doi.org/10.26434/chemrxiv-2024-j4g21)

    Short description

    This dataset contains the Hamiltonian, Overlap, Density and Energy Density matrices from SIESTA calculations of the QM9 dataset (https://doi.org/10.6084/m9.figshare.c.978904.v5)

    SIESTA 5.0.0 was used to compute the dataset.

    Contents

    The dataset has four directories:

    • basis: Contains the files specifying the basis used for each atom.
    • pseudos: Contains the pseudopotentials used for the calculation (obtained from http://www.pseudo-dojo.org/, type NC SR (ONCVPSP v0.5), PBE, standard accuracy)
    • runs: The results of running the SIESTA simulations. Contents are discussed next.
    • splits: The data splits used in the published paper. Each file "splits_X.json" contains the splits for training size X.

    The "runs" directory contains one directory for each run, named with the index of the run. Each directory contains: - RUN.fdf, geom.fdf: The input files used for the SIESTA calculation. - RUN.out: The log of the SIESTA run, which apar - siesta.TSDE: Contains the Density and Energy Density matrices. - siesta.TSHS: Contains the Hamiltonian and Overlap matrices.

    Each matrix can be read using the sisl python package (https://github.com/zerothi/sisl) like:

    import sisl
    
    matrix = sisl.get_sile("RUN.fdf").read_X()
    

    where X is hamiltonian, overlap, density_matrix or energy_density_matrix.

    To reproduce the results presented in the paper, follow the documentation of the graph2mat package (https://github.com/BIG-MAP/graph2mat).

    Cite this data

    https://doi.org/10.11583/DTU.c.7310005 © 2024 Technical University of Denmark

    License

    This dataset is published under the CC BY 4.0 license. This license allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator.

  20. Z

    qm9_conj_OT-w

    • data-staging.niaid.nih.gov
    • nde-dev.biothings.io
    Updated Feb 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuan, Ruichen (2025). qm9_conj_OT-w [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_14906023
    Explore at:
    Dataset updated
    Feb 21, 2025
    Dataset provided by
    Tianjin University
    Authors
    Yuan, Ruichen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This database comprises a curated collection of 21,085 conjugated molecules, filtered from the original QM9 dataset. For each molecule, calculations were performed using the LC-ωPBE/6-31G* method. To account for the system dependence of the range-separated parameter ω in the LC-ωPBE approach, the IP method was employed to fine-tune ω, ensuring an optimal value for each molecule. Based on these refined parameters, further calculations were performed to determine molecular properties, extracting key data such as the Hamiltonian matrix, overlap matrix, eigenvalues, and eigenvectors.

    The dataset is stored in a DB file format, with each entry containing the following information:

    i: Index of the molecule

    SMILES: SMILES representation of the molecule

    omega: Value of the range-separated parameter ω

    j2: Equation value constructed when tuning ω using the IP method

    coordinates: Three-dimensional coordinates of atoms in the molecule

    Z: Atomic numbers of the atoms in the molecule

    n_atoms: Number of atoms in the molecule

    hamiltonian: Hamiltonian matrix of the molecule

    eigenvalues: Eigenvalues of the molecule

    overlap: Overlap matrix of the molecular orbitals

    eigenvectors: Eigenvectors of the molecule

    Usage example :

    import sqlite3

    conn = sqlite3.connect('qm9_conj_OT-w.db') cursor = conn.cursor()

    cursor.execute("SELECT * FROM qm9_data WHERE i = 0")molecule_data = cursor.fetchone()

    smiles = molecule_data[1]omega = molecule_data[2]

    conn.close()

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2024). qm9 [Dataset]. http://doi.org/10.6084/m9.figshare.c.978904.v5

qm9

Explore at:
Dataset updated
Dec 11, 2024
Description

QM9 consists of computed geometric, energetic, electronic, and thermodynamic properties for 134k stable small organic molecules made up of C, H, O, N, and F. As usual, we remove the uncharacterized molecules and provide the remaining 130,831.

To use this dataset:

import tensorflow_datasets as tfds

ds = tfds.load('qm9', split='train')
for ex in ds.take(4):
 print(ex)

See the guide for more informations on tensorflow_datasets.

Search
Clear search
Close search
Google apps
Main menu