13 datasets found
  1. P

    QM9 Dataset

    • paperswithcode.com
    Updated Feb 2, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). QM9 Dataset [Dataset]. https://paperswithcode.com/dataset/qm9
    Explore at:
    Dataset updated
    Feb 2, 2021
    Description

    QM9 provides quantum chemical properties (at DFT level) for a relevant, consistent, and comprehensive chemical space of small organic molecules. This database may serve the benchmarking of existing methods, development of new methods, such as hybrid quantum mechanics/machine learning, and systematic identification of structure-property relationships.

  2. Quantum Machine 9, aka QM9

    • kaggle.com
    zip
    Updated Jun 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nosound (2019). Quantum Machine 9, aka QM9 [Dataset]. https://www.kaggle.com/zaharch/quantum-machine-9-aka-qm9
    Explore at:
    zip(282580282 bytes)Available download formats
    Dataset updated
    Jun 12, 2019
    Authors
    nosound
    Description

    downloaded from: http://quantum-machine.org/datasets/

    Abstract

    Computational de novo design of new drugs and materials requires rigorous and unbiased exploration of chemical compound space. However, large uncharted territories persist due to its size scaling combinatorially with molecular size. We report computed geometric, energetic, electronic, and thermodynamic properties for 134k stable small organic molecules made up of CHONF. These molecules correspond to the subset of all 133,885 species with up to nine heavy atoms (CONF) out of the GDB-17 chemical universe of 166 billion organic molecules. We report geometries minimal in energy, corresponding harmonic frequencies, dipole moments, polarizabilities, along with energies, enthalpies, and free energies of atomization. All properties were calculated at the B3LYP/6-31G(2df,p) level of quantum chemistry. Furthermore, for the predominant stoichiometry, C7H10O2, there are 6,095 constitutional isomers among the 134k molecules. We report energies, enthalpies, and free energies of atomization at the more accurate G4MP2 level of theory for all of them. As such, this data set provides quantum chemical properties for a relevant, consistent, and comprehensive chemical space of small organic molecules. This database may serve the benchmarking of existing methods, development of new methods, such as hybrid quantum mechanics/machine learning, and systematic identification of structure-property relationships.

    Download Available via figshare.

    How to cite When using this dataset, please make sure to cite the following two papers:

    L. Ruddigkeit, R. van Deursen, L. C. Blum, J.-L. Reymond, Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17, J. Chem. Inf. Model. 52, 2864–2875, 2012.

    R. Ramakrishnan, P. O. Dral, M. Rupp, O. A. von Lilienfeld, Quantum chemistry structures and properties of 134 kilo molecules, Scientific Data 1, 140022, 2014. [bibtex]

  3. s

    Results of Quantum Chemical and Machine Learning Computations for Molecules...

    • purl.stanford.edu
    Updated Aug 6, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sinitskiy, Anton V.; Pande, Vijay S. (2019). Results of Quantum Chemical and Machine Learning Computations for Molecules in the QM9 Database [Dataset]. https://purl.stanford.edu/kf921gd3855
    Explore at:
    Dataset updated
    Aug 6, 2019
    Authors
    Sinitskiy, Anton V.; Pande, Vijay S.
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    Two types of approaches to modeling molecular systems have demonstrated high practical efficiency. Density functional theory (DFT), the most widely used quantum chemical method, is a physical approach predicting energies and electron densities of molecules. Recently, numerous papers on machine learning (ML) of molecular properties have also been published. ML models greatly outperform DFT in terms of computational costs, and may even reach comparable accuracy, but they are missing physicality - a direct link to Quantum Physics - which limits their applicability. Here, we propose an approach that combines the strong sides of DFT and ML, namely, physicality and low computational cost. We derive general equations for exact electron densities and energies that can naturally guide applications of ML in Quantum Chemistry. Based on these equations, we build a deep neural network that can compute electron densities and energies of a wide range of organic molecules not only much faster, but also closer to exact physical values than current versions of DFT. In particular, we reached a mean absolute error in energies of molecules with up to eight non-hydrogen atoms as low as 0.9 kcal/mol relative to CCSD(T) values, noticeably lower than those of DFT (approaching ~2 kcal/mol) and ML (~1.5 kcal/mol) methods. A simultaneous improvement in the accuracy of predictions of electron densities and energies suggests that the proposed approach describes the physics of molecules better than DFT functionals developed by "human learning" earlier. Thus, physics-based ML offers exciting opportunities for modeling, with high-theory-level quantum chemical accuracy, of much larger molecular systems than currently possible.

  4. d

    QM9 data for graph2mat

    • data.dtu.dk
    txt
    Updated Aug 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arghya Bhowmik (2024). QM9 data for graph2mat [Dataset]. http://doi.org/10.11583/DTU.26195282.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Aug 6, 2024
    Dataset provided by
    Technical University of Denmark
    Authors
    Arghya Bhowmik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Creators

    Pol Febrer (pol.febrer@icn2.cat, ORCID 0000-0003-0904-2234) Peter Bjorn Jorgensen (peterbjorgensen@gmail.com, ORCID 0000-0003-4404-7276) Arghya Bhowmik (arbh@dtu.dk, ORCID 0000-0003-3198-5116)

    Related publication

    The dataset is published as part of the paper: "GRAPH2MAT: UNIVERSAL GRAPH TO MATRIX CONVERSION FOR ELECTRON DENSITY PREDICTION" (https://doi.org/10.26434/chemrxiv-2024-j4g21)

    Short description

    This dataset contains the Hamiltonian, Overlap, Density and Energy Density matrices from SIESTA calculations of the QM9 dataset (https://doi.org/10.6084/m9.figshare.c.978904.v5)

    SIESTA 5.0.0 was used to compute the dataset.

    Contents

    The dataset has four directories:

    • basis: Contains the files specifying the basis used for each atom.
    • pseudos: Contains the pseudopotentials used for the calculation (obtained from http://www.pseudo-dojo.org/, type NC SR (ONCVPSP v0.5), PBE, standard accuracy)
    • runs: The results of running the SIESTA simulations. Contents are discussed next.
    • splits: The data splits used in the published paper. Each file "splits_X.json" contains the splits for training size X.

    The "runs" directory contains one directory for each run, named with the index of the run. Each directory contains: - RUN.fdf, geom.fdf: The input files used for the SIESTA calculation. - RUN.out: The log of the SIESTA run, which apar - siesta.TSDE: Contains the Density and Energy Density matrices. - siesta.TSHS: Contains the Hamiltonian and Overlap matrices.

    Each matrix can be read using the sisl python package (https://github.com/zerothi/sisl) like:

    import sisl
    
    matrix = sisl.get_sile("RUN.fdf").read_X()
    

    where X is hamiltonian, overlap, density_matrix or energy_density_matrix.

    To reproduce the results presented in the paper, follow the documentation of the graph2mat package (https://github.com/BIG-MAP/graph2mat).

    Cite this data

    https://doi.org/10.11583/DTU.c.7310005 © 2024 Technical University of Denmark

    License

    This dataset is published under the CC BY 4.0 license. This license allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator.

  5. Data from: QH9: A Quantum Hamiltonian Prediction Benchmark for QM9 Molecules...

    • zenodo.org
    bin
    Updated Mar 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haiyang Yu; Haiyang Yu (2024). QH9: A Quantum Hamiltonian Prediction Benchmark for QM9 Molecules [Dataset]. http://doi.org/10.5281/zenodo.8274793
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 28, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Haiyang Yu; Haiyang Yu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the official QH9 datasets from paper 'QH9: A Quantum Hamiltonian Prediction Benchmark for QM9 Molecules'. QH9 is a new Quantum Hamiltonian dataset providing precise Hamiltonian matrices for 130,831 stable molecular geometries, based on the QM9 dataset. Here is the QH9Stable dataset which is used in QH-Stable-iid and QH-Stable-ood.

  6. d

    QM9 Charge Densities and Energies Calculated with VASP

    • data.dtu.dk
    bin
    Updated Jul 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter Bjørn Jørgensen; Arghya Bhowmik (2023). QM9 Charge Densities and Energies Calculated with VASP [Dataset]. http://doi.org/10.11583/DTU.16794500.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jul 10, 2023
    Dataset provided by
    Technical University of Denmark
    Authors
    Peter Bjørn Jørgensen; Arghya Bhowmik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    QM9 molecules calculated with VASP using Atomic Simulation Environment with the following parameters: Vasp(xc='PBE', istart=0, algo='Normal', icharg=2, nelm=180, ispin=1, nelmdl=6, isym=0, lcorr=True, potim=0.1, nelmin=5, kpts=[1,1,1], ismear=0, ediff=0.1E-05, sigma=0.1, nsw=0, ldiag=True, lreal='Auto', lwave=False, lcharg=True, encut=400)

    The resulting CHGCAR files have been compressed with lz4 compression and packed in non-compressed tar archives with up to 1000 structures in each.

    The datasplits json files contain the indices (0-index) of the train, validation and test sets used in the paper "Graph neural networks for fast electron density estimation of molecules, liquids, and solids"

    The QM9 molecule structures were obtained from https://doi.org/10.6084/m9.figshare.c.978904.v5

  7. h

    QM9_ADiT

    • huggingface.co
    Updated May 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chaitanya K. Joshi (2025). QM9_ADiT [Dataset]. https://huggingface.co/datasets/chaitjo/QM9_ADiT
    Explore at:
    Dataset updated
    May 23, 2025
    Authors
    Chaitanya K. Joshi
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    All-atom Diffusion Transformers - QM9 dataset

    QM9 dataset from the paper "All-atom Diffusion Transformers: Unified generative modelling of molecules and materials", by Chaitanya K. Joshi, Xiang Fu, Yi-Lun Liao, Vahe Gharakhanyan, Benjamin Kurt Miller, Anuroop Sriram*, and Zachary W. Ulissi* from FAIR Chemistry at Meta (* Joint last author). Original data source: https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.datasets.QM9.html (Adapted from MoleculeNet)… See the full description on the dataset page: https://huggingface.co/datasets/chaitjo/QM9_ADiT.

  8. Z

    ANI-1E: An equilibrium database from the ANI-1 database

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meuwly, Markus (2021). ANI-1E: An equilibrium database from the ANI-1 database [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4680952
    Explore at:
    Dataset updated
    Oct 5, 2021
    Dataset provided by
    Vazquez-Salazar, Luis Itza
    Meuwly, Markus
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ANI-1E: An equilibrium database from the ANI-1 database v.2.0

    Authors: Luis Itza Vazquez-Salazar and Markus Meuwly E-mail contact: litzavazquezs@gmail.com and m.meuwly@unibas.ch

    From the SMILES strings, provided by Smith et al., initial geometries using OpenBabel were generated. Subsequently, geometries were optimised using PM7 implemented in MOPAC2016, before a final geometry optimisation and frequency calculation at the ωB97x/6-31G(d) level of theory performed using Gaussian09. The final results were checked to assure that they did not contain imaginary frequencies and therefore correspond to a minimum on the potential energy surface, which can be different from the global minima for the molecule. The total number of molecules is 57455; 7 molecules were unstable for optimisation. The format of the files is .xyz, following the style of the QM9 database and it contains the geometry minimal in energy, rotational constants, dipole moments, polarizabilities, along with energies of HOMO and LUMO, electronic spatial extent, zero-point energy, enthalpies, and free energies of atomisation. The header of the .xyz file follows the format given in Table 3 of the QM9 paper with the difference that the TAG is 'ANI-1E'. Additionally, a file with the original smiles of the ANI-1 dataset and the smiles of ANI-1E is added. The seven molecules (56176,56177,56213,56214,56215,56216,56217) that do not converge are not included in the new database. The .xyz of the final structures are available in the folder 'failed'. The output files for all optimizations are available upon reasonable request to the authors.

    We acknowledge Alfred Andersson and Prof. David van der Spoel for attracting our attention to the problems on the first version of our database.

  9. m

    Data from: Unified theory of atom-centered representations and...

    • staging-archive.materialscloud.org
    • archive.materialscloud.org
    Updated Mar 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Materials Cloud (2022). Unified theory of atom-centered representations and message-passing machine-learning schemes [Dataset]. http://doi.org/10.24435/materialscloud:3f-g3
    Explore at:
    Dataset updated
    Mar 24, 2022
    Dataset provided by
    Materials Cloud
    Description

    Data-driven schemes that associate molecular and crystal structures with their microscopic properties share the need for a concise, effective description of the arrangement of their atomic constituents. Many types of models rely on descriptions of atom-centered environments, that are associated with an atomic property or with an atomic contribution to an extensive macroscopic quantity. Frameworks in this class can be understood in terms of atom-centered density correlations (ACDC), that are used as a basis for a body-ordered, symmetry-adapted expansion of the targets. Several other schemes, that gather information on the relationship between neighboring atoms using "message-passing" ideas, cannot be directly mapped to correlations centered around a single atom. We generalize the ACDC framework to include multi-centered information, generating representations that provide a complete linear basis to regress symmetric functions of atomic coordinates, and provides a coherent foundation to systematize our understanding of both atom-centered and message-passing, invariant and equivariant machine-learning schemes.

    This record contains the data and code required to reproduce the results from the corresponding paper, computing message-passing inspired machine learning features built on top of density correlation. The data used in this article is a subset of other existing datasets, which can be found online:

  10. f

    Hydrolysis Datasets from HEPOM paper.

    • figshare.com
    txt
    Updated Nov 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Santiago Vargas; Rishabh Guha (2024). Hydrolysis Datasets from HEPOM paper. [Dataset]. http://doi.org/10.6084/m9.figshare.27851130.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 19, 2024
    Dataset provided by
    figshare
    Authors
    Santiago Vargas; Rishabh Guha
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasets from HEPOM Paper. Includes train/test sets from combined, neutral(qm9+alchemy), protonated, and hydroxylated datasets.

  11. Dataset: Real-time interpretation of neutron vibrational spectra with...

    • zenodo.org
    application/gzip
    Updated Feb 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bowen Han; Bowen Han; Yongqiang Cheng; Yongqiang Cheng (2025). Dataset: Real-time interpretation of neutron vibrational spectra with symmetry-equivariant Hessian matrix prediction [Dataset]. http://doi.org/10.5281/zenodo.14796533
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Feb 4, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Bowen Han; Bowen Han; Yongqiang Cheng; Yongqiang Cheng
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains the optimized structures and the corresponding molecular Hessian matrices of selected molecules in QM8, QM9 and PubChem databases. The data is use to train and validate a machine learning model described in the paper titled "Real-time interpretation of neutron vibrational spectra with symmetry-equivariant Hessian matrix prediction". In the tarball file, the three txt files contains the indexes of molecules in the corresponding datasets. The three folders (for QM8, QM9 and PubChem, respectively) contain the structure (xyz) files and the molecular Hessian (npy) files, which can be loaded with numpy module in Python.

    The shape of each Hessian matrix, H, is (3n, 3n), where n is number of atoms in the molecule. The multiplier 3 means that each atom has 3 degrees of freesom, x,y and z. The order of the indexes along row and column is organized as 1x, 1y, 1z, 2x, 2y, 2z, 3x, 3y, 3z, ..., till nx, ny, nz. The order from 1 to n corresponds to the order of the atoms in the structure file. The training data could be easily generated from the data provided here with the script provided in the paper (https://github.com/maplewen4/INS_molecule).

  12. Z

    AIMEl-DB: Atomic Properties for 44K small organic molecules

    • data.niaid.nih.gov
    Updated Aug 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carpio-Martínez, Pablo (2024). AIMEl-DB: Atomic Properties for 44K small organic molecules [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10610993
    Explore at:
    Dataset updated
    Aug 29, 2024
    Dataset provided by
    Ramírez-Palma, David I.
    Cortés-Guzmán, Fernando
    Martinez-Mayorga, Karina
    Carpio-Martínez, Pablo
    Meza-González, Brandon
    Vázquez-Cuevas, David
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    AIMEl-DB: Atomic Properties for 44K small organic molecules

    This dataset comprises atomic properties of 44K (44 470) molecules selected from the QM9 database. The file names are based on the same indexing system used for QM9.

    This dataset includes four types of files:

    .com FilesInput files for Gaussian 16. Simple-point energy calculations were carried out using the keywords# B3LYP/6-31G(2df,p) scf=(maxcycle=9999) nosymm output=wfx

    .log FilesOutput files from Gaussian 16 calculation with the aformentioned parameters.

    .wfx FilesWave function files from Gaussian 16 calculation. These files were used as inputs for QTAIM calculations.

    .sumviz FilesOutput file from AIMAll software. The keywords used for the calculations wereaimqb -nogui -scp=false -nproc=8 -naat=4 input.wfxEach .sumviz file contains more than 30 properties based on the Quantum Theory of Atoms in Molecules (QTAIM).

    .csv FilesThese files contain the results of a in-house treament of .sumviz data. They cointain two calculated atomic properties:

    Total magnitude of the dipole moment, |mu|

    Total magnitude of the quadrupole moment, |Q|

        and two extracted atomic properties:         3. Electronic Population, N         4. Atomic Energy, E
    

    The aimel_merged_44k.csv presents the concatenation of the 44 470 csv Files.

    Additionaly, the aimel_merged_38k.csv presents the concatenation of the 38 876 csv Files. This file corresponds to the version 1.0 of the dataset.

    If you find this dataset useful, please cite the original paper:

    Meza-González, B., Ramírez-Palma, D.I., Carpio-Martínez, P. et al. Quantum Topological Atomic Properties of 44K molecules. Sci Data 11, 945 (2024). https://doi.org/10.1038/s41597-024-03723-0

  13. f

    ViBench: vibrational spectrum-to-structure benchmark

    • figshare.com
    bin
    Updated May 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    xinyu lu (2025). ViBench: vibrational spectrum-to-structure benchmark [Dataset]. http://doi.org/10.6084/m9.figshare.28579832.v2
    Explore at:
    binAvailable download formats
    Dataset updated
    May 7, 2025
    Dataset provided by
    figshare
    Authors
    xinyu lu
    License

    https://www.apache.org/licenses/LICENSE-2.0.htmlhttps://www.apache.org/licenses/LICENSE-2.0.html

    Description

    IntrocutionThis is an official dataset used to develop Vib2Mol. We have established a vibrational spectrum-to-structure benchmark (ViBench, VB), which consists of eight parts: VB-qm9, VB-zinc15, VB-mols, VB-geometry, VB-PAHs, VB-RXN, VB-peptide, and VB-peptide-mod. Details are listed in our paper.Density functional theory (DFT) was employed to perform conformational optimization of these molecules and calculated the corresponding infrared and Raman spectra. All quantum chemical calculations were carried out using the Gaussian 16 program. The geometries were optimized using the B3LYP-D3BJ functional with a 6-311+G** basis set. Frequency calculations were obtained at the same level at the optimized geometry.Furthermore, to test model's generalization on experimental spectra, we collected experimentally measured infrared spectra from the public NIST dataset.To facilitate calculations, the spectral dimensions were unified to 1024, and molecular structures were all represented using SMILES.FundingsThis work was supported by the National Natural Science Foundation (Grant No: 22227802, 22021001, 22474117, 22272139) of China and the Fundamental Research Funds for the Central Universities (20720220009) and Shanghai Innovation Institute.

  14. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2021). QM9 Dataset [Dataset]. https://paperswithcode.com/dataset/qm9

QM9 Dataset

Explore at:
Dataset updated
Feb 2, 2021
Description

QM9 provides quantum chemical properties (at DFT level) for a relevant, consistent, and comprehensive chemical space of small organic molecules. This database may serve the benchmarking of existing methods, development of new methods, such as hybrid quantum mechanics/machine learning, and systematic identification of structure-property relationships.

Search
Clear search
Close search
Google apps
Main menu