71 datasets found

P
QM9 Dataset
paperswithcode.com
Updated Nov 25, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). QM9 Dataset [Dataset]. https://paperswithcode.com/dataset/qm9
Explore at:
Dataset updated
Nov 25, 2021
Description
QM9 provides quantum chemical properties (at DFT level) for a relevant, consistent, and comprehensive chemical space of small organic molecules. This database may serve the benchmarking of existing methods, development of new methods, such as hybrid quantum mechanics/machine learning, and systematic identification of structure-property relationships.
T
qm9
tensorflow.org
Updated Dec 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). qm9 [Dataset]. http://doi.org/10.6084/m9.figshare.c.978904.v5
Explore at:
Unique identifier
https://doi.org/10.6084/m9.figshare.c.978904.v5
Dataset updated
Dec 11, 2024
Description
QM9 consists of computed geometric, energetic, electronic, and thermodynamic properties for 134k stable small organic molecules made up of C, H, O, N, and F. As usual, we remove the uncharacterized molecules and provide the remaining 130,831.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('qm9', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
Quantum Machine 9, aka QM9
kaggle.com
zip
Updated Jun 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nosound (2019). Quantum Machine 9, aka QM9 [Dataset]. https://www.kaggle.com/zaharch/quantum-machine-9-aka-qm9
Explore at:
zip(282580282 bytes)Available download formats
Dataset updated
Jun 12, 2019
Authors
nosound
Description
downloaded from: http://quantum-machine.org/datasets/

Abstract

Computational de novo design of new drugs and materials requires rigorous and unbiased exploration of chemical compound space. However, large uncharted territories persist due to its size scaling combinatorially with molecular size. We report computed geometric, energetic, electronic, and thermodynamic properties for 134k stable small organic molecules made up of CHONF. These molecules correspond to the subset of all 133,885 species with up to nine heavy atoms (CONF) out of the GDB-17 chemical universe of 166 billion organic molecules. We report geometries minimal in energy, corresponding harmonic frequencies, dipole moments, polarizabilities, along with energies, enthalpies, and free energies of atomization. All properties were calculated at the B3LYP/6-31G(2df,p) level of quantum chemistry. Furthermore, for the predominant stoichiometry, C7H10O2, there are 6,095 constitutional isomers among the 134k molecules. We report energies, enthalpies, and free energies of atomization at the more accurate G4MP2 level of theory for all of them. As such, this data set provides quantum chemical properties for a relevant, consistent, and comprehensive chemical space of small organic molecules. This database may serve the benchmarking of existing methods, development of new methods, such as hybrid quantum mechanics/machine learning, and systematic identification of structure-property relationships.

Download Available via figshare.

How to cite When using this dataset, please make sure to cite the following two papers:

L. Ruddigkeit, R. van Deursen, L. C. Blum, J.-L. Reymond, Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17, J. Chem. Inf. Model. 52, 2864–2875, 2012.

R. Ramakrishnan, P. O. Dral, M. Rupp, O. A. von Lilienfeld, Quantum chemistry structures and properties of 134 kilo molecules, Scientific Data 1, 140022, 2014. [bibtex]
h
QM9-Dataset
huggingface.co
Updated Oct 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Reza Hemmati (2024). QM9-Dataset [Dataset]. https://huggingface.co/datasets/HR-machine/QM9-Dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 1, 2024
Authors
Reza Hemmati
Description
HR-machine/QM9-Dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Revised QM9 dataset (revQM9)
zenodo.org
bin
Updated Feb 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Danish Khan; Danish Khan; Anatole von Lilienfeld; Anatole von Lilienfeld (2025). Revised QM9 dataset (revQM9) [Dataset]. http://doi.org/10.5281/zenodo.10689884
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10689884
Dataset updated
Feb 9, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Danish Khan; Danish Khan; Anatole von Lilienfeld; Anatole von Lilienfeld
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Revised QM9 dataset with properties calculated using aPBE0 in the cc-pVTZ basis set.

The atomic coordinates, atomic numbers, chemical symbols, total energies, atomization energies, MO energies, homos, lumos, dipoles moment norms are in the arrays "coords", "charges", "elements", "energies", "atomization", "moenergies", "homo", "lumo", "dipole" respectively.
Density matrices will be uploaded soon.

Usage example :

import numpy as np

data = np.load('revQM9.npz',allow_pickle=True)

coords, q, elems, energies = data['coords'], data['charges'], data['elements'], data['energies']
f
Accurate GW frontier orbital energies of 134 kilo molecules of the QM9...
figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Artem Fediai; Patrick Reiser; Jorge Enrique Olivares Peña; Pascal Friederich; Wolfgang Wenzel (2023). Accurate GW frontier orbital energies of 134 kilo molecules of the QM9 dataset. [Dataset]. http://doi.org/10.6084/m9.figshare.21610077.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21610077.v1
Dataset updated
May 30, 2023
Dataset provided by
figshare
Authors
Artem Fediai; Patrick Reiser; Jorge Enrique Olivares Peña; Pascal Friederich; Wolfgang Wenzel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A dataset of HOMO/LUMO energies of the QM9 dataset computed at GW level of theory.
s
Results of Quantum Chemical and Machine Learning Computations for Molecules...
purl.stanford.edu
Updated Aug 6, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sinitskiy, Anton V.; Pande, Vijay S. (2019). Results of Quantum Chemical and Machine Learning Computations for Molecules in the QM9 Database [Dataset]. https://purl.stanford.edu/kf921gd3855
Explore at:
Dataset updated
Aug 6, 2019
Authors
Sinitskiy, Anton V.; Pande, Vijay S.
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
Two types of approaches to modeling molecular systems have demonstrated high practical efficiency. Density functional theory (DFT), the most widely used quantum chemical method, is a physical approach predicting energies and electron densities of molecules. Recently, numerous papers on machine learning (ML) of molecular properties have also been published. ML models greatly outperform DFT in terms of computational costs, and may even reach comparable accuracy, but they are missing physicality - a direct link to Quantum Physics - which limits their applicability. Here, we propose an approach that combines the strong sides of DFT and ML, namely, physicality and low computational cost. We derive general equations for exact electron densities and energies that can naturally guide applications of ML in Quantum Chemistry. Based on these equations, we build a deep neural network that can compute electron densities and energies of a wide range of organic molecules not only much faster, but also closer to exact physical values than current versions of DFT. In particular, we reached a mean absolute error in energies of molecules with up to eight non-hydrogen atoms as low as 0.9 kcal/mol relative to CCSD(T) values, noticeably lower than those of DFT (approaching ~2 kcal/mol) and ML (~1.5 kcal/mol) methods. A simultaneous improvement in the accuracy of predictions of electron densities and energies suggests that the proposed approach describes the physics of molecules better than DFT functionals developed by "human learning" earlier. Thus, physics-based ML offers exciting opportunities for modeling, with high-theory-level quantum chemical accuracy, of much larger molecular systems than currently possible.
Taxnonomic classifications for all structure in the QM9 dataset
zenodo.org
application/gzip, png
Updated Jul 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Evan Komp; Evan Komp (2024). Taxnonomic classifications for all structure in the QM9 dataset [Dataset]. http://doi.org/10.5281/zenodo.6498857
Explore at:
application/gzip, pngAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6498857
Dataset updated
Jul 16, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Evan Komp; Evan Komp
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The classification of molecules according to ClassyFire [1] for the QM9 dataset [2].

The QM9 dataset is a set of nearly 140k organic molecules with no more than 9 C, N, O, and F atoms optimized to a stable structure with DFT.

ClassyFire is a tool and taxonomic library for the labeling of molecules.

1. Djoumbou Feunang, Y. et al. ClassyFire: automated chemical classification with a comprehensive, computable taxonomy. J. Cheminform. 8, 1–20 (2016).

2. Ramakrishnan, R., Dral, P. O., Rupp, M. & Von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 1, 1–7 (2014).

The data directory ('QM9_jsons_classified.tar.gz') contains a `json` file for each structure in the QM9 dataset. The name of the file is the same identifier as from QM9. Data fields include:

- `cf_alternative_parents` : classifications describing the compound that do not fall in the given ancestry

- `cf_ancestors` : classes along the taxonomic branch for the structure

- `cf_class` : ClassyFire given class

- `cf_superclass` : ClassyFire given super class

- `cf_subclass` : ClassyFire given subclass

- `cf_direct_parent` : Class one level above this structure on the taxonomic branch

- `cf_description` : Exposition on the given class

- `cf_identifier` : identifier for the structure in the ClassyFire database

- `cf_intermediate_nodes` : classes connecting branches on taxonomic tree

- `cf_kingdom` : ClassyFire given kingdom

- `cf_molecular_framework` : describes aromaticity and number of cycles

- `cf_predicted_chebi_terms` : terms describing the molecule in the ChEBI framework

- `cf_predicted_lipidmaps_terms` : terms describing the molecule in LIPID MAPS framework

- `cf_smiles` : smiles string given by ClassyFire

- `cf_substituents` : substituent groups in the structure

Many fields contain subfields, seen in the example below for molecule with QM9 id 000123:

{"cf_alternative_parents":[{"name":"Dialkylamines","description":"Organic compounds containing a dialkylamine group, characterized by two alkyl groups bonded to the amino nitrogen.","chemont_id":"CHEMONTID:0002228","url":"http:\/\/classyfire.wishartlab.com\/tax_nodes\/C0002228"},{"name":"Organopnictogen compounds","description":"Compounds containing a bond between carbon a pnictogen atom. Pnictogens are p-block element atoms that are in the group 15 of the periodic table.","chemont_id":"CHEMONTID:0004557","url":"http:\/\/classyfire.wishartlab.com\/tax_nodes\/C0004557"},{"name":"Hydrocarbon derivatives","description":"Derivatives of hydrocarbons obtained by substituting one or more carbon atoms by an heteroatom. They contain at least one carbon atom and heteroatom.","chemont_id":"CHEMONTID:0004150","url":"http:\/\/classyfire.wishartlab.com\/tax_nodes\/C0004150"}],"cf_ancestors":["Alpha-aminonitriles","Amines","Chemical entities","Dialkylamines","Hydrocarbon derivatives","Nitriles","Organic compounds","Organic cyanides","Organic nitrogen compounds","Organonitrogen compounds","Organopnictogen compounds","Secondary amines"],"cf_class":"Organonitrogen compounds","cf_classification_version":"2.1","cf_description":"This compound belongs to the class of organic compounds known as alpha-aminonitriles. These are organonitrogen compounds that contain an amino group located on the carbon at the position alpha to a carbonitrile group. They have the general formula RC(NH2)C#N, where the amine group can be substituted.","cf_direct_parent":{"name":"Alpha-aminonitriles","description":"Organonitrogen compounds that contain an amino group located on the carbon at the position alpha to a carbonitrile group. They have the general formula RC(NH2)C#N, where the amine group can be substituted.","chemont_id":"CHEMONTID:0004453","url":"http:\/\/classyfire.wishartlab.com\/tax_nodes\/C0004453"},"cf_external_descriptors":[],"cf_identifier":"Q5198051-1","cf_inchikey":"InChIKey=PVVRRUUMHFWFQV-UHFFFAOYSA-N","cf_intermediate_nodes":[{"name":"Nitriles","description":"Compounds having the structure RC#N; thus C-substituted derivatives of hydrocyanic acid, HC#N.","chemont_id":"CHEMONTID:0000362","url":"http:\/\/classyfire.wishartlab.com\/tax_nodes\/C0000362"}],"cf_kingdom":"Organic compounds","cf_molecular_framework":"Aliphatic acyclic compounds","cf_predicted_chebi_terms":["chemical entity (CHEBI:24431)","organic molecular entity (CHEBI:50860)","organonitrogen compound (CHEBI:35352)","secondary amino compound (CHEBI:50995)","nitrile (CHEBI:18379)","amine (CHEBI:32952)","secondary amine (CHEBI:32863)","cyanides (CHEBI:23424)","organic molecule (CHEBI:72695)","pnictogen molecular entity (CHEBI:33302)","nitrogen molecular entity (CHEBI:51143)"],"cf_predicted_lipidmaps_terms":[],"cf_smiles":"CNCC#N","cf_subclass":"Organic cyanides","cf_substituents":["Alpha-aminonitrile","Secondary amine","Secondary aliphatic amine","Organopnictogen compound","Hydrocarbon derivative","Amine","Aliphatic acyclic compound"],"cf_superclass":"Organic nitrogen compounds"}

A visualization ''qm9_pie_labeled.png" is given of a fracturization of superclasses within qm9 down to subclass.
QM9S dataset
figshare.com
txt
Updated Dec 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
zihan zou (2023). QM9S dataset [Dataset]. http://doi.org/10.6084/m9.figshare.24235333.v3
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24235333.v3
Dataset updated
Dec 18, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
zihan zou
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We constructed the QM9Spectra(QM9S) dataset using 130K organic molecules based on the popular QM9 dataset. We firstly re-optimized molecular geometries using the Gaussian16 package (B.01 version) at B3LYP/def-TZVP level of theory. Then the molecular properties including scalars (energy, NPA charges, etc.), vectors (electric dipole, etc.), 2nd order tensors (Hessian matrix, quadrupole moment, polarizability, etc.), and 3rd order tensors (octupole moment, first hyperpolarizability, etc.) were calculated at the same level. The frequency analysis and time-dependent density functional theory (TD-DFT) were carried out at the same level to obtain the infrared, Raman, and UV-Vis spectra.Two versions of the dataset, .pt (torch_geometric version) and .csv, are provided for training and use. In addition, we also provide broadened spectra.When using this dataset, please cite to the original article's doi: https://doi.org/10.1038/s43588-023-00550-y instead of the doi provided by figshare.
h
qm9
huggingface.co
Updated Sep 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nima Shoghi (2024). qm9 [Dataset]. https://huggingface.co/datasets/nimashoghi/qm9
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 8, 2024
Authors
Nima Shoghi
Description
nimashoghi/qm9 dataset hosted on Hugging Face and contributed by the HF Datasets community
d
Indices of the QM9 molecules which are not present in either of the...
data.dtu.dk
txt
Updated Sep 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Surajit Nandi; Tejs Vegge; Arghya Bhowmik (2023). Indices of the QM9 molecules which are not present in either of the molecular or reaction datasets. [Dataset]. http://doi.org/10.11583/DTU.21028468.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.11583/DTU.21028468.v1
Dataset updated
Sep 29, 2023
Dataset provided by
Technical University of Denmark
Authors
Surajit Nandi; Tejs Vegge; Arghya Bhowmik
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This file contains numbers that are index of the QM9 molecules. These indices are not present in either of our molecular or reaction datasets. These indices are not considered because there were problems converting the coordinates to SMILES string.
This item is part of the collection MultiXC-QM9 with DOI: 10.11583/DTU.c.6185986
modelforge curated dataset: QM9
zenodo.org
application/gzip
Updated May 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christopher Iacovella; Christopher Iacovella; John Chodera; John Chodera; Shuai Yan; Shuai Yan (2025). modelforge curated dataset: QM9 [Dataset]. http://doi.org/10.5281/zenodo.15390593
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15390593
Dataset updated
May 12, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Christopher Iacovella; Christopher Iacovella; John Chodera; John Chodera; Shuai Yan; Shuai Yan
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Modelforge Curated QM9 Dataset:
- 1000 conformer test set
- Version: nc_1000_v1.1:

This provides a curated hdf5 file for a subset of the QM9 dataset to be used for testing purposes, designed to be compatible with modelforge, an infrastructure to implement and train NNPs. This test dataset contains 1000 configurations, 1 for each unique system.

When applicable, the units of properties are provided in the datafile, encoded as strings compatible with the openff-units package. For more information about the structure of the data file, please see the following:

https://github.com/choderalab/modelforge/wiki/Dataset-and-curation#curation-module

Properties Included:

atomic_numbers

positions

"per_atom"

"nanometer"

partial_charges

"per_atom"

"elementary_charge"

polarizability

"per_system"

"nanometer ** 3"

dipole_moment_per_system

"per_system"

"elementary_charge * nanometer"

dipole_moment_scalar_per_system

"per_system"

"elementary_charge * nanometer"

energy_of_homo

"per_system"

"kilojoule_per_mole"

lumo-homo_gap

"per_system"

"kilojoule_per_mole"

zero_point_vibrational_energy

"per_system"

"kilojoule_per_mole"

internal_energy_at_298.15K

"per_system"

"kilojoule_per_mole"

internal_energy_at_0K

"per_system"

"kilojoule_per_mole"

enthalpy_at_298.15K

"per_system"

"kilojoule_per_mole"

free_energy_at_298.15K

"per_system"

"kilojoule_per_mole"

heat_capacity_at_298.15K

"per_system"

"kilojoule_per_mole / kelvin"

rotational_constants

"per_system"

"gigahertz"

harmonic_vibrational_frequencies

"per_system"

"1 / centimeter"

electronic_spatial_extent

"per_system"

"nanometer ** 2"

smiles_gdb-17

"meta_data"

"meta_data"

inchi_corina

"meta_data"

inchi_b3lyp

"meta_data"

idx

"meta_data"

tag

"meta_data"

Original Source:

The QM9 dataset includes 133,885 organic molecules with up to nine total heavy atoms (C,O,N,or F; excluding H) original published by Ramakrishnan, et al. Properties in the QM9 dataset were calculated at the B3LYP/6-31G(2df,p) level of quantum chemistry.

Citations:

Original publication:

Ramakrishnan, R., Dral, P., Rupp, M. et al."Quantum chemistry structures and properties of 134 kilo molecules." Sci Data 1, 140022 (2014). https://doi.org/10.1038/sdata.2014.22

Source dataset, released with CCO 1.0 Universal license:

Ramakrishnan, Raghunathan; Dral, Pavlo; Rupp, Matthias; Anatole von Lilienfeld, O. (2014). Quantum chemistry structures and properties of 134 kilo molecules. figshare. Collection. https://doi.org/10.6084/m9.figshare.c.978904.v5
Z
QM9-XAS database of 56k QM9 small organic molecules labeled with TDDFT X-ray...
data.niaid.nih.gov
zenodo.org
Updated Sep 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kotobi, Amir (2023). QM9-XAS database of 56k QM9 small organic molecules labeled with TDDFT X-ray absorption spectra [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8276901
Explore at:
Dataset updated
Sep 14, 2023
Dataset authored and provided by
Kotobi, Amir
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Database for training graph neural network (GNN) models in Integrating Explainability into Graph Neural Network Models for the Prediction of X-ray Absorption Spectra, by Amir Kotobi, Kanishka Singh, Daniel Höche, Sadia Bari, Robert H.Meißner, and Annika Bande.

Included:

qm9_Cedge_xas_56k.npz: the TDDFT XAS spectra of 56k structures from the QM9 dataset, were employed to label the graph dataset. The dataset contains two pairs of key/value entries: spec_stk, which represents a 2D array containing energies and oscillator strengths of XAS spectra, and id, which consists of the indices of QM9 structures. This data was used to create the QM9-XAS graph dataset.

qm9xas_orca_output.zip: the raw ORCA output of TDDFT calculations for the 56k QM9-XAS dataset consists of excitation energies, densities, molecular orbitals, and other relevant information. This unprocessed output serves as a source to derive ground truth data for explaining the predictions made by GNNs.

qm9xas_spec_train_val.pt: processed graph train/validation dataset of 50k QM9 structures. It is used as input to GNN models for training and validation.

qm9xas_spec_test.pt: processed graph test dataset of 6k QM9 structures. It is used to test the performance of trained GNN models.

Notes on the datasets:

The QM9-XAS dataset was created using ORCA electronic structure package [Neese, F., WIREs Computational Molecular Science 2012, 2, 73–78] to calculate carbon K-edge XAS spectra with the time-dependent density functional theory (TDDFT) method [Petersilka, M.; Gossmann, U. J.; Gross, E. K. U., Phys. Rev. Lett. 1996, 76, 1212–1215]

The molecular structures of QM9-XAS datasets were sourced from the QM9 database [R. Ramakrishnan, P. O. Dral, M. Rupp, and O. A. Von Lilienfeld, Sci. Data 1, 1 (2014)].

Funding:

This research was funded by HIDA Trainee Network program, HAICU, Helmholtz AI-4-XAS, DASHH and HEIBRiDS graduate schools. For theoretical calculations and model training, computational resources at DESY and JFZ were used.
d
qm9
search.dataone.org
dataverse.harvard.edu
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fu, Tianfan (2023). qm9 [Dataset]. http://doi.org/10.7910/DVN/8ZZZ6J
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/8ZZZ6J
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Fu, Tianfan
Description
qm9 quantum. Visit https://dataone.org/datasets/sha256%3A09d247add3aca18d56b63d1834642e99243abfc143ff716a713c4a88e8bf59c5 for complete metadata about this dataset.
QM9x
figshare.com
hdf
Updated Aug 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mathias Schreiner (2022). QM9x [Dataset]. http://doi.org/10.6084/m9.figshare.20449701.v2
Explore at:
hdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20449701.v2
Dataset updated
Aug 16, 2022
Dataset provided by
Figsharehttp://figshare.com/
Authors
Mathias Schreiner
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
QM9x is a dataset that contains DFT calculations of energy and forces for all configurations in QM9 recalculated with the wb97x functional and 6-31G(d) basis set. Recalculating the energy and forces causes a slight shift of the potential energy surface which results in forces acting on most configurations in the dataset.

The choice of basis set and functional makes the QM9x compatible with the Transition1x and the ANI1x dataset.

see https://arxiv.org/abs/2207.12858 for comparison between ANI1x, QM9x and Transition1x.

Dataloaders and example scripts are availble in https://gitlab.com/matschreiner/QM9x

Please cite as Transition1x - Force and Energy Calculations of Millions of Near-Transition State Molecular Configurations,

and the original QM9 dataset.
d
QM9 data for graph2mat
data.dtu.dk
txt
Updated Aug 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arghya Bhowmik (2024). QM9 data for graph2mat [Dataset]. http://doi.org/10.11583/DTU.26195282.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.11583/DTU.26195282.v1
Dataset updated
Aug 6, 2024
Dataset provided by
Technical University of Denmark
Authors
Arghya Bhowmik
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Creators

Pol Febrer (pol.febrer@icn2.cat, ORCID 0000-0003-0904-2234) Peter Bjorn Jorgensen (peterbjorgensen@gmail.com, ORCID 0000-0003-4404-7276) Arghya Bhowmik (arbh@dtu.dk, ORCID 0000-0003-3198-5116)

Related publication

The dataset is published as part of the paper: "GRAPH2MAT: UNIVERSAL GRAPH TO MATRIX CONVERSION FOR ELECTRON DENSITY PREDICTION" (https://doi.org/10.26434/chemrxiv-2024-j4g21)

Short description

This dataset contains the Hamiltonian, Overlap, Density and Energy Density matrices from SIESTA calculations of the QM9 dataset (https://doi.org/10.6084/m9.figshare.c.978904.v5)

SIESTA 5.0.0 was used to compute the dataset.

Contents

The dataset has four directories:

basis: Contains the files specifying the basis used for each atom.

pseudos: Contains the pseudopotentials used for the calculation (obtained from http://www.pseudo-dojo.org/, type NC SR (ONCVPSP v0.5), PBE, standard accuracy)

runs: The results of running the SIESTA simulations. Contents are discussed next.

splits: The data splits used in the published paper. Each file "splits_X.json" contains the splits for training size X.

The "runs" directory contains one directory for each run, named with the index of the run. Each directory contains: - RUN.fdf, geom.fdf: The input files used for the SIESTA calculation. - RUN.out: The log of the SIESTA run, which apar - siesta.TSDE: Contains the Density and Energy Density matrices. - siesta.TSHS: Contains the Hamiltonian and Overlap matrices.

Each matrix can be read using the sisl python package (https://github.com/zerothi/sisl) like:

import sisl matrix = sisl.get_sile("RUN.fdf").read_X()

where X is hamiltonian, overlap, density_matrix or energy_density_matrix.

To reproduce the results presented in the paper, follow the documentation of the graph2mat package (https://github.com/BIG-MAP/graph2mat).

Cite this data

https://doi.org/10.11583/DTU.c.7310005 © 2024 Technical University of Denmark

License

This dataset is published under the CC BY 4.0 license. This license allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator.
data_qm9.pkl.tar.gz
figshare.com
application/gzip
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kuang-Yu Samuel Chang (2023). data_qm9.pkl.tar.gz [Dataset]. http://doi.org/10.6084/m9.figshare.4293959.v2
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.4293959.v2
Dataset updated
Jun 1, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Kuang-Yu Samuel Chang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
compressed python pickle file for qm9 dataset
h
JARVIS-QM9-DGL
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ColabFit, JARVIS-QM9-DGL [Dataset]. https://huggingface.co/datasets/colabfit/JARVIS-QM9-DGL
Explore at:
Dataset authored and provided by
ColabFit
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Cite this dataset

Ramakrishnan, R., Dral, P. O., Rupp, M., and Lilienfeld, O. A. JARVIS-QM9-DGL. ColabFit, 2023. https://doi.org/10.60732/403cd4f2

View on the ColabFit Exchange

https://materials.colabfit.org/id/DS_tat5i46x3hkr_0

Dataset Name

JARVIS-QM9-DGL

Description

The JARVIS-QM9-DGL dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the QM9 dataset… See the full description on the dataset page: https://huggingface.co/datasets/colabfit/JARVIS-QM9-DGL.
h
QM9_ADiT
huggingface.co
Updated May 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chaitanya K. Joshi (2025). QM9_ADiT [Dataset]. https://huggingface.co/datasets/chaitjo/QM9_ADiT
Explore at:
Dataset updated
May 23, 2025
Authors
Chaitanya K. Joshi
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
All-atom Diffusion Transformers - QM9 dataset

QM9 dataset from the paper "All-atom Diffusion Transformers: Unified generative modelling of molecules and materials", by Chaitanya K. Joshi, Xiang Fu, Yi-Lun Liao, Vahe Gharakhanyan, Benjamin Kurt Miller, Anuroop Sriram*, and Zachary W. Ulissi* from FAIR Chemistry at Meta (* Joint last author). Original data source: https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.datasets.QM9.html (Adapted from MoleculeNet)… See the full description on the dataset page: https://huggingface.co/datasets/chaitjo/QM9_ADiT.
f
OD9_0 (union of PC9 and QM9) dataset dictionnary with SMILES and ECFP4...
figshare.com
application/x-gzip
Updated Feb 28, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thomas Cauchy (2023). OD9_0 (union of PC9 and QM9) dataset dictionnary with SMILES and ECFP4 connectivity score [Dataset]. http://doi.org/10.6084/m9.figshare.20054339.v2
Explore at:
application/x-gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20054339.v2
Dataset updated
Feb 28, 2023
Dataset provided by
figshare
Authors
Thomas Cauchy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
JSON file of python dictionnary. Key: SMILES, value: dict {'HAC' # number of heavy atoms, 'swscore_ChEMBL' # % of ECFP4 of the molecule that belong to ChEMBL, 'swscore_ZINC' # % of ECFP4 of the molecule that belong to ZINC or ChEMBL} Only neutral singlet molecule without any atomic charges (formal or real) composed of H, C, N, O, and F.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2021). QM9 Dataset [Dataset]. https://paperswithcode.com/dataset/qm9

QM9 Dataset

Explore at:

Dataset updated

Nov 25, 2021

Description

QM9 provides quantum chemical properties (at DFT level) for a relevant, consistent, and comprehensive chemical space of small organic molecules. This database may serve the benchmarking of existing methods, development of new methods, such as hybrid quantum mechanics/machine learning, and systematic identification of structure-property relationships.

Clear search

Close search

Google apps

Main menu

QM9 Dataset

qm9

Quantum Machine 9, aka QM9

QM9-Dataset

Revised QM9 dataset (revQM9)

Accurate GW frontier orbital energies of 134 kilo molecules of the QM9...

Results of Quantum Chemical and Machine Learning Computations for Molecules...

Taxnonomic classifications for all structure in the QM9 dataset

QM9S dataset

qm9

Indices of the QM9 molecules which are not present in either of the...

modelforge curated dataset: QM9

Modelforge Curated QM9 Dataset:- 1000 conformer test set- Version: nc_1000_v1.1:

Properties Included:

Original Source:

Citations:

QM9-XAS database of 56k QM9 small organic molecules labeled with TDDFT X-ray...

qm9

QM9x

QM9 data for graph2mat

Creators

Related publication

Short description

Contents

Cite this data

License

data_qm9.pkl.tar.gz

JARVIS-QM9-DGL

QM9_ADiT

OD9_0 (union of PC9 and QM9) dataset dictionnary with SMILES and ECFP4...

QM9 DatasetSee More Versions

Modelforge Curated QM9 Dataset:
- 1000 conformer test set
- Version: nc_1000_v1.1:

QM9 Dataset