The NIST Chemical Kinetics Database includes essentially all reported kinetics results for thermal gas-phase chemical reactions. The database is designed to be searched for kinetics data based on the specific reactants involved, for reactions resulting in specified products, for all the reactions of a particular species, or for various combinations of these. In addition, the bibliography can be searched by author name or combination of names. The database contains in excess of 38,000 separate reaction records for over 11,700 distinct reactant pairs. These data have been abstracted from over 12,000 papers with literature coverage through early 2000. Rate constant records for a specified reaction are found by searching the Reaction Database. All rate constant records for that reaction are returned, with a link to 'Details' on that record. Each rate constant record contains the following information (as available): a) Reactants and, if defined, reaction products; b) Rate parameters: A, n, Ea/R, where k = A (T/298)*n exp[-(Ea/R)/T], where T is the temperature in Kelvins; c) Uncertainty in A, n, and Ea/R, if reported; d) Temperature range of experiment or temperature range of validity of a review or theoretical paper; e) Pressure range and bulk gas of the experiment; f) Data type of the record (i.e., experimental, relative rate measurement, theoretical calculation, modeling result, etc.). If the result is a relative rate measurement, then the reaction to which the rate is relative is also given; g) Experimental procedure, including separate fields for the description of the apparatus, the time resolution of the experiment, and the excitation technique. A majority of contemporary chemical kinetics methods are represented. The Kinetics Database is being expanded to include other resources for the convenience of the users. Presently this includes direct links to the corresponding NIST WebBook page for all substances for which such a link is possible. This is indicated by underling and highlighting the species. The WebBook provides thermodynamic, spectral, and other data on the species. Note that the link to the WebBook is opened as a new frame in your browser.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This reaction database is generated along with the manuscript "Comprehensive exploration of graphically defined reaction spaces".RGD1CHNO_AMsmiles.csv contains atom-mapped SMILES, activation energies, and enthalpies of formation for each reaction. RGD!_CHNO.h5 contains the geometry information and can be iterated by a python script from Github (https://github.com/zhaoqy1996/RGD1/parse_data.py). DFT_reaction_info.csv is supplied to reproduce figures in the article.RandP_smiles.txt is a dictionary to map the reactant and product smiles appear in RGD!_CHNO.h5 to a molecule index (molX).RGD1_RPs.h5 provides xtb and DFT optimized geometries of each individual reactant/product molecules. 3D ML models can be trained by combining RGD1_RPs.h5, RGD!_CHNO.h5, and RandP_smiles.txt (see https://github.com/zhaoqy1996/RGD1 for more details)IMPORTANT: We provided an UPDATED VERSION of RGD1 dataset in Ari 24, 2023. The initially posted version of the dataset reported swapped activation energies for ~24% of the forward/reverse reactions which were all corrected in this updated version.
KEGG LIGAND contains knowledge of chemical substances and reactions that are relevant to life. It is a composite database consisting of COMPOUND, GLYCAN, REACTION, RPAIR, and ENZYME databases, whose entries are identified by C, G, R, RP, and EC numbers, respectively. ENZYME is derived from the IUBMB/IUPAC Enzyme Nomenclature, but the others are internally developed and maintained. The primary database of KEGG LIGAND is a relational database with the KegDraw interface, which is used to generated the secondary (flat file) database for DBGET.
The NDRL/NIST Solution Kinetics Database contains data on rate constants for solution-phase chemical reactions. The database is designed to be searched by reactants, products, solvents, or any combination of these. In addition, the bibliography may be searched by author name, title words, journal, page(s), and/or year. This is not the same database as the one at Notre Dame, although both databases share a common data source.
A database based on the SABIO relational database that contains information about biochemical reactions, their kinetic equations with their parameters, and the experimental conditions under which these parameters were measured. It aims to support modelers in the setting-up of models of biochemical networks, but it is also useful for experimentalists or researchers with interest in biochemical reactions and their kinetics. SABIO-RK contains and merges information about reactions such as reactants and modifiers, organism, tissue and cellular location, as well as the kinetic properties of the reactions. The type of the kinetic mechanism, modes of inhibition or activation, and corresponding rate equations are presented together with their parameters and measured values, specifying the experimental conditions under which these were determined. Links to other databases are provided for users to gather further information and to refer to the original publication. Information about reactions and their kinetic data can be exported to an SBML file. The reaction kinetics data are obtained by manual extraction from literature sources and curated.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Optimization of the catalyst structure to simultaneously improve multiple reaction objectives (e.g., yield, enantioselectivity, and regioselectivity) remains a formidable challenge. Herein, we describe a machine learning workflow for the multi-objective optimization of catalytic reactions that employ chiral bisphosphine ligands. This was demonstrated through the optimization of two sequential reactions required in the asymmetric synthesis of an active pharmaceutical ingredient. To accomplish this, a density functional theory-derived database of
550 bisphosphine ligands was constructed, and a designer chemical space mapping technique was established. The protocol used classification methods to identify active catalysts, followed by linear regression to model reaction selectivity. This led to the prediction and validation of significantly improved ligands for all reaction outputs, suggesting a general strategy that can be readily implemented for reaction optimizations where performance is controlled by bisphosphine ligands.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets and splits of the manuscript "Chemprop: Machine Learning Package for Chemical Property Prediction." Train, validation and test splits are located within each folder, as well as additional data necessary for some of the benchmarks. To train Chemprop models, refer to our code repository to obtain ready-to-use scripts to train machine learning models for each of the systems. Available benchmarking systems:
hiv
HIV replication inhibition from MoleculeNet and OGB with scaffold splits
pcba_random
Biological activities from MoleculeNet with random splits (with missing targets filled in with zeros as provided by MoleculeNet)
pcba_random_nans
Biological activities from MoleculeNet with random splits and data format to match OGB (with missing targets not filled in with zeros)
pcba_scaffold
Biological activities from OGB with scaffold splits
qm9_multitask
DFT calculated properties from MoleculeNet and OGB, trained as a multi-task model
qm9_u0
DFT calculated properties from MoleculeNet and OGB, trained as a single-task model on the target U0 only
qm9_gap
DFT calculated properties from MoleculeNet and OGB, trained as a single-task model on the target gap only
sampl
Water-octanol partition coefficients, used to predict molecules from the SAMPL6, 7 and 9 challenges
atom_bond_137k
Quantum-mechanical atom and bond descriptors
bde
Bond dissociation enthalpies trained as single-task model
bde_charges
Bond dissociation enthalpies trained as multi-task model together with atomic partial charges
charges_eps_4
Partial charges at a dielectric constant of 4 (in protein)
charges_eps_78
Partial charges at a dielectric constant of 78 (in water)
barriers_e2
Reaction barrier heights of E2 reactions
barriers_sn2
Reaction barrier heights of SN2 reactions
barriers_cycloadd
Reaction barrier heights of cycloaddition reactions
barriers_rdb7
Reaction barrier heights in the RDB7 dataset
barriers_rgd1
Reaction barrier heights in the RGD1-CNHO dataset
multi_molecule
UV/Vis peak absorption wavelengths in different solvents
ir
IR Spectra
pcqm4mv2
HOMO-LUMO gaps of the PCQM4Mv2 dataset
uncertainty_ensemble
Uncertainty estimation using an ensemble using the QM9 gap dataset
uncertainty_evidential
Uncertainty estimation using evidential learning using the QM9 gap dataset
uncertainty_mve
Uncertainty estimation using mean-variance estimation using the QM9 gap dataset
timing
Timing benchmark using subsets of QM9 gap
Version: This version of the dataset (Version 2) is compatible with all versions of Chemprop (supporting the respective functionality). Version 1 of this dataset is compatible with all versions except Chemprop v.1.6.1, which cannot process the charges_eps_4
and charges_eps_78
datasets (all other benchmarks work as expected). We therefore recommend to always use Version 2 of the dataset (with reformatted charges_eps_4
and charges_eps_78
datasets), since it is compatible with all versions of Chemprop. For use with any other ML software, you can use any version.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Reaction classification has important applications, and many approaches to classification have been applied. Our own algorithm tests all maximum common substructures (MCS) between all reactant and product molecules in order to find an atom mapping containing the minimum chemical distance (MCD). Recent publications have concluded that new MCS algorithms need to be compared with existing methods in a reproducible environment, preferably on a generalized test set, yet the number of test sets available is small, and they are not truly representative of the range of reactions that occur in real reaction databases. We have designed a challenging test set of reactions and are making it publicly available and usable with InfoChem’s software or other classification algorithms. We supply a representative set of example reactions, grouped into different levels of difficulty, from a large number of reaction databases that chemists actually encounter in practice, in order to demonstrate the basic requirements for a mapping algorithm to detect the reaction centers in a consistent way. We invite the scientific community to contribute to the future extension and improvement of this data set, to achieve the goal of a common standard.
The following information is given for each entry in this database: the reference for the data; the reaction studied; the name of the enzyme used and its Enzyme Commission number; the method of measurement; the conditions of measurement (temperature, pH, ionic strength, and the buffer(s) and cofactor(s) used); the data and an evaluation of it; and, sometimes, commentary on the data and on any corrections which have been applied to it. The absence of a piece of information indicates that it was not found in the paper cited.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Reaction SMILES dataset update (now 733K), each line in the file represents a valid reaction SMILES. Source material US patents (2005 - 2016) collection by Daniel Lowe with data enhancement. Source material also includes reaction SMILES drawn from the general literature. Also includes USPTO data from 2022 and 2023. All SMILES are valid by RDKit. Also see https://kmt.vander-lingen.nl
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
USPTO-LLM is an information-enriched chemical reaction dataset that provides more side information (reaction conditions and reaction steps division) for developing new reaction prediction and retrosynthesis methods and inspires new problems, such as reaction condition prediction. It comprises over 247K chemical reactions extracted from the patent documents of USPTO (United States Patent and Trademark Office), encompassing abundant information on reaction conditions.
We employ large language models to expedite the data collection procedures automatically with a reliable quality control process. The extracted chemical reactions are organized as heterogeneous directed graphs, allowing us to formulate a series of prediction tasks, such as reaction prediction, retrosynthesis, and reaction condition prediction, in a unified graph-filling framework.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
The data extract is a series of compressed ASCII text files of the full data set contained in the Canada Vigilance Adverse Reaction Online Database. It is intended for users who are familiar with database structures and setting up their own queries. Find details on the data structure required for the data file in the Canada Vigilance Adverse Reaction Online Database - Data Structure. In order to use the data, the file must be loaded into an existing database or information system provided by the user. The Canada Vigilance Adverse Reaction Online Database contains information about suspected adverse reactions (also known as side effects) to health products, captured from adverse reaction reports submitted to Health Canada by consumers and health professionals, who submit reports voluntarily, as well as by market authorization holders (manufacturers and distributors), who are required to submit reports according to the Food and Drugs Regulations. Information concerning vaccines used for immunization have only been included in the database since January 1, 2011. Indication data has recently been added to the data extract files and the Detailed Adverse Reaction Report. Indication refers to the particular condition for which a health product was taken. For example, diabetes is an indication for insulin. Health products are often authorised for use in treating more than one indication. Note: The database cannot be used on its own to evaluate a health product's safety profile. It does not provide conclusive information on the safety of health products, and is not a substitute for medical advice. Should you have an issue of medical concern, consult a qualified health professional.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the collection associated with list S73 MetXBioDB Metabolite Reaction Database from BioTransformer on the NORMAN Suspect List Exchange.
https://www.norman-network.com/nds/SLE/
This dataset is extracted from the database behind BioTransformer (http://biotransformer.ca/) by Yannick Djoumbou-Feunang, David S. Wishart and colleagues, for addition to the PubChem Transformations section. Change logs and version tracking at the ECI GitLab site.
Please cite the BioTransformer article when using this set: https://jcheminf.biomedcentral.com/articles/10.1186/s13321-018-0324-5
NOTE: This deposition is work in progress ...
Change log: 13 Oct: added InChIKey file. 16 Oct: updated substances with missing CIDs and transformations. 5/11 many bug fixes finally committed, added DTXSIDs. 22/6/2023 adjusted one CID that changed upon PubChem standardization. 15 Nov 2023: fixed typo in reaction description. 26 Feb 2024: corrected name for CID 65564. 6 Aug 2024: fixed many triazine synonyms.
A database of chemicals and reactions inside of US patents (2001 - 2011). SCRIPDB provides the full original patent text, reactions and relationships described within any individual patent, in addition to the molecular files common to structural databases. The patent literature is a rich catalog of biologically relevant chemicals; many public and commercial molecular databases contain the structures disclosed in patent claims. However, patents are an equally rich source of metadata about bioactive molecules, including mechanism of action, disease class, homologous experimental series, structural alternatives, or the synthetic pathways used to produce molecules of interest. Unfortunately, this metadata is discarded when chemical structures are deposited separately in databases. SCRIPDB is a chemical structure database designed to make this metadata accessible. The SCRIPDB information is valuable in medical text mining, chemical image analysis, reaction extraction and in silico pharmaceutical lead optimization. SCRIPDB may be searched by exact chemical structure, substructure or molecular similarity and the results may be restricted to patents describing synthetic routes.
Link Function: information
Manually annotated reaction database where all reaction participants (reactants and products) are linked to the ChEBI database (Chemical Entities of Biological Interest) which provides detailed information about structure, formula and charge. Rhea provides built-in validations that ensure both elemental and charge balance of the reactions. The database has been populated with the reactions found in the Enzyme Commission (EC) list (and in the IntEnz and ENZYME databases), extending it with additional known reactions of biological interest. While the main focus of Rhea is enzyme-catalyzed reactions, other biochemical reactions are also included. Rhea is a manually annotated resource and it provides: stable reaction identifiers for each of its reactions; directionality information if the physiological direction of the reaction is known; the possibility to link several reactions together to form overall reactions; extensive cross-references to other resources including enzyme-catalyzed and other metabolic reactions, such as the EC list (in IntEnz), KEGG, MetaCyc and UniPathway; and chemical substructure and similarity searches on compounds in Rhea.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Collection of reaction SMILES (reactants, reagents, solvents, products) 1.37M lines total from patent literature (USPTO 1976 - 2024) and from academic literature (2.5% total). Data converted from existing USPTO dataset 1] and data generated by parsing by custom design. Data extraction by OSCAR (semantic) or ChatGPT (LLM), molecule identification by OPSIN and custom synonym list. All SMILES are RDKit-safe with duplicate reactions removed. Please note that the data have been collected in an semi-automated process, the dataset is certainly not without errors.More information on https://kmt.vander-lingen.nl.1] Chemical reactions from US patents (1976-Sep2016), Daniel Lowe. Link.
Subset and preprocessed version of Chemical reactions from US patents (1976-Sep2016) by Daniel Lowe. It includes 50K randomly selected reactions that was later classified into 10 reaction classes by Nadine Schneider et al.
Abstract Reaction Mechanism Generator (RMG) constructs kinetic models composed of elementary chemical reaction steps using a general understanding of how molecules react. Species thermochemistry is estimated through Benson group additivity and reaction rate coefficients are estimated using a database of known rate rules and reaction templates. At its core, RMG relies on two fundamental data structures: graphs and trees. Graphs are used to represent chemical structures, and trees are used to represent ... Title of program: RMG Catalogue Id: AEZW_v1_0 Nature of problem Automatic generation of chemical kinetic mechanisms for molecules containing C, H, O, S, and N. Versions of this program held in the CPC repository in Mendeley Data AEZW_v1_0; RMG; 10.1016/j.cpc.2016.02.013 This program has been imported from the CPC Program Library held at Queen's University Belfast (1969-2018)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data of the article "General reactive machine learning potentials for CHON elements"
The NIST Chemical Kinetics Database includes essentially all reported kinetics results for thermal gas-phase chemical reactions. The database is designed to be searched for kinetics data based on the specific reactants involved, for reactions resulting in specified products, for all the reactions of a particular species, or for various combinations of these. In addition, the bibliography can be searched by author name or combination of names. The database contains in excess of 38,000 separate reaction records for over 11,700 distinct reactant pairs. These data have been abstracted from over 12,000 papers with literature coverage through early 2000. Rate constant records for a specified reaction are found by searching the Reaction Database. All rate constant records for that reaction are returned, with a link to 'Details' on that record. Each rate constant record contains the following information (as available): a) Reactants and, if defined, reaction products; b) Rate parameters: A, n, Ea/R, where k = A (T/298)*n exp[-(Ea/R)/T], where T is the temperature in Kelvins; c) Uncertainty in A, n, and Ea/R, if reported; d) Temperature range of experiment or temperature range of validity of a review or theoretical paper; e) Pressure range and bulk gas of the experiment; f) Data type of the record (i.e., experimental, relative rate measurement, theoretical calculation, modeling result, etc.). If the result is a relative rate measurement, then the reaction to which the rate is relative is also given; g) Experimental procedure, including separate fields for the description of the apparatus, the time resolution of the experiment, and the excitation technique. A majority of contemporary chemical kinetics methods are represented. The Kinetics Database is being expanded to include other resources for the convenience of the users. Presently this includes direct links to the corresponding NIST WebBook page for all substances for which such a link is possible. This is indicated by underling and highlighting the species. The WebBook provides thermodynamic, spectral, and other data on the species. Note that the link to the WebBook is opened as a new frame in your browser.