100+ datasets found

NIST Chemical Kinetics Database
catalog.data.gov
gimi9.com
+2more
Updated Sep 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2025). NIST Chemical Kinetics Database [Dataset]. https://catalog.data.gov/dataset/nist-chemical-kinetics-database
Explore at:
Dataset updated
Sep 30, 2025
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
The NIST Chemical Kinetics Database includes essentially all reported kinetics results for thermal gas-phase chemical reactions. The database is designed to be searched for kinetics data based on the specific reactants involved, for reactions resulting in specified products, for all the reactions of a particular species, or for various combinations of these. In addition, the bibliography can be searched by author name or combination of names. The database contains in excess of 38,000 separate reaction records for over 11,700 distinct reactant pairs. These data have been abstracted from over 12,000 papers with literature coverage through early 2000. Rate constant records for a specified reaction are found by searching the Reaction Database. All rate constant records for that reaction are returned, with a link to 'Details' on that record. Each rate constant record contains the following information (as available): a) Reactants and, if defined, reaction products; b) Rate parameters: A, n, Ea/R, where k = A (T/298)*n exp[-(Ea/R)/T], where T is the temperature in Kelvins; c) Uncertainty in A, n, and Ea/R, if reported; d) Temperature range of experiment or temperature range of validity of a review or theoretical paper; e) Pressure range and bulk gas of the experiment; f) Data type of the record (i.e., experimental, relative rate measurement, theoretical calculation, modeling result, etc.). If the result is a relative rate measurement, then the reaction to which the rate is relative is also given; g) Experimental procedure, including separate fields for the description of the apparatus, the time resolution of the experiment, and the excitation technique. A majority of contemporary chemical kinetics methods are represented. The Kinetics Database is being expanded to include other resources for the convenience of the users. Presently this includes direct links to the corresponding NIST WebBook page for all substances for which such a link is possible. This is indicated by underling and highlighting the species. The WebBook provides thermodynamic, spectral, and other data on the species. Note that the link to the WebBook is opened as a new frame in your browser.
Z
S73 | METXBIODB | Metabolite Reaction Database from BioTransformer
data.niaid.nih.gov
Updated Aug 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Djoumbou-Feunang, Yannick; Schymanski, Emma; Zhang, Jeff; Wishart, David S. (2024). S73 | METXBIODB | Metabolite Reaction Database from BioTransformer [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4056560
Explore at:
Dataset updated
Aug 6, 2024
Dataset provided by
University of Alberta
Corteva
LCSB, Uni Luxembourg
NIH/NLM/NCBI
Authors
Djoumbou-Feunang, Yannick; Schymanski, Emma; Zhang, Jeff; Wishart, David S.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the collection associated with list S73 MetXBioDB Metabolite Reaction Database from BioTransformer on the NORMAN Suspect List Exchange.

https://www.norman-network.com/nds/SLE/

This dataset is extracted from the database behind BioTransformer (http://biotransformer.ca/) by Yannick Djoumbou-Feunang, David S. Wishart and colleagues, for addition to the PubChem Transformations section. Change logs and version tracking at the ECI GitLab site.

Please cite the BioTransformer article when using this set: https://jcheminf.biomedcentral.com/articles/10.1186/s13321-018-0324-5

NOTE: This deposition is work in progress ...

Change log: 13 Oct: added InChIKey file. 16 Oct: updated substances with missing CIDs and transformations. 5/11 many bug fixes finally committed, added DTXSIDs. 22/6/2023 adjusted one CID that changed upon PubChem standardization. 15 Nov 2023: fixed typo in reaction description. 26 Feb 2024: corrected name for CID 65564. 6 Aug 2024: fixed many triazine synonyms.
YARP reaction database
figshare.com
zip
Updated Sep 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qiyuan Zhao (2022). YARP reaction database [Dataset]. http://doi.org/10.6084/m9.figshare.14766624.v7
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14766624.v7
Dataset updated
Sep 9, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Qiyuan Zhao
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This is a dataset generated by Yet Another Reaction Program (YARP), including pyGSM reaction pathways, Gaussian transitions states optimization file, IRC calculation results, etc.

Four systems are provided, 'KHP network' involves reactions of gamma-ketohydroperoxide and it's 12 intended products. 'Z-benchmark' involves reactants obtained from Zimmerman testing set.
n
Database of Chemical Compounds and Reactions in Biological Pathways
neuinfo.org
scicrunch.org
+2more
Updated Dec 23, 2005
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2005). Database of Chemical Compounds and Reactions in Biological Pathways [Dataset]. http://identifiers.org/RRID:SCR_006851
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_006851
Dataset updated
Dec 23, 2005
Description
KEGG LIGAND contains knowledge of chemical substances and reactions that are relevant to life. It is a composite database consisting of COMPOUND, GLYCAN, REACTION, RPAIR, and ENZYME databases, whose entries are identified by C, G, R, RP, and EC numbers, respectively. ENZYME is derived from the IUBMB/IUPAC Enzyme Nomenclature, but the others are internally developed and maintained. The primary database of KEGG LIGAND is a relational database with the KegDraw interface, which is used to generated the secondary (flat file) database for DBGET.
f
Data from: SynRoute: A Retrosynthetic Planning Software
acs.figshare.com
txt
Updated Aug 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mario Latendresse; Jeremiah P. Malerich; James Herson; Markus Krummenacker; Judy Szeto; Vi-Anh Vu; Nathan Collins; Peter B. Madrid (2023). SynRoute: A Retrosynthetic Planning Software [Dataset]. http://doi.org/10.1021/acs.jcim.3c00491.s003
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jcim.3c00491.s003
Dataset updated
Aug 28, 2023
Dataset provided by
ACS Publications
Authors
Mario Latendresse; Jeremiah P. Malerich; James Herson; Markus Krummenacker; Judy Szeto; Vi-Anh Vu; Nathan Collins; Peter B. Madrid
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Computer-assisted synthetic planning has seen major advancements that stem from the availability of large reaction databases and artificial intelligence methodologies. SynRoute is a new retrosynthetic planning software tool that uses a relatively small number of general reaction templates, currently 263, along with a literature-based reaction database to find short, practical synthetic routes for target compounds. For each reaction template, a machine learning classifier is trained using data from the Pistachio reaction database to predict whether new computer-generated reactions based on the template are likely to work experimentally in the laboratory. This reaction generation methodology is used together with a vectorized Dijkstra-like search of top-scoring routes organized by synthetic strategies for easy browsing by a synthetic chemist. SynRoute was able to find routes for an average of 83% of compounds based on selection of random subsets of drug-like compounds from the ChEMBL database. Laboratory evaluation of 12 routes produced by SynRoute, to synthesize compounds not from the previous random subsets, demonstrated the ability to produce feasible overall synthetic strategies for all compounds evaluated.
w
NDRL/NIST Solution Kinetics Database - SRD 40
data.wu.ac.at
s.cnmilf.com
+1more
html
Updated Jan 29, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Commerce (2016). NDRL/NIST Solution Kinetics Database - SRD 40 [Dataset]. https://data.wu.ac.at/schema/data_gov/MGYyMDg5MWYtZDgwZi00Y2UzLWIzYzItNTkzMGNiMTMxYzYx
Explore at:
htmlAvailable download formats
Dataset updated
Jan 29, 2016
Dataset provided by
Department of Commerce
Description
The NDRL/NIST Solution Kinetics Database contains data on rate constants for solution-phase chemical reactions. The database is designed to be searched by reactants, products, solvents, or any combination of these. In addition, the bibliography may be searched by author name, title words, journal, page(s), and/or year. This is not the same database as the one at Notre Dame, although both databases share a common data source.
RGD1-CNHO Database
figshare.com
data.niaid.nih.gov
hdf
Updated Nov 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qiyuan Zhao; Brett Savoie; Michael Woulfe; Sai Mahit Vaddadi; Lawal A. Ogunfowora; Sanjay Garimella (2023). RGD1-CNHO Database [Dataset]. http://doi.org/10.6084/m9.figshare.21066901.v9
Explore at:
hdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21066901.v9
Dataset updated
Nov 26, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Qiyuan Zhao; Brett Savoie; Michael Woulfe; Sai Mahit Vaddadi; Lawal A. Ogunfowora; Sanjay Garimella
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This reaction database is generated along with the manuscript "Comprehensive exploration of graphically defined reaction spaces".RGD1CHNO_AMsmiles.csv contains atom-mapped SMILES, activation energies, and enthalpies of formation for each reaction. RGD!_CHNO.h5 contains the geometry information and can be iterated by a python script from Github (https://github.com/zhaoqy1996/RGD1/parse_data.py). DFT_reaction_info.csv is supplied to reproduce figures in the article.RandP_smiles.txt is a dictionary to map the reactant and product smiles appear in RGD!_CHNO.h5 to a molecule index (molX).RGD1_RPs.h5 provides xtb and DFT optimized geometries of each individual reactant/product molecules. 3D ML models can be trained by combining RGD1_RPs.h5, RGD!_CHNO.h5, and RandP_smiles.txt (see https://github.com/zhaoqy1996/RGD1 for more details)IMPORTANT: We provided an UPDATED VERSION of RGD1 dataset in Ari 24, 2023. The initially posted version of the dataset reported swapped activation energies for ~24% of the forward/reverse reactions which were all corrected in this updated version.
r
RHEA
rrid.site
dknet.org
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). RHEA [Dataset]. http://identifiers.org/RRID:SCR_004713
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_004713
Dataset updated
Jan 29, 2022
Description
Manually annotated reaction database where all reaction participants (reactants and products) are linked to the ChEBI database (Chemical Entities of Biological Interest) which provides detailed information about structure, formula and charge. Rhea provides built-in validations that ensure both elemental and charge balance of the reactions. The database has been populated with the reactions found in the Enzyme Commission (EC) list (and in the IntEnz and ENZYME databases), extending it with additional known reactions of biological interest. While the main focus of Rhea is enzyme-catalyzed reactions, other biochemical reactions are also included. Rhea is a manually annotated resource and it provides: stable reaction identifiers for each of its reactions; directionality information if the physiological direction of the reaction is known; the possibility to link several reactions together to form overall reactions; extensive cross-references to other resources including enzyme-catalyzed and other metabolic reactions, such as the EC list (in IntEnz), KEGG, MetaCyc and UniPathway; and chemical substructure and similarity searches on compounds in Rhea.
r
Biochemical Pathways Reaction Kinetics Database
rrid.site
neuinfo.org
+2more
Updated Oct 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Biochemical Pathways Reaction Kinetics Database [Dataset]. http://identifiers.org/RRID:SCR_002122
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_002122
Dataset updated
Oct 22, 2025
Description
A database based on the SABIO relational database that contains information about biochemical reactions, their kinetic equations with their parameters, and the experimental conditions under which these parameters were measured. It aims to support modelers in the setting-up of models of biochemical networks, but it is also useful for experimentalists or researchers with interest in biochemical reactions and their kinetics. SABIO-RK contains and merges information about reactions such as reactants and modifiers, organism, tissue and cellular location, as well as the kinetic properties of the reactions. The type of the kinetic mechanism, modes of inhibition or activation, and corresponding rate equations are presented together with their parameters and measured values, specifying the experimental conditions under which these were determined. Links to other databases are provided for users to gather further information and to refer to the original publication. Information about reactions and their kinetic data can be exported to an SBML file. The reaction kinetics data are obtained by manual extraction from literature sources and curated.
d
Excel, ORD, and NMR spectroscopy files for: Total synthesis of Honokiol via...
datadryad.org
zip
Updated Sep 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emmanuel Moya Cruz; Young Eun Lee; Sun Min Kim; Stanislav Jaracz; Marisa C. Kozlowski (2025). Excel, ORD, and NMR spectroscopy files for: Total synthesis of Honokiol via oxidative phenol coupling [Dataset]. http://doi.org/10.5061/dryad.ghx3ffc2d
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.ghx3ffc2d
Dataset updated
Sep 25, 2025
Dataset provided by
Dryad
Authors
Emmanuel Moya Cruz; Young Eun Lee; Sun Min Kim; Stanislav Jaracz; Marisa C. Kozlowski
Time period covered
Aug 17, 2025
Description
Herein, we disclose the total synthesis of honokiol in six steps with an overall yield of 66%. Two distinct routes were explored, with the key steps being highly efficient and selective cross-couplings of commercially available phenols to construct the main biphenolic backbone. The routes employ inexpensive reagents and are scalable, high-yielding processes. The experimental procedures are reported in the conventional narrative format and in two machine-readable formats.
d
Reaction network for solids - datasets and results.
data.dtu.dk
zip
Updated Oct 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rasmus Fromsejer; Bjørn Maribo-Mogensen; Georgios Kontogeorgis; Xiaodong Liang (2024). Reaction network for solids - datasets and results. [Dataset]. http://doi.org/10.11583/DTU.25897420.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.11583/DTU.25897420.v1
Dataset updated
Oct 14, 2024
Dataset provided by
Technical University of Denmark
Authors
Rasmus Fromsejer; Bjørn Maribo-Mogensen; Georgios Kontogeorgis; Xiaodong Liang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data repository was created by Rasmus Fromsejer (Technical University of Denmark) to supplement the research paper "Accurate Formation Enthalpies of Solids Using Reaction Networks" by Rasmus Fromsejer, Bjørn Maribo-Mogensen, Georgios Kontogeorgis and Xiaodong Liang in npj computational materials.The data repository consists of:a directory with results and databases in .csv format excluding detailed information about the reactions used in the reaction network predictions (.csv/)a directory with the results including detailed information about the reactions used in the reaction network predictions in gzipped .pkl format including detailed information about the reactions used in the reaction network predictions (.pkl/).Refer to the READMEs in the aforementioned directories for detailed information about the directories, files and the file contents.
Chemical reactions from US patents (1976-Sep2016)
figshare.com
7z
Updated Jun 13, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Lowe (2017). Chemical reactions from US patents (1976-Sep2016) [Dataset]. http://doi.org/10.6084/m9.figshare.5104873.v1
Explore at:
7zAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5104873.v1
Dataset updated
Jun 13, 2017
Dataset provided by
Figsharehttp://figshare.com/
Authors
Daniel Lowe
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Reactions extracted by text-mining from United States patents published between 1976 and September 2016. The reactions are available as CML or reaction SMILES. Note that the reactions SMILES are derived from the CML. The files can be unzipped using a program like 7-Zip.The reactions were extracted using an enhanced version of the reaction extraction code described in https://www.repository.cam.ac.uk/handle/1810/244727with LeadMine (https://www.nextmovesoftware.com/leadmine.html) used for chemical entity recognition.General tips:Duplicate reactions are frequent due to the same or highly similar text occurring in multiple patents, this is especially true when combining the applications and grant datasets, many reactions from applications will later appear in patent grants.Paragraph numbers are only present for 2005+ patent grants and patent applications.Multiple reactions can be extracted from the same paragraph.Atom maps in the reactions SMILES are derived using Epam's Indigo toolkit. While typically correct, the atom-maps are wrong in many cases and hence should not be entirely relied on.The reactions have been filtered to remove common cases of incorrectly extracted reactions:All product atoms must be accounted for by the atom-mappingThe product(s) must have >8 heavy atomsThe product must not be charged if it is a single componentThe number of products must be
h
ORD_Ahneman_2018
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carl Mauro, ORD_Ahneman_2018 [Dataset]. https://huggingface.co/datasets/cmmauro/ORD_Ahneman_2018
Explore at:
Authors
Carl Mauro
Description
BIOINF595 W2025 Bioactivity Project Dataset Author: Carl Mauro The reaction data used in this project is from the following publication, accessed through the Open Reaction Database (https://open-reaction-database.org/). The original data is used under an MIT license, and is under copyright by the original authors (see LICENSE.txt file for details). Ahneman, D. T.; Estrada, J. G.; Lin, S.; Dreher, S. D.; Doyle, A. G. Predicting Reaction Performance in C–N Cross-Coupling Using Machine… See the full description on the dataset page: https://huggingface.co/datasets/cmmauro/ORD_Ahneman_2018.
n
PlantCyc
neuinfo.org
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). PlantCyc [Dataset]. http://identifiers.org/RRID:SCR_002110
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_002110
Dataset updated
Jan 29, 2022
Description
Multi species reference database. Comprehensive plant biochemical pathway database, containing curated information from literature and computational analyses about genes, enzymes, compounds, reactions, and pathways involved in primary and secondary metabolism.
Z
RMG-DB-11: Enumerating Reaction Space for Small Molecule Chemistry
data.niaid.nih.gov
Updated Jan 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kevin Spiekermann (2024). RMG-DB-11: Enumerating Reaction Space for Small Molecule Chemistry [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8144352
Explore at:
Dataset updated
Jan 8, 2024
Dataset provided by
MIT
Authors
Kevin Spiekermann
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository presents approximately 750 million atom-mapped reaction SMILES. Reactions are generated by applying templates from the Reaction Mechanism Generator (RMG) database to a subset of the species from GDB11. Thus, we refer to this dataset as RMG-DB-11 i.e., the Reaction Mechanism Generator Database whose species contain up to 11 heavy atoms. All SMILES have been canonicalized by RDKit. All reactions are labeled with their corresponding RMG template.

This data serves as a crucial starting point for quantitative predictive chemistry. Many methods that search for transition state structures require atom-mapped SMILES, which this repository provides. This data is also well-suited for unsupervised pre-training of various machine learning models.

To parse the data with Python, start with import pandas as pd. Reactions with 1-8 heavy atoms can be parsed using the following code snippet: pd.read_csv(). Reactions with 9 heavy atoms can be parsed using pd.read_pickle(, compression='zip'). The file names below include the word "zip" as a helpful hint to use the compression argument. Due to the large number of reactions with 10 and 11 heavy atoms, these are split into smaller chunks. First untar the file using tar -xvf to obtain several zipped pickle files that can each be parsed using the same method as with 9 heavy atoms.
Data from: General reactive machine learning potentials for CHON elements
figshare.com
application/x-gzip
Updated Jun 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
bowen li (2025). General reactive machine learning potentials for CHON elements [Dataset]. http://doi.org/10.6084/m9.figshare.29311196.v1
Explore at:
application/x-gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29311196.v1
Dataset updated
Jun 13, 2025
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
bowen li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data of the article "General reactive machine learning potentials for CHON elements"
n
Pathway Analysis Tool for Integration and Knowledge Acquisition
neuinfo.org
dknet.org
+2more
Updated Sep 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Pathway Analysis Tool for Integration and Knowledge Acquisition [Dataset]. http://identifiers.org/RRID:SCR_002100
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_002100
Dataset updated
Sep 8, 2024
Description
The human pathway database which contains different biological entities and reactions and software tools for analysis. PATIKA Database integrates data from several sources, including Entrez Gene, UniProt, PubChem, GO, IntAct, HPRD, and Reactome. Users can query and access this data using the PATIKAweb query interface. Users can also save their results in XML or export to common picture formats. The BioPAX and SBML exporters can be used as part of this Web service.
b
EAWAG Biocatalysis/Biodegradation Database reaction
bioregistry.io
Updated Apr 30, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). EAWAG Biocatalysis/Biodegradation Database reaction [Dataset]. https://bioregistry.io/umbbd.reaction
Explore at:
Dataset updated
Apr 30, 2021
Description
The University of Minnesota Biocatalysis/Biodegradation Database (UM-BBD) contains information on microbial biocatalytic reactions and biodegradation pathways for primarily xenobiotic, chemical compounds. The goal of the UM-BBD is to provide information on microbial enzyme-catalyzed reactions that are important for biotechnology. This collection refers to reaction information.
Data from: NIST Chemistry WebBook - SRD 69
webbook.nist.gov
data.nist.gov
+3more
Updated Oct 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2023). NIST Chemistry WebBook - SRD 69 [Dataset]. http://doi.org/10.18434/T4D303
Explore at:
Unique identifier
https://doi.org/10.18434/T4D303
Dataset updated
Oct 9, 2023
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
License
https://www.nist.gov/open/copyright-fair-use-and-licensing-statements-srd-data-software-and-technical-series-publications#SRDhttps://www.nist.gov/open/copyright-fair-use-and-licensing-statements-srd-data-software-and-technical-series-publications#SRD
Description
The NIST Chemistry WebBook provides users with easy access to chemical and physical property data for chemical species through the internet. The data provided in the site are from collections maintained by the NIST Standard Reference Data Program and outside contributors. Data in the WebBook system are organized by chemical species. The WebBook system allows users to search for chemical species by various means. Once the desired species has been identified, the system will display data for the species. Data include thermochemical properties of species and reactions, thermophysical properties of species, and optical, electronic and mass spectra.
R
Data from: BioRGroup dataset: R-group expansion of ChEBI molecules...
entrepot.recherche.data.gouv.fr
application/x-gzip
Updated Sep 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Guillaume GRICOURT; Guillaume GRICOURT; Faulon; Faulon (2025). BioRGroup dataset: R-group expansion of ChEBI molecules referenced in the Rhea database [Dataset]. http://doi.org/10.57745/V3URYA
Explore at:
application/x-gzip(2773751200)Available download formats
Unique identifier
https://doi.org/10.57745/V3URYA
Dataset updated
Sep 18, 2025
Dataset provided by
Recherche Data Gouv
Authors
Guillaume GRICOURT; Guillaume GRICOURT; Faulon; Faulon
License
https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
Dataset funded by
Agence nationale de la recherche
Description
This dataset transforms the many generic molecules in the ChEBI ontology—those whose structures contain undefined R‑groups—into fully specified molecular instances. Its purpose is to let cheminformaticians, enzymologists and AI/ML developers treat R‑group–bearing ChEBI entries as ordinary molecules, so they can be indexed, searched and used to augment training sets for tasks such as reaction prediction, bio‑isosteric replacement and retro‑biosynthetic pathway design. In nature, the resource is a gzip‑compressed CSV file produced by a three‑stage RDKit‑based pipeline that: 1. Extracts every ChEBI SMILES that contains at least one R‑group from the Rhea reaction database (release 134). 2. Finds real PubChem compounds whose heavy‑atom core matches the ChEBI scaffold, allowing only the R‑group position to vary. 3. Filters matches so that the final list comprises molecules differing from the template only at the R‑group site, and records their PubChem CIDs for traceability. Each record therefore links a generic ChEBI structure to the enumerated set of concrete PubChem structures that realise it, along with molecular weight, heavy‑atom count and bookkeeping fields that distinguish “exact core” versus “core + extra substituent” matches. The dataset’s scope encompasses all R‑group–containing entries in Rhea/ChEBI that survive atomic filters (≥ 6 heavy atoms and atoms found in living organisms), yielding 12,709 rows and eight columns that summarise: the canonical SMILES, the list of ChEBI IDs sharing that SMILES, computed properties, matched PubChem SMILES/CIDs with and without extra substituents, and provenance metadata. By expanding more than a thousand otherwise unusable generic templates into over ten thousand explicit molecules, the dataset bridges a long‑standing gap between curated biochemical ontologies and large‑scale public compound repositories, enabling systematic benchmarking, data augmentation and method development wherever R‑groups once forced researchers to discard valuable reaction data.

Facebook

Twitter

Click to copy link

Link copied

Cite

National Institute of Standards and Technology (2025). NIST Chemical Kinetics Database [Dataset]. https://catalog.data.gov/dataset/nist-chemical-kinetics-database

NIST Chemical Kinetics Database

Explore at:

Dataset updated

Sep 30, 2025

Dataset provided by

National Institute of Standards and Technologyhttp://www.nist.gov/

Description

The NIST Chemical Kinetics Database includes essentially all reported kinetics results for thermal gas-phase chemical reactions. The database is designed to be searched for kinetics data based on the specific reactants involved, for reactions resulting in specified products, for all the reactions of a particular species, or for various combinations of these. In addition, the bibliography can be searched by author name or combination of names. The database contains in excess of 38,000 separate reaction records for over 11,700 distinct reactant pairs. These data have been abstracted from over 12,000 papers with literature coverage through early 2000. Rate constant records for a specified reaction are found by searching the Reaction Database. All rate constant records for that reaction are returned, with a link to 'Details' on that record. Each rate constant record contains the following information (as available): a) Reactants and, if defined, reaction products; b) Rate parameters: A, n, Ea/R, where k = A (T/298)*n exp[-(Ea/R)/T], where T is the temperature in Kelvins; c) Uncertainty in A, n, and Ea/R, if reported; d) Temperature range of experiment or temperature range of validity of a review or theoretical paper; e) Pressure range and bulk gas of the experiment; f) Data type of the record (i.e., experimental, relative rate measurement, theoretical calculation, modeling result, etc.). If the result is a relative rate measurement, then the reaction to which the rate is relative is also given; g) Experimental procedure, including separate fields for the description of the apparatus, the time resolution of the experiment, and the excitation technique. A majority of contemporary chemical kinetics methods are represented. The Kinetics Database is being expanded to include other resources for the convenience of the users. Presently this includes direct links to the corresponding NIST WebBook page for all substances for which such a link is possible. This is indicated by underling and highlighting the species. The WebBook provides thermodynamic, spectral, and other data on the species. Note that the link to the WebBook is opened as a new frame in your browser.

Clear search

Close search

Google apps

Main menu

NIST Chemical Kinetics Database

S73 | METXBIODB | Metabolite Reaction Database from BioTransformer

YARP reaction database

Database of Chemical Compounds and Reactions in Biological Pathways

Data from: SynRoute: A Retrosynthetic Planning Software

NDRL/NIST Solution Kinetics Database - SRD 40

RGD1-CNHO Database

RHEA

Biochemical Pathways Reaction Kinetics Database

Excel, ORD, and NMR spectroscopy files for: Total synthesis of Honokiol via...

Reaction network for solids - datasets and results.

Chemical reactions from US patents (1976-Sep2016)

ORD_Ahneman_2018

PlantCyc

RMG-DB-11: Enumerating Reaction Space for Small Molecule Chemistry

Data from: General reactive machine learning potentials for CHON elements

Pathway Analysis Tool for Integration and Knowledge Acquisition

EAWAG Biocatalysis/Biodegradation Database reaction

Data from: NIST Chemistry WebBook - SRD 69

Data from: BioRGroup dataset: R-group expansion of ChEBI molecules...

NIST Chemical Kinetics DatabaseSee More Versions

NIST Chemical Kinetics Database