100+ datasets found

Disconnection Labelled Reaction Data
zenodo.org
bin, csv
Updated Sep 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amol Thakkar; Alain Vaucher; Andrea Byekwaso; Philippe Schwaller; Alessandra Toniato; Teodoro Laino; Amol Thakkar; Alain Vaucher; Andrea Byekwaso; Philippe Schwaller; Alessandra Toniato; Teodoro Laino (2022). Disconnection Labelled Reaction Data [Dataset]. http://doi.org/10.5281/zenodo.7101695
Explore at:
bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7101695
Dataset updated
Sep 23, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Amol Thakkar; Alain Vaucher; Andrea Byekwaso; Philippe Schwaller; Alessandra Toniato; Teodoro Laino; Amol Thakkar; Alain Vaucher; Andrea Byekwaso; Philippe Schwaller; Alessandra Toniato; Teodoro Laino
Description
Dataset containing reaction centers used to train the disconnection aware model
g
NIST Chemical Kinetics Database
gimi9.com
res1catalogd-o-tdatad-o-tgov.vcapture.xyz
+3more
Updated Feb 1, 2002
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2002). NIST Chemical Kinetics Database [Dataset]. https://gimi9.com/dataset/data-gov_nist-chemical-kinetics-database-bee86
Explore at:
Dataset updated
Feb 1, 2002
Description
The NIST Chemical Kinetics Database includes essentially all reported kinetics results for thermal gas-phase chemical reactions. The database is designed to be searched for kinetics data based on the specific reactants involved, for reactions resulting in specified products, for all the reactions of a particular species, or for various combinations of these. In addition, the bibliography can be searched by author name or combination of names. The database contains in excess of 38,000 separate reaction records for over 11,700 distinct reactant pairs. These data have been abstracted from over 12,000 papers with literature coverage through early 2000. Rate constant records for a specified reaction are found by searching the Reaction Database. All rate constant records for that reaction are returned, with a link to 'Details' on that record. Each rate constant record contains the following information (as available): a) Reactants and, if defined, reaction products; b) Rate parameters: A, n, Ea/R, where k = A* (T/298)**n exp[-(Ea/R)/T], where T is the temperature in Kelvins; c) Uncertainty in A, n, and Ea/R, if reported; d) Temperature range of experiment or temperature range of validity of a review or theoretical paper; e) Pressure range and bulk gas of the experiment; f) Data type of the record (i.e., experimental, relative rate measurement, theoretical calculation, modeling result, etc.). If the result is a relative rate measurement, then the reaction to which the rate is relative is also given; g) Experimental procedure, including separate fields for the description of the apparatus, the time resolution of the experiment, and the excitation technique. A majority of contemporary chemical kinetics methods are represented. The Kinetics Database is being expanded to include other resources for the convenience of the users. Presently this includes direct links to the corresponding NIST WebBook page for all substances for which such a link is possible. This is indicated by underling and highlighting the species. The WebBook provides thermodynamic, spectral, and other data on the species. Note that the link to the WebBook is opened as a new frame in your browser.
f
Yield curation USPTO rsmi/csv datasets
figshare.com
zip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Minidis (2023). Yield curation USPTO rsmi/csv datasets [Dataset]. http://doi.org/10.6084/m9.figshare.14414039.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14414039.v1
Dataset updated
May 31, 2023
Dataset provided by
figshare
Authors
Alexander Minidis
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
In 2017 Lowe shared curated and published USPTO based chemical reaction datasets in csv format. Based on this, Schwaller et al. published curated reaction smiles (they in turn used the curated set disclosed by Jin and coworkers). Both versions have the drawback of containing only partially curated yields. In those datasets, two columns are available, TextMinedYield and Calculated yield. Many entries there don't contain any, partial, or incorrect numbers. For certain forms of reaction analysis focusing on yield as only available correlation, that information becomes essentially useless since there is no correlation to reaction conditions (unless one would data-mine the CML files or original XML).By correcting and merging the yield into a new column, followed by eliminating faulty entries, the noise in the data set is reduced. The new datasets are reduced by nearly 50%.Attached are two kinds of datasets (of each, Lowe and Schwaller):A "cropped" version, containing only the reaction smiles and the curated yield (and an added ID), and only entries with valid yields. Everything else was filtered out.A second type, a "full" version, including the curated yields and all original input columns and entries (no filtration). The latter might come in handy for other applications where one doesn't agree with the applied removal of invalid entries, or to apply further curation.More details can be found on Github containing Python scripts used to procure the attached datasets and a Readme file.For the less adept programmer, a graphical workflow based on the open-source data analysis platform Knime(R) is also available. The latter contains furthermore a proof of concept reaction splitter (data not included here).
h
ORD_Ahneman_2018
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carl Mauro, ORD_Ahneman_2018 [Dataset]. https://huggingface.co/datasets/cmmauro/ORD_Ahneman_2018
Explore at:
Authors
Carl Mauro
Description
BIOINF595 W2025 Bioactivity Project Dataset Author: Carl Mauro The reaction data used in this project is from the following publication, accessed through the Open Reaction Database (https://open-reaction-database.org/). The original data is used under an MIT license, and is under copyright by the original authors (see LICENSE.txt file for details). Ahneman, D. T.; Estrada, J. G.; Lin, S.; Dreher, S. D.; Doyle, A. G. Predicting Reaction Performance in C–N Cross-Coupling Using Machine… See the full description on the dataset page: https://huggingface.co/datasets/cmmauro/ORD_Ahneman_2018.
ORDerly Transformer Models for chemical tasks
figshare.com
bin
Updated Jul 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Wigh; Joe arrowsmith; Kobi Felton; Alexander Pomberger; Alexei A. Lapkin (2025). ORDerly Transformer Models for chemical tasks [Dataset]. http://doi.org/10.6084/m9.figshare.29552543.v4
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29552543.v4
Dataset updated
Jul 14, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
Daniel Wigh; Joe arrowsmith; Kobi Felton; Alexander Pomberger; Alexei A. Lapkin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Transformer models trained on tasks in organic chemistry on ORDerly benchmark datasets.ORDerly_retro: Retrosynthesis prediction (prediction reactants given a desired product)ORDerly_forward_separated: Forward reaction prediction (predict reaction products given reactants, solvents, and agents), with reactants separated by > from the solvents and agents in the reaction string.ORDerly_forward_mixed: Forward reaction prediction (predict reaction products given reactants, solvents, and agents), with reactants, solvents and agents mixed together in the reaction string.non-uspto-eval: Evaluation of transformer models trained on USPTO data on non-uspto data available in the Open Reaction Database.Full details can be found in our paper: https://chemrxiv.org/engage/chemrxiv/article-details/64ca5d3e4a3f7d0c0d78ca42Neurips workshop paper: https://openreview.net/forum?id=R8FQMsECISCode: https://github.com/sustainable-processes/orderlyThe supplementary datasets used for this work can be found here: https://doi.org/10.6084/m9.figshare.23502372.v3Transformer model architecture is from Molecular Transformer: https://pubs.acs.org/doi/10.1021/acscentsci.9b00576Find the results, models, and checkpoints within MolecularTransformer/experiments. Note that the "wandb" folder was deleted since figshare only allows uploads up to 500 files.Notes:There's a limit of 500 files in figshare, so I deleted the the "docs" and "onmt", and "OpenNMT_py.egg-info" , and "tools" folders from all folders except "ORDerly_retro". I also deleted all wandb-associated files and all checkpoint files.Empty files cannot be uploaded to figshare, so you have to create these yourself, where appropriate (e.g. MolecularTransformer/onmt/tests/_init_.py and non-uspto-eval/MolecularTransformer/experiments/models/ofs_1.pt).Feel free to email me, Daniel Wigh, at dsw46@cam.ac.uk or daniel@reactwise.com or my supervisor Alexei A. Lapkin.
f
Data from: AiZynthFinder: a fast, robust and flexible open-source software...
figshare.com
hdf
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samuel Genheden; Esben Jannik Bjerrum; Amol Thakkar; Jean-Louis Reymond; Veronika Chadimova; Ola Engkvist (2023). AiZynthFinder: a fast, robust and flexible open-source software for retrosynthetic planning [Dataset]. http://doi.org/10.6084/m9.figshare.12334577.v1
Explore at:
hdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12334577.v1
Dataset updated
May 30, 2023
Dataset provided by
figshare
Authors
Samuel Genheden; Esben Jannik Bjerrum; Amol Thakkar; Jean-Louis Reymond; Veronika Chadimova; Ola Engkvist
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This is public data to be used with the aizynthfinder tool for retrosynthesis planning (https://github.com/MolecularAI/aizynthfinder)There are three files available:* full_uspto_03_05_19_rollout_policy.hdf5 - the Keras neural network model used as rollout policy* full_uspto_03_05_19_unique_templates.hdf5 - unique template codes that are used together with the policy to generate new precursors in the tree search* zinc_stock_17_04_20.hdf - stock file made from the ZINC database on 17:th of april 2020.
Canada Vigilance Adverse Reaction Online Database
ouvert.canada.ca
open.canada.ca
+1more
html, json, xml, zip
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Health Canada (2025). Canada Vigilance Adverse Reaction Online Database [Dataset]. https://ouvert.canada.ca/data/dataset/9cbaef00-b52c-4a70-9fed-d9aa8263ab74
Explore at:
json, xml, html, zipAvailable download formats
Dataset updated
May 28, 2025
Dataset provided by
Health Canadahttp://www.hc-sc.gc.ca/
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Area covered
Canada
Description
The data extract is a series of compressed ASCII text files of the full data set contained in the Canada Vigilance Adverse Reaction Online Database. It is intended for users who are familiar with database structures and setting up their own queries. Find details on the data structure required for the data file in the Canada Vigilance Adverse Reaction Online Database - Data Structure. In order to use the data, the file must be loaded into an existing database or information system provided by the user. The Canada Vigilance Adverse Reaction Online Database contains information about suspected adverse reactions (also known as side effects) to health products, captured from adverse reaction reports submitted to Health Canada by consumers and health professionals, who submit reports voluntarily, as well as by market authorization holders (manufacturers and distributors), who are required to submit reports according to the Food and Drugs Regulations. Information concerning vaccines used for immunization have only been included in the database since January 1, 2011. Indication data has recently been added to the data extract files and the Detailed Adverse Reaction Report. Indication refers to the particular condition for which a health product was taken. For example, diabetes is an indication for insulin. Health products are often authorised for use in treating more than one indication. Note: The database cannot be used on its own to evaluate a health product's safety profile. It does not provide conclusive information on the safety of health products, and is not a substitute for medical advice. Should you have an issue of medical concern, consult a qualified health professional.
m
Data from: Chemical Kinetics Bayesian Inference Toolbox (CKBIT)
data.mendeley.com
Updated May 17, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maximilian Cohen (2021). Chemical Kinetics Bayesian Inference Toolbox (CKBIT) [Dataset]. http://doi.org/10.17632/tnzk2jvffs.2
Explore at:
Unique identifier
https://doi.org/10.17632/tnzk2jvffs.2
Dataset updated
May 17, 2021
Authors
Maximilian Cohen
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The robust estimation of chemical kinetic parameters and their associated uncertainty is essential in the field of chemistry and catalysis. The Chemical Kinetics Bayesian Inference Toolbox (CKBIT) is a Python software library introduced to enable users to implement advanced Bayesian inference techniques for kinetic parameter estimation and uncertainty quantification. Leveraging functionalities of other open source Python packages and offering simplified implementation through minimal user-required coding and straightforward Excel input files, CKBIT aspires to make the inference method easily accessible for chemical kinetics. CKBIT provides maximum a posteriori, Markov chain Monte Carlo, and variational inference estimation options. Users may apply these functionalities to estimate activation energies, reaction orders, and pre-exponential terms from chemical reaction data from batch reactors, continuous stirred-tank reactors, and plug flow reactors. The availability of prior distribution specification and the implementation of hierarchical modeling in CKBIT provide a heightened level of accuracy in estimates of kinetic parameters and their uncertainties.
Ames Quantum Chemistry - Dataset - NASA Open Data Portal
data.nasa.gov
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Ames Quantum Chemistry - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/ames-quantum-chemistry
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
Ames Quantum Chemistry Dataset collects electronic structure, reaction kinetics, and dynamics data calculated at Ames Research Center. This includes potential energy curves and surfaces as well as the reaction cross sections and rate coefficients.
ROSETTA REACTION WHEEL ENGINEERING DATA - Dataset - NASA Open Data Portal
data.nasa.gov
data.staging.idas-ds1.appdat.jsc.nasa.gov
Updated Mar 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). ROSETTA REACTION WHEEL ENGINEERING DATA - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/rosetta-reaction-wheel-engineering-data
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
This CODMAC level 3 data set contains the key parameters of the four Reaction wheel housekeeping. In particular, it provides information on the Reaction wheel friction, measured angular momentum & wheel direction. It covers the period from launch in 2004, through the 3 Earth and 1 Mars flyby, plus the hibernation phases, plus the asteroid flybys and finally covers the Prelanding, comet escort & Extension phases of the prime target of the mission. The prime target is comet 67P/Churyumov-Gerasimenko 1 (1969 R1). This version V1.0 is the first version of this dataset.
Canada Vigilance Adverse Reaction Online Database - Data Structure
ouvert.canada.ca
open.canada.ca
html
Updated Apr 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Health Canada (2022). Canada Vigilance Adverse Reaction Online Database - Data Structure [Dataset]. https://ouvert.canada.ca/data/dataset/786f35a3-6170-4419-92c5-5834f071d8bc
Explore at:
htmlAvailable download formats
Dataset updated
Apr 28, 2022
Dataset provided by
Health Canadahttp://www.hc-sc.gc.ca/
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Area covered
Canada
Description
Although the Canada Vigilance Adverse Reaction Online Database is a relational database, there is a requirement to provide the data to users in a common format; therefore the data has been extracted into a flat file format. All files are dollar ($) sign delimited enclosed in "quotes".
V
Data from: Incomplete evidence: the inadequacy of databases in tracing...
odgavaprod.ogopendata.com
healthdata.gov
+1more
html
Updated Jul 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institutes of Health (2025). Incomplete evidence: the inadequacy of databases in tracing published adverse drug reactions in clinical trials [Dataset]. https://odgavaprod.ogopendata.com/dataset/incomplete-evidence-the-inadequacy-of-databases-in-tracing-published-adverse-drug-reactions-in-
Explore at:
htmlAvailable download formats
Dataset updated
Jul 23, 2025
Dataset provided by
National Institutes of Health
Description
Background We would expect information on adverse drug reactions in randomised clinical trials to be easily retrievable from specific searches of electronic databases. However, complete retrieval of such information may not be straightforward, for two reasons. First, not all clinical drug trials provide data on the frequency of adverse effects. Secondly, not all electronic records of trials include terms in the abstract or indexing fields that enable us to select those with adverse effects data. We have determined how often automated search methods, using indexing terms and/or textwords in the title or abstract, would fail to retrieve trials with adverse effects data.

Methods We used a sample set of 107 trials known to report frequencies of adverse drug effects, and measured the proportion that (i) were not assigned the appropriate adverse effects indexing terms in the electronic databases, and (ii) did not contain identifiable adverse effects textwords in the title or abstract. Results Of the 81 trials with records on both MEDLINE and EMBASE, 25 were not indexed for adverse effects in either database. Twenty-six trials were indexed in one database but not the other. Only 66 of the 107 trials reporting adverse effects data mentioned this in the abstract or title of the paper. Simultaneous use of textword and indexing terms retrieved only 82/107 (77%) papers. Conclusions Specific search strategies based on adverse effects textwords and indexing terms will fail to identify nearly a quarter of trials that report on the rate of drug adverse effects.
f
Data from: Ring-Opening Reactions of Tetrahydrofuran versus Alkyne...
figshare.com
txt
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Torsten Beweries; Ulrike Jäger-Fiedler; Marc A. Bach; Vladimir V. Burlakov; Perdita Arndt; Wolfgang Baumann; Anke Spannenberg; Uwe Rosenthal (2023). Ring-Opening Reactions of Tetrahydrofuran versus Alkyne Complexation by Group 4 Metallocene Complexes Leading to General Consequences for Synthesis and Reactions of Metallocene Complexes [Dataset]. http://doi.org/10.1021/om0702173.s006
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1021/om0702173.s006
Dataset updated
Jun 1, 2023
Dataset provided by
ACS Publications
Authors
Torsten Beweries; Ulrike Jäger-Fiedler; Marc A. Bach; Vladimir V. Burlakov; Perdita Arndt; Wolfgang Baumann; Anke Spannenberg; Uwe Rosenthal
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The reduction of certain group 4 metallocene dichlorides by magnesium or lithium in the presence or absence of Me3SiC2SiMe3 in THF or toluene was investigated, giving in the case of titanium the dinuclear Ti(III) complex [rac-(ebthi)Ti(μ-Cl)]2 (1). For zirconium the 1-oxa-2-zirconacyclohexane 2 was formed by ring-opening reaction of rac-(ebthi)Zr(η2-Me3SiC2SiMe3) with THF. As a byproduct from the synthesis of Cp*2Zr(η2-Me3SiC2SiMe3) starting from Cp*2ZrCl2 another 1-oxa-2-zirconacyclohexane (3) was obtained by ring-opening reaction of THF via the dinuclear complex Cp*2Zr(Cl)-(CH2)4O−Zr(Cl)Cp*2 (4). In the case of hafnium the analogous dinuclear complex Cp*2Hf(Cl)−(CH2)4O−Hf(Cl)Cp*2 (5) and 1-oxa-2-hafnacyclohexane (6) were the main products of the reaction, inhibiting the synthesis of Cp*2Hf(η2-Me3SiC2SiMe3) (7). The tendency for ring opening of THF initiated by metallocenes increases in the series Ti, Zr, Hf, thus leading to consequences for the synthesis of metallocene complexes.
Data from: FoamPi: An open-source Raspberry Pi based apparatus for...
osf.io
Updated Sep 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harry Wright (2022). FoamPi: An open-source Raspberry Pi based apparatus for monitoring polyurethane foam reactions. [Dataset]. http://doi.org/10.17605/OSF.IO/U3295
Explore at:
Unique identifier
https://doi.org/10.17605/OSF.IO/U3295
Dataset updated
Sep 11, 2022
Dataset provided by
Center for Open Sciencehttps://cos.io/
Authors
Harry Wright
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Adiabatic temperature rise (ATR) is an important method for determining isocyanate conversion in polyurethane foam reactions as well as many other exothermic chemical reactions. ATR can be used in conjunction with change in height and mass measurements to gain understanding into the blowing and gelling reactions that occur during polyurethane foaming as well as give important information on cell morphology. FoamPi is an open-source Raspberry Pi device for monitoring polyurethane foaming reactions. The device effectively monitors temperature rise, change in foam height as well as changes in the mass during the reaction. Three Python scripts are also presented. The first logs raw data during the reaction. The second corrects temperature data such that it can be used in ATR reactions for calculating isocyanate conversion; additionally this script reduces noise in all the data and removes erroneous readings. The final script extracts important information from the corrected data such as maximum temperature change and maximum height change as well as the time to reach these points. Commercial examples of such equipment exist however the price (£10000) of these equipment make these systems inaccessible for many research laboratories. The FoamPi build presented is inexpensive (£350).
O
Strengthening of Calcite Assemblages through Chemical Complexation Reaction...
data.openei.org
catalog.data.gov
image
Updated Feb 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robert Choens; Jennifer Wilson; Anastasia Ilgen; Robert Choens; Jennifer Wilson; Anastasia Ilgen (2021). Strengthening of Calcite Assemblages through Chemical Complexation Reaction - Experimental Data [Dataset]. https://data.openei.org/submissions/4135
Explore at:
imageAvailable download formats
Dataset updated
Feb 4, 2021
Dataset provided by
Open Energy Data Initiative (OEDI)
Sandia National Laboratories
USDOE Office of Energy Efficiency and Renewable Energy (EERE), Multiple Programs (EE)
Authors
Robert Choens; Jennifer Wilson; Anastasia Ilgen; Robert Choens; Jennifer Wilson; Anastasia Ilgen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Experimental data for manuscript "Strengthening of Calcite Assemblages through Chemical Complexation Reaction" by R. C. Choens, J. Wilson, and A. G. Ilgen; Sandia National Laboratories. The data includes scanning electron microscope images of various calcite assemblages along with experimental data .
f
OpenREACT-CHON-EFH — Open REaction Dataset of Atomic ConfiguraTions...
figshare.com
hdf
Updated May 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Austin Rodriguez; Justin S. Smith; Jose L. Mendoza-Cortes (2025). OpenREACT-CHON-EFH — Open REaction Dataset of Atomic ConfiguraTions comprising C, H, O, N with Energies, Forces, and Hessians [Dataset]. http://doi.org/10.6084/m9.figshare.29189858.v4
Explore at:
hdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29189858.v4
Dataset updated
May 29, 2025
Dataset provided by
figshare
Authors
Austin Rodriguez; Justin S. Smith; Jose L. Mendoza-Cortes
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These datasets were used in the training and testing of Machine Learning Interatomic Potentials (MLIPs) as part of the work represented in the article titled Does Hessian Data Improve the Performance of Machine Learning Potentials?.RTP Dataset (Reactant–Transition State–Product Dataset):The RTP dataset forms the core training and evaluation set and consists of 35,087 molecular geometries sampled from 11,961 unique elementary reactions. For each reaction, three critical geometries are included: the optimized reactant, transition state (TS), and product. Each geometry is labeled with its corresponding DFT-computed potential energy, atomic forces, and Hessian matrix, calculated at the wb97xd/6-31g(d) level of theory. This dataset represents stationary points (critical points) on the potential energy surface and serves as the foundation for training the MLIPs to reproduce energies, gradients, and curvatures.IRC Dataset (Intrinsic Reaction Coordinate Dataset):To assess the extrapolation performance of the trained MLIPs along continuous reaction pathways, a dataset of 34,248 geometries was compiled from 600 Intrinsic Reaction Coordinate (IRC) paths, each corresponding to a distinct elementary reaction in the RTP dataset. These geometries were obtained by following the minimum energy path (MEP) from the transition state to both reactant and product wells using quantum chemistry calculations at the wb97xd/6-31g(d) level of theory. While these geometries are not explicitly used in training, they provide a rigorous benchmark for evaluating the ability of MLIPs to generalize beyond training data and accurately model transition state connectivity and reaction dynamics.NMS Dataset (Normal Mode Sampling Dataset):To evaluate MLIP robustness on off-equilibrium, perturbed structures, 62,527 geometries were generated via Normal Mode Sampling (NMS). These structures are derived by displacing intermediate IRC geometries along their vibrational modes with random amplitudes, simulating thermal fluctuations and non-equilibrium distortions. The properties of these perturbed structures were calculated at the wb97xd/6-31g(d) level of theory. This dataset allows for testing the model's stability and accuracy in more realistic, noisy molecular environments as encountered in molecular dynamics simulations or under experimental conditions.
Data extracts from the Canada Vigilance adverse reaction online database
open.canada.ca
ouvert.canada.ca
html
Updated Mar 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Health Canada (2024). Data extracts from the Canada Vigilance adverse reaction online database [Dataset]. https://open.canada.ca/data/info/29f39ab3-24fc-4b0a-90b2-3c9f97a88158
Explore at:
htmlAvailable download formats
Dataset updated
Mar 30, 2024
Dataset provided by
Health Canadahttp://www.hc-sc.gc.ca/
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Area covered
Canada
Description
The data set is updated on a monthly basis and currently covers the following time period: 1965 to 2023-10-31. The data extract is a series of compressed ASCII text files of the full data set contained in the Canada Vigilance Adverse Reaction Online Database. It is intended for users who are familiar with database structures and setting up their own queries.
Canada Vigilance adverse reaction online database
open.canada.ca
datasets.ai
html
Updated Dec 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Health Canada (2024). Canada Vigilance adverse reaction online database [Dataset]. https://open.canada.ca/data/info/98cad9a3-5b61-4c1e-965d-531804542560
Explore at:
htmlAvailable download formats
Dataset updated
Dec 2, 2024
Dataset provided by
Health Canadahttp://www.hc-sc.gc.ca/
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Area covered
Canada
Description
The Canada Vigilance Adverse Reaction Online Database contains information about suspected adverse reactions (also known as side effects) to health products.
Data from: NIST Chemistry WebBook - SRD 69
webbook.nist.gov
data.nist.gov
+3more
Updated Oct 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2023). NIST Chemistry WebBook - SRD 69 [Dataset]. http://doi.org/10.18434/T4D303
Explore at:
Unique identifier
https://doi.org/10.18434/T4D303
Dataset updated
Oct 9, 2023
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
License
https://www.nist.gov/open/copyright-fair-use-and-licensing-statements-srd-data-software-and-technical-series-publications#SRDhttps://www.nist.gov/open/copyright-fair-use-and-licensing-statements-srd-data-software-and-technical-series-publications#SRD
Description
The NIST Chemistry WebBook provides users with easy access to chemical and physical property data for chemical species through the internet. The data provided in the site are from collections maintained by the NIST Standard Reference Data Program and outside contributors. Data in the WebBook system are organized by chemical species. The WebBook system allows users to search for chemical species by various means. Once the desired species has been identified, the system will display data for the species. Data include thermochemical properties of species and reactions, thermophysical properties of species, and optical, electronic and mass spectra.
d
Data from: FOUNTAIN: A JAVA open-source package to assist large sequencing...
catalog.data.gov
Updated Jul 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institutes of Health (2025). FOUNTAIN: A JAVA open-source package to assist large sequencing projects [Dataset]. https://catalog.data.gov/dataset/fountain-a-java-open-source-package-to-assist-large-sequencing-projects
Explore at:
Dataset updated
Jul 24, 2025
Dataset provided by
National Institutes of Health
Description
Background Better automation, lower cost per reaction and a heightened interest in comparative genomics has led to a dramatic increase in DNA sequencing activities. Although the large sequencing projects of specialized centers are supported by in-house bioinformatics groups, many smaller laboratories face difficulties managing the appropriate processing and storage of their sequencing output. The challenges include documentation of clones, templates and sequencing reactions, and the storage, annotation and analysis of the large number of generated sequences. Results We describe here a new program, named FOUNTAIN, for the management of large sequencing projects . FOUNTAIN uses the JAVA computer language and data storage in a relational database. Starting with a collection of sequencing objects (clones), the program generates and stores information related to the different stages of the sequencing project using a web browser interface for user input. The generated sequences are subsequently imported and annotated based on BLAST searches against the public databases. In addition, simple algorithms to cluster sequences and determine putative polymorphic positions are implemented. Conclusions A simple, but flexible and scalable software package is presented to facilitate data generation and storage for large sequencing projects. Open source and largely platform and database independent, we wish FOUNTAIN to be improved and extended in a community effort.

Facebook

Twitter

Click to copy link

Link copied

Cite

Amol Thakkar; Alain Vaucher; Andrea Byekwaso; Philippe Schwaller; Alessandra Toniato; Teodoro Laino; Amol Thakkar; Alain Vaucher; Andrea Byekwaso; Philippe Schwaller; Alessandra Toniato; Teodoro Laino (2022). Disconnection Labelled Reaction Data [Dataset]. http://doi.org/10.5281/zenodo.7101695

Disconnection Labelled Reaction Data

Explore at:

bin, csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7101695

Dataset updated

Sep 23, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Description

Dataset containing reaction centers used to train the disconnection aware model

Clear search

Close search

Google apps

Main menu

Disconnection Labelled Reaction Data

NIST Chemical Kinetics Database

Yield curation USPTO rsmi/csv datasets

ORD_Ahneman_2018

ORDerly Transformer Models for chemical tasks

Data from: AiZynthFinder: a fast, robust and flexible open-source software...

Canada Vigilance Adverse Reaction Online Database

Data from: Chemical Kinetics Bayesian Inference Toolbox (CKBIT)

Ames Quantum Chemistry - Dataset - NASA Open Data Portal

ROSETTA REACTION WHEEL ENGINEERING DATA - Dataset - NASA Open Data Portal

Canada Vigilance Adverse Reaction Online Database - Data Structure

Data from: Incomplete evidence: the inadequacy of databases in tracing...

Data from: Ring-Opening Reactions of Tetrahydrofuran versus Alkyne...

Data from: FoamPi: An open-source Raspberry Pi based apparatus for...

Strengthening of Calcite Assemblages through Chemical Complexation Reaction...

OpenREACT-CHON-EFH — Open REaction Dataset of Atomic ConfiguraTions...

Data extracts from the Canada Vigilance adverse reaction online database

Canada Vigilance adverse reaction online database

Data from: NIST Chemistry WebBook - SRD 69

Data from: FOUNTAIN: A JAVA open-source package to assist large sequencing...

Disconnection Labelled Reaction Data