Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Clustering algorithms are at the basis of several technological applications, and are fueling the development of rapidly evolving fields such as machine learning. In the recent past, however, it has become apparent that they face challenges stemming from datasets that span more spatial dimensions. In fact, the best-performing clustering algorithms scale linearly in the number of points, but quadratically with respect to the local density of points. In this work, we introduce qCLUE, a quantum clustering algorithm that scales linearly in both the number of points and their density. qCLUE is inspired by CLUE, an algorithm developed to address the challenging time and memory budgets of Event Reconstruction (ER) in future High-Energy Physics experiments. As such, qCLUE marries decades of development with the quadratic speedup provided by quantum computers. We numerically test qCLUE in several scenarios, demonstrating its effectiveness and proving it to be a promising route to handle complex data analysis tasks – especially in high-dimensional datasets with high densities of points.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The PACHQA dataset contains the results of quantum chemical calculations for 3551 molecules comprising 3417 chlorinated polycyclic aromatic hydrocarbons (Cl-PAHs) with up to 6 rings and a different number of chlorine atoms in their structure together with 134 parent polycyclic aromatic hydrocarbons (PAHs). Cl-PAHs, the products of incomplete combustion of organic substances and materials, are hazardous pollutants with carcinogenic and mutagenic activity. Quantum chemistry methods are important to understand their formation mechanisms and properties. The large scale calculations at different levels of quantum chemical theory are useful for training the machine learning algorithms that aim to correct the values of properties obtained with computationally inexpensive methods to the accuracy of higher levels of theory.The computational procedure includes subsequent optimization in the MMFF94 force field, optimization and calculation of the vibrational frequencies and thermochemical properties with the semiempirical tight-binding GFN2-xTB method (this level is denoted as xtb2), optimization and calculation of the vibrational frequencies and thermochemical properties with the composite DFT method r2SCAN-3c (denoted as r2scan), and single-point energy calculations with the range-separated hybrid ωB97X-D4 functional and the def2-TZVP basis set (denoted as d4tzvp). The list of molecules and a number of their properties obtained at different theory levels are compiled in the props.csv file (3.8 MB). The complete list of data fields in props.csv is given in the annotation.pdf file (65 kB). The optimized geometries and more calculated properties which may be useful for machine learning tasks are available in PACHQA1-main.7z (geometries, xtb output, ORCA property reports, 183 MB) and PACHQA2-full_outfiles.7z (full ORCA output files, 343 MB) archives. The file PACHQA3-wfns.7z (57 GB) contains wavefunctions, electron densities, and xtb electrostatic potentials. All other files produced during calculations including the outputs of calculations that resulted in imaginary frequencies are collected in the PACHQA4-other.7z file (7 GB).
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Clustering algorithms are at the basis of several technological applications, and are fueling the development of rapidly evolving fields such as machine learning. In the recent past, however, it has become apparent that they face challenges stemming from datasets that span more spatial dimensions. In fact, the best-performing clustering algorithms scale linearly in the number of points, but quadratically with respect to the local density of points. In this work, we introduce qCLUE, a quantum clustering algorithm that scales linearly in both the number of points and their density. qCLUE is inspired by CLUE, an algorithm developed to address the challenging time and memory budgets of Event Reconstruction (ER) in future High-Energy Physics experiments. As such, qCLUE marries decades of development with the quadratic speedup provided by quantum computers. We numerically test qCLUE in several scenarios, demonstrating its effectiveness and proving it to be a promising route to handle complex data analysis tasks – especially in high-dimensional datasets with high densities of points.