3 datasets found
  1. The Harvard Organic Photovoltaics 2015 (HOPV) dataset: An experiment-theory...

    • figshare.com
    txt
    Updated Mar 1, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alan Aspuru-Guzik (2016). The Harvard Organic Photovoltaics 2015 (HOPV) dataset: An experiment-theory calibration resource. [Dataset]. http://doi.org/10.6084/m9.figshare.1610063.v4
    Explore at:
    txtAvailable download formats
    Dataset updated
    Mar 1, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Alan Aspuru-Guzik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Standard data sets used for the calibration of computational results have been extremely useful for the development of electronic structure methods and their application to areas such as thermochemistry1–3 as well as non-covalent interactions4,5. To our knowledge, the field of organic photovoltaics, specifically as it pertains to high-throughput virtual screening, lacks a similar collection of data. Since the relationship between theoretically predicted and experimentally observed properties is often non-trivial, the dissemination of directly comparable data for a well-defined set of molecules can be a great asset to accelerate advances in this field. The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of geometries, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries in the domain of organic electronics.

  2. f

    Datasets with SMILES and Mordred descriptors.

    • figshare.com
    txt
    Updated Mar 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farshud Sorourifar (2024). Datasets with SMILES and Mordred descriptors. [Dataset]. http://doi.org/10.6084/m9.figshare.25506295.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Mar 29, 2024
    Dataset provided by
    figshare
    Authors
    Farshud Sorourifar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The CEP (Clean Energy Project) dataset includes the candidate molecules that are suitable for solar cell materials. The power conversion efficiency (PEC) dataset has 29,978 compounds, and corresponding CEP values and Mordred descriptors. data source: Hachmann, J., Olivares-Amaya, R., Atahan-Evrenk, S., Amador-Bedolla, C., S ́anchez-Carrera, R.S., Gold-Parker, A., Vogt, L., Brockway, A.M., Aspuru-Guzik, A.: The harvard clean energy project: large-scale computational screening and design of organic photovoltaics on the world community grid. The Journal of Physical Chemistry Letters 2(17), 2241–2251 (2011)The Malaria dataset includes the experimentally measured half-maximal effective concentration (EC50) values of a sulfide-resistant strain of Plasmodium falciparum, which is the source of malaria. The Malaria dataset has 9,998 compounds and their EC50 values and Mordred descriptors. data source: Gamo, F.-J., Sanz, L.M., Vidal, J., Cozar, C., Alvarez, E., Lavandera, J.-L., Vanderwall, D.E., Green, D.V.S., Kumar, V., Hasan, S., Brown, J.R., Peishoff, C.E., Cardon, L.R., Garcia-Bustos, J.F.: Thousands of chemical starting points for anti-malarial lead identification. Nature 465(7296), 305–310 (2010 The Lipophilicity dataset contains an octanol/water distribution coefficient at pH 7.4 measured experimentally. The dataset has 4200 compounds and their corresponding values and Mordred descriptors. data source: Wen, N., Liu, G., Zhang, J., Zhang, R., Fu, Y., Han, X.: A fingerprints based molecular property prediction method using the bert model. Journal of Cheminformatics 14(1), 71 (2022) https://doi.org/10.1186/s13321-022-00650-3 9Log P - Small commercial molecules with octanol/water values. 250k molecules and smiles

  3. f

    Energies of the HOMO and LUMO Orbitals for 111725 Organic Molecules...

    • figshare.com
    xlsx
    Updated Apr 23, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joao Aires-de-Sousa; Diogo A.R.S. Latino (2018). Energies of the HOMO and LUMO Orbitals for 111725 Organic Molecules Calculated by DFT B3LYP / 6-31G* [Dataset]. http://doi.org/10.6084/m9.figshare.3384184.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Apr 23, 2018
    Dataset provided by
    figshare
    Authors
    Joao Aires-de-Sousa; Diogo A.R.S. Latino
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    HOMO and LUMO orbital energies for 111725 organic molecules calculated at the B3LYP/6-31G*//PM6 or B3LYP/6-31G*//PM7 level of theory.Related publication:* Florbela Pereira, Kaixia Xiao, Diogo A. R. S. Latino, Chengcheng Wu, Qingyou Zhang and Joao Aires-de-Sousa:Machine Learning Methods to Predict Density Functional Theory B3LYP Energies of HOMO and LUMO Orbitals.J. Chem. Inf. Model. (2017)DOI: 10.1021/acs.jcim.6b00340 This data set is publicly available at http://dx.doi.org/10.6084/m9.figshare.3384184.v1 Files-----frontier_orbitals_111725mols_sdf.tar.gz - 111275 molecules in the MDL SDFile formatfrontier_orbitals_111725mols.xlsx - HOMO and LUMO orbital energies for 111275 neutral organic moleculescoordinates_111725mols_xyz.zip - atomic coordinates used for the DFT calculation of the 111275 moleculesPM7_frontier_orbitals.xlsx - HOMO and LUMO energies calculated by the PM7 semi-empirical method.Molecules---------For the database creation, molecular structural motifs were retrieved from organic electronics studies, and collections of dyes, metabolites and electrophiles/nucleophiles [1-5]. The database was populated by retrieval of similar examples from the ZINC database [6], the PubChem database [7] and by computationally combining motifs and lists of substituents with the ChemAxon Reactor software, JChem 15.4.6, 2015, ChemAxon (http://www.chemaxon.com). The structures were standardized with ChemAxon Standardizer (JChem 15.4.6, 2015, ChemAxon, http://www.chemaxon.com) and OpenBabel (Open Babel Package, version 2.3.1 http://openbabel.org) for neutralization and inclusion of all hydrogen atoms. The molecular structures include atomic elements C, H, B, N, O, F, Si, P, S, Cl, Se, and Br.Molecular geometries were relaxed by the PM6 or PM7 methods using the MOPAC software [8] and orbital energies were calculated by the GAMESS program [9] with the B3LYP functional and the 6-31G* basis set. Structures were calculated with the geometry obtained with the PM6 or PM7 semi-empirical method. Format------Each molecule is stored in its own file, ending in ".sdf". These are the starting structures, previous to geometry relaxation with the MOPAC program. The format is the standard MDL SDFile generated with ChemAxon Standardizer and OpenBabel.The atomic coordinates obtained with the PM6 and PM7 methods are stored in files ending in ".xyz", one for each molecule. Each file comprises a header line specifying the number of atoms n, a line with the id of the structure, and n lines containing the element and atomic coordinates, one atom per line.Orbital energies are stored in the frontier_orbitals_111725mols.xlsx file. Two different sheets are used for the main database and a data set used as final test set in the related publication. PM7 values are stored in the PM7_frontier_orbitals.xlsx with the same format. Column Content of .xlsx files------1 Molecule ID (as appears in the corresponding .sdf file name)2 HOMO energy in eV.3 LUMO energy in eV.References----------[1] Po R, Bianchi G, Carbonera C, Pellegrino A: All that glisters is not gold: an analysis of the synthetic complexity of efficient polymer donors for polymer solar cells. Macromolecules 2015, 48:453-461.[2] Hachmann J, Olivares-Amaya R, Atahan-Evrenk S, Amador-Bedolla C, Sanchez-Carrera RS, Gold-Parker A, Vogt L, Brockway AM, Aspuru-Guzik A: The Harvard Clean Energy Project: large-scale computational screening and design of organic photovoltaics on the world community grid. J Phys Chem Lett 2011, 2:2241-2251.[3] O’Boyle NM, Campbell CM, Hutchison GR: Computational design and selection of optimal organic photovoltaic materials. J Phys Chem C 2011, 115:16200-16210.[4] Mayr H, Ofial AR: Kinetics of electrophile-nucleophile combinations: a general approach to polar organic reactivity. Pure Appl Chem 2005, 77:1807-1821.[5] Kanehisa M, Goto S: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res 2000, 28:27-30.[6] Irwin JJ, Sterling T, Mysinger MM, Bolstad ES, Coleman RG: ZINC: a free tool to discover chemistry for biology. J Chem Inf Model 2012, 52:1757-1768.[7] Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, Han L, He J, He S, Shoemaker BA, Wang J, Yu B, Zhang J, Bryant SH: PubChem Substance and Compound databases. Nucleic Acids Res 2016, 44(D1):D1202-13. [8] MOPAC2009 and MOPAC2012, James J. P. Stewart, Stewart Computational Chemistry, Colorado Springs, CO, USA, http://OpenMOPAC.net (2008-2012).[9] Schmidt MW, Baldridge KK, Boatz JA, Elbert ST, Gordon MS, Jensen JJ, Koseki S, Matsunaga N, Nguyen KA, Su S, Windus TL, Dupuis M, Montgomery JA: General atomic and molecular electronic structure system. J Comput Chem 1993, 14:1347-1363. GAMESS Version 1 May 2013 (R1).

  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Alan Aspuru-Guzik (2016). The Harvard Organic Photovoltaics 2015 (HOPV) dataset: An experiment-theory calibration resource. [Dataset]. http://doi.org/10.6084/m9.figshare.1610063.v4
Organization logo

The Harvard Organic Photovoltaics 2015 (HOPV) dataset: An experiment-theory calibration resource.

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
txtAvailable download formats
Dataset updated
Mar 1, 2016
Dataset provided by
Figsharehttp://figshare.com/
Authors
Alan Aspuru-Guzik
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Standard data sets used for the calibration of computational results have been extremely useful for the development of electronic structure methods and their application to areas such as thermochemistry1–3 as well as non-covalent interactions4,5. To our knowledge, the field of organic photovoltaics, specifically as it pertains to high-throughput virtual screening, lacks a similar collection of data. Since the relationship between theoretically predicted and experimentally observed properties is often non-trivial, the dissemination of directly comparable data for a well-defined set of molecules can be a great asset to accelerate advances in this field. The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of geometries, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries in the domain of organic electronics.

Search
Clear search
Close search
Google apps
Main menu