3 datasets found
  1. Judson_Mansouri_Automated_Chemical_Curation_QSAREnvRes_Data

    • catalog.data.gov
    • data.wu.ac.at
    Updated May 2, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2021). Judson_Mansouri_Automated_Chemical_Curation_QSAREnvRes_Data [Dataset]. https://catalog.data.gov/dataset/judson-mansouri-automated-chemical-curation-qsarenvres-data
    Explore at:
    Dataset updated
    May 2, 2021
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Here we describe the development of an automated KNIME workflow to curate and correct errors in the structure and identity of chemicals using the publically available PHYSPROP physico-chemical properties and environmental fate datasets. The workflow first assembles structure-identity pairs using up to four provided chemical identifiers, including chemical name, CASRNs, SMILES, and MolBlock. Problems detected included errors and mismatches in chemical structure formats, identifiers, and various structure validation issues, including hypervalency and stereochemistry descriptions. Subsequently, a machine learning procedure was applied to evaluate the impact of this curation process. The performance of QSAR models built on only the highest quality subset of the original dataset was compared to the larger curated and corrected data set. The latter showed statistically improved predictive performance. The final workflow was used to curate the full list of PHYSPROP datasets, and is being made publically available for further usage and integration by the scientific community. This dataset is associated with the following publication: Mansouri, K., C. Grulke, A. Richard, R. Judson, and A. Williams. (SAR AND QSAR IN ENVIRONMENTAL RESEARCH) An automated curation procedure for addressing chemical errors and inconsistencies in public datasets used in QSAR modeling. SAR AND QSAR IN ENVIRONMENTAL RESEARCH. Taylor & Francis, Inc., Philadelphia, PA, USA, 27(11): 911-937, (2016).

  2. d

    Chemical properties and ecotoxicity database

    • data.gov.au
    zip
    Updated May 18, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bioregional Assessment Program (2020). Chemical properties and ecotoxicity database [Dataset]. https://data.gov.au/data/dataset/e2b33c7b-c008-4f3f-89cc-f6b98d91816e
    Explore at:
    zip(174605)Available download formats
    Dataset updated
    May 18, 2020
    Dataset provided by
    Bioregional Assessment Program
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract

    The dataset was compiled by the Geological and Bioregional Assessment Program from multiple sources referenced within the dataset and/or metadata. The processes undertaken to compile this dataset are described in the History field in this metadata statement.

    Attribution

    Geological and Bioregional Assessment Program

    History

    Physico-chemical data were compiled from US EPA Estimation Programme Interface (EPI Suite). Estimated properties were based on the Simplified Molecular Input Line-Entry System (SMILES), Biowin models 1-7, PHYSPROP. Ecotoxicology data were compiled from chemical safety data sheets, eChemPortal, USEPA ECOTOX, OECD SIDS, ECHA assessments, USEPA (2015/6) reaxys database, NICNAS/IMAP assessments, ECOSAR 2.0.

  3. ONS Open Melting Point Collection ONSMP029

    • figshare.com
    xlsx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Lang; Antony Williams; Antony Williams; Jean-Claude Bradley (2023). ONS Open Melting Point Collection ONSMP029 [Dataset]. http://doi.org/10.6084/m9.figshare.93086.v2
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Andrew Lang; Antony Williams; Antony Williams; Jean-Claude Bradley
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This Dataset ONSMP029 (2706 unique compounds, 7413 measurements) is from a project to collect and curate melting points made available as Open Data. This particular collection was selected from the application of a threshold to favor the likelihood of reliability. Specifically, the entire range of averaged values for a data point was set to 0.01 C to 5 C, with at least two different measurements within this range. Measurements were pooled and processed from the following sources: Alfa Aesar, MDPI, Bergstrom, PhysProp, DrugBank, Bell, Oxford MSDS, Hughes, Griffiths and the Chemical Information Validation Spreadsheet. Links to all the information sources and web services are available from the Open Melting Point Resource page: http://onswebservices.wikispaces.com/meltingpoint

  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
U.S. EPA Office of Research and Development (ORD) (2021). Judson_Mansouri_Automated_Chemical_Curation_QSAREnvRes_Data [Dataset]. https://catalog.data.gov/dataset/judson-mansouri-automated-chemical-curation-qsarenvres-data
Organization logo

Judson_Mansouri_Automated_Chemical_Curation_QSAREnvRes_Data

Explore at:
Dataset updated
May 2, 2021
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description

Here we describe the development of an automated KNIME workflow to curate and correct errors in the structure and identity of chemicals using the publically available PHYSPROP physico-chemical properties and environmental fate datasets. The workflow first assembles structure-identity pairs using up to four provided chemical identifiers, including chemical name, CASRNs, SMILES, and MolBlock. Problems detected included errors and mismatches in chemical structure formats, identifiers, and various structure validation issues, including hypervalency and stereochemistry descriptions. Subsequently, a machine learning procedure was applied to evaluate the impact of this curation process. The performance of QSAR models built on only the highest quality subset of the original dataset was compared to the larger curated and corrected data set. The latter showed statistically improved predictive performance. The final workflow was used to curate the full list of PHYSPROP datasets, and is being made publically available for further usage and integration by the scientific community. This dataset is associated with the following publication: Mansouri, K., C. Grulke, A. Richard, R. Judson, and A. Williams. (SAR AND QSAR IN ENVIRONMENTAL RESEARCH) An automated curation procedure for addressing chemical errors and inconsistencies in public datasets used in QSAR modeling. SAR AND QSAR IN ENVIRONMENTAL RESEARCH. Taylor & Francis, Inc., Philadelphia, PA, USA, 27(11): 911-937, (2016).

Search
Clear search
Close search
Google apps
Main menu