2 datasets found

Judson_Mansouri_Automated_Chemical_Curation_QSAREnvRes_Data
catalog.data.gov
data.wu.ac.at
Updated May 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2021). Judson_Mansouri_Automated_Chemical_Curation_QSAREnvRes_Data [Dataset]. https://catalog.data.gov/dataset/judson-mansouri-automated-chemical-curation-qsarenvres-data
Explore at:
Dataset updated
May 2, 2021
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Here we describe the development of an automated KNIME workflow to curate and correct errors in the structure and identity of chemicals using the publically available PHYSPROP physico-chemical properties and environmental fate datasets. The workflow first assembles structure-identity pairs using up to four provided chemical identifiers, including chemical name, CASRNs, SMILES, and MolBlock. Problems detected included errors and mismatches in chemical structure formats, identifiers, and various structure validation issues, including hypervalency and stereochemistry descriptions. Subsequently, a machine learning procedure was applied to evaluate the impact of this curation process. The performance of QSAR models built on only the highest quality subset of the original dataset was compared to the larger curated and corrected data set. The latter showed statistically improved predictive performance. The final workflow was used to curate the full list of PHYSPROP datasets, and is being made publically available for further usage and integration by the scientific community. This dataset is associated with the following publication: Mansouri, K., C. Grulke, A. Richard, R. Judson, and A. Williams. (SAR AND QSAR IN ENVIRONMENTAL RESEARCH) An automated curation procedure for addressing chemical errors and inconsistencies in public datasets used in QSAR modeling. SAR AND QSAR IN ENVIRONMENTAL RESEARCH. Taylor & Francis, Inc., Philadelphia, PA, USA, 27(11): 911-937, (2016).
d
Judson_Mansouri_Automated_Chemical_Curation_QSAREnvRes_Data.
datadiscoverystudio.org
Updated Oct 3, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). Judson_Mansouri_Automated_Chemical_Curation_QSAREnvRes_Data. [Dataset]. http://datadiscoverystudio.org/geoportal/rest/metadata/item/e1b0f413e48648e89c921b4bbe8f3a4a/html
Explore at:
Dataset updated
Oct 3, 2017
Description
description: Here we describe the development of an automated KNIME workflow to curate and correct errors in the structure and identity of chemicals using the publically available PHYSPROP physico-chemical properties and environmental fate datasets. The workflow first assembles structure-identity pairs using up to four provided chemical identifiers, including chemical name, CASRNs, SMILES, and MolBlock. Problems detected included errors and mismatches in chemical structure formats, identifiers, and various structure validation issues, including hypervalency and stereochemistry descriptions. Subsequently, a machine learning procedure was applied to evaluate the impact of this curation process. The performance of QSAR models built on only the highest quality subset of the original dataset was compared to the larger curated and corrected data set. The latter showed statistically improved predictive performance. The final workflow was used to curate the full list of PHYSPROP datasets, and is being made publically available for further usage and integration by the scientific community. This dataset is associated with the following publication: Mansouri, K., C. Grulke, A. Richard, R. Judson, and A. Williams. (SAR AND QSAR IN ENVIRONMENTAL RESEARCH) An automated curation procedure for addressing chemical errors and inconsistencies in public datasets used in QSAR modeling. SAR AND QSAR IN ENVIRONMENTAL RESEARCH. Taylor & Francis, Inc., Philadelphia, PA, USA, 27(11): 911-937, (2016).; abstract: Here we describe the development of an automated KNIME workflow to curate and correct errors in the structure and identity of chemicals using the publically available PHYSPROP physico-chemical properties and environmental fate datasets. The workflow first assembles structure-identity pairs using up to four provided chemical identifiers, including chemical name, CASRNs, SMILES, and MolBlock. Problems detected included errors and mismatches in chemical structure formats, identifiers, and various structure validation issues, including hypervalency and stereochemistry descriptions. Subsequently, a machine learning procedure was applied to evaluate the impact of this curation process. The performance of QSAR models built on only the highest quality subset of the original dataset was compared to the larger curated and corrected data set. The latter showed statistically improved predictive performance. The final workflow was used to curate the full list of PHYSPROP datasets, and is being made publically available for further usage and integration by the scientific community. This dataset is associated with the following publication: Mansouri, K., C. Grulke, A. Richard, R. Judson, and A. Williams. (SAR AND QSAR IN ENVIRONMENTAL RESEARCH) An automated curation procedure for addressing chemical errors and inconsistencies in public datasets used in QSAR modeling. SAR AND QSAR IN ENVIRONMENTAL RESEARCH. Taylor & Francis, Inc., Philadelphia, PA, USA, 27(11): 911-937, (2016).
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

U.S. EPA Office of Research and Development (ORD) (2021). Judson_Mansouri_Automated_Chemical_Curation_QSAREnvRes_Data [Dataset]. https://catalog.data.gov/dataset/judson-mansouri-automated-chemical-curation-qsarenvres-data

Judson_Mansouri_Automated_Chemical_Curation_QSAREnvRes_Data

Explore at:

Dataset updated

May 2, 2021

Dataset provided by

United States Environmental Protection Agencyhttp://www.epa.gov/

Description

Here we describe the development of an automated KNIME workflow to curate and correct errors in the structure and identity of chemicals using the publically available PHYSPROP physico-chemical properties and environmental fate datasets. The workflow first assembles structure-identity pairs using up to four provided chemical identifiers, including chemical name, CASRNs, SMILES, and MolBlock. Problems detected included errors and mismatches in chemical structure formats, identifiers, and various structure validation issues, including hypervalency and stereochemistry descriptions. Subsequently, a machine learning procedure was applied to evaluate the impact of this curation process. The performance of QSAR models built on only the highest quality subset of the original dataset was compared to the larger curated and corrected data set. The latter showed statistically improved predictive performance. The final workflow was used to curate the full list of PHYSPROP datasets, and is being made publically available for further usage and integration by the scientific community. This dataset is associated with the following publication: Mansouri, K., C. Grulke, A. Richard, R. Judson, and A. Williams. (SAR AND QSAR IN ENVIRONMENTAL RESEARCH) An automated curation procedure for addressing chemical errors and inconsistencies in public datasets used in QSAR modeling. SAR AND QSAR IN ENVIRONMENTAL RESEARCH. Taylor & Francis, Inc., Philadelphia, PA, USA, 27(11): 911-937, (2016).

Clear search

Close search

Google apps

Main menu

Judson_Mansouri_Automated_Chemical_Curation_QSAREnvRes_Data

Judson_Mansouri_Automated_Chemical_Curation_QSAREnvRes_Data.

Judson_Mansouri_Automated_Chemical_Curation_QSAREnvRes_Data