100+ datasets found
  1. d

    Data from: A surrogate-based approach for post-genomic partner...

    • catalog.data.gov
    • healthdata.gov
    Updated Sep 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institutes of Health (2025). A surrogate-based approach for post-genomic partner identification [Dataset]. https://catalog.data.gov/dataset/a-surrogate-based-approach-for-post-genomic-partner-identification
    Explore at:
    Dataset updated
    Sep 6, 2025
    Dataset provided by
    National Institutes of Health
    Description

    Background Modern drug discovery is concerned with identification and validation of novel protein targets from among the 30,000 genes or more postulated to be present in the human genome. While protein-protein interactions may be central to many disease indications, it has been difficult to identify new chemical entities capable of regulating these interactions as either agonists or antagonists. Results In this paper, we show that peptide complements (or surrogates) derived from highly diverse random phage display libraries can be used for the identification of the expected natural biological partners for protein and non-protein targets. Our examples include surrogates isolated against both an extracellular secreted protein (TNFβ) and intracellular disease related mRNAs. In each case, surrogates binding to these targets were obtained and found to contain partner information embedded in their amino acid sequences. Furthermore, this information was able to identify the correct biological partners from large human genome databases by rapid and integrated computer based searches. Conclusions Modified versions of these surrogates should provide agents capable of modifying the activity of these targets and enable one to study their involvement in specific biological processes as a means of target validation for downstream drug discovery.

  2. Surrogate ICR data

    • kaggle.com
    zip
    Updated Aug 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tilii (2023). Surrogate ICR data [Dataset]. https://www.kaggle.com/datasets/tilii7/surrogate-icr-data
    Explore at:
    zip(260685 bytes)Available download formats
    Dataset updated
    Aug 12, 2023
    Authors
    Tilii
    Description

    Dataset

    This dataset was created by Tilii

    Contents

  3. H

    Replication Data for: The Surrogate Index: Combining Short-Term Proxies to...

    • dataverse.harvard.edu
    • search.dataone.org
    • +1more
    Updated Feb 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Susan Athey; Raj Chetty; Guido Imbens; Hyunseung Kang (2022). Replication Data for: The Surrogate Index: Combining Short-Term Proxies to Estimate Long-Term Treatment Effects More Rapidly and Precisely [Dataset]. http://doi.org/10.7910/DVN/QCKJYL
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 23, 2022
    Dataset provided by
    Harvard Dataverse
    Authors
    Susan Athey; Raj Chetty; Guido Imbens; Hyunseung Kang
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/QCKJYLhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/QCKJYL

    Description

    This dataset contains replication files for "The Surrogate Index: Combining Short-Term Proxies to Estimate Long-Term Treatment Effects More Rapidly and Precisely" by Susan Athey, Raj Chetty, Guido Imbens, and Hyunseung Kang. For more information, see https://opportunityinsights.org/paper/the-surrogate-index/. A summary of the related publication follows. The impacts of many policies, such as efforts to increase upward income mobility or improve health outcomes, are only observed with long delays. For example, it can take decades to see the effects of early childhood interventions on lifetime earnings. This problem has greatly limited researchers’ and policymakers’ ability to test and improve policies and arises frequently in our own work at Opportunity Insights on the determinants of economic opportunity. In this study, we develop a new method of estimating the long-term impacts of policies more rapidly and precisely using short-term proxies. We predict long-term outcomes (e.g., lifetime earnings) using short-term outcomes (e.g., earnings in early adulthood or test scores). We then show that the causal effects of policies on this predictive index (which we term a “surrogate index”, following terminology in the statistics literature) can help us learn about their long-term impacts more quickly under certain assumptions that are described in the full paper. We apply our method to analyze the long-term impacts of a job training experiment in California. Using short-term employment rates as surrogates, we show that one could have estimated the program’s impact on mean employment rates over a 9 year horizon within 1.5 years, with a 35% reduction in standard errors. The success of the surrogate index in this job training application suggests that our method could be applied to predict the long-term impacts of other programs as well. Going forward, we hope to build a public library of early indicators (surrogate indices) for social science by harnessing historical experiments along with the large-scale datasets we have built. If you would like to contribute to this effort by reporting a surrogate index that predicts long-term impacts estimated in an experiment, as in the GAIN program, please contact us.

  4. f

    Data from: How Important Is ‘Accuracy’ of Surrogate Decision-Making for...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Jan 31, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    De Vries, Raymond; Kim, H. Myra; Knopman, David S.; Kim, Scott Y. H.; Appelbaum, Paul S.; Ryan, Kerry A.; Damschroder, Laura (2013). How Important Is ‘Accuracy’ of Surrogate Decision-Making for Research Participation? [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001728762
    Explore at:
    Dataset updated
    Jan 31, 2013
    Authors
    De Vries, Raymond; Kim, H. Myra; Knopman, David S.; Kim, Scott Y. H.; Appelbaum, Paul S.; Ryan, Kerry A.; Damschroder, Laura
    Description

    BackgroundThere is a longstanding concern about the accuracy of surrogate consent in representing the health care and research preferences of those who lose their ability to decide for themselves. We sought informed, deliberative views of the older general public (≥50 years old) regarding their willingness to participate in dementia research and to grant leeway to future surrogates to choose an option contrary to their stated wishes. Methodology/Principal Findings503 persons aged 50+ recruited by random digit dialing were randomly assigned to one of three groups: deliberation, education, or control. The deliberation group attended an all-day education/peer deliberation session; the education group received written information only. Participants were surveyed at baseline, after the deliberation session (or equivalent time), and one month after the session, regarding their willingness to participate in dementia research and to give leeway to surrogates, regarding studies of varying risk-benefit profiles (a lumbar puncture study, a drug randomized controlled trial, a vaccine randomized controlled trial, and an early phase gene transfer trial). At baseline, 48% (gene transfer scenario) to 92% (drug RCT) were willing to participate in future dementia research. A majority of respondents (57–71% depending on scenario) were willing to give leeway to future surrogate decision-makers. Democratic deliberation increased willingness to participate in all scenarios, to grant leeway in 3 of 4 scenarios (lumbar puncture, vaccine, and gene transfer), and to enroll loved ones in research in all scenarios. On average, respondents were more willing to volunteer themselves for research than to enroll their loved ones. Conclusions/SignificanceMost people were willing to grant leeway to their surrogates, and this willingness was either sustained or increased after democratic deliberation, suggesting that the attitude toward leeway is a reliable opinion. Eliciting a person’s current preferences about future research participation should also involve eliciting his or her leeway preferences.

  5. d

    Data from: Beyond surrogacy: A multi-taxon approach to conservation...

    • datadryad.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Sep 8, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brenda D. Smith-Patten; Michael A. Patten (2015). Beyond surrogacy: A multi-taxon approach to conservation biogeography [Dataset]. http://doi.org/10.5061/dryad.pk27h
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 8, 2015
    Dataset provided by
    Dryad
    Authors
    Brenda D. Smith-Patten; Michael A. Patten
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2015
    Area covered
    East Africa, Eastern Arc Mountains
    Description

    Smith-Patten_Patten_dataPresence/absence data for each taxon used in the study.Smith-Patten_Patten_calculator_matricesCalculator used for pairwise Jaccard’s dissimilarity indices; resulting matrices for all taxaSmith-Patten_Patten_BARRIER_outputRaw BARRIER 2.2 output for all taxaSmith-Patten_Patten_C_ProgramC program to create bootstrap matrices

  6. Z

    Database for "Estimation of the Energy Recovery and Emission Potential of...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Berre, Ivar Tustervatn; Anquetil-Deck, Candy; Schulze-Netzer, Corinna (2023). Database for "Estimation of the Energy Recovery and Emission Potential of Typically Incinerated Norwegian Waste Classes" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8013078
    Explore at:
    Dataset updated
    Jun 7, 2023
    Dataset provided by
    NTNU
    Authors
    Berre, Ivar Tustervatn; Anquetil-Deck, Candy; Schulze-Netzer, Corinna
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A great challenge for waste-to-energy power plants is their uncertain and variable feedstock, which can lead to the power plants not being run as efficiently as possible, leading to reduced energy output and control of emissions. A way to describe the feedstock is to use surrogates. This is a method where the hundreds or thousands of different species of a feedstock are modelled using a few surrogate species, enabling the feedstock’s modelling. The surrogates also provide an estimation of the HHV and the fraction of biomass, oil-based waste and inorganics. This thesis formulated surrogates for waste classes typically incinerated, using a linear least-square solution between available surrogate species and experimental values. Most of the species used were from two existing models in the literature, but three new species were created to improve the representation of some waste classes containing fossil-originated wastes, rubber and PET. These were made by creating reactions based on experimental data from the literature and then testing these reactions under pyrolysis conditions in a stochastic reactor model. The surrogates for the waste classes were formulated by first dividing the waste into components and then finding the surrogate formulation for each component. There were found surrogates for 41 components, which were used to create the surrogate formulation for 30 waste classes. It was found that most of the surrogates modelled the elemental composition accurately compared to experimental values. A statistical overview of the experimental and model data for the waste classes was also created. This overview is relevant for stakeholders in waste management and for other research, such as life-cycle analysis.

  7. 4

    Data underlying the publication: Surrogate-guided Optimization in Quantum...

    • data.4tu.nl
    zip
    Updated Jul 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luise Prielinger; Álvaro Gómez Iñesta; Gayane Vardoyan (2024). Data underlying the publication: Surrogate-guided Optimization in Quantum Networks [Dataset]. http://doi.org/10.4121/a07a9e97-f34c-4e7f-9f68-1010bfb857d0.v4
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 17, 2024
    Dataset provided by
    4TU.ResearchData
    Authors
    Luise Prielinger; Álvaro Gómez Iñesta; Gayane Vardoyan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2023 - 2025
    Area covered
    The Netherlands, Delft
    Dataset funded by
    Dutch Research Council (NWO)
    Description

    This data is associated with the paper "Surrogate-guided Optimization in Quantum Networks".

    In this work we introduce an efficient optimization workflow using machine-learning models that outperforms traditional techniques, addressing the challenges of complex, computationally demanding simulations in quantum networking. Please find guidelines and more context in REAMDE.md file.

  8. u

    Reimbursing a sperm or ova (egg) donor or a surrogate for expenditures...

    • data.urbandatacentre.ca
    Updated Oct 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Reimbursing a sperm or ova (egg) donor or a surrogate for expenditures related to donation or surrogacy - Catalogue - Canadian Urban Data Catalogue (CUDC) [Dataset]. https://data.urbandatacentre.ca/dataset/gov-canada-60fd4458-8b35-424a-bdf8-8d8983f367ec
    Explore at:
    Dataset updated
    Oct 19, 2025
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Area covered
    Canada
    Description

    In Canada, it is illegal to purchase sperm or eggs from a donor (or person acting on behalf of a donor) or pay a female person to be a surrogate. However, donors and surrogates may be reimbursed for out-of-pocket expenditures incurred because of their donation or surrogacy that are provided for in the regulations.

  9. d

    Data from: Patterns of biodiverse, understudied groups do not mirror those...

    • datadryad.org
    • datasetcatalog.nlm.nih.gov
    • +1more
    zip
    Updated Aug 3, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jenna E. Dorey; James C. Lendemer; Robert F.C. Naczi (2018). Patterns of biodiverse, understudied groups do not mirror those of the surrogate groups that set conservation priorities: a case study from the Mid-Atlantic Coastal Plain of eastern North America [Dataset]. http://doi.org/10.5061/dryad.5mk37
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 3, 2018
    Dataset provided by
    Dryad
    Authors
    Jenna E. Dorey; James C. Lendemer; Robert F.C. Naczi
    Time period covered
    Aug 2, 2017
    Area covered
    Atlantic Plain, North America, Virginia, Delaware, Maryland, United States, Delmarva
    Description

    data_packetThis is a zipped folder that contains the following spreadsheets used for analyses in this study: 1) site by species matrices for lichens (1 CSV file) and for vascular plants (1 CSV file). 2) Location data for study sites (1 XLSX file). 3) Georeferenced specimen data used to generate species lists for each site (1 XLSX file).Data_Packet

  10. f

    Data from: Surrogate Variable Analysis.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Apr 4, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benson, Hollie; Monks, Noel; Mooney, Marie; Rusk, Tony; Liang, Winnie; Legendre, Christophe; Cherba, David; Berlinski, Pamela; Webb, Craig Paul; Eugster, Emily; Kamerling, Steve; Simpson, Heather; Marotti, Keith; Bond, Jeffrey; Tembe, Waibhav (2013). Surrogate Variable Analysis. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001648498
    Explore at:
    Dataset updated
    Apr 4, 2013
    Authors
    Benson, Hollie; Monks, Noel; Mooney, Marie; Rusk, Tony; Liang, Winnie; Legendre, Christophe; Cherba, David; Berlinski, Pamela; Webb, Craig Paul; Eugster, Emily; Kamerling, Steve; Simpson, Heather; Marotti, Keith; Bond, Jeffrey; Tembe, Waibhav
    Description

    Surrogate Variable[42] loadings by technology.

  11. Data Sheet 1_Association of surrogate adiposity markers with prevalence,...

    • frontiersin.figshare.com
    docx
    Updated Mar 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fan-Shun Guo; Chen Guo; Jia-Hao Dou; Jun-Xiang Wang; Rui-Yun Wu; Shou-Fang Song; Xue-Lu Sun; Yi-Wei Hu; Jin Wei (2025). Data Sheet 1_Association of surrogate adiposity markers with prevalence, all-cause mortality and long-term survival of heart failure: a retrospective study from NHANES database.docx [Dataset]. http://doi.org/10.3389/fendo.2025.1430277.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Mar 4, 2025
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Fan-Shun Guo; Chen Guo; Jia-Hao Dou; Jun-Xiang Wang; Rui-Yun Wu; Shou-Fang Song; Xue-Lu Sun; Yi-Wei Hu; Jin Wei
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionObesity, especially abdominal obesity, is more common in patients with heart failure (HF), but body mass index (BMI) cannot accurately describe fat distribution. Several surrogate adiposity markers are available to reflect fat distribution and quantity. The objective of this study was to explore which adiposity marker is most highly correlated with HF prevalence, all-cause mortality and patients’ long-term survival.MethodsThe National Health and Nutrition Examination Survey (NHANES) database provided all the data for this study. Logistic regression analyses were adopted to compare the association of each surrogate adiposity marker with the prevalence of HF. Cox proportional hazards models and restricted cubic spline (RCS) analysis were employed to assess the association between surrogate adiposity markers and all-cause mortality in HF patients. The ability of surrogate adiposity markers to predict long-term survival in HF patients was assessed using time-dependent receiver operating characteristic (ROC) curves.Results46,257 participants (1,366 HF patients) were encompassed in this retrospective study. An area under the receiver operating characteristic curve (AUC) for the prevalence of HF assessed by weight-adjusted-waist index (WWI) was 0.70 (95% CI: 0.69-0.72). During a median follow-up of 70 months, 700 of 1366 HF patients’ death were recorded. The hazard ratio (HR) for HF patients’ all-cause mortality was 1.33 (95% CI: 1.06-1.66) in the a body shape index (ABSI) quartile 4 group and 1.43 (95% CI: 1.13-1.82) in the WWI quartile 4 group, compared with the lowest quartile group. The AUC for predicting 5-year survival of HF patients using the ABSI was 0.647 (95% CI: 0.61-0.68).ConclusionsWWI is strongly correlated with the prevalence of HF. In HF patients, those with higher WWI and ABSI tend to higher all-cause mortality. ABSI can predict patients’ long-term survival. We recommend the use of WWI and ABSI for assessing obesity in HF patients.

  12. f

    Surrogate metrics for real data.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Oct 2, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhang, Hui; Graham, Mark S.; Drobnjak, Ivana; Jenkinson, Mark (2017). Surrogate metrics for real data. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001796871
    Explore at:
    Dataset updated
    Oct 2, 2017
    Authors
    Zhang, Hui; Graham, Mark S.; Drobnjak, Ivana; Jenkinson, Mark
    Description

    Table shows whole-brain-mean intensity differences between AP and LR corrected datasets (units are arbitrary signal units). Errors are the standard deviation of the means over the ten subjects. Metrics show statistically significant differences between all methods at the p<0.001 level.

  13. -values for histograms from trade vs. surrogate data.

    • plos.figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter Klimek; Ricardo Hausmann; Stefan Thurner (2023). -values for histograms from trade vs. surrogate data. [Dataset]. http://doi.org/10.1371/journal.pone.0038924.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Peter Klimek; Ricardo Hausmann; Stefan Thurner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The column ‘’ lists the results as described in the main text. In addition, robustness tests where conducted with results listed in separate columns. ‘  = 200 k USD’ uses the same set of countries and a threshold of 200.000 USD below which trade flows are ignored. In ‘all products with positive exports’ all products are included which have positive world exports in each year of the analysis. The column ‘1989–2000’ decreases the number of years included in analysis. Results excluding the FSU and CEE are listed in ‘Excl. FSU’. The maximal lag is then increased to . The last column reports results using the UN ComTrade dataset, as described in Text S1.

  14. o

    Data from: Surrogate's Court

    • openheritage3d.org
    Updated Apr 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lisa Conte (2021). Surrogate's Court [Dataset]. http://doi.org/10.26301/nted-p659
    Explore at:
    Dataset updated
    Apr 20, 2021
    Dataset provided by
    Open Heritage 3D
    Authors
    Lisa Conte
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Spatial Data
    Measurement technique
    Photogrammetry - Terrestrial
    Description

    This data was collected as part of the Map the Moment initiative, a volunteer project to document the artwork and changes to the streetscape following the killing of George Floyd and the demonstrations that followed. This data was collected by Lisa Conte and processed by Joe Graham-Felsen. They used a Canon 5D Mark 3 to scan this data and capture the various murals that appeared throughout the city.

  15. d

    Data from: Debate: The slippery slope of surrogate outcomes

    • catalog.data.gov
    • data.virginia.gov
    Updated Sep 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institutes of Health (2025). Debate: The slippery slope of surrogate outcomes [Dataset]. https://catalog.data.gov/dataset/debate-the-slippery-slope-of-surrogate-outcomes
    Explore at:
    Dataset updated
    Sep 6, 2025
    Dataset provided by
    National Institutes of Health
    Description

    Surrogate outcomes are frequently used in cardiovascular disease research. A concern is that changes in surrogate markers may not reflect changes in disease outcomes. Two recent clinical trials (Heart and Estrogen/Progestin Replacement Study [HERS], and the Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial [ALLHAT]) underscore this problem since their results contradicted what was expected based on the surrogate outcomes. The current regulatory policy to allow new therapies to be introduced onto the market based solely on surrogate outcomes may need to be reviewed.

  16. c

    Data from: Datasets for Comparison of Surrogate Models to Estimate Pesticide...

    • s.cnmilf.com
    • data.usgs.gov
    • +1more
    Updated Oct 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Datasets for Comparison of Surrogate Models to Estimate Pesticide Concentrations at Six U.S. Geological Survey National Water Quality Network Sites During Water Years 2013–2018 [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/datasets-for-comparison-of-surrogate-models-to-estimate-pesticide-concentrations-at-six-u-
    Explore at:
    Dataset updated
    Oct 8, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    This data release is comprised of data tables of input variables for seawaveQ and surrogate models used to predict concentrations of select pesticides at six U.S. Geological Survey National Water Quality Network (NWQN) river sites (Fanno Creek at Durham, Oregon; White River at Hazleton, Indiana; Kansas River at DeSoto, Kansas; Little Arkansas River near Sedgwick, Kansas; Missouri River at Hermann, Missouri; Red River of the North at Grand Forks, North Dakota). Each data table includes discrete concentrations of one select pesticide (Atrazine, Azoxystrobin, Bentazon, Bromacil, Imidacloprid, Simazine, or Triclopyr) at one of the NWQN sites; daily mean streamflow; 30-day and 1-day flow anomalies; daily median values of pH and turbidity; daily mean values of dissolved oxygen, specific conductance, and water temperature; and 30-day and 1-day anomalies for pH, turbidity, dissolved oxygen, specific conductance, and water temperature. Two pesticides were modeled at each site with three types of regression models. Also included is a zip file with outputs from seawaveQ model summary. The processes for retrieving and preparing data for regression models followed those outlined in the SEAWAVE-Q R package documentation (Ryberg and Vecchia, 2013; Ryberg and York, 2020). The R package waterData (Ryberg and Vecchia, 2012) was used to import daily mean values for discharge and either daily mean or daily median values for continuous water-quality constituents directly into R depending on what data were available at each site. Pesticide concentration, streamflow, and surrogate data (continuously measured field parameters) were imported from and are available online from the USGS National Water Information System database (USGS, 2020). The waterData package was used to screen for missing daily mean discharge values (no missing values were found for the sites) and to calculate short-term (1 day) and mid-term (30 day) anomalies for flow and short-term anomalies (1 day) for each water-quality variable. A mid-term streamflow anomaly, for instance, is the deviation of concurrent daily streamflow from average conditions for the previous 30 days (Vecchia and others, 2008). Anomalies were calculated as additional potential model variables. Pesticide concentrations for select constituents from each site were pulled into R using the dataRetrieval package (De Cicco and others, 2018). Three of the six sites (Kansas River at DeSoto, Kansas; Missouri River at Hermann, Missouri; and White River at Hazleton, Indiana) pulled pesticide data for WY 2013–17 whereas the other three sites (Fanno Creek at Durham, Oregon; Little Arkansas River near Sedgwick, Kansas; and Red River of the North at Grand Forks, North Dakota) pulled pesticide data for WY 2013–18. Discrete pesticide data were matched with daily mean discharge and daily mean or median water-quality constituents and the associated calculated short-term (1-day) and mid-term (30-day) anomalies from the date of sampling. Pesticide concentrations were estimated using the SEAWAVE-Q (with surrogates) model using 19 combinations of surrogate variables (table 2 in the associated SIR, "Comparison of Surrogate Models to Estimate Pesticide Concentrations at Six U.S. Geological Survey National Water Quality Network Sites During Water Years 2013–18.") at each of 12 site-pesticide combinations (table 3 in the associated SIR). Three measures of model performance—the generalized coefficient of determination (R2), Akaike’s Information Criteria (AIC), and scale—were included in the output and used to select best-fit models (Table 4 of the associated SIR). The three to four best-fit SEAWAVE-Q (with surrogates) models with sample sizes at least five times the number of variables were selected for each site-pesticide combination based on generalized R2 values—the higher, the better. If generalized R2 values were the same, the model with the lower AIC value was used. The standard surrogate regression and base SEAWAVE-Q models were then applied using the same samples that were used for each of the best-fit SEAWAVE-Q (with surrogates) models so that direct comparisons could be made for each site-pesticide-surrogate instance. The input data used to estimate daily pesticide concentrations for each of the best fit models have been included in this data release. An example of one output file for each model type is included in a .zip file named "output_examples.zip". Each of the output files shows the three measures of model performance. (1) The output file for the standard regression model named "HAZ8_Atrazine_Standard_Regression_Output.txt" includes: Pseudo R-square (Allison) of 0.631, Model AIC of 174.0232, and a Scale of 0.961. (2) The output file for the base SEAWAVE-Q model named "HAZ8_Atrazine_Base_Seawave-Q_Output.txt" includes: Generalized r-squared of 0.82, AIC (Akaike's An Information Criterion) of 36.38, and a Scale of 0.288. (3) The output file for the SEAWAVE-Q w/Surrogates model named "HAZ8_Atrazine_Seawave-Q_w_Surrogates_Output.txt" includes: Generalized r-squared of 0.85, AIC (Akaike's An Information Criterion) of 33.76, and a Scale of 0.268. These values match those for Site ID = HAZ, Pesticide = Atrazine, and Surrogate variable group 8 for each model type in Table 4 of the associated SIR.

  17. Z

    Surrogate waveform model data for black hole binary systems computed in...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Aug 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bachhar, Ritesh; Field, Scott; Giesler, Matthew; Gonzalez-Quesada,, Kevin; Hughes, Scott; Islam, Tousif; Khanna, Gaurav; Kidder, Lawrence; Pfeiffer, Harald; Rifat, Nur; Rink, Katie; Scheel, Mark; Varma, Vijay (2024). Surrogate waveform model data for black hole binary systems computed in point-particle black hole perturbation theory [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_3592427
    Explore at:
    Dataset updated
    Aug 19, 2024
    Dataset provided by
    The University of Texas at Austin
    Massachusetts Institute of Technology
    California Institute of Technology
    Cornell University
    University of Massachusetts Dartmouth
    Albert Einstein Institute
    University of Rhode Island
    Authors
    Bachhar, Ritesh; Field, Scott; Giesler, Matthew; Gonzalez-Quesada,, Kevin; Hughes, Scott; Islam, Tousif; Khanna, Gaurav; Kidder, Lawrence; Pfeiffer, Harald; Rifat, Nur; Rink, Katie; Scheel, Mark; Varma, Vijay
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains all publicly available surrogate data for gravitational waveforms produced within the point-particle black hole perturbation theory framework and calibrated to numerical relativity simulations performed with the Spectral Einstein Code (SpEC).

    Several surrogate models are currently available in this catalog:

    BHPTNRSur2dq1e3, for aligned spin black hole binary systems with mass-ratios varying from 3 to 1000 and spins from −0.8≤χ1≤0.8 on the larger black hole. This surrogate model is trained on waveform data generated by point-particle black hole perturbation theory (ppBHPT) with calibration to numerical relativity (NR) data. The waveforms include all spin-weighted spherical harmonic modes up to ℓ=4 except the (4,1) and m=0 modes. Model details can be found in Rink et al. 2024. This data file is used to evaluate the surrogate model with either stand-alone Python code hosted by the Black Hole Perturbation Toolkit (Jupyter notebook tutorial) or the GWSurrogate Python package, which can be found on PyPI or conda-forge.

    BHPTNRSur1dq1e4, an updated version of the EMRISur1dq1e4 model described below. The updated version includes better calibration to NR, a smoother transition to plunge model, and more harmonic modes. Model details can be found in Islam et al. 2022. This data file is used to evaluate the surrogate model with either stand-alone Python code hosted by the Black Hole Perturbation Toolkit (Jupyter notebook tutorial) or the GWSurrogate Python package, which can be found on PyPI or conda-forge.

    EMRISur1dq1e4, for non-spinning black hole binary systems with mass-ratios varying from 3 to 10000. This surrogate model is trained on waveform data generated by point-particle black hole perturbation theory (ppBHPT), with the total mass rescaling parameter tuned to NR simulations. Available modes are [(2,2), (2,1), (3,3), (3,2), (3,1), (4,4), (4,3), (4,2), (5,5), (5,4), (5,3)]. The m<0 modes are deduced from the m>0 modes. Model details can be found in Rifat et al. 2019. This data file is used to evaluate the surrogate model with either stand-alone Python code hosted by the Black Hole Perturbation Toolkit (Jupyter notebook tutorial) or the GWSurrogate Python package (Jupyter notebook tutorial), which can be found on PyPI.

  18. H

    Data from: Data augmentation for disruption prediction via robust surrogate...

    • dataverse.harvard.edu
    • osti.gov
    Updated Aug 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katharina Rath, David Rügamer, Bernd Bischl, Udo von Toussaint, Cristina Rea, Andrew Maris, Robert Granetz, Christopher G. Albert (2024). Data augmentation for disruption prediction via robust surrogate models [Dataset]. http://doi.org/10.7910/DVN/FMJCAD
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 31, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Katharina Rath, David Rügamer, Bernd Bischl, Udo von Toussaint, Cristina Rea, Andrew Maris, Robert Granetz, Christopher G. Albert
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The goal of this work is to generate large statistically representative datasets to train machine learning models for disruption prediction provided by data from few existing discharges. Such a comprehensive training database is important to achieve satisfying and reliable prediction results in artificial neural network classifiers. Here, we aim for a robust augmentation of the training database for multivariate time series data using Student-t process regression. We apply Student-t process regression in a state space formulation via Bayesian filtering to tackle challenges imposed by outliers and noise in the training data set and to reduce the computational complexity. Thus, the method can also be used if the time resolution is high. We use an uncorrelated model for each dimension and impose correlations afterwards via coloring transformations. We demonstrate the efficacy of our approach on plasma diagnostics data of three different disruption classes from the DIII-D tokamak. To evaluate if the distribution of the generated data is similar to the training data, we additionally perform statistical analyses using methods from time series analysis, descriptive statistics, and classic machine learning clustering algorithms.

  19. Elwha River Sediment Monitoring Data from the Elwha Bed Load Monitoring...

    • data.usbr.gov
    Updated Aug 15, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Bureau of Reclamation (2014). Elwha River Sediment Monitoring Data from the Elwha Bed Load Monitoring Database [Dataset]. https://data.usbr.gov/catalog/4616
    Explore at:
    Dataset updated
    Aug 15, 2014
    Dataset authored and provided by
    United States Bureau of Reclamationhttp://www.usbr.gov/
    Area covered
    Elwha River
    Description

    The data contained in the files are continuous bed load flux. The data are summed across the cross section and averaged over an hour. Data were collected with a calibrated surrogate sediment measurement system including 72 impact plates, mounted adjacent to each other, spanning the cross section of the Elwha River at river kilometer 5 (river mile 3.1). The following Science and Technology project numbers apply to the Elwha bed load data; 0115, 1709, 6209, 4542, 9562, and 6499.

  20. Z

    Data from: Surrogate-based optimization using an artificial neural network...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Nov 17, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pfeil, Markus; Slawig, Thomas (2021). Surrogate-based optimization using an artificial neural network for a parameter identification in a 3D marine ecosystem model [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5643666
    Explore at:
    Dataset updated
    Nov 17, 2021
    Dataset provided by
    Kiel University
    Authors
    Pfeil, Markus; Slawig, Thomas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract:

    Parameter identification for marine ecosystem models is important for the assessment and validation of marine ecosystem models against observational data. The surrogate-based optimization (SBO) is a computationally efficient method to optimize complex models. SBO replaces the computationally expensive (high-fidelity) model by a surrogate constructed from a less accurate but computationally cheaper (low-fidelity) model in combination with an appropriate correction approach, which improves the accuracy of the low-fidelity model. To construct a computationally cheap low-fidelity model, we tested three different approaches to compute an approximation of the annually periodic solution (i.e., a steady annual cycle) of a marine ecosystem model: firstly, a reduced number of spin-up iterations (several decades instead of millennia), secondly, an artificial neural network (ANN) approximating the steady annual cycle and, finally, a combination of the both approaches. Except for the low-fidelity model using only the ANN, the SBO yielded a solution close to the target and reduced the computational effort significantly. If an ANN approximating appropriately a marine ecosystem model is available, the SBO using this ANN as low-fidelity model presents a promising and computational efficient method for the validation.

    Content:

    SQLite database including the data of the different optimization runs

    Structure and weights of the used artificial neural network

    Tracer concentrations obtain from the high-fidelity model for the different optimization runs

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
National Institutes of Health (2025). A surrogate-based approach for post-genomic partner identification [Dataset]. https://catalog.data.gov/dataset/a-surrogate-based-approach-for-post-genomic-partner-identification

Data from: A surrogate-based approach for post-genomic partner identification

Related Article
Explore at:
Dataset updated
Sep 6, 2025
Dataset provided by
National Institutes of Health
Description

Background Modern drug discovery is concerned with identification and validation of novel protein targets from among the 30,000 genes or more postulated to be present in the human genome. While protein-protein interactions may be central to many disease indications, it has been difficult to identify new chemical entities capable of regulating these interactions as either agonists or antagonists. Results In this paper, we show that peptide complements (or surrogates) derived from highly diverse random phage display libraries can be used for the identification of the expected natural biological partners for protein and non-protein targets. Our examples include surrogates isolated against both an extracellular secreted protein (TNFβ) and intracellular disease related mRNAs. In each case, surrogates binding to these targets were obtained and found to contain partner information embedded in their amino acid sequences. Furthermore, this information was able to identify the correct biological partners from large human genome databases by rapid and integrated computer based searches. Conclusions Modified versions of these surrogates should provide agents capable of modifying the activity of these targets and enable one to study their involvement in specific biological processes as a means of target validation for downstream drug discovery.

Search
Clear search
Close search
Google apps
Main menu