100+ datasets found

d
Data from: A surrogate-based approach for post-genomic partner...
catalog.data.gov
healthdata.gov
Updated Sep 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institutes of Health (2025). A surrogate-based approach for post-genomic partner identification [Dataset]. https://catalog.data.gov/dataset/a-surrogate-based-approach-for-post-genomic-partner-identification
Explore at:
Dataset updated
Sep 6, 2025
Dataset provided by
National Institutes of Health
Description
Background Modern drug discovery is concerned with identification and validation of novel protein targets from among the 30,000 genes or more postulated to be present in the human genome. While protein-protein interactions may be central to many disease indications, it has been difficult to identify new chemical entities capable of regulating these interactions as either agonists or antagonists. Results In this paper, we show that peptide complements (or surrogates) derived from highly diverse random phage display libraries can be used for the identification of the expected natural biological partners for protein and non-protein targets. Our examples include surrogates isolated against both an extracellular secreted protein (TNFβ) and intracellular disease related mRNAs. In each case, surrogates binding to these targets were obtained and found to contain partner information embedded in their amino acid sequences. Furthermore, this information was able to identify the correct biological partners from large human genome databases by rapid and integrated computer based searches. Conclusions Modified versions of these surrogates should provide agents capable of modifying the activity of these targets and enable one to study their involvement in specific biological processes as a means of target validation for downstream drug discovery.
Surrogate ICR data
kaggle.com
zip
Updated Aug 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tilii (2023). Surrogate ICR data [Dataset]. https://www.kaggle.com/datasets/tilii7/surrogate-icr-data
Explore at:
zip(260685 bytes)Available download formats
Dataset updated
Aug 12, 2023
Authors
Tilii
Description
Dataset

This dataset was created by Tilii

Contents
H
Replication Data for: The Surrogate Index: Combining Short-Term Proxies to...
dataverse.harvard.edu
search.dataone.org
+1more
Updated Feb 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Susan Athey; Raj Chetty; Guido Imbens; Hyunseung Kang (2022). Replication Data for: The Surrogate Index: Combining Short-Term Proxies to Estimate Long-Term Treatment Effects More Rapidly and Precisely [Dataset]. http://doi.org/10.7910/DVN/QCKJYL
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/QCKJYL
Dataset updated
Feb 23, 2022
Dataset provided by
Harvard Dataverse
Authors
Susan Athey; Raj Chetty; Guido Imbens; Hyunseung Kang
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/QCKJYLhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/QCKJYL
Description
This dataset contains replication files for "The Surrogate Index: Combining Short-Term Proxies to Estimate Long-Term Treatment Effects More Rapidly and Precisely" by Susan Athey, Raj Chetty, Guido Imbens, and Hyunseung Kang. For more information, see https://opportunityinsights.org/paper/the-surrogate-index/. A summary of the related publication follows. The impacts of many policies, such as efforts to increase upward income mobility or improve health outcomes, are only observed with long delays. For example, it can take decades to see the effects of early childhood interventions on lifetime earnings. This problem has greatly limited researchers’ and policymakers’ ability to test and improve policies and arises frequently in our own work at Opportunity Insights on the determinants of economic opportunity. In this study, we develop a new method of estimating the long-term impacts of policies more rapidly and precisely using short-term proxies. We predict long-term outcomes (e.g., lifetime earnings) using short-term outcomes (e.g., earnings in early adulthood or test scores). We then show that the causal effects of policies on this predictive index (which we term a “surrogate index”, following terminology in the statistics literature) can help us learn about their long-term impacts more quickly under certain assumptions that are described in the full paper. We apply our method to analyze the long-term impacts of a job training experiment in California. Using short-term employment rates as surrogates, we show that one could have estimated the program’s impact on mean employment rates over a 9 year horizon within 1.5 years, with a 35% reduction in standard errors. The success of the surrogate index in this job training application suggests that our method could be applied to predict the long-term impacts of other programs as well. Going forward, we hope to build a public library of early indicators (surrogate indices) for social science by harnessing historical experiments along with the large-scale datasets we have built. If you would like to contribute to this effort by reporting a surrogate index that predicts long-term impacts estimated in an experiment, as in the GAIN program, please contact us.
f
Data from: How Important Is ‘Accuracy’ of Surrogate Decision-Making for...
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Jan 31, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
De Vries, Raymond; Kim, H. Myra; Knopman, David S.; Kim, Scott Y. H.; Appelbaum, Paul S.; Ryan, Kerry A.; Damschroder, Laura (2013). How Important Is ‘Accuracy’ of Surrogate Decision-Making for Research Participation? [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001728762
Explore at:
Dataset updated
Jan 31, 2013
Authors
De Vries, Raymond; Kim, H. Myra; Knopman, David S.; Kim, Scott Y. H.; Appelbaum, Paul S.; Ryan, Kerry A.; Damschroder, Laura
Description
BackgroundThere is a longstanding concern about the accuracy of surrogate consent in representing the health care and research preferences of those who lose their ability to decide for themselves. We sought informed, deliberative views of the older general public (≥50 years old) regarding their willingness to participate in dementia research and to grant leeway to future surrogates to choose an option contrary to their stated wishes. Methodology/Principal Findings503 persons aged 50+ recruited by random digit dialing were randomly assigned to one of three groups: deliberation, education, or control. The deliberation group attended an all-day education/peer deliberation session; the education group received written information only. Participants were surveyed at baseline, after the deliberation session (or equivalent time), and one month after the session, regarding their willingness to participate in dementia research and to give leeway to surrogates, regarding studies of varying risk-benefit profiles (a lumbar puncture study, a drug randomized controlled trial, a vaccine randomized controlled trial, and an early phase gene transfer trial). At baseline, 48% (gene transfer scenario) to 92% (drug RCT) were willing to participate in future dementia research. A majority of respondents (57–71% depending on scenario) were willing to give leeway to future surrogate decision-makers. Democratic deliberation increased willingness to participate in all scenarios, to grant leeway in 3 of 4 scenarios (lumbar puncture, vaccine, and gene transfer), and to enroll loved ones in research in all scenarios. On average, respondents were more willing to volunteer themselves for research than to enroll their loved ones. Conclusions/SignificanceMost people were willing to grant leeway to their surrogates, and this willingness was either sustained or increased after democratic deliberation, suggesting that the attitude toward leeway is a reliable opinion. Eliciting a person’s current preferences about future research participation should also involve eliciting his or her leeway preferences.
d
Data from: Beyond surrogacy: A multi-taxon approach to conservation...
datadryad.org
data.niaid.nih.gov
+1more
zip
Updated Sep 8, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brenda D. Smith-Patten; Michael A. Patten (2015). Beyond surrogacy: A multi-taxon approach to conservation biogeography [Dataset]. http://doi.org/10.5061/dryad.pk27h
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.pk27h
Dataset updated
Sep 8, 2015
Dataset provided by
Dryad
Authors
Brenda D. Smith-Patten; Michael A. Patten
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
2015
Area covered
East Africa, Eastern Arc Mountains
Description
Smith-Patten_Patten_dataPresence/absence data for each taxon used in the study.Smith-Patten_Patten_calculator_matricesCalculator used for pairwise Jaccard’s dissimilarity indices; resulting matrices for all taxaSmith-Patten_Patten_BARRIER_outputRaw BARRIER 2.2 output for all taxaSmith-Patten_Patten_C_ProgramC program to create bootstrap matrices
Z
Database for "Estimation of the Energy Recovery and Emission Potential of...
data.niaid.nih.gov
zenodo.org
Updated Jun 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Berre, Ivar Tustervatn; Anquetil-Deck, Candy; Schulze-Netzer, Corinna (2023). Database for "Estimation of the Energy Recovery and Emission Potential of Typically Incinerated Norwegian Waste Classes" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8013078
Explore at:
Dataset updated
Jun 7, 2023
Dataset provided by
NTNU
Authors
Berre, Ivar Tustervatn; Anquetil-Deck, Candy; Schulze-Netzer, Corinna
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A great challenge for waste-to-energy power plants is their uncertain and variable feedstock, which can lead to the power plants not being run as efficiently as possible, leading to reduced energy output and control of emissions. A way to describe the feedstock is to use surrogates. This is a method where the hundreds or thousands of different species of a feedstock are modelled using a few surrogate species, enabling the feedstock’s modelling. The surrogates also provide an estimation of the HHV and the fraction of biomass, oil-based waste and inorganics. This thesis formulated surrogates for waste classes typically incinerated, using a linear least-square solution between available surrogate species and experimental values. Most of the species used were from two existing models in the literature, but three new species were created to improve the representation of some waste classes containing fossil-originated wastes, rubber and PET. These were made by creating reactions based on experimental data from the literature and then testing these reactions under pyrolysis conditions in a stochastic reactor model. The surrogates for the waste classes were formulated by first dividing the waste into components and then finding the surrogate formulation for each component. There were found surrogates for 41 components, which were used to create the surrogate formulation for 30 waste classes. It was found that most of the surrogates modelled the elemental composition accurately compared to experimental values. A statistical overview of the experimental and model data for the waste classes was also created. This overview is relevant for stakeholders in waste management and for other research, such as life-cycle analysis.
4
Data underlying the publication: Surrogate-guided Optimization in Quantum...
data.4tu.nl
zip
Updated Jul 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luise Prielinger; Álvaro Gómez Iñesta; Gayane Vardoyan (2024). Data underlying the publication: Surrogate-guided Optimization in Quantum Networks [Dataset]. http://doi.org/10.4121/a07a9e97-f34c-4e7f-9f68-1010bfb857d0.v4
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/a07a9e97-f34c-4e7f-9f68-1010bfb857d0.v4
Dataset updated
Jul 17, 2024
Dataset provided by
4TU.ResearchData
Authors
Luise Prielinger; Álvaro Gómez Iñesta; Gayane Vardoyan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2023 - 2025
Area covered
The Netherlands, Delft
Dataset funded by
Dutch Research Council (NWO)
Description
This data is associated with the paper "Surrogate-guided Optimization in Quantum Networks".
In this work we introduce an efficient optimization workflow using machine-learning models that outperforms traditional techniques, addressing the challenges of complex, computationally demanding simulations in quantum networking. Please find guidelines and more context in REAMDE.md file.
u
Reimbursing a sperm or ova (egg) donor or a surrogate for expenditures...
data.urbandatacentre.ca
Updated Oct 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Reimbursing a sperm or ova (egg) donor or a surrogate for expenditures related to donation or surrogacy - Catalogue - Canadian Urban Data Catalogue (CUDC) [Dataset]. https://data.urbandatacentre.ca/dataset/gov-canada-60fd4458-8b35-424a-bdf8-8d8983f367ec
Explore at:
Dataset updated
Oct 19, 2025
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Area covered
Canada
Description
In Canada, it is illegal to purchase sperm or eggs from a donor (or person acting on behalf of a donor) or pay a female person to be a surrogate. However, donors and surrogates may be reimbursed for out-of-pocket expenditures incurred because of their donation or surrogacy that are provided for in the regulations.
d
Data from: Patterns of biodiverse, understudied groups do not mirror those...
datadryad.org
datasetcatalog.nlm.nih.gov
+1more
zip
Updated Aug 3, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jenna E. Dorey; James C. Lendemer; Robert F.C. Naczi (2018). Patterns of biodiverse, understudied groups do not mirror those of the surrogate groups that set conservation priorities: a case study from the Mid-Atlantic Coastal Plain of eastern North America [Dataset]. http://doi.org/10.5061/dryad.5mk37
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.5mk37
Dataset updated
Aug 3, 2018
Dataset provided by
Dryad
Authors
Jenna E. Dorey; James C. Lendemer; Robert F.C. Naczi
Time period covered
Aug 2, 2017
Area covered
Atlantic Plain, North America, Virginia, Delaware, Maryland, United States, Delmarva
Description
data_packetThis is a zipped folder that contains the following spreadsheets used for analyses in this study: 1) site by species matrices for lichens (1 CSV file) and for vascular plants (1 CSV file). 2) Location data for study sites (1 XLSX file). 3) Georeferenced specimen data used to generate species lists for each site (1 XLSX file).Data_Packet
f
Data from: Surrogate Variable Analysis.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Apr 4, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benson, Hollie; Monks, Noel; Mooney, Marie; Rusk, Tony; Liang, Winnie; Legendre, Christophe; Cherba, David; Berlinski, Pamela; Webb, Craig Paul; Eugster, Emily; Kamerling, Steve; Simpson, Heather; Marotti, Keith; Bond, Jeffrey; Tembe, Waibhav (2013). Surrogate Variable Analysis. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001648498
Explore at:
Dataset updated
Apr 4, 2013
Authors
Benson, Hollie; Monks, Noel; Mooney, Marie; Rusk, Tony; Liang, Winnie; Legendre, Christophe; Cherba, David; Berlinski, Pamela; Webb, Craig Paul; Eugster, Emily; Kamerling, Steve; Simpson, Heather; Marotti, Keith; Bond, Jeffrey; Tembe, Waibhav
Description
Surrogate Variable[42] loadings by technology.
Data Sheet 1_Association of surrogate adiposity markers with prevalence,...
frontiersin.figshare.com
docx
Updated Mar 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fan-Shun Guo; Chen Guo; Jia-Hao Dou; Jun-Xiang Wang; Rui-Yun Wu; Shou-Fang Song; Xue-Lu Sun; Yi-Wei Hu; Jin Wei (2025). Data Sheet 1_Association of surrogate adiposity markers with prevalence, all-cause mortality and long-term survival of heart failure: a retrospective study from NHANES database.docx [Dataset]. http://doi.org/10.3389/fendo.2025.1430277.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fendo.2025.1430277.s001
Dataset updated
Mar 4, 2025
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Fan-Shun Guo; Chen Guo; Jia-Hao Dou; Jun-Xiang Wang; Rui-Yun Wu; Shou-Fang Song; Xue-Lu Sun; Yi-Wei Hu; Jin Wei
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionObesity, especially abdominal obesity, is more common in patients with heart failure (HF), but body mass index (BMI) cannot accurately describe fat distribution. Several surrogate adiposity markers are available to reflect fat distribution and quantity. The objective of this study was to explore which adiposity marker is most highly correlated with HF prevalence, all-cause mortality and patients’ long-term survival.MethodsThe National Health and Nutrition Examination Survey (NHANES) database provided all the data for this study. Logistic regression analyses were adopted to compare the association of each surrogate adiposity marker with the prevalence of HF. Cox proportional hazards models and restricted cubic spline (RCS) analysis were employed to assess the association between surrogate adiposity markers and all-cause mortality in HF patients. The ability of surrogate adiposity markers to predict long-term survival in HF patients was assessed using time-dependent receiver operating characteristic (ROC) curves.Results46,257 participants (1,366 HF patients) were encompassed in this retrospective study. An area under the receiver operating characteristic curve (AUC) for the prevalence of HF assessed by weight-adjusted-waist index (WWI) was 0.70 (95% CI: 0.69-0.72). During a median follow-up of 70 months, 700 of 1366 HF patients’ death were recorded. The hazard ratio (HR) for HF patients’ all-cause mortality was 1.33 (95% CI: 1.06-1.66) in the a body shape index (ABSI) quartile 4 group and 1.43 (95% CI: 1.13-1.82) in the WWI quartile 4 group, compared with the lowest quartile group. The AUC for predicting 5-year survival of HF patients using the ABSI was 0.647 (95% CI: 0.61-0.68).ConclusionsWWI is strongly correlated with the prevalence of HF. In HF patients, those with higher WWI and ABSI tend to higher all-cause mortality. ABSI can predict patients’ long-term survival. We recommend the use of WWI and ABSI for assessing obesity in HF patients.
f
Surrogate metrics for real data.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Oct 2, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhang, Hui; Graham, Mark S.; Drobnjak, Ivana; Jenkinson, Mark (2017). Surrogate metrics for real data. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001796871
Explore at:
Dataset updated
Oct 2, 2017
Authors
Zhang, Hui; Graham, Mark S.; Drobnjak, Ivana; Jenkinson, Mark
Description
Table shows whole-brain-mean intensity differences between AP and LR corrected datasets (units are arbitrary signal units). Errors are the standard deviation of the means over the ten subjects. Metrics show statistically significant differences between all methods at the p<0.001 level.
-values for histograms from trade vs. surrogate data.
plos.figshare.com
xls
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peter Klimek; Ricardo Hausmann; Stefan Thurner (2023). -values for histograms from trade vs. surrogate data. [Dataset]. http://doi.org/10.1371/journal.pone.0038924.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0038924.t001
Dataset updated
May 30, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Peter Klimek; Ricardo Hausmann; Stefan Thurner
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The column ‘’ lists the results as described in the main text. In addition, robustness tests where conducted with results listed in separate columns. ‘ = 200 k USD’ uses the same set of countries and a threshold of 200.000 USD below which trade flows are ignored. In ‘all products with positive exports’ all products are included which have positive world exports in each year of the analysis. The column ‘1989–2000’ decreases the number of years included in analysis. Results excluding the FSU and CEE are listed in ‘Excl. FSU’. The maximal lag is then increased to . The last column reports results using the UN ComTrade dataset, as described in Text S1.
o
Data from: Surrogate's Court
openheritage3d.org
Updated Apr 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lisa Conte (2021). Surrogate's Court [Dataset]. http://doi.org/10.26301/nted-p659
Explore at:
Unique identifier
https://doi.org/10.26301/nted-p659
Dataset updated
Apr 20, 2021
Dataset provided by
Open Heritage 3D
Authors
Lisa Conte
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Spatial Data
Measurement technique
Photogrammetry - Terrestrial
Description
This data was collected as part of the Map the Moment initiative, a volunteer project to document the artwork and changes to the streetscape following the killing of George Floyd and the demonstrations that followed. This data was collected by Lisa Conte and processed by Joe Graham-Felsen. They used a Canon 5D Mark 3 to scan this data and capture the various murals that appeared throughout the city.
d
Data from: Debate: The slippery slope of surrogate outcomes
catalog.data.gov
data.virginia.gov
Updated Sep 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institutes of Health (2025). Debate: The slippery slope of surrogate outcomes [Dataset]. https://catalog.data.gov/dataset/debate-the-slippery-slope-of-surrogate-outcomes
Explore at:
Dataset updated
Sep 6, 2025
Dataset provided by
National Institutes of Health
Description
Surrogate outcomes are frequently used in cardiovascular disease research. A concern is that changes in surrogate markers may not reflect changes in disease outcomes. Two recent clinical trials (Heart and Estrogen/Progestin Replacement Study [HERS], and the Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial [ALLHAT]) underscore this problem since their results contradicted what was expected based on the surrogate outcomes. The current regulatory policy to allow new therapies to be introduced onto the market based solely on surrogate outcomes may need to be reviewed.
c
Data from: Datasets for Comparison of Surrogate Models to Estimate Pesticide...
s.cnmilf.com
data.usgs.gov
+1more
Updated Oct 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Datasets for Comparison of Surrogate Models to Estimate Pesticide Concentrations at Six U.S. Geological Survey National Water Quality Network Sites During Water Years 2013–2018 [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/datasets-for-comparison-of-surrogate-models-to-estimate-pesticide-concentrations-at-six-u-
Explore at:
Dataset updated
Oct 8, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
This data release is comprised of data tables of input variables for seawaveQ and surrogate models used to predict concentrations of select pesticides at six U.S. Geological Survey National Water Quality Network (NWQN) river sites (Fanno Creek at Durham, Oregon; White River at Hazleton, Indiana; Kansas River at DeSoto, Kansas; Little Arkansas River near Sedgwick, Kansas; Missouri River at Hermann, Missouri; Red River of the North at Grand Forks, North Dakota). Each data table includes discrete concentrations of one select pesticide (Atrazine, Azoxystrobin, Bentazon, Bromacil, Imidacloprid, Simazine, or Triclopyr) at one of the NWQN sites; daily mean streamflow; 30-day and 1-day flow anomalies; daily median values of pH and turbidity; daily mean values of dissolved oxygen, specific conductance, and water temperature; and 30-day and 1-day anomalies for pH, turbidity, dissolved oxygen, specific conductance, and water temperature. Two pesticides were modeled at each site with three types of regression models. Also included is a zip file with outputs from seawaveQ model summary. The processes for retrieving and preparing data for regression models followed those outlined in the SEAWAVE-Q R package documentation (Ryberg and Vecchia, 2013; Ryberg and York, 2020). The R package waterData (Ryberg and Vecchia, 2012) was used to import daily mean values for discharge and either daily mean or daily median values for continuous water-quality constituents directly into R depending on what data were available at each site. Pesticide concentration, streamflow, and surrogate data (continuously measured field parameters) were imported from and are available online from the USGS National Water Information System database (USGS, 2020). The waterData package was used to screen for missing daily mean discharge values (no missing values were found for the sites) and to calculate short-term (1 day) and mid-term (30 day) anomalies for flow and short-term anomalies (1 day) for each water-quality variable. A mid-term streamflow anomaly, for instance, is the deviation of concurrent daily streamflow from average conditions for the previous 30 days (Vecchia and others, 2008). Anomalies were calculated as additional potential model variables. Pesticide concentrations for select constituents from each site were pulled into R using the dataRetrieval package (De Cicco and others, 2018). Three of the six sites (Kansas River at DeSoto, Kansas; Missouri River at Hermann, Missouri; and White River at Hazleton, Indiana) pulled pesticide data for WY 2013–17 whereas the other three sites (Fanno Creek at Durham, Oregon; Little Arkansas River near Sedgwick, Kansas; and Red River of the North at Grand Forks, North Dakota) pulled pesticide data for WY 2013–18. Discrete pesticide data were matched with daily mean discharge and daily mean or median water-quality constituents and the associated calculated short-term (1-day) and mid-term (30-day) anomalies from the date of sampling. Pesticide concentrations were estimated using the SEAWAVE-Q (with surrogates) model using 19 combinations of surrogate variables (table 2 in the associated SIR, "Comparison of Surrogate Models to Estimate Pesticide Concentrations at Six U.S. Geological Survey National Water Quality Network Sites During Water Years 2013–18.") at each of 12 site-pesticide combinations (table 3 in the associated SIR). Three measures of model performance—the generalized coefficient of determination (R2), Akaike’s Information Criteria (AIC), and scale—were included in the output and used to select best-fit models (Table 4 of the associated SIR). The three to four best-fit SEAWAVE-Q (with surrogates) models with sample sizes at least five times the number of variables were selected for each site-pesticide combination based on generalized R2 values—the higher, the better. If generalized R2 values were the same, the model with the lower AIC value was used. The standard surrogate regression and base SEAWAVE-Q models were then applied using the same samples that were used for each of the best-fit SEAWAVE-Q (with surrogates) models so that direct comparisons could be made for each site-pesticide-surrogate instance. The input data used to estimate daily pesticide concentrations for each of the best fit models have been included in this data release. An example of one output file for each model type is included in a .zip file named "output_examples.zip". Each of the output files shows the three measures of model performance. (1) The output file for the standard regression model named "HAZ8_Atrazine_Standard_Regression_Output.txt" includes: Pseudo R-square (Allison) of 0.631, Model AIC of 174.0232, and a Scale of 0.961. (2) The output file for the base SEAWAVE-Q model named "HAZ8_Atrazine_Base_Seawave-Q_Output.txt" includes: Generalized r-squared of 0.82, AIC (Akaike's An Information Criterion) of 36.38, and a Scale of 0.288. (3) The output file for the SEAWAVE-Q w/Surrogates model named "HAZ8_Atrazine_Seawave-Q_w_Surrogates_Output.txt" includes: Generalized r-squared of 0.85, AIC (Akaike's An Information Criterion) of 33.76, and a Scale of 0.268. These values match those for Site ID = HAZ, Pesticide = Atrazine, and Surrogate variable group 8 for each model type in Table 4 of the associated SIR.
Z
Surrogate waveform model data for black hole binary systems computed in...
data-staging.niaid.nih.gov
data.niaid.nih.gov
Updated Aug 19, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bachhar, Ritesh; Field, Scott; Giesler, Matthew; Gonzalez-Quesada,, Kevin; Hughes, Scott; Islam, Tousif; Khanna, Gaurav; Kidder, Lawrence; Pfeiffer, Harald; Rifat, Nur; Rink, Katie; Scheel, Mark; Varma, Vijay (2024). Surrogate waveform model data for black hole binary systems computed in point-particle black hole perturbation theory [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_3592427
Explore at:
Dataset updated
Aug 19, 2024
Dataset provided by
The University of Texas at Austin
Massachusetts Institute of Technology
California Institute of Technology
Cornell University
University of Massachusetts Dartmouth
Albert Einstein Institute
University of Rhode Island
Authors
Bachhar, Ritesh; Field, Scott; Giesler, Matthew; Gonzalez-Quesada,, Kevin; Hughes, Scott; Islam, Tousif; Khanna, Gaurav; Kidder, Lawrence; Pfeiffer, Harald; Rifat, Nur; Rink, Katie; Scheel, Mark; Varma, Vijay
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains all publicly available surrogate data for gravitational waveforms produced within the point-particle black hole perturbation theory framework and calibrated to numerical relativity simulations performed with the Spectral Einstein Code (SpEC).

Several surrogate models are currently available in this catalog:

BHPTNRSur2dq1e3, for aligned spin black hole binary systems with mass-ratios varying from 3 to 1000 and spins from −0.8≤χ1≤0.8 on the larger black hole. This surrogate model is trained on waveform data generated by point-particle black hole perturbation theory (ppBHPT) with calibration to numerical relativity (NR) data. The waveforms include all spin-weighted spherical harmonic modes up to ℓ=4 except the (4,1) and m=0 modes. Model details can be found in Rink et al. 2024. This data file is used to evaluate the surrogate model with either stand-alone Python code hosted by the Black Hole Perturbation Toolkit (Jupyter notebook tutorial) or the GWSurrogate Python package, which can be found on PyPI or conda-forge.

BHPTNRSur1dq1e4, an updated version of the EMRISur1dq1e4 model described below. The updated version includes better calibration to NR, a smoother transition to plunge model, and more harmonic modes. Model details can be found in Islam et al. 2022. This data file is used to evaluate the surrogate model with either stand-alone Python code hosted by the Black Hole Perturbation Toolkit (Jupyter notebook tutorial) or the GWSurrogate Python package, which can be found on PyPI or conda-forge.

EMRISur1dq1e4, for non-spinning black hole binary systems with mass-ratios varying from 3 to 10000. This surrogate model is trained on waveform data generated by point-particle black hole perturbation theory (ppBHPT), with the total mass rescaling parameter tuned to NR simulations. Available modes are [(2,2), (2,1), (3,3), (3,2), (3,1), (4,4), (4,3), (4,2), (5,5), (5,4), (5,3)]. The m<0 modes are deduced from the m>0 modes. Model details can be found in Rifat et al. 2019. This data file is used to evaluate the surrogate model with either stand-alone Python code hosted by the Black Hole Perturbation Toolkit (Jupyter notebook tutorial) or the GWSurrogate Python package (Jupyter notebook tutorial), which can be found on PyPI.
H
Data from: Data augmentation for disruption prediction via robust surrogate...
dataverse.harvard.edu
osti.gov
Updated Aug 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Katharina Rath, David Rügamer, Bernd Bischl, Udo von Toussaint, Cristina Rea, Andrew Maris, Robert Granetz, Christopher G. Albert (2024). Data augmentation for disruption prediction via robust surrogate models [Dataset]. http://doi.org/10.7910/DVN/FMJCAD
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/FMJCAD
Dataset updated
Aug 31, 2024
Dataset provided by
Harvard Dataverse
Authors
Katharina Rath, David Rügamer, Bernd Bischl, Udo von Toussaint, Cristina Rea, Andrew Maris, Robert Granetz, Christopher G. Albert
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The goal of this work is to generate large statistically representative datasets to train machine learning models for disruption prediction provided by data from few existing discharges. Such a comprehensive training database is important to achieve satisfying and reliable prediction results in artificial neural network classifiers. Here, we aim for a robust augmentation of the training database for multivariate time series data using Student-t process regression. We apply Student-t process regression in a state space formulation via Bayesian filtering to tackle challenges imposed by outliers and noise in the training data set and to reduce the computational complexity. Thus, the method can also be used if the time resolution is high. We use an uncorrelated model for each dimension and impose correlations afterwards via coloring transformations. We demonstrate the efficacy of our approach on plasma diagnostics data of three different disruption classes from the DIII-D tokamak. To evaluate if the distribution of the generated data is similar to the training data, we additionally perform statistical analyses using methods from time series analysis, descriptive statistics, and classic machine learning clustering algorithms.
Elwha River Sediment Monitoring Data from the Elwha Bed Load Monitoring...
data.usbr.gov
Updated Aug 15, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States Bureau of Reclamation (2014). Elwha River Sediment Monitoring Data from the Elwha Bed Load Monitoring Database [Dataset]. https://data.usbr.gov/catalog/4616
Explore at:
Dataset updated
Aug 15, 2014
Dataset authored and provided by
United States Bureau of Reclamationhttp://www.usbr.gov/
Area covered
Elwha River
Description
The data contained in the files are continuous bed load flux. The data are summed across the cross section and averaged over an hour. Data were collected with a calibrated surrogate sediment measurement system including 72 impact plates, mounted adjacent to each other, spanning the cross section of the Elwha River at river kilometer 5 (river mile 3.1). The following Science and Technology project numbers apply to the Elwha bed load data; 0115, 1709, 6209, 4542, 9562, and 6499.
Z
Data from: Surrogate-based optimization using an artificial neural network...
data.niaid.nih.gov
zenodo.org
Updated Nov 17, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pfeil, Markus; Slawig, Thomas (2021). Surrogate-based optimization using an artificial neural network for a parameter identification in a 3D marine ecosystem model [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5643666
Explore at:
Dataset updated
Nov 17, 2021
Dataset provided by
Kiel University
Authors
Pfeil, Markus; Slawig, Thomas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract:

Parameter identification for marine ecosystem models is important for the assessment and validation of marine ecosystem models against observational data. The surrogate-based optimization (SBO) is a computationally efficient method to optimize complex models. SBO replaces the computationally expensive (high-fidelity) model by a surrogate constructed from a less accurate but computationally cheaper (low-fidelity) model in combination with an appropriate correction approach, which improves the accuracy of the low-fidelity model. To construct a computationally cheap low-fidelity model, we tested three different approaches to compute an approximation of the annually periodic solution (i.e., a steady annual cycle) of a marine ecosystem model: firstly, a reduced number of spin-up iterations (several decades instead of millennia), secondly, an artificial neural network (ANN) approximating the steady annual cycle and, finally, a combination of the both approaches. Except for the low-fidelity model using only the ANN, the SBO yielded a solution close to the target and reduced the computational effort significantly. If an ANN approximating appropriately a marine ecosystem model is available, the SBO using this ANN as low-fidelity model presents a promising and computational efficient method for the validation.

Content:

SQLite database including the data of the different optimization runs

Structure and weights of the used artificial neural network

Tracer concentrations obtain from the high-fidelity model for the different optimization runs

Facebook

Twitter

Click to copy link

Link copied

Cite

National Institutes of Health (2025). A surrogate-based approach for post-genomic partner identification [Dataset]. https://catalog.data.gov/dataset/a-surrogate-based-approach-for-post-genomic-partner-identification

Data from: A surrogate-based approach for post-genomic partner identification

Explore at:

Dataset updated

Sep 6, 2025

Dataset provided by

National Institutes of Health

Description

Background Modern drug discovery is concerned with identification and validation of novel protein targets from among the 30,000 genes or more postulated to be present in the human genome. While protein-protein interactions may be central to many disease indications, it has been difficult to identify new chemical entities capable of regulating these interactions as either agonists or antagonists. Results In this paper, we show that peptide complements (or surrogates) derived from highly diverse random phage display libraries can be used for the identification of the expected natural biological partners for protein and non-protein targets. Our examples include surrogates isolated against both an extracellular secreted protein (TNFβ) and intracellular disease related mRNAs. In each case, surrogates binding to these targets were obtained and found to contain partner information embedded in their amino acid sequences. Furthermore, this information was able to identify the correct biological partners from large human genome databases by rapid and integrated computer based searches. Conclusions Modified versions of these surrogates should provide agents capable of modifying the activity of these targets and enable one to study their involvement in specific biological processes as a means of target validation for downstream drug discovery.

Clear search

Close search

Google apps

Main menu

Data from: A surrogate-based approach for post-genomic partner...

Surrogate ICR data

Dataset

Contents

Replication Data for: The Surrogate Index: Combining Short-Term Proxies to...

Data from: How Important Is ‘Accuracy’ of Surrogate Decision-Making for...

Data from: Beyond surrogacy: A multi-taxon approach to conservation...

Database for "Estimation of the Energy Recovery and Emission Potential of...

Data underlying the publication: Surrogate-guided Optimization in Quantum...

Reimbursing a sperm or ova (egg) donor or a surrogate for expenditures...

Data from: Patterns of biodiverse, understudied groups do not mirror those...

Data from: Surrogate Variable Analysis.

Data Sheet 1_Association of surrogate adiposity markers with prevalence,...

Surrogate metrics for real data.

-values for histograms from trade vs. surrogate data.

Data from: Surrogate's Court

Data from: Debate: The slippery slope of surrogate outcomes

Data from: Datasets for Comparison of Surrogate Models to Estimate Pesticide...

Surrogate waveform model data for black hole binary systems computed in...

Data from: Data augmentation for disruption prediction via robust surrogate...

Elwha River Sediment Monitoring Data from the Elwha Bed Load Monitoring...

Data from: Surrogate-based optimization using an artificial neural network...

Data from: A surrogate-based approach for post-genomic partner identificationSee More Versions

Data from: A surrogate-based approach for post-genomic partner identification