5 datasets found
  1. Data from: Machine-Learning-Based Data Analysis Method for Cell-Based...

    • acs.figshare.com
    • figshare.com
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rui Hou; Chao Xie; Yuhan Gui; Gang Li; Xiaoyu Li (2023). Machine-Learning-Based Data Analysis Method for Cell-Based Selection of DNA-Encoded Libraries [Dataset]. http://doi.org/10.1021/acsomega.3c02152.s005
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    ACS Publications
    Authors
    Rui Hou; Chao Xie; Yuhan Gui; Gang Li; Xiaoyu Li
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    DNA-encoded library (DEL) is a powerful ligand discovery technology that has been widely adopted in the pharmaceutical industry. DEL selections are typically performed with a purified protein target immobilized on a matrix or in solution phase. Recently, DELs have also been used to interrogate the targets in the complex biological environment, such as membrane proteins on live cells. However, due to the complex landscape of the cell surface, the selection inevitably involves significant nonspecific interactions, and the selection data are much noisier than the ones with purified proteins, making reliable hit identification highly challenging. Researchers have developed several approaches to denoise DEL datasets, but it remains unclear whether they are suitable for cell-based DEL selections. Here, we report the proof-of-principle of a new machine-learning (ML)-based approach to process cell-based DEL selection datasets by using a Maximum A Posteriori (MAP) estimation loss function, a probabilistic framework that can account for and quantify uncertainties of noisy data. We applied the approach to a DEL selection dataset, where a library of 7,721,415 compounds was selected against a purified carbonic anhydrase 2 (CA-2) and a cell line expressing the membrane protein carbonic anhydrase 12 (CA-12). The extended-connectivity fingerprint (ECFP)-based regression model using the MAP loss function was able to identify true binders and also reliable structure–activity relationship (SAR) from the noisy cell-based selection datasets. In addition, the regularized enrichment metric (known as MAP enrichment) could also be calculated directly without involving the specific machine-learning model, effectively suppressing low-confidence outliers and enhancing the signal-to-noise ratio. Future applications of this method will focus on de novo ligand discovery from cell-based DEL selections.

  2. u

    Dataset: Proportional recovery in mice with cortical stroke

    • ldh.stroke-koeln.imise.uni-leipzig.de
    Updated Nov 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Markus Aswendt (2024). Dataset: Proportional recovery in mice with cortical stroke [Dataset]. http://doi.org/10.12751/g-node.gjf2hv
    Explore at:
    Dataset updated
    Nov 4, 2024
    Authors
    Markus Aswendt
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Post-Stroke Recovery Data Repository

    This repository contains various resources related to the study on post-stroke recovery in a mouse model, focusing on the application of the Proportional Recovery Rule (PRR).

    Repository Structure

    • code/: Contains all the code used for the analysis in this study. Detailed information is available in the README within the code folder.
    • input/: This folder contains all datasets used in the publication.
    • output/: This directory includes the final results generated for each dataset. Detailed information for each dataset's output can be found in their respective subfolders.
    • docs/: Additional documentation related to this project, including extra resources in the form of a README file within this folder.

    Methodology Overview

    Introduction

    The Fugl-Meyer upper extremity score is a widely used assessment tool in clinical settings to evaluate motor function in stroke patients. With a maximum score of 66, higher values indicate better motor performance, while lower values signify greater deficits.

    The Proportional Recovery Rule (PRR) suggests that the magnitude of recovery from nonsevere upper limb motor impairment after stroke is approximately 0.7 times the initial impairment. This rule, proposed in 2008, has been applied to various motor and nonmotor impairments, leading to inconsistencies in its formulation and application across studies.

    Translating PRR to Deficit Score

    In this study, we translated the Fugl-Meyer upper extremity score into a deficit score suitable for use in a mouse model. The PRR posits that the change in impairment can be predicted as 0.7 times the initial impairment, plus an error term. We adapted this rule by fitting a linear regression model without an intercept to relate the initial impairment to the change in impairment.

    Data Analysis

    1. Initial Impairment Calculation:

      • Initial impairment (d-score) is calculated as the difference between the deficit score at day 3 post-stroke and the baseline deficit score.
    2. Change Observed and Predicted:

      • Change observed: Initial impairment minus deficit score on day 28.
      • Change predicted: 0.7 times the initial impairment plus an error term.
    3. Cluster Analysis:

      • Data were plotted with initial impairment on the x-axis and change observed on the y-axis.
      • A linear fit was applied to generate two lines: one based on the proportional recovery rule and one from the data fit.
      • Subjects were clustered based on their proximity to these lines, iterating the process until convergence.
    4. Outlier Removal:

      • Outliers were identified and removed based on the interquartile range rule both initially and during each iteration of the clustering process.

    Results

    1. Cluster Characteristics:

      • The final clustering resulted in 65 subjects following the PRR, with a fixed slope of 0.7 and an intercept of -0.42.
      • The other cluster contained 21 subjects with a distinct recovery pattern, characterized by a slope of 0.84.
    2. Statistical Analysis:

      • The slope of the overall linear fit was found to be 0.93.
      • Approximately 75.58% of the subjects adhered to the PRR, indicating the potential relevance of the PRR in the mouse model.

    Additional Information

    This structured dataset was created with reference to the following publication:

    DOI:10.1038/s41597-023-02242-8

    If you have any questions or require further assistance, please do not hesitate to reach out to us. Contact us via email at markus.aswendtATuk-koeln.de or aref.kalantari-sarcheshmehATuk-koeln.de.

  3. r

    Data from: Male responses to sperm competition risk when rivals vary in...

    • researchdata.edu.au
    • search.dataone.org
    • +1more
    Updated 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leigh W. Simmons; Joseph L. Tomkins; Samuel J. Lymbery; School of Biological Sciences (2019). Data from: Male responses to sperm competition risk when rivals vary in their number and familiarity [Dataset]. http://doi.org/10.5061/DRYAD.M097580
    Explore at:
    Dataset updated
    2019
    Dataset provided by
    The University of Western Australia
    DRYAD
    Authors
    Leigh W. Simmons; Joseph L. Tomkins; Samuel J. Lymbery; School of Biological Sciences
    Description

    Males of many species adjust their reproductive investment to the number of rivals present simultaneously. However, few studies have investigated whether males sum previous encounters with rivals, and the total level of competition has never been explicitly separated from social familiarity. Social familiarity can be an important component of kin recognition and has been suggested as a cue that males use to avoid harming females when competing with relatives. Previous work has succeeded in independently manipulating social familiarity and relatedness among rivals, but experimental manipulations of familiarity are confounded with manipulations of the total number of rivals that males encounter. Using the seed beetle Callosobruchus maculatus we manipulated three factors: familiarity among rival males, the number of rivals encountered simultaneously, and the total number of rivals encountered over a 48-hour period. Males produced smaller ejaculates when exposed to more rivals in total, regardless of the maximum number of rivals they encountered simultaneously. Males did not respond to familiarity. Our results demonstrate that males of this species can sum the number of rivals encountered over separate days, and therefore the confounding of familiarity with the total level of competition in previous studies should not be ignored.,Lymbery et al 2018 Full datasetContains all the data used in the statistical analyses for the associated manuscript. The file contains two spreadsheets: one containing the data and one containing a legend relating to column titles.Lymbery et al Full Dataset.xlsxLymbery et al 2018 Reduced dataset 1Contains data used in the attached manuscript following the removal of three outliers for the purposes of data distribution, as described in the associated R code. The file contains two spreadsheets: one containing the data and one containing a legend relating to column titles.Lymbery et al Reduced Dataset After 1st Round of Outlier Removal.xlsxLymbery et al 2018 Reduced dataset 2Contains the data used in the statistical analyses for the associated manuscript, after the removal of all outliers stated in the manuscript and associated R code. The file contains two spreadsheets: one containing the data and one containing a legend relating to column titles.Lymbery et al Reduced Dataset After Final Outlier Removal.xlsxLymbery et al 2018 R ScriptContains all the R code used for statistical analysis in this manuscript, with annotations to aid interpretation.,

  4. f

    Model comparison between near-miss to Weber’s law and Weber’s law.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vladislav Nachev; Kai Petra Stich; York Winter (2023). Model comparison between near-miss to Weber’s law and Weber’s law. [Dataset]. http://doi.org/10.1371/journal.pone.0074144.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Vladislav Nachev; Kai Petra Stich; York Winter
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In both models Equation 5 was fitted against observed individual discrimination performances. Lower Akaike Information Criterion (AIC) scores indicate a better fit of a model to the data, after penalizing for the number of estimated parameters. AIC scores can only be compared within rows but not between rows. ΔAIC gives the difference between the AIC scores for the model based on Weber’s law and the model based on the near-miss to Weber’s law. F and p values are based on one-way ANOVAs with 1 df.aThe exponent β was fixed with value one in the Weber’s law model and was estimated in the near-miss to Weber’s law model. Values in the middle are average estimates and the values to the left and right are the 95% confidence interval limits.bOne outlier was removed from the HIGH data set of Bat 4.

  5. a

    Digital Earth Australia Coastlines

    • digital.atlas.gov.au
    Updated Mar 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Digital Atlas of Australia (2025). Digital Earth Australia Coastlines [Dataset]. https://digital.atlas.gov.au/maps/36b0acf3d8a5439199b9a42a06011d20
    Explore at:
    Dataset updated
    Mar 13, 2025
    Dataset authored and provided by
    Digital Atlas of Australia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    Abstract Digital Earth Australia Coastlines is a continental dataset that includes annual shorelines and rates of coastal change along the entire Australian coastline from 1988 to the present. The product combines satellite data from Geoscience Australia's Digital Earth Australia program with tidal modelling to map the most representative location of the shoreline at mean sea level for each year. The product enables trends of coastal retreat and growth to be examined annually at both a local and continental scale, and for patterns of coastal change to be mapped historically and updated regularly as data continues to be acquired. This allows current rates of coastal change to be compared with that observed in previous years or decades. The ability to map shoreline positions for each year provides valuable insights into whether changes to our coastline are the result of particular events or actions, or a process of more gradual change over time. This information can enable scientists, managers and policy makers to assess impacts from the range of drivers impacting our coastlines and potentially assist planning and forecasting for future scenarios. The DEA Coastlines product contains five layers:

    Annual shorelines Rates of change points Coastal change hotspots (1 km) Coastal change hotspots (5 km) Coastal change hotspots (10 km)

    Annual shorelines Annual shoreline vectors that represent the median or ‘most representative’ position of the shoreline at approximately 0 m Above Mean Sea Level for each year since 1988. Dashed shorelines have low certainty. Rates of change points A point dataset providing robust rates of coastal change for every 30 m along Australia’s non-rocky coastlines. The most recent annual shoreline is used as a baseline for measuring rates of change. Points are shown for locations with statistically significant rates of change (p-value <= 0.01; see sig_time below) and good quality data (certainty = "good"; see certainty below) only. Each point shows annual rates of change (in metres per year; see rate_time below), and an estimate of uncertainty in brackets (95% confidence interval; see se_time). For example, there is a 95% chance that a point with a label -10.0 m (±1.0 m) is retreating at a rate of between -9.0 and -11.0 metres per year. Coastal change hotspots (1 km, 5 km, 10 km) Three points layers summarising coastal change within moving 1 km, 5 km and 10km windows along the coastline. These layers are useful for visualising regional or continental-scale patterns of coastal change. Currency Date modified: August 2023 Modification frequency: Annually Data extent Spatial extent North: -9° South: -44° East: 154° West: 112° Temporal extent From 1988 to Present Source information

    Product description and metadata Digital Earth Australia Coastlines catalog entry Data download Interactive Map

    Lineage statement The DEA Coastlines product is under active development. A full and current product description is best sourced from the DEA Coastlines website. For a full summary of changes made in previous versions, refer to Github. Data dictionary Layer attribute columns Annual shorelines

    Attribute name Description

    OBJECTID Automatically generated system ID

    year The year of each annual shoreline

    certainty A column providing important data quality flags for each annual shoreline (see the Quality assurance section of the product description and metadata page for more detail about each data quality flag)

    tide_datum The tide datum of each annual shoreline (e.g. "0 m AMSL")

    id_primary The name of the annual shoreline's Primary sediment compartment from the Australian Coastal Sediment Compartments framework

    Rates of change points and Coastal change hotspots

    Attribute name Description

    OBJECTID Automatically generated system ID

    uid A unique geohash identifier for each point

    rate_time Annual rates of change (in metres per year) calculated by linearly regressing annual shoreline distances against time (excluding outliers). Negative values indicate retreat and positive values indicate growth

    sig_time Significance (p-value) of the linear relationship between annual shoreline distances and time. Small values (e.g. p-value < 0.01 or 0.05) may indicate a coastline is undergoing consistent coastal change through time

    se-time Standard error (in metres) of the linear relationship between annual shoreline distances and time. This can be used to generate confidence intervals around the rate of change given by rate_time (e.g. 95% confidence interval = se_time * 1.96).

    outl_time Individual annual shoreline are noisy estimators of coastline position that can be influenced by environmental conditions (e.g. clouds, breaking waves, sea spray) or modelling issues (e.g. poor tidal modelling results or limited clear satellite observations). To obtain reliable rates of change, outlier shorelines are excluded using a robust Median Absolute Deviation outlier detection algorithm, and recorded in this column

    dist_1990, dist_1991, etc Annual shoreline distances (in metres) relative to the most recent baseline shoreline. Negative values indicate that an annual shoreline was located inland of the baseline shoreline. By definition, the most recent baseline column will always have a distance of 0 m

    angle_mean, angle_std The mean angle and standard deviation between the baseline point to all annual shorelines. This data is used to calculate how well shorelines fall along a consistent line; high angular standard deviation indicates that derived rates of change are unlikely to be correct

    valid_obs, valid_span The total number of valid (i.e. non-outliers, non-missing) annual shoreline observations, and the maximum number of years between the first and last valid annual shoreline

    sce Shoreline Change Envelope (SCE). A measure of the maximum change or variability across all annual shorelines, calculated by computing the maximum distance between any two annual shorelines (excluding outliers). This statistic excludes sub-annual shoreline variability like tides, storms and seasonal effects

    nsm Net Shoreline Movement (NSM). The distance between the oldest (1988) and most recent annual shoreline (excluding outliers). Negative values indicate the coastline retreated between the oldest and most recent shoreline; positive values indicate growth. This statistic does not reflect sub-annual shoreline variability, so will underestimate the full extent of variability at any given location

    max_year, min_year The year that annual shorelines were at their maximum (i.e. located furthest towards the ocean) and their minimum (i.e. located furthest inland) respectively (excluding outliers). This statistic excludes sub-annual shoreline variability

    certainty A column providing important data quality flags for each annual shoreline (see the Quality assurance section of the product description and metadata page for more detail about each data quality flag)

    id_primary The name of the point's Primary sediment compartment from the Australian Coastal Sediment Compartments framework

    Contact Geoscience Australia, clientservices@ga.gov.au

  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Rui Hou; Chao Xie; Yuhan Gui; Gang Li; Xiaoyu Li (2023). Machine-Learning-Based Data Analysis Method for Cell-Based Selection of DNA-Encoded Libraries [Dataset]. http://doi.org/10.1021/acsomega.3c02152.s005
Organization logo

Data from: Machine-Learning-Based Data Analysis Method for Cell-Based Selection of DNA-Encoded Libraries

Related Article
Explore at:
zipAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
ACS Publications
Authors
Rui Hou; Chao Xie; Yuhan Gui; Gang Li; Xiaoyu Li
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

DNA-encoded library (DEL) is a powerful ligand discovery technology that has been widely adopted in the pharmaceutical industry. DEL selections are typically performed with a purified protein target immobilized on a matrix or in solution phase. Recently, DELs have also been used to interrogate the targets in the complex biological environment, such as membrane proteins on live cells. However, due to the complex landscape of the cell surface, the selection inevitably involves significant nonspecific interactions, and the selection data are much noisier than the ones with purified proteins, making reliable hit identification highly challenging. Researchers have developed several approaches to denoise DEL datasets, but it remains unclear whether they are suitable for cell-based DEL selections. Here, we report the proof-of-principle of a new machine-learning (ML)-based approach to process cell-based DEL selection datasets by using a Maximum A Posteriori (MAP) estimation loss function, a probabilistic framework that can account for and quantify uncertainties of noisy data. We applied the approach to a DEL selection dataset, where a library of 7,721,415 compounds was selected against a purified carbonic anhydrase 2 (CA-2) and a cell line expressing the membrane protein carbonic anhydrase 12 (CA-12). The extended-connectivity fingerprint (ECFP)-based regression model using the MAP loss function was able to identify true binders and also reliable structure–activity relationship (SAR) from the noisy cell-based selection datasets. In addition, the regularized enrichment metric (known as MAP enrichment) could also be calculated directly without involving the specific machine-learning model, effectively suppressing low-confidence outliers and enhancing the signal-to-noise ratio. Future applications of this method will focus on de novo ligand discovery from cell-based DEL selections.

Search
Clear search
Close search
Google apps
Main menu