100+ datasets found
  1. f

    Dataset for: Comparison of Two Correlated ROC Surfaces at a Given Pair of...

    • wiley.figshare.com
    xlsx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leonidas Bantis; Ziding Feng (2023). Dataset for: Comparison of Two Correlated ROC Surfaces at a Given Pair of True Classification Rates [Dataset]. http://doi.org/10.6084/m9.figshare.6527219.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Wiley
    Authors
    Leonidas Bantis; Ziding Feng
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The receiver operating characteristics (ROC) curve is typically employed when one wants to evaluate the discriminatory capability of a continuous or ordinal biomarker in the case where two groups are to be distinguished, commonly the ’healthy’ and the ’diseased’. There are cases for which the disease status has three categories. Such cases employ the (ROC) surface, which is a natural generalization of the ROC curve for three classes. In this paper, we explore new methodologies for comparing two continuous biomarkers that refer to a trichotomous disease status, when both markers are applied to the same patients. Comparisons based on the volume under the surface have been proposed, but that measure is often not clinically relevant. Here, we focus on comparing two correlated ROC surfaces at given pairs of true classification rates, which are more relevant to patients and physicians. We propose delta-based parametric techniques, power transformations to normality, and bootstrap-based smooth nonparametric techniques to investigate the performance of an appropriate test. We evaluate our approaches through an extensive simulation study and apply them to a real data set from prostate cancer screening.

  2. f

    Statistical Comparison of Two ROC Curves

    • figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yaacov Petscher (2023). Statistical Comparison of Two ROC Curves [Dataset]. http://doi.org/10.6084/m9.figshare.860448.v1
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    figshare
    Authors
    Yaacov Petscher
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This excel file will do a statistical tests of whether two ROC curves are different from each other based on the Area Under the Curve. You'll need the coefficient from the presented table in the following article to enter the correct AUC value for the comparison: Hanley JA, McNeil BJ (1983) A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 148:839-843.

  3. Input-Output Data Sets Used in the Evaluation of the Two-Layer Soil Moisture...

    • catalog.data.gov
    • s.cnmilf.com
    Updated Mar 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2023). Input-Output Data Sets Used in the Evaluation of the Two-Layer Soil Moisture and Flux Model [Dataset]. https://catalog.data.gov/dataset/input-output-data-sets-used-in-the-evaluation-of-the-two-layer-soil-moisture-and-flux-mode
    Explore at:
    Dataset updated
    Mar 3, 2023
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    The Excel file contains the model input-out data sets that where used to evaluate the two-layer soil moisture and flux dynamics model. The model is original and was developed by Dr. Hantush by integrating the well-known Richards equation over the root layer and the lower vadose zone. The input-output data are used for: 1) the numerical scheme verification by comparison against HYDRUS model as a benchmark; 2) model validation by comparison against real site data; and 3) for the estimation of model predictive uncertainty and sources of modeling errors. This dataset is associated with the following publication: He, J., M.M. Hantush, L. Kalin, and S. Isik. Two-Layer numerical model of soil moisture dynamics: Model assessment and Bayesian uncertainty estimation. JOURNAL OF HYDROLOGY. Elsevier Science Ltd, New York, NY, USA, 613 part A: 128327, (2022).

  4. f

    Data from: Scalable Methods for Multiple Time Series Comparison in Second...

    • tandf.figshare.com
    pdf
    Updated Aug 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lei Jin; Bo Li (2024). Scalable Methods for Multiple Time Series Comparison in Second Order Dynamics [Dataset]. http://doi.org/10.6084/m9.figshare.26496134.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Aug 5, 2024
    Dataset provided by
    Taylor & Francis
    Authors
    Lei Jin; Bo Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Statistical comparison of multiple time series in their underlying frequency patterns has many real applications. However, existing methods are only applicable to a small number of mutually independent time series, and empirical results for dependent time series are only limited to comparing two time series. We propose scalable methods based on a new algorithm that enables us to compare the spectral density of a large number of time series. The new algorithm helps us efficiently obtain all pairwise feature differences in frequency patterns between M time series, which plays an essential role in our methods. When all M time series are independent of each other, we derive the joint asymptotic distribution of their pairwise feature differences. The asymptotic dependence structure between the feature differences motivates our proposed test for multiple mutually independent time series. We then adapt this test to the case of multiple dependent time series by partially accounting for the underlying dependence structure. Additionally, we introduce a global test to further enhance the approach. To examine the finite sample performance of our proposed methods, we conduct simulation studies. The new approaches demonstrate the ability to compare a large number of time series, whether independent or dependent, while exhibiting competitive power. Finally, we apply our methods to compare multiple mechanical vibrational time series.

  5. d

    Hydroclimate Projections for Select U.S. Fish and Wildlife Service...

    • catalog.data.gov
    • data.usgs.gov
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Hydroclimate Projections for Select U.S. Fish and Wildlife Service Properties - Mountain-Prairie Region, 1951-2099 [Dataset]. https://catalog.data.gov/dataset/hydroclimate-projections-for-select-u-s-fish-and-wildlife-service-properties-mountain-1951
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Canadian Prairies
    Description

    This data release contains time series and plots summarizing mean monthly temperature (TAVE) and total monthly precipitation (PPT), and runoff (RO) from the U.S. Geological Survey Monthly Water Balance Model at 115 National Wildlife Refuges within the U.S. Fish and Wildlife Service Mountain-Prairie Region (CO, KS, MT, NE, ND, SD, UT, and WY). These three variables are derived from two sets of statistically-downscaled general circulation models from 1951 through 2099. Three variables (TAVE, PPT, and RO for refuge areas) were summarized for comparison across four 19-year periods: historic (1951-1969), baseline (1981-1999), 2050 (2041-2059), and 2080 (2071-2089). For each refuge, mean monthly plots, seasonal box plots, and annual envelope plots were produced for each of the four periods.

  6. Z

    Data from: Data set for comparison between two biosignals acquisition...

    • data.niaid.nih.gov
    Updated Jul 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Strojny, Paweł (2024). Data set for comparison between two biosignals acquisition systems – BioNomadix and BITalino [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3726610
    Explore at:
    Dataset updated
    Jul 22, 2024
    Dataset provided by
    Dużmańska-Misiarczyk, Natalia
    Strojny, Paweł
    Lipp, Natalia
    Argasiński, Jan K.
    Giżycka, Barbara
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data was collected in order to compare quality of the signal acquired by two devices – BITalino (Da Silva, Guerreiro, Lourenço, Fred, & Martins, 2014) and BioNomadix (BIOPAC Systems Inc., Goleta, CA, USA).

  7. VineLOGIC: Experimental Data Sets

    • researchdata.edu.au
    • data.csiro.au
    datadownload
    Updated Feb 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Benn; R. J. G. White; D. C. Godwin; Everard Edwards; Peter Clingeleffer; Deidre Blackmore; Anne Pellegrino; Nicola Cooley; Rachel Ashley; Rob Walker (2023). VineLOGIC: Experimental Data Sets [Dataset]. http://doi.org/10.25919/J503-FT52
    Explore at:
    datadownloadAvailable download formats
    Dataset updated
    Feb 28, 2023
    Dataset provided by
    CSIROhttp://www.csiro.au/
    Authors
    David Benn; R. J. G. White; D. C. Godwin; Everard Edwards; Peter Clingeleffer; Deidre Blackmore; Anne Pellegrino; Nicola Cooley; Rachel Ashley; Rob Walker
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jul 1, 2000 - Dec 31, 2006
    Description

    Three experimental data sets (WNRA0103, WNRA0305 and WNRA0506) involving three grapevine varieties and a range of deficit irrigation and pruning treatments are described. The purpose for obtaining the data sets was two-fold, (1) to meet the research goals of the Cooperative Research Centre for Viticulture (CRCV) during its tenure 1999-2006, and (2) to test the capacity of the VineLOGIC grapevine growth and development model to predict timing of bud burst, flowering, veraison and harvest, yield and yield components, berry attributes and components of water balance. A test script, included with the VineLOGIC source code publication (https://doi.org/10.25919/5eb3536b6a8a8), enables comparison between model predicted and measured values for key variables. Key references relating to the model and data sets are provided under Related Links. A description of selected terms and outcomes of regression analysis between values predicted by the model and observed values are provided under Supporting Files. Version 3 included the following amendments: (1) to WNRA0103 – alignment of settings for irrigation simulation control and initial soil water contents for soil layers with those in WNRA0305 and WNRA0506, and addition of missing berry anthocyanin data for season 2002-03; (2) to WNRA0305 - minor corrections to values for berry and bunch number and weight, and correction of target Brix value for harvest to 24.5 Brix; (3) minor corrections to some measured berry anthocyanin concentrations as mg/g fresh weight; minor amendments to treatment names for consistency across data sets, and to the name for irrigation type to improve clarity; and (4) update of regression analysis between VineLOGIC-predicted versus observed values for key variables. Version 4 (this version) includes a metadata only amendment with two additions to Related links: ‘VineLOGIC View’ and a recent publication. Lineage: The data sets were obtained at a commercial wine company vineyard in the Mildura region of north western Victoria, Australia. Vines were spaced 2.4 m within rows and 3 m between rows, trained to a two-wire vertical trellis and drip irrigated. The soil was a Nookamka sandy loam. Data Set 1 (WNRA0103): An experiment comparing the effects on grapevine growth and development of three pruning treatments, spur, light mechanical hedging and minimal pruning, involving Shiraz on Schwarzmann rootstock, irrigated with industry standard drip irrigation and collected over three seasons 2000-01, 2001-02 and 2002-03. The experiment was established and conducted by Dr Rachel Ashley with input from Peter Clingeleffer (CSIRO), Dr Bob Emmett (Department of Primary Industries, Victoria) and Dr Peter Dry (University of Adelaide). Seasons in the southern hemisphere span two calendar years, with budburst in the second half of the first calendar year and harvest in the first half of the second calendar year. Data Set 2 (WNRA0305): An experiment comparing the effects of three irrigation treatments, industry standard drip, Regulated Deficit (RDI) and Prolonged Deficit (PD) irrigation involving Cabernet Sauvignon on own roots and pruned by light mechanical hedging, over three seasons 2002-03, 2003-04 and 2004-05. The RDI treatment involved application of a water deficit in the post-fruit set to pre-veraison period. The PD treatment was initially the same as RDI but with an extended period of extreme deficit (no irrigation) after the RDI stress period until veraison. The experiment was established and conducted by Dr Nicola Cooley with input from Peter Clingeleffer and Dr Rob Walker (CSIRO). Data Set 3 (WNRA0506): Compared basic grapevine growth, development and berry maturation post fruit set at three Trial Sites over two seasons 2004-05 and 2005-06. Trial Site one is the same site used to collect Data Set 1. Data were collected from all three pruning treatments in season 2004-05 but only from the spur and light mechanical hedging treatments in season 2005-06. Trial Site two involved comparison of two scions, Chardonnay and Shiraz, both on Schwarzmann rootstock, irrigated with industry standard drip irrigation and pruned using light mechanical hedging. Data were collected in season 2004-05. Trial Site three is the same site used to collect Data Set 2. Data were collected from all three irrigation treatments in season 2004-05 but only from the industry standard drip and PD treatments in 2005-06. Establishment and conduct of experiments at Trial Sites one, two and three was by Dr Anne Pellegrino and Deidre Blackmore with input from Peter Clingeleffer and Dr Rob Walker. The decision to develop Data Set 3 followed a mid-term CRCV review and analysis of available Australian data sets and relevant literature, which identified the need to obtain a data set covering all of the required variables necessary to run VineLOGIC and in particular, to obtain data on berry development commencing as soon as possible after fruit set. Most prior data sets were from veraison onwards, which is later than desirable from a modelling perspective. Data Set 1, 2 and 3 compilation for VineLOGIC was by Deidre Blackmore with input from Dr Doug Godwin. Review and testing of the Data Sets with VineLOGIC was conducted by David Benn with input from Dr Paul Petrie (South Australian Research and Development Institute), Dr Vinay Pagay (University of Adelaide) and Drs Everard Edwards and Rob Walker (CSIRO). A collaboration agreement with University of Adelaide established in 2017 enabled further input to review of the Data Sets and their testing with VineLOGIC by Dr Sam Culley.

  8. z

    A vigiPoint characterisation of female versus male reports in VigiBase, the...

    • zenodo.org
    • dataone.org
    • +1more
    bin
    Updated Jun 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarah Watson; Sarah Watson; Ola Caster; Ola Caster (2022). A vigiPoint characterisation of female versus male reports in VigiBase, the WHO global database of individual case safety reports [Dataset]. http://doi.org/10.5061/dryad.8cz8w9gk1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 2, 2022
    Dataset provided by
    Zenodo
    Authors
    Sarah Watson; Sarah Watson; Ola Caster; Ola Caster
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    General information

    This data is supplementary material to the paper by Watson et al. on sex differences in global reporting of adverse drug reactions [1]. Readers are referred to this paper for a detailed description of the context in which the data was generated. Anyone intending to use this data for any purpose should read the publicly available information on the VigiBase source data [2, 3]. The conditions specified in the caveat document [3] must be adhered to.

    Source dataset

    The dataset published here is based on analyses performed in VigiBase, the WHO global database of individual case safety reports [4]. All reports entered into VigiBase from its inception in 1967 up to 2 January 2018 with patient sex coded as either female or male have been included, except suspected duplicate reports [5]. In total, the source dataset contained 9,056,566 female and 6,012,804 male reports.

    Statistical analysis

    The characteristics of the female reports were compared to those of the male reports using a method called vigiPoint [6]. This is a method for comparing two or more sets of reports (here female and male reports) on a large set of reporting variables, and highlight any feature in which the sets are different in a statistically and clinically relevant manner. For example, patient age group is a reporting variable, and the different age groups 0 - 27 days, 28 days - 23 months et cetera are features within this variable. The statistical analysis is based on shrinkage log odds ratios computed as a comparison between the two sets of reports for each feature, including all reports without missing information for the variable under consideration. The specific output from vigiPoint is defined precisely below. Here, the results for 18 different variables with a total of 44,486 features are presented. 74 of these features were highlighted as so called vigiPoint key features, suggesting a statistically and clinically significant difference between female and male reports in VigiBase.

    Description of published dataset

    The dataset is provided in the form of a MS Excel spreadsheet (.xlsx file) with nine columns and 44,486 rows (excluding the header), each corresponding to a specific feature. Below follows a detailed description of the data included in the different columns.

    Variable: This column indicates the reporting variable to which the specific feature belongs. Six of these variables are described in the original publication by Watson et al.: country of origin, geographical region of origin, type of reporter, patient age group, MedDRA SOC, ATC level 2 of reported drugs, seriousness, and fatality [1]. The remaining 12 are described here:

    • MedDRA HLGT (high-level group term), MedDRA HLT (high-level term) and MedDRA PT (preferred term) are defined analogously to the MedDRA SOC (system organ class) [1], only at lower levels of the MedDRA (Medical Dictionary for Regulatory Activities) hierarchy. Here, MedDRA version 20.1 has been used.
    • ATC level 3 of reported drugs is defined analogously to the variable ATC level 2 of reported drugs [1], only one step further down in the ATC (Anatomical Therapeutical Classification) hierarchy.
    • The vigiGrade completeness score is a measure of how complete each report is with respect to certain report fields useful for causality assessment [7]. The completeness score has been dichotomised into two features, 'Above or equal to 0.8' and 'Below 0.8'. The maximum possible score for an individual report is 1.0.
    • The date of VigiBase entry is simply the time when a report was entered into VigiBase. This variable is divided into 14 features that are either individual years or ranges of years.
    • The number of reported drugs is the number of unique drugs that are coded on a report as either suspected, interacting, or concomitant. A drug is here defined as an entry at the preferred base (i.e. substance) level of the WHODRUG terminology. The variable is divided into four features: 'One drug', 'Two drugs', '3-5 drugs', and 'More than 5 drugs'.
    • The number of reported MedDRA PTs is the number of unique MedDRA preferred terms that are coded as events on a report. This variable is divided into four features in exactly the same way as the reported drugs.
    • A reported drug is a drug coded on a report as either suspected, interacting, or concomitant. As above, a drug is defined as an entry at the preferred base (i.e. substance) level of the WHODRUG terminology. This variable has almost 23,000 features, one for each drug that occurs in at least one female or one male report.
    • The type of report indicates the type of individual case report. The vast majority belongs to the feature 'Spontaneous', but there are four other possible features for this variable.

    The Variable column can be useful for filtering the data, for example if one is interested in one or a few specific variables.

    Feature: This column contains each of the 44,486 included features. The vast majority should be self-explanatory, or else they have been explained above, or in the original paper [1].

    Female reports and Male reports: These columns show the number of female and male reports, respectively, for which the specific feature is present.

    Proportion among female reports and Proportion among male reports: These columns show the proportions within the female and male reports, respectively, for which the specific feature is present. Comparing these crude proportions is the simplest and most intuitive way to contrast the female and male reports, and a useful complement to the specific vigiPoint output.

    Odds ratio: The odds ratio is a basic measure of association between the classification of reports into female and male reports and a given reporting feature, and hence can be used to compare female and male reports with respect to this feature. It is formally defined as a / (bc / d), where

    • a is the number of female reports with the feature
    • b is the number of female reports without the feature (excluding reports where the variable is missing)
    • c is the number of male reports with the feature
    • d is the number of male reports without the feature (excluding reports where the variable is missing).

    This crude odds ratio can also be computed as (pfemale / (1-pfemale)) / (pmale / (1-pmale)), where pfemale and pmale are the proportions described earlier. If the odds ratio is above 1, the feature is more common among the female than the male reports; if below 1, the feature is less common among the female than the male reports. Note that the odds ratio can be mathematically undefined, in which case it is missing in the published data.

    vigiPoint score: This score is defined based on an odds ratio with added statistical shrinkage, defined as (a + k) / ((bc / d) + k), where k is 1% of the total number of female reports, or about 9,000. While the shrinkage adds robustness to the measure of association, it makes interpretation more difficult, which is why the crude proportions and unshrunk odds ratios are also presented. Further, 99% credibility intervals are computed for the shrinkage odds ratios, and these intervals are transformed onto a log2 scale [6]. The vigiPoint score is then defined as the lower endpoint of the interval, if that endpoint is above 0; as the higher endpoint of the interval, if that endpoint is below 0; and otherwise as 0. The vigiPoint score is useful for sorting the features from strongest positive to strongest negative associations, and/or to filter the features according to some user-defined criteria.

    vigiPoint key feature: Features are classified as vigiPoint key features if their vigiPoint score is either above 0.5 or below -0.5. The specific thereshold of 0.5 is arbitrary, but chosen to identify features where the two sets of reports (here female and male reports) differ in a clinically significant way.

    References

    1. Watson S, Caster O, Rochon PA, den Ruijter H. Reported adverse drug reactions in women and men: Aggregated evidence from globally collected individual case reports during half a decade. EClinicalMedicine 2019.
    2. Uppsala Monitoring Centre. Guideline for using VigiBase data in studies.
    3. Uppsala Monitoring Centre. Caveat document: Statement of reservations, limitations, and conditions relating to data released from VigiBase, the WHO global database of individual case safety reports (ICSRs).
    4. Lindquist M. VigiBase, the WHO Global ICSR Database System: Basic Facts. The Drug Information Journal 2008; 42(5): 409-19.
    5. Norén GN, Orre R, Bate A, Edwards IR. Duplicate detection in adverse drug reaction surveillance. Data Mining and Knowledge Discovery 2007; 14(3): 305-28.
    6. Juhlin K, Star K, Norén GN. A method for data-driven exploration to pinpoint key features in medical data and facilitate expert review. Pharmacoepidemiology and Drug Safety 2017; 26(10):

  9. Data from: Comparison Data Sets for Benchmarking QSAR Methodologies in Lead...

    • figshare.com
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ruchi R. Mittal; Ross A. McKinnon; Michael J. Sorich (2023). Comparison Data Sets for Benchmarking QSAR Methodologies in Lead Optimization [Dataset]. http://doi.org/10.1021/ci900117m.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    ACS Publications
    Authors
    Ruchi R. Mittal; Ross A. McKinnon; Michael J. Sorich
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    2D and 3D QSAR techniques are widely used in lead optimization-like processes. A compilation of 40 diverse data sets is described. It is proposed that these can be used as a common benchmark sample for comparisons of QSAR methodologies, primarily in terms of predictive ability. Use of this benchmark set will be useful for both assessment of new methods and for optimization of existing methods.

  10. H

    Replication Data for: Exploring Disagreement in Indicators of State...

    • dataverse.harvard.edu
    • dataone.org
    Updated May 30, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charles Crabtree (2018). Replication Data for: Exploring Disagreement in Indicators of State Repression [Dataset]. http://doi.org/10.7910/DVN/V5LB9K
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 30, 2018
    Dataset provided by
    Harvard Dataverse
    Authors
    Charles Crabtree
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Until recently, researchers who wanted to examine the determinants of state respect for most specific negative rights needed to rely on data from the CIRI or the Political Terror Scale (PTS). The new V-DEM dataset offers scholars a potential alternative to the individual human rights variables from CIRI. We analyze a set of key Cingranelli-Richards (CIRI) Human Rights Data Project and Varieties of Democracy (V-DEM) negative rights indicators, finding unusual and unexpectedly large patterns of disagreement between the two sets. First, we discuss the new V-DEM dataset by comparing it to the disaggregated CIRI indicators, discussing the history of each project, and describing its empirical domain. Second, we identify a set of disaggregated human rights measures that are similar across the two datasets and discuss each project's measurement approach. Third, we examine how these measures compare to each other empirically, showing that they diverge considerably across both time and space. These findings point to several important directions for future work, such as how conceptual approaches and measurement strategies affect rights scores. For the time being, our findings suggest that researchers should think carefully about using the measures as substitutes.

  11. m

    Database for comparing two algorithms that classified eucalyptus in Landsat...

    • data.mendeley.com
    Updated Sep 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Debora Ferraz (2024). Database for comparing two algorithms that classified eucalyptus in Landsat image time series [Dataset]. http://doi.org/10.17632/jdrx42jds9.3
    Explore at:
    Dataset updated
    Sep 20, 2024
    Authors
    Debora Ferraz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data used in the development of the research entitled: "Comparison between machine learning classification and trajectory-based change detection for identifying eucalyptus areas in Landsat time series"

  12. Pairwise sentence complexity comparison

    • kaggle.com
    Updated Jun 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Douglas K.G. Araujo (2021). Pairwise sentence complexity comparison [Dataset]. https://www.kaggle.com/douglaskgaraujo/pairwise-sentence-complexity-comparison
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 8, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Douglas K.G. Araujo
    Description

    Dataset creation

    The dataset was created by this notebook: https://www.kaggle.com/douglaskgaraujo/sentence-complexity-comparison-dataset

    Context

    This data is a pairwise comparison of sentences, together with information about their relative complexity. The original dataset is from the CommonLit Readability Prize competition, and interested readers are referred there (especially the competitions' discussion forums) for more information on the data itself.

    Important notice! As per that competition's rules, the license is as follows:

    1. COMPETITION DATA. "Competition Data" means the data or datasets available from the Competition Website for the purpose of use in the Competition, including any prototype or executable code provided on the Competition Website. The Competition Data will contain private and public test sets. Which data belongs to which set will not be made available to participants.

    A. Data Access and Use. Competition Use and Non-Commercial & Academic Research: *You may access and use the Competition Data for non-commercial purposes only, including for participating in the Competition and on Kaggle.com forums, and for academic research and education. *The Competition Sponsor reserves the right to disqualify any participant who uses the Competition Data other than as permitted by the Competition Website and these Rules.

    B. Data Security. You agree to use reasonable and suitable measures to prevent persons who have not formally agreed to these Rules from gaining access to the Competition Data. You agree not to transmit, duplicate, publish, redistribute or otherwise provide or make available the Competition Data to any party not participating in the Competition. You agree to notify Kaggle immediately upon learning of any possible unauthorized transmission of or unauthorized access to the Competition Data and agree to work with Kaggle to rectify any unauthorized transmission or access.

    C. External Data. You may use data other than the Competition Data (“External Data”) to develop and test your Submissions. However, you will ensure the External Data is publicly available and equally accessible to use by all participants of the Competition for purposes of the competition at no cost to the other participants. The ability to use External Data under this Section 7.C (External Data) does not limit your other obligations under these Competition Rules, including but not limited to Section 11 (Winners Obligations).

    Content

    This dataset is a pairwise comparison of each sentence in the CommonLit competition with 500 other randomly-matched sentences. Sentences are divided into a training and validation datasets before being matched randomly. The relative complexity of each sentence is measured, and features such as the distance between this score for both sentences, and a column indicating whether or not the first sentence's readability score is greater than or equal to the score of the second sentence.

    Acknowledgements

    Thank you for the organisers of this competition for providing this dataset.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

  13. Excess Deaths Associated with COVID-19

    • datalumos.org
    delimited
    Updated Apr 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Department of Health and Human Services. Centers for Disease Control and Prevention. National Center for Health Statistics (2025). Excess Deaths Associated with COVID-19 [Dataset]. http://doi.org/10.3886/E227667V1
    Explore at:
    delimitedAvailable download formats
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    Centers for Disease Control and Preventionhttp://www.cdc.gov/
    National Center for Health Statisticshttps://www.cdc.gov/nchs/
    Authors
    United States Department of Health and Human Services. Centers for Disease Control and Prevention. National Center for Health Statistics
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2017 - 2023
    Area covered
    United States
    Description

    Estimates of excess deaths can provide information about the burden of mortality potentially related to the COVID-19 pandemic, including deaths that are directly or indirectly attributed to COVID-19. Excess deaths are typically defined as the difference between the observed numbers of deaths in specific time periods and expected numbers of deaths in the same time periods. This visualization provides weekly estimates of excess deaths by the jurisdiction in which the death occurred. Weekly counts of deaths are compared with historical trends to determine whether the number of deaths is significantly higher than expected.Counts of deaths from all causes of death, including COVID-19, are presented. As some deaths due to COVID-19 may be assigned to other causes of deaths (for example, if COVID-19 was not diagnosed or not mentioned on the death certificate), tracking all-cause mortality can provide information about whether an excess number of deaths is observed, even when COVID-19 mortality may be undercounted. Additionally, deaths from all causes excluding COVID-19 were also estimated. Comparing these two sets of estimates — excess deaths with and without COVID-19 — can provide insight about how many excess deaths are identified as due to COVID-19, and how many excess deaths are reported as due to other causes of death. These deaths could represent misclassified COVID-19 deaths, or potentially could be indirectly related to the COVID-19 pandemic (e.g., deaths from other causes occurring in the context of health care shortages or overburdened health care systems).Estimates of excess deaths can be calculated in a variety of ways, and will vary depending on the methodology and assumptions about how many deaths are expected to occur. Estimates of excess deaths presented in this webpage were calculated using Farrington surveillance algorithms (1). A range of values for the number of excess deaths was calculated as the difference between the observed count and one of two thresholds (either the average expected count or the upper bound of the 95% prediction interval), by week and jurisdiction.Provisional death counts are weighted to account for incomplete data. However, data for the most recent week(s) are still likely to be incomplete. Weights are based on completeness of provisional data in prior years, but the timeliness of data may have changed in 2020 relative to prior years, so the resulting weighted estimates may be too high in some jurisdictions and too low in others. As more information about the accuracy of the weighted estimates is obtained, further refinements to the weights may be made, which will impact the estimates. Any changes to the methods or weighting algorithm will be noted in the Technical Notes when they occur. More detail about the methods, weighting, data, and limitations can be found in the Technical Notes.This visualization includes several different estimates:Number of excess deaths: A range of estimates for the number of excess deaths was calculated as the difference between the observed count and one of two thresholds (either the average expected count or the upper bound threshold), by week and jurisdiction. Negative values, where the observed count fell below the threshold, were set to zero.Percent excess: The percent excess was defined as the number of excess deaths divided by the threshold.Total number of excess deaths: The total number of excess deaths in each jurisdiction was calculated by summing the excess deaths in each week, from February 1, 2020 to present. Similarly, the total number of excess deaths for the US overall was computed as a sum of jurisdiction-specific numbers of excess deaths (with negative values set to zero), and not directly estimated using the Farrington surveillance algorithms.Select a dashboard from the menu, then click on “Update Dashboard” to navigate through the different graphics.The first dashboard shows the weekly predicted counts of deaths from all causes, and the threshold for the expected number of deaths. Select a jurisdiction from the drop-down menu to show data for that jurisdiction.The second dashboard shows the weekly predicted counts of deaths from all causes and the weekly count of deaths from all causes excluding COVID-19. Select a jurisdiction from the drop-down menu to show data for that jurisdiction.The th

  14. d

    Data sets for phylogenomic analyses in: Ant backbone phylogeny resolved by...

    • datadryad.org
    • data.niaid.nih.gov
    zip
    Updated Jan 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chenyang Cai (2024). Data sets for phylogenomic analyses in: Ant backbone phylogeny resolved by modelling compositional heterogeneity among sites in genomic data [Dataset]. http://doi.org/10.5061/dryad.pk0p2ngsj
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 4, 2024
    Dataset provided by
    Dryad
    Authors
    Chenyang Cai
    Time period covered
    2023
    Description

    This Readme file summarizes the resultant files of my phylogenetic analyses deposited in the DRYAD repository of the paper: Cai, C., 2024. Ant backbone phylogeny resolved by modelling compositional heterogeneity among sites in genomic data. Communications Biology.

    The results of my phylogenetic analyses are listed in two root folders, corresponding to the two previously published studies: Borowiec et al. (2019) and Romiguier et al. (2022).

    1-Borowiec et al. 2019: This folder includes four folders, showing results based on four datasets of Borowiec et al. (2019) under the site-heterogeneous CAT-GTR+G4 model in PhyloBayes.

    1-Full_data_set_unconstrained-7451 NT sites: Full 11-gene matrix (123 taxa, 7,451 nucleotide [NT] sites).

    • D1-bpcomp.bpdiff: bpdiff result using the bpcomp tool in PhyloBayes
    • D1-bpcomp.con.tre: consensus tree of the PhyloBayes analysis

    2-AT-rich_outgr_removed-7451 NT sites: Full matrix with the most AT-rich outgroups excluded (117 taxa, 7,451 NT sites).、

    • ...
  15. d

    Temporal and Spatio-Temporal High-Resolution Satellite Data for the...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Temporal and Spatio-Temporal High-Resolution Satellite Data for the Validation of a Landsat Time-Series of Fractional Component Cover Across Western United States (U.S.) Rangelands [Dataset]. https://catalog.data.gov/dataset/temporal-and-spatio-temporal-high-resolution-satellite-data-for-the-validation-of-a-landsa
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Western United States, United States
    Description

    Western U.S. rangelands have been quantified as six fractional cover (0-100%) components over the Landsat archive (1985-2018) at 30-m resolution, termed the “Back-in-Time” (BIT) dataset. Robust validation through space and time is needed to quantify product accuracy. We leverage field data observed concurrently with HRS imagery over multiple years and locations in the Western U.S. to dramatically expand the spatial extent and sample size of validation analysis relative to a direct comparison to field observations and to previous work. We compare HRS and BIT data in the corresponding space and time. Our objectives were to evaluate the temporal and spatio-temporal relationships between HRS and BIT data, and to compare their response to spatio-temporal variation in climate. We hypothesize that strong temporal and spatio-temporal relationships will exist between HRS and BIT data and that they will exhibit similar climate response. We evaluated a total of 42 HRS sites across the western U.S. with 32 sites in Wyoming, and 5 sites each in Nevada and Montana. HRS sites span a broad range of vegetation, biophysical, climatic, and disturbance regimes. Our HRS sites were strategically located to collectively capture the range of biophysical conditions within a region. Field data were used to train 2-m predictions of fractional component cover at each HRS site and year. The 2-m predictions were degraded to 30-m, and some were used to train regional Landsat-scale, 30-m, “base” maps of fractional component cover representing circa 2016 conditions. A Landsat-imagery time-series spanning 1985-2018, excluding 2012, was analyzed for change through time. Pixels and times identified as changed from the base were trained using the base fractional component cover from the pixels identified as unchanged. Changed pixels were labeled with the updated predictions, while the base was maintained in the unchanged pixels. The resulting BIT suite includes the fractional cover of the six components described above for 1985-2018. We compare the two datasets, HRS and BIT, in space and time. Two tabular data presented here correspond to a temporal and spatio-temporal validation of the BIT data. First, the temporal data are HRS and BIT component cover and climate variable means by site by year. Second, the spatio-temporal data are HRS and BIT component cover and associated climate variables at individual pixels in a site-year.

  16. Z

    Portuguese Comparative Sentences: A Collection of Labeled Sentences on...

    • data.niaid.nih.gov
    • live.european-language-grid.eu
    • +1more
    Updated Apr 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Breno Matos (2021). Portuguese Comparative Sentences: A Collection of Labeled Sentences on Twitter and Buscapé [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4124409
    Explore at:
    Dataset updated
    Apr 19, 2021
    Dataset provided by
    Fabrício Benevenuto
    Matheus Barbosa
    Julio C. S. Reis
    Daniel Kansaon
    Michele A. Brandão
    Breno Matos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    More and more customers demand online reviews of products and comments on the Web to make decisions about buying a product over another. In this context, sentiment analysis techniques constitute the traditional way to summarize a user’s opinions that criticizes or highlights the positive aspects of a product. Sentiment analysis of reviews usually relies on extracting positive and negative aspects of products, neglecting comparative opinions. Such opinions do not directly express a positive or negative view but contrast aspects of products from different competitors.

    Here, we present the first effort to study comparative opinions in Portuguese, creating two new Portuguese datasets with comparative sentences marked by three humans. This repository consists of three important files: (1) lexicon that contains words frequently used to make a comparison in Portuguese; (2) Twitter dataset with labeled comparative sentences; and (3) Buscapé dataset with labeled comparative sentences.

    The lexicon is a set of 176 words frequently used to express a comparative opinion in the Portuguese language. In these contexts, the lexicon is aggregated in a filter and used to build two sets of data with comparative sentences from two important contexts: (1) Social Network Online; and (2) Product reviews.

    For Twitter, we collected all Portuguese tweets published in Brazil on 2018/01/10 and filtered all tweets that contained at least one keyword present in the lexicon, obtaining 130,459 tweets. Our work is based on the sentence level. Thus, all sentences were extracted and a sample with 2,053 sentences was created, which was labeled for three human manuals, reaching an 83.2% agreement with Fleiss' Kappa coefficient. For Buscapé, a Brazilian website (https://www.buscape.com.br/) used to compare product prices on the web, the same methodology was conducted by creating a set of 2,754 labeled sentences, obtained from comments made in 2013. This dataset was labeled by three humans, reaching an agreement of 83.46% with the Fleiss Kappa coefficient.

    The Twitter dataset has 2,053 labeled sentences, of which 918 are comparative. The Buscapé dataset has 2,754 labeled sentences, of which 1,282 are comparative.

    The datasets contain these labeled properties:

    text: the sentence extracted from the review comment.

    entity_s1: the first entity compared in the sentence.

    entity_s2: the second entity compared in the sentence.

    keyword: the comparative keyword used in the sentence to express comparison.

    preferred_entity: the preferred entity.

    id_start: the keyword's initial position in the sentence.

    id_end: the keyword's final position in the sentence.

    type: the sentence label, which specifies whether the phrase is a comparison.

    Additional Information:

    1 - The sentences were separated using a sentence tokenizer.

    2 - If the compared entity is not specified, the field will receive a value: "_".

    3 - The property "type" can contain five values, they are:

    0: Non-comparative (Não Comparativa).

    1: Non-Equal-Gradable (Gradativa com Predileção).

    2: Equative (Equitativa).

    3: Superlative (Superlativa).

    4: Non-Equal-Gradable (Não Gradativa).

    If you use this data, please cite our paper as follows:

    "Daniel Kansaon, Michele A. Brandão, Julio C. S. Reis, Matheus Barbosa,Breno Matos, and Fabrício Benevenuto. 2020. Mining Portuguese Comparative Sentences in Online Reviews. In Brazilian Symposium on Multimedia and the Web (WebMedia ’20), November 30-December 4, 2020, São Luís, Brazil. ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3428658.3431081"

    Plus Information:

    We make the raw sentences available in the dataset to allow future work to test different pre-processing steps. Then, if you want to obtain the exact sentences used in the paper above, you must reproduce the pre-processing step described in the paper (Figure 2).

    For each sentence with more than one keyword in the dataset:

    You need to extract three words before and three words after the comparative keyword, creating a new sentence that will receive the existing value in the “type” field as a label;

    The original sentence will be divided into n new sentences. (n) is the number of keywords in the sentence;

    The stopwords should not be accounted for as part of this range (3 words);

    Note that: the final processed sentence can have more than six words because the stopwords are not counted as part of the range.

  17. f

    Comparison of classifier performance across two data sets.

    • plos.figshare.com
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph Schlecht; Matthew E. Kaplan; Kobus Barnard; Tatiana Karafet; Michael F. Hammer; Nirav C. Merchant (2023). Comparison of classifier performance across two data sets. [Dataset]. http://doi.org/10.1371/journal.pcbi.1000093.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS Computational Biology
    Authors
    Joseph Schlecht; Matthew E. Kaplan; Kobus Barnard; Tatiana Karafet; Michael F. Hammer; Nirav C. Merchant
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The top table shows the average classifier performance for cross-validation on the 9-locus public STR data. The bottom table is the performance for the same test, but on a 9-locus subset of our ground-truth training data. While overall performance is lower than the 15-locus cross-validation test on our ground-truth data (Table 1), the two data sets perform similarly here, indicating that increasing the number of markers in the data set can significantly improve performance.

  18. d

    Data from: Hydrodynamic time-series data from two marshes and adjacent...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Aug 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Hydrodynamic time-series data from two marshes and adjacent shallows in Northern San Francisco Bay, California, 2022-2023 [Dataset]. https://catalog.data.gov/dataset/hydrodynamic-time-series-data-from-two-marshes-and-adjacent-shallows-in-northern-san-2022-
    Explore at:
    Dataset updated
    Aug 17, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    San Francisco Bay, California
    Description

    Hydrodynamic and sediment transport time-series data, including water depth, velocity, turbidity, conductivity, and temperature, were collected by the U.S. Geological Survey (USGS) Pacific Coastal and Marine Science Center at shallow subtidal and intertidal sites in Corte Madera Bay and San Pablo Bay National Wildlife Refuge (SPNWF) in San Francisco Bay, CA, as well as on the marsh plain of SPNWF marsh and in a tidal creek and on the marsh plain of Corte Madera Marsh, in 2022 and 2023. Data files are grouped by station, San Pablo subtidal, San Pablo intertidal, San Pablo marsh, Corte Madera subtidal, Corte Madera intertidal, Corte Madera marsh, or Corte Madera tidal creek, then by instrument type. At most stations there were periods of low water when sensors were no longer submerged, resulting in spurious data. In addition, most instruments experienced some degree of biofouling, particularly at the subtidal and intertidal stations. The subtidal stations also occasionally show signs of platform rocking or movement due to strong water flow, and/or from accidental fisher/boater interference. Users are advised to assess data quality carefully, and to check the metadata for instrument information, as platform deployment times and data-processing methods varied.

  19. d

    Data release for sensor comparison subset associated with the journal...

    • catalog.data.gov
    • data.usgs.gov
    • +2more
    Updated Jul 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Data release for sensor comparison subset associated with the journal article "Solar and sensor geometry, not vegetation response, drive satellite NDVI phenology in widespread ecosystems of the western United States" [Dataset]. https://catalog.data.gov/dataset/data-release-for-sensor-comparison-subset-associated-with-the-journal-article-solar-and-se
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    United States
    Description

    This dataset provides NDVI time series data in comma-delimited format from the phenocam location using five satellite products: 1) Proba-V L1c product 2) Landsat 7 SR product 3) Sentinel-2 Level-1C product 4) Sentinel 2 Level-2A data product 5) Suomi National Polar-Orbiting Partnership (S-NPP) NASA Visible Infrared Imaging Radiometer Suite (VIIRS) VNP13A1 data product The dataset also includes scripts to download these data from Google Earth Engine. The data are provided in support of the following publication: "Solar and sensor geometry, not vegetation response, drive satellite NDVI phenology in widespread ecosystems of the western United States". The data and scripts allow users to replicate, test, or further explore results. The comma-delimited csv files are named according to the satellite and product. The javascript Google Earth Engine code files (within the folder "Code") are also named by satellite/product, with the Proba and VIIRS time series code combined into a single file, and the other products as separate files. A graph of the data is included as the file 'SensorCompare4SB.jpg' and shows the NDVI time series from the products described above. The data in this graph can also be viewed in figure 10 of the associated journal article.

  20. r

    The banksia plot: a method for visually comparing point estimates and...

    • researchdata.edu.au
    • bridges.monash.edu
    Updated Apr 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Turner; Joanne McKenzie; Emily Karahalios; Elizabeth Korevaar (2024). The banksia plot: a method for visually comparing point estimates and confidence intervals across datasets [Dataset]. http://doi.org/10.26180/25286407.V2
    Explore at:
    Dataset updated
    Apr 16, 2024
    Dataset provided by
    Monash University
    Authors
    Simon Turner; Joanne McKenzie; Emily Karahalios; Elizabeth Korevaar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Companion data for the creation of a banksia plot:

    Background:

    In research evaluating statistical analysis methods, a common aim is to compare point estimates and confidence intervals (CIs) calculated from different analyses. This can be challenging when the outcomes (and their scale ranges) differ across datasets. We therefore developed a plot to facilitate pairwise comparisons of point estimates and confidence intervals from different statistical analyses both within and across datasets.

    Methods:

    The plot was developed and refined over the course of an empirical study. To compare results from a variety of different studies, a system of centring and scaling is used. Firstly, the point estimates from reference analyses are centred to zero, followed by scaling confidence intervals to span a range of one. The point estimates and confidence intervals from matching comparator analyses are then adjusted by the same amounts. This enables the relative positions of the point estimates and CI widths to be quickly assessed while maintaining the relative magnitudes of the difference in point estimates and confidence interval widths between the two analyses. Banksia plots can be graphed in a matrix, showing all pairwise comparisons of multiple analyses. In this paper, we show how to create a banksia plot and present two examples: the first relates to an empirical evaluation assessing the difference between various statistical methods across 190 interrupted time series (ITS) data sets with widely varying characteristics, while the second example assesses data extraction accuracy comparing results obtained from analysing original study data (43 ITS studies) with those obtained by four researchers from datasets digitally extracted from graphs from the accompanying manuscripts.

    Results:

    In the banksia plot of statistical method comparison, it was clear that there was no difference, on average, in point estimates and it was straightforward to ascertain which methods resulted in smaller, similar or larger confidence intervals than others. In the banksia plot comparing analyses from digitally extracted data to those from the original data it was clear that both the point estimates and confidence intervals were all very similar among data extractors and original data.

    Conclusions:

    The banksia plot, a graphical representation of centred and scaled confidence intervals, provides a concise summary of comparisons between multiple point estimates and associated CIs in a single graph. Through this visualisation, patterns and trends in the point estimates and confidence intervals can be easily identified.

    This collection of files allows the user to create the images used in the companion paper and amend this code to create their own banksia plots using either Stata version 17 or R version 4.3.1

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Leonidas Bantis; Ziding Feng (2023). Dataset for: Comparison of Two Correlated ROC Surfaces at a Given Pair of True Classification Rates [Dataset]. http://doi.org/10.6084/m9.figshare.6527219.v1

Dataset for: Comparison of Two Correlated ROC Surfaces at a Given Pair of True Classification Rates

Related Article
Explore at:
xlsxAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
Wiley
Authors
Leonidas Bantis; Ziding Feng
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

The receiver operating characteristics (ROC) curve is typically employed when one wants to evaluate the discriminatory capability of a continuous or ordinal biomarker in the case where two groups are to be distinguished, commonly the ’healthy’ and the ’diseased’. There are cases for which the disease status has three categories. Such cases employ the (ROC) surface, which is a natural generalization of the ROC curve for three classes. In this paper, we explore new methodologies for comparing two continuous biomarkers that refer to a trichotomous disease status, when both markers are applied to the same patients. Comparisons based on the volume under the surface have been proposed, but that measure is often not clinically relevant. Here, we focus on comparing two correlated ROC surfaces at given pairs of true classification rates, which are more relevant to patients and physicians. We propose delta-based parametric techniques, power transformations to normality, and bootstrap-based smooth nonparametric techniques to investigate the performance of an appropriate test. We evaluate our approaches through an extensive simulation study and apply them to a real data set from prostate cancer screening.

Search
Clear search
Close search
Google apps
Main menu