100+ datasets found
  1. f

    Dataset for: Comparison of Two Correlated ROC Surfaces at a Given Pair of...

    • wiley.figshare.com
    xlsx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leonidas Bantis; Ziding Feng (2023). Dataset for: Comparison of Two Correlated ROC Surfaces at a Given Pair of True Classification Rates [Dataset]. http://doi.org/10.6084/m9.figshare.6527219.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Wiley
    Authors
    Leonidas Bantis; Ziding Feng
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The receiver operating characteristics (ROC) curve is typically employed when one wants to evaluate the discriminatory capability of a continuous or ordinal biomarker in the case where two groups are to be distinguished, commonly the ’healthy’ and the ’diseased’. There are cases for which the disease status has three categories. Such cases employ the (ROC) surface, which is a natural generalization of the ROC curve for three classes. In this paper, we explore new methodologies for comparing two continuous biomarkers that refer to a trichotomous disease status, when both markers are applied to the same patients. Comparisons based on the volume under the surface have been proposed, but that measure is often not clinically relevant. Here, we focus on comparing two correlated ROC surfaces at given pairs of true classification rates, which are more relevant to patients and physicians. We propose delta-based parametric techniques, power transformations to normality, and bootstrap-based smooth nonparametric techniques to investigate the performance of an appropriate test. We evaluate our approaches through an extensive simulation study and apply them to a real data set from prostate cancer screening.

  2. Supplementary material from "Visual comparison of two data sets: Do people...

    • figshare.com
    xlsx
    Updated Mar 14, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robin Kramer; Caitlin Telfer; Alice Towler (2017). Supplementary material from "Visual comparison of two data sets: Do people use the means and the variability?" [Dataset]. http://doi.org/10.6084/m9.figshare.4751095.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 14, 2017
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Robin Kramer; Caitlin Telfer; Alice Towler
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In our everyday lives, we are required to make decisions based upon our statistical intuitions. Often, these involve the comparison of two groups, such as luxury versus family cars and their suitability. Research has shown that the mean difference affects judgements where two sets of data are compared, but the variability of the data has only a minor influence, if any at all. However, prior research has tended to present raw data as simple lists of values. Here, we investigated whether displaying data visually, in the form of parallel dot plots, would lead viewers to incorporate variability information. In Experiment 1, we asked a large sample of people to compare two fictional groups (children who drank ‘Brain Juice’ versus water) in a one-shot design, where only a single comparison was made. Our results confirmed that only the mean difference between the groups predicted subsequent judgements of how much they differed, in line with previous work using lists of numbers. In Experiment 2, we asked each participant to make multiple comparisons, with both the mean difference and the pooled standard deviation varying across data sets they were shown. Here, we found that both sources of information were correctly incorporated when making responses. Taken together, we suggest that increasing the salience of variability information, through manipulating this factor across items seen, encourages viewers to consider this in their judgements. Such findings may have useful applications for best practices when teaching difficult concepts like sampling variation.

  3. f

    Statistical Comparison of Two ROC Curves

    • figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yaacov Petscher (2023). Statistical Comparison of Two ROC Curves [Dataset]. http://doi.org/10.6084/m9.figshare.860448.v1
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    figshare
    Authors
    Yaacov Petscher
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This excel file will do a statistical tests of whether two ROC curves are different from each other based on the Area Under the Curve. You'll need the coefficient from the presented table in the following article to enter the correct AUC value for the comparison: Hanley JA, McNeil BJ (1983) A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 148:839-843.

  4. f

    Comparison of the Predictive Performance and Interpretability of Random...

    • acs.figshare.com
    • figshare.com
    zip
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Richard L. Marchese Robinson; Anna Palczewska; Jan Palczewski; Nathan Kidley (2023). Comparison of the Predictive Performance and Interpretability of Random Forest and Linear Models on Benchmark Data Sets [Dataset]. http://doi.org/10.1021/acs.jcim.6b00753.s006
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    ACS Publications
    Authors
    Richard L. Marchese Robinson; Anna Palczewska; Jan Palczewski; Nathan Kidley
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The ability to interpret the predictions made by quantitative structure–activity relationships (QSARs) offers a number of advantages. While QSARs built using nonlinear modeling approaches, such as the popular Random Forest algorithm, might sometimes be more predictive than those built using linear modeling approaches, their predictions have been perceived as difficult to interpret. However, a growing number of approaches have been proposed for interpreting nonlinear QSAR models in general and Random Forest in particular. In the current work, we compare the performance of Random Forest to those of two widely used linear modeling approaches: linear Support Vector Machines (SVMs) (or Support Vector Regression (SVR)) and partial least-squares (PLS). We compare their performance in terms of their predictivity as well as the chemical interpretability of the predictions using novel scoring schemes for assessing heat map images of substructural contributions. We critically assess different approaches for interpreting Random Forest models as well as for obtaining predictions from the forest. We assess the models on a large number of widely employed public-domain benchmark data sets corresponding to regression and binary classification problems of relevance to hit identification and toxicology. We conclude that Random Forest typically yields comparable or possibly better predictive performance than the linear modeling approaches and that its predictions may also be interpreted in a chemically and biologically meaningful way. In contrast to earlier work looking at interpretation of nonlinear QSAR models, we directly compare two methodologically distinct approaches for interpreting Random Forest models. The approaches for interpreting Random Forest assessed in our article were implemented using open-source programs that we have made available to the community. These programs are the rfFC package (https://r-forge.r-project.org/R/?group_id=1725) for the R statistical programming language and the Python program HeatMapWrapper [https://doi.org/10.5281/zenodo.495163] for heat map generation.

  5. Z

    Data from: Dataset for the comparison of two Computational Thinking (CT)...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Dec 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jessica Dehler Zufferey (2022). Dataset for the comparison of two Computational Thinking (CT) test for upper primary school (grades 3-4) : the Beginners' CT test (BCTt) and the competent CT test (cCTt) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5885033
    Explore at:
    Dataset updated
    Dec 1, 2022
    Dataset provided by
    Barbara Bruno
    Pedro Marcelino
    Laila El-Hamamsy
    Jessica Dehler Zufferey
    Estefanía Martín Barroso
    María Zapata-Cáceres
    ‪Marcos Román-González‬
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains quantitative student data acquired during the administration of two validated Computational Thinking (CT) assessments for upper primary school (grades 3 and 4): the Beginners' CT test (BCTt) [1] and the comptent CT test (cCTt) [2]

    To compare the psychometric properties of both instruments a comparative analysis was conducted with data acquired in schools in Portugal from the same school districts. More specifically, we analyse the results of:

    • the BCTt test administered in March 2020 to 374 students in grades 3-4,

    • the cCTt test administered in April 2021 to 201 different students in grades 3-4.

    These students had no prior experience in Computational Thinking, as this was not part of the national curriculum at the times of administration.

    The detailed psychometric comparison is published in Frontiers in Psychology - Educational Psychology [3] and provides indications regarding the use of both instruments for grades 3-4.

    A README is included and provides additional information regarding :

    • the requirements for re-use.

    • the specific content of the 2 csv files

    The BCTt is available upon request to maria.zapata@urjc.es and the cCTt items are available in [2] with an editable version being available upon request to laila.elhamamsy@epfl.ch.

    In case of other inquiries, please contact: laila.elhamamsy@epfl.ch, maria.zapata@urjc.es or pedro.marcelino@treetree2.org

    References

    [1] M. Zapata-Cáceres, E. Martín-Barroso and M. Román-González, "Computational Thinking Test for Beginners: Design and Content Validation," 2020 IEEE Global Engineering Education Conference (EDUCON), 2020, pp. 1905-1914, doi: 10.1109/EDUCON45650.2020.9125368.

    [2] El-Hamamsy, L., Zapata-Cáceres, M., Barroso, E. M., Mondada, F., Zufferey, J. D., & Bruno, B. (2022). The Competent Computational Thinking Test: Development and Validation of an Unplugged Computational Thinking Test for Upper Primary School. Journal of Educational Computing Research, 60(7), 1818–1866. https://doi.org/10.1177/07356331221081753

    [3] Laila El-Hamamsy* , María Zapata-Cáceres, Pedro Marcelino, Jessica Dehler Zufferey, Barbara Bruno, Estefanía Martín-Barroso and Marcos Román-González (2022). Comparing the psychometric properties of two primary school Computational Thinking (CT) assessments for grades 3 and 4: the Beginners' CT test (BCTt) and the competent CT test (cCTt). Front. Psychol. doi:10.3389/fpsyg.2022.1082659

  6. d

    Temporal and Spatio-Temporal High-Resolution Satellite Data for the...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Temporal and Spatio-Temporal High-Resolution Satellite Data for the Validation of a Landsat Time-Series of Fractional Component Cover Across Western United States (U.S.) Rangelands [Dataset]. https://catalog.data.gov/dataset/temporal-and-spatio-temporal-high-resolution-satellite-data-for-the-validation-of-a-landsa
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Western United States, United States
    Description

    Western U.S. rangelands have been quantified as six fractional cover (0-100%) components over the Landsat archive (1985-2018) at 30-m resolution, termed the “Back-in-Time” (BIT) dataset. Robust validation through space and time is needed to quantify product accuracy. We leverage field data observed concurrently with HRS imagery over multiple years and locations in the Western U.S. to dramatically expand the spatial extent and sample size of validation analysis relative to a direct comparison to field observations and to previous work. We compare HRS and BIT data in the corresponding space and time. Our objectives were to evaluate the temporal and spatio-temporal relationships between HRS and BIT data, and to compare their response to spatio-temporal variation in climate. We hypothesize that strong temporal and spatio-temporal relationships will exist between HRS and BIT data and that they will exhibit similar climate response. We evaluated a total of 42 HRS sites across the western U.S. with 32 sites in Wyoming, and 5 sites each in Nevada and Montana. HRS sites span a broad range of vegetation, biophysical, climatic, and disturbance regimes. Our HRS sites were strategically located to collectively capture the range of biophysical conditions within a region. Field data were used to train 2-m predictions of fractional component cover at each HRS site and year. The 2-m predictions were degraded to 30-m, and some were used to train regional Landsat-scale, 30-m, “base” maps of fractional component cover representing circa 2016 conditions. A Landsat-imagery time-series spanning 1985-2018, excluding 2012, was analyzed for change through time. Pixels and times identified as changed from the base were trained using the base fractional component cover from the pixels identified as unchanged. Changed pixels were labeled with the updated predictions, while the base was maintained in the unchanged pixels. The resulting BIT suite includes the fractional cover of the six components described above for 1985-2018. We compare the two datasets, HRS and BIT, in space and time. Two tabular data presented here correspond to a temporal and spatio-temporal validation of the BIT data. First, the temporal data are HRS and BIT component cover and climate variable means by site by year. Second, the spatio-temporal data are HRS and BIT component cover and associated climate variables at individual pixels in a site-year.

  7. Comparative Analysis of MLB Players wOBA

    • kaggle.com
    Updated Jan 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Comparative Analysis of MLB Players wOBA [Dataset]. https://www.kaggle.com/datasets/thedevastator/comparative-analysis-of-mlb-players-2014-woba-st
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 15, 2023
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    Description

    Comparative Analysis of MLB Players wOBA Statistics

    Analyzing Skillsets in The Old Ballgame

    By Devi Ramanan [source]

    About this dataset

    This dataset features a comprehensive look into the performance of 311 professional Major League Baseball players. It comprises key batting statistics including name, team, age, plate appearances (PA), batting average (AVG), on-base plus slugging percentage - average (OBP-AVG), isolated power (ISO), stolen bases (SB), and ultimate zone rating per 150 games (UZR/150). Additionally, the dataset contains more detailed and complex metrics for each player such as weighted values for singles (1Bw), doubles (2Bw), triples(3Bw), home runs(HRw) unintentional walks(uBBw), hit by pitches(HBPw) ,stolen bases attempted/successful(SBW/CSW) and weighted On-Base Average(WOBA). All these data points create an effective way to measure the offensive performance that is both insightful and objective. Jeff Long's Spira Award winning article analyzed this very same data to compare MLB players who have similar skillsets than would otherwise be expected

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset can be used to analyse the wOBA stats of MLB players with at least 250 plate appearances (PA). This dataset has data on 31 baseball players. The data includes the player's name, their team, age, PA, batting average (AVG), on-base percentage minus batting average (OBP-AVG), isolated power (ISO), stolen bases (SB), Ultimate Zone Rating per 150 games (UZR/150), weighted value of singles(1Bw) , weighted value of doubles(2Bw) , weighted value of triples(3Bw) , weighted value of home runs(HRw) ,weighted value of unintentional walks(uBBw) ,weighted value of hit by pitches(HBPw )and stolen base attempt success rate (CSW). By using this dataset you can compare different MLB Players' stats in the same year.

    Research Ideas

    • Analyzing and predicting batting performance. With this dataset, researchers could create models to observe correlations between batting metrics such as strikeouts, walks, home runs, stolen bases etc and overall wOBA scores for the players. This could be used to generate insights into the most important batting factors that contribute the greatest benefit for a team's success.
    • Comparing players from different teams in terms of their batting performance. By comparing two players with similar stats (for example two offensive power hitters) across different teams it would be possible to analyze whether certain teams consistently have better offensive players or if they just have higher quantity in particular positions of play.
    • Creating a predictive model for MLB draft prospects or free agents signing potentials based on their stats and previous yearly changes in OBP-AVG or UZR/150 score could provide meaningful insight into which emerging talents are likely to see substantial improvement in their career trajectory over time when compared with aging stars who may gradually decline over time due to age related attrition factors such as injury and fatigue amongst others

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    Unknown License - Please check the dataset description for more information.

    Columns

    File: Batting Key Stats2.csv | Column name | Description | |:--------------|:--------------------------------------------------| | Name | Name of the player. (String) | | Team | Team the player is on. (String) | | Age | Age of the player. (Integer) | | PA | Plate Appearances. (Integer) | | AVG | Batting Average. (Float) | | OBP-AVG | On-Base Percentage minus Batting Average. (Float) | | ISO | Isolated Power. (Float) | | SB | Stolen Bases. (Integer) | | UZR/150 | Ultimate Zone Rating per 150 games. (Float) |

    File: 2014 wOBA Stats 3.csv | Column name | Description | |:--------------|:-----------------------------------------------| | Name | Name of the player. (String) | | Team | Team the player is on. (String) | | PA | Plate Appearances. (Integer) | | 1Bw | Weighted value of singles. (Float) | | 2Bw | Weighted value of doubles. (Float)...

  8. Compare (Deprecated)

    • noveladata.com
    • data-salemva.opendata.arcgis.com
    Updated Dec 6, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    esri_en (2018). Compare (Deprecated) [Dataset]. https://www.noveladata.com/datasets/632dc51300de4ef7b2630d2d303f1440
    Explore at:
    Dataset updated
    Dec 6, 2018
    Dataset provided by
    Esrihttp://esri.com/
    Authors
    esri_en
    Description

    Compare is a configurable app template that supports a side-by-side or stacked comparison of two maps or scenes. It can be configured to compare two scenes, two maps, or one of each. The two views can be linked or unlinked depending on if you want to show the same location or not.Use CasesSupport presentations of different urban development scenarios for one location or to compare the landscape in two locations.Present the results from a variety of different analytic methods.Show the difference between household income in multiple places, or the difference between household income and home values in a single location.Configurable OptionsCompare can be configured using the following options:Provide a title and description for the application, as well as the option to describe each view.Select from a set of map options including: search, legend, scalebar, home button, measurement tools, a basemap toggle, and a share dialog.Determine how the views will be compared by selecting from a side-by-side or stacked layout, and if you want to synchronize the pop-ups.Supported DevicesThis application is responsively designed to support use in browsers on desktops and tablets.Data RequirementsThis application has no data requirements.Get Started This application can be created in the following ways:Click the Create a Web App button on this pageShare a map and choose to Create a Web AppOn the Content page, click Create - App - From Template Click the Download button to access the source code. Do this if you want to host the app on your own server and optionally customize it to add features or change styling.

  9. O

    CT Aerial Imagery Viewer v2

    • data.ct.gov
    • geodata.ct.gov
    application/rdfxml +5
    Updated Jun 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). CT Aerial Imagery Viewer v2 [Dataset]. https://data.ct.gov/dataset/CT-Aerial-Imagery-Viewer-v2/khzr-x425
    Explore at:
    tsv, xml, application/rssxml, application/rdfxml, csv, jsonAvailable download formats
    Dataset updated
    Jun 27, 2025
    Area covered
    Connecticut
    Description
    This viewer is available through CT ECO, a partnership between CT DEEP and UConn CLEAR.
    Description
    The Aerial Imagery Viewer contains all of Connecticut’s statewide digital aerial imagery plus some. The collection includes black and white, color, and infrared imagery going as far back as 1934 with varying pixel resolutions (up to 3 inch!) and funded by different regional, state, and federal agencies. Refer to the CT Digital Imagery page for descriptions of the datasets available on CT ECO and in the Aerial Imagery Viewer.
    Use
    To use the viewer, zoom in and then use the Layer List (upper right) to turn on and off layers (remember to turn OFF the ones above on the list or they will hide layers below) to compare and explore the area. The swipe tool (lower left) is a fun way to compare two datasets. Be sure at least two items are checked on in the layer list and use the swipe tool to compare. Refer to Viewer Help for more details and tips.
    Tips
    - smaller pixels sizes mean more spatial detail
    - leaf off imagery has a lot of brown and provides better visibility off features that exist under tree canopies
    - near infrared layers are displayed so that healthy green vegetation is the brightest red
    - near infrared layers provide excellent contrast between vegetated and non-vegetated features
  10. E

    Dataset for training classifiers of comparative sentences

    • live.european-language-grid.eu
    • explore.openaire.eu
    • +1more
    csv
    Updated Apr 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Dataset for training classifiers of comparative sentences [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7607
    Explore at:
    csvAvailable download formats
    Dataset updated
    Apr 19, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    As there was no large publicly available cross-domain dataset for comparative argument mining, we create one composed of sentences, potentially annotated with BETTER / WORSE markers (the first object is better / worse than the second object) or NONE (the sentence does not contain a comparison of the target objects). The BETTER sentences stand for a pro-argument in favor of the first compared object and WORSE-sentences represent a con-argument and favor the second object. We aim for minimizing dataset domain-specific biases in order to capture the nature of comparison and not the nature of the particular domains, thus decided to control the specificity of domains by the selection of comparison targets. We hypothesized and could confirm in preliminary experiments that comparison targets usually have a common hypernym (i.e., are instances of the same class), which we utilized for selection of the compared objects pairs. The most specific domain we choose, is computer science with comparison targets like programming languages, database products and technology standards such as Bluetooth or Ethernet. Many computer science concepts can be compared objectively (e.g., on transmission speed or suitability for certain applications). The objects for this domain were manually extracted from List of-articles at Wikipedia. In the annotation process, annotators were asked to only label sentences from this domain if they had some basic knowledge in computer science. The second, broader domain is brands. It contains objects of different types (e.g., cars, electronics, and food). As brands are present in everyday life, anyone should be able to label the majority of sentences containing well-known brands such as Coca-Cola or Mercedes. Again, targets for this domain were manually extracted from `List of''-articles at Wikipedia.The third domain is not restricted to any topic: random. For each of 24~randomly selected seed words 10 similar words were collected based on the distributional similarity API of JoBimText (http://www.jobimtext.org). Seed words created using randomlists.com: book, car, carpenter, cellphone, Christmas, coffee, cork, Florida, hamster, hiking, Hoover, Metallica, NBC, Netflix, ninja, pencil, salad, soccer, Starbucks, sword, Tolkien, wine, wood, XBox, Yale.Especially for brands and computer science, the resulting object lists were large (4493 in brands and 1339 in computer science). In a manual inspection, low-frequency and ambiguous objects were removed from all object lists (e.g., RAID (a hardware concept) and Unity (a game engine) are also regularly used nouns). The remaining objects were combined to pairs. For each object type (seed Wikipedia list page or the seed word), all possible combinations were created. These pairs were then used to find sentences containing both objects. The aforementioned approaches to selecting compared objects pairs tend minimize inclusion of the domain specific data, but do not solve the problem fully though. We keep open a question of extending dataset with diverse object pairs including abstract concepts for future work. As for the sentence mining, we used the publicly available index of dependency-parsed sentences from the Common Crawl corpus containing over 14 billion English sentences filtered for duplicates. This index was queried for sentences containing both objects of each pair. For 90% of the pairs, we also added comparative cue words (better, easier, faster, nicer, wiser, cooler, decent, safer, superior, solid, terrific, worse, harder, slower, poorly, uglier, poorer, lousy, nastier, inferior, mediocre) to the query in order to bias the selection towards comparisons but at the same time admit comparisons that do not contain any of the anticipated cues. This was necessary as a random sampling would have resulted in only a very tiny fraction of comparisons. Note that even sentences containing a cue word do not necessarily express a comparison between the desired targets (dog vs. cat: He's the best pet that you can get, better than a dog or cat.). It is thus especially crucial to enable a classifier to learn not to rely on the existence of clue words only (very likely in a random sample of sentences with very few comparisons). For our corpus, we keep pairs with at least 100 retrieved sentences.From all sentences of those pairs, 2500 for each category were randomly sampled as candidates for a crowdsourced annotation that we conducted on figure-eight.com in several small batches. Each sentence was annotated by at least five trusted workers. We ranked annotations by confidence, which is the figure-eight internal measure of combining annotator trust and voting, and discarded annotations with a confidence below 50%. Of all annotated items, 71% received unanimous votes and for over 85% at least 4 out of 5 workers agreed -- rendering the collection procedure aimed at ease of annotation successful.The final dataset contains 7199 sentences with 271 distinct object pairs. The majority of sentences (over 72%) are non-comparative despite biasing the selection with cue words; in 70% of the comparative sentences, the favored target is named first.You can browse though the data here: https://docs.google.com/spreadsheets/d/1U8i6EU9GUKmHdPnfwXEuBxi0h3aiRCLPRC-3c9ROiOE/edit?usp=sharing Full description of the dataset is available in the workshop paper at ACL 2019 conference. Please cite this paper if you use the data: Franzek, Mirco, Alexander Panchenko, and Chris Biemann. ""Categorization of Comparative Sentences for Argument Mining."" arXiv preprint arXiv:1809.06152 (2018).@inproceedings{franzek2018categorization, title={Categorization of Comparative Sentences for Argument Mining}, author={Panchenko, Alexander and Bondarenko, and Franzek, Mirco and Hagen, Matthias and Biemann, Chris}, booktitle={Proceedings of the 6th Workshop on Argument Mining at ACL'2019}, year={2019}, address={Florence, Italy}}

  11. Dataset from A Two-part, Randomized, Double-blind, Single-dose, Crossover...

    • data.niaid.nih.gov
    Updated Nov 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GSK Clinical Trials (2024). Dataset from A Two-part, Randomized, Double-blind, Single-dose, Crossover Study to Compare Formulations Produced by Two Methods of Manufacture for Bioequivalence and Dissolution in Healthy Adult Volunteers [Dataset]. http://doi.org/10.25934/PR00009269
    Explore at:
    Dataset updated
    Nov 26, 2024
    Dataset provided by
    GSK plchttp://gsk.com/
    Authors
    GSK Clinical Trials
    Area covered
    United States
    Variables measured
    Cmax, Half-life, Area Under the Curve, Daprodustat (GSK1278863), Time to Maximum Concentration (Tmax)
    Description

    This study is comprised of two discrete Parts. Part A is a 3-period cross over evaluating relative bioavailability. Part B is a 2-period cross over evaluating bioequivalence. There will be a minimum of a 7-day washout period between treatment periods. Participants will participate in Part A or Part B, but not both. Approximately 200 participants will be included in the study.

  12. h

    Replication Data for: Exploring Disagreement in Indicators of State...

    • dataverse.harvard.edu
    Updated May 30, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charles Crabtree (2018). Replication Data for: Exploring Disagreement in Indicators of State Repression [Dataset]. http://doi.org/10.7910/DVN/V5LB9K
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 30, 2018
    Dataset provided by
    Harvard Dataverse
    Authors
    Charles Crabtree
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Until recently, researchers who wanted to examine the determinants of state respect for most specific negative rights needed to rely on data from the CIRI or the Political Terror Scale (PTS). The new V-DEM dataset offers scholars a potential alternative to the individual human rights variables from CIRI. We analyze a set of key Cingranelli-Richards (CIRI) Human Rights Data Project and Varieties of Democracy (V-DEM) negative rights indicators, finding unusual and unexpectedly large patterns of disagreement between the two sets. First, we discuss the new V-DEM dataset by comparing it to the disaggregated CIRI indicators, discussing the history of each project, and describing its empirical domain. Second, we identify a set of disaggregated human rights measures that are similar across the two datasets and discuss each project's measurement approach. Third, we examine how these measures compare to each other empirically, showing that they diverge considerably across both time and space. These findings point to several important directions for future work, such as how conceptual approaches and measurement strategies affect rights scores. For the time being, our findings suggest that researchers should think carefully about using the measures as substitutes.

  13. E

    Portuguese Comparative Sentences: A Collection of Labeled Sentences on...

    • live.european-language-grid.eu
    • data.niaid.nih.gov
    • +1more
    json
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Portuguese Comparative Sentences: A Collection of Labeled Sentences on Twitter and Buscapé [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7664
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Dec 10, 2023
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    More and more customers demand online reviews of products and comments on the Web to make decisions about buying a product over another. In this context, sentiment analysis techniques constitute the traditional way to summarize a user’s opinions that criticizes or highlights the positive aspects of a product. Sentiment analysis of reviews usually relies on extracting positive and negative aspects of products, neglecting comparative opinions. Such opinions do not directly express a positive or negative view but contrast aspects of products from different competitors. Here, we present the first effort to study comparative opinions in Portuguese, creating two new Portuguese datasets with comparative sentences marked by three humans. This repository consists of three important files: (1) lexicon that contains words frequently used to make a comparison in Portuguese; (2) Twitter dataset with labeled comparative sentences; and (3) Buscapé dataset with labeled comparative sentences.The lexicon is a set of 176 words frequently used to express a comparative opinion in the Portuguese language. In these contexts, the lexicon is aggregated in a filter and used to build two sets of data with comparative sentences from two important contexts: (1) Social Network Online; and (2) Product reviews.For Twitter, we collected all Portuguese tweets published in Brazil on 2018/01/10 and filtered all tweets that contained at least one keyword present in the lexicon, obtaining 130,459 tweets. Our work is based on the sentence level. Thus, all sentences were extracted and a sample with 2,053 sentences was created, which was labeled for three human manuals, reaching an 83.2% agreement with Fleiss' Kappa coefficient. For Buscapé, a Brazilian website (https://www.buscape.com.br/) used to compare product prices on the web, the same methodology was conducted by creating a set of 2,754 labeled sentences, obtained from comments made in 2013. This dataset was labeled by three humans, reaching an agreement of 83.46% with the Fleiss Kappa coefficient.The Twitter dataset has 2,053 labeled sentences, of which 918 are comparative. The Buscapé dataset has 2,754 labeled sentences, of which 1,282 are comparative.The datasets contain some properties labeled:text: the sentence extracted from the review comment.entity_s1: the first entity compared in the sentence.entity_s2: the second entity compared in the sentence.keyword: the comparative keyword used in the sentence to express comparison.preferred_entity: the preferred entity.id_start: the starting position of the keyword in the phrase.id_end: the keyword's final position in the sentence.type: the sentence label, which specifies whether the phrase is a comparison.Additional information:1 - The sentences were separated using a sentence tokenizer.2 - If the compared entity is not specified, the field will receive a value: "_".3 - The property type can contain different five values, they are:0: Non-comparative (Não Comparativa).1: Non-Equal-Gradable (Gradativa com Predileção).2: Equative (Equitativa).3: Superlative (Superlativa).4: Non-Equal-Gradable (Não Gradativa).If you use this data, please cite our paper as follows: "Daniel Kansaon, Michele A. Brandão, Julio C. S. Reis, Matheus Barbosa,Breno Matos, and Fabrício Benevenuto. 2020. Mining Portuguese Comparative Sentences in Online Reviews. In Brazilian Symposium on Multimedia and the Web (WebMedia ’20), November 30-December 4, 2020, São Luís, Brazil. ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3428658.3431081"

  14. O

    CT Land Cover Viewer

    • data.ct.gov
    application/rdfxml +5
    Updated Jun 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UConn (2023). CT Land Cover Viewer [Dataset]. https://data.ct.gov/w/b7c8-vb26/wqz6-rhce?cur=Uso9-U-2iT7
    Explore at:
    csv, xml, application/rdfxml, json, application/rssxml, tsvAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset authored and provided by
    UConn
    Area covered
    Connecticut
    Description
    This viewer is available on CT ECO from UConn CLEAR. The viewer contains Connecticut's statewide land cover in an easy-to-explore tool.
    Description
    The Connecticut Land Cover Viewer contains seven dates of land cover, between 1985 and 2015 including change to and change from layers. These layers are in the bottom half of the layer list and start with a year, such as 1985 Land Cover.

    For each major land cover class (forest, ag field, developed, turf & grass) there are summary stats by town shown as layers in the viewer with color ramps. The darker the color, the higher the presence. Summary stats include change by town as well, where more change is shown with darker colors. These layers are in the top half of the viewer.

    The viewer also contains forest fragmentation layers from 1985 and 2015.

    More about land cover on the CLEAR website, including Number and Charts data visualizations.

    Use
    To use the viewer, use the Layer List (upper right) to turn on and off layers (remember to turn OFF the ones above on the list or they will hide layers below) to compare and explore the area. The swipe tool (lower left) is a fun way to compare two datasets. Be sure at least two items are checked on in the layer list and use the swipe tool to compare. Refer to Viewer Help for more details and tips.

    Tips
    - compare dates of land cover by turning them on and off in the layer list, or using they swipe tool
    - for any year of land cover (the layers are called 1985 Land Cover, 1990 Land Cover, etc.), click on the little arrow to the left in the table of contents to see layers that show just main land cover classes. This is a good way to explore just forest land cover or just developed land cover - you get the point!
    - land cover from satellite imagery at this resolution can look "fuzzy" compared to high resolution datasets. It is coarse but is an excellent, and perhaps the only way, to look at change over 30 years.
    - visit the Land Cover FAQs for more information.

  15. Data from: How well does a part represent the whole? A comparison of...

    • zenodo.org
    • datadryad.org
    pdf
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Melanie Jane Hopkins; Melanie Jane Hopkins (2024). Data from: How well does a part represent the whole? A comparison of cranidial shape evolution with exoskeletal character evolution in the trilobite family Pterocephaliidae [Dataset]. http://doi.org/10.5061/dryad.3rr40
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Melanie Jane Hopkins; Melanie Jane Hopkins
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    For taphonomic and practical reasons, our understanding of morphological evolution within and among species is based primarily on measurements taken from one or a few morphological traits. However, patterns can be highly dependent on trait choice, making it difficult to draw conclusions about evolution of species or clades as a whole. In this paper, I test whether patterns of evolutionary change in the shape of a part are coincident with patterns of evolutionary change based on a more comprehensive description of the organism. The former is based on geometric morphometrics of the trilobite cranidium and the latter on discrete character data describing the exoskeleton, collected from species belonging to the Cambrian family Pterocephaliidae. Using these two datasets, I compare the amount of change occurring along phylogenetic branches, as well as changes in morphospace occupation and changes in different measures of disparity. Unlike previous studies, I apply as similar a data treatment as possible to each data set in order to facilitate the comparison and interpretation of discrepancies. Results suggest that cranidial shape is a robust proxy for species-level disparity and rates of evolution in this family of trilobites. This indicates that studies that have relied on data collected from the cranidium may be representative of the patterns that would be detected if more comprehensive description of the specimens had been available.

  16. d

    Data from: Data Used to Compare Photo-Interpreted and IfSAR-Derived Maps of...

    • catalog.data.gov
    • data.usgs.gov
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Data Used to Compare Photo-Interpreted and IfSAR-Derived Maps of Polar Bear Denning Habitat for the 1002 Area of the Arctic National Wildlife Refuge, Alaska, 2006-2016 [Dataset]. https://catalog.data.gov/dataset/data-used-to-compare-photo-interpreted-and-ifsar-derived-maps-of-polar-bear-denning-h-2006
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Arctic, Arctic National Wildlife Refuge
    Description

    These are geospatial data that characterize the distribution of polar bear denning habitat in the 1002 Area of the Arctic National Wildlife Refuge, Alaska. They were generated to compare the efficacy of two different techniques for identifying areas with suitable den habitat: (1) from a previously published study (Durner et al., 2006) that used manual interpretation of aerial photos and (2) from computer interrogation of interferometric synthetic aperture radar (IfSAR) digital terrain models. Two datasets are included in this data package, they are both vector geospatial datasets of putative denning habitat (one dataset each for the manual photo interpretation data and the computer interpreted IfSAR data). Additionally included are: vector data used for sampling and metadata describing the IfSAR-derived digital terrain model (DTM) tiles used to generate the shapefiles. The IfSAR DTM are available for purchase through Intermap Technologies, Inc. All vector data are provided in both ESRI shapefile and Keyhole Markup Language (KML) formats.

  17. D

    Replication Data for: What the MIPVU protocol doesn’t tell you (even though...

    • dataverse.no
    • dataverse.azure.uit.no
    txt, type/x-r-syntax
    Updated Sep 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Susan Nacey; Susan Nacey; Tina Krennmayr; Aletta G. Dorst; Aletta G. Dorst; W. Gudrun Reijnierse; W. Gudrun Reijnierse; Tina Krennmayr (2023). Replication Data for: What the MIPVU protocol doesn’t tell you (even though it really does) [Dataset]. http://doi.org/10.18710/F04UW5
    Explore at:
    txt(160256), txt(4687), type/x-r-syntax(8464), txt(160856), type/x-r-syntax(8474)Available download formats
    Dataset updated
    Sep 28, 2023
    Dataset provided by
    DataverseNO
    Authors
    Susan Nacey; Susan Nacey; Tina Krennmayr; Aletta G. Dorst; Aletta G. Dorst; W. Gudrun Reijnierse; W. Gudrun Reijnierse; Tina Krennmayr
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The two datasets provided here were used to provide inter-rater reliability statistics for the application of a metaphor identification procedure to texts written in English. Three experienced metaphor researchers applied the Metaphor Identification Procedure Vrije Universiteit (MIPVU) to approximately 1500 words of text from two English-language newspaper articles. The dataset Eng1 contains each researcher’s independent analysis of the lexical demarcation and metaphorical status of each word in the sample. The dataset Eng2 contains a second analysis of the same texts by the same three researchers, carried out after a comparison of our responses in Eng 1 and a troubleshooting session where we discussed our differences. The accompanying R-code was used to produce the three-way and pairwise inter-rater reliability data reported in Section 3.2 of the chapter: How do I determine what comprises a lexical unit? The headings in both datasets are identical, although the order of the columns differs in the two files. In both datasets, each line corresponds to one orthographic word from the newspaper texts. Chapter Abstract: The first part of this chapter discusses various ‘nitty-gritty’ practical aspects about the original MIPVU intended for the English language. Our focus in these first three sections is on common pitfalls for novice MIPVU users that we have encountered when teaching the procedure. First, we discuss how to determine what comprises a lexical unit (section 3.2). We then move on to how to determine a more basic meaning of a lexical unit (section 3.3), and subsequently discuss how to compare and contrast contextual and basic senses (section 3.4). We illustrate our points with actual examples taken from some of our teaching sessions, as well as with our own study into inter-rater reliability, conducted for the purposes of this new volume about MIPVU in multiple languages. Section 3.5 shifts to another topic that new MIPVU users ask about – namely, which practical tools they can use to annotate their data in an efficient way. Here we discuss some tools that we find useful, illustrating how we utilized them in our inter-rater reliability study. We close this part with section 3.6, a brief discussion about reliability testing. The second part of this chapter adopts more of a bird’s-eye view. Here we leave behind the more technical questions of how to operationalize MIPVU and its steps, and instead respond more directly to the question posed above: Do we really have to identify every metaphor in every bit of our data? We discuss possible approaches for research projects involving metaphor identification, by exploring a number of important questions that all researchers need to ask themselves (preferably before they embark on a major piece of research). Section 3.7 weighs some of the differences between quantitative and qualitative approaches in metaphor research projects, while section 3.8 talks about considerations when it comes to choosing which texts to investigate, as well as possible research areas where metaphor identification can play a useful role. We close this chapter in section 3.9 with a recap of our ‘take-away’ points – that is, a summary of the highlights from our entire discussion.

  18. H

    Replication Data for: "The Governance of Public Budgeting: a Proposal for...

    • dataverse.harvard.edu
    • dataone.org
    Updated Jun 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ursula Dias Peres (2022). Replication Data for: "The Governance of Public Budgeting: a Proposal for Comparative Analyses - the Cases of São Paulo and London" [Dataset]. http://doi.org/10.7910/DVN/2CDWA9
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 7, 2022
    Dataset provided by
    Harvard Dataverse
    Authors
    Ursula Dias Peres
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/2CDWA9https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/2CDWA9

    Area covered
    London, São Paulo
    Description

    This paper aims at understanding the governance of public budgeting in large metropolises with the use of comparative analysis. The analysis is focused on budgetary governance in London and São Paulo and uses qualitative and quantitative data from 2008 to 2019 to understand whether analytical categories such as incrementalism of expenditures, complexity of budgetary rules, bureaucratic hierarchy, bargaining, and muddling through are useful to compare two metropolises, especially to determine the discretionary power of mayors in making budget allocation decisions. The analytical categories are derived from the studies of theorists of economics and political sociology, notably Wildavsky (1975, 1969), Wildavsky and Caiden (2004), Schick (2009, 1976), Caiden (2010) Lascoumes and Le Galès (2005), Baumgartner and Jones (2005), and Fuchs (2012, 2010). The main argument of the paper is that, despite the administrative and political differences between London and São Paulo, similar dimensions can explain decisions about budget allocation and the political discretionary power of mayors. The study shows that mayors have little discretionary power, particularly in contexts of fiscal austerity; it also highlights the importance of property tax as a means to protect such power.

  19. d

    Data from: Quantifying and comparing phylogenetic evolutionary rates for...

    • search.dataone.org
    • zenodo.org
    • +1more
    Updated Apr 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dean C. Adams (2025). Quantifying and comparing phylogenetic evolutionary rates for shape and other high-dimensional phenotypic data [Dataset]. http://doi.org/10.5061/dryad.41hc4
    Explore at:
    Dataset updated
    Apr 16, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Dean C. Adams
    Time period covered
    Jan 1, 2013
    Description

    Many questions in evolutionary biology require the quantification and comparison of rates of phenotypic evolution. Recently, phylogenetic comparative methods have been developed for comparing evolutionary rates on a phylogeny for single, univariate traits (σ2), and evolutionary rate matrices (R) for sets of traits treated simultaneously. However, high-dimensional traits like shape remain under-examined with this framework, because methods suited for such data have not been fully developed. In this article, I describe a method to quantify phylogenetic evolutionary rates for high-dimensional multivariate data (σ2mult), found from the equivalency between statistical methods based on covariance matrices and those based on distance matrices (R-mode and Q-mode methods). I then use simulations to evaluate the statistical performance of hypothesis testing procedures that compare σ2mult for two or more groups of species on a phylogeny. Under both isotropic and non-isotropic conditions, and for d...

  20. Esports Performance Rankings and Results

    • kaggle.com
    Updated Dec 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Esports Performance Rankings and Results [Dataset]. https://www.kaggle.com/datasets/thedevastator/unlocking-collegiate-esports-performance-with-bu/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 12, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    Description

    Esports Performance Rankings and Results

    Performance Rankings and Results from Multiple Esports Platforms

    By [source]

    About this dataset

    This dataset provides a detailed look into the world of competitive video gaming in universities. It covers a wide range of topics, from performance rankings and results across multiple esports platforms to the individual team and university rankings within each tournament. With an incredible wealth of data, fans can discover statistics on their favorite teams or explore the challenges placed upon university gamers as they battle it out to be the best. Dive into the information provided and get an inside view into the world of collegiate esports tournaments as you assess all things from Match ID, Team 1, University affiliations, Points earned or lost in each match and special Seeds or UniSeeds for exceptional teams. Of course don't forget about exploring all the great Team Names along with their corresponding websites for further details on stats across tournaments!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    Download Files First, make sure you have downloaded the CS_week1, CS_week2, CS_week3 and seeds datasets on Kaggle. You will also need to download the currentRankings file for each week of competition. All files should be saved using their originally assigned name in order for your analysis tools to read them properly (ie: CS_week1.csv).

    Understand File Structure Once all data has been collected and organized into separate files on your desktop/laptop computer/mobile device/etc., it's time to become familiar with what type of information is included in each file. The main folder contains three main data files: week1-3 and seedings. The week1-3 contain teams matched against one another according to university, point score from match results as well as team name and website URL associated with university entry; whereas the seedings include a ranking system amongst university entries which are accompanied by information regarding team names, website URLs etc.. Furthermore, there is additional file featured which contains currentRankings scores for each individual player/teams for an first given period of competition (ie: first week).

    Analyzing Data Now that everything is set up on your end it’s time explore! You can dive deep into trends amongst universities or individual players in regards to specific match performances or standings overall throughout weeks of competition etc… Furthermore you may also jumpstart insights via further creation of graphs based off compiled date from sources taken from BUECTracker dataset! For example let us say we wanted compare two universities- let's say Harvard University v Cornell University - against one another since beginning of event i we shall extract respective points(column),dates(column)(found under result tab) ,regions(csilluminating North America vs Europe etc)general stats such as maps played etc.. As well any other custom ideas which would come along in regards when dealing with similar datasets!

    Research Ideas

    • Analyze the performance of teams and identify areas for improvement for better performance in future competitions.
    • Assess which esports platforms are the most popular among gamers.
    • Gain a better understanding of player rankings across different regions, based on rankings system, to create targeted strategies that could boost individual players' scoring potential or team overall success in competitive gaming events

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: CS_week1.csv | Column name | Description | |:---------------|:----------------------------------------------| | Match ID | Unique identifier for each match. (Integer) | | Team 1 | Name of the first team in the match. (String) | | University | University associated with the team. (String) |

    File: CS_week1_currentRankings.csv | Column name | Description | |:--------------|:-----------------------------------------------------------|...

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Leonidas Bantis; Ziding Feng (2023). Dataset for: Comparison of Two Correlated ROC Surfaces at a Given Pair of True Classification Rates [Dataset]. http://doi.org/10.6084/m9.figshare.6527219.v1

Dataset for: Comparison of Two Correlated ROC Surfaces at a Given Pair of True Classification Rates

Related Article
Explore at:
xlsxAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
Wiley
Authors
Leonidas Bantis; Ziding Feng
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

The receiver operating characteristics (ROC) curve is typically employed when one wants to evaluate the discriminatory capability of a continuous or ordinal biomarker in the case where two groups are to be distinguished, commonly the ’healthy’ and the ’diseased’. There are cases for which the disease status has three categories. Such cases employ the (ROC) surface, which is a natural generalization of the ROC curve for three classes. In this paper, we explore new methodologies for comparing two continuous biomarkers that refer to a trichotomous disease status, when both markers are applied to the same patients. Comparisons based on the volume under the surface have been proposed, but that measure is often not clinically relevant. Here, we focus on comparing two correlated ROC surfaces at given pairs of true classification rates, which are more relevant to patients and physicians. We propose delta-based parametric techniques, power transformations to normality, and bootstrap-based smooth nonparametric techniques to investigate the performance of an appropriate test. We evaluate our approaches through an extensive simulation study and apply them to a real data set from prostate cancer screening.

Search
Clear search
Close search
Google apps
Main menu