Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We introduce classifiers based on directional quantiles. We derive theoretical results for selecting optimal quantile levels given a direction, and, conversely, an optimal direction given a quantile level. We also show that the probability of correct classification of the proposed classifier converges to one if population distributions differ by at most a location shift and if the number of directions is allowed to diverge at the same rate of the problem’s dimension. We illustrate the satisfactory performance of our proposed classifiers in both small- and high-dimensional settings via a simulation study and a real data example. The code implementing the proposed methods is publicly available in the R package Qtools. Supplementary materials for this article are available online.
Facebook
TwitterThe map shows population density in Tioga County NY using a quantile classification with 5 data breaks each rounded to the nearest 10 people. The population data is census block level data from the 2010 U.S. Census.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Quantiles of sensitivity, specificity and log posterior for training and validation datasets over all accepted trees, for Bayesian classification trees.
Facebook
TwitterU.S. Census population data for Kansas counties from 1890 through 2010. The choropleth map shows 2010 population based on a quantile classification. Click on any county to see additional information about historic maximums, population loss, and trend in population since 1890.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data presented here were used to produce the following paper:
Archibald, Twine, Mthabini, Stevens (2021) Browsing is a strong filter for savanna tree seedlings in their first growing season. J. Ecology.
The project under which these data were collected is: Mechanisms Controlling Species Limits in a Changing World. NRF/SASSCAL Grant number 118588
For information on the data or analysis please contact Sally Archibald: sally.archibald@wits.ac.za
Description of file(s):
File 1: cleanedData_forAnalysis.csv (required to run the R code: "finalAnalysis_PostClipResponses_Feb2021_requires_cleanData_forAnalysis_.R"
The data represent monthly survival and growth data for ~740 seedlings from 10 species under various levels of clipping.
The data consist of one .csv file with the following column names:
treatment Clipping treatment (1 - 5 months clip plus control unclipped) plot_rep One of three randomised plots per treatment matrix_no Where in the plot the individual was placed species_code First three letters of the genus name, and first three letters of the species name uniquely identifies the species species Full species name sample_period Classification of sampling period into time since clip. status Alive or Dead standing.height Vertical height above ground (in mm) height.mm Length of the longest branch (in mm) total.branch.length Total length of all the branches (in mm) stemdiam.mm Basal stem diameter (in mm) maxSpineLength.mm Length of the longest spine postclipStemNo Number of resprouting stems (only recorded AFTER clipping) date.clipped date.clipped date.measured date.measured date.germinated date.germinated Age.of.plant Date measured - Date germinated newtreat Treatment as a numeric variable, with 8 being the control plot (for plotting purposes)
File 2: Herbivory_SurvivalEndofSeason_march2017.csv (required to run the R code: "FinalAnalysisResultsSurvival_requires_Herbivory_SurvivalEndofSeason_march2017.R"
The data consist of one .csv file with the following column names:
treatment Clipping treatment (1 - 5 months clip plus control unclipped) plot_rep One of three randomised plots per treatment matrix_no Where in the plot the individual was placed species_code First three letters of the genus name, and first three letters of the species name uniquely identifies the species species Full species name sample_period Classification of sampling period into time since clip. status Alive or Dead standing.height Vertical height above ground (in mm) height.mm Length of the longest branch (in mm) total.branch.length Total length of all the branches (in mm) stemdiam.mm Basal stem diameter (in mm) maxSpineLength.mm Length of the longest spine postclipStemNo Number of resprouting stems (only recorded AFTER clipping) date.clipped date.clipped date.measured date.measured date.germinated date.germinated Age.of.plant Date measured - Date germinated newtreat Treatment as a numeric variable, with 8 being the control plot (for plotting purposes) genus Genus MAR Mean Annual Rainfall for that Species distribution (mm) rainclass High/medium/low
File 3: allModelParameters_byAge.csv (required to run the R code: "FinalModelSeedlingSurvival_June2021_.R"
Consists of a .csv file with the following column headings
Age.of.plant Age in days species_code Species pred_SD_mm Predicted stem diameter in mm pred_SD_up top 75th quantile of stem diameter in mm pred_SD_low bottom 25th quantile of stem diameter in mm treatdate date when clipped pred_surv Predicted survival probability pred_surv_low Predicted 25th quantile survival probability pred_surv_high Predicted 75th quantile survival probability species_code species code Bite.probability Daily probability of being eaten max_bite_diam_duiker_mm Maximum bite diameter of a duiker for this species duiker_sd standard deviation of bite diameter for a duiker for this species max_bite_diameter_kudu_mm Maximum bite diameer of a kudu for this species kudu_sd standard deviation of bite diameter for a kudu for this species mean_bite_diam_duiker_mm mean etc duiker_mean_sd standard devaition etc mean_bite_diameter_kudu_mm mean etc kudu_mean_sd standard deviation etc genus genus rainclass low/med/high
File 4: EatProbParameters_June2020.csv (required to run the R code: "FinalModelSeedlingSurvival_June2021_.R"
Consists of a .csv file with the following column headings
shtspec species name
species_code species code
genus genus
rainclass low/medium/high
seed mass mass of seed (g per 1000seeds)
Surv_intercept coefficient of the model predicting survival from age of clip for this species
Surv_slope coefficient of the model predicting survival from age of clip for this species
GR_intercept coefficient of the model predicting stem diameter from seedling age for this species
GR_slope coefficient of the model predicting stem diameter from seedling age for this species
species_code species code
max_bite_diam_duiker_mm Maximum bite diameter of a duiker for this species
duiker_sd standard deviation of bite diameter for a duiker for this species
max_bite_diameter_kudu_mm Maximum bite diameer of a kudu for this species
kudu_sd standard deviation of bite diameter for a kudu for this species
mean_bite_diam_duiker_mm mean etc
duiker_mean_sd standard devaition etc
mean_bite_diameter_kudu_mm mean etc
kudu_mean_sd standard deviation etc
AgeAtEscape_duiker[t] age of plant when its stem diameter is larger than a mean duiker bite
AgeAtEscape_duiker_min[t] age of plant when its stem diameter is larger than a min duiker bite
AgeAtEscape_duiker_max[t] age of plant when its stem diameter is larger than a max duiker bite
AgeAtEscape_kudu[t] age of plant when its stem diameter is larger than a mean kudu bite
AgeAtEscape_kudu_min[t] age of plant when its stem diameter is larger than a min kudu bite
AgeAtEscape_kudu_max[t] age of plant when its stem diameter is larger than a max kudu bite
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This paper formulates a support vector machine with quantile hyper-spheres (QHSVM) for pattern classification. The idea of QHSVM is to build two quantile hyper-spheres with the same center for positive or negative training samples. Every quantile hyper-sphere is constructed by using pinball loss instead of hinge loss, which makes the new classification model be insensitive to noise, especially the feature noise around the decision boundary. Moreover, the robustness and generalization of QHSVM are strengthened through maximizing the margin between two quantile hyper-spheres, maximizing the inner-class clustering of samples and optimizing the independent quadratic programming for a target class. Besides that, this paper proposes a novel local center-based density estimation method. Based on it, ρ-QHSVM with surrounding and clustering samples is given. Under the premise of high accuracy, the execution speed of ρ-QHSVM can be adjusted. The experimental results in artificial, benchmark and strip steel surface defects datasets show that the QHSVM model has distinct advantages in accuracy and the ρ-QHSVM model is fit for large-scale datasets.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ABSTRACT: Plant growth analyses are important because they generate information on the demand and necessary care for each development stage of a plant. Nonlinear regression models are appropriate for the description of curves of growth, since they include parameters with practical biological interpretation. However, these models present information in terms of the conditional mean, and they are subject to problems in the adjustment caused by possible outliers or asymmetry in the distribution of the data. Quantile regression can solve these problems, and it allows the estimation of different quantiles, generating more complete and robust results. The objective of this research was to adjust a nonlinear quantile regression model for the study of dry matter accumulation in garlic plants (Allium sativum L.) over time, estimating parameters at three different quantiles and classifying each garlic accession according to its growth rate and asymptotic weight. The nonlinear regression model fitted was a Logistic model, and 30 garlic accessions were evaluated. These 30 accessions were divided based on the model with the closest quantile estimates; 12 accessions were classified as of lesser interest for planting, 6 were classified as intermediate, and 12 were classified as of greater interest for planting.
Facebook
TwitterQuantile classification rounded to 100,000.Pop-up graphs show CO2 emissions over time since 1961Data from the World Bank.
Facebook
TwitterStandard deviation of depth was calculated from the bathymetry surface for each cell using the ArcGIS Spatial Analyst Focal Statistics "STD" parameter. Standard deviation of depth represents the dispersion of depth values (in meters) around the mean depth within a square 3x3 cell window. The 2x2 meter resolution standard deviation of depth GeoTIFF was exported and added as a new map layer to aid in benthic habitat classification. Acoustic imagery was acquired for the VICRNM on two separate missions onboard the NOAA ship, Nancy Foster. The first mission took place from 2/18/04 to 3/5/04. The second mission took place from 2/1/05 to 2/12/05. On both missions, seafloor depths between 14 to 55 m were mapped using a RESON SeaBat 8101 ER (240 kHz) MBES sensor. This pole-mounted system measured water depths across a 150 degree swath consisting of 101 individual 1.5 degree x 1.5 degree beams. The beams to the port and starboard of nadir (i.e., directly underneath the ship) overlapped adjacent survey lines by approximately 10 m. The vessel survey speed was between 5 and 8 kn. In 2004, the ship's location was determined by a Trimble DSM 132 DGPS system, which provided a RTCM differential data stream from the U.S. Coast Guard Continually Operating Reference Station (CORS) at Port Isabel, Puerto Rico. Gyro, heave, pitch and roll correctors were acquired using an Ixsea Octans gyrocompass. In 2005, the ship's positioning and orientation were determined by the Applanix POS/MV 320 V4, which is a GPS aided Inertial Motion Unit (IMU) providing measurements of roll, pitch and heading. The POS/MV obtained its positions from two dual frequency Trimble Zephyr GPS antennae. An auxiliary Trimble DSM 132 DGPS system provided a RTCM differential data stream from the U.S. Coast Guard CORS at Port Isabel, Puerto Rico. For both years, CTD (conductivity, temperature and depth) measurements were taken approximately every 4 hours using a Seabird Electronics SBE-19 to correct for the changing sound velocities in the water column. In 2004, raw data were logged in .xtf (extended triton format) using Triton ISIS software 6.2. In 2005, raw data were logged in .gsf (generic sensor format) using SAIC ISS 2000 software. Data from 2004 were referenced to the WGS84 UTM 20 N horizontal coordinate system, and data from 2005 were referenced to the NAD83 UTM 20 N horizontal coordinate system. Data from both projects were referenced to the Mean Lower Low Water (MLLW) vertical tidal coordinate system. The 2004 and 2005 MBES bathymetric data were both corrected for sensor offsets, latency, roll, pitch, yaw, static draft, the changing speed of sound in the water column and the influence of tides in CARIS Hips & Sips 5.3 and 5.4, respectively. The 2004 data was then binned to create a 1 x 1 m raster surface, and the 2005 data was binned to a create 2 x 2 m raster surface. After these final surfaces were created, the datum for the 2004 bathymetric surfaces was transformed from WGS84 to NAD83 using the "Project Raster" function in ArcGIS 9.1. The 2004 surface was transformed so that it would have the same datum as the 2005 surface. The 2004 bathymetric surface was then down sampled from 1 x 1 to 2 x 2 m using the "Resample" function in ArcGIS 9.1. The 2004 surface was resampled so it would have the same spatial resolution as the 2005 surface. Having the same coordinate systems and spatial resolutions, the final 2004 and 2005 bathymetry rasters were then merged using the Raster Calculator function "Merge" in ArcGIS's Spatial Analyst Extension to create a seamless bathymetry surface for the entire VICRNM area south of St. John. For a complete description of the data acquisition and processing parameters, please see the data acquisition and processing reports (DAPRs) for projects: NF-04-06-VI and NF-05-05-VI (Monaco & Rooney, 2004; Battista & Lazar, 2005).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Results of two group classification methods.
Facebook
TwitterThe dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.
This dataset is a subset of the Hunter Riverine landscapes classes to be shown as an augmentation to the modelled river impacts layer.
It contains non-ephemeral landscape classes (low to mod intermittent, mod to highly intermittent and perennial) which are deemed to be potentially subject to hydrological change due to having their headwaters in areas subject to ACRD induced drawdown.
Potential impact is flagged at Q05, Q50 and Q95 levels in the attribute table.
for use in map reports
Non ephemeral stream landscape classes were compared with foot prints of 0.2m groundwater ACRD drawdown at the Q05 Q50 and Q95 levels. Streams rising out of and/or intersecting the footprints at the respective quantiles were tagged acoordingly were selected out and tagged accordingly in the attribute table
Bioregional Assessment Programme (2017) HUN SW Potentially Impacted Reaches by Quantile v01. Bioregional Assessment Derived Dataset. Viewed 13 March 2019, http://data.bioregionalassessments.gov.au/dataset/55c568ce-ec90-40ca-9fd6-6c8fa58519e7.
Derived From National Groundwater Dependent Ecosystems (GDE) Atlas (including WA)
Derived From HUN River Perenniality v01
Derived From HUN GW Model code v01
Derived From HUN Landscape Classification v02
Derived From Travelling Stock Route Conservation Values
Derived From HUN GW Model v01
Derived From NSW Wetlands
Derived From Climate Change Corridors Coastal North East NSW
Derived From NSW Office of Water Surface Water Licences Processed for Hunter v1 20140516
Derived From Climate Change Corridors for Nandewar and New England Tablelands
Derived From National Groundwater Dependent Ecosystems (GDE) Atlas
Derived From HUN GW Quantiles Interpolation for IMIA Database v01
Derived From BA ALL Assessment Units 1000m Reference 20160516_v01
Derived From Asset database for the Hunter subregion on 27 August 2015
Derived From Birds Australia - Important Bird Areas (IBA) 2009
Derived From Groundwater Economic Assets Hunter NSW 20150331 PersRem
Derived From Geofabric Surface Network - V2.1.1
Derived From Hunter CMA GDEs (DRAFT DPI pre-release)
Derived From Camerons Gorge Grassy White Box Endangered Ecological Community (EEC) 2008
Derived From Atlas of Living Australia NSW ALA Portal 20140613
Derived From Spatial Threatened Species and Communities (TESC) NSW 20131129
Derived From Estuarine Macrophytes of Hunter Subregion NSW DPI Hunter 2004
Derived From Asset database for the Hunter subregion on 24 February 2016
Derived From Natural Resource Management (NRM) Regions 2010
Derived From Gosford Council Endangered Ecological Communities (Umina woodlands) EEC3906
Derived From NSW Office of Water Surface Water Offtakes - Hunter v1 24102013
Derived From NSW Office of Water Surface Water Entitlements Locations v1_Oct2013
Derived From Australia - Species of National Environmental Significance Database
Derived From Asset list for Hunter - CURRENT
Derived From Northern Rivers CMA GDEs (DRAFT DPI pre-release)
Derived From GEODATA TOPO 250K Series 3, File Geodatabase format (.gdb)
Derived From Ramsar Wetlands of Australia
Derived From Bioregional_Assessment_Programme_Catchment Scale Land Use of Australia - 2014
Derived From GEODATA TOPO 250K Series 3
Derived From NSW Catchment Management Authority Boundaries 20130917
Derived From Geological Provinces - Full Extent
Derived From Hunter subregion boundary
Derived From Commonwealth Heritage List Spatial Database (CHL)
Derived From Groundwater Economic Elements Hunter NSW 20150520 PersRem v02
Derived From Greater Hunter Native Vegetation Mapping with Classification for Mapping
Derived From Native Vegetation Management (NVM) - Manage Benefits
Derived From Bioregional Assessment areas v03
Derived From HUN Groundwater tables 20170421
Derived From HUN Assessment Units 1000m 20160725 v02
Derived From HUN Landscape Classification v03
Derived From National Heritage List Spatial Database (NHL) (v2.1)
Derived From GW Element Bores with Unknown FTYPE Hunter NSW Office of Water 20150514
Derived From Climate Change Corridors (Dry Habitat) for North East NSW
Derived From Groundwater Entitlement Hunter NSW Office of Water 20150324
Derived From Asset database for the Hunter subregion on 20 July 2015
Derived From Fauna Corridors for North East NSW
Derived From NSW Office of Water combined geodatabase of regulated rivers and water sharing plan regions
Derived From BA ALL Assessment Units 1000m 'super set' 20160516_v01
Derived From NSW Office of Water GW licence extract linked to spatial locations for NorthandSouthSydney v3 13032014
Derived From Asset database for the Hunter subregion on 16 June 2015
Derived From Australia World Heritage Areas
Derived From Asset database for the Hunter subregion on 12 February 2015
Derived From [Lower Hunter
Facebook
TwitterThe average classification accuracy (%), standard error, and standard deviation for each sensing modality for test 1.
Facebook
Twitterhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
This dataset represents synthetic cell data with distinct clusters. The data simulates the characteristics of cells, with each cell being described by its coordinates on a 2D plane (X-axis and Y-axis) and assigned to a specific cluster.
Dataset Characteristics:
Number of Cells: 300 Number of Genes: 2 Number of Clusters: 3 Cluster Characteristics:
Cluster 1: X Mean: 2, X Standard Deviation: 0.5 Y Mean: 3, Y Standard Deviation: 0.4 Cluster 2: X Mean: 6, X Standard Deviation: 0.7 Y Mean: 7, Y Standard Deviation: 0.8 Cluster 3: X Mean: 10, X Standard Deviation: 0.6 Y Mean: 11, Y Standard Deviation: 0.5 Cluster Proportions:
Cluster 1: 40% Cluster 2: 30% Cluster 3: 30% Visualization: The dataset is visualized on a scatter plot where each point represents a cell. The X-axis and Y-axis represent the coordinates of each cell, and different colors are used to distinguish cells belonging to different clusters. The legend indicates the corresponding cluster for each color.
This synthetic dataset is created for illustrative purposes and showcases distinct clusters with varying characteristics.
Facebook
TwitterMean and standard deviation (SD) of the best classification accuracy obtained for the healthy subjects and the patients for each cardinality in the Imagery Trial and in the pre-Communication Trial.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The thermoanalytical technique differential scanning calorimetry (DSC) has been applied to characterize protein denaturation patterns (thermograms) in blood plasma samples and relate these to a subject’s health status. The analysis and classification of thermograms is challenging because of the high-dimensionality of the dataset. There are various methods for group classification using high-dimensional data sets; however, the impact of using high-dimensional data sets for cancer classification has been poorly understood. In the present article, we proposed a statistical approach for data reduction and a parametric method (PM) for modeling of high-dimensional data sets for two- and three- group classification using DSC and demographic data. We compared the PM to the non-parametric classification method K-nearest neighbors (KNN) and the semi-parametric classification method KNN with dynamic time warping (DTW). We evaluated the performance of these methods for multiple two-group classifications: (i) normal versus cervical cancer, (ii) normal versus lung cancer, (iii) normal versus cancer (cervical + lung), (iv) lung cancer versus cervical cancer as well as for three-group classification: normal versus cervical cancer versus lung cancer. In general, performance for two-group classification was high whereas three-group classification was more challenging, with all three methods predicting normal samples more accurately than cancer samples. Moreover, specificity of the PM method was mostly higher or the same as KNN and DTW-KNN with lower sensitivity. The performance of KNN and DTW-KNN decreased with the inclusion of demographic data, whereas similar performance was observed for the PM which could be explained by the fact that the PM uses fewer parameters as compared to KNN and DTW-KNN methods and is thus less susceptible to the risk of overfitting. More importantly the accuracy of the PM can be increased by using a greater number of quantile data points and by the inclusion of additional demographic and clinical data, providing a substantial advantage over KNN and DTW-KNN methods.
Facebook
TwitterMean and standard deviation values of operating characteristics (OC), for different classification algorithms and techniques to cope with data imbalance, and values obtained with SB criteria.
Facebook
TwitterOverview: 142: Areas used for sports, leisure and recreation purposes. Traceability (lineage): This dataset was produced with a machine learning framework with several input datasets, specified in detail in Witjes et al., 2022 (in review, preprint available at https://doi.org/10.21203/rs.3.rs-561383/v3 ) Scientific methodology: The single-class probability layers were generated with a spatiotemporal ensemble machine learning framework detailed in Witjes et al., 2022 (in review, preprint available at https://doi.org/10.21203/rs.3.rs-561383/v3 ). The single-class uncertainty layers were calculated by taking the standard deviation of the three single-class probabilities predicted by the three components of the ensemble. The HCL (hard class) layers represents the class with the highest probability as predicted by the ensemble. Usability: The HCL layers have a decreasing average accuracy (weighted F1-score) at each subsequent level in the CLC hierarchy. These metrics are 0.83 at level 1 (5 classes):, 0.63 at level 2 (14 classes), and 0.49 at level 3 (43 classes). This means that the hard-class maps are more reliable when aggregating classes to a higher level in the hierarchy (e.g. 'Discontinuous Urban Fabric' and 'Continuous Urban Fabric' to 'Urban Fabric'). Some single-class probabilities may more closely represent actual patterns for some classes that were overshadowed by unequal sample point distributions. Users are encouraged to set their own thresholds when postprocessing these datasets to optimize the accuracy for their specific use case. Uncertainty quantification: Uncertainty is quantified by taking the standard deviation of the probabilities predicted by the three components of the spatiotemporal ensemble model. Data validation approaches: The LULC classification was validated through spatial 5-fold cross-validation as detailed in the accompanying publication. Completeness: The dataset has chunks of empty predictions in regions with complex coast lines (e.g. the Zeeland province in the Netherlands and the Mar da Palha bay area in Portugal). These are artifacts that will be avoided in subsequent versions of the LULC product. Consistency: The accuracy of the predictions was compared per year and per 30km*30km tile across europe to derive temporal and spatial consistency by calculating the standard deviation. The standard deviation of annual weighted F1-score was 0.135, while the standard deviation of weighted F1-score per tile was 0.150. This means the dataset is more consistent through time than through space: Predictions are notably less accurate along the Mediterrranean coast. The accompanying publication contains additional information and visualisations. Positional accuracy: The raster layers have a resolution of 30m, identical to that of the Landsat data cube used as input features for the machine learning framework that predicted it. Temporal accuracy: The dataset contains predictions and uncertainty layers for each year between 2000 and 2019. Thematic accuracy: The maps reproduce the Corine Land Cover classification system, a hierarchical legend that consists of 5 classes at the highest level, 14 classes at the second level, and 44 classes at the third level. Class 523: Oceans was omitted due to computational constraints.
Facebook
TwitterThe median of the classification accuracies from the constant position and varied grasped loads tests reported with the interquartile range and the standard deviation.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Facebook
TwitterThe zip file contains grid files in UTM 16S resulted from AUV mutlibeam data processing and a table with descriptions of these grid files. AUV bathymetry data resulted from interpolation of multibeam depth measurements using the IDW algorithm in SAGA GIS. The AUV bathymetric derivatives (Bathymetric Position Index, Concavity, LS factor, and Terrain Ruggedness Index were calculated in SAGA GIS. The slope derivative was calculated in ArcMap. The AUV backscatter statistics (10th quantile, 90th quantile, mean and mode) were calculated in FMGT Geocoder. The Bayesian classification map was created in SAGA GIS using data from Bayesian classification in Matlab. The ISODATA classification map was created in SAGA GIS using the the AUV backscatter statistics and the Random Forest predictive map was created using the MGET toolbox in ArcMap and the AUV bathymetry, bathymetric derivatives and backscatter statistics data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We introduce classifiers based on directional quantiles. We derive theoretical results for selecting optimal quantile levels given a direction, and, conversely, an optimal direction given a quantile level. We also show that the probability of correct classification of the proposed classifier converges to one if population distributions differ by at most a location shift and if the number of directions is allowed to diverge at the same rate of the problem’s dimension. We illustrate the satisfactory performance of our proposed classifiers in both small- and high-dimensional settings via a simulation study and a real data example. The code implementing the proposed methods is publicly available in the R package Qtools. Supplementary materials for this article are available online.