Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We introduce classifiers based on directional quantiles. We derive theoretical results for selecting optimal quantile levels given a direction, and, conversely, an optimal direction given a quantile level. We also show that the probability of correct classification of the proposed classifier converges to one if population distributions differ by at most a location shift and if the number of directions is allowed to diverge at the same rate of the problem’s dimension. We illustrate the satisfactory performance of our proposed classifiers in both small- and high-dimensional settings via a simulation study and a real data example. The code implementing the proposed methods is publicly available in the R package Qtools. Supplementary materials for this article are available online.
The map shows population density in Tioga County NY using a quantile classification with 5 data breaks each rounded to the nearest 10 people. The population data is census block level data from the 2010 U.S. Census.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Quantiles of sensitivity, specificity and log posterior for training and validation datasets over all accepted trees, for Bayesian classification trees.
No description was included in this Dataset collected from the OSF
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data presented here were used to produce the following paper:
Archibald, Twine, Mthabini, Stevens (2021) Browsing is a strong filter for savanna tree seedlings in their first growing season. J. Ecology.
The project under which these data were collected is: Mechanisms Controlling Species Limits in a Changing World. NRF/SASSCAL Grant number 118588
For information on the data or analysis please contact Sally Archibald: sally.archibald@wits.ac.za
Description of file(s):
File 1: cleanedData_forAnalysis.csv (required to run the R code: "finalAnalysis_PostClipResponses_Feb2021_requires_cleanData_forAnalysis_.R"
The data represent monthly survival and growth data for ~740 seedlings from 10 species under various levels of clipping.
The data consist of one .csv file with the following column names:
treatment Clipping treatment (1 - 5 months clip plus control unclipped) plot_rep One of three randomised plots per treatment matrix_no Where in the plot the individual was placed species_code First three letters of the genus name, and first three letters of the species name uniquely identifies the species species Full species name sample_period Classification of sampling period into time since clip. status Alive or Dead standing.height Vertical height above ground (in mm) height.mm Length of the longest branch (in mm) total.branch.length Total length of all the branches (in mm) stemdiam.mm Basal stem diameter (in mm) maxSpineLength.mm Length of the longest spine postclipStemNo Number of resprouting stems (only recorded AFTER clipping) date.clipped date.clipped date.measured date.measured date.germinated date.germinated Age.of.plant Date measured - Date germinated newtreat Treatment as a numeric variable, with 8 being the control plot (for plotting purposes)
File 2: Herbivory_SurvivalEndofSeason_march2017.csv (required to run the R code: "FinalAnalysisResultsSurvival_requires_Herbivory_SurvivalEndofSeason_march2017.R"
The data consist of one .csv file with the following column names:
treatment Clipping treatment (1 - 5 months clip plus control unclipped) plot_rep One of three randomised plots per treatment matrix_no Where in the plot the individual was placed species_code First three letters of the genus name, and first three letters of the species name uniquely identifies the species species Full species name sample_period Classification of sampling period into time since clip. status Alive or Dead standing.height Vertical height above ground (in mm) height.mm Length of the longest branch (in mm) total.branch.length Total length of all the branches (in mm) stemdiam.mm Basal stem diameter (in mm) maxSpineLength.mm Length of the longest spine postclipStemNo Number of resprouting stems (only recorded AFTER clipping) date.clipped date.clipped date.measured date.measured date.germinated date.germinated Age.of.plant Date measured - Date germinated newtreat Treatment as a numeric variable, with 8 being the control plot (for plotting purposes) genus Genus MAR Mean Annual Rainfall for that Species distribution (mm) rainclass High/medium/low
File 3: allModelParameters_byAge.csv (required to run the R code: "FinalModelSeedlingSurvival_June2021_.R"
Consists of a .csv file with the following column headings
Age.of.plant Age in days species_code Species pred_SD_mm Predicted stem diameter in mm pred_SD_up top 75th quantile of stem diameter in mm pred_SD_low bottom 25th quantile of stem diameter in mm treatdate date when clipped pred_surv Predicted survival probability pred_surv_low Predicted 25th quantile survival probability pred_surv_high Predicted 75th quantile survival probability species_code species code Bite.probability Daily probability of being eaten max_bite_diam_duiker_mm Maximum bite diameter of a duiker for this species duiker_sd standard deviation of bite diameter for a duiker for this species max_bite_diameter_kudu_mm Maximum bite diameer of a kudu for this species kudu_sd standard deviation of bite diameter for a kudu for this species mean_bite_diam_duiker_mm mean etc duiker_mean_sd standard devaition etc mean_bite_diameter_kudu_mm mean etc kudu_mean_sd standard deviation etc genus genus rainclass low/med/high
File 4: EatProbParameters_June2020.csv (required to run the R code: "FinalModelSeedlingSurvival_June2021_.R"
Consists of a .csv file with the following column headings
shtspec species name
species_code species code
genus genus
rainclass low/medium/high
seed mass mass of seed (g per 1000seeds)
Surv_intercept coefficient of the model predicting survival from age of clip for this species
Surv_slope coefficient of the model predicting survival from age of clip for this species
GR_intercept coefficient of the model predicting stem diameter from seedling age for this species
GR_slope coefficient of the model predicting stem diameter from seedling age for this species
species_code species code
max_bite_diam_duiker_mm Maximum bite diameter of a duiker for this species
duiker_sd standard deviation of bite diameter for a duiker for this species
max_bite_diameter_kudu_mm Maximum bite diameer of a kudu for this species
kudu_sd standard deviation of bite diameter for a kudu for this species
mean_bite_diam_duiker_mm mean etc
duiker_mean_sd standard devaition etc
mean_bite_diameter_kudu_mm mean etc
kudu_mean_sd standard deviation etc
AgeAtEscape_duiker[t] age of plant when its stem diameter is larger than a mean duiker bite
AgeAtEscape_duiker_min[t] age of plant when its stem diameter is larger than a min duiker bite
AgeAtEscape_duiker_max[t] age of plant when its stem diameter is larger than a max duiker bite
AgeAtEscape_kudu[t] age of plant when its stem diameter is larger than a mean kudu bite
AgeAtEscape_kudu_min[t] age of plant when its stem diameter is larger than a min kudu bite
AgeAtEscape_kudu_max[t] age of plant when its stem diameter is larger than a max kudu bite
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Urban heat islands (UHIs) have become one of the most critical issues around the world, especially in the context of rapid urbanization and global climate change. Extensive research has been conducted across disciplines on the factors related to land surface temperature (LST) and how to mitigate the UHI effect. However, there remain deficiencies in the exploration of LST changes across time and their relationship with underlying surfaces in different temperature ranges. In order to fill the gap, this study compared the LST of each month by using the quantile classification method taking the Landsat 8 images of Nanjing on May 18th, July 21st, and October 9th in 2017 as the subject and then calculated the differences between July and May as well as that between July and October by an intersection tool taking the LST classes of July as the baseline. Additionally, the spatial pattern of each temperature class and intersection area was analyzed with the help of several landscape metrics, and the land contribution index (LCI) was utilized to better quantify the thermal contribution of each underlying surface to the area. The results indicated that the difference between months mainly reflected in the medium temperature area, especially between July and October, in which landscape patterns illustrated a trend of fragmentation and decentralization. The proportions of underlying surfaces in different types of intersection revealed the distinction of their warming and cooling degrees over time, in which the warming degree of other rigid pavement was higher in the warming process from May to July, and the cooling degree of buildings was greater in the cooling process from July to October. The LCI of each underlying surface in the entire study area was different from that in each temperature class, indicating that underlying surfaces had distinguished thermal contributions in different temperature ranges. This study is expected to fill the gap in previous studies and provide a new perspective on the mitigation of UHI.
Quantile classification rounded to 100,000.Pop-up graphs show CO2 emissions over time since 1961Data from the World Bank.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset is used to assess the urban heat risk in the city of Riyadh using proxy variables to evaluate the environmental, infrastructural, and social dimensions of the city.
The environmental component was evaluated using the mean values of land surface temperature (LST), air temperature (T2m), and discomfort index (DI) across the districts of Riyadh. These factors, derived from data like MODIS LST and available WRF simulations, represented the degree of heat exposure in different regions.
The infrastructural component of heat risk was evaluated by looking at the city's infrastructure, that is the building density per district. Buildings can act as "heat traps," thus higher building density suggests increased heat risk.
The social component considered demographic factors such as the percentage of the population over 65 old (OP) and under 14 years old (YP), which can indicate sensitivity to extreme heat conditions.
To map the heat risk, these components were combined into a composite heat risk indicator. For this to be achieved, each parameter was reclassified into three categories (1-less, 2-moderate, and 3-high) using the quantile classification which is a data classification method that distributes a set of values into groups that contain an equal number of values.
LST (°C) DI T2m (°C) <14 y.o. (%) >65 y.o (%) Buildings per sq. m.(BD)
1-Less risk <47.2 <28 <40.6 <23 <1 <66
2-Moderate risk 47.2 ≤ LST ≤ 47.9 28≤ DI ≤ 28.2 40.6 ≤ T2m ≤ 40.8 23≤ YP ≤28 1≤ OP ≤ 3 66≤ BD ≤ 109
3-High risk >47.9 >28.2 >40.8 >28 >3 >109
LST: Land Surface Temperature; DI: Discomfort Index; T2m: Air temperature at 2m height; YP<14 y.o.: People under 14 years old; OP y.o.: Older people over 65 years old;
Since the relative importance of each parameter is unknown, we considered that all parameters contributed equally to the composite heat risk index and the arithmetic values were aggregated. The final value for each district was then reclassified into three categories using the quantile classification method resulting in the final three categories of Urban Heat Risk (Less heat risk, Moderate heat risk, High heat risk)
Notice: this is not the latest Heat Island Severity image service. For 2023 data, visit https://tpl.maps.arcgis.com/home/item.html?id=db5bdb0f0c8c4b85b8270ec67448a0b6. This layer contains the relative heat severity for every pixel for every city in the contiguous United States. This 30-meter raster was derived from Landsat 8 imagery band 10 (ground-level thermal sensor) from the summer of 2021, patched with data from 2020 where necessary.Federal statistics over a 30-year period show extreme heat is the leading cause of weather-related deaths in the United States. Extreme heat exacerbated by urban heat islands can lead to increased respiratory difficulties, heat exhaustion, and heat stroke. These heat impacts significantly affect the most vulnerable—children, the elderly, and those with preexisting conditions.The purpose of this layer is to show where certain areas of cities are hotter than the average temperature for that same city as a whole. Severity is measured on a scale of 1 to 5, with 1 being a relatively mild heat area (slightly above the mean for the city), and 5 being a severe heat area (significantly above the mean for the city). The absolute heat above mean values are classified into these 5 classes using the Jenks Natural Breaks classification method, which seeks to reduce the variance within classes and maximize the variance between classes. Knowing where areas of high heat are located can help a city government plan for mitigation strategies.This dataset represents a snapshot in time. It will be updated yearly, but is static between updates. It does not take into account changes in heat during a single day, for example, from building shadows moving. The thermal readings detected by the Landsat 8 sensor are surface-level, whether that surface is the ground or the top of a building. Although there is strong correlation between surface temperature and air temperature, they are not the same. We believe that this is useful at the national level, and for cities that don’t have the ability to conduct their own hyper local temperature survey. Where local data is available, it may be more accurate than this dataset. Dataset SummaryThis dataset was developed using proprietary Python code developed at The Trust for Public Land, running on the Descartes Labs platform through the Descartes Labs API for Python. The Descartes Labs platform allows for extremely fast retrieval and processing of imagery, which makes it possible to produce heat island data for all cities in the United States in a relatively short amount of time.What can you do with this layer?This layer has query, identify, and export image services available. Since it is served as an image service, it is not necessary to download the data; the service itself is data that can be used directly in any Esri geoprocessing tool that accepts raster data as input.In order to click on the image service and see the raw pixel values in a map viewer, you must be signed in to ArcGIS Online, then Enable Pop-Ups and Configure Pop-Ups.Using the Urban Heat Island (UHI) Image ServicesThe data is made available as an image service. There is a processing template applied that supplies the yellow-to-red or blue-to-red color ramp, but once this processing template is removed (you can do this in ArcGIS Pro or ArcGIS Desktop, or in QGIS), the actual data values come through the service and can be used directly in a geoprocessing tool (for example, to extract an area of interest). Following are instructions for doing this in Pro.In ArcGIS Pro, in a Map view, in the Catalog window, click on Portal. In the Portal window, click on the far-right icon representing Living Atlas. Search on the acronyms “tpl” and “uhi”. The results returned will be the UHI image services. Right click on a result and select “Add to current map” from the context menu. When the image service is added to the map, right-click on it in the map view, and select Properties. In the Properties window, select Processing Templates. On the drop-down menu at the top of the window, the default Processing Template is either a yellow-to-red ramp or a blue-to-red ramp. Click the drop-down, and select “None”, then “OK”. Now you will have the actual pixel values displayed in the map, and available to any geoprocessing tool that takes a raster as input. Below is a screenshot of ArcGIS Pro with a UHI image service loaded, color ramp removed, and symbology changed back to a yellow-to-red ramp (a classified renderer can also be used): Other Sources of Heat Island InformationPlease see these websites for valuable information on heat islands and to learn about exciting new heat island research being led by scientists across the country:EPA’s Heat Island Resource CenterDr. Ladd Keith, University of ArizonaDr. Ben McMahan, University of Arizona Dr. Jeremy Hoffman, Science Museum of Virginia Dr. Hunter Jones, NOAA Daphne Lundi, Senior Policy Advisor, NYC Mayor's Office of Recovery and ResiliencyDisclaimer/FeedbackWith nearly 14,000 cities represented, checking each city's heat island raster for quality assurance would be prohibitively time-consuming, so The Trust for Public Land checked a statistically significant sample size for data quality. The sample passed all quality checks, with about 98.5% of the output cities error-free, but there could be instances where the user finds errors in the data. These errors will most likely take the form of a line of discontinuity where there is no city boundary; this type of error is caused by large temperature differences in two adjacent Landsat scenes, so the discontinuity occurs along scene boundaries (see figure below). The Trust for Public Land would appreciate feedback on these errors so that version 2 of the national UHI dataset can be improved. Contact Dale.Watt@tpl.org with feedback.
U.S. Census population data for Kansas counties from 1890 through 2010. The choropleth map shows 2010 population based on a quantile classification. Click on any county to see additional information about historic maximums, population loss, and trend in population since 1890.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Results of three group classification methods.
The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.
This dataset is a subset of the Hunter Riverine landscapes classes to be shown as an augmentation to the modelled river impacts layer.
It contains non-ephemeral landscape classes (low to mod intermittent, mod to highly intermittent and perennial) which are deemed to be potentially subject to hydrological change due to having their headwaters in areas subject to ACRD induced drawdown.
Potential impact is flagged at Q05, Q50 and Q95 levels in the attribute table.
for use in map reports
Non ephemeral stream landscape classes were compared with foot prints of 0.2m groundwater ACRD drawdown at the Q05 Q50 and Q95 levels. Streams rising out of and/or intersecting the footprints at the respective quantiles were tagged acoordingly were selected out and tagged accordingly in the attribute table
Bioregional Assessment Programme (2017) HUN SW Potentially Impacted Reaches by Quantile v01. Bioregional Assessment Derived Dataset. Viewed 13 March 2019, http://data.bioregionalassessments.gov.au/dataset/55c568ce-ec90-40ca-9fd6-6c8fa58519e7.
Derived From National Groundwater Dependent Ecosystems (GDE) Atlas (including WA)
Derived From HUN River Perenniality v01
Derived From HUN GW Model code v01
Derived From HUN Landscape Classification v02
Derived From Travelling Stock Route Conservation Values
Derived From HUN GW Model v01
Derived From NSW Wetlands
Derived From Climate Change Corridors Coastal North East NSW
Derived From NSW Office of Water Surface Water Licences Processed for Hunter v1 20140516
Derived From Climate Change Corridors for Nandewar and New England Tablelands
Derived From National Groundwater Dependent Ecosystems (GDE) Atlas
Derived From HUN GW Quantiles Interpolation for IMIA Database v01
Derived From BA ALL Assessment Units 1000m Reference 20160516_v01
Derived From Asset database for the Hunter subregion on 27 August 2015
Derived From Birds Australia - Important Bird Areas (IBA) 2009
Derived From Groundwater Economic Assets Hunter NSW 20150331 PersRem
Derived From Geofabric Surface Network - V2.1.1
Derived From Hunter CMA GDEs (DRAFT DPI pre-release)
Derived From Camerons Gorge Grassy White Box Endangered Ecological Community (EEC) 2008
Derived From Atlas of Living Australia NSW ALA Portal 20140613
Derived From Spatial Threatened Species and Communities (TESC) NSW 20131129
Derived From Estuarine Macrophytes of Hunter Subregion NSW DPI Hunter 2004
Derived From Asset database for the Hunter subregion on 24 February 2016
Derived From Natural Resource Management (NRM) Regions 2010
Derived From Gosford Council Endangered Ecological Communities (Umina woodlands) EEC3906
Derived From NSW Office of Water Surface Water Offtakes - Hunter v1 24102013
Derived From NSW Office of Water Surface Water Entitlements Locations v1_Oct2013
Derived From Australia - Species of National Environmental Significance Database
Derived From Asset list for Hunter - CURRENT
Derived From Northern Rivers CMA GDEs (DRAFT DPI pre-release)
Derived From GEODATA TOPO 250K Series 3, File Geodatabase format (.gdb)
Derived From Ramsar Wetlands of Australia
Derived From Bioregional_Assessment_Programme_Catchment Scale Land Use of Australia - 2014
Derived From GEODATA TOPO 250K Series 3
Derived From NSW Catchment Management Authority Boundaries 20130917
Derived From Geological Provinces - Full Extent
Derived From Hunter subregion boundary
Derived From Commonwealth Heritage List Spatial Database (CHL)
Derived From Groundwater Economic Elements Hunter NSW 20150520 PersRem v02
Derived From Greater Hunter Native Vegetation Mapping with Classification for Mapping
Derived From Native Vegetation Management (NVM) - Manage Benefits
Derived From Bioregional Assessment areas v03
Derived From HUN Groundwater tables 20170421
Derived From HUN Assessment Units 1000m 20160725 v02
Derived From HUN Landscape Classification v03
Derived From National Heritage List Spatial Database (NHL) (v2.1)
Derived From GW Element Bores with Unknown FTYPE Hunter NSW Office of Water 20150514
Derived From Climate Change Corridors (Dry Habitat) for North East NSW
Derived From Groundwater Entitlement Hunter NSW Office of Water 20150324
Derived From Asset database for the Hunter subregion on 20 July 2015
Derived From Fauna Corridors for North East NSW
Derived From NSW Office of Water combined geodatabase of regulated rivers and water sharing plan regions
Derived From BA ALL Assessment Units 1000m 'super set' 20160516_v01
Derived From NSW Office of Water GW licence extract linked to spatial locations for NorthandSouthSydney v3 13032014
Derived From Asset database for the Hunter subregion on 16 June 2015
Derived From Australia World Heritage Areas
Derived From Asset database for the Hunter subregion on 12 February 2015
Derived From [Lower Hunter
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Results of the normality test showing p-values at different temperature points using data transformations for three-group classification.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The thermoanalytical technique differential scanning calorimetry (DSC) has been applied to characterize protein denaturation patterns (thermograms) in blood plasma samples and relate these to a subject’s health status. The analysis and classification of thermograms is challenging because of the high-dimensionality of the dataset. There are various methods for group classification using high-dimensional data sets; however, the impact of using high-dimensional data sets for cancer classification has been poorly understood. In the present article, we proposed a statistical approach for data reduction and a parametric method (PM) for modeling of high-dimensional data sets for two- and three- group classification using DSC and demographic data. We compared the PM to the non-parametric classification method K-nearest neighbors (KNN) and the semi-parametric classification method KNN with dynamic time warping (DTW). We evaluated the performance of these methods for multiple two-group classifications: (i) normal versus cervical cancer, (ii) normal versus lung cancer, (iii) normal versus cancer (cervical + lung), (iv) lung cancer versus cervical cancer as well as for three-group classification: normal versus cervical cancer versus lung cancer. In general, performance for two-group classification was high whereas three-group classification was more challenging, with all three methods predicting normal samples more accurately than cancer samples. Moreover, specificity of the PM method was mostly higher or the same as KNN and DTW-KNN with lower sensitivity. The performance of KNN and DTW-KNN decreased with the inclusion of demographic data, whereas similar performance was observed for the PM which could be explained by the fact that the PM uses fewer parameters as compared to KNN and DTW-KNN methods and is thus less susceptible to the risk of overfitting. More importantly the accuracy of the PM can be increased by using a greater number of quantile data points and by the inclusion of additional demographic and clinical data, providing a substantial advantage over KNN and DTW-KNN methods.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
"Classification and Quantification of Strawberry Fruit Shape" is a dataset that includes raw RGB images and binary images of strawberry fruit. These folders contain JPEG images taken from the same experimental units on 2 different harvest dates. Images in each folder are labeled according to the 4 digit plot ID from the field experiment (####_) and the 10 digit individual ID (_##########).
"H1" and "H2" folders contain RGB images of multiple fruits. Each fruit was extracted and binarized to become the images in "H1_indiv" and "H2_indiv".
"H1_indiv" and "H2_indiv" folders contain images of individual fruit. Each fruit is bordered by ten white pixels. There are a total of 6,874 images between these two folders. The images were used then resized and scaled to be the images in "ReSized".
"ReSized" contains 6,874 binary images of individual berries. These images are all square images (1000x1000px) with the object represented by black pixels (0) and background represented with white pixels (1). Each image was scaled so that it would take up the maximum number of pixels in a 1000 x 1000px image and would maintain the aspect ratio.
"Fruit_image_data.csv" contains all of the morphometric features extracted from individual images including intermediate values.
All images title with the form "B##_NA" were discarded prior to any analyses. These images come from the buffer plots, not the experimental units of the study.
"PPKC_Figures.zip" contains all figures (F1-F7) and supplemental figures (S1-S7_ from the manuscript. Captions for the main figures are found in the manuscript. Captions for Supplemental figures are below.
Fig. S1 Results of PPKC against original cluster assignments. Ordered centroids from k = 2 to k = 8. On the left are the unordered assignments from k-means, and the on the right are the order assignments following PPKC. Cluster position indicated on the right [1, 8].
Fig. S2 Optimal Value of k. (A) Total within clusters sum of squares. (B) The inverse of the Adjusted R . (C) Akaike information criterion (AIC). (D) Bayesian information criterion (AIC). All metrics were calculated on a random sample of 3, 437 images (50%). 10 samples were randomly drawn. The vertical dashed line in each plot represents the optimal value of k. Reported metrics are standardized to be between [0, 1].
Fig. S3 Hierarchical clustering and distance between classes on PC1. The relationship between clusters at each value of k is represented as both a dendrogram and as bar plot. The labels on the dendrogram (i.e., V1, V2, V3,..., V10) represent the original cluster assignment from k-means. The barplot to the right of each dendrogram depicts the elements of the eigenvector associated with the largest eigenvalue form PPKC. The labels above each line represent the original cluster assignment.
Fig. S4 BLUPs for 13 selected features. For each plot, the X-axis is the index and the Y-axis is the BLUP value estimated from a linear mixed model. Grey points represent the mean feature value for each individual. Each point is the BLUP for a single genotype.
Fig. S5 Effects of Eigenfruit, Vertical Biomass, and Horizontal Biomass Analyses. (A) Effects of PC [1, 7] from the Eigenfruit analysis on the mean shape (center column). The left column is the mean shape minus 1.5× the standard deviation. Right is the mean shape plus 1.5× the standard deviation. The horizontal axis is the horizontal pixel position. The vertical axis is the vertical pixel position. (B) Effects of PC [1, 3] from the Horizontal Biomass analysis on the mean shape (center column). The left column is the mean shape minus 1.5× the standard deviation. Right is the mean shape plus 1.5× the standard deviation. The horizontal axis is the vertical position from the image (height). The vertical axis is the number of activated pixels (RowSum) at the given vertical position. (C) Effects of PC [1, 3] from the Vertical Biomass analysis on the mean shape (center column). The left column is the mean shape minus 1.5× the standard deviation. Right is the mean shape plus 1.5× the standard deviation. The horizontal axis is the horizontal position from the image (width). The vertical axis is the number of activated pixels (ColSum) at the given horizontal position.
Fig. S6 PPKC with variable sample size. Ordered centroids from k = 2 to k = 5 using different image sets for clustering. For all k = [2, 5], k-means clustering was performed using either 100, 80, 50%, or 20% of the total number of images; 6,874, 5, 500, 3, 437, and 1, 374 respectively. Cluster position indicated on the right [1, 5].
Fig. S7 Comparison of scale and continuous features. (A.) PPKC 4-unit ordinal scale. (B.) Distributions of the selected features with each level of k = 4 from the PPKC 4-unit ordinal scale. The light gray line is cluster 1, the medium gray line is cluster 2, the dark gray line is cluster 3, and the black line is cluster 4.
Sichkar V. N. Effect of various dimension convolutional layer filters on traffic sign classification accuracy. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2019, vol. 19, no. 3, pp. DOI: 10.17586/2226-1494-2019-19-3-546-552 (Full-text available here ResearchGate.net/profile/Valentyn_Sichkar)
Test online with custom Traffic Sign here: https://valentynsichkar.name/mnist.html
Design, Train & Test deep CNN for Image Classification. Join the course & enjoy new opportunities to get deep learning skills: https://www.udemy.com/course/convolutional-neural-networks-for-image-classification/
https://github.com/sichkar-valentyn/1-million-images-for-Traffic-Signs-Classification-tasks/blob/main/images/slideshow_classification.gif?raw=true%20=470x516" alt="CNN Course" title="CNN Course">
https://github.com/sichkar-valentyn/1-million-images-for-Traffic-Signs-Classification-tasks/blob/main/images/concept_map.png?raw=true%20=570x410" alt="Concept map" title="Concept map">
https://www.udemy.com/course/convolutional-neural-networks-for-image-classification/
This is ready to use preprocessed data saved into pickle
file.
Preprocessing stages are as follows:
- Normalizing whole data by dividing / 255.0
.
- Dividing whole data into three datasets: train, validation and test.
- Normalizing whole data by subtracting mean image
and dividing by standard deviation
.
- Transposing every dataset to make channels come first.
mean image
and standard deviation
were calculated from train dataset
and applied to all datasets.
When using user's image for classification, it has to be preprocessed firstly in the same way: normalized
, subtracted with mean image
and divided by standard deviation
.
Data written as dictionary with following keys:
x_train: (59000, 1, 28, 28)
y_train: (59000,)
x_validation: (1000, 1, 28, 28)
y_validation: (1000,)
x_test: (1000, 1, 28, 28)
y_test: (1000,)
Contains pretrained weights model_params_ConvNet1.pickle
for the model with following architecture:
Input
--> Conv
--> ReLU
--> Pool
--> Affine
--> ReLU
--> Affine
--> Softmax
Parameters:
Pool
is 2 and height = width = 2.
Architecture also can be understood as follows:
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3400968%2Fc23041248e82134b7d43ed94307b720e%2FModel_1_Architecture_MNIST.png?generation=1563654250901965&alt=media" alt="">
Initial data is MNIST that was collected by Yann LeCun, Corinna Cortes, Christopher J.C. Burges.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Phase 1: Signboard Detection Dataset This phase focuses on detecting signboards in street images. - Total Images: 8,366 - Image Format: JPG (8,366 images) - Resolution: - Minimum: (720, 443) - Maximum: (9,280, 8,285) - Mean: (4,202, 3,138) - Median: (4,032, 3,024) - Aspect Ratio: - Minimum: 0.5625 - Maximum: 5.7043 - Mean: 1.3691 - Most Frequent: 1.3333 - Standard Deviation: 0.2329 - File Size (KB): - Minimum: 88.19 KB - Maximum: 41,266.50 KB - Mean: 5,796.19 KB - Total Dataset Size: 48,490,924.91 KB - Color Statistics: - Color Mode: RGB (8,366 images) - Mean Color (RGB): (110.32, 112.77, 118.16) - Standard Deviation (RGB): (65.71, 65.36, 65.82) - Brightness: - Average: 114.10 --- Phase 2: Region of Text Interest (RTI) Detection Dataset This phase focuses on detecting specific text regions (names and addresses) within signboards. - Total Images: 8,036 - Image Format: JPG (8,036 images) - Resolution: - Minimum: (552, 156) - Maximum: (9,228, 4,682) - Mean: (2,753, 808) - Median: (2,741, 781) - Aspect Ratio: - Minimum: 0.9615 - Maximum: 11.3835 - Mean: 3.6058 - Most Frequent: 4.0 - Standard Deviation: 1.2475 - File Size (KB): - Minimum: 40.54 KB - Maximum: 7,968.94 KB - Mean: 653.67 KB - Total Dataset Size: 5,252,868.26 KB - Color Statistics: - Color Mode: RGB (8,036 images) - Mean Color (RGB): (137.58, 136.29, 144.00) - Standard Deviation (RGB): (47.26, 49.73, 50.89) - Brightness: - Average: 138.74 --- Named Entity Recognition (NER) Dataset This dataset is used for categorizing extracted text from signboards. - Total Entries: 42,547 - Unique Categories: 10 - Category Distribution: - Religious Sites: 10,641 - Retail Outlets: 8,275 - Educational Institutions: 6,826 - Healthcare Institutions: 4,708 - Restaurants: 3,868 - Pharmacies: 3,637 - Parks: 1,547 - Banks: 1,121 - Stations: 1,094 - Hotels: 830 #### Word Count Statistics: - Overall Word Count: - Maximum: 18 - Minimum: 1 - Mean: 3.82 - Category-Wise Word Count: - Banks: Mean: 4.65, Max: 11, Min: 1 - Educational Institutions: Mean: 4.60, Max: 18, Min: 1 - Healthcare Institutions: Mean: 4.02, Max: 16, Min: 1 - Religious Sites: Mean: 4.36, Max: 17, Min: 1 - Retail Outlets: Mean: 3.08, Max: 15, Min: 1 - Restaurants: Mean: 3.36, Max: 13, Min: 1 - Pharmacies: Mean: 2.91, Max: 13, Min: 1 - Parks: Mean: 3.10, Max: 11, Min: 1 - Stations: Mean: 3.72, Max: 17, Min: 1 - Hotels: Mean: 3.12, Max: 12, Min: 1 This dataset is structured for a two-phase object detection pipeline with an additional text classification task to categorize extracted text from detected regions.
In constructing the overall Boulder County Human Services Index (using Census Tracts as the geographical unit), we transform each indicator so that we can measure them on a similar scale. We express the data in standardized (or z-score) form, which indicates how far a Census Tract's raw score is from the mean of all Census Tracts. After the transformation process, all indicators have a mean (µ) of zero (0) and a standard deviation (σ) of one (1). The indicators thus can be expressed in the same units of measurement. The z-scores for each indicator were summed to get a total score for the index. The total score was classified into 5 categories using the natural breaks (Jenks) classification. Then they were ranked from very low to very high based on the natural breaks. We used the same methodology as developed by Dr. Lisa Piscopo, Executive Strategist for Denver Human Services.
Notice: this is not the latest Heat Island Severity image service.This layer contains the relative heat severity for every pixel for every city in the United States, including Alaska, Hawaii, and Puerto Rico. Heat Severity is a reclassified version of Heat Anomalies raster which is also published on this site. This data is generated from 30-meter Landsat 8 imagery band 10 (ground-level thermal sensor) from the summer of 2023.To explore previous versions of the data, visit the links below:Heat Severity - USA 2022Heat Severity - USA 2021Heat Severity - USA 2020Heat Severity - USA 2019Federal statistics over a 30-year period show extreme heat is the leading cause of weather-related deaths in the United States. Extreme heat exacerbated by urban heat islands can lead to increased respiratory difficulties, heat exhaustion, and heat stroke. These heat impacts significantly affect the most vulnerable—children, the elderly, and those with preexisting conditions.The purpose of this layer is to show where certain areas of cities are hotter than the average temperature for that same city as a whole. Severity is measured on a scale of 1 to 5, with 1 being a relatively mild heat area (slightly above the mean for the city), and 5 being a severe heat area (significantly above the mean for the city). The absolute heat above mean values are classified into these 5 classes using the Jenks Natural Breaks classification method, which seeks to reduce the variance within classes and maximize the variance between classes. Knowing where areas of high heat are located can help a city government plan for mitigation strategies.This dataset represents a snapshot in time. It will be updated yearly, but is static between updates. It does not take into account changes in heat during a single day, for example, from building shadows moving. The thermal readings detected by the Landsat 8 sensor are surface-level, whether that surface is the ground or the top of a building. Although there is strong correlation between surface temperature and air temperature, they are not the same. We believe that this is useful at the national level, and for cities that don’t have the ability to conduct their own hyper local temperature survey. Where local data is available, it may be more accurate than this dataset. Dataset SummaryThis dataset was developed using proprietary Python code developed at Trust for Public Land, running on the Descartes Labs platform through the Descartes Labs API for Python. The Descartes Labs platform allows for extremely fast retrieval and processing of imagery, which makes it possible to produce heat island data for all cities in the United States in a relatively short amount of time.What can you do with this layer?This layer has query, identify, and export image services available. Since it is served as an image service, it is not necessary to download the data; the service itself is data that can be used directly in any Esri geoprocessing tool that accepts raster data as input.In order to click on the image service and see the raw pixel values in a map viewer, you must be signed in to ArcGIS Online, then Enable Pop-Ups and Configure Pop-Ups.Using the Urban Heat Island (UHI) Image ServicesThe data is made available as an image service. There is a processing template applied that supplies the yellow-to-red or blue-to-red color ramp, but once this processing template is removed (you can do this in ArcGIS Pro or ArcGIS Desktop, or in QGIS), the actual data values come through the service and can be used directly in a geoprocessing tool (for example, to extract an area of interest). Following are instructions for doing this in Pro.In ArcGIS Pro, in a Map view, in the Catalog window, click on Portal. In the Portal window, click on the far-right icon representing Living Atlas. Search on the acronyms “tpl” and “uhi”. The results returned will be the UHI image services. Right click on a result and select “Add to current map” from the context menu. When the image service is added to the map, right-click on it in the map view, and select Properties. In the Properties window, select Processing Templates. On the drop-down menu at the top of the window, the default Processing Template is either a yellow-to-red ramp or a blue-to-red ramp. Click the drop-down, and select “None”, then “OK”. Now you will have the actual pixel values displayed in the map, and available to any geoprocessing tool that takes a raster as input. Below is a screenshot of ArcGIS Pro with a UHI image service loaded, color ramp removed, and symbology changed back to a yellow-to-red ramp (a classified renderer can also be used): A typical operation at this point is to clip out your area of interest. To do this, add your polygon shapefile or feature class to the map view, and use the Clip Raster tool to export your area of interest as a geoTIFF raster (file extension ".tif"). In the environments tab for the Clip Raster tool, click the dropdown for "Extent" and select "Same as Layer:", and select the name of your polygon. If you then need to convert the output raster to a polygon shapefile or feature class, run the Raster to Polygon tool, and select "Value" as the field.Other Sources of Heat Island InformationPlease see these websites for valuable information on heat islands and to learn about exciting new heat island research being led by scientists across the country:EPA’s Heat Island Resource CenterDr. Ladd Keith, University of ArizonaDr. Ben McMahan, University of Arizona Dr. Jeremy Hoffman, Science Museum of Virginia Dr. Hunter Jones, NOAA Daphne Lundi, Senior Policy Advisor, NYC Mayor's Office of Recovery and ResiliencyDisclaimer/FeedbackWith nearly 14,000 cities represented, checking each city's heat island raster for quality assurance would be prohibitively time-consuming, so Trust for Public Land checked a statistically significant sample size for data quality. The sample passed all quality checks, with about 98.5% of the output cities error-free, but there could be instances where the user finds errors in the data. These errors will most likely take the form of a line of discontinuity where there is no city boundary; this type of error is caused by large temperature differences in two adjacent Landsat scenes, so the discontinuity occurs along scene boundaries (see figure below). Trust for Public Land would appreciate feedback on these errors so that version 2 of the national UHI dataset can be improved. Contact Dale.Watt@tpl.org with feedback.
Notice: this is not the latest Heat Island Severity image service. For 2023 data, visit https://tpl.maps.arcgis.com/home/item.html?id=db5bdb0f0c8c4b85b8270ec67448a0b6. This layer contains the relative heat severity for every pixel for every city in the United States. This 30-meter raster was derived from Landsat 8 imagery band 10 (ground-level thermal sensor) from the summers of 2018 and 2019.Federal statistics over a 30-year period show extreme heat is the leading cause of weather-related deaths in the United States. Extreme heat exacerbated by urban heat islands can lead to increased respiratory difficulties, heat exhaustion, and heat stroke. These heat impacts significantly affect the most vulnerable—children, the elderly, and those with preexisting conditions.The purpose of this layer is to show where certain areas of cities are hotter than the average temperature for that same city as a whole. Severity is measured on a scale of 1 to 5, with 1 being a relatively mild heat area (slightly above the mean for the city), and 5 being a severe heat area (significantly above the mean for the city). The absolute heat above mean values are classified into these 5 classes using the Jenks Natural Breaks classification method, which seeks to reduce the variance within classes and maximize the variance between classes. Knowing where areas of high heat are located can help a city government plan for mitigation strategies.This dataset represents a snapshot in time. It will be updated yearly, but is static between updates. It does not take into account changes in heat during a single day, for example, from building shadows moving. The thermal readings detected by the Landsat 8 sensor are surface-level, whether that surface is the ground or the top of a building. Although there is strong correlation between surface temperature and air temperature, they are not the same. We believe that this is useful at the national level, and for cities that don’t have the ability to conduct their own hyper local temperature survey. Where local data is available, it may be more accurate than this dataset. Dataset SummaryThis dataset was developed using proprietary Python code developed at The Trust for Public Land, running on the Descartes Labs platform through the Descartes Labs API for Python. The Descartes Labs platform allows for extremely fast retrieval and processing of imagery, which makes it possible to produce heat island data for all cities in the United States in a relatively short amount of time.What can you do with this layer?This layer has query, identify, and export image services available. Since it is served as an image service, it is not necessary to download the data; the service itself is data that can be used directly in any Esri geoprocessing tool that accepts raster data as input.Using the Urban Heat Island (UHI) Image ServicesThe data is made available as an image service. There is a processing template applied that supplies the yellow-to-red or blue-to-red color ramp, but once this processing template is removed (you can do this in ArcGIS Pro or ArcGIS Desktop, or in QGIS), the actual data values come through the service and can be used directly in a geoprocessing tool (for example, to extract an area of interest). Following are instructions for doing this in Pro.In ArcGIS Pro, in a Map view, in the Catalog window, click on Portal. In the Portal window, click on the far-right icon representing Living Atlas. Search on the acronyms “tpl” and “uhi”. The results returned will be the UHI image services. Right click on a result and select “Add to current map” from the context menu. When the image service is added to the map, right-click on it in the map view, and select Properties. In the Properties window, select Processing Templates. On the drop-down menu at the top of the window, the default Processing Template is either a yellow-to-red ramp or a blue-to-red ramp. Click the drop-down, and select “None”, then “OK”. Now you will have the actual pixel values displayed in the map, and available to any geoprocessing tool that takes a raster as input. Below is a screenshot of ArcGIS Pro with a UHI image service loaded, color ramp removed, and symbology changed back to a yellow-to-red ramp (a classified renderer can also be used): Other Sources of Heat Island InformationPlease see these websites for valuable information on heat islands and to learn about exciting new heat island research being led by scientists across the country:EPA’s Heat Island Resource CenterDr. Ladd Keith, University of Arizona Dr. Ben McMahan, University of Arizona Dr. Jeremy Hoffman, Science Museum of Virginia Dr. Hunter Jones, NOAADaphne Lundi, Senior Policy Advisor, NYC Mayor's Office of Recovery and ResiliencyDisclaimer/FeedbackWith nearly 14,000 cities represented, checking each city's heat island raster for quality assurance would be prohibitively time-consuming, so The Trust for Public Land checked a statistically significant sample size for data quality. The sample passed all quality checks, with about 98.5% of the output cities error-free, but there could be instances where the user finds errors in the data. These errors will most likely take the form of a line of discontinuity where there is no city boundary; this type of error is caused by large temperature differences in two adjacent Landsat scenes, so the discontinuity occurs along scene boundaries (see figure below). The Trust for Public Land would appreciate feedback on these errors so that version 2 of the national UHI dataset can be improved. Contact Dale.Watt@tpl.org with feedback.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We introduce classifiers based on directional quantiles. We derive theoretical results for selecting optimal quantile levels given a direction, and, conversely, an optimal direction given a quantile level. We also show that the probability of correct classification of the proposed classifier converges to one if population distributions differ by at most a location shift and if the number of directions is allowed to diverge at the same rate of the problem’s dimension. We illustrate the satisfactory performance of our proposed classifiers in both small- and high-dimensional settings via a simulation study and a real data example. The code implementing the proposed methods is publicly available in the R package Qtools. Supplementary materials for this article are available online.