62 datasets found
  1. Data from: Directional Quantile Classifiers

    • tandf.figshare.com
    txt
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alessio Farcomeni; Marco Geraci; Cinzia Viroli (2023). Directional Quantile Classifiers [Dataset]. http://doi.org/10.6084/m9.figshare.17711340.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Alessio Farcomeni; Marco Geraci; Cinzia Viroli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We introduce classifiers based on directional quantiles. We derive theoretical results for selecting optimal quantile levels given a direction, and, conversely, an optimal direction given a quantile level. We also show that the probability of correct classification of the proposed classifier converges to one if population distributions differ by at most a location shift and if the number of directions is allowed to diverge at the same rate of the problem’s dimension. We illustrate the satisfactory performance of our proposed classifiers in both small- and high-dimensional settings via a simulation study and a real data example. The code implementing the proposed methods is publicly available in the R package Qtools. Supplementary materials for this article are available online.

  2. a

    Population Density in Tioga County NY

    • tiogatells-tiogacountyny.hub.arcgis.com
    Updated Jun 14, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tioga County NY (2019). Population Density in Tioga County NY [Dataset]. https://tiogatells-tiogacountyny.hub.arcgis.com/maps/ae0a6e1e4f8144079ba29ed97cb6125c
    Explore at:
    Dataset updated
    Jun 14, 2019
    Dataset authored and provided by
    Tioga County NY
    Area covered
    Description

    The map shows population density in Tioga County NY using a quantile classification with 5 data breaks each rounded to the nearest 10 people. The population data is census block level data from the 2010 U.S. Census.

  3. f

    Quantiles of sensitivity, specificity and log posterior for training and...

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    • +1more
    xls
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wenbiao Hu; Rebecca A. O'Leary; Kerrie Mengersen; Samantha Low Choy (2023). Quantiles of sensitivity, specificity and log posterior for training and validation datasets over all accepted trees, for Bayesian classification trees. [Dataset]. http://doi.org/10.1371/journal.pone.0023903.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Wenbiao Hu; Rebecca A. O'Leary; Kerrie Mengersen; Samantha Low Choy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Quantiles of sensitivity, specificity and log posterior for training and validation datasets over all accepted trees, for Bayesian classification trees.

  4. a

    Kansas Population 1890-2020

    • hub.arcgis.com
    Updated Mar 15, 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kansas State University (2013). Kansas Population 1890-2020 [Dataset]. https://hub.arcgis.com/maps/kstate::kansas-population-1890-2020/explore?path=
    Explore at:
    Dataset updated
    Mar 15, 2013
    Dataset authored and provided by
    Kansas State University
    Area covered
    Description

    U.S. Census population data for Kansas counties from 1890 through 2010. The choropleth map shows 2010 population based on a quantile classification. Click on any county to see additional information about historic maximums, population loss, and trend in population since 1890.

  5. Z

    Data from: Dataset from : Browsing is a strong filter for savanna tree...

    • data.niaid.nih.gov
    Updated Oct 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archibald, Sally; Wayne Twine; Craddock Mthabini; Nicola Stevens (2021). Dataset from : Browsing is a strong filter for savanna tree seedlings in their first growing season [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4972083
    Explore at:
    Dataset updated
    Oct 1, 2021
    Dataset provided by
    School of Animal Plant and Environmental Sciences, University of Witwatersrand, Johannesburg, South Africa
    Centre for African Ecology, School of Animal Plant and Environmental Sciences, University of Witwatersrand, Johannesburg, South Africa
    Centre for African Ecology, School of Animal Plant and Environmental Sciences, University of Witwatersrand, Johannesburg, South Africa AND Environmental Change Institute, School of Geography and the Environment, University of Oxford, Oxford OX1 3QY, United Kingdom
    Authors
    Archibald, Sally; Wayne Twine; Craddock Mthabini; Nicola Stevens
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data presented here were used to produce the following paper:

    Archibald, Twine, Mthabini, Stevens (2021) Browsing is a strong filter for savanna tree seedlings in their first growing season. J. Ecology.

    The project under which these data were collected is: Mechanisms Controlling Species Limits in a Changing World. NRF/SASSCAL Grant number 118588

    For information on the data or analysis please contact Sally Archibald: sally.archibald@wits.ac.za

    Description of file(s):

    File 1: cleanedData_forAnalysis.csv (required to run the R code: "finalAnalysis_PostClipResponses_Feb2021_requires_cleanData_forAnalysis_.R"

    The data represent monthly survival and growth data for ~740 seedlings from 10 species under various levels of clipping.

    The data consist of one .csv file with the following column names:

    treatment Clipping treatment (1 - 5 months clip plus control unclipped) plot_rep One of three randomised plots per treatment matrix_no Where in the plot the individual was placed species_code First three letters of the genus name, and first three letters of the species name uniquely identifies the species species Full species name sample_period Classification of sampling period into time since clip. status Alive or Dead standing.height Vertical height above ground (in mm) height.mm Length of the longest branch (in mm) total.branch.length Total length of all the branches (in mm) stemdiam.mm Basal stem diameter (in mm) maxSpineLength.mm Length of the longest spine postclipStemNo Number of resprouting stems (only recorded AFTER clipping) date.clipped date.clipped date.measured date.measured date.germinated date.germinated Age.of.plant Date measured - Date germinated newtreat Treatment as a numeric variable, with 8 being the control plot (for plotting purposes)

    File 2: Herbivory_SurvivalEndofSeason_march2017.csv (required to run the R code: "FinalAnalysisResultsSurvival_requires_Herbivory_SurvivalEndofSeason_march2017.R"

    The data consist of one .csv file with the following column names:

    treatment Clipping treatment (1 - 5 months clip plus control unclipped) plot_rep One of three randomised plots per treatment matrix_no Where in the plot the individual was placed species_code First three letters of the genus name, and first three letters of the species name uniquely identifies the species species Full species name sample_period Classification of sampling period into time since clip. status Alive or Dead standing.height Vertical height above ground (in mm) height.mm Length of the longest branch (in mm) total.branch.length Total length of all the branches (in mm) stemdiam.mm Basal stem diameter (in mm) maxSpineLength.mm Length of the longest spine postclipStemNo Number of resprouting stems (only recorded AFTER clipping) date.clipped date.clipped date.measured date.measured date.germinated date.germinated Age.of.plant Date measured - Date germinated newtreat Treatment as a numeric variable, with 8 being the control plot (for plotting purposes) genus Genus MAR Mean Annual Rainfall for that Species distribution (mm) rainclass High/medium/low

    File 3: allModelParameters_byAge.csv (required to run the R code: "FinalModelSeedlingSurvival_June2021_.R"

    Consists of a .csv file with the following column headings

    Age.of.plant Age in days species_code Species pred_SD_mm Predicted stem diameter in mm pred_SD_up top 75th quantile of stem diameter in mm pred_SD_low bottom 25th quantile of stem diameter in mm treatdate date when clipped pred_surv Predicted survival probability pred_surv_low Predicted 25th quantile survival probability pred_surv_high Predicted 75th quantile survival probability species_code species code Bite.probability Daily probability of being eaten max_bite_diam_duiker_mm Maximum bite diameter of a duiker for this species duiker_sd standard deviation of bite diameter for a duiker for this species max_bite_diameter_kudu_mm Maximum bite diameer of a kudu for this species kudu_sd standard deviation of bite diameter for a kudu for this species mean_bite_diam_duiker_mm mean etc duiker_mean_sd standard devaition etc mean_bite_diameter_kudu_mm mean etc kudu_mean_sd standard deviation etc genus genus rainclass low/med/high

    File 4: EatProbParameters_June2020.csv (required to run the R code: "FinalModelSeedlingSurvival_June2021_.R"

    Consists of a .csv file with the following column headings

    shtspec species name species_code species code genus genus rainclass low/medium/high seed mass mass of seed (g per 1000seeds)
    Surv_intercept coefficient of the model predicting survival from age of clip for this species Surv_slope coefficient of the model predicting survival from age of clip for this species GR_intercept coefficient of the model predicting stem diameter from seedling age for this species GR_slope coefficient of the model predicting stem diameter from seedling age for this species species_code species code max_bite_diam_duiker_mm Maximum bite diameter of a duiker for this species duiker_sd standard deviation of bite diameter for a duiker for this species max_bite_diameter_kudu_mm Maximum bite diameer of a kudu for this species kudu_sd standard deviation of bite diameter for a kudu for this species mean_bite_diam_duiker_mm mean etc duiker_mean_sd standard devaition etc mean_bite_diameter_kudu_mm mean etc kudu_mean_sd standard deviation etc AgeAtEscape_duiker[t] age of plant when its stem diameter is larger than a mean duiker bite AgeAtEscape_duiker_min[t] age of plant when its stem diameter is larger than a min duiker bite AgeAtEscape_duiker_max[t] age of plant when its stem diameter is larger than a max duiker bite AgeAtEscape_kudu[t] age of plant when its stem diameter is larger than a mean kudu bite AgeAtEscape_kudu_min[t] age of plant when its stem diameter is larger than a min kudu bite AgeAtEscape_kudu_max[t] age of plant when its stem diameter is larger than a max kudu bite

  6. Support vector machine with quantile hyper-spheres for pattern...

    • plos.figshare.com
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maoxiang Chu; Xiaoping Liu; Rongfen Gong; Jie Zhao (2023). Support vector machine with quantile hyper-spheres for pattern classification [Dataset]. http://doi.org/10.1371/journal.pone.0212361
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Maoxiang Chu; Xiaoping Liu; Rongfen Gong; Jie Zhao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This paper formulates a support vector machine with quantile hyper-spheres (QHSVM) for pattern classification. The idea of QHSVM is to build two quantile hyper-spheres with the same center for positive or negative training samples. Every quantile hyper-sphere is constructed by using pinball loss instead of hinge loss, which makes the new classification model be insensitive to noise, especially the feature noise around the decision boundary. Moreover, the robustness and generalization of QHSVM are strengthened through maximizing the margin between two quantile hyper-spheres, maximizing the inner-class clustering of samples and optimizing the independent quadratic programming for a target class. Besides that, this paper proposes a novel local center-based density estimation method. Based on it, ρ-QHSVM with surrounding and clustering samples is given. Under the premise of high accuracy, the execution speed of ρ-QHSVM can be adjusted. The experimental results in artificial, benchmark and strip steel surface defects datasets show that the QHSVM model has distinct advantages in accuracy and the ρ-QHSVM model is fit for large-scale datasets.

  7. f

    Data from: Quantile regression of nonlinear models to describe different...

    • scielo.figshare.com
    jpeg
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guilherme Alves Puiatti; Paulo Roberto Cecon; Moysés Nascimento; Ana Carolina Campana Nascimento; Antônio Policarpo Souza Carneiro; Fabyano Fonseca e Silva; Mário Puiatti; Ana Carolina Ribeiro de Oliveira (2023). Quantile regression of nonlinear models to describe different levels of dry matter accumulation in garlic plants [Dataset]. http://doi.org/10.6084/m9.figshare.5907898.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    SciELO journals
    Authors
    Guilherme Alves Puiatti; Paulo Roberto Cecon; Moysés Nascimento; Ana Carolina Campana Nascimento; Antônio Policarpo Souza Carneiro; Fabyano Fonseca e Silva; Mário Puiatti; Ana Carolina Ribeiro de Oliveira
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ABSTRACT: Plant growth analyses are important because they generate information on the demand and necessary care for each development stage of a plant. Nonlinear regression models are appropriate for the description of curves of growth, since they include parameters with practical biological interpretation. However, these models present information in terms of the conditional mean, and they are subject to problems in the adjustment caused by possible outliers or asymmetry in the distribution of the data. Quantile regression can solve these problems, and it allows the estimation of different quantiles, generating more complete and robust results. The objective of this research was to adjust a nonlinear quantile regression model for the study of dry matter accumulation in garlic plants (Allium sativum L.) over time, estimating parameters at three different quantiles and classifying each garlic accession according to its growth rate and asymptotic weight. The nonlinear regression model fitted was a Logistic model, and 30 garlic accessions were evaluated. These 30 accessions were divided based on the model with the closest quantile estimates; 12 accessions were classified as of lesser interest for planting, 6 were classified as intermediate, and 12 were classified as of greater interest for planting.

  8. a

    Worldwide CO2 Emissions 2007

    • umn.hub.arcgis.com
    Updated Apr 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Minnesota (2022). Worldwide CO2 Emissions 2007 [Dataset]. https://umn.hub.arcgis.com/maps/8ca5236b4b2444718c4cc9ab824f8962
    Explore at:
    Dataset updated
    Apr 14, 2022
    Dataset authored and provided by
    University of Minnesota
    Area covered
    Description

    Quantile classification rounded to 100,000.Pop-up graphs show CO2 emissions over time since 1961Data from the World Bank.

  9. Depth (Standard Deviation) Layer used to identify, delineate and classify...

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated Mar 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Commerce (DOC), National Oceanic and Atmospheric Administration (NOAA), National Ocean Service (NOS), Center for Coastal Monitoring and Assessment (CCMA), Biogeography Branch (Point of Contact) (2025). Depth (Standard Deviation) Layer used to identify, delineate and classify moderate-depth benthic habitats around St. John, USVI [Dataset]. https://catalog.data.gov/dataset/depth-standard-deviation-layer-used-to-identify-delineate-and-classify-moderate-depth-benthic-h4
    Explore at:
    Dataset updated
    Mar 22, 2025
    Dataset provided by
    National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
    United States Department of Commercehttp://commerce.gov/
    Area covered
    Saint John, U.S. Virgin Islands
    Description

    Standard deviation of depth was calculated from the bathymetry surface for each cell using the ArcGIS Spatial Analyst Focal Statistics "STD" parameter. Standard deviation of depth represents the dispersion of depth values (in meters) around the mean depth within a square 3x3 cell window. The 2x2 meter resolution standard deviation of depth GeoTIFF was exported and added as a new map layer to aid in benthic habitat classification. Acoustic imagery was acquired for the VICRNM on two separate missions onboard the NOAA ship, Nancy Foster. The first mission took place from 2/18/04 to 3/5/04. The second mission took place from 2/1/05 to 2/12/05. On both missions, seafloor depths between 14 to 55 m were mapped using a RESON SeaBat 8101 ER (240 kHz) MBES sensor. This pole-mounted system measured water depths across a 150 degree swath consisting of 101 individual 1.5 degree x 1.5 degree beams. The beams to the port and starboard of nadir (i.e., directly underneath the ship) overlapped adjacent survey lines by approximately 10 m. The vessel survey speed was between 5 and 8 kn. In 2004, the ship's location was determined by a Trimble DSM 132 DGPS system, which provided a RTCM differential data stream from the U.S. Coast Guard Continually Operating Reference Station (CORS) at Port Isabel, Puerto Rico. Gyro, heave, pitch and roll correctors were acquired using an Ixsea Octans gyrocompass. In 2005, the ship's positioning and orientation were determined by the Applanix POS/MV 320 V4, which is a GPS aided Inertial Motion Unit (IMU) providing measurements of roll, pitch and heading. The POS/MV obtained its positions from two dual frequency Trimble Zephyr GPS antennae. An auxiliary Trimble DSM 132 DGPS system provided a RTCM differential data stream from the U.S. Coast Guard CORS at Port Isabel, Puerto Rico. For both years, CTD (conductivity, temperature and depth) measurements were taken approximately every 4 hours using a Seabird Electronics SBE-19 to correct for the changing sound velocities in the water column. In 2004, raw data were logged in .xtf (extended triton format) using Triton ISIS software 6.2. In 2005, raw data were logged in .gsf (generic sensor format) using SAIC ISS 2000 software. Data from 2004 were referenced to the WGS84 UTM 20 N horizontal coordinate system, and data from 2005 were referenced to the NAD83 UTM 20 N horizontal coordinate system. Data from both projects were referenced to the Mean Lower Low Water (MLLW) vertical tidal coordinate system. The 2004 and 2005 MBES bathymetric data were both corrected for sensor offsets, latency, roll, pitch, yaw, static draft, the changing speed of sound in the water column and the influence of tides in CARIS Hips & Sips 5.3 and 5.4, respectively. The 2004 data was then binned to create a 1 x 1 m raster surface, and the 2005 data was binned to a create 2 x 2 m raster surface. After these final surfaces were created, the datum for the 2004 bathymetric surfaces was transformed from WGS84 to NAD83 using the "Project Raster" function in ArcGIS 9.1. The 2004 surface was transformed so that it would have the same datum as the 2005 surface. The 2004 bathymetric surface was then down sampled from 1 x 1 to 2 x 2 m using the "Resample" function in ArcGIS 9.1. The 2004 surface was resampled so it would have the same spatial resolution as the 2005 surface. Having the same coordinate systems and spatial resolutions, the final 2004 and 2005 bathymetry rasters were then merged using the Raster Calculator function "Merge" in ArcGIS's Spatial Analyst Extension to create a seamless bathymetry surface for the entire VICRNM area south of St. John. For a complete description of the data acquisition and processing parameters, please see the data acquisition and processing reports (DAPRs) for projects: NF-04-06-VI and NF-05-05-VI (Monaco & Rooney, 2004; Battista & Lazar, 2005).

  10. f

    Results of two group classification methods.

    • plos.figshare.com
    xls
    Updated Jun 4, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shesh N. Rai; Sudhir Srivastava; Jianmin Pan; Xiaoyong Wu; Somesh P. Rai; Chongkham S. Mekmaysy; Lynn DeLeeuw; Jonathan B. Chaires; Nichola C. Garbett (2023). Results of two group classification methods. [Dataset]. http://doi.org/10.1371/journal.pone.0220765.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Shesh N. Rai; Sudhir Srivastava; Jianmin Pan; Xiaoyong Wu; Somesh P. Rai; Chongkham S. Mekmaysy; Lynn DeLeeuw; Jonathan B. Chaires; Nichola C. Garbett
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Results of two group classification methods.

  11. HUN SW Potentially Impacted Reaches by Quantile v01

    • researchdata.edu.au
    • data.gov.au
    • +1more
    Updated Oct 9, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bioregional Assessment Program (2018). HUN SW Potentially Impacted Reaches by Quantile v01 [Dataset]. https://researchdata.edu.au/hun-sw-potentially-quantile-v01/2986501
    Explore at:
    Dataset updated
    Oct 9, 2018
    Dataset provided by
    Data.govhttps://data.gov/
    Authors
    Bioregional Assessment Program
    Description

    Abstract

    The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.

    This dataset is a subset of the Hunter Riverine landscapes classes to be shown as an augmentation to the modelled river impacts layer.

    It contains non-ephemeral landscape classes (low to mod intermittent, mod to highly intermittent and perennial) which are deemed to be potentially subject to hydrological change due to having their headwaters in areas subject to ACRD induced drawdown.

    Potential impact is flagged at Q05, Q50 and Q95 levels in the attribute table.

    Purpose

    for use in map reports

    Dataset History

    Non ephemeral stream landscape classes were compared with foot prints of 0.2m groundwater ACRD drawdown at the Q05 Q50 and Q95 levels. Streams rising out of and/or intersecting the footprints at the respective quantiles were tagged acoordingly were selected out and tagged accordingly in the attribute table

    Dataset Citation

    Bioregional Assessment Programme (2017) HUN SW Potentially Impacted Reaches by Quantile v01. Bioregional Assessment Derived Dataset. Viewed 13 March 2019, http://data.bioregionalassessments.gov.au/dataset/55c568ce-ec90-40ca-9fd6-6c8fa58519e7.

    Dataset Ancestors

  12. f

    The average classification accuracy (%), standard error, and standard...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sagastume, Giancarlo K.; Schofield, Jonathon S.; Whittle, Richard S.; Hong, Kihun; Young, Peyton R.; Battraw, Marcus A.; Winslow, Eden J. (2025). The average classification accuracy (%), standard error, and standard deviation for each sensing modality for test 1. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0002061744
    Explore at:
    Dataset updated
    Apr 10, 2025
    Authors
    Sagastume, Giancarlo K.; Schofield, Jonathon S.; Whittle, Richard S.; Hong, Kihun; Young, Peyton R.; Battraw, Marcus A.; Winslow, Eden J.
    Description

    The average classification accuracy (%), standard error, and standard deviation for each sensing modality for test 1.

  13. Simple Classification Playground

    • kaggle.com
    zip
    Updated Aug 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marcin Wierzbiński (2023). Simple Classification Playground [Dataset]. https://www.kaggle.com/datasets/martininf1n1ty/simple-classification-playground
    Explore at:
    zip(6023 bytes)Available download formats
    Dataset updated
    Aug 22, 2023
    Authors
    Marcin Wierzbiński
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Description

    This dataset represents synthetic cell data with distinct clusters. The data simulates the characteristics of cells, with each cell being described by its coordinates on a 2D plane (X-axis and Y-axis) and assigned to a specific cluster.

    Dataset Characteristics:

    Number of Cells: 300 Number of Genes: 2 Number of Clusters: 3 Cluster Characteristics:

    Cluster 1: X Mean: 2, X Standard Deviation: 0.5 Y Mean: 3, Y Standard Deviation: 0.4 Cluster 2: X Mean: 6, X Standard Deviation: 0.7 Y Mean: 7, Y Standard Deviation: 0.8 Cluster 3: X Mean: 10, X Standard Deviation: 0.6 Y Mean: 11, Y Standard Deviation: 0.5 Cluster Proportions:

    Cluster 1: 40% Cluster 2: 30% Cluster 3: 30% Visualization: The dataset is visualized on a scatter plot where each point represents a cell. The X-axis and Y-axis represent the coordinates of each cell, and different colors are used to distinguish cells belonging to different clusters. The legend indicates the corresponding cluster for each color.

    This synthetic dataset is created for illustrative purposes and showcases distinct clusters with varying characteristics.

  14. f

    Mean and standard deviation (SD) of the best classification accuracy...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Jun 10, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cappello, Angelo; Mangia, Anna Lisa; Simoncini, Laura; Pirini, Marco (2014). Mean and standard deviation (SD) of the best classification accuracy obtained for the healthy subjects and the patients for each cardinality in the Imagery Trial and in the pre-Communication Trial. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001191782
    Explore at:
    Dataset updated
    Jun 10, 2014
    Authors
    Cappello, Angelo; Mangia, Anna Lisa; Simoncini, Laura; Pirini, Marco
    Description

    Mean and standard deviation (SD) of the best classification accuracy obtained for the healthy subjects and the patients for each cardinality in the Imagery Trial and in the pre-Communication Trial.

  15. Multi-group diagnostic classification of high-dimensional data using...

    • plos.figshare.com
    tiff
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shesh N. Rai; Sudhir Srivastava; Jianmin Pan; Xiaoyong Wu; Somesh P. Rai; Chongkham S. Mekmaysy; Lynn DeLeeuw; Jonathan B. Chaires; Nichola C. Garbett (2023). Multi-group diagnostic classification of high-dimensional data using differential scanning calorimetry plasma thermograms [Dataset]. http://doi.org/10.1371/journal.pone.0220765
    Explore at:
    tiffAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Shesh N. Rai; Sudhir Srivastava; Jianmin Pan; Xiaoyong Wu; Somesh P. Rai; Chongkham S. Mekmaysy; Lynn DeLeeuw; Jonathan B. Chaires; Nichola C. Garbett
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The thermoanalytical technique differential scanning calorimetry (DSC) has been applied to characterize protein denaturation patterns (thermograms) in blood plasma samples and relate these to a subject’s health status. The analysis and classification of thermograms is challenging because of the high-dimensionality of the dataset. There are various methods for group classification using high-dimensional data sets; however, the impact of using high-dimensional data sets for cancer classification has been poorly understood. In the present article, we proposed a statistical approach for data reduction and a parametric method (PM) for modeling of high-dimensional data sets for two- and three- group classification using DSC and demographic data. We compared the PM to the non-parametric classification method K-nearest neighbors (KNN) and the semi-parametric classification method KNN with dynamic time warping (DTW). We evaluated the performance of these methods for multiple two-group classifications: (i) normal versus cervical cancer, (ii) normal versus lung cancer, (iii) normal versus cancer (cervical + lung), (iv) lung cancer versus cervical cancer as well as for three-group classification: normal versus cervical cancer versus lung cancer. In general, performance for two-group classification was high whereas three-group classification was more challenging, with all three methods predicting normal samples more accurately than cancer samples. Moreover, specificity of the PM method was mostly higher or the same as KNN and DTW-KNN with lower sensitivity. The performance of KNN and DTW-KNN decreased with the inclusion of demographic data, whereas similar performance was observed for the PM which could be explained by the fact that the PM uses fewer parameters as compared to KNN and DTW-KNN methods and is thus less susceptible to the risk of overfitting. More importantly the accuracy of the PM can be increased by using a greater number of quantile data points and by the inclusion of additional demographic and clinical data, providing a substantial advantage over KNN and DTW-KNN methods.

  16. f

    Mean and standard deviation values of operating characteristics (OC), for...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Jun 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antunes, Marília; Medeiros, Ana Margarida; Alves, Ana Catarina; Bourbon, Mafalda; Albuquerque, João (2022). Mean and standard deviation values of operating characteristics (OC), for different classification algorithms and techniques to cope with data imbalance, and values obtained with SB criteria. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000295569
    Explore at:
    Dataset updated
    Jun 24, 2022
    Authors
    Antunes, Marília; Medeiros, Ana Margarida; Alves, Ana Catarina; Bourbon, Mafalda; Albuquerque, João
    Description

    Mean and standard deviation values of operating characteristics (OC), for different classification algorithms and techniques to cope with data imbalance, and values obtained with SB criteria.

  17. o

    Sport and leisure facilities

    • data.opendatascience.eu
    Updated Jan 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Sport and leisure facilities [Dataset]. https://data.opendatascience.eu/geonetwork/srv/search?type=dataset
    Explore at:
    Dataset updated
    Jan 2, 2021
    Description

    Overview: 142: Areas used for sports, leisure and recreation purposes. Traceability (lineage): This dataset was produced with a machine learning framework with several input datasets, specified in detail in Witjes et al., 2022 (in review, preprint available at https://doi.org/10.21203/rs.3.rs-561383/v3 ) Scientific methodology: The single-class probability layers were generated with a spatiotemporal ensemble machine learning framework detailed in Witjes et al., 2022 (in review, preprint available at https://doi.org/10.21203/rs.3.rs-561383/v3 ). The single-class uncertainty layers were calculated by taking the standard deviation of the three single-class probabilities predicted by the three components of the ensemble. The HCL (hard class) layers represents the class with the highest probability as predicted by the ensemble. Usability: The HCL layers have a decreasing average accuracy (weighted F1-score) at each subsequent level in the CLC hierarchy. These metrics are 0.83 at level 1 (5 classes):, 0.63 at level 2 (14 classes), and 0.49 at level 3 (43 classes). This means that the hard-class maps are more reliable when aggregating classes to a higher level in the hierarchy (e.g. 'Discontinuous Urban Fabric' and 'Continuous Urban Fabric' to 'Urban Fabric'). Some single-class probabilities may more closely represent actual patterns for some classes that were overshadowed by unequal sample point distributions. Users are encouraged to set their own thresholds when postprocessing these datasets to optimize the accuracy for their specific use case. Uncertainty quantification: Uncertainty is quantified by taking the standard deviation of the probabilities predicted by the three components of the spatiotemporal ensemble model. Data validation approaches: The LULC classification was validated through spatial 5-fold cross-validation as detailed in the accompanying publication. Completeness: The dataset has chunks of empty predictions in regions with complex coast lines (e.g. the Zeeland province in the Netherlands and the Mar da Palha bay area in Portugal). These are artifacts that will be avoided in subsequent versions of the LULC product. Consistency: The accuracy of the predictions was compared per year and per 30km*30km tile across europe to derive temporal and spatial consistency by calculating the standard deviation. The standard deviation of annual weighted F1-score was 0.135, while the standard deviation of weighted F1-score per tile was 0.150. This means the dataset is more consistent through time than through space: Predictions are notably less accurate along the Mediterrranean coast. The accompanying publication contains additional information and visualisations. Positional accuracy: The raster layers have a resolution of 30m, identical to that of the Landsat data cube used as input features for the machine learning framework that predicted it. Temporal accuracy: The dataset contains predictions and uncertainty layers for each year between 2000 and 2019. Thematic accuracy: The maps reproduce the Corine Land Cover classification system, a hierarchical legend that consists of 5 classes at the highest level, 14 classes at the second level, and 44 classes at the third level. Class 523: Oceans was omitted due to computational constraints.

  18. f

    The median of the classification accuracies from the constant position and...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hong, Kihun; Schofield, Jonathon S.; Sagastume, Giancarlo K.; Young, Peyton R.; Whittle, Richard S.; Battraw, Marcus A.; Winslow, Eden J. (2025). The median of the classification accuracies from the constant position and varied grasped loads tests reported with the interquartile range and the standard deviation. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0002061782
    Explore at:
    Dataset updated
    Apr 10, 2025
    Authors
    Hong, Kihun; Schofield, Jonathon S.; Sagastume, Giancarlo K.; Young, Peyton R.; Whittle, Richard S.; Battraw, Marcus A.; Winslow, Eden J.
    Description

    The median of the classification accuracies from the constant position and varied grasped loads tests reported with the interquartile range and the standard deviation.

  19. Educational Time Series Data

    • kaggle.com
    zip
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ayman M. (2025). Educational Time Series Data [Dataset]. https://www.kaggle.com/datasets/csmohamedayman/educational-time-series-data
    Explore at:
    zip(3322790 bytes)Available download formats
    Dataset updated
    Nov 29, 2025
    Authors
    Ayman M.
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description
    • This dataset is a feature-engineered time series dataset created from the Tutorial: Tutorial-TSA-EDA-Time Series Data notebook. It includes a wide range of engineered temporal, rolling, statistical, and lag-based features suitable for time-series forecasting, anomaly detection, and exploratory data analysis.

    The dataset contains:

    • Original target variable transformations (lags, differences, rolling statistics, exponential moving averages, etc.)
    • Date-based features (year, month, day, day of year, weekend flags, leap year, season, etc.)
    • Advanced statistical features (volatility, skewness, kurtosis, entropy, Sharpe ratio, drawdown)
    • Trend and detrended components
    • Multiple target encodings (binary_target, multiclass_target)

    This dataset is ideal for practicing:

    • Feature selection for time series
    • Forecasting model training
    • EDA on engineered features
    • Multi-output regression and classification tasks

    Columns Description:

    Primary Core Features

    • date: Daily timestamp (1995-01-01 onward) Primary index
    • ts-1 Time series feature #1 (T-1 period) Core time series
    • ts-2 Time series feature #2 (T-2 period) Core time series
    • ts-3 Time series feature #3 (T-3 period) Core time series
    • ts-4 Time series feature #4 (T-4 period) Core time series
    • numerical_target Primary regression target Sum of all 4 time series features (ts-1 + ts-2 + ts-3 + ts-4) Calculated target
    • multiclass_target Multiclass classification target Quantile-based discretization of numerical_target into 4 equal groups (quartiles) Multiclass classification
    • binary_target Binary classification target Derived from multiclass_target: Classes 0-1 -> 0, Classes 2-3 -> 1 Binary classification

    Lag & Difference Features

    • numerical_target_lag_1 Target value from 1 period ago Lag 1
    • numerical_target_lag_7 Target value from 7 periods ago Lag 7
    • numerical_target_diff1 Difference between current and previous target 1-period
    • numerical_target_diff7 Difference between current and 7-periods ago 7-period
    • numerical_target_pct_change_1 Percentage change from previous period 1-period

    Rolling Statistics Features

    • numerical_target_roll_mean7 Rolling mean 7 periods
    • numerical_target_roll_mean30 Rolling mean 30 periods
    • numerical_target_roll_std7 Rolling standard deviation 7 periods
    • numerical_target_roll_min7 Rolling minimum 7 periods
    • numerical_target_roll_min30 Rolling minimum 30 periods
    • numerical_target_roll_max7 Rolling maximum 7 periods
    • numerical_target_roll_max30 Rolling maximum 30 periods

    Volatility & Risk Metrics

    • numerical_target_volatility_7 Rolling volatility 7 periods
    • numerical_target_volatility_30 Rolling volatility 30 periods
    • numerical_target_sharpe_7 Sharpe ratio (risk-adjusted return) 7 periods
    • numerical_target_sharpe_30 Sharpe ratio 30 periods
    • numerical_target_drawdown Maximum drawdown from peak

    Statistical Distribution Features

    • numerical_target_var_7 Variance 7 periods
    • numerical_target_var_30 Variance 30 periods
    • numerical_target_skewness Distribution skewness
    • numerical_target_kurtosis Distribution kurtosis (tail heaviness)
    • numerical_target_entropy Information entropy

    Trend & Seasonality Features

    • numerical_target_trend_7 Linear trend component 7 periods
    • numerical_target_trend_30 Linear trend component 30 periods
    • numerical_target_detrended_7 Original minus trend (detrended) 7 periods
    • numerical_target_detrended_30 Original minus trend (detrended) 30 periods
    • numerical_target_vs_seasonal_30 Seasonal component 30 periods

    Date-Based Features

    • date_year Year (encoded) Numerical
    • date_month Month (encoded) 0-11
    • date_day Day of month 1-31
    • date_dayofyear Day of year 1-365
    • date_weekofyear Week number 1-52
    • date_quarter Quarter 1-4
    • date_semester Semester 1-2
    • date_season Season 1-4
    • date_isweekend Weekend flag 0/1
    • date_isleapyear Leap year flag 0/1

    Smoothing & Transformation Features

    • numerical_target_ewm_0.3 Exponential Moving Average Alpha=0.3
    • numerical_target_ewm_0.7 Exponential Moving Average Alpha=0.7
    • numerical_target_ratio_lag_1 Ratio to lag-1 value
    • numerical_target_ratio_lag_7 Ratio to lag-7 value

    Dataset Structure Summary

    • Target Variables (3):
      • numerical_target - Continuous target for regression
      • multiclass_target - 4-class classification (quartiles of numerical_target)
      • binary_target - 2-class classification (first 2 vs last 2 quartiles)
    • Feature Categories:
      • Core Time Series (5): ts-1 through ts-4 + date
      • Engineered Features (39): Various transformations of numerical_target
      • Total Columns: 47
  20. t

    Grid files from two AUV missions in the DISCOL area during the SONNE cruise...

    • service.tib.eu
    Updated Nov 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Grid files from two AUV missions in the DISCOL area during the SONNE cruise SO242/1 - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/png-doi-10-1594-pangaea-892662
    Explore at:
    Dataset updated
    Nov 30, 2024
    Description

    The zip file contains grid files in UTM 16S resulted from AUV mutlibeam data processing and a table with descriptions of these grid files. AUV bathymetry data resulted from interpolation of multibeam depth measurements using the IDW algorithm in SAGA GIS. The AUV bathymetric derivatives (Bathymetric Position Index, Concavity, LS factor, and Terrain Ruggedness Index were calculated in SAGA GIS. The slope derivative was calculated in ArcMap. The AUV backscatter statistics (10th quantile, 90th quantile, mean and mode) were calculated in FMGT Geocoder. The Bayesian classification map was created in SAGA GIS using data from Bayesian classification in Matlab. The ISODATA classification map was created in SAGA GIS using the the AUV backscatter statistics and the Random Forest predictive map was created using the MGET toolbox in ArcMap and the AUV bathymetry, bathymetric derivatives and backscatter statistics data.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Alessio Farcomeni; Marco Geraci; Cinzia Viroli (2023). Directional Quantile Classifiers [Dataset]. http://doi.org/10.6084/m9.figshare.17711340.v2
Organization logo

Data from: Directional Quantile Classifiers

Related Article
Explore at:
txtAvailable download formats
Dataset updated
Jun 4, 2023
Dataset provided by
Taylor & Francishttps://taylorandfrancis.com/
Authors
Alessio Farcomeni; Marco Geraci; Cinzia Viroli
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

We introduce classifiers based on directional quantiles. We derive theoretical results for selecting optimal quantile levels given a direction, and, conversely, an optimal direction given a quantile level. We also show that the probability of correct classification of the proposed classifier converges to one if population distributions differ by at most a location shift and if the number of directions is allowed to diverge at the same rate of the problem’s dimension. We illustrate the satisfactory performance of our proposed classifiers in both small- and high-dimensional settings via a simulation study and a real data example. The code implementing the proposed methods is publicly available in the R package Qtools. Supplementary materials for this article are available online.

Search
Clear search
Close search
Google apps
Main menu