15 datasets found
  1. f

    Relevance and Redundancy ranking: Code and Supplementary material

    • springernature.figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arvind Kumar Shekar; Tom Bocklisch; Patricia Iglesias Sanchez; Christoph Nikolas Straehle; Emmanuel Mueller (2023). Relevance and Redundancy ranking: Code and Supplementary material [Dataset]. http://doi.org/10.6084/m9.figshare.5418706.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Authors
    Arvind Kumar Shekar; Tom Bocklisch; Patricia Iglesias Sanchez; Christoph Nikolas Straehle; Emmanuel Mueller
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the code for Relevance and Redundancy ranking; a an efficient filter-based feature ranking framework for evaluating relevance based on multi-feature interactions and redundancy on mixed datasets.Source code is in .scala and .sbt format, metadata in .xml, all of which can be accessed and edited in standard, openly accessible text edit software. Diagrams are in openly accessible .png format.Supplementary_2.pdf: contains the results of experiments on multiple classifiers, along with parameter settings and a description of how KLD converges to mutual information based on its symmetricity.dataGenerator.zip: Synthetic data generator inspired from NIPS: Workshop on variable and feature selection (2001), http://www.clopinet.com/isabelle/Projects/NIPS2001/rar-mfs-master.zip: Relevance and Redundancy Framework containing overview diagram, example datasets, source code and metadata. Details on installing and running are provided below.Background. Feature ranking is benfie cial to gain knowledge and to identify the relevant features from a high-dimensional dataset. However, in several datasets, few features by themselves might have small correlation with the target classes, but by combining these features with some other features, they can be strongly correlated with the target. This means that multiple features exhibit interactions among themselves. It is necessary to rank the features based on these interactions for better analysis and classifier performance. However, evaluating these interactions on large datasets is computationally challenging. Furthermore, datasets often have features with redundant information. Using such redundant features hinders both efficiency and generalization capability of the classifier. The major challenge is to efficiently rank the features based on relevance and redundancy on mixed datasets. In the related publication, we propose a filter-based framework based on Relevance and Redundancy (RaR), RaR computes a single score that quantifies the feature relevance by considering interactions between features and redundancy. The top ranked features of RaR are characterized by maximum relevance and non-redundancy. The evaluation on synthetic and real world datasets demonstrates that our approach outperforms several state of-the-art feature selection techniques.# Relevance and Redundancy Framework (rar-mfs) Build Statusrar-mfs is an algorithm for feature selection and can be employed to select features from labelled data sets. The Relevance and Redundancy Framework (RaR), which is the theory behind the implementation, is a novel feature selection algorithm that - works on large data sets (polynomial runtime),- can handle differently typed features (e.g. nominal features and continuous features), and- handles multivariate correlations.## InstallationThe tool is written in scala and uses the weka framework to load and handle data sets. You can either run it independently providing the data as an .arff or .csv file or you can include the algorithm as a (maven / ivy) dependency in your project. As an example data set we use heart-c. ### Project dependencyThe project is published to maven central (link). To depend on the project use:- maven xml de.hpi.kddm rar-mfs_2.11 1.0.2 - sbt: sbt libraryDependencies += "de.hpi.kddm" %% "rar-mfs" % "1.0.2" To run the algorithm usescalaimport de.hpi.kddm.rar._// ...val dataSet = de.hpi.kddm.rar.Runner.loadCSVDataSet(new File("heart-c.csv", isNormalized = false, "")val algorithm = new RaRSearch( HicsContrastPramsFA(numIterations = config.samples, maxRetries = 1, alphaFixed = config.alpha, maxInstances = 1000), RaRParamsFixed(k = 5, numberOfMonteCarlosFixed = 5000, parallelismFactor = 4))algorithm.selectFeatures(dataSet)### Command line tool- EITHER download the prebuild binary which requires only an installation of a recent java version (>= 6) 1. download the prebuild jar from the releases tab (latest) 2. run java -jar rar-mfs-1.0.2.jar--help Using the prebuild jar, here is an example usage: sh rar-mfs > java -jar rar-mfs-1.0.2.jar arff --samples 100 --subsetSize 5 --nonorm heart-c.arff Feature Ranking: 1 - age (12) 2 - sex (8) 3 - cp (11) ...- OR build the repository on your own: 1. make sure sbt is installed 2. clone repository 3. run sbt run Simple example using sbt directly after cloning the repository: sh rar-mfs > sbt "run arff --samples 100 --subsetSize 5 --nonorm heart-c.arff" Feature Ranking: 1 - age (12) 2 - sex (8) 3 - cp (11) ... ### [Optional]To speed up the algorithm, consider using a fast solver such as Gurobi (http://www.gurobi.com/). Install the solver and put the provided gurobi.jar into the java classpath. ## Algorithm### IdeaAbstract overview of the different steps of the proposed feature selection algorithm:https://github.com/tmbo/rar-mfs/blob/master/docu/images/algorithm_overview.png" alt="Algorithm Overview">The Relevance and Redundancy ranking framework (RaR) is a method able to handle large scale data sets and data sets with mixed features. Instead of directly selecting a subset, a feature ranking gives a more detailed overview into the relevance of the features. The method consists of a multistep approach where we 1. repeatedly sample subsets from the whole feature space and examine their relevance and redundancy: exploration of the search space to gather more and more knowledge about the relevance and redundancy of features 2. decude scores for features based on the scores of the subsets 3. create the best possible ranking given the sampled insights.### Parameters| Parameter | Default value | Description || ---------- | ------------- | ------------|| m - contrast iterations | 100 | Number of different slices to evaluate while comparing marginal and conditional probabilities || alpha - subspace slice size | 0.01 | Percentage of all instances to use as part of a slice which is used to compare distributions || n - sampling itertations | 1000 | Number of different subsets to select in the sampling phase|| k - sample set size | 5 | Maximum size of the subsets to be selected in the sampling phase|

  2. m

    Maryland Natural Filters - Nitrogen Buffer Priority

    • data.imap.maryland.gov
    Updated Mar 30, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ArcGIS Online for Maryland (2016). Maryland Natural Filters - Nitrogen Buffer Priority [Dataset]. https://data.imap.maryland.gov/datasets/28eb001a3b7a4daeb004bcb7cebc13aa
    Explore at:
    Dataset updated
    Mar 30, 2016
    Dataset authored and provided by
    ArcGIS Online for Maryland
    Area covered
    Description

    The Natural Filter Buffer Priorities for Water Quality (Nitrogen) layers identify priority forest/grass buffer opportunities by subwatershed (MD HUC 8). The Natural Filter Buffer Targeting layers were used as a baseline for suitability rankings. Land use, hydrology, soil, and landscape characteristics were analyzed to rank buffer opportunities with high nitrogen removal potential.This is a MD iMAP hosted service layer. Find more information at https://imap.maryland.gov.Feature Service Layer Link:https://geodata.md.gov/imap/rest/services/Environment/MD_NaturalFilters/MapServer/1

  3. w

    MD iMAP: Maryland Natural Filters - Nitrogen Buffer Priority

    • data.wu.ac.at
    • opendata.maryland.gov
    • +1more
    csv, json, xml
    Updated Jul 13, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ArcGIS Online for Maryland (2017). MD iMAP: Maryland Natural Filters - Nitrogen Buffer Priority [Dataset]. https://data.wu.ac.at/schema/data_maryland_gov/ZnJ0dS11Z2ph
    Explore at:
    csv, json, xmlAvailable download formats
    Dataset updated
    Jul 13, 2017
    Dataset provided by
    ArcGIS Online for Maryland
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Area covered
    Maryland
    Description

    This is a MD iMAP hosted service layer. Find more information at http://imap.maryland.gov. The Natural Filter Buffer Priorities for Water Quality (Nitrogen) layers identify priority forest/grass buffer opportunities by subwatershed (MD HUC 8). The Natural Filter Buffer Targeting layers were used as a baseline for suitability rankings. Land use - hydrology - soil - and landscape characteristics were analyzed to rank buffer opportunities with high nitrogen removal potential. Last Updated: 6/9/2014 Feature Service Layer Link: http://geodata.md.gov/imap/rest/services/Environment/MD_NaturalFilters/MapServer/1 ADDITIONAL LICENSE TERMS: The Spatial Data and the information therein (collectively "the Data") is provided "as is" without warranty of any kind either expressed implied or statutory. The user assumes the entire risk as to quality and performance of the Data. No guarantee of accuracy is granted nor is any responsibility for reliance thereon assumed. In no event shall the State of Maryland be liable for direct indirect incidental consequential or special damages of any kind. The State of Maryland does not accept liability for any damages or misrepresentation caused by inaccuracies in the Data or as a result to changes to the Data nor is there responsibility assumed to maintain the Data in any manner or form. The Data can be freely distributed as long as the metadata entry is not modified or deleted. Any data derived from the Data must acknowledge the State of Maryland in the metadata.

  4. a

    Maryland Natural Filters - Nutrient Wetland Priority

    • data-maryland.opendata.arcgis.com
    Updated Jun 9, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ArcGIS Online for Maryland (2014). Maryland Natural Filters - Nutrient Wetland Priority [Dataset]. https://data-maryland.opendata.arcgis.com/items/17687b2f33a74a36a1ccd6e21aa3ffe1
    Explore at:
    Dataset updated
    Jun 9, 2014
    Dataset authored and provided by
    ArcGIS Online for Maryland
    Area covered
    Description

    The Natural Filter Wetland Priorities for Water Quality layers identify priority wetland restoration opportunities by subwatershed (MD HUC 8). The Natural Filter Wetland Targeting layers were used as a baseline for suitability rankings. Land use, hydrology, soil, and landscape characteristics were analyzed to rank wetland restoration opportunities with high nutrient removal potential.This is a MD iMAP hosted service layer. Find more information at https://imap.maryland.gov.Feature Service Layer Link:https://geodata.md.gov/imap/rest/services/Environment/MD_NaturalFilters/MapServer/2

  5. d

    MD iMAP: Maryland Natural Filters - Nutrient Wetland Priority

    • catalog.data.gov
    • opendata.maryland.gov
    • +1more
    Updated May 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    opendata.maryland.gov (2025). MD iMAP: Maryland Natural Filters - Nutrient Wetland Priority [Dataset]. https://catalog.data.gov/dataset/md-imap-maryland-natural-filters-nutrient-wetland-priority
    Explore at:
    Dataset updated
    May 10, 2025
    Dataset provided by
    opendata.maryland.gov
    Area covered
    Maryland
    Description

    This is a MD iMAP hosted service layer. Find more information at http://imap.maryland.gov. The Natural Filter Wetland Priorities for Water Quality layers identify priority wetland restoration opportunities by subwatershed (MD HUC 8). The Natural Filter Wetland Targeting layers were used as a baseline for suitability rankings. Land use - hydrology - soil - and landscape characteristics were analyzed to rank wetland restoration opportunities with high nutrient removal potential. Last Updated: 6/9/2014 Feature Service Layer Link: https://mdgeodata.md.gov/imap/rest/services/Environment/MD_NaturalFilters/MapServer ADDITIONAL LICENSE TERMS: The Spatial Data and the information therein (collectively "the Data") is provided "as is" without warranty of any kind either expressed implied or statutory. The user assumes the entire risk as to quality and performance of the Data. No guarantee of accuracy is granted nor is any responsibility for reliance thereon assumed. In no event shall the State of Maryland be liable for direct indirect incidental consequential or special damages of any kind. The State of Maryland does not accept liability for any damages or misrepresentation caused by inaccuracies in the Data or as a result to changes to the Data nor is there responsibility assumed to maintain the Data in any manner or form. The Data can be freely distributed as long as the metadata entry is not modified or deleted. Any data derived from the Data must acknowledge the State of Maryland in the metadata.

  6. f

    Time complexity.

    • plos.figshare.com
    xls
    Updated Oct 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiaotong Bai; Yuefeng Zheng; Yang Lu; Yongtao Shi (2024). Time complexity. [Dataset]. http://doi.org/10.1371/journal.pone.0311602.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 8, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Xiaotong Bai; Yuefeng Zheng; Yang Lu; Yongtao Shi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Hybrid feature selection algorithm is a strategy that combines different feature selection methods aiming to overcome the limitations of a single feature selection method and improve the effectiveness and performance of feature selection. In this paper, we propose a new hybrid feature selection algorithm, to be named as Tandem Maximum Kendall Minimum Chi-Square and ReliefF Improved Grey Wolf Optimization algorithm (TMKMCRIGWO). The algorithm consists of two stages: First, the original features are filtered and ranked using the bivariate filter algorithm Maximum Kendall Minimum Chi-Square (MKMC) to form a subset of candidate features S1; Subsequently, S1 features are filtered and sorted to form a candidate feature subset S2 by using ReliefF in tandem, and finally S2 is used in the wrapper algorithm to select the optimal subset. In particular, the wrapper algorithm is an improved Grey Wolf Optimization (IGWO) algorithm based on random disturbance factors, while the parameters are adjusted to vary randomly to make the population variations rich in diversity. Hybrid algorithms formed by combining filter algorithms with wrapper algorithms in tandem show better performance and results than single algorithms in solving complex problems. Three sets of comparison experiments were conducted to demonstrate the superiority of this algorithm over the others. The experimental results show that the average classification accuracy of the TMKMCRIGWO algorithm is at least 0.1% higher than the other algorithms on 20 datasets, and the average value of the dimension reduction rate (DRR) reaches 24.76%. The DRR reached 41.04% for 12 low-dimensional datasets and 0.33% for 8 high-dimensional datasets. It also shows that the algorithm improves the generalization ability and performance of the model.

  7. r

    Data from: Feature ranking and feature redundancy reduction for prognostic...

    • researchdata.edu.au
    Updated May 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qihua Tan; Mads Thomassen; Kaare Christensen; Torben A. Kruse (2022). Feature ranking and feature redundancy reduction for prognostic microarray study of tumor clinical outcomes [Dataset]. http://doi.org/10.4225/03/5a1372383442b
    Explore at:
    Dataset updated
    May 5, 2022
    Dataset provided by
    Monash University
    Authors
    Qihua Tan; Mads Thomassen; Kaare Christensen; Torben A. Kruse
    Description

    Different from significant gene expression analysis which looks for all genes that are differentially regulated, feature selection in prognostic gene expression analysis aims at finding a subset of informative marker genes that are discriminative for prediction. Unfortunately feature selection in the literature of microarray study is predominated by the simple heuristic univariate gene filter paradigm that selects differentially expressed genes according to their statistical significance. Since the univariate approach does not take into account the correlated or interactive structure among the genes, classifiers built on genes so selected can be less accurate. More advanced approaches based on multivariate models have to be considered. Here, we introduce a feature ranking method through forward orthogonal search to assist prognostic gene selection. Application to published gene-lists selected by univariate models shows that the feature space can be largely reduced while achieving improved testing performances. Our results indicate that "significant" features selected using the gene-wised approaches can contain irrelevant genes that only serve to complicate model building. Multivariate feature ranking can help to reduce feature redundancy and to select highly informative prognostic marker genes. PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1

    Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ; Chetty, Madhu ; Ahmad, Shandar ; Ngom, Alioune ; Teng, Shyh Wei ; Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ; Coverage: Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.

  8. Spearman rank correlation for texture features without filtration.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Choong Guen Chee; Young Hoon Kim; Kyoung Ho Lee; Yoon Jin Lee; Ji Hoon Park; Hye Seung Lee; Soyeon Ahn; Bohyoung Kim (2023). Spearman rank correlation for texture features without filtration. [Dataset]. http://doi.org/10.1371/journal.pone.0182883.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Choong Guen Chee; Young Hoon Kim; Kyoung Ho Lee; Yoon Jin Lee; Ji Hoon Park; Hye Seung Lee; Soyeon Ahn; Bohyoung Kim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Spearman rank correlation for texture features without filtration.

  9. a

    Where Will Better Air Filtration Improve Wildfire Resilience?

    • community-climatesolutions.hub.arcgis.com
    Updated Dec 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esri (2023). Where Will Better Air Filtration Improve Wildfire Resilience? [Dataset]. https://community-climatesolutions.hub.arcgis.com/items/b81e16997d804de190b1ad08975515a9
    Explore at:
    Dataset updated
    Dec 7, 2023
    Dataset authored and provided by
    Esri
    Area covered
    Description

    Wildfires are a clear and present threat to many communities in the country. That threat continues to grow with continued changes in weather patterns, increasing drought conditions, and development into more rural areas. Climate resilience planning in local communities involves several steps including assessing vulnerability and risk.During a wildfire event, smoke in the air contains a complex mixture of gases and fine particles that may be aggravating and even life threatening to individuals with chronic heart and lung diseases like asthma and COPD. The fine particles in the air can also cause health problems like burning eyes, runny nose and illnesses such as bronchitis. Setting up a clean air room in your house can reduce your exposure to wildfire smoke while sheltering indoors. This layer displays census tracts that are ranked according to which would benefit most from improving access to home air filtration systems such as portable air cleaners and HVAC filters. The ranking is based upon a composite index built with the following attributes:Mean Annual Estimated PM2.5 (μg/m3)Current Asthma Crude Prevalence (%)Percent of Housing Units Built before 1970 (%)These attribute links take you to the original data sources. Preprocessing was needed to prepare many of these inputs for inclusion in our index. The links are provided for reference only.This layer is one of three in a series developed to support local climate resilience planning. Intended as planning tools for policy makers, climate resilience planners, and community members, these layers highlight areas of the community that are most likely to benefit from the resilience intervention it supports. Each layer focuses on one specific wildfire resilience intervention that is intended to help mitigate against the climate hazard.Improving access to home air filtration systems, either through supply chains or subsidizing purchases, is vital to reducing exposure to the microscopic particles that can enter your eyes and respiratory system during a wildfire event. Resources to help you before a fire, during a fire and after a fire can be found at the AirNow.gov website. The wildfire resilience index (WRI) and methodology were developed in collaboration with the US Forest Service's Fire Lab and leverages several assets from the Wildfire Risk to Communities website. Layers in the wildfire hazard intervention series include,Where Will Home Hardening Improve Wildfire Resilience?Where Will Better Air Filtration Improve Wildfire Resilience?Where Will Improved Evacuation Routes Improve Wildfire Resilience?Did you know you can build your own climate resilience index or use ours and customize it? The Customize a climate resilience index Tutorial provides more information on the index and also walks you through steps for taking our index and customizing it to your needs so you can create intervention maps better suited to your location and sourced from your own higher resolution data. For more information about how Esri enriched the census tracts with exposure, demographic, and environmental data to create composite indices called intervention indices, please read this technical reference.This feature layer was created from the Climate Resilience Planning Census Tracts hosted feature layer view and is one of 18 similar intervention layers, all of which can be found in ArcGIS Living Atlas of the World.

  10. O

    Phosphorus Sediment Buffer Priority

    • opendata.maryland.gov
    • catalog.data.gov
    Updated Jul 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Phosphorus Sediment Buffer Priority [Dataset]. https://opendata.maryland.gov/dataset/Phosphorus-Sediment-Buffer-Priority/hi3e-fzck
    Explore at:
    csv, tsv, xml, application/rdfxml, kmz, application/rssxml, kml, application/geo+jsonAvailable download formats
    Dataset updated
    Jul 25, 2025
    Description

    The Natural Filter Buffer Priorities for Water Quality (Phosphorus / Sediment) layers identify priority forest/grass buffer opportunities by subwatershed (MD HUC 8). The Natural Filter Buffer Targeting layers were used as a baseline for suitability rankings. Land use, hydrology, soil, and landscape characteristics were analyzed to rank buffer opportunities with high phosphorus and sediment removal potential.

  11. O

    Nitrogen Buffer Priority

    • opendata.maryland.gov
    • catalog.data.gov
    Updated Jul 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Nitrogen Buffer Priority [Dataset]. https://opendata.maryland.gov/dataset/Nitrogen-Buffer-Priority/w74g-yixc
    Explore at:
    csv, kmz, kml, application/geo+json, application/rdfxml, xml, application/rssxml, tsvAvailable download formats
    Dataset updated
    Jul 22, 2025
    Description

    The Natural Filter Buffer Priorities for Water Quality (Nitrogen) layers identify priority forest/grass buffer opportunities by subwatershed (MD HUC 8). The Natural Filter Buffer Targeting layers were used as a baseline for suitability rankings. Land use, hydrology, soil, and landscape characteristics were analyzed to rank buffer opportunities with high nitrogen removal potential.

  12. d

    Nutrient Wetland Priority

    • catalog.data.gov
    • opendata.maryland.gov
    Updated Jul 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    opendata.maryland.gov (2025). Nutrient Wetland Priority [Dataset]. https://catalog.data.gov/dataset/nutrient-wetland-priority
    Explore at:
    Dataset updated
    Jul 26, 2025
    Dataset provided by
    opendata.maryland.gov
    Description

    The Natural Filter Wetland Priorities for Water Quality layers identify priority wetland restoration opportunities by subwatershed (MD HUC 8). The Natural Filter Wetland Targeting layers were used as a baseline for suitability rankings. Land use, hydrology, soil, and landscape characteristics were analyzed to rank wetland restoration opportunities with high nutrient removal potential.

  13. f

    Physiological parameter scores and rankings for different feature selection...

    • figshare.com
    xls
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pengcheng Yang; Taihu Wu; Ming Yu; Feng Chen; Chunchen Wang; Jing Yuan; Jiameng Xu; Guang Zhang (2023). Physiological parameter scores and rankings for different feature selection methods. [Dataset]. http://doi.org/10.1371/journal.pone.0226962.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Pengcheng Yang; Taihu Wu; Ming Yu; Feng Chen; Chunchen Wang; Jing Yuan; Jiameng Xu; Guang Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Physiological parameter scores and rankings for different feature selection methods.

  14. f

    The p-values of the pairwise one-tailed Wilcoxon rank sum tests.

    • plos.figshare.com
    xls
    Updated Jun 4, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chao Bi; Lei Zhang; Miao Qi; Caixia Zheng; Yugen Yi; Jianzhong Wang; Baoxue Zhang (2023). The p-values of the pairwise one-tailed Wilcoxon rank sum tests. [Dataset]. http://doi.org/10.1371/journal.pone.0159084.t008
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Chao Bi; Lei Zhang; Miao Qi; Caixia Zheng; Yugen Yi; Jianzhong Wang; Baoxue Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The p-values of the pairwise one-tailed Wilcoxon rank sum tests.

  15. f

    Hierarchical Gene Selection and Genetic Fuzzy System for Cancer Microarray...

    • plos.figshare.com
    docx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thanh Nguyen; Abbas Khosravi; Douglas Creighton; Saeid Nahavandi (2023). Hierarchical Gene Selection and Genetic Fuzzy System for Cancer Microarray Data Classification [Dataset]. http://doi.org/10.1371/journal.pone.0120364
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Thanh Nguyen; Abbas Khosravi; Douglas Creighton; Saeid Nahavandi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This paper introduces a novel approach to gene selection based on a substantial modification of analytic hierarchy process (AHP). The modified AHP systematically integrates outcomes of individual filter methods to select the most informative genes for microarray classification. Five individual ranking methods including t-test, entropy, receiver operating characteristic (ROC) curve, Wilcoxon and signal to noise ratio are employed to rank genes. These ranked genes are then considered as inputs for the modified AHP. Additionally, a method that uses fuzzy standard additive model (FSAM) for cancer classification based on genes selected by AHP is also proposed in this paper. Traditional FSAM learning is a hybrid process comprising unsupervised structure learning and supervised parameter tuning. Genetic algorithm (GA) is incorporated in-between unsupervised and supervised training to optimize the number of fuzzy rules. The integration of GA enables FSAM to deal with the high-dimensional-low-sample nature of microarray data and thus enhance the efficiency of the classification. Experiments are carried out on numerous microarray datasets. Results demonstrate the performance dominance of the AHP-based gene selection against the single ranking methods. Furthermore, the combination of AHP-FSAM shows a great accuracy in microarray data classification compared to various competing classifiers. The proposed approach therefore is useful for medical practitioners and clinicians as a decision support system that can be implemented in the real medical practice.

  16. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Arvind Kumar Shekar; Tom Bocklisch; Patricia Iglesias Sanchez; Christoph Nikolas Straehle; Emmanuel Mueller (2023). Relevance and Redundancy ranking: Code and Supplementary material [Dataset]. http://doi.org/10.6084/m9.figshare.5418706.v1

Relevance and Redundancy ranking: Code and Supplementary material

Related Article
Explore at:
pdfAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
figshare
Authors
Arvind Kumar Shekar; Tom Bocklisch; Patricia Iglesias Sanchez; Christoph Nikolas Straehle; Emmanuel Mueller
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset contains the code for Relevance and Redundancy ranking; a an efficient filter-based feature ranking framework for evaluating relevance based on multi-feature interactions and redundancy on mixed datasets.Source code is in .scala and .sbt format, metadata in .xml, all of which can be accessed and edited in standard, openly accessible text edit software. Diagrams are in openly accessible .png format.Supplementary_2.pdf: contains the results of experiments on multiple classifiers, along with parameter settings and a description of how KLD converges to mutual information based on its symmetricity.dataGenerator.zip: Synthetic data generator inspired from NIPS: Workshop on variable and feature selection (2001), http://www.clopinet.com/isabelle/Projects/NIPS2001/rar-mfs-master.zip: Relevance and Redundancy Framework containing overview diagram, example datasets, source code and metadata. Details on installing and running are provided below.Background. Feature ranking is benfie cial to gain knowledge and to identify the relevant features from a high-dimensional dataset. However, in several datasets, few features by themselves might have small correlation with the target classes, but by combining these features with some other features, they can be strongly correlated with the target. This means that multiple features exhibit interactions among themselves. It is necessary to rank the features based on these interactions for better analysis and classifier performance. However, evaluating these interactions on large datasets is computationally challenging. Furthermore, datasets often have features with redundant information. Using such redundant features hinders both efficiency and generalization capability of the classifier. The major challenge is to efficiently rank the features based on relevance and redundancy on mixed datasets. In the related publication, we propose a filter-based framework based on Relevance and Redundancy (RaR), RaR computes a single score that quantifies the feature relevance by considering interactions between features and redundancy. The top ranked features of RaR are characterized by maximum relevance and non-redundancy. The evaluation on synthetic and real world datasets demonstrates that our approach outperforms several state of-the-art feature selection techniques.# Relevance and Redundancy Framework (rar-mfs) Build Statusrar-mfs is an algorithm for feature selection and can be employed to select features from labelled data sets. The Relevance and Redundancy Framework (RaR), which is the theory behind the implementation, is a novel feature selection algorithm that - works on large data sets (polynomial runtime),- can handle differently typed features (e.g. nominal features and continuous features), and- handles multivariate correlations.## InstallationThe tool is written in scala and uses the weka framework to load and handle data sets. You can either run it independently providing the data as an .arff or .csv file or you can include the algorithm as a (maven / ivy) dependency in your project. As an example data set we use heart-c. ### Project dependencyThe project is published to maven central (link). To depend on the project use:- maven xml de.hpi.kddm rar-mfs_2.11 1.0.2 - sbt: sbt libraryDependencies += "de.hpi.kddm" %% "rar-mfs" % "1.0.2" To run the algorithm usescalaimport de.hpi.kddm.rar._// ...val dataSet = de.hpi.kddm.rar.Runner.loadCSVDataSet(new File("heart-c.csv", isNormalized = false, "")val algorithm = new RaRSearch( HicsContrastPramsFA(numIterations = config.samples, maxRetries = 1, alphaFixed = config.alpha, maxInstances = 1000), RaRParamsFixed(k = 5, numberOfMonteCarlosFixed = 5000, parallelismFactor = 4))algorithm.selectFeatures(dataSet)### Command line tool- EITHER download the prebuild binary which requires only an installation of a recent java version (>= 6) 1. download the prebuild jar from the releases tab (latest) 2. run java -jar rar-mfs-1.0.2.jar--help Using the prebuild jar, here is an example usage: sh rar-mfs > java -jar rar-mfs-1.0.2.jar arff --samples 100 --subsetSize 5 --nonorm heart-c.arff Feature Ranking: 1 - age (12) 2 - sex (8) 3 - cp (11) ...- OR build the repository on your own: 1. make sure sbt is installed 2. clone repository 3. run sbt run Simple example using sbt directly after cloning the repository: sh rar-mfs > sbt "run arff --samples 100 --subsetSize 5 --nonorm heart-c.arff" Feature Ranking: 1 - age (12) 2 - sex (8) 3 - cp (11) ... ### [Optional]To speed up the algorithm, consider using a fast solver such as Gurobi (http://www.gurobi.com/). Install the solver and put the provided gurobi.jar into the java classpath. ## Algorithm### IdeaAbstract overview of the different steps of the proposed feature selection algorithm:https://github.com/tmbo/rar-mfs/blob/master/docu/images/algorithm_overview.png" alt="Algorithm Overview">The Relevance and Redundancy ranking framework (RaR) is a method able to handle large scale data sets and data sets with mixed features. Instead of directly selecting a subset, a feature ranking gives a more detailed overview into the relevance of the features. The method consists of a multistep approach where we 1. repeatedly sample subsets from the whole feature space and examine their relevance and redundancy: exploration of the search space to gather more and more knowledge about the relevance and redundancy of features 2. decude scores for features based on the scores of the subsets 3. create the best possible ranking given the sampled insights.### Parameters| Parameter | Default value | Description || ---------- | ------------- | ------------|| m - contrast iterations | 100 | Number of different slices to evaluate while comparing marginal and conditional probabilities || alpha - subspace slice size | 0.01 | Percentage of all instances to use as part of a slice which is used to compare distributions || n - sampling itertations | 1000 | Number of different subsets to select in the sampling phase|| k - sample set size | 5 | Maximum size of the subsets to be selected in the sampling phase|

Search
Clear search
Close search
Google apps
Main menu