Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 1 row and is filtered where the author is Dante J. Scala. It features 7 columns including author, publication date, language, and book publisher.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book publishers. It has 1 row and is filtered where the book publisher is Scala Arts. It features 5 columns: number of books published, authors, earliest publication date, and latest publication date.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To assess the viability of using synthetic populations as substitutes for real populations, we analyse the predictive performance of various machine-learning models using both real and synthetic data. Each model is tasked with predicting weight (or height) based on age, sex, and height (or weight). Case 1: the model undergoes training and testing using survey data. Case 2: an identical model is trained on synthetic population data and tested on the survey data. This table lists the accuracy of each model in predicting outcomes for linear regression and MLP regression.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The fraction of the population that have received at least one vaccine dose (vaccine coverage) and the net seropositivity (both infection and vaccination derived) are tabulated at the moment schools are reopened, for simulations with daily vaccination rates (DVR) of 0% (no vaccination), 0.2% and 0.4%. The top table shows results for an initial recovered fraction of 30% and the bottom corresponds to an initial recovered fraction of 50%.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the code for Relevance and Redundancy ranking; a an efficient filter-based feature ranking framework for evaluating relevance based on multi-feature interactions and redundancy on mixed datasets.Source code is in .scala and .sbt format, metadata in .xml, all of which can be accessed and edited in standard, openly accessible text edit software. Diagrams are in openly accessible .png format.Supplementary_2.pdf: contains the results of experiments on multiple classifiers, along with parameter settings and a description of how KLD converges to mutual information based on its symmetricity.dataGenerator.zip: Synthetic data generator inspired from NIPS: Workshop on variable and feature selection (2001), http://www.clopinet.com/isabelle/Projects/NIPS2001/rar-mfs-master.zip: Relevance and Redundancy Framework containing overview diagram, example datasets, source code and metadata. Details on installing and running are provided below.Background. Feature ranking is benfiecial to gain knowledge and to identify the relevant features from a high-dimensional dataset. However, in several datasets, few features by themselves might have small correlation with the target classes, but by combining these features with some other features, they can be strongly correlated with the target. This means that multiple features exhibit interactions among themselves. It is necessary to rank the features based on these interactions for better analysis and classifier performance. However, evaluating these interactions on large datasets is computationally challenging. Furthermore, datasets often have features with redundant information. Using such redundant features hinders both efficiency and generalization capability of the classifier. The major challenge is to efficiently rank the features based on relevance and redundancy on mixed datasets. In the related publication, we propose a filter-based framework based on Relevance and Redundancy (RaR), RaR computes a single score that quantifies the feature relevance by considering interactions between features and redundancy. The top ranked features of RaR are characterized by maximum relevance and non-redundancy. The evaluation on synthetic and real world datasets demonstrates that our approach outperforms several state of-the-art feature selection techniques.# Relevance and Redundancy Framework (rar-mfs) rar-mfs is an algorithm for feature selection and can be employed to select features from labelled data sets. The Relevance and Redundancy Framework (RaR), which is the theory behind the implementation, is a novel feature selection algorithm that - works on large data sets (polynomial runtime),- can handle differently typed features (e.g. nominal features and continuous features), and- handles multivariate correlations.## InstallationThe tool is written in scala and uses the weka framework to load and handle data sets. You can either run it independently providing the data as an
.arff or .csv file or you can include the algorithm as a (maven / ivy) dependency in your project. As an example data set we use heart-c. ### Project dependencyThe project is published to maven central (link). To depend on the project use:- maven xml de.hpi.kddm rar-mfs_2.11 1.0.2 - sbt: sbt libraryDependencies += "de.hpi.kddm" %% "rar-mfs" % "1.0.2" To run the algorithm usescalaimport de.hpi.kddm.rar._// ...val dataSet = de.hpi.kddm.rar.Runner.loadCSVDataSet(new File("heart-c.csv", isNormalized = false, "")val algorithm = new RaRSearch( HicsContrastPramsFA(numIterations = config.samples, maxRetries = 1, alphaFixed = config.alpha, maxInstances = 1000), RaRParamsFixed(k = 5, numberOfMonteCarlosFixed = 5000, parallelismFactor = 4))algorithm.selectFeatures(dataSet)### Command line tool- EITHER download the prebuild binary which requires only an installation of a recent java version (>= 6) 1. download the prebuild jar from the releases tab (latest) 2. run java -jar rar-mfs-1.0.2.jar--help Using the prebuild jar, here is an example usage: sh rar-mfs > java -jar rar-mfs-1.0.2.jar arff --samples 100 --subsetSize 5 --nonorm heart-c.arff Feature Ranking: 1 - age (12) 2 - sex (8) 3 - cp (11) ...- OR build the repository on your own: 1. make sure sbt is installed 2. clone repository 3. run sbt run Simple example using sbt directly after cloning the repository: sh rar-mfs > sbt "run arff --samples 100 --subsetSize 5 --nonorm heart-c.arff" Feature Ranking: 1 - age (12) 2 - sex (8) 3 - cp (11) ... ### [Optional]To speed up the algorithm, consider using a fast solver such as Gurobi (http://www.gurobi.com/). Install the solver and put the provided gurobi.jar into the java classpath. ## Algorithm### IdeaAbstract overview of the different steps of the proposed feature selection algorithm:https://github.com/tmbo/rar-mfs/blob/master/docu/images/algorithm_overview.png" alt="Algorithm Overview">The Relevance and Redundancy ranking framework (RaR) is a method able to handle large scale data sets and data sets with mixed features. Instead of directly selecting a subset, a feature ranking gives a more detailed overview into the relevance of the features. The method consists of a multistep approach where we 1. repeatedly sample subsets from the whole feature space and examine their relevance and redundancy: exploration of the search space to gather more and more knowledge about the relevance and redundancy of features 2. decude scores for features based on the scores of the subsets 3. create the best possible ranking given the sampled insights.### Parameters| Parameter | Default value | Description || ---------- | ------------- | ------------|| m - contrast iterations | 100 | Number of different slices to evaluate while comparing marginal and conditional probabilities || alpha - subspace slice size | 0.01 | Percentage of all instances to use as part of a slice which is used to compare distributions || n - sampling itertations | 1000 | Number of different subsets to select in the sampling phase|| k - sample set size | 5 | Maximum size of the subsets to be selected in the sampling phase|
Facebook
Twitter{"1) An urban energy system test data.2) Refer to the following book.M. La Scala, S. Bruno, C. A. Nucci, S. Lamonaca, and U. Stecchi, From smart grids to smart cities: new challenges in optimizing energy grids, John Wiley & Sons, 2017.3) Refer to my research work "Adaptive Robust Day-ahead Dispatch for Urban Energy Systems" if published online."}
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 1 row and is filtered where the author is Dante J. Scala. It features 7 columns including author, publication date, language, and book publisher.