https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Click Mintaka
Released under CC0: Public Domain
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ABSTRACT The aim of this study is to identify homogeneous rainfall zones in the winter and summer 1st and 2nd crops, in the state of Paraná, Brazil. The zones were defined by clustering using the expectation-maximization (EM) algorithm to transform seasonal rainfall series. Monthly average rainfall data collected from 157 weather stations for 20 years (1996 to 2015) were employed. The results show that the number of homogeneous zones varied among growing seasons. The summer crop presented two clusters, with rainfall averages of 1489 and 1925 mm; the second crop presented four clusters, with averages of 1849, 1004, 1454, and 1182 mm; and the winter crop had three clusters, with averages of 969, 1498, and 1171 mm. Clustering was a useful instrument to identify geographical regions with similar rainfall regimes during different growing seasons in the state of Paraná. Rainfall distribution was more homogeneous in the summer crop. In all crops analyzed, the clusters with the lowest rainfall rate were present in the northwestern, northern center, and northern pioneer of the state of Paraná, whereas the clusters with the highest rainfall rate were found in the coastal regions.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
all the clusters sets generated by K means clustering algorithm in weka tools, and the drug-drug interaction networks generation data and results of networks are provided along with the code, which was used to identify the number of clusters
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data was collected from the written Java codes by the authors, and Weka packages for executing ECA* on 32 heterogenous and multi-featured datasets against its counterpart algorithms (KM, KM++, EM, LVQ, and GENCLUST++). Each of these algorithms was run thirty times on each of the 32 benchmarking dataset problems to evaluate the performance of ECA* against its competitve algorithms.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction: Despite the unquestionable advantages of Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry Imaging in visualizing the spatial distribution and the relative abundance of biomolecules directly on-tissue, the yielded data is complex and high dimensional. Therefore, analysis and interpretation of this huge amount of information is mathematically, statistically and computationally challenging. Areas covered: This article reviews some of the challenges in data elaboration with particular emphasis on machine learning techniques employed in clinical applications, and can be useful in general as an entry point for those who want to study the computational aspects. Several characteristics of data processing are described, enlightening advantages and disadvantages. Different approaches for data elaboration focused on clinical applications are also provided. Practical tutorial based upon Orange Canvas and Weka software is included, helping familiarization with the data processing. Expert commentary: Recently, MALDI-MSI has gained considerable attention and has been employed for research and diagnostic purposes, with successful results. Data dimensionality constitutes an important issue and statistical methods for information-preserving data reduction represent one of the most challenging aspects. The most common data reduction methods are characterized by collecting independent observations into a single table. However, the incorporation of relational information can improve the discriminatory capability of the data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview of Data
The data is a weka .arff file. It contains 94 independent variables and 1 dependent variable.
Paper Abstract
Modern requirements tracing tools employ information retrieval methods to automatically generate candidate links. Due to the inherent trade-off between recall and precision, such methods cannot achieve a high coverage without also retrieving a great number of false positives, causing a significant drop in result accuracy. In this paper, we propose an approach to improving the quality of candidate link generation for the requirements tracing process. We base our research on the cluster hypothesis which suggests that correct and incorrect links can be grouped in high-quality and low-quality clusters respectively. Result accuracy can thus be enhanced by identifying and filtering out low-quality clusters. We describe our approach by investigating three open-source datasets, and further evaluate our work through an industrial study. The results show that our approach outperforms a baseline pruning strategy and that improvements are still possible.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Churn prediction aims to detect customers intended to leave a service provider. Retaining one customer costs an organization from 5 to 10 times than gaining a new one. Predictive models can provide correct identification of possible churners in the near future in order to provide a retention solution. This paper presents a new prediction model based on Data Mining (DM) techniques. The proposed model is composed of six steps which are; identify problem domain, data selection, investigate data set, classification, clustering and knowledge usage. A data set with 23 attributes and 5000 instances is used. 4000 instances used for training the model and 1000 instances used as a testing set. The predicted churners are clustered into 3 categories in case of using in a retention strategy. The data mining techniques used in this paper are Decision Tree, Support Vector Machine and Neural Network throughout an open source software name WEKA.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes two days of flight trajectory information over South West (SW) Functional Airspace Block (FAB). The flight trajectories have been decomposed into segments with individual information regarding three-dimensional deviations and a global predictability non-conformance metric. Using MakeDensityBasedClusterer algorithm and SimpleKMeans as clusterer in Weka 3.9.1, the 8 clusters assignment is included in this dataset. The attributes acting as explanatory variables are: * Segment type * 2D deviation * Delta flight level threshold normalised at the destination point * Delay range at destination point * Point of origin (A) * Point of destination (B) * Diversion point (C) * Is origin point an airport * Is destination point an airport * P_norm (Predictability_normalised)
The attributes acting as response variables are also detailed in this dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
24-VNTR clusters ID numbers were attributed according to their size (n°1 for the largest). Isolates ID are those stated in S1 Table. SIT = Short International Type.Major clusters of the 2004–2008 Netherlands RIVM collection (n≥10).
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Click Mintaka
Released under CC0: Public Domain