9 datasets found

Clustering Dataset3 using Weka
kaggle.com
Updated Sep 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Click Mintaka (2024). Clustering Dataset3 using Weka [Dataset]. https://www.kaggle.com/datasets/muhammadismailo/clustering-dataset-using-weka/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 20, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Click Mintaka
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Click Mintaka

Released under CC0: Public Domain

Contents
f
Data from: IDENTIFICATION OF HOMOGENEOUS RAINFALL ZONES DURING GRAIN CROPS...
scielo.figshare.com
jpeg
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Allan R. Lopes; Jonatas Marcolin; Jerry A. Johann; Márcio A. Vilas Boas; Adilson R. Schuelter (2023). IDENTIFICATION OF HOMOGENEOUS RAINFALL ZONES DURING GRAIN CROPS IN PARANÁ, BRAZIL [Dataset]. http://doi.org/10.6084/m9.figshare.11350766.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11350766.v1
Dataset updated
Jun 2, 2023
Dataset provided by
SciELO journals
Authors
Allan R. Lopes; Jonatas Marcolin; Jerry A. Johann; Márcio A. Vilas Boas; Adilson R. Schuelter
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
State of Paraná, Brazil
Description
ABSTRACT The aim of this study is to identify homogeneous rainfall zones in the winter and summer 1st and 2nd crops, in the state of Paraná, Brazil. The zones were defined by clustering using the expectation-maximization (EM) algorithm to transform seasonal rainfall series. Monthly average rainfall data collected from 157 weather stations for 20 years (1996 to 2015) were employed. The results show that the number of homogeneous zones varied among growing seasons. The summer crop presented two clusters, with rainfall averages of 1489 and 1925 mm; the second crop presented four clusters, with averages of 1849, 1004, 1454, and 1182 mm; and the winter crop had three clusters, with averages of 969, 1498, and 1171 mm. Clustering was a useful instrument to identify geographical regions with similar rainfall regimes during different growing seasons in the state of Paraná. Rainfall distribution was more homogeneous in the summer crop. In all crops analyzed, the clusters with the lowest rainfall rate were present in the northwestern, northern center, and northern pioneer of the state of Paraná, whereas the clusters with the highest rainfall rate were found in the coastal regions.
m
Data for: Clustering Based Drug-Drug Interaction Networks for Possible...
data.mendeley.com
Updated May 4, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anum Munir (2018). Data for: Clustering Based Drug-Drug Interaction Networks for Possible Repositioning of Drugs against EGFR Mutations [Dataset]. http://doi.org/10.17632/ht3n5tyzcy.1
Explore at:
Unique identifier
https://doi.org/10.17632/ht3n5tyzcy.1
Dataset updated
May 4, 2018
Authors
Anum Munir
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
all the clusters sets generated by K means clustering algorithm in weka tools, and the drug-drug interaction networks generation data and results of networks are provided along with the code, which was used to identify the number of clusters
m
Custering Results of evolutionary clustering algorithm star for clustering...
data.mendeley.com
Updated Mar 9, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tarik A. Rashid (2021). Custering Results of evolutionary clustering algorithm star for clustering heterogeneous datasets [Dataset]. http://doi.org/10.17632/bsn4vh3zv7.2
Explore at:
Unique identifier
https://doi.org/10.17632/bsn4vh3zv7.2
Dataset updated
Mar 9, 2021
Authors
Tarik A. Rashid
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data was collected from the written Java codes by the authors, and Weka packages for executing ECA* on 32 heterogenous and multi-featured datasets against its counterpart algorithms (KM, KM++, EM, LVQ, and GENCLUST++). Each of these algorithms was run thirty times on each of the 32 benchmarking dataset problems to evaluate the performance of ECA* against its competitve algorithms.
f
Data from: Machine learning approaches in MALDI-MSI: clinical applications
tandf.figshare.com
pptx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Manuel Galli; Italo Zoppis; Andrew Smith; Fulvio Magni; Giancarlo Mauri (2023). Machine learning approaches in MALDI-MSI: clinical applications [Dataset]. http://doi.org/10.6084/m9.figshare.3458753.v1
Explore at:
pptxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3458753.v1
Dataset updated
Jun 1, 2023
Dataset provided by
Taylor & Francis
Authors
Manuel Galli; Italo Zoppis; Andrew Smith; Fulvio Magni; Giancarlo Mauri
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Introduction: Despite the unquestionable advantages of Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry Imaging in visualizing the spatial distribution and the relative abundance of biomolecules directly on-tissue, the yielded data is complex and high dimensional. Therefore, analysis and interpretation of this huge amount of information is mathematically, statistically and computationally challenging. Areas covered: This article reviews some of the challenges in data elaboration with particular emphasis on machine learning techniques employed in clinical applications, and can be useful in general as an entry point for those who want to study the computational aspects. Several characteristics of data processing are described, enlightening advantages and disadvantages. Different approaches for data elaboration focused on clinical applications are also provided. Practical tutorial based upon Orange Canvas and Weka software is included, helping familiarization with the data processing. Expert commentary: Recently, MALDI-MSI has gained considerable attention and has been employed for research and diagnostic purposes, with successful results. Data dimensionality constitutes an important issue and statistical methods for information-preserving data reduction represent one of the most challenging aspects. The most common data reduction methods are characterized by collecting independent observations into a single table. However, the incorporation of relational information can improve the discriminatory capability of the data.
KC1
zenodo.org
bin
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
A. Gunes Koru; A. Gunes Koru (2020). KC1 [Dataset]. http://doi.org/10.5281/zenodo.268441
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.268441
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
A. Gunes Koru; A. Gunes Koru
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overview of Data

The data is a weka .arff file. It contains 94 independent variables and 1 dependent variable.

Paper Abstract

Modern requirements tracing tools employ information retrieval methods to automatically generate candidate links. Due to the inherent trade-off between recall and precision, such methods cannot achieve a high coverage without also retrieving a great number of false positives, causing a significant drop in result accuracy. In this paper, we propose an approach to improving the quality of candidate link generation for the requirements tracing process. We base our research on the cluster hypothesis which suggests that correct and incorrect links can be grouped in high-quality and low-quality clusters respectively. Result accuracy can thus be enhanced by identifying and filtering out low-quality clusters. We describe our approach by investigating three open-source datasets, and further evaluate our work through an industrial study. The results show that our approach outperforms a baseline pruning strategy and that improvements are still possible.
f
Data from: A Proposed Churn Prediction Model
figshare.com
pdf
Updated Feb 24, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mona Nasr; Essam Shaaban; Yehia Helmy; Dr. Ayman Khedr (2019). A Proposed Churn Prediction Model [Dataset]. http://doi.org/10.6084/m9.figshare.7763183.v2
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7763183.v2
Dataset updated
Feb 24, 2019
Dataset provided by
figshare
Authors
Mona Nasr; Essam Shaaban; Yehia Helmy; Dr. Ayman Khedr
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Churn prediction aims to detect customers intended to leave a service provider. Retaining one customer costs an organization from 5 to 10 times than gaining a new one. Predictive models can provide correct identification of possible churners in the near future in order to provide a retention solution. This paper presents a new prediction model based on Data Mining (DM) techniques. The proposed model is composed of six steps which are; identify problem domain, data selection, investigate data set, classification, clustering and knowledge usage. A data set with 23 attributes and 5000 instances is used. 4000 instances used for training the model and 1000 instances used as a testing set. The predicted churners are clustered into 3 categories in case of using in a retention strategy. The data mining techniques used in this paper are Decision Tree, Support Vector Machine and Neural Network throughout an open source software name WEKA.
m
Trajectory segments with detail predictability performance metrics and...
data.mendeley.com
Updated Apr 23, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rocío Barragán Montes (2018). Trajectory segments with detail predictability performance metrics and clusterisation for two days of traffic over South West Functional Airspace Block [Dataset]. http://doi.org/10.17632/s3m26t7f6v.1
Explore at:
Unique identifier
https://doi.org/10.17632/s3m26t7f6v.1
Dataset updated
Apr 23, 2018
Authors
Rocío Barragán Montes
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset includes two days of flight trajectory information over South West (SW) Functional Airspace Block (FAB). The flight trajectories have been decomposed into segments with individual information regarding three-dimensional deviations and a global predictability non-conformance metric. Using MakeDensityBasedClusterer algorithm and SimpleKMeans as clusterer in Weka 3.9.1, the 8 clusters assignment is included in this dataset. The attributes acting as explanatory variables are: * Segment type * 2D deviation * Delta flight level threshold normalised at the destination point * Delay range at destination point * Point of origin (A) * Point of destination (B) * Diversion point (C) * Is origin point an airport * Is destination point an airport * P_norm (Predictability_normalised)

The attributes acting as response variables are also detailed in this dataset.
f
Major clusters of the 2004–2008 Netherlands RIVM collection (n≥10).
plos.figshare.com
xls
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jérôme Azé; Christophe Sola; Jian Zhang; Florian Lafosse-Marin; Memona Yasmin; Rubina Siddiqui; Kristin Kremer; Dick van Soolingen; Guislaine Refrégier (2023). Major clusters of the 2004–2008 Netherlands RIVM collection (n≥10). [Dataset]. http://doi.org/10.1371/journal.pone.0130912.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0130912.t002
Dataset updated
May 30, 2023
Dataset provided by
PLOS ONE
Authors
Jérôme Azé; Christophe Sola; Jian Zhang; Florian Lafosse-Marin; Memona Yasmin; Rubina Siddiqui; Kristin Kremer; Dick van Soolingen; Guislaine Refrégier
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
24-VNTR clusters ID numbers were attributed according to their size (n°1 for the largest). Isolates ID are those stated in S1 Table. SIT = Short International Type.Major clusters of the 2004–2008 Netherlands RIVM collection (n≥10).
Not seeing a result you expected?
Learn how you can add new datasets to our index.