9 datasets found
  1. Clustering Dataset3 using Weka

    • kaggle.com
    Updated Sep 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Click Mintaka (2024). Clustering Dataset3 using Weka [Dataset]. https://www.kaggle.com/datasets/muhammadismailo/clustering-dataset-using-weka/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 20, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Click Mintaka
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Click Mintaka

    Released under CC0: Public Domain

    Contents

  2. f

    Data from: IDENTIFICATION OF HOMOGENEOUS RAINFALL ZONES DURING GRAIN CROPS...

    • scielo.figshare.com
    jpeg
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allan R. Lopes; Jonatas Marcolin; Jerry A. Johann; Márcio A. Vilas Boas; Adilson R. Schuelter (2023). IDENTIFICATION OF HOMOGENEOUS RAINFALL ZONES DURING GRAIN CROPS IN PARANÁ, BRAZIL [Dataset]. http://doi.org/10.6084/m9.figshare.11350766.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    SciELO journals
    Authors
    Allan R. Lopes; Jonatas Marcolin; Jerry A. Johann; Márcio A. Vilas Boas; Adilson R. Schuelter
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    State of Paraná, Brazil
    Description

    ABSTRACT The aim of this study is to identify homogeneous rainfall zones in the winter and summer 1st and 2nd crops, in the state of Paraná, Brazil. The zones were defined by clustering using the expectation-maximization (EM) algorithm to transform seasonal rainfall series. Monthly average rainfall data collected from 157 weather stations for 20 years (1996 to 2015) were employed. The results show that the number of homogeneous zones varied among growing seasons. The summer crop presented two clusters, with rainfall averages of 1489 and 1925 mm; the second crop presented four clusters, with averages of 1849, 1004, 1454, and 1182 mm; and the winter crop had three clusters, with averages of 969, 1498, and 1171 mm. Clustering was a useful instrument to identify geographical regions with similar rainfall regimes during different growing seasons in the state of Paraná. Rainfall distribution was more homogeneous in the summer crop. In all crops analyzed, the clusters with the lowest rainfall rate were present in the northwestern, northern center, and northern pioneer of the state of Paraná, whereas the clusters with the highest rainfall rate were found in the coastal regions.

  3. m

    Data for: Clustering Based Drug-Drug Interaction Networks for Possible...

    • data.mendeley.com
    Updated May 4, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anum Munir (2018). Data for: Clustering Based Drug-Drug Interaction Networks for Possible Repositioning of Drugs against EGFR Mutations [Dataset]. http://doi.org/10.17632/ht3n5tyzcy.1
    Explore at:
    Dataset updated
    May 4, 2018
    Authors
    Anum Munir
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    all the clusters sets generated by K means clustering algorithm in weka tools, and the drug-drug interaction networks generation data and results of networks are provided along with the code, which was used to identify the number of clusters

  4. m

    Custering Results of evolutionary clustering algorithm star for clustering...

    • data.mendeley.com
    Updated Mar 9, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tarik A. Rashid (2021). Custering Results of evolutionary clustering algorithm star for clustering heterogeneous datasets [Dataset]. http://doi.org/10.17632/bsn4vh3zv7.2
    Explore at:
    Dataset updated
    Mar 9, 2021
    Authors
    Tarik A. Rashid
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data was collected from the written Java codes by the authors, and Weka packages for executing ECA* on 32 heterogenous and multi-featured datasets against its counterpart algorithms (KM, KM++, EM, LVQ, and GENCLUST++). Each of these algorithms was run thirty times on each of the 32 benchmarking dataset problems to evaluate the performance of ECA* against its competitve algorithms.

  5. f

    Data from: Machine learning approaches in MALDI-MSI: clinical applications

    • tandf.figshare.com
    pptx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manuel Galli; Italo Zoppis; Andrew Smith; Fulvio Magni; Giancarlo Mauri (2023). Machine learning approaches in MALDI-MSI: clinical applications [Dataset]. http://doi.org/10.6084/m9.figshare.3458753.v1
    Explore at:
    pptxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Manuel Galli; Italo Zoppis; Andrew Smith; Fulvio Magni; Giancarlo Mauri
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction: Despite the unquestionable advantages of Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry Imaging in visualizing the spatial distribution and the relative abundance of biomolecules directly on-tissue, the yielded data is complex and high dimensional. Therefore, analysis and interpretation of this huge amount of information is mathematically, statistically and computationally challenging. Areas covered: This article reviews some of the challenges in data elaboration with particular emphasis on machine learning techniques employed in clinical applications, and can be useful in general as an entry point for those who want to study the computational aspects. Several characteristics of data processing are described, enlightening advantages and disadvantages. Different approaches for data elaboration focused on clinical applications are also provided. Practical tutorial based upon Orange Canvas and Weka software is included, helping familiarization with the data processing. Expert commentary: Recently, MALDI-MSI has gained considerable attention and has been employed for research and diagnostic purposes, with successful results. Data dimensionality constitutes an important issue and statistical methods for information-preserving data reduction represent one of the most challenging aspects. The most common data reduction methods are characterized by collecting independent observations into a single table. However, the incorporation of relational information can improve the discriminatory capability of the data.

  6. KC1

    • zenodo.org
    bin
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    A. Gunes Koru; A. Gunes Koru (2020). KC1 [Dataset]. http://doi.org/10.5281/zenodo.268441
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    A. Gunes Koru; A. Gunes Koru
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview of Data

    The data is a weka .arff file. It contains 94 independent variables and 1 dependent variable.

    Paper Abstract

    Modern requirements tracing tools employ information retrieval methods to automatically generate candidate links. Due to the inherent trade-off between recall and precision, such methods cannot achieve a high coverage without also retrieving a great number of false positives, causing a significant drop in result accuracy. In this paper, we propose an approach to improving the quality of candidate link generation for the requirements tracing process. We base our research on the cluster hypothesis which suggests that correct and incorrect links can be grouped in high-quality and low-quality clusters respectively. Result accuracy can thus be enhanced by identifying and filtering out low-quality clusters. We describe our approach by investigating three open-source datasets, and further evaluate our work through an industrial study. The results show that our approach outperforms a baseline pruning strategy and that improvements are still possible.

  7. f

    Data from: A Proposed Churn Prediction Model

    • figshare.com
    pdf
    Updated Feb 24, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mona Nasr; Essam Shaaban; Yehia Helmy; Dr. Ayman Khedr (2019). A Proposed Churn Prediction Model [Dataset]. http://doi.org/10.6084/m9.figshare.7763183.v2
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Feb 24, 2019
    Dataset provided by
    figshare
    Authors
    Mona Nasr; Essam Shaaban; Yehia Helmy; Dr. Ayman Khedr
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Churn prediction aims to detect customers intended to leave a service provider. Retaining one customer costs an organization from 5 to 10 times than gaining a new one. Predictive models can provide correct identification of possible churners in the near future in order to provide a retention solution. This paper presents a new prediction model based on Data Mining (DM) techniques. The proposed model is composed of six steps which are; identify problem domain, data selection, investigate data set, classification, clustering and knowledge usage. A data set with 23 attributes and 5000 instances is used. 4000 instances used for training the model and 1000 instances used as a testing set. The predicted churners are clustered into 3 categories in case of using in a retention strategy. The data mining techniques used in this paper are Decision Tree, Support Vector Machine and Neural Network throughout an open source software name WEKA.

  8. m

    Trajectory segments with detail predictability performance metrics and...

    • data.mendeley.com
    Updated Apr 23, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rocío Barragán Montes (2018). Trajectory segments with detail predictability performance metrics and clusterisation for two days of traffic over South West Functional Airspace Block [Dataset]. http://doi.org/10.17632/s3m26t7f6v.1
    Explore at:
    Dataset updated
    Apr 23, 2018
    Authors
    Rocío Barragán Montes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset includes two days of flight trajectory information over South West (SW) Functional Airspace Block (FAB). The flight trajectories have been decomposed into segments with individual information regarding three-dimensional deviations and a global predictability non-conformance metric. Using MakeDensityBasedClusterer algorithm and SimpleKMeans as clusterer in Weka 3.9.1, the 8 clusters assignment is included in this dataset. The attributes acting as explanatory variables are: * Segment type * 2D deviation * Delta flight level threshold normalised at the destination point * Delay range at destination point * Point of origin (A) * Point of destination (B) * Diversion point (C) * Is origin point an airport * Is destination point an airport * P_norm (Predictability_normalised)

    The attributes acting as response variables are also detailed in this dataset.

  9. f

    Major clusters of the 2004–2008 Netherlands RIVM collection (n≥10).

    • plos.figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jérôme Azé; Christophe Sola; Jian Zhang; Florian Lafosse-Marin; Memona Yasmin; Rubina Siddiqui; Kristin Kremer; Dick van Soolingen; Guislaine Refrégier (2023). Major clusters of the 2004–2008 Netherlands RIVM collection (n≥10). [Dataset]. http://doi.org/10.1371/journal.pone.0130912.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Jérôme Azé; Christophe Sola; Jian Zhang; Florian Lafosse-Marin; Memona Yasmin; Rubina Siddiqui; Kristin Kremer; Dick van Soolingen; Guislaine Refrégier
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    24-VNTR clusters ID numbers were attributed according to their size (n°1 for the largest). Isolates ID are those stated in S1 Table. SIT = Short International Type.Major clusters of the 2004–2008 Netherlands RIVM collection (n≥10).

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Click Mintaka (2024). Clustering Dataset3 using Weka [Dataset]. https://www.kaggle.com/datasets/muhammadismailo/clustering-dataset-using-weka/code
Organization logo

Clustering Dataset3 using Weka

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 20, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Click Mintaka
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Dataset

This dataset was created by Click Mintaka

Released under CC0: Public Domain

Contents

Search
Clear search
Close search
Google apps
Main menu