14 datasets found
  1. Iris dataset

    • kaggle.com
    Updated Jul 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Himanshu Nakrani (2022). Iris dataset [Dataset]. https://www.kaggle.com/datasets/himanshunakrani/iris-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 20, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Himanshu Nakrani
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    It includes three iris species with 50 samples each as well as some properties of each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other.

    FIle name: iris.csv

  2. f

    Explanations for each cluster in Iris dataset.

    • plos.figshare.com
    xls
    Updated Oct 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liang Chen; Caiming Zhong; Zehua Zhang (2023). Explanations for each cluster in Iris dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0292960.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 27, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Liang Chen; Caiming Zhong; Zehua Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Clustering is an unsupervised machine learning technique whose goal is to cluster unlabeled data. But traditional clustering methods only output a set of results and do not provide any explanations of the results. Although in the literature a number of methods based on decision tree have been proposed to explain the clustering results, most of them have some disadvantages, such as too many branches and too deep leaves, which lead to complex explanations and make it difficult for users to understand. In this paper, a hypercube overlay model based on multi-objective optimization is proposed to achieve succinct explanations of clustering results. The model designs two objective functions based on the number of hypercubes and the compactness of instances and then uses multi-objective optimization to find a set of nondominated solutions. Finally, an Utopia point is defined to determine the most suitable solution, in which each cluster can be covered by as few hypercubes as possible. Based on these hypercubes, an explanations of each cluster is provided. Upon verification on synthetic and real datasets respectively, it shows that the model can provide a concise and understandable explanations to users.

  3. h

    iris-clase

    • huggingface.co
    Updated Apr 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrés Eduardo García Herrera (2025). iris-clase [Dataset]. https://huggingface.co/datasets/aegarciaherrera/iris-clase
    Explore at:
    Dataset updated
    Apr 5, 2025
    Authors
    Andrés Eduardo García Herrera
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    Dataset Card for "iris"

      Dataset Summary
    

    The Iris dataset is one of the most classic datasets in machine learning, often used for classification and clustering tasks. It contains 150 samples of iris flowers, each described by four features: sepal length, sepal width, petal length, and petal width. The task is to classify the samples into one of three species: Iris setosa, Iris versicolor, or Iris virginica. This dataset is especially useful for:

    Supervised learning… See the full description on the dataset page: https://huggingface.co/datasets/aegarciaherrera/iris-clase.

  4. FastLloyd Clustering Datasets

    • zenodo.org
    xz
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdulrahman Diaa; Abdulrahman Diaa; Thomas Humphries; Thomas Humphries; Florian Kerschbaum; Florian Kerschbaum (2025). FastLloyd Clustering Datasets [Dataset]. http://doi.org/10.5281/zenodo.15530593
    Explore at:
    xzAvailable download formats
    Dataset updated
    May 28, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Abdulrahman Diaa; Abdulrahman Diaa; Thomas Humphries; Thomas Humphries; Florian Kerschbaum; Florian Kerschbaum
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This artifact bundles the five dataset archives used in our private federated clustering evaluation, corresponding to the real-world benchmarks, scaling experiments, ablation studies, and timing performance tests described in the paper. The real_datasets.tar.xz includes ten established clustering benchmarks drawn from UCI and the Clustering basic benchmark (DOI: https://doi.org/10.1007/s10489-018-1238-7); scale_datasets.tar.xz contains the SynthNew family generated to assess scalability via the R clusterGeneration package ; ablate_datasets.tar.xz holds the AblateSynth sets varying cluster separation for ablation analysis also powered by clusterGeneration ; g2_datasets.tar.xz packages the G2 sets—Gaussian clusters of size 2048 across dimensions 2–1024 with two clusters each, collected from the Clustering basic benchmark (DOI: https://doi.org/10.1007/s10489-018-1238-7) ; and timing_datasets.tar.xz includes the real s1 and lsun datasets alongside TimeSynth files (balanced synthetic clusters for timing), as per Mohassel et al.’s experimental framework .

    Contents

    1. real_datasets.tar.xz

    Contains ten real-world benchmark datasets and formatted as one sample per line with space-separated features:

    • iris.txt: 150 samples, 4 features, 3 classes; classic UCI Iris dataset for petal/sepal measurements.

    • lsun.txt: 400 samples, 2 features, 3 clusters; two-dimensional variant of the LSUN dataset for clustering experiments .

    • s1.txt: 5,000 samples, 2 features, 15 clusters; synthetic benchmark from Fränti’s S1 series.

    • house.txt: 1,837 samples, 3 features, 3 clusters; housing data transformed for clustering tasks.

    • adult.txt: 48,842 samples, 6 features, 3 clusters; UCI Census Income (“Adult”) dataset for income bracket prediction.

    • wine.txt: 178 samples, 13 features, 3 cultivars; UCI Wine dataset with chemical analysis features.

    • breast.txt: 569 samples, 9 features, 2 classes; Wisconsin Diagnostic Breast Cancer dataset.

    • yeast.txt: 1,484 samples, 8 features, 10 localization sites; yeast protein localization data.

    • mnist.txt: 10,000 samples, 784 features (28×28 pixels), 10 digit classes; MNIST handwritten digits.

    • birch2.txt: (a random) 25,000/100,000 subset of samples, 2 features, 100 clusters; synthetic BIRCH2 dataset for high-cluster‐count evaluation .

    2. scale_datasets.tar.xz

    Holds the SynthNew_{k}_{d}_{s}.txt files for scaling experiments, where:

    • $k \in \{2,4,8,16,32\}$ is the number of clusters,

    • $d \in \{2,4,8,16,32,64,128,256,512\}$ is the dimensionality,

    • $s \in \{1,2,3\}$ are different random seeds.

    These are generated with the R clusterGeneration package with cluster sizes following a $1:2:...:k$ ratio. We incorporate a random number (in $[0, 100]$) of randomly sampled outliers and set the cluster separation degrees randomly in $[0.16, 0.26]$, spanning partially overlapping to separated clusters.

    3. ablate_datasets.tar.xz

    Contains the AblateSynth_{k}_{d}_{sep}.txt files for ablation studies, with:

    • $k \in \{2,4,8,16\}$ clusters,

    • $d \in \{2,4,8,16\}$ dimensions,

    • $sep \in \{0.25, 0.5, 0.75\}$ controlling cluster separation degrees.

    Also generated via clusterGeneration.

    4. g2_datasets.tar.xz

    Packages the G2 synthetic sets (g2-{dim}-{var}.txt) from the clustering-data benchmarks:

    • $N=2048$ samples, $k=2$ Gaussian clusters,

    • Dimensions $d \in \{1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024\}$

    • Cluster overlap $var \in \{10, 20, 30, 40, 50, 60, 70, 80, 90, 100\}$

    5. timing_datasets.tar.xz

    Includes:

    • s1.txt, lsun.txt: two real datasets for baseline timing.

    • timesynth_{k}_{d}_{n}.txt: synthetic timing datasets with balanced cluster sizes C_{avg}=N/K, varying:

      • $k \in \{2,5\}$

      • $d \in \{2,5\}$

      • $N \in \{10000; 100000\}$

    Generated similarly to the scaling sets, following Mohassel et al.’s timing experiment protocol .

    Usage:

    Unpack any archive with tar -xJf

  5. Ronald Fisher (1936)-IRIS

    • kaggle.com
    Updated Aug 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ravi Dutt Ramanujapu (2021). Ronald Fisher (1936)-IRIS [Dataset]. https://www.kaggle.com/raviduttramanujapu/ronald-fisher-1936iris/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 25, 2021
    Dataset provided by
    Kaggle
    Authors
    Ravi Dutt Ramanujapu
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description
    1. Title: Iris Plants Database

    2. Sources: (a) Creator: R.A. Fisher (b) Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov) (c) Date: July, 1988

    3. Past Usage:

      • Publications: too many to mention!!! Here are a few.
      • Fisher,R.A. "The use of multiple measurements in taxonomic problems" Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to Mathematical Statistics" (John Wiley, NY, 1950).
      • Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis. (Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218.
      • Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System Structure and Classification Rule for Recognition in Partially Exposed Environments". IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-2, No. 1, 67-71. -- Results: -- very low misclassification rates (0% for the setosa class)
      • Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule". IEEE Transactions on Information Theory, May 1972, 431-433. -- Results: -- very low misclassification rates again
      • See also: 1988 MLC Proceedings, 54-64. Cheeseman et al's AUTOCLASS II conceptual clustering system finds 3 classes in the data.
    4. Relevant Information: --- This is perhaps the best known database to be found in the pattern recognition literature. Fisher's paper is a classic in the field and is referenced frequently to this day. (See Duda & Hart, for example.) The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other. --- Predicted attribute: class of iris plant. --- This is an exceedingly simple domain. --- This data differs from the data presented in Fishers article

    5. Number of Instances: 150 (50 in each of three classes)

    6. Number of Attributes: 4 numeric, predictive attributes and the class

    7. Attribute Information:

      1. sepal length in cm
      2. sepal width in cm
      3. petal length in cm
      4. petal width in cm
      5. class: -- Iris Setosa -- Iris Versicolour -- Iris Virginica
    8. Missing Attribute Values: None

    Summary Statistics:

    sepal length: 4.3 7.9 5.84 0.83 0.7826
    sepal width: 2.0 4.4 3.05 0.43 -0.4194 petal length: 1.0 6.9 3.76 1.76 0.9490 (high!) petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)

    1. Class Distribution: 33.3% for each of 3 classes.
  6. f

    Consistency of variables in the VKFCM-K-LP clustering with the imputation of...

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jun 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anny K. G. Rodrigues; Raydonal Ospina; Marcelo R. P. Ferreira (2023). Consistency of variables in the VKFCM-K-LP clustering with the imputation of missing values using mean values for the Iris Plant dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0259266.t013
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Anny K. G. Rodrigues; Raydonal Ospina; Marcelo R. P. Ferreira
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Consistency of variables in the VKFCM-K-LP clustering with the imputation of missing values using mean values for the Iris Plant dataset.

  7. f

    Iris dataset local result Table for A and B (RA, RB) using purity index.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WAQAR ISHAQ; ELIYA BUYUKKAYA; MUSHTAQ ALI; ZAKIR KHAN (2023). Iris dataset local result Table for A and B (RA, RB) using purity index. [Dataset]. http://doi.org/10.1371/journal.pone.0244691.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    WAQAR ISHAQ; ELIYA BUYUKKAYA; MUSHTAQ ALI; ZAKIR KHAN
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Iris dataset local result Table for A and B (RA, RB) using purity index.

  8. f

    Performance of the VKFCM-K-LP clustering algorithm with the WDS, PDS and OCS...

    • plos.figshare.com
    xls
    Updated Jun 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anny K. G. Rodrigues; Raydonal Ospina; Marcelo R. P. Ferreira (2023). Performance of the VKFCM-K-LP clustering algorithm with the WDS, PDS and OCS strategies for the dataset Iris Plant. [Dataset]. http://doi.org/10.1371/journal.pone.0259266.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Anny K. G. Rodrigues; Raydonal Ospina; Marcelo R. P. Ferreira
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance of the VKFCM-K-LP clustering algorithm with the WDS, PDS and OCS strategies for the dataset Iris Plant.

  9. Data from: Complexity of possibly gapped histogram and analysis of histogram...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated May 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hsieh Fushing; Tania Roy; Hsieh Fushing; Tania Roy (2022). Data from: Complexity of possibly gapped histogram and analysis of histogram [Dataset]. http://doi.org/10.5061/dryad.bs632
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Hsieh Fushing; Tania Roy; Hsieh Fushing; Tania Roy
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    We demonstrate that gaps and distributional patterns embedded within real-valued measurements are inseparable biological and mechanistic information contents of the system. Such patterns are discovered through data-driven possibly gapped histogram, which further leads to the geometry-based analysis of histogram (ANOHT). Constructing a possibly gapped histogram is a complex problem of statistical mechanics due to the ensemble of candidate histograms being captured by a two-layer Ising model. This construction is also a distinctive problem of Information Theory from the perspective of data compression via uniformity. By defining a Hamiltonian (or energy) as a sum of total coding lengths of boundaries and total decoding errors within bins, this issue of computing the minimum energy macroscopic states is surprisingly resolved by applying the hierarchical clustering algorithm. Thus, a possibly gapped histogram corresponds to a macro-state. And then the first phase of ANOHT is developed for simultaneous comparison of multiple treatments, while the second phase of ANOHT is developed based on classical empirical process theory for a tree-geometry that can check the authenticity of branches of the treatment tree. The well-known Iris data are used to illustrate our technical developments. Also, a large baseball pitching dataset and a heavily right-censored divorce data are analysed to showcase the existential gaps and utilities of ANOHT.

  10. f

    Iris dataset local result table for A and B (RA, RB) using Davies Bouldin...

    • figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WAQAR ISHAQ; ELIYA BUYUKKAYA; MUSHTAQ ALI; ZAKIR KHAN (2023). Iris dataset local result table for A and B (RA, RB) using Davies Bouldin index. [Dataset]. http://doi.org/10.1371/journal.pone.0244691.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOS ONE
    Authors
    WAQAR ISHAQ; ELIYA BUYUKKAYA; MUSHTAQ ALI; ZAKIR KHAN
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Iris dataset local result table for A and B (RA, RB) using Davies Bouldin index.

  11. Test for “clustering” in dilute CoxMg1-xO (x=0.03):

    • data.isis.stfc.ac.uk
    raw/nexus
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr Chris Stock; Dr Paul Sarte; Dr William Buyers; Dr Duc Le, Test for “clustering” in dilute CoxMg1-xO (x=0.03): [Dataset]. http://doi.org/10.5286/ISIS.E.RB1520368
    Explore at:
    raw/nexusAvailable download formats
    Dataset provided by
    Science and Technology Facilities Councilhttps://stfc.ukri.org/
    Authors
    Dr Chris Stock; Dr Paul Sarte; Dr William Buyers; Dr Duc Le
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We have been pursuing a study of the magnetic fluctuations in CoO involving a single crystal experiment on MERLIN. To understand the complex magnetic hamiltonian involving spin-orbit, structural distortion terms, and spin exchange, we have in parallel been studying dilute samples of MgO doped with a small amount of Cobalt. As shown previously in a number of studies, these dilute compounds can be understood in terms of comparatively simple dimer physics. This is a large simplification to the Hamiltonian and provides an important constraint in the model to understand the magnetic excitations in pure CoO. One concerns is that our dilute samples have a large amount of clustering of Cobalt sites. We propose to test this and have made a large homogeneous dilute sample based on the sol-gel technique. We request 1 day on IRIS and 1 day on MARI to compare these samples.

  12. f

    Consistency of variables for the dataset Iris Plant.

    • plos.figshare.com
    xls
    Updated Jun 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anny K. G. Rodrigues; Raydonal Ospina; Marcelo R. P. Ferreira (2023). Consistency of variables for the dataset Iris Plant. [Dataset]. http://doi.org/10.1371/journal.pone.0259266.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Anny K. G. Rodrigues; Raydonal Ospina; Marcelo R. P. Ferreira
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Consistency of variables for the dataset Iris Plant.

  13. f

    Iris Davies Bouldin measurement.

    • figshare.com
    xls
    Updated Jun 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WAQAR ISHAQ; ELIYA BUYUKKAYA; MUSHTAQ ALI; ZAKIR KHAN (2023). Iris Davies Bouldin measurement. [Dataset]. http://doi.org/10.1371/journal.pone.0244691.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 12, 2023
    Dataset provided by
    PLOS ONE
    Authors
    WAQAR ISHAQ; ELIYA BUYUKKAYA; MUSHTAQ ALI; ZAKIR KHAN
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Iris Davies Bouldin measurement.

  14. f

    K–means clustering statistics from K = 1 to K = 3.

    • plos.figshare.com
    xls
    Updated Jun 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Azra Blythe-Mallett; Karl A. Aiken; Iris Segura-Garcia; Nathan K. Truelove; Mona K. Webber; Marcia E. Roye; Stephen J. Box (2023). K–means clustering statistics from K = 1 to K = 3. [Dataset]. http://doi.org/10.1371/journal.pone.0245703.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 11, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Azra Blythe-Mallett; Karl A. Aiken; Iris Segura-Garcia; Nathan K. Truelove; Mona K. Webber; Marcia E. Roye; Stephen J. Box
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    K–means clustering statistics from K = 1 to K = 3.

  15. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Himanshu Nakrani (2022). Iris dataset [Dataset]. https://www.kaggle.com/datasets/himanshunakrani/iris-dataset
Organization logo

Iris dataset

Classify iris plants into three species in this classic dataset

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 20, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Himanshu Nakrani
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

It includes three iris species with 50 samples each as well as some properties of each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other.

FIle name: iris.csv

Search
Clear search
Close search
Google apps
Main menu