14 datasets found

Iris dataset
kaggle.com
Updated Jul 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Himanshu Nakrani (2022). Iris dataset [Dataset]. https://www.kaggle.com/datasets/himanshunakrani/iris-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 20, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Himanshu Nakrani
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
It includes three iris species with 50 samples each as well as some properties of each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other.

FIle name: iris.csv
f
Explanations for each cluster in Iris dataset.
plos.figshare.com
xls
Updated Oct 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Liang Chen; Caiming Zhong; Zehua Zhang (2023). Explanations for each cluster in Iris dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0292960.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0292960.t003
Dataset updated
Oct 27, 2023
Dataset provided by
PLOS ONE
Authors
Liang Chen; Caiming Zhong; Zehua Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Clustering is an unsupervised machine learning technique whose goal is to cluster unlabeled data. But traditional clustering methods only output a set of results and do not provide any explanations of the results. Although in the literature a number of methods based on decision tree have been proposed to explain the clustering results, most of them have some disadvantages, such as too many branches and too deep leaves, which lead to complex explanations and make it difficult for users to understand. In this paper, a hypercube overlay model based on multi-objective optimization is proposed to achieve succinct explanations of clustering results. The model designs two objective functions based on the number of hypercubes and the compactness of instances and then uses multi-objective optimization to find a set of nondominated solutions. Finally, an Utopia point is defined to determine the most suitable solution, in which each cluster can be covered by as few hypercubes as possible. Based on these hypercubes, an explanations of each cluster is provided. Upon verification on synthetic and real datasets respectively, it shows that the model can provide a concise and understandable explanations to users.
h
iris-clase
huggingface.co
Updated Apr 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrés Eduardo García Herrera (2025). iris-clase [Dataset]. https://huggingface.co/datasets/aegarciaherrera/iris-clase
Explore at:
Dataset updated
Apr 5, 2025
Authors
Andrés Eduardo García Herrera
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
Dataset Card for "iris"

Dataset Summary

The Iris dataset is one of the most classic datasets in machine learning, often used for classification and clustering tasks. It contains 150 samples of iris flowers, each described by four features: sepal length, sepal width, petal length, and petal width. The task is to classify the samples into one of three species: Iris setosa, Iris versicolor, or Iris virginica. This dataset is especially useful for:

Supervised learning… See the full description on the dataset page: https://huggingface.co/datasets/aegarciaherrera/iris-clase.
FastLloyd Clustering Datasets
zenodo.org
xz
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdulrahman Diaa; Abdulrahman Diaa; Thomas Humphries; Thomas Humphries; Florian Kerschbaum; Florian Kerschbaum (2025). FastLloyd Clustering Datasets [Dataset]. http://doi.org/10.5281/zenodo.15530593
Explore at:
xzAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15530593
Dataset updated
May 28, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Abdulrahman Diaa; Abdulrahman Diaa; Thomas Humphries; Thomas Humphries; Florian Kerschbaum; Florian Kerschbaum
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This artifact bundles the five dataset archives used in our private federated clustering evaluation, corresponding to the real-world benchmarks, scaling experiments, ablation studies, and timing performance tests described in the paper. The real_datasets.tar.xz includes ten established clustering benchmarks drawn from UCI and the Clustering basic benchmark (DOI: https://doi.org/10.1007/s10489-018-1238-7); scale_datasets.tar.xz contains the SynthNew family generated to assess scalability via the R clusterGeneration package ; ablate_datasets.tar.xz holds the AblateSynth sets varying cluster separation for ablation analysis also powered by clusterGeneration ; g2_datasets.tar.xz packages the G2 sets—Gaussian clusters of size 2048 across dimensions 2–1024 with two clusters each, collected from the Clustering basic benchmark (DOI: https://doi.org/10.1007/s10489-018-1238-7) ; and timing_datasets.tar.xz includes the real s1 and lsun datasets alongside TimeSynth files (balanced synthetic clusters for timing), as per Mohassel et al.’s experimental framework .

Contents

1. real_datasets.tar.xz

Contains ten real-world benchmark datasets and formatted as one sample per line with space-separated features:

iris.txt: 150 samples, 4 features, 3 classes; classic UCI Iris dataset for petal/sepal measurements.

lsun.txt: 400 samples, 2 features, 3 clusters; two-dimensional variant of the LSUN dataset for clustering experiments .

s1.txt: 5,000 samples, 2 features, 15 clusters; synthetic benchmark from Fränti’s S1 series.

house.txt: 1,837 samples, 3 features, 3 clusters; housing data transformed for clustering tasks.

adult.txt: 48,842 samples, 6 features, 3 clusters; UCI Census Income (“Adult”) dataset for income bracket prediction.

wine.txt: 178 samples, 13 features, 3 cultivars; UCI Wine dataset with chemical analysis features.

breast.txt: 569 samples, 9 features, 2 classes; Wisconsin Diagnostic Breast Cancer dataset.

yeast.txt: 1,484 samples, 8 features, 10 localization sites; yeast protein localization data.

mnist.txt: 10,000 samples, 784 features (28×28 pixels), 10 digit classes; MNIST handwritten digits.

birch2.txt: (a random) 25,000/100,000 subset of samples, 2 features, 100 clusters; synthetic BIRCH2 dataset for high-cluster‐count evaluation .

2. scale_datasets.tar.xz

Holds the SynthNew_{k}_{d}_{s}.txt files for scaling experiments, where:

$k \in \{2,4,8,16,32\}$ is the number of clusters,

$d \in \{2,4,8,16,32,64,128,256,512\}$ is the dimensionality,

$s \in \{1,2,3\}$ are different random seeds.

These are generated with the R clusterGeneration package with cluster sizes following a $1:2:...:k$ ratio. We incorporate a random number (in $[0, 100]$) of randomly sampled outliers and set the cluster separation degrees randomly in $[0.16, 0.26]$, spanning partially overlapping to separated clusters.

3. ablate_datasets.tar.xz

Contains the AblateSynth_{k}_{d}_{sep}.txt files for ablation studies, with:

$k \in \{2,4,8,16\}$ clusters,

$d \in \{2,4,8,16\}$ dimensions,

$sep \in \{0.25, 0.5, 0.75\}$ controlling cluster separation degrees.

Also generated via clusterGeneration.

4. g2_datasets.tar.xz

Packages the G2 synthetic sets (g2-{dim}-{var}.txt) from the clustering-data benchmarks:

$N=2048$ samples, $k=2$ Gaussian clusters,

Dimensions $d \in \{1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024\}$

Cluster overlap $var \in \{10, 20, 30, 40, 50, 60, 70, 80, 90, 100\}$

5. timing_datasets.tar.xz

Includes:

s1.txt, lsun.txt: two real datasets for baseline timing.

timesynth_{k}_{d}_{n}.txt: synthetic timing datasets with balanced cluster sizes C_{avg}=N/K, varying:

$k \in \{2,5\}$

$d \in \{2,5\}$

$N \in \{10000; 100000\}$

Generated similarly to the scaling sets, following Mohassel et al.’s timing experiment protocol .

Usage:

Unpack any archive with tar -xJf
Ronald Fisher (1936)-IRIS
kaggle.com
Updated Aug 25, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ravi Dutt Ramanujapu (2021). Ronald Fisher (1936)-IRIS [Dataset]. https://www.kaggle.com/raviduttramanujapu/ronald-fisher-1936iris/metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 25, 2021
Dataset provided by
Kaggle
Authors
Ravi Dutt Ramanujapu
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Title: Iris Plants Database

Sources: (a) Creator: R.A. Fisher (b) Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov) (c) Date: July, 1988

Past Usage:

Publications: too many to mention!!! Here are a few.

Fisher,R.A. "The use of multiple measurements in taxonomic problems" Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to Mathematical Statistics" (John Wiley, NY, 1950).

Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis. (Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218.

Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System Structure and Classification Rule for Recognition in Partially Exposed Environments". IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-2, No. 1, 67-71. -- Results: -- very low misclassification rates (0% for the setosa class)

Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule". IEEE Transactions on Information Theory, May 1972, 431-433. -- Results: -- very low misclassification rates again

See also: 1988 MLC Proceedings, 54-64. Cheeseman et al's AUTOCLASS II conceptual clustering system finds 3 classes in the data.

Relevant Information: --- This is perhaps the best known database to be found in the pattern recognition literature. Fisher's paper is a classic in the field and is referenced frequently to this day. (See Duda & Hart, for example.) The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other. --- Predicted attribute: class of iris plant. --- This is an exceedingly simple domain. --- This data differs from the data presented in Fishers article

Number of Instances: 150 (50 in each of three classes)

Number of Attributes: 4 numeric, predictive attributes and the class

Attribute Information:

sepal length in cm

sepal width in cm

petal length in cm

petal width in cm

class: -- Iris Setosa -- Iris Versicolour -- Iris Virginica

Missing Attribute Values: None

Summary Statistics:

sepal length: 4.3 7.9 5.84 0.83 0.7826
sepal width: 2.0 4.4 3.05 0.43 -0.4194 petal length: 1.0 6.9 3.76 1.76 0.9490 (high!) petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)

Class Distribution: 33.3% for each of 3 classes.
f
Consistency of variables in the VKFCM-K-LP clustering with the imputation of...
figshare.com
plos.figshare.com
xls
Updated Jun 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anny K. G. Rodrigues; Raydonal Ospina; Marcelo R. P. Ferreira (2023). Consistency of variables in the VKFCM-K-LP clustering with the imputation of missing values using mean values for the Iris Plant dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0259266.t013
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0259266.t013
Dataset updated
Jun 2, 2023
Dataset provided by
PLOS ONE
Authors
Anny K. G. Rodrigues; Raydonal Ospina; Marcelo R. P. Ferreira
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Consistency of variables in the VKFCM-K-LP clustering with the imputation of missing values using mean values for the Iris Plant dataset.
f
Iris dataset local result Table for A and B (RA, RB) using purity index.
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WAQAR ISHAQ; ELIYA BUYUKKAYA; MUSHTAQ ALI; ZAKIR KHAN (2023). Iris dataset local result Table for A and B (RA, RB) using purity index. [Dataset]. http://doi.org/10.1371/journal.pone.0244691.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0244691.t004
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
WAQAR ISHAQ; ELIYA BUYUKKAYA; MUSHTAQ ALI; ZAKIR KHAN
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Iris dataset local result Table for A and B (RA, RB) using purity index.
f
Performance of the VKFCM-K-LP clustering algorithm with the WDS, PDS and OCS...
plos.figshare.com
xls
Updated Jun 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anny K. G. Rodrigues; Raydonal Ospina; Marcelo R. P. Ferreira (2023). Performance of the VKFCM-K-LP clustering algorithm with the WDS, PDS and OCS strategies for the dataset Iris Plant. [Dataset]. http://doi.org/10.1371/journal.pone.0259266.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0259266.t001
Dataset updated
Jun 8, 2023
Dataset provided by
PLOS ONE
Authors
Anny K. G. Rodrigues; Raydonal Ospina; Marcelo R. P. Ferreira
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Performance of the VKFCM-K-LP clustering algorithm with the WDS, PDS and OCS strategies for the dataset Iris Plant.
Data from: Complexity of possibly gapped histogram and analysis of histogram...
zenodo.org
data.niaid.nih.gov
+1more
zip
Updated May 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hsieh Fushing; Tania Roy; Hsieh Fushing; Tania Roy (2022). Data from: Complexity of possibly gapped histogram and analysis of histogram [Dataset]. http://doi.org/10.5061/dryad.bs632
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.bs632
Dataset updated
May 31, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Hsieh Fushing; Tania Roy; Hsieh Fushing; Tania Roy
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
We demonstrate that gaps and distributional patterns embedded within real-valued measurements are inseparable biological and mechanistic information contents of the system. Such patterns are discovered through data-driven possibly gapped histogram, which further leads to the geometry-based analysis of histogram (ANOHT). Constructing a possibly gapped histogram is a complex problem of statistical mechanics due to the ensemble of candidate histograms being captured by a two-layer Ising model. This construction is also a distinctive problem of Information Theory from the perspective of data compression via uniformity. By defining a Hamiltonian (or energy) as a sum of total coding lengths of boundaries and total decoding errors within bins, this issue of computing the minimum energy macroscopic states is surprisingly resolved by applying the hierarchical clustering algorithm. Thus, a possibly gapped histogram corresponds to a macro-state. And then the first phase of ANOHT is developed for simultaneous comparison of multiple treatments, while the second phase of ANOHT is developed based on classical empirical process theory for a tree-geometry that can check the authenticity of branches of the treatment tree. The well-known Iris data are used to illustrate our technical developments. Also, a large baseball pitching dataset and a heavily right-censored divorce data are analysed to showcase the existential gaps and utilities of ANOHT.
f
Iris dataset local result table for A and B (RA, RB) using Davies Bouldin...
figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WAQAR ISHAQ; ELIYA BUYUKKAYA; MUSHTAQ ALI; ZAKIR KHAN (2023). Iris dataset local result table for A and B (RA, RB) using Davies Bouldin index. [Dataset]. http://doi.org/10.1371/journal.pone.0244691.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0244691.t005
Dataset updated
Jun 3, 2023
Dataset provided by
PLOS ONE
Authors
WAQAR ISHAQ; ELIYA BUYUKKAYA; MUSHTAQ ALI; ZAKIR KHAN
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Iris dataset local result table for A and B (RA, RB) using Davies Bouldin index.
Test for “clustering” in dilute CoxMg1-xO (x=0.03):
data.isis.stfc.ac.uk
raw/nexus
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr Chris Stock; Dr Paul Sarte; Dr William Buyers; Dr Duc Le, Test for “clustering” in dilute CoxMg1-xO (x=0.03): [Dataset]. http://doi.org/10.5286/ISIS.E.RB1520368
Explore at:
raw/nexusAvailable download formats
Unique identifier
https://doi.org/10.5286/ISIS.E.RB1520368
Dataset provided by
Science and Technology Facilities Councilhttps://stfc.ukri.org/
Authors
Dr Chris Stock; Dr Paul Sarte; Dr William Buyers; Dr Duc Le
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We have been pursuing a study of the magnetic fluctuations in CoO involving a single crystal experiment on MERLIN. To understand the complex magnetic hamiltonian involving spin-orbit, structural distortion terms, and spin exchange, we have in parallel been studying dilute samples of MgO doped with a small amount of Cobalt. As shown previously in a number of studies, these dilute compounds can be understood in terms of comparatively simple dimer physics. This is a large simplification to the Hamiltonian and provides an important constraint in the model to understand the magnetic excitations in pure CoO. One concerns is that our dilute samples have a large amount of clustering of Cobalt sites. We propose to test this and have made a large homogeneous dilute sample based on the sol-gel technique. We request 1 day on IRIS and 1 day on MARI to compare these samples.
f
Consistency of variables for the dataset Iris Plant.
plos.figshare.com
xls
Updated Jun 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anny K. G. Rodrigues; Raydonal Ospina; Marcelo R. P. Ferreira (2023). Consistency of variables for the dataset Iris Plant. [Dataset]. http://doi.org/10.1371/journal.pone.0259266.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0259266.t006
Dataset updated
Jun 8, 2023
Dataset provided by
PLOS ONE
Authors
Anny K. G. Rodrigues; Raydonal Ospina; Marcelo R. P. Ferreira
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Consistency of variables for the dataset Iris Plant.
f
Iris Davies Bouldin measurement.
figshare.com
xls
Updated Jun 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WAQAR ISHAQ; ELIYA BUYUKKAYA; MUSHTAQ ALI; ZAKIR KHAN (2023). Iris Davies Bouldin measurement. [Dataset]. http://doi.org/10.1371/journal.pone.0244691.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0244691.t006
Dataset updated
Jun 12, 2023
Dataset provided by
PLOS ONE
Authors
WAQAR ISHAQ; ELIYA BUYUKKAYA; MUSHTAQ ALI; ZAKIR KHAN
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Iris Davies Bouldin measurement.
f
K–means clustering statistics from K = 1 to K = 3.
plos.figshare.com
xls
Updated Jun 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Azra Blythe-Mallett; Karl A. Aiken; Iris Segura-Garcia; Nathan K. Truelove; Mona K. Webber; Marcia E. Roye; Stephen J. Box (2023). K–means clustering statistics from K = 1 to K = 3. [Dataset]. http://doi.org/10.1371/journal.pone.0245703.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0245703.t006
Dataset updated
Jun 11, 2023
Dataset provided by
PLOS ONE
Authors
Azra Blythe-Mallett; Karl A. Aiken; Iris Segura-Garcia; Nathan K. Truelove; Mona K. Webber; Marcia E. Roye; Stephen J. Box
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
K–means clustering statistics from K = 1 to K = 3.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Himanshu Nakrani (2022). Iris dataset [Dataset]. https://www.kaggle.com/datasets/himanshunakrani/iris-dataset

Iris dataset

Classify iris plants into three species in this classic dataset

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jul 20, 2022

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Himanshu Nakrani

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

It includes three iris species with 50 samples each as well as some properties of each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other.

FIle name: iris.csv

Clear search

Close search

Google apps

Main menu

Iris dataset

Explanations for each cluster in Iris dataset.

iris-clase

FastLloyd Clustering Datasets

Contents

1. real_datasets.tar.xz

2. scale_datasets.tar.xz

3. ablate_datasets.tar.xz

4. g2_datasets.tar.xz

5. timing_datasets.tar.xz

Ronald Fisher (1936)-IRIS

Consistency of variables in the VKFCM-K-LP clustering with the imputation of...

Iris dataset local result Table for A and B (RA, RB) using purity index.

Performance of the VKFCM-K-LP clustering algorithm with the WDS, PDS and OCS...

Data from: Complexity of possibly gapped histogram and analysis of histogram...

Iris dataset local result table for A and B (RA, RB) using Davies Bouldin...

Test for “clustering” in dilute CoxMg1-xO (x=0.03):

Consistency of variables for the dataset Iris Plant.

Iris Davies Bouldin measurement.

K–means clustering statistics from K = 1 to K = 3.

Iris dataset

Classify iris plants into three species in this classic dataset