3 datasets found

Iris Species Dataset and Database
kaggle.com
Updated May 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ghanshyam Saini (2025). Iris Species Dataset and Database [Dataset]. https://www.kaggle.com/datasets/ghnshymsaini/iris-species-dataset-and-database
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 15, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ghanshyam Saini
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Iris Flower Dataset

This is a classic and very widely used dataset in machine learning and statistics, often serving as a first dataset for classification problems. Introduced by the British statistician and biologist Ronald Fisher in his 1936 paper "The use of multiple measurements in taxonomic problems," it is a foundational resource for learning classification algorithms.

Overview:

The dataset contains measurements for 150 samples of iris flowers. Each sample belongs to one of three species of iris:

Iris setosa

Iris versicolor

Iris virginica

For each flower, four features were measured:

Sepal length (in cm)

Sepal width (in cm)

Petal length (in cm)

Petal width (in cm)

The goal is typically to build a model that can classify iris flowers into their correct species based on these four features.

File Structure:

The dataset is usually provided as a single CSV (Comma Separated Values) file, often named iris.csv or similar. This file typically contains the following columns:

sepal_length (cm): Numerical. The length of the sepal of the iris flower.

sepal_width (cm): Numerical. The width of the sepal of the iris flower.

petal_length (cm): Numerical. The length of the petal of the iris flower.

petal_width (cm): Numerical. The width of the petal of the iris flower.

species: Categorical. The species of the iris flower (either 'setosa', 'versicolor', or 'virginica'). This is the target variable for classification.

Content of the Data:

The dataset contains an equal number of samples (50) for each of the three iris species. The measurements of the sepal and petal dimensions vary between the species, allowing for their differentiation using machine learning models.

How to Use This Dataset:

Download the iris.csv file.

Load the data using libraries like Pandas in Python.

Explore the data through visualization and statistical analysis to understand the relationships between the features and the different species.

Build classification models (e.g., Logistic Regression, Support Vector Machines, Decision Trees, K-Nearest Neighbors) using the sepal and petal measurements as features and the 'species' column as the target variable.

Evaluate the performance of your model using appropriate metrics (e.g., accuracy, precision, recall, F1-score).

The dataset is small and well-behaved, making it excellent for learning and experimenting with various classification techniques.

Citation:

When using the Iris dataset, it is common to cite Ronald Fisher's original work:

Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179-188.

Data Contribution:

Thank you for providing this classic and fundamental dataset to the Kaggle community. The Iris dataset remains an invaluable resource for both beginners learning the basics of classification and experienced practitioners testing new algorithms. Its simplicity and clear class separation make it an ideal starting point for many data science projects.

If you find this dataset description helpful and the dataset itself useful for your learning or projects, please consider giving it an upvote after downloading. Your appreciation is valuable!
Iris dataset
kaggle.com
Updated Jul 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Himanshu Nakrani (2022). Iris dataset [Dataset]. https://www.kaggle.com/datasets/himanshunakrani/iris-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 20, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Himanshu Nakrani
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
It includes three iris species with 50 samples each as well as some properties of each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other.

FIle name: iris.csv
h
iris
huggingface.co
Updated Sep 23, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
scikit-learn (2022). iris [Dataset]. https://huggingface.co/datasets/scikit-learn/iris
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 23, 2022
Dataset authored and provided by
scikit-learn
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
Iris Species Dataset

The Iris dataset was used in R.A. Fisher's classic 1936 paper, The Use of Multiple Measurements in Taxonomic Problems, and can also be found on the UCI Machine Learning Repository. It includes three iris species with 50 samples each as well as some properties about each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other. The dataset is taken from UCI Machine Learning Repository's… See the full description on the dataset page: https://huggingface.co/datasets/scikit-learn/iris.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Ghanshyam Saini (2025). Iris Species Dataset and Database [Dataset]. https://www.kaggle.com/datasets/ghnshymsaini/iris-species-dataset-and-database

Iris Species Dataset and Database

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

May 15, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Ghanshyam Saini

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Iris Flower Dataset

This is a classic and very widely used dataset in machine learning and statistics, often serving as a first dataset for classification problems. Introduced by the British statistician and biologist Ronald Fisher in his 1936 paper "The use of multiple measurements in taxonomic problems," it is a foundational resource for learning classification algorithms.

Overview:

The dataset contains measurements for 150 samples of iris flowers. Each sample belongs to one of three species of iris:

Iris setosa
Iris versicolor
Iris virginica

For each flower, four features were measured:

Sepal length (in cm)
Sepal width (in cm)
Petal length (in cm)
Petal width (in cm)

The goal is typically to build a model that can classify iris flowers into their correct species based on these four features.

File Structure:

The dataset is usually provided as a single CSV (Comma Separated Values) file, often named iris.csv or similar. This file typically contains the following columns:

sepal_length (cm): Numerical. The length of the sepal of the iris flower.
sepal_width (cm): Numerical. The width of the sepal of the iris flower.
petal_length (cm): Numerical. The length of the petal of the iris flower.
petal_width (cm): Numerical. The width of the petal of the iris flower.
species: Categorical. The species of the iris flower (either 'setosa', 'versicolor', or 'virginica'). This is the target variable for classification.

Content of the Data:

The dataset contains an equal number of samples (50) for each of the three iris species. The measurements of the sepal and petal dimensions vary between the species, allowing for their differentiation using machine learning models.

How to Use This Dataset:

Download the iris.csv file.
Load the data using libraries like Pandas in Python.
Explore the data through visualization and statistical analysis to understand the relationships between the features and the different species.
Build classification models (e.g., Logistic Regression, Support Vector Machines, Decision Trees, K-Nearest Neighbors) using the sepal and petal measurements as features and the 'species' column as the target variable.
Evaluate the performance of your model using appropriate metrics (e.g., accuracy, precision, recall, F1-score).
The dataset is small and well-behaved, making it excellent for learning and experimenting with various classification techniques.

Citation:

When using the Iris dataset, it is common to cite Ronald Fisher's original work:

Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179-188.

Data Contribution:

Thank you for providing this classic and fundamental dataset to the Kaggle community. The Iris dataset remains an invaluable resource for both beginners learning the basics of classification and experienced practitioners testing new algorithms. Its simplicity and clear class separation make it an ideal starting point for many data science projects.

If you find this dataset description helpful and the dataset itself useful for your learning or projects, please consider giving it an upvote after downloading. Your appreciation is valuable!

Clear search

Close search

Google apps

Main menu