100+ datasets found
  1. Dataset for demonstrating simple linear Regression

    • kaggle.com
    zip
    Updated Jul 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aaditya Gupta (2024). Dataset for demonstrating simple linear Regression [Dataset]. https://www.kaggle.com/datasets/aadityagupta11/data-for-demonstrating-basic-linear-regression
    Explore at:
    zip(2132 bytes)Available download formats
    Dataset updated
    Jul 3, 2024
    Authors
    Aaditya Gupta
    Description

    This dataset has been created to demonstrate the use of a simple linear regression model. It includes two variables: an independent variable and a dependent variable. The data can be used for training, testing, and validating a simple linear regression model, making it ideal for educational purposes, tutorials, and basic predictive analysis projects. The dataset consists of 100 observations with no missing values, and it follows a linear relationship

  2. Multiple Linear Regression Dataset

    • kaggle.com
    zip
    Updated Aug 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hussain Nasir Khan (2022). Multiple Linear Regression Dataset [Dataset]. https://www.kaggle.com/datasets/hussainnasirkhan/multiple-linear-regression-dataset/code
    Explore at:
    zip(367 bytes)Available download formats
    Dataset updated
    Aug 14, 2022
    Authors
    Hussain Nasir Khan
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This is a very simple multiple linear regression dataset for beginners. This dataset has only three columns and twenty rows. There are only two independent variables and one dependent variable. The independent variables are 'age' and 'experience'. The dependent variable is 'income'.

  3. Basic Regression Learning Dataset

    • kaggle.com
    zip
    Updated Jan 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artem Cheremuhin (2025). Basic Regression Learning Dataset [Dataset]. https://www.kaggle.com/datasets/artemcheremuhin/basic-regression-learning-dataset
    Explore at:
    zip(227431 bytes)Available download formats
    Dataset updated
    Jan 19, 2025
    Authors
    Artem Cheremuhin
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    4 dataframes are presented for solving regression problems. Descriptions of the dataframe variables are presented in the corresponding documents .docx

  4. doughnut dataset for simple linear regression

    • kaggle.com
    zip
    Updated Oct 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anurag Verma (2023). doughnut dataset for simple linear regression [Dataset]. https://www.kaggle.com/datasets/anurag629/doughnut-dataset-for-simple-linear-regression
    Explore at:
    zip(2260 bytes)Available download formats
    Dataset updated
    Oct 29, 2023
    Authors
    Anurag Verma
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Anurag Verma

    Released under Apache 2.0

    Contents

  5. Basic datasets

    • kaggle.com
    zip
    Updated Apr 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pascal (2024). Basic datasets [Dataset]. https://www.kaggle.com/datasets/pyim59/basic-datasets
    Explore at:
    zip(2343887 bytes)Available download formats
    Dataset updated
    Apr 1, 2024
    Authors
    Pascal
    Description

    Ces datasets sont utilisés pour le cours de Centrale Lille sur le Machine Learning de Pascal Yim (Image générée avec ideogram.ai)

    Régression

    "datareg_xxx_yyy.csv"

    Exemples simples pour la regression Par exemple "datareg_cos_300.csv" est un ensemble de 300 points suivant un cosinus bruité avec deux colonnes 'x' et 'y'

    "housing.csv"

    Estimation de la valeur moyenne des maisons (MEDV) par quartier en fonction de différentes données : - RM : nombre de chambres - LSTAT : mesure du taux de pauvreté - PTRATIO : mesure du taux d'encadrement par élève dans les écoles

    Version simplifiée du dataset original UCI

    Source : https://www.kaggle.com/datasets/schirmerchad/bostonhoustingmlnd

    "kc_house_data.csv"

    Prédiction de prix de maisons aux alentours de Seattle (district de King County)

    Source : https://www.kaggle.com/datasets/harlfoxem/housesalesprediction

    "house_prices.csv"

    Prédiction de prix de maisons - Compétition Kaggle

    Source : https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/data?select=train.csv

    Classification

    "geyser.csv"

    Le geyser « Old Faithful » est un geyser en cône du parc de Yellowstone aux États-Unis

    On a mesuré : - duration : la durée de l’éruption - waiting : l’intervalle de temps depuis la dernière éruption - kind : une étiquette 'short' ou 'long' du type d’éruption

    "iris.csv"

    Dataset pour classifier les espèces d'Iris

    https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQM3aH4Q3AplfE1MR3ROAp9Ok35fafmNT59ddXkdEvNdMkT8X6E">

    On a les informations suivantes : - sepal_length : longueur du sépale (en cm) - sepal_width : largeur du sépale - length,petal : longueur du pétale - petal_width : largeur du pétale - species : 3 espèces d'iris : 'setosa', 'versicolor' ou 'virginica'

    Source : UCI (http://archive.ics.uci.edu/)

    "iris_basic.csv"

    Une version simplifiée du dataset des iris, avec seulement les mesures de pétales et 2 espèces : versicolor (0) et virginica (1)

    "heart.csv"

    Prédiction de malaise cardiaque (output) en fonction de différents paramètres comme l'âge, le taux de cholesterol, ...

    Source : https://www.kaggle.com/rashikrahmanpritom/heart-attack-analysis-prediction-dataset

    "cancer.csv"

    On veut prédire si une tumeur est maline ou non, en fonction de mesures sur une biopsie de la tumeur

    Source : https://www.kaggle.com/uciml/breast-cancer-wisconsin-data

    "penguins.csv"

    Dataset comparable à celui des Iris. On veut prédire l'espèce de manchots

    • species : Adelie, Chinstrap, Gentoo
    • island : Biscoe, Dream, Torgersen
    • bill_length_mm : longueur du bec
    • bill_depth_mm : épaisseur du bec
      • flipper_length_mm : longueur de la nageoire
    • body_mass_g : poids
    • sex : “male” ou “female”

    Source : https://www.kaggle.com/ashkhagan/palmer-penguins-datasetalternative-iris-dataset

    "stars.csv"

    Classification d'étoiles

    Source : https://www.kaggle.com/datasets/deepu1109/star-dataset

    "mushrooms.csv"

    Prédire si un champignon est comestible ou non

    Source : https://www.kaggle.com/uciml/mushroom-classification

    "titanic.csv"

    Dataset très classique sur les survivants du Titanic

    Source : https://www.kaggle.com/c/titanic

    "diabetes.csv"

    Dataset "PIMA Indian diabete"

    Prédiction du diabète pour une population de femmes de la tribu Pima

    Source : https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database

    "churn-small.csv"

    On veut prédire le départ de clients pour la concurrence de clients Orange telecom (problème de ‘churn’ ou ‘attrition’)

    Version "churn-big.csv" avec plus de données

    Source : https://www.kaggle.com/datasets/mnassrib/telecom-churn-datasets

    "stroke.csv"

    Prédiction d'attaque cérébrale

    Source : https://www.kaggle.com/datasets/shashwatwork/cerebral-stroke-predictionimbalaced-dataset

    "predictive_maintenance.csv"

    Prédiction de pannes (UCI)

    Source : https://www.kaggle.com/datasets/shivamb/machine-predictive-maintenance-classification/code

  6. Synthetic Regression Dataset! 🤖

    • kaggle.com
    zip
    Updated Feb 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kyle Graupe (2023). Synthetic Regression Dataset! 🤖 [Dataset]. https://www.kaggle.com/datasets/kylegraupe/regression-dataset-great-for-practice
    Explore at:
    zip(133088 bytes)Available download formats
    Dataset updated
    Feb 26, 2023
    Authors
    Kyle Graupe
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset is a great way to practice single or multiple regression tasks. It is simple, clean, and pre-split into training and testing sets! Choose your input features and your target label(s) and start predicting! The beauty of this dataset is that you get to choose what to do with it. Like in the real world, we may have raw data, but don't have a clear path forward. Use this dataset to exercise that skill!

  7. Simple Linear Regression Dataset

    • kaggle.com
    Updated Jun 29, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nitesh Addagatla (2023). Simple Linear Regression Dataset [Dataset]. https://www.kaggle.com/datasets/niteshaddagatla/simple-linear-regression-datasset/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 29, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nitesh Addagatla
    Description

    Dataset

    This dataset was created by Nitesh Addagatla

    Contents

  8. Number of Books Read | Simple Linear Regression

    • kaggle.com
    zip
    Updated Aug 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Trushil Dhokiya (2024). Number of Books Read | Simple Linear Regression [Dataset]. https://www.kaggle.com/datasets/trushildhokiya/number-of-books-read-simple-linear-regression
    Explore at:
    zip(4644 bytes)Available download formats
    Dataset updated
    Aug 8, 2024
    Authors
    Trushil Dhokiya
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Description: Predicting Number of Books Read Based on Weekly Reading Hours

    Overview:

    This dataset aims to predict the number of books read per month based on the number of hours spent reading each week. It provides a practical dataset for linear regression tasks, where you can explore how reading habits impact the number of books completed.

    The dataset is generated based on assumption taking avg reading hour of people and avg time to read a book

    Features:

    HoursSpentReading (Feature): The number of hours spent reading per week, ranging from 0 to 20 hours. This feature captures the amount of time an individual dedicates to reading each week. BooksRead (Target): The number of books read per month, with values ranging from 0 to 10 books. This target variable represents the outcome influenced by the amount of weekly reading time.

    Dataset Details:

    • Number of Rows: 300
    • Number of Features: 1
    • Number of Targets: 1
  9. Regression Dataset

    • kaggle.com
    zip
    Updated Jul 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhavesh Jain (2021). Regression Dataset [Dataset]. https://www.kaggle.com/bhaveshjain1612/regression-dataset
    Explore at:
    zip(56398 bytes)Available download formats
    Dataset updated
    Jul 8, 2021
    Authors
    Bhavesh Jain
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Overview

    This dataset comprises 4 features and one target variable. Features: - Feature1 - Feature2 - Feature3 - Feature4

    Target: - Target

    Aim:

    We need to predict the value of Target based on the feature list

  10. Boston Housing Dataset

    • kaggle.com
    zip
    Updated Nov 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    João Pedro Jerônimo (2025). Boston Housing Dataset [Dataset]. https://www.kaggle.com/datasets/joopedrojernimo/boston-housing-dataset
    Explore at:
    zip(35513 bytes)Available download formats
    Dataset updated
    Nov 8, 2025
    Authors
    João Pedro Jerônimo
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    housing.csv: this dataset is constituted by 506 points in 14 dimensions. Each point represents a house in the Boston area, and the 14 attributes that you find orderly in each column are the following: * CRIM - per capita crime rate by town * ZN - proportion of residential land zoned for lots over 25,000 sq.ft. * INDUS - proportion of non-retail business acres per town. * CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise) * NOX - nitric oxides concentration (parts per 10 million) * RM - average number of rooms per dwelling * AGE - proportion of owner-occupied units built prior to 1940 * DIS - weighted distances to five Boston employment centres * RAD - index of accessibility to radial highways * TAX - full-value property-tax rate per $10,000 * PTRATIO - pupil-teacher ratio by town * B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town * LSTAT - % lower status of the population * MEDV - Median value of owner-occupied homes in $1000's

    This dataset is normally associated with 2 regression tasks: predicting NOX (in which the nitrous oxide level is to be predicted); and predicting price MEDV (in which the median value of a home is to be predicted).

    This dataset was also pre-processed and scaled.

  11. Pearson's Height Data 📏 Simple linear regression

    • kaggle.com
    zip
    Updated Aug 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MaDiha 🌷 (2024). Pearson's Height Data 📏 Simple linear regression [Dataset]. https://www.kaggle.com/datasets/fundal/pearsons-height-data-simple-linear-regression
    Explore at:
    zip(3544 bytes)Available download formats
    Dataset updated
    Aug 17, 2024
    Authors
    MaDiha 🌷
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Description The table below gives the heights of fathers and their sons, based on a famous experiment by Karl Pearson around 1903. The number of cases is 1078. Random noise was added to the original data, to produce heights to the nearest 0.1 inch.

    Objective: Use this dataset to practice simple linear regression.

    Columns - Father height - Son height

    Source: Department of Statistics, University of California, Berkeley

    Download TSV source file: Pearson.tsv

  12. polynomial regression

    • kaggle.com
    Updated Jul 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Miraj Deep Bhandari (2023). polynomial regression [Dataset]. http://doi.org/10.34740/kaggle/ds/3482232
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 5, 2023
    Dataset provided by
    Kaggle
    Authors
    Miraj Deep Bhandari
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The Ice Cream Selling dataset is a simple and well-suited dataset for beginners in machine learning who are looking to practice polynomial regression. It consists of two columns: temperature and the corresponding number of units of ice cream sold.

    The dataset captures the relationship between temperature and ice cream sales. It serves as a practical example for understanding and implementing polynomial regression, a powerful technique for modeling nonlinear relationships in data.

    The dataset is designed to be straightforward and easy to work with, making it ideal for beginners. The simplicity of the data allows beginners to focus on the fundamental concepts and steps involved in polynomial regression without overwhelming complexity.

    By using this dataset, beginners can gain hands-on experience in preprocessing the data, splitting it into training and testing sets, selecting an appropriate degree for the polynomial regression model, training the model, and evaluating its performance. They can also explore techniques to address potential challenges such as overfitting.

    With this dataset, beginners can practice making predictions of ice cream sales based on temperature inputs and visualize the polynomial regression curve that represents the relationship between temperature and ice cream sales.

    Overall, the Ice Cream Selling dataset provides an accessible and practical learning resource for beginners to grasp the concepts and techniques of polynomial regression in the context of analyzing ice cream sales data.

  13. Salary Dataset - ASPDC

    • kaggle.com
    zip
    Updated Sep 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vrajesh Sharma (2025). Salary Dataset - ASPDC [Dataset]. https://www.kaggle.com/datasets/vrajesh0sharma7/salary-dataset-aspdc
    Explore at:
    zip(457 bytes)Available download formats
    Dataset updated
    Sep 5, 2025
    Authors
    Vrajesh Sharma
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Salary Dataset in CSV for Simple linear regression. It has also been used in ASPDC series "ML in one month"

    There are two columns 1. Experience in years 2. Salary

  14. Simple Linear Regression - Students Study Data

    • kaggle.com
    zip
    Updated Aug 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Souradip Pal (2024). Simple Linear Regression - Students Study Data [Dataset]. https://www.kaggle.com/datasets/souradippal/simple-linear-regression-hours-vs-marks-data
    Explore at:
    zip(791 bytes)Available download formats
    Dataset updated
    Aug 19, 2024
    Authors
    Souradip Pal
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset is designed to help you practice linear regression, a fundamental concept in machine learning and statistical analysis. The dataset contains a simulated linear relationship between the number of hours a student studies and the marks they obtain. It is an ideal resource for beginners who want to understand how linear regression works, or for educators looking to provide a simple yet effective example to their students.

  15. Regression fake data

    • kaggle.com
    zip
    Updated Apr 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    A.Sehwag (2024). Regression fake data [Dataset]. https://www.kaggle.com/datasets/atulsehwag00/regression-fake-data
    Explore at:
    zip(26398 bytes)Available download formats
    Dataset updated
    Apr 9, 2024
    Authors
    A.Sehwag
    Description

    Simple data created for practicing regression problems. Consist of three columns: Price , Feature 1 and Feature 2. Try to predict price using feature1 and feature2.The data is clean and data cleaning is not required.

  16. Call Center Simulated Data

    • kaggle.com
    zip
    Updated Mar 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pablo Sebastián Campos Ortiz (2023). Call Center Simulated Data [Dataset]. https://www.kaggle.com/datasets/scss17/call-center-simulated-data
    Explore at:
    zip(3098 bytes)Available download formats
    Dataset updated
    Mar 28, 2023
    Authors
    Pablo Sebastián Campos Ortiz
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The aim of this data set is to be used along with my notebook Linear Regression Notes which provides a guideline for applying correlation analysis and linear regression models from a statistical approach.

    A fictional call center is interested in knowing the relationship between the number of personnel and some variables that measure their performance such as average answer time, average calls per hour, and average time per call. Data were simulated to represent 200 shifts.

  17. House Price Regression Dataset

    • kaggle.com
    zip
    Updated Sep 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prokshitha Polemoni (2024). House Price Regression Dataset [Dataset]. https://www.kaggle.com/datasets/prokshitha/home-value-insights
    Explore at:
    zip(27045 bytes)Available download formats
    Dataset updated
    Sep 6, 2024
    Authors
    Prokshitha Polemoni
    Description

    Home Value Insights: A Beginner's Regression Dataset

    This dataset is designed for beginners to practice regression problems, particularly in the context of predicting house prices. It contains 1000 rows, with each row representing a house and various attributes that influence its price. The dataset is well-suited for learning basic to intermediate-level regression modeling techniques.

    Features:

    1. Square_Footage: The size of the house in square feet. Larger homes typically have higher prices.
    2. Num_Bedrooms: The number of bedrooms in the house. More bedrooms generally increase the value of a home.
    3. Num_Bathrooms: The number of bathrooms in the house. Houses with more bathrooms are typically priced higher.
    4. Year_Built: The year the house was built. Older houses may be priced lower due to wear and tear.
    5. Lot_Size: The size of the lot the house is built on, measured in acres. Larger lots tend to add value to a property.
    6. Garage_Size: The number of cars that can fit in the garage. Houses with larger garages are usually more expensive.
    7. Neighborhood_Quality: A rating of the neighborhood’s quality on a scale of 1-10, where 10 indicates a high-quality neighborhood. Better neighborhoods usually command higher prices.
    8. House_Price (Target Variable): The price of the house, which is the dependent variable you aim to predict.

    Potential Uses:

    1. Beginner Regression Projects: This dataset can be used to practice building regression models such as Linear Regression, Decision Trees, or Random Forests. The target variable (house price) is continuous, making this an ideal problem for supervised learning techniques.

    2. Feature Engineering Practice: Learners can create new features by combining existing ones, such as the price per square foot or age of the house, providing an opportunity to experiment with feature transformations.

    3. Exploratory Data Analysis (EDA): You can explore how different features (e.g., square footage, number of bedrooms) correlate with the target variable, making it a great dataset for learning about data visualization and summary statistics.

    4. Model Evaluation: The dataset allows for various model evaluation techniques such as cross-validation, R-squared, and Mean Absolute Error (MAE). These metrics can be used to compare the effectiveness of different models.

    Versatility:

    • The dataset is highly versatile for a range of machine learning tasks. You can apply simple linear models to predict house prices based on one or two features, or use more complex models like Random Forest or Gradient Boosting Machines to understand interactions between variables.

    • It can also be used for dimensionality reduction techniques like PCA or to practice handling categorical variables (e.g., neighborhood quality) through encoding techniques like one-hot encoding.

    • This dataset is ideal for anyone wanting to gain practical experience in building regression models while working with real-world features.

  18. Simple Linear Regression - Placement data

    • kaggle.com
    zip
    Updated Nov 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mayur Dalvi (2021). Simple Linear Regression - Placement data [Dataset]. https://www.kaggle.com/mayurdalvi/simple-linear-regression-placement-data
    Explore at:
    zip(3741 bytes)Available download formats
    Dataset updated
    Nov 15, 2021
    Authors
    Mayur Dalvi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    This package was build to understand Simple Linear Regression. The content in this dataset are easy to understand.

    Content

    Contains Two columns:

    CGPA : Aggregate Cgpa received Package : Total Package (LPA)

    Thank You !!

    If like my work please UPVOTE 🙏🙏

    Happy Learning

  19. Color Polygon Images

    • kaggle.com
    zip
    Updated Apr 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gonzalo Recio (2022). Color Polygon Images [Dataset]. https://www.kaggle.com/datasets/gonzalorecioc/color-polygon-images/discussion
    Explore at:
    zip(8387599 bytes)Available download formats
    Dataset updated
    Apr 7, 2022
    Authors
    Gonzalo Recio
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This is a randomly generated image dataset containing 10,000 square images with a single regular polygon each. The procedure to create this dataset can be found in the dataset Code tab.

    This dataset can be used for a number of Machine Learning problems, both classification and regression.

    • The image size is (128, 128)px
    • The possible regular polygons are triangles, squares, pentagons and hexagons. A classifier should be able to predict what's the shape in an unseen image.
    • The possible colors for foreground and background are ["white", "gray", "black", "red", "green", "blue", "yellow", "orange", "purple", "pink"], but they are slightly altered for every image. These color labels can be predicted using a multiclass classifier model.
    • The bounding circle is always within the image boundaries. A regression model can be trained to predict where is this circle.
    • The rotation can be any degree between 0 and 360, this can pose an interesting problem for a regression model, since for all regular polygons there are a set of angles that produce the same image.
  20. Logistic regression

    • kaggle.com
    zip
    Updated Feb 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    osama shabih (2025). Logistic regression [Dataset]. https://www.kaggle.com/datasets/osama12bin/logistic-regression
    Explore at:
    zip(2317 bytes)Available download formats
    Dataset updated
    Feb 1, 2025
    Authors
    osama shabih
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Logistic regression is a statistical method used for binary classification tasks, where the goal is to predict one of two possible outcomes. It's widely used in machine learning for tasks like spam detection, disease diagnosis, and customer churn prediction.

    In logistic regression, the dependent variable (the outcome) is categorical and typically takes on two values (often represented as 0 and 1). The model works by estimating the probability that a given input belongs to a certain class, based on one or more predictor variables (which can be continuous or categorical).

    Key points: Sigmoid Function: Logistic regression uses the sigmoid (or logistic) function, which maps any real-valued number to a value between 0 and 1. This is how the model outputs a probability.

    The sigmoid function is given by:

    𝑃 (

    𝑦

    1 ∣ 𝑋

    )

    1 1 + 𝑒 − 𝑧 P(y=1∣X)= 1+e −z

    1 ​

    where 𝑧 z is a linear combination of the input features:

    𝑧

    𝛽 0 + 𝛽 1 𝑥 1 + 𝛽 2 𝑥 2 + ⋯ + 𝛽 𝑛 𝑥 𝑛 z=β 0 ​ +β 1 ​ x 1 ​ +β 2 ​ x 2 ​ +⋯+β n ​ x n ​

    Here, 𝛽 0 , 𝛽 1 , … , 𝛽 𝑛 β 0 ​ ,β 1 ​ ,…,β n ​ are the coefficients, and 𝑥 1 , 𝑥 2 , … , 𝑥 𝑛 x 1 ​ ,x 2 ​ ,…,x n ​ are the features.

    Prediction: Once the model is trained, it predicts a probability 𝑃 (

    𝑦

    1 ∣ 𝑋 ) P(y=1∣X). A threshold (often 0.5) is used to classify the observation as belonging to one class or the other. If the probability is greater than 0.5, it predicts class 1; otherwise, it predicts class 0.

    Loss Function: Logistic regression typically uses a loss function called log loss (or binary cross-entropy), which measures the difference between the predicted probabilities and the actual class labels.

    Interpretability: The coefficients in logistic regression can provide insights into the relationship between each feature and the probability of the outcome. For example, a positive coefficient indicates that an increase in the corresponding feature is associated with a higher probability of the outcome being class 1.

    Logistic regression is relatively simple to implement and interpret, which makes it a popular choice for many real-world classification tasks!

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Aaditya Gupta (2024). Dataset for demonstrating simple linear Regression [Dataset]. https://www.kaggle.com/datasets/aadityagupta11/data-for-demonstrating-basic-linear-regression
Organization logo

Dataset for demonstrating simple linear Regression

The dataset consists of 100 observations which can be used for regression tasks

Explore at:
zip(2132 bytes)Available download formats
Dataset updated
Jul 3, 2024
Authors
Aaditya Gupta
Description

This dataset has been created to demonstrate the use of a simple linear regression model. It includes two variables: an independent variable and a dependent variable. The data can be used for training, testing, and validating a simple linear regression model, making it ideal for educational purposes, tutorials, and basic predictive analysis projects. The dataset consists of 100 observations with no missing values, and it follows a linear relationship

Search
Clear search
Close search
Google apps
Main menu