100+ datasets found

Price Prediction -Multiple Linear Regression
kaggle.com
zip
Updated Aug 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erol Masimov (2022). Price Prediction -Multiple Linear Regression [Dataset]. https://www.kaggle.com/datasets/erolmasimov/price-prediction-multiple-linear-regression
Explore at:
zip(6192 bytes)Available download formats
Dataset updated
Aug 3, 2022
Authors
Erol Masimov
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The car company wants to enter a new market and needs an estimation of exactly which variables affect the car prices. The goal is: - Which variables are significant in predicting the price of a car - How well do those variables describe the price of a car
Marketing Linear Multiple Regression
kaggle.com
zip
Updated Apr 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FayeJavad (2020). Marketing Linear Multiple Regression [Dataset]. https://www.kaggle.com/datasets/fayejavad/marketing-linear-multiple-regression
Explore at:
zip(1907 bytes)Available download formats
Dataset updated
Apr 24, 2020
Authors
FayeJavad
Description
Dataset

This dataset was created by FayeJavad

Contents
Energy Consumption Dataset - Linear Regression
kaggle.com
Updated Jan 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GOVINDARAM SRIRAM (2025). Energy Consumption Dataset - Linear Regression [Dataset]. https://www.kaggle.com/datasets/govindaramsriram/energy-consumption-dataset-linear-regression
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 6, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
GOVINDARAM SRIRAM
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description: This dataset is designed for predicting energy consumption based on various building features and environmental factors. It contains data for multiple building types, square footage, the number of occupants, appliances used, average temperature, and the day of the week. The goal is to build a predictive model to estimate energy consumption using these attributes.

The dataset can be used for training machine learning models such as linear regression to forecast energy needs based on the building's characteristics. This is useful for understanding energy demand patterns and optimizing energy consumption in different building types and environmental conditions.
Salary Dataset - Simple linear regression
kaggle.com
zip
Updated Jan 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Allena Venkata Sai Abhishek (2023). Salary Dataset - Simple linear regression [Dataset]. https://www.kaggle.com/datasets/abhishek14398/salary-dataset-simple-linear-regression/code
Explore at:
zip(457 bytes)Available download formats
Dataset updated
Jan 10, 2023
Authors
Allena Venkata Sai Abhishek
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset Description

Salary Dataset in CSV for Simple linear regression. It has also been used in Machine Learning A to Z course of my series.

Columns

#

YearsExperience

Salary
Insurance Dataset - Simple Linear Regression
kaggle.com
zip
Updated Sep 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Taseer Mehboob (2023). Insurance Dataset - Simple Linear Regression [Dataset]. https://www.kaggle.com/datasets/taseermehboob9/insurance-dataset-simple-linear-regression
Explore at:
zip(254 bytes)Available download formats
Dataset updated
Sep 14, 2023
Authors
Taseer Mehboob
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Here in This Dataset we have only 2 columns the first one is Age and the second one is Premium You can use this dataset in machine learning for Simple linear Regression and for Prediction Practices.
polynomial regression
kaggle.com
Updated Jul 5, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Miraj Deep Bhandari (2023). polynomial regression [Dataset]. http://doi.org/10.34740/kaggle/ds/3482232
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/ds/3482232
Dataset updated
Jul 5, 2023
Dataset provided by
Kaggle
Authors
Miraj Deep Bhandari
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The Ice Cream Selling dataset is a simple and well-suited dataset for beginners in machine learning who are looking to practice polynomial regression. It consists of two columns: temperature and the corresponding number of units of ice cream sold.

The dataset captures the relationship between temperature and ice cream sales. It serves as a practical example for understanding and implementing polynomial regression, a powerful technique for modeling nonlinear relationships in data.

The dataset is designed to be straightforward and easy to work with, making it ideal for beginners. The simplicity of the data allows beginners to focus on the fundamental concepts and steps involved in polynomial regression without overwhelming complexity.

By using this dataset, beginners can gain hands-on experience in preprocessing the data, splitting it into training and testing sets, selecting an appropriate degree for the polynomial regression model, training the model, and evaluating its performance. They can also explore techniques to address potential challenges such as overfitting.

With this dataset, beginners can practice making predictions of ice cream sales based on temperature inputs and visualize the polynomial regression curve that represents the relationship between temperature and ice cream sales.

Overall, the Ice Cream Selling dataset provides an accessible and practical learning resource for beginners to grasp the concepts and techniques of polynomial regression in the context of analyzing ice cream sales data.
Startup - Multiple Linear Regression
kaggle.com
zip
Updated Jan 29, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
karthickveerakumar (2018). Startup - Multiple Linear Regression [Dataset]. https://www.kaggle.com/datasets/karthickveerakumar/startup-logistic-regression
Explore at:
zip(1330 bytes)Available download formats
Dataset updated
Jan 29, 2018
Authors
karthickveerakumar
Description
Dataset

This dataset was created by karthickveerakumar

Contents
Multiple Linear Regression Dataset for Practice
kaggle.com
zip
Updated Jun 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Fawad Ul Hassan Sarim (2024). Multiple Linear Regression Dataset for Practice [Dataset]. https://www.kaggle.com/datasets/fawadsarim/multiple-linear-regression-dataset-for-practice
Explore at:
zip(2248 bytes)Available download formats
Dataset updated
Jun 14, 2024
Authors
Muhammad Fawad Ul Hassan Sarim
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Muhammad Fawad Ul Hassan Sarim

Released under Apache 2.0

Contents
Salary Dataset for Simple Linear regression model
kaggle.com
zip
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abhishek Kumar (2023). Salary Dataset for Simple Linear regression model [Dataset]. https://www.kaggle.com/datasets/abhishek121212/salary-dataset-for-simple-linear-regression-model
Explore at:
zip(457 bytes)Available download formats
Dataset updated
Dec 4, 2023
Authors
Abhishek Kumar
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Abhishek Kumar

Released under Apache 2.0

Contents
House Price Regression Dataset
kaggle.com
zip
Updated Sep 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prokshitha Polemoni (2024). House Price Regression Dataset [Dataset]. https://www.kaggle.com/datasets/prokshitha/home-value-insights
Explore at:
zip(27045 bytes)Available download formats
Dataset updated
Sep 6, 2024
Authors
Prokshitha Polemoni
Description
Home Value Insights: A Beginner's Regression Dataset

This dataset is designed for beginners to practice regression problems, particularly in the context of predicting house prices. It contains 1000 rows, with each row representing a house and various attributes that influence its price. The dataset is well-suited for learning basic to intermediate-level regression modeling techniques.

Features:

Square_Footage: The size of the house in square feet. Larger homes typically have higher prices.

Num_Bedrooms: The number of bedrooms in the house. More bedrooms generally increase the value of a home.

Num_Bathrooms: The number of bathrooms in the house. Houses with more bathrooms are typically priced higher.

Year_Built: The year the house was built. Older houses may be priced lower due to wear and tear.

Lot_Size: The size of the lot the house is built on, measured in acres. Larger lots tend to add value to a property.

Garage_Size: The number of cars that can fit in the garage. Houses with larger garages are usually more expensive.

Neighborhood_Quality: A rating of the neighborhood’s quality on a scale of 1-10, where 10 indicates a high-quality neighborhood. Better neighborhoods usually command higher prices.

House_Price (Target Variable): The price of the house, which is the dependent variable you aim to predict.

Potential Uses:

Beginner Regression Projects: This dataset can be used to practice building regression models such as Linear Regression, Decision Trees, or Random Forests. The target variable (house price) is continuous, making this an ideal problem for supervised learning techniques.

Feature Engineering Practice: Learners can create new features by combining existing ones, such as the price per square foot or age of the house, providing an opportunity to experiment with feature transformations.

Exploratory Data Analysis (EDA): You can explore how different features (e.g., square footage, number of bedrooms) correlate with the target variable, making it a great dataset for learning about data visualization and summary statistics.

Model Evaluation: The dataset allows for various model evaluation techniques such as cross-validation, R-squared, and Mean Absolute Error (MAE). These metrics can be used to compare the effectiveness of different models.

Versatility:

The dataset is highly versatile for a range of machine learning tasks. You can apply simple linear models to predict house prices based on one or two features, or use more complex models like Random Forest or Gradient Boosting Machines to understand interactions between variables.

It can also be used for dimensionality reduction techniques like PCA or to practice handling categorical variables (e.g., neighborhood quality) through encoding techniques like one-hot encoding.

This dataset is ideal for anyone wanting to gain practical experience in building regression models while working with real-world features.
Multiple Linear Regression Dataset
kaggle.com
zip
Updated Aug 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hussain Nasir Khan (2022). Multiple Linear Regression Dataset [Dataset]. https://www.kaggle.com/datasets/hussainnasirkhan/multiple-linear-regression-dataset/code
Explore at:
zip(367 bytes)Available download formats
Dataset updated
Aug 14, 2022
Authors
Hussain Nasir Khan
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This is a very simple multiple linear regression dataset for beginners. This dataset has only three columns and twenty rows. There are only two independent variables and one dependent variable. The independent variables are 'age' and 'experience'. The dependent variable is 'income'.
Multiple Linear Regression Dataset
kaggle.com
zip
Updated Jul 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Siddant007 (2025). Multiple Linear Regression Dataset [Dataset]. https://www.kaggle.com/datasets/siddant007/multiplelinearregression-outliers-missing-values
Explore at:
zip(1110 bytes)Available download formats
Dataset updated
Jul 11, 2025
Authors
Siddant007
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This is a synthetic but realistic dataset created for practicing Multiple Linear Regression and feature engineering in a housing price prediction context. The dataset includes common real-world challenges like missing values, outliers, and categorical features.

You can use this dataset to: Build a regression model Practice data cleaning Explore feature scaling and encoding Visualize relationships between house characteristics and price
Linear Regression
kaggle.com
zip
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fareed Khan (2022). Linear Regression [Dataset]. https://www.kaggle.com/datasets/fareedkhan557/linear-regression
Explore at:
zip(14257572 bytes)Available download formats
Dataset updated
Jan 29, 2022
Authors
Fareed Khan
Description
Context

This dataset is created using the sources from this dataset.

Content

Single variable regression model:

\[ y = mx + c \]

Both training dataset and testing dataset contain 1 Million rows. 1) x-values are numbers between 1 and 100. 2) y-values are created using this excel function: NORMINV(RAND(), x, 3).

License: feel free to use
Dataset for Linear Regression with 2 IV and 1 DV
kaggle.com
zip
Updated Mar 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stable Space (2025). Dataset for Linear Regression with 2 IV and 1 DV [Dataset]. https://www.kaggle.com/datasets/sharmajicoder/dataset-for-linear-regression-with-2-iv-and-1-dv
Explore at:
zip(9351 bytes)Available download formats
Dataset updated
Mar 25, 2025
Authors
Stable Space
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset for Linear Regression with two Independent variables and one Dependent variable. Focused on Testing, Visualization and Statistical Analysis. The dataset is synthetic and contains 100 instances.
Linear regression with noise
kaggle.com
zip
Updated Nov 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
404akhan (2023). Linear regression with noise [Dataset]. https://www.kaggle.com/datasets/akhan404/linear-regression-with-noise
Explore at:
zip(2324 bytes)Available download formats
Dataset updated
Nov 16, 2023
Authors
404akhan
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F838653%2Ffe2531aed3d9f11080117c338f297e4b%2Fphoto_2023-11-16_18-36-36.jpg?generation=1700138287009978&alt=media" alt="">

Dataset to practice linear regression. We generate syntethic x from uniform distribution (-5, 5). Noise from normal N(0, 1) * 0.5. a = 1 b = 2

Dataset is given using the formula: y = a + b * x + noise
Study Hours ,Student Scores for Linear Regression
kaggle.com
Updated Sep 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
douaa bennoune (2024). Study Hours ,Student Scores for Linear Regression [Dataset]. https://www.kaggle.com/datasets/douaabennoune/study-hours-student-scores-for-linear-regression
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 23, 2024
Dataset provided by
Kaggle
Authors
douaa bennoune
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains a collection of 100 randomly generated data points representing the relationship between the number of hours a student spends studying and their corresponding performance, measured as a score. The data has been generated to simulate a real-world scenario where study hours are assumed to influence academic outcomes, making it an excellent resource for linear regression analysis and other machine learning tasks.

Each row in the dataset consists of:

Hours: The number of hours a student dedicates to studying, ranging between 0 and 10 hours. Scores: The student's performance score, represented as a percentage, ranging from 0 to 100. Use Cases: This dataset is particularly useful for:

Linear Regression: Exploring how study hours influence student performance, fitting a regression line to predict scores based on study time. Data Science & Machine Learning: Practicing regression analysis, training models, and applying other predictive algorithms. Educational Research: Simulating data-driven insights into student behavior and performance metrics. Features: 100 rows of data. Continuous numerical variables suitable for regression tasks. Generated for educational purposes, making it ideal for students, teachers, and beginners in machine learning and data science. Potential Applications: Build a linear regression model to predict student scores. Investigate the correlation between study time and performance. Apply data visualization techniques to better understand the data. Use the dataset to experiment with model evaluation metrics like Mean Squared Error (MSE) and R-squared.
Pearson's Height Data 📏 Simple linear regression
kaggle.com
zip
Updated Aug 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MaDiha 🌷 (2024). Pearson's Height Data 📏 Simple linear regression [Dataset]. https://www.kaggle.com/datasets/fundal/pearsons-height-data-simple-linear-regression
Explore at:
zip(3544 bytes)Available download formats
Dataset updated
Aug 17, 2024
Authors
MaDiha 🌷
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Description The table below gives the heights of fathers and their sons, based on a famous experiment by Karl Pearson around 1903. The number of cases is 1078. Random noise was added to the original data, to produce heights to the nearest 0.1 inch.

Objective: Use this dataset to practice simple linear regression.

Columns - Father height - Son height

Source: Department of Statistics, University of California, Berkeley

Download TSV source file: Pearson.tsv
Dataset for demonstrating simple linear Regression
kaggle.com
zip
Updated Jul 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aaditya Gupta (2024). Dataset for demonstrating simple linear Regression [Dataset]. https://www.kaggle.com/datasets/aadityagupta11/data-for-demonstrating-basic-linear-regression
Explore at:
zip(2132 bytes)Available download formats
Dataset updated
Jul 3, 2024
Authors
Aaditya Gupta
Description
This dataset has been created to demonstrate the use of a simple linear regression model. It includes two variables: an independent variable and a dependent variable. The data can be used for training, testing, and validating a simple linear regression model, making it ideal for educational purposes, tutorials, and basic predictive analysis projects. The dataset consists of 100 observations with no missing values, and it follows a linear relationship
Simple Linear Regression Dataset
kaggle.com
Updated Jun 29, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nitesh Addagatla (2023). Simple Linear Regression Dataset [Dataset]. https://www.kaggle.com/datasets/niteshaddagatla/simple-linear-regression-datasset/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 29, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Nitesh Addagatla
Description
Dataset

This dataset was created by Nitesh Addagatla

Contents
Regression Dataset for Household Income Analysis
kaggle.com
Updated Jun 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Umair Zia (2024). Regression Dataset for Household Income Analysis [Dataset]. https://www.kaggle.com/datasets/stealthtechnologies/regression-dataset-for-household-income-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 5, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Umair Zia
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This synthetic dataset simulates various demographic and socioeconomic factors that influence annual household income. It can be used for exploratory data analysis, predictive modeling, and understanding the relationships between different features and income levels.

Features:

Age: Age of the primary household member (18 to 70 years).

Education Level: Highest education level attained (High School, Bachelor's, Master's, Doctorate).

Occupation: Type of occupation (Healthcare, Education, Technology, Finance, Others).

Number of Dependents: Number of dependents in the household (0 to 5).

Location: Residential location (Urban, Suburban, Rural).

Work Experience: Years of work experience (0 to 50 years).

Marital Status: Marital status of the primary household member (Single, Married, Divorced).

Employment Status: Employment status of the primary household member (Full-time, Part-time, Self-employed).

Household Size: Total number of individuals living in the household (1 to 7).

Homeownership Status: Homeownership status (Own, Rent).

Type of Housing: Type of housing (Apartment, Single-family home, Townhouse).

Gender: Gender of the primary household member (Male, Female).

Primary Mode of Transportation: Primary mode of transportation used by the household member (Car, Public transit, Biking, Walking).

Annual Household Income: Actual annual household income, derived from a combination of features with added noise. Unit USD

This dataset can be used by researchers, analysts, and data scientists to explore the impact of various demographic and socioeconomic factors on household income and to develop predictive models for income estimation.

Facebook

Twitter

Click to copy link

Link copied

Cite

Erol Masimov (2022). Price Prediction -Multiple Linear Regression [Dataset]. https://www.kaggle.com/datasets/erolmasimov/price-prediction-multiple-linear-regression

Price Prediction -Multiple Linear Regression

Multiple Linear Regression - Cleanin, Correlation,Dummies,Multicollinearity,OLS

Explore at:

13 scholarly articles cite this dataset (View in Google Scholar)

zip(6192 bytes)Available download formats

Dataset updated

Aug 3, 2022

Authors

Erol Masimov

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

The car company wants to enter a new market and needs an estimation of exactly which variables affect the car prices. The goal is: - Which variables are significant in predicting the price of a car - How well do those variables describe the price of a car

Clear search

Close search

Google apps

Main menu

Price Prediction -Multiple Linear Regression

Marketing Linear Multiple Regression

Dataset

Contents

Energy Consumption Dataset - Linear Regression

Salary Dataset - Simple linear regression

Dataset Description

Columns

Insurance Dataset - Simple Linear Regression

polynomial regression

Startup - Multiple Linear Regression

Dataset

Contents

Multiple Linear Regression Dataset for Practice

Dataset

Contents

Salary Dataset for Simple Linear regression model

Dataset

Contents

House Price Regression Dataset

Home Value Insights: A Beginner's Regression Dataset

Features:

Potential Uses:

Versatility:

Multiple Linear Regression Dataset

Multiple Linear Regression Dataset

Linear Regression

Context

Content

Dataset for Linear Regression with 2 IV and 1 DV

Linear regression with noise

Study Hours ,Student Scores for Linear Regression

Pearson's Height Data 📏 Simple linear regression

Dataset for demonstrating simple linear Regression

Simple Linear Regression Dataset

Dataset

Contents

Regression Dataset for Household Income Analysis

This synthetic dataset simulates various demographic and socioeconomic factors that influence annual household income. It can be used for exploratory data analysis, predictive modeling, and understanding the relationships between different features and income levels.

Features:

Price Prediction -Multiple Linear Regression

Multiple Linear Regression - Cleanin, Correlation,Dummies,Multicollinearity,OLS