Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The car company wants to enter a new market and needs an estimation of exactly which variables affect the car prices. The goal is: - Which variables are significant in predicting the price of a car - How well do those variables describe the price of a car
Facebook
TwitterThis dataset was created by FayeJavad
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description: This dataset is designed for predicting energy consumption based on various building features and environmental factors. It contains data for multiple building types, square footage, the number of occupants, appliances used, average temperature, and the day of the week. The goal is to build a predictive model to estimate energy consumption using these attributes.
The dataset can be used for training machine learning models such as linear regression to forecast energy needs based on the building's characteristics. This is useful for understanding energy demand patterns and optimizing energy consumption in different building types and environmental conditions.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Salary Dataset in CSV for Simple linear regression. It has also been used in Machine Learning A to Z course of my series.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Here in This Dataset we have only 2 columns the first one is Age and the second one is Premium You can use this dataset in machine learning for Simple linear Regression and for Prediction Practices.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Ice Cream Selling dataset is a simple and well-suited dataset for beginners in machine learning who are looking to practice polynomial regression. It consists of two columns: temperature and the corresponding number of units of ice cream sold.
The dataset captures the relationship between temperature and ice cream sales. It serves as a practical example for understanding and implementing polynomial regression, a powerful technique for modeling nonlinear relationships in data.
The dataset is designed to be straightforward and easy to work with, making it ideal for beginners. The simplicity of the data allows beginners to focus on the fundamental concepts and steps involved in polynomial regression without overwhelming complexity.
By using this dataset, beginners can gain hands-on experience in preprocessing the data, splitting it into training and testing sets, selecting an appropriate degree for the polynomial regression model, training the model, and evaluating its performance. They can also explore techniques to address potential challenges such as overfitting.
With this dataset, beginners can practice making predictions of ice cream sales based on temperature inputs and visualize the polynomial regression curve that represents the relationship between temperature and ice cream sales.
Overall, the Ice Cream Selling dataset provides an accessible and practical learning resource for beginners to grasp the concepts and techniques of polynomial regression in the context of analyzing ice cream sales data.
Facebook
TwitterThis dataset was created by karthickveerakumar
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Muhammad Fawad Ul Hassan Sarim
Released under Apache 2.0
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Abhishek Kumar
Released under Apache 2.0
Facebook
TwitterThis dataset is designed for beginners to practice regression problems, particularly in the context of predicting house prices. It contains 1000 rows, with each row representing a house and various attributes that influence its price. The dataset is well-suited for learning basic to intermediate-level regression modeling techniques.
Beginner Regression Projects: This dataset can be used to practice building regression models such as Linear Regression, Decision Trees, or Random Forests. The target variable (house price) is continuous, making this an ideal problem for supervised learning techniques.
Feature Engineering Practice: Learners can create new features by combining existing ones, such as the price per square foot or age of the house, providing an opportunity to experiment with feature transformations.
Exploratory Data Analysis (EDA): You can explore how different features (e.g., square footage, number of bedrooms) correlate with the target variable, making it a great dataset for learning about data visualization and summary statistics.
Model Evaluation: The dataset allows for various model evaluation techniques such as cross-validation, R-squared, and Mean Absolute Error (MAE). These metrics can be used to compare the effectiveness of different models.
The dataset is highly versatile for a range of machine learning tasks. You can apply simple linear models to predict house prices based on one or two features, or use more complex models like Random Forest or Gradient Boosting Machines to understand interactions between variables.
It can also be used for dimensionality reduction techniques like PCA or to practice handling categorical variables (e.g., neighborhood quality) through encoding techniques like one-hot encoding.
This dataset is ideal for anyone wanting to gain practical experience in building regression models while working with real-world features.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a very simple multiple linear regression dataset for beginners. This dataset has only three columns and twenty rows. There are only two independent variables and one dependent variable. The independent variables are 'age' and 'experience'. The dependent variable is 'income'.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a synthetic but realistic dataset created for practicing Multiple Linear Regression and feature engineering in a housing price prediction context. The dataset includes common real-world challenges like missing values, outliers, and categorical features.
You can use this dataset to: Build a regression model Practice data cleaning Explore feature scaling and encoding Visualize relationships between house characteristics and price
Facebook
TwitterThis dataset is created using the sources from this dataset.
Single variable regression model:
\[ y = mx + c \]
Both training dataset and testing dataset contain 1 Million rows.
1) x-values are numbers between 1 and 100.
2) y-values are created using this excel function: NORMINV(RAND(), x, 3).
License: feel free to use
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset for Linear Regression with two Independent variables and one Dependent variable. Focused on Testing, Visualization and Statistical Analysis. The dataset is synthetic and contains 100 instances.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F838653%2Ffe2531aed3d9f11080117c338f297e4b%2Fphoto_2023-11-16_18-36-36.jpg?generation=1700138287009978&alt=media" alt="">
Dataset to practice linear regression. We generate syntethic x from uniform distribution (-5, 5). Noise from normal N(0, 1) * 0.5. a = 1 b = 2
Dataset is given using the formula: y = a + b * x + noise
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains a collection of 100 randomly generated data points representing the relationship between the number of hours a student spends studying and their corresponding performance, measured as a score. The data has been generated to simulate a real-world scenario where study hours are assumed to influence academic outcomes, making it an excellent resource for linear regression analysis and other machine learning tasks.
Each row in the dataset consists of:
Hours: The number of hours a student dedicates to studying, ranging between 0 and 10 hours. Scores: The student's performance score, represented as a percentage, ranging from 0 to 100. Use Cases: This dataset is particularly useful for:
Linear Regression: Exploring how study hours influence student performance, fitting a regression line to predict scores based on study time. Data Science & Machine Learning: Practicing regression analysis, training models, and applying other predictive algorithms. Educational Research: Simulating data-driven insights into student behavior and performance metrics. Features: 100 rows of data. Continuous numerical variables suitable for regression tasks. Generated for educational purposes, making it ideal for students, teachers, and beginners in machine learning and data science. Potential Applications: Build a linear regression model to predict student scores. Investigate the correlation between study time and performance. Apply data visualization techniques to better understand the data. Use the dataset to experiment with model evaluation metrics like Mean Squared Error (MSE) and R-squared.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description The table below gives the heights of fathers and their sons, based on a famous experiment by Karl Pearson around 1903. The number of cases is 1078. Random noise was added to the original data, to produce heights to the nearest 0.1 inch.
Objective: Use this dataset to practice simple linear regression.
Columns - Father height - Son height
Source: Department of Statistics, University of California, Berkeley
Download TSV source file: Pearson.tsv
Facebook
TwitterThis dataset has been created to demonstrate the use of a simple linear regression model. It includes two variables: an independent variable and a dependent variable. The data can be used for training, testing, and validating a simple linear regression model, making it ideal for educational purposes, tutorials, and basic predictive analysis projects. The dataset consists of 100 observations with no missing values, and it follows a linear relationship
Facebook
TwitterThis dataset was created by Nitesh Addagatla
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Age: Age of the primary household member (18 to 70 years).
Education Level: Highest education level attained (High School, Bachelor's, Master's, Doctorate).
Occupation: Type of occupation (Healthcare, Education, Technology, Finance, Others).
Number of Dependents: Number of dependents in the household (0 to 5).
Location: Residential location (Urban, Suburban, Rural).
Work Experience: Years of work experience (0 to 50 years).
Marital Status: Marital status of the primary household member (Single, Married, Divorced).
Employment Status: Employment status of the primary household member (Full-time, Part-time, Self-employed).
Household Size: Total number of individuals living in the household (1 to 7).
Homeownership Status: Homeownership status (Own, Rent).
Type of Housing: Type of housing (Apartment, Single-family home, Townhouse).
Gender: Gender of the primary household member (Male, Female).
Primary Mode of Transportation: Primary mode of transportation used by the household member (Car, Public transit, Biking, Walking).
Annual Household Income: Actual annual household income, derived from a combination of features with added noise. Unit USD
This dataset can be used by researchers, analysts, and data scientists to explore the impact of various demographic and socioeconomic factors on household income and to develop predictive models for income estimation.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The car company wants to enter a new market and needs an estimation of exactly which variables affect the car prices. The goal is: - Which variables are significant in predicting the price of a car - How well do those variables describe the price of a car