Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The car company wants to enter a new market and needs an estimation of exactly which variables affect the car prices. The goal is: - Which variables are significant in predicting the price of a car - How well do those variables describe the price of a car
Facebook
TwitterThis dataset was created by FayeJavad
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description: This dataset is designed for predicting energy consumption based on various building features and environmental factors. It contains data for multiple building types, square footage, the number of occupants, appliances used, average temperature, and the day of the week. The goal is to build a predictive model to estimate energy consumption using these attributes.
The dataset can be used for training machine learning models such as linear regression to forecast energy needs based on the building's characteristics. This is useful for understanding energy demand patterns and optimizing energy consumption in different building types and environmental conditions.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Here in This Dataset we have only 2 columns the first one is Age and the second one is Premium You can use this dataset in machine learning for Simple linear Regression and for Prediction Practices.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Salary Dataset in CSV for Simple linear regression. It has also been used in Machine Learning A to Z course of my series.
Facebook
TwitterThis dataset is designed for beginners to practice regression problems, particularly in the context of predicting house prices. It contains 1000 rows, with each row representing a house and various attributes that influence its price. The dataset is well-suited for learning basic to intermediate-level regression modeling techniques.
Beginner Regression Projects: This dataset can be used to practice building regression models such as Linear Regression, Decision Trees, or Random Forests. The target variable (house price) is continuous, making this an ideal problem for supervised learning techniques.
Feature Engineering Practice: Learners can create new features by combining existing ones, such as the price per square foot or age of the house, providing an opportunity to experiment with feature transformations.
Exploratory Data Analysis (EDA): You can explore how different features (e.g., square footage, number of bedrooms) correlate with the target variable, making it a great dataset for learning about data visualization and summary statistics.
Model Evaluation: The dataset allows for various model evaluation techniques such as cross-validation, R-squared, and Mean Absolute Error (MAE). These metrics can be used to compare the effectiveness of different models.
The dataset is highly versatile for a range of machine learning tasks. You can apply simple linear models to predict house prices based on one or two features, or use more complex models like Random Forest or Gradient Boosting Machines to understand interactions between variables.
It can also be used for dimensionality reduction techniques like PCA or to practice handling categorical variables (e.g., neighborhood quality) through encoding techniques like one-hot encoding.
This dataset is ideal for anyone wanting to gain practical experience in building regression models while working with real-world features.
Facebook
TwitterThis dataset was created by karthickveerakumar
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Ice Cream Selling dataset is a simple and well-suited dataset for beginners in machine learning who are looking to practice polynomial regression. It consists of two columns: temperature and the corresponding number of units of ice cream sold.
The dataset captures the relationship between temperature and ice cream sales. It serves as a practical example for understanding and implementing polynomial regression, a powerful technique for modeling nonlinear relationships in data.
The dataset is designed to be straightforward and easy to work with, making it ideal for beginners. The simplicity of the data allows beginners to focus on the fundamental concepts and steps involved in polynomial regression without overwhelming complexity.
By using this dataset, beginners can gain hands-on experience in preprocessing the data, splitting it into training and testing sets, selecting an appropriate degree for the polynomial regression model, training the model, and evaluating its performance. They can also explore techniques to address potential challenges such as overfitting.
With this dataset, beginners can practice making predictions of ice cream sales based on temperature inputs and visualize the polynomial regression curve that represents the relationship between temperature and ice cream sales.
Overall, the Ice Cream Selling dataset provides an accessible and practical learning resource for beginners to grasp the concepts and techniques of polynomial regression in the context of analyzing ice cream sales data.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Abhishek Kumar
Released under Apache 2.0
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a very simple multiple linear regression dataset for beginners. This dataset has only three columns and twenty rows. There are only two independent variables and one dependent variable. The independent variables are 'age' and 'experience'. The dependent variable is 'income'.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Muhammad Fawad Ul Hassan Sarim
Released under Apache 2.0
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a synthetic but realistic dataset created for practicing Multiple Linear Regression and feature engineering in a housing price prediction context. The dataset includes common real-world challenges like missing values, outliers, and categorical features.
You can use this dataset to: Build a regression model Practice data cleaning Explore feature scaling and encoding Visualize relationships between house characteristics and price
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The dataset contains 2 .csv files
This file contains various demographic and health-related data for different regions. Here's a brief description of each column:
File 1st
avganncount: Average number of cancer cases diagnosed annually.
avgdeathsperyear: Average number of deaths due to cancer per year.
target_deathrate: Target death rate due to cancer.
incidencerate: Incidence rate of cancer.
medincome: Median income in the region.
popest2015: Estimated population in 2015.
povertypercent: Percentage of population below the poverty line.
studypercap: Per capita number of cancer-related clinical trials conducted.
binnedinc: Binned median income.
medianage: Median age in the region.
pctprivatecoveragealone: Percentage of population covered by private health insurance alone.
pctempprivcoverage: Percentage of population covered by employee-provided private health insurance.
pctpubliccoverage: Percentage of population covered by public health insurance.
pctpubliccoveragealone: Percentage of population covered by public health insurance only.
pctwhite: Percentage of White population.
pctblack: Percentage of Black population.
pctasian: Percentage of Asian population.
pctotherrace: Percentage of population belonging to other races.
pctmarriedhouseholds: Percentage of married households. birthrate: Birth rate in the region.
File 2nd
This file contains demographic information about different regions, including details about household size and geographical location. Here's a description of each column:
statefips: The FIPS code representing the state.
countyfips: The FIPS code representing the county or census area within the state.
avghouseholdsize: The average household size in the region.
geography: The geographical location, typically represented as the county or census area name followed by the state name.
Each row in the file represents a specific region, providing details about household size and geographical location. This information can be used for various demographic analyses and studies.
Facebook
TwitterThis dataset was created by Ludgerus Darell
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Age: Age of the primary household member (18 to 70 years).
Education Level: Highest education level attained (High School, Bachelor's, Master's, Doctorate).
Occupation: Type of occupation (Healthcare, Education, Technology, Finance, Others).
Number of Dependents: Number of dependents in the household (0 to 5).
Location: Residential location (Urban, Suburban, Rural).
Work Experience: Years of work experience (0 to 50 years).
Marital Status: Marital status of the primary household member (Single, Married, Divorced).
Employment Status: Employment status of the primary household member (Full-time, Part-time, Self-employed).
Household Size: Total number of individuals living in the household (1 to 7).
Homeownership Status: Homeownership status (Own, Rent).
Type of Housing: Type of housing (Apartment, Single-family home, Townhouse).
Gender: Gender of the primary household member (Male, Female).
Primary Mode of Transportation: Primary mode of transportation used by the household member (Car, Public transit, Biking, Walking).
Annual Household Income: Actual annual household income, derived from a combination of features with added noise. Unit USD
This dataset can be used by researchers, analysts, and data scientists to explore the impact of various demographic and socioeconomic factors on household income and to develop predictive models for income estimation.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Arejet
Released under CC0: Public Domain
Facebook
TwitterThis dataset was created by Nitesh Addagatla
Facebook
TwitterThis dataset is created using the sources from this dataset.
Single variable regression model:
\[ y = mx + c \]
Both training dataset and testing dataset contain 1 Million rows.
1) x-values are numbers between 1 and 100.
2) y-values are created using this excel function: NORMINV(RAND(), x, 3).
License: feel free to use
Facebook
TwitterThis dataset was created by Vain
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F838653%2Ffe2531aed3d9f11080117c338f297e4b%2Fphoto_2023-11-16_18-36-36.jpg?generation=1700138287009978&alt=media" alt="">
Dataset to practice linear regression. We generate syntethic x from uniform distribution (-5, 5). Noise from normal N(0, 1) * 0.5. a = 1 b = 2
Dataset is given using the formula: y = a + b * x + noise
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The car company wants to enter a new market and needs an estimation of exactly which variables affect the car prices. The goal is: - Which variables are significant in predicting the price of a car - How well do those variables describe the price of a car