100+ datasets found

Dataset for demonstrating simple linear Regression
kaggle.com
zip
Updated Jul 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aaditya Gupta (2024). Dataset for demonstrating simple linear Regression [Dataset]. https://www.kaggle.com/datasets/aadityagupta11/data-for-demonstrating-basic-linear-regression
Explore at:
zip(2132 bytes)Available download formats
Dataset updated
Jul 3, 2024
Authors
Aaditya Gupta
Description
This dataset has been created to demonstrate the use of a simple linear regression model. It includes two variables: an independent variable and a dependent variable. The data can be used for training, testing, and validating a simple linear regression model, making it ideal for educational purposes, tutorials, and basic predictive analysis projects. The dataset consists of 100 observations with no missing values, and it follows a linear relationship
Insurance Dataset - Simple Linear Regression
kaggle.com
zip
Updated Sep 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Taseer Mehboob (2023). Insurance Dataset - Simple Linear Regression [Dataset]. https://www.kaggle.com/datasets/taseermehboob9/insurance-dataset-simple-linear-regression
Explore at:
zip(254 bytes)Available download formats
Dataset updated
Sep 14, 2023
Authors
Taseer Mehboob
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Here in This Dataset we have only 2 columns the first one is Age and the second one is Premium You can use this dataset in machine learning for Simple linear Regression and for Prediction Practices.
d
Data from: Data for multiple linear regression models for predicting...
catalog.data.gov
data.usgs.gov
+2more
Updated Nov 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Data for multiple linear regression models for predicting microcystin concentration action-level exceedances in selected lakes in Ohio [Dataset]. https://catalog.data.gov/dataset/data-for-multiple-linear-regression-models-for-predicting-microcystin-concentration-action
Explore at:
Dataset updated
Nov 19, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Ohio
Description
Site-specific multiple linear regression models were developed for eight sites in Ohio—six in the Western Lake Erie Basin and two in northeast Ohio on inland reservoirs--to quickly predict action-level exceedances for a cyanotoxin, microcystin, in recreational and drinking waters used by the public. Real-time models include easily- or continuously-measured factors that do not require that a sample be collected. Real-time models are presented in two categories: (1) six models with continuous monitor data, and (2) three models with on-site measurements. Real-time models commonly included variables such as phycocyanin, pH, specific conductance, and streamflow or gage height. Many of the real-time factors were averages over time periods antecedent to the time the microcystin sample was collected, including water-quality data compiled from continuous monitors. Comprehensive models use a combination of discrete sample-based measurements and real-time factors. Comprehensive models were useful at some sites with lagged variables (< 2 weeks) for cyanobacterial toxin genes, dissolved nutrients, and (or) N to P ratios. Comprehensive models are presented in three categories: (1) three models with continuous monitor data and lagged comprehensive variables, (2) five models with no continuous monitor data and lagged comprehensive variables, and (3) one model with continuous monitor data and same-day comprehensive variables. Funding for this work was provided by the Ohio Water Development Authority and the U.S. Geological Survey Cooperative Water Program.
d
Data from: An example data set for exploration of Multiple Linear Regression...
catalog.data.gov
data.usgs.gov
Updated Nov 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). An example data set for exploration of Multiple Linear Regression [Dataset]. https://catalog.data.gov/dataset/an-example-data-set-for-exploration-of-multiple-linear-regression
Explore at:
Dataset updated
Nov 20, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
This data set contains example data for exploration of the theory of regression based regionalization. The 90th percentile of annual maximum streamflow is provided as an example response variable for 293 streamgages in the conterminous United States. Several explanatory variables are drawn from the GAGES-II data base in order to demonstrate how multiple linear regression is applied. Example scripts demonstrate how to collect the original streamflow data provided and how to recreate the figures from the associated Techniques and Methods chapter.
d
Data from: Data for multiple linear regression models for estimating...
catalog.data.gov
data.usgs.gov
+2more
Updated Nov 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Data for multiple linear regression models for estimating Escherichia coli (E. coli) concentrations or the probability of exceeding the bathing-water standard at recreational sites in Ohio and Pennsylvania as part of the Great Lakes NowCast, 2019 [Dataset]. https://catalog.data.gov/dataset/data-for-multiple-linear-regression-models-for-estimating-escherichia-coli-e-coli-concentr
Explore at:
Dataset updated
Nov 27, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Pennsylvania, The Great Lakes
Description
Site-specific multiple linear regression models were developed for one beach in Ohio (three discrete sampling sites) and one beach in Pennsylvania to estimate concentrations of Escherichia coli (E. coli) or the probability of exceeding the bathing-water standard for E. coli in recreational waters used by the public. Traditional culture-based methods are commonly used to estimate concentrations of fecal indicator bacteria, such as E. coli; however, results are obtained 18 to 24 hours post sampling and do not accurately reflect current water-quality conditions. Beach-specific mathematical models use environmental and water-quality variables that are easily and quickly measured as surrogates to estimate concentrations of fecal-indicator bacteria or to provide the probability that a State recreational water-quality standard will be exceeded. When predictive models are used for beach closure or advisory decisions, they are referred to as “nowcasts”. Software designed for model development by the U.S. Environmental Protection Agency (Virtual Beach) was used. The selected model for each beach was based on a combination of explanatory variables including, most commonly, turbidity, water temperature, change in lake level over 24 hours, and antecedent rainfall. Model results are used by managers to report water-quality conditions to the public through the Great Lakes NowCast in 2019 (https://pa.water.usgs.gov/apps/nowcast/). Model performance in 2019 (sensitivity, specificity, and accuracy) was compared to using the previous day's E. coli concentration (persistence method).
Salary Dataset - Simple linear regression
kaggle.com
zip
Updated Jan 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Allena Venkata Sai Abhishek (2023). Salary Dataset - Simple linear regression [Dataset]. https://www.kaggle.com/datasets/abhishek14398/salary-dataset-simple-linear-regression/code
Explore at:
zip(457 bytes)Available download formats
Dataset updated
Jan 10, 2023
Authors
Allena Venkata Sai Abhishek
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset Description

Salary Dataset in CSV for Simple linear regression. It has also been used in Machine Learning A to Z course of my series.

Columns

#

YearsExperience

Salary
Table_1_Application of robust regression in translational neuroscience...
frontiersin.figshare.com
datasetcatalog.nlm.nih.gov
docx
Updated Jan 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Malek-Ahmadi; Stephen D. Ginsberg; Melissa J. Alldred; Scott E. Counts; Milos D. Ikonomovic; Eric E. Abrahamson; Sylvia E. Perez; Elliott J. Mufson (2024). Table_1_Application of robust regression in translational neuroscience studies with non-Gaussian outcome data.DOCX [Dataset]. http://doi.org/10.3389/fnagi.2023.1299451.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fnagi.2023.1299451.s001
Dataset updated
Jan 24, 2024
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Michael Malek-Ahmadi; Stephen D. Ginsberg; Melissa J. Alldred; Scott E. Counts; Milos D. Ikonomovic; Eric E. Abrahamson; Sylvia E. Perez; Elliott J. Mufson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Linear regression is one of the most used statistical techniques in neuroscience, including the study of the neuropathology of Alzheimer’s disease (AD) dementia. However, the practical utility of this approach is often limited because dependent variables are often highly skewed and fail to meet the assumption of normality. Applying linear regression analyses to highly skewed datasets can generate imprecise results, which lead to erroneous estimates derived from statistical models. Furthermore, the presence of outliers can introduce unwanted bias, which affect estimates derived from linear regression models. Although a variety of data transformations can be utilized to mitigate these problems, these approaches are also associated with various caveats. By contrast, a robust regression approach does not impose distributional assumptions on data allowing for results to be interpreted in a similar manner to that derived using a linear regression analysis. Here, we demonstrate the utility of applying robust regression to the analysis of data derived from studies of human brain neurodegeneration where the error distribution of a dependent variable does not meet the assumption of normality. We show that the application of a robust regression approach to two independent published human clinical neuropathologic data sets provides reliable estimates of associations. We also demonstrate that results from a linear regression analysis can be biased if the dependent variable is significantly skewed, further indicating robust regression as a suitable alternate approach.
c
Student Performance (Multiple Linear Regression) Dataset
cubig.ai
zip
Updated May 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Student Performance (Multiple Linear Regression) Dataset [Dataset]. https://cubig.ai/store/products/392/student-performance-multiple-linear-regression-dataset
Explore at:
zipAvailable download formats
Dataset updated
May 29, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
Description
1) Data Introduction • The Student Performance (Multiple Linear Regression) Dataset is designed to analyze the relationship between students’ learning habits and academic performance. Each sample includes key indicators related to learning, such as study hours, sleep duration, previous test scores, and the number of practice exams completed.

2) Data Utilization (1) Characteristics of the Student Performance (Multiple Linear Regression) Dataset: • The target variable, Hours Studied, quantitatively represents the amount of time a student has invested in studying. The dataset is structured to allow modeling and inference of learning behaviors based on correlations with other variables.

(2) Applications of the Student Performance (Multiple Linear Regression) Dataset: • AI-Based Study Time Prediction Models: The dataset can be used to develop regression models that estimate a student’s expected study time based on inputs like academic performance, sleep habits, and engagement patterns. • Behavioral Analysis and Personalized Learning Strategies: It can be applied to identify students with insufficient study time and design personalized study interventions based on academic and lifestyle patterns.
n
Data from: Assessing predictive performance of supervised machine learning...
data.niaid.nih.gov
datasetcatalog.nlm.nih.gov
+1more
zip
Updated May 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Evans Omondi (2023). Assessing predictive performance of supervised machine learning algorithms for a diamond pricing model [Dataset]. http://doi.org/10.5061/dryad.wh70rxwrh
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.wh70rxwrh
Dataset updated
May 23, 2023
Dataset provided by
Strathmore University
Authors
Evans Omondi
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
The diamond is 58 times harder than any other mineral in the world, and its elegance as a jewel has long been appreciated. Forecasting diamond prices is challenging due to nonlinearity in important features such as carat, cut, clarity, table, and depth. Against this backdrop, the study conducted a comparative analysis of the performance of multiple supervised machine learning models (regressors and classifiers) in predicting diamond prices. Eight supervised machine learning algorithms were evaluated in this work including Multiple Linear Regression, Linear Discriminant Analysis, eXtreme Gradient Boosting, Random Forest, k-Nearest Neighbors, Support Vector Machines, Boosted Regression and Classification Trees, and Multi-Layer Perceptron. The analysis is based on data preprocessing, exploratory data analysis (EDA), training the aforementioned models, assessing their accuracy, and interpreting their results. Based on the performance metrics values and analysis, it was discovered that eXtreme Gradient Boosting was the most optimal algorithm in both classification and regression, with a R2 score of 97.45% and an Accuracy value of 74.28%. As a result, eXtreme Gradient Boosting was recommended as the optimal regressor and classifier for forecasting the price of a diamond specimen. Methods Kaggle, a data repository with thousands of datasets, was used in the investigation. It is an online community for machine learning practitioners and data scientists, as well as a robust, well-researched, and sufficient resource for analyzing various data sources. On Kaggle, users can search for and publish various datasets. In a web-based data-science environment, they can study datasets and construct models.
Synthetic Dataset for Linear Regression
kaggle.com
zip
Updated Mar 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stable Space (2025). Synthetic Dataset for Linear Regression [Dataset]. https://www.kaggle.com/datasets/sharmajicoder/synthetic-dataset-for-linear-regression/data
Explore at:
zip(9111 bytes)Available download formats
Dataset updated
Mar 23, 2025
Authors
Stable Space
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
The file contain dataset with two variables (x & y). The dataset is for Linear regression ML Models. The dataset can be used for Testing purpose. The x variable is the independent variable, and y is the dependent variable. The dataset has a correlation of 0.9981 showing the dataset is best suited for linear models and can be used for the testing purpose.
m
Panel dataset on Brazilian fuel demand
data.mendeley.com
Updated Oct 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sergio Prolo (2024). Panel dataset on Brazilian fuel demand [Dataset]. http://doi.org/10.17632/hzpwbp7j22.1
Explore at:
Unique identifier
https://doi.org/10.17632/hzpwbp7j22.1
Dataset updated
Oct 7, 2024
Authors
Sergio Prolo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Brazil
Description
Summary : Fuel demand is shown to be influenced by fuel prices, people's income and motorization rates. We explore the effects of electric vehicle's rates in gasoline demand using this panel dataset.

Files : dataset.csv - Panel dimensions are the Brazilian state ( i ) and year ( t ). The other columns are: gasoline sales per capita (ln_Sg_pc), prices of gasoline (ln_Pg) and ethanol (ln_Pe) and their lags, motorization rates of combustion vehicles (ln_Mi_c) and electric vehicles (ln_Mi_e) and GDP per capita (ln_gdp_pc). All variables are all under the natural log function, since we use this to calculate demand elasticities in a regression model.

adjacency.csv - The adjacency matrix used in interaction with electric vehicles' motorization rates to calculate spatial effects. At first, it follows a binary adjacency formula: for each pair of states i and j, the cell (i, j) is 0 if the states are not adjacent and 1 if they are. Then, each row is normalized to have sum equal to one.

regression.do - Series of Stata commands used to estimate the regression models of our study. dataset.csv must be imported to work, see comment section.

dataset_predictions.xlsx - Based on the estimations from Stata, we use this excel file to make average predictions by year and by state. Also, by including years beyond the last panel sample, we also forecast the model into the future and evaluate the effects of different policies that influence gasoline prices (taxation) and EV motorization rates (electrification). This file is primarily used to create images, but can be used to further understand how the forecasting scenarios are set up.

Sources: Fuel prices and sales: ANP (https://www.gov.br/anp/en/access-information/what-is-anp/what-is-anp) State population, GDP and vehicle fleet: IBGE (https://www.ibge.gov.br/en/home-eng.html?lang=en-GB) State EV fleet: Anfavea (https://anfavea.com.br/en/site/anuarios/)
Data from: Learning While Learning: Psychology Case Studies for Teaching...
tandf.figshare.com
bin
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ciaran Evans; Alex Reinhart; Erin Cooley; William Cipolli (2025). Learning While Learning: Psychology Case Studies for Teaching Regression [Dataset]. http://doi.org/10.6084/m9.figshare.28127458.v2
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28127458.v2
Dataset updated
Apr 1, 2025
Dataset provided by
Taylor & Francishttps://taylorandfrancis.com/
Authors
Ciaran Evans; Alex Reinhart; Erin Cooley; William Cipolli
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In this article, we explore the use of two published datasets for teaching a wide range of students about regression models, with a particular focus on interaction terms. The two datasets come from recent psychology studies on beliefs about poverty and welfare, and about the dynamics of groups projects. Both datasets (and their original research papers) are accessible to students, and because of their context, students can learn about data collection, measurement, and the use of statistics when studying complex social topics, while using the data to learn about regression analysis. We have used these data for a range of in-class activities, journal paper discussions, exams, and extended projects, at the undergraduate, master’s, and doctoral levels. Supplementary materials for this article are available online.
f
Multiple linear regression models.
datasetcatalog.nlm.nih.gov
figshare.com
Updated Mar 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Trouet, Dominique; Ledeganck, Kristien J.; Vermeiren, Eline; Van Hoorenbeeck, Kim; Verhulst, Stijn; Dotremont, Hilde; Brinker, Marieke den; France, Annick; Van Eyck, Annelies; Wouters, Kristien; De Winter, Benedicte Y. (2023). Multiple linear regression models. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001115981
Explore at:
Dataset updated
Mar 30, 2023
Authors
Trouet, Dominique; Ledeganck, Kristien J.; Vermeiren, Eline; Van Hoorenbeeck, Kim; Verhulst, Stijn; Dotremont, Hilde; Brinker, Marieke den; France, Annick; Van Eyck, Annelies; Wouters, Kristien; De Winter, Benedicte Y.
Description
An increased blood pressure is a known comorbidity of both type 1 diabetes (T1DM) and obesity in children. Increasing evidence suggests a subtle interplay between epidermal growth factor (EGF) and renin along the juxtaglomerular system, regulating the impact of blood pressure on kidney health and the cardiovascular system. In this study, we investigated the relation between urinary EGF, serum renin and blood pressure in children with obesity or T1DM. 147 non-obese children with T1DM and 126 children with obesity, were included. Blood pressure was measured and mean arterial pressure (MAP) and the pulse pressure (PP) were calculated. Serum renin and urinary EGF levels were determined with a commercial ELISA kit. Partial Spearman rank correlation coefficients and multiple linear regression models were used to study the association between renin, the urinary EGF/urinary creatinine ratio and blood pressure parameters. The urinary EGF/urinary creatinine ratio is correlated with the SBP and the MAP in boys with obesity as well as in boys with T1DM. Multiple regression analysis showed that sex and pulse pressure in male subjects were found to be independently associated with renin. Sex, the presence of diabetes, age, the glomerular filtration rate and both pulse pressure and mean arterial pressure in male subjects were independently associated with urinary EGF/urinary creatinine. In conclusion, in boys with either obesity or diabetes, pulse pressure and mean arterial pressure are negatively associated with the functional integrity of the nephron, which is reflected by a decreased expression of urinary EGF.
f
Data from: INTRODUCTION TO NEURAL NETWORKS FOR NON-LINEAR REGRESSIONS:...
scielo.figshare.com
tiff
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eduardo D. Vicentini; Antonio G. Sampaio de Oliveira-Filho (2023). INTRODUCTION TO NEURAL NETWORKS FOR NON-LINEAR REGRESSIONS: POTENTIAL ENERGY SURFACE FITTING [Dataset]. http://doi.org/10.6084/m9.figshare.19945228.v1
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19945228.v1
Dataset updated
May 31, 2023
Dataset provided by
SciELO journals
Authors
Eduardo D. Vicentini; Antonio G. Sampaio de Oliveira-Filho
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The present work demonstrates how neural networks are used to do non-linear regressions. The technique is presented in a simple and didactic manner and applied to fit potential energy surfaces for the FeC molecule and for the reaction H + H2. It shows how to do the fitting for single- and multi-variable system providing examples and code that can be easily extended to many problems in chemistry. All the code used to perform the fitting and generate the results is available as a Jupyter Notebook, which can be used without neither installation nor configuration
q
Linear Regression (Excel) and Cellular Respiration for Biology, Chemistry...
qubeshub.org
Updated Jan 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Irene Corriette; Beatriz Gonzalez; Daniela Kitanska; Henriette Mozsolits; Sheela Vemu (2022). Linear Regression (Excel) and Cellular Respiration for Biology, Chemistry and Mathematics [Dataset]. http://doi.org/10.25334/5PX5-H796
Explore at:
Unique identifier
https://doi.org/10.25334/5PX5-H796
Dataset updated
Jan 11, 2022
Dataset provided by
QUBES
Authors
Irene Corriette; Beatriz Gonzalez; Daniela Kitanska; Henriette Mozsolits; Sheela Vemu
Description
Students typically find linear regression analysis of data sets in a biology classroom challenging. These activities could be used in a Biology, Chemistry, Mathematics, or Statistics course. The collection provides student activity files with Excel instructions and Instructor Activity files with Excel instructions and solutions to problems.

Students will be able to perform linear regression analysis, find correlation coefficient, create a scatter plot and find the r-square using MS Excel 365. Students will be able to interpret data sets, describe the relationship between biological variables, and predict the value of an output variable based on the input of an predictor variable.
f
Linear regression analysis for commonly used function domains.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Mar 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mastbergen, Simon C.; van Helvoort, Eefje M.; Hodgins, D.; Blanco, Fransisco J.; Kloppenburg, M.; Haugen, Ida K.; Lafeber, Floris P. J. G.; Welsing, Paco M. J.; Berenbaum, F.; Marijnissen, Anne C. A. (2022). Linear regression analysis for commonly used function domains. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000281937
Explore at:
Dataset updated
Mar 23, 2022
Authors
Mastbergen, Simon C.; van Helvoort, Eefje M.; Hodgins, D.; Blanco, Fransisco J.; Kloppenburg, M.; Haugen, Ida K.; Lafeber, Floris P. J. G.; Welsing, Paco M. J.; Berenbaum, F.; Marijnissen, Anne C. A.
Description
Linear regression analysis for commonly used function domains.
Ideals Dataset
zenodo.org
zip
Updated Sep 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ana Cruz; Ana Cruz (2022). Ideals Dataset [Dataset]. http://doi.org/10.5281/zenodo.6939734
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6939734
Dataset updated
Sep 19, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ana Cruz; Ana Cruz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Generated datasets of multiple ideals distributions used in the research in linear regressions and machine learning algorithm for the thesis s 'Predicting the performance of Buchberger‘s algorithm' . Concatenated and concatenated_stats are the datasets with the ideals exponents and correspondent polynomial additions, these datasets were created specifically for RNN, features_dataset contains statistics regarding the ideals and polynomial_additions_dataset contains info regarding their polynomial additions created for multiple linear regression models and simple neural networks.
Study Hours ,Student Scores for Linear Regression
kaggle.com
Updated Sep 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
douaa bennoune (2024). Study Hours ,Student Scores for Linear Regression [Dataset]. https://www.kaggle.com/datasets/douaabennoune/study-hours-student-scores-for-linear-regression
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 23, 2024
Dataset provided by
Kaggle
Authors
douaa bennoune
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains a collection of 100 randomly generated data points representing the relationship between the number of hours a student spends studying and their corresponding performance, measured as a score. The data has been generated to simulate a real-world scenario where study hours are assumed to influence academic outcomes, making it an excellent resource for linear regression analysis and other machine learning tasks.

Each row in the dataset consists of:

Hours: The number of hours a student dedicates to studying, ranging between 0 and 10 hours. Scores: The student's performance score, represented as a percentage, ranging from 0 to 100. Use Cases: This dataset is particularly useful for:

Linear Regression: Exploring how study hours influence student performance, fitting a regression line to predict scores based on study time. Data Science & Machine Learning: Practicing regression analysis, training models, and applying other predictive algorithms. Educational Research: Simulating data-driven insights into student behavior and performance metrics. Features: 100 rows of data. Continuous numerical variables suitable for regression tasks. Generated for educational purposes, making it ideal for students, teachers, and beginners in machine learning and data science. Potential Applications: Build a linear regression model to predict student scores. Investigate the correlation between study time and performance. Apply data visualization techniques to better understand the data. Use the dataset to experiment with model evaluation metrics like Mean Squared Error (MSE) and R-squared.
t
Data from: MIST: l0 Sparse Linear Regression with Momentum
service.tib.eu
Updated Jan 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). MIST: l0 Sparse Linear Regression with Momentum [Dataset]. https://service.tib.eu/ldmservice/dataset/mist--l0-sparse-linear-regression-with-momentum
Explore at:
Dataset updated
Jan 3, 2025
Description
The dataset used in the paper is a large linear system of equations with a sparse solution. The authors used this dataset to test their Momentumized Iterative Shrinkage Thresholding (MIST) algorithm.
c
Data from: Data for Regression Models to Estimate Water Use in Providence,...
s.cnmilf.com
data.usgs.gov
+2more
Updated Oct 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Data for Regression Models to Estimate Water Use in Providence, Rhode Island, 2014-2021 [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/data-for-regression-models-to-estimate-water-use-in-providence-rhode-island-2014-2021
Explore at:
Dataset updated
Oct 1, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Rhode Island, Providence
Description
This data release contains input data and programs (scripts) used to estimate monthly water demand for retail customers of Providence Water, located in Providence, Rhode Island. Explanatory data and model outputs are from July 2014 through June 2021. Models of per capita (for single-family residential customers) or per connection (for multi-family residential, commercial, and industrial customers) water use were developed using multiple linear regression. The dependent variables, provided by Providence Water, are the monthly number of connections and gallons of water delivered to single- and multi-family residential, commercial, and industrial connections. Potential independent variables (from online sources) are climate variables (temperature and precipitation), economic statistics, and a drought statistic. Not all independent variables were used in all of the models. The data are provided in data tables and model files. The data table RIWaterUseVariableExplanation.csv describes the explanatory variables and their data sources. The data table ProvModelInputData.csv provides the monthly water-use data that are the independent variables and the monthly climatic and economic data that are the dependent variables. The data table DroughtInputData.csv provides the weekly U.S. drought monitor index values that were processed to formulate a potential independent variable. The R script model_water_use.R runs the models that predict water use. The other two R scripts (load_preprocess_input_data.R and model_water_use_functions.R) are not run explicitly but are called from the primary script model_water_use.R. Regression equations produced by the models can be used to predict water demand throughout Rhode Island.

Facebook

Twitter

Click to copy link

Link copied

Cite

Aaditya Gupta (2024). Dataset for demonstrating simple linear Regression [Dataset]. https://www.kaggle.com/datasets/aadityagupta11/data-for-demonstrating-basic-linear-regression

Dataset for demonstrating simple linear Regression

The dataset consists of 100 observations which can be used for regression tasks

Explore at:

zip(2132 bytes)Available download formats

Dataset updated

Jul 3, 2024

Authors

Aaditya Gupta

Description

This dataset has been created to demonstrate the use of a simple linear regression model. It includes two variables: an independent variable and a dependent variable. The data can be used for training, testing, and validating a simple linear regression model, making it ideal for educational purposes, tutorials, and basic predictive analysis projects. The dataset consists of 100 observations with no missing values, and it follows a linear relationship

Clear search

Close search

Google apps

Main menu

Dataset for demonstrating simple linear Regression

Insurance Dataset - Simple Linear Regression

Data from: Data for multiple linear regression models for predicting...

Data from: An example data set for exploration of Multiple Linear Regression...

Data from: Data for multiple linear regression models for estimating...

Salary Dataset - Simple linear regression

Dataset Description

Columns

Table_1_Application of robust regression in translational neuroscience...

Student Performance (Multiple Linear Regression) Dataset

Data from: Assessing predictive performance of supervised machine learning...

Synthetic Dataset for Linear Regression

Panel dataset on Brazilian fuel demand

Data from: Learning While Learning: Psychology Case Studies for Teaching...

Multiple linear regression models.

Data from: INTRODUCTION TO NEURAL NETWORKS FOR NON-LINEAR REGRESSIONS:...

Linear Regression (Excel) and Cellular Respiration for Biology, Chemistry...

Linear regression analysis for commonly used function domains.

Ideals Dataset

Study Hours ,Student Scores for Linear Regression

Data from: MIST: l0 Sparse Linear Regression with Momentum

Data from: Data for Regression Models to Estimate Water Use in Providence,...

Dataset for demonstrating simple linear Regression

The dataset consists of 100 observations which can be used for regression tasks