100+ datasets found

U
An example data set for exploration of Multiple Linear Regression
data.usgs.gov
catalog.data.gov
Updated Feb 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
William Farmer (2024). An example data set for exploration of Multiple Linear Regression [Dataset]. http://doi.org/10.5066/P9T5ZEXV
Explore at:
Unique identifier
https://doi.org/10.5066/P9T5ZEXV
Dataset updated
Feb 24, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
William Farmer
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Time period covered
1956 - 2016
Description
This data set contains example data for exploration of the theory of regression based regionalization. The 90th percentile of annual maximum streamflow is provided as an example response variable for 293 streamgages in the conterminous United States. Several explanatory variables are drawn from the GAGES-II data base in order to demonstrate how multiple linear regression is applied. Example scripts demonstrate how to collect the original streamflow data provided and how to recreate the figures from the associated Techniques and Methods chapter.
Marketing Linear Multiple Regression
kaggle.com
zip
Updated Apr 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FayeJavad (2020). Marketing Linear Multiple Regression [Dataset]. https://www.kaggle.com/datasets/fayejavad/marketing-linear-multiple-regression
Explore at:
zip(1907 bytes)Available download formats
Dataset updated
Apr 24, 2020
Authors
FayeJavad
Description
Dataset

This dataset was created by FayeJavad

Contents
d
Data from: Data for multiple linear regression models for predicting...
catalog.data.gov
data.usgs.gov
+2more
Updated Nov 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Data for multiple linear regression models for predicting microcystin concentration action-level exceedances in selected lakes in Ohio [Dataset]. https://catalog.data.gov/dataset/data-for-multiple-linear-regression-models-for-predicting-microcystin-concentration-action
Explore at:
Dataset updated
Nov 19, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Ohio
Description
Site-specific multiple linear regression models were developed for eight sites in Ohio—six in the Western Lake Erie Basin and two in northeast Ohio on inland reservoirs--to quickly predict action-level exceedances for a cyanotoxin, microcystin, in recreational and drinking waters used by the public. Real-time models include easily- or continuously-measured factors that do not require that a sample be collected. Real-time models are presented in two categories: (1) six models with continuous monitor data, and (2) three models with on-site measurements. Real-time models commonly included variables such as phycocyanin, pH, specific conductance, and streamflow or gage height. Many of the real-time factors were averages over time periods antecedent to the time the microcystin sample was collected, including water-quality data compiled from continuous monitors. Comprehensive models use a combination of discrete sample-based measurements and real-time factors. Comprehensive models were useful at some sites with lagged variables (< 2 weeks) for cyanobacterial toxin genes, dissolved nutrients, and (or) N to P ratios. Comprehensive models are presented in three categories: (1) three models with continuous monitor data and lagged comprehensive variables, (2) five models with no continuous monitor data and lagged comprehensive variables, and (3) one model with continuous monitor data and same-day comprehensive variables. Funding for this work was provided by the Ohio Water Development Authority and the U.S. Geological Survey Cooperative Water Program.
Price Prediction -Multiple Linear Regression
kaggle.com
zip
Updated Aug 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erol Masimov (2022). Price Prediction -Multiple Linear Regression [Dataset]. https://www.kaggle.com/datasets/erolmasimov/price-prediction-multiple-linear-regression
Explore at:
zip(6192 bytes)Available download formats
Dataset updated
Aug 3, 2022
Authors
Erol Masimov
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The car company wants to enter a new market and needs an estimation of exactly which variables affect the car prices. The goal is: - Which variables are significant in predicting the price of a car - How well do those variables describe the price of a car
Univariate and multiple linear regression analysis.
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jill A. McKay; Alexandra Groom; Catherine Potter; Lisa J. Coneyworth; Dianne Ford; John C. Mathers; Caroline L. Relton (2023). Univariate and multiple linear regression analysis. [Dataset]. http://doi.org/10.1371/journal.pone.0033290.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0033290.t003
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Jill A. McKay; Alexandra Groom; Catherine Potter; Lisa J. Coneyworth; Dianne Ford; John C. Mathers; Caroline L. Relton
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
*Dominant models were applied for these SNPs, hence coefficients reflect the difference in methylation level for carriers of the minor allele compared to major allele homozgyotes (reference group).†Females were compared to males (reference group).‡Additive models were applied for these SNPs, hence coefficients reflect the difference in methylation level for each additional copy of the minor allele compared to major allele homozygotes (reference group).ΦRecessive models were applied for these SNPs, hence coefficients reflect the difference in methylation level for minor allele homozygotes compared to carriers of the major allele (reference group).łReduced numbers in multiple regression models are due to limited maternal genotype data and removal of outliers, consequently, these reduced numbers may in part account for the lack of significance seen with some predictor variables. Note also that mean methylation levels were utilized for multiple regression modelling despite not always demonstrating the strongest effect size with individual predictors. Standardised beta coefficients are obtained by first standardizing all variables to have a mean of 0 and a standard deviation of 1, they denote the increase in methylation for a standard deviation increase in the predictor variables. Multiple regression analysis was not performed for ZNT5 associations as mean methylation was not considered across this locus.
Multiple Linear Regression Dataset
kaggle.com
zip
Updated Jul 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Siddant007 (2025). Multiple Linear Regression Dataset [Dataset]. https://www.kaggle.com/datasets/siddant007/multiplelinearregression-outliers-missing-values
Explore at:
zip(1110 bytes)Available download formats
Dataset updated
Jul 11, 2025
Authors
Siddant007
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This is a synthetic but realistic dataset created for practicing Multiple Linear Regression and feature engineering in a housing price prediction context. The dataset includes common real-world challenges like missing values, outliers, and categorical features.

You can use this dataset to: Build a regression model Practice data cleaning Explore feature scaling and encoding Visualize relationships between house characteristics and price
Startup - Multiple Linear Regression
kaggle.com
zip
Updated Jan 29, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
karthickveerakumar (2018). Startup - Multiple Linear Regression [Dataset]. https://www.kaggle.com/datasets/karthickveerakumar/startup-logistic-regression
Explore at:
zip(1330 bytes)Available download formats
Dataset updated
Jan 29, 2018
Authors
karthickveerakumar
Description
Dataset

This dataset was created by karthickveerakumar

Contents
d
Data from: Data for multiple linear regression models for estimating...
catalog.data.gov
data.usgs.gov
+2more
Updated Nov 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Data for multiple linear regression models for estimating Escherichia coli (E. coli) concentrations or the probability of exceeding the bathing-water standard at recreational sites in Ohio and Pennsylvania as part of the Great Lakes NowCast, 2019 [Dataset]. https://catalog.data.gov/dataset/data-for-multiple-linear-regression-models-for-estimating-escherichia-coli-e-coli-concentr
Explore at:
Dataset updated
Nov 27, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
The Great Lakes, Pennsylvania
Description
Site-specific multiple linear regression models were developed for one beach in Ohio (three discrete sampling sites) and one beach in Pennsylvania to estimate concentrations of Escherichia coli (E. coli) or the probability of exceeding the bathing-water standard for E. coli in recreational waters used by the public. Traditional culture-based methods are commonly used to estimate concentrations of fecal indicator bacteria, such as E. coli; however, results are obtained 18 to 24 hours post sampling and do not accurately reflect current water-quality conditions. Beach-specific mathematical models use environmental and water-quality variables that are easily and quickly measured as surrogates to estimate concentrations of fecal-indicator bacteria or to provide the probability that a State recreational water-quality standard will be exceeded. When predictive models are used for beach closure or advisory decisions, they are referred to as “nowcasts”. Software designed for model development by the U.S. Environmental Protection Agency (Virtual Beach) was used. The selected model for each beach was based on a combination of explanatory variables including, most commonly, turbidity, water temperature, change in lake level over 24 hours, and antecedent rainfall. Model results are used by managers to report water-quality conditions to the public through the Great Lakes NowCast in 2019 (https://pa.water.usgs.gov/apps/nowcast/). Model performance in 2019 (sensitivity, specificity, and accuracy) was compared to using the previous day's E. coli concentration (persistence method).
c
Student Performance (Multiple Linear Regression) Dataset
cubig.ai
zip
Updated May 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Student Performance (Multiple Linear Regression) Dataset [Dataset]. https://cubig.ai/store/products/392/student-performance-multiple-linear-regression-dataset
Explore at:
zipAvailable download formats
Dataset updated
May 29, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
Description
1) Data Introduction • The Student Performance (Multiple Linear Regression) Dataset is designed to analyze the relationship between students’ learning habits and academic performance. Each sample includes key indicators related to learning, such as study hours, sleep duration, previous test scores, and the number of practice exams completed.

2) Data Utilization (1) Characteristics of the Student Performance (Multiple Linear Regression) Dataset: • The target variable, Hours Studied, quantitatively represents the amount of time a student has invested in studying. The dataset is structured to allow modeling and inference of learning behaviors based on correlations with other variables.

(2) Applications of the Student Performance (Multiple Linear Regression) Dataset: • AI-Based Study Time Prediction Models: The dataset can be used to develop regression models that estimate a student’s expected study time based on inputs like academic performance, sleep habits, and engagement patterns. • Behavioral Analysis and Personalized Learning Strategies: It can be applied to identify students with insufficient study time and design personalized study interventions based on academic and lifestyle patterns.
Multiple Linear Regression Dataset
kaggle.com
zip
Updated Aug 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hussain Nasir Khan (2022). Multiple Linear Regression Dataset [Dataset]. https://www.kaggle.com/datasets/hussainnasirkhan/multiple-linear-regression-dataset/code
Explore at:
zip(367 bytes)Available download formats
Dataset updated
Aug 14, 2022
Authors
Hussain Nasir Khan
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This is a very simple multiple linear regression dataset for beginners. This dataset has only three columns and twenty rows. There are only two independent variables and one dependent variable. The independent variables are 'age' and 'experience'. The dependent variable is 'income'.
Subset for multiple regression analysis: socio-demographic data, social...
figshare.com
txt
Updated Jan 19, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrés Aparicio (2021). Subset for multiple regression analysis: socio-demographic data, social distance and the identification of mental health causes [Dataset]. http://doi.org/10.6084/m9.figshare.13607087.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13607087.v2
Dataset updated
Jan 19, 2021
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Andrés Aparicio
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data collected following the methodology and procedures described in (1,2). The sample consisted of Chilean adults (18 years of age or older) and was stratified by age, gender, and educational level. Five hundred and eighty-three participants began the process to answer the questionnaires either in person or online. Before the analysis, we excluded incomplete records, questionnaires answered by Chilean people living outside of Chile, and foreign people living in Chile for less than 10 years. This article reports the results obtained from 395 participants (68%). The final sample included adults from 18 to 78 years of age with low, middle and high educational levels.1. Scior K, Potts HW, Furnham AF. Awareness of schizophrenia and intellectual disability and stigma across ethnic groups in the UK. Psychiatry Res [Internet]. 2013 Jul 30 [cited 2019 Jan 5];208(2):125–30. Available from: https://www.sciencedirect.com/science/article/pii/S0165178112005604?via=ihub2. Scior K, Furnham A. Development and validation of the Intellectual Disability Literacy Scale for assessment of knowledge, beliefs and attitudes to intellectual disability. Res Dev Disabil [Internet]. 2011 Sep [cited 2017 Dec 31];32(5):1530–41. Available from: http://www.ncbi.nlm.nih.gov/pubmed/21377320
i
multi-output regression datasets
ieee-dataport.org
Updated Nov 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chunyu Wang (2025). multi-output regression datasets [Dataset]. https://ieee-dataport.org/documents/multi-output-regression-datasets
Explore at:
Dataset updated
Nov 20, 2025
Authors
Chunyu Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
1 ) and there are 16 continuous input variables.
f
Data from: Solving linear regression without skewness of the residuals’...
tandf.figshare.com
txt
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Ricker (2023). Solving linear regression without skewness of the residuals’ distribution [Dataset]. http://doi.org/10.6084/m9.figshare.8152901.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.8152901.v1
Dataset updated
Jun 5, 2023
Dataset provided by
Taylor & Francis
Authors
Martin Ricker
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Linear ordinary least squares (OLS) regression assumes an unskewed distribution of the residuals for correct inference and prediction. A proof is given that for Manly’s exponential transformation of the dependent variable, there is always at least one solution for λ, such that the skewness of the standardized residuals’ distribution is zero. A computer code in Mathematica, together with an illustrative example, are provided. Generalized linear models are discussed briefly in comparison.
Multiple Linear Regression Dataset for Practice
kaggle.com
zip
Updated Jun 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Fawad Ul Hassan Sarim (2024). Multiple Linear Regression Dataset for Practice [Dataset]. https://www.kaggle.com/datasets/fawadsarim/multiple-linear-regression-dataset-for-practice
Explore at:
zip(2248 bytes)Available download formats
Dataset updated
Jun 14, 2024
Authors
Muhammad Fawad Ul Hassan Sarim
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Muhammad Fawad Ul Hassan Sarim

Released under Apache 2.0

Contents
d
Data and model archive for multiple linear regression models for prediction...
catalog.data.gov
data.usgs.gov
+1more
Updated Nov 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Data and model archive for multiple linear regression models for prediction of weighted cyanotoxin mixture concentrations and microcystin concentrations at three recurring bloom sites in Kabetogama Lake in Minnesota [Dataset]. https://catalog.data.gov/dataset/data-and-model-archive-for-multiple-linear-regression-models-for-prediction-of-weighted-cy
Explore at:
Dataset updated
Nov 27, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Kabetogama lake, Minnesota
Description
Multiple linear regression models were developed using data collected in 2016 and 2017 from three recurring bloom sites in Kabetogama Lake in northern Minnesota. These models were developed to predict concentrations of cyanotoxins (anatoxin-a, microcystin, and saxitoxin) that occur within the blooms. Virtual Beach software (version 3.0.6) was used to develop four models: two cyanotoxin mixture (MIX) models and two microcystin (MC) models. Models include those using readily available environmental variables (for example, wind speed and specific conductance) and those using additional comprehensive variables (based on laboratory analyses). Many of the independent variables were averages over a certain time period prior to a sample date, whereas other independent variables were lagged between 4 and 8 days. Funding for this work was provided by the U.S Geological Survey – National Park Service Partnership and the U.S. Geological Survey Environmental Health Program (Toxic Substance Hydrology and Contaminant Biology). The resulting model equations and final datasets are included in this data release while an associated child item model archive includes all the files needed to run and develop these VB models.
Dataset for Multiple Linear Regression-2
kaggle.com
zip
Updated Dec 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mit Gandhi (2024). Dataset for Multiple Linear Regression-2 [Dataset]. https://www.kaggle.com/datasets/mitgandhi10/dataset-for-multiple-linear-regression-2
Explore at:
zip(89824 bytes)Available download formats
Dataset updated
Dec 15, 2024
Authors
Mit Gandhi
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset contains detailed information about vehicles, including their engine characteristics, fuel consumption, and CO2 emissions. It is a valuable resource for analyzing the impact of various factors like engine size, transmission type, and fuel type on a vehicle's carbon emissions.

Features:

Engine Size (L): The engine size of the vehicle in liters. Cylinders: Number of cylinders in the engine. Fuel Consumption (City, Highway, Combined): Fuel consumption in liters per 100 kilometers for city, highway, and combined driving conditions. Fuel Consumption (Combined - MPG): Fuel consumption in miles per gallon for combined driving conditions. CO2 Emissions (g/km): Carbon dioxide emissions measured in grams per kilometer. Categorical Columns: Make: Manufacturer of the vehicle. Model: Specific model name. Vehicle Class: Vehicle category (e.g., sedan, SUV, etc.). Transmission: Type of transmission (automatic, manual, etc.). Fuel Type: Type of fuel used (e.g., gasoline, diesel, hybrid, etc.). This dataset is ideal for exploring:

The correlation between fuel efficiency and CO2 emissions. The role of vehicle specifications in determining environmental impact. Regression modeling and machine learning applications.
n
Data from: Assessing predictive performance of supervised machine learning...
data.niaid.nih.gov
datasetcatalog.nlm.nih.gov
+1more
zip
Updated May 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Evans Omondi (2023). Assessing predictive performance of supervised machine learning algorithms for a diamond pricing model [Dataset]. http://doi.org/10.5061/dryad.wh70rxwrh
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.wh70rxwrh
Dataset updated
May 23, 2023
Dataset provided by
Strathmore University
Authors
Evans Omondi
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
The diamond is 58 times harder than any other mineral in the world, and its elegance as a jewel has long been appreciated. Forecasting diamond prices is challenging due to nonlinearity in important features such as carat, cut, clarity, table, and depth. Against this backdrop, the study conducted a comparative analysis of the performance of multiple supervised machine learning models (regressors and classifiers) in predicting diamond prices. Eight supervised machine learning algorithms were evaluated in this work including Multiple Linear Regression, Linear Discriminant Analysis, eXtreme Gradient Boosting, Random Forest, k-Nearest Neighbors, Support Vector Machines, Boosted Regression and Classification Trees, and Multi-Layer Perceptron. The analysis is based on data preprocessing, exploratory data analysis (EDA), training the aforementioned models, assessing their accuracy, and interpreting their results. Based on the performance metrics values and analysis, it was discovered that eXtreme Gradient Boosting was the most optimal algorithm in both classification and regression, with a R2 score of 97.45% and an Accuracy value of 74.28%. As a result, eXtreme Gradient Boosting was recommended as the optimal regressor and classifier for forecasting the price of a diamond specimen. Methods Kaggle, a data repository with thousands of datasets, was used in the investigation. It is an online community for machine learning practitioners and data scientists, as well as a robust, well-researched, and sufficient resource for analyzing various data sources. On Kaggle, users can search for and publish various datasets. In a web-based data-science environment, they can study datasets and construct models.
f
Results of the multiple linear regression models, predicting each stress...
datasetcatalog.nlm.nih.gov
figshare.com
+1more
Updated Apr 19, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Greaves-Lord, Kirstin; Evans, Brittany E.; Franken, Ingmar H. A.; Euser, Anja S.; Tulen, Joke H. M.; Huizink, Anja C. (2013). Results of the multiple linear regression models, predicting each stress response in the child sample. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001710782
Explore at:
Dataset updated
Apr 19, 2013
Authors
Greaves-Lord, Kirstin; Evans, Brittany E.; Franken, Ingmar H. A.; Euser, Anja S.; Tulen, Joke H. M.; Huizink, Anja C.
Description
Notes. SES = socioeconomic status; RSA = respiratory sinus arrhythmia; PPS = perceived physiological stress; adjusted R2 reported; sample sizes for each reactivity model: cortisol (n = 336), heart rate (n = 320); RSA (n = 184); PPS (n = 251); F statistics pertain to model results, β statistics refer to standardized coefficients of individual predictors.
Multiple linear regression results in the full ABCD 4.0 sample, controlling...
plos.figshare.com
xlsx
Updated Jun 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yi Zhou; Narun Pat; Michael C. Neale (2023). Multiple linear regression results in the full ABCD 4.0 sample, controlling for sociodemographic covariates. [Dataset]. http://doi.org/10.1371/journal.pone.0277158.s025
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0277158.s025
Dataset updated
Jun 2, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Yi Zhou; Narun Pat; Michael C. Neale
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The partial regression coefficients (Estimate), standard errors (Std.Err), t-values, p-values, significance, model R2 (R^2), Bonferroni-Hochberg Adjusted p-values (BH adjustment), Bonferroni adjustd p-values, Durbin-Watson statistic (DW_statistic), Breusch-Pagan Chi^2 (BP Chi^2), and Breusch-Pagan (BP) p-values are presented. The ICC is the proportion of variance in rsfMRI connectivity explained by the family structure random effect. (XLSX)
c
Data from: Data for Regression Models to Estimate Water Use in Providence,...
s.cnmilf.com
data.usgs.gov
+2more
Updated Oct 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Data for Regression Models to Estimate Water Use in Providence, Rhode Island, 2014-2021 [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/data-for-regression-models-to-estimate-water-use-in-providence-rhode-island-2014-2021
Explore at:
Dataset updated
Oct 1, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Rhode Island, Providence
Description
This data release contains input data and programs (scripts) used to estimate monthly water demand for retail customers of Providence Water, located in Providence, Rhode Island. Explanatory data and model outputs are from July 2014 through June 2021. Models of per capita (for single-family residential customers) or per connection (for multi-family residential, commercial, and industrial customers) water use were developed using multiple linear regression. The dependent variables, provided by Providence Water, are the monthly number of connections and gallons of water delivered to single- and multi-family residential, commercial, and industrial connections. Potential independent variables (from online sources) are climate variables (temperature and precipitation), economic statistics, and a drought statistic. Not all independent variables were used in all of the models. The data are provided in data tables and model files. The data table RIWaterUseVariableExplanation.csv describes the explanatory variables and their data sources. The data table ProvModelInputData.csv provides the monthly water-use data that are the independent variables and the monthly climatic and economic data that are the dependent variables. The data table DroughtInputData.csv provides the weekly U.S. drought monitor index values that were processed to formulate a potential independent variable. The R script model_water_use.R runs the models that predict water use. The other two R scripts (load_preprocess_input_data.R and model_water_use_functions.R) are not run explicitly but are called from the primary script model_water_use.R. Regression equations produced by the models can be used to predict water demand throughout Rhode Island.

Facebook

Twitter

Click to copy link

Link copied

Cite

William Farmer (2024). An example data set for exploration of Multiple Linear Regression [Dataset]. http://doi.org/10.5066/P9T5ZEXV

An example data set for exploration of Multiple Linear Regression

Explore at:

Unique identifier

https://doi.org/10.5066/P9T5ZEXV

Dataset updated

Feb 24, 2024

Dataset provided by

United States Geological Surveyhttp://www.usgs.gov/

Authors

William Farmer

License

U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically

Time period covered

1956 - 2016

Description

This data set contains example data for exploration of the theory of regression based regionalization. The 90th percentile of annual maximum streamflow is provided as an example response variable for 293 streamgages in the conterminous United States. Several explanatory variables are drawn from the GAGES-II data base in order to demonstrate how multiple linear regression is applied. Example scripts demonstrate how to collect the original streamflow data provided and how to recreate the figures from the associated Techniques and Methods chapter.

Clear search

Close search

Google apps

Main menu

An example data set for exploration of Multiple Linear Regression

Marketing Linear Multiple Regression

Dataset

Contents

Data from: Data for multiple linear regression models for predicting...

Price Prediction -Multiple Linear Regression

Univariate and multiple linear regression analysis.

Multiple Linear Regression Dataset

Startup - Multiple Linear Regression

Dataset

Contents

Data from: Data for multiple linear regression models for estimating...

Student Performance (Multiple Linear Regression) Dataset

Multiple Linear Regression Dataset

Subset for multiple regression analysis: socio-demographic data, social...

multi-output regression datasets

Data from: Solving linear regression without skewness of the residuals’...

Multiple Linear Regression Dataset for Practice

Dataset

Contents

Data and model archive for multiple linear regression models for prediction...

Dataset for Multiple Linear Regression-2

Data from: Assessing predictive performance of supervised machine learning...

Results of the multiple linear regression models, predicting each stress...

Multiple linear regression results in the full ABCD 4.0 sample, controlling...

Data from: Data for Regression Models to Estimate Water Use in Providence,...

An example data set for exploration of Multiple Linear Regression