100+ datasets found
  1. U

    An example data set for exploration of Multiple Linear Regression

    • data.usgs.gov
    • catalog.data.gov
    Updated Feb 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    William Farmer (2024). An example data set for exploration of Multiple Linear Regression [Dataset]. http://doi.org/10.5066/P9T5ZEXV
    Explore at:
    Dataset updated
    Feb 24, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    William Farmer
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    1956 - 2016
    Description

    This data set contains example data for exploration of the theory of regression based regionalization. The 90th percentile of annual maximum streamflow is provided as an example response variable for 293 streamgages in the conterminous United States. Several explanatory variables are drawn from the GAGES-II data base in order to demonstrate how multiple linear regression is applied. Example scripts demonstrate how to collect the original streamflow data provided and how to recreate the figures from the associated Techniques and Methods chapter.

  2. Marketing Linear Multiple Regression

    • kaggle.com
    zip
    Updated Apr 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FayeJavad (2020). Marketing Linear Multiple Regression [Dataset]. https://www.kaggle.com/datasets/fayejavad/marketing-linear-multiple-regression
    Explore at:
    zip(1907 bytes)Available download formats
    Dataset updated
    Apr 24, 2020
    Authors
    FayeJavad
    Description

    Dataset

    This dataset was created by FayeJavad

    Contents

  3. d

    Data from: Data for multiple linear regression models for predicting...

    • catalog.data.gov
    • data.usgs.gov
    • +2more
    Updated Nov 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Data for multiple linear regression models for predicting microcystin concentration action-level exceedances in selected lakes in Ohio [Dataset]. https://catalog.data.gov/dataset/data-for-multiple-linear-regression-models-for-predicting-microcystin-concentration-action
    Explore at:
    Dataset updated
    Nov 19, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Ohio
    Description

    Site-specific multiple linear regression models were developed for eight sites in Ohio—six in the Western Lake Erie Basin and two in northeast Ohio on inland reservoirs--to quickly predict action-level exceedances for a cyanotoxin, microcystin, in recreational and drinking waters used by the public. Real-time models include easily- or continuously-measured factors that do not require that a sample be collected. Real-time models are presented in two categories: (1) six models with continuous monitor data, and (2) three models with on-site measurements. Real-time models commonly included variables such as phycocyanin, pH, specific conductance, and streamflow or gage height. Many of the real-time factors were averages over time periods antecedent to the time the microcystin sample was collected, including water-quality data compiled from continuous monitors. Comprehensive models use a combination of discrete sample-based measurements and real-time factors. Comprehensive models were useful at some sites with lagged variables (< 2 weeks) for cyanobacterial toxin genes, dissolved nutrients, and (or) N to P ratios. Comprehensive models are presented in three categories: (1) three models with continuous monitor data and lagged comprehensive variables, (2) five models with no continuous monitor data and lagged comprehensive variables, and (3) one model with continuous monitor data and same-day comprehensive variables. Funding for this work was provided by the Ohio Water Development Authority and the U.S. Geological Survey Cooperative Water Program.

  4. Price Prediction -Multiple Linear Regression

    • kaggle.com
    zip
    Updated Aug 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erol Masimov (2022). Price Prediction -Multiple Linear Regression [Dataset]. https://www.kaggle.com/datasets/erolmasimov/price-prediction-multiple-linear-regression
    Explore at:
    zip(6192 bytes)Available download formats
    Dataset updated
    Aug 3, 2022
    Authors
    Erol Masimov
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The car company wants to enter a new market and needs an estimation of exactly which variables affect the car prices. The goal is: - Which variables are significant in predicting the price of a car - How well do those variables describe the price of a car

  5. Univariate and multiple linear regression analysis.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jill A. McKay; Alexandra Groom; Catherine Potter; Lisa J. Coneyworth; Dianne Ford; John C. Mathers; Caroline L. Relton (2023). Univariate and multiple linear regression analysis. [Dataset]. http://doi.org/10.1371/journal.pone.0033290.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Jill A. McKay; Alexandra Groom; Catherine Potter; Lisa J. Coneyworth; Dianne Ford; John C. Mathers; Caroline L. Relton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    *Dominant models were applied for these SNPs, hence coefficients reflect the difference in methylation level for carriers of the minor allele compared to major allele homozgyotes (reference group).†Females were compared to males (reference group).‡Additive models were applied for these SNPs, hence coefficients reflect the difference in methylation level for each additional copy of the minor allele compared to major allele homozygotes (reference group).ΦRecessive models were applied for these SNPs, hence coefficients reflect the difference in methylation level for minor allele homozygotes compared to carriers of the major allele (reference group).łReduced numbers in multiple regression models are due to limited maternal genotype data and removal of outliers, consequently, these reduced numbers may in part account for the lack of significance seen with some predictor variables. Note also that mean methylation levels were utilized for multiple regression modelling despite not always demonstrating the strongest effect size with individual predictors. Standardised beta coefficients are obtained by first standardizing all variables to have a mean of 0 and a standard deviation of 1, they denote the increase in methylation for a standard deviation increase in the predictor variables. Multiple regression analysis was not performed for ZNT5 associations as mean methylation was not considered across this locus.

  6. Multiple Linear Regression Dataset

    • kaggle.com
    zip
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Siddant007 (2025). Multiple Linear Regression Dataset [Dataset]. https://www.kaggle.com/datasets/siddant007/multiplelinearregression-outliers-missing-values
    Explore at:
    zip(1110 bytes)Available download formats
    Dataset updated
    Jul 11, 2025
    Authors
    Siddant007
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This is a synthetic but realistic dataset created for practicing Multiple Linear Regression and feature engineering in a housing price prediction context. The dataset includes common real-world challenges like missing values, outliers, and categorical features.

    You can use this dataset to: Build a regression model Practice data cleaning Explore feature scaling and encoding Visualize relationships between house characteristics and price

  7. Startup - Multiple Linear Regression

    • kaggle.com
    zip
    Updated Jan 29, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    karthickveerakumar (2018). Startup - Multiple Linear Regression [Dataset]. https://www.kaggle.com/datasets/karthickveerakumar/startup-logistic-regression
    Explore at:
    zip(1330 bytes)Available download formats
    Dataset updated
    Jan 29, 2018
    Authors
    karthickveerakumar
    Description

    Dataset

    This dataset was created by karthickveerakumar

    Contents

  8. d

    Data from: Data for multiple linear regression models for estimating...

    • catalog.data.gov
    • data.usgs.gov
    • +2more
    Updated Nov 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Data for multiple linear regression models for estimating Escherichia coli (E. coli) concentrations or the probability of exceeding the bathing-water standard at recreational sites in Ohio and Pennsylvania as part of the Great Lakes NowCast, 2019 [Dataset]. https://catalog.data.gov/dataset/data-for-multiple-linear-regression-models-for-estimating-escherichia-coli-e-coli-concentr
    Explore at:
    Dataset updated
    Nov 27, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    The Great Lakes, Pennsylvania
    Description

    Site-specific multiple linear regression models were developed for one beach in Ohio (three discrete sampling sites) and one beach in Pennsylvania to estimate concentrations of Escherichia coli (E. coli) or the probability of exceeding the bathing-water standard for E. coli in recreational waters used by the public. Traditional culture-based methods are commonly used to estimate concentrations of fecal indicator bacteria, such as E. coli; however, results are obtained 18 to 24 hours post sampling and do not accurately reflect current water-quality conditions. Beach-specific mathematical models use environmental and water-quality variables that are easily and quickly measured as surrogates to estimate concentrations of fecal-indicator bacteria or to provide the probability that a State recreational water-quality standard will be exceeded. When predictive models are used for beach closure or advisory decisions, they are referred to as “nowcasts”. Software designed for model development by the U.S. Environmental Protection Agency (Virtual Beach) was used. The selected model for each beach was based on a combination of explanatory variables including, most commonly, turbidity, water temperature, change in lake level over 24 hours, and antecedent rainfall. Model results are used by managers to report water-quality conditions to the public through the Great Lakes NowCast in 2019 (https://pa.water.usgs.gov/apps/nowcast/). Model performance in 2019 (sensitivity, specificity, and accuracy) was compared to using the previous day's E. coli concentration (persistence method).

  9. c

    Student Performance (Multiple Linear Regression) Dataset

    • cubig.ai
    zip
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Student Performance (Multiple Linear Regression) Dataset [Dataset]. https://cubig.ai/store/products/392/student-performance-multiple-linear-regression-dataset
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 29, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
    Description

    1) Data Introduction • The Student Performance (Multiple Linear Regression) Dataset is designed to analyze the relationship between students’ learning habits and academic performance. Each sample includes key indicators related to learning, such as study hours, sleep duration, previous test scores, and the number of practice exams completed.

    2) Data Utilization (1) Characteristics of the Student Performance (Multiple Linear Regression) Dataset: • The target variable, Hours Studied, quantitatively represents the amount of time a student has invested in studying. The dataset is structured to allow modeling and inference of learning behaviors based on correlations with other variables.

    (2) Applications of the Student Performance (Multiple Linear Regression) Dataset: • AI-Based Study Time Prediction Models: The dataset can be used to develop regression models that estimate a student’s expected study time based on inputs like academic performance, sleep habits, and engagement patterns. • Behavioral Analysis and Personalized Learning Strategies: It can be applied to identify students with insufficient study time and design personalized study interventions based on academic and lifestyle patterns.

  10. Multiple Linear Regression Dataset

    • kaggle.com
    zip
    Updated Aug 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hussain Nasir Khan (2022). Multiple Linear Regression Dataset [Dataset]. https://www.kaggle.com/datasets/hussainnasirkhan/multiple-linear-regression-dataset/code
    Explore at:
    zip(367 bytes)Available download formats
    Dataset updated
    Aug 14, 2022
    Authors
    Hussain Nasir Khan
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This is a very simple multiple linear regression dataset for beginners. This dataset has only three columns and twenty rows. There are only two independent variables and one dependent variable. The independent variables are 'age' and 'experience'. The dependent variable is 'income'.

  11. Subset for multiple regression analysis: socio-demographic data, social...

    • figshare.com
    txt
    Updated Jan 19, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrés Aparicio (2021). Subset for multiple regression analysis: socio-demographic data, social distance and the identification of mental health causes [Dataset]. http://doi.org/10.6084/m9.figshare.13607087.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 19, 2021
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Andrés Aparicio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data collected following the methodology and procedures described in (1,2). The sample consisted of Chilean adults (18 years of age or older) and was stratified by age, gender, and educational level. Five hundred and eighty-three participants began the process to answer the questionnaires either in person or online. Before the analysis, we excluded incomplete records, questionnaires answered by Chilean people living outside of Chile, and foreign people living in Chile for less than 10 years. This article reports the results obtained from 395 participants (68%). The final sample included adults from 18 to 78 years of age with low, middle and high educational levels.1. Scior K, Potts HW, Furnham AF. Awareness of schizophrenia and intellectual disability and stigma across ethnic groups in the UK. Psychiatry Res [Internet]. 2013 Jul 30 [cited 2019 Jan 5];208(2):125–30. Available from: https://www.sciencedirect.com/science/article/pii/S0165178112005604?via=ihub2. Scior K, Furnham A. Development and validation of the Intellectual Disability Literacy Scale for assessment of knowledge, beliefs and attitudes to intellectual disability. Res Dev Disabil [Internet]. 2011 Sep [cited 2017 Dec 31];32(5):1530–41. Available from: http://www.ncbi.nlm.nih.gov/pubmed/21377320

  12. i

    multi-output regression datasets

    • ieee-dataport.org
    Updated Nov 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chunyu Wang (2025). multi-output regression datasets [Dataset]. https://ieee-dataport.org/documents/multi-output-regression-datasets
    Explore at:
    Dataset updated
    Nov 20, 2025
    Authors
    Chunyu Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    1 ) and there are 16 continuous input variables.

  13. f

    Data from: Solving linear regression without skewness of the residuals’...

    • tandf.figshare.com
    txt
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Ricker (2023). Solving linear regression without skewness of the residuals’ distribution [Dataset]. http://doi.org/10.6084/m9.figshare.8152901.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Martin Ricker
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Linear ordinary least squares (OLS) regression assumes an unskewed distribution of the residuals for correct inference and prediction. A proof is given that for Manly’s exponential transformation of the dependent variable, there is always at least one solution for λ, such that the skewness of the standardized residuals’ distribution is zero. A computer code in Mathematica, together with an illustrative example, are provided. Generalized linear models are discussed briefly in comparison.

  14. Multiple Linear Regression Dataset for Practice

    • kaggle.com
    zip
    Updated Jun 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Fawad Ul Hassan Sarim (2024). Multiple Linear Regression Dataset for Practice [Dataset]. https://www.kaggle.com/datasets/fawadsarim/multiple-linear-regression-dataset-for-practice
    Explore at:
    zip(2248 bytes)Available download formats
    Dataset updated
    Jun 14, 2024
    Authors
    Muhammad Fawad Ul Hassan Sarim
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Muhammad Fawad Ul Hassan Sarim

    Released under Apache 2.0

    Contents

  15. d

    Data and model archive for multiple linear regression models for prediction...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Data and model archive for multiple linear regression models for prediction of weighted cyanotoxin mixture concentrations and microcystin concentrations at three recurring bloom sites in Kabetogama Lake in Minnesota [Dataset]. https://catalog.data.gov/dataset/data-and-model-archive-for-multiple-linear-regression-models-for-prediction-of-weighted-cy
    Explore at:
    Dataset updated
    Nov 27, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Kabetogama lake, Minnesota
    Description

    Multiple linear regression models were developed using data collected in 2016 and 2017 from three recurring bloom sites in Kabetogama Lake in northern Minnesota. These models were developed to predict concentrations of cyanotoxins (anatoxin-a, microcystin, and saxitoxin) that occur within the blooms. Virtual Beach software (version 3.0.6) was used to develop four models: two cyanotoxin mixture (MIX) models and two microcystin (MC) models. Models include those using readily available environmental variables (for example, wind speed and specific conductance) and those using additional comprehensive variables (based on laboratory analyses). Many of the independent variables were averages over a certain time period prior to a sample date, whereas other independent variables were lagged between 4 and 8 days. Funding for this work was provided by the U.S Geological Survey – National Park Service Partnership and the U.S. Geological Survey Environmental Health Program (Toxic Substance Hydrology and Contaminant Biology). The resulting model equations and final datasets are included in this data release while an associated child item model archive includes all the files needed to run and develop these VB models.

  16. Dataset for Multiple Linear Regression-2

    • kaggle.com
    zip
    Updated Dec 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mit Gandhi (2024). Dataset for Multiple Linear Regression-2 [Dataset]. https://www.kaggle.com/datasets/mitgandhi10/dataset-for-multiple-linear-regression-2
    Explore at:
    zip(89824 bytes)Available download formats
    Dataset updated
    Dec 15, 2024
    Authors
    Mit Gandhi
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset contains detailed information about vehicles, including their engine characteristics, fuel consumption, and CO2 emissions. It is a valuable resource for analyzing the impact of various factors like engine size, transmission type, and fuel type on a vehicle's carbon emissions.

    Features:

    Engine Size (L): The engine size of the vehicle in liters. Cylinders: Number of cylinders in the engine. Fuel Consumption (City, Highway, Combined): Fuel consumption in liters per 100 kilometers for city, highway, and combined driving conditions. Fuel Consumption (Combined - MPG): Fuel consumption in miles per gallon for combined driving conditions. CO2 Emissions (g/km): Carbon dioxide emissions measured in grams per kilometer. Categorical Columns: Make: Manufacturer of the vehicle. Model: Specific model name. Vehicle Class: Vehicle category (e.g., sedan, SUV, etc.). Transmission: Type of transmission (automatic, manual, etc.). Fuel Type: Type of fuel used (e.g., gasoline, diesel, hybrid, etc.). This dataset is ideal for exploring:

    The correlation between fuel efficiency and CO2 emissions. The role of vehicle specifications in determining environmental impact. Regression modeling and machine learning applications.

  17. n

    Data from: Assessing predictive performance of supervised machine learning...

    • data.niaid.nih.gov
    • datasetcatalog.nlm.nih.gov
    • +1more
    zip
    Updated May 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Evans Omondi (2023). Assessing predictive performance of supervised machine learning algorithms for a diamond pricing model [Dataset]. http://doi.org/10.5061/dryad.wh70rxwrh
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 23, 2023
    Dataset provided by
    Strathmore University
    Authors
    Evans Omondi
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The diamond is 58 times harder than any other mineral in the world, and its elegance as a jewel has long been appreciated. Forecasting diamond prices is challenging due to nonlinearity in important features such as carat, cut, clarity, table, and depth. Against this backdrop, the study conducted a comparative analysis of the performance of multiple supervised machine learning models (regressors and classifiers) in predicting diamond prices. Eight supervised machine learning algorithms were evaluated in this work including Multiple Linear Regression, Linear Discriminant Analysis, eXtreme Gradient Boosting, Random Forest, k-Nearest Neighbors, Support Vector Machines, Boosted Regression and Classification Trees, and Multi-Layer Perceptron. The analysis is based on data preprocessing, exploratory data analysis (EDA), training the aforementioned models, assessing their accuracy, and interpreting their results. Based on the performance metrics values and analysis, it was discovered that eXtreme Gradient Boosting was the most optimal algorithm in both classification and regression, with a R2 score of 97.45% and an Accuracy value of 74.28%. As a result, eXtreme Gradient Boosting was recommended as the optimal regressor and classifier for forecasting the price of a diamond specimen. Methods Kaggle, a data repository with thousands of datasets, was used in the investigation. It is an online community for machine learning practitioners and data scientists, as well as a robust, well-researched, and sufficient resource for analyzing various data sources. On Kaggle, users can search for and publish various datasets. In a web-based data-science environment, they can study datasets and construct models.

  18. f

    Results of the multiple linear regression models, predicting each stress...

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    • +1more
    Updated Apr 19, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Greaves-Lord, Kirstin; Evans, Brittany E.; Franken, Ingmar H. A.; Euser, Anja S.; Tulen, Joke H. M.; Huizink, Anja C. (2013). Results of the multiple linear regression models, predicting each stress response in the child sample. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001710782
    Explore at:
    Dataset updated
    Apr 19, 2013
    Authors
    Greaves-Lord, Kirstin; Evans, Brittany E.; Franken, Ingmar H. A.; Euser, Anja S.; Tulen, Joke H. M.; Huizink, Anja C.
    Description

    Notes. SES = socioeconomic status; RSA = respiratory sinus arrhythmia; PPS = perceived physiological stress; adjusted R2 reported; sample sizes for each reactivity model: cortisol (n = 336), heart rate (n = 320); RSA (n = 184); PPS (n = 251); F statistics pertain to model results, β statistics refer to standardized coefficients of individual predictors.

  19. Multiple linear regression results in the full ABCD 4.0 sample, controlling...

    • plos.figshare.com
    xlsx
    Updated Jun 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yi Zhou; Narun Pat; Michael C. Neale (2023). Multiple linear regression results in the full ABCD 4.0 sample, controlling for sociodemographic covariates. [Dataset]. http://doi.org/10.1371/journal.pone.0277158.s025
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Yi Zhou; Narun Pat; Michael C. Neale
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The partial regression coefficients (Estimate), standard errors (Std.Err), t-values, p-values, significance, model R2 (R^2), Bonferroni-Hochberg Adjusted p-values (BH adjustment), Bonferroni adjustd p-values, Durbin-Watson statistic (DW_statistic), Breusch-Pagan Chi^2 (BP Chi^2), and Breusch-Pagan (BP) p-values are presented. The ICC is the proportion of variance in rsfMRI connectivity explained by the family structure random effect. (XLSX)

  20. c

    Data from: Data for Regression Models to Estimate Water Use in Providence,...

    • s.cnmilf.com
    • data.usgs.gov
    • +2more
    Updated Oct 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Data for Regression Models to Estimate Water Use in Providence, Rhode Island, 2014-2021 [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/data-for-regression-models-to-estimate-water-use-in-providence-rhode-island-2014-2021
    Explore at:
    Dataset updated
    Oct 1, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Rhode Island, Providence
    Description

    This data release contains input data and programs (scripts) used to estimate monthly water demand for retail customers of Providence Water, located in Providence, Rhode Island. Explanatory data and model outputs are from July 2014 through June 2021. Models of per capita (for single-family residential customers) or per connection (for multi-family residential, commercial, and industrial customers) water use were developed using multiple linear regression. The dependent variables, provided by Providence Water, are the monthly number of connections and gallons of water delivered to single- and multi-family residential, commercial, and industrial connections. Potential independent variables (from online sources) are climate variables (temperature and precipitation), economic statistics, and a drought statistic. Not all independent variables were used in all of the models. The data are provided in data tables and model files. The data table RIWaterUseVariableExplanation.csv describes the explanatory variables and their data sources. The data table ProvModelInputData.csv provides the monthly water-use data that are the independent variables and the monthly climatic and economic data that are the dependent variables. The data table DroughtInputData.csv provides the weekly U.S. drought monitor index values that were processed to formulate a potential independent variable. The R script model_water_use.R runs the models that predict water use. The other two R scripts (load_preprocess_input_data.R and model_water_use_functions.R) are not run explicitly but are called from the primary script model_water_use.R. Regression equations produced by the models can be used to predict water demand throughout Rhode Island.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
William Farmer (2024). An example data set for exploration of Multiple Linear Regression [Dataset]. http://doi.org/10.5066/P9T5ZEXV

An example data set for exploration of Multiple Linear Regression

Explore at:
Dataset updated
Feb 24, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
William Farmer
License

U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically

Time period covered
1956 - 2016
Description

This data set contains example data for exploration of the theory of regression based regionalization. The 90th percentile of annual maximum streamflow is provided as an example response variable for 293 streamgages in the conterminous United States. Several explanatory variables are drawn from the GAGES-II data base in order to demonstrate how multiple linear regression is applied. Example scripts demonstrate how to collect the original streamflow data provided and how to recreate the figures from the associated Techniques and Methods chapter.

Search
Clear search
Close search
Google apps
Main menu