100+ datasets found
  1. Dataset for demonstrating simple linear Regression

    • kaggle.com
    zip
    Updated Jul 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aaditya Gupta (2024). Dataset for demonstrating simple linear Regression [Dataset]. https://www.kaggle.com/datasets/aadityagupta11/data-for-demonstrating-basic-linear-regression
    Explore at:
    zip(2132 bytes)Available download formats
    Dataset updated
    Jul 3, 2024
    Authors
    Aaditya Gupta
    Description

    This dataset has been created to demonstrate the use of a simple linear regression model. It includes two variables: an independent variable and a dependent variable. The data can be used for training, testing, and validating a simple linear regression model, making it ideal for educational purposes, tutorials, and basic predictive analysis projects. The dataset consists of 100 observations with no missing values, and it follows a linear relationship

  2. Insurance Dataset - Simple Linear Regression

    • kaggle.com
    zip
    Updated Sep 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Taseer Mehboob (2023). Insurance Dataset - Simple Linear Regression [Dataset]. https://www.kaggle.com/datasets/taseermehboob9/insurance-dataset-simple-linear-regression
    Explore at:
    zip(254 bytes)Available download formats
    Dataset updated
    Sep 14, 2023
    Authors
    Taseer Mehboob
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Here in This Dataset we have only 2 columns the first one is Age and the second one is Premium You can use this dataset in machine learning for Simple linear Regression and for Prediction Practices.

  3. d

    Data from: Data for multiple linear regression models for predicting...

    • catalog.data.gov
    • data.usgs.gov
    • +2more
    Updated Nov 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Data for multiple linear regression models for predicting microcystin concentration action-level exceedances in selected lakes in Ohio [Dataset]. https://catalog.data.gov/dataset/data-for-multiple-linear-regression-models-for-predicting-microcystin-concentration-action
    Explore at:
    Dataset updated
    Nov 19, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Ohio
    Description

    Site-specific multiple linear regression models were developed for eight sites in Ohio—six in the Western Lake Erie Basin and two in northeast Ohio on inland reservoirs--to quickly predict action-level exceedances for a cyanotoxin, microcystin, in recreational and drinking waters used by the public. Real-time models include easily- or continuously-measured factors that do not require that a sample be collected. Real-time models are presented in two categories: (1) six models with continuous monitor data, and (2) three models with on-site measurements. Real-time models commonly included variables such as phycocyanin, pH, specific conductance, and streamflow or gage height. Many of the real-time factors were averages over time periods antecedent to the time the microcystin sample was collected, including water-quality data compiled from continuous monitors. Comprehensive models use a combination of discrete sample-based measurements and real-time factors. Comprehensive models were useful at some sites with lagged variables (< 2 weeks) for cyanobacterial toxin genes, dissolved nutrients, and (or) N to P ratios. Comprehensive models are presented in three categories: (1) three models with continuous monitor data and lagged comprehensive variables, (2) five models with no continuous monitor data and lagged comprehensive variables, and (3) one model with continuous monitor data and same-day comprehensive variables. Funding for this work was provided by the Ohio Water Development Authority and the U.S. Geological Survey Cooperative Water Program.

  4. d

    Data from: An example data set for exploration of Multiple Linear Regression...

    • catalog.data.gov
    • data.usgs.gov
    Updated Nov 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). An example data set for exploration of Multiple Linear Regression [Dataset]. https://catalog.data.gov/dataset/an-example-data-set-for-exploration-of-multiple-linear-regression
    Explore at:
    Dataset updated
    Nov 20, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    This data set contains example data for exploration of the theory of regression based regionalization. The 90th percentile of annual maximum streamflow is provided as an example response variable for 293 streamgages in the conterminous United States. Several explanatory variables are drawn from the GAGES-II data base in order to demonstrate how multiple linear regression is applied. Example scripts demonstrate how to collect the original streamflow data provided and how to recreate the figures from the associated Techniques and Methods chapter.

  5. d

    Data from: Data for multiple linear regression models for estimating...

    • catalog.data.gov
    • data.usgs.gov
    • +2more
    Updated Nov 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Data for multiple linear regression models for estimating Escherichia coli (E. coli) concentrations or the probability of exceeding the bathing-water standard at recreational sites in Ohio and Pennsylvania as part of the Great Lakes NowCast, 2019 [Dataset]. https://catalog.data.gov/dataset/data-for-multiple-linear-regression-models-for-estimating-escherichia-coli-e-coli-concentr
    Explore at:
    Dataset updated
    Nov 27, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Pennsylvania, The Great Lakes
    Description

    Site-specific multiple linear regression models were developed for one beach in Ohio (three discrete sampling sites) and one beach in Pennsylvania to estimate concentrations of Escherichia coli (E. coli) or the probability of exceeding the bathing-water standard for E. coli in recreational waters used by the public. Traditional culture-based methods are commonly used to estimate concentrations of fecal indicator bacteria, such as E. coli; however, results are obtained 18 to 24 hours post sampling and do not accurately reflect current water-quality conditions. Beach-specific mathematical models use environmental and water-quality variables that are easily and quickly measured as surrogates to estimate concentrations of fecal-indicator bacteria or to provide the probability that a State recreational water-quality standard will be exceeded. When predictive models are used for beach closure or advisory decisions, they are referred to as “nowcasts”. Software designed for model development by the U.S. Environmental Protection Agency (Virtual Beach) was used. The selected model for each beach was based on a combination of explanatory variables including, most commonly, turbidity, water temperature, change in lake level over 24 hours, and antecedent rainfall. Model results are used by managers to report water-quality conditions to the public through the Great Lakes NowCast in 2019 (https://pa.water.usgs.gov/apps/nowcast/). Model performance in 2019 (sensitivity, specificity, and accuracy) was compared to using the previous day's E. coli concentration (persistence method).

  6. Salary Dataset - Simple linear regression

    • kaggle.com
    zip
    Updated Jan 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allena Venkata Sai Abhishek (2023). Salary Dataset - Simple linear regression [Dataset]. https://www.kaggle.com/datasets/abhishek14398/salary-dataset-simple-linear-regression/code
    Explore at:
    zip(457 bytes)Available download formats
    Dataset updated
    Jan 10, 2023
    Authors
    Allena Venkata Sai Abhishek
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset Description

    Salary Dataset in CSV for Simple linear regression. It has also been used in Machine Learning A to Z course of my series.

    Columns

    • #
    • YearsExperience
    • Salary
  7. Table_1_Application of robust regression in translational neuroscience...

    • frontiersin.figshare.com
    • datasetcatalog.nlm.nih.gov
    docx
    Updated Jan 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Malek-Ahmadi; Stephen D. Ginsberg; Melissa J. Alldred; Scott E. Counts; Milos D. Ikonomovic; Eric E. Abrahamson; Sylvia E. Perez; Elliott J. Mufson (2024). Table_1_Application of robust regression in translational neuroscience studies with non-Gaussian outcome data.DOCX [Dataset]. http://doi.org/10.3389/fnagi.2023.1299451.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jan 24, 2024
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Michael Malek-Ahmadi; Stephen D. Ginsberg; Melissa J. Alldred; Scott E. Counts; Milos D. Ikonomovic; Eric E. Abrahamson; Sylvia E. Perez; Elliott J. Mufson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Linear regression is one of the most used statistical techniques in neuroscience, including the study of the neuropathology of Alzheimer’s disease (AD) dementia. However, the practical utility of this approach is often limited because dependent variables are often highly skewed and fail to meet the assumption of normality. Applying linear regression analyses to highly skewed datasets can generate imprecise results, which lead to erroneous estimates derived from statistical models. Furthermore, the presence of outliers can introduce unwanted bias, which affect estimates derived from linear regression models. Although a variety of data transformations can be utilized to mitigate these problems, these approaches are also associated with various caveats. By contrast, a robust regression approach does not impose distributional assumptions on data allowing for results to be interpreted in a similar manner to that derived using a linear regression analysis. Here, we demonstrate the utility of applying robust regression to the analysis of data derived from studies of human brain neurodegeneration where the error distribution of a dependent variable does not meet the assumption of normality. We show that the application of a robust regression approach to two independent published human clinical neuropathologic data sets provides reliable estimates of associations. We also demonstrate that results from a linear regression analysis can be biased if the dependent variable is significantly skewed, further indicating robust regression as a suitable alternate approach.

  8. c

    Student Performance (Multiple Linear Regression) Dataset

    • cubig.ai
    zip
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Student Performance (Multiple Linear Regression) Dataset [Dataset]. https://cubig.ai/store/products/392/student-performance-multiple-linear-regression-dataset
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 29, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
    Description

    1) Data Introduction • The Student Performance (Multiple Linear Regression) Dataset is designed to analyze the relationship between students’ learning habits and academic performance. Each sample includes key indicators related to learning, such as study hours, sleep duration, previous test scores, and the number of practice exams completed.

    2) Data Utilization (1) Characteristics of the Student Performance (Multiple Linear Regression) Dataset: • The target variable, Hours Studied, quantitatively represents the amount of time a student has invested in studying. The dataset is structured to allow modeling and inference of learning behaviors based on correlations with other variables.

    (2) Applications of the Student Performance (Multiple Linear Regression) Dataset: • AI-Based Study Time Prediction Models: The dataset can be used to develop regression models that estimate a student’s expected study time based on inputs like academic performance, sleep habits, and engagement patterns. • Behavioral Analysis and Personalized Learning Strategies: It can be applied to identify students with insufficient study time and design personalized study interventions based on academic and lifestyle patterns.

  9. n

    Data from: Assessing predictive performance of supervised machine learning...

    • data.niaid.nih.gov
    • datasetcatalog.nlm.nih.gov
    • +1more
    zip
    Updated May 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Evans Omondi (2023). Assessing predictive performance of supervised machine learning algorithms for a diamond pricing model [Dataset]. http://doi.org/10.5061/dryad.wh70rxwrh
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 23, 2023
    Dataset provided by
    Strathmore University
    Authors
    Evans Omondi
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The diamond is 58 times harder than any other mineral in the world, and its elegance as a jewel has long been appreciated. Forecasting diamond prices is challenging due to nonlinearity in important features such as carat, cut, clarity, table, and depth. Against this backdrop, the study conducted a comparative analysis of the performance of multiple supervised machine learning models (regressors and classifiers) in predicting diamond prices. Eight supervised machine learning algorithms were evaluated in this work including Multiple Linear Regression, Linear Discriminant Analysis, eXtreme Gradient Boosting, Random Forest, k-Nearest Neighbors, Support Vector Machines, Boosted Regression and Classification Trees, and Multi-Layer Perceptron. The analysis is based on data preprocessing, exploratory data analysis (EDA), training the aforementioned models, assessing their accuracy, and interpreting their results. Based on the performance metrics values and analysis, it was discovered that eXtreme Gradient Boosting was the most optimal algorithm in both classification and regression, with a R2 score of 97.45% and an Accuracy value of 74.28%. As a result, eXtreme Gradient Boosting was recommended as the optimal regressor and classifier for forecasting the price of a diamond specimen. Methods Kaggle, a data repository with thousands of datasets, was used in the investigation. It is an online community for machine learning practitioners and data scientists, as well as a robust, well-researched, and sufficient resource for analyzing various data sources. On Kaggle, users can search for and publish various datasets. In a web-based data-science environment, they can study datasets and construct models.

  10. Synthetic Dataset for Linear Regression

    • kaggle.com
    zip
    Updated Mar 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stable Space (2025). Synthetic Dataset for Linear Regression [Dataset]. https://www.kaggle.com/datasets/sharmajicoder/synthetic-dataset-for-linear-regression/data
    Explore at:
    zip(9111 bytes)Available download formats
    Dataset updated
    Mar 23, 2025
    Authors
    Stable Space
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The file contain dataset with two variables (x & y). The dataset is for Linear regression ML Models. The dataset can be used for Testing purpose. The x variable is the independent variable, and y is the dependent variable. The dataset has a correlation of 0.9981 showing the dataset is best suited for linear models and can be used for the testing purpose.

  11. m

    Panel dataset on Brazilian fuel demand

    • data.mendeley.com
    Updated Oct 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sergio Prolo (2024). Panel dataset on Brazilian fuel demand [Dataset]. http://doi.org/10.17632/hzpwbp7j22.1
    Explore at:
    Dataset updated
    Oct 7, 2024
    Authors
    Sergio Prolo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Brazil
    Description

    Summary : Fuel demand is shown to be influenced by fuel prices, people's income and motorization rates. We explore the effects of electric vehicle's rates in gasoline demand using this panel dataset.

    Files : dataset.csv - Panel dimensions are the Brazilian state ( i ) and year ( t ). The other columns are: gasoline sales per capita (ln_Sg_pc), prices of gasoline (ln_Pg) and ethanol (ln_Pe) and their lags, motorization rates of combustion vehicles (ln_Mi_c) and electric vehicles (ln_Mi_e) and GDP per capita (ln_gdp_pc). All variables are all under the natural log function, since we use this to calculate demand elasticities in a regression model.

    adjacency.csv - The adjacency matrix used in interaction with electric vehicles' motorization rates to calculate spatial effects. At first, it follows a binary adjacency formula: for each pair of states i and j, the cell (i, j) is 0 if the states are not adjacent and 1 if they are. Then, each row is normalized to have sum equal to one.

    regression.do - Series of Stata commands used to estimate the regression models of our study. dataset.csv must be imported to work, see comment section.

    dataset_predictions.xlsx - Based on the estimations from Stata, we use this excel file to make average predictions by year and by state. Also, by including years beyond the last panel sample, we also forecast the model into the future and evaluate the effects of different policies that influence gasoline prices (taxation) and EV motorization rates (electrification). This file is primarily used to create images, but can be used to further understand how the forecasting scenarios are set up.

    Sources: Fuel prices and sales: ANP (https://www.gov.br/anp/en/access-information/what-is-anp/what-is-anp) State population, GDP and vehicle fleet: IBGE (https://www.ibge.gov.br/en/home-eng.html?lang=en-GB) State EV fleet: Anfavea (https://anfavea.com.br/en/site/anuarios/)

  12. Data from: Learning While Learning: Psychology Case Studies for Teaching...

    • tandf.figshare.com
    bin
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ciaran Evans; Alex Reinhart; Erin Cooley; William Cipolli (2025). Learning While Learning: Psychology Case Studies for Teaching Regression [Dataset]. http://doi.org/10.6084/m9.figshare.28127458.v2
    Explore at:
    binAvailable download formats
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Ciaran Evans; Alex Reinhart; Erin Cooley; William Cipolli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this article, we explore the use of two published datasets for teaching a wide range of students about regression models, with a particular focus on interaction terms. The two datasets come from recent psychology studies on beliefs about poverty and welfare, and about the dynamics of groups projects. Both datasets (and their original research papers) are accessible to students, and because of their context, students can learn about data collection, measurement, and the use of statistics when studying complex social topics, while using the data to learn about regression analysis. We have used these data for a range of in-class activities, journal paper discussions, exams, and extended projects, at the undergraduate, master’s, and doctoral levels. Supplementary materials for this article are available online.

  13. f

    Multiple linear regression models.

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Mar 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Trouet, Dominique; Ledeganck, Kristien J.; Vermeiren, Eline; Van Hoorenbeeck, Kim; Verhulst, Stijn; Dotremont, Hilde; Brinker, Marieke den; France, Annick; Van Eyck, Annelies; Wouters, Kristien; De Winter, Benedicte Y. (2023). Multiple linear regression models. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001115981
    Explore at:
    Dataset updated
    Mar 30, 2023
    Authors
    Trouet, Dominique; Ledeganck, Kristien J.; Vermeiren, Eline; Van Hoorenbeeck, Kim; Verhulst, Stijn; Dotremont, Hilde; Brinker, Marieke den; France, Annick; Van Eyck, Annelies; Wouters, Kristien; De Winter, Benedicte Y.
    Description

    An increased blood pressure is a known comorbidity of both type 1 diabetes (T1DM) and obesity in children. Increasing evidence suggests a subtle interplay between epidermal growth factor (EGF) and renin along the juxtaglomerular system, regulating the impact of blood pressure on kidney health and the cardiovascular system. In this study, we investigated the relation between urinary EGF, serum renin and blood pressure in children with obesity or T1DM. 147 non-obese children with T1DM and 126 children with obesity, were included. Blood pressure was measured and mean arterial pressure (MAP) and the pulse pressure (PP) were calculated. Serum renin and urinary EGF levels were determined with a commercial ELISA kit. Partial Spearman rank correlation coefficients and multiple linear regression models were used to study the association between renin, the urinary EGF/urinary creatinine ratio and blood pressure parameters. The urinary EGF/urinary creatinine ratio is correlated with the SBP and the MAP in boys with obesity as well as in boys with T1DM. Multiple regression analysis showed that sex and pulse pressure in male subjects were found to be independently associated with renin. Sex, the presence of diabetes, age, the glomerular filtration rate and both pulse pressure and mean arterial pressure in male subjects were independently associated with urinary EGF/urinary creatinine. In conclusion, in boys with either obesity or diabetes, pulse pressure and mean arterial pressure are negatively associated with the functional integrity of the nephron, which is reflected by a decreased expression of urinary EGF.

  14. f

    Data from: INTRODUCTION TO NEURAL NETWORKS FOR NON-LINEAR REGRESSIONS:...

    • scielo.figshare.com
    tiff
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eduardo D. Vicentini; Antonio G. Sampaio de Oliveira-Filho (2023). INTRODUCTION TO NEURAL NETWORKS FOR NON-LINEAR REGRESSIONS: POTENTIAL ENERGY SURFACE FITTING [Dataset]. http://doi.org/10.6084/m9.figshare.19945228.v1
    Explore at:
    tiffAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    SciELO journals
    Authors
    Eduardo D. Vicentini; Antonio G. Sampaio de Oliveira-Filho
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The present work demonstrates how neural networks are used to do non-linear regressions. The technique is presented in a simple and didactic manner and applied to fit potential energy surfaces for the FeC molecule and for the reaction H + H2. It shows how to do the fitting for single- and multi-variable system providing examples and code that can be easily extended to many problems in chemistry. All the code used to perform the fitting and generate the results is available as a Jupyter Notebook, which can be used without neither installation nor configuration

  15. q

    Linear Regression (Excel) and Cellular Respiration for Biology, Chemistry...

    • qubeshub.org
    Updated Jan 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Irene Corriette; Beatriz Gonzalez; Daniela Kitanska; Henriette Mozsolits; Sheela Vemu (2022). Linear Regression (Excel) and Cellular Respiration for Biology, Chemistry and Mathematics [Dataset]. http://doi.org/10.25334/5PX5-H796
    Explore at:
    Dataset updated
    Jan 11, 2022
    Dataset provided by
    QUBES
    Authors
    Irene Corriette; Beatriz Gonzalez; Daniela Kitanska; Henriette Mozsolits; Sheela Vemu
    Description

    Students typically find linear regression analysis of data sets in a biology classroom challenging. These activities could be used in a Biology, Chemistry, Mathematics, or Statistics course. The collection provides student activity files with Excel instructions and Instructor Activity files with Excel instructions and solutions to problems.

    Students will be able to perform linear regression analysis, find correlation coefficient, create a scatter plot and find the r-square using MS Excel 365. Students will be able to interpret data sets, describe the relationship between biological variables, and predict the value of an output variable based on the input of an predictor variable.

  16. f

    Linear regression analysis for commonly used function domains.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Mar 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mastbergen, Simon C.; van Helvoort, Eefje M.; Hodgins, D.; Blanco, Fransisco J.; Kloppenburg, M.; Haugen, Ida K.; Lafeber, Floris P. J. G.; Welsing, Paco M. J.; Berenbaum, F.; Marijnissen, Anne C. A. (2022). Linear regression analysis for commonly used function domains. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000281937
    Explore at:
    Dataset updated
    Mar 23, 2022
    Authors
    Mastbergen, Simon C.; van Helvoort, Eefje M.; Hodgins, D.; Blanco, Fransisco J.; Kloppenburg, M.; Haugen, Ida K.; Lafeber, Floris P. J. G.; Welsing, Paco M. J.; Berenbaum, F.; Marijnissen, Anne C. A.
    Description

    Linear regression analysis for commonly used function domains.

  17. Ideals Dataset

    • zenodo.org
    zip
    Updated Sep 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ana Cruz; Ana Cruz (2022). Ideals Dataset [Dataset]. http://doi.org/10.5281/zenodo.6939734
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 19, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ana Cruz; Ana Cruz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Generated datasets of multiple ideals distributions used in the research in linear regressions and machine learning algorithm for the thesis s 'Predicting the performance of Buchberger‘s algorithm' . Concatenated and concatenated_stats are the datasets with the ideals exponents and correspondent polynomial additions, these datasets were created specifically for RNN, features_dataset contains statistics regarding the ideals and polynomial_additions_dataset contains info regarding their polynomial additions created for multiple linear regression models and simple neural networks.

  18. Study Hours ,Student Scores for Linear Regression

    • kaggle.com
    Updated Sep 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    douaa bennoune (2024). Study Hours ,Student Scores for Linear Regression [Dataset]. https://www.kaggle.com/datasets/douaabennoune/study-hours-student-scores-for-linear-regression
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 23, 2024
    Dataset provided by
    Kaggle
    Authors
    douaa bennoune
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains a collection of 100 randomly generated data points representing the relationship between the number of hours a student spends studying and their corresponding performance, measured as a score. The data has been generated to simulate a real-world scenario where study hours are assumed to influence academic outcomes, making it an excellent resource for linear regression analysis and other machine learning tasks.

    Each row in the dataset consists of:

    Hours: The number of hours a student dedicates to studying, ranging between 0 and 10 hours. Scores: The student's performance score, represented as a percentage, ranging from 0 to 100. Use Cases: This dataset is particularly useful for:

    Linear Regression: Exploring how study hours influence student performance, fitting a regression line to predict scores based on study time. Data Science & Machine Learning: Practicing regression analysis, training models, and applying other predictive algorithms. Educational Research: Simulating data-driven insights into student behavior and performance metrics. Features: 100 rows of data. Continuous numerical variables suitable for regression tasks. Generated for educational purposes, making it ideal for students, teachers, and beginners in machine learning and data science. Potential Applications: Build a linear regression model to predict student scores. Investigate the correlation between study time and performance. Apply data visualization techniques to better understand the data. Use the dataset to experiment with model evaluation metrics like Mean Squared Error (MSE) and R-squared.

  19. t

    Data from: MIST: l0 Sparse Linear Regression with Momentum

    • service.tib.eu
    Updated Jan 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). MIST: l0 Sparse Linear Regression with Momentum [Dataset]. https://service.tib.eu/ldmservice/dataset/mist--l0-sparse-linear-regression-with-momentum
    Explore at:
    Dataset updated
    Jan 3, 2025
    Description

    The dataset used in the paper is a large linear system of equations with a sparse solution. The authors used this dataset to test their Momentumized Iterative Shrinkage Thresholding (MIST) algorithm.

  20. c

    Data from: Data for Regression Models to Estimate Water Use in Providence,...

    • s.cnmilf.com
    • data.usgs.gov
    • +2more
    Updated Oct 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Data for Regression Models to Estimate Water Use in Providence, Rhode Island, 2014-2021 [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/data-for-regression-models-to-estimate-water-use-in-providence-rhode-island-2014-2021
    Explore at:
    Dataset updated
    Oct 1, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Rhode Island, Providence
    Description

    This data release contains input data and programs (scripts) used to estimate monthly water demand for retail customers of Providence Water, located in Providence, Rhode Island. Explanatory data and model outputs are from July 2014 through June 2021. Models of per capita (for single-family residential customers) or per connection (for multi-family residential, commercial, and industrial customers) water use were developed using multiple linear regression. The dependent variables, provided by Providence Water, are the monthly number of connections and gallons of water delivered to single- and multi-family residential, commercial, and industrial connections. Potential independent variables (from online sources) are climate variables (temperature and precipitation), economic statistics, and a drought statistic. Not all independent variables were used in all of the models. The data are provided in data tables and model files. The data table RIWaterUseVariableExplanation.csv describes the explanatory variables and their data sources. The data table ProvModelInputData.csv provides the monthly water-use data that are the independent variables and the monthly climatic and economic data that are the dependent variables. The data table DroughtInputData.csv provides the weekly U.S. drought monitor index values that were processed to formulate a potential independent variable. The R script model_water_use.R runs the models that predict water use. The other two R scripts (load_preprocess_input_data.R and model_water_use_functions.R) are not run explicitly but are called from the primary script model_water_use.R. Regression equations produced by the models can be used to predict water demand throughout Rhode Island.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Aaditya Gupta (2024). Dataset for demonstrating simple linear Regression [Dataset]. https://www.kaggle.com/datasets/aadityagupta11/data-for-demonstrating-basic-linear-regression
Organization logo

Dataset for demonstrating simple linear Regression

The dataset consists of 100 observations which can be used for regression tasks

Explore at:
zip(2132 bytes)Available download formats
Dataset updated
Jul 3, 2024
Authors
Aaditya Gupta
Description

This dataset has been created to demonstrate the use of a simple linear regression model. It includes two variables: an independent variable and a dependent variable. The data can be used for training, testing, and validating a simple linear regression model, making it ideal for educational purposes, tutorials, and basic predictive analysis projects. The dataset consists of 100 observations with no missing values, and it follows a linear relationship

Search
Clear search
Close search
Google apps
Main menu