This data set contains example data for exploration of the theory of regression based regionalization. The 90th percentile of annual maximum streamflow is provided as an example response variable for 293 streamgages in the conterminous United States. Several explanatory variables are drawn from the GAGES-II data base in order to demonstrate how multiple linear regression is applied. Example scripts demonstrate how to collect the original streamflow data provided and how to recreate the figures from the associated Techniques and Methods chapter.
This dataset is having data of customers who buys clothes online. The store offers in-store style and clothing advice sessions. Customers come in to the store, have sessions/meetings with a personal stylist, then they can go home and order either on a mobile app or website for the clothes they want.
The company is trying to decide whether to focus their efforts on their mobile app experience or their website.
Site-specific multiple linear regression models were developed for eight sites in Ohio—six in the Western Lake Erie Basin and two in northeast Ohio on inland reservoirs--to quickly predict action-level exceedances for a cyanotoxin, microcystin, in recreational and drinking waters used by the public. Real-time models include easily- or continuously-measured factors that do not require that a sample be collected. Real-time models are presented in two categories: (1) six models with continuous monitor data, and (2) three models with on-site measurements. Real-time models commonly included variables such as phycocyanin, pH, specific conductance, and streamflow or gage height. Many of the real-time factors were averages over time periods antecedent to the time the microcystin sample was collected, including water-quality data compiled from continuous monitors. Comprehensive models use a combination of discrete sample-based measurements and real-time factors. Comprehensive models were useful at some sites with lagged variables (< 2 weeks) for cyanobacterial toxin genes, dissolved nutrients, and (or) N to P ratios. Comprehensive models are presented in three categories: (1) three models with continuous monitor data and lagged comprehensive variables, (2) five models with no continuous monitor data and lagged comprehensive variables, and (3) one model with continuous monitor data and same-day comprehensive variables. Funding for this work was provided by the Ohio Water Development Authority and the U.S. Geological Survey Cooperative Water Program.
A simple dataset prepared for learning the subject of linear regression. This dataset is related to the scores of 61 students. It has two columns. It contains the duration of the exam and the column related to the score It has two columns. It contains the duration of the exam and the column related to the grade
This dataset was created by Muhammad Abiodun SULAIMAN
This dataset was created by Abdul Ali Nawrozie
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Introduction to Primate Data Exploration and Linear Modeling with R was created with the goal of providing training to undergraduate biology students on data management and statistical analysis using authentic data of Cayo Santiago rhesus macaques. Module M.4 introduces simple linear regression analysis in R.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary : Fuel demand is shown to be influenced by fuel prices, people's income and motorization rates. We explore the effects of electric vehicle's rates in gasoline demand using this panel dataset.
Files : dataset.csv - Panel dimensions are the Brazilian state ( i ) and year ( t ). The other columns are: gasoline sales per capita (ln_Sg_pc), prices of gasoline (ln_Pg) and ethanol (ln_Pe) and their lags, motorization rates of combustion vehicles (ln_Mi_c) and electric vehicles (ln_Mi_e) and GDP per capita (ln_gdp_pc). All variables are all under the natural log function, since we use this to calculate demand elasticities in a regression model.
adjacency.csv - The adjacency matrix used in interaction with electric vehicles' motorization rates to calculate spatial effects. At first, it follows a binary adjacency formula: for each pair of states i and j, the cell (i, j) is 0 if the states are not adjacent and 1 if they are. Then, each row is normalized to have sum equal to one.
regression.do - Series of Stata commands used to estimate the regression models of our study. dataset.csv must be imported to work, see comment section.
dataset_predictions.xlsx - Based on the estimations from Stata, we use this excel file to make average predictions by year and by state. Also, by including years beyond the last panel sample, we also forecast the model into the future and evaluate the effects of different policies that influence gasoline prices (taxation) and EV motorization rates (electrification). This file is primarily used to create images, but can be used to further understand how the forecasting scenarios are set up.
Sources: Fuel prices and sales: ANP (https://www.gov.br/anp/en/access-information/what-is-anp/what-is-anp) State population, GDP and vehicle fleet: IBGE (https://www.ibge.gov.br/en/home-eng.html?lang=en-GB) State EV fleet: Anfavea (https://anfavea.com.br/en/site/anuarios/)
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
viewer: true
Synthetic Linear Regression Dataset
This dataset consists of 1000 synthetic data points for training and evaluating simple linear regression models.
Usage
You can load this dataset manually using pandas: import pandas as pd
df = pd.read_csv('synthetic_linear_data.csv') print(df.head())
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Student Performance (Multiple Linear Regression) Dataset is designed to analyze the relationship between students’ learning habits and academic performance. Each sample includes key indicators related to learning, such as study hours, sleep duration, previous test scores, and the number of practice exams completed.
2) Data Utilization (1) Characteristics of the Student Performance (Multiple Linear Regression) Dataset: • The target variable, Hours Studied, quantitatively represents the amount of time a student has invested in studying. The dataset is structured to allow modeling and inference of learning behaviors based on correlations with other variables.
(2) Applications of the Student Performance (Multiple Linear Regression) Dataset: • AI-Based Study Time Prediction Models: The dataset can be used to develop regression models that estimate a student’s expected study time based on inputs like academic performance, sleep habits, and engagement patterns. • Behavioral Analysis and Personalized Learning Strategies: It can be applied to identify students with insufficient study time and design personalized study interventions based on academic and lifestyle patterns.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Simple Linear Regression - Placement data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/mayurdalvi/simple-linear-regression-placement-data on 28 January 2022.
--- Dataset description provided by original source is as follows ---
This package was build to understand Simple Linear Regression. The content in this dataset are easy to understand.
Contains Two columns:
CGPA : Aggregate Cgpa received Package : Total Package (LPA)
If like my work please UPVOTE 🙏🙏
--- Original source retains full ownership of the source dataset ---
This dataset was created by Hrishikesh_Dutta0078
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionIntimate Partner Violence (IPV) is a worldwide public health problem and major human and legal rights abuses of women. It affects the physical, sexual, and psychological aspects of the victims therefore, it requires complex and multifaceted interventions. Health providers are responsible for providing essential healthcare services for IPV victims. However, there is a lack of detailed information on whether or not health providers are ready to identify and manage IPV. Therefore, this study aimed to assess health providers’ readiness and associated factors in managing IPV in public health institutions at Hawassa, Ethiopia.MethodInstitutional based cross-sectional study was conducted through a simple random sample of 424 health providers. Data was collected with an anonymous questioners using physician Readiness to Manage Intimate Partner Violence Survey (PREMIS) tool. Linear regression analysis was used to examine relationships among variables. The strength of association was assessed by using unstandardized β with 95% CI.ResultsThe mean score of perceived provider’s readiness in managing IPV was 26.18± 6.69. Higher providers age and providers perceived knowledge had positive association with provider perceived readiness in managing IPV. Whereas not had IPV training, absence of a protocol for dealing with IPV management, and provider attitude had a negative association with provider perceived readiness in managing IPV.Conclusion and recommendationThis study reviled that health providers had limited perceived readiness to manage IPV. Provision of training for providers and develop protocol for IPV managements have an important role to improve providers readiness in the managements of IPV.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
There are surveys that gather precise information on an outcome of interest, but measure continuous covariates by a discrete number of intervals, in which case the covariates are interval censored. For applications with a second independent dataset precisely measuring the covariates, but not the outcome, this paper introduces a semiparametrically efficient estimator for the coefficients in a linear regression model. The second sample serves to establish point identification. An empirical application investigating the relationship between income and body mass index illustrates the use of the estimator.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
A series of simple linear regression models were developed for the U.S. Geological Survey (USGS) streamgage at Rice Creek below Highway 8 in Mounds View, Minnesota (USGS station number 05288580). The simple linear regression models were calibrated using streamflow data to estimate suspended-sediment (total, fines, and sands) and bedload. Data were collected during water years 2010, 2011, 2014, 2018, and 2019. The estimates from the simple linear regressions were used to calculate loads for water years 2010 through 2019. The calibrated simple linear regression models were used to improve understanding of sediment transport processes and increase accuracy of estimating sediment and loads for Rice Creek. Two multidimensional flow and models were developed with the International River Interface Cooperative (iRIC) software and Flow and Sediment Transport with Morphological Evolution of Channels (FaSTMECH) solver. These models were developed with elevation data from terrestrial laser sc ...
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Dataset Card for Testingdatasetcards
Very Simple Multiple Linear Regression Dataset
Dataset Details
Dataset Description
Curated by: HUSSAIN NASIR KHAN (Kaggle) Shared by [optional]: Maria Murphy Language(s) (NLP): English License: CC0: Public Domain
Uses
Intended for practice with linear regression.
Dataset Structure
Contains three columns (age, experience, income) and twenty observations.
This dataset was created by Samratsingh Dikkhat
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A disadvantage to online clothes shopping is the inability to try on clothing to test the fit. A class project is discussed where students consult with the CEO of an online mensware clothing company to explore ways in which an online clothing customer can be assured of a superior fit by developing statistical models based on a shopper’s height and weight to predict measurements needed to create a suit that feels custom-made. The dataset is most amenable to use with students who have previously been exposed to simple linear regression, and can be used to explore multiple regression topics such as interaction terms, influential points, transformations, and polynomial predictors. Discussion points are included for more advanced topics such as canonical correlation, clustering, and dimension reduction.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Simple linear regression results for STS.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data set from PLOS ONE Article Published Entitled: Western Lowland Gorillas Signal Selectively Using Odor
This data set contains example data for exploration of the theory of regression based regionalization. The 90th percentile of annual maximum streamflow is provided as an example response variable for 293 streamgages in the conterminous United States. Several explanatory variables are drawn from the GAGES-II data base in order to demonstrate how multiple linear regression is applied. Example scripts demonstrate how to collect the original streamflow data provided and how to recreate the figures from the associated Techniques and Methods chapter.