100+ datasets found

U
An example data set for exploration of Multiple Linear Regression
data.usgs.gov
res1catalogd-o-tdatad-o-tgov.vcapture.xyz
+1more
Updated Feb 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
William Farmer (2024). An example data set for exploration of Multiple Linear Regression [Dataset]. http://doi.org/10.5066/P9T5ZEXV
Explore at:
Unique identifier
https://doi.org/10.5066/P9T5ZEXV
Dataset updated
Feb 24, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
William Farmer
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Time period covered
1956 - 2016
Description
This data set contains example data for exploration of the theory of regression based regionalization. The 90th percentile of annual maximum streamflow is provided as an example response variable for 293 streamgages in the conterminous United States. Several explanatory variables are drawn from the GAGES-II data base in order to demonstrate how multiple linear regression is applied. Example scripts demonstrate how to collect the original streamflow data provided and how to recreate the figures from the associated Techniques and Methods chapter.
d
Data for multiple linear regression models for predicting microcystin...
catalog.data.gov
data.usgs.gov
+3more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Data for multiple linear regression models for predicting microcystin concentration action-level exceedances in selected lakes in Ohio [Dataset]. https://catalog.data.gov/dataset/data-for-multiple-linear-regression-models-for-predicting-microcystin-concentration-action
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Ohio
Description
Site-specific multiple linear regression models were developed for eight sites in Ohio—six in the Western Lake Erie Basin and two in northeast Ohio on inland reservoirs--to quickly predict action-level exceedances for a cyanotoxin, microcystin, in recreational and drinking waters used by the public. Real-time models include easily- or continuously-measured factors that do not require that a sample be collected. Real-time models are presented in two categories: (1) six models with continuous monitor data, and (2) three models with on-site measurements. Real-time models commonly included variables such as phycocyanin, pH, specific conductance, and streamflow or gage height. Many of the real-time factors were averages over time periods antecedent to the time the microcystin sample was collected, including water-quality data compiled from continuous monitors. Comprehensive models use a combination of discrete sample-based measurements and real-time factors. Comprehensive models were useful at some sites with lagged variables (< 2 weeks) for cyanobacterial toxin genes, dissolved nutrients, and (or) N to P ratios. Comprehensive models are presented in three categories: (1) three models with continuous monitor data and lagged comprehensive variables, (2) five models with no continuous monitor data and lagged comprehensive variables, and (3) one model with continuous monitor data and same-day comprehensive variables. Funding for this work was provided by the Ohio Water Development Authority and the U.S. Geological Survey Cooperative Water Program.
Linear Regression example Dataset
kaggle.com
Updated Dec 20, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Çağrı Karadeniz (2021). Linear Regression example Dataset [Dataset]. https://www.kaggle.com/datasets/arkaradeniz/linear-regression-example-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 20, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Çağrı Karadeniz
Description
Dataset

This dataset was created by Çağrı Karadeniz

Contents
m
Panel dataset on Brazilian fuel demand
data.mendeley.com
Updated Oct 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sergio Prolo (2024). Panel dataset on Brazilian fuel demand [Dataset]. http://doi.org/10.17632/hzpwbp7j22.1
Explore at:
Unique identifier
https://doi.org/10.17632/hzpwbp7j22.1
Dataset updated
Oct 7, 2024
Authors
Sergio Prolo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary : Fuel demand is shown to be influenced by fuel prices, people's income and motorization rates. We explore the effects of electric vehicle's rates in gasoline demand using this panel dataset.

Files : dataset.csv - Panel dimensions are the Brazilian state ( i ) and year ( t ). The other columns are: gasoline sales per capita (ln_Sg_pc), prices of gasoline (ln_Pg) and ethanol (ln_Pe) and their lags, motorization rates of combustion vehicles (ln_Mi_c) and electric vehicles (ln_Mi_e) and GDP per capita (ln_gdp_pc). All variables are all under the natural log function, since we use this to calculate demand elasticities in a regression model.

adjacency.csv - The adjacency matrix used in interaction with electric vehicles' motorization rates to calculate spatial effects. At first, it follows a binary adjacency formula: for each pair of states i and j, the cell (i, j) is 0 if the states are not adjacent and 1 if they are. Then, each row is normalized to have sum equal to one.

regression.do - Series of Stata commands used to estimate the regression models of our study. dataset.csv must be imported to work, see comment section.

dataset_predictions.xlsx - Based on the estimations from Stata, we use this excel file to make average predictions by year and by state. Also, by including years beyond the last panel sample, we also forecast the model into the future and evaluate the effects of different policies that influence gasoline prices (taxation) and EV motorization rates (electrification). This file is primarily used to create images, but can be used to further understand how the forecasting scenarios are set up.

Sources: Fuel prices and sales: ANP (https://www.gov.br/anp/en/access-information/what-is-anp/what-is-anp) State population, GDP and vehicle fleet: IBGE (https://www.ibge.gov.br/en/home-eng.html?lang=en-GB) State EV fleet: Anfavea (https://anfavea.com.br/en/site/anuarios/)
Stonybrook_AMS578_Multiple_Regression_Dataset
kaggle.com
Updated Dec 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joseph Chan (2020). Stonybrook_AMS578_Multiple_Regression_Dataset [Dataset]. https://www.kaggle.com/josephchan524/stonybrook-ams578-multiple-regression-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 20, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Joseph Chan
Description
Context

This is a dataset is a Multiple Regression Project from an Applied Math Science Graduate Level Course at Stony Brook (AMS578 Spring 2020).

The class blackboard has a pdf file of a paper by Caspi et al. that reports a finding of a gene-environment interaction. This paper used multiple regression techniques as the methodology for its findings. You should read it for background, as it is the genesis of the models that you will be given. The data that you are analyzing is synthetic. That is, the TA used a model to generate the data. Your task is to find the model that the TA used for your data. For example, one possible model is

The class blackboard also contains a paper by Risch et al. that uses a larger collection of data to assess the findings in Caspi et al. These researchers confirmed that Caspi et al. calculated their results correctly but that no other dataset had the relation reported in Caspi et al. That is, Caspi et al. seem to have reported a false positive (Type I error). The class blackboard contains a recent paper about the genetics of mental illness and a technical appendix giving the specifics. Together these papers are an example of the response of the research community to studying the genetics of mental illness, which is a notoriously difficult research area.

Content

One file contains the patient identifier and the dependent variable value. The second file contains the patient identifier and values of six environment variables called E1 to E6. The third file contains the patient identifier and the twenty independent indicator variables called G1 to G20. The records may not be in correct order in each file, and cases may be missing in one or more of the files. You can process the data with VMLOOKUP or other data merging software.
u
Data from: Dataset of the paper “Variable selection for linear regression in...
investigacion.ubu.es
Updated 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pacheco Bonrostro, Joaquín; Casado Yusta, Silvia; Pacheco Bonrostro, Joaquín; Casado Yusta, Silvia (2020). Dataset of the paper “Variable selection for linear regression in large databases: exact methods” Applied Intelligence, 51(6), 3736-3756 [Dataset]. https://investigacion.ubu.es/documentos/682afba74c44bf76b28811e1
Explore at:
Dataset updated
2020
Authors
Pacheco Bonrostro, Joaquín; Casado Yusta, Silvia; Pacheco Bonrostro, Joaquín; Casado Yusta, Silvia
Description
The variable selection problem in the context of Linear Regression for large databases is analysed. The problem consists in selecting a small subset of independent variables that can perform the prediction task optimally. This problem has a wide range of applications. One important type of application is the design of composite indicators in various areas (sociology and economics, for example). Other important applications of variable selection in linear regression can be found in fields such as chemometrics, genetics, and climate prediction, among many others. For this problem, we propose a Branch & Bound method. This is an exact method and therefore guarantees optimal solutions. We also provide strategies that enable this method to be applied in very large databases (with hundreds of thousands of cases) in a moderate computation time. A series of computational experiments shows that our method performs well compared with well-known methods in the literature and with commercial software.
d
Data from: Mining Distance-Based Outliers in Near Linear Time
catalog.data.gov
datasets.ai
+1more
Updated Apr 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Mining Distance-Based Outliers in Near Linear Time [Dataset]. https://catalog.data.gov/dataset/mining-distance-based-outliers-in-near-linear-time
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description
Full title: Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule Abstract: Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used. We test our algorithm on real high-dimensional data sets with millions of examples and show that the near linear scaling holds over several orders of magnitude. Our average case analysis suggests that much of the efficiency is because the time to process non-outliers, which are the majority of examples, does not depend on the size of the data set.
toy_lr
kaggle.com
Updated Oct 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Dragon (2022). toy_lr [Dataset]. https://www.kaggle.com/datasets/daviddragon/toy-lr
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 9, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
David Dragon
Description
A toy dataset for running linear regression! The dataset consists of inputs and targets. Inputs are of shape (1000, 10), where there are 1000 examples and 10 input features. Targets are of shape (1000,), one target per example. Submit learned weights and biases at https://forms.gle/R4gRgrSYcMTPXZUy9 to get a score! Template notebook to get started: https://www.kaggle.com/code/daviddragon/toy-lr-template/notebook
g
Replication data for: Linear Models with Outliers: Choosing between...
datasearch.gesis.org
dataverse.harvard.edu
+1more
Updated Jan 22, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harden, Jeffrey; Desmarais, Bruce (2020). Replication data for: Linear Models with Outliers: Choosing between Conditional-Mean and Conditional-Median Methods [Dataset]. https://datasearch.gesis.org/dataset/httpsdataverse.unc.eduoai--hdl1902.2911608
Explore at:
Dataset updated
Jan 22, 2020
Dataset provided by
Odum Institute Dataverse Network
Authors
Harden, Jeffrey; Desmarais, Bruce
Description
State politics researchers commonly employ ordinary least squares (OLS) regression or one of its variants to test linear hypotheses. However, OLS is easily influenced by outliers and thus can produce misleading results when the error term distribution has heavy tails. Here we demonstrate that median regression (MR), an alternative to OLS that conditions the median of the dependent variable (rather than the mean) on the independent variables, can be a solution to this problem. Then we propose and validate a hypothesis test that applied researchers can use to select between OLS and MR in a given sample of data. Finally, we present two examples from state politics research in which (1) the test selects MR over OLS and (2) differences in results between the two methods could lead to different substantive inferences. We conclude that MR and the test we propose can improve linear models in state politics research.
SPSS Data Set S1 Logistic Regression Model Data
figshare.com
bin
Updated Jan 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michelle Klailova; Phyllis Lee (2016). SPSS Data Set S1 Logistic Regression Model Data [Dataset]. http://doi.org/10.6084/m9.figshare.1051748.v2
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1051748.v2
Dataset updated
Jan 19, 2016
Dataset provided by
Figsharehttp://figshare.com/
Authors
Michelle Klailova; Phyllis Lee
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data set from PLOS ONE Article Published Entitled: Western Lowland Gorillas Signal Selectively Using Odor
Dataset for: A comparison of approaches for simultaneous inference of fixed...
wiley.figshare.com
txt
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Signe Marie Jensen; Christian Ritz (2023). Dataset for: A comparison of approaches for simultaneous inference of fixed effects for multiple outcomes using linear mixed models [Dataset]. http://doi.org/10.6084/m9.figshare.5954836.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5954836.v1
Dataset updated
Jun 2, 2023
Dataset provided by
Wileyhttps://www.wiley.com/
Authors
Signe Marie Jensen; Christian Ritz
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Longitudinal studies with multiple outcomes often pose challenges for the statistical analysis. A joint model including all outcomes has the advantage of incorporating the simultaneous behavior but is often difficult to fit due to computational challenges. We consider two alternative approaches in order to quantify and assess the loss in efficiency as compared to joint modelling when evaluating fixed effects. The first approach is pairwise fitting of pseudo-likelihood functions for pairs of outcomes. The second approach recovers correlations between parameter estimates across multiple marginal linear mixed models. The methods are evaluated both in terms of a data example from a study on the effects of milk protein on health in young adolescents and in an extensive simulation study. We find that the two alternatives give similar results in settings where an exchangeability condition is met, but otherwise pairwise fitting shows a larger loss in efficiency than the marginal models approach. Using an alternative to the joint modelling strategy will lead to some but not necessarily a large loss of efficiency for small sample sizes.
Linear Algebra Dataset for Students
kaggle.com
Updated Apr 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aswina Vinod Kumar A (2025). Linear Algebra Dataset for Students [Dataset]. https://www.kaggle.com/datasets/aswinavinod/linear-algebra-dataset-for-students/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 9, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aswina Vinod Kumar A
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains a small set of linear algebra concepts, definitions, and example solutions that I’ve compiled from my personal learning. Topics include determinants, trace, eigenvalues, eigenvectors, and basic matrix properties. Each concept is explained with relevant formulas or proof.
H
Time-Series Matrix (TSMx): A visualization tool for plotting multiscale...
dataverse.harvard.edu
Updated Jul 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Georgios Boumis; Brad Peter (2024). Time-Series Matrix (TSMx): A visualization tool for plotting multiscale temporal trends [Dataset]. http://doi.org/10.7910/DVN/ZZDYM9
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/ZZDYM9
Dataset updated
Jul 8, 2024
Dataset provided by
Harvard Dataverse
Authors
Georgios Boumis; Brad Peter
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Time-Series Matrix (TSMx): A visualization tool for plotting multiscale temporal trends TSMx is an R script that was developed to facilitate multi-temporal-scale visualizations of time-series data. The script requires only a two-column CSV of years and values to plot the slope of the linear regression line for all possible year combinations from the supplied temporal range. The outputs include a time-series matrix showing slope direction based on the linear regression, slope values plotted with colors indicating magnitude, and results of a Mann-Kendall test. The start year is indicated on the y-axis and the end year is indicated on the x-axis. In the example below, the cell in the top-right corner is the direction of the slope for the temporal range 2001–2019. The red line corresponds with the temporal range 2010–2019 and an arrow is drawn from the cell that represents that range. One cell is highlighted with a black border to demonstrate how to read the chart—that cell represents the slope for the temporal range 2004–2014. This publication entry also includes an excel template that produces the same visualizations without a need to interact with any code, though minor modifications will need to be made to accommodate year ranges other than what is provided. TSMx for R was developed by Georgios Boumis; TSMx was originally conceptualized and created by Brad G. Peter in Microsoft Excel. Please refer to the associated publication: Peter, B.G., Messina, J.P., Breeze, V., Fung, C.Y., Kapoor, A. and Fan, P., 2024. Perspectives on modifiable spatiotemporal unit problems in remote sensing of agriculture: evaluating rice production in Vietnam and tools for analysis. Frontiers in Remote Sensing, 5, p.1042624. https://www.frontiersin.org/journals/remote-sensing/articles/10.3389/frsen.2024.1042624 TSMx sample chart from the supplied Excel template. Data represent the productivity of rice agriculture in Vietnam as measured via EVI (enhanced vegetation index) from the NASA MODIS data product (MOD13Q1.V006). TSMx R script: # import packages library(dplyr) library(readr) library(ggplot2) library(tibble) library(tidyr) library(forcats) library(Kendall) options(warn = -1) # disable warnings # read data (.csv file with "Year" and "Value" columns) data <- read_csv("EVI.csv") # prepare row/column names for output matrices years <- data %>% pull("Year") r.names <- years[-length(years)] c.names <- years[-1] years <- years[-length(years)] # initialize output matrices sign.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) pval.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) slope.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) # function to return remaining years given a start year getRemain <- function(start.year) { years <- data %>% pull("Year") start.ind <- which(data[["Year"]] == start.year) + 1 remain <- years[start.ind:length(years)] return (remain) } # function to subset data for a start/end year combination splitData <- function(end.year, start.year) { keep <- which(data[['Year']] >= start.year & data[['Year']] <= end.year) batch <- data[keep,] return(batch) } # function to fit linear regression and return slope direction fitReg <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(sign(slope)) } # function to fit linear regression and return slope magnitude fitRegv2 <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(slope) } # function to implement Mann-Kendall (MK) trend test and return significance # the test is implemented only for n>=8 getMann <- function(batch) { if (nrow(batch) >= 8) { mk <- MannKendall(batch[['Value']]) pval <- mk[['sl']] } else { pval <- NA } return(pval) } # function to return slope direction for all combinations given a start year getSign <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) signs <- lapply(combs, fitReg) return(signs) } # function to return MK significance for all combinations given a start year getPval <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) pvals <- lapply(combs, getMann) return(pvals) } # function to return slope magnitude for all combinations given a start year getMagn <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) magns <- lapply(combs, fitRegv2) return(magns) } # retrieve slope direction, MK significance, and slope magnitude signs <- lapply(years, getSign) pvals <- lapply(years, getPval) magns <- lapply(years, getMagn) # fill-in output matrices dimension <- nrow(sign.matrix) for (i in 1:dimension) { sign.matrix[i, i:dimension] <- unlist(signs[i]) pval.matrix[i, i:dimension] <- unlist(pvals[i]) slope.matrix[i, i:dimension] <- unlist(magns[i]) } sign.matrix <-...
f
Data from: Solving linear regression without skewness of the residuals’...
tandf.figshare.com
txt
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Ricker (2023). Solving linear regression without skewness of the residuals’ distribution [Dataset]. http://doi.org/10.6084/m9.figshare.8152901.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.8152901.v1
Dataset updated
Jun 5, 2023
Dataset provided by
Taylor & Francis
Authors
Martin Ricker
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Linear ordinary least squares (OLS) regression assumes an unskewed distribution of the residuals for correct inference and prediction. A proof is given that for Manly’s exponential transformation of the dependent variable, there is always at least one solution for λ, such that the skewness of the standardized residuals’ distribution is zero. A computer code in Mathematica, together with an illustrative example, are provided. Generalized linear models are discussed briefly in comparison.
e
Introduction to spatial statistics - Dataset - B2FIND
b2find.eudat.eu
Updated Nov 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Introduction to spatial statistics - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/63f3ba40-a121-5fd4-bc46-ae5c7797c215
Explore at:
Dataset updated
Nov 24, 2024
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This dataset constitutes an introduction to plotting and mapping and the essential concepts of spatial data management and modeling. And data ready for several examples of regression and classification algorithms (Multiple Linear Regression, Generalized Linear Models, CART and Random Forest), also exploring classic interpolation methods (Inverse Distance Weighting and Kriging). R, 4.2.1 This research has led to the development of teaching materials and the improvement of teaching practices.
f
DataSheet1_Repeated Measures Correlation.pdf
frontiersin.figshare.com
pdf
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan Z. Bakdash; Laura R. Marusich (2023). DataSheet1_Repeated Measures Correlation.pdf [Dataset]. http://doi.org/10.3389/fpsyg.2017.00456.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fpsyg.2017.00456.s001
Dataset updated
May 30, 2023
Dataset provided by
Frontiers
Authors
Jonathan Z. Bakdash; Laura R. Marusich
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Repeated measures correlation (rmcorr) is a statistical technique for determining the common within-individual association for paired measures assessed on two or more occasions for multiple individuals. Simple regression/correlation is often applied to non-independent observations or aggregated data; this may produce biased, specious results due to violation of independence and/or differing patterns between-participants versus within-participants. Unlike simple regression/correlation, rmcorr does not violate the assumption of independence of observations. Also, rmcorr tends to have much greater statistical power because neither averaging nor aggregation is necessary for an intra-individual research question. Rmcorr estimates the common regression slope, the association shared among individuals. To make rmcorr accessible, we provide background information for its assumptions and equations, visualization, power, and tradeoffs with rmcorr compared to multilevel modeling. We introduce the R package (rmcorr) and demonstrate its use for inferential statistics and visualization with two example datasets. The examples are used to illustrate research questions at different levels of analysis, intra-individual, and inter-individual. Rmcorr is well-suited for research questions regarding the common linear association in paired repeated measures data. All results are fully reproducible.
e
Simple download service (Atom) of the dataset: Linear entity at the origin...
data.europa.eu
unknown
Updated Jan 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Simple download service (Atom) of the dataset: Linear entity at the origin of the risk of Loison-sub-Lens NPP [Dataset]. https://data.europa.eu/data/datasets/fr-120066022-srv-af5937d8-dd86-424a-a3c8-541b84ba8b68
Explore at:
unknownAvailable download formats
Dataset updated
Jan 26, 2022
Description
The origin of the risk characterises the real-world entity which, through its presence, represents a potential risk. This origin may be characterised by a name and, in some cases, a geographical object locating the actual entity causing the risk. The location of the entity and the knowledge of the hazardous phenomenon are used to define the risk pools, the risk-exposed areas that underpin the RPP.For NPPs, this entity may, for example, correspond to a river, a geologically unstable area.
e
Simple download service (Atom) of the dataset: Linear coverings of the POS...
data.europa.eu
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simple download service (Atom) of the dataset: Linear coverings of the POS (doc. of 07.04.2000) of the municipality of Ménétréol-sous-Sancerre [Dataset]. https://data.europa.eu/data/datasets/fr-120066022-srv-ef8a5ed6-50a0-4b71-b7f8-71cf4d44783c
Explore at:
inspire download serviceAvailable download formats
Description
The cladding elements are entries in relation to a regulatory provision (way width, odds, names of neighbouring municipalities.) or geometrical surface, linear or point indicative elements, dressing the graphic documents of the PLU or the POS. They are necessary for the paper edition of the applicable graphic documents. This may be, for example, a hold of a detail plan, a frame, a cartridge, a reminder for a writing, a draw to draw a rating, an equipment identification label
t
Linear Separation via Optimism - Dataset - LDM
service.tib.eu
Updated Dec 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Linear Separation via Optimism - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/linear-separation-via-optimism
Explore at:
Dataset updated
Dec 16, 2024
Description
The dataset is a linearly separable dataset with margin γ > 0, where each example pair is described by a vector x(i) ∈ Rd and a label y(i) ∈ {±1}.
e
Simple download service (Atom) of the dataset: Linear entity at the origin...
data.europa.eu
Updated Apr 1, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). Simple download service (Atom) of the dataset: Linear entity at the origin of the risk of Mazingarbe PPRN [Dataset]. https://data.europa.eu/data/datasets/fr-120066022-srv-dd4fe46a-c6c0-47ac-8c1c-c33ca06f1db5/embed
Explore at:
inspire download serviceAvailable download formats
Dataset updated
Apr 1, 2019
Description
The origin of the risk characterises the real-world entity which, through its presence, represents a potential risk. This origin may be characterised by a name and, in some cases, a geographical object locating the actual entity causing the risk. The location of the entity and the knowledge of the hazardous phenomenon are used to define the risk pools, the risk-exposed areas that underpin the RPP.For NPPs, this entity may, for example, correspond to a river, a geologically unstable area.

Facebook

Twitter

Click to copy link

Link copied

Cite

William Farmer (2024). An example data set for exploration of Multiple Linear Regression [Dataset]. http://doi.org/10.5066/P9T5ZEXV

An example data set for exploration of Multiple Linear Regression

Explore at:

Unique identifier

https://doi.org/10.5066/P9T5ZEXV

Dataset updated

Feb 24, 2024

Dataset provided by

United States Geological Surveyhttp://www.usgs.gov/

Authors

William Farmer

License

U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically

Time period covered

1956 - 2016

Description

This data set contains example data for exploration of the theory of regression based regionalization. The 90th percentile of annual maximum streamflow is provided as an example response variable for 293 streamgages in the conterminous United States. Several explanatory variables are drawn from the GAGES-II data base in order to demonstrate how multiple linear regression is applied. Example scripts demonstrate how to collect the original streamflow data provided and how to recreate the figures from the associated Techniques and Methods chapter.

Clear search

Close search

Google apps

Main menu

An example data set for exploration of Multiple Linear Regression

Data for multiple linear regression models for predicting microcystin...

Linear Regression example Dataset

Dataset

Contents

Panel dataset on Brazilian fuel demand

Stonybrook_AMS578_Multiple_Regression_Dataset

Context

Content

Data from: Dataset of the paper “Variable selection for linear regression in...

Data from: Mining Distance-Based Outliers in Near Linear Time

toy_lr

Replication data for: Linear Models with Outliers: Choosing between...

SPSS Data Set S1 Logistic Regression Model Data

Dataset for: A comparison of approaches for simultaneous inference of fixed...

Linear Algebra Dataset for Students

Time-Series Matrix (TSMx): A visualization tool for plotting multiscale...

Data from: Solving linear regression without skewness of the residuals’...

Introduction to spatial statistics - Dataset - B2FIND

DataSheet1_Repeated Measures Correlation.pdf

Simple download service (Atom) of the dataset: Linear entity at the origin...

Simple download service (Atom) of the dataset: Linear coverings of the POS...

Linear Separation via Optimism - Dataset - LDM

Simple download service (Atom) of the dataset: Linear entity at the origin...

An example data set for exploration of Multiple Linear Regression