68 datasets found

polynomial regression
kaggle.com
Updated Jul 5, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Miraj Deep Bhandari (2023). polynomial regression [Dataset]. http://doi.org/10.34740/kaggle/ds/3482232
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/ds/3482232
Dataset updated
Jul 5, 2023
Dataset provided by
Kaggle
Authors
Miraj Deep Bhandari
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The Ice Cream Selling dataset is a simple and well-suited dataset for beginners in machine learning who are looking to practice polynomial regression. It consists of two columns: temperature and the corresponding number of units of ice cream sold.

The dataset captures the relationship between temperature and ice cream sales. It serves as a practical example for understanding and implementing polynomial regression, a powerful technique for modeling nonlinear relationships in data.

The dataset is designed to be straightforward and easy to work with, making it ideal for beginners. The simplicity of the data allows beginners to focus on the fundamental concepts and steps involved in polynomial regression without overwhelming complexity.

By using this dataset, beginners can gain hands-on experience in preprocessing the data, splitting it into training and testing sets, selecting an appropriate degree for the polynomial regression model, training the model, and evaluating its performance. They can also explore techniques to address potential challenges such as overfitting.

With this dataset, beginners can practice making predictions of ice cream sales based on temperature inputs and visualize the polynomial regression curve that represents the relationship between temperature and ice cream sales.

Overall, the Ice Cream Selling dataset provides an accessible and practical learning resource for beginners to grasp the concepts and techniques of polynomial regression in the context of analyzing ice cream sales data.
e
Linear Regression Rate
data.europa.eu
csv, esri shape +4
Updated Nov 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenDataNI (2025). Linear Regression Rate [Dataset]. https://data.europa.eu/data/datasets/linear-regression-rate2?locale=de
Explore at:
kml, json, html, geojson, esri shape, csvAvailable download formats
Dataset updated
Nov 5, 2025
Dataset authored and provided by
OpenDataNI
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
The primary objective from this project was to acquire historical shoreline information for all of the Northern Ireland coastline. Having this detailed understanding of the coast’s shoreline position and geometry over annual to decadal time periods is essential in any management of the coast.
The historical shoreline analysis was based on all available Ordnance Survey maps and aerial imagery information. Analysis looked at position and geometry over annual to decadal time periods, providing a dynamic picture of how the coastline has changed since the start of the early 1800s.
Once all datasets were collated, data was interrogated using the ArcGIS package – Digital Shoreline Analysis System (DSAS). DSAS is a software package which enables a user to calculate rate-of-change statistics from multiple historical shoreline positions. Rate-of-change was collected at 25m intervals and displayed both statistically and spatially allowing for areas of retreat/accretion to be identified at any given stretch of coastline.
The DSAS software will produce the following rate-of-change statistics:
Net Shoreline Movement (NSM) – the distance between the oldest and the youngest shorelines.
Shoreline Change Envelope (SCE) – a measure of the total change in shoreline movement considering all available shoreline positions and reporting their distances, without reference to their specific dates.
End Point Rate (EPR) – derived by dividing the distance of shoreline movement by the time elapsed between the oldest and the youngest shoreline positions.
Linear Regression Rate (LRR) – determines a rate of change statistic by fitting a least square regression to all shorelines at specific transects.
Weighted Linear Regression Rate (WLR) - calculates a weighted linear regression of shoreline change on each transect. It considers the shoreline uncertainty giving more emphasis on shorelines with a smaller error.
The end product provided by Ulster University is an invaluable tool and digital asset that has helped to visualise shoreline change and assess approximate rates of historical change at any given coastal stretch on the Northern Ireland coast.
Study Hours ,Student Scores for Linear Regression
kaggle.com
Updated Sep 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
douaa bennoune (2024). Study Hours ,Student Scores for Linear Regression [Dataset]. https://www.kaggle.com/datasets/douaabennoune/study-hours-student-scores-for-linear-regression
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 23, 2024
Dataset provided by
Kaggle
Authors
douaa bennoune
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains a collection of 100 randomly generated data points representing the relationship between the number of hours a student spends studying and their corresponding performance, measured as a score. The data has been generated to simulate a real-world scenario where study hours are assumed to influence academic outcomes, making it an excellent resource for linear regression analysis and other machine learning tasks.

Each row in the dataset consists of:

Hours: The number of hours a student dedicates to studying, ranging between 0 and 10 hours. Scores: The student's performance score, represented as a percentage, ranging from 0 to 100. Use Cases: This dataset is particularly useful for:

Linear Regression: Exploring how study hours influence student performance, fitting a regression line to predict scores based on study time. Data Science & Machine Learning: Practicing regression analysis, training models, and applying other predictive algorithms. Educational Research: Simulating data-driven insights into student behavior and performance metrics. Features: 100 rows of data. Continuous numerical variables suitable for regression tasks. Generated for educational purposes, making it ideal for students, teachers, and beginners in machine learning and data science. Potential Applications: Build a linear regression model to predict student scores. Investigate the correlation between study time and performance. Apply data visualization techniques to better understand the data. Use the dataset to experiment with model evaluation metrics like Mean Squared Error (MSE) and R-squared.
Goodness-of-fit measure for multiple linear regression model.
plos.figshare.com
xls
Updated Jun 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hasanur Bin Khazri; Sadia Choudhury Shimmi; M. Tanveer Hossain Parash (2023). Goodness-of-fit measure for multiple linear regression model. [Dataset]. http://doi.org/10.1371/journal.pone.0273840.t010
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0273840.t010
Dataset updated
Jun 16, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Hasanur Bin Khazri; Sadia Choudhury Shimmi; M. Tanveer Hossain Parash
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Goodness-of-fit measure for multiple linear regression model.
n
Data from: WiBB: An integrated method for quantifying the relative...
data.niaid.nih.gov
data-staging.niaid.nih.gov
+1more
zip
Updated Aug 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qin Li; Xiaojun Kou (2021). WiBB: An integrated method for quantifying the relative importance of predictive variables [Dataset]. http://doi.org/10.5061/dryad.xsj3tx9g1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.xsj3tx9g1
Dataset updated
Aug 20, 2021
Dataset provided by
Beijing Normal University
Field Museum of Natural History
Authors
Qin Li; Xiaojun Kou
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
This dataset contains simulated datasets, empirical data, and R scripts described in the paper: “Li, Q. and Kou, X. (2021) WiBB: An integrated method for quantifying the relative importance of predictive variables. Ecography (DOI: 10.1111/ecog.05651)”.

A fundamental goal of scientific research is to identify the underlying variables that govern crucial processes of a system. Here we proposed a new index, WiBB, which integrates the merits of several existing methods: a model-weighting method from information theory (Wi), a standardized regression coefficient method measured by ß* (B), and bootstrap resampling technique (B). We applied the WiBB in simulated datasets with known correlation structures, for both linear models (LM) and generalized linear models (GLM), to evaluate its performance. We also applied two other methods, relative sum of wight (SWi), and standardized beta (ß*), to evaluate their performance in comparison with the WiBB method on ranking predictor importances under various scenarios. We also applied it to an empirical dataset in a plant genus Mimulus to select bioclimatic predictors of species’ presence across the landscape. Results in the simulated datasets showed that the WiBB method outperformed the ß* and SWi methods in scenarios with small and large sample sizes, respectively, and that the bootstrap resampling technique significantly improved the discriminant ability. When testing WiBB in the empirical dataset with GLM, it sensibly identified four important predictors with high credibility out of six candidates in modeling geographical distributions of 71 Mimulus species. This integrated index has great advantages in evaluating predictor importance and hence reducing the dimensionality of data, without losing interpretive power. The simplicity of calculation of the new metric over more sophisticated statistical procedures, makes it a handy method in the statistical toolbox.

Methods To simulate independent datasets (size = 1000), we adopted Galipaud et al.’s approach (2014) with custom modifications of the data.simulation function, which used the multiple normal distribution function rmvnorm in R package mvtnorm(v1.0-5, Genz et al. 2016). Each dataset was simulated with a preset correlation structure between a response variable (y) and four predictors(x1, x2, x3, x4). The first three (genuine) predictors were set to be strongly, moderately, and weakly correlated with the response variable, respectively (denoted by large, medium, small Pearson correlation coefficients, r), while the correlation between the response and the last (spurious) predictor was set to be zero. We simulated datasets with three levels of differences of correlation coefficients of consecutive predictors, where ∆r = 0.1, 0.2, 0.3, respectively. These three levels of ∆r resulted in three correlation structures between the response and four predictors: (0.3, 0.2, 0.1, 0.0), (0.6, 0.4, 0.2, 0.0), and (0.8, 0.6, 0.3, 0.0), respectively. We repeated the simulation procedure 200 times for each of three preset correlation structures (600 datasets in total), for LM fitting later. For GLM fitting, we modified the simulation procedures with additional steps, in which we converted the continuous response into binary data O (e.g., occurrence data having 0 for absence and 1 for presence). We tested the WiBB method, along with two other methods, relative sum of wight (SWi), and standardized beta (ß*), to evaluate the ability to correctly rank predictor importances under various scenarios. The empirical dataset of 71 Mimulus species was collected by their occurrence coordinates and correponding values extracted from climatic layers from WorldClim dataset (www.worldclim.org), and we applied the WiBB method to infer important predictors for their geographical distributions.
d
13: Models coefficients and statistics for regression models used to...
catalog.data.gov
data.usgs.gov
+1more
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey, 13: Models coefficients and statistics for regression models used to estimate streamwater loads for 12 water-quality constituents in 13 watersheds in Gwinnett County, Georgia for water years 2003-2020 [Dataset]. https://catalog.data.gov/dataset/13-models-coefficients-and-statistics-for-regression-models-used-to-estimate-streamwa-2003
Explore at:
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Gwinnett County
Description
The dataset contains model coefficients and statistics for the 488 regression models used to estimate streamwater constituent loads for 13 watersheds in Gwinnett County, Georgia for two calibration periods, water years 2003-2010 and 2010-2020. Model terms were selected from an 11-parameter equation, which was a function of discharge, base flow, season, turbidity, and time (trend), using a forward stepwise ordinary least squares regression approach. Model coefficients were fit using U.S. Geological Survey (USGS) LOADEST load estimation software. Models were fit both with and without turbidity explanatory variables for 12 water-quality constituents: total suspended solids, suspended sediment concentration, total nitrogen, total nitrate plus nitrite, total phosphorus, dissolved phosphorus, total organic carbon, total calcium, total magnesium, total lead, total zinc, and total dissolved solids. The dataset includes a summary of sample concentrations used to calibration the models (period of samples collected, number of concentrations, number of censored concentrations, and number of outliers removed), model coefficients, and selected model statistics (concentration and load model R-squares, estimated residual variance, serial correlation in the model residuals, and Turnbull-Weiss normality test statistic of residuals). Portable document format files of LOADEST output are provided for each model in a “zip” file that contain model diagnostic statistics and plots for evaluating model fits.
Predictive Analysis: Vehicle Prices
kaggle.com
zip
Updated Jul 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leonardo Galdino (2024). Predictive Analysis: Vehicle Prices [Dataset]. https://www.kaggle.com/datasets/leonardogaldinno/predictive-analysis-vehicle-price
Explore at:
zip(375879 bytes)Available download formats
Dataset updated
Jul 29, 2024
Authors
Leonardo Galdino
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Project Description:

In this project, I developed a linear regression model to predict car prices based on key features such as fuel tank capacity, width, length, and year of manufacture. The goal was to understand how these factors influence car prices and to assess the effectiveness of the model in making accurate predictions.

Key Features:

Fuel Tank Capacity: The capacity of the car’s fuel tank. Width: The width of the car. Length: The length of the car. Year: The year of manufacture of the car.

Target Variable:

Price: The price of the car, which is the primary variable being predicted.

Methodology:

Data Preparation:

Extracted relevant features and the target variable from the dataset.

Split the data into training and testing sets to evaluate model performance.

Model Training:

Built and trained a Linear Regression model using the training dataset.

Evaluated the model using Mean Absolute Error (MAE) and R-squared (R²) metrics to gauge prediction accuracy and model fit.

Feature Scaling:

Applied standard scaling to normalize the feature values, transforming them to have a mean of 0 and standard deviation of 1.

Retrained the model with the scaled data and reassessed its performance using the same evaluation metrics.

Evaluation:

Compared the model's performance before and after scaling to determine the impact of feature normalization. MAE was used to measure the average prediction error, while R² indicated how well the model explained the variance in car prices.

Visualization:

Created scatter plots to visualize the relationship between actual and predicted prices.

Added a reference line representing perfect predictions to highlight the model's accuracy.

Results:

The project demonstrated the effectiveness of linear regression in predicting car prices.

Feature scaling improved model performance, as reflected in the evaluation metrics.

Visualizations provided a clear understanding of the model’s accuracy and areas for improvement.

Technologies Used:

Python, with libraries such as pandas, NumPy, scikit-learn, and matplotlib for data manipulation, model building, and visualization.
g
Weighted Linear Regression | gimi9.com
gimi9.com
Updated Jan 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Weighted Linear Regression | gimi9.com [Dataset]. https://gimi9.com/dataset/uk_weighted-linear-regression
Explore at:
Dataset updated
Jan 4, 2024
Description
🇬🇧 영국 English The primary objective from this project was to acquire historical shoreline information for all of the Northern Ireland coastline. Having this detailed understanding of the coast’s shoreline position and geometry over annual to decadal time periods is essential in any management of the coast.The historical shoreline analysis was based on all available Ordnance Survey maps and aerial imagery information. Analysis looked at position and geometry over annual to decadal time periods, providing a dynamic picture of how the coastline has changed since the start of the early 1800s.Once all datasets were collated, data was interrogated using the ArcGIS package – Digital Shoreline Analysis System (DSAS). DSAS is a software package which enables a user to calculate rate-of-change statistics from multiple historical shoreline positions. Rate-of-change was collected at 25m intervals and displayed both statistically and spatially allowing for areas of retreat/accretion to be identified at any given stretch of coastline.The DSAS software will produce the following rate-of-change statistics:Net Shoreline Movement (NSM) – the distance between the oldest and the youngest shorelines.Shoreline Change Envelope (SCE) – a measure of the total change in shoreline movement considering all available shoreline positions and reporting their distances, without reference to their specific dates.End Point Rate (EPR) – derived by dividing the distance of shoreline movement by the time elapsed between the oldest and the youngest shoreline positions.Linear Regression Rate (LRR) – determines a rate of change statistic by fitting a least square regression to all shorelines at specific transects.Weighted Linear Regression Rate (WLR) - calculates a weighted linear regression of shoreline change on each transect. It considers the shoreline uncertainty giving more emphasis on shorelines with a smaller error.The end product provided by Ulster University is an invaluable tool and digital asset that has helped to visualise shoreline change and assess approximate rates of historical change at any given coastal stretch on the Northern Ireland coast.
House Price Regression Dataset
kaggle.com
zip
Updated Sep 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prokshitha Polemoni (2024). House Price Regression Dataset [Dataset]. https://www.kaggle.com/datasets/prokshitha/home-value-insights
Explore at:
zip(27045 bytes)Available download formats
Dataset updated
Sep 6, 2024
Authors
Prokshitha Polemoni
Description
Home Value Insights: A Beginner's Regression Dataset

This dataset is designed for beginners to practice regression problems, particularly in the context of predicting house prices. It contains 1000 rows, with each row representing a house and various attributes that influence its price. The dataset is well-suited for learning basic to intermediate-level regression modeling techniques.

Features:

Square_Footage: The size of the house in square feet. Larger homes typically have higher prices.

Num_Bedrooms: The number of bedrooms in the house. More bedrooms generally increase the value of a home.

Num_Bathrooms: The number of bathrooms in the house. Houses with more bathrooms are typically priced higher.

Year_Built: The year the house was built. Older houses may be priced lower due to wear and tear.

Lot_Size: The size of the lot the house is built on, measured in acres. Larger lots tend to add value to a property.

Garage_Size: The number of cars that can fit in the garage. Houses with larger garages are usually more expensive.

Neighborhood_Quality: A rating of the neighborhood’s quality on a scale of 1-10, where 10 indicates a high-quality neighborhood. Better neighborhoods usually command higher prices.

House_Price (Target Variable): The price of the house, which is the dependent variable you aim to predict.

Potential Uses:

Beginner Regression Projects: This dataset can be used to practice building regression models such as Linear Regression, Decision Trees, or Random Forests. The target variable (house price) is continuous, making this an ideal problem for supervised learning techniques.

Feature Engineering Practice: Learners can create new features by combining existing ones, such as the price per square foot or age of the house, providing an opportunity to experiment with feature transformations.

Exploratory Data Analysis (EDA): You can explore how different features (e.g., square footage, number of bedrooms) correlate with the target variable, making it a great dataset for learning about data visualization and summary statistics.

Model Evaluation: The dataset allows for various model evaluation techniques such as cross-validation, R-squared, and Mean Absolute Error (MAE). These metrics can be used to compare the effectiveness of different models.

Versatility:

The dataset is highly versatile for a range of machine learning tasks. You can apply simple linear models to predict house prices based on one or two features, or use more complex models like Random Forest or Gradient Boosting Machines to understand interactions between variables.

It can also be used for dimensionality reduction techniques like PCA or to practice handling categorical variables (e.g., neighborhood quality) through encoding techniques like one-hot encoding.

This dataset is ideal for anyone wanting to gain practical experience in building regression models while working with real-world features.
End Point Rate - Dataset - data.gov.uk
ckan.publishing.service.gov.uk
Updated Oct 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.publishing.service.gov.uk (2024). End Point Rate - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/end-point-rate
Explore at:
Dataset updated
Oct 12, 2024
Dataset provided by
CKANhttps://ckan.org/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
The primary objective from this project was to acquire historical shoreline information for all of the Northern Ireland coastline. Having this detailed understanding of the coast’s shoreline position and geometry over annual to decadal time periods is essential in any management of the coast.The historical shoreline analysis was based on all available Ordnance Survey maps and aerial imagery information. Analysis looked at position and geometry over annual to decadal time periods, providing a dynamic picture of how the coastline has changed since the start of the early 1800s.Once all datasets were collated, data was interrogated using the ArcGIS package – Digital Shoreline Analysis System (DSAS). DSAS is a software package which enables a user to calculate rate-of-change statistics from multiple historical shoreline positions. Rate-of-change was collected at 25m intervals and displayed both statistically and spatially allowing for areas of retreat/accretion to be identified at any given stretch of coastline.The DSAS software will produce the following rate-of-change statistics:Net Shoreline Movement (NSM) – the distance between the oldest and the youngest shorelines.Shoreline Change Envelope (SCE) – a measure of the total change in shoreline movement considering all available shoreline positions and reporting their distances, without reference to their specific dates.End Point Rate (EPR) – derived by dividing the distance of shoreline movement by the time elapsed between the oldest and the youngest shoreline positions.Linear Regression Rate (LRR) – determines a rate of change statistic by fitting a least square regression to all shorelines at specific transects.Weighted Linear Regression Rate (WLR) - calculates a weighted linear regression of shoreline change on each transect. It considers the shoreline uncertainty giving more emphasis on shorelines with a smaller error.The end product provided by Ulster University is an invaluable tool and digital asset that has helped to visualise shoreline change and assess approximate rates of historical change at any given coastal stretch on the Northern Ireland coast.
1900-1906 OS CS 3rd Ed - Dataset - data.gov.uk
ckan.publishing.service.gov.uk
Updated Jul 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.publishing.service.gov.uk (2025). 1900-1906 OS CS 3rd Ed - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/1900-1906-os-cs-3rd-ed2
Explore at:
Dataset updated
Jul 28, 2025
Dataset provided by
CKANhttps://ckan.org/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
The primary objective from this project was to acquire historical shoreline information for all of the Northern Ireland coastline. Having this detailed understanding of the coast’s shoreline position and geometry over annual to decadal time periods is essential in any management of the coast.The historical shoreline analysis was based on all available Ordnance Survey maps and aerial imagery information. Analysis looked at position and geometry over annual to decadal time periods, providing a dynamic picture of how the coastline has changed since the start of the early 1800s.Once all datasets were collated, data was interrogated using the ArcGIS package – Digital Shoreline Analysis System (DSAS). DSAS is a software package which enables a user to calculate rate-of-change statistics from multiple historical shoreline positions. Rate-of-change was collected at 25m intervals and displayed both statistically and spatially allowing for areas of retreat/accretion to be identified at any given stretch of coastline.The DSAS software will produce the following rate-of-change statistics:Net Shoreline Movement (NSM) – the distance between the oldest and the youngest shorelines.Shoreline Change Envelope (SCE) – a measure of the total change in shoreline movement considering all available shoreline positions and reporting their distances, without reference to their specific dates.End Point Rate (EPR) – derived by dividing the distance of shoreline movement by the time elapsed between the oldest and the youngest shoreline positions.Linear Regression Rate (LRR) – determines a rate of change statistic by fitting a least square regression to all shorelines at specific transects.Weighted Linear Regression Rate (WLR) - calculates a weighted linear regression of shoreline change on each transect. It considers the shoreline uncertainty giving more emphasis on shorelines with a smaller error.The end product provided by Ulster University is an invaluable tool and digital asset that has helped to visualise shoreline change and assess approximate rates of historical change at any given coastal stretch on the Northern Ireland coast.
1830-1834 OS CS 1st Ed - Dataset - data.gov.uk
ckan.publishing.service.gov.uk
Updated Jul 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.publishing.service.gov.uk (2025). 1830-1834 OS CS 1st Ed - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/1830-1834-os-cs-1st-ed2
Explore at:
Dataset updated
Jul 28, 2025
Dataset provided by
CKANhttps://ckan.org/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
The primary objective from this project was to acquire historical shoreline information for all of the Northern Ireland coastline. Having this detailed understanding of the coast’s shoreline position and geometry over annual to decadal time periods is essential in any management of the coast.The historical shoreline analysis was based on all available Ordnance Survey maps and aerial imagery information. Analysis looked at position and geometry over annual to decadal time periods, providing a dynamic picture of how the coastline has changed since the start of the early 1800s.Once all datasets were collated, data was interrogated using the ArcGIS package – Digital Shoreline Analysis System (DSAS). DSAS is a software package which enables a user to calculate rate-of-change statistics from multiple historical shoreline positions. Rate-of-change was collected at 25m intervals and displayed both statistically and spatially allowing for areas of retreat/accretion to be identified at any given stretch of coastline.The DSAS software will produce the following rate-of-change statistics:Net Shoreline Movement (NSM) – the distance between the oldest and the youngest shorelines.Shoreline Change Envelope (SCE) – a measure of the total change in shoreline movement considering all available shoreline positions and reporting their distances, without reference to their specific dates.End Point Rate (EPR) – derived by dividing the distance of shoreline movement by the time elapsed between the oldest and the youngest shoreline positions.Linear Regression Rate (LRR) – determines a rate of change statistic by fitting a least square regression to all shorelines at specific transects.Weighted Linear Regression Rate (WLR) - calculates a weighted linear regression of shoreline change on each transect. It considers the shoreline uncertainty giving more emphasis on shorelines with a smaller error.The end product provided by Ulster University is an invaluable tool and digital asset that has helped to visualise shoreline change and assess approximate rates of historical change at any given coastal stretch on the Northern Ireland coast.
d
Water-quality and streamflow datasets used in Weighted Regressions on Time,...
catalog.data.gov
data.usgs.gov
+1more
Updated Nov 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Water-quality and streamflow datasets used in Weighted Regressions on Time, Discharge, and Season (WRTDS) models to determine trends in the Nation’s rivers and streams, 1972-2017 [Dataset]. https://catalog.data.gov/dataset/water-quality-and-streamflow-datasets-used-in-weighted-regressions-on-time-discharge-1972-
Explore at:
Dataset updated
Nov 26, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
In 1991, the U.S. Geological Survey (USGS) began a study of more than 50 major river basins across the Nation as part of the National Water-Quality Assessment (NAWQA) project. One of the major goals of the NAWQA project was to determine how river water quality has changed over time. To support that goal, long-term consistent and comparable monitoring has been conducted by the USGS on streams and rivers throughout the Nation. Outside of the NAWQA project, the USGS and other Federal, State, and local agencies also have collected long-term water-quality data to support their own assessments of changing water quality. In 2017, data from these multiple sources were combined to support one of the most comprehensive assessments to date of water-quality trends in the United States (Oelsner and others, 2017; De Cicco and others, 2017). This data release updates these water quality trends, which ended in 2012, with 5 more years of data and now end in 2017. This USGS data release contains all the input and output files necessary to reproduce the results from the Weighted Regressions on Time, Discharge, and Season (WRTDS) models, using data preparation methods described in Oelsner and others, 2017. Models were calibrated for each combination of site and parameter using the screened input data. Models were run on Yeti, the USGS supercomputer, in 3 separate runs, using the scripts in the "Script.zip" folder. See readMe.txt for details on how the files in this data release are related and the modeling process. "SiteTable.csv" gives information on sites used in this analysis. Once calibrated, the WRTDS models were initially evaluated using a logistic regression equation that estimated a probability of acceptance for each model (e.g., "a good fit") based on a set of diagnostic metrics derived from the observed, estimated, and residual values from each model and data set. Each WRTDS model was assigned to one of three categories: “auto-accept,” “auto-reject,” or “manual evaluation". Models assigned to the latter category were visually evaluated for appropriate model fit using residual and diagnostic plots. Models assigned to the first two categories were automatically included or rejected from the final results, respectively. Twenty-two water-quality parameters were assessed, including nutrients (ammonia, nitrate, filtered orthophosphate, total nitrogen, total phosphorus, and unfiltered orthophosphate), major ions (calcium, bromide, fluoride, chloride, magnesium, potassium, sodium, and sulfate), salinity indicators (total dissolved solids and specific conductance), sediment (total suspended solids and suspended sediment concentration), carbon (dissolved organic carbon, total organic carbon, and particulate organic carbon), and alkalinity. Trends are reported for six periods: 1972-2017, 1982-2017, 1987-2017, 1992-2017, 2002-2017, and 2007-2017.
f
Summary of binomial generalized linear regression models (GLMs) used to...
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Aug 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oyarbide, Usua; Feyrer, Laura Joan; Gordon, Jonathan (2023). Summary of binomial generalized linear regression models (GLMs) used to assess the effect of trawling activity, region and sighting conditions on observations of (A) sperm whales or (B) northern bottlenose whales. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001006793
Explore at:
Dataset updated
Aug 23, 2023
Authors
Oyarbide, Usua; Feyrer, Laura Joan; Gordon, Jonathan
Description
Greatest support for best fit models is indicated by lowest ΔAIC (Akaike’s information criterion) values; all model with ΔAIC < 2 indicated in bold. Coefficients from the most supported model are provided, and terms where support for a significant relationship (positive or negative) with whale presence (p-value <0.05) are noted in bold. Df = degrees of freedom. For Trawler Activity state, Towing was the reference level and for Fishing Area, Flemish cap was the reference level.
d
Data from: Stable isotope ratios of carbon and nitrogen from diamondback...
catalog.data.gov
data.usgs.gov
+1more
Updated Nov 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Stable isotope ratios of carbon and nitrogen from diamondback terrapins and resources within Southern Everglades and Key West National Wildlife Refuge, sampled 2012-2013 [Dataset]. https://catalog.data.gov/dataset/stable-isotope-ratios-of-carbon-and-nitrogen-from-diamondback-terrapins-and-resources-2012
Explore at:
Dataset updated
Nov 21, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Key West National Wildlife Refuge, Everglades
Description
This study was initiated to provide baseline data and to determine the utility of stable isotope analysis to evaluate the foraging strategies of an opportunistic reptile predator. Stable isotope ratios of carbon and nitrogen were evaluated from multiple tissues from terrapin populations to determine spatial or temporal variations in resource use within mangrove habitats in Southern Florida. We sampled Diamondback terrapin (Malaclemys terrapin) and potential resources within mainland and island habitats, and evaluated their δ13C and δ15N values. We fit linear regression models to determine the best predictors of isotopic values for both terrapins and their prey, and SIBER analysis to examine terrapin isotopic niche space and overlap between groups. We identified differences in terrapin isotopic δ13C and δ15N values among all sites. Blood and scute tissues revealed different isotopic compositions and niche overlap between sites, suggesting diets or foraging locations may change over time, and amount of variation is site specific. Niche overlap between size classes was larger for blood (short-term) versus scute (long-term), suggesting greater variability in food resource use and/or isotopic signal of those food resources over short and long timescales.
f
Data from: Prediction of Retention Time and Collision Cross Section (CCSH+,...
datasetcatalog.nlm.nih.gov
figshare.com
+1more
Updated Oct 25, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hernandez, Félix; Bijlsma, Lubertus; Celma, Alberto; Humphries, Melissa; Bade, Richard; Sancho, Juan Vicente (2022). Prediction of Retention Time and Collision Cross Section (CCSH+, CCSH–, and CCSNa+) of Emerging Contaminants Using Multiple Adaptive Regression Splines [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000406004
Explore at:
Dataset updated
Oct 25, 2022
Authors
Hernandez, Félix; Bijlsma, Lubertus; Celma, Alberto; Humphries, Melissa; Bade, Richard; Sancho, Juan Vicente
Description
Ultra-high performance liquid chromatography coupled to ion mobility separation and high-resolution mass spectrometry instruments have proven very valuable for screening of emerging contaminants in the aquatic environment. However, when applying suspect or nontarget approaches (i.e., when no reference standards are available), there is no information on retention time (RT) and collision cross-section (CCS) values to facilitate identification. In silico prediction tools of RT and CCS can therefore be of great utility to decrease the number of candidates to investigate. In this work, Multiple Adaptive Regression Splines (MARS) were evaluated for the prediction of both RT and CCS. MARS prediction models were developed and validated using a database of 477 protonated molecules, 169 deprotonated molecules, and 249 sodium adducts. Multivariate and univariate models were evaluated showing a better fit for univariate models to the experimental data. The RT model (R2 = 0.855) showed a deviation between predicted and experimental data of ±2.32 min (95% confidence intervals). The deviation observed for CCS data of protonated molecules using the CCSH model (R2 = 0.966) was ±4.05% with 95% confidence intervals. The CCSH model was also tested for the prediction of deprotonated molecules, resulting in deviations below ±5.86% for the 95% of the cases. Finally, a third model was developed for sodium adducts (CCSNa, R2 = 0.954) with deviation below ±5.25% for 95% of the cases. The developed models have been incorporated in an open-access and user-friendly online platform which represents a great advantage for third-party research laboratories for predicting both RT and CCS data.
Simulation Data Set
s.cnmilf.com
catalog.data.gov
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Simulation Data Set [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/simulation-data-set
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
u
Dataset: Proportional recovery in mice with cortical stroke
ldh.stroke-koeln.imise.uni-leipzig.de
Updated Nov 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Markus Aswendt (2024). Dataset: Proportional recovery in mice with cortical stroke [Dataset]. http://doi.org/10.12751/g-node.gjf2hv
Explore at:
Unique identifier
https://doi.org/10.12751/g-node.gjf2hv
Dataset updated
Nov 4, 2024
Authors
Markus Aswendt
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Post-Stroke Recovery Data Repository

This repository contains various resources related to the study on post-stroke recovery in a mouse model, focusing on the application of the Proportional Recovery Rule (PRR).

Repository Structure

code/: Contains all the code used for the analysis in this study. Detailed information is available in the README within the code folder.

input/: This folder contains all datasets used in the publication.

output/: This directory includes the final results generated for each dataset. Detailed information for each dataset's output can be found in their respective subfolders.

docs/: Additional documentation related to this project, including extra resources in the form of a README file within this folder.

Methodology Overview

Introduction

The Fugl-Meyer upper extremity score is a widely used assessment tool in clinical settings to evaluate motor function in stroke patients. With a maximum score of 66, higher values indicate better motor performance, while lower values signify greater deficits.

The Proportional Recovery Rule (PRR) suggests that the magnitude of recovery from nonsevere upper limb motor impairment after stroke is approximately 0.7 times the initial impairment. This rule, proposed in 2008, has been applied to various motor and nonmotor impairments, leading to inconsistencies in its formulation and application across studies.

Translating PRR to Deficit Score

In this study, we translated the Fugl-Meyer upper extremity score into a deficit score suitable for use in a mouse model. The PRR posits that the change in impairment can be predicted as 0.7 times the initial impairment, plus an error term. We adapted this rule by fitting a linear regression model without an intercept to relate the initial impairment to the change in impairment.

Data Analysis

Initial Impairment Calculation:

Initial impairment (d-score) is calculated as the difference between the deficit score at day 3 post-stroke and the baseline deficit score.

Change Observed and Predicted:

Change observed: Initial impairment minus deficit score on day 28.

Change predicted: 0.7 times the initial impairment plus an error term.

Cluster Analysis:

Data were plotted with initial impairment on the x-axis and change observed on the y-axis.

A linear fit was applied to generate two lines: one based on the proportional recovery rule and one from the data fit.

Subjects were clustered based on their proximity to these lines, iterating the process until convergence.

Outlier Removal:

Outliers were identified and removed based on the interquartile range rule both initially and during each iteration of the clustering process.

Results

Cluster Characteristics:

The final clustering resulted in 65 subjects following the PRR, with a fixed slope of 0.7 and an intercept of -0.42.

The other cluster contained 21 subjects with a distinct recovery pattern, characterized by a slope of 0.84.

Statistical Analysis:

The slope of the overall linear fit was found to be 0.93.

Approximately 75.58% of the subjects adhered to the PRR, indicating the potential relevance of the PRR in the mouse model.

Additional Information

This structured dataset was created with reference to the following publication:

DOI:10.1038/s41597-023-02242-8

If you have any questions or require further assistance, please do not hesitate to reach out to us. Contact us via email at markus.aswendtATuk-koeln.de or aref.kalantari-sarcheshmehATuk-koeln.de.
d
Data from: Probability distribution grids of dissolved oxygen and dissolved...
search.dataone.org
data.usgs.gov
+2more
Updated Feb 1, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Celia Z. Rosecrans; Bernard T. Nolan; Jo Ann M. Gronberg (2018). Probability distribution grids of dissolved oxygen and dissolved manganese concentrations at selected thresholds in drinking water depth zones, Central Valley, California [Dataset]. https://search.dataone.org/view/a905afa4-cdf2-4f19-ac0d-42423de2d684
Explore at:
Dataset updated
Feb 1, 2018
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
Celia Z. Rosecrans; Bernard T. Nolan; Jo Ann M. Gronberg
Time period covered
Jan 1, 1993 - Jan 1, 2014
Area covered

Variables measured
Not applicable
Description
The ascii grids represent regional probabilities that groundwater in a particular location will have dissolved oxygen (DO) concentrations less than selected threshold values representing anoxic groundwater conditions or will have dissolved manganese (Mn) concentrations greater than selected threshold values representing secondary drinking water-quality contaminant levels (SMCL) and health-based screening levels (HBSL) for water quality. The probability models were constrained by the alluvial boundary of the Central Valley to a depth of approximately 300 meters (m). We utilized prediction modeling methods, specifically boosted regression trees (BRT) with a Bernoulli error distribution within a statistical learning framework within R's computing framework (http://www.r-project.org/) to produce two-dimensional probability grids at selected depths throughout the modeling domain. The statistical learning framework seeks to maximize the predictive performance of machine learning methods through model tuning by cross validation. Models were constructed using measured dissolved oxygen and manganese concentrations sampled from 2,767 wells within the alluvial boundary of the Central Valley and over 60 predictor variables from 7 sources (see metadata) and were assembled to develop a model that incorporates regional-scale soil properties, soil chemistry, land use, aquifer textures, and aquifer hydrology. Previously developed Central Valley model outputs of textures (Central Valley Textural Model, CVTM; Faunt and others, 2010) and MODFLOW-simulated vertical water fluxes and predicted depth to water table (Central Valley Hydrologic Model, CVHM; Faunt, 2009) were used to represent aquifer textures and groundwater hydraulics, respectively. The wells used in the BRT models described above were attributed to predictor variable values in ArcGIS using a 500-m buffer. The response variable data consisted of measured DO and Mn concentrations from 2,767 wells within the alluvial boundary of the Central Valley. The data were compiled from two sources: U.S. Geological Survey (USGS) National Water Information System (NWIS) database (all data are publicly available from the USGS at http://waterdata.usgs.gov/ca/nwis/nwis) and the California State Water Resources Control Board Division of Drinking Water (SWRCB-DDW) database (water-quality data are publicly available from the SWRCB at http://geotracker.waterboards.ca.gov/gama/). Only wells with well depth data were selected, and for wells with multiple records, only the most recent sample in the period 1993–2014 that had the required water-quality data was used. Data were available for 932 wells for the NWIS dataset and 1,835 wells for the SWRCB-DDW dataset. Models were trained on a USGS NWIS dataset of 932 wells and evaluated on an independent hold-out dataset of 1,835 wells from the SWRCB-DDW. We used cross-validation to assess the predictive performance of models of varying complexity as a basis for selecting the final models used to create the prediction grids. Trained models were applied to cross-validation testing data and a separate hold-out dataset to evaluate model predictive performance by emphasizing three model metrics of fit: Kappa, accuracy, and the area under the receiver operator characteristic (ROC) curve. The final trained models were used for mapping predictions at discrete depths to a depth of approximately 300 m. Trained DO and Mn models had accuracies of 86–100 percent, Kappa values of 0.69–0.99, and ROC values of 0.92–1.0. Model accuracies for cross-validation testing datasets were 82–95 percent, and ROC values were 0.87–0.91, indicating good predictive performance. Kappa values for the cross-validation testing dataset were 0.30–0.69, indicating fair to substantial agreement between testing observations and model predictions. Hold-out data were available for the manganese model only and indicated accuracies of 89–97 percent, ROC values of 0.73–0.75, and Kappa values of 0.06–0.30. The predictive performance of both the DO and Mn models was reasonable, considering all three of these fit metrics and the low percentages of low-DO and high-Mn events in the data. See associated journal article (Rosecrans and others, 2017) for complete summary of BRT modeling methods, model fit metrics, and relative influence of predictor variables for a given DO or Mn BRT model. The modeled response variables for the DO BRT models were based on measured DO values from wells at the following thresholds: <0.5 milligrams per liter (mg/L), <1.0 mg/L, and <2.0 mg/L, and these thresholds values were considered anoxic based on literature reviews. The modeled response variables for the Mn BRT models were based on measured Mn values from wells at the following exceedance thresholds: >50 micrograms per liter (µg/L), >150 µg/L, and >300 µg/L. (The 150 µg/L manganese threshold repres... Visit https://dataone.org/datasets/a905afa4-cdf2-4f19-ac0d-42423de2d684 for complete metadata about this dataset.
d
Curve Fit regression tool adjusted R-squared outputs for wild celery, wild...
catalog.data.gov
data.usgs.gov
Updated Nov 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Curve Fit regression tool adjusted R-squared outputs for wild celery, wild rice and arrowhead for pools 4, 8, and 13 on the Upper Mississippi River system from 1998-2019 [Dataset]. https://catalog.data.gov/dataset/curve-fit-regression-tool-adjusted-r-squared-outputs-for-wild-celery-wild-rice-and-ar-1998
Explore at:
Dataset updated
Nov 19, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Mississippi River, Mississippi River System
Description
A geodatabase was developed to compile Curve Fit (Version 10.1; De Jager and Fox, 2013) regression tool adjusted R-squared outputs for wild celery (Vallisneria americana), wild rice (Zizania aquatica) and arrowhead (one raster for the sum of Sagittaria rigida and Sagittaria latifolia) for pools 4, 8, and 13 on the Upper Mississippi River system from 1998-2019 using mapped abundance raster datasets. Relative abundance, for submersed species and filamentous algae, represents the sum of rake scores across the six subsites divided by the maximum possible rake score (30) at each site, multiplied by 100 (0-100%). Percent cover, for emersed, rooted floating-leaved and free-floating lifeforms, represents the maximum % cover for each category (0, 20, 40, 60, 80, 100%). Each explanatory variable (year) was paired with the corresponding raster by pool. Curve Fit was used to estimate the linear relationship between year and pixel value (one relative abundance/percent cover value per year) and create an output raster containing parameter estimates, model error, and r2. The adjusted R-squared is a modified version of R-squared that has been adjusted for the number of predictors in the model. Outputs were developed at two temporal scales: 1998-2019 and 2010-2019.

Facebook

Twitter

Click to copy link

Link copied

Cite

Miraj Deep Bhandari (2023). polynomial regression [Dataset]. http://doi.org/10.34740/kaggle/ds/3482232

polynomial regression

polynomial regression for begginers , Polynomial Regression Algorithm

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.34740/kaggle/ds/3482232

Dataset updated

Jul 5, 2023

Dataset provided by

Kaggle

Authors

Miraj Deep Bhandari

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

The Ice Cream Selling dataset is a simple and well-suited dataset for beginners in machine learning who are looking to practice polynomial regression. It consists of two columns: temperature and the corresponding number of units of ice cream sold.

The dataset captures the relationship between temperature and ice cream sales. It serves as a practical example for understanding and implementing polynomial regression, a powerful technique for modeling nonlinear relationships in data.

The dataset is designed to be straightforward and easy to work with, making it ideal for beginners. The simplicity of the data allows beginners to focus on the fundamental concepts and steps involved in polynomial regression without overwhelming complexity.

By using this dataset, beginners can gain hands-on experience in preprocessing the data, splitting it into training and testing sets, selecting an appropriate degree for the polynomial regression model, training the model, and evaluating its performance. They can also explore techniques to address potential challenges such as overfitting.

With this dataset, beginners can practice making predictions of ice cream sales based on temperature inputs and visualize the polynomial regression curve that represents the relationship between temperature and ice cream sales.

Overall, the Ice Cream Selling dataset provides an accessible and practical learning resource for beginners to grasp the concepts and techniques of polynomial regression in the context of analyzing ice cream sales data.

Clear search

Close search

Google apps

Main menu

polynomial regression

Linear Regression Rate

Study Hours ,Student Scores for Linear Regression

Goodness-of-fit measure for multiple linear regression model.

Data from: WiBB: An integrated method for quantifying the relative...

13: Models coefficients and statistics for regression models used to...

Predictive Analysis: Vehicle Prices

Weighted Linear Regression | gimi9.com

House Price Regression Dataset

Home Value Insights: A Beginner's Regression Dataset

Features:

Potential Uses:

Versatility:

End Point Rate - Dataset - data.gov.uk

1900-1906 OS CS 3rd Ed - Dataset - data.gov.uk

1830-1834 OS CS 1st Ed - Dataset - data.gov.uk

Water-quality and streamflow datasets used in Weighted Regressions on Time,...

Summary of binomial generalized linear regression models (GLMs) used to...

Data from: Stable isotope ratios of carbon and nitrogen from diamondback...

Data from: Prediction of Retention Time and Collision Cross Section (CCSH+,...

Simulation Data Set

Dataset: Proportional recovery in mice with cortical stroke

Post-Stroke Recovery Data Repository

Repository Structure

Methodology Overview

Introduction

Translating PRR to Deficit Score

Data Analysis

Results

Additional Information

Data from: Probability distribution grids of dissolved oxygen and dissolved...

Curve Fit regression tool adjusted R-squared outputs for wild celery, wild...

polynomial regressionSee More Versions

polynomial regression for begginers , Polynomial Regression Algorithm

polynomial regression