Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Sachin Gupta
Released under CC0: Public Domain
Facebook
TwitterThis dataset has been created to demonstrate the use of a simple linear regression model. It includes two variables: an independent variable and a dependent variable. The data can be used for training, testing, and validating a simple linear regression model, making it ideal for educational purposes, tutorials, and basic predictive analysis projects. The dataset consists of 100 observations with no missing values, and it follows a linear relationship
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is constructed from project activity experience.
Columns: not done - Projects that didn't worked out until accomplishment (0 = done // 1 = not done) time required - Time in hours estimated for the accomplishment cost - Cost per hour
Facebook
TwitterThis dataset is designed for beginners to practice regression problems, particularly in the context of predicting house prices. It contains 1000 rows, with each row representing a house and various attributes that influence its price. The dataset is well-suited for learning basic to intermediate-level regression modeling techniques.
Beginner Regression Projects: This dataset can be used to practice building regression models such as Linear Regression, Decision Trees, or Random Forests. The target variable (house price) is continuous, making this an ideal problem for supervised learning techniques.
Feature Engineering Practice: Learners can create new features by combining existing ones, such as the price per square foot or age of the house, providing an opportunity to experiment with feature transformations.
Exploratory Data Analysis (EDA): You can explore how different features (e.g., square footage, number of bedrooms) correlate with the target variable, making it a great dataset for learning about data visualization and summary statistics.
Model Evaluation: The dataset allows for various model evaluation techniques such as cross-validation, R-squared, and Mean Absolute Error (MAE). These metrics can be used to compare the effectiveness of different models.
The dataset is highly versatile for a range of machine learning tasks. You can apply simple linear models to predict house prices based on one or two features, or use more complex models like Random Forest or Gradient Boosting Machines to understand interactions between variables.
It can also be used for dimensionality reduction techniques like PCA or to practice handling categorical variables (e.g., neighborhood quality) through encoding techniques like one-hot encoding.
This dataset is ideal for anyone wanting to gain practical experience in building regression models while working with real-world features.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract Early cost estimation of construction projects is not an easy task, as such projects involve a high level of inaccuracy and uncertainty. Even in the early stages, errors in the estimates can result in financial loss and jeopardize construction completion. Therefore, the main objective of this research study is to present a framework for building cost estimation models, using the linear regression technique. The framework method is divided into five phases: (1) identifying the model’s requirements, (2) selection of the independent variables, (3) database construction, (4) data modelling and (5) model performance evaluation. A case study was conducted on federal penitentiary construction projects to test the applicability of the framework. Through the case study, two valid models were built, and their margins of error were 23 and 25%. The framework itself is one of the main contributions of this study, and it can be replicated by practitioners to develop models for construction cost estimation.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset consists of total 90 data points for past Dynamics 365 Finance and Operations projects. The data is collected from three organizations having CMMI level 3, located in Lahore Pakistan. The dataset consists of Organization code, Project code for identification of data points against companies and their respective projects 9 predictor variables - Number of Fields(nF), Number of Input Parameters(nIP), Number of Data sources(nDS), Number of Tables(nT), Number of Graphs(nG), Number of Static Visual Elements(nSVE), Number of Report Designs(nRD), Number of Integrations(nI), Number of Business Units(nBU) Actual Effort in Person Days Expert Judgement Effort in Person Days
Facebook
TwitterProblem Statement The data given is of the mutual funds in the USA. The objective of this problem is to predict the ‘basis point spread’ over AAA bonds i.e. feature ‘bonds_aaa’ against each Serial Number.
Basis Point Spread indicates the additional return a mutual fund would give over the AAA-rated bonds.
About the Dataset For this task, we have only taken the required columns and dropped the unnecessary columns. The data has already been cleansed for better analysis.
A zipped file containing the following items is given:
train.csv : The data file train.csv contains the 9518 instances with the 153 features including the target feature.
test.csv : The datafile test.csv contains the 2380instances with the 152 features excluding the target feature.
sample_submission.csv : Explained under the Submission sub-heading
MutualFundReturnsDataDictionary.csv: The file contains data dictionary(Dictionary explaining what each feature of the dataset means) of the dataset
Submission After training the model on train.csv data, the learner has to predict the target feature of the test.csv data using the trained model. The learner has to then submit a CSV file with the predicted feature.
Sample submission file(sample_submission.csv) is given to you as a reference to the format expected when you submit
Evaluation metrics For this particular dataset we are using RMSE as the evaluation metric.
Submissions will be evaluated based on RMSE
Your RMSE score Points earned for the Task RMSE < 16.5 100% of the available points 16.5 <= RMSE < 20 80% of the available points 20 <= RMSE < 25 70% of the available points RMSE >= 25 No points earned
After completing this project you will have better understanding of how to apply linear model using GridsearchCV.
Chi-square contingency test Box plot Linear regression GridsearchCV Ridge and Lasso Regressor
Facebook
TwitterSandy ocean beaches are a popular recreational destination, often surrounded by communities containing valuable real estate. Development is on the rise despite the fact that coastal infrastructure is subjected to flooding and erosion. As a result, there is an increased demand for accurate information regarding past and present shoreline changes. To meet these national needs, the Coastal and Marine Geology Program of the U.S. Geological Survey (USGS) is compiling existing reliable historical shoreline data along open-ocean sandy shores of the conterminous United States and parts of Alaska and Hawaii under the National Assessment of Shoreline Change project. There is no widely accepted standard for analyzing shoreline change. Existing shoreline data measurements and rate calculation methods vary from study to study and prevent combining results into state-wide or regional assessments. The impetus behind the National Assessment project was to develop a standardized method of measuring changes in shoreline position that is consistent from coast to coast. The goal was to facilitate the process of periodically and systematically updating the results in an internally consistent manner.
Facebook
TwitterSandy ocean beaches are a popular recreational destination, often surrounded by communities containing valuable real estate. Development is on the rise despite the fact that coastal infrastructure is subjected to flooding and erosion. As a result, there is an increased demand for accurate information regarding past and present shoreline changes. To meet these national needs, the Coastal and Marine Geology Program of the U.S. Geological Survey (USGS) is compiling existing reliable historical shoreline data along open-ocean sandy shores of the conterminous United States and parts of Alaska and Hawaii under the National Assessment of Shoreline Change project. There is no widely accepted standard for analyzing shoreline change. Existing shoreline data measurements and rate calculation methods vary from study to study and prevent combining results into state-wide or regional assessments. The impetus behind the National Assessment project was to develop a standardized method of measuring changes in shoreline position that is consistent from coast to coast. The goal was to facilitate the process of periodically and systematically updating the results in an internally consistent manner.
Facebook
TwitterSandy ocean beaches are a popular recreational destination, often surrounded by communities containing valuable real estate. Development is on the rise despite the fact that coastal infrastructure is subjected to flooding and erosion. As a result, there is an increased demand for accurate information regarding past and present shoreline changes. To meet these national needs, the Coastal and Marine Geology Program of the U.S. Geological Survey (USGS) is compiling existing reliable historical shoreline data along open-ocean sandy shores of the conterminous United States and parts of Alaska and Hawaii under the National Assessment of Shoreline Change project. There is no widely accepted standard for analyzing shoreline change. Existing shoreline data measurements and rate calculation methods vary from study to study and prevent combining results into state-wide or regional assessments. The impetus behind the National Assessment project was to develop a standardized method of measuring changes in shoreline position that is consistent from coast to coast. The goal was to facilitate the process of periodically and systematically updating the results in an internally consistent manner.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These four datasets consist of software projects—specifically student projects from a private university in Lahore, Pakistan—developed using different programming languages and application types, including desktop, command-line, and web applications. Specifically, Dataset #1 comprises 31 C++ desktop GUI applications, Dataset #2 contains 19 Java desktop GUI projects, Dataset #3 includes 11 Java command-line applications, and Dataset #4 features 12 Java web-based systems. Each dataset includes a comprehensive set of metrics derived from Use Case Diagrams (UCD), Analysis Class Diagrams (ACD), and Data Flow Diagrams (DFD), along with the corresponding software size measured in Source Lines of Code (SLOC). These datasets are utilized to compare the effectiveness of metrics derived from these three diagrams for early software size estimation.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract The valuation of real estate, which assists in the definition of market value, is an important science with a wide field of action, which includes the collection of taxes, commercial transactions, insurance and judicial expertise. This study presents the construction of a linear regression model to determine the market value (dependent variable) of residential apartments in the city of Fortaleza-CE. The studied database presents 17,493 apartments, divided into 227 plan types in a total of 154 projects launched between the years of 2011 and 2014. The model developed was obtained using Multiple Linear Regression associated with the Ridge Regression technique to solve the existing multicollinearity problem. In the analysis of 30 variables (12 quantitative and 18 dummy type qualitative variables), an equation with 6 variables was reached, which meets the theoretical assumptions for its existence.
Facebook
TwitterThis dataset consists of short-term (100+ years) linear regression shoreline change rates for the North Shore region of Massachusetts. Rates of short-term shoreline change were computed within a GIS using the Digital Shoreline Analysis System (DSAS) version 4.3, an ArcGIS extension developed by the U.S. Geological Survey. The baseline is used as a reference line for the transects cast by the DSAS software. The transects intersect each shoreline at the measurement points, which are then used to calculate a linear regression rate for the Massachusetts Office of Coastal Zone Management Shoreline Change Project. Short-term linear regression statistics were calculated with all of the historical shorelines compiled for the Massachusetts Office of Coastal Zone Management Shoreline Change Project.. Due to continued coastal population growth and increased threats of erosion, current data on trends and rates of shoreline movement are required to inform shoreline and floodplain management. The Massachusetts Office of Coastal Zone Management launched the Shoreline Change Project in 1989 to identify erosion-prone areas of the coast. In 2001, a 1994 shoreline was added to calculate both long- and short-term shoreline change rates at 40-meter intervals along ocean-facing sections of the Massachusetts coast. The Coastal and Marine Geology Program of the U.S. Geological Survey (USGS) in cooperation with the Massachusetts Office of Coastal Zone Management, has compiled reliable historical shoreline data along open-facing sections of the Massachusetts coast under the Massachusetts Shoreline Change Mapping and Analysis Project 2013 Update. Two oceanfront shorelines for Massachusetts (approximately 1,370 km) were (1) delineated using 2008/09 color aerial orthoimagery, and (2) extracted from topographic LIDAR datasets (2007) obtained from NOAA's Ocean Service, Coastal Services Center. The new shorelines were integrated with existing Massachusetts Office of Coastal Zone Management and USGS historical shoreline data in order to compute long- and short-term rates using the latest version of the Digital Shoreline Analysis System (DSAS).
Facebook
TwitterSandy ocean beaches are a popular recreational destination, often surrounded by communities containing valuable real estate. Development is on the rise despite the fact that coastal infrastructure is subjected to flooding and erosion. As a result, there is an increased demand for accurate information regarding past and present shoreline changes. To meet these national needs, the Coastal and Marine Geology Program of the U.S. Geological Survey (USGS) is compiling existing reliable historical shoreline data along open-ocean sandy shores of the conterminous United States and parts of Alaska and Hawaii under the National Assessment of Shoreline Change project. There is no widely accepted standard for analyzing shoreline change. Existing shoreline data measurements and rate calculation methods vary from study to study and prevent combining results into state-wide or regional assessments. The impetus behind the National Assessment project was to develop a standardized method of measuring changes in shoreline position that is consistent from coast to coast. The goal was to facilitate the process of periodically and systematically updating the results in an internally consistent manner.
Facebook
TwitterThis dataset was created by #Feba2005
Facebook
TwitterSandy ocean beaches are a popular recreational destination, often surrounded by communities containing valuable real estate. Development is on the rise despite the fact that coastal infrastructure is subjected to flooding and erosion. As a result, there is an increased demand for accurate information regarding past and present shoreline changes. To meet these national needs, the Coastal and Marine Geology Program of the U.S. Geological Survey (USGS) is compiling existing reliable historical shoreline data along open-ocean sandy shores of the conterminous United States and parts of Alaska and Hawaii under the National Assessment of Shoreline Change project. There is no widely accepted standard for analyzing shoreline change. Existing shoreline data measurements and rate calculation methods vary from study to study and prevent combining results into state-wide or regional assessments. The impetus behind the National Assessment project was to develop a standardized method of measuring changes in shoreline position that is consistent from coast to coast. The goal was to facilitate the process of periodically and systematically updating the results in an internally consistent manner.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
It includes five different datasets. The first four datasets contain student projects collected from different offerings of two undergraduate-level courses – Object-Oriented Analysis and Design (OOAD) and Software Engineering (SE) – taught in a renowned private university in Lahore over a period of six years. The fifth dataset contains real-life industry projects collected from a renowned software house (i.e. member of Pakistan Software Houses Association for IT and ITeS (P@SHA)) in Lahore.
Dataset #1 consists of 31 C++ GUI-based desktop applications. Dataset #2 consists of 19 Java GUI-based desktop applications. Dataset #3 consists of 12 Java web applications. Dataset #4 consists of 31 Java all two categories. Dataset #5 consists of 11 VB.NET GUI-based desktop applications.
Attributes are used as follows: Project Code – Project ID for identification purposes NOC – The total number of classes in a class diagram NOA – The total number of attributes in a class diagram NOM – The total number of methods/operations in a class diagram NODep – The total number of dependency relationships in a class diagram NOAss – The total number of association relationships in a class diagram NOComp – The total number of composition relationships in a class diagram NOAgg – The total number of aggregation relationships in a class diagram NOGen – The total number of generalization relationships in a class diagram NORR – The total number of realization relationships in a class diagram NOOM – The total number of one-to-one multiplicity relationships in a class diagram NOMM – The total number of one-to-many multiplicity relationships in a class diagram NMMM – The total number of many-to-many multiplicity relationships in a class diagram OCP – objective class points EOCP – enhanced objective class points WEOCP – weighted enhanced objective class points SLOC – software size measured in source lines of code
Facebook
TwitterSandy ocean beaches are a popular recreational destination, often surrounded by communities containing valuable real estate. Development is on the rise despite the fact that coastal infrastructure is subjected to flooding and erosion. As a result, there is an increased demand for accurate information regarding past and present shoreline changes. To meet these national needs, the Coastal and Marine Geology Program of the U.S. Geological Survey (USGS) is compiling existing reliable historical shoreline data along open-ocean sandy shores of the conterminous United States and parts of Alaska and Hawaii under the National Assessment of Shoreline Change project. There is no widely accepted standard for analyzing shoreline change. Existing shoreline data measurements and rate calculation methods vary from study to study and prevent combining results into state-wide or regional assessments. The impetus behind the National Assessment project was to develop a standardized method of measuring changes in shoreline position that is consistent from coast to coast. The goal was to facilitate the process of periodically and systematically updating the results in an internally consistent manner.
Facebook
TwitterSandy ocean beaches are a popular recreational destination, often surrounded by communities containing valuable real estate. Development is on the rise despite the fact that coastal infrastructure is subjected to flooding and erosion. As a result, there is an increased demand for accurate information regarding past and present shoreline changes. To meet these national needs, the Coastal and Marine Geology Program of the U.S. Geological Survey (USGS) is compiling existing reliable historical shoreline data along open-ocean sandy shores of the conterminous United States and parts of Alaska and Hawaii under the National Assessment of Shoreline Change project. There is no widely accepted standard for analyzing shoreline change. Existing shoreline data measurements and rate calculation methods vary from study to study and prevent combining results into state-wide or regional assessments. The impetus behind the National Assessment project was to develop a standardized method of measuring changes in shoreline position that is consistent from coast to coast. The goal was to facilitate the process of periodically and systematically updating the results in an internally consistent manner.
Facebook
TwitterSandy ocean beaches are a popular recreational destination, often surrounded by communities containing valuable real estate. Development is on the rise despite the fact that coastal infrastructure is subjected to flooding and erosion. As a result, there is an increased demand for accurate information regarding past and present shoreline changes. To meet these national needs, the Coastal and Marine Geology Program of the U.S. Geological Survey (USGS) is compiling existing reliable historical shoreline data along open-ocean sandy shores of the conterminous United States and parts of Alaska and Hawaii under the National Assessment of Shoreline Change project. There is no widely accepted standard for analyzing shoreline change. Existing shoreline data measurements and rate calculation methods vary from study to study and prevent combining results into state-wide or regional assessments. The impetus behind the National Assessment project was to develop a standardized method of measuring changes in shoreline position that is consistent from coast to coast. The goal was to facilitate the process of periodically and systematically updating the results in an internally consistent manner.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Sachin Gupta
Released under CC0: Public Domain