100+ datasets found
  1. d

    An example data set for exploration of Multiple Linear Regression

    • catalog.data.gov
    • data.usgs.gov
    Updated Jul 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). An example data set for exploration of Multiple Linear Regression [Dataset]. https://catalog.data.gov/dataset/an-example-data-set-for-exploration-of-multiple-linear-regression
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    This data set contains example data for exploration of the theory of regression based regionalization. The 90th percentile of annual maximum streamflow is provided as an example response variable for 293 streamgages in the conterminous United States. Several explanatory variables are drawn from the GAGES-II data base in order to demonstrate how multiple linear regression is applied. Example scripts demonstrate how to collect the original streamflow data provided and how to recreate the figures from the associated Techniques and Methods chapter.

  2. g

    Regression Dataset

    • gts.ai
    json
    Updated Jan 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2025). Regression Dataset [Dataset]. https://gts.ai/dataset-download/regression-dataset/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jan 24, 2025
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Explore the Gender, Height, and Weight Regression Dataset for regression analysis, machine learning, and predictive modeling.

  3. d

    Data for multiple linear regression models for predicting microcystin...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Data for multiple linear regression models for predicting microcystin concentration action-level exceedances in selected lakes in Ohio [Dataset]. https://catalog.data.gov/dataset/data-for-multiple-linear-regression-models-for-predicting-microcystin-concentration-action
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Ohio
    Description

    Site-specific multiple linear regression models were developed for eight sites in Ohio—six in the Western Lake Erie Basin and two in northeast Ohio on inland reservoirs--to quickly predict action-level exceedances for a cyanotoxin, microcystin, in recreational and drinking waters used by the public. Real-time models include easily- or continuously-measured factors that do not require that a sample be collected. Real-time models are presented in two categories: (1) six models with continuous monitor data, and (2) three models with on-site measurements. Real-time models commonly included variables such as phycocyanin, pH, specific conductance, and streamflow or gage height. Many of the real-time factors were averages over time periods antecedent to the time the microcystin sample was collected, including water-quality data compiled from continuous monitors. Comprehensive models use a combination of discrete sample-based measurements and real-time factors. Comprehensive models were useful at some sites with lagged variables (< 2 weeks) for cyanobacterial toxin genes, dissolved nutrients, and (or) N to P ratios. Comprehensive models are presented in three categories: (1) three models with continuous monitor data and lagged comprehensive variables, (2) five models with no continuous monitor data and lagged comprehensive variables, and (3) one model with continuous monitor data and same-day comprehensive variables. Funding for this work was provided by the Ohio Water Development Authority and the U.S. Geological Survey Cooperative Water Program.

  4. Linear Regression E-commerce Dataset

    • kaggle.com
    zip
    Updated Sep 16, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saurabh Kolawale (2019). Linear Regression E-commerce Dataset [Dataset]. https://www.kaggle.com/datasets/kolawale/focusing-on-mobile-app-or-website
    Explore at:
    zip(44169 bytes)Available download formats
    Dataset updated
    Sep 16, 2019
    Authors
    Saurabh Kolawale
    Description

    This dataset is having data of customers who buys clothes online. The store offers in-store style and clothing advice sessions. Customers come in to the store, have sessions/meetings with a personal stylist, then they can go home and order either on a mobile app or website for the clothes they want.

    The company is trying to decide whether to focus their efforts on their mobile app experience or their website.

  5. Regression Test Data

    • catalog.data.gov
    • gimi9.com
    Updated Jan 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2021). Regression Test Data [Dataset]. https://catalog.data.gov/dataset/regression-test-data
    Explore at:
    Dataset updated
    Jan 5, 2021
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    This dataset is based on the unit and regression testing available on https://github.com/USEPA/Stormwater-Management-Model/actions

  6. R

    20ml Regression Dataset

    • universe.roboflow.com
    zip
    Updated Jun 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    20ml (2024). 20ml Regression Dataset [Dataset]. https://universe.roboflow.com/20ml-tkw8a/20ml-regression/model/4
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 26, 2024
    Dataset authored and provided by
    20ml
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Volume
    Description

    20ml Regression

    ## Overview
    
    20ml Regression is a dataset for classification tasks - it contains Volume annotations for 5,683 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  7. m

    Panel dataset on Brazilian fuel demand

    • data.mendeley.com
    Updated Oct 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sergio Prolo (2024). Panel dataset on Brazilian fuel demand [Dataset]. http://doi.org/10.17632/hzpwbp7j22.1
    Explore at:
    Dataset updated
    Oct 7, 2024
    Authors
    Sergio Prolo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary : Fuel demand is shown to be influenced by fuel prices, people's income and motorization rates. We explore the effects of electric vehicle's rates in gasoline demand using this panel dataset.

    Files : dataset.csv - Panel dimensions are the Brazilian state ( i ) and year ( t ). The other columns are: gasoline sales per capita (ln_Sg_pc), prices of gasoline (ln_Pg) and ethanol (ln_Pe) and their lags, motorization rates of combustion vehicles (ln_Mi_c) and electric vehicles (ln_Mi_e) and GDP per capita (ln_gdp_pc). All variables are all under the natural log function, since we use this to calculate demand elasticities in a regression model.

    adjacency.csv - The adjacency matrix used in interaction with electric vehicles' motorization rates to calculate spatial effects. At first, it follows a binary adjacency formula: for each pair of states i and j, the cell (i, j) is 0 if the states are not adjacent and 1 if they are. Then, each row is normalized to have sum equal to one.

    regression.do - Series of Stata commands used to estimate the regression models of our study. dataset.csv must be imported to work, see comment section.

    dataset_predictions.xlsx - Based on the estimations from Stata, we use this excel file to make average predictions by year and by state. Also, by including years beyond the last panel sample, we also forecast the model into the future and evaluate the effects of different policies that influence gasoline prices (taxation) and EV motorization rates (electrification). This file is primarily used to create images, but can be used to further understand how the forecasting scenarios are set up.

    Sources: Fuel prices and sales: ANP (https://www.gov.br/anp/en/access-information/what-is-anp/what-is-anp) State population, GDP and vehicle fleet: IBGE (https://www.ibge.gov.br/en/home-eng.html?lang=en-GB) State EV fleet: Anfavea (https://anfavea.com.br/en/site/anuarios/)

  8. o

    Weighted Linear Regression - Dataset - Open Data NI

    • admin.opendatani.gov.uk
    Updated Oct 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Weighted Linear Regression - Dataset - Open Data NI [Dataset]. https://admin.opendatani.gov.uk/dataset/weighted-linear-regression
    Explore at:
    Dataset updated
    Oct 9, 2024
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    The primary objective from this project was to acquire historical shoreline information for all of the Northern Ireland coastline. Having this detailed understanding of the coast’s shoreline position and geometry over annual to decadal time periods is essential in any management of the coast.The historical shoreline analysis was based on all available Ordnance Survey maps and aerial imagery information. Analysis looked at position and geometry over annual to decadal time periods, providing a dynamic picture of how the coastline has changed since the start of the early 1800s.Once all datasets were collated, data was interrogated using the ArcGIS package – Digital Shoreline Analysis System (DSAS). DSAS is a software package which enables a user to calculate rate-of-change statistics from multiple historical shoreline positions. Rate-of-change was collected at 25m intervals and displayed both statistically and spatially allowing for areas of retreat/accretion to be identified at any given stretch of coastline.The DSAS software will produce the following rate-of-change statistics:Net Shoreline Movement (NSM) – the distance between the oldest and the youngest shorelines.Shoreline Change Envelope (SCE) – a measure of the total change in shoreline movement considering all available shoreline positions and reporting their distances, without reference to their specific dates.End Point Rate (EPR) – derived by dividing the distance of shoreline movement by the time elapsed between the oldest and the youngest shoreline positions.Linear Regression Rate (LRR) – determines a rate of change statistic by fitting a least square regression to all shorelines at specific transects.Weighted Linear Regression Rate (WLR) - calculates a weighted linear regression of shoreline change on each transect. It considers the shoreline uncertainty giving more emphasis on shorelines with a smaller error.The end product provided by Ulster University is an invaluable tool and digital asset that has helped to visualise shoreline change and assess approximate rates of historical change at any given coastal stretch on the Northern Ireland coast.

  9. j

    Data from: Data on the Construction Processes of Regression Models

    • jstagedata.jst.go.jp
    jpeg
    Updated Jul 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Taichi Kimura; Riko Iwamoto; Mikio Yoshida; Tatsuya Takahashi; Shuji Sasabe; Yoshiyuki Shirakawa (2023). Data on the Construction Processes of Regression Models [Dataset]. http://doi.org/10.50931/data.kona.22180318.v2
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jul 27, 2023
    Dataset provided by
    Hosokawa Powder Technology Foundation
    Authors
    Taichi Kimura; Riko Iwamoto; Mikio Yoshida; Tatsuya Takahashi; Shuji Sasabe; Yoshiyuki Shirakawa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This CSV dataset (numbered 1–8) demonstrates the construction processes of the regression models using machine learning methods, which are used to plot Fig. 2–7. The CSV file of 1.LSM_R^2 (plotting Fig. 2) shows the data of the relationship between estimated values and actual values when the least-squares method was used for a model construction. In the CSV file 2.PCR_R^2 (plotting Fig. 3), the number of the principal components was varied from 1 to 5 during the construction of a model using the principal component regression. The data in the CSV file 3.SVR_R^2 (plotting Fig. 4) is the result of the construction using the support vector regression. The hyperparameters were decided by the comprehensive combination from the listed candidates by exploring hyperparameters with maximum R2 values. When a deep neural network was applied to the construction of a regression model, NNeur., NH.L. and NL.T. were varied. The CSV file 4.DNN_HL (plotting Fig. 5a)) shows the changes in the relationship between estimated values and actual values at each NH.L.. Similarly, changes in the relationships between estimated values and actual values in the case NNeur. or NL.T. were varied in the CSV files 5.DNN_ Neur (plotting Fig. 5b)) and 6.DNN_LT (plotting Fig. 5c)). The data in the CSV file 7.DNN_R^2 (plotting Fig. 6) is the result using optimal NNeur., NH.L. and NL.T.. In the CSV file 8.R^2 (plotting Fig. 7), the validity of each machine learning method was compared by showing the optimal results for each method. Experimental conditions Supply volume of the raw material: 25–125 mL Addition rate of TiO2: 5.0–15.0 wt% Operation time: 1–15 min Rotation speed: 2,200–5,700 min-1 Temperature: 295–319 K Nomenclature NNeur.: the number of neurons NH.L.: the number of hidden layers NL.T.: the number of learning times

  10. e

    Synthesized Dataset of Length-Weight Regression Coefficients for Delta Fish

    • portal.edirepository.org
    • search.dataone.org
    • +1more
    bin, csv
    Updated Apr 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarah Perry (2020). Synthesized Dataset of Length-Weight Regression Coefficients for Delta Fish [Dataset]. http://doi.org/10.6073/pasta/895d04734af8380b486087d26d56d95b
    Explore at:
    csv(11461 bytes), csv(1510 bytes), csv(79491 bytes), bin(3461 bytes)Available download formats
    Dataset updated
    Apr 29, 2020
    Dataset provided by
    EDI
    Authors
    Sarah Perry
    Time period covered
    Jan 1, 1950 - Dec 31, 2018
    Area covered
    Variables measured
    a, b, n, r2, Ref, Lmax, Lmin, SD_b, RefNo, Gender, and 16 more
    Description

    This dataset is a compilation of length-weight regression coefficients for fish species commonly found in the freshwater tidal habitats of the San Francisco Estuary. This effort was born out of the Delta Smelt Resiliency Strategy Aquatic Weed Control Action study, which, in order to calculate fish biomass, needed to calculate individual fish weights from their measured lengths. The Aquatic Weed Control study was supported by Interagency Ecological Program through the Endangered Species Act and is included in the Interagency Ecological Program 2017-2019 workplan. Weight is estimated from length using the exponential function W=a

      L^b. These can be calculated using
    

    the linear regression of the log-transformed equation (log⁡(W)=log⁡(a)+b

     log(L)). This dataset provides the
    

    species-specific a and b parameters. Associated publication(s) and relevant metadata information are included. Data was obtained either via database (fishbase.us) or peer-reviewed scientific papers.

  11. n

    Data from: Assessing predictive performance of supervised machine learning...

    • data.niaid.nih.gov
    • datadryad.org
    • +1more
    zip
    Updated May 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Evans Omondi (2023). Assessing predictive performance of supervised machine learning algorithms for a diamond pricing model [Dataset]. http://doi.org/10.5061/dryad.wh70rxwrh
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 23, 2023
    Dataset provided by
    Strathmore University
    Authors
    Evans Omondi
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The diamond is 58 times harder than any other mineral in the world, and its elegance as a jewel has long been appreciated. Forecasting diamond prices is challenging due to nonlinearity in important features such as carat, cut, clarity, table, and depth. Against this backdrop, the study conducted a comparative analysis of the performance of multiple supervised machine learning models (regressors and classifiers) in predicting diamond prices. Eight supervised machine learning algorithms were evaluated in this work including Multiple Linear Regression, Linear Discriminant Analysis, eXtreme Gradient Boosting, Random Forest, k-Nearest Neighbors, Support Vector Machines, Boosted Regression and Classification Trees, and Multi-Layer Perceptron. The analysis is based on data preprocessing, exploratory data analysis (EDA), training the aforementioned models, assessing their accuracy, and interpreting their results. Based on the performance metrics values and analysis, it was discovered that eXtreme Gradient Boosting was the most optimal algorithm in both classification and regression, with a R2 score of 97.45% and an Accuracy value of 74.28%. As a result, eXtreme Gradient Boosting was recommended as the optimal regressor and classifier for forecasting the price of a diamond specimen. Methods Kaggle, a data repository with thousands of datasets, was used in the investigation. It is an online community for machine learning practitioners and data scientists, as well as a robust, well-researched, and sufficient resource for analyzing various data sources. On Kaggle, users can search for and publish various datasets. In a web-based data-science environment, they can study datasets and construct models.

  12. Training Dataset for FeatureLTE: Regression (Real)

    • figshare.com
    zip
    Updated Mar 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhaoyang Wang (2024). Training Dataset for FeatureLTE: Regression (Real) [Dataset]. http://doi.org/10.6084/m9.figshare.25434676.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 19, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Zhaoyang Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The training datasets for the regression task of FeatureLTE. These datasets are all real regression problems.

  13. d

    Data from: Stochastic Regression and Peak Delineation with Flow Cytometry...

    • datasets.ai
    • data.nist.gov
    • +1more
    0
    Updated Aug 13, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2024). Stochastic Regression and Peak Delineation with Flow Cytometry Data [Dataset]. https://datasets.ai/datasets/stochastic-regression-and-peak-delineation-with-flow-cytometry-data
    Explore at:
    0Available download formats
    Dataset updated
    Aug 13, 2024
    Dataset authored and provided by
    National Institute of Standards and Technology
    Description

    This data repository contains original files (fcs) of flow cytometry experiments. The data was used to demonstrate the use of stochastic regression to quantify subpopulations of cells that have distinctly different genome copies per cell within a heterogenous population of Escherichia coli (E. coli) cells. This new approach gives estimates of signal and noise, the former of which is used for analysis, and the latter is used to quantify uncertainty. By separating these two components, the signal and noise can be compared independently to evaluate measurement quality across different experimental conditions. The files contain experiments from a single stock of Escherichia coli cells that was diluted to different concentrations, stained with Hoechst33342, and acquired on a CytoFLEX LX under the same acquisition conditions. ?Control_Hoechst? is a biologic control sample stained only with Hoechst. ?RainbowBeads? is a control of hard-dyed fluorescent beads with 8 distinct peaks of known fluorescent intensities per manufacturer documentation. ?Test_double? indicates test samples with double fluorescent probe staining, the fractional number (e.g. 0.7) indicates the dilution factor from the stock, and the integer at the end represents the technical replicate.The downloaded Exp_20230921_1_Cyto-A-journal.zip file contains 14 files in .fcs format, which requires suitable software to read/analyze data (i.e., FCS Express).

  14. Dataset for Linear regression

    • kaggle.com
    Updated Mar 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vivek (2025). Dataset for Linear regression [Dataset]. https://www.kaggle.com/datasets/vivekk7/dataset-for-linear-regression/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 6, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Vivek
    Description

    Dataset

    This dataset was created by Vivek

    Contents

  15. U

    Data for Regression Models to Estimate Water Use in Providence, Rhode...

    • data.usgs.gov
    • datasets.ai
    • +1more
    Updated Apr 19, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laura Medalie; Timothy Stagnitta (2023). Data for Regression Models to Estimate Water Use in Providence, Rhode Island, 2014-2021 [Dataset]. http://doi.org/10.5066/P91H5QOY
    Explore at:
    Dataset updated
    Apr 19, 2023
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Laura Medalie; Timothy Stagnitta
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    Jul 2014 - Jun 2021
    Area covered
    Providence, Rhode Island
    Description

    This data release contains input data and programs (scripts) used to estimate monthly water demand for retail customers of Providence Water, located in Providence, Rhode Island. Explanatory data and model outputs are from July 2014 through June 2021. Models of per capita (for single-family residential customers) or per connection (for multi-family residential, commercial, and industrial customers) water use were developed using multiple linear regression. The dependent variables, provided by Providence Water, are the monthly number of connections and gallons of water delivered to single- and multi-family residential, commercial, and industrial connections. Potential independent variables (from online sources) are climate variables (temperature and precipitation), economic statistics, and a drought statistic. Not all independent variables were used in all of the models. The data are provided in data tables and model files. The data table RIWaterUseVariableExplanation.csv describes ...

  16. c

    Student Performance (Multiple Linear Regression) Dataset

    • cubig.ai
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Student Performance (Multiple Linear Regression) Dataset [Dataset]. https://cubig.ai/store/products/392/student-performance-multiple-linear-regression-dataset
    Explore at:
    Dataset updated
    May 29, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
    Description

    1) Data Introduction • The Student Performance (Multiple Linear Regression) Dataset is designed to analyze the relationship between students’ learning habits and academic performance. Each sample includes key indicators related to learning, such as study hours, sleep duration, previous test scores, and the number of practice exams completed.

    2) Data Utilization (1) Characteristics of the Student Performance (Multiple Linear Regression) Dataset: • The target variable, Hours Studied, quantitatively represents the amount of time a student has invested in studying. The dataset is structured to allow modeling and inference of learning behaviors based on correlations with other variables.

    (2) Applications of the Student Performance (Multiple Linear Regression) Dataset: • AI-Based Study Time Prediction Models: The dataset can be used to develop regression models that estimate a student’s expected study time based on inputs like academic performance, sleep habits, and engagement patterns. • Behavioral Analysis and Personalized Learning Strategies: It can be applied to identify students with insufficient study time and design personalized study interventions based on academic and lifestyle patterns.

  17. u

    Data from: Dataset of the paper “Variable selection for linear regression in...

    • investigacion.ubu.es
    Updated 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pacheco Bonrostro, Joaquín; Casado Yusta, Silvia; Pacheco Bonrostro, Joaquín; Casado Yusta, Silvia (2020). Dataset of the paper “Variable selection for linear regression in large databases: exact methods” Applied Intelligence, 51(6), 3736-3756 [Dataset]. https://investigacion.ubu.es/documentos/682afba74c44bf76b28811e1
    Explore at:
    Dataset updated
    2020
    Authors
    Pacheco Bonrostro, Joaquín; Casado Yusta, Silvia; Pacheco Bonrostro, Joaquín; Casado Yusta, Silvia
    Description

    The variable selection problem in the context of Linear Regression for large databases is analysed. The problem consists in selecting a small subset of independent variables that can perform the prediction task optimally. This problem has a wide range of applications. One important type of application is the design of composite indicators in various areas (sociology and economics, for example). Other important applications of variable selection in linear regression can be found in fields such as chemometrics, genetics, and climate prediction, among many others. For this problem, we propose a Branch & Bound method. This is an exact method and therefore guarantees optimal solutions. We also provide strategies that enable this method to be applied in very large databases (with hundreds of thousands of cases) in a moderate computation time. A series of computational experiments shows that our method performs well compared with well-known methods in the literature and with commercial software.

  18. Quantile Regression Code and Data

    • figshare.com
    zip
    Updated Sep 10, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marzieh Shahmandi; Paul Wilson; Mike Thelwall (2019). Quantile Regression Code and Data [Dataset]. http://doi.org/10.6084/m9.figshare.9700466.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 10, 2019
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Marzieh Shahmandi; Paul Wilson; Mike Thelwall
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The SAS and R code and the accompanied data are related to the paper "The Advantages of Quantile Regression in Scientometrics"

  19. SPSS Data Set S1 Logistic Regression Model Data

    • figshare.com
    bin
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michelle Klailova; Phyllis Lee (2016). SPSS Data Set S1 Logistic Regression Model Data [Dataset]. http://doi.org/10.6084/m9.figshare.1051748.v2
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Michelle Klailova; Phyllis Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data set from PLOS ONE Article Published Entitled: Western Lowland Gorillas Signal Selectively Using Odor

  20. Simple Housing Dataset for Regression!

    • kaggle.com
    Updated Apr 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sujith K Mandala (2024). Simple Housing Dataset for Regression! [Dataset]. http://doi.org/10.34740/kaggle/dsv/4577500
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 19, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sujith K Mandala
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A straightforward regression dataset of the int type. Improve your Machine Learning - Regression skills through this dataset!

    The dataset contains these columns -

    Transaction date INT

    House Age INT

    Distance from nearest Metro station (km) INT

    Number of convenience stores INT

    latitude INT

    longitude INT

    Number of bedrooms INT

    House size (sqft) INT

    House price INT

    Happy Learning!

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
U.S. Geological Survey (2024). An example data set for exploration of Multiple Linear Regression [Dataset]. https://catalog.data.gov/dataset/an-example-data-set-for-exploration-of-multiple-linear-regression

An example data set for exploration of Multiple Linear Regression

Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description

This data set contains example data for exploration of the theory of regression based regionalization. The 90th percentile of annual maximum streamflow is provided as an example response variable for 293 streamgages in the conterminous United States. Several explanatory variables are drawn from the GAGES-II data base in order to demonstrate how multiple linear regression is applied. Example scripts demonstrate how to collect the original streamflow data provided and how to recreate the figures from the associated Techniques and Methods chapter.

Search
Clear search
Close search
Google apps
Main menu