67 datasets found
  1. Marketing Linear Multiple Regression

    • kaggle.com
    zip
    Updated Apr 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FayeJavad (2020). Marketing Linear Multiple Regression [Dataset]. https://www.kaggle.com/datasets/fayejavad/marketing-linear-multiple-regression
    Explore at:
    zip(1907 bytes)Available download formats
    Dataset updated
    Apr 24, 2020
    Authors
    FayeJavad
    Description

    Dataset

    This dataset was created by FayeJavad

    Contents

  2. Startup - Multiple Linear Regression

    • kaggle.com
    zip
    Updated Jan 29, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    karthickveerakumar (2018). Startup - Multiple Linear Regression [Dataset]. https://www.kaggle.com/datasets/karthickveerakumar/startup-logistic-regression
    Explore at:
    zip(1330 bytes)Available download formats
    Dataset updated
    Jan 29, 2018
    Authors
    karthickveerakumar
    Description

    Dataset

    This dataset was created by karthickveerakumar

    Contents

  3. Cars Dataset

    • kaggle.com
    zip
    Updated Mar 6, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    APriyanka (2020). Cars Dataset [Dataset]. https://www.kaggle.com/datasets/apriyanka/cars-dataset
    Explore at:
    zip(1415 bytes)Available download formats
    Dataset updated
    Mar 6, 2020
    Authors
    APriyanka
    Description

    Dataset

    This dataset was created by APriyanka

    Contents

  4. Multiple Linear Regression using One-Hot Encoding

    • kaggle.com
    zip
    Updated Dec 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Azmary (2021). Multiple Linear Regression using One-Hot Encoding [Dataset]. https://www.kaggle.com/datasets/azmary/multiple-linear-regression-using-onehot-encoding
    Explore at:
    zip(4609 bytes)Available download formats
    Dataset updated
    Dec 3, 2021
    Authors
    Azmary
    Description

    Dataset

    This dataset was created by Azmary

    Contents

  5. Multi Variable-Linear Regression-ML

    • kaggle.com
    zip
    Updated Aug 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Parisan Ahmadi (2022). Multi Variable-Linear Regression-ML [Dataset]. https://www.kaggle.com/datasets/parisanahmadi/multi-variablelinear-regressionml
    Explore at:
    zip(427 bytes)Available download formats
    Dataset updated
    Aug 7, 2022
    Authors
    Parisan Ahmadi
    Description

    Dataset

    This dataset was created by Parisan Ahmadi

    Contents

  6. polynomial regression

    • kaggle.com
    Updated Jul 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Miraj Deep Bhandari (2023). polynomial regression [Dataset]. http://doi.org/10.34740/kaggle/ds/3482232
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 5, 2023
    Dataset provided by
    Kaggle
    Authors
    Miraj Deep Bhandari
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The Ice Cream Selling dataset is a simple and well-suited dataset for beginners in machine learning who are looking to practice polynomial regression. It consists of two columns: temperature and the corresponding number of units of ice cream sold.

    The dataset captures the relationship between temperature and ice cream sales. It serves as a practical example for understanding and implementing polynomial regression, a powerful technique for modeling nonlinear relationships in data.

    The dataset is designed to be straightforward and easy to work with, making it ideal for beginners. The simplicity of the data allows beginners to focus on the fundamental concepts and steps involved in polynomial regression without overwhelming complexity.

    By using this dataset, beginners can gain hands-on experience in preprocessing the data, splitting it into training and testing sets, selecting an appropriate degree for the polynomial regression model, training the model, and evaluating its performance. They can also explore techniques to address potential challenges such as overfitting.

    With this dataset, beginners can practice making predictions of ice cream sales based on temperature inputs and visualize the polynomial regression curve that represents the relationship between temperature and ice cream sales.

    Overall, the Ice Cream Selling dataset provides an accessible and practical learning resource for beginners to grasp the concepts and techniques of polynomial regression in the context of analyzing ice cream sales data.

  7. d

    Data from: Data for Regression Models to Estimate Water Use in Providence,...

    • catalog.data.gov
    • data.usgs.gov
    • +2more
    Updated Nov 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Data for Regression Models to Estimate Water Use in Providence, Rhode Island, 2014-2021 [Dataset]. https://catalog.data.gov/dataset/data-for-regression-models-to-estimate-water-use-in-providence-rhode-island-2014-2021
    Explore at:
    Dataset updated
    Nov 19, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Rhode Island, Providence
    Description

    This data release contains input data and programs (scripts) used to estimate monthly water demand for retail customers of Providence Water, located in Providence, Rhode Island. Explanatory data and model outputs are from July 2014 through June 2021. Models of per capita (for single-family residential customers) or per connection (for multi-family residential, commercial, and industrial customers) water use were developed using multiple linear regression. The dependent variables, provided by Providence Water, are the monthly number of connections and gallons of water delivered to single- and multi-family residential, commercial, and industrial connections. Potential independent variables (from online sources) are climate variables (temperature and precipitation), economic statistics, and a drought statistic. Not all independent variables were used in all of the models. The data are provided in data tables and model files. The data table RIWaterUseVariableExplanation.csv describes the explanatory variables and their data sources. The data table ProvModelInputData.csv provides the monthly water-use data that are the independent variables and the monthly climatic and economic data that are the dependent variables. The data table DroughtInputData.csv provides the weekly U.S. drought monitor index values that were processed to formulate a potential independent variable. The R script model_water_use.R runs the models that predict water use. The other two R scripts (load_preprocess_input_data.R and model_water_use_functions.R) are not run explicitly but are called from the primary script model_water_use.R. Regression equations produced by the models can be used to predict water demand throughout Rhode Island.

  8. d

    Calibration datasets and model archive summaries for regression models...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Oct 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Calibration datasets and model archive summaries for regression models developed to estimate metal concentrations at nine sites on the Animas and San Juan Rivers, Colorado, New Mexico, and Utah: U.S. Geological Survey data release, https://doi.org/10.5066/P9THSFE0 [Dataset]. https://catalog.data.gov/dataset/calibration-datasets-and-model-archive-summaries-for-regression-models-developed-to-estima
    Explore at:
    Dataset updated
    Oct 22, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    San Juan River, New Mexico, Utah, Colorado
    Description

    This data release supports the following publication: Mast, M. A., 2018, Estimating metal concentrations with regression analysis and water-quality surrogates at nine sites on the Animas and San Juan Rivers, Colorado, New Mexico, and Utah: U.S. Geological Survey Scientific Investigations Report 2018-5116. The U.S. Geological Survey (USGS), in cooperation with the U. S. Environmental Protection Agency (EPA), developed site-specific regression models to estimate concentrations of selected metals at nine USGS streamflow-gaging stations along the Animas and San Juan Rivers. Multiple linear-regression models were developed by relating metal concentrations in discrete water-quality samples to continuously monitored streamflow and surrogate parameters including specific conductance, pH, turbidity, and water temperature. Models were developed for dissolved and total concentrations of aluminum, arsenic, cadmium, iron, lead, manganese, and zinc using water-quality samples collected during 2005–17 by several agencies, using different collection methods and analytical laboratories. Calibration datasets in comma-separated format (CSV) include the variables of sampling date and time, metal concentrations (in micrograms per liter), stream discharge (in cubic feet per second), specific conductance (in microsiemens per centimeter at 25 degrees Celsius), pH, water temperature (in degrees Celsius), turbidity (in nephelometric turbidity units), and calculated seasonal terms based on Julian day. Surrogate parameters and discrete water-quality samples were used from nine sites including Cement Creek at Silverton, Colo. (USGS station 09358550); Animas River below Silverton, Colo. (USGS station 09359020); Animas River at Durango, Colo. (USGS station 09361500); Animas River Near Cedar Hill, N. Mex. (USGS station 09363500); Animas River below Aztec, N. Mex. (USGS station 09364010); San Juan River at Farmington, N. Mex. (USGS station 09365000); San Juan River at Shiprock, N. Mex (USGS Station 09368000); San Juan River at Four Corners, Colo. (USGS station 09371010); and San Juan River near Bluff, Utah (USGS station 09379500). Model archive summaries in pdf format include model statistics, data, and plots and were generated using a R script developed by USGS Kansas Water Science Center available at https://patrickeslick.github.io/ModelArchiveSummary/. A description of each USGS streamflow gaging station along with information about the calibration datasets also are provided.

  9. m

    Datasets used to train and test prediction model to predict scores in terms...

    • data.mendeley.com
    Updated Mar 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jarosław Wątróbski (2025). Datasets used to train and test prediction model to predict scores in terms of SDG 7 realization [Dataset]. http://doi.org/10.17632/6c8fm7s4y2.1
    Explore at:
    Dataset updated
    Mar 5, 2025
    Authors
    Jarosław Wątróbski
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The datasets used in this research work refer to the aims of Sustainable Development Goal 7. These datasets were used to train and test machine learning model based on artificial neural network and other machine learning regression models for solving the problem of prediction scores in terms of SDG 7 aims realization. Train dataset was created based on data from 2013 to 2021 and includes 261 samples. Test dataset includes 29 samples. Sources data from 2013 to 2022 are available in 10 XLSX and CSV files. Train and test datasets are available in XLSX and CSV files. Detailed description of data is available in PDF file.

  10. Cancer Regression

    • kaggle.com
    Updated Apr 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Varun Raskar (2024). Cancer Regression [Dataset]. https://www.kaggle.com/datasets/varunraskar/cancer-regression
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 14, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Varun Raskar
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The dataset contains 2 .csv files

    This file contains various demographic and health-related data for different regions. Here's a brief description of each column:

    File 1st

    avganncount: Average number of cancer cases diagnosed annually.

    avgdeathsperyear: Average number of deaths due to cancer per year.

    target_deathrate: Target death rate due to cancer.

    incidencerate: Incidence rate of cancer.

    medincome: Median income in the region.

    popest2015: Estimated population in 2015.

    povertypercent: Percentage of population below the poverty line.

    studypercap: Per capita number of cancer-related clinical trials conducted.

    binnedinc: Binned median income.

    medianage: Median age in the region.

    pctprivatecoveragealone: Percentage of population covered by private health insurance alone.

    pctempprivcoverage: Percentage of population covered by employee-provided private health insurance.

    pctpubliccoverage: Percentage of population covered by public health insurance.

    pctpubliccoveragealone: Percentage of population covered by public health insurance only.

    pctwhite: Percentage of White population.

    pctblack: Percentage of Black population.

    pctasian: Percentage of Asian population.

    pctotherrace: Percentage of population belonging to other races.

    pctmarriedhouseholds: Percentage of married households. birthrate: Birth rate in the region.

    File 2nd

    This file contains demographic information about different regions, including details about household size and geographical location. Here's a description of each column:

    statefips: The FIPS code representing the state.

    countyfips: The FIPS code representing the county or census area within the state.

    avghouseholdsize: The average household size in the region.

    geography: The geographical location, typically represented as the county or census area name followed by the state name.

    Each row in the file represents a specific region, providing details about household size and geographical location. This information can be used for various demographic analyses and studies.

  11. Cleaned NHANES 1988-2018

    • figshare.com
    txt
    Updated Feb 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet (2025). Cleaned NHANES 1988-2018 [Dataset]. http://doi.org/10.6084/m9.figshare.21743372.v9
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 18, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The National Health and Nutrition Examination Survey (NHANES) provides data and have considerable potential to study the health and environmental exposure of the non-institutionalized US population. However, as NHANES data are plagued with multiple inconsistencies, processing these data is required before deriving new insights through large-scale analyses. Thus, we developed a set of curated and unified datasets by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 135,310 participants and 5,078 variables. The variables conveydemographics (281 variables),dietary consumption (324 variables),physiological functions (1,040 variables),occupation (61 variables),questionnaires (1444 variables, e.g., physical activity, medical conditions, diabetes, reproductive health, blood pressure and cholesterol, early childhood),medications (29 variables),mortality information linked from the National Death Index (15 variables),survey weights (857 variables),environmental exposure biomarker measurements (598 variables), andchemical comments indicating which measurements are below or above the lower limit of detection (505 variables).csv Data Record: The curated NHANES datasets and the data dictionaries includes 23 .csv files and 1 excel file.The curated NHANES datasets involves 20 .csv formatted files, two for each module with one as the uncleaned version and the other as the cleaned version. The modules are labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments."dictionary_nhanes.csv" is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 5,078 variables in NHANES."dictionary_harmonized_categories.csv" contains the harmonized categories for the categorical variables.“dictionary_drug_codes.csv” contains the dictionary for descriptors on the drugs codes.“nhanes_inconsistencies_documentation.xlsx” is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES modules.R Data Record: For researchers who want to conduct their analysis in the R programming language, only cleaned NHANES modules and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file.“w - nhanes_1988_2018.RData” contains all the aforementioned datasets as R data objects. We make available all R scripts on customized functions that were written to curate the data.“m - nhanes_1988_2018.R” shows how we used the customized functions (i.e. our pipeline) to curate the original NHANES data.Example starter codes: The set of starter code to help users conduct exposome analysis consists of four R markdown files (.Rmd). We recommend going through the tutorials in order.“example_0 - merge_datasets_together.Rmd” demonstrates how to merge the curated NHANES datasets together.“example_1 - account_for_nhanes_design.Rmd” demonstrates how to conduct a linear regression model, a survey-weighted regression model, a Cox proportional hazard model, and a survey-weighted Cox proportional hazard model.“example_2 - calculate_summary_statistics.Rmd” demonstrates how to calculate summary statistics for one variable and multiple variables with and without accounting for the NHANES sampling design.“example_3 - run_multiple_regressions.Rmd” demonstrates how run multiple regression models with and without adjusting for the sampling design.

  12. n

    Data for: Identification of hindered internal rotational mode for complex...

    • narcis.nl
    • data.mendeley.com
    Updated Nov 8, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Le, T (via Mendeley Data) (2017). Data for: Identification of hindered internal rotational mode for complex chemical species: A data mining approach with multivariate logistic regression model [Dataset]. http://doi.org/10.17632/d37mzs3b3m.2
    Explore at:
    Dataset updated
    Nov 8, 2017
    Dataset provided by
    Data Archiving and Networked Services (DANS)
    Authors
    Le, T (via Mendeley Data)
    Description

    The "Dataset_HIR" folder contains the data to reproduce the results of the data mining approach proposed in the manuscript titled "Identification of hindered internal rotational mode for complex chemical species: A data mining approach with multivariate logistic regression model".

    More specifically, the folder contains the raw electronic structure calculation input data provided by the domain experts as well as the training and testing dataset with the extracted features.

    The "Dataset_HIR" folder contains the following subfolders namely:

    1. Electronic structure calculation input data: contains the electronic structure calculation input generated by the Gaussian program

      1.1. Testing data: contains the raw data of all training species (each is stored in a separate folder) used for extracting dataset for training and validation phase.

      1.2. Testing data: contains the raw data of all testing species (each is stored in a separate folder) used for extracting data for the testing phase.

    2. Dataset 2.1. Training dataset: used to produce the results in Tables 3 and 4 in the manuscript

      + datasetTrain_raw.csv: contains the features for all vibrational modes associated with corresponding labeled species to let the chemists select the Hindered Internal Rotor from the list easily for the training and validation steps.  
      
      + datasetTrain.csv: refines the datasetTrain_raw.csv where the names of the species are all removed to transform the dataset into an appropriate form for the modeling and validation steps.
      

      2.2. Testing dataset: used to produce the results of the data mining approach in Table 5 in the manuscript.

      + datasetTest_raw.csv: contains the features for all vibrational modes of each labeled species to let the chemists select the Hindered Internal Rotor from the list for the testing step.
      
      + datasetTest.csv: refines the datasetTest_raw.csv where the names of the species are all removed to transform the dataset into an appropriate form for the testing step.
      

    Note for the Result feature in the dataset: 1 is for the mode needed to be treated as Hindered Internal Rotor, and 0 otherwise.

  13. g

    Data from: Data for Development of Regression Equations to Estimate Flow...

    • gimi9.com
    • data.usgs.gov
    • +1more
    Updated Jul 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Data for Development of Regression Equations to Estimate Flow Durations, Low-Flow Frequencies, and Mean Flows in Connecticut using Data Collected through Water Year 2022 [Dataset]. https://gimi9.com/dataset/data-gov_data-for-development-of-regression-equations-to-estimate-flow-durations-low-flow-frequenci/
    Explore at:
    Dataset updated
    Jul 23, 2025
    Description

    The U.S. Geological Survey (USGS), in cooperation with the Connecticut Department of Energy and Environmental Protection and the Connecticut Department of Transportation, calculated select streamflow statistics and basin characteristics. This U.S. Geological Survey (USGS) data release provides select streamflow statistics and basin characteristics for 118 streamgages in Connecticut and adjacent states compiled using data through water year 2022. A subset of the data from select streamgages with minimally altered flow conditions was used in the development of statewide regression equations to estimate: 1-, 5-, 10-, 25-, 50-, 75-, 90-, 99-percent flow durations; 7-day, 10-year low-flow frequency (7Q10) and 30-day, 2-year low-flow frequency (30Q2); mean flow, spring mean flow, and harmonic mean flow; and monthly and seasonal flow durations ranging from 1- to 99-percent for six bioperiods ⸺ salmonid spawning (November), overwinter (December–February), habitat forming (March-April), clupeid spawning (May), resident spawning (June), and rearing and growth (July–October). The data used in this study were compiled to help provide federal and state governments with the information needed to manage their water resources. This data release contains (1) data files in comma-separated values (.CSV) format of the streamflow statistics and basin characteristics for 118 streamgages, and (2) the USGS Weighted Multiple-Linear Regression (WREG, version 3.0) input and output data files in text format for four sets of regressions (n-day frequency, flow durations, bioperiod flow durations, and mean flows). The WREG input and output files for 47 statewide regression equations are in zipped folders.

  14. Z

    Investigating terrestrial isopod abundance in sandplain grassland using a...

    • data.niaid.nih.gov
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mckenna-Foster, Andrew (2024). Investigating terrestrial isopod abundance in sandplain grassland using a multiple linear regression [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3841616
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Maria Mitchell Association
    Authors
    Mckenna-Foster, Andrew
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Most North American species of terrestrial isopod (Isopoda) have been introduced from Europe. Sandplain grassland is a globally rare habitat that is abundant on Nantucket Island, Massachusetts and the abundance of terrestrial isopods in the habitat has never been studied. The objective of this project was to develop a model to explain isopod abundance based on vegetation characteristics within Sandplain grassland and use this model to test for land management effects (prescribed burning and mowing) on isopod abundance. I counted terrestrial isopods from 175 pitfall traps set for one week and used multiple linear regression with several selection algorithms to select the best model. The vegetation characteristics I used as regressors do not appear to explain terrestrial abundance well and the final model only contains the percent grass coverage as a regressor. The model suggests that terrestrial isopods decrease in abundance with increasing grass coverage and it explains 29 percent of the data. When management effects are incorporated, the model suggests that mowing significantly increases isopod abundance.

    Funding for this project came from the Nantucket Islands Land Bank, Nantucket Land Council, and the Nantucket Biodiversity Initiative.

    Associated vegetation data is in the published "Effects of Sandplain Grassland Management on Spider Richness and Abundance on Nantucket Island" dataset. Sampling methods are in the thesis linked from that dataset.

    allisopodData.csv - isopod counts by trap dataDictionary.csv - descriptions of variables mckenna-foster_2009.pdf - a report submitted to NBI and used as part of a statistics class at the University of Wisconsin-Green Bay

  15. Summary of datasets used in multiple linear regression models.

    • plos.figshare.com
    txt
    Updated Apr 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ying Wang; Fang Yuan; Yueqian Song; Huaxiang Rao; Lili Xiao; Huilin Guo; Xiaolong Zhang; Mufan Li; Jiayu Wang; Yi zhou Ren; Jie Tian; Jianzhou Yang (2024). Summary of datasets used in multiple linear regression models. [Dataset]. http://doi.org/10.1371/journal.pone.0301420.s004
    Explore at:
    txtAvailable download formats
    Dataset updated
    Apr 9, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ying Wang; Fang Yuan; Yueqian Song; Huaxiang Rao; Lili Xiao; Huilin Guo; Xiaolong Zhang; Mufan Li; Jiayu Wang; Yi zhou Ren; Jie Tian; Jianzhou Yang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data is used to train the Multiple Linear Regression Models. (CSV)

  16. f

    Data_Sheet_2_Correlation between hearing loss and mild cognitive impairment...

    • frontiersin.figshare.com
    • figshare.com
    application/csv
    Updated Jun 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tong Xu; Tao Zong; Jing Liu; Le Zhang; Hai Ge; Rong Yang; Zongtao Liu (2024). Data_Sheet_2_Correlation between hearing loss and mild cognitive impairment in the elderly population: Mendelian randomization and cross-sectional study.CSV [Dataset]. http://doi.org/10.3389/fnagi.2024.1380145.s002
    Explore at:
    application/csvAvailable download formats
    Dataset updated
    Jun 7, 2024
    Dataset provided by
    Frontiers
    Authors
    Tong Xu; Tao Zong; Jing Liu; Le Zhang; Hai Ge; Rong Yang; Zongtao Liu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundHearing loss and tinnitus have been linked to mild cognitive impairment (MCI); however, the evidence is constrained by ethical and temporal constraints, and few prospective studies have definitively established causation. This study aims to utilize Mendelian randomization (MR) and cross-sectional studies to validate and analyze this association.MethodsThis study employs a two-step approach. Initially, the genetic data of the European population from the Genome-wide association studies (GWAS) database is utilized to establish the causal relationship between hearing loss and cognitive impairment through Mendelian randomization using the inverse variance weighted (IVW) method. This is achieved by identifying strongly correlated single nucleotide polymorphisms (SNPs), eliminating linkage disequilibrium, and excluding weak instrumental variables. In the second step, 363 elderly individuals from 10 communities in Qingdao, China are assessed and examined using methods questionnaire survey and pure tone audiology (PTA). Logistic regression and multiple linear regression were used to analyze the risk factors of MCI in the elderly and to calculate the cutoff values.ResultsMendelian randomization studies have shown that hearing loss is a risk factor for MCI in European populations, with a risk ratio of hearing loss to MCI loss of 1. 23. The findings of this cross-sectional study indicate that age, tinnitus, and hearing loss emerged as significant risk factors for MCI in univariate logistic regression analysis. Furthermore, multivariate logistic regression analysis identified hearing loss and tinnitus as potential risk factors for MCI. Consistent results were observed in multiple linear regression analysis, revealing that hearing loss and age significantly influenced the development of MCI. Additionally, a notable finding was that the likelihood of MCI occurrence increased by 9% when the hearing threshold exceeded 20 decibels.ConclusionThis study provides evidence from genomic and epidemiological investigations indicating that hearing loss may serve as a risk factor for cognitive impairment. While our epidemiological study has found both hearing loss and tinnitus as potential risk factors for cognitive decline, additional research is required to establish a causal relationship, particularly given that tinnitus can manifest as a symptom of various underlying medical conditions.

  17. Energy Consumption Dataset - Linear Regression

    • kaggle.com
    Updated Jan 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GOVINDARAM SRIRAM (2025). Energy Consumption Dataset - Linear Regression [Dataset]. https://www.kaggle.com/datasets/govindaramsriram/energy-consumption-dataset-linear-regression
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 6, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    GOVINDARAM SRIRAM
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description: This dataset is designed for predicting energy consumption based on various building features and environmental factors. It contains data for multiple building types, square footage, the number of occupants, appliances used, average temperature, and the day of the week. The goal is to build a predictive model to estimate energy consumption using these attributes.

    The dataset can be used for training machine learning models such as linear regression to forecast energy needs based on the building's characteristics. This is useful for understanding energy demand patterns and optimizing energy consumption in different building types and environmental conditions.

  18. d

    Data from: Myoelectric prosthesis control using recurrent convolutional...

    • search.dataone.org
    • resodate.org
    • +1more
    Updated Jun 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heather E. Williams; Ahmed W. Shehata; Kodi Y. Cheng; Jacqueline S. Hebert; Patrick M. Pilarski (2025). Myoelectric prosthesis control using recurrent convolutional neural network regression mitigates the limb position effect [Dataset]. http://doi.org/10.5061/dryad.rv15dv4ks
    Explore at:
    Dataset updated
    Jun 10, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Heather E. Williams; Ahmed W. Shehata; Kodi Y. Cheng; Jacqueline S. Hebert; Patrick M. Pilarski
    Description

    Many myoelectric upper limb prosthesis controllers use pattern recognition, a method that learns and recognizes patterns of electromyographic (EMG) signals produced by the user’s residual limb muscles to predict and execute device movements. Such control becomes unreliable in high limb positions—a problem known as the limb position effect. Pattern recognition often uses a classification algorithm; simple to implement, but limits user-initiated control to only one device movement at a time, at a single speed. To combat position-related control deficiencies and classification controller constraints, we developed and tested two recurrent convolutional neural network (RCNN) pattern recognition-based solutions: (1) an RCNN classification controller that uses EMG plus positional inertial measurement unit (IMU) signals to offer one-speed, sequential movement control; and (2) an RCNN regression controller that uses the same data capture technique to offer simultaneous control of multiple moveme..., , # Myoelectric Prosthesis Control using Recurrent Convolutional Neural Network Regression Mitigates the Limb Position Effect

    Dataset DOI: 10.5061/dryad.rv15dv4ks

    Description of the data and file structure

    This dataset contains calculated metrics from 16 non-disabled participants, each testing two myoelectric prosthesis control strategies: 1) either a recurrent convolutional neural network-based classification model (RCNN-Class) or a recurrent convolutional neural network-based regression model (RCNN-Reg), and 2) a linear discriminant analysis classification baseline (LDA-Baseline).Â

    Files and variables

    File: RCNN_vs_LDA-Baseline.csv

    Description:Â

    Variables
    • ParticipantID:Â Randomly assigned 3-digit participant identification number.
    • ControlStrategy:Â Which control strategy was used for the given trial: linear discriminant analysis baseline classification (LDA-Baseline) or recurrent convolutional neural network classification wit..., All participants provided explicit written consent for their de-identified data to be shared in the public domain. Prior to data sharing, all personally identifiable information (PII) was removed to ensure participant anonymity. Electromyographic (EMG) and inertial measurement unit (IMU) data were anonymized by assigning randomized participant codes. The resulting dataset contains only non-identifiable signal data and task labels, ensuring compliance with ethical standards for human subject research and data sharing.
  19. Z

    Data from: Using social media and personality traits to assess software...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leo Silva; Marília Gurgel de Castro; Miriam Bernardino Silva; Milena Santos; Uirá Kulesza; Margarida Lima; Henrique Madeira (2023). Using social media and personality traits to assess software developers' emotional polarity [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7846995
    Explore at:
    Dataset updated
    Apr 20, 2023
    Dataset provided by
    Centre of Informatics and Systems, University of Coimbra, Polo II, Pinhal de Marrocos, 3030-290 Coimbra, Portugal
    Faculty of Psychology and Educational Sciences, University of Coimbra, Colégio Novo Street, 3001-802 Coimbra, Portugal
    Department of Informatics and Applied Mathematics, Federal University of Rio Grande do Norte, 59072-970, Natal, Brazil
    Authors
    Leo Silva; Marília Gurgel de Castro; Miriam Bernardino Silva; Milena Santos; Uirá Kulesza; Margarida Lima; Henrique Madeira
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Companion DATA

    Title: Using social media and personality traits to assess software developers' emotional polarity

    Authors: Leo Moreira Silva Marília Gurgel Castro Miriam Bernardino Silva Milena Santos Uirá Kulesza Margarida Lima Henrique Madeira

    Journal: PeerJ Computer Science

    Github: https://github.com/leosilva/peerj_computer_science_2022

    The folders contain:

    Experiment_Protocol.pdf: document that present the protocol regarding recruitment protocol, data collection of public posts from Twitter, criteria for manual analysis, and the assessment of Big Five factors from participants and psychologists. English version.

    /analysis analyzed_tweets_by_psychologists.csv: file containing the manual analysis done by psychologists analyzed_tweets_by_participants.csv: file containing the manual analysis done by participants analyzed_tweets_by_psychologists_solved_divergencies.csv: file containing the manual analysis done by psychologists over 51 divergent tweets' classifications

    /dataset alldata.json: contains the dataset used in the paper

    /ethics_committee committee_response_english_version.pdf: contains the acceptance response of Research Ethics and Deontology Committee of the Faculty of Psychology and Educational Sciences of the University of Coimbra. English version. committee_response_original_portuguese_version: contains the acceptance response of Research Ethics and Deontology Committee of the Faculty of Psychology and Educational Sciences of the University of Coimbra. Portuguese version. committee_submission_form_english_version.pdf: the project submitted to the committee. English version. committee_submission_form_original_portuguese_version.pdf: the project submitted to the committee. Portuguese version. consent_form_english_version.pdf: declaration of free and informed consent fulfilled by participants. English version. consent_form_original_portuguese_version.pdf: declaration of free and informed consent fulfilled by participants. Portuguese version. data_protection_declaration_english_version.pdf: personal data and privacy declaration, according to European Union General Data Protection Regulation. English version. data_protection_declaration_original_portuguese_version.pdf: personal data and privacy declaration, according to European Union General Data Protection Regulation. Portuguese version.

    /notebooks General - Charts.ipynb: notebook file containing all charts produced in the study, including those in the paper Statistics - Lexicons and Ensembles.ipynb: notebook file with the statistics for the five lexicons and ensembles used in the study Statistics - Linear Regression.ipynb: notebook file with the multiple linear regression results Statistics - Polynomial Regression.ipynb: notebook file with the polynomial regression results Statistics - Psychologists versus Participants.ipynb: notebook file with the statistics between the psychologists and participants manual analysis Statistics - Working x Non-working.ipynb: notebook file containing the statistical analysis for the tweets posted during work period and those posted outside of working period

    /surveys Demographic_Survey_english_version.pdf: survey inviting participants to enroll in the study. We collect demographic data and participants' authorization to access their public Tweet posts. English version. Demographic_Survey_portuguese_version.pdf: survey inviting participants to enroll in the study. We collect demographic data and participants' authorization to access their public Tweet posts. Portuguese version. Demographic_Survey_answers.xlsx: participants' demographic survey answers ibf_pt_br.doc: the Portuguese version of the Big Five Inventory (BFI) instrument to infer participants' Big Five polarity traits. ibf_en.doc: translation in English of the Portuguese version of the Big Five Inventory (BFI) instrument to infer participants' Big Five polarity traits. ibf_answers.xlsx: participantes' and psychologists' answers for BFI

    We have removed from dataset any sensible data to protect participants' privacy and anonymity. We have removed from demographic survey answers any sensible data to protect participants' privacy and anonymity.

  20. Z

    NII Face Mask Dataset

    • data.niaid.nih.gov
    Updated Jan 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Trung-Nghia Le; Khanh-Duy Nguyen; Huy H. Nguyen; Junichi Yamagishi; Isao Echizen (2022). NII Face Mask Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5761724
    Explore at:
    Dataset updated
    Jan 26, 2022
    Dataset provided by
    National Institute of Informatics, Japan
    University of Information Technology-VNUHCM, Vietnam
    Authors
    Trung-Nghia Le; Khanh-Duy Nguyen; Huy H. Nguyen; Junichi Yamagishi; Isao Echizen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    =====================================================================

    NII Face Mask Dataset v1.0

    =====================================================================

    Authors: Trung-Nghia Le (1), Khanh-Duy Nguyen (2), Huy H. Nguyen (1), Junichi Yamagishi (1), Isao Echizen (1)

    Affiliations: (1)National Institute of Informatics, Japan (2)University of Information Technology-VNUHCM, Vietnam

    National Institute of Informatics Copyright (c) 2021

    Emails: {ltnghia, nhhuy, jyamagis, iechizen}@nii.ac.jp, {khanhd}@uit.edu.vn

    Arxiv: https://arxiv.org/abs/2111.12888 NII Face Mask Dataset v1.0: https://zenodo.org/record/5761725

    =============================== INTRODUCTION ===============================

    The NII Face Mask Dataset is the first large-scale dataset targeting mask-wearing ratio estimation in street cameras. This dataset contains 581,108 face annotations extracted from 18,088 video frames (1920x1080 pixels) in 17 street-view videos obtained from the Rambalac's YouTube channel.

    The videos were taken in multiple places, at various times, before and during the COVID-19 pandemic. The total length of the videos is approximately 56 hours.

    =============================== REFERENCES ===============================

    If your publish using any of the data in this dataset please cite the following papers:

    Pre-print version

    @article{Nguyen202112888, title={Effectiveness of Detection-based and Regression-based Approaches for Estimating Mask-Wearing Ratio}, author={Nguyen, Khanh-Duy and Nguyen, Huy H and Le, Trung-Nghia and Yamagishi, Junichi and Echizen, Isao}, archivePrefix={arXiv}, arxivId={2111.12888}, url={https://arxiv.org/abs/2111.12888}, year={2021} }

    Final version

    @INPROCEEDINGS{Nguyen2021EstMaskWearing, author={Nguyen, Khanh-Duv and Nguyen, Huv H. and Le, Trung-Nghia and Yamagishi, Junichi and Echizen, Isao}, booktitle={2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021)}, title={Effectiveness of Detection-based and Regression-based Approaches for Estimating Mask-Wearing Ratio}, year={2021}, pages={1-8}, url={https://ieeexplore.ieee.org/document/9667046}, doi={10.1109/FG52635.2021.9667046}}

    ======================== DATA STRUCTURE ==================================

    1. Directory Structure

    ./NFM ├── dataset │ ├── train.csv: annotations for the train set. │ ├── test.csv: annotations for the test set. └── README_v1.0.md

    2. Description for each files in detail.

    We use the same structure for two CSV files (train.csv and test.csv). Both CSV files have the same columns: <1st column>: video_id (a source video can be found by following the link: https://www.youtube.com/watch?v=) <2nd column>: frame_id (the index of a frame extracted from the source video) <3rd column>: timestamp in milisecond (the timestamp of a frame extracted from the source video) <4th column>: label (for each annotated face, one of three labels was attached with a bounding box: 'Mask'/'No-Mask'/'Unknown') <5th column>: left <6th column>: top <7th column>: right <8th column>: bottom Four coordinates (left, top, right, bottom) were used to denote a face's bounding box.

    ============================== COPYING ================================

    This repository is made available under Creative Commons Attribution License (CC-BY).

    Regarding Creative Commons License: Attribution 4.0 International (CC BY 4.0), please see https://creativecommons.org/licenses/by/4.0/

    THIS DATABASE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS DATABASE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE

    ====================== ACKNOWLEDGEMENTS ================================

    This research was partly supported by JSPS KAKENHI Grants (JP16H06302, JP18H04120, JP21H04907, JP20K23355, JP21K18023), and JST CREST Grants (JPMJCR20D3, JPMJCR18A6), Japan.

    This dataset is based on the Rambalac's YouTube channel: https://www.youtube.com/c/Rambalac

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
FayeJavad (2020). Marketing Linear Multiple Regression [Dataset]. https://www.kaggle.com/datasets/fayejavad/marketing-linear-multiple-regression
Organization logo

Marketing Linear Multiple Regression

Explore at:
zip(1907 bytes)Available download formats
Dataset updated
Apr 24, 2020
Authors
FayeJavad
Description

Dataset

This dataset was created by FayeJavad

Contents

Search
Clear search
Close search
Google apps
Main menu