95 datasets found
  1. u

    Data from: Dataset of the paper “Variable selection for linear regression in...

    • investigacion.ubu.es
    Updated 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pacheco Bonrostro, Joaquín; Casado Yusta, Silvia; Pacheco Bonrostro, Joaquín; Casado Yusta, Silvia (2020). Dataset of the paper “Variable selection for linear regression in large databases: exact methods” Applied Intelligence, 51(6), 3736-3756 [Dataset]. https://investigacion.ubu.es/documentos/682afba74c44bf76b28811e1
    Explore at:
    Dataset updated
    2020
    Authors
    Pacheco Bonrostro, Joaquín; Casado Yusta, Silvia; Pacheco Bonrostro, Joaquín; Casado Yusta, Silvia
    Description

    The variable selection problem in the context of Linear Regression for large databases is analysed. The problem consists in selecting a small subset of independent variables that can perform the prediction task optimally. This problem has a wide range of applications. One important type of application is the design of composite indicators in various areas (sociology and economics, for example). Other important applications of variable selection in linear regression can be found in fields such as chemometrics, genetics, and climate prediction, among many others. For this problem, we propose a Branch & Bound method. This is an exact method and therefore guarantees optimal solutions. We also provide strategies that enable this method to be applied in very large databases (with hundreds of thousands of cases) in a moderate computation time. A series of computational experiments shows that our method performs well compared with well-known methods in the literature and with commercial software.

  2. WaterMelon dataset

    • kaggle.com
    Updated Mar 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JunHyeok_Choi (2023). WaterMelon dataset [Dataset]. https://www.kaggle.com/datasets/junhyeokchoi99/watermelon-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 7, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    JunHyeok_Choi
    Description

    This dataset was extracted from the academic data below

    An Econometric Model of the Watermelon Market (Suites, 1955) : https://www.jstor.org/stable/1233923?seq=1 Suits' watermelon model (Stewart, 2018) : https://www.uvic.ca/socialsciences/economics/assets/docs/seminars/KenStewartBrownBagFeb28.pdf

    Variable Description Year : Year (1930-1951) log q (Q) : Total number of watermelons available for harvest (millions) log h (X) : Watermelons harvested (millions) log p (P) : Average farm price of watermelons ( $ per 1,000) log pc (C) : Average annual net farm receipts per pound of lncottonprice (dollars) log pv (T) : Average farm price of vegetables (index) log w (W) : Farm lnwageindex rates in the South Atlantic States (index) log n (N) : US population (milions) log(y/n) (Y/N) : Per capita disposable ($) log pf (F) : Railway lnfreightcostindex costs for watermelons (index)

  3. d

    Data for Regression Models to Estimate Water Use in Providence, Rhode...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Data for Regression Models to Estimate Water Use in Providence, Rhode Island, 2014-2021 [Dataset]. https://catalog.data.gov/dataset/data-for-regression-models-to-estimate-water-use-in-providence-rhode-island-2014-2021
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Rhode Island, Providence
    Description

    This data release contains input data and programs (scripts) used to estimate monthly water demand for retail customers of Providence Water, located in Providence, Rhode Island. Explanatory data and model outputs are from July 2014 through June 2021. Models of per capita (for single-family residential customers) or per connection (for multi-family residential, commercial, and industrial customers) water use were developed using multiple linear regression. The dependent variables, provided by Providence Water, are the monthly number of connections and gallons of water delivered to single- and multi-family residential, commercial, and industrial connections. Potential independent variables (from online sources) are climate variables (temperature and precipitation), economic statistics, and a drought statistic. Not all independent variables were used in all of the models. The data are provided in data tables and model files. The data table RIWaterUseVariableExplanation.csv describes the explanatory variables and their data sources. The data table ProvModelInputData.csv provides the monthly water-use data that are the independent variables and the monthly climatic and economic data that are the dependent variables. The data table DroughtInputData.csv provides the weekly U.S. drought monitor index values that were processed to formulate a potential independent variable. The R script model_water_use.R runs the models that predict water use. The other two R scripts (load_preprocess_input_data.R and model_water_use_functions.R) are not run explicitly but are called from the primary script model_water_use.R. Regression equations produced by the models can be used to predict water demand throughout Rhode Island.

  4. w

    Dataset of book subjects that contain The economics of low pay in Britain :...

    • workwithdata.com
    Updated Nov 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2024). Dataset of book subjects that contain The economics of low pay in Britain : a logistic regression approach [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=The+economics+of+low+pay+in+Britain+%3A+a+logistic+regression+approach&j=1&j0=books
    Explore at:
    Dataset updated
    Nov 7, 2024
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United Kingdom
    Description

    This dataset is about book subjects. It has 1 row and is filtered where the books is The economics of low pay in Britain : a logistic regression approach. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.

  5. Canada per capita income prediction

    • kaggle.com
    Updated Jul 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anagha K P (2023). Canada per capita income prediction [Dataset]. https://www.kaggle.com/datasets/anaghakp/ml-prediction
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 28, 2023
    Dataset provided by
    Kaggle
    Authors
    Anagha K P
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Canada
    Description

    Canada Per Capita Income Dataset: Contextualizing Economic Growth and Trends. This comprehensive dataset features per capita income data for Canada spanning multiple years, providing valuable insights into the country's economic progression. Sourced from reputable economic databases and governmental records, this dataset serves as a valuable resource for analysts, researchers, and policymakers. Inspired by the need for accessible and reliable economic data on Kaggle, this dataset aims to facilitate informed decision-making and foster a deeper understanding of Canada's income dynamics over time.

  6. o

    Replication data for: Big Data: New Tricks for Econometrics

    • openicpsr.org
    Updated May 1, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hal R. Varian (2014). Replication data for: Big Data: New Tricks for Econometrics [Dataset]. http://doi.org/10.3886/E113925V1
    Explore at:
    Dataset updated
    May 1, 2014
    Dataset provided by
    American Economic Association
    Authors
    Hal R. Varian
    Time period covered
    May 1, 2014
    Description

    Computers are now involved in many economic transactions and can capture data associated with these transactions, which can then be manipulated and analyzed. Conventional statistical and econometric techniques such as regression often work well, but there are issues unique to big datasets that may require different tools. First, the sheer size of the data involved may require more powerful data manipulation tools. Second, we may have more potential predictors than appropriate for estimation, so we need to do some kind of variable selection. Third, large datasets may allow for more flexible relationships than simple linear models. Machine learning techniques such as decision trees, support vector machines, neural nets, deep learning, and so on may allow for more effective ways to model complex relationships. In this essay, I will describe a few of these tools for manipulating and analyzing big data. I believe that these methods have a lot to offer and should be more widely known and used by economists.

  7. m

    Results of univariate gamma regression models for direct costs and per...

    • data.mendeley.com
    • narcis.nl
    Updated Feb 19, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paula Becker (2020). Results of univariate gamma regression models for direct costs and per capita direct costs according to results of ASSIST for alcohol, cannabis, and cocaine/crack [Dataset]. http://doi.org/10.17632/mnj98x2ghc.1
    Explore at:
    Dataset updated
    Feb 19, 2020
    Authors
    Paula Becker
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This material brings data on the results of univariate gamma regression model for direct costs, which was the first stage of inferential analysis using the linear regression model so that we could analyse which variables could be interacting with total direct cost per capita. The first table shows these data and precedes the multivariate analysis described in the article. The second table shows a more detailed descreptive analysis of per capita direct costs according to the current drug use pattern (evaluated by ASSIST alcohol, cannabis and cocaine/crack), including mean, standard deviation, minimum, maximum, first quartile, median, third quartile and the p value according to Kruskal-Wallis test. These data make reference to the article by Dr. Paula Becker e Dr. Denise Razzouk called " Relationships between age of onset of drug use, use pattern, and direct health costs in a sample of adults’ drug dependents in treatment at a Brazilian community mental health service ".

  8. Medical_cost_dataset

    • kaggle.com
    Updated Aug 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nandita Pore (2023). Medical_cost_dataset [Dataset]. https://www.kaggle.com/datasets/nanditapore/medical-cost-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 19, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nandita Pore
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Description:

    Explore the intricacies of medical costs and healthcare expenses with our meticulously curated Medical Cost Dataset. This dataset offers valuable insights into the factors influencing medical charges, enabling researchers, analysts, and healthcare professionals to gain a deeper understanding of the dynamics within the healthcare industry.

    Columns: 1. ID: A unique identifier assigned to each individual record, facilitating efficient data management and analysis. 2. Age: The age of the patient, providing a crucial demographic factor that often correlates with medical expenses. 3. Sex: The gender of the patient, offering insights into potential cost variations based on biological differences. 4. BMI: The Body Mass Index (BMI) of the patient, indicating the relative weight status and its potential impact on healthcare costs. 5. Children: The number of children or dependents covered under the medical insurance, influencing family-related medical expenses. 6. Smoker: A binary indicator of whether the patient is a smoker or not, as smoking habits can significantly impact healthcare costs. 7. Region: The geographic region of the patient, helping to understand regional disparities in healthcare expenditure. 8. Charges: The medical charges incurred by the patient, serving as the target variable for analysis and predictions.

    Whether you're aiming to uncover patterns in medical billing, predict future healthcare costs, or explore the relationships between different variables and charges, our Medical Cost Dataset provides a robust foundation for your research. Researchers can utilize this dataset to develop data-driven models that enhance the efficiency of healthcare resource allocation, insurers can refine pricing strategies, and policymakers can make informed decisions to improve the overall healthcare system.

    Unlock the potential of healthcare data with our comprehensive Medical Cost Dataset. Gain insights, make informed decisions, and contribute to the advancement of healthcare economics and policy. Start your analysis today and pave the way for a healthier future.

  9. f

    County-level dataset accuracy.

    • plos.figshare.com
    xls
    Updated Nov 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiaofeng Xu; Zhaoyuan Chen; Shixiang Chen (2023). County-level dataset accuracy. [Dataset]. http://doi.org/10.1371/journal.pone.0293303.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Nov 7, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Xiaofeng Xu; Zhaoyuan Chen; Shixiang Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Urban economic competitiveness is a fundamental indicator for assessing the level of urban development and serves as an effective approach for understanding regional disparities. Traditional economic competitiveness research that relies solely on traditional regression models and assumes feature relationship theory tends to fall short in fully exploring the intricate interrelationships and nonlinear associations among features. As a result, the study of urban economic disparities remains limited to a narrow range of urban features, which is insufficient for comprehending cities as complex systems. The ability of deep learning neural networks to automatically construct models of nonlinear relationships among complex features provides a new approach to research in this issue. In this study, a complex urban feature dataset comprising 1008 features was constructed based on statistical data from 283 prefecture-level cities in China. Employing a machine learning approach based on convolutional neural network (CNN), a novel analytical model is constructed to capture the interrelationships among urban features, which is applied to achieve accurate classification of urban economic competitiveness. In addition, considering the limited number of samples in the dataset owing to the fixed number of cities, this study developed a data augmentation approach based on deep convolutional generative adversarial network (DCGAN) to further enhance the accuracy and generalization ability of the model. The performance of the CNN classification model was effectively improved by adding the generated samples to the original sample dataset. This study provides a precise and stable analytical model for investigating disparities in regional development. In the meantime, it offers a feasible solution to the limited sample size issue in the application of deep learning in urban research.

  10. m

    Data from: Research Document: Jaouad Karfali Economic Cycle Analysis with...

    • data.mendeley.com
    Updated Feb 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Karfali Jaouad (2025). Research Document: Jaouad Karfali Economic Cycle Analysis with Numerical Time Cycles [Dataset]. http://doi.org/10.17632/wv7dcm5834.1
    Explore at:
    Dataset updated
    Feb 26, 2025
    Authors
    Karfali Jaouad
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Description

    Description: This dataset contains historical economic data spanning from 1871 to 2024, used in Jaouad Karfali’s research on Economic Cycle Analysis with Numerical Time Cycles. The study aims to improve economic forecasting accuracy through the 9-year cycle model, which demonstrates superior predictive capabilities compared to traditional economic indicators.

    Dataset Contents: The dataset includes a comprehensive range of economic indicators used in the research, such as:

    USGDP_1871-2024.csv – U.S. Gross Domestic Product (GDP) data. USCPI_cleaned.csv – U.S. Consumer Price Index (CPI), cleaned and processed. USWAGE_1871-2024.csv – U.S. average wages data. EXCHANGEGLOBAL_cleaned.csv – Global exchange rates for the U.S. dollar. EXCHANGEPOUND_cleaned.csv – U.S. dollar to British pound exchange rates. INTERESTRATE_1871-2024.csv – U.S. interest rate data. UNRATE.csv – U.S. unemployment rate statistics. POPTOTUSA647NWDB.csv – U.S. total population data. Significance of the Data: This dataset serves as a foundation for a robust economic analysis of the U.S. economy over multiple decades. It was instrumental in testing the 9-year economic cycle model, which demonstrated an 85% accuracy rate in economic forecasting when compared to traditional models such as ARIMA and VAR.

    Applications:

    Economic Forecasting: Predicts a 1.5% decline in GDP in 2025, followed by a gradual recovery between 2026-2034. Economic Stability Analysis: Used for comparing forecasts with estimates from institutions like the IMF and World Bank. Academic and Institutional Research: Supports studies in economic cycles and long-term forecasting. Source & Further Information: For more details on the methodology and research findings, refer to the full paper published on SSRN:

    https://ssrn.com/author=7429208 https://orcid.org/0009-0002-9626-7289

    • Jaouad Karfali
  11. H

    Replication Data for: A Regression Analysis of the probability of a...

    • dataverse.harvard.edu
    Updated Jul 22, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pranav Krishnan; Yash Patel (2020). Replication Data for: A Regression Analysis of the probability of a recession and student loan debt utilizing data between 1993-2019 [Dataset]. http://doi.org/10.7910/DVN/WNNWCO
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 22, 2020
    Dataset provided by
    Harvard Dataverse
    Authors
    Pranav Krishnan; Yash Patel
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Over 44.7 million Americans carry student loan debt, with the total amount valued at approximately $1.31 trillion (Quarterly Report, 2019). Ergo, consumer spending, a factor of GDP, is stifled and negatively impacts the economy (Frizell, 2014, p. 22). This study examined the relationship between student loan debt and the probability of a recession in the near future, as well as the effects of proposed student loan forgiveness policies through the use of a created model. The Federal Reserve Bank of St. Louis’s website (FRED) was used to extract data regarding total GDP per quarter and student loan debt per quarter ("Federal Reserve Economic Data," 2019). Through the combination of the student loan debt per quarter and total GDP per quarter datasets, the percentage of total GDP composed of student loan debt per quarter was calculated and fitted to a logistic curve. Future quarterly values for total GDP and the percentage of total GDP composed by student loan debt per quarter were found through Long Short Term Models and Euler’s Method, respectively. Through the creation of a probability of recession index, the probability of recession per quarter was compared to the percentage of total GDP composed by student loan debt per quarter to construct an exponential regression model. Utilizing a primarily quantitative method of analysis, the percentage of total GDP composed by student loan debt per quarter was found to be strongly associated[p < 1.26696* 10-8]with the probability of recession per quarter(p(R)), with the p(R) tending to peak as the percentage of total GDP composed of student loan debt per quarter strayed away from the carrying capacity of the logistic curve. Inputting the student loan debt forgiveness policies of potential congressional bills proposed by lawmakers found that eliminating 49.7 % and 36.7% of student loan debt would reduce the recession probabilities to be 1.73545*10-29% and 9.74474*10-25%, respectively.

  12. f

    Tuned hyperparameters for linear regression.

    • plos.figshare.com
    xls
    Updated Jun 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sherin Kularathne; Namal Rathnayake; Madhawa Herath; Upaka Rathnayake; Yukinobu Hoshino (2024). Tuned hyperparameters for linear regression. [Dataset]. http://doi.org/10.1371/journal.pone.0303883.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Sherin Kularathne; Namal Rathnayake; Madhawa Herath; Upaka Rathnayake; Yukinobu Hoshino
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Rice is a crucial crop in Sri Lanka, influencing both its agricultural and economic landscapes. This study delves into the complex interplay between economic indicators and rice production, aiming to uncover correlations and build prediction models using machine learning techniques. The dataset, spanning from 1960 to 2020, includes key economic variables such as GDP, inflation rate, manufacturing output, population, population growth rate, imports, arable land area, military expenditure, and rice production. The study’s findings reveal the significant influence of economic factors on rice production in Sri Lanka. Machine learning models, including Linear Regression, Support Vector Machines, Ensemble methods, and Gaussian Process Regression, demonstrate strong predictive accuracy in forecasting rice production based on economic indicators. These results underscore the importance of economic indicators in shaping rice production outcomes and highlight the potential of machine learning in predicting agricultural trends. The study suggests avenues for future research, such as exploring regional variations and refining models based on ongoing data collection.

  13. w

    Dataset of publication dates of book series where books equals Behind the...

    • workwithdata.com
    Updated Nov 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2024). Dataset of publication dates of book series where books equals Behind the model : a constructive critique of economic modeling [Dataset]. https://www.workwithdata.com/datasets/book-series?col=book_series%2Cj0-publication_date&f=1&fcol0=j0-book&fop0=%3D&fval0=Behind+the+model+%3A+a+constructive+critique+of+economic+modeling&j=1&j0=books
    Explore at:
    Dataset updated
    Nov 25, 2024
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about book series. It has 1 row and is filtered where the books is Behind the model : a constructive critique of economic modeling. It features 2 columns including publication dates.

  14. Regression models based on Effects of Taxes and Benefits data

    • ons.gov.uk
    • cy.ons.gov.uk
    xlsx
    Updated May 15, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2019). Regression models based on Effects of Taxes and Benefits data [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/wellbeing/datasets/regressionmodelsbasedoneffectsoftaxesandbenefitsdata
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 15, 2019
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Personal and economic well-being regression models based on effects of taxes and benefits data for the UK for the period April 2016 to March 2017

  15. r

    CALY-SWE: Discrete choice experiment and time trade-off data for a...

    • researchdata.se
    • data.europa.eu
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaspar Walter Meili; Lars Lindholm (2024). CALY-SWE: Discrete choice experiment and time trade-off data for a representative Swedish value set [Dataset]. http://doi.org/10.5878/asxy-3p37
    Explore at:
    Dataset updated
    Sep 24, 2024
    Dataset provided by
    Umeå University
    Authors
    Kaspar Walter Meili; Lars Lindholm
    Time period covered
    Jan 8, 2022 - Apr 18, 2022
    Area covered
    Sweden
    Description

    The data consist of two parts: Time trade-off (TTO) data with one row per TTO question (5 questions), and discrete choice experiment (DCE) data with one row per question (6 questions). The purpose of the data is the calculation of a Swedish value set for the capability-adjusted life years (CALY-SWE) instrument. To protect the privacy of the study participants and to comply with GDPR, access to the data is given upon request.

    The data is provided in 4 .csv files with the names:

    • tto.csv (252 kB)
    • dce.csv (282 kB)
    • weights_final_model.csv (30 kB)
    • coefs_final_model.csv (1 kB)

    The first two files (tto.csv, dce.csv) contain the time trade-off (TTO) answers and discrete choice experiment (DCE) answers of participants. The latter two files (weight_final_model.csv, coefs_final_model.csv) contain the generated value set of CALY-SWE weights, and the pertaining coefficients of the main effects additive model.

    Background:

    CALY-SWE is a capability-based instrument for studying Quality of Life (QoL). It consists of 6 attributes (health, social relations, financial situation & housing, occupation, security, political & civil rights) and provides the option to gives for attribute answers on 3 levels (Agree, Agree partially, Do not agree). A configuration or state is one of the 3^6 = 729 possible situations that the instrument describes. Here, a config is denoted in the form of xxxxxx, one x for each attribute in order above. X is a digit corresponding to the level of the respective attribute, with 3 being the highest (Agree), and 1 being the lowest (Do not agree). For example, 222222 encodes a configuration with all attributes on level 2 (Partially agree). The purpose of this dataset is to support the publication of the CALY-SWE value set and to enable reproduction of the calculations (due to privacy concerns we abstain from publishing individual level characteristics). A value set consists of values on the 0 to 1 scale for all 729, each of represents a quality weighting where 1 is the highest capability-related QoL, and 0 the lowest capability-related QoL.

    The data contains answers to two types of questions: TTO and DCE.

    In TTO questions, participants iteratively chose a number of years between 1 to 10. A choice of 10 years is equivalent to living 10 years with full capability (state configuration 333333) in the capability state that the TTO question describes. The answer on the 0 to 1 scale is then calculated as x/10. In the DCE questions, participants were given two states and they chose a state that they found to be better. We used a hybrid model with a linear regression and a logit model component, where the coefficients were linked through a multiplicative factor, to obtain the weights (weights_final_model.csv). Each weight is calculated as constant + the coefficients for the respective configuration. Coefficients for level 3 encode the difference to level 2, and coefficients for level 2 the difference to the constant. For example, for the weight for 123112 is calculated as constant + socrel2 + finhou2 + finhou3 + polciv2 (No coefficients for health, occupation, and security involved as they are on level 1 that is captured in the constant/intercept).

    To assess the quality of TTO answers, we calculated a score per participant that takes into account inconsistencies in answering the TTO question. We then excluded 20% of participants with the worst score to improve the TTO data quality and signal strength for the model (this is indicated by the 'included' variable in the TTO dataset). Details of the entire survey are described in the preprint “CALY-SWE value set: An integrated approach for a valuation study based on an online-administered TTO and DCE survey” by Meili et al. (2023). Please check this document for updated versions.

    Ids have been randomized with preserved linkage between the DCE and TTO dataset.

    Data files and variables:

    Below is a description of the variables in each CSV file. - tto.csv:

    config: 6 numbers representing the attribute levels. position: The number of the asked TTO question. tto_block: The design block of the TTO question. answer: The equivalence value indicated by the participant, ranging from 0.1 to 1 in steps of 0.1. included: If the answer was included in the data for the model to generate the value set. id: Randomized id of the participant.

    • dce.csv:

    config1: Configuration of the first state in the question. config2: Configuration of the second state in the question. position: The number of the asked TTO question. answer: Whether state 1 or 2 was preferred. id: Randomized id of the participant.

    • weights_final_model.csv

    config: 6 numbers representing the attribute levels. weight: The weight calculated with the final model. ciu: The upper 95% credible interval. cil: The lower 95% credible interval.

    • coefs_final_model.csv:

    name: Name of the coefficient, composed of an abbreviation for the attribute and a level number (abbreviations in the same order as above: health, socrel, finhou, occu, secu, polciv). value: Continuous, weight on the 0 to 1 scale. ciu: The upper 95% credible interval. cil: The lower 95% credible interval.

  16. E

    EVOLVE Project GB 2030 Economic Dispatch Model

    • dtechtive.com
    • find.data.gov.scot
    txt, zip
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Edinburgh. School of Engineering. Institute of Energy Systems (2023). EVOLVE Project GB 2030 Economic Dispatch Model [Dataset]. http://doi.org/10.7488/ds/7469
    Explore at:
    zip(2.731 MB), txt(0.0047 MB), txt(0.0166 MB)Available download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    University of Edinburgh. School of Engineering. Institute of Energy Systems
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    UNITED KINGDOM
    Description

    This dataset contains the code, input sheets, set-up guide and documentation for the EVOLVE research project (https://evolveenergy.eu/) economic dispatch model of Great Britain. Within this research project, a novel modelling framework has been developed to quantify the potential benefit of including higher proportions of ocean energy within large-scale electricity systems. Economic dispatch modelling is utilised to model hourly supply-demand matching for a range of sensitivity runs, adjusting the proportion of ocean energy within the generation mix. The framework is applied to a 2030 case study of the power system of Great Britain, testing installed wave or tidal stream capacities ranging from 100 MW to 10 GW. This dataset contains all of the data, code and documentation required to run this economic dispatch model. The project results found that for all sensitivity runs, ocean energy increases renewable dispatch, reduces dispatch costs, reduces generation required from fossil fuels, reduces system carbon emissions, reduces price volatility, and captures higher market prices. The development of this model, and analysis of the model results, is described in detail in a journal paper (currently in press). A preprint of this paper is included within the folder. It can be referenced as: S. Pennock, D.R. Noble, Y. Verdanyan, T. Delahaye and H. Jeffrey (2023). 'A modelling framework to quantify the power system benefits from ocean energy deployments'. Applied Energy, Volume 347, 1 October 2023, 121413 ( https://doi.org/10.1016/j.apenergy.2023.121413 ).

  17. Z

    Summary ouput data - Wasteaware Cities Benchmark Indicators - WABI 2023 -...

    • data.niaid.nih.gov
    Updated Jan 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Whiteman (2023). Summary ouput data - Wasteaware Cities Benchmark Indicators - WABI 2023 - Global data analytics - Machine learning vs. Non-linear Regression [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7570420
    Explore at:
    Dataset updated
    Jan 26, 2023
    Dataset provided by
    Costas A. Velis
    Yoni Gavish
    Andrew Whiteman
    David C. Wilson
    Sue M. Grimes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the output dataset for the research publication "Socio-economic development drives solid waste management performance in cities: A global analysis using machine learning". It features

    Metadata info used by R codes

    Summary of results for two modelling approaches (machine learning: Conditional random-forest and non-linear regression)

    The independent variables dataset analysed here refer to specific indicators of the WABI methodology (https://www.sciencedirect.com/science/article/pii/S0956053X14004905) that generates solid waste management and resource recovery profiles for cities. It was applied here for 40 cities around the world. The data input are available here: 10.5281/zenodo.7570174

  18. A

    ‘Boston House Prices-Advanced Regression Techniques’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 1, 2001
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2001). ‘Boston House Prices-Advanced Regression Techniques’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-boston-house-prices-advanced-regression-techniques-bae0/fd606ebf/?iid=003-577&v=presentation
    Explore at:
    Dataset updated
    Feb 1, 2001
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Boston
    Description

    Analysis of ‘Boston House Prices-Advanced Regression Techniques’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/fedesoriano/the-boston-houseprice-data on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    Similar Datasets

    • Gender Pay Gap Dataset: LINK
    • California Housing Prices Data (5 new features!): LINK
    • Company Bankruptcy Prediction: LINK

    Context

    The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978.

    Attribute Information

    Input features in order: 1) CRIM: per capita crime rate by town 2) ZN: proportion of residential land zoned for lots over 25,000 sq.ft. 3) INDUS: proportion of non-retail business acres per town 4) CHAS: Charles River dummy variable (1 if tract bounds river; 0 otherwise) 5) NOX: nitric oxides concentration (parts per 10 million) [parts/10M] 6) RM: average number of rooms per dwelling 7) AGE: proportion of owner-occupied units built prior to 1940 8) DIS: weighted distances to five Boston employment centres 9) RAD: index of accessibility to radial highways 10) TAX: full-value property-tax rate per $10,000 [$/10k] 11) PTRATIO: pupil-teacher ratio by town 12) B: The result of the equation B=1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town 13) LSTAT: % lower status of the population

    Output variable: 1) MEDV: Median value of owner-occupied homes in $1000's [k$]

    Source

    StatLib - Carnegie Mellon University

    Relevant Papers

    Harrison, David & Rubinfeld, Daniel. (1978). Hedonic housing prices and the demand for clean air. Journal of Environmental Economics and Management. 5. 81-102. 10.1016/0095-0696(78)90006-2. LINK

    Belsley, David A. & Kuh, Edwin. & Welsch, Roy E. (1980). Regression diagnostics: identifying influential data and sources of collinearity. New York: Wiley LINK

    --- Original source retains full ownership of the source dataset ---

  19. o

    Results: Analysis of Correlation Between GDP per Capita and Average Height...

    • explore.openaire.eu
    Updated Apr 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lea Salome Brugger (2021). Results: Analysis of Correlation Between GDP per Capita and Average Height of Young Adults in 2019 in 164 Countries [Dataset]. http://doi.org/10.5281/zenodo.4699900
    Explore at:
    Dataset updated
    Apr 19, 2021
    Authors
    Lea Salome Brugger
    Description

    These are the results obtained by conducting the experiment "Average Height of 19-year-old Males and Females and GDP per Capita in 2019 for 164 Countries". The CSV file contains the raw data produced by processing, filtering and merging the input datasets. There are two rows for each of the 164 countries. In both rows, the country name, country code and GDP per capita are given. However, one row contains the average height of 19-year-old males (indicated by the value 'Boys' in the 'Sex' column) whereas the other displays the average height of 19-year-old females (indicated by the value 'Girls'). Furthermore, there are two PNG files which display the regression plots for the average height of 19-year-old males and females, respectively. Note that the x-scale (for the GDP per capita) is logarithmic. {"references": ["The World Bank, GDP per capita (current US$), Washington, DC: The World Bank, 2021. Accessed on: Apr. 13, 2021. [Online] Available: https://data.worldbank.org/indicator/NY.GDP.PCAP.CD.", "NCD Risk Factor Collaboration, Height - Evolution of adult height over time, NCD Risk Factor Collaboration, 2021. Accessed on: Apr. 18, 2021. [Online] Available: https://ncdrisc.org/data-downloads-height.html under "Country-specific data for all countries"."]}

  20. g

    Data used in "Insights from adding transportation sector detail into an...

    • gimi9.com
    • s.cnmilf.com
    • +1more
    Updated May 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Data used in "Insights from adding transportation sector detail into an economy-wide model: The case of the ADAGE CGE model" [Dataset]. https://gimi9.com/dataset/data-gov_data-used-in-insights-from-adding-transportation-sector-detail-into-an-economy-wide-model-
    Explore at:
    Dataset updated
    May 8, 2024
    Description

    In this project, we added transportation modes and representation of alternative fuel technologies to a computable general equilibrium (CGE) model (ADAGE), and illustrated the impact of these transportation sector details using AEO oil price scenarios. This dataset includes the model results supporting the publication, "Insights from adding transportation sector detail into an economy-wide model: The case of the ADAGE CGE model." The dataset includes 3 files. "Adage_oilprice_main_results.xlsx" shows the data presented in the results section of the paper. "Adage_oilprice_fixed_factor.xlsx" shows data from sensitivity scenarios presented in Appendix B of the paper. "Adage_oilprice_alternative_nesting.xlsx" shows data from sensitivity scenarios presented in Appendix C of the paper. Citation information for this dataset can be found in the EDG's Metadata Reference Information section and Data.gov's References section.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Pacheco Bonrostro, Joaquín; Casado Yusta, Silvia; Pacheco Bonrostro, Joaquín; Casado Yusta, Silvia (2020). Dataset of the paper “Variable selection for linear regression in large databases: exact methods” Applied Intelligence, 51(6), 3736-3756 [Dataset]. https://investigacion.ubu.es/documentos/682afba74c44bf76b28811e1

Data from: Dataset of the paper “Variable selection for linear regression in large databases: exact methods” Applied Intelligence, 51(6), 3736-3756

Related Article
Explore at:
Dataset updated
2020
Authors
Pacheco Bonrostro, Joaquín; Casado Yusta, Silvia; Pacheco Bonrostro, Joaquín; Casado Yusta, Silvia
Description

The variable selection problem in the context of Linear Regression for large databases is analysed. The problem consists in selecting a small subset of independent variables that can perform the prediction task optimally. This problem has a wide range of applications. One important type of application is the design of composite indicators in various areas (sociology and economics, for example). Other important applications of variable selection in linear regression can be found in fields such as chemometrics, genetics, and climate prediction, among many others. For this problem, we propose a Branch & Bound method. This is an exact method and therefore guarantees optimal solutions. We also provide strategies that enable this method to be applied in very large databases (with hundreds of thousands of cases) in a moderate computation time. A series of computational experiments shows that our method performs well compared with well-known methods in the literature and with commercial software.

Search
Clear search
Close search
Google apps
Main menu