95 datasets found

u
Data from: Dataset of the paper “Variable selection for linear regression in...
investigacion.ubu.es
Updated 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pacheco Bonrostro, Joaquín; Casado Yusta, Silvia; Pacheco Bonrostro, Joaquín; Casado Yusta, Silvia (2020). Dataset of the paper “Variable selection for linear regression in large databases: exact methods” Applied Intelligence, 51(6), 3736-3756 [Dataset]. https://investigacion.ubu.es/documentos/682afba74c44bf76b28811e1
Explore at:
Dataset updated
2020
Authors
Pacheco Bonrostro, Joaquín; Casado Yusta, Silvia; Pacheco Bonrostro, Joaquín; Casado Yusta, Silvia
Description
The variable selection problem in the context of Linear Regression for large databases is analysed. The problem consists in selecting a small subset of independent variables that can perform the prediction task optimally. This problem has a wide range of applications. One important type of application is the design of composite indicators in various areas (sociology and economics, for example). Other important applications of variable selection in linear regression can be found in fields such as chemometrics, genetics, and climate prediction, among many others. For this problem, we propose a Branch & Bound method. This is an exact method and therefore guarantees optimal solutions. We also provide strategies that enable this method to be applied in very large databases (with hundreds of thousands of cases) in a moderate computation time. A series of computational experiments shows that our method performs well compared with well-known methods in the literature and with commercial software.
WaterMelon dataset
kaggle.com
Updated Mar 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
JunHyeok_Choi (2023). WaterMelon dataset [Dataset]. https://www.kaggle.com/datasets/junhyeokchoi99/watermelon-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 7, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
JunHyeok_Choi
Description
This dataset was extracted from the academic data below

An Econometric Model of the Watermelon Market (Suites, 1955) : https://www.jstor.org/stable/1233923?seq=1 Suits' watermelon model (Stewart, 2018) : https://www.uvic.ca/socialsciences/economics/assets/docs/seminars/KenStewartBrownBagFeb28.pdf

Variable Description Year : Year (1930-1951) log q (Q) : Total number of watermelons available for harvest (millions) log h (X) : Watermelons harvested (millions) log p (P) : Average farm price of watermelons ( $ per 1,000) log pc (C) : Average annual net farm receipts per pound of lncottonprice (dollars) log pv (T) : Average farm price of vegetables (index) log w (W) : Farm lnwageindex rates in the South Atlantic States (index) log n (N) : US population (milions) log(y/n) (Y/N) : Per capita disposable ($) log pf (F) : Railway lnfreightcostindex costs for watermelons (index)
d
Data for Regression Models to Estimate Water Use in Providence, Rhode...
catalog.data.gov
data.usgs.gov
+1more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Data for Regression Models to Estimate Water Use in Providence, Rhode Island, 2014-2021 [Dataset]. https://catalog.data.gov/dataset/data-for-regression-models-to-estimate-water-use-in-providence-rhode-island-2014-2021
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Rhode Island, Providence
Description
This data release contains input data and programs (scripts) used to estimate monthly water demand for retail customers of Providence Water, located in Providence, Rhode Island. Explanatory data and model outputs are from July 2014 through June 2021. Models of per capita (for single-family residential customers) or per connection (for multi-family residential, commercial, and industrial customers) water use were developed using multiple linear regression. The dependent variables, provided by Providence Water, are the monthly number of connections and gallons of water delivered to single- and multi-family residential, commercial, and industrial connections. Potential independent variables (from online sources) are climate variables (temperature and precipitation), economic statistics, and a drought statistic. Not all independent variables were used in all of the models. The data are provided in data tables and model files. The data table RIWaterUseVariableExplanation.csv describes the explanatory variables and their data sources. The data table ProvModelInputData.csv provides the monthly water-use data that are the independent variables and the monthly climatic and economic data that are the dependent variables. The data table DroughtInputData.csv provides the weekly U.S. drought monitor index values that were processed to formulate a potential independent variable. The R script model_water_use.R runs the models that predict water use. The other two R scripts (load_preprocess_input_data.R and model_water_use_functions.R) are not run explicitly but are called from the primary script model_water_use.R. Regression equations produced by the models can be used to predict water demand throughout Rhode Island.
w
Dataset of book subjects that contain The economics of low pay in Britain :...
workwithdata.com
Updated Nov 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Dataset of book subjects that contain The economics of low pay in Britain : a logistic regression approach [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=The+economics+of+low+pay+in+Britain+%3A+a+logistic+regression+approach&j=1&j0=books
Explore at:
Dataset updated
Nov 7, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United Kingdom
Description
This dataset is about book subjects. It has 1 row and is filtered where the books is The economics of low pay in Britain : a logistic regression approach. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
Canada per capita income prediction
kaggle.com
Updated Jul 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anagha K P (2023). Canada per capita income prediction [Dataset]. https://www.kaggle.com/datasets/anaghakp/ml-prediction
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 28, 2023
Dataset provided by
Kaggle
Authors
Anagha K P
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Canada
Description
Canada Per Capita Income Dataset: Contextualizing Economic Growth and Trends. This comprehensive dataset features per capita income data for Canada spanning multiple years, providing valuable insights into the country's economic progression. Sourced from reputable economic databases and governmental records, this dataset serves as a valuable resource for analysts, researchers, and policymakers. Inspired by the need for accessible and reliable economic data on Kaggle, this dataset aims to facilitate informed decision-making and foster a deeper understanding of Canada's income dynamics over time.
o
Replication data for: Big Data: New Tricks for Econometrics
openicpsr.org
Updated May 1, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hal R. Varian (2014). Replication data for: Big Data: New Tricks for Econometrics [Dataset]. http://doi.org/10.3886/E113925V1
Explore at:
Unique identifier
https://doi.org/10.3886/E113925V1
Dataset updated
May 1, 2014
Dataset provided by
American Economic Association
Authors
Hal R. Varian
Time period covered
May 1, 2014
Description
Computers are now involved in many economic transactions and can capture data associated with these transactions, which can then be manipulated and analyzed. Conventional statistical and econometric techniques such as regression often work well, but there are issues unique to big datasets that may require different tools. First, the sheer size of the data involved may require more powerful data manipulation tools. Second, we may have more potential predictors than appropriate for estimation, so we need to do some kind of variable selection. Third, large datasets may allow for more flexible relationships than simple linear models. Machine learning techniques such as decision trees, support vector machines, neural nets, deep learning, and so on may allow for more effective ways to model complex relationships. In this essay, I will describe a few of these tools for manipulating and analyzing big data. I believe that these methods have a lot to offer and should be more widely known and used by economists.
m
Results of univariate gamma regression models for direct costs and per...
data.mendeley.com
narcis.nl
Updated Feb 19, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paula Becker (2020). Results of univariate gamma regression models for direct costs and per capita direct costs according to results of ASSIST for alcohol, cannabis, and cocaine/crack [Dataset]. http://doi.org/10.17632/mnj98x2ghc.1
Explore at:
Unique identifier
https://doi.org/10.17632/mnj98x2ghc.1
Dataset updated
Feb 19, 2020
Authors
Paula Becker
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This material brings data on the results of univariate gamma regression model for direct costs, which was the first stage of inferential analysis using the linear regression model so that we could analyse which variables could be interacting with total direct cost per capita. The first table shows these data and precedes the multivariate analysis described in the article. The second table shows a more detailed descreptive analysis of per capita direct costs according to the current drug use pattern (evaluated by ASSIST alcohol, cannabis and cocaine/crack), including mean, standard deviation, minimum, maximum, first quartile, median, third quartile and the p value according to Kruskal-Wallis test. These data make reference to the article by Dr. Paula Becker e Dr. Denise Razzouk called " Relationships between age of onset of drug use, use pattern, and direct health costs in a sample of adults’ drug dependents in treatment at a Brazilian community mental health service ".
Medical_cost_dataset
kaggle.com
Updated Aug 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nandita Pore (2023). Medical_cost_dataset [Dataset]. https://www.kaggle.com/datasets/nanditapore/medical-cost-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 19, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Nandita Pore
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Description:

Explore the intricacies of medical costs and healthcare expenses with our meticulously curated Medical Cost Dataset. This dataset offers valuable insights into the factors influencing medical charges, enabling researchers, analysts, and healthcare professionals to gain a deeper understanding of the dynamics within the healthcare industry.

Columns: 1. ID: A unique identifier assigned to each individual record, facilitating efficient data management and analysis. 2. Age: The age of the patient, providing a crucial demographic factor that often correlates with medical expenses. 3. Sex: The gender of the patient, offering insights into potential cost variations based on biological differences. 4. BMI: The Body Mass Index (BMI) of the patient, indicating the relative weight status and its potential impact on healthcare costs. 5. Children: The number of children or dependents covered under the medical insurance, influencing family-related medical expenses. 6. Smoker: A binary indicator of whether the patient is a smoker or not, as smoking habits can significantly impact healthcare costs. 7. Region: The geographic region of the patient, helping to understand regional disparities in healthcare expenditure. 8. Charges: The medical charges incurred by the patient, serving as the target variable for analysis and predictions.

Whether you're aiming to uncover patterns in medical billing, predict future healthcare costs, or explore the relationships between different variables and charges, our Medical Cost Dataset provides a robust foundation for your research. Researchers can utilize this dataset to develop data-driven models that enhance the efficiency of healthcare resource allocation, insurers can refine pricing strategies, and policymakers can make informed decisions to improve the overall healthcare system.

Unlock the potential of healthcare data with our comprehensive Medical Cost Dataset. Gain insights, make informed decisions, and contribute to the advancement of healthcare economics and policy. Start your analysis today and pave the way for a healthier future.
f
County-level dataset accuracy.
plos.figshare.com
xls
Updated Nov 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiaofeng Xu; Zhaoyuan Chen; Shixiang Chen (2023). County-level dataset accuracy. [Dataset]. http://doi.org/10.1371/journal.pone.0293303.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0293303.t006
Dataset updated
Nov 7, 2023
Dataset provided by
PLOS ONE
Authors
Xiaofeng Xu; Zhaoyuan Chen; Shixiang Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Urban economic competitiveness is a fundamental indicator for assessing the level of urban development and serves as an effective approach for understanding regional disparities. Traditional economic competitiveness research that relies solely on traditional regression models and assumes feature relationship theory tends to fall short in fully exploring the intricate interrelationships and nonlinear associations among features. As a result, the study of urban economic disparities remains limited to a narrow range of urban features, which is insufficient for comprehending cities as complex systems. The ability of deep learning neural networks to automatically construct models of nonlinear relationships among complex features provides a new approach to research in this issue. In this study, a complex urban feature dataset comprising 1008 features was constructed based on statistical data from 283 prefecture-level cities in China. Employing a machine learning approach based on convolutional neural network (CNN), a novel analytical model is constructed to capture the interrelationships among urban features, which is applied to achieve accurate classification of urban economic competitiveness. In addition, considering the limited number of samples in the dataset owing to the fixed number of cities, this study developed a data augmentation approach based on deep convolutional generative adversarial network (DCGAN) to further enhance the accuracy and generalization ability of the model. The performance of the CNN classification model was effectively improved by adding the generated samples to the original sample dataset. This study provides a precise and stable analytical model for investigating disparities in regional development. In the meantime, it offers a feasible solution to the limited sample size issue in the application of deep learning in urban research.
m
Data from: Research Document: Jaouad Karfali Economic Cycle Analysis with...
data.mendeley.com
Updated Feb 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Karfali Jaouad (2025). Research Document: Jaouad Karfali Economic Cycle Analysis with Numerical Time Cycles [Dataset]. http://doi.org/10.17632/wv7dcm5834.1
Explore at:
Unique identifier
https://doi.org/10.17632/wv7dcm5834.1
Dataset updated
Feb 26, 2025
Authors
Karfali Jaouad
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
Description: This dataset contains historical economic data spanning from 1871 to 2024, used in Jaouad Karfali’s research on Economic Cycle Analysis with Numerical Time Cycles. The study aims to improve economic forecasting accuracy through the 9-year cycle model, which demonstrates superior predictive capabilities compared to traditional economic indicators.

Dataset Contents: The dataset includes a comprehensive range of economic indicators used in the research, such as:

USGDP_1871-2024.csv – U.S. Gross Domestic Product (GDP) data. USCPI_cleaned.csv – U.S. Consumer Price Index (CPI), cleaned and processed. USWAGE_1871-2024.csv – U.S. average wages data. EXCHANGEGLOBAL_cleaned.csv – Global exchange rates for the U.S. dollar. EXCHANGEPOUND_cleaned.csv – U.S. dollar to British pound exchange rates. INTERESTRATE_1871-2024.csv – U.S. interest rate data. UNRATE.csv – U.S. unemployment rate statistics. POPTOTUSA647NWDB.csv – U.S. total population data. Significance of the Data: This dataset serves as a foundation for a robust economic analysis of the U.S. economy over multiple decades. It was instrumental in testing the 9-year economic cycle model, which demonstrated an 85% accuracy rate in economic forecasting when compared to traditional models such as ARIMA and VAR.

Applications:

Economic Forecasting: Predicts a 1.5% decline in GDP in 2025, followed by a gradual recovery between 2026-2034. Economic Stability Analysis: Used for comparing forecasts with estimates from institutions like the IMF and World Bank. Academic and Institutional Research: Supports studies in economic cycles and long-term forecasting. Source & Further Information: For more details on the methodology and research findings, refer to the full paper published on SSRN:

https://ssrn.com/author=7429208 https://orcid.org/0009-0002-9626-7289

Jaouad Karfali
H
Replication Data for: A Regression Analysis of the probability of a...
dataverse.harvard.edu
Updated Jul 22, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pranav Krishnan; Yash Patel (2020). Replication Data for: A Regression Analysis of the probability of a recession and student loan debt utilizing data between 1993-2019 [Dataset]. http://doi.org/10.7910/DVN/WNNWCO
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/WNNWCO
Dataset updated
Jul 22, 2020
Dataset provided by
Harvard Dataverse
Authors
Pranav Krishnan; Yash Patel
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Over 44.7 million Americans carry student loan debt, with the total amount valued at approximately $1.31 trillion (Quarterly Report, 2019). Ergo, consumer spending, a factor of GDP, is stifled and negatively impacts the economy (Frizell, 2014, p. 22). This study examined the relationship between student loan debt and the probability of a recession in the near future, as well as the effects of proposed student loan forgiveness policies through the use of a created model. The Federal Reserve Bank of St. Louis’s website (FRED) was used to extract data regarding total GDP per quarter and student loan debt per quarter ("Federal Reserve Economic Data," 2019). Through the combination of the student loan debt per quarter and total GDP per quarter datasets, the percentage of total GDP composed of student loan debt per quarter was calculated and fitted to a logistic curve. Future quarterly values for total GDP and the percentage of total GDP composed by student loan debt per quarter were found through Long Short Term Models and Euler’s Method, respectively. Through the creation of a probability of recession index, the probability of recession per quarter was compared to the percentage of total GDP composed by student loan debt per quarter to construct an exponential regression model. Utilizing a primarily quantitative method of analysis, the percentage of total GDP composed by student loan debt per quarter was found to be strongly associated[p < 1.26696* 10-8]with the probability of recession per quarter(p(R)), with the p(R) tending to peak as the percentage of total GDP composed of student loan debt per quarter strayed away from the carrying capacity of the logistic curve. Inputting the student loan debt forgiveness policies of potential congressional bills proposed by lawmakers found that eliminating 49.7 % and 36.7% of student loan debt would reduce the recession probabilities to be 1.73545*10-29% and 9.74474*10-25%, respectively.
f
Tuned hyperparameters for linear regression.
plos.figshare.com
xls
Updated Jun 21, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sherin Kularathne; Namal Rathnayake; Madhawa Herath; Upaka Rathnayake; Yukinobu Hoshino (2024). Tuned hyperparameters for linear regression. [Dataset]. http://doi.org/10.1371/journal.pone.0303883.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0303883.t002
Dataset updated
Jun 21, 2024
Dataset provided by
PLOS ONE
Authors
Sherin Kularathne; Namal Rathnayake; Madhawa Herath; Upaka Rathnayake; Yukinobu Hoshino
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Rice is a crucial crop in Sri Lanka, influencing both its agricultural and economic landscapes. This study delves into the complex interplay between economic indicators and rice production, aiming to uncover correlations and build prediction models using machine learning techniques. The dataset, spanning from 1960 to 2020, includes key economic variables such as GDP, inflation rate, manufacturing output, population, population growth rate, imports, arable land area, military expenditure, and rice production. The study’s findings reveal the significant influence of economic factors on rice production in Sri Lanka. Machine learning models, including Linear Regression, Support Vector Machines, Ensemble methods, and Gaussian Process Regression, demonstrate strong predictive accuracy in forecasting rice production based on economic indicators. These results underscore the importance of economic indicators in shaping rice production outcomes and highlight the potential of machine learning in predicting agricultural trends. The study suggests avenues for future research, such as exploring regional variations and refining models based on ongoing data collection.
w
Dataset of publication dates of book series where books equals Behind the...
workwithdata.com
Updated Nov 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Dataset of publication dates of book series where books equals Behind the model : a constructive critique of economic modeling [Dataset]. https://www.workwithdata.com/datasets/book-series?col=book_series%2Cj0-publication_date&f=1&fcol0=j0-book&fop0=%3D&fval0=Behind+the+model+%3A+a+constructive+critique+of+economic+modeling&j=1&j0=books
Explore at:
Dataset updated
Nov 25, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about book series. It has 1 row and is filtered where the books is Behind the model : a constructive critique of economic modeling. It features 2 columns including publication dates.
Regression models based on Effects of Taxes and Benefits data
ons.gov.uk
cy.ons.gov.uk
xlsx
Updated May 15, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2019). Regression models based on Effects of Taxes and Benefits data [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/wellbeing/datasets/regressionmodelsbasedoneffectsoftaxesandbenefitsdata
Explore at:
xlsxAvailable download formats
Dataset updated
May 15, 2019
Dataset provided by
Office for National Statisticshttp://www.ons.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Personal and economic well-being regression models based on effects of taxes and benefits data for the UK for the period April 2016 to March 2017
r
CALY-SWE: Discrete choice experiment and time trade-off data for a...
researchdata.se
data.europa.eu
Updated Sep 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaspar Walter Meili; Lars Lindholm (2024). CALY-SWE: Discrete choice experiment and time trade-off data for a representative Swedish value set [Dataset]. http://doi.org/10.5878/asxy-3p37
Explore at:
Unique identifier
https://doi.org/10.5878/asxy-3p37
Dataset updated
Sep 24, 2024
Dataset provided by
Umeå University
Authors
Kaspar Walter Meili; Lars Lindholm
Time period covered
Jan 8, 2022 - Apr 18, 2022
Area covered
Sweden
Description
The data consist of two parts: Time trade-off (TTO) data with one row per TTO question (5 questions), and discrete choice experiment (DCE) data with one row per question (6 questions). The purpose of the data is the calculation of a Swedish value set for the capability-adjusted life years (CALY-SWE) instrument. To protect the privacy of the study participants and to comply with GDPR, access to the data is given upon request.

The data is provided in 4 .csv files with the names:

tto.csv (252 kB)

dce.csv (282 kB)

weights_final_model.csv (30 kB)

coefs_final_model.csv (1 kB)

The first two files (tto.csv, dce.csv) contain the time trade-off (TTO) answers and discrete choice experiment (DCE) answers of participants. The latter two files (weight_final_model.csv, coefs_final_model.csv) contain the generated value set of CALY-SWE weights, and the pertaining coefficients of the main effects additive model.

Background:

CALY-SWE is a capability-based instrument for studying Quality of Life (QoL). It consists of 6 attributes (health, social relations, financial situation & housing, occupation, security, political & civil rights) and provides the option to gives for attribute answers on 3 levels (Agree, Agree partially, Do not agree). A configuration or state is one of the 3^6 = 729 possible situations that the instrument describes. Here, a config is denoted in the form of xxxxxx, one x for each attribute in order above. X is a digit corresponding to the level of the respective attribute, with 3 being the highest (Agree), and 1 being the lowest (Do not agree). For example, 222222 encodes a configuration with all attributes on level 2 (Partially agree). The purpose of this dataset is to support the publication of the CALY-SWE value set and to enable reproduction of the calculations (due to privacy concerns we abstain from publishing individual level characteristics). A value set consists of values on the 0 to 1 scale for all 729, each of represents a quality weighting where 1 is the highest capability-related QoL, and 0 the lowest capability-related QoL.

The data contains answers to two types of questions: TTO and DCE.

In TTO questions, participants iteratively chose a number of years between 1 to 10. A choice of 10 years is equivalent to living 10 years with full capability (state configuration 333333) in the capability state that the TTO question describes. The answer on the 0 to 1 scale is then calculated as x/10. In the DCE questions, participants were given two states and they chose a state that they found to be better. We used a hybrid model with a linear regression and a logit model component, where the coefficients were linked through a multiplicative factor, to obtain the weights (weights_final_model.csv). Each weight is calculated as constant + the coefficients for the respective configuration. Coefficients for level 3 encode the difference to level 2, and coefficients for level 2 the difference to the constant. For example, for the weight for 123112 is calculated as constant + socrel2 + finhou2 + finhou3 + polciv2 (No coefficients for health, occupation, and security involved as they are on level 1 that is captured in the constant/intercept).

To assess the quality of TTO answers, we calculated a score per participant that takes into account inconsistencies in answering the TTO question. We then excluded 20% of participants with the worst score to improve the TTO data quality and signal strength for the model (this is indicated by the 'included' variable in the TTO dataset). Details of the entire survey are described in the preprint “CALY-SWE value set: An integrated approach for a valuation study based on an online-administered TTO and DCE survey” by Meili et al. (2023). Please check this document for updated versions.

Ids have been randomized with preserved linkage between the DCE and TTO dataset.

Data files and variables:

Below is a description of the variables in each CSV file. - tto.csv:

config: 6 numbers representing the attribute levels. position: The number of the asked TTO question. tto_block: The design block of the TTO question. answer: The equivalence value indicated by the participant, ranging from 0.1 to 1 in steps of 0.1. included: If the answer was included in the data for the model to generate the value set. id: Randomized id of the participant.

dce.csv:

config1: Configuration of the first state in the question. config2: Configuration of the second state in the question. position: The number of the asked TTO question. answer: Whether state 1 or 2 was preferred. id: Randomized id of the participant.

weights_final_model.csv

config: 6 numbers representing the attribute levels. weight: The weight calculated with the final model. ciu: The upper 95% credible interval. cil: The lower 95% credible interval.

coefs_final_model.csv:

name: Name of the coefficient, composed of an abbreviation for the attribute and a level number (abbreviations in the same order as above: health, socrel, finhou, occu, secu, polciv). value: Continuous, weight on the 0 to 1 scale. ciu: The upper 95% credible interval. cil: The lower 95% credible interval.
E
EVOLVE Project GB 2030 Economic Dispatch Model
dtechtive.com
find.data.gov.scot
txt, zip
Updated Jun 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Edinburgh. School of Engineering. Institute of Energy Systems (2023). EVOLVE Project GB 2030 Economic Dispatch Model [Dataset]. http://doi.org/10.7488/ds/7469
Explore at:
zip(2.731 MB), txt(0.0047 MB), txt(0.0166 MB)Available download formats
Unique identifier
https://doi.org/10.7488/ds/7469
Dataset updated
Jun 9, 2023
Dataset provided by
University of Edinburgh. School of Engineering. Institute of Energy Systems
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
UNITED KINGDOM
Description
This dataset contains the code, input sheets, set-up guide and documentation for the EVOLVE research project (https://evolveenergy.eu/) economic dispatch model of Great Britain. Within this research project, a novel modelling framework has been developed to quantify the potential benefit of including higher proportions of ocean energy within large-scale electricity systems. Economic dispatch modelling is utilised to model hourly supply-demand matching for a range of sensitivity runs, adjusting the proportion of ocean energy within the generation mix. The framework is applied to a 2030 case study of the power system of Great Britain, testing installed wave or tidal stream capacities ranging from 100 MW to 10 GW. This dataset contains all of the data, code and documentation required to run this economic dispatch model. The project results found that for all sensitivity runs, ocean energy increases renewable dispatch, reduces dispatch costs, reduces generation required from fossil fuels, reduces system carbon emissions, reduces price volatility, and captures higher market prices. The development of this model, and analysis of the model results, is described in detail in a journal paper (currently in press). A preprint of this paper is included within the folder. It can be referenced as: S. Pennock, D.R. Noble, Y. Verdanyan, T. Delahaye and H. Jeffrey (2023). 'A modelling framework to quantify the power system benefits from ocean energy deployments'. Applied Energy, Volume 347, 1 October 2023, 121413 ( https://doi.org/10.1016/j.apenergy.2023.121413 ).
Z
Summary ouput data - Wasteaware Cities Benchmark Indicators - WABI 2023 -...
data.niaid.nih.gov
Updated Jan 26, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew Whiteman (2023). Summary ouput data - Wasteaware Cities Benchmark Indicators - WABI 2023 - Global data analytics - Machine learning vs. Non-linear Regression [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7570420
Explore at:
Dataset updated
Jan 26, 2023
Dataset provided by
Costas A. Velis
Yoni Gavish
Andrew Whiteman
David C. Wilson
Sue M. Grimes
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the output dataset for the research publication "Socio-economic development drives solid waste management performance in cities: A global analysis using machine learning". It features

Metadata info used by R codes

Summary of results for two modelling approaches (machine learning: Conditional random-forest and non-linear regression)

The independent variables dataset analysed here refer to specific indicators of the WABI methodology (https://www.sciencedirect.com/science/article/pii/S0956053X14004905) that generates solid waste management and resource recovery profiles for cities. It was applied here for 40 cities around the world. The data input are available here: 10.5281/zenodo.7570174
A
‘Boston House Prices-Advanced Regression Techniques’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 1, 2001
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2001). ‘Boston House Prices-Advanced Regression Techniques’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-boston-house-prices-advanced-regression-techniques-bae0/fd606ebf/?iid=003-577&v=presentation
Explore at:
Dataset updated
Feb 1, 2001
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Boston
Description
Analysis of ‘Boston House Prices-Advanced Regression Techniques’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/fedesoriano/the-boston-houseprice-data on 13 February 2022.

--- Dataset description provided by original source is as follows ---

Similar Datasets

Gender Pay Gap Dataset: LINK

California Housing Prices Data (5 new features!): LINK

Company Bankruptcy Prediction: LINK

Context

The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978.

Attribute Information

Input features in order: 1) CRIM: per capita crime rate by town 2) ZN: proportion of residential land zoned for lots over 25,000 sq.ft. 3) INDUS: proportion of non-retail business acres per town 4) CHAS: Charles River dummy variable (1 if tract bounds river; 0 otherwise) 5) NOX: nitric oxides concentration (parts per 10 million) [parts/10M] 6) RM: average number of rooms per dwelling 7) AGE: proportion of owner-occupied units built prior to 1940 8) DIS: weighted distances to five Boston employment centres 9) RAD: index of accessibility to radial highways 10) TAX: full-value property-tax rate per $10,000 [$/10k] 11) PTRATIO: pupil-teacher ratio by town 12) B: The result of the equation B=1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town 13) LSTAT: % lower status of the population

Output variable: 1) MEDV: Median value of owner-occupied homes in $1000's [k$]

Source

StatLib - Carnegie Mellon University

Relevant Papers

Harrison, David & Rubinfeld, Daniel. (1978). Hedonic housing prices and the demand for clean air. Journal of Environmental Economics and Management. 5. 81-102. 10.1016/0095-0696(78)90006-2. LINK

Belsley, David A. & Kuh, Edwin. & Welsch, Roy E. (1980). Regression diagnostics: identifying influential data and sources of collinearity. New York: Wiley LINK

--- Original source retains full ownership of the source dataset ---
o
Results: Analysis of Correlation Between GDP per Capita and Average Height...
explore.openaire.eu
Updated Apr 19, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lea Salome Brugger (2021). Results: Analysis of Correlation Between GDP per Capita and Average Height of Young Adults in 2019 in 164 Countries [Dataset]. http://doi.org/10.5281/zenodo.4699900
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.4699900
Dataset updated
Apr 19, 2021
Authors
Lea Salome Brugger
Description
These are the results obtained by conducting the experiment "Average Height of 19-year-old Males and Females and GDP per Capita in 2019 for 164 Countries". The CSV file contains the raw data produced by processing, filtering and merging the input datasets. There are two rows for each of the 164 countries. In both rows, the country name, country code and GDP per capita are given. However, one row contains the average height of 19-year-old males (indicated by the value 'Boys' in the 'Sex' column) whereas the other displays the average height of 19-year-old females (indicated by the value 'Girls'). Furthermore, there are two PNG files which display the regression plots for the average height of 19-year-old males and females, respectively. Note that the x-scale (for the GDP per capita) is logarithmic. {"references": ["The World Bank, GDP per capita (current US$), Washington, DC: The World Bank, 2021. Accessed on: Apr. 13, 2021. [Online] Available: https://data.worldbank.org/indicator/NY.GDP.PCAP.CD.", "NCD Risk Factor Collaboration, Height - Evolution of adult height over time, NCD Risk Factor Collaboration, 2021. Accessed on: Apr. 18, 2021. [Online] Available: https://ncdrisc.org/data-downloads-height.html under "Country-specific data for all countries"."]}
g
Data used in "Insights from adding transportation sector detail into an...
gimi9.com
s.cnmilf.com
+1more
Updated May 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Data used in "Insights from adding transportation sector detail into an economy-wide model: The case of the ADAGE CGE model" [Dataset]. https://gimi9.com/dataset/data-gov_data-used-in-insights-from-adding-transportation-sector-detail-into-an-economy-wide-model-
Explore at:
Dataset updated
May 8, 2024
Description
In this project, we added transportation modes and representation of alternative fuel technologies to a computable general equilibrium (CGE) model (ADAGE), and illustrated the impact of these transportation sector details using AEO oil price scenarios. This dataset includes the model results supporting the publication, "Insights from adding transportation sector detail into an economy-wide model: The case of the ADAGE CGE model." The dataset includes 3 files. "Adage_oilprice_main_results.xlsx" shows the data presented in the results section of the paper. "Adage_oilprice_fixed_factor.xlsx" shows data from sensitivity scenarios presented in Appendix B of the paper. "Adage_oilprice_alternative_nesting.xlsx" shows data from sensitivity scenarios presented in Appendix C of the paper. Citation information for this dataset can be found in the EDG's Metadata Reference Information section and Data.gov's References section.

Facebook

Twitter

Click to copy link

Link copied

Cite

Pacheco Bonrostro, Joaquín; Casado Yusta, Silvia; Pacheco Bonrostro, Joaquín; Casado Yusta, Silvia (2020). Dataset of the paper “Variable selection for linear regression in large databases: exact methods” Applied Intelligence, 51(6), 3736-3756 [Dataset]. https://investigacion.ubu.es/documentos/682afba74c44bf76b28811e1

Data from: Dataset of the paper “Variable selection for linear regression in large databases: exact methods” Applied Intelligence, 51(6), 3736-3756

Explore at:

Dataset updated

2020

Authors

Pacheco Bonrostro, Joaquín; Casado Yusta, Silvia; Pacheco Bonrostro, Joaquín; Casado Yusta, Silvia

Description

The variable selection problem in the context of Linear Regression for large databases is analysed. The problem consists in selecting a small subset of independent variables that can perform the prediction task optimally. This problem has a wide range of applications. One important type of application is the design of composite indicators in various areas (sociology and economics, for example). Other important applications of variable selection in linear regression can be found in fields such as chemometrics, genetics, and climate prediction, among many others. For this problem, we propose a Branch & Bound method. This is an exact method and therefore guarantees optimal solutions. We also provide strategies that enable this method to be applied in very large databases (with hundreds of thousands of cases) in a moderate computation time. A series of computational experiments shows that our method performs well compared with well-known methods in the literature and with commercial software.

Clear search

Close search

Google apps

Main menu

Data from: Dataset of the paper “Variable selection for linear regression in...

WaterMelon dataset

Data for Regression Models to Estimate Water Use in Providence, Rhode...

Dataset of book subjects that contain The economics of low pay in Britain :...

Canada per capita income prediction

Replication data for: Big Data: New Tricks for Econometrics

Results of univariate gamma regression models for direct costs and per...

Medical_cost_dataset

Description:

County-level dataset accuracy.

Data from: Research Document: Jaouad Karfali Economic Cycle Analysis with...

Replication Data for: A Regression Analysis of the probability of a...

Tuned hyperparameters for linear regression.

Dataset of publication dates of book series where books equals Behind the...

Regression models based on Effects of Taxes and Benefits data

CALY-SWE: Discrete choice experiment and time trade-off data for a...

EVOLVE Project GB 2030 Economic Dispatch Model

Summary ouput data - Wasteaware Cities Benchmark Indicators - WABI 2023 -...

‘Boston House Prices-Advanced Regression Techniques’ analyzed by Analyst-2

Similar Datasets

Context

Attribute Information

Source

Relevant Papers

Results: Analysis of Correlation Between GDP per Capita and Average Height...

Data used in "Insights from adding transportation sector detail into an...

Data from: Dataset of the paper “Variable selection for linear regression in large databases: exact methods” Applied Intelligence, 51(6), 3736-3756