Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Replication files for David Slichter, "The Employment Effects of the Minimum Wage: A Selection Ratio Approach to Measuring Treatment Effects,” Journal of Applied Econometrics, forthcoming
Firstly, I’ve provided a .do file called sr.do which contains general code for implementing the selection ratio approach, with detailed instructions written as comments in the code.
For the minimum wage application, the main data file is mw_final.dta. A .csv version is also provided. Observations are a county in a time period. I have added self-explanatory variable labels for most variables. A few variables warrant a clearer explanation:
adj1-adj14: List of FIPS codes of all counties which are adjacent to the county in question. Each variables holds one adjacent county, and counties with fewer than 14 neighbors will have missing values for some of these variables.
change, logchange: Minimum wage this quarter - minimum wage last quarter, measured either in dollars or in logs.
time, t1-t108: The variable "time" converts years and quarters into a univariate time period, with time=1 in 1990Q1 and time=108 in 2016Q4. t1-t108 are indicators for each of these time periods.
lnemp_1418, lnearnbeg_1418, lnsep_1418, lnhira_1418, lnchurn_1418: Logs of employment, earnings, separations, hires, and churn, respectively, for 14-18 year olds.
gt1-gt6: Dummies for inclusion in each of the six comparisons used for the main (i.e., not spillover-robust) analysis. All treated counties which neighbor a control country take value 1 for each of these variables; all other treated counties take value 0. Among control counties, gt1=1 if the county neighbors a treated county and 0 otherwise, gt2=1 if the county has gt1=0 but neighbors a gt1=1 county, gt3=1 if county has gt1=gt2=0 but neighbors a gt2=1 county, etc.
h2-h6: Dummies for inclusion in each of the first spillover-robust (i.e., excluding border counties only) comparisons. Among control counties, h2-h6 are equal to gt2-gt6. Among treated counties, h2-h6 are equal to 1 if the treated county has gt1=0 but borders a gt1=1 county, and 0 otherwise.
k3-k6: Dummies for inclusion in each of the second spillover-robust (i.e., excluding two layers) comparisons. Among control counties, these variables are equal to gt3-gt6. Among treated counties, all observations take value 1 except those with gt1=1 or h2=1.
The data sources are as follows. The minimum wage law series is taken from David Neumark's website (https://www.economics.uci.edu/~dneumark/datasets.html). The economic variables are taken from the QWI, which I accessed via the Ithaca Virtual RDC. County adjacency files were downloaded from the Census Bureau (https://www.census.gov/geo/reference/county-adjacency.html).
The file main.do then runs the analyses. The resulting output file containing results is results.dta.
For the incumbency application, the main data file is incumb_final.dta. A .csv version is also provided. This file is drawn from Caughey and Sekhon's (2011) data; see their description of most variables here: https://doi.org/10.7910/DVN/8EYYA2
The key added variables are _IDistancea1-_IDistancea50, which are dummies for inclusion in the 50 comparisons used in the paper. Treated observations (i.e., Democratic wins) with margin of victory below 5 points have each of these variables equal to 1. Control observations have these variables equal to 1 if they fall within the margin of victory range, e.g., _IDistancea9=1 for control observations with Republican margin of victory between 8 and 9 points. Note that these variables are redefined by the code for the analyses of treatment effects away from the discontinuity. Lastly, there is a variable called RepWin which is the treatment variable when treatment is defined as a Republican winning.
The file sr_incumb.do then performs the analysis.
Please contact me with any questions at slichter@binghamton.edu.
Abstract copyright UK Data Service and data collection copyright owner.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Calonico, Sebastian, Cattaneo, Matias D., Farrell, Max H., and Titiunik, Rocio, (2019) "Regression Discontinuity Designs Using Covariates." Review of Economics and Statistics 101:3, 442-451.
Review of Economics and Statistics: Forthcoming.. Visit https://dataone.org/datasets/sha256%3Ad00937e0e95caca90195351492ee3df98fa25094069700fa52605c182a3a5a0c for complete metadata about this dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Boston House Prices-Advanced Regression Techniques’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/fedesoriano/the-boston-houseprice-data on 13 February 2022.
--- Dataset description provided by original source is as follows ---
The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978.
Input features in order: 1) CRIM: per capita crime rate by town 2) ZN: proportion of residential land zoned for lots over 25,000 sq.ft. 3) INDUS: proportion of non-retail business acres per town 4) CHAS: Charles River dummy variable (1 if tract bounds river; 0 otherwise) 5) NOX: nitric oxides concentration (parts per 10 million) [parts/10M] 6) RM: average number of rooms per dwelling 7) AGE: proportion of owner-occupied units built prior to 1940 8) DIS: weighted distances to five Boston employment centres 9) RAD: index of accessibility to radial highways 10) TAX: full-value property-tax rate per $10,000 [$/10k] 11) PTRATIO: pupil-teacher ratio by town 12) B: The result of the equation B=1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town 13) LSTAT: % lower status of the population
Output variable: 1) MEDV: Median value of owner-occupied homes in $1000's [k$]
StatLib - Carnegie Mellon University
Harrison, David & Rubinfeld, Daniel. (1978). Hedonic housing prices and the demand for clean air. Journal of Environmental Economics and Management. 5. 81-102. 10.1016/0095-0696(78)90006-2. LINK
Belsley, David A. & Kuh, Edwin. & Welsch, Roy E. (1980). Regression diagnostics: identifying influential data and sources of collinearity. New York: Wiley LINK
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This material brings data on the results of univariate gamma regression model for direct costs, which was the first stage of inferential analysis using the linear regression model so that we could analyse which variables could be interacting with total direct cost per capita. The first table shows these data and precedes the multivariate analysis described in the article. The second table shows a more detailed descreptive analysis of per capita direct costs according to the current drug use pattern (evaluated by ASSIST alcohol, cannabis and cocaine/crack), including mean, standard deviation, minimum, maximum, first quartile, median, third quartile and the p value according to Kruskal-Wallis test. These data make reference to the article by Dr. Paula Becker e Dr. Denise Razzouk called " Relationships between age of onset of drug use, use pattern, and direct health costs in a sample of adults’ drug dependents in treatment at a Brazilian community mental health service ".
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Replication package to construct the analysis for "When Do Politicians Appeal Broadly? The Economic Consequences of Electoral Rules in Brazil." It uses datasets constructed from four data sources: 1) Brazil municipal election data; 2) Brazil demographic censuses; 3) Brazil school census; and 4) nighttime lights.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Regression analysis for economic growth by OLS.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The digital economy (DE) has become a major breakthrough in promoting industrial upgrading and an important engine for high-quality economic growth. However, most studies have neglected the important driving effect of regional economic and social (RES) development on DE. In this paper, we discuss the mechanism of RES development promoting the development of DE, and establish a demand-driven regional DE development model to express the general idea. With the help of spatial analysis toolbox in ArcGIS software, the spatial development characteristics of DE in the Yangtze River Delta City Cluster (YRDCC) is explored. We find the imbalance of spatial development is very significant in YRDCC, no matter at the provincial level or city level. Quantitative analysis reveals that less than 1% likelihood that the imbalanced or clustered pattern of DE development in YRDCC could be the result of random chance. Geographically weighted regression (GWR) analysis with publicly available dataset of YRDCC indicates RES development significantly promotes the development of DE.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Urban economic competitiveness is a fundamental indicator for assessing the level of urban development and serves as an effective approach for understanding regional disparities. Traditional economic competitiveness research that relies solely on traditional regression models and assumes feature relationship theory tends to fall short in fully exploring the intricate interrelationships and nonlinear associations among features. As a result, the study of urban economic disparities remains limited to a narrow range of urban features, which is insufficient for comprehending cities as complex systems. The ability of deep learning neural networks to automatically construct models of nonlinear relationships among complex features provides a new approach to research in this issue. In this study, a complex urban feature dataset comprising 1008 features was constructed based on statistical data from 283 prefecture-level cities in China. Employing a machine learning approach based on convolutional neural network (CNN), a novel analytical model is constructed to capture the interrelationships among urban features, which is applied to achieve accurate classification of urban economic competitiveness. In addition, considering the limited number of samples in the dataset owing to the fixed number of cities, this study developed a data augmentation approach based on deep convolutional generative adversarial network (DCGAN) to further enhance the accuracy and generalization ability of the model. The performance of the CNN classification model was effectively improved by adding the generated samples to the original sample dataset. This study provides a precise and stable analytical model for investigating disparities in regional development. In the meantime, it offers a feasible solution to the limited sample size issue in the application of deep learning in urban research.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Regression analysis.
Review Economics and Statistics: Forthcoming. Visit https://dataone.org/datasets/sha256%3A5c6d9a57835ff9097126708e2b53bc66c569250f049fb86a1172478089de8737 for complete metadata about this dataset.
This dataset utilizes data from Chinese provinces spanning the period from 2004 to 2022,. It employs the entropy weighting method and text analysis to construct indicators that analyze the spatial impact of science and technology finance (Sci-tech finance) on high-quality economic development, using a spatial autoregressive model.
Problem Statement
👉 Download the case studies here
Investors and buyers in the real estate market faced challenges in accurately assessing property values and market trends. Traditional valuation methods were time-consuming and lacked precision, making it difficult to make informed investment decisions. A real estate firm sought a predictive analytics solution to provide accurate property price forecasts and market insights.
Challenge
Developing a real estate price prediction system involved addressing the following challenges:
Collecting and processing vast amounts of data, including historical property prices, economic indicators, and location-specific factors.
Accounting for diverse variables such as neighborhood quality, proximity to amenities, and market demand.
Ensuring the model’s adaptability to changing market conditions and economic fluctuations.
Solution Provided
A real estate price prediction system was developed using machine learning regression models and big data analytics. The solution was designed to:
Analyze historical and real-time data to predict property prices accurately.
Provide actionable insights on market trends, enabling better investment strategies.
Identify undervalued properties and potential growth areas for investors.
Development Steps
Data Collection
Collected extensive datasets, including property listings, sales records, demographic data, and economic indicators.
Preprocessing
Cleaned and structured data, removing inconsistencies and normalizing variables such as location, property type, and size.
Model Development
Built regression models using techniques such as linear regression, decision trees, and gradient boosting to predict property prices. Integrated feature engineering to account for location-specific factors, amenities, and market trends.
Validation
Tested the models using historical data and cross-validation to ensure high prediction accuracy and robustness.
Deployment
Implemented the prediction system as a web-based platform, allowing users to input property details and receive price estimates and market insights.
Continuous Monitoring & Improvement
Established a feedback loop to update models with new data and refine predictions as market conditions evolved.
Results
Increased Prediction Accuracy
The system delivered highly accurate property price forecasts, improving investor confidence and decision-making.
Informed Investment Decisions
Investors and buyers gained valuable insights into market trends and property values, enabling better strategies and reduced risks.
Enhanced Market Insights
The platform provided detailed analytics on neighborhood trends, demand patterns, and growth potential, helping users identify opportunities.
Scalable Solution
The system scaled seamlessly to include new locations, property types, and market dynamics.
Improved User Experience
The intuitive platform design made it easy for users to access predictions and insights, boosting engagement and satisfaction.
Municipal and provincial Italian electoral data 2018, together with economic and immigration data at the same level. Shapefiles dataset for maps and spatial regression models Scholars agree that two major issues oriented voting behaviours during the Italian general election of 2018. The first was the state of the economy, which had not yet recovered from the lowest points reached during the Great Recession, but had nevertheless exhibited some marginal improvement. The second issue originated from another crisis, the refugee and asylum emergency, which contributed to increasing the presence of foreigners in Italy and the salience of the migration issue. The article investigates the impact of these two types of problem on the 2018 election results by using aggregated objective data at the municipal level. It finds confirmation of the two issues’ impact on retrospective punishment of the incumbent Democratic Party also when using spatial regression models distinguishing the direct influence and the spill-over effects of the poor state of the economy and an increase in the size of the foreign population.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Snider, Connan, and Williams, Jonathan W., (2015) "Barriers to Entry in the Airline Industry: A Multi-Dimensional Regression-Discontinuity Analysis of AIR-21." Review of Economics and Statistics 97:5, 1002-1022.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
The first two tables give detailed descriptions of the definitions used in the report. The latter two tables contain the coefficient results from the regression models run in the analysis.
Review of Economics and Statistics: Forthcoming. Visit https://dataone.org/datasets/sha256%3A6b7c9ddc73f32ceb5e078ce9a3433aae5552b5e5a6fa8dfb7f13d399d7d1aceb for complete metadata about this dataset.
Computers are now involved in many economic transactions and can capture data associated with these transactions, which can then be manipulated and analyzed. Conventional statistical and econometric techniques such as regression often work well, but there are issues unique to big datasets that may require different tools. First, the sheer size of the data involved may require more powerful data manipulation tools. Second, we may have more potential predictors than appropriate for estimation, so we need to do some kind of variable selection. Third, large datasets may allow for more flexible relationships than simple linear models. Machine learning techniques such as decision trees, support vector machines, neural nets, deep learning, and so on may allow for more effective ways to model complex relationships. In this essay, I will describe a few of these tools for manipulating and analyzing big data. I believe that these methods have a lot to offer and should be more widely known and used by economists.
Forests in Washington State generate substantial economic revenue from commercial timber harvesting on private lands. To investigate the rates, causes, and spatial and temporal patterns of forest harvest on private tracts throughout the central Cascade Mountain area, we relied on a new generation of annual land-use/land-cover (LULC) products created from the application of the Continuous Change Detection and Classification (CCDC) algorithm to Landsat satellite imagery collected from 1985 to 2014. We calculated metrics of landscape pattern using patches of intact and harvested forest patches identified in each annual layer to identify changes throughout the time series. Patch dynamics revealed four distinct eras of logging trends that align with prevailing regulations and economic conditions. We used multiple logistic regression to determine the biophysical and anthropogenic factors that influence fine-scale selection of harvest stands in each time period. Results show that private forestland became significantly reduced and more fragmented from 1985 to 2014. Variables linked to parameters of site conditions, location, climate, and vegetation greenness consistently distinguished harvest selection for each distinct era. This study demonstrates the utility of annual LULC data for investigating the underlying factors that influence land cover change.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Replication files for David Slichter, "The Employment Effects of the Minimum Wage: A Selection Ratio Approach to Measuring Treatment Effects,” Journal of Applied Econometrics, forthcoming
Firstly, I’ve provided a .do file called sr.do which contains general code for implementing the selection ratio approach, with detailed instructions written as comments in the code.
For the minimum wage application, the main data file is mw_final.dta. A .csv version is also provided. Observations are a county in a time period. I have added self-explanatory variable labels for most variables. A few variables warrant a clearer explanation:
adj1-adj14: List of FIPS codes of all counties which are adjacent to the county in question. Each variables holds one adjacent county, and counties with fewer than 14 neighbors will have missing values for some of these variables.
change, logchange: Minimum wage this quarter - minimum wage last quarter, measured either in dollars or in logs.
time, t1-t108: The variable "time" converts years and quarters into a univariate time period, with time=1 in 1990Q1 and time=108 in 2016Q4. t1-t108 are indicators for each of these time periods.
lnemp_1418, lnearnbeg_1418, lnsep_1418, lnhira_1418, lnchurn_1418: Logs of employment, earnings, separations, hires, and churn, respectively, for 14-18 year olds.
gt1-gt6: Dummies for inclusion in each of the six comparisons used for the main (i.e., not spillover-robust) analysis. All treated counties which neighbor a control country take value 1 for each of these variables; all other treated counties take value 0. Among control counties, gt1=1 if the county neighbors a treated county and 0 otherwise, gt2=1 if the county has gt1=0 but neighbors a gt1=1 county, gt3=1 if county has gt1=gt2=0 but neighbors a gt2=1 county, etc.
h2-h6: Dummies for inclusion in each of the first spillover-robust (i.e., excluding border counties only) comparisons. Among control counties, h2-h6 are equal to gt2-gt6. Among treated counties, h2-h6 are equal to 1 if the treated county has gt1=0 but borders a gt1=1 county, and 0 otherwise.
k3-k6: Dummies for inclusion in each of the second spillover-robust (i.e., excluding two layers) comparisons. Among control counties, these variables are equal to gt3-gt6. Among treated counties, all observations take value 1 except those with gt1=1 or h2=1.
The data sources are as follows. The minimum wage law series is taken from David Neumark's website (https://www.economics.uci.edu/~dneumark/datasets.html). The economic variables are taken from the QWI, which I accessed via the Ithaca Virtual RDC. County adjacency files were downloaded from the Census Bureau (https://www.census.gov/geo/reference/county-adjacency.html).
The file main.do then runs the analyses. The resulting output file containing results is results.dta.
For the incumbency application, the main data file is incumb_final.dta. A .csv version is also provided. This file is drawn from Caughey and Sekhon's (2011) data; see their description of most variables here: https://doi.org/10.7910/DVN/8EYYA2
The key added variables are _IDistancea1-_IDistancea50, which are dummies for inclusion in the 50 comparisons used in the paper. Treated observations (i.e., Democratic wins) with margin of victory below 5 points have each of these variables equal to 1. Control observations have these variables equal to 1 if they fall within the margin of victory range, e.g., _IDistancea9=1 for control observations with Republican margin of victory between 8 and 9 points. Note that these variables are redefined by the code for the analyses of treatment effects away from the discontinuity. Lastly, there is a variable called RepWin which is the treatment variable when treatment is defined as a Republican winning.
The file sr_incumb.do then performs the analysis.
Please contact me with any questions at slichter@binghamton.edu.