Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
These features provide valuable information about the characteristics of neighborhoods that can influence housing prices.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Ankush Kashyap
Released under Apache 2.0
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Housing Values in Suburbs of Boston The medv variable is the target variable.
Data description The Boston data frame has 506 rows and 14 columns.
This data frame contains the following columns:
crim per capita crime rate by town.
zn proportion of residential land zoned for lots over 25,000 sq.ft.
indus proportion of non-retail business acres per town.
chas Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).
nox nitrogen oxides concentration (parts per 10 million).
rm average number of rooms per dwelling.
age proportion of owner-occupied units built prior to 1940.
dis weighted mean of distances to five Boston employment centres.
rad index of accessibility to radial highways.
tax full-value property-tax rate per $10,000.
ptratio pupil-teacher ratio by town.
black 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town.
lstat lower status of the population (percent).
medv median value of owner-occupied homes in $1000s.
Source Harrison, D. and Rubinfeld, D.L. (1978) Hedonic prices and the demand for clean air. J. Environ. Economics and Management 5, 81–102.
Belsley D.A., Kuh, E. and Welsch, R.E. (1980) Regression Diagnostics. Identifying Influential Data and Sources of Collinearity. New York: Wiley.
Facebook
TwitterThis dataset was created by Vishal Kushwaha
Facebook
TwitterThe Boston Housing Dataset
The Boston Housing Dataset is a derived from information collected by the U.S. Census Service concerning housing in the area of Boston MA. The following describes the dataset columns:
The following describes the dataset columns:
CRIM - per capita crime rate by town ZN - proportion of residential land zoned for lots over 25,000 sq.ft. INDUS - proportion of non-retail business acres per town. CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise) NOX - nitric oxides concentration (parts per 10 million) RM - average number of rooms per dwelling AGE - proportion of owner-occupied units built prior to 1940 DIS - weighted distances to five Boston employment centres RAD - index of accessibility to radial highways TAX - full-value property-tax rate per $10,000 PTRATIO - pupil-teacher ratio by town B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town LSTAT - % lower status of the population MEDV - Median value of owner-occupied homes in $1000's
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Domain: Real Estate
Difficulty: Easy to Medium
Challenges:
1. Missing value treatment
2. Outlier treatment
3. Understanding which variables drive the price of homes in Boston
Summary: The Boston housing dataset contains 506 observations and 14 variables. The dataset contains missing values.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
housing.csv: this dataset is constituted by 506 points in 14 dimensions. Each point represents a house in the Boston area, and the 14 attributes that you find orderly in each column are the following:
* CRIM - per capita crime rate by town
* ZN - proportion of residential land zoned for lots over 25,000 sq.ft.
* INDUS - proportion of non-retail business acres per town.
* CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise)
* NOX - nitric oxides concentration (parts per 10 million)
* RM - average number of rooms per dwelling
* AGE - proportion of owner-occupied units built prior to 1940
* DIS - weighted distances to five Boston employment centres
* RAD - index of accessibility to radial highways
* TAX - full-value property-tax rate per $10,000
* PTRATIO - pupil-teacher ratio by town
* B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
* LSTAT - % lower status of the population
* MEDV - Median value of owner-occupied homes in $1000's
This dataset is normally associated with 2 regression tasks: predicting NOX (in which the nitrous oxide level is to be predicted); and predicting price MEDV (in which the median value of a home is to be predicted).
This dataset was also pre-processed and scaled.
Facebook
TwitterThere are 14 attributes in each case of the dataset. They are:
CRIM - per capita crime rate by town ZN - proportion of residential land zoned for lots over 25,000 sq.ft. INDUS - proportion of non-retail business acres per town. CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise) NOX - nitric oxides concentration (parts per 10 million) RM - average number of rooms per dwelling AGE - proportion of owner-occupied units built prior to 1940 DIS - weighted distances to five Boston employment centres RAD - index of accessibility to radial highways TAX - full-value property-tax rate per $10,000 PTRATIO - pupil-teacher ratio by town B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town LSTAT - % lower status of the population MEDV - Median value of owner-occupied homes in $1000's
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a copy of the original Boston Housing Dataset. As of December 2021, the original link doesn't contain the dataset so I'm uploading it if anyone wants to use it. I'll implement a linear regression model to predict the output 'MEDV' variable using PyTorch (check the companion notebook).
I took the data given in this link and processed it to include the column names as well.
https://www.kaggle.com/prasadperera/the-boston-housing-dataset/data
Good luck on your data science career :)
Facebook
TwitterThis is a classic dataset for machine learning practice. I do not own it and the sources of the dataset are mentioned in the metadata and at the end of this description. I first came in contact with the dataset during the Business Analytics course given by the Lund University's School of Economics and Management.
The Boston dataset contain 506 rows and 14 columns.
http://lib.stat.cmu.edu/datasets/boston
Harrison, D. and Rubinfeld, D.L. (1978) Hedonic prices and the demand for clean air. J. Environ. Economics and Management 5, 81–102.
Belsley D.A., Kuh, E. and Welsch, R.E. (1980) Regression Diagnostics. Identifying Influential Data and Sources of Collinearity. New York: Wiley.
Facebook
TwitterThis dataset was created by Srushti Kardile
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Hemant Choudhary
Released under MIT
Facebook
TwitterThis dataset was created by Devendrasingh J Jhala
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Boston Housing dataset, which is often used for regression analysis and predictive modeling tasks, doesn't typically have an official "subtitle." However, it's commonly referred to as the "Boston Housing dataset" or the "Boston Housing Price dataset" due to its focus on housing-related features and its primary target variable being the median value of owner-occupied homes in Boston suburbs.
Column Description
Columns: 1. CRIM: per capita crime rate by town (numeric) 2. ZN: proportion of residential land zoned for lots over 25,000 sq.ft. (numeric) 3. INDUS: proportion of non-retail business acres per town (numeric) 4. CHAS: Charles River dummy variable (1 if tract bounds river; 0 otherwise) (categorical) 5. NOX: nitric oxides concentration (parts per 10 million) (numeric) 6. RM: average number of rooms per dwelling (numeric) 7. AGE: proportion of owner-occupied units built prior to 1940 (numeric) 8. DIS: weighted distances to five Boston employment centres (numeric) 9. RAD: index of accessibility to radial highways (numeric) 10. TAX: full-value property-tax rate per $10,000 (numeric) 11. PTRATIO: pupil-teacher ratio by town (numeric) 12. B: 1000(Bk - 0.63)^2 where Bk is the proportion of [people of African American descent] by town (numeric) 13. LSTAT: % lower status of the population (numeric) 14. MEDV: Median value of owner-occupied homes in $1000s (target variable) (numeric)
Facebook
TwitterThis dataset was created by Lucas Guttensohn
Facebook
TwitterThis dataset was created by Kushal Sharma
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Lakshmi Kruthika
Released under Apache 2.0
Facebook
TwitterAll the following text is copied directly from the original dataset used: https://www.kaggle.com/datasets/fedesoriano/the-boston-houseprice-data
The only difference is that features 12 and 13 have been removed for simplicity. See original link for a version with those features in place.
Gender Pay Gap Dataset: https://www.kaggle.com/fedesoriano/gender-pay-gap-dataset
California Housing Prices Data (5 new features!): https://www.kaggle.com/fedesoriano/california-housing-prices-data-extra-features
Company Bankruptcy Prediction: https://www.kaggle.com/fedesoriano/company-bankruptcy-prediction
Spanish Wine Quality Dataset: https://www.kaggle.com/datasets/fedesoriano/spanish-wine-quality-dataset
The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978.
Input features in order:
1) CRIM: per capita crime rate by town
2) ZN: proportion of residential land zoned for lots over 25,000 sq.ft.
3) INDUS: proportion of non-retail business acres per town
4) CHAS: Charles River dummy variable (1 if tract bounds river; 0 otherwise)
5) NOX: nitric oxides concentration (parts per 10 million) [parts/10M]
6) RM: average number of rooms per dwelling
7) AGE: proportion of owner-occupied units built prior to 1940
8) DIS: weighted distances to five Boston employment centres
9) RAD: index of accessibility to radial highways
10) TAX: full-value property-tax rate per $10,000 [$/10k]
11) PTRATIO: pupil-teacher ratio by town
[Original features 12 and 13 have been deliberately removed from this version of the dataset]
Output variable:
1) MEDV: Median value of owner-occupied homes in $1000's [k$]
StatLib - Carnegie Mellon University
Harrison, David & Rubinfeld, Daniel. (1978). Hedonic housing prices and the demand for clean air. Journal of Environmental Economics and Management. 5. 81-102. 10.1016/0095-0696(78)90006-2. https://www.researchgate.net/profile/Daniel-Rubinfeld/publication/4974606_Hedonic_housing_prices_and_the_demand_for_clean_air/links/5c38ce85458515a4c71e3a64/Hedonic-housing-prices-and-the-demand-for-clean-air.pdf
Belsley, David A. & Kuh, Edwin. & Welsch, Roy E. (1980). Regression diagnostics: identifying influential data and sources of collinearity. New York: Wiley https://www.wiley.com/en-us/Regression+Diagnostics%3A+Identifying+Influential+Data+and+Sources+of+Collinearity-p-9780471691174
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The Boston Housing dataset is a well-known dataset in the field of predictive modeling and statistics. It contains information collected by the U.S. Census Service concerning housing in the area of Boston Mass.
The dataset includes the following features:
This dataset can be used for:
Details about the dataset and its original source can be found in the following reference:
Facebook
TwitterThis dataset was created by Kunaal Naik
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
These features provide valuable information about the characteristics of neighborhoods that can influence housing prices.