Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
These features provide valuable information about the characteristics of neighborhoods that can influence housing prices.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Ever wondered which city in Greater Boston reigns supreme? š Look no further! This dataset ranks 141 cities and towns in the Greater Boston area based on a variety of factors, including:
š” Housing ā Where can you actually afford to live? (Trick question, itās Boston.)
š Safety ā Which towns are as secure as Fort Knox?
š Mobility ā How easy is it to get around without losing your mind?
š„ Health ā Where are the healthiest, happiest Bostonians?
š Entertainment ā Because life is more than just Dunkin' runs.
š Diversity ā The melting pot of cultures in each town.
š Education ā Where future Einsteins are born.
š¼ Employment ā Who's hiring and where are the best career opportunities?
This dataset is perfect for data exploration, visualizations, and even some lighthearted city rivalry. Whether you're a data analyst, a real estate enthusiast, or just looking to settle the debate with your friends on which Boston suburb is the bestāthis dataset has you covered!
š Insights Await! Can you uncover hidden trends, build an interactive ranking map, or find the best place for your next move? Let's find out!
Example notebook: https://www.kaggle.com/code/michaeldelamaza/boston-city-rankings-linear-regression
Facebook
TwitterODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Each year, the City of Boston publishes payroll data for employees. This dataset contains employee names, job details, and earnings information including base salary, overtime, and total compensation for employees of the City.
See the "Payroll Categories" document below for an explanation of what types of earnings are included in each category.
Facebook
TwitterODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
City of Boston boundary that excludes water.
Facebook
TwitterODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Buildings and space owned by the City of Boston
Facebook
TwitterODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
New Boston City Council districts for 2023-2031 municipal elections. Passed by the City Council on May 24th, 2023.The City Council Districts data layer reflects Chapter 9 of the Ordinances of 2022.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This data set contains a series of administrative geographies utilized by the City of Boston, including: traditional neighborhoods defined by the Boston Planning and Development Agency (BPDA Neighborhood Statistical Areas, BPDA Planning Districts); election board regions (City Council Districts, Election Precincts, Election Wards); and districts for City operations (Fire Districts, ISD Neighborhoods, Police Districts, Public Works Districts, and ZIP Codes). For each we include a shape file with unique identifiers. These geographic files were obtained from the City of Boston Department of Innovation and Technologyās Analyze Boston data hub site: https://data.boston.gov/. Information pertaining to these files can be found in the most recent documentation for the "Geographical Infrastructure for the City of Boston." Note: These geographies are the most recent updated versions at the time of the release of the most recent Geographical Infrastructure for the City of Boston.
Facebook
TwitterODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Provides metrics on overall city health based on work done across all facets of the City of Boston.
For more information about CityScore, please refer to this link: https://www.boston.gov/cityscore
Facebook
TwitterBoston MA city boundary including water features.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Housing Values in Suburbs of Boston The medv variable is the target variable.
Data description The Boston data frame has 506 rows and 14 columns.
This data frame contains the following columns:
crim per capita crime rate by town.
zn proportion of residential land zoned for lots over 25,000 sq.ft.
indus proportion of non-retail business acres per town.
chas Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).
nox nitrogen oxides concentration (parts per 10 million).
rm average number of rooms per dwelling.
age proportion of owner-occupied units built prior to 1940.
dis weighted mean of distances to five Boston employment centres.
rad index of accessibility to radial highways.
tax full-value property-tax rate per $10,000.
ptratio pupil-teacher ratio by town.
black 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town.
lstat lower status of the population (percent).
medv median value of owner-occupied homes in $1000s.
Source Harrison, D. and Rubinfeld, D.L. (1978) Hedonic prices and the demand for clean air. J. Environ. Economics and Management 5, 81ā102.
Belsley D.A., Kuh, E. and Welsch, R.E. (1980) Regression Diagnostics. Identifying Influential Data and Sources of Collinearity. New York: Wiley.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The graph shows the citations of ^'s papers published in each year.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Overview
This dataset is a cleaned and updated version of the classic Boston Housing Dataset, originally made available by the U.S. Census and later popularized in machine learning communities. It contains detailed information about housing prices in Boston suburbs, along with environmental, structural, and socio-economic indicators for each neighborhood.
The dataset is widely used as a benchmark for regression tasks and offers an excellent opportunity to explore linear modeling, feature engineering, multicollinearity analysis, bias mitigation, and more. š Context
Originally published by Harrison and Rubinfeld in 1978, this dataset has been widely adopted in the machine learning and statistics communities. It contains 506 observations, each representing a town or neighborhood in the Boston metropolitan area.
However, some features in the datasetāparticularly the B column which encodes race-based informationāhave become the subject of ethical scrutiny in recent years. Therefore, this version may have undergone data cleaning, feature selection, or modification to ensure it is more appropriate for modern and ethical ML applications. š Features Feature Description CRIM Per capita crime rate by town ZN Proportion of residential land zoned for lots over 25,000 sq. ft. INDUS Proportion of non-retail business acres per town CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) NOX Nitric oxides concentration (parts per 10 million) RM Average number of rooms per dwelling AGE Proportion of owner-occupied units built before 1940 DIS Weighted distance to five Boston employment centers RAD Index of accessibility to radial highways TAX Property tax rate per $10,000 PTRATIO Pupil-teacher ratio by town B 1000(Bk - 0.63)^2 where Bk is the proportion of Black residents LSTAT Percentage of lower-status population MEDV Median value of owner-occupied homes in $1000s (Target Variable)
š” Note: Some features (e.g., CHAS, B, or RAD) may have been removed or modified in this version depending on your ethical preprocessing or cleaning steps.
šÆ Target Variable
MEDV: Median value of owner-occupied homes (in $1000s). This is the value we aim to predict in regression tasks.
ā Use Cases
This dataset is ideal for:
Predictive modeling using linear regression or advanced ML techniques
Feature engineering and feature selection
Studying the effects of urban and environmental variables on real estate prices
Analyzing multicollinearity and variable importance
Exploring ethical considerations in machine learning
āļø Ethical Considerations
The original dataset includes the feature B, which encodes racial information. While historically included for statistical analysis, modern ML best practices recommend caution when using such data to avoid unintended bias or discrimination.
In this version, you may choose to remove or retain the column depending on the intended use and audience.
Always consider the fairness, accountability, and transparency of your ML models.
š File Information
Filename: boston_housing_cleaned.csv
Records: 506 rows (observations)
Columns: 13 features + 1 target variable (depending on cleaning)
Missing Values: None (in original); NA if introduced during preprocessing
Source: Based on U.S. Census data (original), sourced from Kaggle and cleaned
š Tags
housing-prices Ā· regression Ā· real-estate Ā· data-cleaning Ā· ethical-ml Ā· boston Ā· exploratory-data-analysis Ā· feature-engineering
Facebook
Twitterhttps://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Graph and download economic data for Resident Population in South Boston City, VA (DISCONTINUED) (VASBOS0POP) from 1970 to 1989 about south, Boston, VA, residents, population, and USA.
Facebook
TwitterODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Monthly utility data for all City of Boston accounts. This data comes from Bostonās Enterprise Energy Management System. This software tool serves as the system of record for all municipal utility expenditures and energy/water use.
Facebook
TwitterODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
This data, maintained by the Mayorās Office of Housing (MOH), is an inventory of all income-restricted units in the city. This data includes public housing owned by the Boston Housing Authority (BHA), privately- owned housing built with funding from DND and/or on land that was formerly City-owned, and privately-owned housing built without any City subsidy, e.g., created using Low-Income Housing Tax Credits (LIHTC) or as part of the Inclusionary Development Policy (IDP). Information is gathered from a variety of sources, including the City's IDP list, permitting and completion data from the Inspectional Services Department (ISD), newspaper advertisements for affordable units, Community Economic Development Assistance Corporationās (CEDAC) Expiring Use list, and project lists from the BHA, the Massachusetts Department of Housing and Community Development (DHCD), MassHousing, and the U.S. Department of Housing and Urban Development (HUD), among others. The data is meant to be as exhaustive and up-to-date as possible, but since many units are not required to report data to the City of Boston, MOH is constantly working to verify and update it. See the data dictionary for more information on the structure of the data and important notes.
The database only includes units that have a deed-restriction. It does not include tenant-based (also known as mobile) vouchers, which subsidize rent, but move with the tenant and are not attached to a particular unit. There are over 22,000 tenant-based vouchers in the city of Boston which provide additional affordability to low- and moderate-income households not accounted for here.
The Income-Restricted Housing report can be directly accessed here:
https://www.boston.gov/sites/default/files/file/2023/04/Income%20Restricted%20Housing%202022_0.pdf
Learn more about income-restricted housing (as well as other types of affordable housing) here: https://www.boston.gov/affordable-housing-boston#income-restricted
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
U.S. Census Bureau QuickFacts statistics for Boston city, Massachusetts. QuickFacts data are derived from: Population Estimates, American Community Survey, Census of Population and Housing, Current Population Survey, Small Area Health Insurance Estimates, Small Area Income and Poverty Estimates, State and County Housing Unit Estimates, County Business Patterns, Nonemployer Statistics, Economic Census, Survey of Business Owners, Building Permits.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Boston Housing dataset, which is often used for regression analysis and predictive modeling tasks, doesn't typically have an official "subtitle." However, it's commonly referred to as the "Boston Housing dataset" or the "Boston Housing Price dataset" due to its focus on housing-related features and its primary target variable being the median value of owner-occupied homes in Boston suburbs.
Column Description
Columns: 1. CRIM: per capita crime rate by town (numeric) 2. ZN: proportion of residential land zoned for lots over 25,000 sq.ft. (numeric) 3. INDUS: proportion of non-retail business acres per town (numeric) 4. CHAS: Charles River dummy variable (1 if tract bounds river; 0 otherwise) (categorical) 5. NOX: nitric oxides concentration (parts per 10 million) (numeric) 6. RM: average number of rooms per dwelling (numeric) 7. AGE: proportion of owner-occupied units built prior to 1940 (numeric) 8. DIS: weighted distances to five Boston employment centres (numeric) 9. RAD: index of accessibility to radial highways (numeric) 10. TAX: full-value property-tax rate per $10,000 (numeric) 11. PTRATIO: pupil-teacher ratio by town (numeric) 12. B: 1000(Bk - 0.63)^2 where Bk is the proportion of [people of African American descent] by town (numeric) 13. LSTAT: % lower status of the population (numeric) 14. MEDV: Median value of owner-occupied homes in $1000s (target variable) (numeric)
Facebook
Twitterhttps://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Graph and download economic data for Consumer Price Index for All Urban Consumers: Housing in Boston-Cambridge-Newton, MA-NH (CBSA) (CUUSA103SAH) from 1984 to 2024 about ME, Boston, NH, CT, MA, urban, consumer, CPI, housing, inflation, price index, indexes, price, and USA.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
housing.csv: this dataset is constituted by 506 points in 14 dimensions. Each point represents a house in the Boston area, and the 14 attributes that you find orderly in each column are the following:
* CRIM - per capita crime rate by town
* ZN - proportion of residential land zoned for lots over 25,000 sq.ft.
* INDUS - proportion of non-retail business acres per town.
* CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise)
* NOX - nitric oxides concentration (parts per 10 million)
* RM - average number of rooms per dwelling
* AGE - proportion of owner-occupied units built prior to 1940
* DIS - weighted distances to five Boston employment centres
* RAD - index of accessibility to radial highways
* TAX - full-value property-tax rate per $10,000
* PTRATIO - pupil-teacher ratio by town
* B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
* LSTAT - % lower status of the population
* MEDV - Median value of owner-occupied homes in $1000's
This dataset is normally associated with 2 regression tasks: predicting NOX (in which the nitrous oxide level is to be predicted); and predicting price MEDV (in which the median value of a home is to be predicted).
This dataset was also pre-processed and scaled.
Facebook
TwitterThere are 14 attributes in each case of the dataset. They are:
CRIM - per capita crime rate by town ZN - proportion of residential land zoned for lots over 25,000 sq.ft. INDUS - proportion of non-retail business acres per town. CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise) NOX - nitric oxides concentration (parts per 10 million) RM - average number of rooms per dwelling AGE - proportion of owner-occupied units built prior to 1940 DIS - weighted distances to five Boston employment centres RAD - index of accessibility to radial highways TAX - full-value property-tax rate per $10,000 PTRATIO - pupil-teacher ratio by town B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town LSTAT - % lower status of the population MEDV - Median value of owner-occupied homes in $1000's
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
These features provide valuable information about the characteristics of neighborhoods that can influence housing prices.