Facebook
TwitterThis dataset is designed for beginners to practice regression problems, particularly in the context of predicting house prices. It contains 1000 rows, with each row representing a house and various attributes that influence its price. The dataset is well-suited for learning basic to intermediate-level regression modeling techniques.
Beginner Regression Projects: This dataset can be used to practice building regression models such as Linear Regression, Decision Trees, or Random Forests. The target variable (house price) is continuous, making this an ideal problem for supervised learning techniques.
Feature Engineering Practice: Learners can create new features by combining existing ones, such as the price per square foot or age of the house, providing an opportunity to experiment with feature transformations.
Exploratory Data Analysis (EDA): You can explore how different features (e.g., square footage, number of bedrooms) correlate with the target variable, making it a great dataset for learning about data visualization and summary statistics.
Model Evaluation: The dataset allows for various model evaluation techniques such as cross-validation, R-squared, and Mean Absolute Error (MAE). These metrics can be used to compare the effectiveness of different models.
The dataset is highly versatile for a range of machine learning tasks. You can apply simple linear models to predict house prices based on one or two features, or use more complex models like Random Forest or Gradient Boosting Machines to understand interactions between variables.
It can also be used for dimensionality reduction techniques like PCA or to practice handling categorical variables (e.g., neighborhood quality) through encoding techniques like one-hot encoding.
This dataset is ideal for anyone wanting to gain practical experience in building regression models while working with real-world features.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset contains 2000 rows of house-related data, representing various features that could influence house prices. Below, we discuss key aspects of the dataset, which include its structure, the choice of features, and potential use cases for analysis.
The dataset is designed to capture essential attributes for predicting house prices, including:
Area: Square footage of the house, which is generally one of the most important predictors of price. Bedrooms & Bathrooms: The number of rooms in a house significantly affects its value. Homes with more rooms tend to be priced higher. Floors: The number of floors in a house could indicate a larger, more luxurious home, potentially raising its price. Year Built: The age of the house can affect its condition and value. Newly built houses are generally more expensive than older ones. Location: Houses in desirable locations such as downtown or urban areas tend to be priced higher than those in suburban or rural areas. Condition: The current condition of the house is critical, as well-maintained houses (in 'Excellent' or 'Good' condition) will attract higher prices compared to houses in 'Fair' or 'Poor' condition. Garage: Availability of a garage can increase the price due to added convenience and space. Price: The target variable, representing the sale price of the house, used to train machine learning models to predict house prices based on the other features.
Area Distribution: The area of the houses in the dataset ranges from 500 to 5000 square feet, which allows analysis across different types of homes, from smaller apartments to larger luxury houses. Bedrooms and Bathrooms: The number of bedrooms varies from 1 to 5, and bathrooms from 1 to 4. This variance enables analysis of homes with different sizes and layouts. Floors: Houses in the dataset have between 1 and 3 floors. This feature could be useful for identifying the influence of multi-level homes on house prices. Year Built: The dataset contains houses built from 1900 to 2023, giving a wide range of house ages to analyze the effects of new vs. older construction. Location: There is a mix of urban, suburban, downtown, and rural locations. Urban and downtown homes may command higher prices due to proximity to amenities. Condition: Houses are labeled as 'Excellent', 'Good', 'Fair', or 'Poor'. This feature helps model the price differences based on the current state of the house. Price Distribution: Prices range between $50,000 and $1,000,000, offering a broad spectrum of property values. This range makes the dataset appropriate for predicting a wide variety of housing prices, from affordable homes to luxury properties.
3. Correlation Between Features
A key area of interest is the relationship between various features and house price: Area and Price: Typically, a strong positive correlation is expected between the size of the house (Area) and its price. Larger homes are likely to be more expensive. Location and Price: Location is another major factor. Houses in urban or downtown areas may show a higher price on average compared to suburban and rural locations. Condition and Price: The condition of the house should show a positive correlation with price. Houses in better condition should be priced higher, as they require less maintenance and repair. Year Built and Price: Newer houses might command a higher price due to better construction standards, modern amenities, and less wear-and-tear, but some older homes in good condition may retain historical value. Garage and Price: A house with a garage may be more expensive than one without, as it provides extra storage or parking space.
The dataset is well-suited for various machine learning and data analysis applications, including:
House Price Prediction: Using regression techniques, this dataset can be used to build a model to predict house prices based on the available features. Feature Importance Analysis: By using techniques such as feature importance ranking, data scientists can determine which features (e.g., location, area, or condition) have the greatest impact on house prices. Clustering: Clustering techniques like k-means could help identify patterns in the data, such as grouping houses into segments based on their characteristics (e.g., luxury homes, affordable homes). Market Segmentation: The dataset can be used to perform segmentation by location, price range, or house type to analyze trends in specific sub-markets, like luxury vs. affordable housing. Time-Based Analysis: By studying how house prices vary with the year built or the age of the house, analysts can derive insights into the trends of older vs. newer homes.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Average house prices are derived from data supplied by the mortgage lending agencies on loans approved by them rather than loans paid. In comparing house prices figures from one period to another, account should be taken of the fact that changes in the mix of houses (incl apartments) will affect the average figures. The most current data is published on these sheets. Previously published data may be subject to revision. Any change from the originally published data will be highlighted by a comment on the cell in question. These comments will be maintained for at least a year after the date of the value change. Excluding apartments, measured in € Figure changed on the 27/6/16 as revised data received from the Local authority .hidden { display: none }
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Licensed under: Creative Commons Attribution 4.0
Facebook
TwitterOpen Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
This report provides common criteria to help identify high value datasets and provide examples of common types of high value datasets. It was based on jurisdictional scans of high value dataset criteria, recent surveys, and international standards
Facebook
TwitterRedfin is a real estate brokerage and publishes the US housing market data on a regular basis. Using this dataset, you can analyze and visualize housing market data for US cities. Timeline: Starting from February 2012 until the present time (Data is refreshed and updated on a monthly basis)
The dataset has the following columns:
- period_begin
- period_end
- period_duration
- region_type
- region_type_id
- table_id
- is_seasonally_adjusted. (indicates if prices are seasonally adjusted; f represents False)
- region
- city
- state
- state_code
- property_type
- property_type_id
- median_sale_price
- median_sale_price_mom (median sale price changes month over month)
- median_sale_price_yoy (median sale price changes year over year)
- median_list_price
- median_list_price_mom (median list price changes month over month)
- median_list_price_yoy (median list price changes year over year)
- median_ppsf (median sale price per square foot)
- median_ppsf_mom (median sale price per square foot changes month over month)
- median_ppsf_yoy (median sale price per square foot changes year over year)
- median_list_ppsf (median list price per square foot)
- median_list_ppsf_mom (median list price per square foot changes month over month)
- median_list_ppsf_yoy. (median list price per square foot changes year over year)
- homes_sold (number of homes sold)
- homes_sold_mom (number of homes sold month over month)
- homes_sold_yoy (number of homes sold year over year)
- pending_sales
- pending_sales_mom
- pending_sales_yoy
- new_listings
- new_listings_mom
- new_listings_yoy
- inventory
- inventory_mom
- inventory_yoy
- months_of_supply
- months_of_supply_mom
- months_of_supply_yoy
- median_dom (median days on market until property is sold)
- median_dom_mom (median days on market changes month over month)
- median_dom_yoy (median days on market changes year over year)
- avg_sale_to_list (average sale price to list price ratio)
- avg_sale_to_list_mom (average sale price to list price ratio changes month over month)
- avg_sale_to_list_yoy (average sale price to list price ratio changes year over year)
- sold_above_list
- sold_above_list_mom
- sold_above_list_yoy
- price_drops
- price_drops_mom
- price_drops_yoy
- off_market_in_two_weeks (number of properties that will be taken off the market within 2 weeks)
- off_market_in_two_weeks_mom (changes in number of properties that will be taken off the market within 2 weeks, month over month)
- off_market_in_two_weeks_yoy (changes in number of properties that will be taken off the market within 2 weeks, year over year)
- parent_metro_region
- parent_metro_region_metro_code
- last_updated
Filetype: gzip (gz) Support for gzip files in Python: https://docs.python.org/3/library/gzip.html
Data Source & Credit: Redfin.com
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time series data for the statistic Consumer Price Index, All items and country United States. Indicator Definition:Consumer Price Index, All itemsThe indicator "Consumer Price Index, All items" stands at 148.58 as of 8/31/2025, the highest value at least since 2/28/1990, the period currently displayed. Regarding the One-Year-Change of the series, the current value constitutes an increase of 2.92 percent compared to the value the year prior.The 1 year change in percent is 2.92.The 3 year change in percent is 9.39.The 5 year change in percent is 24.65.The 10 year change in percent is 35.94.The Serie's long term average value is 95.94. It's latest available value, on 8/31/2025, is 54.87 percent higher, compared to it's long term average value.The Serie's change in percent from it's minimum value, on 1/31/1990, to it's latest available value, on 8/31/2025, is +154.30%.The Serie's change in percent from it's maximum value, on 8/31/2025, to it's latest available value, on 8/31/2025, is 0.0%.
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Concept: Average value of credit operations by size of the borrower - microenterprise Source: Credit Information System 9ca1e64e-7fc1-4843-bfee-e936b0cfe7d4 25715-average-value-of-credit-operations-by-size-of-the-borrower---microenterprise
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a list of 10,000 women's shoes and their product information provided by Datafiniti's Product Database.
The dataset includes shoe name, brand, price, and more. Each shoe will have an entry for each price found for it and some shoes may have multiple entries.
Note that this is a sample of a large dataset. The full dataset is available through Datafiniti.
You can use this data to determine brand markups, pricing strategies, and trends for luxury shoes. E.g.:
What is the average price of each distinct brand listed? Which brands have the highest prices? Which ones have the widest distribution of prices? Is there a typical price distribution (e.g., normal) across brands or within specific brands? Further processing data would also let you:
Correlate specific product features with changes in price. You can cross-reference this data with a sample of our Men's Shoe Prices to see if there are any differences between women's brands and men's brands.
A full schema for the data is available in here
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Concept: Average value of credit operations - individual microentrepreneur (MEI) Source: Central Bank of Brazil - Department of Financial Education d58a9ee7-fad6-4699-841e-222a3e73bcaa 27311-average-value-of-credit-operations---individual-microentrepreneur-mei
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time series data for the statistic Consumer Price Index, All items and country Denmark. Indicator Definition:Consumer Price Index, All itemsThe indicator "Consumer Price Index, All items" stands at 121.70 as of 8/31/2025. Regarding the One-Year-Change of the series, the current value constitutes an increase of 2.01 percent compared to the value the year prior.The 1 year change in percent is 2.01.The 3 year change in percent is 5.92.The 5 year change in percent is 17.47.The 10 year change in percent is 21.70.The Serie's long term average value is 88.15. It's latest available value, on 8/31/2025, is 38.05 percent higher, compared to it's long term average value.The Serie's change in percent from it's minimum value, on 1/31/1990, to it's latest available value, on 8/31/2025, is +100.08%.The Serie's change in percent from it's maximum value, on 7/31/2025, to it's latest available value, on 8/31/2025, is -0.653%.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Average House Prices in the United States increased to 534100 USD in August from 478200 USD in July of 2025. This dataset includes a chart with historical data for the United States New Home Average Sales Price.
Facebook
TwitterOpen Government Licence 2.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/2/
License information was derived automatically
Average House Price
Facebook
TwitterThis data set contains preliminary monthly sales data for the average price per gram of usable cannabis in both the adult-use cannabis and medical marijuana markets. For the purposes of this dataset, "usable cannabis" includes raw flower in whole, ground, or pre-rolled form, without additional extracted materials. The data reported is compiled at specific points in time and only captures data current at the time the report is generated. Data values may be updated and change over time as updates occur.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time series data for the statistic Consumer Price Index, All items and country Trinidad and Tobago. Indicator Definition:Consumer Price Index, All itemsThe indicator "Consumer Price Index, All items" stands at 125.70 as of 7/31/2025, the highest value at least since 2/29/2000, the period currently displayed. Regarding the One-Year-Change of the series, the current value constitutes an increase of 1.45 percent compared to the value the year prior.The 1 year change in percent is 1.45.The 3 year change in percent is 6.53.The 5 year change in percent is 15.32.The 10 year change in percent is 23.57.The Serie's long term average value is 82.93. It's latest available value, on 7/31/2025, is 51.57 percent higher, compared to it's long term average value.The Serie's change in percent from it's minimum value, on 3/31/2000, to it's latest available value, on 7/31/2025, is +231.26%.The Serie's change in percent from it's maximum value, on 7/31/2025, to it's latest available value, on 7/31/2025, is 0.0%.
Facebook
TwitterBy Josh Murrey [source]
The Books Dataset: Sales, Ratings, and Publication provides comprehensive information on various aspects of books, including their publishing year, author details, ratings given by readers, sales performance data, and genre classification. The dataset consists of several key columns that capture important attributes related to each book.
The Publishing Year column indicates the year in which each book was published. This information helps in understanding the chronological distribution of books in the dataset.
The Book Name column contains the titles of the books. Each book has a unique name that distinguishes it from others in the dataset.
The Author column specifies the name(s) of the author(s) responsible for creating each book. This information is crucial for understanding different authors' contributions and analyzing their impact on sales and ratings.
The language_code column represents a specific code assigned to indicate the language in which each book is written. This code serves as a reference point for language-based analysis within the dataset.
Each author's rating is captured in the Author_Rating column. This rating is based on their previous works and serves as an indicator of their reputation or acclaim among readers.
The average rating given by readers for each book is recorded in the Book_average_rating column. This value reflects how well-received a particular book is by its audience.
The number of ratings given to each book by readers can be found in the Book_ratings_count column. This metric helps gauge reader engagement and provides insights into popular or widely-discussed books within this dataset.
Books are classified into different genres or categories which are mentioned under the genre column. Genre classification allows for analyzing trends across specific literary genres or identifying patterns related to certain types of books.
Sales-related data includes both gross sales revenue (gross sales) generated by each book and publisher revenue (publisher revenue) earned from these sales transactions. These numeric values provide insights into financial performance aspects associated with the book market.
The sale price column denotes the specific price at which each book is sold. This information helps evaluate pricing strategies and their potential impact on sales figures.
Sales performance is further quantified through the sales rank column, which assigns a numerical rank to each book based on its sales performance. This ranking system aids in identifying high-performing books within the dataset.
Lastly, the units sold column captures the number of units of each book that have been sold. This data highlights popular books based on reader demand and serves as a crucial measure of commercial success within the dataset.
Overall, this expansive and comprehensive Books Dataset
Introduction:
Getting Familiar with the Columns: The dataset contains multiple columns that provide different kinds of information:
Book Name: The title of each book.
Author: The name of the author who wrote the book.
language_code: The code representing the language in which the book is written.
Author_Rating: The rating assigned to the author based on their previous works.
Book_average_rating: The average rating given to the book by readers.
Book_ratings_count: The number of ratings given to the book by readers.
genre: The genre or category to which the book belongs.
gross sales: The total sales revenue generated by each book.
publisher revenue: The revenue earned by publishers from selling each book.
sale price: The price at which each copy of a book is sold.
sales rank: A numeric value indicating a book's rank based on its sales performance in comparison to other books within its category (genre).
units sold : Total number of copies sold for each specific title.
Understanding Numeric and Textual Data: Numeric columns in this dataset include Publishing Year, Author_Rating, Book_average_rating, Book_ratings_count,gross sales,publisher revenue,sale price,sales rank and units sold; these provide quantitative insights that can be used for statistical analysis and comparisons.
Additionally,the columns 'Author','Book Name',and 'genre' contain textual data that provides descriptive elements such as authors' names and categorization genres.
- Exploring Relationships Between Data Points: By combining different co...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time series data for the statistic Nominal Residential Property Price Index Quarterly and country Finland. Indicator Definition:Nominal Residential Property Price Index QuarterlyThe indicator "Nominal Residential Property Price Index Quarterly" stands at 107.03 as of 06/30/2025. Regarding the One-Year-Change of the series, the current value constitutes a decrease of -1.37 percent compared to the value the year prior.The 1 year change in percent is -1.37.The 3 year change in percent is -11.45.The 5 year change in percent is -4.65.The 10 year change in percent is 0.0996.The Serie's long term average value is 61.26. It's latest available value, on 06/30/2025, is 74.70 percent higher, compared to it's long term average value.The Serie's change in percent from it's minimum value, on 03/31/1970, to it's latest available value, on 06/30/2025, is +1,486.88%.The Serie's change in percent from it's maximum value, on 06/30/2022, to it's latest available value, on 06/30/2025, is -11.45%.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time series data for the statistic Consumer Price Index, All items and country Poland. Indicator Definition:Consumer Price Index, All itemsThe indicator "Consumer Price Index, All items" stands at 194.10 as of 7/31/2025, the highest value at least since 2/28/1990, the period currently displayed. Regarding the One-Year-Change of the series, the current value constitutes an increase of 3.19 percent compared to the value the year prior.The 1 year change in percent is 3.19.The 3 year change in percent is 19.37.The 5 year change in percent is 44.74.The 10 year change in percent is 57.42.The Serie's long term average value is 99.84. It's latest available value, on 7/31/2025, is 94.41 percent higher, compared to it's long term average value.The Serie's change in percent from it's minimum value, on 1/31/1990, to it's latest available value, on 7/31/2025, is +3,582.72%.The Serie's change in percent from it's maximum value, on 7/31/2025, to it's latest available value, on 7/31/2025, is 0.0%.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This page contains selected data from the publication "Dual-mode room temperature self-calibrating photodiodes approaching cryogenic radiometer uncertainty", Marit S Ulset et al 2022 Metrologia 59 035008, DOI 10.1088/1681-7575/ac6a94.
Description of files:
Fig5.txt: Data for Fig 5a. Plotted values of non-equivalence as a function of beam position on the photodiode.
Fig6.txt: Non-equivalence (gamma) in parts per million (ppm) and responsivity in mK/mW as a function of power level P in mW.
Fig7.txt: Spectral directional emissivity determined under 10° with respect to the sample surface normal for Wafer P7 at 20°C. Uncertainty is given as standard uncertainty (k=1).
Fig14.txt: Calculated time constants for the four different steps in a thermal heating cycle (electrical low, optical, electrical high, optical). The average value in the published paper contains an error, as one dataset was used twice. The file also shows correct the average value, when all datasets are used only once. A corrigendum was submitted to Metrologia, but it was considered not necessary, and hence not published.
Fig15.txt: Plotted values for apparent IQD in parts per million (ppm) for three different calculation algorithms. Uncertainty is given as propagated type A standard uncertainty.
Fig16.txt: Plotted values for measured IQD in parts per million (ppm) as a function of absorbed optical power in µW, for two different measurement methods - OC and FB method.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Median price paid for residential property in England and Wales, by property type and administrative geographies. Annual data.
Facebook
TwitterThis dataset is designed for beginners to practice regression problems, particularly in the context of predicting house prices. It contains 1000 rows, with each row representing a house and various attributes that influence its price. The dataset is well-suited for learning basic to intermediate-level regression modeling techniques.
Beginner Regression Projects: This dataset can be used to practice building regression models such as Linear Regression, Decision Trees, or Random Forests. The target variable (house price) is continuous, making this an ideal problem for supervised learning techniques.
Feature Engineering Practice: Learners can create new features by combining existing ones, such as the price per square foot or age of the house, providing an opportunity to experiment with feature transformations.
Exploratory Data Analysis (EDA): You can explore how different features (e.g., square footage, number of bedrooms) correlate with the target variable, making it a great dataset for learning about data visualization and summary statistics.
Model Evaluation: The dataset allows for various model evaluation techniques such as cross-validation, R-squared, and Mean Absolute Error (MAE). These metrics can be used to compare the effectiveness of different models.
The dataset is highly versatile for a range of machine learning tasks. You can apply simple linear models to predict house prices based on one or two features, or use more complex models like Random Forest or Gradient Boosting Machines to understand interactions between variables.
It can also be used for dimensionality reduction techniques like PCA or to practice handling categorical variables (e.g., neighborhood quality) through encoding techniques like one-hot encoding.
This dataset is ideal for anyone wanting to gain practical experience in building regression models while working with real-world features.