9 datasets found

polynomial regression
kaggle.com
Updated Jul 5, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Miraj Deep Bhandari (2023). polynomial regression [Dataset]. http://doi.org/10.34740/kaggle/ds/3482232
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/ds/3482232
Dataset updated
Jul 5, 2023
Dataset provided by
Kaggle
Authors
Miraj Deep Bhandari
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The Ice Cream Selling dataset is a simple and well-suited dataset for beginners in machine learning who are looking to practice polynomial regression. It consists of two columns: temperature and the corresponding number of units of ice cream sold.

The dataset captures the relationship between temperature and ice cream sales. It serves as a practical example for understanding and implementing polynomial regression, a powerful technique for modeling nonlinear relationships in data.

The dataset is designed to be straightforward and easy to work with, making it ideal for beginners. The simplicity of the data allows beginners to focus on the fundamental concepts and steps involved in polynomial regression without overwhelming complexity.

By using this dataset, beginners can gain hands-on experience in preprocessing the data, splitting it into training and testing sets, selecting an appropriate degree for the polynomial regression model, training the model, and evaluating its performance. They can also explore techniques to address potential challenges such as overfitting.

With this dataset, beginners can practice making predictions of ice cream sales based on temperature inputs and visualize the polynomial regression curve that represents the relationship between temperature and ice cream sales.

Overall, the Ice Cream Selling dataset provides an accessible and practical learning resource for beginners to grasp the concepts and techniques of polynomial regression in the context of analyzing ice cream sales data.
Walmart Dataset
kaggle.com
zip
Updated Dec 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
M Yasser H (2021). Walmart Dataset [Dataset]. https://www.kaggle.com/datasets/yasserh/walmart-dataset
Explore at:
zip(125095 bytes)Available download formats
Dataset updated
Dec 26, 2021
Authors
M Yasser H
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
https://raw.githubusercontent.com/Masterx-AI/Project_Retail_Analysis_with_Walmart/main/Wallmart1.jpg" alt="">

Description:

One of the leading retail stores in the US, Walmart, would like to predict the sales and demand accurately. There are certain events and holidays which impact sales on each day. There are sales data available for 45 stores of Walmart. The business is facing a challenge due to unforeseen demands and runs out of stock some times, due to the inappropriate machine learning algorithm. An ideal ML algorithm will predict demand accurately and ingest factors like economic conditions including CPI, Unemployment Index, etc.

Walmart runs several promotional markdown events throughout the year. These markdowns precede prominent holidays, the four largest of all, which are the Super Bowl, Labour Day, Thanksgiving, and Christmas. The weeks including these holidays are weighted five times higher in the evaluation than non-holiday weeks. Part of the challenge presented by this competition is modeling the effects of markdowns on these holiday weeks in the absence of complete/ideal historical data. Historical sales data for 45 Walmart stores located in different regions are available.

Acknowledgements

The dataset is taken from Kaggle.

Objective:

Understand the Dataset & cleanup (if required).

Build Regression models to predict the sales w.r.t single & multiple features.

Also evaluate the models & compare their respective scores like R2, RMSE, etc.
Cleaned NHANES 1988-2018
figshare.com
txt
Updated Feb 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet (2025). Cleaned NHANES 1988-2018 [Dataset]. http://doi.org/10.6084/m9.figshare.21743372.v9
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21743372.v9
Dataset updated
Feb 18, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The National Health and Nutrition Examination Survey (NHANES) provides data and have considerable potential to study the health and environmental exposure of the non-institutionalized US population. However, as NHANES data are plagued with multiple inconsistencies, processing these data is required before deriving new insights through large-scale analyses. Thus, we developed a set of curated and unified datasets by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 135,310 participants and 5,078 variables. The variables conveydemographics (281 variables),dietary consumption (324 variables),physiological functions (1,040 variables),occupation (61 variables),questionnaires (1444 variables, e.g., physical activity, medical conditions, diabetes, reproductive health, blood pressure and cholesterol, early childhood),medications (29 variables),mortality information linked from the National Death Index (15 variables),survey weights (857 variables),environmental exposure biomarker measurements (598 variables), andchemical comments indicating which measurements are below or above the lower limit of detection (505 variables).csv Data Record: The curated NHANES datasets and the data dictionaries includes 23 .csv files and 1 excel file.The curated NHANES datasets involves 20 .csv formatted files, two for each module with one as the uncleaned version and the other as the cleaned version. The modules are labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments."dictionary_nhanes.csv" is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 5,078 variables in NHANES."dictionary_harmonized_categories.csv" contains the harmonized categories for the categorical variables.“dictionary_drug_codes.csv” contains the dictionary for descriptors on the drugs codes.“nhanes_inconsistencies_documentation.xlsx” is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES modules.R Data Record: For researchers who want to conduct their analysis in the R programming language, only cleaned NHANES modules and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file.“w - nhanes_1988_2018.RData” contains all the aforementioned datasets as R data objects. We make available all R scripts on customized functions that were written to curate the data.“m - nhanes_1988_2018.R” shows how we used the customized functions (i.e. our pipeline) to curate the original NHANES data.Example starter codes: The set of starter code to help users conduct exposome analysis consists of four R markdown files (.Rmd). We recommend going through the tutorials in order.“example_0 - merge_datasets_together.Rmd” demonstrates how to merge the curated NHANES datasets together.“example_1 - account_for_nhanes_design.Rmd” demonstrates how to conduct a linear regression model, a survey-weighted regression model, a Cox proportional hazard model, and a survey-weighted Cox proportional hazard model.“example_2 - calculate_summary_statistics.Rmd” demonstrates how to calculate summary statistics for one variable and multiple variables with and without accounting for the NHANES sampling design.“example_3 - run_multiple_regressions.Rmd” demonstrates how run multiple regression models with and without adjusting for the sampling design.
Housing Prices Dataset
kaggle.com
zip
Updated Jan 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
M Yasser H (2022). Housing Prices Dataset [Dataset]. https://www.kaggle.com/datasets/yasserh/housing-prices-dataset
Explore at:
zip(4740 bytes)Available download formats
Dataset updated
Jan 12, 2022
Authors
M Yasser H
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
https://raw.githubusercontent.com/Masterx-AI/Project_Housing_Price_Prediction_/main/hs.jpg" alt="">

Description:

A simple yet challenging project, to predict the housing price based on certain factors like house area, bedrooms, furnished, nearness to mainroad, etc. The dataset is small yet, it's complexity arises due to the fact that it has strong multicollinearity. Can you overcome these obstacles & build a decent predictive model?

Acknowledgement:

Harrison, D. and Rubinfeld, D.L. (1978) Hedonic prices and the demand for clean air. J. Environ. Economics and Management 5, 81–102. Belsley D.A., Kuh, E. and Welsch, R.E. (1980) Regression Diagnostics. Identifying Influential Data and Sources of Collinearity. New York: Wiley.

Objective:

Understand the Dataset & cleanup (if required).

Build Regression models to predict the sales w.r.t a single & multiple feature.

Also evaluate the models & compare thier respective scores like R2, RMSE, etc.
Advertising Sales Dataset
kaggle.com
zip
Updated Dec 25, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
M Yasser H (2021). Advertising Sales Dataset [Dataset]. https://www.kaggle.com/datasets/yasserh/advertising-sales-dataset
Explore at:
zip(2302 bytes)Available download formats
Dataset updated
Dec 25, 2021
Authors
M Yasser H
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
https://raw.githubusercontent.com/Masterx-AI/Project_Ad_Budget_Estimation_/main/0-ad1%20(1).jpg" alt="">

Description:

The advertising dataset captures the sales revenue generated with respect to advertisement costs across multiple channels like radio, tv, and newspapers.

It is required to understand the impact of ad budgets on the overall sales.

Acknowledgement:

The dataset is taken from Kaggle

Objective:

Understand the Dataset & cleanup (if required).

Build Regression models to predict the sales w.r.t a single & multiple features.

Also evaluate the models & compare their respective scores like R2, RMSE, etc.
Transportation and Logistics Tracking Dataset
kaggle.com
zip
Updated May 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicole Machado (2024). Transportation and Logistics Tracking Dataset [Dataset]. https://www.kaggle.com/datasets/nicolemachado/transportation-and-logistics-tracking-dataset
Explore at:
zip(3705944 bytes)Available download formats
Dataset updated
May 5, 2024
Authors
Nicole Machado
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The Transportation and Logistics Tracking Dataset comprises multiple datasets related to various aspects of transportation and logistics operations. It includes information on on-time delivery impact, routes by rating, customer ratings, delivery times with and without congestion, weather conditions, and differences between fixed and main delivery times across different regions.

On-Time Delivery Impact: This dataset provides insights into the impact of on-time delivery, categorizing deliveries based on their impact and counting the occurrences for each category. Routes by Rating: Here, the dataset illustrates the relationship between routes and their corresponding ratings, offering a visual representation of route performance across different rating categories. Customer Ratings and On-Time Delivery: This dataset explores the relationship between customer ratings and on-time delivery, presenting a comparison of delivery counts based on customer ratings and on-time delivery status. Delivery Time with and Without Congestion: It contains information on delivery times in various cities, both with and without congestion, allowing for an analysis of how congestion affects delivery efficiency. Weather Conditions: This dataset provides a summary of weather conditions, including counts for different weather conditions such as partly cloudy, patchy light rain with thunder, and sunny. Difference between Fixed and Main Delivery Times: Lastly, the dataset highlights the differences between fixed and main delivery times across different regions, shedding light on regional variations in delivery schedules. Overall, this dataset offers valuable insights into the transportation and logistics domain, enabling analysis and decision-making to optimize delivery processes and enhance customer satisfaction.
NHANES 1988-2018
kaggle.com
zip
Updated Jul 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nguyenvy (2025). NHANES 1988-2018 [Dataset]. https://www.kaggle.com/datasets/nguyenvy/nhanes-19882018
Explore at:
zip(917955003 bytes)Available download formats
Dataset updated
Jul 31, 2025
Authors
nguyenvy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The National Health and Nutrition Examination Survey (NHANES) provides data and have considerable potential to study the health and environmental exposure of the non-institutionalized US population. However, as NHANES data are plagued with multiple inconsistencies, processing these data is required before deriving new insights through large-scale analyses. Thus, we developed a set of curated and unified datasets by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 135,310 participants and 5,078 variables. The variables convey 1. demographics (281 variables), 2. dietary consumption (324 variables), 3. physiological functions (1,040 variables), 4. occupation (61 variables), 5. questionnaires (1444 variables, e.g., physical activity, medical conditions, diabetes, reproductive health, blood pressure and cholesterol, early childhood), 6. medications (29 variables), 7. mortality information linked from the National Death Index (15 variables), 8. survey weights (857 variables), 9. environmental exposure biomarker measurements (598 variables), and 10. chemical comments indicating which measurements are below or above the lower limit of detection (505 variables).

csv Data Record: The curated NHANES datasets and the data dictionaries includes 23 .csv files and 1 excel file. - The curated NHANES datasets involves 20 .csv formatted files, two for each module with one as the uncleaned version and the other as the cleaned version. The modules are labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments. - "dictionary_nhanes.csv" is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 5,078 variables in NHANES. - "dictionary_harmonized_categories.csv" contains the harmonized categories for the categorical variables. - “dictionary_drug_codes.csv” contains the dictionary for descriptors on the drugs codes. - “nhanes_inconsistencies_documentation.xlsx” is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES modules.

R Data Record: For researchers who want to conduct their analysis in the R programming language, only cleaned NHANES modules and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file. - “w - nhanes_1988_2018.RData” contains all the aforementioned datasets as R data objects. We make available all R scripts on customized functions that were written to curate the data. - “m - nhanes_1988_2018.R” shows how we used the customized functions (i.e. our pipeline) to curate the original NHANES data.

Example starter codes: The set of starter code to help users conduct exposome analysis consists of four R markdown files (.Rmd). We recommend going through the tutorials in order. - “example_0 - merge_datasets_together.Rmd” demonstrates how to merge the curated NHANES datasets together. - “example_1 - account_for_nhanes_design.Rmd” demonstrates how to conduct a linear regression model, a survey-weighted regression model, a Cox proportional hazard model, and a survey-weighted Cox proportional hazard model. - “example_2 - calculate_summary_statistics.Rmd” demonstrates how to calculate summary statistics for one variable and multiple variables with and without accounting for the NHANES sampling design. - “example_3 - run_multiple_regressions.Rmd” demonstrates how run multiple regression models with and without adjusting for the sampling design.
Temperature and Ice Cream Sales
kaggle.com
zip
Updated Feb 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
rephy (2024). Temperature and Ice Cream Sales [Dataset]. https://www.kaggle.com/datasets/raphaelmanayon/temperature-and-ice-cream-sales
Explore at:
zip(1502 bytes)Available download formats
Dataset updated
Feb 19, 2024
Authors
rephy
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Project is still being worked on.

Initially, this dataset was just for a Google Data Analytics project, where I was given a task to accomplish with the data in a spreadsheet: look at the table given in the spreadsheet, and see if there's a correlation between temperature and revenue in ice cream sales. Eventually, I did see the pattern: higher temperatures usually meant more revenue, which seems realistic. However, I wanted to dig further into the data and perform a deeper analysis using a visualization, and maybe even a regression. My new questions were, "How strong is this correlation?" and "Can we represent the data using a linear regression?"
Retail Store Inventory Forecasting Dataset
kaggle.com
zip
Updated Nov 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anirudh Singh Chauhan (2024). Retail Store Inventory Forecasting Dataset [Dataset]. https://www.kaggle.com/datasets/anirudhchauhan/retail-store-inventory-forecasting-dataset
Explore at:
zip(1588139 bytes)Available download formats
Dataset updated
Nov 24, 2024
Authors
Anirudh Singh Chauhan
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset provides synthetic yet realistic data for analyzing and forecasting retail store inventory demand. It contains over 73000 rows of daily data across multiple stores and products, including attributes like sales, inventory levels, pricing, weather, promotions, and holidays.

The dataset is ideal for practicing machine learning tasks such as demand forecasting, dynamic pricing, and inventory optimization. It allows data scientists to explore time series forecasting techniques, study the impact of external factors like weather and holidays on sales, and build advanced models to optimize supply chain performance.

Challenges for Data Scientists:

Challenge 1: Time Series Demand Forecasting Predict daily product demand across stores using historical sales and inventory data. Can you build an LSTM-based forecasting model that outperforms classical methods like ARIMA?

Challenge 2: Inventory Optimization Optimize inventory levels by analyzing sales trends and minimizing stockouts while reducing overstock situations.

Challenge 3: Dynamic Pricing Develop a pricing strategy based on demand, competitor pricing, and discounts to maximize revenue.

Key Data Features:

Date: Daily records from [start_date] to [end_date]. Store ID & Product ID: Unique identifiers for stores and products. Category: Product categories like Electronics, Clothing, Groceries, etc. Region: Geographic region of the store. Inventory Level: Stock available at the beginning of the day. Units Sold: Units sold during the day. Demand Forecast: Predicted demand based on past trends. Weather Condition: Daily weather impacting sales. Holiday/Promotion: Indicators for holidays or promotions.

Example Notebook Ideas

Exploratory Data Analysis (EDA): Analyze sales trends, visualize data, and identify patterns. Time Series Forecasting: Train models like ARIMA, Prophet, or LSTM to predict future demand. Pricing Analysis: Study how discounts and competitor pricing affect sales.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Miraj Deep Bhandari (2023). polynomial regression [Dataset]. http://doi.org/10.34740/kaggle/ds/3482232

polynomial regression

polynomial regression for begginers , Polynomial Regression Algorithm

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.34740/kaggle/ds/3482232

Dataset updated

Jul 5, 2023

Dataset provided by

Kaggle

Authors

Miraj Deep Bhandari

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

The Ice Cream Selling dataset is a simple and well-suited dataset for beginners in machine learning who are looking to practice polynomial regression. It consists of two columns: temperature and the corresponding number of units of ice cream sold.

The dataset captures the relationship between temperature and ice cream sales. It serves as a practical example for understanding and implementing polynomial regression, a powerful technique for modeling nonlinear relationships in data.

The dataset is designed to be straightforward and easy to work with, making it ideal for beginners. The simplicity of the data allows beginners to focus on the fundamental concepts and steps involved in polynomial regression without overwhelming complexity.

By using this dataset, beginners can gain hands-on experience in preprocessing the data, splitting it into training and testing sets, selecting an appropriate degree for the polynomial regression model, training the model, and evaluating its performance. They can also explore techniques to address potential challenges such as overfitting.

With this dataset, beginners can practice making predictions of ice cream sales based on temperature inputs and visualize the polynomial regression curve that represents the relationship between temperature and ice cream sales.

Overall, the Ice Cream Selling dataset provides an accessible and practical learning resource for beginners to grasp the concepts and techniques of polynomial regression in the context of analyzing ice cream sales data.

Clear search

Close search

Google apps

Main menu

polynomial regression

Walmart Dataset

Description:

Acknowledgements

Objective:

Cleaned NHANES 1988-2018

Housing Prices Dataset

Description:

Acknowledgement:

Objective:

Advertising Sales Dataset

Description:

Acknowledgement:

Objective:

Transportation and Logistics Tracking Dataset

NHANES 1988-2018

Temperature and Ice Cream Sales

Retail Store Inventory Forecasting Dataset

Challenges for Data Scientists:

Key Data Features:

Example Notebook Ideas

polynomial regressionSee More Versions

polynomial regression for begginers , Polynomial Regression Algorithm

polynomial regression