Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data scraped from National Retail Federation webpage for 2020.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
this graph was created in R and Canva :
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2F1a47e2e6e4836b86b065441359d5c9f0%2Fgraph1.gif?generation=1742159161939732&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2F87de025c5703cb69483764c4fc9c58ab%2Fgraph2.gif?generation=1742159169346925&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2Fddf5001438c97c8c030333261685849b%2Fgraph3.png?generation=1742159174793142&alt=media" alt="">
The dataset offers a comprehensive view of grocery inventory, covering 990 products across multiple categories such as Grains & Pulses, Beverages, Fruits & Vegetables, and more. It includes crucial details about each product, such as its unique identifier (Product_ID), name, category, and supplier information, including Supplier_ID and Supplier_Name. This dataset is particularly valuable for businesses aiming to optimize inventory management, sales tracking, and supply chain efficiency.
Key inventory-related fields include Stock_Quantity, which indicates the current stock level, and Reorder_Level, which determines when a product should be reordered. The Reorder_Quantity specifies how much stock to order when inventory falls below the reorder threshold. Additionally, Unit_Price provides insight into pricing, helping businesses analyze cost trends and profitability.
To manage product flow, the dataset includes dates such as Date_Received, which tracks when the product was added to the warehouse, and Last_Order_Date, marking the most recent procurement. For perishable goods, the Expiration_Date column is critical, allowing businesses to minimize waste by monitoring shelf life. The Warehouse_Location specifies where each product is stored, facilitating efficient inventory handling.
Sales and performance metrics are also included. The Sales_Volume column records the total number of units sold, providing insights into consumer demand. Inventory_Turnover_Rate helps businesses assess how quickly a product sells and is replenished, ensuring better stock management. The dataset also tracks the Status of each product, indicating whether it is Active, Discontinued, or Backordered.
The dataset serves multiple purposes in inventory management, sales performance evaluation, supplier analysis, and product lifecycle tracking. Businesses can leverage this data to refine reorder strategies, ensuring optimal stock levels and avoiding stockouts or excessive inventory. Sales analysis can help identify high-demand products and slow-moving items, enabling better decision-making in pricing and promotions. Evaluating suppliers based on their performance, pricing, and delivery efficiency helps streamline procurement and improve overall supply chain operations.
Furthermore, the dataset can support predictive analytics by employing machine learning techniques to estimate reorder quantities, forecast demand, and optimize stock replenishment. Inventory turnover insights can aid in maintaining a balanced supply, preventing unnecessary overstocking or shortages. By tracking trends in sales, businesses can refine their marketing and distribution strategies, ensuring sustained profitability.
This dataset is designed for educational and demonstration purposes, offering fictional data under the Creative Commons Attribution 4.0 International License. Users are free to analyze, modify, and apply the data while providing proper attribution. Additionally, certain products are marked as discontinued or backordered, reflecting real-world inventory dynamics. Businesses dealing with perishable goods should closely monitor expiration and last order dates to avoid losses due to spoilage.
Overall, this dataset provides a versatile resource for those interested in inventory management, sales analysis, and supply chain optimization. By leveraging the structured data, businesses can make data-driven decisions to enhance operational efficiency and maximize profitability.
The provided Python code is a comprehensive analysis of sales data for a business that involves the merging of monthly sales data, cleaning and augmenting the dataset, and performing various analytical tasks. Here's a breakdown of the code:
Data Preparation and Merging:
The code begins by importing necessary libraries and filtering out warnings. It merges sales data from 12 months into a single file named "all_data.csv." Data Cleaning:
Rows with NaN values are dropped, and any entries starting with 'Or' in the 'Order Date' column are removed. Columns like 'Quantity Ordered' and 'Price Each' are converted to numeric types for further analysis. Data Augmentation:
Additional columns such as 'Month,' 'Sales,' and 'City' are added to the dataset. The 'City' column is derived from the 'Purchase Address' column. Analysis:
Several analyses are conducted, answering questions such as: The best month for sales and total earnings. The city with the highest number of sales. The ideal time for advertisements based on the number of orders per hour. Products that are often sold together. The best-selling products and their correlation with price. Visualization:
Bar charts and line plots are used for visualizing the analysis results, making it easier to interpret trends and patterns. Matplotlib is employed for creating visualizations. Summary:
The code concludes with a comprehensive visualization that combines the quantity ordered and average price for each product, shedding light on product performance. This code is structured to offer insights into sales patterns, customer behavior, and product performance, providing valuable information for strategic decision-making in the business.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains two .csv files that can be used as a new benchmark data for the solving of real-world sales forecasting problem. All data are real and obtained experimentally in production environment in one of the biggest retail company in Bosnia and Herzegovina.The available data in this dataset are in period from 2014/03/01 to 2021/03/01. Data are aggregated on monthly basis for 50 top items of one very popular brand in 4 different organizational units.
The Office of Policy and Management maintains a listing of all real estate sales with a sales price of $2,000 or greater that occur between October 1 and September 30 of each year. For each sale record, the file includes: town, property address, date of sale, property type (residential, apartment, commercial, industrial or vacant land), sales price, and property assessment. Data are collected in accordance with Connecticut General Statutes, section 10-261a and 10-261b: https://www.cga.ct.gov/current/pub/chap_172.htm#sec_10-261a and https://www.cga.ct.gov/current/pub/chap_172.htm#sec_10-261b. Annual real estate sales are reported by grand list year (October 1 through September 30 each year). For instance, sales from 2018 GL are from 10/01/2018 through 9/30/2019. Some municipalities may not report data for certain years because when a municipality implements a revaluation, they are not required to submit sales data for the twelve months following implementation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.
----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
This is a Power BI report of an online sales data.
The data is divided into two tables, Sales.csv and Orders.csv
Sales.csv contains the following columns
Orders.csv contains
Apart from the pbix file(Power BI file), I have atatched a pdf version of the report
This dataset is a merged dataset created from the data provided in the competition "Store Sales - Time Series Forecasting". The other datasets that were provided there apart from train and test (for example holidays_events, oil, stores, etc.) could not be used in the final prediction. According to my understanding, through the EDA of the merged dataset, we will be able to get a clearer picture of the other factors that might also affect the final prediction of grocery sales. Therefore, I created this merged dataset and posted it here for the further scope of analysis.
##### Data Description Data Field Information (This is a copy of the description as provided in the actual dataset)
Train.csv - id: store id - date: date of the sale - store_nbr: identifies the store at which the products are sold. -**family**: identifies the type of product sold. - sales: gives the total sales for a product family at a particular store at a given date. Fractional values are possible since products can be sold in fractional units (1.5 kg of cheese, for instance, as opposed to 1 bag of chips). - onpromotion: gives the total number of items in a product family that were being promoted at a store on a given date. - Store metadata, including ****city, state, type, and cluster.**** - cluster is a grouping of similar stores. - Holidays and Events, with metadata NOTE: Pay special attention to the transferred column. A holiday that is transferred officially falls on that calendar day but was moved to another date by the government. A transferred day is more like a normal day than a holiday. To find the day that it was celebrated, look for the corresponding row where the type is Transfer. For example, the holiday Independencia de Guayaquil was transferred from 2012-10-09 to 2012-10-12, which means it was celebrated on 2012-10-12. Days that are type Bridge are extra days that are added to a holiday (e.g., to extend the break across a long weekend). These are frequently made up by the type Work Day which is a day not normally scheduled for work (e.g., Saturday) that is meant to pay back the Bridge. Additional holidays are days added to a regular calendar holiday, for example, as typically happens around Christmas (making Christmas Eve a holiday). - dcoilwtico: Daily oil price. Includes values during both the train and test data timeframes. (Ecuador is an oil-dependent country and its economic health is highly vulnerable to shocks in oil prices.)
**Note: ***There is a transaction column in the training dataset which displays the sales transactions on that particular date. * Test.csv - The test data, having the same features like the training data. You will predict the target sales for the dates in this file. - The dates in the test data are for the 15 days after the last date in the training data. **Note: ***There is a no transaction column in the test dataset as was there in the training dataset. Therefore, while building the model, you might exclude this column and may use it only for EDA.*
submission.csv - A sample submission file in the correct format.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
High-quality, free real estate dataset from all around the United States, in CSV format. Over 10.000 records relevant to Real Estate investors, agents, and data scientists. We are working on complete datasets from a wide variety of countries. Don't hesitate to contact us for more information.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A dataset containing the 100 latest settled sales in CSV format for Berkeley - Lake Heights - Cringila as at March-2025, data sourced from the NSW Valuer General, geocoded and analyzed by AreaSearch.
Monthly state sales tax collections is an experimental dataset published by the U.S. Census Bureau. It provides data for collections from sales taxes including motor fuel taxes. Data reported for a specific month generally represent sales taxes collected on sales made during the prior month. Tax collections primarily rely on unaudited data collected from existing state reports or state data sources available from and posted on the Internet. Secondarily, states report the data via the Quarterly Survey of State and Local Tax Revenue. Data are updated monthly, but due to differing reporting cycles data for some states may lag.
The UK House Price Index is a National Statistic.
Download the full UK House Price Index data below, or use our tool to create your own bespoke reports.
Datasets are available as CSV files. Find out about republishing and making use of the data.
Google Chrome is blocking downloads of our UK HPI data files (Chrome 88 onwards). Please use another internet browser while we resolve this issue. We apologise for any inconvenience caused.
This file includes a derived back series for the new UK HPI. Under the UK HPI, data is available from 1995 for England and Wales, 2004 for Scotland and 2005 for Northern Ireland. A longer back series has been derived by using the historic path of the Office for National Statistics HPI to construct a series back to 1968.
Download the full UK HPI background file:
If you are interested in a specific attribute, we have separated them into these CSV files:
http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/Average-prices-2021-09.csv?utm_medium=GOV.UK&utm_source=datadownload&utm_campaign=average_price&utm_term=9.30_17_11_21" class="govuk-link">Average price (CSV, 9.2MB)
http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/Average-prices-Property-Type-2021-09.csv?utm_medium=GOV.UK&utm_source=datadownload&utm_campaign=average_price_property_price&utm_term=9.30_17_11_21" class="govuk-link">Average price by property type (CSV, 27.8MB)
http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/Sales-2021-09.csv?utm_medium=GOV.UK&utm_source=datadownload&utm_campaign=sales&utm_term=9.30_17_11_21" class="govuk-link">Sales (CSV, 4.7MB)
http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/Cash-mortgage-sales-2021-09.csv?utm_medium=GOV.UK&utm_source=datadownload&utm_campaign=cash_mortgage-sales&utm_term=9.30_17_11_21" class="govuk-link">Cash mortgage sales (CSV, 6.2MB)
http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/First-Time-Buyer-Former-Owner-Occupied-2021-09.csv?utm_medium=GOV.UK&utm_source=datadownload&utm_campaign=FTNFOO&utm_term=9.30_17_11_21" class="govuk-link">First time buyer and former owner occupier (CSV, 5.9MB)
http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/New-and-Old-2021-09.csv?utm_medium=GOV.UK&utm_source=datadownload&utm_campaign=new_build&utm_term=9.30_17_11_21" class="govuk-link">New build and existing resold property (CSV, 16.9MB)
http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/Indices-2021-09.csv?utm_medium=GOV.UK&utm_source=datadownload&utm_campaign=index&utm_term=9.30_17_11_21" class="govuk-link">Index (CSV, 5.9MB)
http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/Indices-seasonally-adjusted-2021-09.csv?utm_medium=GOV.UK&utm_source=datadownload&utm_campaign=index_season_adjusted&utm_term=9.30_17_11_21" class="govuk-link">Index seasonally adjusted (CSV, 194KB)
http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/Average-price-seasonally-adjusted-2021-09.csv?utm_medium=GOV.UK&utm_source=datadownload&utm_campaign=average-price_season_adjusted&utm_term=9.30_17_11_21" class="govuk-link">Average price seasonally adjusted (CSV, 2
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Big Mart Sales’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/akashdeepkuila/big-mart-sales on 12 November 2021.
--- Dataset description provided by original source is as follows ---
The data scientists at Big Mart have collected 2013 sales data for 1559 products across 10 stores in different cities. Also, certain attributes of each product and store have been defined. The aim is to build a predictive model and predict the sales of each product at a particular outlet.
Using this model, Big Mart will try to understand the properties of products and outlets which play a key role in increasing sales.
Please note that the data may have missing values as some stores might not report all the data due to technical glitches. Hence, it will be required to treat them accordingly.
The dataset provides the product details and the outlet information of the products purchased with their sales value split into a train set (8523) and a test (5681) set. Train file: CSV containing the item outlet information with sales value Test file: CSV containing item outlet combinations for which sales need to be forecasted
ProductID
: unique product IDWeight
: weight of productsFatContent
: specifies whether the product is low on fat or notVisibility
: percentage of total display area of all products in a store allocated to the particular productProductType
: the category to which the product belongsMRP
: Maximum Retail Price (listed price) of the productsOutletID
: unique store IDEstablishmentYear
: year of establishment of the outletsOutletSize
: the size of the store in terms of ground area coveredLocationType
: the type of city in which the store is locatedOutletType
: specifies whether the outlet is just a grocery store or some sort of supermarketOutletSales
: (target variable) sales of the product in the particular storeSales of a given product at a retail store can depend both on store attributes as well as product attributes. The dataset is ideal to explore and build a data science model to predict the future sales.
--- Original source retains full ownership of the source dataset ---
Datasets are available as CSV files. Find out about republishing and making use of the data.
http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/UK-HPI-full-file-2016-09.csv" class="govuk-link">UK HPI full file (CSV, 42.5MB)
http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/Average-prices-2016-09.csv" class="govuk-link">Average price.csv
http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/Average-prices-Property-Type-2016-09.csv" class="govuk-link">Average price by property type.csv
http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/Sales-2016-09.csv" class="govuk-link">Sales.csv
http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/Cash-mortgage-sales-2016-09.csv" class="govuk-link">Cash mortgage sales.csv
http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/First-Time-Buyer-Former-Owner-Occupied-2016-09.csv" class="govuk-link">First time buyer and former owner occupied.csv
http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/New-and-Old-2016-09.csv" class="govuk-link">New build and existing resold property.csv
http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/Indices-2016-09.csv" class="govuk-link">Index.csv
http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/Indices-seasonally-adjusted-2016-09.csv" class="govuk-link">Index seasonally adjusted.csv
http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/Average-price-seasonally-adjusted-2016-09.csv" class="govuk-link">Average Price seasonally adjusted.csv
http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/Repossession-2016-09.csv" class="govuk-link">Repossessions.csv
This file includes a derived back series for the new UK HPI. Under the UK HPI, data is available from 1995 for England and Wales, 2004 for Scotland and 2005 for Northern Ireland. A longer back series has been derived by using the historic path of the ONS HPI to construct a series back to 1968:
The release calendar shows when the next month’s data will be published.
Create your own reports based on the UK House Price Index data, http://landregistry.data.gov.uk/app/ukhpi" class="govuk-link">use our tool.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A dataset containing the 100 latest settled sales in CSV format for Minto - St Andrews as at April-2025, data sourced from the NSW Valuer General, geocoded and analyzed by AreaSearch.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A dataset containing the 100 latest settled sales in CSV format for Young Surrounds as at March-2025, data sourced from the NSW Valuer General, geocoded and analyzed by AreaSearch.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A dataset containing the 100 latest settled sales in CSV format for Orange as at March-2025, data sourced from the NSW Valuer General, geocoded and analyzed by AreaSearch.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
I create this dataset to increase the data existing on another dataset I am working on https://www.kaggle.com/gregorut/videogamesales
The console.csv has a console list with its manufacturers and sales. From: https://en.wikipedia.org/wiki/List_of_best-selling_game_consoles I made color.csv for practice reasons, it is a file assigning colors to manufacturers for plotting. The region file refers to the regions in https://www.kaggle.com/gregorut/videogamesales , I am just assigning colors for plot.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A dataset containing the 100 latest settled sales in CSV format for Wyoming as at March-2025, data sourced from the NSW Valuer General, geocoded and analyzed by AreaSearch.
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Aldi is supermarket chain stores operating over 10,000 stores. Crawl Feeds team extracted more than 11K+ groceries information from Aldi.
Available data format CSV
18 data points
Dataset will update based on request
Last extracted on 17 jun 2022
---
Site compleity: Difficult
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data scraped from National Retail Federation webpage for 2020.