5 datasets found
  1. t

    Evaluating FAIR Models for Rossmann Store Sales Prediction: Insights and...

    • test.researchdata.tuwien.at
    bin, csv, json +1
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dilara Çakmak; Dilara Çakmak; Dilara Çakmak; Dilara Çakmak (2025). Evaluating FAIR Models for Rossmann Store Sales Prediction: Insights and Performance Analysis [Dataset]. http://doi.org/10.70124/f5t2d-xt904
    Explore at:
    bin, json, text/markdown, csvAvailable download formats
    Dataset updated
    Apr 28, 2025
    Dataset provided by
    TU Wien
    Authors
    Dilara Çakmak; Dilara Çakmak; Dilara Çakmak; Dilara Çakmak
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Apr 2025
    Description

    Context and Methodology

    Research Domain:
    The dataset is part of a project focused on retail sales forecasting. Specifically, it is designed to predict daily sales for Rossmann, a chain of over 3,000 drug stores operating across seven European countries. The project falls under the broader domain of time series analysis and machine learning applications for business optimization. The goal is to apply machine learning techniques to forecast future sales based on historical data, which includes factors like promotions, competition, holidays, and seasonal trends.

    Purpose:
    The primary purpose of this dataset is to help Rossmann store managers predict daily sales for up to six weeks in advance. By making accurate sales predictions, Rossmann can improve inventory management, staffing decisions, and promotional strategies. This dataset serves as a training set for machine learning models aimed at reducing forecasting errors and supporting decision-making processes across the company’s large network of stores.

    How the Dataset Was Created:
    The dataset was compiled from several sources, including historical sales data from Rossmann stores, promotional calendars, holiday schedules, and external factors such as competition. The data is split into multiple features, such as the store's location, promotion details, whether the store was open or closed, and weather information. The dataset is publicly available on platforms like Kaggle and was initially created for the Kaggle Rossmann Store Sales competition. The data is made accessible via an API for further analysis and modeling, and it is structured to help machine learning models predict future sales based on various input variables.

    Technical Details

    Dataset Structure:

    The dataset consists of three main files, each with its specific role:

    1. Train:
      This file contains the historical sales data, which is used to train machine learning models. It includes daily sales information for each store, as well as various features that could influence the sales (e.g., promotions, holidays, store type, etc.).

      https://handle.test.datacite.org/10.82556/yb6j-jw41
      PID: b1c59499-9c6e-42c2-af8f-840181e809db
    2. Test2:
      The test dataset mirrors the structure of train.csv but does not include the actual sales values (i.e., the target variable). This file is used for making predictions using the trained machine learning models. It is used to evaluate the accuracy of predictions when the true sales data is unknown.

      https://handle.test.datacite.org/10.82556/jerg-4b84
      PID: 7cbb845c-21dd-4b60-b990-afa8754a0dd9
    3. Store:
      This file provides metadata about each store, including information such as the store’s location, type, and assortment level. This data is essential for understanding the context in which the sales data is gathered.

      https://handle.test.datacite.org/10.82556/nqeg-gy34
      PID: 9627ec46-4ee6-4969-b14a-bda555fe34db

    Data Fields Description:

    • Id: A unique identifier for each (Store, Date) combination within the test set.

    • Store: A unique identifier for each store.

    • Sales: The daily turnover (target variable) for each store on a specific day (this is what you are predicting).

    • Customers: The number of customers visiting the store on a given day.

    • Open: An indicator of whether the store was open (1 = open, 0 = closed).

    • StateHoliday: Indicates if the day is a state holiday, with values like:

      • 'a' = public holiday,

      • 'b' = Easter holiday,

      • 'c' = Christmas,

      • '0' = no holiday.

    • SchoolHoliday: Indicates whether the store is affected by school closures (1 = yes, 0 = no).

    • StoreType: Differentiates between four types of stores: 'a', 'b', 'c', 'd'.

    • Assortment: Describes the level of product assortment in the store:

      • 'a' = basic,

      • 'b' = extra,

      • 'c' = extended.

    • CompetitionDistance: Distance (in meters) to the nearest competitor store.

    • CompetitionOpenSince[Month/Year]: The month and year when the nearest competitor store opened.

    • Promo: Indicates whether the store is running a promotion on a particular day (1 = yes, 0 = no).

    • Promo2: Indicates whether the store is participating in Promo2, a continuing promotion for some stores (1 = participating, 0 = not participating).

    • Promo2Since[Year/Week]: The year and calendar week when the store started participating in Promo2.

    • PromoInterval: Describes the months when Promo2 is active, e.g., "Feb,May,Aug,Nov" means the promotion starts in February, May, August, and November.

    Software Requirements

    To work with this dataset, you will need to have specific software installed, including:

    • DBRepo Authorization: This is required to access the datasets via the DBRepo API. You may need to authenticate with an API key or login credentials to retrieve the datasets.

    • Python Libraries: Key libraries for working with the dataset include:

      • pandas for data manipulation,

      • numpy for numerical operations,

      • matplotlib and seaborn for data visualization,

      • scikit-learn for machine learning algorithms.

    Additional Resources

    Several additional resources are available for working with the dataset:

    1. Presentation:
      A presentation summarizing the exploratory data analysis (EDA), feature engineering process, and key insights from the analysis is provided. This presentation also includes visualizations that help in understanding the dataset’s trends and relationships.

    2. Jupyter Notebook:
      A Jupyter notebook, titled Retail_Sales_Prediction_Capstone_Project.ipynb, is provided, which details the entire machine learning pipeline, from data loading and cleaning to model training and evaluation.

    3. Model Evaluation Results:
      The project includes a detailed evaluation of various machine learning models, including their performance metrics like training and testing scores, Mean Absolute Percentage Error (MAPE), and Root Mean Squared Error (RMSE). This allows for a comparison of model effectiveness in forecasting sales.

    4. Trained Models (.pkl files):
      The models trained during the project are saved as .pkl files. These files contain the trained machine learning models (e.g., Random Forest, Linear Regression, etc.) that can be loaded and used to make predictions without retraining the models from scratch.

    5. sample_submission.csv:
      This file is a sample submission file that demonstrates the format of predictions expected when using the trained model. The sample_submission.csv contains predictions made on the test dataset using the trained Random Forest model. It provides an example of how the output should be structured for submission.

    These resources provide a comprehensive guide to implementing and analyzing the sales forecasting model, helping you understand the data, methods, and results in greater detail.

  2. 2012 Economic Surveys: SB1200CSCB35 | Statistics for All U.S. Firms by...

    • data.census.gov
    Updated Feb 24, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ECN (2016). 2012 Economic Surveys: SB1200CSCB35 | Statistics for All U.S. Firms by Percent of Total Sales of Goods/Services Exported Outside the United States by Receipts Size of Firm, Gender, Ethnicity, Race, and Veteran Status for the U.S.: 2012 (ECNSVY Survey of Business Owners Survey of Business Owners Characteristics of Business) [Dataset]. https://data.census.gov/table/SBOCB2012.SB1200CSCB35?q=GO+WEST+CONSTRUCTION
    Explore at:
    Dataset updated
    Feb 24, 2016
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Authors
    ECN
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2012
    Area covered
    United States
    Description

    Release Date: 2016-02-23.[NOTE: Includes firms with paid employees and firms with no paid employees. Data are based on the 2012 Economic Census, and the estimates of business ownership by gender, ethnicity, race, and veteran status are from the 2012 Survey of Business Owners. Detail may not add to total due to rounding or because a Hispanic firm may be of any race. Moreover, each owner had the option of selecting more than one race and therefore is included in each race selected. Respondent firms include all firms that responded to the characteristic(s) tabulated in this dataset and reported gender, ethnicity, race, or veteran status or that were publicly held or not classifiable by gender, ethnicity, race, and veteran status. Percentages are for respondent firms only and are not recalculated when the dataset is resorted. Percentages are always based on total reporting (defined above) within a gender, ethnicity, race, veteran status, and/or receipts size group for the characteristics tabulated in this dataset. Firms with more than one domestic establishment are counted in each geographic area and industry in which they operate, but only once in the U.S. and state totals for all sectors. For information on confidentiality protection, sampling error, nonsampling error, and definitions, see Survey Methodology.]..Table Name. . Statistics for All U.S. Firms by Percent of Total Sales of Goods/Services Exported Outside the United States by Receipts Size of Firm, Gender, Ethnicity, Race, and Veteran Status for the U.S.: 2012. ..Release Schedule. . The data in this file was released in February 2016.. ..Key Table Information. . This data is related to all other 2012 SBO files.. Refer to the Methodology section of the Survey of Business Owners website for additional information.. ..Universe. . The universe for the 2012 Survey of Business Owners (SBO) includes all U.S. firms operating during 2012 with receipts of $1,000 or more which are classified in the North American Industry Classification System (NAICS) sectors 11 through 99, except for NAICS 111, 112, 482, 491, 521, 525, 813, 814, and 92 which are not covered. Firms with more than one domestic establishment are counted in each geographic area and industry in which they operate, but only once in the U.S. total.. In this file, "respondent firms" refers to all firms that reported gender, ethnicity, race, or veteran status for at least one owner or returned a survey form with at least one item completed and were publicly held or not classifiable by gender, ethnicity, race, and veteran status.. ..Geographic Coverage. . The data are shown at the U.S. level only.. ..Industry Coverage. . The data are shown for the total of all sectors (NAICS 00).. ..Data Items and Other Identifying Records. . Statistics for All U.S. Firms by Percent of Total Sales of Goods/Services Exported Outside the United States by Receipts Size of Firm, Gender, Ethnicity, Race, and Veteran Status for the U.S.: 2012 contains data on:. . Number of firms, firms with paid employees, and firms with no paid employees. Sales and receipts for all firms, firms with paid employees, and firms with no paid employees. Number of employees for firms with paid employees. Annual payroll for firms with paid employees. Percent of all respondent firms, respondent firms with paid employees, and respondent firms with no paid employees. Percent of sales and receipts of all respondent firms, respondent firms with paid employees, and respondent firms with no paid employees. Percent of number of employees of respondent firms with paid employees. Percent of annual payroll of respondent firms with paid employees. . The data are published by percent of total sales of goods and services that were exported outside the U.S. in 2012 and receipts size of firm and by gender, ethnicity, race, and veteran status.. ..Sort Order. . Data are presented in ascending levels by:. . Gender, ethnicity, race, and veteran status (CBGROUP). Receipts size of firm (RCPSZFI). Percent of total sales of goods and services that were exported outside the U.S. in 2012 (PEXPORT). . The data are sorted on underlying control field values, so control fields may not appear in alphabetical order.. ..FTP Download. . Download the entire SB1200CSCB35 table at: https://www2.census.gov/programs-surveys/sbo/data/2012/SB1200CSCB35.zip. ..Contact Information. . To contact the Survey of Business Owners staff:. . Visit the website at www.census.gov/programs-surveys/sbo.html.. Email general, nonsecure, and unencrypted messages to ewd.survey.of.business.owners@census.gov.. Call 301.763.3316 between 7 a.m. and 5 p.m. (EST), Monday through Friday.. Write to:. U.S. Census Bureau. Survey of Business Owners. 4600 Silver Hill Road. ...

  3. 2012 Economic Surveys: SB1200CSCB56 | Statistics for All U.S. Firms That Had...

    • data.census.gov
    Updated Feb 24, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ECN (2016). 2012 Economic Surveys: SB1200CSCB56 | Statistics for All U.S. Firms That Had E-Commerce Sales by Receipts Size of Firm, Gender, Ethnicity, Race, and Veteran Status for the U.S.: 2012 (ECNSVY Survey of Business Owners Survey of Business Owners Characteristics of Business) [Dataset]. https://data.census.gov/table/SBOCB2012.SB1200CSCB56?q=B+E+S+Manufacturing
    Explore at:
    Dataset updated
    Feb 24, 2016
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Authors
    ECN
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2012
    Area covered
    United States
    Description

    Release Date: 2016-02-23.[NOTE: Includes firms with paid employees and firms with no paid employees. Data are based on the 2012 Economic Census, and the estimates of business ownership by gender, ethnicity, race, and veteran status are from the 2012 Survey of Business Owners. Detail may not add to total due to rounding or because a Hispanic firm may be of any race. Moreover, each owner had the option of selecting more than one race and therefore is included in each race selected. Respondent firms include all firms that responded to the characteristic(s) tabulated in this dataset and reported gender, ethnicity, race, or veteran status or that were publicly held or not classifiable by gender, ethnicity, race, and veteran status. Percentages are for respondent firms only and are not recalculated when the dataset is resorted. Percentages are always based on total reporting (defined above) within a gender, ethnicity, race, veteran status, and/or receipts size group for the characteristics tabulated in this dataset. Firms with more than one domestic establishment are counted in each geographic area and industry in which they operate, but only once in the U.S. and state totals for all sectors. For information on confidentiality protection, sampling error, nonsampling error, and definitions, see Survey Methodology.]..Table Name. . Statistics for All U.S. Firms That Had E-Commerce Sales by Receipts Size of Firm, Gender, Ethnicity, Race, and Veteran Status for the U.S.: 2012. ..Release Schedule. . The data in this file was released in February 2016.. ..Key Table Information. . This data is related to all other 2012 SBO files.. Refer to the Methodology section of the Survey of Business Owners website for additional information.. ..Universe. . The universe for the 2012 Survey of Business Owners (SBO) includes all U.S. firms operating during 2012 with receipts of $1,000 or more which are classified in the North American Industry Classification System (NAICS) sectors 11 through 99, except for NAICS 111, 112, 482, 491, 521, 525, 813, 814, and 92 which are not covered. Firms with more than one domestic establishment are counted in each geographic area and industry in which they operate, but only once in the U.S. total.. In this file, "respondent firms" refers to all firms that reported gender, ethnicity, race, or veteran status for at least one owner or returned a survey form with at least one item completed and were publicly held or not classifiable by gender, ethnicity, race, and veteran status.. ..Geographic Coverage. . The data are shown at the U.S. level only.. ..Industry Coverage. . The data are shown for the total of all sectors (NAICS 00).. ..Data Items and Other Identifying Records. . Statistics for All U.S. Firms That Had E-Commerce Sales by Receipts Size of Firm, Gender, Ethnicity, Race, and Veteran Status for the U.S.: 2012 contains data on:. . Number of firms, firms with paid employees, and firms with no paid employees. Sales and receipts for all firms, firms with paid employees, and firms with no paid employees. Number of employees for firms with paid employees. Annual payroll for firms with paid employees. Percent of all respondent firms, respondent firms with paid employees, and respondent firms with no paid employees. Percent of sales and receipts of all respondent firms, respondent firms with paid employees, and respondent firms with no paid employees. Percent of number of employees of respondent firms with paid employees. Percent of annual payroll of respondent firms with paid employees. . The data are published by percent of total sales of goods and services that were from e-commerce in 2012 and receipts size of firm and by gender, ethnicity, race, and veteran status.. ..Sort Order. . Data are presented in ascending levels by:. . Gender, ethnicity, race, and veteran status (CBGROUP). Receipts size of firm (RCPSZFI). Percent of total sales of goods and services that were from e-commerce in 2012 (PECOMMRC). . The data are sorted on underlying control field values, so control fields may not appear in alphabetical order.. ..FTP Download. . Download the entire SB1200CSCB56 table at: https://www2.census.gov/programs-surveys/sbo/data/2012/SB1200CSCB56.zip. ..Contact Information. . To contact the Survey of Business Owners staff:. . Visit the website at www.census.gov/programs-surveys/sbo.html.. Email general, nonsecure, and unencrypted messages to ewd.survey.of.business.owners@census.gov.. Call 301.763.3316 between 7 a.m. and 5 p.m. (EST), Monday through Friday.. Write to:. U.S. Census Bureau. Survey of Business Owners. 4600 Silver Hill Road. Washington, DC 20233. . . ...Source: U.S. Census Bureau, 2012 Survey of Business Owners.Note: The data in this ...

  4. E-commerce revenue worldwide 2017-2030, by segment

    • statista.com
    Updated Jun 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2025). E-commerce revenue worldwide 2017-2030, by segment [Dataset]. https://www.statista.com/topics/871/online-shopping/
    Explore at:
    Dataset updated
    Jun 3, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Description

    Significant fluctuations are estimated for all segments over the forecast period for the revenue. In general, the indicator appears to exhibit a positive trend, with more segments showing increasing values rather than decreasing values until 2030. Among them, the segment Food attains the highest value throughout the entire period, reaching 1.23 trillion U.S. dollars.The Statista Market Insights cover a broad range of additional markets.

  5. Global retail e-commerce sales 2022-2028

    • statista.com
    Updated Jun 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Global retail e-commerce sales 2022-2028 [Dataset]. https://www.statista.com/statistics/379046/worldwide-retail-e-commerce-sales/
    Explore at:
    Dataset updated
    Jun 24, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Feb 2025
    Area covered
    Worldwide
    Description

    In 2024, global retail e-commerce sales reached an estimated ************ U.S. dollars. Projections indicate a ** percent growth in this figure over the coming years, with expectations to come close to ************** dollars by 2028. World players Among the key players on the world stage, the American marketplace giant Amazon holds the title of the largest e-commerce player globally, with a gross merchandise value of nearly *********** U.S. dollars in 2024. Amazon was also the most valuable retail brand globally, followed by mostly American competitors such as Walmart and the Home Depot. Leading e-tailing regions E-commerce is a dormant channel globally, but nowhere has it been as successful as in Asia. In 2024, the e-commerce revenue in that continent alone was measured at nearly ************ U.S. dollars, outperforming the Americas and Europe. That year, the up-and-coming e-commerce markets also centered around Asia. The Philippines and India stood out as the swiftest-growing e-commerce markets based on online sales, anticipating a growth rate surpassing ** percent.

  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Dilara Çakmak; Dilara Çakmak; Dilara Çakmak; Dilara Çakmak (2025). Evaluating FAIR Models for Rossmann Store Sales Prediction: Insights and Performance Analysis [Dataset]. http://doi.org/10.70124/f5t2d-xt904

Evaluating FAIR Models for Rossmann Store Sales Prediction: Insights and Performance Analysis

Explore at:
bin, json, text/markdown, csvAvailable download formats
Dataset updated
Apr 28, 2025
Dataset provided by
TU Wien
Authors
Dilara Çakmak; Dilara Çakmak; Dilara Çakmak; Dilara Çakmak
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered
Apr 2025
Description

Context and Methodology

Research Domain:
The dataset is part of a project focused on retail sales forecasting. Specifically, it is designed to predict daily sales for Rossmann, a chain of over 3,000 drug stores operating across seven European countries. The project falls under the broader domain of time series analysis and machine learning applications for business optimization. The goal is to apply machine learning techniques to forecast future sales based on historical data, which includes factors like promotions, competition, holidays, and seasonal trends.

Purpose:
The primary purpose of this dataset is to help Rossmann store managers predict daily sales for up to six weeks in advance. By making accurate sales predictions, Rossmann can improve inventory management, staffing decisions, and promotional strategies. This dataset serves as a training set for machine learning models aimed at reducing forecasting errors and supporting decision-making processes across the company’s large network of stores.

How the Dataset Was Created:
The dataset was compiled from several sources, including historical sales data from Rossmann stores, promotional calendars, holiday schedules, and external factors such as competition. The data is split into multiple features, such as the store's location, promotion details, whether the store was open or closed, and weather information. The dataset is publicly available on platforms like Kaggle and was initially created for the Kaggle Rossmann Store Sales competition. The data is made accessible via an API for further analysis and modeling, and it is structured to help machine learning models predict future sales based on various input variables.

Technical Details

Dataset Structure:

The dataset consists of three main files, each with its specific role:

  1. Train:
    This file contains the historical sales data, which is used to train machine learning models. It includes daily sales information for each store, as well as various features that could influence the sales (e.g., promotions, holidays, store type, etc.).

    https://handle.test.datacite.org/10.82556/yb6j-jw41
    PID: b1c59499-9c6e-42c2-af8f-840181e809db
  2. Test2:
    The test dataset mirrors the structure of train.csv but does not include the actual sales values (i.e., the target variable). This file is used for making predictions using the trained machine learning models. It is used to evaluate the accuracy of predictions when the true sales data is unknown.

    https://handle.test.datacite.org/10.82556/jerg-4b84
    PID: 7cbb845c-21dd-4b60-b990-afa8754a0dd9
  3. Store:
    This file provides metadata about each store, including information such as the store’s location, type, and assortment level. This data is essential for understanding the context in which the sales data is gathered.

    https://handle.test.datacite.org/10.82556/nqeg-gy34
    PID: 9627ec46-4ee6-4969-b14a-bda555fe34db

Data Fields Description:

  • Id: A unique identifier for each (Store, Date) combination within the test set.

  • Store: A unique identifier for each store.

  • Sales: The daily turnover (target variable) for each store on a specific day (this is what you are predicting).

  • Customers: The number of customers visiting the store on a given day.

  • Open: An indicator of whether the store was open (1 = open, 0 = closed).

  • StateHoliday: Indicates if the day is a state holiday, with values like:

    • 'a' = public holiday,

    • 'b' = Easter holiday,

    • 'c' = Christmas,

    • '0' = no holiday.

  • SchoolHoliday: Indicates whether the store is affected by school closures (1 = yes, 0 = no).

  • StoreType: Differentiates between four types of stores: 'a', 'b', 'c', 'd'.

  • Assortment: Describes the level of product assortment in the store:

    • 'a' = basic,

    • 'b' = extra,

    • 'c' = extended.

  • CompetitionDistance: Distance (in meters) to the nearest competitor store.

  • CompetitionOpenSince[Month/Year]: The month and year when the nearest competitor store opened.

  • Promo: Indicates whether the store is running a promotion on a particular day (1 = yes, 0 = no).

  • Promo2: Indicates whether the store is participating in Promo2, a continuing promotion for some stores (1 = participating, 0 = not participating).

  • Promo2Since[Year/Week]: The year and calendar week when the store started participating in Promo2.

  • PromoInterval: Describes the months when Promo2 is active, e.g., "Feb,May,Aug,Nov" means the promotion starts in February, May, August, and November.

Software Requirements

To work with this dataset, you will need to have specific software installed, including:

  • DBRepo Authorization: This is required to access the datasets via the DBRepo API. You may need to authenticate with an API key or login credentials to retrieve the datasets.

  • Python Libraries: Key libraries for working with the dataset include:

    • pandas for data manipulation,

    • numpy for numerical operations,

    • matplotlib and seaborn for data visualization,

    • scikit-learn for machine learning algorithms.

Additional Resources

Several additional resources are available for working with the dataset:

  1. Presentation:
    A presentation summarizing the exploratory data analysis (EDA), feature engineering process, and key insights from the analysis is provided. This presentation also includes visualizations that help in understanding the dataset’s trends and relationships.

  2. Jupyter Notebook:
    A Jupyter notebook, titled Retail_Sales_Prediction_Capstone_Project.ipynb, is provided, which details the entire machine learning pipeline, from data loading and cleaning to model training and evaluation.

  3. Model Evaluation Results:
    The project includes a detailed evaluation of various machine learning models, including their performance metrics like training and testing scores, Mean Absolute Percentage Error (MAPE), and Root Mean Squared Error (RMSE). This allows for a comparison of model effectiveness in forecasting sales.

  4. Trained Models (.pkl files):
    The models trained during the project are saved as .pkl files. These files contain the trained machine learning models (e.g., Random Forest, Linear Regression, etc.) that can be loaded and used to make predictions without retraining the models from scratch.

  5. sample_submission.csv:
    This file is a sample submission file that demonstrates the format of predictions expected when using the trained model. The sample_submission.csv contains predictions made on the test dataset using the trained Random Forest model. It provides an example of how the output should be structured for submission.

These resources provide a comprehensive guide to implementing and analyzing the sales forecasting model, helping you understand the data, methods, and results in greater detail.

Search
Clear search
Close search
Google apps
Main menu