77 datasets found

d
Warehouse and Retail Sales
catalog.data.gov
data.montgomerycountymd.gov
+3more
Updated Aug 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.montgomerycountymd.gov (2025). Warehouse and Retail Sales [Dataset]. https://catalog.data.gov/dataset/warehouse-and-retail-sales
Explore at:
Dataset updated
Aug 11, 2025
Dataset provided by
data.montgomerycountymd.gov
Description
This dataset contains a list of sales and movement data by item and department appended monthly. Update Frequency : Monthly
Online Sales Dataset - Popular Marketplace Data
kaggle.com
Updated May 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ShreyanshVerma27 (2024). Online Sales Dataset - Popular Marketplace Data [Dataset]. https://www.kaggle.com/datasets/shreyanshverma27/online-sales-dataset-popular-marketplace-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 25, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
ShreyanshVerma27
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset provides a comprehensive overview of online sales transactions across different product categories. Each row represents a single transaction with detailed information such as the order ID, date, category, product name, quantity sold, unit price, total price, region, and payment method.

Columns:

Order ID: Unique identifier for each sales order.

Date:Date of the sales transaction.

Category:Broad category of the product sold (e.g., Electronics, Home Appliances, Clothing, Books, Beauty Products, Sports).

Product Name:Specific name or model of the product sold.

Quantity:Number of units of the product sold in the transaction.

Unit Price:Price of one unit of the product.

Total Price: Total revenue generated from the sales transaction (Quantity * Unit Price).

Region:Geographic region where the transaction occurred (e.g., North America, Europe, Asia).

Payment Method: Method used for payment (e.g., Credit Card, PayPal, Debit Card).

Insights:

1. Analyze sales trends over time to identify seasonal patterns or growth opportunities.

2. Explore the popularity of different product categories across regions.

3. Investigate the impact of payment methods on sales volume or revenue.

4. Identify top-selling products within each category to optimize inventory and marketing strategies.

5. Evaluate the performance of specific products or categories in different regions to tailor marketing campaigns accordingly.
Company Datasets for Business Profiling
datarade.ai
Updated Feb 23, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oxylabs (2017). Company Datasets for Business Profiling [Dataset]. https://datarade.ai/data-products/company-datasets-for-business-profiling-oxylabs
Explore at:
.json, .xml, .csv, .xlsAvailable download formats
Dataset updated
Feb 23, 2017
Dataset authored and provided by
Oxylabs
Area covered
Andorra, Taiwan, Nepal, Tunisia, Isle of Man, Canada, Moldova (Republic of), Northern Mariana Islands, British Indian Ocean Territory, Bangladesh
Description
Company Datasets for valuable business insights!

Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.

These datasets are sourced from top industry providers, ensuring you have access to high-quality information:

Owler: Gain valuable business insights and competitive intelligence. -AngelList: Receive fresh startup data transformed into actionable insights. -CrunchBase: Access clean, parsed, and ready-to-use business data from private and public companies. -Craft.co: Make data-informed business decisions with Craft.co's company datasets. -Product Hunt: Harness the Product Hunt dataset, a leader in curating the best new products.

We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:

Company name;

Size;

Founding date;

Location;

Industry;

Revenue;

Employee count;

Competitors.

You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.

Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.

With Oxylabs Datasets, you can count on:

Fresh and accurate data collected and parsed by our expert web scraping team.

Time and resource savings, allowing you to focus on data analysis and achieving your business goals.

A customized approach tailored to your specific business needs.

Legal compliance in line with GDPR and CCPA standards, thanks to our membership in the Ethical Web Data Collection Initiative.

Pricing Options:

Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

Experience a seamless journey with Oxylabs:

Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.

Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.

Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.

Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!
Tata Motors Sales Analysis (2021-2022)
kaggle.com
Updated Sep 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
numen_Vikrant (2023). Tata Motors Sales Analysis (2021-2022) [Dataset]. https://www.kaggle.com/datasets/numenvikrant/tata-motors-sales-analysis-2021-2022
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 15, 2023
Dataset provided by
Kaggle
Authors
numen_Vikrant
Description
I'm excited to share my recent project where I dived deep into the world of data analysis to gain valuable insights into Tata Motors' sales data for the fiscal year 2021-2022. 📈

Project Highlights:

Data Processing and Cleaning: I meticulously cleaned and processed the dataset, ensuring accuracy and reliability in the analysis.

In-Depth Analysis: Through advanced analytical techniques, I uncovered patterns, trends, and key metrics within the data, helping to reveal critical business insights.

Data Visualization: I transformed the complex sales data into clear and insightful visual representations, making it easier for stakeholders to grasp the findings.

Interactive Dashboard: I designed an interactive dashboard that allows users to explore the data dynamically, facilitating a deeper understanding of the sales performance.

Findings: Tata Motors achieved 105% growth in sales, marking an impressive 126% profit increase compared to the year 2021.

This remarkable growth not only showcases the company's resilience but also the effectiveness of their strategies and operations. It's a testament to the hard work and dedication of the entire Tata Motors team.
B
Data Cleaning Sample
borealisdata.ca
Updated Jul 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/ZCN177
Dataset updated
Jul 13, 2023
Dataset provided by
Borealis
Authors
Rong Luo
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Sample data for exercises in Further Adventures in Data Cleaning.
4
Real-world sales forecasting benchmark data - Extended version
data.4tu.nl
zip
Updated Apr 20, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emir Žunić (2021). Real-world sales forecasting benchmark data - Extended version [Dataset]. http://doi.org/10.4121/14406134.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/14406134.v1
Dataset updated
Apr 20, 2021
Dataset provided by
4TU.ResearchData
Authors
Emir Žunić
Description
This dataset contains two .csv files that can be used as a new benchmark data for the solving of real-world sales forecasting problem. All data are real and obtained experimentally in production environment in one of the biggest retail company in Bosnia and Herzegovina.The available data in this dataset are in period from 2014/03/01 to 2021/03/01. Data are aggregated on monthly basis for 50 top items of one very popular brand in 4 different organizational units.
Store Sales - T.S Forecasting...Merged Dataset
kaggle.com
Updated Dec 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shramana Bhattacharya (2021). Store Sales - T.S Forecasting...Merged Dataset [Dataset]. https://www.kaggle.com/shramanabhattacharya/store-sales-ts-forecastingmerged-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 15, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Shramana Bhattacharya
Description
This dataset is a merged dataset created from the data provided in the competition "Store Sales - Time Series Forecasting". The other datasets that were provided there apart from train and test (for example holidays_events, oil, stores, etc.) could not be used in the final prediction. According to my understanding, through the EDA of the merged dataset, we will be able to get a clearer picture of the other factors that might also affect the final prediction of grocery sales. Therefore, I created this merged dataset and posted it here for the further scope of analysis.

##### Data Description Data Field Information (This is a copy of the description as provided in the actual dataset)

Train.csv - id: store id - date: date of the sale - store_nbr: identifies the store at which the products are sold. -**family**: identifies the type of product sold. - sales: gives the total sales for a product family at a particular store at a given date. Fractional values are possible since products can be sold in fractional units (1.5 kg of cheese, for instance, as opposed to 1 bag of chips). - onpromotion: gives the total number of items in a product family that were being promoted at a store on a given date. - Store metadata, including ****city, state, type, and cluster.**** - cluster is a grouping of similar stores. - Holidays and Events, with metadata NOTE: Pay special attention to the transferred column. A holiday that is transferred officially falls on that calendar day but was moved to another date by the government. A transferred day is more like a normal day than a holiday. To find the day that it was celebrated, look for the corresponding row where the type is Transfer. For example, the holiday Independencia de Guayaquil was transferred from 2012-10-09 to 2012-10-12, which means it was celebrated on 2012-10-12. Days that are type Bridge are extra days that are added to a holiday (e.g., to extend the break across a long weekend). These are frequently made up by the type Work Day which is a day not normally scheduled for work (e.g., Saturday) that is meant to pay back the Bridge. Additional holidays are days added to a regular calendar holiday, for example, as typically happens around Christmas (making Christmas Eve a holiday). - dcoilwtico: Daily oil price. Includes values during both the train and test data timeframes. (Ecuador is an oil-dependent country and its economic health is highly vulnerable to shocks in oil prices.)

**Note: ***There is a transaction column in the training dataset which displays the sales transactions on that particular date. * Test.csv - The test data, having the same features like the training data. You will predict the target sales for the dates in this file. - The dates in the test data are for the 15 days after the last date in the training data. **Note: ***There is a no transaction column in the test dataset as was there in the training dataset. Therefore, while building the model, you might exclude this column and may use it only for EDA.*

submission.csv - A sample submission file in the correct format.
Datasets for Sentiment Analysis
zenodo.org
csv
Updated Dec 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10157504
Dataset updated
Dec 10, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.

----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
product_id - Product ID
product_name - Name of the Product
category - Category of the Product
discounted_price - Discounted Price of the Product
actual_price - Actual Price of the Product
discount_percentage - Percentage of Discount for the Product
rating - Rating of the Product
rating_count - Number of people who voted for the Amazon rating
about_product - Description about the Product
user_id - ID of the user who wrote review for the Product
user_name - Name of the user who wrote review for the Product
review_id - ID of the user review
review_title - Short review
review_content - Long review
img_link - Image Link of the Product
product_link - Official Website Link of the Product
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
G
Real manufacturing sales, orders, inventory owned and inventory to sales...
open.canada.ca
www150.statcan.gc.ca
+1more
csv, html, xml
Updated Jul 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statistics Canada (2025). Real manufacturing sales, orders, inventory owned and inventory to sales ratio, 2017 dollars, seasonally adjusted [Dataset]. https://open.canada.ca/data/dataset/7bf43dd1-af41-4c6f-871e-4c653aad27d0
Explore at:
csv, html, xmlAvailable download formats
Dataset updated
Jul 15, 2025
Dataset provided by
Statistics Canada
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Description
Canadian Sales of goods manufactured (shipments), new orders, unfilled orders, inventories, raw materials, goods or work in process, finished goods, and inventory to sales ratios for durable and non-durable goods by North American Industry Classification System (NAICS) for reference periods January 2002 to the current reference month. Not all combinations are available. Values are in constant dollars.
A
‘Coffee shop sample data (11.1.3+)’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 3, 2011
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2011). ‘Coffee shop sample data (11.1.3+)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-coffee-shop-sample-data-11-1-3-454e/4f64962f/?iid=015-492&v=presentation
Explore at:
Dataset updated
Jan 3, 2011
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Coffee shop sample data (11.1.3+)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/ylchang/coffee-shop-sample-data-1113 on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

This sample data module contains representative retail data from a fictional coffee chain. The source data is contained in an uploaded file named April Sales.zip. Source: IBM.

We have created sample data for a fictional coffee shop chain with three locations in New York city. The chain has purchased IBM Cognos Analytics to identify factors that contribute to their success, and ultimately to make data-informed decisions.

Amber and Sandeep are the co-founders of the coffee chain. They uploaded their data in a series of spreadsheets and created a data module. From that data, they designed an operations dashboard and a marketing dashboard.

Inventory

Amber and Sandeep have created two dashboards and one data module that is based on nine spreadsheets:

Coffee operations: This sample dashboard demonstrates operational data from a fictional coffee chain. Location: Team content > Samples > Dashboards.

Coffee marketing: This sample dashboard demonstrates marketing data from a fictional coffee chain. Location: Team content > Samples > Dashboards.

Coffee sales and marketing: This sample data module contains representative retail data from a fictional coffee chain. Location: Team content > Samples > Data.

April Sales.zip: This sample data contains representative retail data from a fictional coffee chain. This ZIP file contains nine related CSV files. Location: Team content > Samples > Data > Source files > Retail.

Content

Data

The sample data module named Coffee sales and marketing can be found in Team content > Samples > Data. There are nine tables:

Sales Receipts

Pastry Inventory

Sales Targets

Customer

Dates

Product

Sales Outlet

Staff

Generation

Acknowledgements

https://community.ibm.com/community/user/businessanalytics/blogs/steven-macko/2019/07/12/beanie-coffee-1113

--- Original source retains full ownership of the source dataset ---
US Real Estate
zenrows.com
csv
Updated Jun 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ZenRows (2021). US Real Estate [Dataset]. https://www.zenrows.com/datasets/us-real-estate
Explore at:
csv(5,8MB)Available download formats
Dataset updated
Jun 27, 2021
Dataset provided by
ZenRows S.L.
Authors
ZenRows
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Area covered
United States
Description
High-quality, free real estate dataset from all around the United States, in CSV format. Over 10.000 records relevant to Real Estate investors, agents, and data scientists. We are working on complete datasets from a wide variety of countries. Don't hesitate to contact us for more information.
Europe Bike Store Sales
kaggle.com
Updated Mar 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PrepInsta Technologies (2023). Europe Bike Store Sales [Dataset]. https://www.kaggle.com/datasets/prepinstaprime/europe-bike-store-sales
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 21, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
PrepInsta Technologies
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Europe
Description
In the Europe bikes dataset, Extract the insight into sales in each country and each state of their countries using Excel.
t
Evaluating FAIR Models for Rossmann Store Sales Prediction: Insights and...
test.researchdata.tuwien.ac.at
bin, csv, json +1
Updated Apr 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dilara Çakmak; Dilara Çakmak; Dilara Çakmak; Dilara Çakmak (2025). Evaluating FAIR Models for Rossmann Store Sales Prediction: Insights and Performance Analysis [Dataset]. http://doi.org/10.70124/f5t2d-xt904
Explore at:
csv, text/markdown, json, binAvailable download formats
Unique identifier
https://doi.org/10.70124/f5t2d-xt904
Dataset updated
Apr 28, 2025
Dataset provided by
TU Wien
Authors
Dilara Çakmak; Dilara Çakmak; Dilara Çakmak; Dilara Çakmak
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 2025
Description
Context and Methodology

Research Domain:
The dataset is part of a project focused on retail sales forecasting. Specifically, it is designed to predict daily sales for Rossmann, a chain of over 3,000 drug stores operating across seven European countries. The project falls under the broader domain of time series analysis and machine learning applications for business optimization. The goal is to apply machine learning techniques to forecast future sales based on historical data, which includes factors like promotions, competition, holidays, and seasonal trends.

Purpose:
The primary purpose of this dataset is to help Rossmann store managers predict daily sales for up to six weeks in advance. By making accurate sales predictions, Rossmann can improve inventory management, staffing decisions, and promotional strategies. This dataset serves as a training set for machine learning models aimed at reducing forecasting errors and supporting decision-making processes across the company’s large network of stores.

How the Dataset Was Created:
The dataset was compiled from several sources, including historical sales data from Rossmann stores, promotional calendars, holiday schedules, and external factors such as competition. The data is split into multiple features, such as the store's location, promotion details, whether the store was open or closed, and weather information. The dataset is publicly available on platforms like Kaggle and was initially created for the Kaggle Rossmann Store Sales competition. The data is made accessible via an API for further analysis and modeling, and it is structured to help machine learning models predict future sales based on various input variables.

Technical Details

Dataset Structure:

The dataset consists of three main files, each with its specific role:

Train:
This file contains the historical sales data, which is used to train machine learning models. It includes daily sales information for each store, as well as various features that could influence the sales (e.g., promotions, holidays, store type, etc.).

https://handle.test.datacite.org/10.82556/yb6j-jw41
PID: b1c59499-9c6e-42c2-af8f-840181e809db

Test2:
The test dataset mirrors the structure of train.csv but does not include the actual sales values (i.e., the target variable). This file is used for making predictions using the trained machine learning models. It is used to evaluate the accuracy of predictions when the true sales data is unknown.

https://handle.test.datacite.org/10.82556/jerg-4b84
PID: 7cbb845c-21dd-4b60-b990-afa8754a0dd9

Store:
This file provides metadata about each store, including information such as the store’s location, type, and assortment level. This data is essential for understanding the context in which the sales data is gathered.

https://handle.test.datacite.org/10.82556/nqeg-gy34
PID: 9627ec46-4ee6-4969-b14a-bda555fe34db

Data Fields Description:

Id: A unique identifier for each (Store, Date) combination within the test set.

Store: A unique identifier for each store.

Sales: The daily turnover (target variable) for each store on a specific day (this is what you are predicting).

Customers: The number of customers visiting the store on a given day.

Open: An indicator of whether the store was open (1 = open, 0 = closed).

StateHoliday: Indicates if the day is a state holiday, with values like:

'a' = public holiday,

'b' = Easter holiday,

'c' = Christmas,

'0' = no holiday.

SchoolHoliday: Indicates whether the store is affected by school closures (1 = yes, 0 = no).

StoreType: Differentiates between four types of stores: 'a', 'b', 'c', 'd'.

Assortment: Describes the level of product assortment in the store:

'a' = basic,

'b' = extra,

'c' = extended.

CompetitionDistance: Distance (in meters) to the nearest competitor store.

CompetitionOpenSince[Month/Year]: The month and year when the nearest competitor store opened.

Promo: Indicates whether the store is running a promotion on a particular day (1 = yes, 0 = no).

Promo2: Indicates whether the store is participating in Promo2, a continuing promotion for some stores (1 = participating, 0 = not participating).

Promo2Since[Year/Week]: The year and calendar week when the store started participating in Promo2.

PromoInterval: Describes the months when Promo2 is active, e.g., "Feb,May,Aug,Nov" means the promotion starts in February, May, August, and November.

Software Requirements

To work with this dataset, you will need to have specific software installed, including:

DBRepo Authorization: This is required to access the datasets via the DBRepo API. You may need to authenticate with an API key or login credentials to retrieve the datasets.

Python Libraries: Key libraries for working with the dataset include:

pandas for data manipulation,

numpy for numerical operations,

matplotlib and seaborn for data visualization,

scikit-learn for machine learning algorithms.

Additional Resources

Several additional resources are available for working with the dataset:

Presentation:
A presentation summarizing the exploratory data analysis (EDA), feature engineering process, and key insights from the analysis is provided. This presentation also includes visualizations that help in understanding the dataset’s trends and relationships.

Jupyter Notebook:
A Jupyter notebook, titled Retail_Sales_Prediction_Capstone_Project.ipynb, is provided, which details the entire machine learning pipeline, from data loading and cleaning to model training and evaluation.

Model Evaluation Results:
The project includes a detailed evaluation of various machine learning models, including their performance metrics like training and testing scores, Mean Absolute Percentage Error (MAPE), and Root Mean Squared Error (RMSE). This allows for a comparison of model effectiveness in forecasting sales.

Trained Models (.pkl files):
The models trained during the project are saved as .pkl files. These files contain the trained machine learning models (e.g., Random Forest, Linear Regression, etc.) that can be loaded and used to make predictions without retraining the models from scratch.

sample_submission.csv:
This file is a sample submission file that demonstrates the format of predictions expected when using the trained model. The sample_submission.csv contains predictions made on the test dataset using the trained Random Forest model. It provides an example of how the output should be structured for submission.

These resources provide a comprehensive guide to implementing and analyzing the sales forecasting model, helping you understand the data, methods, and results in greater detail.
d
US B2B Contact Data | 200M+ Verified Records | 95% Accuracy | API/CSV/JSON
datarade.ai
.json, .csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Forager.ai, US B2B Contact Data | 200M+ Verified Records | 95% Accuracy | API/CSV/JSON [Dataset]. https://datarade.ai/data-products/us-b2b-contact-data-180m-records-bi-weekly-updates-csv-forager-ai
Explore at:
.json, .csvAvailable download formats
Dataset provided by
Forager.ai
Area covered
United States of America
Description
US B2B Contact Database | 200M+ Verified Records | 95% Accuracy | API/CSV/JSON Elevate your sales and marketing efforts with America's most comprehensive B2B contact data, featuring over 200M+ verified records of decision-makers, from CEOs to managers, across all industries. Powered by AI and refreshed bi-weekly, this dataset ensures you have access to the freshest, most accurate contact details available for effective outreach and engagement.

Key Features & Stats:

200M+ Decision-Makers: Includes C-level executives, VPs, Directors, and Managers.

95% Accuracy: Email & Phone numbers verified for maximum deliverability.

Bi-Weekly Updates: Never waste time on outdated leads with our frequent data refreshes.

50+ Data Points: Comprehensive firmographic, technographic, and contact details.

Core Fields:

Direct Work Emails & Personal Emails for effective outreach.

Mobile Phone Numbers for cold calls and SMS campaigns.

Full Name, Job Title, Seniority for better personalization.

Company Insights: Size, Revenue, Funding data, Industry, and Tech Stack for a complete profile.

Location: HQ and regional offices to target local, national, or international markets.

Top Use Cases:

Cold Email & Calling Campaigns: Target the right people with accurate contact data.

CRM & Marketing Automation Enrichment: Enhance your CRM with enriched data for better lead management.

ABM & Sales Intelligence: Target the right decision-makers and personalize your approach.

Recruiting & Talent Mapping: Access CEO and senior leadership data for executive search.

Instant Delivery Options:

JSON – Bulk downloads via S3 for easy integration.

REST API – Real-time integration for seamless workflow automation.

CRM Sync – Direct integration with your CRM for streamlined lead management.

Enterprise-Grade Quality:

SOC 2 Compliant: Ensuring the highest standards of security and data privacy.

GDPR/CCPA Ready: Fully compliant with global data protection regulations.

Triple-Verification Process: Ensuring the accuracy and deliverability of every record.

Suppression List Management: Eliminate irrelevant or non-opt-in contacts from your outreach.

US Business Contacts | B2B Email Database | Sales Leads | CRM Enrichment | Verified Phone Numbers | ABM Data | CEO Contact Data | US B2B Leads | US prospects data
d
Dataset of Newly Incorporated Entities | Company Data | API | CSV | JSON |...
datarade.ai
.json, .csv
Updated Dec 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HitHorizons (2024). Dataset of Newly Incorporated Entities | Company Data | API | CSV | JSON | 80M+ Companies | 50 EU Countries | GDPR-Compliant | Monthly Updated | 2025 [Dataset]. https://datarade.ai/data-products/dataset-of-newly-established-companies-api-csv-json-8-hithorizons
Explore at:
.json, .csvAvailable download formats
Dataset updated
Dec 24, 2024
Dataset authored and provided by
HitHorizons
Area covered
Turkmenistan, Austria, Portugal, France, Moldova (Republic of), San Marino, Monaco, Luxembourg, Turkey, Latvia, European Union
Description
HitHorizons Newly Established Companies Dataset gives access to aggregated firmographic data on 80M+ companies from the whole of Europe and beyond.

Company registration data: company name national identifier and its type registered address: street, postal code, city, state / province, country business activity: SIC code, local activity code with classification system year of establishment company type location type

Sales and number of employees data: sales in EUR, USD and local currency (with local currency code) total number of employees sales and number of employees accuracy local number of employees (in case of multiple branches) companies’ sales and number of employees market position compared to other companies in a country / industry / region

Industry data: size of the whole industry size of all companies operating within a particular SIC code benchmarking within a particular country or industry regional benchmarking (EU 27, state / province)

Contact details: company website company email domain (without person’s name)

Invoicing details available for selected countries: company name company address company VAT number
d
B2B Marketing Data | API | CSV | JSON | Screener | 80M+ Companies | 50...
datarade.ai
.json, .csv
Updated Aug 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HitHorizons (2025). B2B Marketing Data | API | CSV | JSON | Screener | 80M+ Companies | 50 European Countries | Data Enrichment | GDPR-Compliant | Monthly Updated (Copy) [Dataset]. https://datarade.ai/data-products/b2b-marketing-data-api-csv-json-screener-80m-compa-hithorizons
Explore at:
.json, .csvAvailable download formats
Dataset updated
Aug 8, 2025
Dataset authored and provided by
HitHorizons
Area covered
Spain, San Marino, Holy See, Poland, Bulgaria, Greece, Slovenia, Monaco, Austria, Uzbekistan, Europe
Description
Our API provides access to aggregated company information on 80M+ businesses across Europe and beyond.

Company Registration Data: - Company name - National identifier and its type - Registered address: street, postal code, city, state/province, country - Business activity: SIC code, local activity code with classification system - Year of establishment - Company type - Location type

Sales and Number of Employees Data: - Sales in EUR, USD, and local currency (with local currency code) - Total number of employees - Sales and number of employees accuracy - Local number of employees (for multiple branches) - Companies’ sales and number of employees market position compared to others in a country/industry/region

Industry Data: - Size of the whole industry - Size of all companies operating within a particular SIC code - Benchmarking within a particular country or industry - Regional benchmarking (EU 27, state/province)

Contact Details: - Company website - Company email domain (without person’s name)

Invoicing Details (Available for Selected Countries): - Company name - Company address - Company VAT number
Price Paid Data
gov.uk
Updated Jul 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HM Land Registry (2025). Price Paid Data [Dataset]. https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads
Explore at:
Dataset updated
Jul 28, 2025
Dataset provided by
GOV.UKhttp://gov.uk/
Authors
HM Land Registry
Description
Our Price Paid Data includes information on all property sales in England and Wales that are sold for value and are lodged with us for registration.

Get up to date with the permitted use of our Price Paid Data:
check what to consider when using or publishing our Price Paid Data

Using or publishing our Price Paid Data

If you use or publish our Price Paid Data, you must add the following attribution statement:

Contains HM Land Registry data © Crown copyright and database right 2021. This data is licensed under the Open Government Licence v3.0.

Price Paid Data is released under the http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/" class="govuk-link">Open Government Licence (OGL). You need to make sure you understand the terms of the OGL before using the data.

Under the OGL, HM Land Registry permits you to use the Price Paid Data for commercial or non-commercial purposes. However, OGL does not cover the use of third party rights, which we are not authorised to license.

Price Paid Data contains address data processed against Ordnance Survey’s AddressBase Premium product, which incorporates Royal Mail’s PAF® database (Address Data). Royal Mail and Ordnance Survey permit your use of Address Data in the Price Paid Data:

for personal and/or non-commercial use

to display for the purpose of providing residential property price information services

If you want to use the Address Data in any other way, you must contact Royal Mail. Email address.management@royalmail.com.

Address data

The following fields comprise the address data included in Price Paid Data:

Postcode

PAON Primary Addressable Object Name (typically the house number or name)

SAON Secondary Addressable Object Name – if there is a sub-building, for example, the building is divided into flats, there will be a SAON

Street

Locality

Town/City

District

County

June 2025 data (current month)

The June 2025 release includes:

the first release of data for June 2025 (transactions received from the first to the last day of the month)

updates to earlier data releases

Standard Price Paid Data (SPPD) and Additional Price Paid Data (APPD) transactions

As we will be adding to the June data in future releases, we would not recommend using it in isolation as an indication of market or HM Land Registry activity. When the full dataset is viewed alongside the data we’ve previously published, it adds to the overall picture of market activity.

Your use of Price Paid Data is governed by conditions and by downloading the data you are agreeing to those conditions.

Google Chrome (Chrome 88 onwards) is blocking downloads of our Price Paid Data. Please use another internet browser while we resolve this issue. We apologise for any inconvenience caused.

We update the data on the 20th working day of each month. You can download the:

http://prod.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-monthly-update-new-version.csv" class="govuk-link">current month as a CSV file (CSV, 18.5MB)

http://prod.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-monthly-update.txt" class="govuk-link">current month as a text file (TXT, 17.9MB)

Single file

These include standard and additional price paid data transactions received at HM Land Registry from 1 January 1995 to the most current monthly data.

Your use of Price Paid Data is governed by conditions and by downloading the data you are agreeing to those conditions.

The data is updated monthly and the average size of this file is 3.7 GB, you can download:

<a
LinkedIn Data | C-Level Executives Worldwide | Verified Work Emails &...
datarade.ai
Updated Jan 1, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Success.ai (2018). LinkedIn Data | C-Level Executives Worldwide | Verified Work Emails & Contact Details from 700M+ Dataset | Best Price Guarantee [Dataset]. https://datarade.ai/data-products/linkedin-data-c-level-executives-worldwide-verified-work-success-ai
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Jan 1, 2018
Dataset provided by
Area covered
Malta, Bermuda, Burundi, Latvia, Saint Pierre and Miquelon, Marshall Islands, Cambodia, Palestine, Netherlands, United States Minor Outlying Islands
Description
Success.ai proudly offers our exclusive LinkedIn Data product, targeting C-level executives from around the globe. This premium dataset is meticulously curated to empower your business development, recruitment strategies, and market research efforts with direct access to top-tier professionals.

Global Reach and Detailed Insights: Our LinkedIn Data encompasses profiles of C-level executives worldwide, offering detailed insights that include professional histories, current and past affiliations, as well as direct contact information such as verified work emails and phone numbers. This data spans across industries such as finance, technology, healthcare, manufacturing, and more, ensuring you have comprehensive coverage no matter your sector focus.

Accuracy and Compliance: Accuracy is paramount in executive-level data. Each profile within our dataset undergoes rigorous verification processes, using advanced AI algorithms to ensure data accuracy and reliability. Our datasets are also compliant with global data privacy laws such as GDPR, CCPA, and others, providing you with data you can trust and use with confidence.

Empower Your Business Strategies: Leverage our LinkedIn Data to enhance various business functions:

Sales and Marketing: Directly reach decision-makers, reducing sales cycles and increasing conversion rates. Recruitment and Talent Acquisition: Identify and engage with potential candidates for executive roles within your organization. Market Research and Competitive Analysis: Gain insights into competitor leadership and strategic moves by analyzing executive backgrounds and professional networks. Robust Data Points Include:

Full Names and Titles: Gain access to the full names and current positions of C-level executives. Professional Emails and Phone Numbers: Direct communication channels to ensure your messages reach the intended audience. Company Information: Understand the organizational context with details about the company size, industry, and role within the corporation. Professional History: Detailed career trajectories, highlighting roles, responsibilities, and achievements. Education and Certifications: Educational backgrounds and certifications that enrich the professional profiles of these executives. Flexible Delivery and Integration: Our LinkedIn Data is available in multiple formats, including CSV, Excel, and via API, allowing easy integration into your CRM systems or other sales platforms. We provide continuous updates to our datasets, ensuring you always have access to the most current information available.

Competitive Pricing with Best Price Guarantee: Success.ai offers this valuable data at the most competitive rates in the industry, backed by our best price guarantee. We are committed to providing you with the highest quality data at prices that fit your budget, ensuring excellent return on investment.

Sample Data and Custom Solutions: To demonstrate the quality and depth of our LinkedIn Data, we offer a sample dataset for initial evaluation. For specific needs, our team is skilled at creating customized datasets tailored to your exact business requirements.

Client Success Stories: Our clients, from startups to Fortune 500 companies, have successfully leveraged our LinkedIn Data to drive growth and strategic initiatives. We provide case studies and testimonials that showcase the effectiveness of our data in real-world applications.

Engage with Success.ai Today: Connect with us to explore how our LinkedIn Data can transform your strategic initiatives. Our data experts are ready to assist you in leveraging the full potential of this dataset to meet your business goals.

Reach out to Success.ai to access the world of C-level executives and propel your business to new heights with strategic data insights that drive success.
d
Dataset of Dissolved Companies | Company Registry Data | API | Dataset | CSV...
datarade.ai
.json, .csv
Updated Dec 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HitHorizons (2024). Dataset of Dissolved Companies | Company Registry Data | API | Dataset | CSV | JSON | 80M+ Companies | 50 European Countries | GDPR | Monthly Updated [Dataset]. https://datarade.ai/data-products/dataset-of-dissolved-companies-api-csv-json-80m-comp-hithorizons
Explore at:
.json, .csvAvailable download formats
Dataset updated
Dec 17, 2024
Dataset authored and provided by
HitHorizons
Area covered
Greece, San Marino, Spain, Bosnia and Herzegovina, Azerbaijan, Belgium, Switzerland, Ireland, Guernsey, Malta, Europe
Description
HitHorizons Employee Data API gives access to aggregated company data on 80M+ companies from the whole of Europe and beyond.

Company registration data: company name national identifier and its type registered address: street, postal code, city, state / province, country business activity: SIC code, local activity code with classification system year of establishment company type location type

Sales and number of employees data: sales in EUR, USD and local currency (with local currency code) total number of employees sales and number of employees accuracy local number of employees (in case of multiple branches) companies’ sales and number of employees market position compared to other companies in a country / industry / region

Industry data: size of the whole industry size of all companies operating within a particular SIC code benchmarking within a particular country or industry regional benchmarking (EU 27, state / province)

Contact details: company website company email domain (without person’s name)

Invoicing details available for selected countries: company name company address company VAT number
m
Panel dataset on Brazilian fuel demand
data.mendeley.com
Updated Oct 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sergio Prolo (2024). Panel dataset on Brazilian fuel demand [Dataset]. http://doi.org/10.17632/hzpwbp7j22.1
Explore at:
Unique identifier
https://doi.org/10.17632/hzpwbp7j22.1
Dataset updated
Oct 7, 2024
Authors
Sergio Prolo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary : Fuel demand is shown to be influenced by fuel prices, people's income and motorization rates. We explore the effects of electric vehicle's rates in gasoline demand using this panel dataset.

Files : dataset.csv - Panel dimensions are the Brazilian state ( i ) and year ( t ). The other columns are: gasoline sales per capita (ln_Sg_pc), prices of gasoline (ln_Pg) and ethanol (ln_Pe) and their lags, motorization rates of combustion vehicles (ln_Mi_c) and electric vehicles (ln_Mi_e) and GDP per capita (ln_gdp_pc). All variables are all under the natural log function, since we use this to calculate demand elasticities in a regression model.

adjacency.csv - The adjacency matrix used in interaction with electric vehicles' motorization rates to calculate spatial effects. At first, it follows a binary adjacency formula: for each pair of states i and j, the cell (i, j) is 0 if the states are not adjacent and 1 if they are. Then, each row is normalized to have sum equal to one.

regression.do - Series of Stata commands used to estimate the regression models of our study. dataset.csv must be imported to work, see comment section.

dataset_predictions.xlsx - Based on the estimations from Stata, we use this excel file to make average predictions by year and by state. Also, by including years beyond the last panel sample, we also forecast the model into the future and evaluate the effects of different policies that influence gasoline prices (taxation) and EV motorization rates (electrification). This file is primarily used to create images, but can be used to further understand how the forecasting scenarios are set up.

Sources: Fuel prices and sales: ANP (https://www.gov.br/anp/en/access-information/what-is-anp/what-is-anp) State population, GDP and vehicle fleet: IBGE (https://www.ibge.gov.br/en/home-eng.html?lang=en-GB) State EV fleet: Anfavea (https://anfavea.com.br/en/site/anuarios/)