MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset contains US Retail companies with company size from 200-500 workers. For each company, all workers were scrapped as well.
For mode details about scrapping code, you can check my article or GitHub code
This dataset is a merged dataset created from the data provided in the competition "Store Sales - Time Series Forecasting". The other datasets that were provided there apart from train and test (for example holidays_events, oil, stores, etc.) could not be used in the final prediction. According to my understanding, through the EDA of the merged dataset, we will be able to get a clearer picture of the other factors that might also affect the final prediction of grocery sales. Therefore, I created this merged dataset and posted it here for the further scope of analysis.
##### Data Description Data Field Information (This is a copy of the description as provided in the actual dataset)
Train.csv - id: store id - date: date of the sale - store_nbr: identifies the store at which the products are sold. -**family**: identifies the type of product sold. - sales: gives the total sales for a product family at a particular store at a given date. Fractional values are possible since products can be sold in fractional units (1.5 kg of cheese, for instance, as opposed to 1 bag of chips). - onpromotion: gives the total number of items in a product family that were being promoted at a store on a given date. - Store metadata, including ****city, state, type, and cluster.**** - cluster is a grouping of similar stores. - Holidays and Events, with metadata NOTE: Pay special attention to the transferred column. A holiday that is transferred officially falls on that calendar day but was moved to another date by the government. A transferred day is more like a normal day than a holiday. To find the day that it was celebrated, look for the corresponding row where the type is Transfer. For example, the holiday Independencia de Guayaquil was transferred from 2012-10-09 to 2012-10-12, which means it was celebrated on 2012-10-12. Days that are type Bridge are extra days that are added to a holiday (e.g., to extend the break across a long weekend). These are frequently made up by the type Work Day which is a day not normally scheduled for work (e.g., Saturday) that is meant to pay back the Bridge. Additional holidays are days added to a regular calendar holiday, for example, as typically happens around Christmas (making Christmas Eve a holiday). - dcoilwtico: Daily oil price. Includes values during both the train and test data timeframes. (Ecuador is an oil-dependent country and its economic health is highly vulnerable to shocks in oil prices.)
**Note: ***There is a transaction column in the training dataset which displays the sales transactions on that particular date. * Test.csv - The test data, having the same features like the training data. You will predict the target sales for the dates in this file. - The dates in the test data are for the 15 days after the last date in the training data. **Note: ***There is a no transaction column in the test dataset as was there in the training dataset. Therefore, while building the model, you might exclude this column and may use it only for EDA.*
submission.csv - A sample submission file in the correct format.
Envestnet®| Yodlee®'s Retail Sales Data (Aggregate/Row) Panels consist of de-identified, near-real time (T+1) USA credit/debit/ACH transaction level data – offering a wide view of the consumer activity ecosystem. The underlying data is sourced from end users leveraging the aggregation portion of the Envestnet®| Yodlee®'s financial technology platform.
Envestnet | Yodlee Consumer Panels (Aggregate/Row) include data relating to millions of transactions, including ticket size and merchant location. The dataset includes de-identified credit/debit card and bank transactions (such as a payroll deposit, account transfer, or mortgage payment). Our coverage offers insights into areas such as consumer, TMT, energy, REITs, internet, utilities, ecommerce, MBS, CMBS, equities, credit, commodities, FX, and corporate activity. We apply rigorous data science practices to deliver key KPIs daily that are focused, relevant, and ready to put into production.
We offer free trials. Our team is available to provide support for loading, validation, sample scripts, or other services you may need to generate insights from our data.
Investors, corporate researchers, and corporates can use our data to answer some key business questions such as: - How much are consumers spending with specific merchants/brands and how is that changing over time? - Is the share of consumer spend at a specific merchant increasing or decreasing? - How are consumers reacting to new products or services launched by merchants? - For loyal customers, how is the share of spend changing over time? - What is the company’s market share in a region for similar customers? - Is the company’s loyal user base increasing or decreasing? - Is the lifetime customer value increasing or decreasing?
Additional Use Cases: - Use spending data to analyze sales/revenue broadly (sector-wide) or granular (company-specific). Historically, our tracked consumer spend has correlated above 85% with company-reported data from thousands of firms. Users can sort and filter by many metrics and KPIs, such as sales and transaction growth rates and online or offline transactions, as well as view customer behavior within a geographic market at a state or city level. - Reveal cohort consumer behavior to decipher long-term behavioral consumer spending shifts. Measure market share, wallet share, loyalty, consumer lifetime value, retention, demographics, and more.) - Study the effects of inflation rates via such metrics as increased total spend, ticket size, and number of transactions. - Seek out alpha-generating signals or manage your business strategically with essential, aggregated transaction and spending data analytics.
Use Cases Categories (Our data provides an innumerable amount of use cases, and we look forward to working with new ones): 1. Market Research: Company Analysis, Company Valuation, Competitive Intelligence, Competitor Analysis, Competitor Analytics, Competitor Insights, Customer Data Enrichment, Customer Data Insights, Customer Data Intelligence, Demand Forecasting, Ecommerce Intelligence, Employee Pay Strategy, Employment Analytics, Job Income Analysis, Job Market Pricing, Marketing, Marketing Data Enrichment, Marketing Intelligence, Marketing Strategy, Payment History Analytics, Price Analysis, Pricing Analytics, Retail, Retail Analytics, Retail Intelligence, Retail POS Data Analysis, and Salary Benchmarking
Investment Research: Financial Services, Hedge Funds, Investing, Mergers & Acquisitions (M&A), Stock Picking, Venture Capital (VC)
Consumer Analysis: Consumer Data Enrichment, Consumer Intelligence
Market Data: AnalyticsB2C Data Enrichment, Bank Data Enrichment, Behavioral Analytics, Benchmarking, Customer Insights, Customer Intelligence, Data Enhancement, Data Enrichment, Data Intelligence, Data Modeling, Ecommerce Analysis, Ecommerce Data Enrichment, Economic Analysis, Financial Data Enrichment, Financial Intelligence, Local Economic Forecasting, Location-based Analytics, Market Analysis, Market Analytics, Market Intelligence, Market Potential Analysis, Market Research, Market Share Analysis, Sales, Sales Data Enrichment, Sales Enablement, Sales Insights, Sales Intelligence, Spending Analytics, Stock Market Predictions, and Trend Analysis
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.
----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
• 3M+ Contact Profiles • 5M+ Worldwide eCommerce Brands • Direct Contact Info for Decision Makers • Contact Direct Email and Mobile Number • 15+ eCommerce Platforms • 20+ Data Points • Lifetime Support Until You 100% Satisfied
Buy eCommerce leads from our eCommerce leads database today. Reach out to eCommerce companies to expand your business. Now is the time to buy eCommerce leads and start running a campaign to attract new customers. We provide current and accurate information that will assist you in achieving your goals.
Our database is made up of highly valuable and interested leads who are ready to make online purchases. You can always filter our data and choose the database that best meets your needs if you need eCommerce leads based on industry.
We have millions of eCommerce data ready to go no matter where you are. We’ve acquired hundreds of clients from all over the world over the years and delivered data that they’re happy with.
We were able to do so by obtaining data from various locations around the world. As a result, our database is widely accessible, and anyone can use it from any location on the planet. Please contact us if you want the best eCommerce leads .
We sell eCommerce leads that can be filtered by industry. We know what you’re going through and what you’ll need for your next project. As a result, we’ve compiled a list of eCommerce leads that are exactly what you require. With the most potential data we provide, you can grow your business and achieve your business goals. All of our eCommerce leads are generated professionally, with real people – not bots – entering data.
We’re a leading brand in the industry because we source data from the most well-known platforms, ensuring that the information you receive from us is accurate and reliable. That’s especially true because we verify each and every piece of information in order to provide you with yet another benefit in your life.
The majority of our customers have had success with the information we’ve provided. That is why they keep contacting us for our services. You can count on our business-to-business eCommerce sales leads. Contact us to work with one of the most effective lead generation companies in the industry, which has already helped thousands of potential members achieve success.
Every month, we update our eCommerce store sales leads in order to provide our clients with the most accurate data possible. We have a team of professionals who strive for excellence when it comes to gathering the right leads to ensure you get the number of sales you need. Our experts also double-check that all of the sales data we receive is genuine and accurate.
The accuracy of our eCommerce database is why the majority of our clients choose us. Furthermore, we offer round-the-clock support to provide on-demand solutions. We take care of everything so you can spend less time evaluating our product database and more time becoming one of them.
The Mikrozensus survey in September 1984 is on the topics shopping habits and job seeking shopping habits: The question program consists of the following topics: - local supply, especially important for elderly people and in rural areas - time needed for shopping, often a problem for working persons - planning of expenses: Of interest is the consumer behaviour and the planning of the household expenses. - planned purchases: This questions is of special interest in connection with the information gathered in the survey from June (Mikrozensus MZ8402) on the households’ achieved level of equipment with durable consumer goods. job seeking On the topic of job seeking the last study had been conducted in September 1982 (Mikrozensus MZ8203). The survey was conducted to complement the monthly statistic on persons who filed for unemployment at an unemployment office. It gives additional information on people searching for a job who did not file for unemployment at any unemployment office and who are therefore not include in the monthly statistics. This includes all unemployed people (those who lost a job as well as housewives, students, etc.) who might be looking for job as well as those who want to change their workplace because they are dissatisfied with their current job. Apart from providing possibilities for comparing with and complementing the monthly unemployment statistics of the labour market administration, this special survey provides additional information that cannot be found in an administration statistic. Probability: Stratified: Disproportional Face-to-face interview
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Coles is a leading supermarket, retail, and customer service brand in Australia, boasting over 800 outlets nationwide and holding a 27% Australian market share.
ColesStoreData 1. Coles_StoreID - is a unique ID that uniquely identifies each store. 2. Store_Location - provides information on the store’s location in different states. 3. Customer_Count - is the average customer count accounted for in the store. 4. Staff_Count - is the number of employees who work at the stores. 5. Store_Area - is the store’s size, expressed in square meters.
ColesSalesData 1. Coles_StoreIDNo - is the unique ID that identifies each store. 2. Expec_Revenue - is the Expected Revenue in $M a store is supposed to generate. 3. Gross_Sale - is the Gross Sale in $M a store has generated. 4. Sales_Cost - is the cost incurred in $M after the Sale has occurred. 5. Targeted_Quarter - indicates the quarter within a fiscal year to which the sales data corresponds. 6. Coles_Forecast - provides details about whether the store's sales align with the expected revenue, specifically whether the net sales (computed as Gross_Sale - Sales_Cost) meet the expected goal. The forecasted values are determined based on predefined conditions. If the net sales equal or exceed the Expected Revenue, they are categorized as "On Target." If they fall short of the Expected Revenue, they are categorized as "Below Target".
This synthetic dataset can be used for a variety of purposes, including market analysis, sales forecasting, and training machine learning models. It provides a representative sample of data that can be analyzed and used to make informed business decisions without exposing real customer information.
Welcome to the North America Data: Your Gateway to Strategic Connections Across the Americas
In today’s fast-evolving business landscape, having the right data at your fingertips is crucial for success. North America Data offers an unmatched resource designed to empower businesses by providing access to key decision-makers across the vast and diverse markets of North and South America. Our meticulously curated database serves as the cornerstone of your strategic outreach efforts, enabling you to connect with the right people in the right places at the right time.
What Makes Our Data Unique?
Depth and Precision
Our database is more than just a collection of names and contact details—it’s a gateway to deep, actionable insights about the people who shape industries. We go beyond basic data points to offer a nuanced understanding of top executives, owners, founders, and influencers. Whether you're looking to connect with a CEO of a Fortune 500 company or a founder of a dynamic startup, our data provides the precision you need to identify and engage the most relevant decision-makers.
Our Data Sourcing Excellence
Reliability and Integrity
Our data is sourced from a variety of authoritative channels, ensuring that every entry is both reliable and relevant. We draw from respected business directories, publicly available records, and proprietary research methodologies. Each piece of data undergoes a rigorous vetting process, meticulously checked for accuracy, to ensure that you can trust the integrity of the information you receive.
Primary Use-Cases and Industry Verticals
Versatility Across Sectors
The North America Data is a versatile tool designed to meet the needs of a wide range of industries. Whether you're in finance, manufacturing, technology, healthcare, retail, hospitality, energy, transportation, or any other sector, our database provides the critical insights necessary to drive your business forward. Use our data to:
Conduct Market Research: Gain a deeper understanding of industry trends and dynamics.
Seamless Integration with Broader Data Solutions
Comprehensive Business Intelligence
Our North America Data is not an isolated resource; it’s a vital component of our comprehensive business intelligence suite. When combined with our global datasets, it provides a holistic view of the global business landscape. This integrated approach enables businesses to make well-informed decisions, tapping into insights that span across continents and sectors.
Geographical Coverage Across the Americas
Pan-American Reach
Our database covers the entirety of North and South America, offering a robust range of contacts across numerous countries, including but not limited to:
And many more
Extensive Industry Coverage
Tailored to Your Sector
We cater to a vast array of industries, ensuring that no matter your focus, our database has the coverage you need. Key industries include:
Transportation: Engage with decision-makers in logistics, shipping, and infrastructure.
Comprehensive Employee Size and Revenue Data
Insights Across Business Sizes
Our database doesn’t just provide contact information—it also includes detailed data on employee size and revenue. Whether you’re targeting small startups, mid-sized enterprises, or large multinational corporations, our database has the depth to accommodate your needs. We offer insights into:
Revenue Size: Covering companies ranging from early-stage startups to global giants in the Fortune 500.
Empower Your Business with Unmatched Data Access
Unlock Opportunities Across the Americas
With the North America Data, you gain access to a powerful resource designed to unlock endless opportunities for growth and success. Whether you’re looking to break into new markets, establish strong business relationships, or enhance your market intelligence, our database equips you with the tools you need to excel.
Explore the North America Data tod...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A curated list of food prices in South Africa, reported monthly on http://www.pacsa.org.za "What is the PACSA Food Basket? The PACSA Food Basket is an index for food price inflation. It provides insight into the affordability of food and other essential household requirements for working class households in a context of low wages, social grants and high levels of unemployment. The PACSA Food Basket tracks the prices of a basket of 36 basic foods which working class poor households, with 7 members, said they buy every month (based on conversations with women). The food basket is not nutritionally complete; it is a reflection of reality - what people are buying. Data is collected on the same day between the 21st and 24th of each month from six different retail stores which service the lower-income market in Pietermaritzburg, KwaZulu-Natal. Women have told us that they base their purchasing decisions on price and whether the quality of the food is not too poor. Women are savy shoppers and so foods and their prices in each store are selected on this basis. The PACSA Food Basket tracks the foods working class households buy, in the quantities they buy them in and from the supermarkets they buy them from. PACSA has been tracking the price of the basket since 2006. We release our Food Price Barometer monthly and consolidate the data for an annual report to coincide with World Food Day annually on the 16th October." - PACSA website
Due to changes in the collection and availability of data on COVID-19, this website will no longer be updated. The webpage will no longer be available as of 11 May 2023. On-going, reliable sources of data for COVID-19 are available via the COVID-19 dashboard and the UKHSA
Since March 2020, London has seen many different levels of restrictions - including three separate lockdowns and many other tiers/levels of restrictions, as well as easing of restrictions and even measures to actively encourage people to go to work, their high streets and local restaurants. This reports gathers data from a number of sources, including google, apple, citymapper, purple wifi and opentable to assess the extent to which these levels of restrictions have translated to a reductions in Londoners' movements.
The data behind the charts below come from different sources. None of these data represent a direct measure of how well people are adhering to the lockdown rules - nor do they provide an exhaustive data set. Rather, they are measures of different aspects of mobility, which together, offer an overall impression of how people Londoners are moving around the capital. The information is broken down by use of public transport, pedestrian activity, retail and leisure, and homeworking.
For the transport measures, we have included data from google, Apple, CityMapper and Transport for London. They measure different aspects of public transport usage - depending on the data source. Each of the lines in the chart below represents a percentage of a pre-pandemic baseline.
https://cdn.datapress.cloud/london/img/dataset/60e5834b-68aa-48d7-a8c5-7ee4781bde05/2025-06-09T20%3A54%3A15/6b096426c4c582dc9568ed4830b4226d.webp" alt="Embedded Image" />
activity Source Latest Baseline Min value in Lockdown 1 Min value in Lockdown 2 Min value in Lockdown 3 Citymapper Citymapper mobility index 2021-09-05 Compares trips planned and trips taken within its app to a baseline of the four weeks from 6 Jan 2020 7.9% 28% 19% Google Google Mobility Report 2022-10-15 Location data shared by users of Android smartphones, compared time and duration of visits to locations to the median values on the same day of the week in the five weeks from 3 Jan 2020 20.4% 40% 27% TfL Bus Transport for London 2022-10-30 Bus journey ‘taps' on the TfL network compared to same day of the week in four weeks starting 13 Jan 2020 - 34% 24% TfL Tube Transport for London 2022-10-30 Tube journey ‘taps' on the TfL network compared to same day of the week in four weeks starting 13 Jan 2020 - 30% 21% Pedestrian activity
With the data we currently have it's harder to estimate pedestrian activity and high street busyness. A few indicators can give us information on how people are making trips out of the house:
https://cdn.datapress.cloud/london/img/dataset/60e5834b-68aa-48d7-a8c5-7ee4781bde05/2025-06-09T20%3A54%3A15/bcf082c07e4d7ff5202012f0a97abc3a.webp" alt="Embedded Image" />
activity Source Latest Baseline Min value in Lockdown 1 Min value in Lockdown 2 Min value in Lockdown 3 Walking Apple Mobility Index 2021-11-09 estimates the frequency of trips made on foot compared to baselie of 13 Jan '20 22% 47% 36% Parks Google Mobility Report 2022-10-15 Frequency of trips to parks. Changes in the weather mean this varies a lot. Compared to baseline of 5 weeks from 3 Jan '20 30% 55% 41% Retail & Rec Google Mobility Report 2022-10-15 Estimates frequency of trips to shops/leisure locations. Compared to baseline of 5 weeks from 3 Jan '20 30% 55% 41% Retail and recreation
In this section, we focus on estimated footfall to shops, restaurants, cafes, shopping centres and so on.
https://cdn.datapress.cloud/london/img/dataset/60e5834b-68aa-48d7-a8c5-7ee4781bde05/2025-06-09T20%3A54%3A16/b62d60f723eaafe64a989e4afec4c62b.webp" alt="Embedded Image" />
activity Source Latest Baseline Min value in Lockdown 1 Min value in Lockdown 2 Min value in Lockdown 3 Grocery/pharmacy Google Mobility Report 2022-10-15 Estimates frequency of trips to grovery shops and pharmacies. Compared to baseline of 5 weeks from 3 Jan '20 32% 55.00% 45.000% Retail/rec <a href="https://ww
Number of employees by North American Industry Classification System (NAICS) and type of employee, last 5 years.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
# Brazilian E-Commerce Public Dataset by Olist
Welcome! This is a Brazilian ecommerce public dataset of orders made at Olist Store. The dataset has information of 100k orders from 2016 to 2018 made at multiple ## marketplaces in Brazil. Its features allows viewing an order from multiple dimensions: from order status, price, payment and freight performance to customer location, product attributes and finally reviews written by customers. We also released a geolocation dataset that relates Brazilian zip codes to lat/lng coordinates.
This is real commercial data, it has been anonymised, and references to the companies and partners in the review text have been replaced with the names of Game of Thrones great houses.
# Join it With the Marketing Funnel by Olist
We have also released a Marketing Funnel Dataset. You may join both datasets and see an order from Marketing perspective now!
Instructions on joining are available on this Kernel.
Context
This dataset # was generously provided by Olist, the largest department store in Brazilian marketplaces. Olist connects small businesses from all over Brazil to channels without hassle and with a single contract. Those merchants are able to sell their products through the Olist Store and ship them directly to the customers using Olist logistics partners. See more on our website: www.olist.com
After a customer purchases the product from Olist Store a seller gets notified to fulfill that order. Once the customer receives the product, or the estimated delivery date is due, the customer gets a satisfaction survey by email where he can give a note for the purchase experience and write down some comments. Attention
An order might have multiple items.
Each item might be fulfilled by a distinct seller.
All text identifying stores and partners where replaced by the names of Game of Thrones great houses.
Example of a product listing on a marketplace
https://i.imgur.com/JuJMns1.png" alt="">
Example of a product listing on a marketplace
Data Schema
https://i.imgur.com/HRhd2Y0.png" alt="">
The data is divided in multiple datasets for better understanding and organization. Please refer to the following data schema when working with it: Data Schema # Classified Dataset
We had previously released a classified dataset, but we removed it at Version 6. We intend to release it again as a new dataset with a new data schema. While we don't finish it, you may use the classified dataset available at the Version 5 or previous. # Inspiration
Here are some inspiration for possible outcomes from this dataset.
NLP:
This dataset offers a supreme environment to parse out the reviews text through its multiple dimensions.
Clustering:
Some customers didn't write a review. But why are they happy or mad?
Sales Prediction:
With purchase date information you'll be able to predict future sales.
Delivery Performance:
You will also be able to work through delivery performance and find ways to optimize delivery times.
Product Quality:
Enjoy yourself discovering the products categories that are more prone to customer insatisfaction.
Feature Engineering:
Create features from this rich dataset or attach some external public information to it.
Thanks to Olist for releasing this dataset.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By [source]
This data set provides an in-depth look into the ordering, invoicing and sales processes at a supermarket. With information ranging from customers' meal choices to the value of their orders and whether they were converted into sales, this dataset opens up endless possibilities to uncover consumer behavior trends and engagement within the business. From understanding who is exchanging with the company and when, to seeing what types of meals are most popular with consumers, this rich collection of data will allow us to gain priceless insights into consumer actions and habits that can inform strategic decisions. Dive deep into big data now by exploring Invoices.csv, OrderLeads.csv and SalesTeam.csv for invaluable knowledge about your customers!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset provides an in-depth look into the ordering and invoicing processes of a supermarket, as well as how consumers are engaging with it. This dataset can be used to analyze and gain insights into consumer purchasing behaviors and preferences at the store.
The first step in analyzing this data set is to familiarize yourself with its content. The dataset contains three CSV files: Invoices.csv, OrderLeads.csv, and SalesTeam.csv have different features like date of meal, participants, Meal Price, Type of meal ,company Name ,Order Value etc .Each file contains a list of columns containing data related to each particular feature like Date ,Date Of Meal Participants etc .
Once you understand what types of information is included in each table it’ll be easier for you to start drawing conclusions about customer preferences and trends from within the store's data set. You can use mathematical functions or statistical models such as regression analysis or cluster analysis in order to gain even further insight into customers’ behaviors within the store setting. Additionally you could use machine learning algorithms such as K-Nearest Neighbors (KNN) or Support Vector Machines (SVM) if your goal was improving targeting strategy or recognizing patterns between customer purchases over time.
All these techniques will help you determine what promotional tactics work best when trying to attract customers and promote sales through various marketing campaigns at this supermarket chain They will also help shed light on how customers engage with products within categories across different days/weeks/months according to their own individual purchasing habits which would ultimately contribute towards improved marketing strategies from management side .
Overall this data set provides immense potential for advancing understanding retail behaviour by allowing us access specific transactions that occurred at a given time frame; ultimately providing us detailed insight into customer behavior trends along with tools such software packages that allow us manipulate these metrics however necessary for entertainment purposes that help us identify strategies designed for greater efficiency when increasing revenue
- Identifying the most profitable customer segment based on order value and converted sales.
- Leveraging trends in participant size to suggest meal packages for different types of meals.
- Analyzing the conversion rate of orders over time to optimize promotional strategies and product offerings accordingly
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: Invoices.csv | Column name | Description | |:-----------------|:-------------------------------------------------------------| | Date | The date the order was placed. (Date) | | Date of Meal | The date the meal was served. (Date) | | Participants | The number of people who participated in the meal. (Integer) | | Meal Price | The cost of the meal. (Float) | | Type of Meal | The...
Online shopping sales across India amounted to around ** billion U.S. dollars in 2021. The e-commerce market is likely to grow to over *** billion U.S. dollars by 2025. The e-commerce market in India is the fastest-growing market in the world. Online retail segments In fiscal year 2017, the retail market was led by electronics with a penetration rate of about ** percent. However, in terms of groceries, local offline vendors or kiranas continued to be the preferred choice for daily groceries due the ease of bargaining and benefitting from the ‘old-customer’ designation with extra rations as a gesture from the vendor. Nevertheless, the number of online shoppers in the country was estimated to increase to over *** million in 2025, up from around ** million in 2017. Impact of COVID-19 on the marketThe coronavirus outbreak in March 2020 caused a surge in prices across e-commerce platforms. Panic purchasing resulted in the shortage of sanitary and food items online as well as in physical stores across the country. As the online consumption continued to increase, unscrupulous sellers jacked up the prices on certain items. Amazon and Flipkart, the two e-commerce market leaders in India urged sellers and even blocked certain products to exercise responsible pricing. Manufacturers increased production in order to keep up with the supply of fast-moving items. With the uncertainty surrounding the impact of COVID-19, manufacturers and retailers will presumably have to work in unison to keep track of an unprecedented demand and supply scenario.
The collected data sets come from the multi-branch store computer system. The data shows: stocking, sales, sales statistics, characteristics of products sold from January 2018 - December 2018.
Store was open in 2009 and is located in Poland. The shop area is 120m2. We offer general food-and basic chemistry, hygienic articles. We have fresh bread from 4 different bakers,sweets, local vegetables, dairy, basic meat(ham,sausages), newspaper, home chemistry etc. Interior is basic.
Location: Shop is located in city that population is around 28 000 people. Shop is placed in mid of house estate( block of flats), near is sports field. The store is open every day: Monday-Saturday from 06:00 to 22:00, Sunday from 10:00 to 20:00. The store has 4 employees. Work in the store takes place on 3 shifts. First: 06:00- 12:00, second: 10:00-16: 00/18:00 and third: 16: 00-22: 00.
The nearest competition: There is another grocery store nearby (30 m). The second store is smaller - also a delicatessen, but half smaller. They offer similar products for daily use- bread, dairy, some meat and general foods. I'm not sure about alcohol and how wide their offer is. However in our store the offer is richer(bread is delivered from 4 different bakers). To know exactly what are the differences I need check details.
Grocery stores in the town:
1 hypermarket
8 supermarkets
25 groceries stores
Shopping trends in Poland Connected to our location: People tend to do general food shopping in supermarkets. If they need daily fresh things, something is missing or they need some special product (not valid at supermarket) they do shopping at groceries like ours. Still in Poland people prefer to go to shop in the neighborhood to do: quicker shopping/talk to people/or just throw out rubbish and do shop at once. To do bigger shopping they go by car to supermarket e.g. after work or on weekend.
Online shopping: E-commerce are 1% of the sales of the FMCG goods market in Poland. It is starting to be popular in bigger cities like Warsaw, Krakow etc. Not popular in our city.
Health trend: -Three-quarters of Polish consumers agree with the statement that "you are what you eat". Therefore, we pay more attention to what we eat and do not save on food products
Convenience trend: According to the expert, the habits of buyers will not change so quickly, and the fact is that Poles like to shop flat - Polish shoppers visit 4 shops a month on average. Also the vast majority of them tend to make smaller purchases, which confirms the most popular shopping mission - replenishing stocks. However, the shopping experience is pleasant in the third place among buyers' motivation and selection of the store. 8 out of 10 buyers prefer to shop in a well-organized store with a nice atmosphere. This is one of the reasons for the development of the convenience channel. He also responds very well to other needs of Polish consumers, because Poles definitely have less and less time, so shopping must be fast and convenient. In this situation, the price is not the most important - 30% of Polish buyers declare that anything that saves their time is worth the higher price.
Our costumer is located in the neighborhood leave in house estate (block of flats). During events of the sport field our opening hours are adjusted to get more costumers from event. Moreover, during trade free Sundays we have costumers from City. Some of the costumer work abroad and come to our shop when they are at home and have special order- e.g. cigarettes packages.
Average age of people is 40 years old. Gender split is equal between men and women. Majority of population are marriages 60% and city has positive natural increase. Unemployment rate is low and similar to country rate- around 7%. Average monthly gross salary is around 3800 PLN gross .This is between minimum and average salary in Poland. (Minimum wage in Poland is :2250 PLN gross and average wage is : 4272 PLN gross.) Occupation split of people is : 40 % industry and construction, 30% agricultural sector, 11%service sector and other. Companies in the city are micro and small ( only few big companies). City is not touristic. In general situation in city is good-budget revenues are growing year to year. Additionally, polish government gives social funds for every second children starts from 2017 and now in 2019 it is going to be extended to every children, without limits. This should boost economy.
In general- Costumer in the city has good shopping condition.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset contains US Retail companies with company size from 200-500 workers. For each company, all workers were scrapped as well.
For mode details about scrapping code, you can check my article or GitHub code