58 datasets found

Gen Z Money Spending Dataset
kaggle.com
Updated Jan 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anand Kumar (2025). Gen Z Money Spending Dataset [Dataset]. https://www.kaggle.com/datasets/manandkumar/gen-z-money-spending-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 31, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Anand Kumar
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset provides insights into the spending habits of Gen Z (ages 18-27) across various categories such as rent, groceries, entertainment, education, savings, and more. It contains 1700 records and 15 financial attributes, making it a valuable resource for financial trend analysis, budgeting studies, and machine learning applications in personal finance.
Fake News Prediction Dataset
kaggle.com
Updated Nov 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rajat Kumar (2023). Fake News Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/rajatkumar30/fake-news
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 3, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rajat Kumar
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
** Please Upvote if you like the dataset **

Fake news or hoax news is false or misleading information presented as news. Fake news often has the aim of damaging the reputation of a person or entity, or making money through advertising revenue.

This dataset is having Both Fake and Real news.

The columns present in the dataset are:-

1) Title -> Title of the News

2) Text -> Text or Content of the News

3) Label -> Labelling the news as Fake or Real
A
‘Hotel Prices - Beginner Dataset’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Hotel Prices - Beginner Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-hotel-prices-beginner-dataset-6aca/74a157b1/?iid=000-816&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Hotel Prices - Beginner Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/sveneschlbeck/hotel-prices-beginner-dataset on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

This dataset addresses Data Science students and/or Beginners who want to dive into Regression or Clustering without the need to pre-clean the data first.

Content

This dataset consists of a pre-cleaned .csv table that has been translated from German to English.

There are four columns in this dataset:

Profit (How much money does this hotel make in a year)

Price in Millions (€)

Square Meter (Hotel Area)

City

Here, "Hotel Prices" does not refer to the cost of spending a night at those hotels but the price for buying them. This would be an interesting chart for someone who wants to buy a hotel and needs to judge whether he/she is overpaying or getting a great deal depending on similar objects in other comparable cities.

--- Original source retains full ownership of the source dataset ---
Realistic Sales Revenue Dataset
kaggle.com
Updated Jul 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shoukat khan (2025). Realistic Sales Revenue Dataset [Dataset]. https://www.kaggle.com/datasets/drisrarahmad/realistic-sales-revenue-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 16, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Shoukat khan
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
📄 Description: This synthetic dataset is designed for practising regression tasks, particularly for predicting Sales Revenue based on product, market, and economic factors. It contains both categorical (nominal) and numerical features, simulating real-world sales data across various product categories and regions.

📌 Dataset Summary: Rows: 2000

Columns: 12 features + 1 target (SalesRevenue)

🏷️ Columns Description: Column Name Type Description ProductCategory Categorical Type of product: Electronics, Clothing, Furniture, Toys Region Categorical Sales region: North, South, East, West CustomerSegment Categorical Customer income group: Low, Middle, High IsPromotionApplied Categorical Whether promotion was applied: Yes/No ProductionCost Numerical Cost to produce the product MarketingSpend Numerical Money spent on marketing SeasonalDemandIndex Numerical Factor representing seasonal demand CompetitorPrice Numerical Average price of competing products CustomerRating Numerical Average customer rating (out of 5) EconomicIndex Numerical Indicator of overall economic conditions StoreCount Numerical Number of stores selling the product OnlinePresence Numerical Online presence score of the product SalesRevenue Numerical Target Variable: Revenue from product sales
A
‘🌡 Weather Check’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘🌡 Weather Check’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-weather-check-aa67/dfd26413/?iid=001-955&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘🌡 Weather Check’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/weather-checke on 28 January 2022.

--- Dataset description provided by original source is as follows ---

About this dataset

See Readme for more details.
This repository contains a selection of the data -- and the data-processing scripts -- behind the articles, graphics and interactives at FiveThirtyEight.

We hope you'll use it to check our work and to create stories and visualizations of your own. The data is available under the Creative Commons Attribution 4.0 International License and the code is available under the MIT License. If you do find it useful, please let us know](andrei.scheinkman@fivethirtyeight.com).

Source: https://github.com/fivethirtyeight/data

This dataset was created by FiveThirtyEight and contains around 900 samples along with What Is Your Gender?, Age, technical information and other features such as: - How Much Total Combined Money Did All Members Of Your Household Earn Last Year? - If You Had A Smartwatch (like The Soon To Be Released Apple Watch), How Likely Or Unlikely Would You Be To Check The Weather On That Device? - and more.

How to use this dataset

Analyze How Do You Typically Check The Weather? in relation to Do You Typically Check A Daily Weather Report?

Study the influence of Us Region on A Specific Website Or App (please Provide The Answer)

More datasets

Acknowledgements

If you use this dataset in your research, please credit FiveThirtyEight

Start A New Notebook!

--- Original source retains full ownership of the source dataset ---
A
‘Air Passengers Forecast Dataset’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Air Passengers Forecast Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-air-passengers-forecast-dataset-dbf6/latest
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Air Passengers Forecast Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yasserh/air-passengers-forecast-dataset on 28 January 2022.

--- Dataset description provided by original source is as follows ---

https://raw.githubusercontent.com/Masterx-AI/Project_Forecasting_Air_Passengers_/main/ap.jpg" alt="">

Description:

The "spam" concept is diverse: advertisements for products/websites, make money fast schemes, chain letters, pornography...

Our collection of spam e-mails came from our postmaster and individuals who had filed spam. Our collection of non-spam e-mails came from filed work and personal e-mails, and hence the word 'George' and the area code '650' are indicators of non-spam. These are useful when constructing a personalized spam filter. One would either have to blind such non-spam indicators or get a very wide collection of non-spam to generate a general-purpose spam filter.

The dataset, taken from the UCI ML repository, contains about 4600 emails labelle as spam or ham.

Acknowledgements:

This dataset has been referred from Kaggle.

Objective:

Understand the Dataset & cleanup (if required).

Build classification models to predict whether or not the email is spam.

Also fine-tune the hyperparameters & compare the evaluation metrics of various classification algorithms.

--- Original source retains full ownership of the source dataset ---
A
‘Spam Emails Dataset’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Spam Emails Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-spam-emails-dataset-6e4f/a414900c/?iid=003-246&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Spam Emails Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yasserh/spamemailsdataset on 28 January 2022.

--- Dataset description provided by original source is as follows ---

https://raw.githubusercontent.com/Masterx-AI/Project_Email_Spam_Detection_/main/ee.png" alt="">

Description:

The "spam" concept is diverse: advertisements for products/web sites, make money fast schemes, chain letters, pornography...

Our collection of spam e-mails came from our postmaster and individuals who had filed spam. Our collection of non-spam e-mails came from filed work and personal e-mails, and hence the word 'george' and the area code '650' are indicators of non-spam. These are useful when constructing a personalized spam filter. One would either have to blind such non-spam indicators or get a very wide collection of non-spam to generate a general purpose spam filter.

The dataset, taken from the UCI ML repository, contains about 4600 emails labelled as spam or ham.

The dataset can be downloaded here: https://archive.ics.uci.edu/ml/datasets/spambase

Objective:

Understand the Dataset & cleanup (if required).

Build classification models to predict whether or not the email spam.

Compare the evaluation metrics of vaious classification algorithms.

--- Original source retains full ownership of the source dataset ---
A
‘NFT History Sales’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘NFT History Sales’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-nft-history-sales-d36d/latest
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘NFT History Sales’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/mathurinache/nft-history-sales on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Make money using NFT + AI | How to get started?

Introduction By now, you would have at least heard about NFT. If not, you would have heard about art which sold for $69 million. May be at-least record sales on NBA and with NFT. This got me curious, I started to research more on it recently by wearing my Data Scientist hat. Data Scientists follow when some numbers are there! In this article, I will share about NFT’s and how to get started.

--- Original source retains full ownership of the source dataset ---
A
‘FAANG- Complete Stock Data’ analyzed by Analyst-2
analyst-2.ai
Updated Sep 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘FAANG- Complete Stock Data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-faang-complete-stock-data-36c1/9110ef3b/?iid=011-763&v=presentation
Explore at:
Dataset updated
Sep 30, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘FAANG- Complete Stock Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/aayushmishra1512/faang-complete-stock-data on 30 September 2021.

--- Dataset description provided by original source is as follows ---

Context

There are a few companies that are considered to be revolutionary. These companies also happen to be a dream place to work at for many many people across the world. These companies include - Facebook,Amazon,Apple,Netflix and Google also known as FAANG! These companies make ton of money and they help others too by giving them a chance to invest in the companies via stocks and shares. This data wass made targeting these stock prices.

Content

The data contains information such as opening price of a stock, closing price, how much of these stocks were sold and many more things. There are 5 different CSV files in the data for each company.

--- Original source retains full ownership of the source dataset ---
Online Sales Data Power BI Dashboard
kaggle.com
Updated Aug 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
manjeshkumar05 (2024). Online Sales Data Power BI Dashboard [Dataset]. https://www.kaggle.com/datasets/manjeshkumar05/online-sales-data-power-bi-dashboard
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 20, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
manjeshkumar05
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Exploring Online Sales Data with Power BI !!

Another productive day diving into online sales dataset! Here’s a roundup of the insights I uncovered today:

Revenue by Category: Analyzed revenue distribution across different product categories to identify high-performing sectors.

Revenue by Sub-Category: Drilled down into sub-categories for a more granular view of revenue streams.

Revenue by Payment Mode: Examined revenue patterns based on payment methods to understand customer preferences.

Revenue by State: Mapped out revenue by state to pinpoint geographical strengths and opportunities.

Profit by Category: Evaluated profitability across product categories to assess which categories yield the highest profit margins.

Profit by Sub-Category: Explored profit levels at a sub-category level to identify the most profitable segments.

Profit by Payment Mode: Analyzed profit distribution across different payment methods.

Top 5 States by Revenue and Profit: Highlighted the top 5 states driving the most revenue and profit, offering insights into regional performance.

Sales Map by State: Visualized sales data on a map to provide a geographical perspective on sales distribution.

Total Quantity, Revenue, and Profit: Aggregated data to give an overview of total quantities sold, overall revenue, and total profit.

Filter by Category: Added a filter functionality to focus on specific categories and refine data analysis.
A
‘Student Food Survey’ analyzed by Analyst-2
analyst-2.ai
Updated Sep 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Student Food Survey’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-student-food-survey-fc60/f814e5a0/?iid=041-303&v=presentation
Explore at:
Dataset updated
Sep 30, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Student Food Survey’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/mlomuscio/student-food-survey on 30 September 2021.

--- Dataset description provided by original source is as follows ---

Purpose

This data came from a survey of students. The purpose of the survey was to identify attitudes and habits regarding food consumption at school and outside of school.

Description of Variables

Gender: What gender do you identify as?

Boarding: Day or Boarding?

Grade: What grade are you in?

Athlete: Do you consider yourself an athlete?

Activities: Are you participating in any of the following activities this season?

DHBreakfast: How many days a week do you eat breakfast in the dining hall (including brunch on Saturday and Sunday)?

NDHBreakfast: How many days a week do you eat breakfast but not in the dining hall?

BClass: Would you be more likely to eat breakfast if you could eat it in class?

DHBoxes: On average, how many boxes of food do you eat per meal at the dining hall?

NDHBoxes: How many boxes do you take from the dining hall to eat later?

NDHMeals: How many meals a week do you eat in the dorm or student center?

Nutrition: On a scale of 0 to 5, how aware are you of the nutritional values of the food you eat?

Money: On average, how much money do spend on food outside of the dining hall per week?

--- Original source retains full ownership of the source dataset ---
Superstore Dataset
kaggle.com
Updated Feb 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vivek Chowdhury (2022). Superstore Dataset [Dataset]. https://www.kaggle.com/datasets/vivek468/superstore-dataset-final/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 22, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Vivek Chowdhury
Description
Context

With growing demands and cut-throat competitions in the market, a Superstore Giant is seeking your knowledge in understanding what works best for them. They would like to understand which products, regions, categories and customer segments they should target or avoid.

You can even take this a step further and try and build a Regression model to predict Sales or Profit.

Go crazy with the dataset, but also make sure to provide some business insights to improve.

Metadata

Row ID => Unique ID for each row. Order ID => Unique Order ID for each Customer. Order Date => Order Date of the product. Ship Date => Shipping Date of the Product. Ship Mode=> Shipping Mode specified by the Customer. Customer ID => Unique ID to identify each Customer. Customer Name => Name of the Customer. Segment => The segment where the Customer belongs. Country => Country of residence of the Customer. City => City of residence of of the Customer. State => State of residence of the Customer. Postal Code => Postal Code of every Customer. Region => Region where the Customer belong. Product ID => Unique ID of the Product. Category => Category of the product ordered. Sub-Category => Sub-Category of the product ordered. Product Name => Name of the Product Sales => Sales of the Product. Quantity => Quantity of the Product. Discount => Discount provided. Profit => Profit/Loss incurred.

Acknowledgements

I do not own this data. I merely found it from the Tableau website. All credits to the original authors/creators. For educational purposes only.
IBM AMLSim Example Dataset
kaggle.com
Updated Jul 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ansh Ankul (2021). IBM AMLSim Example Dataset [Dataset]. https://www.kaggle.com/anshankul/ibm-amlsim-example-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 20, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ansh Ankul
Description
IBM AMLSim

IBM AMLSim: The AMLSim project is intended to provide a multi-agent based simulator that generates synthetic banking transaction data together with a set of known money laundering patterns - mainly for the purpose of testing machine learning models and graph algorithms.

This dataset is an example dataset generated from IBM AMLSim.

Content

There are 3 datasets mentioned here: alerts, transactions and accounts.

Accounts dataset: Contains the information about all the bank accounts whose transactions are monitored.

Alerts dataset: Contains the transactions which triggered an alert according to AML guidelines.

Transactions dataset: Contains the list of all the transactions with information about sender and receiver accounts.

Acknowledgements

Do check out the AML Sim project and generate your own datasets for AML purposes. Link: https://github.com/IBM/AMLSim

License

IBM/AMLSim is licensed under the Apache License 2.0 A permissive license whose main conditions require preservation of copyright and license notices. Contributors provide an express grant of patent rights. Licensed works, modifications, and larger works may be distributed under different terms and without source code. Link: https://github.com/IBM/AMLSim/blob/master/LICENSE
A
‘Unsupervised Learning on Country Data’ analyzed by Analyst-2
analyst-2.ai
Updated Nov 21, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Unsupervised Learning on Country Data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-unsupervised-learning-on-country-data-cd76/5b0fd9e4/?iid=001-487&v=presentation
Explore at:
Dataset updated
Nov 21, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Unsupervised Learning on Country Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/rohan0301/unsupervised-learning-on-country-data on 21 November 2021.

--- Dataset description provided by original source is as follows ---

Clustering the Countries by using Unsupervised Learning for HELP International

Objective:

To categorise the countries using socio-economic and health factors that determine the overall development of the country.

About organization:

HELP International is an international humanitarian NGO that is committed to fighting poverty and providing the people of backward countries with basic amenities and relief during the time of disasters and natural calamities.

Problem Statement:

HELP International have been able to raise around $ 10 million. Now the CEO of the NGO needs to decide how to use this money strategically and effectively. So, CEO has to make decision to choose the countries that are in the direst need of aid. Hence, your Job as a Data scientist is to categorise the countries using some socio-economic and health factors that determine the overall development of the country. Then you need to suggest the countries which the CEO needs to focus on the most.

--- Original source retains full ownership of the source dataset ---
Retail Transactions Dataset
kaggle.com
Updated May 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prasad Patil (2024). Retail Transactions Dataset [Dataset]. https://www.kaggle.com/datasets/prasad22/retail-transactions-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 18, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Prasad Patil
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset was created to simulate a market basket dataset, providing insights into customer purchasing behavior and store operations. The dataset facilitates market basket analysis, customer segmentation, and other retail analytics tasks. Here's more information about the context and inspiration behind this dataset:

Context:

Retail businesses, from supermarkets to convenience stores, are constantly seeking ways to better understand their customers and improve their operations. Market basket analysis, a technique used in retail analytics, explores customer purchase patterns to uncover associations between products, identify trends, and optimize pricing and promotions. Customer segmentation allows businesses to tailor their offerings to specific groups, enhancing the customer experience.

Inspiration:

The inspiration for this dataset comes from the need for accessible and customizable market basket datasets. While real-world retail data is sensitive and often restricted, synthetic datasets offer a safe and versatile alternative. Researchers, data scientists, and analysts can use this dataset to develop and test algorithms, models, and analytical tools.

Dataset Information:

The columns provide information about the transactions, customers, products, and purchasing behavior, making the dataset suitable for various analyses, including market basket analysis and customer segmentation. Here's a brief explanation of each column in the Dataset:

Transaction_ID: A unique identifier for each transaction, represented as a 10-digit number. This column is used to uniquely identify each purchase.

Date: The date and time when the transaction occurred. It records the timestamp of each purchase.

Customer_Name: The name of the customer who made the purchase. It provides information about the customer's identity.

Product: A list of products purchased in the transaction. It includes the names of the products bought.

Total_Items: The total number of items purchased in the transaction. It represents the quantity of products bought.

Total_Cost: The total cost of the purchase, in currency. It represents the financial value of the transaction.

Payment_Method: The method used for payment in the transaction, such as credit card, debit card, cash, or mobile payment.

City: The city where the purchase took place. It indicates the location of the transaction.

Store_Type: The type of store where the purchase was made, such as a supermarket, convenience store, department store, etc.

Discount_Applied: A binary indicator (True/False) representing whether a discount was applied to the transaction.

Customer_Category: A category representing the customer's background or age group.

Season: The season in which the purchase occurred, such as spring, summer, fall, or winter.

Promotion: The type of promotion applied to the transaction, such as "None," "BOGO (Buy One Get One)," or "Discount on Selected Items."

Use Cases:

Market Basket Analysis: Discover associations between products and uncover buying patterns.

Customer Segmentation: Group customers based on purchasing behavior.

Pricing Optimization: Optimize pricing strategies and identify opportunities for discounts and promotions.

Retail Analytics: Analyze store performance and customer trends.

Note: This dataset is entirely synthetic and was generated using the Python Faker library, which means it doesn't contain real customer data. It's designed for educational and research purposes.
A
‘Loan Data’ analyzed by Analyst-2
analyst-2.ai
Updated Aug 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘Loan Data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-loan-data-faf6/0734d774/?iid=005-759&v=presentation
Explore at:
Dataset updated
Aug 4, 2020
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Loan Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/itssuru/loan-data on 28 January 2022.

--- Dataset description provided by original source is as follows ---

About the data and what to do…

publicly available data from LendingClub.com. Lending Club connects people who need money (borrowers) with people who have money (investors). Hopefully, as an investor you would want to invest in people who showed a profile of having a high probability of paying you back.

We will use lending data from 2007-2010 and be trying to classify and predict whether or not the borrower paid back their loan in full. You can download the data from here.

Here are what the columns represent:

credit.policy: 1 if the customer meets the credit underwriting criteria of LendingClub.com, and 0 otherwise. purpose: The purpose of the loan (takes values "credit_card", "debt_consolidation", "educational", "major_purchase", "small_business", and "all_other"). int.rate: The interest rate of the loan, as a proportion (a rate of 11% would be stored as 0.11). Borrowers judged by LendingClub.com to be more risky are assigned higher interest rates. installment: The monthly installments owed by the borrower if the loan is funded. log.annual.inc: The natural log of the self-reported annual income of the borrower. dti: The debt-to-income ratio of the borrower (amount of debt divided by annual income). fico: The FICO credit score of the borrower. days.with.cr.line: The number of days the borrower has had a credit line. revol.bal: The borrower's revolving balance (amount unpaid at the end of the credit card billing cycle). revol.util: The borrower's revolving line utilization rate (the amount of the credit line used relative to total credit available). inq.last.6mths: The borrower's number of inquiries by creditors in the last 6 months. delinq.2yrs: The number of times the borrower had been 30+ days past due on a payment in the past 2 years. pub.rec: The borrower's number of derogatory public records (bankruptcy filings, tax liens, or judgments).

--- Original source retains full ownership of the source dataset ---
Spam Dataset
kaggle.com
Updated Aug 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arman Aghania (2023). Spam Dataset [Dataset]. https://www.kaggle.com/datasets/armanaghania/spam-dataset/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 16, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Arman Aghania
Description
Spam E-mail Data

Description

The data consist of 4601 email items, of which 1813 items were identified as spam.

Usage

spam7

Format

This data frame contains the following columns: * crl.tot * total length of words in capitals * dollar * number of occurrences of the $ symbol * bang * number of occurrences of the ! symbol * money * number of occurrences of the word ‘money’ * n000 * number of occurrences of the string ‘000’ * make * number of occurrences of the word ‘make’ * yesno * outcome variable, a factor with levels n not spam, y spam
🦈 Shark Tank India dataset 🇮🇳
kaggle.com
Updated Apr 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Satya Thirumani (2025). 🦈 Shark Tank India dataset 🇮🇳 [Dataset]. https://www.kaggle.com/datasets/thirumani/shark-tank-india
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 20, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Satya Thirumani
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Shark Tank India Data set.

Shark Tank India - Season 1 to season 4 information, with 80 fields/columns and 630+ records.

All seasons/episodes of 🦈 SHARKTANK INDIA 🇮🇳 were broadcasted on SonyLiv OTT/Sony TV.

Here is the data dictionary for (Indian) Shark Tank season's dataset.

Season Number - Season number

Startup Name - Company name or product name

Episode Number - Episode number within the season

Pitch Number - Overall pitch number

Season Start - Season first aired date

Season End - Season last aired date

Original Air Date - Episode original/first aired date, on OTT/TV

Episode Title - Episode title in SonyLiv

Anchor - Name of the episode presenter/host

Industry - Industry name or type

Business Description - Business Description

Company Website - Company Website URL

Started in - Year in which startup was started/incorporated

Number of Presenters - Number of presenters

Male Presenters - Number of male presenters

Female Presenters - Number of female presenters

Transgender Presenters - Number of transgender/LGBTQ presenters

Couple Presenters - Are presenters wife/husband ? 1-yes, 0-no

Pitchers Average Age - All pitchers average age, <30 young, 30-50 middle, >50 old

Pitchers City - Presenter's town/city or place where company head office exists

Pitchers State - Indian state pitcher hails from or state where company head office exists

Yearly Revenue - Yearly revenue, in lakhs INR, -1 means negative revenue, 0 means pre-revenue

Monthly Sales - Total monthly sales, in lakhs

Gross Margin - Gross margin/profit of company, in percentages

Net Margin - Net margin/profit of company, in percentages

EBITDA - Earnings Before Interest, Taxes, Depreciation, and Amortization

Cash Burn - In loss in current year; burning/paying money from their pocket (yes/no)

SKUs - Stock Keeping Units or number of varieties, at the time of pitch

Has Patents - Pitcher has Patents/Intellectual property (filed/granted), at the time of pitch

Bootstrapped - Startup is bootstrapped or not (yes/no)

Part of Match off - Competition between two similar brands, pitched at same time

Original Ask Amount - Original Ask Amount, in lakhs INR

Original Offered Equity - Original Offered Equity, in percentages

Valuation Requested - Valuation Requested, in lakhs INR

Received Offer - Received offer or not, 1-received, 0-not received

Accepted Offer - Accepted offer or not, 1-accepted, 0-rejected

Total Deal Amount - Total Deal Amount, in lakhs INR

Total Deal Equity - Total Deal Equity, in percentages

Total Deal Debt - Total Deal debt/loan amount, in lakhs INR

Debt Interest - Debt interest rate, in percentages

Deal Valuation - Deal Valuation, in lakhs INR

Number of sharks in deal - Number of sharks involved in deal

Deal has conditions - Deal has conditions or not? (yes or no)

Royalty Percentage - Royalty percentage, if it's royalty deal

Royalty Recouped Amount - Royalty recouped amount, if it's royalty deal, in lakhs

Advisory Shares Equity - Deal with Advisory shares or equity, in percentages

Namita Investment Amount - Namita Investment Amount, in lakhs INR

Namita Investment Equity - Namita Investment Equity, in percentages

Namita Debt Amount - Namita Debt Amount, in lakhs INR

Vineeta Investment Amount - Vineeta Investment Amount, in lakhs INR

Vineeta Investment Equity - Vineeta Investment Equity, in percentages

Vineeta Debt Amount - Vineeta Debt Amount, in lakhs INR

Anupam Investment Amount - Anupam Investment Amount, in lakhs INR

Anupam Investment Equity - Anupam Investment Equity, in percentages

Anupam Debt Amount - Anupam Debt Amount, in lakhs INR

Aman Investment Amount - Aman Investment Amount, in lakhs INR

Aman Investment Equity - Aman Investment Equity, in percentages

Aman Debt Amount - Aman Debt Amount, in lakhs INR

Peyush Investment Amount - Peyush Investment Amount, in lakhs INR

Peyush Investment Equity - Peyush Investment Equity, in percentages

Peyush Debt Amount - Peyush Debt Amount, in lakhs INR

Ritesh Investment Amount - Ritesh Investment Amount, in lakhs INR

Ritesh Investment Equity - Ritesh Investment Equity, in percentages

Ritesh Debt Amount - Ritesh Debt Amount, in lakhs INR

Amit Investment Amount - Amit Investment Amount, in lakhs INR

Amit Investment Equity - Amit Investment Equity, in percentages

Amit Debt Amount - Amit Debt Amount, in lakhs INR

Guest Investment Amount - Guest Investment Amount, in lakhs INR

Guest Investment Equity - Guest Investment Equity, in percentages

Guest Debt Amount - Guest Debt Amount, in lakhs INR

Invested Guest Name - Name of the guest(s) who invested in deal

All Guest Names - Name of all guests, who are present in episode

Namita Present - Whether Namita present in episode or not

Vineeta Present - Whether Vineeta present in episode or not

Anupam ...
Google Analytics Sample
kaggle.com
zip
Updated Sep 19, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/datasets/bigquery/google-analytics-sample
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Sep 19, 2019
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Googlehttp://google.com/
Authors
Google BigQuery
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

Content

The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

Fork this kernel to get started.

Acknowledgements

Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

Banner Photo by Edho Pratama from Unsplash.

Inspiration

What is the total number of transactions generated per device browser in July 2017?

The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

What was the average number of product pageviews for users who made a purchase in July 2017?

What was the average number of product pageviews for users who did not make a purchase in July 2017?

What was the average total transactions per user that made a purchase in July 2017?

What is the average amount of money spent per session in July 2017?

What is the sequence of pages viewed?
X company Data analysis Project
kaggle.com
Updated Sep 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmed Samir (2023). X company Data analysis Project [Dataset]. https://www.kaggle.com/datasets/ahmedsamir11111/x-company-data-analysis-project/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 6, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ahmed Samir
Description
About Dataset The dataset contains information about sales transactions, including details such as the customer's age, gender, location, and the products sold. The dataset includes data on both the cost of the product and the revenue generated from its sale, allowing for calculations of profit and profit margins. The dataset includes information on customer age and gender, which could be used to analyze purchasing behavior across different demographic groups. The dataset likely includes both numeric and categorical data, which would require different types of analysis and visualization techniques. Overall, the dataset appears to provide a comprehensive view of sales transactions, with the potential for analysis at multiple levels, including by product, customer, and location. But it does not contain any useful information or insights for decision makers. - After understanding the dataset. - I cleaned it and add some columns & calculations like (Net profit, Age Status). - Making a model in Power Pivot, calculate some measures like (Total profit, COGS, Total revenues) and Making KPIS Model. - Then asked some questions: About Distribution What are the total revenues and profits? What is the best-selling country in terms of revenue? What are the five best-selling states in terms of revenue? What are the five lowest-selling states in terms of revenues? What is the position of age in relation to revenues? About profitability What are the total revenues and profits? Monthly position in terms of revenues and profits? Months position in terms of COGS? What are the top category-selling in terms of revenues & Profit? What are the three best-selling sub-category in terms of profit? About KPIS Explain to me each salesperson's position in terms of Target

Then Answering that questions, analysis the data and Visualize with Dashboards.

Facebook

Twitter

Click to copy link

Link copied

Cite

Anand Kumar (2025). Gen Z Money Spending Dataset [Dataset]. https://www.kaggle.com/datasets/manandkumar/gen-z-money-spending-dataset

Gen Z Money Spending Dataset

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jan 31, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Anand Kumar

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

This dataset provides insights into the spending habits of Gen Z (ages 18-27) across various categories such as rent, groceries, entertainment, education, savings, and more. It contains 1700 records and 15 financial attributes, making it a valuable resource for financial trend analysis, budgeting studies, and machine learning applications in personal finance.

Clear search

Close search

Google apps

Main menu

Gen Z Money Spending Dataset

Fake News Prediction Dataset

‘Hotel Prices - Beginner Dataset’ analyzed by Analyst-2

Context

Content

Realistic Sales Revenue Dataset

‘🌡 Weather Check’ analyzed by Analyst-2

About this dataset

How to use this dataset

Acknowledgements

Start A New Notebook!

‘Air Passengers Forecast Dataset’ analyzed by Analyst-2

Description:

Acknowledgements:

Objective:

‘Spam Emails Dataset’ analyzed by Analyst-2

Description:

Objective:

‘NFT History Sales’ analyzed by Analyst-2

‘FAANG- Complete Stock Data’ analyzed by Analyst-2

Context

Content

Online Sales Data Power BI Dashboard

‘Student Food Survey’ analyzed by Analyst-2

Purpose

Description of Variables

Superstore Dataset

Context

Metadata

Acknowledgements

IBM AMLSim Example Dataset

IBM AMLSim

Content

Acknowledgements

License

‘Unsupervised Learning on Country Data’ analyzed by Analyst-2

Clustering the Countries by using Unsupervised Learning for HELP International

Objective:

About organization:

Problem Statement:

Retail Transactions Dataset

Context:

Inspiration:

Dataset Information:

Use Cases:

Note: This dataset is entirely synthetic and was generated using the Python Faker library, which means it doesn't contain real customer data. It's designed for educational and research purposes.

‘Loan Data’ analyzed by Analyst-2

About the data and what to do…

Spam Dataset

Spam E-mail Data

Description

Usage

Format

🦈 Shark Tank India dataset 🇮🇳

Shark Tank India Data set.

Google Analytics Sample

Context

Content

Acknowledgements

Inspiration

X company Data analysis Project

Gen Z Money Spending Dataset

`Context:`

`Inspiration:`

`Dataset Information:`

`Use Cases:`