58 datasets found
  1. Gen Z Money Spending Dataset

    • kaggle.com
    Updated Jan 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anand Kumar (2025). Gen Z Money Spending Dataset [Dataset]. https://www.kaggle.com/datasets/manandkumar/gen-z-money-spending-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 31, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Anand Kumar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset provides insights into the spending habits of Gen Z (ages 18-27) across various categories such as rent, groceries, entertainment, education, savings, and more. It contains 1700 records and 15 financial attributes, making it a valuable resource for financial trend analysis, budgeting studies, and machine learning applications in personal finance.

  2. Fake News Prediction Dataset

    • kaggle.com
    Updated Nov 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rajat Kumar (2023). Fake News Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/rajatkumar30/fake-news
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 3, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rajat Kumar
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    ** Please Upvote if you like the dataset **

    Fake news or hoax news is false or misleading information presented as news. Fake news often has the aim of damaging the reputation of a person or entity, or making money through advertising revenue.

    This dataset is having Both Fake and Real news.

    The columns present in the dataset are:-

    1) Title -> Title of the News

    2) Text -> Text or Content of the News

    3) Label -> Labelling the news as Fake or Real

  3. A

    ‘Hotel Prices - Beginner Dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Hotel Prices - Beginner Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-hotel-prices-beginner-dataset-6aca/74a157b1/?iid=000-816&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Hotel Prices - Beginner Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/sveneschlbeck/hotel-prices-beginner-dataset on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    This dataset addresses Data Science students and/or Beginners who want to dive into Regression or Clustering without the need to pre-clean the data first.

    Content

    This dataset consists of a pre-cleaned .csv table that has been translated from German to English.

    There are four columns in this dataset:

    • Profit (How much money does this hotel make in a year)
    • Price in Millions (€)
    • Square Meter (Hotel Area)
    • City

    Here, "Hotel Prices" does not refer to the cost of spending a night at those hotels but the price for buying them. This would be an interesting chart for someone who wants to buy a hotel and needs to judge whether he/she is overpaying or getting a great deal depending on similar objects in other comparable cities.

    --- Original source retains full ownership of the source dataset ---

  4. Realistic Sales Revenue Dataset

    • kaggle.com
    Updated Jul 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shoukat khan (2025). Realistic Sales Revenue Dataset [Dataset]. https://www.kaggle.com/datasets/drisrarahmad/realistic-sales-revenue-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 16, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Shoukat khan
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    📄 Description: This synthetic dataset is designed for practising regression tasks, particularly for predicting Sales Revenue based on product, market, and economic factors. It contains both categorical (nominal) and numerical features, simulating real-world sales data across various product categories and regions.

    📌 Dataset Summary: Rows: 2000

    Columns: 12 features + 1 target (SalesRevenue)

    🏷️ Columns Description: Column Name Type Description ProductCategory Categorical Type of product: Electronics, Clothing, Furniture, Toys Region Categorical Sales region: North, South, East, West CustomerSegment Categorical Customer income group: Low, Middle, High IsPromotionApplied Categorical Whether promotion was applied: Yes/No ProductionCost Numerical Cost to produce the product MarketingSpend Numerical Money spent on marketing SeasonalDemandIndex Numerical Factor representing seasonal demand CompetitorPrice Numerical Average price of competing products CustomerRating Numerical Average customer rating (out of 5) EconomicIndex Numerical Indicator of overall economic conditions StoreCount Numerical Number of stores selling the product OnlinePresence Numerical Online presence score of the product SalesRevenue Numerical Target Variable: Revenue from product sales

  5. A

    ‘🌡 Weather Check’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘🌡 Weather Check’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-weather-check-aa67/dfd26413/?iid=001-955&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘🌡 Weather Check’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/weather-checke on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    About this dataset

    See Readme for more details.
    This repository contains a selection of the data -- and the data-processing scripts -- behind the articles, graphics and interactives at FiveThirtyEight.

    We hope you'll use it to check our work and to create stories and visualizations of your own. The data is available under the Creative Commons Attribution 4.0 International License and the code is available under the MIT License. If you do find it useful, please let us know](andrei.scheinkman@fivethirtyeight.com).

    Source: https://github.com/fivethirtyeight/data

    This dataset was created by FiveThirtyEight and contains around 900 samples along with What Is Your Gender?, Age, technical information and other features such as: - How Much Total Combined Money Did All Members Of Your Household Earn Last Year? - If You Had A Smartwatch (like The Soon To Be Released Apple Watch), How Likely Or Unlikely Would You Be To Check The Weather On That Device? - and more.

    How to use this dataset

    • Analyze How Do You Typically Check The Weather? in relation to Do You Typically Check A Daily Weather Report?
    • Study the influence of Us Region on A Specific Website Or App (please Provide The Answer)
    • More datasets

    Acknowledgements

    If you use this dataset in your research, please credit FiveThirtyEight

    Start A New Notebook!

    --- Original source retains full ownership of the source dataset ---

  6. A

    ‘Air Passengers Forecast Dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Air Passengers Forecast Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-air-passengers-forecast-dataset-dbf6/latest
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Air Passengers Forecast Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yasserh/air-passengers-forecast-dataset on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    https://raw.githubusercontent.com/Masterx-AI/Project_Forecasting_Air_Passengers_/main/ap.jpg" alt="">

    Description:

    The "spam" concept is diverse: advertisements for products/websites, make money fast schemes, chain letters, pornography...

    Our collection of spam e-mails came from our postmaster and individuals who had filed spam. Our collection of non-spam e-mails came from filed work and personal e-mails, and hence the word 'George' and the area code '650' are indicators of non-spam. These are useful when constructing a personalized spam filter. One would either have to blind such non-spam indicators or get a very wide collection of non-spam to generate a general-purpose spam filter.

    The dataset, taken from the UCI ML repository, contains about 4600 emails labelle as spam or ham.

    Acknowledgements:

    This dataset has been referred from Kaggle.

    Objective:

    • Understand the Dataset & cleanup (if required).
    • Build classification models to predict whether or not the email is spam.
    • Also fine-tune the hyperparameters & compare the evaluation metrics of various classification algorithms.

    --- Original source retains full ownership of the source dataset ---

  7. A

    ‘Spam Emails Dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Spam Emails Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-spam-emails-dataset-6e4f/a414900c/?iid=003-246&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Spam Emails Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yasserh/spamemailsdataset on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    https://raw.githubusercontent.com/Masterx-AI/Project_Email_Spam_Detection_/main/ee.png" alt="">

    Description:

    The "spam" concept is diverse: advertisements for products/web sites, make money fast schemes, chain letters, pornography...

    Our collection of spam e-mails came from our postmaster and individuals who had filed spam. Our collection of non-spam e-mails came from filed work and personal e-mails, and hence the word 'george' and the area code '650' are indicators of non-spam. These are useful when constructing a personalized spam filter. One would either have to blind such non-spam indicators or get a very wide collection of non-spam to generate a general purpose spam filter.

    The dataset, taken from the UCI ML repository, contains about 4600 emails labelled as spam or ham.

    The dataset can be downloaded here: https://archive.ics.uci.edu/ml/datasets/spambase

    Objective:

    • Understand the Dataset & cleanup (if required).
    • Build classification models to predict whether or not the email spam.
    • Compare the evaluation metrics of vaious classification algorithms.

    --- Original source retains full ownership of the source dataset ---

  8. A

    ‘NFT History Sales’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘NFT History Sales’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-nft-history-sales-d36d/latest
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘NFT History Sales’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/mathurinache/nft-history-sales on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Make money using NFT + AI | How to get started?

    Introduction By now, you would have at least heard about NFT. If not, you would have heard about art which sold for $69 million. May be at-least record sales on NBA and with NFT. This got me curious, I started to research more on it recently by wearing my Data Scientist hat. Data Scientists follow when some numbers are there! In this article, I will share about NFT’s and how to get started.

    --- Original source retains full ownership of the source dataset ---

  9. A

    ‘FAANG- Complete Stock Data’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Sep 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘FAANG- Complete Stock Data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-faang-complete-stock-data-36c1/9110ef3b/?iid=011-763&v=presentation
    Explore at:
    Dataset updated
    Sep 30, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘FAANG- Complete Stock Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/aayushmishra1512/faang-complete-stock-data on 30 September 2021.

    --- Dataset description provided by original source is as follows ---

    Context

    There are a few companies that are considered to be revolutionary. These companies also happen to be a dream place to work at for many many people across the world. These companies include - Facebook,Amazon,Apple,Netflix and Google also known as FAANG! These companies make ton of money and they help others too by giving them a chance to invest in the companies via stocks and shares. This data wass made targeting these stock prices.

    Content

    The data contains information such as opening price of a stock, closing price, how much of these stocks were sold and many more things. There are 5 different CSV files in the data for each company.

    --- Original source retains full ownership of the source dataset ---

  10. Online Sales Data Power BI Dashboard

    • kaggle.com
    Updated Aug 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    manjeshkumar05 (2024). Online Sales Data Power BI Dashboard [Dataset]. https://www.kaggle.com/datasets/manjeshkumar05/online-sales-data-power-bi-dashboard
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 20, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    manjeshkumar05
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Exploring Online Sales Data with Power BI !!

    Another productive day diving into online sales dataset! Here’s a roundup of the insights I uncovered today:

    Revenue by Category: Analyzed revenue distribution across different product categories to identify high-performing sectors.

    Revenue by Sub-Category: Drilled down into sub-categories for a more granular view of revenue streams.

    Revenue by Payment Mode: Examined revenue patterns based on payment methods to understand customer preferences.

    Revenue by State: Mapped out revenue by state to pinpoint geographical strengths and opportunities.

    Profit by Category: Evaluated profitability across product categories to assess which categories yield the highest profit margins.

    Profit by Sub-Category: Explored profit levels at a sub-category level to identify the most profitable segments.

    Profit by Payment Mode: Analyzed profit distribution across different payment methods.

    Top 5 States by Revenue and Profit: Highlighted the top 5 states driving the most revenue and profit, offering insights into regional performance.

    Sales Map by State: Visualized sales data on a map to provide a geographical perspective on sales distribution.

    Total Quantity, Revenue, and Profit: Aggregated data to give an overview of total quantities sold, overall revenue, and total profit.

    Filter by Category: Added a filter functionality to focus on specific categories and refine data analysis.

  11. A

    ‘Student Food Survey’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Sep 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Student Food Survey’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-student-food-survey-fc60/f814e5a0/?iid=041-303&v=presentation
    Explore at:
    Dataset updated
    Sep 30, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Student Food Survey’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/mlomuscio/student-food-survey on 30 September 2021.

    --- Dataset description provided by original source is as follows ---

    Purpose

    This data came from a survey of students. The purpose of the survey was to identify attitudes and habits regarding food consumption at school and outside of school.

    Description of Variables

    • Gender: What gender do you identify as?
    • Boarding: Day or Boarding?
    • Grade: What grade are you in?
    • Athlete: Do you consider yourself an athlete?
    • Activities: Are you participating in any of the following activities this season?
    • DHBreakfast: How many days a week do you eat breakfast in the dining hall (including brunch on Saturday and Sunday)?
    • NDHBreakfast: How many days a week do you eat breakfast but not in the dining hall?
    • BClass: Would you be more likely to eat breakfast if you could eat it in class?
    • DHBoxes: On average, how many boxes of food do you eat per meal at the dining hall?
    • NDHBoxes: How many boxes do you take from the dining hall to eat later?
    • NDHMeals: How many meals a week do you eat in the dorm or student center?
    • Nutrition: On a scale of 0 to 5, how aware are you of the nutritional values of the food you eat?
    • Money: On average, how much money do spend on food outside of the dining hall per week?

    --- Original source retains full ownership of the source dataset ---

  12. Superstore Dataset

    • kaggle.com
    Updated Feb 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vivek Chowdhury (2022). Superstore Dataset [Dataset]. https://www.kaggle.com/datasets/vivek468/superstore-dataset-final/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 22, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Vivek Chowdhury
    Description

    Context

    With growing demands and cut-throat competitions in the market, a Superstore Giant is seeking your knowledge in understanding what works best for them. They would like to understand which products, regions, categories and customer segments they should target or avoid.

    You can even take this a step further and try and build a Regression model to predict Sales or Profit.

    Go crazy with the dataset, but also make sure to provide some business insights to improve.

    Metadata

    Row ID => Unique ID for each row. Order ID => Unique Order ID for each Customer. Order Date => Order Date of the product. Ship Date => Shipping Date of the Product. Ship Mode=> Shipping Mode specified by the Customer. Customer ID => Unique ID to identify each Customer. Customer Name => Name of the Customer. Segment => The segment where the Customer belongs. Country => Country of residence of the Customer. City => City of residence of of the Customer. State => State of residence of the Customer. Postal Code => Postal Code of every Customer. Region => Region where the Customer belong. Product ID => Unique ID of the Product. Category => Category of the product ordered. Sub-Category => Sub-Category of the product ordered. Product Name => Name of the Product Sales => Sales of the Product. Quantity => Quantity of the Product. Discount => Discount provided. Profit => Profit/Loss incurred.

    Acknowledgements

    I do not own this data. I merely found it from the Tableau website. All credits to the original authors/creators. For educational purposes only.

  13. IBM AMLSim Example Dataset

    • kaggle.com
    Updated Jul 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ansh Ankul (2021). IBM AMLSim Example Dataset [Dataset]. https://www.kaggle.com/anshankul/ibm-amlsim-example-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 20, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ansh Ankul
    Description

    IBM AMLSim

    IBM AMLSim: The AMLSim project is intended to provide a multi-agent based simulator that generates synthetic banking transaction data together with a set of known money laundering patterns - mainly for the purpose of testing machine learning models and graph algorithms.

    This dataset is an example dataset generated from IBM AMLSim.

    Content

    There are 3 datasets mentioned here: alerts, transactions and accounts.

    1. Accounts dataset: Contains the information about all the bank accounts whose transactions are monitored.
    2. Alerts dataset: Contains the transactions which triggered an alert according to AML guidelines.
    3. Transactions dataset: Contains the list of all the transactions with information about sender and receiver accounts.

    Acknowledgements

    Do check out the AML Sim project and generate your own datasets for AML purposes. Link: https://github.com/IBM/AMLSim

    License

    IBM/AMLSim is licensed under the Apache License 2.0 A permissive license whose main conditions require preservation of copyright and license notices. Contributors provide an express grant of patent rights. Licensed works, modifications, and larger works may be distributed under different terms and without source code. Link: https://github.com/IBM/AMLSim/blob/master/LICENSE

  14. A

    ‘Unsupervised Learning on Country Data’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Nov 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Unsupervised Learning on Country Data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-unsupervised-learning-on-country-data-cd76/5b0fd9e4/?iid=001-487&v=presentation
    Explore at:
    Dataset updated
    Nov 21, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Unsupervised Learning on Country Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/rohan0301/unsupervised-learning-on-country-data on 21 November 2021.

    --- Dataset description provided by original source is as follows ---

    Clustering the Countries by using Unsupervised Learning for HELP International

    Objective:

    To categorise the countries using socio-economic and health factors that determine the overall development of the country.

    About organization:

    HELP International is an international humanitarian NGO that is committed to fighting poverty and providing the people of backward countries with basic amenities and relief during the time of disasters and natural calamities.

    Problem Statement:

    HELP International have been able to raise around $ 10 million. Now the CEO of the NGO needs to decide how to use this money strategically and effectively. So, CEO has to make decision to choose the countries that are in the direst need of aid. Hence, your Job as a Data scientist is to categorise the countries using some socio-economic and health factors that determine the overall development of the country. Then you need to suggest the countries which the CEO needs to focus on the most.

    --- Original source retains full ownership of the source dataset ---

  15. Retail Transactions Dataset

    • kaggle.com
    Updated May 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prasad Patil (2024). Retail Transactions Dataset [Dataset]. https://www.kaggle.com/datasets/prasad22/retail-transactions-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 18, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Prasad Patil
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset was created to simulate a market basket dataset, providing insights into customer purchasing behavior and store operations. The dataset facilitates market basket analysis, customer segmentation, and other retail analytics tasks. Here's more information about the context and inspiration behind this dataset:

    Context:

    Retail businesses, from supermarkets to convenience stores, are constantly seeking ways to better understand their customers and improve their operations. Market basket analysis, a technique used in retail analytics, explores customer purchase patterns to uncover associations between products, identify trends, and optimize pricing and promotions. Customer segmentation allows businesses to tailor their offerings to specific groups, enhancing the customer experience.

    Inspiration:

    The inspiration for this dataset comes from the need for accessible and customizable market basket datasets. While real-world retail data is sensitive and often restricted, synthetic datasets offer a safe and versatile alternative. Researchers, data scientists, and analysts can use this dataset to develop and test algorithms, models, and analytical tools.

    Dataset Information:

    The columns provide information about the transactions, customers, products, and purchasing behavior, making the dataset suitable for various analyses, including market basket analysis and customer segmentation. Here's a brief explanation of each column in the Dataset:

    • Transaction_ID: A unique identifier for each transaction, represented as a 10-digit number. This column is used to uniquely identify each purchase.
    • Date: The date and time when the transaction occurred. It records the timestamp of each purchase.
    • Customer_Name: The name of the customer who made the purchase. It provides information about the customer's identity.
    • Product: A list of products purchased in the transaction. It includes the names of the products bought.
    • Total_Items: The total number of items purchased in the transaction. It represents the quantity of products bought.
    • Total_Cost: The total cost of the purchase, in currency. It represents the financial value of the transaction.
    • Payment_Method: The method used for payment in the transaction, such as credit card, debit card, cash, or mobile payment.
    • City: The city where the purchase took place. It indicates the location of the transaction.
    • Store_Type: The type of store where the purchase was made, such as a supermarket, convenience store, department store, etc.
    • Discount_Applied: A binary indicator (True/False) representing whether a discount was applied to the transaction.
    • Customer_Category: A category representing the customer's background or age group.
    • Season: The season in which the purchase occurred, such as spring, summer, fall, or winter.
    • Promotion: The type of promotion applied to the transaction, such as "None," "BOGO (Buy One Get One)," or "Discount on Selected Items."

    Use Cases:

    • Market Basket Analysis: Discover associations between products and uncover buying patterns.
    • Customer Segmentation: Group customers based on purchasing behavior.
    • Pricing Optimization: Optimize pricing strategies and identify opportunities for discounts and promotions.
    • Retail Analytics: Analyze store performance and customer trends.

    Note: This dataset is entirely synthetic and was generated using the Python Faker library, which means it doesn't contain real customer data. It's designed for educational and research purposes.

  16. A

    ‘Loan Data’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Aug 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘Loan Data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-loan-data-faf6/0734d774/?iid=005-759&v=presentation
    Explore at:
    Dataset updated
    Aug 4, 2020
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Loan Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/itssuru/loan-data on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    About the data and what to do…

    publicly available data from LendingClub.com. Lending Club connects people who need money (borrowers) with people who have money (investors). Hopefully, as an investor you would want to invest in people who showed a profile of having a high probability of paying you back.

    We will use lending data from 2007-2010 and be trying to classify and predict whether or not the borrower paid back their loan in full. You can download the data from here.

    Here are what the columns represent:

    credit.policy: 1 if the customer meets the credit underwriting criteria of LendingClub.com, and 0 otherwise. purpose: The purpose of the loan (takes values "credit_card", "debt_consolidation", "educational", "major_purchase", "small_business", and "all_other"). int.rate: The interest rate of the loan, as a proportion (a rate of 11% would be stored as 0.11). Borrowers judged by LendingClub.com to be more risky are assigned higher interest rates. installment: The monthly installments owed by the borrower if the loan is funded. log.annual.inc: The natural log of the self-reported annual income of the borrower. dti: The debt-to-income ratio of the borrower (amount of debt divided by annual income). fico: The FICO credit score of the borrower. days.with.cr.line: The number of days the borrower has had a credit line. revol.bal: The borrower's revolving balance (amount unpaid at the end of the credit card billing cycle). revol.util: The borrower's revolving line utilization rate (the amount of the credit line used relative to total credit available). inq.last.6mths: The borrower's number of inquiries by creditors in the last 6 months. delinq.2yrs: The number of times the borrower had been 30+ days past due on a payment in the past 2 years. pub.rec: The borrower's number of derogatory public records (bankruptcy filings, tax liens, or judgments).

    --- Original source retains full ownership of the source dataset ---

  17. Spam Dataset

    • kaggle.com
    Updated Aug 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arman Aghania (2023). Spam Dataset [Dataset]. https://www.kaggle.com/datasets/armanaghania/spam-dataset/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 16, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Arman Aghania
    Description

    Spam E-mail Data

    Description

    The data consist of 4601 email items, of which 1813 items were identified as spam.

    Usage

    • spam7

    Format

    This data frame contains the following columns: * crl.tot * total length of words in capitals * dollar * number of occurrences of the $ symbol * bang * number of occurrences of the ! symbol * money * number of occurrences of the word ‘money’ * n000 * number of occurrences of the string ‘000’ * make * number of occurrences of the word ‘make’ * yesno * outcome variable, a factor with levels n not spam, y spam

  18. 🦈 Shark Tank India dataset 🇮🇳

    • kaggle.com
    Updated Apr 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Satya Thirumani (2025). 🦈 Shark Tank India dataset 🇮🇳 [Dataset]. https://www.kaggle.com/datasets/thirumani/shark-tank-india
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 20, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Satya Thirumani
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Shark Tank India Data set.

    Shark Tank India - Season 1 to season 4 information, with 80 fields/columns and 630+ records.

    All seasons/episodes of 🦈 SHARKTANK INDIA 🇮🇳 were broadcasted on SonyLiv OTT/Sony TV.

    Here is the data dictionary for (Indian) Shark Tank season's dataset.

    • Season Number - Season number
    • Startup Name - Company name or product name
    • Episode Number - Episode number within the season
    • Pitch Number - Overall pitch number
    • Season Start - Season first aired date
    • Season End - Season last aired date
    • Original Air Date - Episode original/first aired date, on OTT/TV
    • Episode Title - Episode title in SonyLiv
    • Anchor - Name of the episode presenter/host
    • Industry - Industry name or type
    • Business Description - Business Description
    • Company Website - Company Website URL
    • Started in - Year in which startup was started/incorporated
    • Number of Presenters - Number of presenters
    • Male Presenters - Number of male presenters
    • Female Presenters - Number of female presenters
    • Transgender Presenters - Number of transgender/LGBTQ presenters
    • Couple Presenters - Are presenters wife/husband ? 1-yes, 0-no
    • Pitchers Average Age - All pitchers average age, <30 young, 30-50 middle, >50 old
    • Pitchers City - Presenter's town/city or place where company head office exists
    • Pitchers State - Indian state pitcher hails from or state where company head office exists
    • Yearly Revenue - Yearly revenue, in lakhs INR, -1 means negative revenue, 0 means pre-revenue
    • Monthly Sales - Total monthly sales, in lakhs
    • Gross Margin - Gross margin/profit of company, in percentages
    • Net Margin - Net margin/profit of company, in percentages
    • EBITDA - Earnings Before Interest, Taxes, Depreciation, and Amortization
    • Cash Burn - In loss in current year; burning/paying money from their pocket (yes/no)
    • SKUs - Stock Keeping Units or number of varieties, at the time of pitch
    • Has Patents - Pitcher has Patents/Intellectual property (filed/granted), at the time of pitch
    • Bootstrapped - Startup is bootstrapped or not (yes/no)
    • Part of Match off - Competition between two similar brands, pitched at same time
    • Original Ask Amount - Original Ask Amount, in lakhs INR
    • Original Offered Equity - Original Offered Equity, in percentages
    • Valuation Requested - Valuation Requested, in lakhs INR
    • Received Offer - Received offer or not, 1-received, 0-not received
    • Accepted Offer - Accepted offer or not, 1-accepted, 0-rejected
    • Total Deal Amount - Total Deal Amount, in lakhs INR
    • Total Deal Equity - Total Deal Equity, in percentages
    • Total Deal Debt - Total Deal debt/loan amount, in lakhs INR
    • Debt Interest - Debt interest rate, in percentages
    • Deal Valuation - Deal Valuation, in lakhs INR
    • Number of sharks in deal - Number of sharks involved in deal
    • Deal has conditions - Deal has conditions or not? (yes or no)
    • Royalty Percentage - Royalty percentage, if it's royalty deal
    • Royalty Recouped Amount - Royalty recouped amount, if it's royalty deal, in lakhs
    • Advisory Shares Equity - Deal with Advisory shares or equity, in percentages
    • Namita Investment Amount - Namita Investment Amount, in lakhs INR
    • Namita Investment Equity - Namita Investment Equity, in percentages
    • Namita Debt Amount - Namita Debt Amount, in lakhs INR
    • Vineeta Investment Amount - Vineeta Investment Amount, in lakhs INR
    • Vineeta Investment Equity - Vineeta Investment Equity, in percentages
    • Vineeta Debt Amount - Vineeta Debt Amount, in lakhs INR
    • Anupam Investment Amount - Anupam Investment Amount, in lakhs INR
    • Anupam Investment Equity - Anupam Investment Equity, in percentages
    • Anupam Debt Amount - Anupam Debt Amount, in lakhs INR
    • Aman Investment Amount - Aman Investment Amount, in lakhs INR
    • Aman Investment Equity - Aman Investment Equity, in percentages
    • Aman Debt Amount - Aman Debt Amount, in lakhs INR
    • Peyush Investment Amount - Peyush Investment Amount, in lakhs INR
    • Peyush Investment Equity - Peyush Investment Equity, in percentages
    • Peyush Debt Amount - Peyush Debt Amount, in lakhs INR
    • Ritesh Investment Amount - Ritesh Investment Amount, in lakhs INR
    • Ritesh Investment Equity - Ritesh Investment Equity, in percentages
    • Ritesh Debt Amount - Ritesh Debt Amount, in lakhs INR
    • Amit Investment Amount - Amit Investment Amount, in lakhs INR
    • Amit Investment Equity - Amit Investment Equity, in percentages
    • Amit Debt Amount - Amit Debt Amount, in lakhs INR
    • Guest Investment Amount - Guest Investment Amount, in lakhs INR
    • Guest Investment Equity - Guest Investment Equity, in percentages
    • Guest Debt Amount - Guest Debt Amount, in lakhs INR
    • Invested Guest Name - Name of the guest(s) who invested in deal
    • All Guest Names - Name of all guests, who are present in episode
    • Namita Present - Whether Namita present in episode or not
    • Vineeta Present - Whether Vineeta present in episode or not
    • Anupam ...
  19. Google Analytics Sample

    • kaggle.com
    zip
    Updated Sep 19, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/datasets/bigquery/google-analytics-sample
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Sep 19, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    Authors
    Google BigQuery
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

    Content

    The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

    Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

    Fork this kernel to get started.

    Acknowledgements

    Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

    Banner Photo by Edho Pratama from Unsplash.

    Inspiration

    What is the total number of transactions generated per device browser in July 2017?

    The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

    What was the average number of product pageviews for users who made a purchase in July 2017?

    What was the average number of product pageviews for users who did not make a purchase in July 2017?

    What was the average total transactions per user that made a purchase in July 2017?

    What is the average amount of money spent per session in July 2017?

    What is the sequence of pages viewed?

  20. X company Data analysis Project

    • kaggle.com
    Updated Sep 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Samir (2023). X company Data analysis Project [Dataset]. https://www.kaggle.com/datasets/ahmedsamir11111/x-company-data-analysis-project/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 6, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ahmed Samir
    Description

    About Dataset The dataset contains information about sales transactions, including details such as the customer's age, gender, location, and the products sold. The dataset includes data on both the cost of the product and the revenue generated from its sale, allowing for calculations of profit and profit margins. The dataset includes information on customer age and gender, which could be used to analyze purchasing behavior across different demographic groups. The dataset likely includes both numeric and categorical data, which would require different types of analysis and visualization techniques. Overall, the dataset appears to provide a comprehensive view of sales transactions, with the potential for analysis at multiple levels, including by product, customer, and location. But it does not contain any useful information or insights for decision makers. - After understanding the dataset. - I cleaned it and add some columns & calculations like (Net profit, Age Status). - Making a model in Power Pivot, calculate some measures like (Total profit, COGS, Total revenues) and Making KPIS Model. - Then asked some questions: About Distribution What are the total revenues and profits? What is the best-selling country in terms of revenue? What are the five best-selling states in terms of revenue? What are the five lowest-selling states in terms of revenues? What is the position of age in relation to revenues? About profitability What are the total revenues and profits? Monthly position in terms of revenues and profits? Months position in terms of COGS? What are the top category-selling in terms of revenues & Profit? What are the three best-selling sub-category in terms of profit? About KPIS Explain to me each salesperson's position in terms of Target

    • Then Answering that questions, analysis the data and Visualize with Dashboards.
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Anand Kumar (2025). Gen Z Money Spending Dataset [Dataset]. https://www.kaggle.com/datasets/manandkumar/gen-z-money-spending-dataset
Organization logo

Gen Z Money Spending Dataset

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 31, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Anand Kumar
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

This dataset provides insights into the spending habits of Gen Z (ages 18-27) across various categories such as rent, groceries, entertainment, education, savings, and more. It contains 1700 records and 15 financial attributes, making it a valuable resource for financial trend analysis, budgeting studies, and machine learning applications in personal finance.

Search
Clear search
Close search
Google apps
Main menu