https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides insights into the spending habits of Gen Z (ages 18-27) across various categories such as rent, groceries, entertainment, education, savings, and more. It contains 1700 records and 15 financial attributes, making it a valuable resource for financial trend analysis, budgeting studies, and machine learning applications in personal finance.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
** Please Upvote if you like the dataset **
Fake news or hoax news is false or misleading information presented as news. Fake news often has the aim of damaging the reputation of a person or entity, or making money through advertising revenue.
This dataset is having Both Fake and Real news.
The columns present in the dataset are:-
1) Title -> Title of the News
2) Text -> Text or Content of the News
3) Label -> Labelling the news as Fake or Real
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Hotel Prices - Beginner Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/sveneschlbeck/hotel-prices-beginner-dataset on 28 January 2022.
--- Dataset description provided by original source is as follows ---
This dataset addresses Data Science students and/or Beginners who want to dive into Regression or Clustering without the need to pre-clean the data first.
This dataset consists of a pre-cleaned .csv
table that has been translated from German to English.
There are four columns in this dataset:
Here, "Hotel Prices" does not refer to the cost of spending a night at those hotels but the price for buying them. This would be an interesting chart for someone who wants to buy a hotel and needs to judge whether he/she is overpaying or getting a great deal depending on similar objects in other comparable cities.
--- Original source retains full ownership of the source dataset ---
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
📄 Description: This synthetic dataset is designed for practising regression tasks, particularly for predicting Sales Revenue based on product, market, and economic factors. It contains both categorical (nominal) and numerical features, simulating real-world sales data across various product categories and regions.
📌 Dataset Summary: Rows: 2000
Columns: 12 features + 1 target (SalesRevenue)
🏷️ Columns Description: Column Name Type Description ProductCategory Categorical Type of product: Electronics, Clothing, Furniture, Toys Region Categorical Sales region: North, South, East, West CustomerSegment Categorical Customer income group: Low, Middle, High IsPromotionApplied Categorical Whether promotion was applied: Yes/No ProductionCost Numerical Cost to produce the product MarketingSpend Numerical Money spent on marketing SeasonalDemandIndex Numerical Factor representing seasonal demand CompetitorPrice Numerical Average price of competing products CustomerRating Numerical Average customer rating (out of 5) EconomicIndex Numerical Indicator of overall economic conditions StoreCount Numerical Number of stores selling the product OnlinePresence Numerical Online presence score of the product SalesRevenue Numerical Target Variable: Revenue from product sales
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘🌡 Weather Check’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/weather-checke on 28 January 2022.
--- Dataset description provided by original source is as follows ---
See Readme for more details.
This repository contains a selection of the data -- and the data-processing scripts -- behind the articles, graphics and interactives at FiveThirtyEight.We hope you'll use it to check our work and to create stories and visualizations of your own. The data is available under the Creative Commons Attribution 4.0 International License and the code is available under the MIT License. If you do find it useful, please let us know](andrei.scheinkman@fivethirtyeight.com).
Source: https://github.com/fivethirtyeight/data
This dataset was created by FiveThirtyEight and contains around 900 samples along with What Is Your Gender?, Age, technical information and other features such as: - How Much Total Combined Money Did All Members Of Your Household Earn Last Year? - If You Had A Smartwatch (like The Soon To Be Released Apple Watch), How Likely Or Unlikely Would You Be To Check The Weather On That Device? - and more.
- Analyze How Do You Typically Check The Weather? in relation to Do You Typically Check A Daily Weather Report?
- Study the influence of Us Region on A Specific Website Or App (please Provide The Answer)
- More datasets
If you use this dataset in your research, please credit FiveThirtyEight
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Air Passengers Forecast Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yasserh/air-passengers-forecast-dataset on 28 January 2022.
--- Dataset description provided by original source is as follows ---
https://raw.githubusercontent.com/Masterx-AI/Project_Forecasting_Air_Passengers_/main/ap.jpg" alt="">
The "spam" concept is diverse: advertisements for products/websites, make money fast schemes, chain letters, pornography...
Our collection of spam e-mails came from our postmaster and individuals who had filed spam. Our collection of non-spam e-mails came from filed work and personal e-mails, and hence the word 'George' and the area code '650' are indicators of non-spam. These are useful when constructing a personalized spam filter. One would either have to blind such non-spam indicators or get a very wide collection of non-spam to generate a general-purpose spam filter.
The dataset, taken from the UCI ML repository, contains about 4600 emails labelle as spam or ham.
This dataset has been referred from Kaggle.
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Spam Emails Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yasserh/spamemailsdataset on 28 January 2022.
--- Dataset description provided by original source is as follows ---
https://raw.githubusercontent.com/Masterx-AI/Project_Email_Spam_Detection_/main/ee.png" alt="">
The "spam" concept is diverse: advertisements for products/web sites, make money fast schemes, chain letters, pornography...
Our collection of spam e-mails came from our postmaster and individuals who had filed spam. Our collection of non-spam e-mails came from filed work and personal e-mails, and hence the word 'george' and the area code '650' are indicators of non-spam. These are useful when constructing a personalized spam filter. One would either have to blind such non-spam indicators or get a very wide collection of non-spam to generate a general purpose spam filter.
The dataset, taken from the UCI ML repository, contains about 4600 emails labelled as spam or ham.
The dataset can be downloaded here: https://archive.ics.uci.edu/ml/datasets/spambase
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘NFT History Sales’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/mathurinache/nft-history-sales on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Make money using NFT + AI | How to get started?
Introduction By now, you would have at least heard about NFT. If not, you would have heard about art which sold for $69 million. May be at-least record sales on NBA and with NFT. This got me curious, I started to research more on it recently by wearing my Data Scientist hat. Data Scientists follow when some numbers are there! In this article, I will share about NFT’s and how to get started.
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘FAANG- Complete Stock Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/aayushmishra1512/faang-complete-stock-data on 30 September 2021.
--- Dataset description provided by original source is as follows ---
There are a few companies that are considered to be revolutionary. These companies also happen to be a dream place to work at for many many people across the world. These companies include - Facebook,Amazon,Apple,Netflix and Google also known as FAANG! These companies make ton of money and they help others too by giving them a chance to invest in the companies via stocks and shares. This data wass made targeting these stock prices.
The data contains information such as opening price of a stock, closing price, how much of these stocks were sold and many more things. There are 5 different CSV files in the data for each company.
--- Original source retains full ownership of the source dataset ---
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Exploring Online Sales Data with Power BI !!
Another productive day diving into online sales dataset! Here’s a roundup of the insights I uncovered today:
Revenue by Category: Analyzed revenue distribution across different product categories to identify high-performing sectors.
Revenue by Sub-Category: Drilled down into sub-categories for a more granular view of revenue streams.
Revenue by Payment Mode: Examined revenue patterns based on payment methods to understand customer preferences.
Revenue by State: Mapped out revenue by state to pinpoint geographical strengths and opportunities.
Profit by Category: Evaluated profitability across product categories to assess which categories yield the highest profit margins.
Profit by Sub-Category: Explored profit levels at a sub-category level to identify the most profitable segments.
Profit by Payment Mode: Analyzed profit distribution across different payment methods.
Top 5 States by Revenue and Profit: Highlighted the top 5 states driving the most revenue and profit, offering insights into regional performance.
Sales Map by State: Visualized sales data on a map to provide a geographical perspective on sales distribution.
Total Quantity, Revenue, and Profit: Aggregated data to give an overview of total quantities sold, overall revenue, and total profit.
Filter by Category: Added a filter functionality to focus on specific categories and refine data analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Student Food Survey’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/mlomuscio/student-food-survey on 30 September 2021.
--- Dataset description provided by original source is as follows ---
This data came from a survey of students. The purpose of the survey was to identify attitudes and habits regarding food consumption at school and outside of school.
--- Original source retains full ownership of the source dataset ---
With growing demands and cut-throat competitions in the market, a Superstore Giant is seeking your knowledge in understanding what works best for them. They would like to understand which products, regions, categories and customer segments they should target or avoid.
You can even take this a step further and try and build a Regression model to predict Sales or Profit.
Go crazy with the dataset, but also make sure to provide some business insights to improve.
Row ID => Unique ID for each row. Order ID => Unique Order ID for each Customer. Order Date => Order Date of the product. Ship Date => Shipping Date of the Product. Ship Mode=> Shipping Mode specified by the Customer. Customer ID => Unique ID to identify each Customer. Customer Name => Name of the Customer. Segment => The segment where the Customer belongs. Country => Country of residence of the Customer. City => City of residence of of the Customer. State => State of residence of the Customer. Postal Code => Postal Code of every Customer. Region => Region where the Customer belong. Product ID => Unique ID of the Product. Category => Category of the product ordered. Sub-Category => Sub-Category of the product ordered. Product Name => Name of the Product Sales => Sales of the Product. Quantity => Quantity of the Product. Discount => Discount provided. Profit => Profit/Loss incurred.
I do not own this data. I merely found it from the Tableau website. All credits to the original authors/creators. For educational purposes only.
IBM AMLSim: The AMLSim project is intended to provide a multi-agent based simulator that generates synthetic banking transaction data together with a set of known money laundering patterns - mainly for the purpose of testing machine learning models and graph algorithms.
This dataset is an example dataset generated from IBM AMLSim.
There are 3 datasets mentioned here: alerts, transactions and accounts.
Do check out the AML Sim project and generate your own datasets for AML purposes. Link: https://github.com/IBM/AMLSim
IBM/AMLSim is licensed under the Apache License 2.0 A permissive license whose main conditions require preservation of copyright and license notices. Contributors provide an express grant of patent rights. Licensed works, modifications, and larger works may be distributed under different terms and without source code. Link: https://github.com/IBM/AMLSim/blob/master/LICENSE
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Unsupervised Learning on Country Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/rohan0301/unsupervised-learning-on-country-data on 21 November 2021.
--- Dataset description provided by original source is as follows ---
To categorise the countries using socio-economic and health factors that determine the overall development of the country.
HELP International is an international humanitarian NGO that is committed to fighting poverty and providing the people of backward countries with basic amenities and relief during the time of disasters and natural calamities.
HELP International have been able to raise around $ 10 million. Now the CEO of the NGO needs to decide how to use this money strategically and effectively. So, CEO has to make decision to choose the countries that are in the direst need of aid. Hence, your Job as a Data scientist is to categorise the countries using some socio-economic and health factors that determine the overall development of the country. Then you need to suggest the countries which the CEO needs to focus on the most.
--- Original source retains full ownership of the source dataset ---
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created to simulate a market basket dataset, providing insights into customer purchasing behavior and store operations. The dataset facilitates market basket analysis, customer segmentation, and other retail analytics tasks. Here's more information about the context and inspiration behind this dataset:
Context:
Retail businesses, from supermarkets to convenience stores, are constantly seeking ways to better understand their customers and improve their operations. Market basket analysis, a technique used in retail analytics, explores customer purchase patterns to uncover associations between products, identify trends, and optimize pricing and promotions. Customer segmentation allows businesses to tailor their offerings to specific groups, enhancing the customer experience.
Inspiration:
The inspiration for this dataset comes from the need for accessible and customizable market basket datasets. While real-world retail data is sensitive and often restricted, synthetic datasets offer a safe and versatile alternative. Researchers, data scientists, and analysts can use this dataset to develop and test algorithms, models, and analytical tools.
Dataset Information:
The columns provide information about the transactions, customers, products, and purchasing behavior, making the dataset suitable for various analyses, including market basket analysis and customer segmentation. Here's a brief explanation of each column in the Dataset:
Use Cases:
Note: This dataset is entirely synthetic and was generated using the Python Faker library, which means it doesn't contain real customer data. It's designed for educational and research purposes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Loan Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/itssuru/loan-data on 28 January 2022.
--- Dataset description provided by original source is as follows ---
publicly available data from LendingClub.com. Lending Club connects people who need money (borrowers) with people who have money (investors). Hopefully, as an investor you would want to invest in people who showed a profile of having a high probability of paying you back.
We will use lending data from 2007-2010 and be trying to classify and predict whether or not the borrower paid back their loan in full. You can download the data from here.
Here are what the columns represent:
credit.policy: 1 if the customer meets the credit underwriting criteria of LendingClub.com, and 0 otherwise. purpose: The purpose of the loan (takes values "credit_card", "debt_consolidation", "educational", "major_purchase", "small_business", and "all_other"). int.rate: The interest rate of the loan, as a proportion (a rate of 11% would be stored as 0.11). Borrowers judged by LendingClub.com to be more risky are assigned higher interest rates. installment: The monthly installments owed by the borrower if the loan is funded. log.annual.inc: The natural log of the self-reported annual income of the borrower. dti: The debt-to-income ratio of the borrower (amount of debt divided by annual income). fico: The FICO credit score of the borrower. days.with.cr.line: The number of days the borrower has had a credit line. revol.bal: The borrower's revolving balance (amount unpaid at the end of the credit card billing cycle). revol.util: The borrower's revolving line utilization rate (the amount of the credit line used relative to total credit available). inq.last.6mths: The borrower's number of inquiries by creditors in the last 6 months. delinq.2yrs: The number of times the borrower had been 30+ days past due on a payment in the past 2 years. pub.rec: The borrower's number of derogatory public records (bankruptcy filings, tax liens, or judgments).
--- Original source retains full ownership of the source dataset ---
The data consist of 4601 email items, of which 1813 items were identified as spam.
This data frame contains the following columns: * crl.tot * total length of words in capitals * dollar * number of occurrences of the $ symbol * bang * number of occurrences of the ! symbol * money * number of occurrences of the word ‘money’ * n000 * number of occurrences of the string ‘000’ * make * number of occurrences of the word ‘make’ * yesno * outcome variable, a factor with levels n not spam, y spam
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Shark Tank India - Season 1 to season 4 information, with 80 fields/columns and 630+ records.
All seasons/episodes of 🦈 SHARKTANK INDIA 🇮🇳 were broadcasted on SonyLiv OTT/Sony TV.
Here is the data dictionary for (Indian) Shark Tank season's dataset.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.
The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:
Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.
Fork this kernel to get started.
Banner Photo by Edho Pratama from Unsplash.
What is the total number of transactions generated per device browser in July 2017?
The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?
What was the average number of product pageviews for users who made a purchase in July 2017?
What was the average number of product pageviews for users who did not make a purchase in July 2017?
What was the average total transactions per user that made a purchase in July 2017?
What is the average amount of money spent per session in July 2017?
What is the sequence of pages viewed?
About Dataset The dataset contains information about sales transactions, including details such as the customer's age, gender, location, and the products sold. The dataset includes data on both the cost of the product and the revenue generated from its sale, allowing for calculations of profit and profit margins. The dataset includes information on customer age and gender, which could be used to analyze purchasing behavior across different demographic groups. The dataset likely includes both numeric and categorical data, which would require different types of analysis and visualization techniques. Overall, the dataset appears to provide a comprehensive view of sales transactions, with the potential for analysis at multiple levels, including by product, customer, and location. But it does not contain any useful information or insights for decision makers. - After understanding the dataset. - I cleaned it and add some columns & calculations like (Net profit, Age Status). - Making a model in Power Pivot, calculate some measures like (Total profit, COGS, Total revenues) and Making KPIS Model. - Then asked some questions: About Distribution What are the total revenues and profits? What is the best-selling country in terms of revenue? What are the five best-selling states in terms of revenue? What are the five lowest-selling states in terms of revenues? What is the position of age in relation to revenues? About profitability What are the total revenues and profits? Monthly position in terms of revenues and profits? Months position in terms of COGS? What are the top category-selling in terms of revenues & Profit? What are the three best-selling sub-category in terms of profit? About KPIS Explain to me each salesperson's position in terms of Target
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides insights into the spending habits of Gen Z (ages 18-27) across various categories such as rent, groceries, entertainment, education, savings, and more. It contains 1700 records and 15 financial attributes, making it a valuable resource for financial trend analysis, budgeting studies, and machine learning applications in personal finance.