Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description: - This dataset includes all 22 built-in datasets from the Seaborn library, a widely used Python data visualization tool. Seaborn's built-in datasets are essential resources for anyone interested in practicing data analysis, visualization, and machine learning. They span a wide range of topics, from classic datasets like the Iris flower classification to real-world data such as Titanic survival records and diamond characteristics.
This complete collection serves as an excellent starting point for anyone looking to improve their data science skills, offering a wide array of datasets suitable for both beginners and advanced users.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Muhammet Ikbal Elek
Released under CC0: Public Domain
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is about SeaBorn Python Library and I was just practising.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Mohan Pradhan
Released under Apache 2.0
Facebook
Twitterpip install numpy pandas scikit-learn matplotlib seaborn
Sample dataset: Age, Salary, and whether they purchased (1 = Yes, 0 = No)
data = { 'Age': [22, 25, 47, 52, 46, 56, 24, 27, 32, 37], 'Salary': [20000, 25000, 50000, 60000, 58000, 70000, 22000, 27000, 32000, 37000], 'Purchased': [0, 0, 1, 1, 1, 1, 0, 0, 1, 1] } df = pd.DataFrame(data)
Split dataset into Features (X) and Target (y)
X = df[['Age', 'Salary']] # Independent variables y = df['Purchased'] #β¦ See the full description on the dataset page: https://huggingface.co/datasets/changamonika/python.
Facebook
TwitterThis dataset contains 55,000 entries of synthetic customer transactions, generated using Python's Faker library. The goal behind creating this dataset was to provide a resource for learners like myself to explore, analyze, and apply various data analysis techniques in a context that closely mimics real-world data.
About the Dataset: - CID (Customer ID): A unique identifier for each customer. - TID (Transaction ID): A unique identifier for each transaction. - Gender: The gender of the customer, categorized as Male or Female. - Age Group: Age group of the customer, divided into several ranges. - Purchase Date: The timestamp of when the transaction took place. - Product Category: The category of the product purchased, such as Electronics, Apparel, etc. - Discount Availed: Indicates whether the customer availed any discount (Yes/No). - Discount Name: Name of the discount applied (e.g., FESTIVE50). - Discount Amount (INR): The amount of discount availed by the customer. - Gross Amount: The total amount before applying any discount. - Net Amount: The final amount after applying the discount. - Purchase Method: The payment method used (e.g., Credit Card, Debit Card, etc.). - Location: The city where the purchase took place.
Use Cases: 1. Exploratory Data Analysis (EDA): This dataset is ideal for conducting EDA, allowing users to practice techniques such as summary statistics, visualizations, and identifying patterns within the data. 2. Data Preprocessing and Cleaning: Learners can work on handling missing data, encoding categorical variables, and normalizing numerical values to prepare the dataset for analysis. 3. Data Visualization: Use tools like Pythonβs Matplotlib, Seaborn, or Power BI to visualize purchasing trends, customer demographics, or the impact of discounts on purchase amounts. 4. Machine Learning Applications: After applying feature engineering, this dataset is suitable for supervised learning models, such as predicting whether a customer will avail a discount or forecasting purchase amounts based on the input features.
This dataset provides an excellent sandbox for honing skills in data analysis, machine learning, and visualization in a structured but flexible manner.
This is not a real dataset. This dataset was generated using Python's Faker library for the sole purpose of learning
Facebook
Twitterπ°οΈ Exploratory Data Analysis of Luxury Watch Prices
Overview
This project analyzes a large dataset of luxury watches to understand which factors influence price.We focus on brand, movement type, case material, size, gender, and production year.All work was done in Python (Pandas, NumPy, Matplotlib/Seaborn) on Google Colab.
Dataset
Rows: ~172,000
Columns: 14
Unit of observation: one watch listing
Main columns
name β watch/listing title
price β listedβ¦ See the full description on the dataset page: https://huggingface.co/datasets/yotam22/watches.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
These are the visuals created from the resultant findings of the Cyclistic bike share data analysis , using the latest 12 months of data of time period of January,2022 - December 2022 .
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Required Libraries
The following libraries are required to run the scripts in this repository. You can install them using `pip`:
```bash pip install pandas numpy argparse json time random openai copy statistics krippendorff sklearn seaborn matplotlib together anthropic google-generativeai
Make sure to also install any other dependencies required by the specific model API if you plan on using models like GPT-4 or Claude:
openaianthropictogetherAll the experiments were done using python 3.10.11
For each dataset, we have a folder that contains process.py, heatmap.py, ira_sample.py. The folder also contains the relevant datasets and plots.
File Description:
Commands for datasets (Except Code Summarization):
Generating samples for different models:
python process.py --model gpt-4 --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx
python process.py --model gpt-3.5-turbo --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx
python process.py --model llama3--fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx
python process.py --model mixtral --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx
python process.py --model claude --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx
python process.py --model gemini --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx
For Figure (1-5):
python heatmap.py
For Figure (7-10):
python ira_sample.py
Commands for datasets (Code Summarization):
python process.py --what accurate --model gpt-4 --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx
python process.py --what accurate --model gpt-3.5-turbo --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx
python process.py --what accurate --model llama3--fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx
python process.py --what accurate --model mixtral --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx
python process.py --what accurate --model claude --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx
python process.py --what accurate --model gemini --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx
For Figure (1-5):
python heatmap.py
For Figure (7-10):
python ira_sample.py
What="accurate", "adequate", "concise", "similarity"
For Figure 6:
python scatter.py
For Figure 12 & 13, please copy majority.py and probability.py outside the shared folders.
For Figure 12:
python probability.py
For Figure 6:
python majority.py
We also provided sample prompts from all datasets in Prompts.pdf
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The analysed data and complete scripts for the permutation tests and mixed linear regression models (MLRMs) used in the paper 'Identifying Key Drivers of Product Formation in Microbial Electrosynthesis with a Mixed Linear Regression Analysis'.
Python version 3.10.13 with packages numpy, pandas, os, scipy.optimize, scipy.stats, sklearn.metrics, matplotlib.pyplot, statsmodels.formula.api, seaborn are required to run the .py files. Ensure all packages are installed before running the scripts. Data files required to run the code (.xlsx and .csv format) are included in the relevant folders.
Facebook
Twitterhttps://www.gnu.org/licenses/gpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/gpl-3.0-standalone.html
Data for 2D Lagrangian Particle tracking and evaluation for their hydrodynamic characteristics ## Abstract This dataset entails PYTHON code for fluid mechanic evaluation of Lagrangian Particles with the "Consensus-Based tracking with Selective Rejection of Tracklets" (CSRT) algorithm in the "OpenCV" library, written by Ryan Rautenbach in the framework of his Master thesis. ## Workflow for Lagrangian Particle tracking and evaluatio via OpenCV In the following a brief introduction and guide based on the folders in the repository is laid out. More code specific instructions can be found in the respective codes. working_env_RMR.yml --> Contains the entire environment including software versions (here used with Spyder IDE and Conda) with which the datasets were evaluated. 01 --> The tracking always begins with the same 01_milti[...] folder in which the python code with OpenCV algorithm is located. For tracking the tracking to work certain directories are required in which the raw images are to be stored (separate from anything else) as well as a directory in which the results are to be save (not the same directory as the raw data). After tracking is completed for all respective experiments and the results directories are adequately labelled and stored any of the other code files can be used for respective analyses. The order of folders beyond the first 01 directory has no relevance to the order of evaluation however can ease the understanding of evaluated data if followed. 02 --> Evaluation of amount of circulations and respective circulation time in experimental vat. (code can be extended to calculate the circulation time based on the various plains that are artificially set) 03 --> Code for the calculation of the amount of contacts with the vat floor. Code requires certain visual evaluations based on the LP trajectories, as the plain/barrier for the contact evaluation has to be manually set. 04 --> Contains two codes that can be applied to results data to combine individual results into larger more processable arrays within python. 05 --> Contains the code to plot the trajectory of single experiments of Lagrangian particles based on their positional results and velocity at respective position, highlighting the trajectory over the experiment. 06 --> Condes to create 1D histograms based on the probability density distribution and velocity distributions in cumulative experiments. 07 --> Codes for plotting the 2D probability density distribution (2D Histograms) of Lagrangian Particles based on the cumulative experiments. Code provides values for the 2D grid, plotting is conducted in Origin Lab or similar graphing tools, graphing can also be conducted in python whereby the seaborn (matplotlib) library is suggested. 08 --> Contain the code for the dimensionless evaluation of the results based on the respective Stokes number approaches and weighted averages. 2D histograms are also vital to this evaluation, whereby the plotting is again conducted in Origin Lab as values are only calculated in code. 09 --> Directory does not contain any python codes but instead contains the respective Origin Lab files for the graphing, plotting and evaluation of results calculated via python is given. Respective tables, histograms and heat maps are hereby given to be used as templates if necessary.
Facebook
Twitterhttps://www.usa.gov/government-works/https://www.usa.gov/government-works/
This dataset was created by hardly_human
Released under U.S. Government Works
Facebook
TwitterThe bank.csv dataset describes about a phone call between customer and customer care staffs who are working for Portuguese banking institution. The dataset is about, whether the customer will get the scheme or product such as bank term deposit. Maximum the data will have βyesβ or βnoβ type data.
The main goal is to predict if clients will subscribe to a term deposit or not.
Bank Client Data: 1 - age: (numeric) 2 - job: type of job (categorical: admin., blue-collar, entrepreneur, housemaid, management, retired, self-employed, services, student, technician, unemployed, unknown) 3 - marital: marital status (categorical: divorced, married, single, unknown; note: divorced means either divorced or widowed) 4 - education: (categorical: basic.4y, basic.6y, basic.9y, high.school, illiterate, professional.course, university.degree, unknown) 5 - default: has credit in default? (categorical: no, yes, unknown) 6 - housing: has housing loan? (categorical: no, yes, unknown) 7 - loan: has personal loan? (categorical: no, yes, unknown)
Related with the Last Contact of the Current Campaign: 8 - contact: contact communication type (categorical: cellular, telephone) 9 - month: last contact month of year (categorical: jan, feb, mar, ..., nov, dec) 10 - day_of_week: last contact day of the week (categorical: mon, tue, wed, thu, fri) 11 - duration: last contact duration, in seconds (numeric). Important note: this attribute highly affects the output target (e.g., if duration=0 then y='no'). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model.
Other Attributes: 12 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact) 13 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previously contacted) 14 - previous: number of contacts performed before this campaign and for this client (numeric) 15 - poutcome: outcome of the previous marketing campaign (categorical: failure, nonexistent, success)
#Social and Economic Context Attributes 16 - emp.var.rate: employment variation rate - quarterly indicator (numeric) 17 - cons.price.idx: consumer price index - monthly indicator (numeric) 18 - cons.conf.idx: consumer confidence index - monthly indicator (numeric) 19 - euribor3m: euribor 3 month rate - daily indicator (numeric) 20 - nr.employed: number of employees - quarterly indicator (numeric)
Output Variable (Desired Target): 21 - y (deposit): - has the client subscribed a term deposit? (binary: yes, no) -> changed column title from '***y***' to '***deposit***'
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Quantitative data of figures and graphing scripts from the thesis titled 'Developing a congestion management scheme to reduce the impact of congestion in mixed traffic LoRaWANs'. The files contain the processed output of simulations conducted with a modified version of the ns-3 plugin lorawan. Processed simulation output was Pandas dataframes stored in text files. Software used: ns-3 (version 3.30), Jupyter notebooks, Python with packages sem, pandas, seaborn, modified version of lorawan module from signetlabdei. Python scripts refer to Std and Ex, std refers to the standard LoRaWAN module and Ex refers to the Extended version of the module with the algorithms presented in the thesis. Text files contain a legend at the top of all of the fields present in the dataframe.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ObjectiveThe primary objective of this study was to analyze CpG dinucleotide dynamics in coronaviruses by comparing Wuhan-Hu-1 with its closest and most distant relatives. Heatmaps were generated to visualize CpG counts and O/E ratios across intergenic regions, providing a clear depiction of conserved and divergent CpG patterns.Methods1. Data CollectionSource : The dataset includes CpG counts and O/E ratios for various coronaviruses, extracted from publicly available genomic sequences.Format : Data was compiled into a CSV file containing columns for intergenic regions, CpG counts, and O/E ratios for each virus.2. PreprocessingData Cleaning :Missing values (NaN), infinite values (inf, -inf), and blank entries were handled using Python's pandas library.Missing values were replaced with column means, and infinite values were capped at a large finite value (1e9).Reshaping :The data was reshaped into matrices for CpG counts and O/E ratios using meltpandas[] and pivot[] functions.3. Distance CalculationEuclidean Distance :Pairwise Euclidean distances were calculated between Wuhan-Hu-1 and other viruses using the scipy.spatial.distance.euclidean function.Distances were computed separately for CpG counts and O/E ratios, and the total distance was derived as the sum of both metrics.4. Identification of Closest and Distant RelativesThe virus with the smallest total distance was identified as the closest relative .The virus with the largest total distance was identified as the most distant relative .5. Heatmap GenerationTools :Heatmaps were generated using Python's seaborn library (sns.heatmap) and matplotlib for visualization.Parameters :Heatmaps were annotated with numerical values for clarity.A color gradient (coolwarm) was used to represent varying CpG counts and O/E ratios.Titles and axis labels were added to describe the comparison between Wuhan-Hu-1 and its relatives.ResultsClosest Relative :The closest relative to Wuhan-Hu-1 was identified based on the smallest Euclidean distance.Heatmaps for CpG counts and O/E ratios show high similarity in specific intergenic regions.Most Distant Relative :The most distant relative was identified based on the largest Euclidean distance.Heatmaps reveal significant differences in CpG dynamics compared to Wuhan-Hu-1 .Tools and LibrariesThe following tools and libraries were used in this analysis:Programming Language :Python 3.13Libraries :pandas: For data manipulation and cleaning.numpy: For numerical operations and handling missing/infinite values.scipy.spatial.distance: For calculating Euclidean distances.seaborn: For generating heatmaps.matplotlib: For additional visualization enhancements.File Formats :Input: CSV files containing CpG counts and O/E ratios.Output: PNG images of heatmaps.Files IncludedCSV File :Contains the raw data of CpG counts and O/E ratios for all viruses.Heatmap Images :Heatmaps for CpG counts and O/E ratios comparing Wuhan-Hu-1 with its closest and most distant relatives.Python Script :Full Python code used for data processing, distance calculation, and heatmap generation.Usage NotesResearchers can use this dataset to further explore the evolutionary dynamics of CpG dinucleotides in coronaviruses.The Python script can be adapted to analyze other viral genomes or datasets.Heatmaps provide a visual summary of CpG dynamics, aiding in hypothesis generation and experimental design.AcknowledgmentsSpecial thanks to the open-source community for developing tools like pandas, numpy, seaborn, and matplotlib.This work was conducted as part of an independent research project in molecular biology and bioinformatics.LicenseThis dataset is shared under the CC BY 4.0 License , allowing others to share and adapt the material as long as proper attribution is given.DOI: 10.6084/m9.figshare.28736501
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This synthetic dataset is designed specifically for practicing data visualization and exploratory data analysis (EDA) using popular Python libraries like Seaborn, Matplotlib, and Pandas.
Unlike most public datasets, this one includes a diverse mix of column types:
π Date columns (for time series and trend plots) π’ Numerical columns (for histograms, boxplots, scatter plots) π·οΈ Categorical columns (for bar charts, group analysis)
Whether you are a beginner learning how to visualize data or an intermediate user testing new charting techniques, this dataset offers a versatile playground.
Feel free to:
Create EDA notebooks Practice plotting techniques Experiment with filtering, grouping, and aggregations π οΈ No missing values, no data cleaning needed β just download and start exploring!
Hope you find this helpful. Looking forward to hearing from you all.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
About Datasets: - Domain : Marketing - Project: User Profiling and Segmentation - Datasets: user_profile_for_ads.csv - Dataset Type: Excel Data - Dataset Size: 16k+ records
KPI's: 1. Distribution of Key Demographic Variables like: a. Count of Age b. Count of Gender c. Count of Education Level d. Count of Income Level e. Count of Device Usage
Understanding Online Behavior like: a. Count of Time Spent Online (hrs/Weekday) b. Count of Time Spent Online (hrs/Weekend)
Ad Interaction Metrics: a. Count of likes and Reactions b. Count of click through rates (CTR) c. Count of Conversion Rate d. Count of Ad Interaction Time (secs) e. Count of Ad Interaction Time by Top Interests
Process: 1. Understanding the problem 2. Data Collection 3. Exploring and analyzing the data 4. Interpreting the results
This data contains pandas, matplotlib, seaborn, isnull, set_style, suptitle, countplot, palette, tight_layout, figsize, histplot, barplot, sklearn, standardscaler, OneHotEncoder, ColumnTransformer, Pipeline, KMeans, cluster_means, groupby, numpy, radar_df
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
To compare baseball player statistics effectively using visualization, we can create some insightful plots. Below are the steps to accomplish this in Python using libraries like Pandas and Matplotlib or Seaborn.
First, we need to load the judge.csv file into a DataFrame. This will allow us to manipulate and analyze the data easily.
Before creating visualizations, itβs good to understand the data structure and identify the columns we want to compare. The relevant columns in your data include pitch_type, release_speed, game_date, and events.
We can create various visualizations, such as: - A bar chart to compare the average release speed of different pitch types. - A line plot to visualize trends over time based on game dates. - A scatter plot to analyze the relationship between release speed and the outcome of the pitches (e.g., strikeouts, home runs).
Here is a sample code to demonstrate how to create these visualizations using Matplotlib and Seaborn:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load the data
df = pd.read_csv('judge.csv')
# Display the first few rows of the dataframe
print(df.head())
# Set the style of seaborn
sns.set(style="whitegrid")
# 1. Average Release Speed by Pitch Type
plt.figure(figsize=(12, 6))
avg_speed = df.groupby('pitch_type')['release_speed'].mean().sort_values()
sns.barplot(x=avg_speed.values, y=avg_speed.index, palette="viridis")
plt.title('Average Release Speed by Pitch Type')
plt.xlabel('Average Release Speed (mph)')
plt.ylabel('Pitch Type')
plt.show()
# 2. Trends in Release Speed Over Time
# First, convert the 'game_date' to datetime
df['game_date'] = pd.to_datetime(df['game_date'])
plt.figure(figsize=(14, 7))
sns.lineplot(data=df, x='game_date', y='release_speed', estimator='mean', ci=None)
plt.title('Trends in Release Speed Over Time')
plt.xlabel('Game Date')
plt.ylabel('Average Release Speed (mph)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
# 3. Scatter Plot of Release Speed vs. Events
plt.figure(figsize=(12, 6))
sns.scatterplot(data=df, x='release_speed', y='events', hue='pitch_type', alpha=0.7)
plt.title('Release Speed vs. Events')
plt.xlabel('Release Speed (mph)')
plt.ylabel('Event Type')
plt.legend(title='Pitch Type', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()
These visualizations will help you compare player statistics in a meaningful way. You can customize the plots further based on your specific needs, such as filtering data for specific players or seasons. If you have any specific comparisons in mind or additional data to visualize, let me know!
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains coffee shop transaction records, including details about sales, payment type, time of purchase, and customer preferences. It is specifically curated for data visualization, dashboard building, and business analytics projects in tools like Power BI, Tableau, and Python visualization libraries (Matplotlib, Seaborn, Plotly).
With attributes covering time of day, weekdays, months, coffee types, and revenue, this dataset provides a strong foundation for analyzing customer behavior, sales patterns, and business performance trends.
Dataset Structure-
File format: CSV Columns (features):
hour_of_day β Hour of purchase (0β23)
cash_type β Mode of payment (cash / card)
money β Transaction amount (in local currency)
coffee_name β Type of coffee purchased (e.g., Latte, Americano, Hot Chocolate)
Time_of_Day β Categorized time of purchase (Morning, Afternoon, Night)
Weekday β Day of the week (e.g., Mon, Tue, β¦)
Month_name β Month of purchase (e.g., Jan, Feb, Mar)
Weekdaysort β Numeric representation for weekday ordering (1 = Mon, 7 = Sun)
Monthsort β Numeric representation for month ordering (1 = Jan, 12 = Dec)
Date β Date of transaction (YYYY-MM-DD)
Time β Exact time of transaction (HH:MM:SS)
Potential Data Visualizations-
This dataset is well-suited for interactive dashboards and visual reports, such as:
π Sales by Coffee Type (e.g., top-selling drinks)
β° Sales by Hour of Day (peak business hours)
π Sales by Time of Day (Morning vs Afternoon vs Night trends)
π Sales by Weekday & Month (seasonal & weekly demand patterns)
π³ Payment Method Breakdown (cash vs card usage)
π Revenue Trends Over Time (daily/monthly growth analysis)
Use Cases-
Power BI / Tableau dashboards
Python data visualization (Matplotlib, Seaborn, Plotly)
Data storytelling projects
Business intelligence & decision-making simulations
Training projects for beginners in data visualization & analytics
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
π Description: This dataset contains sales records from a global superstore, including details about orders, customers, products, shipping, and profitability. The goal of this analysis is to uncover business insights related to sales performance, regional trends, shipping efficiency, and product profitability.
π Key Objectives: Analyze overall sales, profit, and discount trends
Identify top-performing regions, segments, and categories
Evaluate the impact of shipping mode on delivery and profit
Perform RFM (Recency, Frequency, Monetary) analysis on customers
Visualize key metrics with Matplotlib and Seaborn
π Dataset Features: Column --Description Order ID--- Unique identifier for each order Order Date--- Date when the order was placed Ship Mode --Mode of shipping used Customer Name --Full name of the customer Segment ---Customer segment (Consumer, Corporate, Home Office) Region ---Geographical region of the order Product Category --Category and sub-category of the product Sales --Sales amount Quantity --Number of units sold Profit --Profit earned on the sale Discount--- Discount applied on the product
π Tools & Libraries: Python
Pandas, NumPy β for data manipulation
Matplotlib, Seaborn β for data visualization
Excel β for data import and inspection
π― Business Impact: By understanding sales and profitability patterns, this analysis helps identify opportunities for cost optimization, product focus, and regional strategy β essential for scaling business performance.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description: - This dataset includes all 22 built-in datasets from the Seaborn library, a widely used Python data visualization tool. Seaborn's built-in datasets are essential resources for anyone interested in practicing data analysis, visualization, and machine learning. They span a wide range of topics, from classic datasets like the Iris flower classification to real-world data such as Titanic survival records and diamond characteristics.
This complete collection serves as an excellent starting point for anyone looking to improve their data science skills, offering a wide array of datasets suitable for both beginners and advanced users.