Facebook
TwitterThis dataset was created by Deepali Sukhdeve
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Data Cleaning from Public Nashville Housing Data:
Standardize the Date Format
Populate Property Address data
Breaking out Addresses into Individual Columns (Address, City, State)
Change Y and N to Yes and No in the "Sold as Vacant" field
Remove Duplicates
Delete Unused Columns
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Data exploration, cleaning, and arrangement with Covid Death and Covid Vaccination which is involved:
Data that going to be using
Shows the likelihood of dying if you contract covid in your country
Show what percentage of the population got Covid
Looking at Countries with the Highest Infection Rate compared to the Population
Showing the Country with the Highest Death Count per Population
Break things down by continent
Continents with the Highest death count per population
Looking at Total Population vs Vaccinations
Used CTE and Temp Table
Creating View to store data for later visualizations
Facebook
TwitterThe dataset contained information on housing data in the Nashville, TN area. I used SQL Server to clean the data to make it easier to use. For example, I converted some dates to remove unnecessary timestamps; I populated data for null values; I changed address columns from containing all of the address, city and state into separate columns; I changed a column that had different representations of the same data into consistent usage; I removed duplicate rows; and I deleted unused columns.
Facebook
TwitterThis is a in-depth analysis I have created using data pulled from an open source (ODbL) data project that was provided on Kaggle:
Pavansubhash. (2017). IBM HR Analytics Employee Attrition & Performance, Version 1. Retrieved August 3rd, 2023 from https://www.kaggle.com/datasets/pavansubhasht/ibm-hr-analytics-attrition-dataset.
Problem: The VP of People Operations/HR at [Company] wants to better understand what efforts they can make to retain more employees every year.
Question: How does education, job involvement, and work life balance effect employee attrition?
Metrics
A Survey was sent out 2068 current and past employees which asked a series of clear and consist questions inquiring about different variables involving the workplace. The surveys where anonymous to assure that employees answered truthfully and protecting the integrity of the data collected.
Education: 1)Below College 2)Some College 3)Bachelor 4)Master 5)Doctor
Job Involvement: 1)Low 2)Medium 3)High 4)Very High
Work Life Balance: 1)Bad 2)Good 3)Better 4)Best
Facebook
Twitter**Title: **Practical Exploration of SQL Constraints: Building a Foundation in Data Integrity Introduction: Welcome to my Data Analysis project, where I focus on mastering SQL constraintsāa pivotal aspect of database management. This project centers on hands-on experience with SQL's Data Definition Language (DDL) commands, emphasizing constraints such as PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK, and DEFAULT. In this project, I aim to demonstrate my foundational understanding of enforcing data integrity and maintaining a structured database environment. Purpose: The primary purpose of this project is to showcase my proficiency in implementing and managing SQL constraints for robust data governance. By delving into the realm of constraints, you'll gain insights into my SQL skills and how I utilize constraints to ensure data accuracy, consistency, and reliability within relational databases. What to Expect: Within this project, you will find a series of projects that focus on the implementation and utilization of SQL constraints. These projects highlight my command over the following key constraint types: NOT NULL: The NOT NULL constraint is crucial for ensuring the presence of essential data in a column. PRIMARY KEY: Ensuring unique identification of records for data integrity. FOREIGN KEY: Establishing relationships between tables to maintain referential integrity. UNIQUE: Guaranteeing the uniqueness of values within specified columns. CHECK: Implementing custom conditions to validate data entries. DEFAULT: Setting default values for columns to enhance data reliability. Each exercise within this project is accompanied by clear and concise SQL scripts, explanations of the intended outcomes, and practical insights into the application of these constraints. My goal is to showcase how SQL constraints serve as crucial tools for creating a structured and dependable database foundation. I invite you to explore these projects in detail, where I provide hands-on examples that highlight the importance and utility of SQL constraints. Together, these projects underscore my commitment to upholding data quality, ensuring data accuracy, and harnessing the power of SQL constraints for informed decision-making in data analysis. 3.1 CONSTRAINT - ENFORCING NOT NULL CONSTRAINT WHILE CREATING NEW TABLE. 3.2 CONSTRAINT- ENFORCE NOT NULL CONSTRAINT ON EXISTING COLUMN. 3.3 CONSTRAINT - ENFORCING PRIMARY KEY CONSTRAINT WHILE CREATING A NEW TABLE. 3.4 CONSTRAINT - ENFORCE PRIMARY KEY CONSTRAINT ON EXISTING COLUMN. 3.5 CONSTRAINT - ENFORCING FOREIGN KEY CONSTRAINT WHILE CREATING NEW TABLE. 3.6 CONSTRAINT - ENFORCE FOREIGN KEY CONSTRAINT ON EXISTING COLUMN. 3.7CONSTRAINT - ENFORCING UNIQUE CONSTRAINTS WHILE CREATING A NEW TABLE. 3.8 CONSTRAINT - ENFORCING UNIQUE CONSTRAINT IN EXISTING TABLE. 3.9 CONSTRAINT - ENFORCING CHECK CONSTRAINT IN NEW TABLE. 3.10 CONSTRAINT - ENFORCING CHECK CONSTRAINT IN THE EXISTING TABLE. 3.11 CONSTRAINT - ENFORCING DEFAULT CONSTRAINT IN THE NEW TABLE. 3.12 CONSTRAINT - ENFORCING DEFAULT CONSTRAINT IN THE EXISTING TABLE.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
I generated a database for sales data of a supermarket in order to practice determining KPI's and make data visualizations.
This data set includes: - Unique sales id for each row. - Branch of the supermarket (New York, Chicago, and Los Angeles). - City of the supermarket (New York, Chicago, and Los Angeles). - Customer Type (Member or Normal). Members receive reward points. - Gender (Male or Female) - Product name of the product sold. - Product category of the product sold. - Unit price of each product sold. - Quantity of the product sold. - 7% sales tax of each product. - Total price of the product after tax. - Reward points for only members customer type.
The Creation Queries.sql file will have the creation query for the Sales table and Insert queries. The data provided here is the same as what is found in the sales.csv file.
The Sales and Revenue KPIs.sql file will have the queries I used to perform my analysis on key performance indicators relating to sales and revenue of this fictional company.
The Customer Behavior KPIs.sql file will have the queries I used to perform my analysis on key performance indicators relating to customer behavior of this fictional company.
The Product Performance KPIs.sql file will have the queries I used to perform my analysis on key performance indicators relating to product performance of this fictional company.
Facebook
TwitterThe Practical Exercise in SQL Data Definition Language (DDL) Commands is a hands-on project designed to help you gain a deep understanding of fundamental DDL commands in SQL, including:
This project aims to enhance your proficiency in using SQL to create, modify, and manage database structures effectively.
1.1 DDL-CREATE TABLE
1.2 DDL-ALTER TABLE(ADD)
1.3 DDL-ALTER(RENAME COLUMN NAME)
1.4 DDL-ALTER(RENAME TABLE NAME)
1.5 DDL-ALTER(DROP COLUMN FROM TABLE)
1.6 DDL-ALTER(DROP TABLE)
1.7 DDL- TRUNCATE TABLE
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This project is built on the AdventureWorks dataset, originally provided by Microsoft for SQL Server samples. This comprehensive dataset models a bicycle manufacturer and its sales to global markets, offering a realistic foundation for a data analytics portfolio.
The raw data can be accessed and downloaded directly from the official Microsoft GitHub repository: https://github.com/microsoft/sql-server-samples/tree/master/samples/databases/adventure-works
The work presented in this portfolio project demonstrates my end-to-end data analysis skills, from initial data cleaning and modeling to creating an interactive, insight-driven dashboard. Within this project, you will find examples of various data visualizations and a dashboard layout that follows the F-pattern for optimized user experience.
I encourage you to download the dataset and follow along with my analysis. Feel free to replicate my work, critique my methods, or build upon it with your own creative insights and improvements. Your feedback and engagement are highly welcomed!
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Cyclistic Bike-Share Dataset (2022ā2024) ā Cleaned & Merged
This dataset contains three full years (2022, 2023, and 2024) of publicly available Cyclistic bike-share trip data. All yearly files have been cleaned, standardized, and merged into a single high-quality master dataset for easy analysis.
The dataset is ideal for:
š¹ Key Cleaning & Processing Steps - Removed duplicate records - Handled missing values - Standardized column names - Converted date-time formats - Created calculated columns (ride length, day, month, etc.) - Merged yearly datasets into one master CSV file (3.17 GB)
š¹ What You Can Analyze - Member vs Casual rider behavior - Peak riding hours and days - Monthly & seasonal trends - Trip duration patterns - Station usage & demand forecasting
This dataset is especially useful for data analyst portfolio projects and technical interview preparation.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset provides a comprehensive view of retail operations, combining sales transactions, return records, and shipping cost details into one analysis-ready package. Itās ideal for data analysts, business intelligence professionals, and students looking to practice Power BI, Tableau, or SQL projects focusing on sales performance, profitability, and operational cost analysis.
Dataset Structure
Orders Table ā Detailed transactional data
Row ID
Order ID
Order Date, Ship Date, Delivery Duration
Ship Mode
Customer ID, Customer Name, Segment, Country, City, State, Postal Code, Region
Product ID, Category, Sub-Category, Product Name
Sales, Quantity, Discount, Discount Value, Profit, COGS
Returns Table ā Return records by Order ID
Returned (Yes/No)
Order ID
Shipping Cost Table ā State-level shipping expenses
State
Shipping Cost Per Unit
Potential Use Cases
Calculate gross vs. net profit after considering returns and shipping costs.
Perform regional sales and profit analysis.
Identify high-return products and loss-making categories.
Visualize KPIs in Power BI or Tableau.
Build predictive models for returns or shipping costs.
Source & Context The dataset is designed for educational and analytical purposes. It is inspired by retail and e-commerce operations data and was prepared for data analytics portfolio projects.
License Open for use in learning, analytics projects, and data visualization practice.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
A complete operational database from a fictional Class 8 trucking company spanning three years. This isn't scraped web data or simplified tutorial contentāit's a realistic simulation built from 12 years of real-world logistics experience, designed specifically for analysts transitioning into supply chain and transportation domains.
The dataset contains 85,000+ records across 14 interconnected tables covering everything from driver assignments and fuel purchases to maintenance schedules and delivery performance. Each table maintains proper foreign key relationships, making this ideal for practicing complex SQL queries, building data pipelines, or developing operational dashboards.
SQL Learners: Master window functions, CTEs, and multi-table JOINs using realistic business scenarios rather than contrived examples.
Data Analysts: Build portfolio projects that demonstrate understanding of operational metrics: cost-per-mile analysis, fleet utilization optimization, driver performance scorecards.
Aspiring Supply Chain Analysts: Work with authentic logistics data patternsāseasonal freight volumes, equipment utilization rates, route profitability calculationsāwithout NDA restrictions.
Data Science Students: Develop predictive models for maintenance scheduling, driver retention, or route optimization using time-series data with actual business context.
Career Changers: If you're moving from operations into analytics (like the dataset creator), this provides a bridgeāyour domain knowledge becomes a competitive advantage rather than a gap to explain.
Most logistics datasets are either proprietary (unavailable) or overly simplified (unrealistic). This fills the gap: operational complexity without confidentiality concerns. The data reflects real industry patterns:
Core Entities (Reference Tables): - Drivers (150 records) - Demographics, employment history, CDL info - Trucks (120 records) - Fleet specs, acquisition dates, status - Trailers (180 records) - Equipment types, current assignments - Customers (200 records) - Shipper accounts, contract terms, revenue potential - Facilities (50 records) - Terminals and warehouses with geocoordinates - Routes (60+ records) - City pairs with distances and rate structures
Operational Transactions: - Loads (57,000+ records) - Shipment details, revenue, booking type - Trips (57,000+ records) - Driver-truck assignments, actual performance - Fuel Purchases (131,000+ records) - Transaction-level data with pricing - Maintenance Records (6,500+ records) - Service history, costs, downtime - Delivery Events (114,000+ records) - Pickup/delivery timestamps, detention - Safety Incidents (114 records) - Accidents, violations, claims
Aggregated Analytics: - Driver Monthly Metrics (5,400+ records) - Performance summaries - Truck Utilization Metrics (3,800+ records) - Equipment efficiency
Temporal Coverage: January 2022 through December 2024 (3 years)
Geographic Scope: National operations across 25+ major US cities
Realistic Patterns: - Seasonal freight fluctuations (Q4 peaks) - Historical fuel price accuracy - Equipment lifecycle modeling - Driver retention dynamics - Service level variations
Data Quality: - Complete foreign key integrity - No orphaned records - Intentional 2% null rate in driver/truck assignments (reflects reality) - All timestamps properly sequenced - Financial calculations verified
Business Intelligence: Create executive dashboards showing revenue per truck, cost per mile, driver efficiency rankings, maintenance spend by equipment age, customer concentration risk.
Predictive Analytics: Build models forecasting equipment failures based on maintenance history, predict driver turnover using performance metrics, estimate route profitability for new lanes.
Operations Optimization: Analyze route efficiency, identify underutilized assets, optimize maintenance scheduling, calculate ideal fleet size, evaluate driver-to-truck ratios.
SQL Mastery: Practice window functions for running totals and rankings, write complex JOINs across 6+ tables, implement CTEs for hierarchical queries, perform cohort analysis on driver retention.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
š Data Science Careers in 2025: Jobs and Salary Trends in Pakistan š Data Science is one of the fastest-growing fields, and by 2025, the demand for skilled professionals in Pakistan will only increase. If youāre considering a career in Data Science, hereās what you need to know about the top jobs and salary trends.
š Top Data Science Jobs in 2025 1) Data Scientist Avg Salary: PKR 1.2M - 2.5M/year (Entry-Level), PKR 3M - 6M/year (Experienced) Skills: Python, R, Machine Learning, Data Visualization
2) Data Analyst Avg Salary: PKR 800K - 1.5M/year (Entry-Level), PKR 2M - 3.5M/year (Experienced) Skills: SQL, Excel, Tableau, Power BI
3) Machine Learning Engineer Avg Salary: PKR 1.5M - 3M/year (Entry-Level), PKR 4M - 7M/year (Experienced) Skills: TensorFlow, PyTorch, Deep Learning, NLP
4)Business Intelligence Analyst Avg Salary: PKR 1M - 2M/year (Entry-Level), PKR 2.5M - 4M/year (Experienced) Skills: Data Warehousing, ETL, Dashboarding
5) AI Research Scientist Avg Salary: PKR 2M - 4M/year (Entry-Level), PKR 5M - 10M/year (Experienced) Skills: AI Algorithms, Research, Advanced Mathematic
š” Why Choose Data Science? High Demand: Every industry in Pakistan needs data professionals. Attractive Salaries: Competitive pay based on technical expertise. Growth Opportunities: Unlimited career growth in this field.
š Salary Trends Entry-Level: PKR 800K - 1.5M/year Mid-Level: PKR 2M - 4M/year Senior-Level: PKR 5M+ (depending on expertise and industry)
š ļø How to Get Started? Learn Skills: Focus on Python, SQL, Machine Learning, and Data Visualization. Build Projects: Work on real-world datasets to create a strong portfolio. Network: Connect with industry professionals and join Data Science communities.
work_year: The year in which the data was recorded. This field indicates the temporal context of the data, important for understanding salary trends over time.
job_title: The specific title of the job role, like 'Data Scientist', 'Data Engineer', or 'Data Analyst'. This column is crucial for understanding the salary distribution across various specialized roles within the data field.
job_category: A classification of the job role into broader categories for easier analysis. This might include areas like 'Data Analysis', 'Machine Learning', 'Data Engineering', etc.
salary_currency: The currency in which the salary is paid, such as USD, EUR, etc. This is important for currency conversion and understanding the actual value of the salary in a global context.
salary: The annual gross salary of the role in the local currency. This raw salary figure is key for direct regional salary comparisons.
salary_in_usd: The annual gross salary converted to United States Dollars (USD). This uniform currency conversion aids in global salary comparisons and analyses.
employee_residence: The country of residence of the employee. This data point can be used to explore geographical salary differences and cost-of-living variations.
experience_level: Classifies the professional experience level of the employee. Common categories might include 'Entry-level', 'Mid-level', 'Senior', and 'Executive', providing insight into how experience influences salary in data-related roles.
employment_type: Specifies the type of employment, such as 'Full-time', 'Part-time', 'Contract', etc. This helps in analyzing how different employment arrangements affect salary structures.
work_setting: The work setting or environment, like 'Remote', 'In-person', or 'Hybrid'. This column reflects the impact of work settings on salary levels in the data industry.
company_location: The country where the company is located. It helps in analyzing how the location of the company affects salary structures.
company_size: The size of the employer company, often categorized into small (S), medium (M), and large (L) sizes. This allows for analysis of how company size influences salary.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterThis dataset was created by Deepali Sukhdeve