13 datasets found
  1. Data Cleaning Portfolio Project

    • kaggle.com
    zip
    Updated Apr 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepali Sukhdeve (2024). Data Cleaning Portfolio Project [Dataset]. https://www.kaggle.com/datasets/deepalisukhdeve/data-cleaning-portfolio-project
    Explore at:
    zip(6053781 bytes)Available download formats
    Dataset updated
    Apr 2, 2024
    Authors
    Deepali Sukhdeve
    Description

    Dataset

    This dataset was created by Deepali Sukhdeve

    Contents

  2. SQL Data Cleaning Portfolio V2

    • kaggle.com
    zip
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammad Hurairah (2023). SQL Data Cleaning Portfolio V2 [Dataset]. https://www.kaggle.com/datasets/mohammadhurairah/sql-cleaning-portfolio-v2/discussion
    Explore at:
    zip(6054498 bytes)Available download formats
    Dataset updated
    Jun 16, 2023
    Authors
    Mohammad Hurairah
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Data Cleaning from Public Nashville Housing Data:

    1. Standardize the Date Format

    2. Populate Property Address data

    3. Breaking out Addresses into Individual Columns (Address, City, State)

    4. Change Y and N to Yes and No in the "Sold as Vacant" field

    5. Remove Duplicates

    6. Delete Unused Columns

  3. SQL Data Exploration COVID Portfolio V1

    • kaggle.com
    zip
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammad Hurairah (2023). SQL Data Exploration COVID Portfolio V1 [Dataset]. https://www.kaggle.com/datasets/mohammadhurairah/covid-portfolio-project-sql-v1
    Explore at:
    zip(61483158 bytes)Available download formats
    Dataset updated
    Jun 16, 2023
    Authors
    Mohammad Hurairah
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Data exploration, cleaning, and arrangement with Covid Death and Covid Vaccination which is involved:

    1. Data that going to be using

    2. Shows the likelihood of dying if you contract covid in your country

    3. Show what percentage of the population got Covid

    4. Looking at Countries with the Highest Infection Rate compared to the Population

    5. Showing the Country with the Highest Death Count per Population

    6. Break things down by continent

    7. Continents with the Highest death count per population

    8. Looking at Total Population vs Vaccinations

    9. Used CTE and Temp Table

    10. Creating View to store data for later visualizations

  4. Cleaning Data in SQL Portfolio Project

    • kaggle.com
    zip
    Updated Apr 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Austin Kennell (2023). Cleaning Data in SQL Portfolio Project [Dataset]. https://www.kaggle.com/austinkennell/cleaning-data-in-sql-portfolio-project
    Explore at:
    zip(6054868 bytes)Available download formats
    Dataset updated
    Apr 19, 2023
    Authors
    Austin Kennell
    Description

    The dataset contained information on housing data in the Nashville, TN area. I used SQL Server to clean the data to make it easier to use. For example, I converted some dates to remove unnecessary timestamps; I populated data for null values; I changed address columns from containing all of the address, city and state into separate columns; I changed a column that had different representations of the same data into consistent usage; I removed duplicate rows; and I deleted unused columns.

  5. Employee Attrition Case Study

    • kaggle.com
    zip
    Updated Aug 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hunter Gonzalez (2023). Employee Attrition Case Study [Dataset]. https://www.kaggle.com/datasets/huntergonzalez247/employee-attrition-case-study
    Explore at:
    zip(9887 bytes)Available download formats
    Dataset updated
    Aug 8, 2023
    Authors
    Hunter Gonzalez
    Description

    This is a in-depth analysis I have created using data pulled from an open source (ODbL) data project that was provided on Kaggle:

    Pavansubhash. (2017). IBM HR Analytics Employee Attrition & Performance, Version 1. Retrieved August 3rd, 2023 from https://www.kaggle.com/datasets/pavansubhasht/ibm-hr-analytics-attrition-dataset.

    Problem: The VP of People Operations/HR at [Company] wants to better understand what efforts they can make to retain more employees every year.

    Question: How does education, job involvement, and work life balance effect employee attrition?

    Metrics

    A Survey was sent out 2068 current and past employees which asked a series of clear and consist questions inquiring about different variables involving the workplace. The surveys where anonymous to assure that employees answered truthfully and protecting the integrity of the data collected.

    Education: 1)Below College 2)Some College 3)Bachelor 4)Master 5)Doctor

    Job Involvement: 1)Low 2)Medium 3)High 4)Very High

    Work Life Balance: 1)Bad 2)Good 3)Better 4)Best

  6. SQL Integrity Journey: Unleashing Data Constraints

    • kaggle.com
    zip
    Updated Oct 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Radha Gandhi (2023). SQL Integrity Journey: Unleashing Data Constraints [Dataset]. https://www.kaggle.com/datasets/radhagandhi/sql-integrity-journey-unleashing-data-constraints
    Explore at:
    zip(13817 bytes)Available download formats
    Dataset updated
    Oct 9, 2023
    Authors
    Radha Gandhi
    Description

    **Title: **Practical Exploration of SQL Constraints: Building a Foundation in Data Integrity Introduction: Welcome to my Data Analysis project, where I focus on mastering SQL constraints—a pivotal aspect of database management. This project centers on hands-on experience with SQL's Data Definition Language (DDL) commands, emphasizing constraints such as PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK, and DEFAULT. In this project, I aim to demonstrate my foundational understanding of enforcing data integrity and maintaining a structured database environment. Purpose: The primary purpose of this project is to showcase my proficiency in implementing and managing SQL constraints for robust data governance. By delving into the realm of constraints, you'll gain insights into my SQL skills and how I utilize constraints to ensure data accuracy, consistency, and reliability within relational databases. What to Expect: Within this project, you will find a series of projects that focus on the implementation and utilization of SQL constraints. These projects highlight my command over the following key constraint types: NOT NULL: The NOT NULL constraint is crucial for ensuring the presence of essential data in a column. PRIMARY KEY: Ensuring unique identification of records for data integrity. FOREIGN KEY: Establishing relationships between tables to maintain referential integrity. UNIQUE: Guaranteeing the uniqueness of values within specified columns. CHECK: Implementing custom conditions to validate data entries. DEFAULT: Setting default values for columns to enhance data reliability. Each exercise within this project is accompanied by clear and concise SQL scripts, explanations of the intended outcomes, and practical insights into the application of these constraints. My goal is to showcase how SQL constraints serve as crucial tools for creating a structured and dependable database foundation. I invite you to explore these projects in detail, where I provide hands-on examples that highlight the importance and utility of SQL constraints. Together, these projects underscore my commitment to upholding data quality, ensuring data accuracy, and harnessing the power of SQL constraints for informed decision-making in data analysis. 3.1 CONSTRAINT - ENFORCING NOT NULL CONSTRAINT WHILE CREATING NEW TABLE. 3.2 CONSTRAINT- ENFORCE NOT NULL CONSTRAINT ON EXISTING COLUMN. 3.3 CONSTRAINT - ENFORCING PRIMARY KEY CONSTRAINT WHILE CREATING A NEW TABLE. 3.4 CONSTRAINT - ENFORCE PRIMARY KEY CONSTRAINT ON EXISTING COLUMN. 3.5 CONSTRAINT - ENFORCING FOREIGN KEY CONSTRAINT WHILE CREATING NEW TABLE. 3.6 CONSTRAINT - ENFORCE FOREIGN KEY CONSTRAINT ON EXISTING COLUMN. 3.7CONSTRAINT - ENFORCING UNIQUE CONSTRAINTS WHILE CREATING A NEW TABLE. 3.8 CONSTRAINT - ENFORCING UNIQUE CONSTRAINT IN EXISTING TABLE. 3.9 CONSTRAINT - ENFORCING CHECK CONSTRAINT IN NEW TABLE. 3.10 CONSTRAINT - ENFORCING CHECK CONSTRAINT IN THE EXISTING TABLE. 3.11 CONSTRAINT - ENFORCING DEFAULT CONSTRAINT IN THE NEW TABLE. 3.12 CONSTRAINT - ENFORCING DEFAULT CONSTRAINT IN THE EXISTING TABLE.

  7. SuperMarket Sales

    • kaggle.com
    zip
    Updated Dec 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chad Wambles (2024). SuperMarket Sales [Dataset]. https://www.kaggle.com/datasets/chadwambles/supermarket-sales
    Explore at:
    zip(37361 bytes)Available download formats
    Dataset updated
    Dec 17, 2024
    Authors
    Chad Wambles
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    I generated a database for sales data of a supermarket in order to practice determining KPI's and make data visualizations.

    This data set includes: - Unique sales id for each row. - Branch of the supermarket (New York, Chicago, and Los Angeles). - City of the supermarket (New York, Chicago, and Los Angeles). - Customer Type (Member or Normal). Members receive reward points. - Gender (Male or Female) - Product name of the product sold. - Product category of the product sold. - Unit price of each product sold. - Quantity of the product sold. - 7% sales tax of each product. - Total price of the product after tax. - Reward points for only members customer type.

    The Creation Queries.sql file will have the creation query for the Sales table and Insert queries. The data provided here is the same as what is found in the sales.csv file.

    The Sales and Revenue KPIs.sql file will have the queries I used to perform my analysis on key performance indicators relating to sales and revenue of this fictional company.

    The Customer Behavior KPIs.sql file will have the queries I used to perform my analysis on key performance indicators relating to customer behavior of this fictional company.

    The Product Performance KPIs.sql file will have the queries I used to perform my analysis on key performance indicators relating to product performance of this fictional company.

  8. Mastering the Essentials:Hands-On DDL Command Prac

    • kaggle.com
    zip
    Updated Sep 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Radha Gandhi (2023). Mastering the Essentials:Hands-On DDL Command Prac [Dataset]. https://www.kaggle.com/datasets/radhagandhi/1practical-exercise-in-ddl-commands/code
    Explore at:
    zip(7378 bytes)Available download formats
    Dataset updated
    Sep 25, 2023
    Authors
    Radha Gandhi
    Description

    The Practical Exercise in SQL Data Definition Language (DDL) Commands is a hands-on project designed to help you gain a deep understanding of fundamental DDL commands in SQL, including:

    • CREATE TABLE,
    • ALTER(ADD, RENAME, DROP)TABLE,
    • TRUNCATE TABLE.

    This project aims to enhance your proficiency in using SQL to create, modify, and manage database structures effectively.

    1.1 DDL-CREATE TABLE

    1.2 DDL-ALTER TABLE(ADD)

    1.3 DDL-ALTER(RENAME COLUMN NAME)

    1.4 DDL-ALTER(RENAME TABLE NAME)

    1.5 DDL-ALTER(DROP COLUMN FROM TABLE)

    1.6 DDL-ALTER(DROP TABLE)

    1.7 DDL- TRUNCATE TABLE

  9. Sales Executive Dashboard Report

    • kaggle.com
    zip
    Updated Aug 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jogleen Calipon (2025). Sales Executive Dashboard Report [Dataset]. https://www.kaggle.com/datasets/joelearns/sales-executive-dashboard-report
    Explore at:
    zip(3092158 bytes)Available download formats
    Dataset updated
    Aug 1, 2025
    Authors
    Jogleen Calipon
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This project is built on the AdventureWorks dataset, originally provided by Microsoft for SQL Server samples. This comprehensive dataset models a bicycle manufacturer and its sales to global markets, offering a realistic foundation for a data analytics portfolio.

    The raw data can be accessed and downloaded directly from the official Microsoft GitHub repository: https://github.com/microsoft/sql-server-samples/tree/master/samples/databases/adventure-works

    Project Overview

    The work presented in this portfolio project demonstrates my end-to-end data analysis skills, from initial data cleaning and modeling to creating an interactive, insight-driven dashboard. Within this project, you will find examples of various data visualizations and a dashboard layout that follows the F-pattern for optimized user experience.

    I encourage you to download the dataset and follow along with my analysis. Feel free to replicate my work, critique my methods, or build upon it with your own creative insights and improvements. Your feedback and engagement are highly welcomed!

  10. cyclistic-bike-share-2022-2024-clean

    • kaggle.com
    zip
    Updated Nov 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chathuranga Sudusinghe (2025). cyclistic-bike-share-2022-2024-clean [Dataset]. https://www.kaggle.com/datasets/indrajithsudusinghe/cyclistic-bike-share-2022-2024-clean
    Explore at:
    zip(579891587 bytes)Available download formats
    Dataset updated
    Nov 28, 2025
    Authors
    Chathuranga Sudusinghe
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Cyclistic Bike-Share Dataset (2022–2024) – Cleaned & Merged

    This dataset contains three full years (2022, 2023, and 2024) of publicly available Cyclistic bike-share trip data. All yearly files have been cleaned, standardized, and merged into a single high-quality master dataset for easy analysis.

    The dataset is ideal for:

    • Data Analysis & Visualization
    • SQL Projects
    • Python (Pandas) Practice
    • Power BI, Tableau Dashboards
    • Machine Learning Feature Engineering

    šŸ”¹ Key Cleaning & Processing Steps - Removed duplicate records - Handled missing values - Standardized column names - Converted date-time formats - Created calculated columns (ride length, day, month, etc.) - Merged yearly datasets into one master CSV file (3.17 GB)

    šŸ”¹ What You Can Analyze - Member vs Casual rider behavior - Peak riding hours and days - Monthly & seasonal trends - Trip duration patterns - Station usage & demand forecasting

    This dataset is especially useful for data analyst portfolio projects and technical interview preparation.

  11. Retail Sales, Returns & Shipping Dataset

    • kaggle.com
    zip
    Updated Aug 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    kunal malviya (2025). Retail Sales, Returns & Shipping Dataset [Dataset]. https://www.kaggle.com/datasets/kunalmalviya06/retail-sales-returns-and-shipping-dataset
    Explore at:
    zip(632399 bytes)Available download formats
    Dataset updated
    Aug 15, 2025
    Authors
    kunal malviya
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset provides a comprehensive view of retail operations, combining sales transactions, return records, and shipping cost details into one analysis-ready package. It’s ideal for data analysts, business intelligence professionals, and students looking to practice Power BI, Tableau, or SQL projects focusing on sales performance, profitability, and operational cost analysis.

    Dataset Structure

    Orders Table – Detailed transactional data

    Row ID

    Order ID

    Order Date, Ship Date, Delivery Duration

    Ship Mode

    Customer ID, Customer Name, Segment, Country, City, State, Postal Code, Region

    Product ID, Category, Sub-Category, Product Name

    Sales, Quantity, Discount, Discount Value, Profit, COGS

    Returns Table – Return records by Order ID

    Returned (Yes/No)

    Order ID

    Shipping Cost Table – State-level shipping expenses

    State

    Shipping Cost Per Unit

    Potential Use Cases

    Calculate gross vs. net profit after considering returns and shipping costs.

    Perform regional sales and profit analysis.

    Identify high-return products and loss-making categories.

    Visualize KPIs in Power BI or Tableau.

    Build predictive models for returns or shipping costs.

    Source & Context The dataset is designed for educational and analytical purposes. It is inspired by retail and e-commerce operations data and was prepared for data analytics portfolio projects.

    License Open for use in learning, analytics projects, and data visualization practice.

  12. Logistics Operations Database

    • kaggle.com
    zip
    Updated Nov 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yogape Rodriguez (2025). Logistics Operations Database [Dataset]. https://www.kaggle.com/datasets/yogape/logistics-operations-database
    Explore at:
    zip(15059576 bytes)Available download formats
    Dataset updated
    Nov 23, 2025
    Authors
    Yogape Rodriguez
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Kaggle Dataset: Synthetic Logistics Operations Database (2022-2024)

    About this Dataset

    What's Inside

    A complete operational database from a fictional Class 8 trucking company spanning three years. This isn't scraped web data or simplified tutorial content—it's a realistic simulation built from 12 years of real-world logistics experience, designed specifically for analysts transitioning into supply chain and transportation domains.

    The dataset contains 85,000+ records across 14 interconnected tables covering everything from driver assignments and fuel purchases to maintenance schedules and delivery performance. Each table maintains proper foreign key relationships, making this ideal for practicing complex SQL queries, building data pipelines, or developing operational dashboards.

    Who This Is For

    SQL Learners: Master window functions, CTEs, and multi-table JOINs using realistic business scenarios rather than contrived examples.

    Data Analysts: Build portfolio projects that demonstrate understanding of operational metrics: cost-per-mile analysis, fleet utilization optimization, driver performance scorecards.

    Aspiring Supply Chain Analysts: Work with authentic logistics data patterns—seasonal freight volumes, equipment utilization rates, route profitability calculations—without NDA restrictions.

    Data Science Students: Develop predictive models for maintenance scheduling, driver retention, or route optimization using time-series data with actual business context.

    Career Changers: If you're moving from operations into analytics (like the dataset creator), this provides a bridge—your domain knowledge becomes a competitive advantage rather than a gap to explain.

    Why This Dataset Exists

    Most logistics datasets are either proprietary (unavailable) or overly simplified (unrealistic). This fills the gap: operational complexity without confidentiality concerns. The data reflects real industry patterns:

    • Fuel prices track the 2022 diesel spike and 2023-2024 decline
    • Driver turnover sits at 15% annually (industry standard)
    • Equipment utilization averages 65% (typical for dry van operations)
    • On-time delivery performance ranges 85-95% (realistic service levels)
    • Maintenance intervals follow Class 8 PM schedules

    Dataset Structure

    Core Entities (Reference Tables): - Drivers (150 records) - Demographics, employment history, CDL info - Trucks (120 records) - Fleet specs, acquisition dates, status - Trailers (180 records) - Equipment types, current assignments - Customers (200 records) - Shipper accounts, contract terms, revenue potential - Facilities (50 records) - Terminals and warehouses with geocoordinates - Routes (60+ records) - City pairs with distances and rate structures

    Operational Transactions: - Loads (57,000+ records) - Shipment details, revenue, booking type - Trips (57,000+ records) - Driver-truck assignments, actual performance - Fuel Purchases (131,000+ records) - Transaction-level data with pricing - Maintenance Records (6,500+ records) - Service history, costs, downtime - Delivery Events (114,000+ records) - Pickup/delivery timestamps, detention - Safety Incidents (114 records) - Accidents, violations, claims

    Aggregated Analytics: - Driver Monthly Metrics (5,400+ records) - Performance summaries - Truck Utilization Metrics (3,800+ records) - Equipment efficiency

    Key Features

    Temporal Coverage: January 2022 through December 2024 (3 years)

    Geographic Scope: National operations across 25+ major US cities

    Realistic Patterns: - Seasonal freight fluctuations (Q4 peaks) - Historical fuel price accuracy - Equipment lifecycle modeling - Driver retention dynamics - Service level variations

    Data Quality: - Complete foreign key integrity - No orphaned records - Intentional 2% null rate in driver/truck assignments (reflects reality) - All timestamps properly sequenced - Financial calculations verified

    Use Case Examples

    Business Intelligence: Create executive dashboards showing revenue per truck, cost per mile, driver efficiency rankings, maintenance spend by equipment age, customer concentration risk.

    Predictive Analytics: Build models forecasting equipment failures based on maintenance history, predict driver turnover using performance metrics, estimate route profitability for new lanes.

    Operations Optimization: Analyze route efficiency, identify underutilized assets, optimize maintenance scheduling, calculate ideal fleet size, evaluate driver-to-truck ratios.

    SQL Mastery: Practice window functions for running totals and rankings, write complex JOINs across 6+ tables, implement CTEs for hierarchical queries, perform cohort analysis on driver retention.

    Sample Questions to Explore

    1. Which routes generate the highest profit margin after fuel costs?
    2. How does driver tenure correlate with fuel ef...
  13. 2025 Jobs and Salaries in Data Science

    • kaggle.com
    zip
    Updated Jan 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hina Ismail (2025). 2025 Jobs and Salaries in Data Science [Dataset]. https://www.kaggle.com/datasets/sonialikhan/2025-jobs-and-salaries-in-data-science/versions/1
    Explore at:
    zip(77972 bytes)Available download formats
    Dataset updated
    Jan 29, 2025
    Authors
    Hina Ismail
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    šŸš€ Data Science Careers in 2025: Jobs and Salary Trends in Pakistan šŸš€ Data Science is one of the fastest-growing fields, and by 2025, the demand for skilled professionals in Pakistan will only increase. If you’re considering a career in Data Science, here’s what you need to know about the top jobs and salary trends.

    šŸ” Top Data Science Jobs in 2025 1) Data Scientist Avg Salary: PKR 1.2M - 2.5M/year (Entry-Level), PKR 3M - 6M/year (Experienced) Skills: Python, R, Machine Learning, Data Visualization

    2) Data Analyst Avg Salary: PKR 800K - 1.5M/year (Entry-Level), PKR 2M - 3.5M/year (Experienced) Skills: SQL, Excel, Tableau, Power BI

    3) Machine Learning Engineer Avg Salary: PKR 1.5M - 3M/year (Entry-Level), PKR 4M - 7M/year (Experienced) Skills: TensorFlow, PyTorch, Deep Learning, NLP

    4)Business Intelligence Analyst Avg Salary: PKR 1M - 2M/year (Entry-Level), PKR 2.5M - 4M/year (Experienced) Skills: Data Warehousing, ETL, Dashboarding

    5) AI Research Scientist Avg Salary: PKR 2M - 4M/year (Entry-Level), PKR 5M - 10M/year (Experienced) Skills: AI Algorithms, Research, Advanced Mathematic

    šŸ’” Why Choose Data Science? High Demand: Every industry in Pakistan needs data professionals. Attractive Salaries: Competitive pay based on technical expertise. Growth Opportunities: Unlimited career growth in this field.

    šŸ“ˆ Salary Trends Entry-Level: PKR 800K - 1.5M/year Mid-Level: PKR 2M - 4M/year Senior-Level: PKR 5M+ (depending on expertise and industry)

    šŸ› ļø How to Get Started? Learn Skills: Focus on Python, SQL, Machine Learning, and Data Visualization. Build Projects: Work on real-world datasets to create a strong portfolio. Network: Connect with industry professionals and join Data Science communities.

    work_year: The year in which the data was recorded. This field indicates the temporal context of the data, important for understanding salary trends over time.

    job_title: The specific title of the job role, like 'Data Scientist', 'Data Engineer', or 'Data Analyst'. This column is crucial for understanding the salary distribution across various specialized roles within the data field.

    job_category: A classification of the job role into broader categories for easier analysis. This might include areas like 'Data Analysis', 'Machine Learning', 'Data Engineering', etc.

    salary_currency: The currency in which the salary is paid, such as USD, EUR, etc. This is important for currency conversion and understanding the actual value of the salary in a global context.

    salary: The annual gross salary of the role in the local currency. This raw salary figure is key for direct regional salary comparisons.

    salary_in_usd: The annual gross salary converted to United States Dollars (USD). This uniform currency conversion aids in global salary comparisons and analyses.

    employee_residence: The country of residence of the employee. This data point can be used to explore geographical salary differences and cost-of-living variations.

    experience_level: Classifies the professional experience level of the employee. Common categories might include 'Entry-level', 'Mid-level', 'Senior', and 'Executive', providing insight into how experience influences salary in data-related roles.

    employment_type: Specifies the type of employment, such as 'Full-time', 'Part-time', 'Contract', etc. This helps in analyzing how different employment arrangements affect salary structures.

    work_setting: The work setting or environment, like 'Remote', 'In-person', or 'Hybrid'. This column reflects the impact of work settings on salary levels in the data industry.

    company_location: The country where the company is located. It helps in analyzing how the location of the company affects salary structures.

    company_size: The size of the employer company, often categorized into small (S), medium (M), and large (L) sizes. This allows for analysis of how company size influences salary.

  14. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Deepali Sukhdeve (2024). Data Cleaning Portfolio Project [Dataset]. https://www.kaggle.com/datasets/deepalisukhdeve/data-cleaning-portfolio-project
Organization logo

Data Cleaning Portfolio Project

Cleaning Data with SQL Queries

Explore at:
zip(6053781 bytes)Available download formats
Dataset updated
Apr 2, 2024
Authors
Deepali Sukhdeve
Description

Dataset

This dataset was created by Deepali Sukhdeve

Contents

Search
Clear search
Close search
Google apps
Main menu