60 datasets found
  1. Netflix Data: Cleaning, Analysis and Visualization

    • kaggle.com
    zip
    Updated Aug 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdulrasaq Ariyo (2022). Netflix Data: Cleaning, Analysis and Visualization [Dataset]. https://www.kaggle.com/datasets/ariyoomotade/netflix-data-cleaning-analysis-and-visualization
    Explore at:
    zip(276607 bytes)Available download formats
    Dataset updated
    Aug 26, 2022
    Authors
    Abdulrasaq Ariyo
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Netflix is a popular streaming service that offers a vast catalog of movies, TV shows, and original contents. This dataset is a cleaned version of the original version which can be found here. The data consist of contents added to Netflix from 2008 to 2021. The oldest content is as old as 1925 and the newest as 2021. This dataset will be cleaned with PostgreSQL and visualized with Tableau. The purpose of this dataset is to test my data cleaning and visualization skills. The cleaned data can be found below and the Tableau dashboard can be found here .

    Data Cleaning

    We are going to: 1. Treat the Nulls 2. Treat the duplicates 3. Populate missing rows 4. Drop unneeded columns 5. Split columns Extra steps and more explanation on the process will be explained through the code comments

    --View dataset
    
    SELECT * 
    FROM netflix;
    
    
    --The show_id column is the unique id for the dataset, therefore we are going to check for duplicates
                                      
    SELECT show_id, COUNT(*)                                                                                      
    FROM netflix 
    GROUP BY show_id                                                                                              
    ORDER BY show_id DESC;
    
    --No duplicates
    
    --Check null values across columns
    
    SELECT COUNT(*) FILTER (WHERE show_id IS NULL) AS showid_nulls,
        COUNT(*) FILTER (WHERE type IS NULL) AS type_nulls,
        COUNT(*) FILTER (WHERE title IS NULL) AS title_nulls,
        COUNT(*) FILTER (WHERE director IS NULL) AS director_nulls,
        COUNT(*) FILTER (WHERE movie_cast IS NULL) AS movie_cast_nulls,
        COUNT(*) FILTER (WHERE country IS NULL) AS country_nulls,
        COUNT(*) FILTER (WHERE date_added IS NULL) AS date_addes_nulls,
        COUNT(*) FILTER (WHERE release_year IS NULL) AS release_year_nulls,
        COUNT(*) FILTER (WHERE rating IS NULL) AS rating_nulls,
        COUNT(*) FILTER (WHERE duration IS NULL) AS duration_nulls,
        COUNT(*) FILTER (WHERE listed_in IS NULL) AS listed_in_nulls,
        COUNT(*) FILTER (WHERE description IS NULL) AS description_nulls
    FROM netflix;
    
    We can see that there are NULLS. 
    director_nulls = 2634
    movie_cast_nulls = 825
    country_nulls = 831
    date_added_nulls = 10
    rating_nulls = 4
    duration_nulls = 3 
    

    The director column nulls is about 30% of the whole column, therefore I will not delete them. I will rather find another column to populate it. To populate the director column, we want to find out if there is relationship between movie_cast column and director column

    -- Below, we find out if some directors are likely to work with particular cast
    
    WITH cte AS
    (
    SELECT title, CONCAT(director, '---', movie_cast) AS director_cast 
    FROM netflix
    )
    
    SELECT director_cast, COUNT(*) AS count
    FROM cte
    GROUP BY director_cast
    HAVING COUNT(*) > 1
    ORDER BY COUNT(*) DESC;
    
    With this, we can now populate NULL rows in directors 
    using their record with movie_cast 
    
    UPDATE netflix 
    SET director = 'Alastair Fothergill'
    WHERE movie_cast = 'David Attenborough'
    AND director IS NULL ;
    
    --Repeat this step to populate the rest of the director nulls
    --Populate the rest of the NULL in director as "Not Given"
    
    UPDATE netflix 
    SET director = 'Not Given'
    WHERE director IS NULL;
    
    --When I was doing this, I found a less complex and faster way to populate a column which I will use next
    

    Just like the director column, I will not delete the nulls in country. Since the country column is related to director and movie, we are going to populate the country column with the director column

    --Populate the country using the director column
    
    SELECT COALESCE(nt.country,nt2.country) 
    FROM netflix AS nt
    JOIN netflix AS nt2 
    ON nt.director = nt2.director 
    AND nt.show_id <> nt2.show_id
    WHERE nt.country IS NULL;
    UPDATE netflix
    SET country = nt2.country
    FROM netflix AS nt2
    WHERE netflix.director = nt2.director and netflix.show_id <> nt2.show_id 
    AND netflix.country IS NULL;
    
    
    --To confirm if there are still directors linked to country that refuse to update
    
    SELECT director, country, date_added
    FROM netflix
    WHERE country IS NULL;
    
    --Populate the rest of the NULL in director as "Not Given"
    
    UPDATE netflix 
    SET country = 'Not Given'
    WHERE country IS NULL;
    

    The date_added rows nulls is just 10 out of over 8000 rows, deleting them cannot affect our analysis or visualization

    --Show date_added nulls
    
    SELECT show_id, date_added
    FROM netflix_clean
    WHERE date_added IS NULL;
    
    --DELETE nulls
    
    DELETE F...
    
  2. Atlix - Data Cleaning to Data Viz

    • kaggle.com
    zip
    Updated Apr 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vikram amin (2023). Atlix - Data Cleaning to Data Viz [Dataset]. https://www.kaggle.com/datasets/vikramamin/atlix-data-cleaning-to-data-viz
    Explore at:
    zip(177969 bytes)Available download formats
    Dataset updated
    Apr 8, 2023
    Authors
    vikram amin
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The reference for the dataset and the dashboard was Youtube Channel codebasics. I have used a fictitious company called Atlix where the Sales Director want the sales data to be in a proper format which can help in decision making.

    We have a total of 5 tables namely customers, products, markets, date & transactions. The data is exported from Mysql to Tableau.

    In tableau , inner joins were used.

    In the transactions table, we notice that sum sales amount figures are either negative or zero while the sales qty is either 1 or more. This cannot be right. Therefore, we filter the sales amount table in Tableau by having the least sales amount as minimum 1.

    When currency column from transactions table was grouped in MySql, we could see ‘USD’ and ‘INR’ showing up. We cannot have a sales data showing two currencies. This was rectified by converting the USD sales amount into INR by taking the latest exchange rate at Rs.81.

    We make the above change in tableau by creating a new calculated field called ‘Normalised Sales Amount’. If [Sales Amount] == ‘USD’ then [Sales Amount] * 81 else [Sales Amount] End.

    Conclusion: The dashboard prepared is an interactive dashboard with filters. For eg. By Clicking on Mumbai under “Sales by Markets” we will see the results change in the other charts as well as they Will now show the results pertaining only to Mumbai. This can be done by year , month, customers , products etc. Parameter with filter has also been created for top customers and top products. This produces a slider which can be used to view the top 10 customers and products and slide it accordingly.

    Following information can be passed on to the sales team or director.

    Total Sales: from Jun’17 to Feb’20 has been INR 12.83 million. There is a drop of 57% in the sales revenue from 2018 to 2019. The year 2020 has not been considered as it only account for 2 months data. Markets: Mumbai which is the top most performing market and accounts for 51% of the total sales market has seen a drop in sales of almost 64% from 2018 to 2019. Top Customers: Path was on 2nd position in terms of sales in the year 2018. It accounted for 19% of the total sales after Electricalslytical which accounted for 21% of the total sales. But in year 2019, both Electricalslytical and Path were the 2nd and 4th highest customers by sales. By targeting the specific markets and customers through new ideas such as promotions, discounts etc we can look to reverse the trend of decreasing sales.

  3. D

    Data Preparation Tools Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Mar 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Data Preparation Tools Report [Dataset]. https://www.datainsightsmarket.com/reports/data-preparation-tools-1458728
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    Mar 12, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    Discover the booming Data Preparation Tools market! Learn about its 18.5% CAGR, key players (Microsoft, Tableau, IBM), and regional growth trends from our comprehensive analysis. Explore market segments, drivers, and restraints shaping this crucial sector for businesses of all sizes.

  4. USA Bank Financial Data

    • kaggle.com
    zip
    Updated Jun 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VISHAL SINGH SANGRAL (2024). USA Bank Financial Data [Dataset]. https://www.kaggle.com/datasets/vishalsinghsangral/usa-bank-financial-data
    Explore at:
    zip(20684 bytes)Available download formats
    Dataset updated
    Jun 28, 2024
    Authors
    VISHAL SINGH SANGRAL
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Description:

    The myusabank.csv dataset contains daily financial data for a fictional bank (MyUSA Bank) over a two-year period. It includes various key financial metrics such as interest income, interest expense, average earning assets, net income, total assets, shareholder equity, operating expenses, operating income, market share, and stock price. The data is structured to simulate realistic scenarios in the banking sector, including outliers, duplicates, and missing values for educational purposes.

    Potential Student Tasks:

    1. Data Cleaning and Preprocessing:

      • Handle missing values, duplicates, and outliers to ensure data integrity.
      • Normalize or scale data as needed for analysis.
    2. Exploratory Data Analysis (EDA):

      • Visualize trends and distributions of financial metrics over time.
      • Identify correlations between different financial indicators.
    3. Calculating Key Performance Indicators (KPIs):

      • Compute metrics such as Net Interest Margin (NIM), Return on Assets (ROA), Return on Equity (ROE), and Cost-to-Income Ratio using calculated fields.
      • Analyze the financial health and performance of MyUSA Bank based on these KPIs.
    4. Building Tableau Dashboards:

      • Design interactive dashboards to present insights and trends.
      • Include summary cards, bar charts, line charts, and pie charts to visualize financial performance metrics.
    5. Forecasting and Predictive Modeling:

      • Use historical data to forecast future financial performance.
      • Apply regression or time series analysis to predict market share or stock price movements.
    6. Business Insights and Reporting:

      • Interpret findings to derive actionable insights for bank management.
      • Prepare reports or presentations summarizing key findings and recommendations.

    Educational Goals:

    The dataset aims to provide hands-on experience in data preprocessing, analysis, and visualization within the context of banking and finance. It encourages students to apply data science techniques to real-world financial data, enhancing their skills in data-driven decision-making and strategic analysis.

  5. D

    Data Preparation Software Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Oct 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Data Preparation Software Report [Dataset]. https://www.datainsightsmarket.com/reports/data-preparation-software-1447211
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Oct 23, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global Data Preparation Software market is poised for substantial growth, projected to reach an estimated $613 million in 2025 with a compelling Compound Annual Growth Rate (CAGR) of 8.5% through 2033. This robust expansion is fueled by the escalating volume and complexity of data generated across all industries, necessitating efficient tools for cleaning, transforming, and enriching raw data into usable formats for analytics and decision-making. Large enterprises, in particular, are significant adopters, leveraging these solutions to manage vast datasets and derive actionable insights. However, the Small and Medium-sized Enterprises (SMEs) segment is emerging as a key growth driver, as more businesses recognize the competitive advantage that well-prepared data offers, even with limited IT resources. The prevalent trend towards cloud-based solutions further democratizes access to advanced data preparation capabilities, offering scalability and flexibility that are crucial in today's dynamic business environment. Key market drivers include the increasing demand for data-driven decision-making, the growing adoption of business intelligence and advanced analytics, and the need for regulatory compliance. Trends such as the integration of AI and machine learning within data preparation tools to automate repetitive tasks, the rise of self-service data preparation for business users, and the focus on data governance and quality are shaping the market landscape. While the market exhibits strong growth, potential restraints could include the high initial cost of some sophisticated solutions and the need for skilled personnel to fully leverage their capabilities. Geographically, North America and Europe are expected to continue their dominance, driven by established technological infrastructure and a strong analytics culture. However, the Asia Pacific region is anticipated to witness the fastest growth due to rapid digital transformation and increasing data generation. Here's a comprehensive report description on Data Preparation Software, incorporating your specified elements:

    This report provides an in-depth analysis of the global Data Preparation Software market, projecting a robust growth trajectory from a Base Year of 2025 through a Forecast Period of 2025-2033. The Study Period covers 2019-2033, with a particular focus on the Estimated Year of 2025 and the Historical Period of 2019-2024. We project the market to reach substantial valuations, with the global market size estimated to be over $500 million in 2025, and poised for significant expansion in the coming decade.

  6. Z

    IVMOOC 2017 - GloBI Data for Interactive Tableau Map of Spatial and Temporal...

    • data-staging.niaid.nih.gov
    • nde-dev.biothings.io
    • +2more
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cains, Mariana; Anand, Srini (2020). IVMOOC 2017 - GloBI Data for Interactive Tableau Map of Spatial and Temporal Distribution of Interactions [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_814911
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Indiana University
    Authors
    Cains, Mariana; Anand, Srini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Global Biotic Interactions (GloBI, www.globalbioticinteractions.org) provides an infrastructure and data service that aggregates and archives known biotic interaction databases to provide easy access to species interaction data. This project explores the coverage of GloBI data against known taxonomic catalogues in order to identify 'gaps' in knowledge of species interactions. We examine the richness of GloBI's datasets using itself as a frame of reference for comparison and explore interaction networks according to geographic regions over time. The resulting analysis and visualizations intend to provide insights that may help to enhance GloBI as a resource for research and education.

    Spatial and temporal biotic interactions data were used in the construction of an interactive Tableau map. The raw data (IVMOOC 2017 GloBI Kingdom Data Extracted 2017 04 17.csv) was extracted from the project-specific SQL database server. The raw data was clean and preprocessed (IVMOOC 2017 GloBI Cleaned Tableau Data.csv) for use in the Tableau map. Data cleaning and preprocessing steps are detailed in the companion paper.

    The interactive Tableau map can be found here: https://public.tableau.com/profile/publish/IVMOOC2017-GloBISpatialDistributionofInteractions/InteractionsMapTimeSeries#!/publish-confirm

    The companion paper can be found here: doi.org/10.5281/zenodo.814979

    Complementary high resolution visualizations can be found here: doi.org/10.5281/zenodo.814922

    Project-specific data can be found here: doi.org/10.5281/zenodo.804103 (SQL server database)

  7. D

    Data Preparation Platform Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Jul 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Data Preparation Platform Report [Dataset]. https://www.marketresearchforecast.com/reports/data-preparation-platform-531963
    Explore at:
    ppt, pdf, docAvailable download formats
    Dataset updated
    Jul 23, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    Discover the booming Data Preparation Platform market! Our analysis reveals a projected $30B market by 2033, driven by cloud adoption, AI integration, and growing data volumes. Explore key trends, leading companies (Microsoft, Tableau, Alteryx), and regional insights in this comprehensive market report.

  8. D

    Data Preparation Tools Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Jun 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Data Preparation Tools Report [Dataset]. https://www.datainsightsmarket.com/reports/data-preparation-tools-1968805
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Jun 25, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The data preparation tools market is experiencing robust growth, driven by the exponential increase in data volume and velocity across various industries. The rising need for data quality and consistency, coupled with the increasing adoption of advanced analytics and business intelligence solutions, fuels this expansion. A CAGR of, let's assume, 15% (a reasonable estimate given the rapid technological advancements in this space) between 2019 and 2024 suggests a significant market expansion. This growth is further amplified by the increasing demand for self-service data preparation tools that empower business users to access and prepare data without needing extensive technical expertise. Major players like Microsoft, Tableau, and Alteryx are leading the charge, continuously innovating and expanding their offerings to cater to diverse industry needs. The market is segmented based on deployment type (cloud, on-premise), organization size (small, medium, large enterprises), and industry vertical (BFSI, healthcare, retail, etc.), creating lucrative opportunities across various segments. However, challenges remain. The complexity of integrating data preparation tools with existing data infrastructures can pose implementation hurdles for certain organizations. Furthermore, the need for skilled professionals to manage and utilize these tools effectively presents a potential restraint to wider adoption. Despite these obstacles, the long-term outlook for the data preparation tools market remains highly positive, with continuous innovation in areas like automated data preparation, machine learning-powered data cleansing, and enhanced collaboration features driving further growth throughout the forecast period (2025-2033). We project a market size of approximately $15 billion in 2025, considering a realistic growth trajectory and the significant investment made by both established players and emerging startups.

  9. D

    Data Preparation Platform Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Sep 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Data Preparation Platform Report [Dataset]. https://www.datainsightsmarket.com/reports/data-preparation-platform-1368457
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    Sep 20, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global Data Preparation Platform market is poised for substantial growth, estimated to reach $15,600 million by the study's end in 2033, up from $6,000 million in the base year of 2025. This trajectory is fueled by a Compound Annual Growth Rate (CAGR) of approximately 12.5% over the forecast period. The proliferation of big data and the increasing need for clean, usable data across all business functions are primary drivers. Organizations are recognizing that effective data preparation is foundational to accurate analytics, informed decision-making, and successful AI/ML initiatives. This has led to a surge in demand for platforms that can automate and streamline the complex, time-consuming process of data cleansing, transformation, and enrichment. The market's expansion is further propelled by the growing adoption of cloud-based solutions, offering scalability, flexibility, and cost-efficiency, particularly for Small & Medium Enterprises (SMEs). Key trends shaping the Data Preparation Platform market include the integration of AI and machine learning for automated data profiling and anomaly detection, enhanced collaboration features to facilitate teamwork among data professionals, and a growing focus on data governance and compliance. While the market exhibits robust growth, certain restraints may temper its pace. These include the complexity of integrating data preparation tools with existing IT infrastructures, the shortage of skilled data professionals capable of leveraging advanced platform features, and concerns around data security and privacy. Despite these challenges, the market is expected to witness continuous innovation and strategic partnerships among leading companies like Microsoft, Tableau, and Alteryx, aiming to provide more comprehensive and user-friendly solutions to meet the evolving demands of a data-driven world. Here's a comprehensive report description on Data Preparation Platforms, incorporating the requested information, values, and structure:

  10. Stock Market Dashboard Build (Python + Tableau)

    • kaggle.com
    zip
    Updated Feb 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    jackmnob (2025). Stock Market Dashboard Build (Python + Tableau) [Dataset]. https://www.kaggle.com/datasets/jackmnob/stock-market-dashboard-build-python-tableau
    Explore at:
    zip(549379249 bytes)Available download formats
    Dataset updated
    Feb 27, 2025
    Authors
    jackmnob
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Original Credit goes to: Oleh Onyshchak

    Original Owner: https://www.kaggle.com/datasets/jacksoncrow/stock-market-dataset?resource=download

    rawData (.CSVs) Information:

    "This dataset contains historical data of daily prices for each ticker (minus a few incompatible tickers, such as CARR# and UTX#) - currently trading on NASDAQ. The up to date list is available from nasdaqtrader.com.

    The historic data was retrieved from Yahoo finance via yfinance python package."

    Each file contains data from 01/04/2016 to 04/01/2020.

    cleanData (.CSVs) & .ipynb (Python code) Information:

    This edition contains my .ipynb notebook for user replication within JupyterLab and code transparency via Kaggle, this dataset is then cleaned via Python & pandas and used to create the final Tableau Dashboard linked below:

    My Tableau Dashboard: https://public.tableau.com/app/profile/jack3951/viz/TopStocksAnalysisPythonpandas/Dashboard1

    Enjoy!

  11. D

    Data Preparation Tools Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Mar 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Data Preparation Tools Report [Dataset]. https://www.archivemarketresearch.com/reports/data-preparation-tools-51852
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    Mar 6, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Data Preparation Tools market is booming, projected to reach $3 billion by 2025 with a 17.7% CAGR. Discover key trends, drivers, and restraints shaping this dynamic industry, including regional market share and leading companies like Microsoft, Tableau, and Alteryx. Explore the impact of self-service tools and cloud adoption.

  12. D

    Data Preparation Platform Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Mar 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Data Preparation Platform Report [Dataset]. https://www.marketresearchforecast.com/reports/data-preparation-platform-36093
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Mar 16, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    Discover the booming Data Preparation Platform market! Learn about its $15 billion valuation (2025), 18% CAGR, key drivers, trends, and leading players like Microsoft, Tableau, and Alteryx. Explore regional market share and growth projections to 2033. Get your insights now!

  13. a

    ES Encampment Cleaning Tracking (Tacoma)

    • hub.arcgis.com
    • data.tacoma.gov
    Updated Sep 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Tacoma GIS (2024). ES Encampment Cleaning Tracking (Tacoma) [Dataset]. https://hub.arcgis.com/datasets/b9e836fe916f4b16baaff37fa415f60b
    Explore at:
    Dataset updated
    Sep 12, 2024
    Dataset authored and provided by
    City of Tacoma GIS
    Area covered
    Description

    ES Encampment Cleaning Tracking Public is a hosted layer view indented for sharing Encampment Cleanup information with the public and for use within Tableau dashboards. Homeless encampment cleanup data is collected by contractors and tracking info related to cleanup efforts within encampments and perimeters.Data informs the Tidy-Up Tacoma Data Dashboard and aides in analysis of trends. Data is updated daily. For more information contact: Vicky Tirrell Business Services Analyst ES SW Operations Support Services. vtirrell@cityoftacoma.org

  14. D

    Data Preparation Tools Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Mar 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Data Preparation Tools Report [Dataset]. https://www.archivemarketresearch.com/reports/data-preparation-tools-52055
    Explore at:
    ppt, pdf, docAvailable download formats
    Dataset updated
    Mar 6, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The booming data preparation tools market, projected to reach $33.2 billion by 2033 with a 15% CAGR, is reshaping data analytics. Learn about key drivers, market segmentation (self-service, data integration, applications), leading vendors (Microsoft, Tableau, Alteryx), and regional trends influencing this rapidly evolving landscape.

  15. Visualizing Chicago Crime Data

    • kaggle.com
    zip
    Updated Jul 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elijah Toumoua (2022). Visualizing Chicago Crime Data [Dataset]. https://www.kaggle.com/datasets/elijahtoumoua/chicago-analysis-of-crime-data-dashboard
    Explore at:
    zip(94861784 bytes)Available download formats
    Dataset updated
    Jul 1, 2022
    Authors
    Elijah Toumoua
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Chicago
    Description

    Prelude

    This dataset is a cleaned version of the Chicago Crime Dataset, which can be found here. All rights for the dataset go to the original owners. The purpose of this dataset is to display my skills in visualizations and creating dashboards. To be specific, I will attempt to create a dashboard that will allow users to see metrics for a specific crime within a given year using filters and metrics. Due to this, there will not be much of a focus on the analysis of the data, but there will be portions discussing the validity of the dataset, the steps I took to clean the data, and how I organized it. The cleaned datasets can be found below, the Query (which utilized BigQuery) can be found here and the Tableau dashboard can be found here.

    About the Dataset

    Important Facts

    The dataset comes directly from the City of Chicago's website under the page "City Data Catalog." The data is gathered directly from the Chicago Police's CLEAR (Citizen Law Enforcement Analysis and Reporting) and is updated daily to present the information accurately. This means that a crime on a specific date may be changed to better display the case. The dataset represents crimes starting all the way from 2001 to seven days prior to today's date.

    Reliability

    Using the ROCCC method, we can see that: * The data has high reliability: The data covers the entirety of Chicago from a little over 2 decades. It covers all the wards within Chicago and even gives the street names. While we may not have an idea for how big the sample size is, I do believe that the dataset has high reliability since it geographically covers the entirety of Chicago. * The data has high originality: The dataset was gained directly from the Chicago Police Dept. using their database, so we can say this dataset is original. * The data is somewhat comprehensive: While we do have important information such as the types of crimes committed and their geographic location, I do not think this gives us proper insights as to why these crimes take place. We can pinpoint the location of the crime, but we are limited by the information we have. How hot was the day of the crime? Did the crime take place in a neighborhood with low-income? I believe that these key factors prevent us from getting proper insights as to why these crimes take place, so I would say that this dataset is subpar with how comprehensive it is. * The data is current: The dataset is updated frequently to display crimes that took place seven days prior to today's date and may even update past crimes as more information comes to light. Due to the frequent updates, I do believe the data is current. * The data is cited: As mentioned prior, the data is collected directly from the polices CLEAR system, so we can say that the data is cited.

    Processing the Data

    Cleaning the Dataset

    The purpose of this step is to clean the dataset such that there are no outliers in the dashboard. To do this, we are going to do the following: * Check for any null values and determine whether we should remove them. * Update any values where there may be typos. * Check for outliers and determine if we should remove them.

    The following steps will be explained in the code segments below. (I used BigQuery for this so the coding will follow BigQuery's syntax) ```

    Examining the dataset

    There are over 7.5 million rows of data

    Putting a limit so it does not take a long time to run

    SELECT * FROM portfolioproject-350601.ChicagoCrime.Crime LIMIT 1000;

    Seeing which points are null

    There are 85,000 null points so we can exclude them as it's not a significant amount since it is only ~1.3% of the dataset

    Most of the null points are in the lat and long, which we will need later

    Because we don't have the full address, we can't estimate the lat and long in SQL so we will have to delete the rows with Null Data

    SELECT * FROM portfolioproject-350601.ChicagoCrime.Crime WHERE unique_key IS NULL OR case_number IS NULL OR date IS NULL OR primary_type IS NULL OR location_description IS NULL OR arrest IS NULL OR longitude IS NULL OR latitude IS NULL;

    Deleting all null rows

    DELETE FROM portfolioproject-350601.ChicagoCrime.Crime WHERE
    unique_key IS NULL OR case_number IS NULL OR date IS NULL OR primary_type IS NULL OR location_description IS NULL OR arrest IS NULL OR longitude IS NULL OR latitude IS NULL;

    Checking for any duplicates in the unique keys

    None to be found

    SELECT unique_key, COUNT(unique_key) FROM `portfolioproject-350601.ChicagoCrime....

  16. Tableau Dummy Dataset for Practice

    • kaggle.com
    Updated Aug 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Piush Dave (2025). Tableau Dummy Dataset for Practice [Dataset]. https://www.kaggle.com/datasets/piyushdave/tableau-dummy-dataset-for-practice
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 21, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Piush Dave
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Domain-Specific Dataset and Visualization Guide

    This package contains 20 realistic datasets in CSV format across different industries, along with 20 text files suggesting visualization ideas. Each dataset includes about 300 rows of synthetic but domain-appropriate data. They are designed for data analysis, visualization practice, machine learning projects, and dashboard building.

    What’s inside

    • 20 CSV files, one for each domain:

      1. Education
      2. E-Commerce
      3. Healthcare
      4. Finance
      5. Retail
      6. Social Media
      7. Manufacturing
      8. Sports
      9. Transport
      10. Hospitality
      11. Telecom
      12. Banking
      13. Real Estate
      14. Gaming
      15. Agriculture
      16. Automobile
      17. Energy
      18. Insurance
      19. Government
      20. Entertainment

    20 TXT files, each listing 10 relevant graphing options for the dataset.

    MASTER_INDEX.csv, which summarizes all domains with their column names.

    Use cases

    • Practice data cleaning, exploration, and visualization in Excel, Tableau, Power BI, or Python.
    • Build dashboards for specific industries.
    • Train beginner-level machine learning models such as classification and regression.
    • Use in classroom teaching or workshops as ready-made datasets.

    Example

    • Education dataset has columns like StudentName, Class, Subject, Marks, AttendancePercent. Suggested graphs: bar chart of average marks by subject, scatter plot of marks vs attendance percent, line chart of attendance over time.

    • E-Commerce dataset has columns like OrderDate, Product, Category, Price, Quantity, Total. Suggested graphs: line chart of revenue trend, bar chart of revenue by category, pie chart of payment mode share.

  17. D

    Data Prep Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Jun 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Data Prep Report [Dataset]. https://www.marketresearchforecast.com/reports/data-prep-547253
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Jun 23, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Data Prep market is booming, projected to reach $12 Billion by 2033 with a 13.7% CAGR. Discover key trends, leading companies (Alteryx, Informatica, IBM), and regional insights in this comprehensive market analysis. Learn how self-service tools and cloud solutions are transforming data preparation.

  18. Sales Performance Analysis with Sheet/Tableau

    • kaggle.com
    zip
    Updated May 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Victor Agbon Eghoghon (2025). Sales Performance Analysis with Sheet/Tableau [Dataset]. https://www.kaggle.com/datasets/victoragboneghoghon/sales-performance-analysis-with-sheettableau
    Explore at:
    zip(96628 bytes)Available download formats
    Dataset updated
    May 21, 2025
    Authors
    Victor Agbon Eghoghon
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Project Introduction and Goals

    This project is focused on analyzing a sales dataset using Google Sheets for data cleaning and Tableau for visualizations. The main objective is to uncover actionable insights such as top performing countries, best selling products, and monthly sales trends. I aim to present these findings through an interactive dashboard that can be used by business stakeholders for decision making.

    Process Overview

    1. Data Cleaning (Google Sheets) • Removed blank rows and filtered out missing values. • Standardized product and region names for consistency. • Split combined columns (e.g., date & time) for easier analysis. • Replaced missing or incorrect values with relevant estimates (e.g., average or “unknown”).

    2. Exploratory Analysis • Calculated total sales by country. • Identified the best-selling products and frequent buyers. • Tracked monthly sales trends.

    3. Visualization (Tableau)

      • Created a dynamic sales dashboard including: • Line chart showing sales over time • Pie chart of product categories • Bar chart of top 10 customers by revenue • Country-wise sales comparison

    4. Conclusion

    The analysis reveals key patterns in sales distribution, highlights top contributors to revenue, and suggests areas needing attention (e.g., low-performing countries). The dashboard enables real-time filtering and deeper insight for users.

    https://public.tableau.com/views/Salesperformanceanalysis_17478415969510/Salesperformanceanalysis?:language=en-GB&:sid=&:display_count=n&:origin=viz_share_link

  19. f

    Dataset – Student & Early-Career Survey on Data-Analytics Tool Adoption and...

    • figshare.com
    xlsx
    Updated Jun 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lev Radman (2025). Dataset – Student & Early-Career Survey on Data-Analytics Tool Adoption and Decision-Making (Uzbekistan, Apr–May 2025) [Dataset]. http://doi.org/10.6084/m9.figshare.29430227.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 29, 2025
    Dataset provided by
    figshare
    Authors
    Lev Radman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Purpose. This dataset contains anonymised raw responses (n = 55, 31 variables) from a cross-sectional survey investigating factors that influence the adoption of data-analytics tools (Excel/Sheets, Power BI/Tableau, Python notebooks, Google Analytics) among graduate students and early-career professionals in Uzbekistan.Instrument. Items operationalise seven UTAUT/TAM-based constructs: Performance Expectancy, Effort Expectancy, Behavioural Intention, Familiarity & Usage, Task–Technology Fit, Barriers to Adoption, plus Demographics (age, gender, study programme, prior stats courses, work experience). All Likert items use a five-point scale.Collection & cleaning. Data were collected via Google Forms between 02 Apr 2025 and 22 Apr 2025 through university e-mail lists, Telegram study channels, and LinkedIn posts. Five partial records (> 20 % missing) were removed; remaining open-text answers were lower-cased, spell-checked, and stemmed. The file is provided exactly as analysed in the accompanying thesis; no further processing (e.g., recoding) has been performed.File contents. survey_responses.xlsx – one worksheet (“Form Responses 1”) with 55 rows × 31 columns. Column A (“Timestamp”) shows submission time in UTC+5. Variable names follow the original question stems for transparency.Ethics & privacy. All participants gave informed e-consent; no personal identifiers (names, e-mails, IPs) are included. Ethical approval: Silk Road University REC # 2025-DX-012.

  20. 🎬 IMDB 2020 Top Movies – Tableau Dashboard

    • kaggle.com
    zip
    Updated Jul 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sahil Raj (2025). 🎬 IMDB 2020 Top Movies – Tableau Dashboard [Dataset]. https://www.kaggle.com/datasets/ssrai7/imdb-2020-top-movies-tableau-dashboard
    Explore at:
    zip(499974 bytes)Available download formats
    Dataset updated
    Jul 30, 2025
    Authors
    Sahil Raj
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    🎬 IMDB 2020 – Tableau Dashboard Project

    https://github.com/ssrAiLab/IMDB-2020-Tableau-Dashboard/blob/main/Dashboard%20Screenshot.png?raw=true" alt="Dashboard Preview">

    📊 Project Overview

    The IMDB Top 1000 Movies of 2020 dataset provides a rich canvas for exploring the world of cinema — and this Tableau project transforms that data into stunning visuals and insights.

    I’ve designed a dynamic and visually appealing dashboard using Tableau that highlights movie trends, ratings, genres, and key metrics from 2020’s cinematic landscape.

    🧠 Key Insights Covered

    ✅ Top 20 Movies by IMDB Rating
    ✅ Distribution of Movies by Genre
    ✅ Top Directors with Most Hits
    ✅ Language & Country-wise Movie Count
    ✅ Gross Earnings vs Ratings
    ✅ Runtime Distribution Analysis
    ✅ Certificate-wise Movie Breakdown
    ✅ Year-wise Trend in Popularity

    🛠️ Tools & Technologies Used

    • Tableau Public – for creating the interactive dashboard
    • Excel – for data cleaning and transformation
    • Kaggle & GitHub – for hosting and sharing the project
    • Design Thinking – for dashboard layout and visual balance

    🗂️ Files Included

    FileDescription
    IMDB_2020_Dashboard.twbTableau workbook file
    imdb_top_1000.csvCleaned dataset used
    Dashboard Screenshot.pngSnapshot of the final dashboard
    archive.zipContains all the files in one place

    🚀 How to Use

    1. Download the .twb file from this dataset
    2. Open it in Tableau Desktop or Tableau Public (free version)
    3. Explore the dashboard and insights interactively
    4. Customize or expand the analysis with your own creativity

    👨‍💻 About the Creator

    Sahil Raj
    Data Analyst | Tableau Storyteller | Movie Enthusiast 🎥
    🔗 LinkedIn | GitHub | Kaggle

    “Cinema is more than entertainment — it’s culture, storytelling, and data waiting to be visualized.”

    ⭐ Show Some Love

    • If you like the project, give it an upvote 💖
    • Share your feedback or forks
    • Connect on LinkedIn or GitHub for collaborations

    📌 This project is for educational and portfolio purposes only. IMDB data is publicly available and curated for non-commercial use.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Abdulrasaq Ariyo (2022). Netflix Data: Cleaning, Analysis and Visualization [Dataset]. https://www.kaggle.com/datasets/ariyoomotade/netflix-data-cleaning-analysis-and-visualization
Organization logo

Netflix Data: Cleaning, Analysis and Visualization

Cleaning and Visualization with Pgsql and Tableau

Explore at:
zip(276607 bytes)Available download formats
Dataset updated
Aug 26, 2022
Authors
Abdulrasaq Ariyo
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Netflix is a popular streaming service that offers a vast catalog of movies, TV shows, and original contents. This dataset is a cleaned version of the original version which can be found here. The data consist of contents added to Netflix from 2008 to 2021. The oldest content is as old as 1925 and the newest as 2021. This dataset will be cleaned with PostgreSQL and visualized with Tableau. The purpose of this dataset is to test my data cleaning and visualization skills. The cleaned data can be found below and the Tableau dashboard can be found here .

Data Cleaning

We are going to: 1. Treat the Nulls 2. Treat the duplicates 3. Populate missing rows 4. Drop unneeded columns 5. Split columns Extra steps and more explanation on the process will be explained through the code comments

--View dataset

SELECT * 
FROM netflix;

--The show_id column is the unique id for the dataset, therefore we are going to check for duplicates
                                  
SELECT show_id, COUNT(*)                                                                                      
FROM netflix 
GROUP BY show_id                                                                                              
ORDER BY show_id DESC;

--No duplicates
--Check null values across columns

SELECT COUNT(*) FILTER (WHERE show_id IS NULL) AS showid_nulls,
    COUNT(*) FILTER (WHERE type IS NULL) AS type_nulls,
    COUNT(*) FILTER (WHERE title IS NULL) AS title_nulls,
    COUNT(*) FILTER (WHERE director IS NULL) AS director_nulls,
    COUNT(*) FILTER (WHERE movie_cast IS NULL) AS movie_cast_nulls,
    COUNT(*) FILTER (WHERE country IS NULL) AS country_nulls,
    COUNT(*) FILTER (WHERE date_added IS NULL) AS date_addes_nulls,
    COUNT(*) FILTER (WHERE release_year IS NULL) AS release_year_nulls,
    COUNT(*) FILTER (WHERE rating IS NULL) AS rating_nulls,
    COUNT(*) FILTER (WHERE duration IS NULL) AS duration_nulls,
    COUNT(*) FILTER (WHERE listed_in IS NULL) AS listed_in_nulls,
    COUNT(*) FILTER (WHERE description IS NULL) AS description_nulls
FROM netflix;
We can see that there are NULLS. 
director_nulls = 2634
movie_cast_nulls = 825
country_nulls = 831
date_added_nulls = 10
rating_nulls = 4
duration_nulls = 3 

The director column nulls is about 30% of the whole column, therefore I will not delete them. I will rather find another column to populate it. To populate the director column, we want to find out if there is relationship between movie_cast column and director column

-- Below, we find out if some directors are likely to work with particular cast

WITH cte AS
(
SELECT title, CONCAT(director, '---', movie_cast) AS director_cast 
FROM netflix
)

SELECT director_cast, COUNT(*) AS count
FROM cte
GROUP BY director_cast
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC;

With this, we can now populate NULL rows in directors 
using their record with movie_cast 
UPDATE netflix 
SET director = 'Alastair Fothergill'
WHERE movie_cast = 'David Attenborough'
AND director IS NULL ;

--Repeat this step to populate the rest of the director nulls
--Populate the rest of the NULL in director as "Not Given"

UPDATE netflix 
SET director = 'Not Given'
WHERE director IS NULL;

--When I was doing this, I found a less complex and faster way to populate a column which I will use next

Just like the director column, I will not delete the nulls in country. Since the country column is related to director and movie, we are going to populate the country column with the director column

--Populate the country using the director column

SELECT COALESCE(nt.country,nt2.country) 
FROM netflix AS nt
JOIN netflix AS nt2 
ON nt.director = nt2.director 
AND nt.show_id <> nt2.show_id
WHERE nt.country IS NULL;
UPDATE netflix
SET country = nt2.country
FROM netflix AS nt2
WHERE netflix.director = nt2.director and netflix.show_id <> nt2.show_id 
AND netflix.country IS NULL;


--To confirm if there are still directors linked to country that refuse to update

SELECT director, country, date_added
FROM netflix
WHERE country IS NULL;

--Populate the rest of the NULL in director as "Not Given"

UPDATE netflix 
SET country = 'Not Given'
WHERE country IS NULL;

The date_added rows nulls is just 10 out of over 8000 rows, deleting them cannot affect our analysis or visualization

--Show date_added nulls

SELECT show_id, date_added
FROM netflix_clean
WHERE date_added IS NULL;

--DELETE nulls

DELETE F...
Search
Clear search
Close search
Google apps
Main menu