100+ datasets found
  1. Top 1000 Kaggle Datasets

    • kaggle.com
    zip
    Updated Jan 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Trrishan (2022). Top 1000 Kaggle Datasets [Dataset]. https://www.kaggle.com/datasets/notkrishna/top-1000-kaggle-datasets
    Explore at:
    zip(34269 bytes)Available download formats
    Dataset updated
    Jan 3, 2022
    Authors
    Trrishan
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    From wiki

    Kaggle, a subsidiary of Google LLC, is an online community of data scientists and machine learning practitioners. Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.

    Kaggle got its start in 2010 by offering machine learning competitions and now also offers a public data platform, a cloud-based workbench for data science, and Artificial Intelligence education. Its key personnel were Anthony Goldbloom and Jeremy Howard. Nicholas Gruen was founding chair succeeded by Max Levchin. Equity was raised in 2011 valuing the company at $25 million. On 8 March 2017, Google announced that they were acquiring Kaggle.[1][2]

    Source: Kaggle

  2. Powerful Data for Power BI

    • kaggle.com
    zip
    Updated Aug 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shiv_D24Coder (2023). Powerful Data for Power BI [Dataset]. https://www.kaggle.com/datasets/shivd24coder/powerful-data-for-power-bi
    Explore at:
    zip(907404 bytes)Available download formats
    Dataset updated
    Aug 28, 2023
    Authors
    Shiv_D24Coder
    Description

    Explore the world of data visualization with this Power BI dataset containing HR Analytics and Sales Analytics datasets. Gain insights, create impactful reports, and craft engaging dashboards using real-world data from HR and sales domains. Sharpen your Power BI skills and uncover valuable data-driven insights with this powerful dataset. Happy analyzing!

  3. Top 2500 Kaggle Datasets

    • kaggle.com
    Updated Feb 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saket Kumar (2024). Top 2500 Kaggle Datasets [Dataset]. http://doi.org/10.34740/kaggle/dsv/7637365
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 16, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Saket Kumar
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This dataset compiles the top 2500 datasets from Kaggle, encompassing a diverse range of topics and contributors. It provides insights into dataset creation, usability, popularity, and more, offering valuable information for researchers, analysts, and data enthusiasts.

    Research Analysis: Researchers can utilize this dataset to analyze trends in dataset creation, popularity, and usability scores across various categories.

    Contributor Insights: Kaggle contributors can explore the dataset to gain insights into factors influencing the success and engagement of their datasets, aiding in optimizing future submissions.

    Machine Learning Training: Data scientists and machine learning enthusiasts can use this dataset to train models for predicting dataset popularity or usability based on features such as creator, category, and file types.

    Market Analysis: Analysts can leverage the dataset to conduct market analysis, identifying emerging trends and popular topics within the data science community on Kaggle.

    Educational Purposes: Educators and students can use this dataset to teach and learn about data analysis, visualization, and interpretation within the context of real-world datasets and community-driven platforms like Kaggle.

    Column Definitions:

    Dataset Name: Name of the dataset. Created By: Creator(s) of the dataset. Last Updated in number of days: Time elapsed since last update. Usability Score: Score indicating the ease of use. Number of File: Quantity of files included. Type of file: Format of files (e.g., CSV, JSON). Size: Size of the dataset. Total Votes: Number of votes received. Category: Categorization of the dataset's subject matter.

  4. Netflix Data: Cleaning, Analysis and Visualization

    • kaggle.com
    zip
    Updated Aug 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdulrasaq Ariyo (2022). Netflix Data: Cleaning, Analysis and Visualization [Dataset]. https://www.kaggle.com/datasets/ariyoomotade/netflix-data-cleaning-analysis-and-visualization
    Explore at:
    zip(276607 bytes)Available download formats
    Dataset updated
    Aug 26, 2022
    Authors
    Abdulrasaq Ariyo
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Netflix is a popular streaming service that offers a vast catalog of movies, TV shows, and original contents. This dataset is a cleaned version of the original version which can be found here. The data consist of contents added to Netflix from 2008 to 2021. The oldest content is as old as 1925 and the newest as 2021. This dataset will be cleaned with PostgreSQL and visualized with Tableau. The purpose of this dataset is to test my data cleaning and visualization skills. The cleaned data can be found below and the Tableau dashboard can be found here .

    Data Cleaning

    We are going to: 1. Treat the Nulls 2. Treat the duplicates 3. Populate missing rows 4. Drop unneeded columns 5. Split columns Extra steps and more explanation on the process will be explained through the code comments

    --View dataset
    
    SELECT * 
    FROM netflix;
    
    
    --The show_id column is the unique id for the dataset, therefore we are going to check for duplicates
                                      
    SELECT show_id, COUNT(*)                                                                                      
    FROM netflix 
    GROUP BY show_id                                                                                              
    ORDER BY show_id DESC;
    
    --No duplicates
    
    --Check null values across columns
    
    SELECT COUNT(*) FILTER (WHERE show_id IS NULL) AS showid_nulls,
        COUNT(*) FILTER (WHERE type IS NULL) AS type_nulls,
        COUNT(*) FILTER (WHERE title IS NULL) AS title_nulls,
        COUNT(*) FILTER (WHERE director IS NULL) AS director_nulls,
        COUNT(*) FILTER (WHERE movie_cast IS NULL) AS movie_cast_nulls,
        COUNT(*) FILTER (WHERE country IS NULL) AS country_nulls,
        COUNT(*) FILTER (WHERE date_added IS NULL) AS date_addes_nulls,
        COUNT(*) FILTER (WHERE release_year IS NULL) AS release_year_nulls,
        COUNT(*) FILTER (WHERE rating IS NULL) AS rating_nulls,
        COUNT(*) FILTER (WHERE duration IS NULL) AS duration_nulls,
        COUNT(*) FILTER (WHERE listed_in IS NULL) AS listed_in_nulls,
        COUNT(*) FILTER (WHERE description IS NULL) AS description_nulls
    FROM netflix;
    
    We can see that there are NULLS. 
    director_nulls = 2634
    movie_cast_nulls = 825
    country_nulls = 831
    date_added_nulls = 10
    rating_nulls = 4
    duration_nulls = 3 
    

    The director column nulls is about 30% of the whole column, therefore I will not delete them. I will rather find another column to populate it. To populate the director column, we want to find out if there is relationship between movie_cast column and director column

    -- Below, we find out if some directors are likely to work with particular cast
    
    WITH cte AS
    (
    SELECT title, CONCAT(director, '---', movie_cast) AS director_cast 
    FROM netflix
    )
    
    SELECT director_cast, COUNT(*) AS count
    FROM cte
    GROUP BY director_cast
    HAVING COUNT(*) > 1
    ORDER BY COUNT(*) DESC;
    
    With this, we can now populate NULL rows in directors 
    using their record with movie_cast 
    
    UPDATE netflix 
    SET director = 'Alastair Fothergill'
    WHERE movie_cast = 'David Attenborough'
    AND director IS NULL ;
    
    --Repeat this step to populate the rest of the director nulls
    --Populate the rest of the NULL in director as "Not Given"
    
    UPDATE netflix 
    SET director = 'Not Given'
    WHERE director IS NULL;
    
    --When I was doing this, I found a less complex and faster way to populate a column which I will use next
    

    Just like the director column, I will not delete the nulls in country. Since the country column is related to director and movie, we are going to populate the country column with the director column

    --Populate the country using the director column
    
    SELECT COALESCE(nt.country,nt2.country) 
    FROM netflix AS nt
    JOIN netflix AS nt2 
    ON nt.director = nt2.director 
    AND nt.show_id <> nt2.show_id
    WHERE nt.country IS NULL;
    UPDATE netflix
    SET country = nt2.country
    FROM netflix AS nt2
    WHERE netflix.director = nt2.director and netflix.show_id <> nt2.show_id 
    AND netflix.country IS NULL;
    
    
    --To confirm if there are still directors linked to country that refuse to update
    
    SELECT director, country, date_added
    FROM netflix
    WHERE country IS NULL;
    
    --Populate the rest of the NULL in director as "Not Given"
    
    UPDATE netflix 
    SET country = 'Not Given'
    WHERE country IS NULL;
    

    The date_added rows nulls is just 10 out of over 8000 rows, deleting them cannot affect our analysis or visualization

    --Show date_added nulls
    
    SELECT show_id, date_added
    FROM netflix_clean
    WHERE date_added IS NULL;
    
    --DELETE nulls
    
    DELETE F...
    
  5. Human Resources Data Set

    • kaggle.com
    zip
    Updated Oct 19, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr. Rich (2020). Human Resources Data Set [Dataset]. https://www.kaggle.com/datasets/rhuebner/human-resources-data-set/discussion
    Explore at:
    zip(17041 bytes)Available download formats
    Dataset updated
    Oct 19, 2020
    Authors
    Dr. Rich
    Description

    Updated 30 January 2023

    Version 14 of Dataset

    License Update:

    There has been some confusion around licensing for this data set. Dr. Carla Patalano and Dr. Rich Huebner are the original authors of this dataset.

    We provide a license to anyone who wishes to use this dataset for learning or teaching. For the purposes of sharing, please follow this license:

    CC-BY-NC-ND This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

    Codebook

    https://rpubs.com/rhuebner/hrd_cb_v14

    PLEASE NOTE -- I recently updated the codebook - please use the above link. A few minor discrepancies were identified between the codebook and the dataset. Please feel free to contact me through LinkedIn (www.linkedin.com/in/RichHuebner) to report discrepancies and make requests.

    Context

    HR data can be hard to come by, and HR professionals generally lag behind with respect to analytics and data visualization competency. Thus, Dr. Carla Patalano and I set out to create our own HR-related dataset, which is used in one of our graduate MSHRM courses called HR Metrics and Analytics, at New England College of Business. We created this data set ourselves. We use the data set to teach HR students how to use and analyze the data in Tableau Desktop - a data visualization tool that's easy to learn.

    This version provides a variety of features that are useful for both data visualization AND creating machine learning / predictive analytics models. We are working on expanding the data set even further by generating even more records and a few additional features. We will be keeping this as one file/one data set for now. There is a possibility of creating a second file perhaps down the road where you can join the files together to practice SQL/joins, etc.

    Note that this dataset isn't perfect. By design, there are some issues that are present. It is primarily designed as a teaching data set - to teach human resources professionals how to work with data and analytics.

    Content

    We have reduced the complexity of the dataset down to a single data file (v14). The CSV revolves around a fictitious company and the core data set contains names, DOBs, age, gender, marital status, date of hire, reasons for termination, department, whether they are active or terminated, position title, pay rate, manager name, and performance score.

    Recent additions to the data include: - Absences - Most Recent Performance Review Date - Employee Engagement Score

    Acknowledgements

    Dr. Carla Patalano provided the baseline idea for creating this synthetic data set, which has been used now by over 200 Human Resource Management students at the college. Students in the course learn data visualization techniques with Tableau Desktop and use this data set to complete a series of assignments.

    Inspiration

    We've included some open-ended questions that you can explore and try to address through creating Tableau visualizations, or R or Python analyses. Good luck and enjoy the learning!

    • Is there any relationship between who a person works for and their performance score?
    • What is the overall diversity profile of the organization?
    • What are our best recruiting sources if we want to ensure a diverse organization?
    • Can we predict who is going to terminate and who isn't? What level of accuracy can we achieve on this?
    • Are there areas of the company where pay is not equitable?

    There are so many other interesting questions that could be addressed through this interesting data set. Dr. Patalano and I look forward to seeing what we can come up with.

    If you have any questions or comments about the dataset, please do not hesitate to reach out to me on LinkedIn: http://www.linkedin.com/in/RichHuebner

    You can also reach me via email at: Richard.Huebner@go.cambridgecollege.edu

  6. Divvy_Trips

    • kaggle.com
    zip
    Updated Oct 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FernandoGarciaH24 (2024). Divvy_Trips [Dataset]. https://www.kaggle.com/datasets/fernandogarciah24/divvy-trips/code
    Explore at:
    zip(25635550 bytes)Available download formats
    Dataset updated
    Oct 29, 2024
    Authors
    FernandoGarciaH24
    Description

    The dataset compiles rides from Q1_2019 and Q1_2020 from a cycling company in Chicago. The data needs to be cleaned and prepare for further analysis with some visualizations to make it easier to spot trends and recommendations.

  7. Global Top Chart Searches in 21st Century

    • kaggle.com
    zip
    Updated Apr 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sanjay (2023). Global Top Chart Searches in 21st Century [Dataset]. https://www.kaggle.com/datasets/sanjay277/global-top-chart-searches-in-21st-century
    Explore at:
    zip(2476 bytes)Available download formats
    Dataset updated
    Apr 16, 2023
    Authors
    Sanjay
    Description

    This Kaggle dataset provides a comprehensive list of the top global Google searches over the years 2001-2023. The dataset includes information such as the search term, the year in which it was trending. This information can be used for a variety of purposes, including trend analysis, market research, and data visualization. With this dataset, users can gain insights into the popular search trends and topics over the years, and how they have evolved over time.

  8. All Seaborn Built-in Datasets πŸ“Šβœ¨

    • kaggle.com
    zip
    Updated Aug 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdelrahman Mohamed (2024). All Seaborn Built-in Datasets πŸ“Šβœ¨ [Dataset]. https://www.kaggle.com/datasets/abdoomoh/all-seaborn-built-in-datasets
    Explore at:
    zip(1383218 bytes)Available download formats
    Dataset updated
    Aug 27, 2024
    Authors
    Abdelrahman Mohamed
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Description: - This dataset includes all 22 built-in datasets from the Seaborn library, a widely used Python data visualization tool. Seaborn's built-in datasets are essential resources for anyone interested in practicing data analysis, visualization, and machine learning. They span a wide range of topics, from classic datasets like the Iris flower classification to real-world data such as Titanic survival records and diamond characteristics.

    • Included Datasets:
      • Anagrams: Analysis of word anagram patterns.
      • Anscombe: Anscombe's quartet demonstrating the importance of data visualization.
      • Attention: Data on attention span variations in different scenarios.
      • Brain Networks: Connectivity data within brain networks.
      • Car Crashes: US car crash statistics.
      • Diamonds: Data on diamond properties including price, cut, and clarity.
      • Dots: Randomly generated data for scatter plot visualization.
      • Dow Jones: Historical records of the Dow Jones Industrial Average.
      • Exercise: The relationship between exercise and health metrics.
      • Flights: Monthly passenger numbers on flights.
      • FMRI: Functional MRI data capturing brain activity.
      • Geyser: Eruption times of the Old Faithful geyser.
      • Glue: Strength of glue under different conditions.
      • Health Expenditure: Health expenditure statistics across countries.
      • Iris: Famous dataset for classifying Iris species.
      • MPG: Miles per gallon for various vehicles.
      • Penguins: Data on penguin species and their features.
      • Planets: Characteristics of discovered exoplanets.
      • Sea Ice: Measurements of sea ice extent.
      • Taxis: Taxi trips data in a city.
      • Tips: Tipping data collected from a restaurant.
      • Titanic: Survival data from the Titanic disaster.

    This complete collection serves as an excellent starting point for anyone looking to improve their data science skills, offering a wide array of datasets suitable for both beginners and advanced users.

  9. top_500_games

    • kaggle.com
    zip
    Updated Aug 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Subhadeepta Sahoo (2022). top_500_games [Dataset]. https://www.kaggle.com/datasets/subhadeeptasahoo/top-500-games
    Explore at:
    zip(11709 bytes)Available download formats
    Dataset updated
    Aug 23, 2022
    Authors
    Subhadeepta Sahoo
    Description

    The dataset has information about the top 500 games of all time including all genres and all platforms. The data has been scraped from "https://www.pwnrank.com/top-500" using the Python web scraping module BeautifulSoup. The dataset can be used to visualize the top games in each platform and number of games released over the years in each platform etc.

  10. supply chain data set

    • kaggle.com
    Updated Aug 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    shiva iyer (2023). supply chain data set [Dataset]. https://www.kaggle.com/datasets/shivaiyer129/supply-chain-data-set
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 8, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    shiva iyer
    License

    https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

    Description

    The dataset contains information related to supply chain operations, including orders, products, inventory, suppliers, logistics, and demand. It aims to optimize supply chain efficiency and improve performance through predictive analytics, inventory management, and logistics optimization.

  11. Sales Data Analysis Project

    • kaggle.com
    zip
    Updated Jun 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stina Tonia (2024). Sales Data Analysis Project [Dataset]. https://www.kaggle.com/datasets/stinatonia/2019-project-on-sales
    Explore at:
    zip(3818151 bytes)Available download formats
    Dataset updated
    Jun 1, 2024
    Authors
    Stina Tonia
    Description

    This project was done to analyze sales data: to identify trends, top-selling products, and revenue metrics for business decision-making. I did this project offered by MeriSKILL, to learn more and be exposed to real-world projects and challenges that will provide me with valuable industry experience and help me develop my data analytical skills.https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20837845%2Fe3561db319392bf9cc8b7d3fcc7ed94d%2F2019%20Sales%20dashboard.png?generation=1717273572595587&alt=media" alt=""> More on this project is on Medium

  12. Data Visualization Cheat sheets and Resources

    • kaggle.com
    zip
    Updated May 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kash (2022). Data Visualization Cheat sheets and Resources [Dataset]. https://www.kaggle.com/kaushiksuresh147/data-visualization-cheat-cheats-and-resources
    Explore at:
    zip(133638507 bytes)Available download formats
    Dataset updated
    May 31, 2022
    Authors
    Kash
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The Data Visualization Corpus

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1430847%2F29f7950c3b7daf11175aab404725542c%2FGettyImages-1187621904-600x360.jpg?generation=1601115151722854&alt=media" alt="">

    Data Visualization

    Data visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data.

    In the world of Big Data, data visualization tools and technologies are essential to analyze massive amounts of information and make data-driven decisions

    The Data Visualizaion Copus

    The Data Visualization corpus consists:

    • 32 cheat sheets: This includes A-Z about the techniques and tricks that can be used for visualization, Python and R visualization cheat sheets, Types of charts, and their significance, Storytelling with data, etc..

    • 32 Charts: The corpus also consists of a significant amount of data visualization charts information along with their python code, d3.js codes, and presentations relation to the respective charts explaining in a clear manner!

    • Some recommended books for data visualization every data scientist's should read:

      1. Beautiful Visualization by Julie Steele and Noah Iliinsky
      2. Information Dashboard Design by Stephen Few
      3. Knowledge is beautiful by David McCandless (Short abstract)
      4. The Functional Art: An Introduction to Information Graphics and Visualization by Alberto Cairo
      5. The Visual Display of Quantitative Information by Edward R. Tufte
      6. storytelling with data: a data visualization guide for business professionals by cole Nussbaumer knaflic
      7. Research paper - Cheat Sheets for Data Visualization Techniques by Zezhong Wang, Lovisa Sundin, Dave Murray-Rust, Benjamin Bach

    Suggestions:

    In case, if you find any books, cheat sheets, or charts missing and if you would like to suggest some new documents please let me know in the discussion sections!

    Resources:

    Request to kaggle users:

    • A kind request to kaggle users to create notebooks on different visualization charts as per their interest by choosing a dataset of their own as many beginners and other experts could find it useful!

    • To create interactive EDA using animation with a combination of data visualization charts to give an idea about how to tackle data and extract the insights from the data

    Suggestion and queries:

    Feel free to use the discussion platform of this data set to ask questions or any queries related to the data visualization corpus and data visualization techniques

    Kindly upvote the dataset if you find it useful or if you wish to appreciate the effort taken to gather this corpus! Thank you and have a great day!

  13. Top 200 Movies Lifetime Gross (Beginner Friendly)

    • kaggle.com
    zip
    Updated Dec 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oguz Enver Uzan (2023). Top 200 Movies Lifetime Gross (Beginner Friendly) [Dataset]. https://www.kaggle.com/datasets/oguzuzan/top-200-movies-lifetime-gross
    Explore at:
    zip(4581 bytes)Available download formats
    Dataset updated
    Dec 20, 2023
    Authors
    Oguz Enver Uzan
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset contains the 200 highest-grossing movies worldwide. If you are new to data analysis and coding, it is a good dataset to start with.

    I converted the data to CSV by web scraping from Box Office.

    There are 4 different columns in the data set:

    Rank: Shows the ranking of the movies between 1-200.

    Title: Names of movies

    Year: The date the movies were released

    Lifetime_Gross: Total Revenues of Movies

  14. 🍿 IMDb Top 100 Movies Dataset (2025 Edition)

    • kaggle.com
    zip
    Updated Oct 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shayan Zulfiqar (2025). 🍿 IMDb Top 100 Movies Dataset (2025 Edition) [Dataset]. https://www.kaggle.com/datasets/shayanzk/imdb-top-100-movies-dataset-2025-edition
    Explore at:
    zip(5609 bytes)Available download formats
    Dataset updated
    Oct 7, 2025
    Authors
    Shayan Zulfiqar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    🎬 Overview

    This dataset contains detailed information about the Top 100 highest-rated movies on IMDb (as of 2025). It’s designed for data exploration, visualization, and machine learning projects related to cinema, audience preferences, and storytelling trends.

    πŸ“Š Features Included

    Each movie entry includes:

    .πŸŽ₯ Title β€” Movie name .πŸ“… Year β€” Release year .⭐ IMDb Rating β€” Average rating (out of 10) .πŸ—³οΈ Votes β€” Number of IMDb user votes .🎭 Genre β€” Primary and secondary genres .🎬 Director β€” Film director .🌟 Stars / Cast β€” Leading actors or actresses .πŸ•’ Runtime β€” Duration in minutes

    πŸ’‘ How to Use This Dataset This dataset is perfect for:

    .EDA (Exploratory Data Analysis) β€” Uncover patterns between ratings, genres, and directors .Data Visualization Projects β€” Create dashboards or charts of top genres, decades, or countries .Machine Learning β€” Build simple recommendation systems or predict ratings .Storytelling β€” Analyze what makes a movie timeless

    πŸ“ˆ Example Ideas .Compare IMDb ratings across decades .Visualize the most common genres among top-rated films .Identify directors with multiple entries in the Top 100 .Correlate number of votes vs rating

    πŸ”– Inspiration This dataset brings together cinematic excellence from around the world, from timeless classics to modern masterpieces β€” perfect for analysts, developers, and movie enthusiasts alike.

  15. netflix_data

    • kaggle.com
    zip
    Updated Jul 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anvith SV (2023). netflix_data [Dataset]. https://www.kaggle.com/datasets/anvithsv/netflix-data
    Explore at:
    zip(1400865 bytes)Available download formats
    Dataset updated
    Jul 30, 2023
    Authors
    Anvith SV
    Description

    Dataset

    This dataset was created by Anvith SV

    Contents

  16. Highest-Rated TV Shows Dataset

    • kaggle.com
    zip
    Updated Nov 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raveennimbiwal (2025). Highest-Rated TV Shows Dataset [Dataset]. https://www.kaggle.com/datasets/raveennimbiwal/top-rated-tv-shows-dataset-global-2025
    Explore at:
    zip(321985 bytes)Available download formats
    Dataset updated
    Nov 25, 2025
    Authors
    Raveennimbiwal
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset features the top 2,000 highest-rated web and TV series from TMDB, along with the key details like titles, genres, ratings, votes, etc. It’s a simple, clean collection for exploring TV trends, understanding audience ratings, or running quick analysis projects.

    The file contains 11 columns : - id – Unique numeric identifier for each series. - title – International or primary title of the series. - original_title – Title in the original language. - overview – Short description or summary of the series. - premiere_date – First air date in YYYY-MM-DD format. - popularity – TMDB popularity score. - genre – Comma-separated list of genres. - country_origin – Country where the series was produced. - original_language – Original spoken language. - rating – Average user rating (0–10 scale). - votes – Total number of user votes.

  17. Human Resources Data Set

    • kaggle.com
    zip
    Updated Mar 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lissa Dora (2021). Human Resources Data Set [Dataset]. https://www.kaggle.com/datasets/lissadora/human-resources-data-set
    Explore at:
    zip(71388 bytes)Available download formats
    Dataset updated
    Mar 16, 2021
    Authors
    Lissa Dora
    Description

    Instruction

    Please create at minimum two Power BI dashboards with filters and drill-down functionality.

    Here are some open-ended questions that you can explore and try to address through creating Power BI visualizations.

    • Is there any relationship between who a person works for and their performance score?
    • What is the overall diversity profile of the organization?
    • What are our best recruiting sources if we want to ensure a diverse organization?
    • How does the turnover look like? What kind of insight can be derived from this data?
    • Are there areas of the company where pay is not equitable?
  18. Visualizing Chicago Crime Data

    • kaggle.com
    zip
    Updated Jul 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elijah Toumoua (2022). Visualizing Chicago Crime Data [Dataset]. https://www.kaggle.com/datasets/elijahtoumoua/chicago-analysis-of-crime-data-dashboard
    Explore at:
    zip(94861784 bytes)Available download formats
    Dataset updated
    Jul 1, 2022
    Authors
    Elijah Toumoua
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Chicago
    Description

    Prelude

    This dataset is a cleaned version of the Chicago Crime Dataset, which can be found here. All rights for the dataset go to the original owners. The purpose of this dataset is to display my skills in visualizations and creating dashboards. To be specific, I will attempt to create a dashboard that will allow users to see metrics for a specific crime within a given year using filters and metrics. Due to this, there will not be much of a focus on the analysis of the data, but there will be portions discussing the validity of the dataset, the steps I took to clean the data, and how I organized it. The cleaned datasets can be found below, the Query (which utilized BigQuery) can be found here and the Tableau dashboard can be found here.

    About the Dataset

    Important Facts

    The dataset comes directly from the City of Chicago's website under the page "City Data Catalog." The data is gathered directly from the Chicago Police's CLEAR (Citizen Law Enforcement Analysis and Reporting) and is updated daily to present the information accurately. This means that a crime on a specific date may be changed to better display the case. The dataset represents crimes starting all the way from 2001 to seven days prior to today's date.

    Reliability

    Using the ROCCC method, we can see that: * The data has high reliability: The data covers the entirety of Chicago from a little over 2 decades. It covers all the wards within Chicago and even gives the street names. While we may not have an idea for how big the sample size is, I do believe that the dataset has high reliability since it geographically covers the entirety of Chicago. * The data has high originality: The dataset was gained directly from the Chicago Police Dept. using their database, so we can say this dataset is original. * The data is somewhat comprehensive: While we do have important information such as the types of crimes committed and their geographic location, I do not think this gives us proper insights as to why these crimes take place. We can pinpoint the location of the crime, but we are limited by the information we have. How hot was the day of the crime? Did the crime take place in a neighborhood with low-income? I believe that these key factors prevent us from getting proper insights as to why these crimes take place, so I would say that this dataset is subpar with how comprehensive it is. * The data is current: The dataset is updated frequently to display crimes that took place seven days prior to today's date and may even update past crimes as more information comes to light. Due to the frequent updates, I do believe the data is current. * The data is cited: As mentioned prior, the data is collected directly from the polices CLEAR system, so we can say that the data is cited.

    Processing the Data

    Cleaning the Dataset

    The purpose of this step is to clean the dataset such that there are no outliers in the dashboard. To do this, we are going to do the following: * Check for any null values and determine whether we should remove them. * Update any values where there may be typos. * Check for outliers and determine if we should remove them.

    The following steps will be explained in the code segments below. (I used BigQuery for this so the coding will follow BigQuery's syntax) ```

    Examining the dataset

    There are over 7.5 million rows of data

    Putting a limit so it does not take a long time to run

    SELECT * FROM portfolioproject-350601.ChicagoCrime.Crime LIMIT 1000;

    Seeing which points are null

    There are 85,000 null points so we can exclude them as it's not a significant amount since it is only ~1.3% of the dataset

    Most of the null points are in the lat and long, which we will need later

    Because we don't have the full address, we can't estimate the lat and long in SQL so we will have to delete the rows with Null Data

    SELECT * FROM portfolioproject-350601.ChicagoCrime.Crime WHERE unique_key IS NULL OR case_number IS NULL OR date IS NULL OR primary_type IS NULL OR location_description IS NULL OR arrest IS NULL OR longitude IS NULL OR latitude IS NULL;

    Deleting all null rows

    DELETE FROM portfolioproject-350601.ChicagoCrime.Crime WHERE
    unique_key IS NULL OR case_number IS NULL OR date IS NULL OR primary_type IS NULL OR location_description IS NULL OR arrest IS NULL OR longitude IS NULL OR latitude IS NULL;

    Checking for any duplicates in the unique keys

    None to be found

    SELECT unique_key, COUNT(unique_key) FROM `portfolioproject-350601.ChicagoCrime....

  19. Financial Analysis Dataset

    • kaggle.com
    zip
    Updated Jul 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael_Dsouza16 (2024). Financial Analysis Dataset [Dataset]. https://www.kaggle.com/datasets/michaeldsouza16/financial-data-analysis
    Explore at:
    zip(8886 bytes)Available download formats
    Dataset updated
    Jul 21, 2024
    Authors
    Michael_Dsouza16
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains financial information for the top 500 companies in India, including their market capitalization and quarterly sales. The data is categorized based on market cap and sales quartiles, allowing for detailed analysis and comparison. This dataset can be used to identify trends, patterns, and key metrics that are crucial for understanding the competitive landscape in the Indian market.

  20. Top Programming Guru

    • kaggle.com
    zip
    Updated Jul 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    code19 (2021). Top Programming Guru [Dataset]. https://www.kaggle.com/datasets/code19/top-programming-guru/suggestions?status=pending&yourSuggestions=true
    Explore at:
    zip(6655537 bytes)Available download formats
    Dataset updated
    Jul 15, 2021
    Authors
    code19
    Description

    Context

    YouTube has been a great tool for me to learn programming. There was one time I saw a video about Calculating the Duration of a Playlist. It got me curious to learn more about the YouTube API. I want to answer questions like what are the trending videos on YouTube in a given time and location? How can some videos get more views than others? …, This is how the idea of the dataset was born.

    Content

    What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too. In this database we collected data from YouTube channels that were named in Top Programming Guru. These YouTube channels helped people to advance their careers in programming. The ranking of these YouTube channels is based on the vote of the community. We used the YouTube API to collect data statistics about channels, videos and playlists. The dataset has 5 files *

    Acknowledgements

    If I had to learn only from the official documentation of the YouTube API it would take me forever to finish the project :), big thanks to Corey Schafer and ClarityCoders for the amazing YouTube videos they put about this API. dataset about learning to code using . Python YouTube API Tutorial: Getting Started - Creating an API Key and Querying the API
    "https://www.youtube.com/watch?v=2mSwcRb3KjQ">Using Python and YouTube API to Create Analytics on any Channel. Of course there's always the official documentation where you can ran some test without having to create an API key YouTube API

    Inspiration

    My hope is that this data set can be used in some way to help create more YouTube content that programmers are eager to discover. Analyzing this data can help content creators make data-driven decisions about topic, duration, and expected number of views when making new videos.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Trrishan (2022). Top 1000 Kaggle Datasets [Dataset]. https://www.kaggle.com/datasets/notkrishna/top-1000-kaggle-datasets
Organization logo

Top 1000 Kaggle Datasets

Kaggle's most popular datasets

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
zip(34269 bytes)Available download formats
Dataset updated
Jan 3, 2022
Authors
Trrishan
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

From wiki

Kaggle, a subsidiary of Google LLC, is an online community of data scientists and machine learning practitioners. Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.

Kaggle got its start in 2010 by offering machine learning competitions and now also offers a public data platform, a cloud-based workbench for data science, and Artificial Intelligence education. Its key personnel were Anthony Goldbloom and Jeremy Howard. Nicholas Gruen was founding chair succeeded by Max Levchin. Equity was raised in 2011 valuing the company at $25 million. On 8 March 2017, Google announced that they were acquiring Kaggle.[1][2]

Source: Kaggle

Search
Clear search
Close search
Google apps
Main menu