2 datasets found
  1. Job Descriptions Dataset

    • kaggle.com
    Updated May 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jayakishan Minnekanti (2025). Job Descriptions Dataset [Dataset]. https://www.kaggle.com/datasets/jayakishan225/job-descriptions-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 12, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jayakishan Minnekanti
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This dataset includes 521 real-world job descriptions for various data analyst roles, compiled solely for educational and research purposes. It was created to support natural language processing (NLP) and skill extraction tasks.

    Each row represents a unique job posting with: - Job Title: The role being advertised - Description: The full-text job description

    🔍 Use Case:
    This dataset was used in the "Job Skill Analyzer" project, which applies NLP and multi-label classification to extract in-demand skills such as Python, SQL, Tableau, Power BI, Excel, and Communication.

    🎯 Ideal For: - NLP-based skill extraction - Resume/job description matching - EDA on job market skill trends - Multi-label text classification projects

    ⚠️ Disclaimer:
    - The job descriptions were collected from publicly available postings across multiple job boards.
    - No logos, branding, or personally identifiable information is included.
    - This dataset is not intended for commercial use.

    License: CC BY-NC-SA 4.0
    Suitable For: NLP, EDA, Job Market Analysis, Skill Mining, Text Classification

  2. Netflix Movies & TV Shows dataset

    • kaggle.com
    Updated Oct 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zubaira Maimona (2025). Netflix Movies & TV Shows dataset [Dataset]. https://www.kaggle.com/datasets/zubairamuti/netflix-movies-and-tv-shows-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 3, 2025
    Dataset provided by
    Kaggle
    Authors
    Zubaira Maimona
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Content

    Regarding this dataset, Netflix is among the most popular websites for streaming movies and videos. They have more than 200 million members globally as of the middle of 2021, and their platform offers over 8,000 movies and TV shows. This tabular dataset contains listings of all the movies and TV shows available on Netflix, together with details about the actors, directors, ratings, length, year of release, and other details.

    Interesting Ideas to do Tasks for the people from different backgrounds

    For Analysts of Data

    Content Trends Over Time - Examine the annual changes in Netflix's movie and TV show counts. 2. Genre Popularity - Discover the most popular genres and how their popularity changes by location or year. 3. Country Insights - Find out which nations produce the most shows and what kinds of content they contribute. 4. Ratings Distribution - Show how the mature ratings (G, PG, R, TV-MA) are distributed throughout Netflix material. 5. Best Directors & Actors - Find the actors or directors who show up on Netflix the most.

    For Data Scientists

    Create a content-based recommender by utilizing genres and title descriptions in the Recommendation System Prototype. 2. Text Analysis on Descriptions - Apply natural language processing (NLP) to identify trends in the way Netflix characterizes its material using terms like "crime," "adventure," and "love." 3. Classification Models - Use metadata to determine if a title is a movie or a TV show. Using genres, lengths, and descriptions, group films and television series into clusters. 5. Trend Forecasting - Forecast future growth in the Netflix library using time-series analysis.

    For Students (Study Assignments)

    1. Data Cleaning & Preprocessing - Standardize formats and deal with missing variables (such as directors/countries).
    2. Exploratory Data Analysis (EDA): Make notebooks or dashboards with a ton of graphics that illustrate Netflix trends.
    3. Data Visualization Practice - Create imaginative graphics such as word clouds or heatmaps using Matplotlib, Seaborn, or Plotly. Storytelling with Data: Compose a data tale on how Netflix changed from renting out DVDs to becoming a major worldwide streaming service.
    4. Beginner Machine Learning – Start small: use genre or description to forecast maturity rating.

    Approach to the Netflix Dataset

    1. Understand the Data (Initial Exploration)

      • Load the dataset and check its size, columns, and data types.
      • Get a sense of the key fields: title, type, country, release_year, rating, etc.
      • Look for unique values (e.g., how many genres, countries, ratings).
    2. Data Cleaning & Preprocessing

      • Handle missing values (some entries don’t have directors or countries).
      • Standardize inconsistent formats (e.g., dates in date_added).
      • Split multi-valued columns (like genres or cast) if needed.
      • Convert durations into numeric values (minutes or seasons).
    3. Exploratory Data Analysis (EDA)

      • Compare Movies vs. TV Shows count.
      • Analyze content growth trend by release year or date added.
      • Study genre popularity across different countries.
      • Explore rating distribution (family-friendly vs. mature content).
      • Identify most frequent directors, actors, and countries.
    4. Visualization & Storytelling

      • Create bar charts, pie charts, heatmaps, and timelines.
      • Use word clouds for descriptions and genres.
      • Highlight interesting trends (e.g., rise of international TV shows).
    5. Advanced Analysis / Data Science Tasks

      • Build a recommendation system (based on genres & descriptions).
      • Perform sentiment/keyword analysis on descriptions.
      • Apply clustering to group similar shows/movies.
      • Predict whether a title is a movie or TV show from metadata.
    6. Insights & Reporting

      • Summarize key findings (e.g., “TV shows are growing faster than movies,” “US and India dominate Netflix content”).
      • Create dashboards (Tableau, Power BI, or Python libraries like Plotly).
      • Share a story rather than just numbers—make it human and relatable.
  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jayakishan Minnekanti (2025). Job Descriptions Dataset [Dataset]. https://www.kaggle.com/datasets/jayakishan225/job-descriptions-dataset
Organization logo

Job Descriptions Dataset

Real job descriptions for NLP-based data analyst skill extraction

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 12, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Jayakishan Minnekanti
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

This dataset includes 521 real-world job descriptions for various data analyst roles, compiled solely for educational and research purposes. It was created to support natural language processing (NLP) and skill extraction tasks.

Each row represents a unique job posting with: - Job Title: The role being advertised - Description: The full-text job description

🔍 Use Case:
This dataset was used in the "Job Skill Analyzer" project, which applies NLP and multi-label classification to extract in-demand skills such as Python, SQL, Tableau, Power BI, Excel, and Communication.

🎯 Ideal For: - NLP-based skill extraction - Resume/job description matching - EDA on job market skill trends - Multi-label text classification projects

⚠️ Disclaimer:
- The job descriptions were collected from publicly available postings across multiple job boards.
- No logos, branding, or personally identifiable information is included.
- This dataset is not intended for commercial use.

License: CC BY-NC-SA 4.0
Suitable For: NLP, EDA, Job Market Analysis, Skill Mining, Text Classification

Search
Clear search
Close search
Google apps
Main menu