3 datasets found
  1. Social Media vs Productivity

    • kaggle.com
    Updated May 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahdi Mashayekhi (2025). Social Media vs Productivity [Dataset]. https://www.kaggle.com/datasets/mahdimashayekhi/social-media-vs-productivity/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 15, 2025
    Dataset provided by
    Kaggle
    Authors
    Mahdi Mashayekhi
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    πŸ“Š Social Media vs Productivity β€” Realistic Behavioral Dataset (30,000 Users)

    This dataset explores how daily digital habits β€” including social media usage, screen time, and notification exposure β€” relate to individual productivity, stress, and well-being.

    πŸ” What’s Inside?

    The dataset contains 30,000 real-world-style records simulating behavioral patterns of people with various jobs, social habits, and lifestyle choices. The goal is to understand how different digital behaviors correlate with perceived and actual productivity.

    🧠 Why This Dataset is Valuable

    • βœ… Designed for real-world ML workflows
      Includes missing values, noise, and outliers β€” ideal for practicing data cleaning and preprocessing.

    • πŸ”— High correlation between target features
      The perceived_productivity_score and actual_productivity_score are strongly correlated, making this dataset suitable for experiments in feature selection and multicollinearity.

    • πŸ› οΈ Feature Engineering playground
      Use this dataset to practice feature scaling, encoding, binning, interaction terms, and more.

    • πŸ§ͺ Perfect for EDA, regression & classification
      You can model productivity, stress, or satisfaction based on behavior patterns and digital exposure.

    🧾 Columns & Feature Info

    Column NameDescription
    ageAge of the individual (18–65 years)
    genderGender identity: Male, Female, or Other
    job_typeEmployment sector or status (IT, Education, Student, etc.)
    daily_social_media_timeAverage daily time spent on social media (hours)
    social_platform_preferenceMost-used social platform (Instagram, TikTok, Telegram, etc.)
    number_of_notificationsNumber of mobile/social notifications per day
    work_hours_per_dayAverage hours worked each day
    perceived_productivity_scoreSelf-rated productivity score (scale: 0–10)
    actual_productivity_scoreSimulated ground-truth productivity score (scale: 0–10)
    stress_levelCurrent stress level (scale: 1–10)
    sleep_hoursAverage hours of sleep per night
    screen_time_before_sleepTime spent on screens before sleeping (hours)
    breaks_during_workNumber of breaks taken during work hours
    uses_focus_appsWhether the user uses digital focus apps (True/False)
    has_digital_wellbeing_enabledWhether Digital Wellbeing is activated (True/False)
    coffee_consumption_per_dayNumber of coffee cups consumed per day
    days_feeling_burnout_per_monthNumber of burnout days reported per month
    weekly_offline_hoursTotal hours spent offline each week (excluding sleep)
    job_satisfaction_scoreSatisfaction with job/life responsibilities (scale: 0–10)

    πŸ“Œ Notes

    • Contains NaN values in critical columns (productivity, sleep, stress) for data imputation tasks
    • Includes outliers in media usage, coffee intake, and notification count
    • Target columns are strongly correlated for multicollinearity testing
    • Multi-purpose: regression, classification, clustering, visualization

    πŸ’‘ Use Cases

    • Exploratory Data Analysis (EDA)
    • Feature engineering pipelines
    • Machine learning model benchmarking
    • Statistical hypothesis testing
    • Burnout and mental health prediction projects

    πŸ“₯ Bonus

    πŸ‘‰ Sample notebook coming soon with data cleaning, visualization, and productivity prediction!

  2. Social Power NBA

    • kaggle.com
    Updated Aug 1, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Noah Gift (2017). Social Power NBA [Dataset]. https://www.kaggle.com/datasets/noahgift/social-power-nba/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 1, 2017
    Dataset provided by
    Kaggle
    Authors
    Noah Gift
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Context

    This data set contains combined on-court performance data for NBA players in the 2016-2017 season, alongside salary, Twitter engagement, and Wikipedia traffic data.

    Further information can be found in a series of articles for IBM Developerworks: "Explore valuation and attendance using data science and machine learning" and "Exploring the individual NBA players".

    A talk about this dataset has slides from March, 2018, Strata:

    https://www.slideshare.net/noahgift/social-power-andinfluenceinthenba-89807740?qid=3f9f835a-f3d7-4174-8a8c-c97f9c82e614&v=&b=&from_search=1

    Further reading on this dataset is in the book Pragmatic AI, in Chapter 6 or full book, Pragmatic AI: An introduction to Cloud-based Machine Learning and watch lesson 9 in Essential Machine Learning and AI with Python and Jupyter Notebook

    Followup Items

    Acknowledgement

    Data sources include ESPN, Basketball-Reference, Twitter, Five-ThirtyEight, and Wikipedia. The source code for this dataset (in Python and R) can be found on GitHub. Links to more writing can be found at noahgift.com.

    Inspiration

    • Do NBA fans know more about who the best players are, or do owners?
    • What is the true worth of the social media presence of athletes in the NBA?
  3. 19 East Gig Schedules

    • kaggle.com
    Updated Jun 21, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jereme (2018). 19 East Gig Schedules [Dataset]. https://www.kaggle.com/jeremejazz/19-east-gig-schedules-2016-2018/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 21, 2018
    Dataset provided by
    Kaggle
    Authors
    Jereme
    Description

    Context

    Back in 2016 I was working on an app to pass the time of unemployment so I decided to make a mobile app. This isn't much just for me to pass the time. You can find the app here though, I use it personally especially to check some interesting upcoming schedule

    Content

    The data consists of schedules from 2016 to 2018 (june). I'm still exploring data science though but I plan to make a simple analysis out of this scrapped data.

    I might be posting more data after some time to get some data like most frequent artists, one time deals, or maybe social media opinions as well probably some customers might have already tweeted about the event.

  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mahdi Mashayekhi (2025). Social Media vs Productivity [Dataset]. https://www.kaggle.com/datasets/mahdimashayekhi/social-media-vs-productivity/code
Organization logo

Social Media vs Productivity

Impact of Social Media Usage on Individual Productivity

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 15, 2025
Dataset provided by
Kaggle
Authors
Mahdi Mashayekhi
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

πŸ“Š Social Media vs Productivity β€” Realistic Behavioral Dataset (30,000 Users)

This dataset explores how daily digital habits β€” including social media usage, screen time, and notification exposure β€” relate to individual productivity, stress, and well-being.

πŸ” What’s Inside?

The dataset contains 30,000 real-world-style records simulating behavioral patterns of people with various jobs, social habits, and lifestyle choices. The goal is to understand how different digital behaviors correlate with perceived and actual productivity.

🧠 Why This Dataset is Valuable

  • βœ… Designed for real-world ML workflows
    Includes missing values, noise, and outliers β€” ideal for practicing data cleaning and preprocessing.

  • πŸ”— High correlation between target features
    The perceived_productivity_score and actual_productivity_score are strongly correlated, making this dataset suitable for experiments in feature selection and multicollinearity.

  • πŸ› οΈ Feature Engineering playground
    Use this dataset to practice feature scaling, encoding, binning, interaction terms, and more.

  • πŸ§ͺ Perfect for EDA, regression & classification
    You can model productivity, stress, or satisfaction based on behavior patterns and digital exposure.

🧾 Columns & Feature Info

Column NameDescription
ageAge of the individual (18–65 years)
genderGender identity: Male, Female, or Other
job_typeEmployment sector or status (IT, Education, Student, etc.)
daily_social_media_timeAverage daily time spent on social media (hours)
social_platform_preferenceMost-used social platform (Instagram, TikTok, Telegram, etc.)
number_of_notificationsNumber of mobile/social notifications per day
work_hours_per_dayAverage hours worked each day
perceived_productivity_scoreSelf-rated productivity score (scale: 0–10)
actual_productivity_scoreSimulated ground-truth productivity score (scale: 0–10)
stress_levelCurrent stress level (scale: 1–10)
sleep_hoursAverage hours of sleep per night
screen_time_before_sleepTime spent on screens before sleeping (hours)
breaks_during_workNumber of breaks taken during work hours
uses_focus_appsWhether the user uses digital focus apps (True/False)
has_digital_wellbeing_enabledWhether Digital Wellbeing is activated (True/False)
coffee_consumption_per_dayNumber of coffee cups consumed per day
days_feeling_burnout_per_monthNumber of burnout days reported per month
weekly_offline_hoursTotal hours spent offline each week (excluding sleep)
job_satisfaction_scoreSatisfaction with job/life responsibilities (scale: 0–10)

πŸ“Œ Notes

  • Contains NaN values in critical columns (productivity, sleep, stress) for data imputation tasks
  • Includes outliers in media usage, coffee intake, and notification count
  • Target columns are strongly correlated for multicollinearity testing
  • Multi-purpose: regression, classification, clustering, visualization

πŸ’‘ Use Cases

  • Exploratory Data Analysis (EDA)
  • Feature engineering pipelines
  • Machine learning model benchmarking
  • Statistical hypothesis testing
  • Burnout and mental health prediction projects

πŸ“₯ Bonus

πŸ‘‰ Sample notebook coming soon with data cleaning, visualization, and productivity prediction!

Search
Clear search
Close search
Google apps
Main menu