29 datasets found
  1. LinkedIn Profile Data

    • kaggle.com
    zip
    Updated Mar 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Om Ashish Mishra (2020). LinkedIn Profile Data [Dataset]. https://www.kaggle.com/omashish/linkedin-profile-data
    Explore at:
    zip(2415431 bytes)Available download formats
    Dataset updated
    Mar 21, 2020
    Authors
    Om Ashish Mishra
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    LinkedIn is a place for increasing connection, showing your skills and achievements. Therefore in order to understand the various features like promotions, regional analysis and facial characteristics. This data is taken into consideration.

    Content

    Data is consisting of around 15000 profiles. The data set deals with a lot of features like region, the way the images are being uploaded, the emotions on them and growth of the users over time.

    Lets understand the following attributes for the betterment:-

    User id is a thing of privacy and should not be disclosed although there characteristics can be given in order to understand the various behavior pattern of people in LinkedIn. c id : name for each data, basically forms the primary key.

    Profession Columns avg time in previous position: The amount of time spent in years in the previous position avg current position length: The amount of time on an average the user is present in the current position avg previous position length: The amount of time on an average the user is present in the previous position m urn: The user id for each profile m urn id: This is reduced to a distinct code no of promotions: Total number of times the user was promoted no of previous positions: The number of previous positions the user holds current position length: The number of months the person is in current position age: The Age of the person gender: Male or Female ethnicity: The percentage of ethnicity n followers: Number of followers

    Image Clarity beauty: The beauty is the index for the analysis of the beauty female: This predicts the user image is more to be female or not.
    beauty male: This predicts the user image is more to be male or not. blur: The degree of shadiness of the image

    Emotion Captured emo anger: The percentage of anger found emo disgust: The percentage of disgust found emo fear : The percentage of fear found emo happiness: The percentage of happiness found emo neutral: The percentage of neutral emo sadness: The percentage of sadness emo surprise: The percentage of surprise

    Orientation & Facial Accessories glass: The person is wearing glasses or not or sunglasses head pitch: The orientation of head(basically Up or down) head roll: The orientation of head(side ways rolling; horizontal or vertical) head yaw: The orientation of head(side facing; left or right) mouth close: The percentage of closed mouth mouth mask: The percentage of masked mouth mouth open: The percentage of open mouth mouth other: The percentage of other mouth things skin acne: The percentage of skin tone skin dark_circle: The percentage of dark circle on skin skin health: The growth of the skin percentage skin stain: The stain percentage on skin smile: The smile percentage

    Region Columns nationality: The nationality belonging Followed by the percentage of each:- african celtic english
    east asian
    european
    greek
    hispanic
    jewish
    muslim
    nordic
    south asian

    face_quality: The quality of the face recognized.

    Acknowledgements

    We wouldn't be here without the help of Kagglers. If you owe any attributions or thanks, include them here along with any citations of past research.

    Inspiration

    Always wanted to contribute to the data science community and open up to questions.

  2. Average daily time spent on social media worldwide 2012-2024

    • statista.com
    • de.statista.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon, Average daily time spent on social media worldwide 2012-2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    How much time do people spend on social media?

                  As of 2024, the average daily social media usage of internet users worldwide amounted to 143 minutes per day, down from 151 minutes in the previous year. Currently, the country with the most time spent on social media per day is Brazil, with online users spending an average of three hours and 49 minutes on social media each day. In comparison, the daily time spent with social media in
                  the U.S. was just two hours and 16 minutes. Global social media usageCurrently, the global social network penetration rate is 62.3 percent. Northern Europe had an 81.7 percent social media penetration rate, topping the ranking of global social media usage by region. Eastern and Middle Africa closed the ranking with 10.1 and 9.6 percent usage reach, respectively.
                  People access social media for a variety of reasons. Users like to find funny or entertaining content and enjoy sharing photos and videos with friends, but mainly use social media to stay in touch with current events friends. Global impact of social mediaSocial media has a wide-reaching and significant impact on not only online activities but also offline behavior and life in general.
                  During a global online user survey in February 2019, a significant share of respondents stated that social media had increased their access to information, ease of communication, and freedom of expression. On the flip side, respondents also felt that social media had worsened their personal privacy, increased a polarization in politics and heightened everyday distractions.
    
  3. Restaurant Sales Data

    • kaggle.com
    zip
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Science Lovers (2025). Restaurant Sales Data [Dataset]. https://www.kaggle.com/datasets/rohitgrewal/restaurant-sales-data/code
    Explore at:
    zip(2237 bytes)Available download formats
    Dataset updated
    Jul 31, 2025
    Authors
    Data Science Lovers
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    📹 Project Video available on YouTube - https://youtu.be/dQwnyCEZ-UU

    🖇️Connect with me on LinkedIn - https://www.linkedin.com/in/rohit-grewal

    It is a sales data of a restaurant company operating in multiple cities in the world. It contains information about individual sales transactions, customer demographics, and product details. The data is structured in a tabular format, with each row representing a single record and each column representing a specific attribute. This dataset can be commonly used for business intelligence, sales forecasting, and customer behaviour analysis.

    Using this dataset, we answered multiple questions with Python in our Project.

    Q.1) Most Preferred Payment Method ?

    Q.2) Most Selling Product - By Quantity & By Revenue ?

    Q.3) Which city had maximum revenue , or , Which Manager earned maximum revenue ?

    Q.4) Date wise revenue.

    Q.5) Average Revenue.

    Q.6) Average Revenue of November & December month.

    Q.7) Standard Deviation of Revenue and Quantity ?

    Q.8) Variance of Revenue and Quantity ?

    Q.9) Is revenue increasing or decreasing over time?

    Q.10) Average 'Quantity Sold' & 'Average Revenue' for each product ?

    These are the main Features/Columns available in the dataset :

    1) Order ID: A unique identifier for each sales order. This can be used to track individual transactions.

    2) Order Date: The date when the order was placed. This column is essential for time-series analysis, such as identifying sales trends over time or seasonality.

    3) Product: The name or type of the product sold. This column is crucial for analyzing sales performance by product category.

    4) Price : The unit price of the product. This, along with 'Quantity Ordered', is used to calculate the total price of each order.

    5) Quantity : The number of units of the product sold in a single order. This is a key metric for calculating revenue and understanding sales volume.

    6) Purchase Type : The order was made online or in-store or drive-thru.

    7) Payment Method : How the payment for the order was done.

    8) Manager : Name of the manager of the store.

    9) City : The location of the store. This can be used for geographical analysis of sales, such as identifying top-performing regions or optimizing logistics.

  4. HR Dataset (Multinational Company)

    • kaggle.com
    zip
    Updated Aug 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Science Lovers (2025). HR Dataset (Multinational Company) [Dataset]. https://www.kaggle.com/datasets/rohitgrewal/hr-data-mnc/code
    Explore at:
    zip(69930946 bytes)Available download formats
    Dataset updated
    Aug 23, 2025
    Authors
    Data Science Lovers
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    📹Project Video available on YouTube - https://youtu.be/fykrwQD3HR4

    🖇️Connect with me on LinkedIn - https://www.linkedin.com/in/rohit-grewal

    Human Resource (HR) Data of a Multi-national Corporation (MNC) - 2 Million Records

    This dataset contains HR information for employees of a multinational corporation (MNC). It includes 2 Million (20 Lakhs) employee records with details about personal identifiers, job-related attributes, performance, employment status, and salary information. The dataset can be used for HR analytics, including workforce distribution, attrition analysis, salary trends, and performance evaluation.

    This data is available as a CSV file. We are going to analyse this data set using the Pandas. This analyse will be helpful for those working in HR domain.

    Using this dataset, we answered multiple questions with Python in our Project.

    Q.1) What is the distribution of Employee Status (Active, Resigned, Retired, Terminated) ?

    Q.2) What is the distribution of work modes (On-site, Remote) ?

    Q.3) How many employees are there in each department ?

    Q.4) What is the average salary by Department ?

    Q.5) Which job title has the highest average salary ?

    Q.6) What is the average salary in different Departments based on Job Title ?

    Q.7) How many employees Resigned & Terminated in each department ?

    Q.8) How does salary vary with years of experience ?

    Q.9) What is the average performance rating by department ?

    Q.10) Which Country have the highest concentration of employees ?

    Q.11) Is there a correlation between performance rating and salary ?

    Q.12) How has the number of hires changed over time (per year) ?

    Q.13) Compare salaries of Remote vs. On-site employees — is there a significant difference ?

    Q.14) Find the top 10 employees with the highest salary in each department.

    Q.15) Identify departments with the highest attrition rate (Resigned %).

    Enrol in our Udemy courses : 1. Python Data Analytics Projects - https://www.udemy.com/course/bigdata-analysis-python/?referralCode=F75B5F25D61BD4E5F161 2. Python For Data Science - https://www.udemy.com/course/python-for-data-science-real-time-exercises/?referralCode=9C91F0B8A3F0EB67FE67 3. Numpy For Data Science - https://www.udemy.com/course/python-numpy-exercises/?referralCode=FF9EDB87794FED46CBDF

    These are the main Features/Columns available in the dataset :

    1) Unnamed: 0 – Index column (auto-generated, not useful for analysis, will be deleted).

    2) Employee_ID – Unique identifier assigned to each employee (e.g., EMP0000001).

    3) Full_Name – Full name of the employee.

    4) Department – Department in which the employee works (e.g., IT, HR, Marketing, Operations).

    5) Job_Title – Designation or role of the employee (e.g., Software Engineer, HR Manager).

    6) Hire_Date – The date when the employee was hired by the company.

    7) Location – Geographical location of the employee (city, country).

    8) Performance_Rating – Performance evaluation score (numeric scale, higher is better).

    9) Experience_Years – Number of years of professional experience the employee has.

    10) Status – Current employment status (e.g., Active, Resigned).

    11) Work_Mode – Mode of working (e.g., On-site, Hybrid, Remote).

    12) Salary_INR – Annual salary of the employee in Indian Rupees.

  5. DISSERTATION - raw EEG dataset

    • kaggle.com
    zip
    Updated Jul 30, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexis Pomares Pastor (2021). DISSERTATION - raw EEG dataset [Dataset]. https://www.kaggle.com/datasets/alexispomares/dissertation-raw
    Explore at:
    zip(21481445801 bytes)Available download formats
    Dataset updated
    Jul 30, 2021
    Authors
    Alexis Pomares Pastor
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Context

    A major shortcoming of medical practice is the lack of an objective measure of conscious level. Impairment of consciousness is common, e.g. following brain injury and seizures, which can also interfere with sensory processing and volitional responses. This is also an important pitfall in neurophysiological methods that infer awareness via command following, e.g. using functional magnetic resonance imaging or electroencephalography (EEG).

    Transcranial electrical stimulation (TES) can be employed to non-invasively stimulate the brain, bypassing sensory inputs, and has already showed promising results in providing reliable indicators of brain state. However, current non-invasive solutions have been limited to transcranial magnetic stimulation, which is not easily translatable to clinical settings.

    In this neurotechnology research study, we demonstrated the feasibility of a framework using Deep Learning (DL) algorithms to classify EEG brain responses evoked by a defined multi-dimensional pattern of TES. We found that delivering transcranial direct current stimulation (tDCS) to posterior cortical areas targeting the angular gyrus elicited an exceptionally reliable brain response. For this particular paradigm, our best Convolutional Neural Network models reached a 92% classification f1-score on Holdout data from participants never seen during training, significantly surpassing an estimated human-level performance at 60-70% accuracy.

    Content

    This dataset contains the raw EEG files (in MFF format) used in our study (see Section 2.5), with the corresponding GitHub repository containing all code & support data. The complementary Kaggle dataset with preprocessed EEG files (CSV format) can be found here.

    Each root folder corresponds to a different approach we followed: 1. Timeseries: filtered, curated, and epoched EEG time series; resampled to 250Hz. Shape per training example => (250 time samples, 188 good channels) 2. Features: set of 37 statistical measures that describe the EEG time series data. Shape per training example => (37 features, 188 good channels) 3. Concatenated Features: an equivalent version to [2] but with features stacked horizontally. Shape per training example => (1 row, 6956 columns)

    Methodology

    We enrolled 11 healthy resting awake participants (4 female; ages 20-37, average 25.0±4.6 years old) to conduct 13 separate experimental sessions (subjects P000 and P001 participated twice, after results from first sessions were found invalid). Participants were instructed to sit awake with eyes open, and blinded to conditions applied. Following an initial rest period of 120 seconds (including 60 seconds with eyes closed), up to 58 blocks of TES were performed, with a total time of up to ~60 minutes per session as tolerated per the participant. EEG was continuously recorded at 1000Hz, and later resampled to 250Hz in our MNE preprocessing pipeline.

    For equipment we used a newly-acquired GTEN 200 neuromodulation system (Electrical Geodesics, Inc.) that allows simultaneously delivering TES and recording high-density EEG through the same 256-electrodes cap. We delivered tDCS and tACS stimulation across 2 cortical regions: bilateral posterior (targeting angular gyrus) and bilateral frontal (middle frontal gyrus).

    Acknowledgements

    Thank you to all my participants for playing a crucial part in this study, enabling the creation of two public datasets to be used freely by the DL-EEG research community.

    Special thanks for their expert guidance during this research to:
    Dr. Ines Ribeiro Violante
    Dr. Gregory Scott

    Questions?

    Open a Kaggle Discussion or contact me via LinkedIn.

    Thanks,
    Alexis Pomares

  6. TikTok: distribution of global audiences 2025, by age and gender

    • statista.com
    • de.statista.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department, TikTok: distribution of global audiences 2025, by age and gender [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Description

    As of February 2025, it was found that around 14.1 percent of TikTok's global audience were women between the ages of 18 and 24 years, while male users of the same age formed approximately 16.6 percent of the platform's audience. The online audience of the popular social video platform was further composed of 14.6 percent of female users aged between 25 and 34 years, and 20.7 percent of male users in the same age group.

  7. Facebook users worldwide 2017-2027

    • statista.com
    • de.statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon, Facebook users worldwide 2017-2027 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    The global number of Facebook users was forecast to continuously increase between 2023 and 2027 by in total 391 million users (+14.36 percent). After the fourth consecutive increasing year, the Facebook user base is estimated to reach 3.1 billion users and therefore a new peak in 2027. Notably, the number of Facebook users was continuously increasing over the past years. User figures, shown here regarding the platform Facebook, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).

  8. data jobs

    • kaggle.com
    zip
    Updated Nov 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ixlem26266 (2025). data jobs [Dataset]. https://www.kaggle.com/datasets/ixlem26266/data-jobs
    Explore at:
    zip(66489754 bytes)Available download formats
    Dataset updated
    Nov 2, 2025
    Authors
    Ixlem26266
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    A dataset of real-world data analytics job postings from 2023, collected and processed by Luke Barousse. I've been collecting data on data job postings since 2022. I've been using a bot to scrape the data from Google, which come from a variety of sources.

    Columns: job_id: A unique identifier for each job posting. Can be sourced from an external dataset (e.g., LinkedIn) or generated as a surrogate key during ETL to ensure each job row can be uniquely referenced. job_title_short: A simplified or standardized version of the job title (e.g., “Data Scientist”, “Data Analyst”) used for grouping or classification. job_title: The full, original job title as listed in the posting (e.g., “Senior Data Scientist – Machine Learning”). job_location: The location where the job is based, usually including city and/or state (e.g., “San Francisco, CA”). job_via: The platform, company, or recruitment source through which the job was posted (e.g., “via LinkedIn”, “via Indeed”). job_schedule_type: The type of job schedule, such as “Full-time”, “Part-time”, “Contract”, “Internship”, etc. job_work_from_home: Indicates whether the job allows remote work or work-from-home flexibility (Boolean or categorical: True / False / Hybrid). search_location: The geographic location or area used when searching or scraping for jobs (e.g., “New York”, “London”). Often used to contextualize the job posting. job_posted_date: The date when the job was originally posted or made public by the employer or platform. job_no_degree_mention: A flag indicating whether the job posting explicitly mentions that no degree is required (Boolean: True / False). job_health_insurance: Indicates whether the job listing includes health insurance or similar benefits (Boolean: True / False). job_country: The country in which the job is located (e.g., “USA”, “France”, “Germany”). salary_rate: The unit or frequency of the salary specified. salary_year_avg: The estimated or provided average annual salary for the job, standardized in a yearly format (numeric). salary_hour_avg: The estimated or provided average hourly wage, standardized in an hourly format (numeric). company_name: The name of the hiring company or organization offering the job. job_skills: A list or string of skills required or mentioned in the job description (e.g., “Python, SQL”). job_type_skills: A categorized or grouped skill profile, typically summarizing the type of job based on skill composition.

  9. Newborn Health Monitoring Dataset

    • kaggle.com
    Updated Aug 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arif Miah (2025). Newborn Health Monitoring Dataset [Dataset]. https://www.kaggle.com/datasets/miadul/newborn-health-monitoring-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 21, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Arif Miah
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    📌 Introduction

    This dataset is a synthetic yet realistic simulation of newborn baby health monitoring.
    It is designed for healthcare analytics, machine learning, and app development, especially for early detection of newborn health risks.

    The dataset mimics daily health records of newborn babies, including vital signs, growth parameters, feeding patterns, and risk classification labels.

    🎯 Motivation

    Newborn health is one of the most sensitive areas of healthcare.
    Monitoring newborns can help detect jaundice, infections, dehydration, and respiratory issues early.

    Since real newborn data is private and hard to access, this dataset provides a safe and realistic alternative for researchers, students, and developers to build and test:
    - 📊 Exploratory Data Analysis (EDA)
    - 🤖 Machine Learning classification models
    - 📱 Healthcare monitoring apps (Streamlit, Flask, Django, etc.)
    - 🏥 Predictive healthcare systems

    📂 Dataset Overview

    • Total Babies: 100
    • Monitoring Period: 30 days per baby
    • Total Records: 3,000
    • File Format: CSV
    • Synthetic Data: Generated using Python (pandas, numpy, faker) with medically-informed rules

    📑 Column Description

    🔹 Demographics

    • baby_id → Unique identifier for each baby (e.g., B001).
    • name → Randomly generated baby first name (for realism).
    • gender → Male / Female.
    • gestational_age_weeks → Gestational age at birth (normal: 37–42 weeks).
    • birth_weight_kg → Birth weight (normal range: 2.5–4.5 kg).
    • birth_length_cm → Length at birth (avg: 48–52 cm).
    • birth_head_circumference_cm → Head circumference at birth (avg: 33–35 cm).

    🔹 Daily Monitoring

    • date → Monitoring date.
    • age_days → Age of baby in days since birth.
    • weight_kg → Daily updated weight (growth trend ~25–30g/day).
    • length_cm → Daily updated body length (slow increase).
    • head_circumference_cm → Daily updated head circumference.
    • temperature_c → Body temperature in °C (normal: 36.5–37.5°C).
    • heart_rate_bpm → Heart rate (normal: 120–160 bpm).
    • respiratory_rate_bpm → Breathing rate (normal: 30–60 breaths/min).
    • oxygen_saturation → SpO₂ level (normal >95%).

    🔹 Feeding & Hydration

    • feeding_type → Breastfeeding / Formula / Mixed.
    • feeding_frequency_per_day → Number of feeds per day (normal: 8–12).
    • urine_output_count → Wet diapers/day (normal: 6–8+).
    • stool_count → Bowel movements per day (0–5 is common).

    🔹 Medical Screening

    • jaundice_level_mg_dl → Bilirubin level (normal <5, mild 5–12, severe >15).
    • apgar_score → 0–10 score at birth (only day 1).
    • immunizations_done → Yes/No (BCG, HepB, OPV on Day 1 & 30).
    • reflexes_normal → Newborn reflex check (Yes/No).

    🔹 Risk Classification

    • risk_level → Automatically assigned health status:
      • ✅ Healthy → All vitals normal.
      • ⚠️ At Risk → Mild abnormalities (e.g., mild jaundice, slight fever, SpO₂ 92–95%).
      • 🚨 Critical → Severe abnormalities (e.g., jaundice >15, SpO₂ <92, HR >180, temp >39°C).

    📊 How Data Was Generated

    The dataset was generated in Python using:
    - numpy and pandas for data simulation.
    - faker for generating baby names and dates.
    - Medically realistic rules for vitals, growth, jaundice progression, and risk classification.

    💡 Potential Applications

    • Machine Learning: Train classification models to predict newborn health risks.
    • Streamlit/Dash Apps: Build real-time newborn monitoring dashboards.
    • Healthcare Research: Study growth and vital sign patterns.
    • Education: Practice EDA, visualization, and predictive modeling on health datasets.

    📬 Author & Contact

    Created by [Arif Miah]
    I am passionate about AI, Healthcare Analytics, and App Development.
    You can connect with me:

    ⚠️ Disclaimer

    This is a synthetic dataset created for educational and research purposes only.
    It should NOT be used for actual medical diagnosis or treatment decisions.

  10. Instagram: distribution of global audiences 2024, by age group

    • statista.com
    • de.statista.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon, Instagram: distribution of global audiences 2024, by age group [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    As of April 2024, almost 32 percent of global Instagram audiences were aged between 18 and 24 years, and 30.6 percent of users were aged between 25 and 34 years. Overall, 16 percent of users belonged to the 35 to 44 year age group.

                  Instagram users
    
                  With roughly one billion monthly active users, Instagram belongs to the most popular social networks worldwide. The social photo sharing app is especially popular in India and in the United States, which have respectively 362.9 million and 169.7 million Instagram users each.
    
                  Instagram features
    
                  One of the most popular features of Instagram is Stories. Users can post photos and videos to their Stories stream and the content is live for others to view for 24 hours before it disappears. In January 2019, the company reported that there were 500 million daily active Instagram Stories users. Instagram Stories directly competes with Snapchat, another photo sharing app that initially became famous due to it’s “vanishing photos” feature.
                  As of the second quarter of 2021, Snapchat had 293 million daily active users.
    
  11. Leading social media platforms used by marketers worldwide 2024

    • statista.com
    • de.statista.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher Ross, Leading social media platforms used by marketers worldwide 2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Christopher Ross
    Description

    During a 2024 survey among marketers worldwide, around 86 percent reported using Facebook for marketing purposes. Instagram and LinkedIn followed, respectively mentioned by 79 and 65 percent of the respondents.

                  The global social media marketing segment
    
                  According to the same study, 59 percent of responding marketers intended to increase their organic use of YouTube for marketing purposes throughout that year. LinkedIn and Instagram followed with similar shares, rounding up the top three social media platforms attracting a planned growth in organic use among global marketers in 2024. Their main driver is increasing brand exposure and traffic, which led the ranking of benefits of social media marketing worldwide.
    
                  Social media for B2B marketing
    
                  Social media platform adoption rates among business-to-consumer (B2C) and business-to-business (B2B) marketers vary according to each subsegment's focus. While B2C professionals prioritize Facebook and Instagram – both run by Meta, Inc. – due to their popularity among online audiences, B2B marketers concentrate their endeavors on Microsoft-owned LinkedIn due to its goal to connect people and companies in a corporate context.
    
  12. Instagram accounts with the most followers worldwide 2024

    • statista.com
    • de.statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon, Instagram accounts with the most followers worldwide 2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    Cristiano Ronaldo has one of the most popular Instagram accounts as of April 2024.

                  The Portuguese footballer is the most-followed person on the photo sharing app platform with 628 million followers. Instagram's own account was ranked first with roughly 672 million followers.
    
                  How popular is Instagram?
    
                  Instagram is a photo-sharing social networking service that enables users to take pictures and edit them with filters. The platform allows users to post and share their images online and directly with their friends and followers on the social network. The cross-platform app reached one billion monthly active users in mid-2018. In 2020, there were over 114 million Instagram users in the United States and experts project this figure to surpass 127 million users in 2023.
    
                  Who uses Instagram?
    
                  Instagram audiences are predominantly young – recent data states that almost 60 percent of U.S. Instagram users are aged 34 years or younger. Fall 2020 data reveals that Instagram is also one of the most popular social media for teens and one of the social networks with the biggest reach among teens in the United States.
    
                  Celebrity influencers on Instagram
                  Many celebrities and athletes are brand spokespeople and generate additional income with social media advertising and sponsored content. Unsurprisingly, Ronaldo ranked first again, as the average media value of one of his Instagram posts was 985,441 U.S. dollars.
    
  13. Pupil-Teacher Ratio in Primary Education

    • kaggle.com
    zip
    Updated Dec 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hafiz Amsal (2024). Pupil-Teacher Ratio in Primary Education [Dataset]. https://www.kaggle.com/datasets/hafizamsal/pupil-teacher-ratio-in-primary-education/data
    Explore at:
    zip(2043559 bytes)Available download formats
    Dataset updated
    Dec 19, 2024
    Authors
    Hafiz Amsal
    License

    https://www.worldbank.org/en/about/legal/terms-of-use-for-datasetshttps://www.worldbank.org/en/about/legal/terms-of-use-for-datasets

    Description

    Kaggle Dataset Description

    Title: Pupil-Teacher Ratio in Primary Education
    Subtitle: Analyzing global trends in educational quality through pupil-teacher ratios.

    Detailed Description:
    This dataset contains data on the pupil-teacher ratio in primary education, which measures the average number of pupils per teacher. It is a critical indicator of educational quality and infrastructure. Sourced from the World Bank, this dataset spans several decades and countries, offering insights into the adequacy of teaching resources worldwide.

    Key Highlights: - Annual data for countries worldwide. - Metric: Number of pupils per teacher in primary education. - Use cases: Analyze trends, compare regional disparities, and study correlations with literacy rates, GDP, and enrollment rates.

    4. Exploratory Data Analysis (EDA)

    Notebook Ideas

    1. Data Cleaning:

      • Handle missing or inconsistent values.
      • Normalize data to compare across countries with varying education systems.
      • Aggregate data by regions or income levels for a comprehensive analysis.
    2. Visualizations:

      • Line Graph: Trends in pupil-teacher ratios over time for selected countries.
      • Heatmap: Regional disparities in pupil-teacher ratios by year.
      • Scatterplot: Correlation between pupil-teacher ratios and literacy rates or GDP.
      • Bar Chart: Top and bottom countries by pupil-teacher ratios for a given year.
    3. Descriptive Analysis:

      • Highlight regions with the lowest and highest pupil-teacher ratios.
      • Analyze changes over time in countries with significant improvements.
      • Explore patterns in regions with higher teacher availability.

    5. Predictive Analysis (Optional)

    • Use time-series forecasting (e.g., ARIMA or Prophet) to predict future trends in pupil-teacher ratios.
    • Apply clustering algorithms to group countries with similar trends in educational infrastructure.

    6. Kaggle Notebook

    Create a Kaggle notebook with: 1. Data Cleaning: Handle missing values or inconsistencies. 2. EDA: Include compelling visualizations like heatmaps, scatterplots, and line charts. 3. Insights: Highlight findings, such as regions with the best and worst pupil-teacher ratios or trends over time. 4. Optional Predictive Modeling: Use regression or time-series models to forecast future ratios.

    7. Call to Action

    For GitHub:

    • Share the GitHub repository link on LinkedIn, Twitter, and forums like Reddit (e.g., r/datascience).
    • Invite collaboration:
      • "Fork this repository and contribute by adding insights or new visualizations!"

    GitHub Link: https://github.com/AmsalAli/Pupil_Teacher_Ratio_Trends

    For Kaggle:

    • Encourage upvotes:
      • "If this dataset helps you, please upvote it to make it more visible for others!"
    • Engage users with questions:
      • "Which regions have made the most progress in reducing pupil-teacher ratios?"
      • "How do pupil-teacher ratios affect literacy and enrollment rates?"

    Kaggle Link: https://www.kaggle.com/datasets/yourusername/pupil-teacher-ratio

    8. LinkedIn Post

    Post Title:
    📚 Global Trends in Pupil-Teacher Ratios in Primary Education 🌍

    Post Body:
    Excited to share my latest dataset on pupil-teacher ratios in primary education, sourced from the World Bank. This dataset measures the average number of pupils per teacher, offering critical insights into global educational quality.

    📂 Explore the Dataset:
    - GitHub Repository: https://github.com/AmsalAli/Pupil_Teacher_Ratio_Trends
    - Kaggle Dataset: https://www.kaggle.com/datasets/yourusername/pupil-teacher-ratio

    Why It Matters:

    The pupil-teacher ratio is a key indicator of educational quality and infrastructure. This dataset is ideal for:
    - Trend Analysis: Study changes in pupil-teacher ratios across decades.
    - Regional Comparisons: Identify disparities in teaching resources globally.
    - Correlations: Explore the impact of pupil-teacher ratios on literacy and enrollment rates.

    📈 Get Involved:
    - Use this dataset for your analyses and visualizations.
    - Share your insights and findings.
    - Upvote on Kaggle to help others discover it!

    What trends or insights can you uncover from this data?
    - Which countries have the most favorable pupil-teacher ratios?
    - What factors drive disparities in teacher availability globally?

    Let me know your thoughts, and feel free to share this resource with others! 🌟

    DataScienc...

  14. Walmart Sales Forecasting

    • kaggle.com
    zip
    Updated Dec 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anggun Dwi Lestari (2024). Walmart Sales Forecasting [Dataset]. https://www.kaggle.com/datasets/anggundwilestari/walmart-sales-forecasting
    Explore at:
    zip(6261013 bytes)Available download formats
    Dataset updated
    Dec 8, 2024
    Authors
    Anggun Dwi Lestari
    Description

    About Dataset: Walmart Sales Forecast

    This dataset focuses on predicting weekly store sales at Walmart by examining holiday effects, temporal patterns, and other influential factors. The goal is to enable efficient stock planning, revenue calculations, and strategic decision-making by understanding patterns related to seasonal sales fluctuations. This machine learning model is developed based on resources from : https://www.kaggle.com/c/walmart-recruiting-store-sales-forecasting/overview/evaluation .

    Dataset Overview

    1. Test Data Contains 115,064 rows with information: Store, Department, Date, IsHoliday. "IsHoliday" indicates whether the week includes a special holiday. Holidays tend to show higher average sales than non-holiday periods.

    2. Train Data Also contains 115,064 rows with Store, Department, Date, IsHoliday, Weekly Sales. Weekly sales are the recorded weekly sales for specific departments at certain stores.

    3. Features Data Consists of 8,190 rows with variables such as Temperature, Fuel Price, CPI, Unemployment, Markdown 1-5, IsHoliday * Temperature: Average temperature (Fahrenheit) in a region. * Fuel Price: Can impact consumer spending and sales. * Markdowns 1-5: Promotional markdowns (missing values marked as NA). * CPI: Consumer Price Index (reflects inflation/deflation). * Unemployment: Unemployment rate in a region that affects consumer spending.

    4.Store Data Includes details about Walmart stores such as store numbers, store types, and store sizes. Walmart has 45 stores categorized into 3 types: * Type A: Sizes from 39.690 to 219.622 * Type B: Sizes from 34.875 to 140.167 * Type C: Sizes from 39.690 to 42.988 The target variables for prediction are weekly sales, is holiday, and date. The other features are explored to identify patterns and generate insights to build accurate prediction models.

    Modeling Objective

    The goal is to predict the impact of holidays on weekly store sales. To achieve this, a Time Series modeling approach was applied using variables such as date, weekly sales, is holiday, lag features, rolling averages, and XGBoost. The evaluation metric used was Weighted Mean Absolute Error (WMAE), which emphasizes periods of higher significance, such as holidays.

    Final Model Metrics: * Weighted Mean Absolute Error = 211 * Error rate relative to average weekly sales = ~1.32%.

    The low error percentage highlights the model's accuracy in forecasting weekly sales and assessing seasonal fluctuations.

    Insights

    • The analysis provides actionable insights by identifying holiday effects on sales trends.
    • This supports better stock planning, strategic financial planning, and risk management.

    📢 Published on : My LinkedIn

  15. London Housing Data

    • kaggle.com
    zip
    Updated Sep 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Science Lovers (2025). London Housing Data [Dataset]. https://www.kaggle.com/datasets/rohitgrewal/london-housing-data
    Explore at:
    zip(138862 bytes)Available download formats
    Dataset updated
    Sep 15, 2025
    Authors
    Data Science Lovers
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Area covered
    London
    Description

    📹Project Video available on YouTube - https://youtu.be/q-Omt6LgRLc

    🖇️Connect with me on LinkedIn - https://www.linkedin.com/in/rohit-grewal

    London Housing Price Dataset

    The dataset contains housing market information for different areas of London over time. It includes details such as average house prices, the number of houses sold, and crime statistics. The data spans multiple years and is organized by date and geographic area.

    This data is available as a CSV file. We are going to analyze this data set using the Pandas DataFrame.

    Using this dataset, we answered multiple questions with Python in our Project.

    Q. 1) Convert the Datatype of 'Date' column to Date-Time format.

    Q. 2.A) Add a new column ''year'' in the dataframe, which contains years only.

    Q. 2.B) Add a new column ''month'' as 2nd column in the dataframe, which contains month only.

    Q. 3) Remove the columns 'year' and 'month' from the dataframe.

    Q. 4) Show all the records where 'No. of Crimes' is 0. And, how many such records are there ?

    Q. 5) What is the maximum & minimum 'average_price' per year in england ?

    Q. 6) What is the Maximum & Minimum No. of Crimes recorded per area ?

    Q. 7) Show the total count of records of each area, where average price is less than 100000.

    Enrol in our Udemy courses : 1. Python Data Analytics Projects - https://www.udemy.com/course/bigdata-analysis-python/?referralCode=F75B5F25D61BD4E5F161 2. Python For Data Science - https://www.udemy.com/course/python-for-data-science-real-time-exercises/?referralCode=9C91F0B8A3F0EB67FE67 3. Numpy For Data Science - https://www.udemy.com/course/python-numpy-exercises/?referralCode=FF9EDB87794FED46CBDF

    These are the main Features/Columns available in the dataset :

    1) Date – The month and year when the data was recorded.

    2) Area – The London borough or area for which the housing and crime data is reported.

    3) Average_price – The average house price in the given area during the specified month.

    4) Code – The unique area code (e.g., government statistical code) corresponding to each borough or region.

    5) Houses_sold – The number of houses sold in the given area during the specified month.

    6) No_of_crimes – The number of crimes recorded in the given area during the specified month.

  16. TikTok global quarterly downloads 2018-2024

    • statista.com
    • de.statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department, TikTok global quarterly downloads 2018-2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Description

    In the fourth quarter of 2024, TikTok generated around 186 million downloads from users worldwide. Initially launched in China first by ByteDance as Douyin, the short-video format was popularized by TikTok and took over the global social media environment in 2020. In the first quarter of 2020, TikTok downloads peaked at over 313.5 million worldwide, up by 62.3 percent compared to the first quarter of 2019.

                  TikTok interactions: is there a magic formula for content success?
    
                  In 2024, TikTok registered an engagement rate of approximately 4.64 percent on video content hosted on its platform. During the same examined year, the social video app recorded over 1,100 interactions on average. These interactions were primarily composed of likes, while only recording less than 20 comments per piece of content on average in 2024.
                  The platform has been actively monitoring the issue of fake interactions, as it removed around 236 million fake likes during the first quarter of 2024. Though there is no secret formula to get the maximum of these metrics, recommended video length can possibly contribute to the success of content on TikTok.
                  It was recommended that tiny TikTok accounts with up to 500 followers post videos that are around 2.6 minutes long as of the first quarter of 2024. While, the ideal video duration for huge TikTok accounts with over 50,000 followers was 7.28 minutes. The average length of TikTok videos posted by the creators in 2024 was around 43 seconds.
    
                  What’s trending on TikTok Shop?
    
                  Since its launch in September 2023, TikTok Shop has become one of the most popular online shopping platforms, offering consumers a wide variety of products. In 2023, TikTok shops featuring beauty and personal care items sold over 370 million products worldwide.
                  TikTok shops featuring womenswear and underwear, as well as food and beverages, followed with 285 and 138 million products sold, respectively. Similarly, in the United States market, health and beauty products were the most-selling items,
                  accounting for 85 percent of sales made via the TikTok Shop feature during the first month of its launch. In 2023, Indonesia was the market with the largest number of TikTok Shops, hosting over 20 percent of all TikTok Shops. Thailand and Vietnam followed with 18.29 and 17.54 percent of the total shops listed on the famous short video platform, respectively.
    
  17. TikTok: account removed 2020-2024, by reason

    • statista.com
    • de.statista.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department, TikTok: account removed 2020-2024, by reason [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Description

    During the fourth quarter 2024, approximately 20.6 million TikTok accounts were removed from the platform due to suspicion of being operated by users under the age of 13. During the last measured period, around 185 million fake accounts were removed from fake accounts removed from TikTok.

  18. Global social network penetration 2019-2028

    • statista.com
    • de.statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon, Global social network penetration 2019-2028 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    The global social media penetration rate in was forecast to continuously increase between 2024 and 2028 by in total 11.6 (+18.19 percent). After the ninth consecutive increasing year, the penetration rate is estimated to reach 75.31 and therefore a new peak in 2028. Notably, the social media penetration rate of was continuously increasing over the past years.

  19. Facebook: distribution of global audiences 2024, by age and gender

    • statista.com
    • de.statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon, Facebook: distribution of global audiences 2024, by age and gender [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    As of April 2024, it was found that men between the ages of 25 and 34 years made up Facebook largest audience, accounting for 18.4 percent of global users. Additionally, Facebook's second largest audience base could be found with men aged 18 to 24 years.

                  Facebook connects the world
    
                  Founded in 2004 and going public in 2012, Facebook is one of the biggest internet companies in the world with influence that goes beyond social media. It is widely considered as one of the Big Four tech companies, along with Google, Apple, and Amazon (all together known under the acronym GAFA). Facebook is the most popular social network worldwide and the company also owns three other billion-user properties: mobile messaging apps WhatsApp and Facebook Messenger,
                  as well as photo-sharing app Instagram. Facebook usersThe vast majority of Facebook users connect to the social network via mobile devices. This is unsurprising, as Facebook has many users in mobile-first online markets. Currently, India ranks first in terms of Facebook audience size with 378 million users. The United States, Brazil, and Indonesia also all have more than 100 million Facebook users each.
    
  20. Social media revenue of selected companies 2023

    • statista.com
    • de.statista.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon, Social media revenue of selected companies 2023 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    In 2023, Meta Platforms had a total annual revenue of over 134 billion U.S. dollars, up from 116 billion in 2022. LinkedIn reported its highest annual revenue to date, generating over 15 billion USD, whilst Snapchat reported an annual revenue of 4.6 billion USD.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Om Ashish Mishra (2020). LinkedIn Profile Data [Dataset]. https://www.kaggle.com/omashish/linkedin-profile-data
Organization logo

LinkedIn Profile Data

Facial and Regional Data Analysis

Explore at:
115 scholarly articles cite this dataset (View in Google Scholar)
zip(2415431 bytes)Available download formats
Dataset updated
Mar 21, 2020
Authors
Om Ashish Mishra
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

LinkedIn is a place for increasing connection, showing your skills and achievements. Therefore in order to understand the various features like promotions, regional analysis and facial characteristics. This data is taken into consideration.

Content

Data is consisting of around 15000 profiles. The data set deals with a lot of features like region, the way the images are being uploaded, the emotions on them and growth of the users over time.

Lets understand the following attributes for the betterment:-

User id is a thing of privacy and should not be disclosed although there characteristics can be given in order to understand the various behavior pattern of people in LinkedIn. c id : name for each data, basically forms the primary key.

Profession Columns avg time in previous position: The amount of time spent in years in the previous position avg current position length: The amount of time on an average the user is present in the current position avg previous position length: The amount of time on an average the user is present in the previous position m urn: The user id for each profile m urn id: This is reduced to a distinct code no of promotions: Total number of times the user was promoted no of previous positions: The number of previous positions the user holds current position length: The number of months the person is in current position age: The Age of the person gender: Male or Female ethnicity: The percentage of ethnicity n followers: Number of followers

Image Clarity beauty: The beauty is the index for the analysis of the beauty female: This predicts the user image is more to be female or not.
beauty male: This predicts the user image is more to be male or not. blur: The degree of shadiness of the image

Emotion Captured emo anger: The percentage of anger found emo disgust: The percentage of disgust found emo fear : The percentage of fear found emo happiness: The percentage of happiness found emo neutral: The percentage of neutral emo sadness: The percentage of sadness emo surprise: The percentage of surprise

Orientation & Facial Accessories glass: The person is wearing glasses or not or sunglasses head pitch: The orientation of head(basically Up or down) head roll: The orientation of head(side ways rolling; horizontal or vertical) head yaw: The orientation of head(side facing; left or right) mouth close: The percentage of closed mouth mouth mask: The percentage of masked mouth mouth open: The percentage of open mouth mouth other: The percentage of other mouth things skin acne: The percentage of skin tone skin dark_circle: The percentage of dark circle on skin skin health: The growth of the skin percentage skin stain: The stain percentage on skin smile: The smile percentage

Region Columns nationality: The nationality belonging Followed by the percentage of each:- african celtic english
east asian
european
greek
hispanic
jewish
muslim
nordic
south asian

face_quality: The quality of the face recognized.

Acknowledgements

We wouldn't be here without the help of Kagglers. If you owe any attributions or thanks, include them here along with any citations of past research.

Inspiration

Always wanted to contribute to the data science community and open up to questions.

Search
Clear search
Close search
Google apps
Main menu