3 datasets found
  1. Prostate Cancer Risk & Lifestyle Synthetic Dataset

    • kaggle.com
    zip
    Updated Jul 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arif Miah (2025). Prostate Cancer Risk & Lifestyle Synthetic Dataset [Dataset]. https://www.kaggle.com/datasets/miadul/prostate-cancer-risk-and-lifestyle-synthetic-dataset/code
    Explore at:
    zip(12073 bytes)Available download formats
    Dataset updated
    Jul 26, 2025
    Authors
    Arif Miah
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This synthetic dataset simulates 1,000 individual health profiles focusing on potential risk factors for prostate cancer. The dataset is designed to support public health awareness, machine learning research, and medical decision-support application development.

    Prostate cancer is one of the most common cancers among men globally. While early detection through regular checkups can help prevent fatal outcomes, many lifestyle and behavioral factors contribute to increased risk.

    This dataset includes a variety of features such as age, body mass index (BMI), smoking habits, diet, physical activity, family history of cancer, mental stress levels, and health-check behavior. Each row is labeled with an estimated prostate cancer risk level (Low / Medium / High) based on a rule-based scoring model.

    The dataset is purely synthetic and contains no real patient information. It is safe for educational, research, and development use.

    🔍 Dataset Columns:

    Column Name Description id Unique identifier for each individual (synthetic) age Age of the individual in years (30–80) bmi Body Mass Index (simulated around average value with some variation) smoker Indicates if the person is a smoker ("Yes" or "No") alcohol_consumption Level of alcohol intake: None / Moderate / High diet_type Type of diet: Healthy / Fatty / Mixed physical_activity_level Physical activity level: Low / Moderate / High family_history Whether there is a family history of cancer ("Yes" or "No") mental_stress_level Perceived mental stress: Low / Medium / High sleep_hours Average hours of sleep per day (between ~4.5 to 9) regular_health_checkup Whether the person does routine health checkups ("Yes" or "No") prostate_exam_done Whether prostate exam was done recently ("Yes" or "No") risk_level Estimated prostate cancer risk: Low / Medium / High

    🎯 Potential Use Cases:

    Exploratory Data Analysis (EDA) for healthcare

    Classification modeling (predicting prostate cancer risk level)

    AI-based health advisory assistant

    Public health awareness dashboards

    Feature importance analysis and visualization

    🧠 Target Audience: Data science learners, machine learning engineers, public health researchers, healthcare app developers

    🔒 License: This dataset is distributed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). You are free to use, modify, and share this dataset with proper attribution.

    🔗 Attribution (if published): Dataset created by Arif Miah

  2. GameSphere 🕹️: 2000 App Store Insights & Ratings

    • kaggle.com
    zip
    Updated Feb 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kanchana1990 (2024). GameSphere 🕹️: 2000 App Store Insights & Ratings [Dataset]. https://www.kaggle.com/datasets/kanchana1990/gamesphere-2000-app-store-insights-and-ratings
    Explore at:
    zip(1651892 bytes)Available download formats
    Dataset updated
    Feb 17, 2024
    Authors
    Kanchana1990
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    Dataset Overview:

    • Entries and Columns: The dataset consists of 2,000 entries, each representing a game from the App Store, spread across 14 columns.
    • Completeness: Most columns are fully populated with 2,000 non-null entries. However, the 'price' column has 1942 non-null entries, indicating some games might be free or their pricing data is missing. The 'releaseNotes' column has 1969 non-null entries, suggesting some games might not have release notes available.

    Column Analysis:

    1. artistName: Names of the game developers or publishers.
    2. averageUserRating: The average user rating on a scale, with all entries populated, suggesting a comprehensive user feedback representation.
    3. averageUserRatingForCurrentVersion: Average ratings specifically for the current version of the games.
    4. contentAdvisoryRating: Age suitability ratings with values like '4+', '12+', '9+', and '17+', indicating a diverse range of content appropriate for various age groups.
    5. description: Game descriptions, providing insights into the game's theme, gameplay, and features.
    6. fileSizeBytes: The size of the game files in bytes, indicative of the game's scale and complexity.
    7. isGameCenterEnabled: A boolean indicating whether the game is integrated with Apple's Game Center, showing a mix of games with and without Game Center integration.
    8. minimumOsVersion: The minimum required version of the operating system, ensuring compatibility information is available for users.
    9. price: Game pricing information, with some missing values; the range includes free (0.0) to premium games.
    10. primaryGenreId: All games have the same genre ID (6014), suggesting they are from the same primary genre.
    11. releaseDate: The release dates for the games, useful for temporal analysis and trend identification.
    12. trackName: The name of the games as listed on the App Store.
    13. userRatingCount: The number of user ratings, providing a quantitative measure of user engagement.
    14. releaseNotes: Notes regarding game updates and new features, with some entries missing.

    Data Science Applications:

    • Trend Analysis: Utilize 'releaseDate' and 'averageUserRating' to identify trends in game popularity and user satisfaction over time.
    • Content Analysis: Employ NLP techniques on 'description' and 'releaseNotes' to extract themes and features that correlate with higher user ratings.
    • Pricing Strategy: Analyze 'price' alongside 'averageUserRating' and 'userRatingCount' to assess the impact of pricing on user engagement and satisfaction.
    • Demographic Targeting: Use 'contentAdvisoryRating' to understand the target demographics for different types of games.

    Ethical Consideration:

    The dataset is ethically mined, ensuring respect for data privacy and integrity. The effort to maintain ethical standards in data collection is commendable, aligning with best practices in data science.

    Acknowledgment:

    A special note of gratitude is extended to the App Store platform for not only providing a rich dataset but also allowing the use of its iconic imagery to enhance the dataset's visual appeal. This collaboration underscores the synergy between data science and digital ecosystems, facilitating a deeper understanding of user preferences and market dynamics.

  3. Steam Games Dataset

    • kaggle.com
    zip
    Updated Apr 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Bustos (2025). Steam Games Dataset [Dataset]. https://www.kaggle.com/datasets/fronkongames/steam-games-dataset/discussion
    Explore at:
    zip(252274279 bytes)Available download formats
    Dataset updated
    Apr 26, 2025
    Authors
    Martin Bustos
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset has been created with this code (MIT) and use the API provided by Steam, the largest gaming platform on PC. Data is also collected from Steam Spy.

    Here is a simple example of how to parse information:

    # Simple parse of the 'games.json' file.
    import os
    import json
    
    dataset = {}
    if os.path.exists('games.json'):
     with open('games.json', 'r', encoding='utf-8') as fin:
      text = fin.read()
      if len(text) > 0:
       dataset = json.loads(text)
    
    for app in dataset:
     appID = app                     # AppID, unique identifier for each app (string).
     game = dataset[app]       
    
     name = game['name']                 # Game name (string).
     releaseDate = game['release_date']         # Release date (string).
     estimatedOwners = game['estimated_owners']     # Estimated owners (string, e.g.: "0 - 20000").
     peakCCU = game['peak_ccu']             # Number of concurrent users, yesterday (int).
     required_age = game['required_age']         # Age required to play, 0 if it is for all audiences (int).
     price = game['price']                # Price in USD, 0.0 if its free (float).
     dlcCount = game['dlc_count']            # Number of DLCs, 0 if you have none (int).
     longDesc = game['detailed_description']       # Detailed description of the game (string).
     shortDesc = game['short_description']        # Brief description of the game,
                               # does not contain HTML tags (string).
     languages = game['supported_languages']       # Comma-separated enumeration of supporting languages.
     fullAudioLanguages = game['full_audio_languages']  # Comma-separated enumeration of languages with audio support.
     reviews = game['reviews']              #
     headerImage = game['header_image']         # Header image URL in the store (string).
     website = game['website']              # Game website (string).
     supportWeb = game['support_url']          # Game support URL (string).
     supportEmail = game['support_email']        # Game support email (string).
     supportWindows = game['windows']          # Does it support Windows? (bool).
     supportMac = game['mac']              # Does it support Mac? (bool).
     supportLinux = game['linux']            # Does it support Linux? (bool).
     metacriticScore = game['metacritic_score']     # Metacritic score, 0 if it has none (int).
     metacriticURL = game['metacritic_url']       # Metacritic review URL (string).
     userScore = game['user_score']           # Users score, 0 if it has none (int).
     positive = game['positive']             # Positive votes (int).
     negative = game['negative']             # Negative votes (int).
     scoreRank = game['score_rank']           # Score rank of the game based on user reviews (string).
     achievements = game['achievements']         # Number of achievements, 0 if it has none (int).
     recommens = game['recommendations']         # User recommendations, 0 if it has none (int).
     notes = game['notes']                # Extra information about the game content (string).
     averagePlaytime = game['average_playtime_forever'] # Average playtime since March 2009, in minutes (int).
     averageplaytime2W = game['average_playtime_2weeks'] # Average playtime in the last two weeks, in minutes (int).
     medianPlaytime = game['median_playtime_forever']  # Median playtime since March 2009, in minutes (int).
     medianPlaytime2W = game['median_playtime_2weeks']  # Median playtime in the last two weeks, in minutes (int).
    
     packages = game['packages']             # Available packages.
     for pack in packages:      
      title = pack['title']               # Package title (string).
      packDesc = pack['description']          # Package description (string).
    
      subs = pack['subs']                # Subpackages.
      for sub in subs:      
       text = sub['text']               # Subpackage title (string).
       subDesc = sub['description']          # Subpackage description (string).
       subPrice = sub['price']             # Subpackage price in USD (float).
    
     developers = game['developers']           # Game developers.
     for developer in developers:      
      developerName = developer             # Developer name (string).
    
     publishers = game['publishers']           # Game publishers.
     for publisher in publishers:      
      publisherName = publisher             # Publisher name (string).
    
     categories = game['categ...
    
  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Arif Miah (2025). Prostate Cancer Risk & Lifestyle Synthetic Dataset [Dataset]. https://www.kaggle.com/datasets/miadul/prostate-cancer-risk-and-lifestyle-synthetic-dataset/code
Organization logo

Prostate Cancer Risk & Lifestyle Synthetic Dataset

A synthetic health dataset simulating prostate cancer risk factors based on life

Explore at:
zip(12073 bytes)Available download formats
Dataset updated
Jul 26, 2025
Authors
Arif Miah
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

This synthetic dataset simulates 1,000 individual health profiles focusing on potential risk factors for prostate cancer. The dataset is designed to support public health awareness, machine learning research, and medical decision-support application development.

Prostate cancer is one of the most common cancers among men globally. While early detection through regular checkups can help prevent fatal outcomes, many lifestyle and behavioral factors contribute to increased risk.

This dataset includes a variety of features such as age, body mass index (BMI), smoking habits, diet, physical activity, family history of cancer, mental stress levels, and health-check behavior. Each row is labeled with an estimated prostate cancer risk level (Low / Medium / High) based on a rule-based scoring model.

The dataset is purely synthetic and contains no real patient information. It is safe for educational, research, and development use.

🔍 Dataset Columns:

Column Name Description id Unique identifier for each individual (synthetic) age Age of the individual in years (30–80) bmi Body Mass Index (simulated around average value with some variation) smoker Indicates if the person is a smoker ("Yes" or "No") alcohol_consumption Level of alcohol intake: None / Moderate / High diet_type Type of diet: Healthy / Fatty / Mixed physical_activity_level Physical activity level: Low / Moderate / High family_history Whether there is a family history of cancer ("Yes" or "No") mental_stress_level Perceived mental stress: Low / Medium / High sleep_hours Average hours of sleep per day (between ~4.5 to 9) regular_health_checkup Whether the person does routine health checkups ("Yes" or "No") prostate_exam_done Whether prostate exam was done recently ("Yes" or "No") risk_level Estimated prostate cancer risk: Low / Medium / High

🎯 Potential Use Cases:

Exploratory Data Analysis (EDA) for healthcare

Classification modeling (predicting prostate cancer risk level)

AI-based health advisory assistant

Public health awareness dashboards

Feature importance analysis and visualization

🧠 Target Audience: Data science learners, machine learning engineers, public health researchers, healthcare app developers

🔒 License: This dataset is distributed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). You are free to use, modify, and share this dataset with proper attribution.

🔗 Attribution (if published): Dataset created by Arif Miah

Search
Clear search
Close search
Google apps
Main menu