3 datasets found

Prostate Cancer Risk & Lifestyle Synthetic Dataset
kaggle.com
zip
Updated Jul 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arif Miah (2025). Prostate Cancer Risk & Lifestyle Synthetic Dataset [Dataset]. https://www.kaggle.com/datasets/miadul/prostate-cancer-risk-and-lifestyle-synthetic-dataset/code
Explore at:
zip(12073 bytes)Available download formats
Dataset updated
Jul 26, 2025
Authors
Arif Miah
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This synthetic dataset simulates 1,000 individual health profiles focusing on potential risk factors for prostate cancer. The dataset is designed to support public health awareness, machine learning research, and medical decision-support application development.

Prostate cancer is one of the most common cancers among men globally. While early detection through regular checkups can help prevent fatal outcomes, many lifestyle and behavioral factors contribute to increased risk.

This dataset includes a variety of features such as age, body mass index (BMI), smoking habits, diet, physical activity, family history of cancer, mental stress levels, and health-check behavior. Each row is labeled with an estimated prostate cancer risk level (Low / Medium / High) based on a rule-based scoring model.

The dataset is purely synthetic and contains no real patient information. It is safe for educational, research, and development use.

🔍 Dataset Columns:

Column Name Description id Unique identifier for each individual (synthetic) age Age of the individual in years (30–80) bmi Body Mass Index (simulated around average value with some variation) smoker Indicates if the person is a smoker ("Yes" or "No") alcohol_consumption Level of alcohol intake: None / Moderate / High diet_type Type of diet: Healthy / Fatty / Mixed physical_activity_level Physical activity level: Low / Moderate / High family_history Whether there is a family history of cancer ("Yes" or "No") mental_stress_level Perceived mental stress: Low / Medium / High sleep_hours Average hours of sleep per day (between ~4.5 to 9) regular_health_checkup Whether the person does routine health checkups ("Yes" or "No") prostate_exam_done Whether prostate exam was done recently ("Yes" or "No") risk_level Estimated prostate cancer risk: Low / Medium / High

🎯 Potential Use Cases:

Exploratory Data Analysis (EDA) for healthcare

Classification modeling (predicting prostate cancer risk level)

AI-based health advisory assistant

Public health awareness dashboards

Feature importance analysis and visualization

🧠 Target Audience: Data science learners, machine learning engineers, public health researchers, healthcare app developers

🔒 License: This dataset is distributed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). You are free to use, modify, and share this dataset with proper attribution.

🔗 Attribution (if published): Dataset created by Arif Miah
GameSphere 🕹️: 2000 App Store Insights & Ratings
kaggle.com
zip
Updated Feb 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kanchana1990 (2024). GameSphere 🕹️: 2000 App Store Insights & Ratings [Dataset]. https://www.kaggle.com/datasets/kanchana1990/gamesphere-2000-app-store-insights-and-ratings
Explore at:
zip(1651892 bytes)Available download formats
Dataset updated
Feb 17, 2024
Authors
Kanchana1990
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Description
Dataset Overview:

Entries and Columns: The dataset consists of 2,000 entries, each representing a game from the App Store, spread across 14 columns.

Completeness: Most columns are fully populated with 2,000 non-null entries. However, the 'price' column has 1942 non-null entries, indicating some games might be free or their pricing data is missing. The 'releaseNotes' column has 1969 non-null entries, suggesting some games might not have release notes available.

Column Analysis:

artistName: Names of the game developers or publishers.

averageUserRating: The average user rating on a scale, with all entries populated, suggesting a comprehensive user feedback representation.

averageUserRatingForCurrentVersion: Average ratings specifically for the current version of the games.

contentAdvisoryRating: Age suitability ratings with values like '4+', '12+', '9+', and '17+', indicating a diverse range of content appropriate for various age groups.

description: Game descriptions, providing insights into the game's theme, gameplay, and features.

fileSizeBytes: The size of the game files in bytes, indicative of the game's scale and complexity.

isGameCenterEnabled: A boolean indicating whether the game is integrated with Apple's Game Center, showing a mix of games with and without Game Center integration.

minimumOsVersion: The minimum required version of the operating system, ensuring compatibility information is available for users.

price: Game pricing information, with some missing values; the range includes free (0.0) to premium games.

primaryGenreId: All games have the same genre ID (6014), suggesting they are from the same primary genre.

releaseDate: The release dates for the games, useful for temporal analysis and trend identification.

trackName: The name of the games as listed on the App Store.

userRatingCount: The number of user ratings, providing a quantitative measure of user engagement.

releaseNotes: Notes regarding game updates and new features, with some entries missing.

Data Science Applications:

Trend Analysis: Utilize 'releaseDate' and 'averageUserRating' to identify trends in game popularity and user satisfaction over time.

Content Analysis: Employ NLP techniques on 'description' and 'releaseNotes' to extract themes and features that correlate with higher user ratings.

Pricing Strategy: Analyze 'price' alongside 'averageUserRating' and 'userRatingCount' to assess the impact of pricing on user engagement and satisfaction.

Demographic Targeting: Use 'contentAdvisoryRating' to understand the target demographics for different types of games.

Ethical Consideration:

The dataset is ethically mined, ensuring respect for data privacy and integrity. The effort to maintain ethical standards in data collection is commendable, aligning with best practices in data science.

Acknowledgment:

A special note of gratitude is extended to the App Store platform for not only providing a rich dataset but also allowing the use of its iconic imagery to enhance the dataset's visual appeal. This collaboration underscores the synergy between data science and digital ecosystems, facilitating a deeper understanding of user preferences and market dynamics.

Steam Games Dataset

kaggle.com

zip

Updated Apr 26, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Martin Bustos (2025). Steam Games Dataset [Dataset]. https://www.kaggle.com/datasets/fronkongames/steam-games-dataset/discussion

Explore at:

zip(252274279 bytes)Available download formats

Dataset updated

Apr 26, 2025

Authors

Martin Bustos

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

This dataset has been created with this code (MIT) and use the API provided by Steam, the largest gaming platform on PC. Data is also collected from Steam Spy.

Here is a simple example of how to parse information:

# Simple parse of the 'games.json' file.
import os
import json

dataset = {}
if os.path.exists('games.json'):
 with open('games.json', 'r', encoding='utf-8') as fin:
  text = fin.read()
  if len(text) > 0:
   dataset = json.loads(text)

for app in dataset:
 appID = app                     # AppID, unique identifier for each app (string).
 game = dataset[app]       

 name = game['name']                 # Game name (string).
 releaseDate = game['release_date']         # Release date (string).
 estimatedOwners = game['estimated_owners']     # Estimated owners (string, e.g.: "0 - 20000").
 peakCCU = game['peak_ccu']             # Number of concurrent users, yesterday (int).
 required_age = game['required_age']         # Age required to play, 0 if it is for all audiences (int).
 price = game['price']                # Price in USD, 0.0 if its free (float).
 dlcCount = game['dlc_count']            # Number of DLCs, 0 if you have none (int).
 longDesc = game['detailed_description']       # Detailed description of the game (string).
 shortDesc = game['short_description']        # Brief description of the game,
                           # does not contain HTML tags (string).
 languages = game['supported_languages']       # Comma-separated enumeration of supporting languages.
 fullAudioLanguages = game['full_audio_languages']  # Comma-separated enumeration of languages with audio support.
 reviews = game['reviews']              #
 headerImage = game['header_image']         # Header image URL in the store (string).
 website = game['website']              # Game website (string).
 supportWeb = game['support_url']          # Game support URL (string).
 supportEmail = game['support_email']        # Game support email (string).
 supportWindows = game['windows']          # Does it support Windows? (bool).
 supportMac = game['mac']              # Does it support Mac? (bool).
 supportLinux = game['linux']            # Does it support Linux? (bool).
 metacriticScore = game['metacritic_score']     # Metacritic score, 0 if it has none (int).
 metacriticURL = game['metacritic_url']       # Metacritic review URL (string).
 userScore = game['user_score']           # Users score, 0 if it has none (int).
 positive = game['positive']             # Positive votes (int).
 negative = game['negative']             # Negative votes (int).
 scoreRank = game['score_rank']           # Score rank of the game based on user reviews (string).
 achievements = game['achievements']         # Number of achievements, 0 if it has none (int).
 recommens = game['recommendations']         # User recommendations, 0 if it has none (int).
 notes = game['notes']                # Extra information about the game content (string).
 averagePlaytime = game['average_playtime_forever'] # Average playtime since March 2009, in minutes (int).
 averageplaytime2W = game['average_playtime_2weeks'] # Average playtime in the last two weeks, in minutes (int).
 medianPlaytime = game['median_playtime_forever']  # Median playtime since March 2009, in minutes (int).
 medianPlaytime2W = game['median_playtime_2weeks']  # Median playtime in the last two weeks, in minutes (int).

 packages = game['packages']             # Available packages.
 for pack in packages:      
  title = pack['title']               # Package title (string).
  packDesc = pack['description']          # Package description (string).

  subs = pack['subs']                # Subpackages.
  for sub in subs:      
   text = sub['text']               # Subpackage title (string).
   subDesc = sub['description']          # Subpackage description (string).
   subPrice = sub['price']             # Subpackage price in USD (float).

 developers = game['developers']           # Game developers.
 for developer in developers:      
  developerName = developer             # Developer name (string).

 publishers = game['publishers']           # Game publishers.
 for publisher in publishers:      
  publisherName = publisher             # Publisher name (string).

 categories = game['categ...

Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Arif Miah (2025). Prostate Cancer Risk & Lifestyle Synthetic Dataset [Dataset]. https://www.kaggle.com/datasets/miadul/prostate-cancer-risk-and-lifestyle-synthetic-dataset/code

Prostate Cancer Risk & Lifestyle Synthetic Dataset

A synthetic health dataset simulating prostate cancer risk factors based on life

Explore at:

zip(12073 bytes)Available download formats

Dataset updated

Jul 26, 2025

Authors

Arif Miah

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

This synthetic dataset simulates 1,000 individual health profiles focusing on potential risk factors for prostate cancer. The dataset is designed to support public health awareness, machine learning research, and medical decision-support application development.

Prostate cancer is one of the most common cancers among men globally. While early detection through regular checkups can help prevent fatal outcomes, many lifestyle and behavioral factors contribute to increased risk.

This dataset includes a variety of features such as age, body mass index (BMI), smoking habits, diet, physical activity, family history of cancer, mental stress levels, and health-check behavior. Each row is labeled with an estimated prostate cancer risk level (Low / Medium / High) based on a rule-based scoring model.

The dataset is purely synthetic and contains no real patient information. It is safe for educational, research, and development use.

🔍 Dataset Columns:

Column Name Description id Unique identifier for each individual (synthetic) age Age of the individual in years (30–80) bmi Body Mass Index (simulated around average value with some variation) smoker Indicates if the person is a smoker ("Yes" or "No") alcohol_consumption Level of alcohol intake: None / Moderate / High diet_type Type of diet: Healthy / Fatty / Mixed physical_activity_level Physical activity level: Low / Moderate / High family_history Whether there is a family history of cancer ("Yes" or "No") mental_stress_level Perceived mental stress: Low / Medium / High sleep_hours Average hours of sleep per day (between ~4.5 to 9) regular_health_checkup Whether the person does routine health checkups ("Yes" or "No") prostate_exam_done Whether prostate exam was done recently ("Yes" or "No") risk_level Estimated prostate cancer risk: Low / Medium / High

🎯 Potential Use Cases:

Exploratory Data Analysis (EDA) for healthcare

Classification modeling (predicting prostate cancer risk level)

AI-based health advisory assistant

Public health awareness dashboards

Feature importance analysis and visualization

🧠 Target Audience: Data science learners, machine learning engineers, public health researchers, healthcare app developers

🔒 License: This dataset is distributed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). You are free to use, modify, and share this dataset with proper attribution.

🔗 Attribution (if published): Dataset created by Arif Miah

Clear search

Close search

Google apps

Main menu

Prostate Cancer Risk & Lifestyle Synthetic Dataset

GameSphere 🕹️: 2000 App Store Insights & Ratings

Dataset Overview:

Column Analysis:

Data Science Applications:

Ethical Consideration:

Acknowledgment:

Steam Games Dataset

Prostate Cancer Risk & Lifestyle Synthetic Dataset

A synthetic health dataset simulating prostate cancer risk factors based on life