Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This synthetic dataset simulates 1,000 individual health profiles focusing on potential risk factors for prostate cancer. The dataset is designed to support public health awareness, machine learning research, and medical decision-support application development.
Prostate cancer is one of the most common cancers among men globally. While early detection through regular checkups can help prevent fatal outcomes, many lifestyle and behavioral factors contribute to increased risk.
This dataset includes a variety of features such as age, body mass index (BMI), smoking habits, diet, physical activity, family history of cancer, mental stress levels, and health-check behavior. Each row is labeled with an estimated prostate cancer risk level (Low / Medium / High) based on a rule-based scoring model.
The dataset is purely synthetic and contains no real patient information. It is safe for educational, research, and development use.
🔍 Dataset Columns:
Column Name Description id Unique identifier for each individual (synthetic) age Age of the individual in years (30–80) bmi Body Mass Index (simulated around average value with some variation) smoker Indicates if the person is a smoker ("Yes" or "No") alcohol_consumption Level of alcohol intake: None / Moderate / High diet_type Type of diet: Healthy / Fatty / Mixed physical_activity_level Physical activity level: Low / Moderate / High family_history Whether there is a family history of cancer ("Yes" or "No") mental_stress_level Perceived mental stress: Low / Medium / High sleep_hours Average hours of sleep per day (between ~4.5 to 9) regular_health_checkup Whether the person does routine health checkups ("Yes" or "No") prostate_exam_done Whether prostate exam was done recently ("Yes" or "No") risk_level Estimated prostate cancer risk: Low / Medium / High
🎯 Potential Use Cases:
Exploratory Data Analysis (EDA) for healthcare
Classification modeling (predicting prostate cancer risk level)
AI-based health advisory assistant
Public health awareness dashboards
Feature importance analysis and visualization
🧠 Target Audience: Data science learners, machine learning engineers, public health researchers, healthcare app developers
🔒 License: This dataset is distributed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). You are free to use, modify, and share this dataset with proper attribution.
🔗 Attribution (if published): Dataset created by Arif Miah
Facebook
TwitterOpen Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
The dataset is ethically mined, ensuring respect for data privacy and integrity. The effort to maintain ethical standards in data collection is commendable, aligning with best practices in data science.
A special note of gratitude is extended to the App Store platform for not only providing a rich dataset but also allowing the use of its iconic imagery to enhance the dataset's visual appeal. This collaboration underscores the synergy between data science and digital ecosystems, facilitating a deeper understanding of user preferences and market dynamics.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset has been created with this code (MIT) and use the API provided by Steam, the largest gaming platform on PC. Data is also collected from Steam Spy.
Here is a simple example of how to parse information:
# Simple parse of the 'games.json' file.
import os
import json
dataset = {}
if os.path.exists('games.json'):
with open('games.json', 'r', encoding='utf-8') as fin:
text = fin.read()
if len(text) > 0:
dataset = json.loads(text)
for app in dataset:
appID = app # AppID, unique identifier for each app (string).
game = dataset[app]
name = game['name'] # Game name (string).
releaseDate = game['release_date'] # Release date (string).
estimatedOwners = game['estimated_owners'] # Estimated owners (string, e.g.: "0 - 20000").
peakCCU = game['peak_ccu'] # Number of concurrent users, yesterday (int).
required_age = game['required_age'] # Age required to play, 0 if it is for all audiences (int).
price = game['price'] # Price in USD, 0.0 if its free (float).
dlcCount = game['dlc_count'] # Number of DLCs, 0 if you have none (int).
longDesc = game['detailed_description'] # Detailed description of the game (string).
shortDesc = game['short_description'] # Brief description of the game,
# does not contain HTML tags (string).
languages = game['supported_languages'] # Comma-separated enumeration of supporting languages.
fullAudioLanguages = game['full_audio_languages'] # Comma-separated enumeration of languages with audio support.
reviews = game['reviews'] #
headerImage = game['header_image'] # Header image URL in the store (string).
website = game['website'] # Game website (string).
supportWeb = game['support_url'] # Game support URL (string).
supportEmail = game['support_email'] # Game support email (string).
supportWindows = game['windows'] # Does it support Windows? (bool).
supportMac = game['mac'] # Does it support Mac? (bool).
supportLinux = game['linux'] # Does it support Linux? (bool).
metacriticScore = game['metacritic_score'] # Metacritic score, 0 if it has none (int).
metacriticURL = game['metacritic_url'] # Metacritic review URL (string).
userScore = game['user_score'] # Users score, 0 if it has none (int).
positive = game['positive'] # Positive votes (int).
negative = game['negative'] # Negative votes (int).
scoreRank = game['score_rank'] # Score rank of the game based on user reviews (string).
achievements = game['achievements'] # Number of achievements, 0 if it has none (int).
recommens = game['recommendations'] # User recommendations, 0 if it has none (int).
notes = game['notes'] # Extra information about the game content (string).
averagePlaytime = game['average_playtime_forever'] # Average playtime since March 2009, in minutes (int).
averageplaytime2W = game['average_playtime_2weeks'] # Average playtime in the last two weeks, in minutes (int).
medianPlaytime = game['median_playtime_forever'] # Median playtime since March 2009, in minutes (int).
medianPlaytime2W = game['median_playtime_2weeks'] # Median playtime in the last two weeks, in minutes (int).
packages = game['packages'] # Available packages.
for pack in packages:
title = pack['title'] # Package title (string).
packDesc = pack['description'] # Package description (string).
subs = pack['subs'] # Subpackages.
for sub in subs:
text = sub['text'] # Subpackage title (string).
subDesc = sub['description'] # Subpackage description (string).
subPrice = sub['price'] # Subpackage price in USD (float).
developers = game['developers'] # Game developers.
for developer in developers:
developerName = developer # Developer name (string).
publishers = game['publishers'] # Game publishers.
for publisher in publishers:
publisherName = publisher # Publisher name (string).
categories = game['categ...
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This synthetic dataset simulates 1,000 individual health profiles focusing on potential risk factors for prostate cancer. The dataset is designed to support public health awareness, machine learning research, and medical decision-support application development.
Prostate cancer is one of the most common cancers among men globally. While early detection through regular checkups can help prevent fatal outcomes, many lifestyle and behavioral factors contribute to increased risk.
This dataset includes a variety of features such as age, body mass index (BMI), smoking habits, diet, physical activity, family history of cancer, mental stress levels, and health-check behavior. Each row is labeled with an estimated prostate cancer risk level (Low / Medium / High) based on a rule-based scoring model.
The dataset is purely synthetic and contains no real patient information. It is safe for educational, research, and development use.
🔍 Dataset Columns:
Column Name Description id Unique identifier for each individual (synthetic) age Age of the individual in years (30–80) bmi Body Mass Index (simulated around average value with some variation) smoker Indicates if the person is a smoker ("Yes" or "No") alcohol_consumption Level of alcohol intake: None / Moderate / High diet_type Type of diet: Healthy / Fatty / Mixed physical_activity_level Physical activity level: Low / Moderate / High family_history Whether there is a family history of cancer ("Yes" or "No") mental_stress_level Perceived mental stress: Low / Medium / High sleep_hours Average hours of sleep per day (between ~4.5 to 9) regular_health_checkup Whether the person does routine health checkups ("Yes" or "No") prostate_exam_done Whether prostate exam was done recently ("Yes" or "No") risk_level Estimated prostate cancer risk: Low / Medium / High
🎯 Potential Use Cases:
Exploratory Data Analysis (EDA) for healthcare
Classification modeling (predicting prostate cancer risk level)
AI-based health advisory assistant
Public health awareness dashboards
Feature importance analysis and visualization
🧠 Target Audience: Data science learners, machine learning engineers, public health researchers, healthcare app developers
🔒 License: This dataset is distributed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). You are free to use, modify, and share this dataset with proper attribution.
🔗 Attribution (if published): Dataset created by Arif Miah