https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This synthetic dataset simulates user behavior in a fitness application, designed to predict the risk of gym membership dropout based on attendance patterns and personal attributes. The dataset contains 10,000 realistic user profiles with features that influence gym retention, making it ideal for classification tasks in behavioral analytics.
Feature | Type | Description | Value Range |
---|---|---|---|
user_id | int | Unique user identifier | 1-10000 |
age | int | User's age | 18-60 (peaked at 25-40) |
gender | categorical | User's gender | Male/Female |
sessions_per_week | int | Weekly gym attendance | 0-7 sessions |
avg_session_duration | float | Average workout length in minutes | 10-120 |
progress_score | float | Composite fitness progress metric | 0-100 |
mood_after | categorical | Post-workout emotional state | Energized/Neutral/Fatigued |
injury | categorical | Reported workout injuries | None/Knee/Back/Shoulder |
dropout | binary | Target variable - quit status | 0 (active)/1 (quit) |
Data was programmatically generated with: 1. Base distributions matching real gym statistics 2. Logical correlations between features (e.g., more sessions → longer durations) 3. Non-linear relationships in target variable 4. Controlled noise injection (Gaussian + categorical variability)
For classification models: - Precision-Recall curves (class imbalance consideration) - F1 score - ROC AUC - Feature importance analysis
CC0: Public Domain (Free to use for any purpose)
Synthetic dataset created for machine learning education and benchmarking purposes. Inspired by real fitness app analytics challenges.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This synthetic dataset simulates user behavior in a fitness application, designed to predict the risk of gym membership dropout based on attendance patterns and personal attributes. The dataset contains 10,000 realistic user profiles with features that influence gym retention, making it ideal for classification tasks in behavioral analytics.
Feature | Type | Description | Value Range |
---|---|---|---|
user_id | int | Unique user identifier | 1-10000 |
age | int | User's age | 18-60 (peaked at 25-40) |
gender | categorical | User's gender | Male/Female |
sessions_per_week | int | Weekly gym attendance | 0-7 sessions |
avg_session_duration | float | Average workout length in minutes | 10-120 |
progress_score | float | Composite fitness progress metric | 0-100 |
mood_after | categorical | Post-workout emotional state | Energized/Neutral/Fatigued |
injury | categorical | Reported workout injuries | None/Knee/Back/Shoulder |
dropout | binary | Target variable - quit status | 0 (active)/1 (quit) |
Data was programmatically generated with: 1. Base distributions matching real gym statistics 2. Logical correlations between features (e.g., more sessions → longer durations) 3. Non-linear relationships in target variable 4. Controlled noise injection (Gaussian + categorical variability)
For classification models: - Precision-Recall curves (class imbalance consideration) - F1 score - ROC AUC - Feature importance analysis
CC0: Public Domain (Free to use for any purpose)
Synthetic dataset created for machine learning education and benchmarking purposes. Inspired by real fitness app analytics challenges.